8.1 Introduction to surveys
8.2 Methodological approaches
8.3 Doing survey research
8.4 Statistical Analysis
8.4.1 Descriptive statistics
8.4.2 Exploring relationships
8.4.3 Analysing samples
8.4.3.1 Generalising from samples
8.4.3.2 Dealing with sampling error
8.4.3.3 Confidence limits
8.4.3.4 Statistical significance
8.4.3.5 Hypothesis testing
8.4.3.6 Significance tests
8.4.3.6.1 Parametric tests of significance
8.4.3.6.2 Non-parametric tests of significance
8.4.3.6.2.1 Chi-square test
8.4.3.6.2.2 Mann Witney U test
8.4.3.6.2.3 Kolmogorov-Smirnov Test
8.4.3.6.2.4 H test
8.4.3.6.2.5 Sign test
8.4.3.6.2.6 Wilcoxon test
8.4.3.6.2.7 Friedman test
8.4.3.6.2.8 Q test
8.4.3.7 Summary of significance testing and association: an example
8.4.4 Report writing
8.4 Summary and conclusion
When to use the Mann Witney U test
Hypotheses
What we need to know
Sampling distribution
Assumptions
Testing Statistic
Critical values
Computational procedures for calculating U
Worked examples
Limitation of the Man Whitney U test
Unworked examples
A note on the Wald Wolfowitz Runs Test
When to use the Mann Witney U test
The Mann Witney U test is a test of two samples to see whether they come from the same population.
The samples must be independent but can be of any size. The U test is only applicable in a two sample case.
The test can only be used if the two samples can be ranked in numerical size and both samples are in the same units; thus the data must be at least ordinal.
If the samples contain a large number of ties, the U test may become inapplicable. A tie is a number that occurs in both samples (See section on ties below).
The U test is most applicable with small samples and is used instead of the parametric t-test, when the assumption of the t-test that both populations are normal with equal variances does not hold, or when data is not of an interval scale.
Top
Hypotheses
H0: Both samples are from identical distributions
HA:The samples are not from identical distributions, (i.e. the samples are significantly different). Two-tail test
HA: The majority of one sample lies above the majority of the other sample. One-tail test
Top
What we need to know
To use the U test it must be possible to rank the component variables of each sample in numerical size and thus rank the combined sample. Therefore, both samples must be in the same units and the numbers should represent a score rather than representing frequencies.
For example, the number of examination level passes achieved by 6 students are: 3, 5, 6, 2, 8, 4. These numbers can be ranked in ascending order meaningfully.
However, if the frequency distribution for the number of passes is as in the table below, then to be able to rank the data meaningfully, the data would need to be extracted from the frequency table.
Number of passes |
0
|
1
|
2
|
3
|
4
|
5
|
6
|
Frequency |
2
|
0 |
4
|
3
|
6
|
7
|
7
|
|
Thus
0, 0, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5... etc.
Therefore, if the categories into which the frequencies fall are qualitative rather than quantitative, then this ranking will not always be possible as there may be no way of ordering each of the qualitative categories. The following data cannot be ranked in any meaningful way
Colour of eyes |
Blue
|
Grey
|
Green
|
Brown
|
Frequency |
6
|
4 |
4
|
3
|
It would be meaningless to categorise blue as numerically higher or lower than green.
Top
Sampling distribution
The sample distribution is the U distribution. The Mann Whitney test compares two sample distributions to see if they are likely to be sub-groups from a single population distribution.
The U test compares two sample distributions, to see if 'the bulk of one sample lies above the bulk of the other sample'. By simply ranking the data in the two samples in order, and then counting to see if there is more of one sample above the other, the U test shows whether the difference between the samples could have occurred by chance.
The value U is computed counting the total number of values of X from one sample that fall before each value of X from the other sample. Consider the following explanatory example. The data below is the scores of two cricket batsmen A and B in successive innings.
A: 2, 3, 4
B:
6, 12, 14, 24
Clearly, batsman B has a better performance than A, all four of his scores are better than all three of batsman A's scores. The U Test, when applied to this data should show a significant difference.
The mechanics of calculating U is as follows:
i) Combine both the samples, retaining the sample identity of each value, thus:
Combined sample |
2 |
3 |
4 |
6 |
12 |
14 |
24 |
Identity |
A |
A |
A |
B |
B |
B |
B |
ii) Count the total number of values of X in one sample that fall before each value of X in the other samples. Let' s take As before each B. Before the first B score of 6 there are three A scores (namely 2, 3 and 4).
Before the second B score (of 12) there are three A scores (namely 2, 3 and 4; note they are counted again).
Before the third B there are also 3; similarly with the fourth B score.
Hence the total number of A scores before each B is 3+3+3+3 = 12. Thus U = 12
U could also have been computed by counting the number of B scores before each A. There are no B scores before the lowest A score of 2, none before the second lowest A score, and none before the highest of the 3 A scores. Hence there is another value of U, calculated by counting the number of B scores before each A, thus;
U = 0+0+0 = 0
When calculating U there are always two U scores, depending upon which way U is computed. The more distinct the two samples are, the larger the gap will be between the two U scores. When the two samples do not 'overlap' at all, as in the example above, the two U scores will be at their maximum difference.
The lowest value U can be is zero, and the highest is (n1).(n2) (where n1 and n2 are the sample sizes, in this case 3 and 4 respectively. So (n1).(n2) = 3 x 4 = 12.
Note that U1 + U2 = (n1).(n2)
When the two samples 'overlap perfectly', as in the data below, then the two U scores will be equal and the samples will not be significantly different.
A: 3 6 14
B: 2 4 12 24
To calculate U combine the samples, retaining the identity thus:
Combined sample |
2 |
3 |
4 |
6 |
12 |
14 |
24 |
Identity |
B |
A |
B |
A |
B |
A |
B |
Counting the number of A's before each B
U = 0+1+2+3 = 6
Similarly, counting the number of B's before each A
U = 1+2+3 = 6
Note that U1 + U2 = (n1).(n2)
i.e. 6+6=3x4=12
So if you know one value of U the other one can be computed from the formula U1 + U2 = (n1).(n2).
In situations where there is neither perfect nor no overlap the two values of U will be between zero and (n1).(n2).
For example, if the distribution in the two samples above had been somewhat different and generated values of U1=8 and U2=4, would the difference be significant?
To see if the two samples are significantly different, it is necessary to calculate the probability of getting the U scores of 4 and 8 if the samples were from the same population. If the probability is less than the significance level (e.g. 5% (p=0.05)) then the two samples are significantly different.
The calculation of the probabilities for all sample sizes between 3 and 20 have been calculated and are embodied in the tables of critical values for the Mann Witney U test (see below). For samples over 20 in size, the distribution of U values is approximately normal.
Top
Assumptions
There are no assumptions as to the shape of the parent populations when applying the U test.
Top
Testing statistic
(i) For small samples the testing statistic is equal to the smallest value of U.
(ii) For large samples the U distribution approximates a normal distribution and the testing statistic is z.
z = (U- Mean of U)/SD of U
where Mean of U is (n1)(n12))/2
and SD of U = √((n1n2)(n1+n2+1)/12)
Either of the two U values can be used. The only difference will be that the sign of z will change. This will make no difference for a two-tail test and common sense should be used to see if, in the case of a one-tail test, the difference in the original data is in the direction specified.
Top
Critical values
For samples up to size twenty, the critical values can be found from tables of U critical values. Only the lower critical U values is stated in most tables (the upper value may be found by subtracting the lower critical value from (n1n2) if required).
Thus the null hypothesis is rejected if the smallest calculated U value is less than the lower limit value given in the tables (which normally have lowest values). The tables usually states critical values for significance levels, 0.05, 0.025, and 0.01.
Tables can be found at:
University of Saskatchewan (accessed 24 May 2020, also here)
UMass Boston (accessed 24 May 2020)
Open Door website (for samples up to n=30 but only for 5% significance) (accessed 24 May 2020)
Top
Computational procedures for calculating U
A) Note that as U1 + U2 = (n1)(n2), it is not necessary to count U both ways. As long as one U value has been derived, the other can be found as the values of n1 and n2 will be known.
B) Another method of deriving U, rather than counting, which can become tedious when samples
are large, is to give each value in the combined sample a rank score. The method is as follows.
Give each element of the combined sample a rank number. The lowest value has a rank of 1, the next lowest a rank of 2 and so on (irrespective of the differences between the numbers) and finally the highest score will have a rank equal to the sum of the sample sizes.
When the rank numbers have been assigned, add them up for sample one and call the total R1
OR
add them for sample two and call the total R2.
U is then given by the following formulae:
U1 = (n1)(n2) + (n1(n1+1)/2) - R1
OR
U2 = (n1)(n2) + (n2(n2+1)/2) - R2
The two formulae will give the two different values for U. However, only one needs to be calculated and the other can be derived from the property that U1 + U2 = (n1)(n2).
As an example of calculating U by the ranking technique, reconsider the example above. The combined sample is:
Combined sample |
2 |
3 |
4 |
6 |
12 |
14 |
24 |
Identity |
B |
A |
B |
A |
B |
A |
B |
Add the rank number:
Combined sample |
2 |
3 |
4 |
6 |
12 |
14 |
24 |
Identity |
B |
A |
B |
A |
B |
A |
B |
Thus:
R1 = Total rank of sample A = 2 + 4 + 6 = 12
OR
R2 = Total rank of sample B = 1 + 3 + 5 + 7= 16
Thus:
U1 = (n1)(n2) + (n1(n1+1)/2) - R1 = (3)(4) + (3)(4)/2 - 12 = 12 + 6 - 12 = 6
OR
U2 = (n1)(n2) + (n2(n2+1)/2) - R2 = (3)(4) + (4)(5)/2 - 16 = 12 + 10 -16 = 6
C) When the same score occurs in both samples the rank given to each of the tied scores is equal to the sum of the ranks that would have been assigned to the scores had they varied, divided by the number of scores that had tied for the same position. For example, the two samples in the table below both have values of X equal to 2 and 10.
A |
1 |
2 |
3 |
3 |
8 |
8 |
10 |
10 |
B |
2 |
5 |
7 |
10 |
10 |
10 |
11 |
|
Ranking the values for A and B
A |
B |
X |
Rank |
X |
Rank |
1 |
1 |
2 |
2.5 |
2 |
2.5 |
5 |
6 |
3 |
4 |
7 |
7 |
4 |
5 |
10 |
12 |
8 |
8 |
10 |
12 |
8 |
9 |
10 |
12 |
10 |
12 |
11 |
15 |
10 |
12 |
|
|
1 has a rank of 1 being the lowest, the two 2 have ranks 2 and 3, but to be fair to both samples each 2 has the average rank, namely 2.5. The 3 has a rank of 4, and so on. There is no need to average the ranks of the two 8s as they are both in sample A. When we get to the 10s the first 9 ranks have been used so the five 10s have the ranks 10, 11, 12, 13 and 14. The average is 12 so each 10 has a rank score of 12. Finally the 11 has a rank of 15.
Top
Worked examples
1. Small sample U test, using the counting method
The Students' Union elections at a college were all held in the same week. The candidates could be broadly separated into two categories, the left-wing and the non-left. The number of votes cast for each of the two groups for each elected post were as follows.
|
Left
|
Non-Left
|
President |
263
|
166
|
Secretary |
86
|
130
|
Treasurer |
230
|
185
|
Social Secretary |
183
|
178
|
Internal Affairs |
144
|
164
|
External Affairs |
131
|
120
|
Catering Liason Committee |
93
|
194
|
Welfare Secretary |
149
|
74
|
Sports Secretary |
89
|
69
|
Test the data to see if the college student politics can be said to be left wing.
A U test will be carried out rather than a Kolmogorov-Smirnov (K-S) test or parametric test of means (t-test).
The K-S test would be applicable as both small samples are of the same size, however it would be rather awkward deciding on the class intervals to be used in order to build up the cumulative frequency, and as the total frequency is only nine, the class groupings could greatly influence the results. (An exercise would be to carry out a K-S test on the data, grouped in class intervals of 25 votes with the lowest class group ranging from 51 to 75 inclusive). A parametric means test is possible but the range of votes is quite different and it is unlikely that the standard deviations will not differ significantly. The assumption that the population is normally distributed is tenuous.
Hypotheses:
H0: The two samples come from the same population, i.e. there is no significant difference in the distribution of votes cast for left and non-left candidates.
HA: There is a significant difference in the distribution of votes cast for the two types of candidates.
This is a two-tail test of significance.
Significance level: 5%
Testing statistic: U
Critical value: From tables of critical U values for two samples of 9, critical U at 5% (two-tail) is 18.
Decision rule: reject H0 if U<18
Computation: List the 18 items from the two samples (from the table above) in ascending order of size, retaining their identifty (red are left).
69 74 86 89 93 120 130 131 144 149 161 166 178 183 185 194 230 263
Number of non-left's before each left = 2+2+2+4+4+4+7+9+9 = 43
is this the smallest U?
U1 + U2 = (n1)(n2)
Thus 43 + U2 = (9)(9)
U2 = 38
Decision: Cannot reject the null hypotheses (H0) as smallest U (=38) is greater than critical value of 18. Student politics at the college is not significantly left-wing.
NOTE: This was also calculated using the Social Science Statistics U test calculator (accessed 27 May 2019, not available 25 June 2019), which gave a U value of 38 using the z approximation approach outlined below (giving a critical z of 17 (not 18 ) and a z score of -0,1766 (p=.0.85617)
Top
2. Small sample U test, using the ranking method
It was contended that female divorcees were socially acquainted with their husbands for a shorter period of time prior to marriage than currently married women. In a survey conducted in January 2018 the following data was collected for two independent random samples.
Number of months acquainted with husband prior to marriage:
Divorcees |
13 |
36 |
12 |
45 |
30 |
10 |
12 |
8 |
18 |
3 |
10 |
1 |
28 |
25 |
12 |
19 |
7 |
24 |
18 |
|
Married women |
4 |
18 |
24 |
21 |
16 |
9 |
14 |
29 |
21 |
15 |
25 |
32 |
36 |
24 |
18 |
|
|
|
|
|
Is there a significant difference in the length of time divorcees and currently married women knew their husbands prior to marriage?
Because of the ties involved in the data, a K-S test may have been more appropriate for this
data but it is not possible however due to the unequal size of the small samples.
A parametric test of means would involve calculating two means and two standard deviations and require the assumption that the populations are normal with equal variances, this latter assumption may well be false as the amount of time people have been acquainted prior to marriage tails off over time to the extent that two people could have been acquainted since childhood. Therefore a Mann-Whitney U test will be used.
Hypotheses:
H0: There is no significant difference in the time divorcees and married women knew their husbands prior to marriage.
HA: The difference is significant.
This is a two-tail test of significance.
Significance level: 5%
Testing statistic: U
Critical value: From tables of critical U values, critical U at 5% (two-tail) is U=86.
Decision rule: reject H0 if U<86
Computation: Call divorcees sample 1
Divorcees |
Married |
Months |
Rank |
Months |
Rank |
1 |
1 |
4 |
3 |
3 |
2 |
9 |
6 |
7 |
4 |
14 |
13 |
8 |
5 |
15 |
14 |
10 |
7.5 |
16 |
15 |
10 |
7.5 |
18 |
17.5 |
12 |
10 |
18 |
17.5 |
12 |
10 |
21 |
21.5 |
12 |
10 |
21 |
21.5 |
13 |
12 |
24 |
24 |
18 |
17.5 |
24 |
24 |
18 |
17.5 |
25 |
26.5 |
19 |
20 |
29 |
29 |
24 |
24 |
32 |
31 |
25 |
26.5 |
36 |
32.5 |
28 |
28 |
R2 |
296 |
30 |
30 |
|
|
36 |
32.5 |
|
|
45 |
34 |
|
|
U2 = n1n2 + n2(n2+1)/2 -R2 = 19x15 + 15x16/2 -296 = = 285 + 120 - 296 =109
U1 + U2 = n1n2
U1 + 109 = 285
U1 = 176
so smallest U= 109
As a check compute U by finding and applying the formula U1= n1n2 + n1(n1+1)/2 -R1
R1 = 299
Therefore U1 = 285 + (19x20)/2 – 299 = 285 + 190 - 299 = 176.
Therefore the check confirms result above.
Decision: Cannot reject the null hypotheses (H0) as smallest U (= 109) is greater than critical value of 86. Hence there is no reason to uphold the contention that divorcees know their partners for a significantly different length of time than married women.
Top
3. Large samples
A questionnaire containing three key questions, among many others, was given to two separate groups of first-year students. The questionnaire was about drugs in general and the three key questions were concerned with the individual's potential inclination to experiment with certain drugs. Each of the three key questions was accompanied by a choice of answers:
Definitely not
Unlikely
Undecided
Likely
Yes, if available
Each answer related to the drug in question and the possibility of the student experimenting with it. In the analysis of the results of the questionnaire, the three key questions ware isolated and each student's total reply to these three questions calculated on the basis of the answers given, where an answer of "Definitely not" scored 5 points, an answer of "Unlikely" 4 points, and so on, thus an answer of "Yes" scored 1 point. Hence the maximum score any student would get for the three questions was 15 and the minimum was 3, the higher the score the less likely the student is to experiment.
The two separate groups of students differed in that Group 2 had seen an "anti-drug propaganda" film before answering the questionnaire, whilst Group 1 had not. Group 1 consisted of 30 students and Group 2 of 28 students. The results are shown below.
Group 1: 7 6 8 9 8 7 6 8 9 15 8 7 10 12 7 9 10 12 7 9 10 8 7 12 14 8 11 12 13 7 10 6 5 9
Group 2: 6 14 13 14 13 12 8 12 12 11 10 15 15 9 12 14 11 10 9 11 11 9 11 14 9 13 7 4
Test the effect of the film as an influence on the potential experimentation of students with drugs.
A chi-square test would have a lot of expected frequencies below 5 so is not really applicable. A parametric test of means is not viable with ordinal data. (The coding of the responses does not make the data into interval scale data!)
Hypotheses:
H0: The film has no significant effect. There is no signicant difference in scores between the two groups
HA: The film has an effect, it reduces potential drug experimentation
This is a one-tail test of significance.
Significance level: 1% (As a considerable amount of time effort and money goes into making and watching films, the tester wants to reduce the probability of wrongly rejecting H0)
Testing statistic: z = (U - Mean of U)/SD of U
where Mean of U is (n1)(n2))/2
and SD of U = √((n1n2)(n1+n2+1)/12)
Critical value: z = -2.33 (one-tail z test at 99%) from normal distribution tables
Decision rule: reject H0 if z<-2.33
Computation:
Non-film group |
U2 |
Film group |
U1 |
Scores |
Rank |
Scores |
Rank |
5 |
2 |
4 |
1 |
6 |
4.5 |
6 |
4.5 |
6 |
4.5 |
7 |
10 |
6 |
4.5 |
8 |
17 |
7 |
10 |
9 |
24.5 |
7 |
10 |
9 |
24.5 |
7 |
10 |
9 |
24.5 |
7 |
10 |
9 |
24.5 |
7 |
10 |
10 |
31 |
7 |
10 |
10 |
31 |
8 |
17 |
11 |
36.5 |
8 |
17 |
11 |
36.5 |
8 |
17 |
11 |
36.5 |
8 |
17 |
11 |
36.5 |
8 |
17 |
11 |
36.5 |
8 |
17 |
12 |
43 |
9 |
24.5 |
12 |
43 |
9 |
24.5 |
12 |
43 |
9 |
24.5 |
12 |
43 |
9 |
24.5 |
13 |
48.5 |
10 |
31 |
13 |
48.5 |
10 |
31 |
13 |
48.5 |
10 |
31 |
14 |
53 |
11 |
36.5 |
14 |
53 |
12 |
43 |
14 |
53 |
12 |
43 |
14 |
53 |
12 |
43 |
15 |
57 |
13 |
48.5 |
15 |
57 |
14 |
53 |
|
|
15 |
57 |
|
|
Total |
692.5 |
|
1018.5 |
U1 = n1n2 + n1(n1+1)/2 -R1 = 28x30 + 28x29/2 - 1018.5 = 840+406-1018.5 = 227.5
z = (U- Mean of U)/SD of U
where Mean of U is (n1)(n2))/2 = 30x28/2 = 840/2 = 420
and SD of U = √((n1n2)(n1+n2+1)/12) = √((840)(59)/12 = √4130 = 64.26
z = (U- Mean of U)/SD of U = (227.5 - 420)/64.3 = -192.5/64.3 = -2.378
Decision: Reject the null hypotheses (H0) at 99% level of confidence. The film reduces potential experimentation (provided there are no other variables that may have induced the different responses.)
Note: All the test in fact shows is that the difference between samples is not due to sampling error. As to whether the difference is caused by the film is not proven. Other variables that may have an effect on responses need to be controlled before any causality can validly be attributed to the film.
Above it was noted that the sign of the calculated z value does not always correspond to the direction of the test. However, in this case a cursory glance shows that it is the right direction.
Top
4 Ties
The effect of tied scores on the U Test could be to alter the U score and result in a rejection of the null hypothesis (H0) if the scores are ordered in one way and non-rejection if the scores are ordered in a different way, as the following example illustrates.
Sample X: 3, 4, 5, 6. 7, 8
Sample Y: 3, 4, 4, 4, 4
The scores 3 and 4 are tied across the samples.
If the combined sample has the all the Y scores for the ties first (Y priority), the result is:
3,3,4,4,4,4,4,5,6,7,8
Y,X,Y,Y,Y,Y,X,X,X,X,X
If the combined sample has the all the X scores for the ties first (X priority), the result is:
3,3,4,4,4,4,4,5,6,7,8
X,Y,X,Y,Y,Y,Y,X,X,X,X
Counting the number of X's that precede each Y in each case results in:
Y priority: U = 0+1+1+1=1=4
X priority: U = 1+2+2+2+2=9
Consequently this results in two very different values for U (both legitimate in theory). The critical value for samples of size 5 and 6 at 95% is 6. So H0 would have been rejected in one case and not the other. This is clearly unsatisfactory.
(Note that these are the smallest values of U. Had the count be of Y's preceding X then the counts would have been bigger in both cases).
A compromise would be to use the ranking method of determining U.
Thus:
Sample X |
|
Sample Y |
|
Score |
Rank |
Score |
Rank |
3 |
1.5 |
3 |
1.5 |
4 |
5 |
4 |
5 |
5 |
8 |
4 |
5 |
6 |
9 |
4 |
5 |
7 |
10 |
4 |
5 |
8 |
11 |
|
|
R1 |
44.5 |
R2 |
21.5 |
U1 = 30+15-21.5 = 23.5
U2 = 30+21-44.5 = 6.5
Therefore, smallest U = 6.5
Consequently H0 would not be rejected as the smallest calculated U is not less than the critical value, which was 6.
Although the compromise situation that is brought about by the ranking method (which gives tied rank and 'average' score) it is not totally accurate, and there is a correction procedure that can be applied in cases where ties are prevalent (explained below).
A final note: if the number of tied scores is very large, if for example every score occurs in both samples, the U test should be avoided. (It does not matter that the same score occurs more than once in the same sample.)
Top
Procedure for dealing with a large number of ties
The correction for ties applies for samples over size 20, when the normal curve analysis is applied as shown above (3 Large Samples)
The correction factor comes into the calculation of the standard deviation of U
Instead of SD of U = √((n1n2)(n1+n2+1)/12)
SD of U corrected for ties becomes: √((n1n2)/(n1+n2)(n1+n2-1)).((n1+n2)3-(n1+n2)/12) - ∑T)
Where T= (t3-t)/12 and t is number of ties for a given rank
Where ∑T = is total numbers of T scores (for all the tied ranks)
Example:
In the large sample example shown above (3 Large Samples), there were a large number of ties so applying the correction:
The following tied groups in the example:
4 scores of 6
7 scores of 7
7 scores of 8
8 scores of 9
5 scores of 10
6 scores of 11
7 scores of 12
4 scores of 13
5 scores of 14
3 scores of 15
Thus, there are t values of 4,7,7,8,5,6,7,4,5,3
To find ∑T compute (t3-t)/12 for each tied rank and sum:
Thus, ∑T= (43-4)/12 + (73-7)/12 + (73-7)/12 + (83-8)/12 + (53-5)/12 + (63-6)/12 + (73-7)/12 + (43-4)/12 + (53-5)/12 + (33-3)/12
∑T=5+28+28+42+10+17.5+28+5+10+2 = 175.5
n1=30, n2=28,
so SD of U corrected is: √((n1n2)/(n1+n2)(n1+n2-1)).((n1+n2)3-(n1+n2)/12) - ∑T)
= √(840/(58)(57))((583-58)/12)-175.5)
= √(0.2541)(16253.5 - 175.5) = 63.93
So
z = (U - (n1n2)/2)/SD of U corrected = U - ((30)(28)/2)/63.95
z= (U - 420)/63.95
In the large sample example shown above (3 Large Samples), U was calculated as 227.5
so
z= (227.5 - 420)/63.95 = -192.5/63.95 = -2.392
Having corrected for ties, the resulting value is slightly larger (without ties z was -2.378) but does not alter the decision to reject the null hypothesis (-2.39 < -2.33)
This correction for ties is only necessary when there a substantial number of ties. When two or three ties occur it can be ignored.
Top
Limitation of the Man Whitney U test
The U test will not show a significant difference when the distribution of one sample is grouped around a very narrow range of values in the other sample, despite the distributions being clearly different. Thus:
Sample 1: 0,1,2,3,4,6,7,8,9,10
Sample 2: 5,5,5,5,5,5,5,5,5,5
A U test would result in the smallest U score being 50 (Number of sample 1 before each sample 2 is 10 for items 6 to 10 = 10 x 5 = 50). The critical value of U for two sample of size 10 at 95% is 24 and the U score of 50 is not less than 24, so the test says the two samples are not significantly different. (A parametric test of sample means would result in the same outcome. A Kologorov-Smirnov test is preferable in such circumstances)
Unworked examples
1. It was hypothesised that, at University X, Arts students were mare politically left wing than Science students. Eleven Arts students and ten Science students were given a political test in attempt to define their position on a political spectrum. A high score on the test indicates left wing and a low score right wing. The maximum score was 50 and the minimum score was
zero. The results are shown below. Is the hypothesis correct?
Arts: 42, 40, 28, 17, 24, 44, 28, 38, 48, 29, 7
Science: 36, 21, 6, 11, 18, 22, 18, 31, 10, 47
2. The number of days of absenteeism of 15 shop-floor workers and 8 office workers at a large factory are shown below. is there a significant difference in absenteeism for office and shop-floor workers at a five per cent significance level, using a non-paprametric test?
Shop floor: 10, 9, 13, 9, 6, 12, 15, 3, 26, 2, 6, 7, 12, 5, 10
Office: 14, 6, 4, 17, 18, 11, 8, 4
3. A sample of delegates from both Republican and Democrat Conventions were asked to indicate, on a scale (from 0 to 9), as to whether or not they agreed that greenhouse gases caused climate change. Zero represented totaly disagree and 9 totally agree. The results are below, is there any significant difference between the two parties on this issue?
Republican: 4, 0, 5, 4, 6, 4, 5, 8, 3, 2, 0, 3, 2, 7, 5, 0, 3, 8, 2, 2
Democrat: 1, 3, 7, 7, 8, 9, 7, 1, 5, 5, 6, 5, 9, 9, 5, 3, 6, 8, 7, 6
Top
A note on the Wald Wolfowitz Runs Test
Another test, similar to the Mann Whitney U test, which could have been applied to the example data, is the Wald Wolfowitz Runs test. Howerver it is a statistically less powerful test than the U test, although employing a similar method when applied to two independent samples. It is as restricted in its use as the U test, i.e. the sample data has to be at least ordinal.The only advantage that the Wald Wolfowitz Runs test is that it is slightly easier to compute, an advantage that is completely outweighed by the loss of statistical power. (See power of significance tetsts). For this reason the test will not be further explored here.
|