The Mann–Whitney Test ∗

We have developed two procedures for performing a hypothesis test to compare the means of two populations: the pooled and nonpooledt-tests. Both tests require simple random samples, independent samples, and normal populations or large samples. The pooledt-test also requires equal population standard deviations.

Recall that the shape of a normal distribution is determined by its standard devia- tion. In other words, two normal distributions have the same shape if and only if they have equal standard deviations. Consequently, the pooledt-test applies when the two distributions (one for each population) of the variable under consideration are normal and have the same shape; the nonpooledt-test applies when the two distributions are normal, even if they don’t have the same shape.

Another procedure for performing a hypothesis test based on independent simple random samples to compare the means of two populations is theMann–Whitney test.

This nonparametric test, introduced by Wilcoxon and further developed by Mann and Whitney, is also commonly referred to as theWilcoxon rank-sum testor theMann–

Whitney–Wilcoxon test.

The Mann–Whitney test applies when the two distributions of the variable under consideration have the same shape, but it does not require that they be normal or have any other speciﬁc shape. See Fig. 10.9.

FIGURE 10.9 Appropriate procedure for comparing two population means based on independent simple random samples

(a) Normal populations, same shape.

Use pooled t-test.

(b) Normal populations, different shapes.

Use nonpooled t-test.

Use Mann–Whitney test.

(d) Not both normal populations, different shapes. Use nonpooled t-test for large samples; otherwise, consult a statistician.

EXAMPLE 10.9 Introducing the Mann–Whitney Test

Computer-System Training A nationwide shipping ﬁrm purchased a new computer system to track its shipments, pickups, and deliveries. Employees were expected to need about 2 hours to learn how to use the system. In fact, some employees could use the system in very little time, whereas others took considerably longer.

Someone suggested that the reason for this difference might be that only some employees had experience with this kind of computer system. To test this sugges- tion, independent samples of employees with and without such experience were randomly selected.

The times, in minutes, required for these employees to learn how to use the system are given in Table 10.9. At the 5% signiﬁcance level, do the data provide sufﬁcient evidence to conclude that the mean learning time for all employees without experience exceeds the mean learning time for all employees with experience?

TABLE 10.9 Times, in minutes, required to learn how to use the system

Without With

experience experience

139 142

118 109

164 130

151 107

182 155

140 88

134 95

104

Solution Letμ1andμ2denote the mean learning times for all employees without experience and with experience, respectively. Then the null and alternative hypotheses are, respectively,

H0:μ1=μ2(mean time for inexperienced employees is not greater) Ha:μ1> μ2(mean time for inexperienced employees is greater).

To use the Mann–Whitney test, the learning-time distributions for employees without and with experience should have the same shape. If they do, then the distributions of the two samples in Table 10.9 should also have the same shape, roughly.

To check this condition, we constructed Fig. 10.10, aback-to-back stem-and- leaf diagramof the two samples in Table 10.9. In such a diagram, the leaves for the ﬁrst sample are on the left, the stems are in the middle, and the leaves for the second sample are on the right. The stem-and-leaf diagrams in Fig. 10.10 have roughly the same shape and so do not reveal any obvious violations of the same- shape condition.†

FIGURE 10.10 Back-to-back stem-and-leaf diagram of the two learning-time samples in Table 10.9

8 5 4 7 9

0 2 5 8 9 10 11 12 13 14 15 16 17 18 8

9 4 0 1 4

With experience Without

experience

To apply the Mann–Whitney test, we ﬁrst rank all the data from both samples combined. (Referring to Fig. 10.10 is helpful in ranking the data.) The ranking, depicted in Table 10.10, shows, for instance, that the ﬁrst employee without experience had the ninth-shortest learning time among all 15 employees in the two samples combined.

The idea behind the Mann–Whitney test is simple: If the sum of the ranks for the sample of employees without experience is too large, we conclude that the null hypothesis is false and, therefore, that the mean learning time for all employees without experience exceeds that for all employees with experience. From Table 10.10, the sum of the ranks for the sample of employees without experience, denotedM, is

9+6+14+12+15+10+8=74.

TABLE 10.10 Results of ranking the combined data from Table 10.9

Without Overall With Overall

experience rank experience rank

139 9 142 11

118 6 109 5

164 14 130 7

151 12 107 4

182 15 155 13

140 10 88 1

134 8 95 2

104 3

†For ease in explaining the Mann–Whitney test, we have chosen an example in which the sample sizes are very small. However, very small sample sizes make effectively checking the same-shape condition difﬁcult, so proceed cautiously when dealing with very small samples.

466 CHAPTER 10 Inferences for Two Population Means

To decide whether M =74 is large enough to reject the null hypothesis, we need to ﬁrst discuss some preliminary material.

Using the Mann–Whitney Table†

Table VI in Appendix A gives values of Mα for a Mann–Whitney test.‡ The size of the sample from Population 2 is given in the leftmost column of Table VI, the values ofαin the next column, and the size of the sample from Population 1 along the top.

As expected, the symbol Mαdenotes theM-value with area (percentage, probability) αto its right.

We can express the critical value(s) for a Mann–Whitney test at the signiﬁcance levelαas follows:

r For a two-tailed test, the critical values are theM-values with areaα/2 to its left (or, equivalently, area 1−α/2 to its right) and areaα/2 to its right, which areM1−α/2 andMα/2, respectively. See Fig. 10.11(a).

r For a left-tailed test, the critical value is the M-value with area α to its left or, equivalently, area 1−αto its right, which isM1−α. See Fig. 10.11(b).

r For a right-tailed test, the critical value is theM-value with areaαto its right, which isMα. See Fig. 10.11(c).

FIGURE 10.11 Critical value(s) for a Mann–Whitney test at the significance levelαif the test is (a) two tailed, (b) left tailed, or (c) right tailed

M (b) Left tailed

M (c) Right tailed M

(a) Two tailed /2

M1−/2 M/2 M1− M

/2 Reject

Reject H0

Reject H0 Do not

reject H0

Do not reject H0 Do not reject H0

Note the following:

r A critical value from Table VI is to be included as part of the rejection region.

r Although the variableMis discrete, we drew the “histograms” in Fig. 10.11 in the shape of a normal curve. This approach is acceptable becauseMis close to normally distributed except for very small sample sizes. We use this graphical convention throughout this section.

The distribution of the variable M is symmetric about n1(n1+n2+1)/2. This characteristic implies that the M-value with area A to its left (or, equivalently, area 1− Ato its right) equals n1(n1+n2+1) minus the M-value with area A to its right. In symbols,

M1−A =n1(n1+n2+1)−MA. (10.2) Referring to Fig. 10.11, we see that by using Equation (10.2) and Table VI, we can determine the critical value for a left-tailed Mann–Whitney test and the critical values for a two-tailed Mann–Whitney test. The next example illustrates the use of Table VI to determine critical values for a Mann–Whitney test.

†We can use the Mann-Whitney table to estimate theP-value of a Mann-Whitney test. However, because doing so can be awkward or tedious, using statistical software is preferable. Thus, those concentrating on theP-value approach to hypothesis testing can skip to the subsection “Performing the Mann–Whitney Test.”

‡Actually, theα-levels in Table VI are only approximate, but are used in practice.

EXAMPLE 10.10 Using the Mann–Whitney Table

In each case, use Table VI to determine the critical value(s) for a Mann–Whitney test. Sketch graphs to illustrate your results.

a. n1=9,n2=6; significance level=0.01; right tailed b. n1=5,n2=7; significance level=0.10; left tailed c. n1=8,n2=4; significance level=0.05; two tailed

Solution In solving these problems, it helps to refer to Fig. 10.11.

a. The critical value for a right-tailed test at the 1% signiﬁcance level isM0.01. To ﬁnd the critical value, we use Table VI. First we go down the leftmost column, labeledn2, to “6.” Then, going across the row forαlabeled 0.01 to the column labeled “9,” we reach 92, the required critical value. See Fig. 10.12(a).

b. The critical value for a left-tailed test at the 10% signiﬁcance level isM1−0.10. To ﬁnd the critical value, we use Table VI and Equation (10.2). First we go down the leftmost column, labeledn2, to “7.” Then, going across the row forα labeled 0.10 to the column labeled “5,” we reach 41; thusM0.10=41. Now we apply Equation (10.2) and the result just obtained to get

M1−0.10=5(5+7+1)−M0.10=65−41=24, which is the required critical value. See Fig. 10.12(b).

c. The critical values for a two-tailed test at the 5% signiﬁcance level areM1−0.05/2andM0.05/2, that is,M1−0.025andM0.025. First we use Table VI to ﬁndM0.025. We go down the leftmost column, labeledn2, to “4.” Then, going across the row forαlabeled 0.025 to the column labeled “8,” we reach 64;

thus M0.025=64. Now we apply Equation (10.2) and the result just obtained to getM1−0.025:

M1−0.025=8(8+4+1)−M0.025=104−64=40. See Fig. 10.12(c).

Exercise 10.99 on page 474

FIGURE 10.12 Critical value(s) for a Mann–Whitney test: (a) right tailed,α=0.01,n1=9,n2=6;

(b) left tailed,α=0.10,n1=5,n2=7; (c) two tailed,α=0.05,n1=8,n2=4

M (c)

Do not reject H0 Reject

Reject H0

0.025

40 64

0.025 M

0.01

(a)

Do not reject H0 Reject H0

92 M

(b)

Do not reject H0 Reject

24 0.10

Performing the Mann–Whitney Test

Procedure 10.5 on the following page provides a step-by-step method for performing a Mann–Whitney test. Note that we often use the phrase same-shape populations to indicate that the two distributions (one for each population) of the variable under consideration have the same shape.

Note:When there are ties in the sample data, ranks are assigned in the same way as in the Wilcoxon signed-rank test. Namely, if two or more observations are tied, each is assigned the mean of the ranks they would have had if there had been no ties.

468 CHAPTER 10 Inferences for Two Population Means

PROCEDURE 10.5 Mann–Whitney Test

Purpose To perform a hypothesis test to compare two population means,μ1andμ2

Assumptions

1. Simple random samples 2. Independent samples 3. Same-shape populations

Step 1 The null hypothesis isH0:μ1=μ2, and the alternative hypothesis is Ha:μ1=μ2 or Ha:μ1< μ2 or Ha:μ1> μ2

(Two tailed) (Left tailed) (Right tailed) Step 2 Decide on the signiﬁcance level,α.

Step 3 Compute the value of the test statistic

M=sum of the ranks for sample data from Population 1

and denote that value M0. To do so, construct a work table of the following form.

Sample from Overall Sample from Overall Population 1 rank Population 2 rank

ã ã ã ã

CRITICAL-VALUE APPROACH OR P-VALUE APPROACH

Step 4 The critical value(s) are

M1−α/2andMα/2 M1−α Mα

or or

(Two tailed) (Left tailed) (Right tailed) Use Table VI to ﬁnd the critical value(s). For a left- tailed or two-tailed test, you will also need the rela- tionM1−A=n1(n1+n2+1)−MA.

M Left tailed Do not

reject H0 Reject

Reject H0

Do not reject H0 Reject

Do not reject H0 Reject H0

M Right tailed M

Two tailed /2

M1−/2 M/2 M1− M

Step 5 If the value of the test statistic falls in the rejection region, reject H0; otherwise, do not rejectH0.

Step 4 Obtain the P-value by using technology.

P- value

M M M

P- value

Two tailed Left tailed Right tailed

M0 M0

M0 P- value

Step 5 If P≤α, reject H0; otherwise, do not reject H0.

Step 6 Interpret the results of the hypothesis test.

EXAMPLE 10.11 The Mann–Whitney Test

Computer-System Training Let’s complete the hypothesis test of Example 10.9.

Independent simple random samples of employees with and without computer- system experience were obtained. The employees selected were timed to see how long it would take them to learn how to use a certain computer system.

The times, in minutes, are given in Table 10.9 on page 465. At the 5% sig- niﬁcance level, do the data provide sufﬁcient evidence to conclude that the mean learning time for employees without experience exceeds that for employees with experience?

Solution We apply Procedure 10.5.

Step 1 State the null and alternative hypotheses.

Letμ1andμ2denote the mean learning times for all employees without and with experience, respectively. Then the null and alternative hypotheses are, respectively,

H0:μ1=μ2(mean time for inexperienced employees is not greater) Ha:μ1> μ2(mean time for inexperienced employees is greater).

Note that the hypothesis test is right tailed.

Step 2 Decide on the signiﬁcance level,α.

We are to perform the test at the 5% signiﬁcance level; so,α=0.05.

Step 3 Compute the value of the test statistic

M =sum of the ranks for sample data from Population 1.

The critical value for a two-tailed test

Inferences for Two Population Means, Using Paired Samples