JERZY NEYMAN: A PRINCIPAL FOUNDER OF MODERN STATISTICAL THEORY
PROCEDURE 10.8 Paired Wilcoxon Signed-Rank Test
Purpose To perform a hypothesis test to compare two population means,μ1andμ2
Assumptions
1. Simple random paired sample 2. Symmetric differences
Step 1 The null hypothesis is H0:μ1=μ2, and the alternative hypothesis is Ha:μ1=μ2 Ha:μ1< μ2 Ha:μ1> μ2
or or .
(Two tailed) (Left tailed) (Right tailed) Step 2 Decide on the significance level,α.
Step 3 Compute the value of the test statistic
W =sum of the positive ranks
and denote that valueW0. To do so, first calculate the paired differences of the sample pairs, next discard all paired differences that equal 0 and reduce the sample size accordingly, and then construct a work table of the following form.
Paired difference Rank Signed rank d |d| of|d| R
ã ã ã ã
ã ã ã ã
ã ã ã ã
CRITICAL-VALUE APPROACH OR P-VALUE APPROACH Step 4 The critical value(s) are
W1−α/2andWα/2 W1−α Wα
or or .
(Two tailed) (Left tailed) (Right tailed) Use Table V to find the critical value(s). For a
left-tailed or two-tailed test, you will also need the relationW1−A=n(n+1)/2−WA.
W Left tailed Do not
reject H0 Reject
H0
Reject H0
Do not reject H0 Reject
H0
Do not reject H0 Reject H0
W Right tailed W
Two tailed
␣/2
W1−␣/2 W␣/2 W1−␣ W␣
␣
␣
␣/2
Step 5 If the value of the test statistic falls in the rejection region, rejectH0; otherwise, do not reject H0.
Step 4 Obtain theP-value by using technology.
P- value
W W W
P- value
Two tailed Left tailed Right tailed
W0 W0
W0 P- value
Step 5 IfP ≤α, reject H0; otherwise, do not rejectH0.
Step 6 Interpret the results of the hypothesis test.
In Example 10.16 on pages 511–512, we used a pairedt-test to decide whether a difference exists in the mean ages of married men and married women. Now we do so by using the paired Wilcoxon signed-rank test.
EXAMPLE 10.19 The Paired Wilcoxon Signed-Rank Test
Ages of Married People The U.S. Census Bureaupublishes information on the ages of married people inCurrent Population Reports. A random sample of 10 mar- ried couples gave the data on ages, in years, shown in the second and third columns of Table 10.14. The fourth column shows the paired differences, obtained by sub- tracting the age of each wife from that of her husband.
TABLE 10.14 Ages, in years, of a random sample of 10 married couples
Couple Husband Wife Difference,d
1 59 53 6
2 21 22 −1
3 33 36 −3
4 78 74 4
5 70 64 6
6 33 35 −2
7 68 67 1
8 32 28 4
9 54 41 13
10 52 44 8
At the 5% significance level, do the data provide sufficient evidence to conclude that the mean age of married men differs from the mean age of married women?
Solution First, we check the two conditions required for using the paired Wilcoxon signed-rank test, as listed in Procedure 10.8.
FIGURE 10.17 Stem-and-leaf diagram (using five lines per stem) of the paired differences in Table 10.14 3 2 1 1
4 4 6 6 8
3 –0 – 0 0 0 0 0 0 1 1
r Assumption 1 is satisfied because we have a simple random paired sample. Each pair consists of a married couple.
r Figure 10.17 shows a stem-and-leaf diagram for the sample of paired differences in the last column of Table 10.14. Because the diagram is roughly symmetric, we can consider Assumption 2 satisfied.
From the preceding items, we see that the paired Wilcoxon signed-rank test can be used to conduct the required hypothesis test. We apply Procedure 10.8.
Step 1 State the null and alternative hypotheses.
Letμ1denote the mean age of all married men, and letμ2denote the mean age of all married women. Then the null and alternative hypotheses are, respectively,
H0: μ1=μ2(mean ages are equal) Ha: μ1=μ2(mean ages differ).
Note that the hypothesis test is two tailed.
Step 2 Decide on the significance level,α.
We are to perform the test at the 5% significance level, soα=0.05.
Step 3 Compute the value of the test statistic
W =sum of the positive ranks.
The paired differences (d-values) are shown in the fourth column of Table 10.14.
We note that none of the paired differences equal 0 and proceed to construct the fol- lowing work table. Observe that, in several instances, ties occur among the absolute
10.6 The Paired Wilcoxon Signed-Rank Test∗ 523 paired differences (|d|-values). To deal with such ties, we proceed in the usual man- ner. Specifically, if two or more absolute paired differences are tied, each is assigned the mean of the ranks they would have had if there had been no ties. For instance, the second and seventh paired differences (−1 and 1) both have the smallest abso- lute paired difference, each of which is assigned rank (1+2)/2 or 1.5, as shown in the third column of the table.
Paired difference Rank Signed rank
d |d| of|d| R
6 6 7.5 7.5
−1 1 1.5 −1.5
−3 3 4 −4
4 4 5.5 5.5
6 6 7.5 7.5
−2 2 3 −3
1 1 1.5 1.5
4 4 5.5 5.5
13 13 10 10
8 8 9 9
Referring to the last column of the preceding table, we find that the value of the test statistic is
W =7.5+5.5+7.5+1.5+5.5+10+9=46.5. CRITICAL-VALUE APPROACH OR P-VALUE APPROACH Step 4 The critical values for a two-tailed test
areW1−α/2andWα/2. Use Table V and the relation W1−A=n(n+1)/2−WAto find the critical values.
From Table 10.14, we see thatn=10. The critical val- ues for a two-tailed test at the 5% significance level areW1−0.05/2andW0.05/2, that is,W1−0.025andW0.025. First we use Table V to find W0.025. We go down the outside columns, labeledn, to “10.” Then, going across that row to the column labeled W0.025, we reach 47;
thusW0.025=47. Now we apply the aforementioned re- lation and the result just obtained to getW1−0.025:
W1−0.025=10(10+1)/2−W0.025=55−47=8. See Fig. 10.18A.
FIGURE 10.18A
Do not reject H0 Reject
H0
Reject H0
W 0.025
8 47
0.025
Step 5 If the value of the test statistic falls in the rejection region, rejectH0; otherwise, do not rejectH0.
The value of the test statistic isW =46.5, as found in Step 3, which does not fall in the rejection region shown in Fig. 10.18A. Thus we do not rejectH0. The test results are not statistically significant at the 5% level.
Step 4 Obtain the P-value by using technology.
Using technology, we find that the P-value for the hy- pothesis test isP =0.059, as shown in Fig. 10.18B.
FIGURE 10.18B
W
W = 46.5 P = 0.059
Step 5 If P≤α, reject H0; otherwise, do not rejectH0.
From Step 4,P=0.059. Because the P-value exceeds the specified significance level of 0.05, we do not re- jectH0. The test results are not statistically significant at the 5% level but (see Table 9.8 on page 408) the data do nonetheless provide moderate evidence against the null hypothesis.
Step 6 Interpret the results of the hypothesis test.
Interpretation At the 5% significance level, the data do not provide sufficient evidence to conclude that the mean age of married men differs from the mean age of married women.
Report 10.8
Exercise 10.191 on page 528
It is interesting to note that, although we reject the null hypothesis of equal mean ages at the 5% significance level by using the pairedt-test (Example 10.16), we do not reject it by using the paired Wilcoxon signed-rank test (Example 10.19). Nonetheless, the evidence against the null hypothesis is comparable with both tests:P =0.048 and
P=0.059, respectively.
Comparing the Paired Wilcoxon Signed-Rank Test and the Paired t-Test
As we demonstrated in Section 10.5, a pairedt-test can be used to conduct a hypothesis test to compare two population means when we have a paired sample and the paired- difference variable is normally distributed. Because normally distributed variables have symmetric distributions, we can also use the paired Wilcoxon signed-rank test to per- form such a hypothesis test.
For a normally distributed paired-difference variable, the pairedt-test is more pow- erful than the paired Wilcoxon signed-rank test because it is designed expressly for such paired-difference variables; surprisingly, though, the pairedt-test is not much more powerful than the paired Wilcoxon signed-rank test. However, if the paired- difference variable has a symmetric distribution but is not normally distributed, the paired Wilcoxon signed-rank test is usually more powerful than the pairedt-test and is often considerably more powerful.
KEY FACT 10.7 Paired Wilcoxon Signed-Rank Test Versus the Paired t-Test
Suppose that you want to perform a hypothesis test using a paired sample to compare the means of two populations. When deciding between the paired t-test and the paired Wilcoxon signed-rank test, follow these guidelines:
r If you are reasonably sure that the paired-difference variable is normally distributed, use the pairedt-test.
r If you are not reasonably sure that the paired-difference variable is normally distributed but are reasonably sure that it has a symmetric distribution, use the paired Wilcoxon signed-rank test.
THE TECHNOLOGY CENTER
Most statistical technologies have programs that automatically perform a paired Wilcoxon signed-rank test. In this subsection, we present output and step-by-step in- structions for such programs. Although many statistical technologies present the output of the paired Wilcoxon signed-rank test in terms of medians, it can also be interpreted in terms of means.
As you will see, different programs may report slightly different P-values for a paired Wilcoxon signed-rank test. These differences are due to the fact that different programs may use different methods for obtaining or approximating such P-values.
Note to Minitab users:At the time of this writing, Minitab does not have a built-in program for a paired Wilcoxon signed-rank test. You can conduct such a test, however, by applying Minitab’s (one-sample) Wilcoxon signed-rank test to the sample of paired differences, using the null hypothesisH0:μd =0.
10.6 The Paired Wilcoxon Signed-Rank Test∗ 525 Note to TI-83/84 Plus users:At the time of this writing, the TI-83/84 Plus does not have a built-in program for conducting a paired Wilcoxon signed-rank test. However, we have written a TI program called PDWILCOX for performing that test. It is located in the TI Programs section on the WeissStats site. Your instructor can show you how to download the program to your calculator.Warning:Any data that you may have previously stored in Lists 1–6 will be erased during program execution, so copy those data to other lists prior to program execution if you want to retain them.
EXAMPLE 10.20 Using Technology to Conduct Paired Wilcoxon Signed-Rank Test
Ages of Married People The second and third columns of Table 10.14 on page 522 give the ages of 10 randomly selected married couples. Use Minitab, Excel, or the TI-83/84 Plus to decide, at the 5% significance level, whether the data provide suffi- cient evidence to conclude that the mean age of married men differs from the mean age of married women.
Solution Letμ1denote the mean age of all married men, and letμ2 denote the mean age of all married women. We want to perform the hypothesis test
H0: μ1=μ2(mean ages are equal) Ha: μ1=μ2(mean ages differ), at the 5% significance level.
We applied the Wilcoxon signed-rank programs to the data, resulting in Out- put 10.5. Steps for generating that output are presented in Instructions 10.5 on the following page.Note to Excel users:For brevity, we have presented only the essen- tial portions of the actual output.
MINITAB
OUTPUT 10.5 Paired Wilcoxon signed-rank test on the age data
EXCEL
TI-83/84 PLUS
As shown in Output 10.5, theP-value for the hypothesis test is 0.059. Because the P-value exceeds the specified significance level of 0.05, we do not reject H0. At the 5% significance level, the data do not provide sufficient evidence to conclude that the mean age of married men differs from the mean age of married women.
INSTRUCTIONS 10.5 Steps for generating Output 10.5 MINITAB
1 Store the data from the second and third columns of Table 10.14 in columns named HUSBAND and WIFE 2 ChooseCalc➤Calculator. . .
3 TypeDIFFERENCEin theStore result in variable text box
4 Specify ‘HUSBAND’–‘WIFE’ in theExpressiontext box and clickOK
5 ChooseStat➤Nonparametrics➤1-Sample Wilcoxon. . .
6 Press the F3 key to reset the dialog box 7 Specify DIFFERENCE in theVariablestext box 8 Select theTest medianoption button
9 Click the arrow button at the right of theAlternative drop-down list box and selectnot equal
10 ClickOK EXCEL
1 Store the age data from the second and third columns of Table 10.14 in columns named HUSBAND and WIFE 2 ChooseXLSTAT➤Nonparametric tests➤
Comparison of two samples (Wilcoxon, Mann-Whitney, . . . )
3 Click the reset button in the lower left corner of the dialog box
4 Click in theSample 1selection box and then select the column of the worksheet that contains the HUSBAND data
5 Click in theSample 2selection box and then select the column of the worksheet that contains the WIFE data 6 Uncheck theSign testcheck box
7 Click theOptionstab
8 Click the arrow button at the right of theAlternative hypothesisdrop-down list box and selectSample 1 – Sample 2=D
9 Type5in theSignificance level (%)text box 10 Check theExact p-valuecheck box
11 ClickOK
12 Click theContinuebutton in theXLSTAT – Selections dialog box
TI-83/84 PLUS
1 Store the data from Table 10.14 in lists named HUSB and WIFE
2 PressPRGM
3 Arrow down to PDWILCOX and pressENTERtwice 4 Press2ND➤LIST, arrow down to HUSB, and press
ENTERtwice
5 Press2ND➤LIST, arrow down to WIFE, and press ENTERtwice
6 Type0forTYPEand pressENTER
Exercises 10.6
Understanding the Concepts and Skills
10.177 Suppose that you want to perform a hypothesis test based on a simple random paired sample to compare the means of two popula- tions and you know that the paired-difference variable has a symmet- ric distribution that is far from normal.
a. Is use of the pairedt-test acceptable if the sample size is small or moderate? Why or why not?
b. Is use of the pairedt-test acceptable if the sample size is large?
Why or why not?
c. Is use of the paired Wilcoxon signed-rank test acceptable? Why or why not?
d. If both the pairedt-test and the paired Wilcoxon signed-rank test are acceptable, which test is preferable? Explain your answer.
10.178 A hypothesis test based on a simple random paired sample is to be performed to compare the means of two populations. The sample of 15 paired differences contains an outlier but otherwise is approximately bell shaped. Assuming that removing the outlier is not legitimate, which test is better to use—the pairedt-test or the paired Wilcoxon signed-rank test? Explain your answer.
10.179 Suppose that you want to perform a hypothesis test based on a simple random paired sample to compare the means of two popula- tions. For each part, decide whether you would use the pairedt-test,
the paired Wilcoxon signed-rank test, or neither of these tests. Prelim- inary data analyses of the sample of paired differences suggest that the distribution of the paired-difference variable is
a. approximately normal.
b. highly skewed; the sample size is 20.
c. symmetric bimodal.
10.180 Suppose that you want to perform a hypothesis test based on a simple random paired sample to compare the means of two popula- tions. For each part, decide whether you would use the pairedt-test, the paired Wilcoxon signed-rank test, or neither of these tests. Prelim- inary data analyses of the sample of paired differences suggest that the distribution of the paired-difference variable is
a. uniform.
b. not symmetric; the sample size is 132.
c. moderately skewed but otherwise approximately bell shaped.
In Exercises10.181–10.186, the null hypothesis is H0:μ1=μ2and the alternative hypothesis is as specified. We have provided data from a simple random paired sample from the two populations under con- sideration. In each case, use the paired Wilcoxon signed-rank test to perform the required hypothesis test at the 10% significance level.
(Note: These problems were presented as Exercises 10.149–10.154 in Section 10.5, where they were to be solved by using the paired t -test.)
10.6 The Paired Wilcoxon Signed-Rank Test∗ 527 10.181 Ha:μ1=μ2
Observation from Pair Population 1 Population 2
1 13 11
2 16 15
3 13 10
4 14 8
5 12 8
6 8 9
7 17 14
10.182 Ha:μ1< μ2
Observation from Pair Population 1 Population 2
1 7 13
2 4 9
3 10 6
4 0 2
5 20 19
6 −1 5
7 12 10
10.183 Ha:μ1> μ2
Observation from Pair Population 1 Population 2
1 7 3
2 4 5
3 9 8
4 7 2
5 19 16
6 12 12
7 13 18
8 5 11
10.184 Ha:μ1=μ2
Observation from Pair Population 1 Population 2
1 10 12
2 8 7
3 13 11
4 13 16
5 17 15
6 12 9
7 12 12
8 11 7
10.185 Ha:μ1< μ2
Observation from Pair Population 1 Population 2
1 15 18
2 22 25
3 15 17
4 27 24
5 24 30
6 23 23
7 8 10
8 20 27
9 2 3
10.186 Ha:μ1> μ2
Observation from Pair Population 1 Population 2
1 40 32
2 30 29
3 34 36
4 22 18
5 35 31
6 26 26
7 26 25
8 27 25
9 11 15
10 35 31
Applying the Concepts and Skills
Exercises 10.187–10.192 repeat Exercises 10.155–10.160 of Sec- tion 10.5. There, you applied the paired t-test to solve each problem.
Now solve each problem by applying the paired Wilcoxon signed-rank test.
10.187 Behavioral Genetics. In the article “Growth references for height, weight and BMI of Twins aged 0–2.5 years” (ACTA Pedi- atrica, Vol. 97, pp. 1099–1104), the researchers, P. Dommelen et al.
determines the size of the growth deficit in Dutch monozygotic and dizygotic twins aged between 0–2.5 years as compared to the single- tons and to construct reference growth charts for twins. The follow- ing table shows the difference of the height of the twins at various age groups from 0 to 2.5 years.
5.2 4.5 3.2 4.1 4.8 5.0 4.3 2.7 3.4 3.8 4.7 3.9 1.9 2.3 1.0
a. At the 5% significance level, do the data provide sufficient ev- idence to conclude that the mean heights of twins separately differ?
b. Repeat part (a) at the 1% significance level.
10.188 Sleep. In 1908, W. S. Gosset published “The Probable Error of a Mean” (Biometrika, Vol. 6, pp. 1–25). In this pioneering paper,
published under the pseudonym “Student,” he introduced what later became known as Student’st-distribution. Gosset used the following data set, which gives the additional sleep in hours obtained by 10 pa- tients who used laevohysocyamine hydrobromide.
1.9 0.8 1.1 0.1 −0.1
4.4 5.5 1.6 4.6 3.4
a. At the 5% significance level, do the data provide sufficient evi- dence to conclude that laevohysocyamine hydrobromide is effec- tive in increasing sleep?
b. Repeat part (a) at the 1% significance level.
10.189 Anorexia Treatment. Anorexia nervosa is a serious eat- ing disorder, particularly among young women. The following data provide the weights, in pounds, of 17 anorexic young women before and after receiving a family therapy treatment for anorexia nervosa.
[SOURCE: D. Hand et al., ed.,A Handbook of Small Data Sets, Lon- don: Chapman & Hall, 1994; raw data from B. Everitt (personal com- munication)]
Before After Before After Before After
83.3 94.3 76.9 76.8 82.1 95.5
86.0 91.5 94.2 101.6 77.6 90.7
82.5 91.9 73.4 94.9 83.5 92.5
86.7 100.3 80.5 75.2 89.9 93.8
79.6 76.7 81.6 77.8 86.0 91.7
87.3 98.0 83.8 95.2
Does family therapy appear to be effective in helping anorexic young women gain weight? Perform the appropriate hypothesis test at the 5% significance level.
10.190 Measuring Treadwear. R. Stichler et al. compared two methods of measuring treadwear in their paper “Measurement of Treadwear of Commercial Tires” (Rubber Age, 73:2). Eleven tires were each measured for treadwear by two methods, one based on weight and the other on groove wear. The following are the data, in thousands of miles.
Weight Groove Weight Groove method method method method
30.5 28.7 24.5 16.1
30.9 25.9 20.9 19.9
31.9 23.3 18.9 15.2
30.4 23.1 13.7 11.5
27.3 23.7 11.4 11.2
20.4 20.9
At the 5% significance level, do the data provide sufficient evidence to conclude that, on average, the two measurement methods give dif- ferent results?
10.191 Glaucoma and Corneal Thickness. Glaucoma is a lead- ing cause of blindness in the United States. N. Ehlers measured the corneal thickness of eight patients who had glaucoma in one eye but not in the other. The results of the study were published in the paper “On Corneal Thickness and Intraocular Pressure, II” (Acta
Opthalmologica, Vol. 48, pp. 1107–1112). The following are the data on corneal thickness, in microns.
Patient Normal Glaucoma
1 484 488
2 478 478
3 492 480
4 444 426
5 436 440
6 398 410
7 464 458
8 476 460
At the 10% significance level, do the data provide sufficient evidence to conclude that mean corneal thickness is greater in normal eyes than in eyes with glaucoma?
10.192 Cooling Down. Cooling down with a cold drink before ex- ercise in the heat is believed to help an athlete perform. Researcher J. Dugas explored the difference between cooling down with an ice slurry (slushy) and with cold water in the article “Ice Slurry Inges- tion Increases Running Time in the Heat” (Clinical Journal of Sports Medicine, Vol. 21, No. 6, pp. 541–542). Ten male participants drank a flavored ice slurry and ran on a treadmill in a controlled hot and hu- mid environment. Days later, the same participants drank cold water and ran on a treadmill in the same hot and humid environment. The following table shows the times, in minutes, it took to fatigue on the treadmill for both the ice slurry and the cold water.
Subject Cold Water Ice Slurry
1 52 56
2 37 43
3 44 52
4 51 58
5 34 38
6 38 45
7 41 45
8 50 58
9 29 34
10 38 44
At the 1% significance level, do the data provide sufficient evidence to conclude that, on average, cold water is less effective than ice slurry for optimizing athletic performance in the heat?
In each of Exercises10.193–10.195, use the technology of your choice to perform the required tasks.
10.193 Font Readability. In the online paper “A Comparison of Two Computer Fonts: Serif versus Ornate Sans Serif” (Usability News, Issue 5.3), researchers S. Morrison and J. Noyes studied whether the type of font used in a document affects reading speed or comprehension. The fonts used for the comparisons were the serif font Times New Roman (TNR) and a more ornate sans serif font called Gigi. There were 10 substitution words used for testing the comprehensibility of the two fonts. The substitution words were in- appropriate to the context of the passage and varied grammatically from the original words in the paragraphs. The following table gives the number of inappropriate words out of the 10 that were identified in the TNR and Gigi fonts by each of the 25 participants.