In Section 10.5, we discussed the paired t-procedures, which provide methods for comparing two population means using paired samples. An assumption for use of those procedures is that the paired-difference variable is (approximately) normally dis- tributed or that the sample size is large. For a small or moderate sample size where the distribution of the paired-difference variable is far from normal, a pairedt-procedure is inappropriate and a nonparametric procedure should be used instead.
For instance, if the distribution of the paired-difference variable is symmetric (but not necessarily normal), we can perform a hypothesis test to compare the means of the two populations by applying the Wilcoxon signed-rank test (Procedure 9.3 on page 404) to the sample of paired differences. In this context, the Wilcoxon signed- rank test is called thepaired Wilcoxon signed-rank test.
Procedure 10.8 on the next page provides the steps for performing a paired Wilcoxon signed-rank test. Note that we use the phrase symmetric differences as shorthand for “the paired-difference variable has a symmetric distribution.”
In Example 10.16 on page 480, we used a pairedt-test to decide whether a differ- ence exists in the mean ages of married men and married women. Now we do so by using the paired Wilcoxon signed-rank test.
EXAMPLE 10.19 The Paired Wilcoxon Signed-Rank Test
Ages of Married People The U.S. Census Bureaupublishes information on the ages of married people inCurrent Population Reports. A random sample of 10 mar- ried couples gave the data on ages, in years, shown in the second and third columns of Table 10.14. The fourth column shows the paired differences, obtained by sub- tracting the age of each wife from that of her husband.
TABLE 10.14 Ages, in years, of a random sample of 10 married couples
Couple Husband Wife Difference,d
1 59 53 6
2 21 22 −1
3 33 36 −3
4 78 74 4
5 70 64 6
6 33 35 −2
7 68 67 1
8 32 28 4
9 54 41 13
10 52 44 8
At the 5% significance level, do the data provide sufficient evidence to conclude that the mean age of married men differs from the mean age of married women?
492 CHAPTER 10 Inferences for Two Population Means
PROCEDURE 10.8 Paired Wilcoxon Signed-Rank Test
Purpose To perform a hypothesis test to compare two population means,μ1andμ2
Assumptions
1. Simple random paired sample 2. Symmetric differences
Step 1 The null hypothesis isH0:μ1=μ2, and the alternative hypothesis is Ha:μ1=μ2 Ha:μ1< μ2 Ha:μ1> μ2
or or
(Two tailed) (Left tailed) (Right tailed) Step 2 Decide on the significance level,α.
Step 3 Compute the value of the test statistic W =sum of the positive ranks
and denote that valueW0. To do so, first calculate the paired differences of the sample pairs, next discard all paired differences that equal 0 and reduce the sample size accordingly, and then construct a work table of the following form.
Paired difference Rank Signed rank
d |d| of|d| R
ã ã ã ã
ã ã ã ã
ã ã ã ã
CRITICAL-VALUE APPROACH OR P-VALUE APPROACH
Step 4 The critical value(s) are
W1−α/2andWα/2 W1−α Wα
or or
(Two tailed) (Left tailed) (Right tailed) Use Table V to find the critical value(s). For a left- tailed or two-tailed test, you will also need the rela- tionW1−A=n(n+1)/2−WA.
W Left tailed Do not
reject H0 Reject
H0
Reject H0
Do not reject H0 Reject
H0
Do not reject H0 Reject H0
W Right tailed W
Two tailed /2
W1−/2 W/2 W1− W
/2
Step 5 If the value of the test statistic falls in the rejection region, reject H0; otherwise, do not rejectH0.
Step 4 Obtain the P-value by using technology.
P- value
W W W
P- value
Two tailed Left tailed Right tailed
W0 W0
W0 P- value
Step 5 If P≤α, reject H0; otherwise, do not reject H0.
Step 6 Interpret the results of the hypothesis test.
Solution First, we check the two conditions required for using the paired Wil- coxon signed-rank test, as listed in Procedure 10.8.
FIGURE 10.17 Stem-and-leaf diagram (using five lines per stem) of the paired differences in Table 10.14 3 2 1 1
4 4 6 6 8
3 –0 – 0 0 0 0 0 0 1 1
r Assumption 1 is satisfied because we have a simple random paired sample. Each pair consists of a married couple.
r Figure 10.17 shows a stem-and-leaf diagram for the sample of paired differences in the last column of Table 10.14. Because the diagram is roughly symmetric, we can consider Assumption 2 satisfied.
From the preceding items, we see that the paired Wilcoxon signed-rank test can be used to conduct the required hypothesis test. We apply Procedure 10.8.
Step 1 State the null and alternative hypotheses.
Letμ1denote the mean age of all married men, and letμ2denote the mean age of all married women. Then the null and alternative hypotheses are, respectively,
H0: μ1=μ2(mean ages are equal) Ha: μ1 =μ2(mean ages differ).
Note that the hypothesis test is right tailed.
Step 2 Decide on the significance level,α.
We are to perform the test at the 5% significance level, soα=0.05.
Step 3 Compute the value of the test statistic
W =sum of the positive ranks.
The paired differences (d-values) are shown in the fourth column of Table 10.14.
We note that none of the paired differences equal 0 and proceed to construct the following work table. Observe that, in several instances, ties occur among the abso- lute paired differences (|d|-values). To deal with such ties, we proceed in the usual manner. Specifically, if two or more absolute paired differences are tied, each is as- signed the mean of the ranks they would have had if there had been no ties. For in- stance, the second and seventh paired differences (−1 and 1) both have the smallest absolute paired difference, each of which is assigned rank (1+2)/2 or 1.5, as shown in the third column of the table.
Paired difference Rank Signed rank
d |d| of|d| R
6 6 7.5 7.5
−1 1 1.5 −1.5
−3 3 4 −4
4 4 5.5 5.5
6 6 7.5 7.5
−2 2 3 −3
1 1 1.5 1.5
4 4 5.5 5.5
13 13 10 10
8 8 9 9
Referring to the last column of the preceding table, we find that the value of the test statistic is
W =7.5+5.5+7.5+1.5+5.5+10+9=46.5.
494 CHAPTER 10 Inferences for Two Population Means
CRITICAL-VALUE APPROACH OR P-VALUE APPROACH Step 4 The critical values for a two-tailed test
areW1−α/2andWα/2. Use Table V and the relation W1−A=n(n+1)/2−WAto find the critical values.
From Table 10.14, we see thatn=10. The critical val- ues for a two-tailed test at the 5% significance level areW1−0.05/2andW0.05/2, that is,W1−0.025andW0.025. First we use Table V to find W0.025. We go down the outside columns, labeledn, to “10.” Then, going across that row to the column labeled W0.025, we reach 47;
thusW0.025=47. Now we apply the aforementioned re- lation and the result just obtained to getW1−0.025:
W1−0.025=10(10+1)/2−W0.025=55−47=8. See Fig. 10.18A.
FIGURE 10.18A
Do not reject H0 Reject
H0
Reject H0
W 0.025
8 47
0.025
Step 5 If the value of the test statistic falls in the rejection region, rejectH0; otherwise, do not rejectH0.
The value of the test statistic isW =46.5, as found in Step 3, which does not fall in the rejection region shown in Fig. 10.18A. Thus we do not reject H0. The test re- sults are not statistically significant at the 5% level.
Step 4 Obtain the P-value by using technology.
Using technology, we find that the P-value for the hy- pothesis test isP=0.059, as shown in Fig. 10.18B.
FIGURE 10.18B
W
W = 46.5 P = 0.059
Step 5 If P ≤α, reject H0; otherwise, do not rejectH0.
From Step 4,P=0.059. Because the P-value exceeds the specified significance level of 0.05, we do not re- ject H0. The test results are not statistically significant at the 5% level but (see Table 9.8 on page 378) the data do nonetheless provide moderate evidence against the null hypothesis.
Step 6 Interpret the results of the hypothesis test.
Interpretation At the 5% significance level, the data do not provide sufficient evidence to conclude that the mean age of married men differs from the mean age of married women.
Report 10.8
Exercise 10.177 on page 498
It is interesting to note that, although we reject the null hypothesis of equal mean ages at the 5% significance level by using the pairedt-test (Example 10.16), we do not reject it by using the paired Wilcoxon signed-rank test (Example 10.18). Nonetheless, the evidence against the null hypothesis is comparable with both tests:P=0.048 and P=0.059, respectively.
Comparing the Paired Wilcoxon Signed-Rank Test and the Paired t -Test
As we demonstrated in Section 10.5, a pairedt-test can be used to conduct a hypoth- esis test to compare two population means when we have a paired sample and the paired-difference variable is normally distributed. Because normally distributed vari- ables have symmetric distributions, we can also use the paired Wilcoxon signed-rank test to perform such a hypothesis test.
For a normally distributed paired-difference variable, the paired t-test is more powerful than the paired Wilcoxon signed-rank test because it is designed expressly
for such paired-difference variables; surprisingly, though, the pairedt-test is not much more powerful than the paired Wilcoxon signed-rank test. However, if the paired- difference variable has a symmetric distribution but is not normally distributed, the paired Wilcoxon signed-rank test is usually more powerful than the pairedt-test and is often considerably more powerful.
KEY FACT 10.7 Paired Wilcoxon Signed-Rank Test Versus the Pairedt-Test
Suppose that you want to perform a hypothesis test using a paired sample to compare the means of two populations. When deciding between the paired t-test and the paired Wilcoxon signed-rank test, follow these guidelines:
r If you are reasonably sure that the paired-difference variable is normally distributed, use the pairedt-test.
r If you are not reasonably sure that the paired-difference variable is normally distributed but are reasonably sure that it has a symmetric distribution, use the paired Wilcoxon signed-rank test.
THE TECHNOLOGY CENTER
Some statistical technologies have programs that automatically perform a paired Wilcoxon signed-rank test. In this subsection, we present output and step-by-step instructions for such programs. Although many statistical technologies present the output of the paired Wilcoxon signed-rank test in terms of medians, it can also be interpreted in terms of means.
Note to Minitab users:At the time of this writing, Minitab does not have a built-in program for a paired Wilcoxon signed-rank test. You can conduct such a test, however, by applying Minitab’s (one-sample) Wilcoxon signed-rank test to the sample of paired differences, using the null hypothesisH0:μd =0.
Note to TI-83/84 Plus users: At the time of this writing, the TI-83/84 Plus does not have a built-in program for a paired Wilcoxon signed-rank test. However, a TI pro- gram, WILCOX, to help with the calculations is located in the TI Programs folder on the WeissStats CD. See theTI-83/84 Plus Manualfor details.
EXAMPLE 10.20 Using Technology to Conduct Paired Wilcoxon Signed-Rank Test
Ages of Married People The second and third columns of Table 10.14 on page 491 give the ages of 10 randomly selected married couples. Use Minitab or Excel to decide, at the 5% significance level, whether the data provide sufficient evidence to conclude that the mean age of married men differs from the mean age of married women.
Solution Letμ1denote the mean age of all married men, and letμ2denote the mean age of all married women. We want to perform the hypothesis test
H0: μ1=μ2(mean ages are equal) Ha: μ1 =μ2(mean ages differ), at the 5% significance level.
We applied the Wilcoxon signed-rank programs to the data, resulting in Out- put 10.5 on the following page. Steps for generating that output are presented in Instructions 10.5, also on the following page.
496 CHAPTER 10 Inferences for Two Population Means OUTPUT 10.5
Paired Wilcoxon signed-rank test on the age data
EXCEL MINITAB
As shown in Output 10.5, theP-value for the hypothesis test exceeds the spec- ified significance level of 0.05; hence, we do not rejectH0. At the 5% significance level, the data do not provide sufficient evidence to conclude that the mean age of married men differs from the mean age of married women.
INSTRUCTIONS 10.5
Steps for generating Output 10.5 MINITAB
1 Store the data from the second and third columns of Table 10.14 in columns named HUSBAND and WIFE
2 ChooseCalc➤Calculator. . . 3 TypeDIFFERENCEin theStore
result in variabletext box 4 Specify ‘HUSBAND’–‘WIFE’ in the
Expressiontext box and clickOK 5 ChooseStat➤Nonparametrics
➤1-Sample Wilcoxon. . . 6 Specify DIFFERENCE in the
Variablestext box
7 Select theTest medianoption button
8 Type0in theTest mediantext box 9 Click the arrow button at the right
of theAlternativedrop-down list box and selectnot equal 10 ClickOK
EXCEL
1 Store the data from the second and third columns of Table 10.14 in ranges named HUSBAND and WIFE 2 ChooseDDXL➤Nonparametric
Tests
3 SelectPaired Wilcoxonfrom the Function typedrop-down list box 4 Specify HUSBAND in the
1st Quantitative Variabletext box 5 Specify WIFE in the
2nd Quantitative Variabletext box 6 ClickOK
7 Click the0.05button 8 Click theTwo Tailedbutton 9 Click theComputebutton
Exercises 10.6
Understanding the Concepts and Skills
10.162 Suppose that you want to perform a hypothesis test based on a simple random paired sample to compare the means of two populations and that you know that the paired-difference variable is normally distributed. Answer each question and explain your answers.
a. Is it acceptable to use the pairedt-test?
b. Is it acceptable to use the paired Wilcoxon signed-rank test?
c. Which test is preferable, the pairedt-test or the paired Wil- coxon signed-rank test?
10.163 Suppose that you want to perform a hypothesis test based on a simple random paired sample to compare the means of two populations and you know that the paired-difference variable has a symmetric distribution that is far from normal.
a. Is use of the pairedt-test acceptable if the sample size is small or moderate? Why or why not?
b. Is use of the pairedt-test acceptable if the sample size is large?
Why or why not?
c. Is use of the paired Wilcoxon signed-rank test acceptable?
Why or why not?
d. If both the pairedt-test and the paired Wilcoxon signed-rank test are acceptable, which test is preferable? Explain your answer.
10.164 A hypothesis test based on a simple random paired sam- ple is to be performed to compare the means of two populations.
The sample of 15 paired differences contains an outlier but other- wise is approximately bell shaped. Assuming that removing the outlier is not legitimate, which test is better to use—the paired t-test or the paired Wilcoxon signed-rank test? Explain your answer.
10.165 Suppose that you want to perform a hypothesis test based on a simple random paired sample to compare the means of two populations. For each part, decide whether you would use the paired t-test, the paired Wilcoxon signed-rank test, or neither of these tests. Preliminary data analyses of the sample of paired differences suggest that the distribution of the paired-difference variable is
a. approximately normal.
b. highly skewed; the sample size is 20.
c. symmetric bimodal.
10.166 Suppose that you want to perform a hypothesis test based on a simple random paired sample to compare the means of two populations. For each part, decide whether you would use the paired t-test, the paired Wilcoxon signed-rank test, or neither of these tests. Preliminary data analyses of the sample of paired differences suggest that the distribution of the paired-difference variable is
a. uniform.
b. not symmetric; the sample size is 132.
c. moderately skewed but otherwise approximately bell shaped.
In each of Exercises 10.167–10.172, the null hypothesis is H0:μ1=μ2 and the alternative hypothesis is as specified. We have provided data from a simple random paired sample from the two populations under consideration. In each case, use the paired Wilcoxon signed-rank test to perform the required hypothesis test at the 10% significance level. (Note: These problems were pre-
sented as Exercises 10.135–10.140 in Section 10.5, where they were to be solved by using the paired t-test.)
10.167 Ha:μ1 =μ2
Observation from Pair Population 1 Population 2
1 13 11
2 16 15
3 13 10
4 14 8
5 12 8
6 8 9
7 17 14
10.168 Ha:μ1< μ2
Observation from Pair Population 1 Population 2
1 7 13
2 4 9
3 10 6
4 0 2
5 20 19
6 −1 5
7 12 10
10.169 Ha:μ1> μ2
Observation from Pair Population 1 Population 2
1 7 3
2 4 5
3 9 8
4 7 2
5 19 16
6 12 12
7 13 18
8 5 11
10.170 Ha:μ1 =μ2
Observation from Pair Population 1 Population 2
1 10 12
2 8 7
3 13 11
4 13 16
5 17 15
6 12 9
7 12 12
8 11 7
498 CHAPTER 10 Inferences for Two Population Means 10.171 Ha:μ1< μ2
Observation from Pair Population 1 Population 2
1 15 18
2 22 25
3 15 17
4 27 24
5 24 30
6 23 23
7 8 10
8 20 27
9 2 3
10.172 Ha:μ1> μ2
Observation from Pair Population 1 Population 2
1 40 32
2 30 29
3 34 36
4 22 18
5 35 31
6 26 26
7 26 25
8 27 25
9 11 15
10 35 31
Exercises 10.173–10.178 repeat Exercises 10.141–10.146 of Section 10.5. There, you applied the paired t-test to solve each problem. Now solve each problem by applying the paired Wilcoxon signed-rank test.
10.173 Zea Mays. Charles Darwin, author ofOrigin of Species, investigated the effect of cross-fertilization on the heights of plants. In one study he planted 15 pairs ofZea maysplants. Each pair consisted of one cross-fertilized plant and one self-fertilized plant grown in the same pot. The following table gives the height differences, in eighths of an inch, for the 15 pairs. Each differ- ence is obtained by subtracting the height of the self-fertilized plant from that of the cross-fertilized plant.
49 −67 8 16 6
23 28 41 14 29
56 24 75 60 −48
a. At the 5% significance level, do the data provide sufficient ev- idence to conclude that the mean heights of cross-fertilized and self-fertilizedZea maysdiffer?
b. Repeat part (a) at the 1% significance level.
10.174 Sleep. In 1908, W. S. Gosset published “The Probable Error of a Mean” (Biometrika, Vol. 6, pp. 1–25). In this pioneer- ing paper, published under the pseudonym “Student,” he intro- duced what later became known as Student’st-distribution. Gos- set used the following data set, which gives the additional sleep
in hours obtained by 10 patients who used laevohysocyamine hy- drobromide.
1.9 0.8 1.1 0.1 −0.1 4.4 5.5 1.6 4.6 3.4
a. At the 5% significance level, do the data provide sufficient evidence to conclude that laevohysocyamine hydrobromide is effective in increasing sleep?
b. Repeat part (a) at the 1% significance level.
10.175 Anorexia Treatment. Anorexia nervosa is a serious eat- ing disorder, particularly among young women. The following data provide the weights, in pounds, of 17 anorexic young women before and after receiving a family therapy treatment for anorexia nervosa. [SOURCE: D. Hand et al., ed.,A Handbook of Small Data Sets, London: Chapman & Hall, 1994; raw data from B. Everitt (personal communication)]
Before After Before After Before After
83.3 94.3 76.9 76.8 82.1 95.5
86.0 91.5 94.2 101.6 77.6 90.7
82.5 91.9 73.4 94.9 83.5 92.5
86.7 100.3 80.5 75.2 89.9 93.8
79.6 76.7 81.6 77.8 86.0 91.7
87.3 98.0 83.8 95.2
Does family therapy appear to be effective in helping anorexic young women gain weight? Perform the appropriate hypothesis test at the 5% significance level.
10.176 Measuring Treadwear. R. Stichler et al. compared two methods of measuring treadwear in their paper “Measurement of Treadwear of Commercial Tires” (Rubber Age, 73:2). Eleven tires were each measured for treadwear by two methods, one based on weight and the other on groove wear. The following are the data, in thousands of miles.
Weight Groove Weight Groove method method method method
30.5 28.7 24.5 16.1
30.9 25.9 20.9 19.9
31.9 23.3 18.9 15.2
30.4 23.1 13.7 11.5
27.3 23.7 11.4 11.2
20.4 20.9
At the 5% significance level, do the data provide sufficient evi- dence to conclude that, on average, the two measurement meth- ods give different results?
10.177 Glaucoma and Corneal Thickness. Glaucoma is a leading cause of blindness in the United States. N. Ehlers mea- sured the corneal thickness of eight patients who had glaucoma in one eye but not in the other. The results of the study were pub- lished in the paper “On Corneal Thickness and Intraocular Pres- sure, II” (Acta Opthalmologica, Vol. 48, pp. 1107–1112). The fol- lowing are the data on corneal thickness, in microns.
Patient Normal Glaucoma
1 484 488
2 478 478
3 492 480
4 444 426
5 436 440
6 398 410
7 464 458
8 476 460
At the 10% significance level, do the data provide sufficient evi- dence to conclude that mean corneal thickness is greater in nor- mal eyes than in eyes with glaucoma?
10.178 Fortified Orange Juice. V. Tangpricha et al. conducted a study to determine whether fortifying orange juice with vita- min D would increase serum 25-hydroxyvitamin D [25(OH)D]
concentration in the blood. The researchers reported their find- ings in the paper “Fortification of Orange Juice with Vitamin D:
A Novel Approach for Enhancing Vitamin D Nutritional Health”
(American Journal of Clinical Nutrition, Vol. 77, pp. 1478–
1483). A double-blind experiment was used in which 14 subjects drank 240 mL per day of orange juice fortified with 1000 IU of vitamin D and 12 subjects drank 240 mL per day of unfortified orange juice. Concentration levels were recorded at the beginning of the experiment and again at the end of 12 weeks. The follow- ing data, based on the results of the study, provide the before and after serum 25(OH)D concentrations in the blood, in nanomoles per liter (nmo/L), for the group that drank the fortified juice.
Before After Before After
8.6 33.8 3.9 75.0
32.3 137.0 1.5 83.3
60.7 110.6 18.1 71.5
20.4 52.7 100.9 142.0 39.4 110.5 84.3 171.4
15.7 39.1 32.3 52.1
58.3 124.1 41.7 112.9
At the 1% significance level, do the data provide sufficient evi- dence to conclude that, on average, drinking fortified orange juice increases the serum 25(OH)D concentration in the blood?
10.179 Tobacco Mosaic Virus. To assess the effects of two different strains of the tobacco mosaic virus, W. Youden and H. Beale randomly selected eight tobacco leaves. Half of each leaf was subjected to one of the strains of tobacco mosaic virus and the other half to the other strain. The researchers then counted the number of local lesions apparent on each half of each leaf.
The results of their study were published in the paper “A Sta- tistical Study of the Local Lesion Method for Estimating To- bacco Mosaic Virus” (Contributions to Boyce Thompson Insti- tute, Vol. 6, p. 437). Here are the data.
Leaf 1 2 3 4 5 6 7 8
Virus 1 31 20 18 17 9 8 10 7
Virus 2 18 17 14 11 10 7 5 6
Suppose that you want to perform a hypothesis test to determine whether a difference exists between the mean numbers of local lesions resulting from the two viral strains. Conduct prelimi- nary graphical analyses to decide whether applying the paired Wilcoxon signed-rank test is reasonable. Explain your decision.
10.180 Improving Car Emissions? The makers of the MAG- NETIZER Engine Energizer System (EES) claim that it improves gas mileage and reduces emissions in automobiles by using mag- netic free energy to increase the amount of oxygen in the fuel for greater combustion efficiency. Following are test results, per- formed under international and U.S. Government agency stan- dards, on a random sample of 14 vehicles. The data give the carbon monoxide (CO) levels, in parts per million, of each ve- hicle tested, both before installation of EES and after installation.
[SOURCE:Global Source Marketing]
Before After Before After
1.60 0.15 2.60 1.60
0.30 0.20 0.15 0.06
3.80 2.80 0.06 0.16
6.20 3.60 0.60 0.35
3.60 1.00 0.03 0.01
1.50 0.50 0.10 0.00
2.00 1.60 0.19 0.00
Suppose that you want to perform a hypothesis test to deter- mine whether, on average, EES reduces CO emissions. Conduct preliminary graphical data analyses to decide whether applying the paired Wilcoxon signed-rank test is reasonable. Explain your decision.
10.181 Consonantal Inventory Size. In the article “Intervo- calic Consonants in the Speech of Typically Developing Chil- dren: Emergence and Early Use” (Clinical Linguistics and Pho- netics, Vol. 16, Issue 3, pp. 155–168), C. Stoel-Gammon exam- ined the development of intervocalic consonants (consonants ap- pearing between two vowels) by children during the first years of life. The following data provide word-initial and word-final consonantal inventory sizes for nine children at age 21 months.
Child 1 2 3 4 5 6 7 8 9
Initial 16 14 13 12 12 11 8 7 6
Final 4 10 0 7 7 6 3 4 6
Suppose that you want to use these data to perform a hypothe- sis test to determine whether mean word-initial consonantal in- ventory size is greater than mean word-final consonantal inven- tory size. Conduct preliminary graphical data analyses to decide whether it is reasonable to apply the
a. pairedt-test.
b. paired Wilcoxon signed-rank test.
Working with Large Data Sets
10.182 Faculty Salaries. TheAmerican Association of Univer- sity Professors(AAUP) conducts salary studies of college pro- fessors and publishes its findings inAAUP Annual Report on the Economic Status of the Profession. Pairs were formed by match- ing faculty in private and public institutions by rank and specialty.