Referring to Table 16.11, we see that we can declare the meansμ2andμ3different and the meansμ2andμ4different; all other pairs of means are not declared different.
Step 5 Summarize the results in Step 4 by ranking the sample means from smallest to largest and by connecting with lines those whose population means were not declared different.
In light of Table 16.10, Step 4, and the numbering used to represent the U.S. regions (shown parenthetically), we obtain the following diagram.
West (4) South (3) Northeast (1) Midwest (2)
7.2 7.5 11.0 12.5
Step 6 Interpret the results of the multiple comparison.
Interpretation Referring to the diagram in Step 5, we conclude that last year’s mean energy consumption in the Midwest exceeds that in the West and South and that no other means can be declared different. All of this can be said with 95% con- fidence, the family confidence level.
Report 16.2
Exercise 16.95 on page 748
THE TECHNOLOGY CENTER
Some statistical technologies have programs that automatically perform a Tukey multi- ple comparison. In this subsection, we present output and step-by-step instructions for such programs. (Note to TI-83/84 Plus users: At the time of this writing, the TI-83/84 Plus does not have a built-in program for conducting a Tukey multiple comparison.
However, a TI program, TUKEY, to help with the analysis is located in the TI Pro- grams section on the WeissStats site.)
EXAMPLE 16.7 Using Technology to Conduct a Tukey Multiple Comparison Energy Consumption Table 16.9 on page 744 shows last year’s energy consump- tions for independent simple random samples of households in the four U.S. regions.
Apply Minitab or Excel to conduct a Tukey multiple comparison, using a 95% fam- ily confidence level.
Solution We applied the Tukey multiple-comparison programs to the data. Out- put 16.2 shows only the portion of the output essential to the Tukey multiple com- parison. Steps for generating that output are presented in Instructions 16.2 on this and the next page.
MINITAB
OUTPUT 16.2 Tukey multiple comparison on the energy-consumption data
EXCEL
In Output 16.2, means that do not share a common letter are declared different.
Thus, we see that last year’s mean energy consumption in the Midwest exceeds that in the West and South and that no other means can be declared different. All of this can be said with 95% confidence, the family confidence level.
INSTRUCTIONS 16.2 Steps for generating Output 16.2 MINITAB
1 Store all 20 energy consumptions from Table 16.9 in a column named ENERGY
2 Store the regions corresponding to the energy consumptions in a column named REGION
3 ChooseStat➤ANOVA➤One-Way. . . 4 Press the F3 key to reset the dialog box 5 Specify ENERGY in theResponsetext box 6 Specify REGION in theFactortext box 7 Click theComparisons. . . button
8 Type5(100 minus the family confidence level expressed as a percentage) in theError rate for comparisonstext box
9 In theComparison procedures assuming equal variancescheck-box list, check theTukeycheck box 10 In theResultslist, uncheck theInterval plot for
differences of meanscheck box 11 ClickOK
12 Click theGraphs. . . button, uncheck theInterval plot check box, and clickOK
13 Click theResults. . . button, uncheck all the check boxes, and clickOKtwice
(continued)
16.4 Multiple Comparisons∗ 747
EXCEL
1 Store all 20 energy consumptions from Table 16.9 in a column named ENERGY
2 Store the regions corresponding to the energy consumptions in a column named REGION 3 ChooseXLSTAT➤Modeling data➤ANOVA 4 Click the reset button in the lower left corner of the
dialog box
5 In theY / Dependent variableslist, click in the Quantitativeselection box and then select the column of the worksheet that contains the ENERGY data
6 In theX / Explanatory variableslist, click in the Qualitativeselection box and then select the
column of the worksheet that contains the REGION data
7 Click theOptionstab and then type95in the Confidence interval (%)text box
8 Click theOutputstab and uncheck all check boxes 9 Click theMultiple comparisonssubtab
10 Check thePairwise comparisonscheck box
11 In thePairwise comparisonslist box, ensure that only Tukey (HSD)is checked
12 Click theChartstab and then uncheck theRegression chartsandMeans chartscheck boxes
13 ClickOK
14 Click theContinuebutton in theXLSTAT – Selections dialog box
Exercises 16.4
Understanding the Concepts and Skills
16.76 What is the purpose of doing a multiple comparison?
16.77 Fill in the blank: If a confidence interval for the difference between two population means does not contain , we can reject the null hypothesis that the two means are equal in favor of the alter- native hypothesis that the two means are different; and vice versa.
16.78 Explain the difference between the family confidence level and the individual confidence level.
16.79 Regarding family and individual confidence levels, answer the following questions and explain your answers.
a. Which is smaller for multiple comparisons involving three or more means, the family confidence level or the individual confidence level?
b. For multiple comparisons involving two means, what is the rela- tionship between the family confidence level and the individual confidence level?
16.80 What is the name of the distribution on which the Tukey multiple-comparison method is based? What is its abbreviation?
16.81 The parameterνfor theq-curve in a Tukey multiple compari- son equals one of the degrees of freedom for theF-curve in a one-way ANOVA. Which one?
16.82 Explain the essential difference between obtaining a confi- dence interval by using the pooledt-interval procedure and obtaining a confidence interval by using the Tukey multiple-comparison proce- dure.
16.83 Determine the following for aq-curve with parametersκ=6 andν=13.
a. Theq-value having area 0.05 to its right b. q0.01
16.84 Determine the following for aq-curve with parametersκ=8 andν=20.
a. Theq-value having area 0.01 to its right b. q0.05
16.85 Find the following for aq-curve with parametersκ=9 and ν=30.
a. Theq-value having area 0.01 to its right b. q0.05
16.86 Find the following for aq-curve with parametersκ=4 and ν=11.
a. Theq-value having area 0.05 to its right b. q0.01
16.87 Suppose that you conduct a one-way ANOVA test and find that the test is not statistically significant at the 5% level. If you subse- quently perform a Tukey multiple comparison at a family confidence level of 0.95, what will be the results? Explain your answer.
In Exercises16.88–16.93, we repeat the data from Exercises 16.42–
16.47 of Section 16.3 for independent simple random samples from several populations. In each case, conduct a Tukey multiple compar- ison at the 95% family confidence level. Interpret your results.
16.88
Sample 1 Sample 2 Sample 3
1 10 4
9 4 16
8 10
6 2
16.89
Sample 1 Sample 2 Sample 3
8 2 4
4 1 3
6 3 6
3
16.90
Sample 1 Sample 2 Sample 3 Sample 4
6 9 4 8
3 5 4 4
3 7 2 6
8 2
6 3
16.91
Sample 1 Sample 2 Sample 3 Sample 4 Sample 5
7 5 6 3 7
4 9 7 7 9
5 4 5 7 11
4 4 4
8 4
16.92
Sample 1 Sample 2 Sample 3 Sample 4 Sample 5
4 8 9 4 3
2 5 6 0 6
3 5 9 2 9
16.93
Sample 1 Sample 2 Sample 3 Sample 4
11 9 16 5
6 2 10 1
7 4 10 3
Applying the Concepts and Skills
In Exercises16.94–16.99, use Procedure 16.2 on page 743 to perform a Tukey multiple comparison at the specified family confidence level.
16.94 Book Review. Following are the data from Exercise 16.48 on number of pages for random samples of books in five rating groups.
Use a family confidence level of 0.99.
1* 2* 3* 4* 5*
382 560 384 325 360
391 343 458 390 298
335 512 409 304 272
368 329 309 240 368
400 391 374 306 320
372 367 459 169 326
16.95 Copepod Cuisine. Following are the data on the number of copepods in each of 12 containers after 14 days for three different diets from Exercise 16.49. Use a family confidence level of 0.95.
Diatoms Bacteria Macroalgae
426 303 277
467 301 324
438 293 302
497 328 272
16.96 From Exercise 16.50: In Section 16.2, we considered two hypothetical examples to explain the logic for one-way ANOVA.
a. Refer to Table 16.1 on page 724 (95% family confidence level).
b. Refer to Table 16.2 on page 724 (95% family confidence level).
16.97 Theraphosaidea. Following are the data from Exercise 16.51 on Femur size of the four legs of the new genus of Theraphosaidea.
Leg I Leg II Leg III Leg IV
7.72 6.74 6.44 9.16
6.38 5.39 5.82 8.37
8.59 7.36 6.92 10.0
7.36 6.22 6.72 9.69
6.93 6.98 6.01 8.71
Use a 95% family confidence level.
16.98 Permeation Sampling. Following are the data from Exer- cise 16.52 on experimentally obtained calibration constants for sam- ples of compounds in each of four compound groups.
Aliphatic Aromatic Esters Alcohols hydrocarbons hydrocarbons
0.185 0.185 0.230 0.166
0.155 0.160 0.184 0.144
0.131 0.142 0.160 0.117
0.103 0.122 0.132 0.072
0.064 0.117 0.100
0.115 0.064
0.110 0.095 0.085 0.075
a. Use a 95% family confidence level.
b. Without doing any further work or referring to Exercise 16.52, decide at the 5% significance level whether the data provide sufficient evidence to conclude that a difference exists in mean calibration constant among the four compound groups. Explain your reasoning.
16.99 SmartPhone Battery Life. In the following table are the data from Exercise 16.53 on battery lives in hours, for samples of Smart- Phones made by four different mobile companies. The four brands are iPhone 6s, iPhone 6s+, LG G3, and Samsang Galaxy S5, but we have not used the names and have permuted the order. Use a 95%
family confidence level.
Brand A Brand B Brand C Brand D
19.60 21.10 10.31 17.02
18.82 20.00 10.02 16.71
19.00 20.43 9.41 17.78
18.45 19.67 9.89 18.65
19.79 18.99 10.05 15.98
19.03 19.98 10.52 17.63
17.89 20.14 11.02 17.00
19.42 19.78 10.42 16.78
16.92 17.14
In Exercises16.100–16.105, use the technology of your choice to per- form and interpret a Tukey multiple comparison at the specified family confidence level. All data sets are on the WeissStats site.
16.100 Empty Stomachs. The data from Exercise 16.54 on the pro- portions of fish with empty stomachs among four species in African waters. Use a 99% family confidence level.
16.101 Monthly Rents. The data from Exercise 16.55 on monthly rents, in dollars, for independent random samples of newly completed apartments in the four U.S. regions. Use a 95% family confidence level.
16.102 Ground Water. The data from Exercise 16.56 on the con- centrations, in milligrams per liter, of each of four chemicals among three different wells. Use a 99% family confidence level.
16.103 Rock Sparrows. The data from Exercise 16.57 on the num- ber of minutes per hour that male Rock Sparrows sang in the vicinity of the nests after patch-size manipulations were done on three differ- ent groups of females. Use a 99% family confidence level.
16.104 Artificial Teeth: Wear. The data from Exercise 16.58 on the volume of material worn away, in cubic millimeters, among three different materials for making artificial teeth. Use a 95% family confidence level.
16.4 Multiple Comparisons∗ 749 16.105 Artificial Teeth: Hardness. The data from Exercise 16.59
on the Vickers microhardness (VHN) of the occlusal surfaces among three different materials for making artificial teeth. Use a 95% family confidence level.
In Exercises16.106–16.109, use Procedure 16.2 on page 743 to per- form a Tukey multiple comparison at the specified family confidence level. Note: We have provided values of qanot given in Table IX or X.
16.106 Breast Milk and IQ. Following are summary statistics from Exercise 16.60 on IQ for samples of children at age 71/2–8 years who were born preterm. The researchers used the follow- ing designations. Group I: mothers declined to provide breast milk;
Group IIa: mothers had chosen but were unable to provide breast milk; and Group IIb: mothers had chosen and were able to provide breast milk. Use a family confidence level of 0.99. Hereqα=4.15.
Group nj x¯j sj
I 90 92.8 15.2
IIa 17 94.8 19.0
IIb 193 103.7 15.3
16.107 Denosumab and Osteoporosis. In the following table are summary statistics from Exercise 16.61 on body-mass indexes (BMI) of the women in five denosumab treatment groups. Use a family con- fidence level of 0.90. Hereqα=3.50.
Treatment nj x¯j sj
Placebo 46 25.9 4.3
14 mg 54 25.8 5.3
60 mg 47 27.5 5.8
100 mg 42 26.0 4.6
210 mg 47 25.9 4.3
16.108 Minke Whales. In the following table are summary statis- tics from Exercise 16.62 on body lengths, in meters, of minke whales entangled at four different ocean depths, in meters. Use a 99% family confidence level. Hereqα=4.52.
Depth nj x¯j sj
0−49 39 4.49 0.80
50−99 20 5.06 0.79
100−149 28 5.75 1.20 150−199 14 5.99 0.97
16.109 Starting Salaries. Following are summary statistics from Exercise 16.63 on starting salaries, in thousands of dollars, for samples of bachelor’s-degree graduates in six fields. Use a family confidence level of 0.99. Hereqα=4.85.
Field nj x¯j sj
Business 46 55.1 5.6
Communications 11 44.6 4.7 Computer Science 30 59.1 4.0
Education 11 40.6 5.0
Engineering 44 62.6 5.7
Math & Sciences 18 43.0 4.8
Working with Large Data Sets
In Exercises 16.110–16.118, we repeat information from Exer- cises 16.64–16.72, where you were asked to decide whether con- ducting a one-way ANOVA test on the data is reasonable. For those exercises where it is, use the technology of your choice to perform and interpret a Tukey multiple comparison at the 95% family confidence level. All data sets are on the WeissStats site.
16.110 Daily TV Viewing Time. The data from Exercise 16.64 on the daily TV viewing times, in hours, of independent simple random samples of men, women, teens, and children.
16.111 Fish of Lake Laengelmaevesi. The data from Exer- cise 16.65 on weight (in grams) and length (in centimeters) from the nose to the beginning of the tail for four species of fish caught in Lake Laengelmaevesi, Finland. Consider both the weight and length data for possible analysis.
16.112 Popular Diets. The data from Exercise 16.66 on weight losses, in kilograms, over a 1-year period of four popular diets. Recall that negative losses are gains and that WW = Weight Watchers.
16.113 Cuckoo Care. The data from Exercise 16.67 on the lengths, in millimeters, of cuckoo eggs found in the nests of six bird species.
16.114 Doing Time. The data from Exercise 16.68 on times served, in months, of independent simple random samples of released pris- oners among five different offense categories.
16.115 Book Prices. The data from Exercise 16.69 on book prices, in dollars, for independent random samples of hardcover books in law, science, medicine, and technology.
16.116 Magazine Ads. The data from Exercise 16.70 on the num- ber of words of three syllables or more in advertisements from mag- azines of three different educational levels.
16.117 Sickle Cell Disease. The data from Exercise 16.71 on the steady-state hemoglobin levels of patients with three different types of sickle cell disease.
16.118 Prolonging Life. The data from Exercise 16.72 on the sur- vival times, in days, among samples of patients in advanced stages of cancer, grouped by the affected organ, who were given a vitamin C supplement.
Extending the Concepts and Skills
16.119 Explain why the family confidence level, not the individual confidence level, is the appropriate level for comparing all population means simultaneously.
16.120 In Step 3 of Procedure 16.2, we obtain confidence intervals only wheni< j. Explain how to determine the remaining confidence intervals from those obtained.
16.121 Energy Consumption. Apply Table 16.11 on page 745 and your answer from Exercise 16.120 to determine the remaining six confidence intervals for the differences between the energy consump- tion means.
16.5 The Kruskal–Wallis Test∗
In this section, we examine theKruskal–Wallis test,a nonparametric alternative to the one-way ANOVA procedure discussed in Section 16.3. The Kruskal–Wallis test applies when the distributions (one for each population) of the variable under consideration have the same shape in the sense of Definition 10.1 on page 492; it does not require that the distributions be normal or have any other specific shape.
Like the Mann–Whitney test, the Kruskal–Wallis test is based on ranks. When ties occur, ranks are assigned in the same way as in the Mann–Whitney test:If two or more observations are tied, each is assigned the mean of the ranks they would have had if there were no ties.
EXAMPLE 16.8 Introducing the Kruskal–Wallis Test
Vehicle Miles The Federal Highway Administrationconducts annual surveys on motor vehicle travel by type of vehicle and publishes its findings inHighway Statis- tics. Independent simple random samples of cars, buses, and trucks yielded the data on number of thousands of miles driven last year shown in Table 16.12.
Suppose that we want to use the sample data in Table 16.12 to decide whether a difference exists in last year’s mean number of miles driven among cars, buses, and trucks.
TABLE 16.12 Number of miles driven (1000s) last year for independent samples of cars, buses, and trucks
Cars Buses Trucks
19.9 1.8 24.6
15.3 7.2 37.0
2.2 7.2 21.2
6.8 6.5 23.6
34.2 13.3 23.0
8.3 25.4 15.3
12.0 57.1
7.0 14.5
9.5 26.0
1.1
a. Formulate the problem statistically by posing it as a hypothesis test.
b. Is it appropriate to apply the one-way ANOVA test here? What about the Kruskal–Wallis test?
c. Explain the basic idea for carrying out a Kruskal–Wallis test.
d. Discuss the use of the sample data in Table 16.12 to make a decision concerning the hypothesis test.
Solution
a. Letμ1,μ2, andμ3 denote last year’s mean number of miles driven for cars, buses, and trucks, respectively. Then the null and alternative hypotheses are, respectively,
H0:μ1=μ2=μ3(mean miles driven are equal) Ha: Not all the means are equal.
FIGURE 16.12 Stem-and-leaf diagrams of the three samples in Table 16.12 1 2
6 7 8 9 2 5 9
4 0 0 1 1 2 2 3 3
(b) Buses (c) Trucks (a) Cars
4 5 1 3 3 4 6
7
7 1 1 2 2 3 3 4 4 5 5 1
6 7 7 3
5 0 0 1 1 2 2
b. We constructed stem-and-leaf diagrams of the three samples, as shown in Fig. 16.12. These diagrams suggest that the distributions of miles driven have roughly the same shape for cars, buses, and trucks but that those distributions are far from normal. Thus, although the one-way ANOVA test of Section 16.3 is probably inappropriate, the Kruskal–Wallis procedure appears suitable.† c. To apply the Kruskal–Wallis test, we first rank the data from all three samples
combined, as shown in Table 16.13.
The idea behind the Kruskal–Wallis test is simple: If the null hypothesis of equal population means is true, the means of the ranks for the three samples should be roughly equal. Put another way, if the variation among the mean ranks for the three samples is too large, we have evidence against the null hypothesis.
To measure the variation among the mean ranks, we use the treatment sum of squares,SSTR, computed for the ranks. To decide whether that quantity is too large, we compare it to the variance of all the ranks, which can be expressed asSST/(n−1), whereSST is the total sum of squares for the ranks andnis the
†To explain the Kruskal–Wallis test, we have chosen an example with very small sample sizes. However, because having very small sample sizes makes effectively checking the same-shape condition difficult, proceed cautiously when dealing with them.
16.5 The Kruskal–Wallis Test∗ 751
TABLE 16.13 Results of ranking the combined data from Table 16.12
Cars Rank Buses Rank Trucks Rank
19.9 16 1.8 2 24.6 20
15.3 14.5 7.2 7.5 37.0 24
2.2 3 7.2 7.5 21.2 17
6.8 5 6.5 4 23.6 19
34.2 23 13.3 12 23.0 18
8.3 9 25.4 21 15.3 14.5
12.0 11 57.1 25
7.0 6 14.5 13
9.5 10 26.0 22
1.1 1
9.850 9.000 19.167 ←− Mean ranks
total number of observations.†More precisely, the test statistic for a Kruskal–
Wallis test, denotedK, is
K = SSTR
SST/(n−1). (16.5)
? What Does It Mean?
TheK-statistic is the ratio of the variation among the mean ranks to the variation of
all the ranks. Large values of K indicate that the variation among the mean ranks is large (relative to the variance of all the ranks) and hence that the null hypothesis of equal population means should be rejected.
d. For the ranks in Table 16.13, we find thatSSTR=537.475, SST=1299, and n=25. Thus the value of the test statistic is
K = SSTR
SST/(n−1) = 537.475
1299/24=9.930.
Is this value of K large enough to conclude that the null hypothesis of equal population means is false? To answer this question, we need to know the distri- bution of the variableK.
KEY FACT 16.5 Distribution of the K -Statistic for a Kruskal–Wallis Test
Suppose that thek distributions (one for each population) of the variable under consideration have the same shape. Then, for independent samples from thekpopulations, the variable
K = SSTR SST/(n−1)
has approximately a chi-square distribution with df=k−1 if the null hypoth- esis of equal population means is true. Here,ndenotes the total number of observations.
Note the following:
r A rule of thumb for using the chi-square distribution as an approximation to the true distribution ofK is that all sample sizes should be 5 or greater. Although we adopt that rule of thumb, some statisticians consider it too restrictive. Instead, they regard the chi-square approximation to be adequate unlessk=3 and none of the sample sizes exceed 5.
†Recall from Sections 16.2 and 16.3 that the treatment sum of squares,SSTR, is a measure of variation among means and that the total sum of squares,SST, is a measure of variation among all the data. The defining and computing formulas forSSTRandSSTare given in Formula 16.1 on page 731. For the Kruskal–Wallis test, we apply those formulas to the ranks of the sample data, not to the sample data themselves.