The Kruskal–Wallis Test ∗

Một phần của tài liệu Ebook Introductory statistics (9th edition) Part 2 (Trang 315 - 318)

CHAPTER OUTLINE 16.1 The F -Distribution

Step 6 Interpret the results of the multiple comparison

16.5 The Kruskal–Wallis Test ∗

In this section, we examine theKruskal–Wallis test,a nonparametric alternative to the one-way ANOVA procedure discussed in Section 16.3. The Kruskal–Wallis test applies when the distributions (one for each population) of the variable under consid- eration have the same shape; it does not require that the distributions be normal or have any other specific shape.

Like the Mann–Whitney test, the Kruskal–Wallis test is based on ranks. When ties occur, ranks are assigned in the same way as in the Mann–Whitney test:If two or more observations are tied, each is assigned the mean of the ranks they would have had if there were no ties.

EXAMPLE 16.8 Introducing the Kruskal–Wallis Test

Vehicle Miles TheFederal Highway Administrationconducts annual surveys on motor vehicle travel by type of vehicle and publishes its findings inHighway Statis- tics. Independent simple random samples of cars, buses, and trucks yielded the data on number of thousands of miles driven last year shown in Table 16.10.

Suppose that we want to use the sample data in Table 16.10 to decide whether a difference exists in last year’s mean number of miles driven among cars, buses, and trucks.

TABLE 16.10 Number of miles driven (1000s) last year for independent samples of cars, buses, and trucks

Cars Buses Trucks

19.9 1.8 24.6

15.3 7.2 37.0

2.2 7.2 21.2

6.8 6.5 23.6

34.2 13.3 23.0

8.3 25.4 15.3

12.0 57.1

7.0 14.5

9.5 26.0

1.1

a. Formulate the problem statistically by posing it as a hypothesis test.

b. Is it appropriate to apply the one-way ANOVA test here? What about the Kruskal–Wallis test?

c. Explain the basic idea for carrying out a Kruskal–Wallis test.

d. Discuss the use of the sample data in Table 16.10 to make a decision concerning the hypothesis test.

Solution

a. Letμ1,μ2, andμ3 denote last year’s mean number of miles driven for cars, buses, and trucks, respectively. Then the null and alternative hypotheses are, respectively,

H0:μ1=μ2=μ3(mean miles driven are equal) Ha:Not all the means are equal.

b. We constructed stem-and-leaf diagrams of the three samples, as shown in Fig. 16.12. These diagrams suggest that the distributions of miles driven have roughly the same shape for cars, buses, and trucks but that those distributions are far from normal. Thus, although the one-way ANOVA test of Section 16.3 is probably inappropriate, the Kruskal–Wallis procedure appears suitable.† FIGURE 16.12

Stem-and-leaf diagrams of the three samples in Table 16.10

1 2 6 7 8 9 2 5 9

4 0 0 1 1 2 2 3 3

(b) Buses (c) Trucks (a) Cars

4 5 1 3 3 4 6

7

7 1 1 2 2 3 3 4 4 5 5 1

6 7 7 3

5 0 0 1 1 2 2

c. To apply the Kruskal–Wallis test, we first rank the data from all three samples combined, as shown in Table 16.11.

TABLE 16.11 Results of ranking the combined data from Table 16.10

Cars Rank Buses Rank Trucks Rank

19.9 16 1.8 2 24.6 20

15.3 14.5 7.2 7.5 37.0 24

2.2 3 7.2 7.5 21.2 17

6.8 5 6.5 4 23.6 19

34.2 23 13.3 12 23.0 18

8.3 9 25.4 21 15.3 14.5

12.0 11 57.1 25

7.0 6 14.5 13

9.5 10 26.0 22

1.1 1

9.850 9.000 19.167 ←− Mean ranks

The idea behind the Kruskal–Wallis test is simple: If the null hypothesis of equal population means is true, the means of the ranks for the three samples should be roughly equal. Put another way, if the variation among the mean ranks for the three samples is too large, we have evidence against the null hypothesis.

? What Does It Mean?

TheH-statistic is the ratio of the variation among the mean ranks to the variation of all the ranks.

To measure the variation among the mean ranks, we use the treatment sum of squares,SSTR, computed for the ranks. To decide whether that quantity is too large, we compare it to the variance of all the ranks, which can be expressed asSST/(n−1), whereSSTis the total sum of squares for the ranks andnis the total number of observations.‡ More precisely, the test statistic for a Kruskal–

Wallis test, denotedH, is

H = SSTR SST/(n−1).

†To explain the Kruskal–Wallis test, we have chosen an example with very small sample sizes. However, because having very small sample sizes makes effectively checking the same-shape condition difficult, proceed cautiously when dealing with them.

‡Recall from Sections 16.2 and 16.3 that the treatment sum of squares,SSTR, is a measure of variation among means and that the total sum of squares,SST, is a measure of variation among all the data. The defining and computing formulas forSSTRandSSTare given in Formula 16.1 on page 726. For the Kruskal–Wallis test, we apply those formulas to the ranks of the sample data, not to the sample data themselves.

748 CHAPTER 16 Analysis of Variance (ANOVA)

Large values of H indicate that the variation among the mean ranks is large (relative to the variance of all the ranks) and hence that the null hypothesis of equal population means should be rejected.

d. For the ranks in Table 16.11, we find thatSSTR=537.475, SST=1299, and n=25. Thus the value of the test statistic is

H = SSTR

SST/(n−1) = 537.475

1299/24 =9.930.

Is this value of H large enough to conclude that the null hypothesis of equal population means is false? To answer this question, we need to know the distri- bution of the variableH.

KEY FACT 16.5 Distribution of theH-Statistic for a Kruskal–Wallis Test

Suppose that thek distributions (one for each population) of the variable under consideration have the same shape. Then, for independent samples from thekpopulations, the variable

H= SSTR SST/(n−1)

has approximately a chi-square distribution with df=k−1 if the null hypoth- esis of equal population means is true. Here,ndenotes the total number of observations.

Note: A rule of thumb for using the chi-square distribution as an approximation to the true distribution ofH is that all sample sizes should be 5 or greater. Although we adopt that rule of thumb, some statisticians consider it too restrictive. Instead, they regard the chi-square approximation to be adequate unlessk=3 and none of the sample sizes exceed 5.

Computing Formula for H

Usually, an easier way to compute the test statisticH by hand from the raw data is to apply the computing formula

H = 12 n(n+1)

k

j=1

Rj2

nj −3(n+1),

where R1denotes the sum of the ranks for the sample data from Population 1,R2de- notes the sum of the ranks for the sample data from Population 2, and so on.

? What Does It Mean?

This is the computing formula forH, used for hand calculations.

Strictly speaking, the computing formula for H is equivalent to the defining for- mula forHonly if no ties occur. In practice, however, the computing formula provides a sufficiently accurate approximation unless the number of ties is relatively large.

Performing the Kruskal–Wallis Test

Procedure 16.3 provides a step-by-step method for conducting a Kruskal–Wallis test by using either the critical-value approach or theP-value approach. Because the null hypothesis is rejected only when the test statistic, H, is too large, a Kruskal–Wallis test is always right tailed.

Although the Kruskal–Wallis test can be used to compare several population medians as well as several population means, we state Procedure 16.3 in terms of population means. To apply the procedure for population medians, simply replaceμ1

byη1,μ2byη2, and so on.

PROCEDURE 16.3 Kruskal–Wallis Test

Purpose To perform a hypothesis test to compare k population means, μ1,μ2, . . . ,μk

Assumptions

1. Simple random samples 2. Independent samples 3. Same-shape populations 4. All sample sizes are 5 or greater

Step 1 The null and alternative hypotheses are, respectively, H0:μ1=μ2= ã ã ã =μk

Ha:Not all the means are equal.

Step 2 Decide on the significance level,α. Step 3 Compute the value of the test statistic

H= 12 n(n+1)

k

j=1

R2j

nj −3(n+1)

and denote that value H. Here, n is the total number of observations and R1,R2, . . . ,Rkdenote the sums of the ranks for the sample data from Popula- tions 1, 2, . . . ,k, respectively. To obtainH, first construct a work table to rank the data from all the samples combined.

Một phần của tài liệu Ebook Introductory statistics (9th edition) Part 2 (Trang 315 - 318)

Tải bản đầy đủ (PDF)

(454 trang)