Chi-Square Goodness-of-Fit Test

Một phần của tài liệu Ebook Introductory statistics (9th edition) Part 2 (Trang 151 - 154)

ABRAHAM DE MOIVRE: PAVING THE WAY FOR PROPORTION INFERENCES

13.2 Chi-Square Goodness-of-Fit Test

Our first chi-square procedure is called thechi-square goodness-of-fit test.We can use this procedure to perform a hypothesis test about the distribution of a quali- tative (categorical) variable or a discrete quantitative variable that has only finitely many possible values. We introduce and explain the reasoning behind the chi-square goodness-of-fit test next.

EXAMPLE 13.2 Introduces the Chi-Square Goodness-of-Fit Test

Violent Crimes The Federal Bureau of Investigation (FBI) compiles data on crimes and crime rates and publishes the information inCrime in the United States.

A violent crime is classified by the FBI as murder, forcible rape, robbery, or aggra- vated assault. Table 13.1 gives a relative-frequency distribution for (reported) vio- lent crimes in 2000. For instance, in 2000, 28.6% of violent crimes were robberies.

A simple random sample of 500 violent-crime reports from last year yielded the frequency distribution shown in Table 13.2. Suppose that we want to use the data in Tables 13.1 and 13.2 to decide whether last year’s distribution of violent crimes is changed from the 2000 distribution.

a. Formulate the problem statistically by posing it as a hypothesis test.

b. Explain the basic idea for carrying out the hypothesis test.

c. Discuss the details for making a decision concerning the hypothesis test.

TABLE 13.1 Distribution of violent crimes in the United States, 2000

Type of Relative

violent crime frequency

Murder 0.011

Forcible rape 0.063

Robbery 0.286

Agg. assault 0.640 1.000 TABLE 13.2 Sample results for 500 randomly selected violent-crime reports from last year Type of

violent crime Frequency

Murder 3

Forcible rape 37

Robbery 154

Agg. assault 306 500

Solution

a. The population is last year’s (reported) violent crimes. The variable is “type of violent crime,” and its possible values are murder, forcible rape, robbery, and aggravated assault. We want to perform the following hypothesis test.

H0:Last year’s violent-crime distribution is the same as the 2000 distribution.

Ha:Last year’s violent-crime distribution is different from the 2000 distribution.

b. The idea behind the chi-square goodness-of-fit test is to compare theobserved frequenciesin the second column of Table 13.2 to the frequencies that would be expected—theexpected frequencies—if last year’s violent-crime distribu- tion is the same as the 2000 distribution. If the observed and expected fre- quencies match fairly well (i.e., each observed frequency is roughly equal to its corresponding expected frequency), we do not reject the null hypothesis;

otherwise, we reject the null hypothesis.

c. To formulate a precise procedure for carrying out the hypothesis test, we need to answer two questions:

1. What frequencies should we expect from a random sample of 500 violent- crime reports from last year if last year’s violent-crime distribution is the same as the 2000 distribution?

2. How do we decide whether the observed and expected frequencies match fairly well?

The first question is easy to answer, which we illustrate with robberies. If last year’s violent-crime distribution is the same as the 2000 distribution, then, according to Table 13.1, 28.6% of last year’s violent crimes would have been robberies. Therefore, in a random sample of 500 violent-crime reports from last year, we would expect about 28.6% of the 500 to be robberies. In other words, we would expect the number of robberies to be 500ã0.286, or 143.

In general, we compute each expected frequency, denoted E, by using the formula

E=np,

wherenis the sample size and pis the appropriate relative frequency from the second column of Table 13.1. Using this formula, we calculated the expected frequencies for all four types of violent crime. The results are displayed in the second column of Table 13.3.

TABLE 13.3 Expected frequencies if last year’s violent-crime distribution is the same as the 2000 distribution

Type of Expected

violent crime frequency

Murder 5.5

Forcible rape 31.5

Robbery 143.0

Agg. assault 320.0

The second column of Table 13.3 answers the first question. It gives the frequencies that we would expect if last year’s violent-crime distribution is the same as the 2000 distribution.

584 CHAPTER 13 Chi-Square Procedures

The second question—whether the observed and expected frequencies match fairly well—is harder to answer. We need to calculate a number that measures the goodness of fit.

In Table 13.4, the second column repeats the observed frequencies from the second column of Table 13.2. The third column of Table 13.4 repeats the expected frequencies from the second column of Table 13.3.

TABLE 13.4

Calculating the goodness of fit Type of Observed Expected Square of Chi-square

violent crime frequency frequency Difference difference subtotal

x O E OE (OE)2 (OE)2/E

Murder 3 5.5 −2.5 6.25 1.136

Forcible rape 37 31.5 5.5 30.25 0.960

Robbery 154 143.0 11.0 121.00 0.846

Agg. assault 306 320.0 −14.0 196.00 0.613

500 500.0 0 3.555

To measure the goodness of fit of the observed and expected frequencies, we look at the differences,OE, shown in the fourth column of Table 13.4.

Summing these differences to obtain a measure of goodness of fit isn’t very useful because the sum is 0. Instead, we square each difference (shown in the fifth column) and then divide by the corresponding expected frequency. Doing so gives the values(OE)2/E, called chi-square subtotals,shown in the sixth column. The sum of the chi-square subtotals,

(OE)2/E=3.555,

is the statistic used to measure the goodness of fit of the observed and expected frequencies.†

If the null hypothesis is true, the observed and expected frequencies should be roughly equal, resulting in a small value of the test statistic,(OE)2/E.

In other words, large values of(OE)2/Eprovide evidence against the null hypothesis.

As we have seen, (OE)2/E =3.555. Can this value be reasonably attributed to sampling error, or is it large enough to suggest that the null hy- pothesis is false? To answer this question, we need to know the distribution of the test statistic(OE)2/E.

First we present the formula for expected frequencies in a chi-square goodness- of-fit test, as discussed in the preceding example, and then we provide the distribution of the test statistic for a chi-square goodness-of-fit test.

FORMULA 13.1 Expected Frequencies for a Goodness-of-Fit Test

In a chi-square goodness-of-fit test, the expected frequency for each possible value of the variable is found by using the formula

E =np,

where n is the sample size and pis the relative frequency (or probability) given for the value in the null hypothesis.

? What Does It Mean?

To obtain an expected frequency, multiply the sample size by the null-hypothesis relative frequency.

†Using subscripts alone or both subscripts and indices, we would write(OE)2/Eas (OiEi)2/Ei or

c i=1

(OiEi)2/Ei,

wherecdenotes the number of possible values for the variable, in this case, four(c=4). However, because no confusion can arise, we use the simpler notation without subscripts or indices.

KEY FACT 13.2 Distribution of theχ2-Statistic for a Goodness-of-Fit Test For a chi-square goodness-of-fit test, the test statistic

χ2=(OE)2/E

has approximately a chi-square distribution if the null hypothesis is true. The number of degrees of freedom is 1 less than the number of possible values for the variable under consideration.

? What Does It Mean?

To obtain a chi-square subtotal, square the difference between an observed and expected frequency and divide the result by the expected frequency. Adding the chi-square subtotals gives theχ2-statistic, which has approximately a chi-square distribution.

Procedure for the Chi-Square Goodness-of-Fit Test

In light of Key Fact 13.2, we now present, in Procedure 13.1, a step-by-step method for conducting a chi-square goodness-of-fit test. Because the null hypothesis is rejected only when the test statistic is too large, a chi-square goodness-of-fit test is always right tailed.

PROCEDURE 13.1 Chi-Square Goodness-of-Fit Test

Purpose To perform a hypothesis test for the distribution of a variable Assumptions

1. All expected frequencies are 1 or greater

2. At most 20% of the expected frequencies are less than 5 3. Simple random sample

Một phần của tài liệu Ebook Introductory statistics (9th edition) Part 2 (Trang 151 - 154)

Tải bản đầy đủ (PDF)

(454 trang)