Note: Two associated variables are also called statistically dependent variables
2. At most 20% of the expected frequencies are less than 5
What can we do if one or both of these assumptions are violated? Three approaches are possible. We can combine rows or columns to increase the expected frequencies in those cells in which they are too small; we can eliminate certain rows or columns in which the small expected frequencies occur; or we can increase the sample size.
Association and Causation
Two variables may be associated without being causally related. In Example 13.10, we concluded that the variables marital status and alcohol consumption are associated.
This result means that knowing the marital status of a person imparts information about the alcohol consumption of that person, and vice versa. It does not necessarily mean, however, for instance, that being single causes a person to drink more.
? What Does It Mean?
Association does not imply causation!
Although we must keep in mind that association does not imply causation, we must also note that, if two variables are not associated, there is no point in looking for a causal relationship. In other words, association is a necessary but not sufficient condition for causation.
THE TECHNOLOGY CENTER
Most statistical technologies have programs that automatically perform a chi-square independence test. In this subsection, we present output and step-by-step instructions for such programs.
EXAMPLE 13.11 Using Technology to Perform an Independence Test
Marital Status and Drinking A random sample of 1772 U.S. adults yielded the data on marital status and alcohol consumption shown in Table 13.13 on page 619.
Use Minitab, Excel, or the TI-83/84 Plus to decide, at the 5% significance level, whether the data provide sufficient evidence to conclude that an association exists between marital status and alcohol consumption.
Solution We want to perform the hypothesis test
H0: Marital status and alcohol consumption are not associated Ha: Marital status and alcohol consumption are associated at the 5% significance level.
We applied the chi-square independence test programs to the data, resulting in Output 13.4. Steps for generating that output are presented in Instructions 13.4.
Note to Excel users:For brevity, we have presented only the essential portions of the actual output.
As shown in Output 13.4, the P-value for the hypothesis test is 0.000 to three decimal places. Because the P-value is less than the specified significance level of 0.05, we rejectH0. At the 5% significance level, the data provide sufficient evi- dence to conclude that marital status and alcohol consumption are associated.
13.4 Chi-Square Independence Test 625
OUTPUT 13.4 Chi-square independence test on the data on marital status and alcohol consumption
MINITAB
EXCEL TI-83/84 PLUS
INSTRUCTIONS 13.4 Steps for generating Output 13.4 MINITAB
1 Store the marital-status categories from Table 13.13 in a column named MARITAL STATUS
2 Store the cell data from Table 13.13 in columns named Abstain, 1–60, and Over 60
3 ChooseStat➤Tables➤Chi-Square Test for Association. . .
4 Press the F3 key to reset the dialog box
5 SelectSummarized data in a two-way tablefrom the drop-down list box
6 Specify Abstain, ‘1–60’, and ‘Over 60’ in theColumns containing the tabletext box
7 Specify ‘MARITAL STATUS’ in theRowstext box 8 TypeDRINKS PER MONTHin theColumnstext box 9 ClickOK
EXCEL
1 Store the marital-status categories from Table 13.13 in a column named MARITAL STATUS
2 Directly to the right of the column entered at step 1, store the cell data from Table 13.13 in columns named Abstain, 1–60, and Over 60
3 ChooseXLSTAT➤Correlation/Association tests➤ Tests on contingency tables (Chi-square. . . )
4 Click the reset button in the lower left corner of the dialog box
5 Click in theContingency tableselection box and then select the columns of the worksheet that contain the data stored in steps 1 and 2
6 Click theOptionstab
7 Check theChi-square testcheck box 8 Type5in theSignificance level (%)text box 9 Click theOutputstab and then uncheck all check
boxes 10 ClickOK
11 Click theContinuebutton in theXLSTAT – Selections dialog box
TI-83/84 PLUS
1 Press2nd➤MATRIX, arrow over toEDIT, and press1 2 Type4and pressENTER
3 Type3and pressENTER
4 Enter the cell data from Table 13.13, pressingENTER after each entry
5 PressSTAT, arrow over toTESTS, and pressALPHA➤C 6 Press2nd➤MATRIX, press1, and pressENTER 7 Press2nd➤MATRIX, press2, and pressENTER 8 Arrow down toCalculate, and pressENTER
Note to Excel users:You can obtain results in addition to those shown in Output 13.4 by checking boxes in the Outputstab as desired. For instance, to get tables of the observed and expected frequencies, check theObserved frequenciesandTheoretical frequenciescheck boxes, respectively.
Exercises 13.4
Understanding the Concepts and Skills
13.62 To decide whether two variables of a population are associ- ated, we usually need to resort to inferential methods such as the chi-square independence test. Why?
13.63 Step 1 of Procedure 13.2 gives generic statements for the null and alternative hypotheses of a chi-square independence test. Use the termsstatistically dependentandstatistically independent,intro- duced on page 612, to restate those hypotheses.
13.64 In Example 13.9, we made the following statement: If no as- sociation exists between marital status and alcohol consumption, the proportion of married adults who abstain is the same as the proportion of all adults who abstain. Explain why that statement is true.
13.65 A chi-square independence test is to be conducted to decide whether an association exists between two variables of a population.
One variable has six possible values, and the other variable has four.
What is the degrees of freedom for theχ2-statistic?
13.66 We stated earlier that, if two variables are not associated, there is no point in looking for a causal relationship. Why is that so?
13.67 Education and Salary. Studies have shown that a positive association exists between educational level and annual salary; in other words, people with more education tend to make more money.
a. Does this finding mean that more educationcausesa person to make more money? Explain your answer.
b. Do you think there is a causal relationship between educational level and annual salary? Explain your answer.
13.68 Identify three techniques that can be tried as a remedy when one or more of the expected-frequency assumptions for a chi-square independence test are violated.
In each of Exercises13.69–13.74, we have given the number of possi- ble values for two variables of a population. For each exercise, deter- mine the maximum number of expected frequencies that can be less than 5 in order that Assumption 2 of Procedure 13.2 on page 622 be satisfied. Note: The number of cells for a contingency table with m rows and n columns is mãn.
13.69 four and five 13.70 five and three 13.71 two and three 13.72 four and three 13.73 two and two 13.74 six and seven
In each of Exercises13.75–13.78, we have presented a contingency table that gives a cross-classification of a random sample of values for two variables, x and y, of a population. For each exercise, per- form the following tasks.
a. Find the expected frequencies. Note: You will first need to compute the row totals, column totals, and grand total.
b. Determine the value of the chi-square statistic.
c. Decide at the 5% significance level whether the data provide suf- ficient evidence to conclude that the two variables are associated.
13.75
y
x
A B
a 10 20 b 30 40
13.76
y
x
A B
a 10 20 b 40 30
13.77
y
x
A B C
a 10 15 75 b 0 25 75
13.78
y
x
A B
a 5 35 b 20 80 c 25 85
Applying the Concepts and Skills
In Exercises13.79–13.86, use either the critical-value approach or the P-value approach to perform a chi-square independence test, pro- vided the conditions for using the test are met.
13.79 Siskel and Ebert. In the classic TV showSneak Previews, originally hosted by the late Gene Siskel and Roger Ebert, the two Chicago movie critics reviewed the week’s new movie releases and then rated them thumbs up (positive), mixed, or thumbs down (neg- ative). These two critics often saw the merits of a movie differently.
In general, however, were the ratings given by Siskel and Ebert as- sociated? The answer to this question was the focus of the paper
“Evaluating Agreement and Disagreement Among Movie Review- ers” by A. Agresti and L. Winner that appeared inChance(Vol. 10(2), pp. 10–14). The following contingency table summarizes the ratings by Siskel and Ebert for 160 movies.
Siskel’srating
Ebert’s rating
Thumbs Thumbs
down Mixed up Total
Thumbs
24 8 13 45
down
Mixed 8 13 11 32
Thumbs
10 9 64 83
up
Total 42 30 88 160
At the 1% significance level, do the data provide sufficient evidence to conclude that an association exists between the ratings of Siskel and Ebert?
13.80 Diabetes in Native Americans. Preventable chronic diseases are increasing rapidly in Native American populations, particularly diabetes. F. Gilliland et al. examined the diabetes issue in the paper
13.4 Chi-Square Independence Test 627
“Preventative Health Care among Rural American Indians in New Mexico” (Preventative Medicine, Vol. 28, pp. 194–202). Following is a contingency table showing cross-classification of educational attainment and diabetic state for a sample of 1273 Native Americans (HS is high school).
Education
Diabetic state
Diabetes No diabetes Total
Less than HS 33 218 251
HS grad 25 389 414
Some college 20 393 413
College grad 17 178 195
Total 95 1178 1273
At the 1% significance level, do the data provide sufficient evidence to conclude that an association exists between educational level and diabetic state for Native Americans?
13.81 Learning at Home. M. Stuart et al. studied various aspects of grade-school children and their mothers and reported their findings in the article “Learning to Read at Home and at School” (British Journal of Educational Psychology, 68(1), pp. 3–14). The researchers gave a questionnaire to parents of 66 children in kindergarten through sec- ond grade. Two social-class groups, middle and working, were iden- tified based on the mother’s occupation.
a. One of the questions dealt with the children’s knowledge of nurs- ery rhymes. The following data were obtained.
Social class
Nursery-rhyme knowledge
A few Some Lots
Middle 4 13 15
Working 5 11 18
Are Assumptions 1 and 2 satisfied for a chi-square independence test? If so, conduct the test at the 1% significance level and inter- pret your results.
b. Another question dealt with whether the parents played “I Spy”
games with their children. The following data were obtained.
Social class
Frequency of games Never Sometimes Often
Middle 2 8 22
Working 11 10 13
Are Assumptions 1 and 2 satisfied for a chi-square independence test? If so, conduct the test at the 1% significance level and inter- pret your results.
13.82 Deceptive News. In the article “When News Reporters De- ceive: The Production of Stereotypes” (Journalism & Mass Commu- nication Quarterly, Vol. 84, No. 2, pp. 281–298), researchers D. La-
sorsa and J. Dai investigated the relationship between authenticity and tone for news stories. Tone was measured by using a method of cod- ing sentences and was categorized as positive, neutral, or negative. A sample of stories yielded the following data.
Tone
Authenticity
Authentic Deceptive Total
Negative 59 111 170
Neutral 49 61 110
Positive 20 11 31
Total 128 183 311
At the 5% significance level, do the data provide sufficient evi- dence to conclude that an association exists between authenticity and tone?
13.83 Fear of Gangs. In the article “Growing Pains and Fear of Gangs” (Applied Psychology in Criminal Justice, Vol. 5, No. 2, pp.
1339–164), B. Brown and W. Benedict examined the relationship be- tween worry about a gang thefts and actually being a victim of a theft.
Interviews of a sample of high school students yielded the following contingency table.
VictimofTheft
Worry about Gang Theft
yes no total
yes 58 45 103
no 29 71 100
total 87 116 203
At the 5% significance level, do the data provide sufficient evidence to conclude that an association exists between worry about a gang theft and actually being a victim of a theft?
13.84 HPV Vaccine. In the article “Correlates for Completion of 3-dose Regimen of HPV Vaccine in Female Members of a Managed Care Organization” (Mayo Clinical Proceedings, Vol. 84, pp. 864–
870), C. Chao et al. examined factors that may influence whether young female patients complete a three-injection sequence of the Gar- dasil quadrivalent human papillomavirus vaccine (HPV4). HPV is a virus that has been linked to the development of cervical cancer. The following contingency table summarizes the data obtained for com- pletion of treatment versus practice type.
Practice
Completion
Yes No Total
Pediatric 162 353 515
Family 106 259 365
OB/GYN 201 332 533
Total 469 944 1413
a. At the 5% significance level, do the data provide sufficient evi- dence to conclude that an association exists between completion of treatment and practice type?
b. Repeat part (a) at the 1% significance level.
13.85 Religion. A worldwide poll on religion was conducted by WIN-Gallup International and published as the document Global Index of Religiosity and Atheism. One question involved religious belief and educational attainment. The following data is based on the answers to that question.
Religiosity
Education
Basic Secondary Advanced Total
Religious 77 149 78 304
Not religious 23 56 36 115
Atheist 8 24 29 61
Don’t know 6 15 8 29
Total 114 244 151 509
a. At the 5% significance level, do the data provide sufficient evi- dence to conclude that an association exists between religiosity and education?
b. Repeat part (a) at the 1% significance level.
13.86 BMD and Depression. In the paper “Depression and Bone Mineral Density: Is There a Relationship in Elderly Asian Men?”
(Osteoporosis International, Vol. 16, pp. 610–615), S. Wong et al.
published results of their study on bone mineral density (BMD) and depression for 1999 Hong Kong men aged 65 to 92 years. Here are the cross-classified data.
BMD
Depression
Depressed Not depressed Total
Osteoporitic 3 35 38
Low BMD 69 533 602
Normal 97 1262 1359
Total 169 1830 1999
At the 1% significance level, do the data provide sufficient evidence to conclude that BMD and depression are statistically dependent for elderly Asian men?
Job Satisfaction.ACNN/USA TODAY poll conducted by Gallup asked a sample of employed Americans the following question:
“Which do you enjoy more, the hours when you are on your job, or the hours when you are not on your job?” The responses to this question were cross-tabulated against several characteristics, among which were gender, age, type of community, educational attain- ment, income, and type of employer. The data are provided on the WeissStats site. In each of Exercises 13.87–13.92, use the technol- ogy of your choice to decide, at the 5% significance level, whether an association exists between the specified pair of variables.
13.87 gender and response (to the question) 13.88 age and response
13.89 type of community and response 13.90 educational attainment and response 13.91 income and response
13.92 type of employer and response
13.5 Chi-Square Homogeneity Test
The purpose of a chi-square homogeneity test is to compare the distributions of a variable of two or more populations. As a special case, it can be used to decide whether a difference exists among two or more population proportions.
For a chi-square homogeneity test, the null hypothesis is that the distributions of the variable are the same for all the populations, and the alternative hypothesis is that the distributions of the variable are not all the same (i.e., the distributions differ for at least two of the populations).
When the populations under consideration have the same distribution for a variable, they are said to behomogeneouswith respect to the variable; otherwise, they are said to benonhomogeneouswith respect to the variable. Using this terminology, we can state the null and alternative hypotheses for a chi-square homogeneity test simply as follows:
H0: The populations are homogeneous with respect to the variable Ha: The populations are nonhomogeneous with respect to the variable.
The assumptions for use of the chi-square homogeneity test are simple random samples, independent samples, and the same two expected-frequency assumptions re- quired for performing a chi-square independence test.
Although the context of and assumptions for the chi-square homogeneity test differ from those of the chi-square independence test, the steps for carrying out the two tests are the same. In particular, the test statistics for the two tests are identical.
As with a chi-square independence test, the observed frequencies for a chi-square homogeneity test are arranged in a contingency table. Moreover, the expected frequen- cies are computed in the same way.
13.5 Chi-Square Homogeneity Test 629
FORMULA 13.3 Expected Frequencies for a Homogeneity Test
In a chi-square homogeneity test, the expected frequency for each cell is found by using the formula
E= RãC n ,
whereRis the row total,Cis the column total, andnis the sample size.
? What Does It Mean?
To obtain an expected frequency, multiply the row total by the column total and divide by the sample size
The distribution of the test statistic for a chi-square homogeneity test is presented in Key Fact 13.4.
KEY FACT 13.4 Distribution of the χ2-Statistic for a Chi-Square Homogeneity Test
For a chi-square homogeneity test, the test statistic χ2 =(O−E)2/E
has approximately a chi-square distribution if the null hypothesis of homo- geneity is true. The number of degrees of freedom is (r −1)(c−1), wherer is the number of populations andcis the number of possible values for the variable under consideration.
? What Does It Mean?
To obtain a chi-square subtotal, square the difference between an observed and expected frequency and divide the result by the expected frequency. Adding the chi-square subtotals gives the χ2-statistic, which has approxi-
mately a chi-square distribution. Procedure for the Chi-Square Homogeneity Test
In light of Key Fact 13.4, we present, in Procedure 13.3 (next page), a step-by-step method for conducting a chi-square homogeneity test by using either the critical-value approach or the P-value approach. Because the null hypothesis is rejected only when the test statistic is too large, a chi-square homogeneity test is always right tailed.
EXAMPLE 13.12 The Chi-Square Homogeneity Test
Region and Educational Attainment TheU.S. Census Bureaucompiles data on the resident population by region and educational attainment. Results are published inCurrent Population Survey. Independent simple random samples of (adult) res- idents in the four U.S. regions gave the following data on educational attainment (HS is high school; Assoc’s is Associate’s). At the 5% significance level, do the data provide sufficient evidence to conclude that a difference exists in educational- attainment distributions among residents of the four U.S. regions?
TABLE 13.15 Sample data for educational attainment in the four U.S. regions
Region
Educational attainment
Not HS HS Some Assoc’s Bachelor’s Advanced
grad grad college degree degree degree Total
Northeast 7 13 7 4 10 6 47
Midwest 5 18 13 6 9 4 55
South 11 30 14 7 19 10 91
West 8 16 13 2 10 8 57
Total 31 77 47 19 48 28 250
Solution We first calculate the expected frequencies by using Formula 13.3. Do- ing so, we obtain Table 13.16 shown at the bottom of the next page, which displays the expected frequencies below the observed frequencies from Table 13.15.