ADRIEN LEGENDRE: INTRODUCING THE METHOD OF LEAST SQUARES
Step 4 Use Table IX to estimate the P-value, or
1 Rp
R0p P-value
Step 5 If P≤α, reject H0; otherwise, do not reject H0.
Step 6 Interpret the results of the hypothesis test.
Step 2 Decide on the significance level,α.
We are to perform the hypothesis test at the 5% significance level, orα=0.05.
Step 3 Compute the value of the test statistic Rp= xiwi
Sxxw2i.
To compute the value of the test statistic, we need a table forx,w,xw,x2, andw2, as given in Table 15.7. The normal scores are from Table III in Appendix A.
Substituting the sums from Table 15.7 into the equation forRpyields Rp = xiwi
Sxxwi2 = xiwi
xi2−(xi)2/n wi2
= 274.370 22,255.50−(395.0)2/12
ã9.8656
=0.908.
TABLE 15.7
Table for computingRp Adjusted gross Normal
income score
x w xw x2 w2
7.8 −1.64 −12.792 60.84 2.6896
9.7 −1.11 −10.767 94.09 1.2321
10.6 −0.79 −8.374 112.36 0.6241
12.7 −0.53 −6.731 161.29 0.2809
12.8 −0.31 −3.968 163.84 0.0961
18.1 −0.10 −1.810 327.61 0.0100
21.2 0.10 2.120 449.44 0.0100
33.0 0.31 10.230 1,089.00 0.0961
43.5 0.53 23.055 1,892.25 0.2809
51.1 0.79 40.369 2,611.21 0.6241
81.4 1.11 90.354 6,625.96 1.2321
93.1 1.64 152.684 8,667.61 2.6896
395.0 0.00 274.370 22,255.50 9.8656
CRITICAL-VALUE APPROACH OR P-VALUE APPROACH Step 4 The critical value is R∗p. Use Table IX to find
the critical value.
We haveα=0.05 andn=12. From Table IX, the crit- ical value isRp∗=0.927,as shown in Fig. 15.13A.
FIGURE 15.13A
1 Rp
0.927
Do not reject H0 Reject H0
= 0.05
Step 5 If the value of the test statistic falls in the rejection region, reject H0; otherwise, do not rejectH0.
From Step 3, the value of the test statistic isRp=0.908, which, as Fig. 15.13A shows, falls in the rejection re- gion. So we reject H0. The test results are statistically significant at the 5% level.
Step 4 Use Table IX to estimate theP-value, or obtain it exactly by using technology.
From Step 3, we see that the value of the test statistic is Rp=0.908. Because the test is left tailed, the P-value is the probability of observing a value ofRpof 0.908 or less if the null hypothesis is true. That probability equals the shaded area shown in Fig. 15.13B.
FIGURE 15.13B
1 Rp
Rp= 0.908 P-value
Referring to Fig. 15.13B and to Table IX withn=12, we find that 0.01<P <0.05. (Using technology, we obtain P=0.0266.)
Step 5 If P≤α,rejectH0; otherwise, do not rejectH0.
From Step 4, 0.01< P <0.05. Because the P-value is less than the specified significance level of 0.05, we re- jectH0. The test results are statistically significant at the 5% level and (see Table 9.8 on page 378) provide strong evidence against the null hypothesis of normality.
Step 6 Interpret the results of the hypothesis test.
Interpretation At the 5% significance level, the data provide sufficient evidence to conclude that adjusted gross incomes are not normally distributed.
Exercise 15.131 on page 708
706 CHAPTER 15 Inferential Methods in Regression and Correlation
Correlation Tests for Normality in Residual Analysis
An important use of correlation tests for normality is in residual analysis. Recall that, if the assumptions for regression inferences are met, we can regard the residuals as independent observations of a variable, called theerror term,having a normal distri- bution. Thus a normal probability plot of the residuals should be roughly linear. We can more precisely check the condition of normality for the error term by conducting a correlation test for normality on the residuals.
EXAMPLE 15.14 The Correlation Test for Normality
Age and Price of Orions The residuals for the age and price data of a sample of 11 Orions were calculated in the fourth column of Table 14.8 on page 652 and are repeated here in Table 15.8. At the 5% significance level, do the data provide suffi- cient evidence to conclude that the normality assumption for regression inferences is violated by the variables age and price for Orions?
TABLE 15.8 Residuals for the Orion data
−9.16 −11.42 −3.90 −12.16
−5.16 3.84 −7.90 21.10
14.05 16.36 −5.64 Solution We apply Procedure 15.6 to the residuals in Table 15.8. The null and alternative hypotheses are, respectively,
H0:The normality assumption for regression inferences is not violated Ha: The normality assumption for regression inferences is violated.
Proceeding as in Example 15.13, we find that the value of the test statistic is Rp=0.934.
Critical-value approach:From Table IX, the critical value for a test at the 5% sig- nificance level is 0.923. Because the value of the test statistic exceeds the critical value, we do not rejectH0.
P-value approach:From Table IX, we find that 0.05< P <0.10. (Using technol- ogy, we get P=0.084.) Because the P-value exceeds the specified significance level of 0.05, we do not reject H0. Table 9.8 on page 378 shows, however, that the data do provide moderate evidence against the null hypothesis.
Interpretation At the 5% significance level, the data do not provide sufficient evidence to conclude that the normality assumption for regression inferences is vi- olated by the variables age and price for Orions.
Exercise 15.141 on page 709
THE TECHNOLOGY CENTER
Some statistical technologies have programs that automatically perform a correlation test for normality. In this subsection, we present output and step-by-step instructions for such programs. (Note to Excel and TI-83/84 Plus users:At the time of this writing, neither Excel nor the TI-83/84 Plus has a built-in program for conducting a correlation test for normality. However, they can be used to help perform such a test.)
EXAMPLE 15.15 Using Technology to Perform a Correlation Test for Normality Adjusted Gross Incomes A random sample of 12 federal individual income tax returns from last year gave the adjusted gross incomes (AGI), in thousands of dol- lars, shown in Table 15.6 on page 703. Use Minitab to decide, at the 5% significance level, whether the data provide sufficient evidence to conclude that adjusted gross incomes are not normally distributed.
Solution We want to perform the hypothesis test
H0:Adjusted gross incomes are normally distributed Ha: Adjusted gross incomes are not normally distributed at the 5% significance level.
We applied Minitab’s correlation test for normality program to the data, re- sulting in Output 15.4. Steps for generating that output are presented in Instruc- tions 15.4.
OUTPUT 15.4 Correlation test for normality on the AGI data
MINITAB
As shown in Output 15.4, the P-value for the hypothesis test is 0.027. Because theP-value is less than the specified significance level of 0.05, we rejectH0. At the 5% significance level, the data provide sufficient evidence to conclude that adjusted gross incomes are not normally distributed.
INSTRUCTIONS 15.4 Steps for generating Output 15.4
MINITAB
1 Store the data from Table 15.6 in a column named AGI
2 ChooseStat➤Basic Statistics➤ Normality Test. . .
3 Specify AGI in theVariabletext box
4 Select theRyan-Joineroption button from theTests for Normalitylist
5 ClickOK
Exercises 15.5
Understanding the Concepts and Skills
15.127 Regarding normal probability plots, a. what are they?
b. what is an important use for them?
c. how is one used to assess the normality of a variable?
d. why is the method described in part (c) subjective?
15.128 In a correlation test for normality, what correlation is computed?
15.129 If you examine Procedure 15.6, you will note that a correlation test for normality is always left tailed. Explain why this is so.
708 CHAPTER 15 Inferential Methods in Regression and Correlation 15.130 Suppose that you perform a correlation test for normality
at the 1% significance level. Further suppose that you reject the null hypothesis that the variable under consideration is normally distributed. Can you be confident in stating that the variable is not normally distributed? Explain your answer.
In Exercises15.131–15.138, perform a correlation test for nor- mality, using either the critical-value approach or the P-value approach.
15.131 Exam Scores. A sample of the final exam scores in a large introductory statistics course is as follows.
88 67 64 76 86
85 82 39 75 34
90 63 89 90 84
81 96 100 70 96
At the 5% significance level, do the data provide sufficient ev- idence to conclude that final exam scores in this introductory statistics class are not normally distributed?
15.132 Cell Phone Rates. In an issue ofConsumer Reports, different cell-phone providers and plans were compared. The monthly fees, in dollars, for a sample of the providers and plans are shown in the following table.
40 110 90 30 70
70 30 60 60 50
60 70 35 80 75
Do the data provide sufficient evidence to conclude that monthly fees for cell phones are not normally distributed? Useα=0.05.
15.133 Thoroughbred Racing. The following table displays finishing times, in seconds, for the winners of fourteen 1-mile thoroughbred horse races, as found in two recent issues ofThor- oughbred Times.
94.15 93.37 103.02 95.57 97.73 101.09 99.38 97.19 96.63 101.05 97.91 98.44 97.47 95.10 Do the data provide sufficient evidence to conclude that the fin- ishing times for the winners of 1-mile thoroughbred horse races are not normally distributed? Useα=0.10.
15.134 Beverage Expenditures. TheBureau of Labor Statistics publishes information on average annual expenditures by con- sumers in theConsumer Expenditure Survey. In 2005, the mean amount spent by consumers on nonalcoholic beverages was $303.
A random sample of 12 consumers yielded the following data, in dollars, on last year’s expenditures on nonalcoholic beverages.
423 238 246 327 321 343 302 335 321 311 256 320
At the 10% significance level, do the data provide sufficient evi- dence to conclude that last year’s expenditures by consumers on nonalcoholic beverages are not normally distributed?
15.135 Shoe and Apparel E-Tailers. In the special report
“Mousetrap: The Most-Visited Shoe and Apparel E-tailers”
(Footwear News, Vol. 58, No. 3, p. 18), we found the following data on the average time, in minutes, spent per user per month from January to June of one year for a sample of 15 shoe and apparel retail Web sites.
13.3 9.0 11.1 9.1 8.4
15.6 8.1 8.3 13.0 17.1 16.3 13.5 8.0 15.1 5.8
At the 10% significance level, do the data provide sufficient evi- dence to conclude that the average time spent per user per month from January to June of the year in question is not normally dis- tributed?
15.136 Hotels and Motels. The following table provides the daily charges, in dollars, for a sample of 15 hotels and motels operating in South Carolina. The data were found in the report South Carolina Statistical Abstractsponsored by theSouth Car- olina Budget and Control Board.
81.05 69.63 74.25 53.39 57.48 47.87 61.07 51.40 50.37 106.43 47.72 58.07 56.21 130.17 95.23
At the 5% significance level, do the data provide sufficient ev- idence to conclude that daily charges by hotels and motels in South Carolina are not normally distributed?
15.137 Oxygen Distribution. In the article “Distribution of Oxygen in Surface Sediments from Central Sagami Bay, Japan:
In Situ Measurements by Microelectrodes and Planar Optodes”
(Deep Sea Research Part I: Oceanographic Research Papers, Vol. 52, Issue 10, pp. 1974–1987), R. Glud et al. explore the dis- tributions of oxygen in surface sediments from central Sagami Bay. The oxygen distribution gives important information on the general biogeochemistry of marine sediments. Measurements were performed at 16 sites. A sample of 22 depths yielded the following data, in millimoles per square meter per day (mmol m−2d−1), on diffusive oxygen uptake (DOU).
1.8 2.0 1.8 2.3 3.8 3.4 2.7 1.1 3.3 1.2 3.6 1.9 7.6 2.0 1.5 2.0 1.1 0.7 1.0 1.8 1.8 6.7
Do the data provide sufficient evidence to conclude that diffusive oxygen uptakes in surface sediments from central Sagami Bay are not normally distributed? Useα=0.01.
15.138 Medieval Cremation Burials. In the article “Material Culture as Memory: Combs and Cremations in Early Medieval Britain” (Early Medieval Europe, Vol. 12, Issue 2, pp. 89–128), H. Williams discussed the frequency of cremation burials found in 17 archaeological sites in eastern England. Here are the data.
83 64 46 48 523 35 34 265 2484
46 385 21 86 429 51 258 119
At the 1% significance level, do the data provide sufficient evi- dence to conclude that frequency of cremation burials in archeo- logical sites in eastern England is not normally distributed?
15.139 Explain how the normality assumption for regression in- ferences can be checked by using a correlation test for normality.
In Exercises15.140–15.145, we repeat the information from Ex- ercises 15.16–15.21. For each exercise, use a correlation test for normality to decide at the specified significance level whether the data provide sufficient evidence to conclude that the normality assumption for regression inferences is violated by the two vari- ables under consideration.
15.140 Tax Efficiency. Following are the data on percentage of investments in energy securities and tax efficiency from Exer- cise 15.16. Useα=0.05.
x 3.1 3.2 3.7 4.3 4.0 5.5 6.7 7.4 7.4 10.6 y 98.1 94.7 92.0 89.8 87.5 85.0 82.0 77.8 72.1 53.5
15.141 Corvette Prices. Following are the age and price data for Corvettes from Exercise 15.17. Useα=0.10.
x 6 6 6 2 2 5 4 5 1 4
y 290 280 295 425 384 315 355 328 425 325
15.142 Custom Homes. Following are the size and price data for custom homes from Exercise 15.18. Useα=0.01.
x 26 27 33 29 29 34 30 40 22
y 540 555 575 577 606 661 738 804 496
15.143 Plant Emissions. Following are the data on plant weight and quantity of volatile emissions from Exercise 15.19. Use α=0.05.
x 57 85 57 65 52 67 62 80 77 53 68
y 8.0 22.0 10.5 22.5 12.0 11.5 7.5 13.0 16.5 21.0 12.0
15.144 Crown-Rump Length. Following are the data on age of fetuses and length of crown-rump from Exercise 15.20. Use α=0.10.
x 10 10 13 13 18 19 19 23 25 28
y 66 66 108 106 161 166 177 228 235 280
15.145 Study Time and Score. Following are the data on to- tal hours studied over 2 weeks and test score at the end of the 2 weeks from Exercise 15.21. Useα=0.01.
x 10 15 12 20 8 16 14 22
y 92 81 84 74 85 80 84 80
15.146 Age and BMI. In the article “Childhood Overweight Problem in a Selected School District in Hawaii” (American Journal of Human Biology, Vol. 12, Issue 2, pp. 164–177), D. Chai et al. examined the serious problem of obesity among boys and girls of Hawaiian ancestry. A sample of six children gave the following data on age (x) and body mass index (y).
x 7 7 10 11 14 15
y 18.1 16.8 20.6 21.5 23.8 24.5
At the 5% significance level, do the data provide sufficient ev- idence to conclude that the variables age and body mass index violate the normality assumption for regression inferences?
Working with Large Data Sets
In each of Exercises15.147–15.149, use the technology of your choice to perform and interpret a correlation test for normality at the specified significance level for the variable under consid- eration.
15.147 Body Temperature. A study by researchers at theUni- versity of Marylandaddressed the question of whether the mean body temperature of humans is 98.6◦F. The results of the study by P. Mackowiak et al. appeared in the article “A Critical Appraisal of 98.6◦F, the Upper Limit of the Normal Body Temperature, and Other Legacies of Carl Reinhold August Wunderlich” (Journal of the American Medical Association, Vol. 268, pp. 1578–1580).
Among other data, the researchers obtained the body tempera- tures of 93 healthy humans, as provided on the WeissStats CD.
Useα=0.05.
15.148 Vegetarians and Omnivores. Philosophical and health issues are prompting an increasing number of Taiwanese to switch to a vegetarian lifestyle. In the paper “LDL of Taiwanese Vegetarians Are Less Oxidizable than Those of Omnivores”
(Journal of Nutrition, Vol. 130, pp. 1591–1596), S. Lu et al.
compared the daily intake of nutrients by vegetarians and om- nivores living in Taiwan. Among the nutrients considered was protein. Too little protein stunts growth and interferes with all bodily functions; too much protein puts a strain on the kidneys, can cause diarrhea and dehydration, and can leach calcium from bones and teeth. The data on the WeissStats CD, based on the re- sults of the aforementioned study, give the daily protein intake, in grams, by samples of 51 female vegetarians and 53 female omni- vores. Useα=0.05.
15.149 “Chips Ahoy! 1,000 Chips Challenge.” Students in an introductory statistics course at theU.S. Air Force Academy participated in Nabisco’s “Chips Ahoy! 1,000 Chips Challenge”
by confirming that there were at least 1000 chips in every 18- ounce bag of cookies that they examined. As part of their as- signment, they concluded that the number of chips per bag is approximately normally distributed. Their conclusion was based on the data shown on the WeissStats CD, which gives the
710 CHAPTER 15 Inferential Methods in Regression and Correlation number of chips per bag for 42 bags. Do you agree with the
conclusion of the students? Explain your answer. [SOURCE: B. Warner and J. Rutledge, “Checking the Chips Ahoy! Guar- antee,”Chance, Vol. 12(1), pp. 10–14]
a. Useα=0.05. b. Useα=0.10.
In Exercises15.150–15.160, use the technology of your choice to do the following tasks.
a. Decide whether finding a regression line for the data is reason- able. If so, also do part (b).
b. Decide, at the 5% significance level, whether the data provide sufficient evidence to conclude that the normality assumption for regression inferences is violated by the variables under consideration.
15.150 Birdies and Score. The data from Exercise 15.30 for number of birdies during a tournament and final score of 63 women golfers are on the WeissStats CD.
15.151 U.S. Presidents. The data from Exercise 15.31 for the ages at inauguration and of death of the presidents of the United States are on the WeissStats CD.
15.152 Health Care. The data from Exercise 15.32 for per- centage of gross domestic product (GDP) spent on health care and life expectancy, in years, of selected countries are on the WeissStats CD. Do the required parts separately for each gender.
15.153 Acreage and Value. The data from Exercise 15.33 for lot size (in acres) and assessed value (in thousands of dollars) of a sample of homes in a particular area are on the WeissStats CD.
15.154 Home Size and Value. The data from Exercise 15.34 for home size (in square feet) and assessed value (in thousands of dollars) for the same homes as in Exercise 15.153 are on the WeissStats CD.
15.155 High and Low Temperature. The data from Exer- cise 15.35 for average high and low temperatures in January of a random sample of 50 cities are on the WeissStats CD.
15.156 PCBs and Pelicans. The data from Exercise 15.36 for shell thickness and concentration of PCBs of 60 Anacapa pelican eggs are on the WeissStats CD.
15.157 Gas Guzzlers. The data from Exercise 15.37 for gas mileage and engine displacement of 121 vehicles are on the WeissStats CD.
15.158 Estriol Level and Birth Weight. The data from Exer- cise 15.38 for estriol levels of pregnant women and birth weights of their children are on the WeissStats CD.
15.159 Shortleaf Pines. The data from Exercise 15.39 for vol- ume, in cubic feet, and diameter at breast height, in inches, of 70 shortleaf pines are on the WeissStats CD.
15.160 Body Fat. The data from Exercise 15.72 for age and body fat of 18 randomly selected adults are on the WeissStats CD.
Extending the Concepts and Skills
15.161 Finger Length of Criminals. In 1902, W. R. Macdonell published the article “On Criminal Anthropometry and the Iden- tification of Criminals” (Biometrika, 1, pp. 177–227). Among other things, the author presented data on the left middle finger length, in centimeters (cm). The following table provides the mid- points and frequencies of the finger length classes used.
Midpoint Midpoint
(cm) Frequency (cm) Frequency
9.5 1 11.6 691
9.8 4 11.9 509
10.1 24 12.2 306
10.4 67 12.5 131
10.7 193 12.8 63
11.0 417 13.1 16
11.3 575 13.4 3
At the 5% significance level, do the data provide sufficient evi- dence to conclude that left middle finger length of criminals is not normally distributed?
15.162 Gestation Periods of Humans. For humans, gestation periods are normally distributed with a mean of 266 days and a standard deviation of 16 days.
a. Simulate four random samples of 50 human gestation periods each.
b. Perform a correlation test for normality on each sample in part (a). Useα=0.05.
c. Are the conclusions in part (b) what you expected? Explain your answer.
15.163 Emergency Room Traffic. Desert Samaritan Hospital, in Mesa, Arizona, keeps records of emergency room traffic.
Those records reveal that the times between arriving patients have a special type of reverse-J-shaped distribution called anexponen- tial distribution. The records also show that the mean time be- tween arriving patients is 8.7 minutes.
a. Simulate four random samples of 75 interarrival times each.
b. Perform a correlation test for normality on each sample in part (a). Useα=0.05.
c. Are the conclusions in part (b) what you expected? Explain your answer.
CHAPTER IN REVIEW
You Should Be Able to
1. use and understand the formulas in this chapter.
2. state the assumptions for regression inferences.
3. understand the difference between the population regression line and a sample regression line.
4. estimate the regression parametersβ0,β1, andσ. 5. determine the standard error of the estimate.
6. perform a residual analysis to check the assumptions for re- gression inferences.
7. perform a hypothesis test to decide whether the slope,β1, of the population regression line is not 0 and hence whetherxis useful for predictingy.
8. obtain a confidence interval forβ1.
9. determine a point estimate and a confidence interval for the conditional mean of the response variable corresponding to a particular value of the predictor variable.
10. determine a predicted value and a prediction interval for the response variable corresponding to a particular value of the predictor variable.
11. understand the difference between the population correlation coefficient and a sample correlation coefficient.
12. perform a hypothesis test for a population linear correlation coefficient.
13.
* perform a correlation test for normality.
Key Terms
conditional distribution,669 conditional mean,669 conditional meant-interval
procedure,689 correlationt-test,698
correlation test for normality,*704 error term,706
linearly correlated variables,697 linearly uncorrelated variables,696 multiple regression analysis,692
negatively linearly correlated variables,697
population linear correlation coefficient(ρ),696
population regression equation,670 population regression line,670 positively linearly correlated
variables,697
predicted valuet-interval procedure,691 prediction interval,690
regression model,670
regressiont-interval procedure,685 regressiont-test,682
residual (e),673 residual plot,674
residual standard deviation,673 sampling distribution of the slope
of the regression line,681 simple linear regression,692
standard error of the estimate (se),672
REVIEW PROBLEMS
Understanding the Concepts and Skills
1. Suppose thatx andyare two variables of a population with x a predictor variable andya response variable.
a. The distribution of all possible values of the response vari- able y corresponding to a particular value of the predictor variablexis called a distribution of the response vari- able.
b. State the four assumptions for regression inferences.
2. Suppose that x andy are two variables of a population and that the assumptions for regression inferences are met withx as the predictor variable andyas the response variable.
a. What statistic is used to estimate the slope of the population regression line?
b. What statistic is used to estimate they-intercept of the popu- lation regression line?
c. What statistic is used to estimate the common conditional standard deviation of the response variable corresponding to fixed values of the predictor variable?
3. What two plots did we use in this chapter to decide whether we can reasonably presume that the assumptions for regression inferences are met by two variables of a population? What prop- erties should those plots have?
4. Regarding analysis of residuals, decide in each case which as- sumption for regression inferences may be violated.
a. A residual plot—that is, a plot of the residuals against the ob- served values of the predictor variable—shows curvature.
b. A residual plot becomes wider with increasing values of the predictor variable.
c. A normal probability plot of the residuals shows extreme cur- vature.
d. A normal probability plot of the residuals shows outliers but is otherwise roughly linear.
5. Suppose that you perform a hypothesis test for the slope of the population regression line with the null hypothesis H0:β1=0 and the alternative hypothesis Ha:β1 =0. If you reject the null hypothesis, what can you say about the utility of the regression equation for making predictions?
6. Identify three statistics that can be used as a basis for testing the utility of a regression.
7. For a particular value of a predictor variable, is there a differ- ence between the predicted value of the response variable and the point estimate for the conditional mean of the response variable?
Explain your answer.
8. Generally speaking, what is the difference between a confi- dence interval and a prediction interval?
9. Fill in the blank:x¯is toμasris to .
10. Identify the relationship between two variables and the ter- minology used to describe that relationship if
a. ρ >0. b. ρ=0. c. ρ <0.
11. Graduation Rates. Graduation rate—the percentage of entering freshmen attending full time and graduating within 5 years—and what influences it have become a concern in