ADRIEN LEGENDRE: INTRODUCING THE METHOD OF LEAST SQUARES
Step 2 The endpoints of the confidence interval for β 1 are
√Sxx. Step 3 Interpret the confidence interval.
EXAMPLE 15.6 The Regression t -Interval Procedure
Age and Price of Orions Use the data in Table 15.3 on page 681 to determine a 95% confidence interval for the slope of the population regression line that relates price to age for Orions.
Solution We apply Procedure 15.2.
Step 1 For a confidence level of 1−α, use Table IV to findtα/2with df=n−2.
For a 95% confidence interval,α=0.05. Becausen=11, df=11−2=9. From Table IV,tα/2=t0.05/2=t0.025=2.262.
Step 2 The endpoints of the confidence interval forβ1are b1±tα/2ã se
√Sxx.
From Example 14.4,b1= −20.26,xi2=326, andxi =58. Also, from Exam- ple 15.2,se=12.58. Hence the endpoints of the confidence interval forβ1are
−20.26±2.262ã 12.58 326−(58)2/11, or−20.26±6.33, or−26.59 to−13.93.
Step 3 Interpret the confidence interval.
Interpretation We can be 95% confident that the slope of the population re- gression line is somewhere between−26.59 and −13.93. In other words, we can be 95% confident that the yearly decrease in mean price for Orions is somewhere between $1393 and $2659.
Report 15.4
Exercise 15.57 on page 687
686 CHAPTER 15 Inferential Methods in Regression and Correlation
THE TECHNOLOGY CENTER
Most statistical technologies provide the information needed to perform a regression t-test as part of their regression analysis output. For instance, consider the Minitab and Excel regression analysis in Output 14.2 on page 643 for the age and price data of 11 Orions. The items circled in orange give thet-statistic and the P-value for the regressiont-test.
To perform a regressiont-test with the TI-83/84 Plus, we use theLinRegTTest program. See theTI-83/84 Plus Manualfor details.
Exercises 15.2
Understanding the Concepts and Skills
15.40 Explain why the predictor variable is useless as a predictor of the response variable if the slope of the population regression line is 0.
15.41 For two variables satisfying Assumptions 1–3 for re- gression inferences, the population regression equation is y=20−3.5x. For samples of size 10 and given values of the predictor variable, the distribution of slopes of all possible sam- ple regression lines is a distribution with mean . 15.42 Consider the standardized variable
z= b1−β1
σ/√ Sxx
. a. Identify its distribution.
b. Why can’t it be used as the test statistic for a hypothesis test concerningβ1?
c. What statistic is used? What is the distribution of that statistic?
15.43 In this section, we used the statisticb1 as a basis for con- ducting a hypothesis test to decide whether a regression equation is useful for prediction. Identify two other statistics that can be used as a basis for such a test.
In Exercises15.44–15.49, we repeat the information from Exer- cises 15.10–15.15.
a. Decide, at the 10% significance level, whether the data pro- vide sufficient evidence to conclude that x is useful for pre- dicting y.
b. Find a 90% confidence interval for the slope of the population regression line.
15.44
x 2 4 3
ˆ y=2+x
y 3 5 7
15.45
x 3 1 2
ˆ
y=1−2x y −4 0 −5
15.46
x 0 4 3 1 2
ˆ
y=1+2x
y 1 9 8 4 3
15.47
x 3 4 1 2
ˆ
y= −3+2x
y 4 5 0 −1
15.48
x 1 1 5 5
ˆ
y=1.75+0.25x
y 1 3 2 4
15.49
x 0 2 2 5 6
ˆ
y=2.875−0.625x
y 4 2 0 −2 1
In Exercises15.50–15.55, we repeat the information from Exer- cises 15.16–15.21. Presuming that the assumptions for regres- sion inferences are met, decide at the specified significance level whether the data provide sufficient evidence to conclude that the predictor variable is useful for predicting the response variable.
15.50 Tax Efficiency. Following are the data on percentage of investments in energy securities and tax efficiency from Exer- cise 15.16. Useα=0.05.
x 3.1 3.2 3.7 4.3 4.0 5.5 6.7 7.4 7.4 10.6 y 98.1 94.7 92.0 89.8 87.5 85.0 82.0 77.8 72.1 53.5 15.51 Corvette Prices. Following are the age and price data for Corvettes from Exercise 15.17. Useα=0.10.
x 6 6 6 2 2 5 4 5 1 4
y 290 280 295 425 384 315 355 328 425 325 15.52 Custom Homes. Following are the size and price data for custom homes from Exercise 15.18. Useα=0.01.
x 26 27 33 29 29 34 30 40 22
y 540 555 575 577 606 661 738 804 496 15.53 Plant Emissions. Following are the data on plant weight and quantity of volatile emissions from Exercise 15.19. Use α=0.05.
x 57 85 57 65 52 67 62 80 77 53 68 y 8.0 22.0 10.5 22.5 12.0 11.5 7.5 13.0 16.5 21.0 12.0 15.54 Crown-Rump Length. Following are the data on age of fetuses and length of crown-rump from Exercise 15.20. Use α=0.10.
x 10 10 13 13 18 19 19 23 25 28
y 66 66 108 106 161 166 177 228 235 280 15.55 Study Time and Score. Following are the data on to- tal hours studied over 2 weeks and test score at the end of the 2 weeks from Exercise 15.21. Useα=0.01.
x 10 15 12 20 8 16 14 22
y 92 81 84 74 85 80 84 80
In each of Exercises 15.56–15.61, apply Procedure 15.2 on page 685 to find and interpret a confidence interval, at the spec- ified confidence level, for the slope of the population regression line that relates the response variable to the predictor variable.
15.56 Tax Efficiency. Refer to Exercise 15.50; 95%.
15.57 Corvette Prices. Refer to Exercise 15.51; 90%.
15.58 Custom Homes. Refer to Exercise 15.52; 99%.
15.59 Plant Emissions. Refer to Exercise 15.53; 95%.
15.60 Crown-Rump Length. Refer to Exercise 15.54; 90%.
15.61 Study Time and Score. Refer to Exercise 15.55; 99%.
Working with Large Data Sets
In Exercises15.62–15.72, use the technology of your choice to do the following tasks.
a. Decide whether you can reasonably apply the regression t-test.
If so, then also do part (b).
b. Decide, at the 5% significance level, whether the data provide sufficient evidence to conclude that the predictor variable is useful for predicting the response variable.
15.62 Birdies and Score. The data from Exercise 15.30 for number of birdies during a tournament and final score for 63 women golfers are on the WeissStats CD.
15.63 U.S. Presidents. The data from Exercise 15.31 for the ages at inauguration and of death for the presidents of the United States are on the WeissStats CD.
15.64 Health Care. The data from Exercise 15.32 for per- centage of gross domestic product (GDP) spent on health care and life expectancy, in years, for selected countries are on the WeissStats CD. Do the required parts separately for each gender.
15.65 Acreage and Value. The data from Exercise 15.33 for lot size (in acres) and assessed value (in thousands of dollars) for a sample of homes in a particular area are on the WeissStats CD.
15.66 Home Size and Value. The data from Exercise 15.34 for home size (in square feet) and assessed value (in thousands of dollars) for the same homes as in Exercise 15.65 are on the WeissStats CD.
15.67 High and Low Temperature. The data from Exer- cise 15.35 for average high and low temperatures in January for a random sample of 50 cities are on the WeissStats CD.
15.68 PCBs and Pelicans. Use the data points given on the WeissStats CD for shell thickness and concentration of PCBs for 60 Anacapa pelican eggs referred to in Exercise 15.36.
15.69 Gas Guzzlers. Use the data on the WeissStats CD for gas mileage and engine displacement for 121 vehicles referred to in Exercise 15.37.
15.70 Estriol Level and Birth Weight. Use the data on the WeissStats CD for estriol levels of pregnant women and birth weights of their children referred to in Exercise 15.38.
15.71 Shortleaf Pines. The data from Exercise 15.39 for vol- ume, in cubic feet, and diameter at breast height, in inches, for 70 shortleaf pines are on the WeissStats CD.
15.72 Body Fat. In the paper “Total Body Composition by Dual-Photon (153Gd) Absorptiometry” (American Journal of Clinical Nutrition, Vol. 40, pp. 834–839), R. Mazess et al. studied methods for quantifying body composition. Eighteen randomly selected adults were measured for percentage of body fat, using dual-photon absorptiometry. Each adult’s age and percentage of body fat are shown on the WeissStats CD.
15.3 Estimation and Prediction
In this section, we examine how a sample regression equation can be used to make two important inferences:
r Estimate the conditional mean of the response variable corresponding to a particular value of the predictor variable.
r Predict the value of the response variable for a particular value of the predictor variable.
We again use the Orion data to illustrate the pertinent ideas. In doing so, we pre- sume that the assumptions for regression inferences (Key Fact 15.1 on page 670) are satisfied by the variables age and price for Orions. Example 15.3 on page 674 shows that to presume so is not unreasonable.
688 CHAPTER 15 Inferential Methods in Regression and Correlation
EXAMPLE 15.7 Estimating Conditional Means in Regression
Age and Price of Orions Use the data on age and price for a sample of 11 Orions, repeated in Table 15.4, to estimate the mean price of all 3-year-old Orions.
TABLE 15.4 Age and price data for a sample of 11 Orions Age (yr) Price ($100)
x y
5 85
4 103
6 70
5 82
5 89
5 98
6 66
6 95
2 169
7 70
7 48
Solution By Assumption 1 for regression inferences, the population regression line gives the mean prices for the various ages of Orions. In particular, the mean price of all 3-year-old Orions isβ0+β1ã3. Becauseβ0 andβ1 are unknown, we estimate the mean price of all 3-year-old Orions (β0+β1ã3) by the corresponding value on the sample regression line, namely,b0+b1ã3.
Recalling that the sample regression equation for the age and price data in Table 15.4 is yˆ =195.47−20.26x, we estimate that the mean price of all 3-year- old Orions is
ˆ
y=195.47−20.26ã3=134.69,
or $13,469. Note that the estimate for the mean price of all 3-year-old Orions is the same as the predicted price for a 3-year-old Orion. Both are obtained by substituting x =3 into the sample regression equation.
Report 15.5
Exercise 15.81(a) on page 695
Confidence Intervals for Conditional Means in Regression
The estimate of $13,469 for the mean price of all 3-year-old Orions found in the previ- ous example is a point estimate. Providing a confidence-interval estimate for the mean price of all 3-year-old Orions would be more informative.
To that end, consider all possible samples of 11 Orions whose ages are the same as those given in the first column of Table 15.4. For such samples, the predicted price of a 3-year-old Orion varies from one sample to another and is therefore a variable.
Using the assumptions for regression inferences, we can show that its distribution is a normal distribution whose mean equals the mean price of all 3-year-old Orions. More generally, we have Key Fact 15.5.
KEY FACT 15.5 Distribution of the Predicted Value of a Response Variable
Suppose that the variablesx and ysatisfy the four assumptions for regres- sion inferences. Let xp denote a particular value of the predictor variable, and letyˆpbe the corresponding value predicted for the response variable by the sample regression equation; that is,yˆp=b0+b1xp. Then, for samples of sizen, each with the same valuesx1,x2, . . .,xnfor the predictor variable, the following properties hold foryˆp.
r The mean ofyˆpequals the conditional mean of the response variable cor- responding to the valuexpof the predictor variable:μyˆp=β0+β1xp. r The standard deviation ofyˆpis
σyˆp=σ
1
n+(xp−xi/n)2 Sxx . r The variableyˆpis normally distributed.
In particular, for fixed values of the predictor variable, the possible predicted values of the response variable corresponding toxphave a normal distribu- tion with meanβ0+β1xp.
In light of Key Fact 15.5, if we standardize the variableyˆp, the resulting variable has the standard normal distribution. However, because the standardized variable con- tains the unknown parameterσ, it cannot be used as a basis for a confidence-interval formula. Therefore we replaceσ by its estimatese, the standard error of the estimate.
The resulting variable has at-distribution.
KEY FACT 15.6 t-Distribution for Confidence Intervals for Conditional Means in Regression
Suppose that the variables x and ysatisfy the four assumptions for regres- sion inferences. Then, for samples of size n, each with the same values x1,x2, . . .,xnfor the predictor variable, the variable
t= yˆp−(β0+β1xp) se
1
n+(xp−xi/n)2 Sxx has thet-distribution with df=n−2.
Recalling thatβ0+β1xp is the conditional mean of the response variable corre- sponding to the value xp of the predictor variable, we can apply Key Fact 15.6 to derive a confidence-interval procedure for means in regression. We call that procedure theconditional meant-interval procedure.
PROCEDURE 15.3 Conditional Meant-Interval Procedure
Purpose To find a confidence interval for the conditional mean of the response variable corresponding to a particular value of the predictor variable,xp
Assumptions The four assumptions for regression inferences
Step 1 For a confidence level of 1−α, use Table IV to find tα/2 with df=n−2.
Step 2 Compute the point estimate, ˆyp=b0+b1xp.
Step 3 The endpoints of the confidence interval for the conditional mean of the response variable are
ˆ
yp±tα/2ãse
1