Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 216 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
216
Dung lượng
19,23 MB
Nội dung
CHAPTER Hypothesis Tests for One Population Mean CHAPTER OUTLINE CHAPTER OBJECTIVES 9.1 The Nature of In Chapter 8, we examined methods for obtaining confidence intervals for one population mean We know that a confidence interval for a population mean, μ, is based on a sample mean, x ¯ Now we show how that statistic can be used to make decisions about hypothesized values of a population mean For example, suppose that we want to decide whether the mean prison sentence, μ, of all people imprisoned last year for drug offenses exceeds the year 2000 mean of 75.5 months To make that decision, we can take a random sample of people imprisoned last year for drug offenses, compute their sample mean sentence, x, ¯ and then apply a statistical-inference technique called a hypothesis test In this chapter, we describe hypothesis tests for one population mean In doing so, we consider two different procedures They are called the one-mean z-test and the one-mean t-test, which are the hypothesis-test analogues of the one-mean z-interval and one-mean t-interval confidence-interval procedures, respectively, discussed in Chapter We also examine two different approaches to hypothesis testing—namely, the critical-value approach and the P-value approach Hypothesis Testing 9.2 Critical-Value Approach to Hypothesis Testing 9.3 P-Value Approach to Hypothesis Testing 9.4 Hypothesis Tests for One Population Mean When σ Is Known 9.5 Hypothesis Tests for One Population Mean When σ Is Unknown CASE STUDY Gender and Sense of Direction Many of you have been there, a classic scene: mom yelling at dad to turn left, while dad decides to just the opposite Well, who made the right call? More generally, who has a better sense of direction, women or men? Dr J Sholl et al considered these and related questions in the paper “The Relation of Sex and Sense of 340 Direction to Spatial Orientation in an Unfamiliar Environment” (Journal of Environmental Psychology, Vol 20, pp 17–28) In their study, the spatial orientation skills of 30 male students and 30 female students from Boston College were challenged in Houghton Garden Park, a wooded park near campus in Newton, Massachusetts Before driving to the park, the participants were asked to rate their own sense of direction as either good or poor In the park, students were instructed to point to predesignated landmarks and also to the direction of south Pointing was carried out by students moving a pointer attached to a 360◦ protractor; the angle of 9.1 The Nature of Hypothesis Testing the pointing response was then recorded to the nearest degree For the female students who had rated their sense of direction to be good, the following table displays the pointing errors (in degrees) when they attempted to point south Based on these data, can you conclude that, in general, women who consider themselves to have a good sense of direction really better, on average, than they would 9.1 14 91 27 122 68 128 78 20 109 31 69 341 12 36 18 by randomly guessing at the direction of south? To answer that question, you need to conduct a hypothesis test, which you will after you study hypothesis testing in this chapter The Nature of Hypothesis Testing We often use inferential statistics to make decisions or judgments about the value of a parameter, such as a population mean For example, we might need to decide whether the mean weight, μ, of all bags of pretzels packaged by a particular company differs from the advertised weight of 454 grams (g), or we might want to determine whether the mean age, μ, of all cars in use has increased from the year 2000 mean of 9.0 years One of the most commonly used methods for making such decisions or judgments is to perform a hypothesis test A hypothesis is a statement that something is true For example, the statement “the mean weight of all bags of pretzels packaged differs from the advertised weight of 454 g” is a hypothesis Typically, a hypothesis test involves two hypotheses: the null hypothesis and the alternative hypothesis (or research hypothesis), which we define as follows ? DEFINITION 9.1 What Does It Mean? Originally, the word null in null hypothesis stood for “no difference” or “the difference is null.” Over the years, however, null hypothesis has come to mean simply a hypothesis to be tested Null and Alternative Hypotheses; Hypothesis Test Null hypothesis: A hypothesis to be tested We use the symbol H to represent the null hypothesis Alternative hypothesis: A hypothesis to be considered as an alternative to the null hypothesis We use the symbol H a to represent the alternative hypothesis Hypothesis test: The problem in a hypothesis test is to decide whether the null hypothesis should be rejected in favor of the alternative hypothesis For instance, in the pretzel-packaging example, the null hypothesis might be “the mean weight of all bags of pretzels packaged equals the advertised weight of 454 g,” and the alternative hypothesis might be “the mean weight of all bags of pretzels packaged differs from the advertised weight of 454 g.” Choosing the Hypotheses The first step in setting up a hypothesis test is to decide on the null hypothesis and the alternative hypothesis The following are some guidelines for choosing these two hypotheses Although the guidelines refer specifically to hypothesis tests for one population mean, μ, they apply to any hypothesis test concerning one parameter 342 CHAPTER Hypothesis Tests for One Population Mean Null Hypothesis In this book, the null hypothesis for a hypothesis test concerning a population mean, μ, always specifies a single value for that parameter Hence we can express the null hypothesis as H0: μ = μ0 , where μ0 is some number Alternative Hypothesis The choice of the alternative hypothesis depends on and should reflect the purpose of the hypothesis test Three choices are possible for the alternative hypothesis r If the primary concern is deciding whether a population mean, μ, is different from a specified value μ0 , we express the alternative hypothesis as Ha: μ = μ0 A hypothesis test whose alternative hypothesis has this form is called a two-tailed test r If the primary concern is deciding whether a population mean, μ, is less than a specified value μ0 , we express the alternative hypothesis as Ha: μ < μ0 A hypothesis test whose alternative hypothesis has this form is called a left-tailed test r If the primary concern is deciding whether a population mean, μ, is greater than a specified value μ0 , we express the alternative hypothesis as Ha: μ > μ0 A hypothesis test whose alternative hypothesis has this form is called a right-tailed test A hypothesis test is called a one-tailed test if it is either left tailed or right tailed EXAMPLE 9.1 Choosing the Null and Alternative Hypotheses Quality Assurance A snack-food company produces a 454-g bag of pretzels Although the actual net weights deviate slightly from 454 g and vary from one bag to another, the company insists that the mean net weight of the bags be 454 g As part of its program, the quality assurance department periodically performs a hypothesis test to decide whether the packaging machine is working properly, that is, to decide whether the mean net weight of all bags packaged is 454 g a Determine the null hypothesis for the hypothesis test b Determine the alternative hypothesis for the hypothesis test c Classify the hypothesis test as two tailed, left tailed, or right tailed Solution Let μ denote the mean net weight of all bags packaged a The null hypothesis is that the packaging machine is working properly, that is, that the mean net weight, μ, of all bags packaged equals 454 g In symbols, H0: μ = 454 g b The alternative hypothesis is that the packaging machine is not working properly, that is, that the mean net weight, μ, of all bags packaged is different from 454 g In symbols, Ha: μ = 454 g c This hypothesis test is two tailed because a does-not-equal sign (=) appears in the alternative hypothesis 9.1 The Nature of Hypothesis Testing EXAMPLE 9.2 343 Choosing the Null and Alternative Hypotheses Prices of History Books The R R Bowker Company collects information on the retail prices of books and publishes the data in The Bowker Annual Library and Book Trade Almanac In 2005, the mean retail price of history books was $78.01 Suppose that we want to perform a hypothesis test to decide whether this year’s mean retail price of history books has increased from the 2005 mean a Determine the null hypothesis for the hypothesis test b Determine the alternative hypothesis for the hypothesis test c Classify the hypothesis test as two tailed, left tailed, or right tailed Solution Let μ denote this year’s mean retail price of history books a The null hypothesis is that this year’s mean retail price of history books equals the 2005 mean of $78.01; that is, H0: μ = $78.01 b The alternative hypothesis is that this year’s mean retail price of history books is greater than the 2005 mean of $78.01; that is, Ha: μ > $78.01 c This hypothesis test is right tailed because a greater-than sign (>) appears in the alternative hypothesis EXAMPLE 9.3 Choosing the Null and Alternative Hypotheses Poverty and Dietary Calcium Calcium is the most abundant mineral in the human body and has several important functions Most body calcium is stored in the bones and teeth, where it functions to support their structure Recommendations for calcium are provided in Dietary Reference Intakes, developed by the Institute of Medicine of the National Academy of Sciences The recommended adequate intake (RAI) of calcium for adults (ages 19–50 years) is 1000 milligrams (mg) per day Suppose that we want to perform a hypothesis test to decide whether the average adult with an income below the poverty level gets less than the RAI of 1000 mg a Determine the null hypothesis for the hypothesis test b Determine the alternative hypothesis for the hypothesis test c Classify the hypothesis test as two tailed, left tailed, or right tailed Solution Let μ denote the mean calcium intake (per day) of all adults with incomes below the poverty level Exercise 9.5 on page 346 a The null hypothesis is that the mean calcium intake of all adults with incomes below the poverty level equals the RAI of 1000 mg per day; that is, H0: μ = 1000 mg b The alternative hypothesis is that the mean calcium intake of all adults with incomes below the poverty level is less than the RAI of 1000 mg per day; that is, Ha: μ < 1000 mg c This hypothesis test is left tailed because a less-than sign ( 0, the variables are positively linearly correlated, meaning that y tends to increase linearly as x increases (and vice versa), with the tendency being greater the closer ρ is to r If ρ < 0, the variables are negatively linearly correlated, meaning that y tends to decrease linearly as x increases (and vice versa), with the tendency being greater the closer ρ is to −1 r If ρ = 0, the variables are linearly correlated Linearly correlated variables are either positively linearly correlated or negatively linearly correlated As we mentioned, a sample linear correlation coefficient, r , is an estimate of the population linear correlation coefficient, ρ Consequently, we can use r as a basis for performing a hypothesis test for ρ To so, we require the following fact KEY FACT 14.9 t-Distribution for a Correlation Test Suppose that the variables x and y satisfy the four assumptions for regression inferences and that ρ = Then, for samples of size n, the variable r t= −r2 n−2 has the t-distribution with df = n − In light of Key Fact 14.9, for a hypothesis test with the null hypothesis H0: ρ = 0, we can use the variable r t= − r2 n−2 as the test statistic and obtain the critical values or P-value from the t-table, Table IV We call this hypothesis-testing procedure the correlation t-test Procedure 14.5 on the next page provides a step-by-step method for performing a correlation t-test by using either the critical-value approach or the P-value approach EXAMPLE 14.11 TABLE 14.5 Age and price data for a sample of 11 Orions Age (yr) x Price ($100) y 5 6 7 85 103 70 82 89 98 66 95 169 70 48 The Correlation t-Test Age and Price of Orions The data on age and price for a sample of 11 Orions are repeated in Table 14.5 At the 5% significance level, the data provide sufficient evidence to conclude that age and price of Orions are negatively linearly correlated? Solution As we discovered in Example 14.3 on page 556, considering that the assumptions for regression inferences are met by the variables age and price for Orions is not unreasonable, at least for Orions between and years old Consequently, we apply Procedure 14.5 to carry out the required hypothesis test Step State the null and alternative hypotheses Let ρ denote the population linear correlation coefficient for the variables age and price of Orions Then the null and alternative hypotheses are, respectively, H0: ρ = (age and price are linearly uncorrelated) Ha: ρ < (age and price are negatively linearly correlated) Note that the hypothesis test is left tailed Step Decide on the significance level, α We are to use α = 0.05 CHAPTER 14 Inferential Methods in Regression and Correlation 580 PROCEDURE 14.5 Correlation t-Test Purpose To perform a hypothesis test for a population linear correlation coefficient, ρ Assumptions The four assumptions for regression inferences Step The null hypothesis is H0: ρ = 0, and the alternative hypothesis is Ha: ρ = Ha: ρ < Ha: ρ > or or (Two tailed) (Left tailed) (Right tailed) Step Decide on the significance level, α Step Compute the value of the test statistic t= r − r2 n−2 and denote that value t0 CRITICAL-VALUE APPROACH P-VALUE APPROACH OR Step The t-statistic has df = n − Use Table IV to estimate the P-value, or obtain it exactly by using technology Step The critical value(s) are ±tα/2 −tα tα or or (Two tailed) (Left tailed) (Right tailed) with df = n − Use Table IV to find the critical value(s) P - value P - value Reject H0 Do not reject H0 Reject H0 Reject Do not reject H0 H0 Do not reject H0 Reject H0 −|t | |t | t t0 Two tailed ␣/2 −t␣/2 t␣/2 Two tailed ␣ ␣ ␣/2 t −t␣ Left tailed t t␣ t Right tailed P- value Left tailed t t0 t Right tailed Step If P ≤ α, reject H0 ; otherwise, not reject H0 Step If the value of the test statistic falls in the rejection region, reject H0 ; otherwise, not reject H0 Step Interpret the results of the hypothesis test Step Compute the value of the test statistic t= r − r2 n−2 In Example 4.10 on page 173, we found that r = −0.924, so the value of the test statistic is −0.924 t= = −7.249 − (−0.924)2 11 − 14.4 Inferences in Correlation CRITICAL-VALUE APPROACH P-VALUE APPROACH OR Step The critical value for a left-tailed test is −tα with df = n − Use Table IV to find the critical value Step The t-statistic has df = n − Use Table IV to estimate the P-value or obtain it exactly by using technology For n = 11, df = Also, α = 0.05 From Table IV, for df = 9, t0.05 = 1.833 Consequently, the critical value is −t0.05 = −1.833, as shown in Fig 14.12A From Step 3, the value of the test statistic is t = −7.249 Because the test is left tailed, the P-value is the probability of observing a value of t of −7.249 or less if the null hypothesis is true That probability equals the shaded area shown in Fig 14.12B FIGURE 14.12A Reject H Do not reject H 581 FIGURE 14.12B t-curve df = t-curve df = P-value 0.05 −1.833 0 t t t = −7.249 Step If the value of the test statistic falls in the rejection region, reject H0 ; otherwise, not reject H0 For n = 11, df = Referring to Fig 14.12B and Table IV, we find that P < 0.005 (Using technology, we obtain P = 0.0000244.) The value of the test statistic, found in Step 3, is t = −7.249 Figure 14.12A shows that this value falls in the rejection region, so we reject H0 The test results are statistically significant at the 5% level Step If P ≤ α, reject H0 ; otherwise, not reject H0 From Step 4, P < 0.005 Because the P-value is less than the specified significance level of 0.05, we reject H0 The test results are statistically significant at the 5% level and (see Table 9.8 on page 360) provide very strong evidence against the null hypothesis Step Interpret the results of the hypothesis test Report 14.8 Exercise 14.109 on page 583 Interpretation At the 5% significance level, the data provide sufficient evidence to conclude that age and price of Orions are negatively linearly correlated Prices for 2- to 7-year-old Orions tend to decrease linearly with increasing age THE TECHNOLOGY CENTER Most statistical technologies have programs that automatically perform a correlation t-test In this subsection, we present output and step-by-step instructions for such programs Note to Minitab users: At the time of this writing, Minitab does only a two-tailed correlation t-test However, we can get a one-tailed P-value from the provided twotailed P-value by using the result of Exercise 9.63 on page 361 This result implies, for instance, that if the sign of the sample linear correlation coefficient is in the same direction as the alternative hypothesis, then the one-tailed P-value equals one-half of the two-tailed P-value EXAMPLE 14.12 Using Technology to Conduct a Correlation t-Test Age and Price of Orions Table 14.5 on page 579 gives the age and price data for a sample of 11 Orions Use Minitab, Excel, or the TI-83/84 Plus to decide, at the 582 CHAPTER 14 Inferential Methods in Regression and Correlation 5% significance level, whether the data provide sufficient evidence to conclude that age and price of Orions are negatively linearly correlated Solution Let ρ denote the population linear correlation coefficient for the variables age and price of Orions We want to perform the hypothesis test H0: ρ = (age and price are linearly uncorrelated) Ha: ρ < (age and price are negatively linearly correlated) at the 5% significance level Note that the hypothesis test is left tailed We applied the correlation t-test programs to the data, resulting in Output 14.3 Steps for generating that output are presented in Instructions 14.3 OUTPUT 14.3 Correlation t-test on the Orion data EXCEL MINITAB TI-83/84 PLUS As shown in Output 14.3, the P-value is less than the specified significance level of 0.05, so we reject H0 At the 5% significance level, the data provide sufficient evidence to conclude that age and price of Orions are negatively linearly correlated INSTRUCTIONS 14.3 Steps for generating Output 14.3 MINITAB Store the age and price data from Table 14.5 in columns named AGE and PRICE, respectively Choose Stat ➤ Basic Statistics ➤ Correlation Specify AGE and PRICE in the Variables text box Cleck the Display p-values text box Click OK EXCEL Store the age and price data from Table 14.5 in ranges named AGE and PRICE, respectively Choose DDXL ➤ Regression Select Correlation from the Function type drop-down list box Specify AGE in the x-Axis Quantitative Variable text box Specify PRICE in the y-Axis Quantitative Variable text box Click OK Click the Perform a Left Tailed Test button TI-83/84 PLUS Store the age and price data from Table 14.5 in lists named AGE and PRICE, respectively Press STAT, arrow over to TESTS, and press ALPHA ➤ F for the TI-84 Plus and ALPHA ➤ E for the TI-83 Plus Press 2nd ➤ LIST, arrow down to AGE, and press ENTER twice Press 2nd ➤ LIST, arrow down to PRICE, and press ENTER three times Highlight 14.103 x y −4 −5 x y Ha: ρ < x y 6 2 5 290 280 295 425 384 315 355 328 425 325 At the 5% level of significance, the data provide sufficient evidence to conclude that age and price of Corvettes are negatively linearly correlated? 14.110 Custom Homes Following are the size and price data for custom homes from Exercise 14.18 x 26 27 33 29 29 34 30 40 22 y 540 555 575 577 606 661 738 804 496 At the 0.5% significance level, the data provide sufficient evidence to conclude that, for custom homes in the Equestrian Estates, size and price are positively linearly correlated? 14.111 Plant Emissions Following are the data on plant weight and quantity of volatile emissions from Exercise 14.19 14.104 Ha: ρ = x 57 85 57 65 52 67 62 80 77 53 68 y 8.0 22.0 10.5 22.5 12.0 11.5 7.5 13.0 16.5 21.0 12.0 14.105 x y −1 Ha: ρ > 14.106 x 1 5 y x 2 y −2 Ha: ρ < 14.107 Ha: ρ = In Exercises 14.108–14.113, we repeat the information from Exercises 14.16–14.21 Presuming that the assumptions for regression inferences are met, perform the required correlation t-tests, using either the critical-value approach or the P-value approach 14.108 Tax Efficiency Following are the data on percentage of investments in energy securities and tax efficiency from Exercise 14.16 Do the data suggest that, for the potato plant Solanum tuberosom, weight and quantity of volatile emissions are linearly correlated? Use α = 0.05 14.112 Crown-Rump Length Following are the data on age of fetuses and length of crown-rump from Exercise 14.20 x 10 10 13 13 18 19 19 23 25 28 y 66 66 108 106 161 166 177 228 235 280 At the 10% significance level, the data provide sufficient evidence to conclude that age and crown-rump length are linearly correlated? 14.113 Study Time and Score Following are the data on total hours studied over weeks and test score at the end of the weeks from Exercise 14.21 x 10 15 12 20 16 14 22 y 92 81 84 74 85 80 84 80 CHAPTER 14 Inferential Methods in Regression and Correlation 584 a At the 1% significance level, the data provide sufficient evidence to conclude that a negative linear correlation exists between study time and test score for beginning calculus students? b Repeat part (a) using a 5% significance level 14.114 Height and Score A random sample of 10 students was taken from an introductory statistics class The following data were obtained, where x denotes height, in inches, and y denotes score on the final exam x 71 68 71 65 66 68 68 64 62 65 y 87 96 66 71 71 55 83 67 86 60 At the 5% significance level, the data provide sufficient evidence to conclude that, for students in introductory statistics courses, height and final exam score are linearly correlated? 14.115 Is ρ a parameter or a statistic? What about r ? Explain your answers Working with Large Data Sets In each of Exercises 14.116–14.126, use the technology of your choice to decide whether you can reasonably apply the correlation t-test If so, perform and interpret the required correlation t-test(s) at the 5% significance level 14.116 Birdies and Score The data from Exercise 14.30 for number of birdies during a tournament and final score of 63 women golfers are on the WeissStats CD Do the data provide sufficient evidence to conclude that, for women golfers, number of birdies and score are negatively linearly correlated? 14.117 U.S Presidents The data from Exercise 14.31 for the ages at inauguration and of death of the presidents of the United States are on the WeissStats CD Do the data provide sufficient evidence to conclude that, for U.S presidents, age at inauguration and age at death are positively linearly correlated? 14.118 Health Care The data from Exercise 14.32 for percentage of gross domestic product (GDP) spent on health care and life expectancy, in years, of selected countries are on the WeissStats CD Do each gender separately 14.119 Acreage and Value The data from Exercise 14.33 for lot size (in acres) and assessed value (in thousands of dollars) of a sample of homes in a particular area are on the WeissStats CD Do the data provide sufficient evidence to conclude that, for homes in this particular area, lot size and assessed value are positively linearly correlated? 14.120 Home Size and Value The data from Exercise 14.34 for home size (in square feet) and assessed value (in thousands of dollars) for the same homes as in Exercise 14.119 are on the WeissStats CD Do the data provide sufficient evidence to conclude that, for homes in this particular area, home size and assessed value are positively linearly correlated? 14.121 High and Low Temperature The data from Exercise 14.35 for average high and low temperatures in January of a random sample of 50 cities are on the WeissStats CD Do the data provide sufficient evidence to conclude that, for cities, average high and low temperatures in January are linearly correlated? 14.122 PCBs and Pelicans The data from Exercise 14.36 for shell thickness and concentration of PCBs of 60 Anacapa pelican eggs are on the WeissStats CD Do the data provide sufficient evidence to conclude that concentration of PCBs and shell thickness are linearly correlated for Anacapa pelican eggs? 14.123 Gas Guzzlers The data from Exercise 14.37 for gas mileage and engine displacement of 121 vehicles are on the WeissStats CD Do the data provide sufficient evidence to conclude that engine displacement and gas mileage are negatively linearly correlated? 14.124 Estriol Level and Birth Weight The data from Exercise 14.38 for estriol levels of pregnant women and birth weights of their children are on the WeissStats CD Do the data provide sufficient evidence to conclude that estriol level and birth weight are positively linearly correlated? 14.125 Shortleaf Pines The data from Exercise 14.39 for volume, in cubic feet, and diameter at breast height, in inches, of 70 shortleaf pines are on the WeissStats CD Do the data provide sufficient evidence to conclude that diameter at breast height and volume are positively linearly correlated for shortleaf pines? 14.126 Body Fat The data from Exercise 14.72 for age and body fat of 18 randomly selected adults are on the WeissStats CD a Do the data provide sufficient evidence to conclude that, for adults, age and percentage of body fat are positively linearly correlated? b Remove the potential outlier and repeat part (a) c Compare your results with and without the removal of the potential outlier and state your conclusions CHAPTER IN REVIEW You Should Be Able to use and understand the formulas in this chapter state the assumptions for regression inferences understand the difference between the population regression line and a sample regression line estimate the regression parameters β0 , β1 , and σ determine the standard error of the estimate perform a residual analysis to check the assumptions for regression inferences perform a hypothesis test to decide whether the slope, β1 , of the population regression line is not and hence whether x is useful for predicting y obtain a confidence interval for β1 Chapter 14 Review Problems determine a point estimate and a confidence interval for the conditional mean of the response variable corresponding to a particular value of the predictor variable 10 determine a predicted value and a prediction interval for the response variable corresponding to a particular value of the predictor variable 585 11 understand the difference between the population correlation coefficient and a sample correlation coefficient 12 perform a hypothesis test for a population linear correlation coefficient Key Terms conditional distribution, 551 conditional mean, 551 conditional mean t-interval procedure, 571 correlation t-test, 580 linearly correlated variables, 579 linearly uncorrelated variables, 578 multiple regression analysis, 574 negatively linearly correlated variables, 579 population linear correlation coefficient (ρ), 578 population regression equation, 552 population regression line, 552 positively linearly correlated variables, 579 predicted value t-interval procedure, 573 prediction interval, 572 regression model, 552 regression t-interval procedure, 567 regression t-test, 564 residual (e), 555 residual plot, 556 residual standard deviation, 555 sampling distribution of the slope of the regression line, 563 simple linear regression, 574 standard error of the estimate (se ), 554 REVIEW PROBLEMS Understanding the Concepts and Skills Suppose that x and y are two variables of a population with x a predictor variable and y a response variable a The distribution of all possible values of the response variable y corresponding to a particular value of the predictor distribution of the response varivariable x is called a able b State the four assumptions for regression inferences Suppose that x and y are two variables of a population and that the assumptions for regression inferences are met with x as the predictor variable and y as the response variable a What statistic is used to estimate the slope of the population regression line? b What statistic is used to estimate the y-intercept of the population regression line? c What statistic is used to estimate the common conditional standard deviation of the response variable corresponding to fixed values of the predictor variable? What two plots did we use in this chapter to decide whether we can reasonably presume that the assumptions for regression inferences are met by two variables of a population? What properties should those plots have? Regarding analysis of residuals, decide in each case which assumption for regression inferences may be violated a A residual plot—that is, a plot of the residuals against the observed values of the predictor variable—shows curvature b A residual plot becomes wider with increasing values of the predictor variable c A normal probability plot of the residuals shows extreme curvature d A normal probability plot of the residuals shows outliers but is otherwise roughly linear Suppose that you perform a hypothesis test for the slope of the population regression line with the null hypothesis H0: β1 = and the alternative hypothesis Ha: β1 = If you reject the null hypothesis, what can you say about the utility of the regression equation for making predictions? Identify three statistics that can be used as a basis for testing the utility of a regression For a particular value of a predictor variable, is there a difference between the predicted value of the response variable and the point estimate for the conditional mean of the response variable? Explain your answer Generally speaking, what is the difference between a confidence interval and a prediction interval? Fill in the blank: x¯ is to μ as r is to 10 Identify the relationship between two variables and the terminology used to describe that relationship if a ρ > b ρ = c ρ < 11 Graduation Rates Graduation rate—the percentage of entering freshmen attending full time and graduating within years—and what influences it have become a concern in U.S colleges and universities U.S News and World Report’s “College Guide” provides data on graduation rates for colleges and universities as a function of the percentage of freshmen in the top 10% of their high school class, total spending per student, 586 CHAPTER 14 Inferential Methods in Regression and Correlation and student-to-faculty ratio A random sample of 10 universities gave the following data on student-to-faculty ratio (S/F ratio) and graduation rate (Grad rate) S/F ratio x Grad rate y S/F ratio x Grad rate y 16 20 17 19 22 45 55 70 50 47 17 17 17 10 18 46 50 66 26 60 Discuss what satisfying the assumptions for regression inferences would mean with student-to-faculty ratio as the predictor variable and graduation rate as the response variable 12 Graduation Rates Refer to Problem 11 a Determine the regression equation for the data b Compute and interpret the standard error of the estimate c Presuming that the assumptions for regression inferences are met, interpret your answer to part (b) 13 Graduation Rates Refer to Problems 11 and 12 Perform a residual analysis to decide whether considering the assumptions for regression inferences to be met by the variables student-tofaculty ratio and graduation rate is reasonable For Problems 14–16, presume that the variables student-tofaculty ratio and graduation rate satisfy the assumptions for regression inferences 14 Graduation Rates Refer to Problems 11 and 12 a At the 5% significance level, the data provide sufficient evidence to conclude that student-to-faculty ratio is useful as a predictor of graduation rate? b Determine a 95% confidence interval for the slope, β1 , of the population regression line that relates graduation rate to student-to-faculty ratio Interpret your answer 15 Graduation Rates Refer to Problems 11 and 12 a Find a point estimate for the mean graduation rate of all universities that have a student-to-faculty ratio of 17 b Determine a 95% confidence interval for the mean graduation rate of all universities that have a student-to-faculty ratio of 17 c Find the predicted graduation rate for a university that has a student-to-faculty ratio of 17 d Find a 95% prediction interval for the graduation rate of a university that has a student-to-faculty ratio of 17 e Explain why the prediction interval in part (d) is wider than the confidence interval in part (b) 16 Graduation Rates Refer to Problem 11 At the 2.5% significance level, the data provide sufficient evidence to conclude that the variables student-to-faculty ratio and graduation rate are positively linearly correlated? Working with Large Data Sets In Problems 17–20, use the technology of your choice to a determine the sample regression equation b find and interpret the standard error of the estimate c decide, at the 5% significance level, whether the data provide sufficient evidence to conclude that the predictor variable is useful for predicting the response variable d determine and interpret a point estimate for the conditional mean of the response variable corresponding to the specified value of the predictor variable e find and interpret a 95% confidence interval for the conditional mean of the response variable corresponding to the specified value of the predictor variable f determine and interpret the predicted value of the response variable corresponding to the specified value of the predictor variable g find and interpret a 95% prediction interval for the value of the response variable corresponding to the specified value of the predictor variable h compare and discuss the differences between the confidence interval that you obtained in part (e) and the prediction interval that you obtained in part (g) i perform and interpret the required correlation t-test at the 5% significance level j perform a residual analysis to decide whether making the preceding inferences is reasonable Explain your answer 17 IMR and Life Expectancy From the International Data Base, published by the U.S Census Bureau, we obtained data on infant mortality rate (IMR) and life expectancy (LE), in years, for a sample of 60 countries The data are presented on the WeissStats CD r For the estimations and predictions, use an IMR of 30 r For the correlation test, decide whether IMR and life expectancy are negatively linearly correlated 18 High Temperature and Precipitation The National Oceanic and Atmospheric Administration publishes temperature and precipitation information for cities around the world in Climates of the World Data on average high temperature (in degrees Fahrenheit) in July and average precipitation (in inches) in July for 48 cities are on the WeissStats CD r For the estimations and predictions, use an average July temperature of 83◦ F r For the correlation test, decide whether average high temperature in July and average precipitation in July are linearly correlated 19 Fat Consumption and Prostate Cancer Researchers have asked whether there is a relationship between nutrition and cancer, and many studies have shown that there is In fact, one of the conclusions of a study by B Reddy et al., “Nutrition and Its Relationship to Cancer” (Advances in Cancer Research, Vol 32, pp 237–345), was that “ none of the risk factors for cancer is probably more significant than diet and nutrition.” One dietary factor that has been studied for its relationship with prostate cancer is fat consumption On the WeissStats CD, you will find data on per capita fat consumption (in grams per day) and prostate cancer death rate (per 100,000 males) for nations of the world The data were obtained from a graph—adapted from information in the article mentioned—in J Robbins’s classic book Diet for a New America (Walpole, NH: Stillpoint, 1987, p 271) r For the estimations and predictions, use a per capita fat consumption of 92 grams per day r For the correlation test, decide whether per capita fat consumption and prostate cancer death rate are positively linearly correlated Chapter 14 Case Study Discussion 20 Masters Golf In the article “Statistical Fallacies in Sports” (Chance, Vol 19, No 4, pp 50–56), S Berry discussed, among other things, the relation between scores for the first and second rounds of the 2006 Masters golf tournament You will find those scores on the WeissStats CD Take these scores to be a sample of those of all Masters golf tournaments 587 r For the estimations and predictions, use a first-round score of 72 r For the correlation test, decide whether first-round and second-round scores are positively linearly correlated FOCUSING ON DATA ANALYSIS UWEC UNDERGRADUATES Recall from Chapter (refer to page 30) that the Focus database and Focus sample contain information on the undergraduate students at the University of Wisconsin - Eau Claire (UWEC) Now would be a good time for you to review the discussion about these data sets Open the Focus sample worksheet (FocusSample) in the technology of your choice and the following a Perform a residual analysis to decide whether considering the assumptions for regression inferences met by the variables high school percentile and cumulative GPA appears reasonable b With high school percentile as the predictor variable and cumulative GPA as the response variable, determine and interpret the standard error of the estimate c At the 5% significance level, the data provide sufficient evidence to conclude that high school percentile is d e f g h useful for predicting cumulative GPA of UWEC undergraduates? Determine a point estimate for the mean cumulative GPA of all UWEC undergraduates who had high school percentiles of 74 Find a 95% confidence interval for the mean cumulative GPA of all UWEC undergraduates who had high school percentiles of 74 Determine the predicted cumulative GPA of a UWEC undergraduate who had a high school percentile of 74 Find a 95% prediction interval for the cumulative GPA of a UWEC undergraduate who had a high school percentile of 74 At the 5% significance level, the data provide sufficient evidence to conclude that high school percentile and cumulative GPA are positively linearly correlated? CASE STUDY DISCUSSION SHOE SIZE AND HEIGHT At the beginning of this chapter, we repeated data from Chapter on shoe size and height for a sample of students at Arizona State University In Chapter 4, you used those data to perform some descriptive regression and correlation analyses Now you are to employ those same data to carry out several inferential procedures in regression and correlation We recommend that you use statistical software or a graphing calculator to solve the following problems, but they can also be done by hand: a Separate the data in the table on page 551 into two tables, one for males and the other for females Parts (b)–(j) are for the male data b Determine the sample regression equation with shoe size as the predictor variable for height c Perform a residual analysis to decide whether considering Assumptions 1–3 for regression inferences to be satisfied by the variables shoe size and height appears reasonable d Find and interpret the standard error of the estimate e Determine the P-value for a test of whether shoe size is useful for predicting height Then refer to Table 9.8 on page 360 to assess the evidence in favor of utility f Find a point estimate for the mean height of all males who wear a size 10 12 shoe g Obtain a 95% confidence interval for the mean height of all males who wear a size 10 12 shoe Interpret your answer h Determine the predicted height of a male who wears a size 10 12 shoe i Find a 95% prediction interval for the height of a male who wears a size 10 12 shoe Interpret your answer j At the 5% significance level, the data provide sufficient evidence to conclude that shoe size and height are positively linearly correlated? k Repeat parts (b)–(j) for the unabridged data on shoe size and height for females Do the estimation and prediction problems for a size shoe l Repeat part (k) for the data on shoe size and height for females with the outlier removed Compare your results with those obtained in part (k) 588 CHAPTER 14 Inferential Methods in Regression and Correlation BIOGRAPHY SIR FRANCIS GALTON: DISCOVERER OF REGRESSION AND CORRELATION Francis Galton was born on February 16, 1822, into a wealthy Quaker family of bankers and gunsmiths on his father’s side and as a cousin of Charles Darwin on his mother’s side Although his IQ was estimated to be about 200, his formal education was unfinished He began training in medicine in Birmingham and London but quit when, in his words, “A passion for travel seized me as if I had been a migratory bird.” After a tour through Germany and southeastern Europe, he went to Trinity College in Cambridge to study mathematics He left Cambridge in his third year, broken from overwork He recovered quickly and resumed his medical studies in London However, his father died before he had finished medical school and left to him, at 22, “a sufficient fortune to make me independent of the medical profession.” Galton held no professional or academic positions; nearly all his experiments were conducted at his home or performed by friends He was curious about almost every- thing, and carried out research in fields that included meteorology, biology, psychology, statistics, and genetics The origination of the concepts of regression and correlation, developed by Galton as tools for measuring the influence of heredity, are summed up in his work Natural Inheritance He discovered regression during experiments with sweet-pea seeds to determine the law of inheritance of size He made his other great discovery, correlation, while applying his techniques to the problem of measuring the degree of association between the sizes of two different body organs of an individual In his later years, Galton was associated with Karl Pearson, who became his champion and an extender of his ideas Pearson was the first holder of the chair of eugenics at University College in London, which Galton had endowed in his will Galton was knighted in 1909 He died in Haslemere, Surrey, England, in 1911