Chapter 15 Multiple Regression Learning Objectives Understand how multiple regression analysis can be used to develop relationships involving one dependent variable and several independent variables Be able to interpret the coefficients in a multiple regression analysis Know the assumptions necessary to conduct statistical tests involving the hypothesized regression model Understand the role of computer packages in performing multiple regression analysis Be able to interpret and use computer output to develop the estimated regression equation Be able to determine how good a fit is provided by the estimated regression equation Be able to test for the significance of the regression equation Understand how multicollinearity affects multiple regression analysis Know how residual analysis can be used to make a judgement as to the appropriateness of the model, identify outliers, and determine which observations are influential 10 Understand how logistic regression is used for regression analyses involving a binary dependent variable 15 - © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Chapter 15 Solutions: a b1 = 5906 is an estimate of the change in y corresponding to a unit change in x1 when x2 is held constant b2 = 4980 is an estimate of the change in y corresponding to a unit change in x2 when x1 is held constant b yˆ = 29.1270 + 5906(180) + 4980(310) = 289.82 a The estimated regression equation is yˆ = 45.06 + 1.94x1 An estimate of y when x1 = 45 is yˆ = 45.06 + 1.94(45) = 132.36 b The estimated regression equation is yˆ = 85.22 + 4.32x2 An estimate of y when x2 = 15 is yˆ = 85.22 + 4.32(15) = 150.02 c The estimated regression equation is yˆ = -18.37 + 2.01x1 + 4.74x2 An estimate of y when x1 = 45 and x2 = 15 is yˆ = -18.37 + 2.01(45) + 4.74(15) = 143.18 a b1 = 3.8 is an estimate of the change in y corresponding to a unit change in x1 when x2, x3, and x4 are held constant b2 = -2.3 is an estimate of the change in y corresponding to a unit change in x2 when x1, x3, and x4 are held constant b3 = 7.6 is an estimate of the change in y corresponding to a unit change in x3 when x1, x2, and x4 are held constant b4 = 2.7 is an estimate of the change in y corresponding to a unit change in x4 when x1, x2, and x3 are held constant b yˆ = 17.6 + 3.8(10) – 2.3(5) + 7.6(1) + 2.7(2) = 57.1 15 - © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Multiple Regression a yˆ = 25 + 10(15) + 8(10) = 255; sales estimate: $255,000 b Sales can be expected to increase by $10 for every dollar increase in inventory investment when advertising expenditure is held constant Sales can be expected to increase by $8 for every dollar increase in advertising expenditure when inventory investment is held constant a The Minitab output is shown below: The regression equation is Revenue = 88.6 + 1.60 TVAdv Predictor Constant TVAdv Coef 88.638 1.6039 S = 1.215 SE Coef 1.582 0.4778 R-Sq = 65.3% T 56.02 3.36 P 0.000 0.015 R-Sq(adj) = 59.5% Analysis of Variance Source Regression Residual Error Total b DF SS 16.640 8.860 25.500 MS 16.640 1.477 F 11.27 P 0.015 The Minitab output is shown below: The regression equation is Revenue = 83.2 + 2.29 TVAdv + 1.30 NewsAdv Predictor Constant TVAdv NewsAdv Coef 83.230 2.2902 1.3010 S = 0.6426 SE Coef 1.574 0.3041 0.3207 R-Sq = 91.9% T 52.88 7.53 4.06 P 0.000 0.001 0.010 R-Sq(adj) = 88.7% Analysis of Variance Source Regression Residual Error Total DF SS 23.435 2.065 25.500 MS 11.718 0.413 F 28.38 P 0.002 c No, it is 1.60 in part (a) and 2.29 above In part (b) it represents the marginal change in revenue due to an increase in television advertising with newspaper advertising held constant d Revenue = 83.2 + 2.29(3.5) + 1.30(1.8) = $93.56 or $93,560 a The Minitab output is shown below: The regression equation is Proportion Won = 0.354 + 0.000888 HR Predictor Constant Coef 0.35402 SE Coef 0.09591 T 3.69 P 0.002 15 - © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Chapter 15 HR 0.0008880 S = 0.0666633 0.0005580 R-Sq = 15.3% 1.59 0.134 R-Sq(adj) = 9.3% Analysis of Variance Source Regression Residual Error Total b DF 14 15 SS 0.011253 0.062216 0.073469 MS 0.011253 0.004444 F 2.53 P 0.134 A portion of the Minitab output is shown below: The regression equation is Proportion Won = 0.865 - 0.0837 ERA Predictor Constant ERA Coef 0.86474 -0.08367 S = 0.0510721 SE Coef 0.09661 0.02223 T 8.95 -3.76 R-Sq = 50.3% P 0.000 0.002 R-Sq(adj) = 46.7% Analysis of Variance Source Regression Residual Error Total c DF 14 15 SS 0.036952 0.036517 0.073469 MS 0.036952 0.002608 F 14.17 P 0.002 A portion of the Excel output is shown below: The regression equation is Proportion Won = 0.709 + 0.00140 HR - 0.103 ERA Predictor Constant HR ERA Coef 0.70919 0.0014006 -0.10260 S = 0.0282980 SE Coef 0.06006 0.0002453 0.01276 R-Sq = 85.8% T 11.81 5.71 -8.04 P 0.000 0.000 0.000 R-Sq(adj) = 83.7% Analysis of Variance Source Regression Residual Error Total d DF 13 15 SS 0.063059 0.010410 0.073469 MS 0.031530 0.000801 F 39.37 P 0.000 yˆ = 709 + 00140(180) - 103(4) = 549 The estimated regression equation indicates that if San Diego can make these changes the estimate of the percentage of games they will win increase to 54.9% a The Minitab output is shown below: The regression equation is 15 - © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Multiple Regression PCW Rating = 66.1 + 0.170 Performance Predictor Constant Performance Coef 66.062 0.16989 S = 2.59221 SE Coef 3.793 0.05407 R-Sq = 55.2% T 17.42 3.14 P 0.000 0.014 R-Sq(adj) = 49.6% Analysis of Variance Source Regression Residual Error Total b DF SS 66.343 53.757 120.100 MS 66.343 6.720 F 9.87 P 0.014 The Minitab output is shown below: The regression equation is PCW Rating = 40.0 + 0.113 Performance + 0.382 Features Predictor Constant Performance Features Coef 39.982 0.11338 0.3820 S = 1.67285 SE Coef 7.855 0.03846 0.1093 R-Sq = 83.7% T 5.09 2.95 3.49 P 0.001 0.021 0.010 R-Sq(adj) = 79.0% Analysis of Variance Source Regression Residual Error Total DF SS 100.511 19.589 120.100 MS 50.255 2.798 c yˆ = 40.0 + 113(80) + 382(70) = 75.78 or 76 a The Minitab output is shown below: F 17.96 P 0.002 The regression equation is Price ($) = 31054 - 1329 Reliability Predictor Constant Reliability Coef 31054 -1328.7 S = 3717.85 SE Coef 2217 619.1 R-Sq = 12.9% T 14.01 -2.15 P 0.000 0.040 R-Sq(adj) = 10.1% Analysis of Variance Source Regression Residual Error Total DF 31 32 SS 63665063 428495631 492160694 MS 63665063 13822440 F 4.61 P 0.040 yˆ = 31054 - 1328.7 Reliability 15 - © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Chapter 15 Because the p-value = 040 < α = 05, there is a significant relationship between price and the reliability rating b The Minitab output is shown below: The regression equation is Price ($) = 21313 + 137 Road-Test Score - 1446 Reliability Predictor Constant Road-Test Score Reliability S = 3526.04 Coef 21313 136.69 -1446.3 SE Coef 5067 64.69 589.8 R-Sq = 24.2% T 4.21 2.11 -2.45 P 0.000 0.043 0.020 R-Sq(adj) = 19.2% Analysis of Variance Source Regression Residual Error Total DF 30 32 SS 119172860 372987834 492160694 MS 59586430 12432928 F 4.79 P 0.016 yˆ = 21,313+ 136.69 Score - 1446.3 Reliability c yˆ = 21,313+ 136.69(80) - 1446.3(4) = $26,463 a The Minitab output is shown below: The regression equation is TopSpeed = 65.0 - 0.390 Beam + 0.0511 HP Predictor Constant Beam HP Coef 64.966 -0.38959 0.05106 S = 1.59538 SE Coef 9.009 0.09579 0.01312 R-Sq = 59.7% T 7.21 -4.07 3.89 P 0.000 0.001 0.001 R-Sq(adj) = 55.0% Analysis of Variance Source Regression Residual Error Total b DF 17 19 SS 64.157 43.269 107.426 MS 32.078 2.545 F 12.60 P 0.000 yˆ = 64.966 - 38959 Beam + 05106 HP = 64.966 - 38959(85) + 05106(330) = 48.70 Thus, an estimate of the top speed for the Svfara SV609 is 48.7 mph 15 - © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Multiple Regression 10 a A portion of the Minitab output is shown below: The regression equation is PCT = - 1.22 + 3.96 FG% Predictor Constant FG% Coef -1.2207 3.958 S = 0.126636 SE Coef 0.6617 1.519 R-Sq = 20.1% T -1.84 2.60 P 0.076 0.015 R-Sq(adj) = 17.1% Analysis of Variance Source Regression Residual Error Total DF 27 28 SS 0.10882 0.43299 0.54181 MS 0.10882 0.01604 F 6.79 P 0.015 b An increase of 1% in the percentage of field goals made will increase the percentage of games won by 3.96(.01) = 0396 or approximately 04 c A portion of the Minitab output is shown below: The regression equation is PCT = - 1.23 + 4.82 FG% - 2.59 Opp Pt% + 0.0344 Opp TO Predictor Constant FG% Opp Pt% Opp TO Coef -1.2346 4.817 -2.5895 0.03443 S = 0.0972325 SE Coef 0.6003 1.183 0.7041 0.01253 T -2.06 4.07 -3.68 2.75 R-Sq = 56.4% P 0.050 0.000 0.001 0.011 R-Sq(adj) = 51.1% Analysis of Variance Source Regression Residual Error Total DF 25 28 SS 0.30546 0.23635 0.54181 MS 0.10182 0.00945 F 10.77 P 0.000 d To increase the percentage of games won a team needs to increase the percentage of field goals made, decrease the percentage of three-point shots made by the team’s opponent, and increase the number of turnovers committed by the team’s opponent e yˆ = -1.2346 + 4.817(.45) - 2.5895(.34) + 03443(17) = 638 11 a SSE = SST - SSR = 6,724.125 - 6,216.375 = 507.75 SSR 6, 216.375 = = 924 SST 6, 724.125 b R2 = c Ra2 = − (1 − R ) n −1 10 − = − (1 − 924) = 902 n − p −1 10 − − 15 - © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Chapter 15 d 12 a The estimated regression equation provided an excellent fit R2 = SSR 14, 052.2 = = 926 SST 15,182.9 n −1 10 − = − (1 − 926) = 905 n − p −1 10 − − b Ra2 = − (1 − R ) c Yes; after adjusting for the number of independent variables in the model, we see that 90.5% of the variability in y has been accounted for 13 a R2 = SSR 1760 = = 975 SST 1805 n −1 30 − = − (1 − 975) = 971 n − p −1 30 − − b Ra2 = − (1 − R ) c The estimated regression equation provided an excellent fit 14 a b c 15 a SSR 12, 000 = = 75 SST 16, 000 n −1 Ra2 = − (1 − R ) = − 25 = 68 n − p −1 R2 = The adjusted coefficient of determination shows that 68% of the variability has been explained by the two independent variables; thus, we conclude that the model does not explain a large amount of variability R2 = SSR 23.435 = = 919 SST 25.5 Ra2 = − (1 − R ) b 16 a b 17 a Multiple regression analysis is preferred since both R2 and Ra2 show an increased percentage of the variability of y explained when both independent variables are used No, r2 = 153 Using both independent variables provides a much better fit r2 = 858 and Ra2 = 837 R2 = SSR 64.157 = = 597 SST 107.426 Ra2 = − (1 − R ) b n −1 −1 = − (1 − 919) = 887 n − p −1 − −1 n −1 20 − = − (1 − 597) = 550 n − p −1 20 − − The fit is not very good 15 - © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Multiple Regression 18 a b 19 a r2 = 564 and Ra2 = 511 Although the fit is not very good, the estimated regression equation does explain over 50% of the variability in the dependent variable MSR = SSR/p = 6,216.375/2 = 3,108.188 MSE = b SSE 507.75 = = 72.536 n − p − 10 − − F = MSR/MSE = 3,108.188/72.536 = 42.85 Using F table (2 degrees of freedom numerator and denominator), p-value is less than 01 Actual p-value = 0001 Because p-value ≤ α = 05, the overall model is significant c t = 5906/.0813 = 7.26 Using t table (7 degrees of freedom), area in tail is less than 005; p-value is less than 01 Actual p-value = 0002 Because p-value ≤ α , β1 is significant d t = 4980/.0567 = 8.78 Using t table (7 degrees of freedom), area in tail is less than 005; p-value is less than 01 Actual p-value = 0001 Because p-value ≤ α , β2 is significant 20 A portion of the Minitab output is shown below The regression equation is Y = - 18.4 + 2.01 X1 + 4.74 X2 Predictor Constant X1 X2 S = 12.71 Coef -18.37 2.0102 4.7378 SE Coef 17.97 0.2471 0.9484 R-Sq = 92.6% T -1.02 8.13 5.00 P 0.341 0.000 0.002 R-Sq(adj) = 90.4% Analysis of Variance Source Regression Residual Error Total DF SS 14052.2 1130.7 15182.9 MS 7026.1 161.5 F 43.50 15 - © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part P 0.000 Chapter 15 a Since the p-value corresponding to F = 43.50 is 000 < α = 05, we reject H0: β1 = β2 = 0; there is a significant relationship b Since the p-value corresponding to t = 8.13 is 000 < α = 05, we reject H0: β1 = 0; β1 is significant c Since the p-value corresponding to t = 5.00 is 002 < α = 05, we reject H0: β2 = 0; β2 is significant 21 a b 22 a In the two independent variable case the coefficient of x1 represents the expected change in y corresponding to a one unit increase in x1 when x2 is held constant In the single independent variable case the coefficient of x1 represents the expected change in y corresponding to a one unit increase in x1 Yes If x1 and x2 are correlated one would expect a change in x1 to be accompanied by a change in x2 SSE = SST - SSR = 16000 - 12000 = 4000 s2 = SSE 4000 = = 571.43 n - p -1 MSR = b SSR 12000 = = 6000 p F = MSR/MSE = 6000/571.43 = 10.50 Using F table (2 degrees of freedom numerator and denominator), p-value is less than 01 Actual p-value = 008 Because p-value ≤ α , we reject H0 There is a significant relationship among the variables 23 a F = 28.38 Using F table (2 degrees of freedom numerator and denominator), p-value is less than 01 Actual p-value = 002 Because p-value ≤ α , there is a significant relationship b t = 7.53 Using t table (5 degrees of freedom), area in tail is less than 005; p-value is less than 01 Actual p-value = 001 Because p-value ≤ α , β1 is significant and x1 should not be dropped from the model c t = 4.06 Actual p-value = 010 15 - 10 © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Chapter 15 Source Price Horsepwr DF 1 Seq SS 406.39 509.27 Unusual Observations Obs Price Speed 93.8 108.000 Fit 105.882 SE Fit 2.007 Residual 2.118 St Resid 1.45 X X denotes an observation whose X value gives it large influence b The standardized residual plot is shown below There appears to be a very unusual trend in the standardized residuals Standardized Residual -1 -2 90 95 100 105 Fitted Value 110 115 120 c The Minitab output shown in part (a) did not identify any observations with a large standardized residual; thus, there does not appear to be any outliers in the data d The Minitab output shown in part (a) identifies observation as an influential observation 43 a The Minitab output is shown below: The regression equation is Scoring Avg = 58.1 - 10.7 Greens in Reg + 11.7 Putting Avg Predictor Constant Greens in Reg Putting Avg S = 0.428970 Coef 58.090 -10.736 11.707 SE Coef 6.053 3.016 2.899 R-Sq = 58.3% T 9.60 -3.56 4.04 P 0.000 0.001 0.000 R-Sq(adj) = 55.2% Analysis of Variance Source Regression Residual Error Total DF 27 29 SS 6.9351 4.9684 11.9035 MS 3.4675 0.1840 F 18.84 P 0.000 15 - 22 © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Multiple Regression Unusual Observations Greens in Reg 0.772 0.631 0.728 Obs 14 30 Scoring Avg 69.3300 71.8000 72.1300 Fit 70.2887 72.0366 70.8781 SE Fit 0.2403 0.2478 0.1410 Residual -0.9587 -0.2366 1.2519 St Resid -2.70RX -0.68 X 3.09R R denotes an observation with a large standardized residual X denotes an observation whose X value gives it large influence b The standardized residual plot is shown below: Standardized Residual -1 -2 -3 70.0 70.5 71.0 71.5 Fitted Value 72.0 72.5 The standardized residual plot does not support the assumption about ε There are three unusual observations and the variance of the residuals appears to be increasing for larger values of yˆ c The Minitab output in part (a) identified two outliers: observations and 30 Observation corresponds to Annika Sorenstam; her scoring average was much lower than the other players Observation 30 corresponds to Karine Icher; although her performance in terms of greens in regulation and putting average was very good, her scoring average was much higher than most of the other players d The Minitab output in part (a) identified two influential observations: observations and 14 Observation corresponds to Annika Sorenstam and observation 14 corresponds to Soo-Yun Kang 44 a b E( y) = e β0 + β x + e β0 + β x It is an estimate of the probability that a customer that does not have a Simmons credit card will make a purchase 15 - 23 © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Chapter 15 c A portion of the Minitab binary logistic regression output follows: Logistic Regression Table Predictor Constant Card Coef -0.9445 1.0245 SE Coef 0.3150 0.4235 Z P -3.00 0.003 2.42 0.016 Odds Ratio 2.79 95% CI Lower Upper 1.21 6.39 Log-Likelihood = -64.265 Test that all slopes are zero: G = 6.072, DF = 1, P-Value = 0.014 Thus, the estimated logit is gˆ( x) = -0.9445 + 1.0245x d For customers that not have a Simmons credit card (x = 0) gˆ(0) = -0.9445 + 1.245(0) = 0.9445 and yˆ = e gˆ (0) e −0.9445 0.3889 = = = 0.28 gˆ (0) 1+ e + e −0.9445 + 0.3889 For customers that have a Simmons credit card (x = 1) gˆ(1) = -0.9445 + 1.245(1) = 0.0800 and yˆ = e 45 a b e gˆ (1) e0.08 1.0833 = = = 0.52 gˆ (1) 0.08 1+ e 1+ e + 1.0833 Using the Minitab output shown in part (c), the estimated odds ratio is 2.79 We can conclude that the estimated odds of making a purchase for customers who have a Simmons credit card are 2.79 times greater than the estimated odds of making a purchase for customers that not have a Simmons credit card odds = 3413 = 4584 − 3413 odds = 5790 = 1.3753 − 5790 odds = 4584 (from part (a)) odds ratio = c 46 a odds1 1.3753 = = 3.00 odds 4584 The odds ratio for x2 computed holding annual spending constant at $2000 is also 3.00 This shows that the odds ratio for x2 is independent of the value of x1 E( y) = e β0 + β x + e β0 + β x 15 - 24 © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Multiple Regression b A portion of the Minitab binary logistic regression output follows: Logistic Regression Table Predictor Constant Balance Coef -2.6335 0.22018 SE Coef 0.7985 0.09002 Z P -3.30 0.001 2.45 0.014 Odds Ratio 1.25 95% CI Lower Upper 1.04 1.49 Log-Likelihood = -25.813 Test that all slopes are zero: G = 9.460, DF = 1, P-Value = 0.002 Thus, the estimated logistic regression equation is E( y) = e −2.6355 + 0.22018 x + e−2.6355+ 0.22018 x c Significant result: the p-value corresponding to the G test statistic is 0.0002 d For an average monthly balance of $1000, x = 10 E( y) = e −2.6355 + 0.22018 x e −2.6355 + 0.22018(10) e −0.4317 0.6494 = = = = 0.39 −2.6355 + 0.22018 x −2.6355 + 0.22018(10) −0.4317 1+ e 1+ e 1+ e 1.6494 Thus, an estimate of the probability that customers with an average monthly balance of $1000 will sign up for direct payroll deposit is 0.39 e Repeating the calculations in part (d) using various values for x, a value of x = 12 or an average monthly balance of approximately $1200 is required to achieve this level of probability f Using the Minitab output shown in part (b), the estimated odds ratio is 1.25 Because values of x are measured in hundreds of dollars, the estimated odds of signing up for payroll direct deposit for customers that have an average monthly balance of $600 is 1.25 times greater than the estimated odds of signing up for payroll direct deposit for customers that have an average monthly balance of $500 Moreover, this interpretation is true for any one hundred dollar increment in the average monthly balance 47 a E( y) = e β0 + β1 x1 + β x2 + e β0 + β1 x1 + β x2 b For a given GPA, it is an estimate of the probability that a student who did not attend the orientation program will return to Lakeland for the sophomore year c A portion of the Minitab binary logistic regression output follows: Logistic Regression Table Predictor Constant GPA Program Coef -6.893 2.5388 1.5608 SE Coef 1.747 0.6729 0.5631 Z P -3.94 0.000 3.77 0.000 2.77 0.006 Odds Ratio 12.66 4.76 95% CI Lower Upper 3.39 1.58 47.35 14.36 Log-Likelihood = -40.169 Test that all slopes are zero: G = 47.869, DF = 2, P-Value = 0.000 15 - 25 © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Chapter 15 Thus, the estimated logit is gˆ( x1 , x2 ) = −6.893 + 2.5388 x1 + 1.5608 x2 d Significant result: the p-value corresponding to the G test statistic is 0.0000 e Both variables are significant at α = 01: the p-value for x1 is 0.000 and the p-value for x2 is 0.006 f For x1 =2.5 and x2 = gˆ (2.5, 0) = -6.893 + 2.5388(2.5) + 1.5608(0) = -0.5460 and yˆ = e gˆ (2.5,0) e −0.5460 0.5793 = = = 0.37 + e gˆ (2.5,0) + e−0.5460 + 0.5793 For x1 =2.5 and x2 = gˆ (2.5, 1) = -6.893 + 2.5388(2.5) + 1.5608(1) = 1.0148 and yˆ = e gˆ (2.5,1) e1.0148 2.7588 = = = 0.73 gˆ (2.5,1) 1.0148 1+ e 1+ e + 2.7588 g From the Minitab output in part (c) we see that the estimated odds ratio is 4.76 for the orientation program This means that the odds of students who attended the orientation program continuing are 4.76 times greater than for students who did not attend the program h We recommend making the orientation program required From part (e), we see that the odds of continuing are much higher for students who have attended the orientation program 48 a b E( y) = e β0 + β1 x + e β0 + β1 x A portion of the Minitab binary logistic regression output follows: Logistic Regression Table Predictor Constant Price Coef -2.805 1.1492 SE Coef 1.432 0.5143 Z P -1.96 0.050 2.23 0.025 Odds Ratio 3.16 95% CI Lower Upper 1.15 8.65 Log-Likelihood = -8.200 Test that all slopes are zero: G = 9.465, DF = 1, P-Value = 0.002 Thus, the estimated logit is gˆ( x) = -2.805 + 1.1492x c For chocolates that have a price per serving of $4.00 gˆ(4) = -2.805 + 1.1492(4) = 1.7918 15 - 26 © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Multiple Regression and yˆ = d 49 a b 50 a b 51 a e gˆ (4) e1.7918 6.0002 = = = 0.86 gˆ (4) 1+ e + e1.7918 + 6.0002 Using the Minitab output shown in part (b), the estimated odds ratio is 3.16 We can conclude that the estimated odds of having a quality rating of very good or excellent for a chocolate that has a price of $4.00 per serving is 3.16 times greater than the estimated odds for a chocolate with a price of $3.00 per serving Moreover, this interpretation is true for any one dollar difference in the price per serving The expected increase in final college grade point average corresponding to a one point increase in high school grade point average is 0235 when SAT mathematics score does not change Similarly, the expected increase in final college grade point average corresponding to a one point increase in the SAT mathematics score is 00486 when the high school grade point average does not change yˆ = -1.41 + 0235(84) + 00486(540) = 3.19 Job satisfaction can be expected to decrease by 8.69 units with a one unit increase in length of service if the wage rate does not change A dollar increase in the wage rate is associated with a 13.5 point increase in the job satisfaction score when the length of service does not change yˆ = 14.4 - 8.69(4) + 13.5(6.5) = 67.39 The computer output with the missing values filled in is as follows: The regression equation is Y = 8.103 + 7.602 X1 + 3.111 X2 Predictor Constant X1 X2 S = 3.35 Coef 8.103 7.602 3.111 SE Coef 2.667 2.105 0.613 R-sq = 92.3% T 3.04 3.61 5.08 R-sq (adj) = 91.0% Analysis of Variance SOURCE Regression Residual Error Total b DF 12 14 SS 1612 134.67 1746.67 MS 806 11.2225 F 71.82 F.05 = 3.89 F = 71.82 > F.05; significant relationship Actual p-value = 000 15 - 27 © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Chapter 15 Because p-value ≤ α = 05, the overall relationship is significant c Using t table (12 degrees of freedom), area in tail corresponding to t = 3.61 is less than 005; pvalue is less than 01 Actual p-value = 0000 Because p-value ≤ α , reject H0 : β1 = Using t table (12 degrees of freedom), area in tail corresponding to t = 5.08 is less than 005; pvalue is less than 01 Actual p-value = 0003 Because p-value ≤ α , reject H0 : β2 = d e 52 a See computer output Ra2 = − (1 − 923) 14 = 91 12 The regression equation is Y = -1.41 + 0235 X1 + 00486 X2 Predictor Constant X1 X2 Coef -1.4053 0.023467 00486 S = 0.1298 SE Coef 0.4848 0.008666 0.001077 R-sq = 93.7% T -2.90 2.71 4.51 R-sq (adj) = 91.9% Analysis of Variance SOURCE Regression Residual Error Total b DF SS 1.76209 1179 1.88000 MS 881 0168 F 52.44 Using F table (2 degrees of freedom numerator and degrees of freedom denominator), p-value is less than 01 Actual p-value = 0001 Because p-value ≤ α , there is a significant relationship c for β1 : p-value = 0302; reject H0: β1 = for β : p-value = 0028; reject H0: β = d R2 = SSR = 937 SST 15 - 28 © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Multiple Regression Ra2 = − (1 − 937) = 919 good fit 53 a The regression equation is Y = 14.4 - 8.69 X1 + 13.52 X2 Predictor Constant X1 X2 Coef 14.448 -8.69 13.517 S = 3.773 SE Coef 8.191 1.555 2.085 R-sq = 90.1% T 1.76 -5.59 6.48 R-sq (adj) = 86.1% Analysis of Variance SOURCE Regression Residual Error Total b DF SS 648.83 71.17 720.00 MS 324.415 14.234 F 22.79 F.05 = 5.79 F = 22.79 > F.05; significant relationship Actual p-value = 0031 Because p-value ≤ α = 05, the overall relationship is significant c R2 = SSR = 901 SST Ra2 = − (1 − 901) = 861 good fit d for β1: t = p-value = 0025; reject H0 : β1 = for β2: p-value = 0013; reject H0 : β2 = 54 a A portion of the Minitab output follows: The regression equation is Buy Again = - 7.52 + 1.82 Steering Predictor Constant Steering Coef -7.522 1.8151 SE Coef 1.467 0.1958 T -5.13 9.27 P 0.000 0.000 15 - 29 © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Chapter 15 S = 0.841071 R-Sq = 84.3% R-Sq(adj) = 83.3% Analysis of Variance Source Regression Residual Error Total DF 16 17 SS 60.787 11.318 72.105 MS 60.787 0.707 F 85.93 P 0.000 Because the p-value = 000 < α = 05, there is a significant relationship b The estimated regression equation provided a good fit; 84.3 % of the variability in the Buy Again rating was explained by the linear effect of the Steering rating c A portion of the Minitab output follows: The regression equation is Buy Again = - 5.39 + 0.690 Steering + 0.911 Treadwear Predictor Constant Steering Treadwear Coef -5.388 0.6899 0.9113 S = 0.572723 SE Coef 1.110 0.2875 0.2063 T -4.86 2.40 4.42 R-Sq = 93.2% P 0.000 0.030 0.001 R-Sq(adj) = 92.3% Analysis of Variance Source Regression Residual Error Total d 55 a DF 15 17 SS 67.185 4.920 72.105 MS 33.592 0.328 F 102.41 P 0.000 For the Treadwear independent variable, the p-value = 001 < α = 05; thus, the addition of Treadwear is significant A portion of the Minitab output is shown below: The regression equation is Score = 67.7 + 0.00462 Price Predictor Constant Price Coef 67.676 0.004615 S = 4.51094 SE Coef 2.305 0.001100 R-Sq = 44.5% T 29.36 4.20 P 0.000 0.000 R-Sq(adj) = 41.9% Analysis of Variance Source Regression Residual Error Total DF 22 23 SS 358.29 447.67 805.96 MS 358.29 20.35 F 17.61 P 0.000 15 - 30 © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Multiple Regression b Because the p-value = 000 < α = 05, there is a significant relationship c A portion of the Minitab output is shown below: The regression equation is Score = 65.7 + 0.00232 Price + 10.2 Quality-E + 5.92 Quality-VG Predictor Constant Price Quality-E Quality-VG Coef 65.660 0.002316 10.210 5.925 S = 3.93071 SE Coef 2.507 0.001229 3.438 2.759 R-Sq = 61.7% T 26.19 1.88 2.97 2.15 P 0.000 0.074 0.008 0.044 R-Sq(adj) = 55.9% Analysis of Variance Source Regression Residual Error Total DF 20 23 SS 496.95 309.01 805.96 MS 165.65 15.45 F 10.72 P 0.000 SE Fit 2.001 Residual -6.995 Unusual Observations Obs Price 3200 Score 72.000 Fit 78.995 St Resid -2.07R R denotes an observation with a large standardized residual d Because the p-value = 000 < α = 10, there is a significant overall relationship e Price, Quality-E, and Quality-VG are all significant because for each independent variable the corresponding p-value is less than α = 10 f The standardized residual plot is shown below: 15 - 31 © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Chapter 15 Standardized Residual -1 -2 65 70 75 Fitted Value 80 85 The pattern of the points in the residual plot appears to be reasonable g In the Minitab output in part (c), observation was identified as an observation with a large standardized residual and hence we consider this observation to be an outlier; no influential observations were identified h In part (c) we developed the following estimated regression equation: Score = 65.7 + 0.00232 Price + 10.2 Quality-E + 5.92 Quality-VG Thus, for a treadmill with a price of $2,000 with a good quality rating, an estimate of the overall score is: Score = 65.7 + 0.00232(2000) + 10.2(0) + 5.92(0) = 70.34 An estimate of the score for a price of $2000 with a very good quality rating is: Score = 65.7 + 0.00232(2000) + 10.2(0) + 5.92(1) = 76.26 Thus, the estimate changes by 76.26 – 70.34 = 5.92, the estimated coefficient for the Quality-VG dummy variable 56 a Type of Fund is a categorical variable with three levels Let FundDE = for a domestic equity fund and FundIE = for an international fund The Excel output is shown below: Regression Statistics Multiple R 0.7838 R Square 0.6144 Adjusted R Square 0.5960 Standard Error 5.5978 Observations 45 15 - 32 © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Multiple Regression ANOVA df Regression SS MS 2096.84891048.4245 Residual 42 1316.0771 Total 44 3412.9260 Coefficient s Standard Error F 33.4584 Significance F 2.03818E-09 31.3352 t Stat P-value Lower 95% Upper 95% Intercept 4.9090 1.7702 2.7732 0.0082 1.3366 8.4814 FundDE 10.4658 2.0722 5.0505 9.033E-06 6.2839 14.6477 FundIE 21.6823 2.6553 8.1658 3.288E-10 16.3237 27.0408 yˆ = 4.9090+ 10.4658 FundDE + 21.6823 FundIE Since the p-value corresponding to F = 33.4584 is 0000 < α = 05, there is a significant relationship b R Square = 6144 A reasonably good fit using only Type of Fund c The Excel output follows: Regression Statistics Multiple R 0.8135 R Square 0.6617 Adjusted R Square 0.6279 Standard Error 5.3726 Observations 45 ANOVA df Regression Residual Total Intercept FundDE FundIE Net Asset Value 40 44 SS MS 2258.3432 564.5858 1154.5827 28.8646 3412.9260 Coefficient s Standard Error 1.1899 2.3781 6.8969 2.7651 17.6800 3.3161 0.0265 0.0670 F Significance F 19.5598 5.48647E-09 t Stat P-value 0.5004 0.6196 2.4942 0.0169 5.3315 4.096E-06 0.3950 0.6950 Lower 95% Upper 95% -3.6164 5.9961 1.3083 12.4854 10.9778 24.3821 -0.1089 0.1619 15 - 33 © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Chapter 15 ($) Expense Ratio (%) 6.4564 2.7593 2.3399 0.0244 0.8798 12.0331 Since the p-value corresponding to F = 19.5558 is 0000 < α = 05, there is a significant relationship For Net Asset Value ($), the p-value corresponding to t = 3950 is 6950 > α = 05, Net Asset Value ($) is not significant and can be deleted from the model d Morningstar Rank is a categorical variable The data set only contains funds with four ranks (2Star through –5Star), so three dummy variables are needed Let 3StarRank = for a 3-StarRank, 4StarRank = for a 4-StarRank, and 5StarRank = for a 5-StarRank The Excel output follows: Regression Statistics Multiple R 0.8501 R Square 0.7227 Adjusted R Square 0.6789 Standard Error 4.9904 Observations 45 ANOVA df Regression SS MS 2466.5721 411.0954 Residual 38 946.3539 Total 44 3412.9260 Coefficient s Standard Error F 16.5072 Significance F 2.96759E-09 24.9040 t Stat P-value Lower 95% Upper 95% Intercept -4.6074 3.2909 -1.4000 0.1696 -11.2694 2.0547 FundDE 8.1713 2.2754 3.5912 0.0009 3.5650 12.7776 FundIE 19.5194 2.7795 7.0227 2.292E-08 13.8926 25.1461 Expense Ratio (%) 5.5197 2.5862 2.1343 0.0393 0.2843 10.7552 3StarRank 5.9237 2.8250 2.0969 0.0427 0.2048 11.6426 4StarRank 8.2367 2.8474 2.8927 0.0063 2.4725 14.0009 5StarRank 6.6241 3.1425 2.1079 0.0417 0.2624 12.9858 yˆ = -4.6074 + 8.1713 FundDE + 19.5194 FundIE +5.5197 Expense Ratio (%) + 5.9237 3StarRank + 8.2367 4StarRank + 6.6241 5StarRank At the 05 level of significance, all the independent variables are significant e 57 a yˆ = -4.6074 + 8.1713(1) + 19.5194(0) +5.5197(1.05) + 5.9237(1) + 8.2367(0) +6.62415(0) = 15.28% Excel output follows: 15 - 34 © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Multiple Regression Regression Statistics Multiple R 0.8333 R Square 0.6945 Adjusted R Square 0.6935 Standard Error 2.2212 Observations 311 ANOVA df Regression SS MS 3464.8213 3464.8213 Residual 309 1524.4584 Total 310 4989.2797 Coefficient s Standard Error F 702.3017 Significance F 1.51247E-81 4.9335 t Stat P-value Lower 95% Upper 95% Intercept 35.3950 0.3818 92.6977 1.47E-227 34.6437 36.1464 Displacement -2.8821 0.1088 -26.5010 1.512E-81 -3.0961 -2.6681 Since the p-value corresponding to F = 702.3017 is 0000 < α = 05, there is a significant relationship b The Excel output follows: Regression Statistics Multiple R 0.8827 R Square 0.7791 Adjusted R Square 0.7769 Standard Error 1.8947 Observations 311 ANOVA df Regression SS MS 3887.1290 1295.7097 Residual 307 1102.1508 Total 310 4989.2797 Coefficient s Standard Error Intercept 30.2289 0.5948 F 360.9151 Significance F 2.6788E-100 3.5901 t Stat P-value 50.8225 1.63E-151 Lower 95% 29.0585 15 - 35 © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Upper 95% 31.3993 Chapter 15 ClassMidsize 3.4563 0.3233 10.6900 6.956E-23 2.8201 4.0925 ClassLarge 1.7078 0.4048 4.2190 3.235E-05 0.9113 2.5043 0.1293 -14.9591 2.216E-38 -2.1891 -1.6802 Displacement -1.9347 c For ClassMidsize, the p-value corresponding to t = 10.6900 is 000 < α = 05; significant For ClassLarge, the p-value corresponding to t = 4.2190 is 000 < α = 05; significant The addition of the dummy variables is significant d The Excel output follows: Regression Statistics Multiple R 0.8976 R Square 0.8057 Adjusted R Square 0.8031 Standard Error 1.7801 Observations 311 ANOVA df Regression SS MS 4019.6456 1004.9114 Residual 306 969.6341 Total 310 4989.2797 Coefficient s Standard Error Intercept e F 317.1329 Significance F 1.7602E-107 3.1687 t Stat P-value Lower 95% Upper 95% 30.9548 0.5700 54.3099 3.66E-159 29.8333 32.0764 ClassMidsize 2.9832 0.3124 9.5479 4.436E-19 2.3684 3.5980 ClassLarge 1.5088 0.3815 3.9545 9.534E-05 0.7580 2.2595 3.72E-39 -2.0937 -1.6129 -6.4668 3.954E-10 -1.8659 -0.9953 Displacement -1.8533 0.1222 -15.1719 FuelPremium -1.4306 0.2212 Since the p-value corresponding to F = 317.1329 is 0000 < α = 05, there is a significant overall relationship Because the p-values for each independent variable are also < α = 05, each of the independent variables is significant 15 - 36 © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part ... x2 = 15 is yˆ = 85.22 + 4.32 (15) = 150 .02 c The estimated regression equation is yˆ = -18.37 + 2.01x1 + 4.74x2 An estimate of y when x1 = 45 and x2 = 15 is yˆ = -18.37 + 2.01(45) + 4.74 (15) =... F 12.56 15 - 15 © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part P 0.008 Chapter 15 Residual... 1328.7 Reliability 15 - © 2010 Cengage Learning All Rights Reserved May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part Chapter 15 Because the