Understand how multiple regression analysis can be used to develop relationships involving one dependent variable and several independent variables.. Be able to interpret and use compute
Trang 1Multiple Regression
Learning Objectives
1 Understand how multiple regression analysis can be used to develop relationships involving one
dependent variable and several independent variables
2 Be able to interpret the coefficients in a multiple regression analysis
3 Know the assumptions necessary to conduct statistical tests involving the hypothesized regression
model
4 Understand the role of computer packages in performing multiple regression analysis
5 Be able to interpret and use computer output to develop the estimated regression equation
6 Be able to determine how good a fit is provided by the estimated regression equation
7 Be able to test for the significance of the regression equation
8 Understand how multicollinearity affects multiple regression analysis
9 Know how residual analysis can be used to make a judgement as to the appropriateness of the
model, identify outliers, and determine which observations are influential
10 Understand how logistic regression is used for regression analyses involving a binary dependent
variable
Trang 23 a b1 = 3.8 is an estimate of the change in y corresponding to a 1 unit change in x1 when x2, x3, and x4
are held constant
b2 = -2.3 is an estimate of the change in y corresponding to a 1 unit change in x2 when x1, x3, and x4
are held constant
b3 = 7.6 is an estimate of the change in y corresponding to a 1 unit change in x3 when x1, x2, and x4
are held constant
b4 = 2.7 is an estimate of the change in y corresponding to a 1 unit change in x4 when x1, x2, and x3
are held constant
b ˆy = 17.6 + 3.8(10) – 2.3(5) + 7.6(1) + 2.7(2) = 57.1
15 - 2
Trang 34 a ˆy = 25 + 10(15) + 8(10) = 255; sales estimate: $255,000
b Sales can be expected to increase by $10 for every dollar increase in inventory investment when advertising expenditure is held constant Sales can be expected to increase by $8 for every dollar increase in advertising expenditure when inventory investment is held constant
5 a The Minitab output is shown below:
The regression equation is
Total 7 25.500
b The Minitab output is shown below:
The regression equation is
Revenue = 83.2 + 2.29 TVAdv + 1.30 NewsAdv
Predictor Coef SE Coef T P
Total 7 25.500
c No, it is 1.60 in part (a) and 2.29 above In part (b) it represents the marginal change in revenue due
to an increase in television advertising with newspaper advertising held constant
d Revenue = 83.2 + 2.29(3.5) + 1.30(1.8) = $93.56 or $93,560
6 a The Minitab output is shown below:
The regression equation is
Proportion Won = 0.354 + 0.000888 HR
Predictor Coef SE Coef T P
Constant 0.35402 0.09591 3.69 0.002
Trang 4b A portion of the Minitab output is shown below:
The regression equation is
Proportion Won = 0.865 - 0.0837 ERA
Predictor Coef SE Coef T P
c A portion of the Excel output is shown below:
The regression equation is
Proportion Won = 0.709 + 0.00140 HR - 0.103 ERA
Predictor Coef SE Coef T P
7 a The Minitab output is shown below:
The regression equation is
15 - 4
Trang 5b The Minitab output is shown below:
The regression equation is
PCW Rating = 40.0 + 0.113 Performance + 0.382 Features
Predictor Coef SE Coef T P
8 a The Minitab output is shown below:
The regression equation is
Trang 6Because the p-value = 040 < α = 05, there is a significant relationship between price and the
reliability rating
b The Minitab output is shown below:
The regression equation is
Price ($) = 21313 + 137 Road-Test Score - 1446 Reliability
Predictor Coef SE Coef T P
9 a The Minitab output is shown below:
The regression equation is
Trang 710 a A portion of the Minitab output is shown below:
The regression equation is
c A portion of the Minitab output is shown below:
The regression equation is
e ˆy= -1.2346 + 4.817(.45) - 2.5895(.34) + 03443(17) = 638
11 a SSE = SST - SSR = 6,724.125 - 6,216.375 = 507.75
b 2 SSR 6, 216.375
.924SST 6,724.125
Trang 8d The estimated regression equation provided an excellent fit.
c Yes; after adjusting for the number of independent variables in the model, we see that 90.5% of the
variability in y has been accounted for.
.975SST 1805
b Multiple regression analysis is preferred since both R2 andR show an increased percentage of the a2
variability of y explained when both independent variables are used.
16 a No, r2 = 153
b Using both independent variables provides a much better fit r2 = 858 and R a2 =.837
.597SST 107.426
Trang 9Because p-value ≤α, β2 is significant.
20 A portion of the Minitab output is shown below
The regression equation is
Total 9 15182.9
Trang 10a Since the p-value corresponding to F = 43.50 is 000 < α = 05, we reject H0: β1 = β2 = 0; there is a significant relationship.
b Since the p-value corresponding to t = 8.13 is 000 < α = 05, we reject H0: β1 = 0; β1 is significant
c Since the p-value corresponding to t = 5.00 is 002 < α = 05, we reject H0: β2 = 0; β2 is significant
21 a In the two independent variable case the coefficient of x1 represents the expected change in y
corresponding to a one unit increase in x1 when x2 is held constant In the single independent
variable case the coefficient of x1 represents the expected change in y corresponding to a one unit increase in x1
b Yes If x1 and x2 are correlated one would expect a change in x1 to be accompanied by a change in
Trang 11Because p-value ≤α, β2 is significant and x2 should not be dropped from the model.
24 a The Minitab output is shown below:
The regression equation is
Salary = - 0.682 + 0.0498 Revenue + 0.0147 %Wins
Predictor Coef SE Coef T P
b Because the p-value = 001< α= 05, there is a significant relationship.
c For Revenue: Because the p-value = 001 < α= 05, Revenue is significant.
For %Wins: Because the p-value = 025 < α= 05, %Wins is significant.
25 a The Minitab output is shown below:
The regression equation is
Rating = 0.345 + 0.255 TradeEx + 0.132 Use + 0.459 Range
Predictor Coef SE Coef T P
Total 9 3.10000
b Because the p-value = 003 < α= 05, there is a significant relationship
c For TradeEx: Because the p-value = 025 < α= 05, TradeEx is significant
For Use: Because the p-value = 382 > α= 05, Use is not significant.
For Range: Because the p-value = 010 < α= 05, Range is significant.
The Minitab output after removing Use is shown below:
Trang 12The regression equation is
Rating = 0.672 + 0.264 TradeEx + 0.485 Range
Predictor Coef SE Coef T P
Total 9 3.1000
The coefficient of determination for the estimated regression equation developed in part (a) is
886 After the removal of Use, the coefficient of determination is 869 There is very little difference in the fit provided by the two estimated regression equations But, because Use is not significant, this result is as expected
26 a Since the p-value corresponding to F = 10.77 is 0000 <α = 05, there is a significant relationship
between the percentage of games won and the independent variables
b All of the independent variables are significant because the p-values corresponding to the t test
are all less than α = 05
27 a ˆy = 29.1270 + 5906(180) + 4980(310) = 289.8150
b The point estimate for an individual value is ˆy = 289.8150, the same as the point estimate of the
mean value
28 a Using Minitab, the 95% confidence interval is 132.16 to 154.16
b Using Minitab, the 95% prediction interval is 111.13 to 175.18
29 a ˆy = 83.2 + 2.29(3.5) + 1.30(1.8) = 93.555 or $93,555
Note: In Exercise 5b, the Minitab output also shows that b0 = 83.230, b1 = 2.2902,
and b2 = 1.3010; hence, ˆy = 83.230 + 2.2902x1 + 1.3010x2 Using this estimated regression
Trang 13computations and this will not be an issue.
The Minitab output is shown below:
Fit Stdev.Fit 95% C.I 95% P.I
93.588 0.291 ( 92.840, 94.335) ( 91.774, 95.401)
Note that the value of FIT ( ˆy ) is 93.588.
b Confidence interval estimate: 92.840 to 94.335 or $92,840 to $94,335
c Prediction interval estimate: 91.774 to 95.401 or $91,774 to $95,401
30 The Minitab output used to answer parts (a) and (b) follows:
The regression equation is
a The 95% confidence interval is 46.758 to 50.646
b The 95% prediction interval for the Svfara SV609 is 44.815 to 52.589
31 a A portion of the Minitab output is shown below:
The regression equation is
Overall = - 0.06 + 0.276 Handling + 0.447 Dependability + 0.270 Fitand Finish
Predictor Coef SE Coef T P
Trang 14b The Minitab output showing the confidence and prediction intervals is shown below:
Predicted Values for New Observations
d β2 = E(y | level 2) - E(y | level 1)
β1 is the change in E(y) for a 1 unit change in x1 holding x2 constant
Trang 15β3 = E(y | level 3) - E(y | level 1)
β1 is the change in E(y) for a 1 unit change in x1 holding x2 and x3 constant
34 a $15,300
b Estimate of sales = 10.1 - 4.2(2) + 6.8(8) + 15.3(0) = 56.1 or $56,100
c Estimate of sales = 10.1 - 4.2(1) + 6.8(3) + 15.3(1) = 41.6 or $41,600
35 a Let Type = 0 if a mechanical repair
Type = 1 if an electrical repair
The Minitab output is shown below:
The regression equation is
Total 9 10.476
b The estimated regression equation did not provide a good fit In fact, the p-value of 408 shows that
the relationship is not significant for any reasonable value of α
c Person = 0 if Bob Jones performed the service and Person = 1 if Dave Newton performed the service The Minitab output is shown below:
The regression equation is
Trang 16Residual Error 8 4.0760 0.5095
Total 9 10.4760
d We see that 61.1% of the variability in repair time has been explained by the repair person that performed the service; an acceptable, but not good, fit
36 a The Minitab output is shown below:
The regression equation is
Time = 1.86 + 0.291 Months + 1.10 Type - 0.609 Person
Predictor Coef SE Coef T P
Total 9 10.4760
b Since the p-value corresponding to F = 18.04 is 002 < α = 05, the overall model is statistically significant
c The p-value corresponding to t = -1.57 is 167 > α = 05; thus, the addition of Person is not
statistically significant Person is highly correlated with Months (the sample correlation coefficient
is -.691); thus, once the effect of Months has been accounted for, Person will not add much to the model
37 a A portion of the Minitab output follows:
The regression equation is
Trang 17b Because the p-value = 005 < α = 05, there is a significant relationship.
c Let Type_Italian = 1 if the restaurant is an Italian restaurant; 0 otherwise
d A portion of the Minitab output follows:
The regression equation is
Score = 67.4 + 0.573 Price + 3.04 Type_Italian
Predictor Coef SE Coef T P
e For the Type_Italian dummy variable, the p-value = 017 < α = 05; thus, type of restaurant is a
significant factor in overall customer satisfaction
f The estimated regression equation computed in part (d) is ˆy = 67.4 + 573(Price) +
Thus, the satisfaction score increases by 3.04 points
38 a The Minitab output is shown below:
The regression equation is
Risk = - 91.8 + 1.08 Age + 0.252 Pressure + 8.74 Smoker
Predictor Coef SE Coef T P
Trang 18Regression 3 3660.7 1220.2 36.82 0.000Residual Error 16 530.2 33.1
Total 19 4190.9
b Since the p-value corresponding to t = 2.91 is 010 < α = 05, smoking is a significant factor
c Using Minitab, the point estimate is 34.27; the 95% prediction interval is 21.35 to 47.18 Thus, the probability of a stroke (.2135 to 4718 at the 95% confidence level) appears to be quite high The physician would probably recommend that Art quit smoking and begin some type of treatment designed to reduce his blood pressure
39 a The Minitab output is shown below:
The regression equation is
c Using Minitab, we obtained the following values:
StudentizedDeleted Residual
15 - 18
Trang 1940 a The Minitab output is shown below:
The regression equation is
Trang 20Revenue = 83.2 + 2.29 TVAdv + 1.30 NewsAdv
b Using Minitab we obtained the following values:
ˆi
y
StandardizedResidual
With the relatively few observations, it is difficult to determine if any of the assumptions
regarding the error term have been violated For instance, an argument could be made that there does not appear to be any pattern in the plot; alternatively an argument could be made that there is
a curvilinear pattern in the plot
c The values of the standardized residuals are greater than -2 and less than +2; thus, using test, there are no outliers As a further check for outliers, we used Minitab to compute the following
studentized deleted residuals:
Observation
StudentizedDeleted Residual
15 - 20
Trang 21Since none of the studentized deleted residuals is less than -2.776 or greater than 2.776, we
conclude that there are no outliers in the data
d Using Minitab we obtained the following values:
p
n
Since none of the values exceed 1.125, we conclude that there are no influential observations
However, using Cook’s distance measure, we see that D1 > 1 (rule of thumb critical value); thus, weconclude the first observation is influential Final Conclusion: observations 1 is an influential observation
42 a The Minitab output is shown below:
The regression equation is
Speed = 71.3 + 0.107 Price + 0.0845 Horsepwr
Predictor Coef SE Coef T P
Total 15 995.95
Trang 22X denotes an observation whose X value gives it large influence.
b The standardized residual plot is shown below There appears to be a very unusual trend in the standardized residuals
110 105
100 95
d The Minitab output shown in part (a) identifies observation 2 as an influential observation
43 a The Minitab output is shown below:
The regression equation is
Scoring Avg = 58.1 - 10.7 Greens in Reg + 11.7 Putting Avg.Predictor Coef SE Coef T P
Trang 23R denotes an observation with a large standardized residual.
X denotes an observation whose X value gives it large influence
b The standardized residual plot is shown below:
71.5 71.0
70.5 70.0
3 2 1 0 -1 -2 -3
The standardized residual plot does not support the assumption aboutε There are three unusual observations and the variance of the residuals appears to be increasing for larger values of ˆy
c The Minitab output in part (a) identified two outliers: observations 1 and 30 Observation 1 corresponds to Annika Sorenstam; her scoring average was much lower than the other players Observation 30 corresponds to Karine Icher; although her performance in terms of greens in regulation and putting average was very good, her scoring average was much higher than most of the other players
d The Minitab output in part (a) identified two influential observations: observations 1 and 14 Observation 1 corresponds to Annika Sorenstam and observation 14 corresponds to Soo-Yun Kang
44 a
0
0( )
1
x x
=
+
b It is an estimate of the probability that a customer that does not have a Simmons credit card will make a purchase