Statistics for Business and Economics chapter 15 Multiple Regression

Understand how multiple regression analysis can be used to develop relationships involving one dependent variable and several independent variables.. Be able to interpret and use compute

Trang 1

Multiple Regression

Learning Objectives

1 Understand how multiple regression analysis can be used to develop relationships involving one

dependent variable and several independent variables

2 Be able to interpret the coefficients in a multiple regression analysis

3 Know the assumptions necessary to conduct statistical tests involving the hypothesized regression

model

4 Understand the role of computer packages in performing multiple regression analysis

5 Be able to interpret and use computer output to develop the estimated regression equation

6 Be able to determine how good a fit is provided by the estimated regression equation

7 Be able to test for the significance of the regression equation

8 Understand how multicollinearity affects multiple regression analysis

9 Know how residual analysis can be used to make a judgement as to the appropriateness of the

model, identify outliers, and determine which observations are influential

10 Understand how logistic regression is used for regression analyses involving a binary dependent

variable

Trang 2

3 a b1 = 3.8 is an estimate of the change in y corresponding to a 1 unit change in x1 when x2, x3, and x4

are held constant

b2 = -2.3 is an estimate of the change in y corresponding to a 1 unit change in x2 when x1, x3, and x4

are held constant

b3 = 7.6 is an estimate of the change in y corresponding to a 1 unit change in x3 when x1, x2, and x4

are held constant

b4 = 2.7 is an estimate of the change in y corresponding to a 1 unit change in x4 when x1, x2, and x3

are held constant

b ˆy = 17.6 + 3.8(10) – 2.3(5) + 7.6(1) + 2.7(2) = 57.1

15 - 2

Trang 3

4 a ˆy = 25 + 10(15) + 8(10) = 255; sales estimate: $255,000

b Sales can be expected to increase by $10 for every dollar increase in inventory investment when advertising expenditure is held constant Sales can be expected to increase by $8 for every dollar increase in advertising expenditure when inventory investment is held constant

5 a The Minitab output is shown below:

The regression equation is

Total 7 25.500

b The Minitab output is shown below:

Revenue = 83.2 + 2.29 TVAdv + 1.30 NewsAdv

Predictor Coef SE Coef T P

Total 7 25.500

c No, it is 1.60 in part (a) and 2.29 above In part (b) it represents the marginal change in revenue due

to an increase in television advertising with newspaper advertising held constant

d Revenue = 83.2 + 2.29(3.5) + 1.30(1.8) = $93.56 or $93,560

Proportion Won = 0.354 + 0.000888 HR

Constant 0.35402 0.09591 3.69 0.002

Trang 4

b A portion of the Minitab output is shown below:

Proportion Won = 0.865 - 0.0837 ERA

c A portion of the Excel output is shown below:

Proportion Won = 0.709 + 0.00140 HR - 0.103 ERA

15 - 4

Trang 5

PCW Rating = 40.0 + 0.113 Performance + 0.382 Features

Trang 6

Because the p-value = 040 < α = 05, there is a significant relationship between price and the

reliability rating

Price ($) = 21313 + 137 Road-Test Score - 1446 Reliability

Trang 7

10 a A portion of the Minitab output is shown below:

c A portion of the Minitab output is shown below:

e ˆy= -1.2346 + 4.817(.45) - 2.5895(.34) + 03443(17) = 638

11 a SSE = SST - SSR = 6,724.125 - 6,216.375 = 507.75

b 2 SSR 6, 216.375

.924SST 6,724.125

Trang 8

d The estimated regression equation provided an excellent fit.

c Yes; after adjusting for the number of independent variables in the model, we see that 90.5% of the

variability in y has been accounted for.

.975SST 1805

b Multiple regression analysis is preferred since both R2 andR show an increased percentage of the a2

variability of y explained when both independent variables are used.

16 a No, r2 = 153

b Using both independent variables provides a much better fit r2 = 858 and R a2 =.837

.597SST 107.426

Trang 9

Because p-value ≤α, β2 is significant.

20 A portion of the Minitab output is shown below

Total 9 15182.9

Trang 10

a Since the p-value corresponding to F = 43.50 is 000 < α = 05, we reject H0: β1 = β2 = 0; there is a significant relationship.

b Since the p-value corresponding to t = 8.13 is 000 < α = 05, we reject H0: β1 = 0; β1 is significant

c Since the p-value corresponding to t = 5.00 is 002 < α = 05, we reject H0: β2 = 0; β2 is significant

21 a In the two independent variable case the coefficient of x1 represents the expected change in y

corresponding to a one unit increase in x1 when x2 is held constant In the single independent

variable case the coefficient of x1 represents the expected change in y corresponding to a one unit increase in x1

b Yes If x1 and x2 are correlated one would expect a change in x1 to be accompanied by a change in

Trang 11

Because p-value ≤α, β2 is significant and x2 should not be dropped from the model.

Salary = - 0.682 + 0.0498 Revenue + 0.0147 %Wins

b Because the p-value = 001< α= 05, there is a significant relationship.

c For Revenue: Because the p-value = 001 < α= 05, Revenue is significant.

For %Wins: Because the p-value = 025 < α= 05, %Wins is significant.

Rating = 0.345 + 0.255 TradeEx + 0.132 Use + 0.459 Range

Total 9 3.10000

b Because the p-value = 003 < α= 05, there is a significant relationship

c For TradeEx: Because the p-value = 025 < α= 05, TradeEx is significant

For Use: Because the p-value = 382 > α= 05, Use is not significant.

For Range: Because the p-value = 010 < α= 05, Range is significant.

The Minitab output after removing Use is shown below:

Trang 12

Rating = 0.672 + 0.264 TradeEx + 0.485 Range

Total 9 3.1000

The coefficient of determination for the estimated regression equation developed in part (a) is

886 After the removal of Use, the coefficient of determination is 869 There is very little difference in the fit provided by the two estimated regression equations But, because Use is not significant, this result is as expected

26 a Since the p-value corresponding to F = 10.77 is 0000 <α = 05, there is a significant relationship

between the percentage of games won and the independent variables

b All of the independent variables are significant because the p-values corresponding to the t test

are all less than α = 05

27 a ˆy = 29.1270 + 5906(180) + 4980(310) = 289.8150

b The point estimate for an individual value is ˆy = 289.8150, the same as the point estimate of the

mean value

28 a Using Minitab, the 95% confidence interval is 132.16 to 154.16

b Using Minitab, the 95% prediction interval is 111.13 to 175.18

29 a ˆy = 83.2 + 2.29(3.5) + 1.30(1.8) = 93.555 or $93,555

Note: In Exercise 5b, the Minitab output also shows that b0 = 83.230, b1 = 2.2902,

and b2 = 1.3010; hence, ˆy = 83.230 + 2.2902x1 + 1.3010x2 Using this estimated regression

Trang 13

computations and this will not be an issue.

The Minitab output is shown below:

Fit Stdev.Fit 95% C.I 95% P.I

93.588 0.291 ( 92.840, 94.335) ( 91.774, 95.401)

Note that the value of FIT ( ˆy ) is 93.588.

b Confidence interval estimate: 92.840 to 94.335 or $92,840 to $94,335

c Prediction interval estimate: 91.774 to 95.401 or $91,774 to $95,401

30 The Minitab output used to answer parts (a) and (b) follows:

a The 95% confidence interval is 46.758 to 50.646

b The 95% prediction interval for the Svfara SV609 is 44.815 to 52.589

31 a A portion of the Minitab output is shown below:

Overall = - 0.06 + 0.276 Handling + 0.447 Dependability + 0.270 Fitand Finish

Trang 14

b The Minitab output showing the confidence and prediction intervals is shown below:

Predicted Values for New Observations

d β2 = E(y | level 2) - E(y | level 1)

β1 is the change in E(y) for a 1 unit change in x1 holding x2 constant

Trang 15

β3 = E(y | level 3) - E(y | level 1)

β1 is the change in E(y) for a 1 unit change in x1 holding x2 and x3 constant

34 a $15,300

b Estimate of sales = 10.1 - 4.2(2) + 6.8(8) + 15.3(0) = 56.1 or $56,100

c Estimate of sales = 10.1 - 4.2(1) + 6.8(3) + 15.3(1) = 41.6 or $41,600

35 a Let Type = 0 if a mechanical repair

Type = 1 if an electrical repair

The Minitab output is shown below:

Total 9 10.476

b The estimated regression equation did not provide a good fit In fact, the p-value of 408 shows that

the relationship is not significant for any reasonable value of α

c Person = 0 if Bob Jones performed the service and Person = 1 if Dave Newton performed the service The Minitab output is shown below:

Trang 16

Residual Error 8 4.0760 0.5095

Total 9 10.4760

d We see that 61.1% of the variability in repair time has been explained by the repair person that performed the service; an acceptable, but not good, fit

Time = 1.86 + 0.291 Months + 1.10 Type - 0.609 Person

Total 9 10.4760

b Since the p-value corresponding to F = 18.04 is 002 < α = 05, the overall model is statistically significant

c The p-value corresponding to t = -1.57 is 167 > α = 05; thus, the addition of Person is not

statistically significant Person is highly correlated with Months (the sample correlation coefficient

is -.691); thus, once the effect of Months has been accounted for, Person will not add much to the model

37 a A portion of the Minitab output follows:

Trang 17

b Because the p-value = 005 < α = 05, there is a significant relationship.

c Let Type_Italian = 1 if the restaurant is an Italian restaurant; 0 otherwise

d A portion of the Minitab output follows:

Score = 67.4 + 0.573 Price + 3.04 Type_Italian

e For the Type_Italian dummy variable, the p-value = 017 < α = 05; thus, type of restaurant is a

significant factor in overall customer satisfaction

f The estimated regression equation computed in part (d) is ˆy = 67.4 + 573(Price) +

Thus, the satisfaction score increases by 3.04 points

Risk = - 91.8 + 1.08 Age + 0.252 Pressure + 8.74 Smoker

Trang 18

Regression 3 3660.7 1220.2 36.82 0.000Residual Error 16 530.2 33.1

Total 19 4190.9

b Since the p-value corresponding to t = 2.91 is 010 < α = 05, smoking is a significant factor

c Using Minitab, the point estimate is 34.27; the 95% prediction interval is 21.35 to 47.18 Thus, the probability of a stroke (.2135 to 4718 at the 95% confidence level) appears to be quite high The physician would probably recommend that Art quit smoking and begin some type of treatment designed to reduce his blood pressure

c Using Minitab, we obtained the following values:

StudentizedDeleted Residual

15 - 18

Trang 19

Trang 20

Revenue = 83.2 + 2.29 TVAdv + 1.30 NewsAdv

b Using Minitab we obtained the following values:

ˆi

y

StandardizedResidual

With the relatively few observations, it is difficult to determine if any of the assumptions

regarding the error term have been violated For instance, an argument could be made that there does not appear to be any pattern in the plot; alternatively an argument could be made that there is

a curvilinear pattern in the plot

c The values of the standardized residuals are greater than -2 and less than +2; thus, using test, there are no outliers As a further check for outliers, we used Minitab to compute the following

studentized deleted residuals:

Observation

StudentizedDeleted Residual

15 - 20

Trang 21

Since none of the studentized deleted residuals is less than -2.776 or greater than 2.776, we

conclude that there are no outliers in the data

d Using Minitab we obtained the following values:

p

n

Since none of the values exceed 1.125, we conclude that there are no influential observations

However, using Cook’s distance measure, we see that D1 > 1 (rule of thumb critical value); thus, weconclude the first observation is influential Final Conclusion: observations 1 is an influential observation

Speed = 71.3 + 0.107 Price + 0.0845 Horsepwr

Total 15 995.95

Trang 22

X denotes an observation whose X value gives it large influence.

b The standardized residual plot is shown below There appears to be a very unusual trend in the standardized residuals

110 105

100 95

d The Minitab output shown in part (a) identifies observation 2 as an influential observation

Scoring Avg = 58.1 - 10.7 Greens in Reg + 11.7 Putting Avg.Predictor Coef SE Coef T P

Trang 23

R denotes an observation with a large standardized residual.

X denotes an observation whose X value gives it large influence

b The standardized residual plot is shown below:

71.5 71.0

70.5 70.0

3 2 1 0 -1 -2 -3

The standardized residual plot does not support the assumption aboutε There are three unusual observations and the variance of the residuals appears to be increasing for larger values of ˆy

c The Minitab output in part (a) identified two outliers: observations 1 and 30 Observation 1 corresponds to Annika Sorenstam; her scoring average was much lower than the other players Observation 30 corresponds to Karine Icher; although her performance in terms of greens in regulation and putting average was very good, her scoring average was much higher than most of the other players

d The Minitab output in part (a) identified two influential observations: observations 1 and 14 Observation 1 corresponds to Annika Sorenstam and observation 14 corresponds to Soo-Yun Kang

44 a

0

0( )

1

x x

=

+

b It is an estimate of the probability that a customer that does not have a Simmons credit card will make a purchase

Định dạng
Số trang	36
Dung lượng	706,5 KB