When you have completed this chapter, you will be able to: Understand the importance of an appropriate model specification and multiple regression analysis, comprehend the nature and technique of multiple regression models and the concept of partial regression coefficients, use the estimation techniques for multiple regression models,...
14 1 Copyright © 2004 by The McGrawHill Companies, Inc. All rights reserved. 14 2 When you have completed this chapter, you will be able to: Understand the importance of an appropriate model specification and multiple regression analysis Comprehend the nature and technique of multiple regression models and the concept of partial regression coefficients Use the estimation techniques for multiple regression models Conduct an analysis of variance of an estimated model Copyright © 2004 by The McGrawHill Companies, Inc. All rights reserved. 14 3 Explain the goodness of fit of an estimated model Draw inferences about the assumed (true) model though a joint test of hypothesis (F test) on the coefficients of all variables Draw inferences about the importance of the independent variables through tests of hypothesis (ttests) Identify the problems raised, and the remedies thereof, by the presence of multicollinearity in the data sets Identify the problems raised, and the remedies thereof, by the presence of outliers/influential observations in the data sets Copyright © 2004 by The McGrawHill Companies, Inc. All rights reserved. 14 4 10 Identify the violation of model assumptions, including linearity, homoscedasticity, autocorrelation, and normality through simple diagnosic procedures. 11 Use some simple remedial measures in the presence of violations of the model assumptions 12 Write a research report on an investigation using multiple regression analysis 13 Comprehend the concept of partial correlations and its importance in multiple regression analysis Copyright © 2004 by The McGrawHill Companies, Inc. All rights reserved. 14 5 14 Draw inferences about the importance of a subset of the importance in multiple regression analysis 15 Use qualitative variables, as well as their interactions with other independent variables through a joint test of hypothesis 16 Apply some advanced diagnostic checks and remedies in multiple regression analysis Copyright © 2004 by The McGrawHill Companies, Inc. All rights reserved. Multiple Regression Analysis For two independent variables, the general form of the multiple regression equation is: y a b 1x b2 x x and x are the independent variables 1 a is the yintercept b1 is the net change in y for each unit change in x1 holding x2 constant. It is called …a partial regression coefficient, …a net regression coefficient, or …just a regression coefficient. Copyright © 2004 by The McGrawHill Companies, Inc. All rights reserved. 14 6 Multiple Regression Analysis The general multiple regression with k independent variables is given by: y a b1x b2 x . bk x k The least squares criterion is used to develop this equation Because determining b1, b2, etc. is very tedious, a software package such as Excel or MINITAB is recommended. Copyright © 2004 by The McGrawHill Companies, Inc. All rights reserved. 14 7 Multiple Standard Error of Estimate 14 8 … is a measure of the effectiveness of the regression equation … is measured in the same units as the dependent variable … it is difficult to determine what is a large value and what is a small value of the standard error! Σ( y − y ) n − ( k + 1) Copyright © 2004 by The McGrawHill Companies, Inc. All rights reserved. Multiple Regression and Correlation Multiple Regression and Correlation Assumptions Assumptions 14 9 … the independent variables and the dependent variables have a linear relationship … the dependent variable must be continuous and at least intervalscale … the variation in (y y) or residual must be the same for all values of y. When this is the case, we say the difference exhibits homoscedasticity … the residuals should follow the normal distribution with mean of 0 … successive values of the dependent variable must be uncorrelated Copyright © 2004 by The McGrawHill Companies, Inc. All rights reserved. The AVOVA Table The AVOVA Table 14 10 … reports the variation in the dependent variable … the variation is divided into two components: a. … the Explained Variation is that accounted for by the set of independent variable b. … the Unexplained or Random Variation is not accounted for by the independent variables Copyright © 2004 by The McGrawHill Companies, Inc. All rights reserved. 14 17 … continued Family Food Income Size 3900 376 5300 515 4300 516 4 4900 468 5 6400 538 6 7300 626 7 4900 543 5300 437 6100 608 10 6400 513 11 7400 493 12 5800 563 Copyright © 2004 by The McGrawHill Companies, Inc. All rights reserved. Student 14 18 … continued Use a computer software package, such as MINITAB or Excel, to develop a correlation matrix From the analysis provided by MINITAB, write out the regression equation: What food expenditure would you estimate What food expenditure would you estimate for a family of 4, with no college students, for a family of 4, with no college students, and an income and an income of $50,000 (which of $50,000 (which is input as 500)? is input as 500)? Copyright © 2004 by The McGrawHill Companies, Inc. All rights reserved. 14 19 … continued The regression equation is Food = 954 + 1.09 Income + 748 Size + 565 Student Predictor Constant Income Size Student Coef 954 1.092 748.4 564.5 SE Coef 1581 3.153 303.0 495.1 S = 572.7 R-Sq = 80.4% T 0.60 0.35 2.47 1.14 P 0.563 0.738 0.039 0.287 R-Sq(adj) = 73.1% Analysis of Variance Source Regression Residual Error Total DF 11 SS 10762903 2623764 13386667 MS 3587634 327970 F 10.94 y = 954 +1.09x11 + 748x + 748x22 + 565x + 565x33 y = 954 +1.09x Copyright © 2004 by The McGrawHill Companies, Inc. All rights reserved. P 0.003 14 20 … continued The regression equation is Food = 954 + 1.09 Income + 748 Size + 565 Student Predictor Coef SE Coef T P Constant 954 1581 0.60 0.563 Income 1.092 3.153 0.35 0.738 Size 748.4 303.0 2.47 0.039 Student 564.5 495.1 1.14 0.287 S=572.7 R-Sq = 80.4% R-Sq(adj) = 73.1% Analysis of Variance Source DF SS MS F Regression 10762903 3587634 10.94 Residual Error 2623764 327970 Total 11 13386667 From the regression From the regression output we note: output we note: The coefficient of The coefficient of determination determination is 80.4 is 80.4 percent. percent. P This means that 0.003 more than 80 percent of the variation in the amount spent on food is accounted for by the variables income, family size, and student Copyright © 2004 by The McGrawHill Companies, Inc. All rights reserved. 14 21 … continued The regression equation is Food = 954 + 1.09 Income + 748 Size + 565 Student Predictor Coef SE Coef T P Constant 954 1581 0.60 0.563 Income 1.092 3.153 0.35 0.738 Size 748.4 303.0 2.47 0.039 Student 564.5 495.1 1.14 0.287 An additional family An additional family member will member will increase the amount increase the amount spent per year on spent per year on food by $748 food by $748 S=572.7 R-Sq = 80.4% R-Sq(adj) = 73.1% Analysis of Variance Source DF SS MS F Regression 10762903 3587634 10.94 Residual Error 2623764 327970 Total 11 13386667 A family with a A family with a college student will college student will spend $565 more spend $565 more per year on food per year on food than those without than those without a college student a college student Copyright © 2004 by The McGrawHill Companies, Inc. All rights reserved. P 0.003 14 22 … continued The correlation matrix is as follows: The correlation matrix is as follows: Food Income Size Income 0.587 Size 0.876 0.609 Student 0.773 0.491 0.743 The strongest correlation between The strongest correlation between the dependent variable (Food) and the dependent variable (Food) and an independent variable is between an independent variable is between family size and amount spent on family size and amount spent on food food None of the correlations among the None of the correlations among the independent variables should cause independent variables should cause problems. problems. All are between –.70 and .70 All are between –.70 and .70 Copyright © 2004 by The McGrawHill Companies, Inc. All rights reserved. 14 23 … continued Find the estimated food expenditure for a family of 4 with a $500 (that is $50,000) income and no college student The regression equation is… Food = 954 + 1.09 Income + 748 Size + 565 Student y = 954 + 1.09(500) + 748(4) + 565(0) = $4,491 Copyright © 2004 by The McGrawHill Companies, Inc. All rights reserved. 14 24 … continued The regression equation is Food = 954 + 1.09 Income + 748 Size + 565 Student Predictor Coef SE Coef T P Constant 954 1581 0.60 0.563 Income 1.092 3.153 0.35 0.738 Size 748.4 303.0 2.47 0.039 Student 564.5 495.1 1.14 0.287 S=572.7 R-Sq = 80.4% R-Sq(adj) = 73.1% Analysis of Variance Source DF SS MS F Regression 10762903 3587634 10.94 0.003 Residual Error 2623764 327970 Total 11 13386667 Conduct a global test of hypothesis to determine if any of the regression coefficients are not zero H0 : H1 : at least one H0 is rejected if F>4.07 …from the MINITAB output, the computed value Decision: H0 is rejected of F is 10.94 P Not all the regression coefficients are zero Copyright © 2004 by The McGrawHill Companies, Inc. All rights reserved. 14 25 … continued Conduct an individual test to determine which coefficients are not zero (This is the H 0: H :hypothesis for the independent variable … Using the 5% level of significance, family size) The regression equation is Food = 954 + 1.09 Income + 748 Size + 565 Student Predictor Coef SECoef T P Constant 954 1581 0.60 0.563 Income 1.092 3.153 0.35 0.738 Size 748.4 303.0 2.47 0.039 Student 564.5 495.1 1.14 0.287 reject H0 if the P value