Econometrics – lecture 7 – model specifications

MODEL SPECIFICATIONS Dr Tu Thuy Anh Faculty of International Economics Problems  Model misspecification:  Omitting relevant variables  Including irrelevant variables  Function specification: Ramsey test  Structural change: the Chow test OMISSION OF A RELEVANT VARIABLE Consequences of Variable Misspecification True Model Fitted Model Y  b1  b X  u Y  b1  b X  b X  u Yˆ  b1  b2 X Yˆ  b1  b2 X  b3 X To keep the analysis simple, we will assume that there are only two possibilities Either Y depends only on X2, or it depends on both X2 and X3 OMISSION OF A RELEVANT VARIABLE Consequences of Variable Misspecification True Model Fitted Model Y  b1  b X  u Y  b1  b X  b X  u Correct specification, ˆ Y  b1  b2 X no problems Yˆ  b1  b2 X  b3 X Correct specification, no problems If Y depends only on X2, and we fit a simple regression model, we will not encounter any problems, assuming of course that the regression model assumptions are valid Likewise we will not encounter any problems if Y depends on both X2 and X3 and we fit the multiple regression OMISSION OF A RELEVANT VARIABLE Consequences of Variable Misspecification True Model Fitted Model Y  b1  b X  u Y  b1  b X  b X  u Yˆ  b1  b2 X Yˆ  b1  b2 X  b3 X Correct specification, no problems Coefficients are biased (in general) Standard errors are invalid Correct specification, no problems In this sequence we will examine the consequences of fitting a simple regression when the true model is multiple The omission of a relevant explanatory variable causes the regression coefficients to be biased and the standard errors to be invalid OMISSION OF A RELEVANT VARIABLE Y  b1  b X  b X  u E (b2 )  b  b  X i  X Yi  Y   ˆ Y  b1  b2 X b2    X  X  i   X  X  X  X   X  X  2i 3i 2i Y direct effect of X2, holding b X3 constant X2 effect of X3 b apparent effect of X2, acting as a mimic for X3 X3 The strength of the proxy effect depends on two factors: the strength of the effect of X3 on Y, which is given by b3, and the ability of X2 to mimic X3 The ability of X2 to mimic X3 is determined by the slope coefficient obtained when X3 is regressed on X2, the term highlighted in yellow OMISSION OF A RELEVANT VARIABLE  If we estimate the restricted (bad) model instead of the right (unrestricted) one, then  the estimate of parameter B2 will be biased, and the bias will depend on:  The magnitude of the omitted parameter (B3)  The correlation between the included and the omitted variables (X2 and X3)  R2 is affected  Highly correlated variables with omitted one will be affected more Model : R-squared 0,942443 coefficient std error t-ratio p-value -const -4,85518 14,5403 -0,3339 0,7418 k 0,158596 0,0416823 3,805 0,0010 *** l 0,916609 0,149560 6,129 4,42e-06 *** Model : R-squared 0,839496 coefficient std error t-ratio p-value const 77,5787 9,01119 8,609 1,71e-08 *** k 0,377244 0,0351677 10,73 3,32e-010 *** Model : R-squared 0,902764 coefficient std error t-ratio p-value const -38,7267 14,5994 -2,653 0,0145 ** l 1,40367 0,0982155 14,29 1,29e-012 *** Example: Teaching Ratings Correlation Coefficients, using the observations - 463 5% critical value (two-tailed) = 0,0911 for n = 463 minority 1,0000 age female onecredit beauty -0,1031 0,1146 0,2475 0,0331 minority 1,0000 -0,2851 -0,0253 -0,2979 age 1,0000 -0,0443 0,1257 female 1,0000 -0,0847 onecredit 1,0000 beauty intro nnenglish 0,1361 0,2922 minority -0,0915 -0,0021 age -0,0566 0,0038 female 0,3279 -0,0245 onecredit 0,0326 0,0103 beauty 1,0000 -0,1434 intro 1,0000 nnenglish Example: Teaching Ratings Model 1: OLS, using observations 1-463 Dependent variable: course_eval coefficient std error t-ratio p-value -const 4,16853 0,141806 29,40 3,14e-107 *** minority -0,169428 0,0764135 -2,217 0,0271 ** age -0,00195448 0,00266615 -0,7331 0,4639 female -0,183234 0,0510690 -3,588 0,0004 *** onecredit 0,633000 0,111415 5,681 2,39e-08 *** beauty 0,159209 0,0319610 4,981 8,98e-07 *** intro 0,00794885 0,0546997 0,1453 0,8845 nnenglish -0,243840 0,107013 -2,279 0,0232 ** -Mean dependent var 3,998272 S.D dependent var 0,554866 Sum squared resid 120,0996 S.E of regression 0,513766 R-squared 0,155647 Adjusted R-squared 0,142657 F(7, 455) 11,98202 P-value(F) 4,67e-14 Log-likelihood -344,5811 Akaike criterion 705,1623 Schwarz criterion 738,2641 Hannan-Quinn 718,1936 10 Example: Teaching Ratings Model 3: OLS, using observations 1-463 Dependent variable: course_eval coefficient std error t-ratio p-value -const 4,33803 0,141212 30,72 4,02e-113 *** minority -0,162938 0,0783720 -2,079 0,0382 ** age -0,00558276 0,00263085 -2,122 0,0344 ** female -0,172939 0,0523426 -3,304 0,0010 *** onecredit 0,574941 0,113660 5,058 6,14e-07 *** intro 0,0194039 0,0560603 0,3461 0,7294 nnenglish -0,239690 0,109768 -2,184 0,0295 ** -Mean dependent var 3,998272 S.D dependent var 0,554866 Sum squared resid 126,6494 S.E of regression 0,527010 R-squared 0,109599 Adjusted R-squared 0,097883 F(6, 456) 9,354823 P-value(F) 1,09e-09 Log-likelihood -356,8740 Akaike criterion 727,7480 Schwarz criterion 756,7121 Hannan-Quinn 739,1504 11 INCLUSION OF AN IRRELEVANT VARIABLE Consequences of Variable Misspecification True Model Fitted Model Y  b1  b X  u Y  b1  b X  b X  u Yˆ  b1  b2 X Yˆ  b1  b2 X  b3 X Correct specification, no problems Coefficients are unbiased (in general), but inefficient Standard errors are valid (in general) Coefficients are biased (in general) Standard errors are invalid Correct specification, no problems Including irrelevant variables: The effects are different from those of omitted variable misspecification In this case the coefficients in general remain unbiased, but they are inefficient The standard errors remain valid, but are needlessly large INCLUSION OF AN IRRELEVANT VARIABLE Y  b1  b X  u Yˆ  b1  b2 X  b3 X Y  b1  b X  X  u  u2    2  r   X  X X2 ,X3  2i 2 b2 Rewrite the true model adding X3 as an explanatory variable, with a coefficient of Now the true model and the fitted model coincide Hence b2 will be an unbiased estimator of b2 and b3 will be an unbiased estimator of However, the variance of b2 will be larger than it would have been if the correct simple regression had been run because it includes the factor / (1 – r2), where r is the correlation between X2 and X3 The standard errors remain valid, but they will tend to be larger than those obtained in a simple regression, reflecting the loss of efficiency Example: Teaching Ratings Model 9: OLS, using observations 1-463 Dependent variable: course_eval coefficient std error t-ratio p-value -const 4,17279 0,138589 30,11 1,87e-110 *** minority -0,168200 0,0758629 -2,217 0,0271 ** age -0,00198710 0,00265383 -0,7488 0,4544 female -0,183884 0,0508185 -3,618 0,0003 *** onecredit 0,637712 0,106477 5,989 4,28e-09 *** beauty 0,159404 0,0318984 4,997 8,30e-07 *** nnenglish -0,246515 0,105304 -2,341 0,0197 ** Mean dependent var 3,998272 S.D dependent var 0,554866 Sum squared resid 120,1052 S.E of regression 0,513214 R-squared 0,155608 Adjusted R-squared 0,144497 F(6, 456) 14,00557 P-value(F) 1,19e-14 Log-likelihood -344,5919 Akaike criterion 703,1838 Schwarz criterion 732,1479 Hannan-Quinn 714,5861 14 Example: Teaching Ratings Model 9: OLS, using observations 1-463 Dependent variable: course_eval coefficient std error t-ratio p-value -const 4,17279 0,138589 30,11 1,87e-110 *** minority -0,168200 0,0758629 -2,217 0,0271 ** age -0,00198710 0,00265383 -0,7488 0,4544 female -0,183884 0,0508185 -3,618 0,0003 *** onecredit 0,637712 0,106477 5,989 4,28e-09 *** beauty 0,159404 0,0318984 4,997 8,30e-07 *** nnenglish -0,246515 0,105304 -2,341 0,0197 ** Mean dependent var 3,998272 S.D dependent var 0,554866 Sum squared resid 120,1052 S.E of regression 0,513214 R-squared 0,155608 Adjusted R-squared 0,144497 F(6, 456) 14,00557 P-value(F) 1,19e-14 Log-likelihood -344,5919 Akaike criterion 703,1838 Schwarz criterion 732,1479 Hannan-Quinn 714,5861 15 Example: Teaching Ratings Model 1: OLS, using observations 1-463 Dependent variable: course_eval coefficient std error t-ratio p-value -const 4,16853 0,141806 29,40 3,14e-107 *** minority -0,169428 0,0764135 -2,217 0,0271 ** age -0,00195448 0,00266615 -0,7331 0,4639 female -0,183234 0,0510690 -3,588 0,0004 *** onecredit 0,633000 0,111415 5,681 2,39e-08 *** beauty 0,159209 0,0319610 4,981 8,98e-07 *** intro 0,00794885 0,0546997 0,1453 0,8845 nnenglish -0,243840 0,107013 -2,279 0,0232 ** Mean dependent var 3,998272 S.D dependent var 0,554866 Sum squared resid 120,0996 S.E of regression 0,513766 R-squared 0,155647 Adjusted R-squared 0,142657 F(7, 455) 11,98202 P-value(F) 4,67e-14 Log-likelihood -344,5811 Akaike criterion 705,1623 Schwarz criterion 738,2641 Hannan-Quinn 718,1936 16 Function specification  Why the model has to be linear?  The effects of a bad specification for our functional form would as serious as the omitted Variables problem: biased estimates  Ramsey (1969) proposes a test with the following hypothesis:  H0: Right functional form  Ha: Mistaken functional form 17 Ramsey RESET test  Original model  We may suspect that some variables should be introduced non-linearly (e.g in quadratic or cubic terms) To detect it the RESET test proposes using powers of the adjusted endogenous variables from the original equation into an auxiliary regression:  Hypothesis H0: 1=0 versus Ha: 10  If more terms involved, possible F tests 18 Example: Teaching Ratings RESET test for specification (squares and cubes) Test statistic: F = 1,821350, with p-value = P(F(2,453) > 1,82135) = 0,163 RESET test for specification (cubes only) Test statistic: F = 1,089669, with p-value = P(F(1,454) > 1,08967) = 0,297 RESET test for specification (squares only) Test statistic: F = 1,213414, with p-value = P(F(1,454) > 1,21341) = 0,271 19 Structural change  What if parameters are NOT constant over the sample  structural change  Male vs female  ASEAN vs non-ASEAN membership  HQ class vs regular class  If we not control for structural changes, our predictions will not be reliable, our estimation will be inefficient and biased for the not controlled parameter 20 The Chow test  We assume that we may have two subsamples: N=N1+N2  Perform three different regressions:  Y=β1X1+ +βkXk+U for (N)  Y=β1X1+ +βkXk+U for (N1)  Y=β1X1+ +βkXk+U for (N2)  Compute the sum of squares of the residuals for every model (SSRT, SSR1 and SSR2)  Finally we compute the following test: 21 Example: Teaching Ratings coefficient std error t-ratio p-value const 3,74308 0,179284 20,88 3,81e-068 *** minority 0,0397926 0,128113 0,3106 0,7562 age 0,00589585 0,00335918 1,755 0,0799 * female 0,753851 0,272189 2,770 0,0058 *** onecredit 0,729636 0,151391 4,820 1,97e-06 *** beauty 0,270447 0,0446577 6,056 2,95e-09 *** intro 0,0530256 0,0706963 0,7500 0,4536 nnenglish -0,381114 0,151789 -2,511 0,0124 ** sd_minority -0,291008 0,164043 -1,774 0,0767 * sd_age -0,0184194 0,00555496 -3,316 0,0010 *** sd_onecredit -0,325877 0,235494 -1,384 0,1671 sd_beauty -0,191189 0,0642592 -2,975 0,0031 *** sd_intro -0,0809758 0,112427 -0,7203 0,4717 sd_nnenglish 0,315239 0,219198 1,438 0,1511 Chow test for structural break at observation 269 F(6, 449) = 4,10488 with p-value 0,0005 22 ... 0,03 979 26 0,128113 0,3106 0 ,75 62 age 0,00589585 0,00335918 1 ,75 5 0, 079 9 * female 0 ,75 3851 0, 272 189 2 ,77 0 0,0058 *** onecredit 0 ,72 9636 0,151391 4,820 1,97e-06 *** beauty 0, 270 4 47 0,0446 577 6,056... 4,42e-06 *** Model : R-squared 0,839496 coefficient std error t-ratio p-value const 77 , 578 7 9,01119 8,609 1 ,71 e-08 *** k 0, 377 244 0,0351 677 10 ,73 3,32e-010 *** Model :... -const 4, 172 79 0,138589 30,11 1,87e-110 *** minority -0,168200 0, 075 8629 -2,2 17 0,0 271 ** age -0,0019 871 0 0,00265383 -0 ,74 88 0,4544 female -0,183884 0,0508185 -3,618 0,0003 *** onecredit 0,6 377 12 0,106 477

Định dạng
Số trang	22
Dung lượng	236,16 KB