Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 12 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
12
Dung lượng
233,84 KB
Nội dung
Applied Econometrics Simple Linear Regression Model Applied Econometrics Lecture 2: Simple Regression Model ‘It does require maturity to realize that models are to be used but not to be believed’ HENRI THEIL, Principles of Econometrics 1) Assumptions of the two-variable linear regression model The estimation process begins by assuming or hypothesizing that the least squares linear regression model (drawn from a sample) is valid The formal two-variable linear regression model is based on the following assumptions: (1) The population regression is adequately represented by a straight line: E(Yi) = μ(Xi) = β0 + β1Xi (2) The error terms have zero mean: E(∈i) = (3) A constant variance (homoscedasticity): V(∈i) = σ2 (4) Zero covariance (no correlation): E(∈i, ∈j) = for all i ≠ j (5) X is non-stochastic, implying that E(Xi, ∈i) = 2) Least squares estimation The sample regression model can be written as follows: Yi = b0 + b1Xi + ei Its least squared estimators b0 and b1 are obtained by minimizing the sum of squared residual with respect to b0 and b1 ∑ei2 = ∑(Yi – b0 – b1Xi)2 → The resulting estimators of b0 and b1 are then given by: n b1 = ( )( ∑ Xi − X Yi − Y i =1 n ⎛ ⎞ ∑ ⎜ Xi − X ⎟ ⎠ ) i =1 ⎝ b0 = Y − b1 X 3) Analysis of Variances The least squared regression splits the variation in the Y variable into two components: the explained variation due to the variation in Xi and the residual variation TSS Written by Nguyen Hoang Bao = RSS + ESS May 20, 2004 Applied Econometrics Simple Linear Regression Model ∑ (Yi −Y ) = n i =1 ˆ ∑ (Y i −Y i ) n i =1 + ˆ ∑ (Y i −Y n i =1 ) where TSS is the total sum of squares, which is observed in the dependent variable Y RSS is the residual sum of squares ESS is the explanatory sum of squares, which is the variation of the predicted values (b0 + b1X) The coefficient of determination, which measures the goodness of fit of the estimated sample regression: R2 = ESS/TSS ≤ R2 ≤ +1 Because the explained variation cannot exceed the total variation, the maximum value of R2 is one; that is, 100 percent of the variation is explained or accounted for Conversely, its minimum value is zero, which implies that none of the variation in the dependent variable is explained by the independent variable There may also be spurious correlation, especially using time-series data The correlation coefficient (r) can be calculated as follows: r= COV(X, Y) ) = VAR(X ) VAR(Y ) n ∑ (X − X )(Yi − Y ) n i =1 i n ⎛⎜ n ⎛⎜ ⎞2 ⎞2 ∑ ⎜⎝ Xi − X ⎟⎟⎠ ∑ ⎜⎝ Yi − Y ⎟⎟⎠ n i =1 n i =1 -1 ≤ r ≤ +1 Note that values close to ±1 indicate very strong relationships, those close to zero indicate very weak relationships Now that we have r, how we know if it is significant different from zero? We must first convert our t into a value of calculated-t using the following formulae: The hypothesized value of r is given by: H0: r = tcalculated = r n−2 −r If tcalculated > tcritical value(0.05; n-2): We reject the hypothesis If tcalculated ≤ tcritical value(0.05; n-2): We maintain the hypothesis High correlation not provide for an inference of causality There are numerous cases in economics Written by Nguyen Hoang Bao May 20, 2004 Applied Econometrics Simple Linear Regression Model in which two variables are highly correlated, but both are determined by a third underlying variable If such is the case, the underlying variable should appear in the regression model as the independent variable 3) Standard errors of the regression and coefficients The standard error of the regression (s) is given as follows: RSS n−2 s= The standard errors of the coefficients (intercept and slope) are respectively given by: X SE(b0 ) = s + n n ∑ (Xi − X )2 i =1 SE (b1) = s ∑ (Xi − X ) n i =1 4) Whether b0 and b1 are statistically significant difference from the hypotheses H0(β0) and H0(β1) or not? The hypothesized value of β0 is given by: H0: b0 = β0 The calculated t is given by: t calculated = b0 − β0 SE (b0 ) The hypothesized value of β1 is given by: H0: b1 = β1 The calculated t is given by: t calculated = b1 − β1 SE (b1) We compare the calculated t with the Student’s t We compare the calculated t with the Student’s t distribution with (n-2) at desired level of distribution with (n-2) at desired level of significance (usually 5%) significance (usually 5%) If tcalculated>tcritical value: we reject the hypothesis If tcalculated>tcritical value: we reject the hypothesis If tcalculated≤tcritical value: we maintain the hypothesis If tcalculated≤tcritical value: we maintain the hypothesis The smaller s, the better the fit Written by Nguyen Hoang Bao May 20, 2004 Applied Econometrics Simple Linear Regression Model 5) Confident intervals for the parameters β0 and β1, for the conditional mean of Y and for the predicted Y values 5.1) Confident intervals for the parameters β0 and β1 are respectively given by: b0 ± t(α/2; n-2) x SE(b0) = (b0 – t(α/2; n-2) x SE(b0); b0 + t(α/2; n-2) x SE(b0)) b1 ± t(α/2; n-2) x SE(b1) = (b1 – t(α/2; n-2) x SE(b1); b1 + t(α/2; n-2) x SE(b1)) where α is the desired level of significance 5.2) Confident interval for the conditional mean of Y If X = X0, the point estimate of Y is given by: Y0 = b0 + b1X0 The confident intervals for the conditional mean can be obtained by: Y0 ± t(α/2; n-2) x SE(Y0) where SE(Y0 ) = s (X − X )2 + n n ∑ (X i − X )2 i =1 5.3) Confident interval for the predicted Y values One use of a least squares regression equation is the prediction of Y for a given X Prediction is often used when direction or observation of Y is expensive or impossible, while X is readily and cheaply observed If X = X0, the point estimate of Y is given by: Y0 = b0 + b1X0 The confident intervals for the predicted Y value can be obtained by: Y0 ± t(α/2; n-2) x SE(Y0) where SE (Y0 ) = s + (X − X )2 + n n ∑ (X i − X )2 i =1 In this case, therefore, the standard error of Y0 is larger than that of Y0 in the previous section, since the latter corresponds to the conditional mean of Y for a given level of X, while the former corresponds to the predicted value of Y Note that SE(Y0) is smaller if (1) s is smaller; (2) X0 is close to the sample mean X ; (3) n is larger; and, (4) the sample variation in X is greater Written by Nguyen Hoang Bao May 20, 2004 Applied Econometrics Simple Linear Regression Model 6) Standard error of a residual The residuals ei are the estimators of errors ∈i The standard error of ei is obtained as follows: SE (ei ) = s − h i where hi = (Xi − X )2 + n n i∑ (Xi − X )2 =1 The standard deviation of the error term is assumed to be homoscedasticity 7) Example The cross-country data for 66 developing countries for 1987 is collected from World Development Report The variables are: Dependent variable: National savings rate (S/Y) Independent variable: Foreign aid as per cent of GDP (A/Y) SUMMARY OUTPUT Regression Statistics Multiple R 0.570665305 R Square 0.32565889 Adjusted R Square 0.31512231 Standard Error 12.72184023 Observations 66 ANOVA Df SS Regression MS 5002.224173 5002.224 Residual 64 10358.09401 65 15360.31818 Coefficients Intercept A/Y Standard Error t Stat 30.90746 Significance F 161.8452 Total F P-value 5.65198E-07 Lower 95% Upper 95% 19.1411184 2.062147087 9.282131 1.82E-13 15.02150973 23.2607271 -0.872432774 0.156927962 -5.55945 5.65E-07 -1.18593213 -0.55893341 Written by Nguyen Hoang Bao May 20, 2004 Applied Econometrics Simple Linear Regression Model RESIDUAL OUTPUT Observation Predicted S/Y Residuals 8.846411667 -5.846411667 9.806087718 0.193912282 … … … 66 19.1411184 5.858881602 Figure 7.1: Impact of Foreign Aid on National Savings 60.0 40.0 National Savings ratio 20.0 0.0 0.0 10.0 20.0 30.0 40.0 50.0 60.0 -20.0 -40.0 -60.0 -80.0 Foreign Aid as per cent of GDP Figure 7.2: Written by Nguyen Hoang Bao May 20, 2004 Applied Econometrics Simple Linear Regression Model Residual Plot (S/Y-S/Yfit) versus A/Y 40 Residual (e = S/Y-S/Yfit) 20 0.0 10.0 20.0 30.0 40.0 50.0 60.0 -20 -40 -60 -80 Foreign Aid as per cent of GDP 8) Functional forms of regression models We observe that the relationship between Y and X can be nonlinear rather than linear The functional form to adopt can often be seen from the data and this is sometimes you should become practiced in Semi – logarithmic transformation Y = β0 + β1lnX Y Y X X lnY = β0 + β1X Y Written by Nguyen Hoang Bao Y May 20, 2004 Applied Econometrics Simple Linear Regression Model X X Double – logarithmic transformation lnY = β0 + β1lnX Y Y X X Reciprocal Tranformation Y = β0 + β1(1/X) Regressing Y on 1/X allows Y to grow or decline towards a limiting value given by the intercept of the model Y Y X X The interpretation of the regression coefficients depends on the functional form adopted Specific functional forms Y = β0 + β1X Y = β0 + β1lnX lnY = β0 + β1X Written by Nguyen Hoang Bao Derivatives Interpretation β1 = ∂Y ∂X Constant absolute increment in Y β1 = ∂Y ∂X/X Constant absolute increment in Y β1 = ∂Y/Y ∂X Constant rate of growth in Y for for one unit change in X for one percent increase in X one unit change in X May 20, 2004 Applied Econometrics Simple Linear Regression Model lnY = β0 + β1lnX β1 = Constant elasticity (i.e., constant ∂Y/Y ∂X/X per cent increase in Y for one per cent increase in X) Y = β0 + β1(1/X) β1 = Constant absolute increment in Y ∂Y ∂ (1 / X) for one unit change in (1/X) If we transform either X or Y, or both, to achieve linearity of the regression line, we should not forget that, in the process, we also transformed the distribution of Y or X or both In practice, a useful rule of thumb is to keep in mind that linear regression tends to work best when both variables are similarly preferably symmetrically shaped (Hamilton, 1992: 148) In econometric practice, the logarithmic transformation is very popular One reason is that functions, which can be linearised with the aid of logarithms have coefficient which lend themselves to meaningful economic interpretations, such as elasticity or a growth rate Another reason is that the logarithmic transformation frequently, but not always, does the trick with socioeconomic data, which are often skewed to the right References Bao, Nguyen Hoang (1995), ‘Applied Econometrics’, Lecture notes and Readings, Vietnam-Netherlands Project for MA Program in Economics of Development Hamilton, Lawrence C (1992), Regression with graphics: A Second in Applied Statistics, Pacific Grove, CA: Brooks Cole Maddala, G.S (1992), ‘Introduction to Econometrics’, Macmillan Publishing Company, New York Mukherjee Chandan, Howard White and Marc Wuyts (1998), ‘Econometrics and Data Analysis for Developing Countries’ published by Routledge, London, UK Written by Nguyen Hoang Bao May 20, 2004 Applied Econometrics 10 Simple Linear Regression Model Workshop 2: Simple Regression Models 1) Retrieve data file AIDSAV.WK1, which contains the data for saving rate (S/Y) and the aid ratio (A/Y) 1.1) Plot the scatter plot of (S/Y) against (A/Y) 1.2) Calculate the mean deviations (S/Y – S/Y ), (A/Y – A/Y ); the mean deviations (S/Y– S/Y )2, (A/Y– A/Y )2; and the product of the mean deviation (S/Y – S/Y )*(A/Y – A/Y ) ˆ ˆ ˆ ˆ 1.3) Calculate β0 and β1 from the following regression equation S/Y = β0 + β1 (A/Y) 1.4) Calculate the fitted values of S/Y 1.5) Calculate the residual ui 1.6) Check they sum to zero ∑ui = 1.7) Calculate the total sum of squares (TSS), the explanatory sum of squares (ESS), and the residual sum of squares (RSS) 1.8) Calculate the coefficient of determination (R2) and correlation coefficient (r) ˆ 1.9) Calculate the standard error of the regression (s), the standard error of the slope SE( β1 ), and ˆ the standard error of the intercept SE( β0 ) 1.10) Calculate the t – statistics ˆ ˆ 1.11) Test the null hypothesis that H0: β1 = against β1 ≠ Comment on your results 1.12) Draw the scatter plot showing the fitted line Comment on your findings Written by Nguyen Hoang Bao May 20, 2004 Applied Econometrics 11 Simple Linear Regression Model ˆ ˆ 1.13) Construct 95% confident interval for β0 and β1 1.14) Given a specific value of A/Y = 0.12, calculate the prediction interval for S/Y 2) Retrieve data file TOT.WK1, which contains the data for TOT 2.1) Estimate the following two models lnTOT = α0 + α1T where T = 1, 2, …, 37 lnTOT = β0 + β1t where t = 1950, 1951, …, 1986 2.2) What is the relationship between α0 and β0 and between α1 and β1 Discuss this relationship first by looking at your regression results and algebraic demonstration of the relationship between the coefficients 3) Draw a scatter plot of consumption against income using the data in data file SRINA.WK1 Plot on this graph 3.1) the fitted values of consumption from the consumption function 3.2) the upper and lower limits of the confidence interval for the fitted values and discussing on the shape of the curves you have plotted 4) Demonstrate algebraically that adding a point (Xn+1, Yn+1) to a sample of n observations will 4.1) not influence the slope coefficient of the regression of Y on X if Xn+1 is equal to sample mean for X of the n observations 4.2) that the intercept from the regression will probably change even if Yn+1 is equal to the sample mean for Y of the n observations 4.3) that if Xn+1, Yn+1 lies at the point of means of the sample of observations, then the regression line is unchanged Written by Nguyen Hoang Bao May 20, 2004 Applied Econometrics 12 Simple Linear Regression Model 4.4) Generating a numerical data to illustrate your findings 5) Retrieve data file INDO.WK1, which contains the data for income (Y) and expenditure (E= C+I) 5.1) Estimate the following two models: lnE = α0 + α1Y lnE = β0 + β1lnY 5.2) Draw the scatter plots and the fitted lines on each; one paragraph of discussion Written by Nguyen Hoang Bao May 20, 2004 ... Written by Nguyen Hoang Bao May 20, 2004 Applied Econometrics 10 Simple Linear Regression Model Workshop 2: Simple Regression Models 1) Retrieve data file AIDSAV.WK1, which contains the data... -60.0 -80.0 Foreign Aid as per cent of GDP Figure 7 .2: Written by Nguyen Hoang Bao May 20, 2004 Applied Econometrics Simple Linear Regression Model Residual Plot (S/Y-S/Yfit) versus A/Y 40 Residual... underlying variable should appear in the regression model as the independent variable 3) Standard errors of the regression and coefficients The standard error of the regression (s) is given as follows: