Tài liệu MULTIPLE LINEAR REGRESSION MODEL Introduction and Estimation ppt

12 610 0
Tài liệu MULTIPLE LINEAR REGRESSION MODEL Introduction and Estimation ppt

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Fulbright Economics Teaching Program Analytical Methods Lecture notes Lecture MULTIPLE LINEAR REGRESSION MODEL Introduction and Estimation 1) Introduction to the multiple linear regression model The simple linear regression model cannot explain everything So far, we have considered the simple linear regression model In both theory and practice, there are many cases in which a given economic variable cannot be explained by such the simple regression model We can offer the following examples : Quantity demanded depends on price, income, and the prices of other goods, etc Recall consumer behaviour theory QD = f(P, I, Ps, Pc, Market size,Pf (expected price), T (preference)) Output depends on price, primary inputs, intermediate inputs, technology, etc Recall production function theory : QS=f(K,L, TECH) Nguyen Trong Hoai 1/10/24 Fulbright Economics Teaching Program Analytical Methods Lecture notes the growth rate of an economy depends on investment, labour, technological change, etc Recall the total factor productivity theory : Wages depends on education, experience, gender, age, etc House prices depends on size, the number of bedrooms and bathroom, etc Nguyen Trong Hoai 1/10/24 Fulbright Economics Teaching Program Analytical Methods Lecture notes Household expenditure on food depends on the household size, the income, the location, etc National children mortality rates depends on the income per capita, eduction, etc Nguyen Trong Hoai 1/10/24 Fulbright Economics Teaching Program Analytical Methods Lecture notes The demand for money depends on the rate of interest, the price, the GDP in the economy; etc When we collect data of some economic variables (called dependent variables) and its determinants (called explanatory variables), studies of separate influences (direct or net) of the various factors on an economic variable can be explained by the multiple regression model 2) Data requirement Some data is expressed in terms of a spreadsheet as above mentioned 3) Population Regression Function-PRF Study the model : Yi     X i   X 3i     K X Ki   i E Yi | X' s      X 2i   X 3i     K X Ki  E  i | X' s PRF  The β -coefficients are called the partial regression coefficients and each one has the following interpretation : - Nguyen Trong Hoai 1/10/24 Fulbright Economics Teaching Program Analytical Methods Lecture notes E  Yi | X' s   k X k 4) Important assumptions of the multiple linear regression model PRF consists of two components : a controlled part and a stochastic part (stochastic disturbance - random disturbance) i is a random variable and follows normal a distribution i  N(0, 2), X’s are controlled variables or given variables Since Yi is the sum of such two parts, Yi is also a random variable 4.1 OLS assumptions in a simple regression model are interpreted in a multiple regression model : these assumptions relate to stochastic disturbance (i) a) b) c) d) Mean value of i is zero => E(i  X’s) = No serial correlation (autocorrelation) => cov(i, j X’s ) = với i # j Homoscedasticity => var(i) = 2 Random disturbance has no correlation with Xs => cov(i, Xki ) = (k: number of explanatory variables in the model) e) No error model specification 4.2 Additional assumptions of OLS for multiple regression model Regressors not perfectly satisfy any linear relationship (perfect multicollinearity) That is, there is no set of coefficients for which the following expression is always true :   X i   X 3i     K X Ki  We will explain this condition clearly by the two-explanatory variable (two regressor) model We temporarily accept this assumption 5) Sample Regression Function-SRF We address the estimation problem by specifying the sample regression function (SRF) : Yˆi  ˆ1  ˆ X 2i  ˆ X 3i    ˆ K X Ki The residuals are defined in just the same way as they were defined in the simple regression framework : ei ˆ  Yi - Y i 6) Ordinary Least Squares Estimators - OLS By definition, we can invoke the ordinary least squares principle to choose the estimators of partial regression coefficients Choose ˆ1 , ˆ ,  , ˆ K Nguyen Trong Hoai to minimize e i 1/10/24 Fulbright Economics Teaching Program Analytical Methods Lecture notes Note e that i   Y ˆ -  ˆ X ˆ -  -  - ˆ K X Ki 2 i -  X 3i i  We can set the first-order conditions of the minimization exercise as :  ei2 ˆ  - 2 Y - ˆ1 - ˆ X i - ˆ X 3i -  - ˆ K X Ki   ei2 ˆ  - 2 Y - ˆ1 - ˆ X i - ˆ X 3i -  - ˆ K X Ki X 2i  - 2 Y - ˆ1 - ˆ X i - ˆ X 3i -  - ˆ K X Ki X Ki i 0 i 0   ei2 ˆ i 0 K From the system we call the ‘normal equation system’ we can solve K normal equations for K unknown beta coefficients The straight-forward representation of the solution is expressed in the matrix algebra However, since the main purpose is the application and EViews Other data analysis software is available, so we can easily find regression coefficients without remembering all the algebraic expressions 7) The Two Explanatory Variable (two regressor) Regression Model We can present a solution for the model that contains two regressors : Yi     X 2i   X 3i   i First, we must write down a normal equation system for the case, then use matrix algebra to find the estimators The least-squares estimators are : ˆ1  Y - ˆ X - ˆ X ˆ   y x   x  -  y x   x  x   x  -  x x  i 3i 2i 2i i 3i 3i 2i x 3i  2i 3i  y i x3i   x 22i  -  y i x2i   x 2i x3i  ˆ   x22i   x 3i2  -  x 2i x3i  Nguyen Trong Hoai 1/10/24 Fulbright Economics Teaching Program Analytical Methods Lecture notes We not need to remember these expressions, but we will use them to demonstrate certain results The calculation of the estimators will become more difficult if our regression model has more regressors However, with the help of Eviews and other data analysis software, we can find the estimators of the multiple regression model quickly and easily To explain when there is perfect multi-collinearity, we cannot receive finite solutions for the regression coefficients 8) Meaning of estimated coefficients in the multiple regression model Name : partial slope coefficient or partial regression coefficient Meaning : the partial slope coefficient of regression variables in the multiple regression model describes by how many units the dependent variable changes when the explanatory variable changes by one unit - holding other explanatory variables constant In other words, the partial slope coefficient reflects the net effect or the direct effect of the dependent variable when the explanatory variable changes by one unit – and after having removed the influences of any other regression variables The effectiveness of multiple regression model : it directly estimates the direct effect of the one regression variable on the dependent variable If we use a multiple regression model to estimate the direct effect of one regression variable on the dependent variable (for example, where the one dependent variable depends upon two regression variables X2 and X3) If we want to find out the direct effect of X2 on the dependent variable (Y in this case), we must three simple regressions For example : we have data on the child mortality rate (CM) which depends on the GNP per capita (PGNP) and the female illiteracy rate (FLR) If we want to find out the direct effect of PGNP on CM, we remove the effect of FLR on CM and PGNP Please see the example in the Reading, pages 206 and 214 (English version) - Regress CM on FLR = CMi = 263.8635 – 2.3905.FLRi + e 1i Nguyen Trong Hoai 1/10/24 Fulbright Economics Teaching Program Analytical Methods Lecture notes - - Regress PGNP on FLR = PGNP = - 39.3033 + 28.1427 FLRi + e 2i - Nguyen Trong Hoai 1/10/24 Fulbright Economics Teaching Program - Analytical Methods Lecture notes Regress the resedual of the first function on the resedual of the second equation = e1i^= -0.0056 e2i - Multiple regression helps us immediately to know the direct effect of the PGNP on the CM with the same value as is calculated in the third simple regression Nguyen Trong Hoai 1/10/24 Fulbright Economics Teaching Program Analytical Methods Lecture notes CM^ = 263.6416 – 0.0056 PGNPi – 2.2326 FLRi We can explain more by using a graph 9) The Variance (VAR) and standard deviation (SE) of the estinators SE (estimated) = root square of VAR (estimated) Variances of multiple regression are also very complicated We will only write down the variance of ˆ to see this as an example : -   VAR ˆ x 3i  x   x  -  x 2i 3i 2i x 3i  2 Recall the definition of the squared correlation coefficient between X and X3 : 2 23 r  x x    x   x  2i 2i 3i 3i By manipulating a little bit we can rewrite the variance of ˆ as follows : - VAR  ˆ2   Nguyen Trong Hoai   x  1 - r  2i 23 10   2  K 2 1/10/24 Fulbright Economics Teaching Program Analytical Methods Lecture notes Again, if these two regressors are uncorrected, then the variance is simplified to it’s simple regression counterpart Sampling probability distributions of OLS estimators In order to be able to construct intervals with confidence for the unknown parameters, and to best test their hypotheses, we need to know the sampling probability distributions for the estimators When we mention the sampling distribution, we require three things : The mathematical expectation The variance The functional form First consider the expectation of an estimator ˆ : - ˆ   Y x   x  -  Y x   x  x   x  -  x x  i 3i 2i i 2i 3i 2i x 3i  2 3i 2i 3i Now substitute : Yi     X 2i   X 3i   i into the expression and make some changes algebraically : -  Yi x2i   x 3i2  -  Yi x3i   x 2i x3i  ˆ 2   x22i   x 3i2  -  x 2i x3i   2    x   x  -   x   x  x   x  -  x x  i 3i 2i 2i i 3i 2i x 3i  2 3i 2i 3i When we take the expectation of the second expression, we find that our estimator is unbiased because of the following results : -    E ˆ 2 We are already familiar with the variance Finally, it is apparent from the expression above that each estimator is a linear combination of normally distributed random variables, so each estimator is also normally distributed The same results are true for the K-variable multiple regression The estimators are unbiased, their variances are known, and they are normally Nguyen Trong Hoai 11 1/10/24 Fulbright Economics Teaching Program Analytical Methods Lecture notes distributed However, these results are impractical to demonstrate without matrix algebra We summarize the typical result this way : -  ˆ k ~ N  k ,  2ˆ k  10) Properties of OLS estimators in the multiple regression model 10.1 BLUE – “Best Linear Unbiased Estimator.” This property is the same as for the simple regression model We should understand three properties of BLUEø : - Linear estimators (linear regression coefficients - give some examples) Unbiased estimators (based on the estimation expression - we can get expectation of both sides) Variance of estimators is minimum (it is proved by Gauss-Markov, however, we can interpret the estimators with a minimum variance directly when referring to the covariance of regression variables - and assuming that there is no perfect collinearity) 10.2 When there is perfect multi-collinearity (i.e not satisfy the OLS assumptions for the multiple regression model), the VAR of the estimated coefficients is not minimized and we cannot find the estimators for the coefficients 10.3 The more the change in the regression variable in comparison with its mean, the less the variance of the estimated coefficient, and the more accurate the estimated parameters becomes It is normal that the more the change in the regression variable, the more accurate the sample size (number of observations) becomes This can be explained using a probability density function graph So, what can be said to be a large enough sample size? Nguyen Trong Hoai 12 1/10/24 ... of the multiple linear regression model PRF consists of two components : a controlled part and a stochastic part (stochastic disturbance - random disturbance) i is a random variable and follows... the multiple regression model 10.1 BLUE – “Best Linear Unbiased Estimator.” This property is the same as for the simple regression model We should understand three properties of BLUEø : - Linear. .. difficult if our regression model has more regressors However, with the help of Eviews and other data analysis software, we can find the estimators of the multiple regression model quickly and easily

Ngày đăng: 20/12/2013, 18:15

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan