foundations of econometrics phần 5 ppsx

7.7 Testing for Serial Correlation 279 the alternative that ρ > 0. An investigator will reject the null hypothesis if d < d L , fail to reject if d > d U , and come to no conclusion if d L < d < d U . For example, for a test at the .05 level when n = 100 and k = 8, including the constant term, the bounding critical values are d L = 1.528 and d U = 1.826. Therefore, one would reject the null hypothesis if d < 1.528 and not reject it if d > 1.826. Notice that, even for this not particularly small sample size, the indeterminate region between 1.528 and 1.826 is quite large. It should by now be evident that the Durbin-Watson statistic, despite its popularity, is not very satisfactory. Using it with standard tables is relatively cumbersome and often yields inconclusive results. Moreover, the standard tables only allow us to perform one-tailed tests against the alternative that ρ > 0. Since the alternative that ρ < 0 is often of interest as well, the inability to perform a two-tailed test, or a one-tailed test against this alternative, using standard tables is a serious limitation. Although exact P values for both one- tailed and two-tailed tests, which depend on the X matrix, can be obtained by using appropriate software, many computer programs do not offer this capability. In addition, the DW statistic is not valid when the regressors include lagged dependent variables, and it cannot easily be generalized to test for higher-order processes. Happily, the development of simulation-based tests has made the DW statistic obsolete. Monte Carlo Tests for Serial Correlation We discussed simulation-based tests, including Monte Carlo tests and bootstrap tests, at some length in Section 4.6. The techniques discussed there can readily be applied to the problem of testing for serial correlation in linear and nonlinear regression models. All the test statistics we have discussed, namely, t GNR , t SR , and d, are pivotal under the null hypothesis that ρ = 0 when the assumptions of the classical normal linear model are satisfied. This makes it possible to perform Monte Carlo tests that are exact in finite samples. Pivotalness follows from two properties shared by all these statistics. The first of these is that they depend only on the residuals ˜u t obtained by estimation under the null hyp othesis. The distribution of the residuals depends on the exogenous explanatory variables X, but these are given and the same for all DGPs in a classical normal linear model. The distribution does not depend on the parameter vector β of the regression function, because, if y = Xβ + u, then M X y = M X u whatever the value of the vector β. The second property that all the statistics we have considered share is scale invariance. By this, we mean that multiplying the dependent variable by an arbitrary scalar λ leaves the statistic unchanged. In a linear regression model, multiplying the dependent variable by λ causes the residuals to be multiplied by λ. But the statistics defined in (7.51), (7.52), and (7.53) are clearly unchanged if all the residuals are multiplied by the same constant, and so these statistics are scale invariant. Since the residuals ˜ u are equal to M X u, Copyright c  1999, Russell Davidson and James G. MacKinnon 280 Generalized Least Squares and Related Topics it follows that multiplying σ by an arbitrary λ multiplies the residuals by λ. Consequently, the distributions of the statistics are independent of σ 2 as well as of β. This implies that, for the classical normal linear model, all three statistics are pivotal. We now outline how to perform Monte Carlo tests for serial correlation in the context of the classical normal linear model. Let us call the test statistic we are using τ and its realized value ˆτ. If we want to test for AR(1) errors, the best choice for the statistic τ is the t statistic t GNR from the GNR (7.43), but it could also be the DW statistic, the t statistic t SR from the simple regression (7.46), or even ˜ρ itself. If we want to test for AR(p) errors, the best choice for τ would be the F statistic from the GNR (7.45), but it could also be the F statistic from a regression of ˜u t on ˜u t−1 through ˜u t−p . The first step, evidently, is to compute ˆτ. The next step is to generate B sets of simulated residuals and use each of them to compute a simulated test statistic, say τ ∗ j , for j = 1, . . . , B. Because the parameters do not matter, we can simply draw B vectors u ∗ j from the N(0, I) distribution and regress each of them on X to generate the simulated residuals M X u ∗ j , which are then used to compute τ ∗ j . This can be done very inexpensively. The final step is to calculate an estimated P value for whatever null hypothesis is of interest. For example, for a two-tailed test of the null hypothesis that ρ = 0, the P value would be the proportion of the τ ∗ j that exceed ˆτ in absolute value: ˆp ∗ (ˆτ) = 1 B B  j=1 I  |τ ∗ j | > |ˆτ|  . (7.54) We would then reject the null hypothesis at level α if ˆp ∗ (ˆτ) < α. As we saw in Section 4.6, such a test will be exact whenever B is chosen so that α(B + 1) is an integer. Bootstrap Tests for Serial Correlation Whenever the regression function is nonlinear or contains lagged dependent variables, or whenever the distribution of the error terms is unknown, none of the standard test statistics for serial correlation will be pivotal. Nevertheless, it is still possible to obtain very accurate inferences, even in quite small samples, by using bootstrap tests. The procedure is essentially the one described in the previous subsection. We still generate B simulated test statistics and use them to compute a P value according to (7.54) or its analog for a one- tailed test. For best results, the test statistic used should be asymptotically valid for the model that is being tested. In particular, we should avoid d and t SR whenever there are lagged dependent variables. It is extremely important to generate the bootstrap samples in such a way that they are compatible with the model under test. Ways of generating bootstrap samples for regression models were discussed in Section 4.6. If the mo del Copyright c  1999, Russell Davidson and James G. MacKinnon 7.7 Testing for Serial Correlation 281 is nonlinear or includes lagged dependent variables, we need to generate y ∗ j rather than just u ∗ j . For this, we need estimates of the parameters of the regression function. If the model includes lagged dependent variables, we must generate the bootstrap samples recursively, as in (4.66). Unless we are going to assume that the error terms are normally distributed, we should draw the bootstrap error terms from the EDF of the residuals for the model under test, after they have been appropriately rescaled. Recall that there is more than one way to do this. The simplest approach is just to multiply each residual by (n/(n − k)) 1/2 , as in expression (4.68). We strongly recommend the use of simulation-based tests for serial correlation, rather than asymptotic tests. Monte Carlo tests are appropriate only in the context of the classical normal linear model, but bootstrap tests are appropriate under much weaker assumptions. It is generally a good idea to test for b oth AR(1) errors and higher-order autoregressive errors, at least fourth-order in the case of quarterly data, and at least twelfth-order in the case of monthly data. Heteroskedasticity-Robust Tests The tests for serial correlation that we have discussed are based on the assumption that the error terms are homoskedastic. When this crucial assumption is violated, the asymptotic distributions of all the test statistics will differ from whatever distributions they are supposed to follow asymptotically. However, as we saw in Section 6.8, it is not difficult to modify GNR-based tests to make them robust to heteroskedasticity of unknown form. Suppose we wish to test the linear regression model (7.42), in which the error terms are serially uncorrelated, against the alternative that the error terms follow an AR(p) process. Under the assumption of homoskedasticity, we could simply run the GNR (7.45) and use an asymptotic F test. If we let Z denote an n × p matrix with typical element Z ti = ˜u t−i , where any missing lagged residuals are replaced by zeros, this GNR can be written as ˜ u = Xb + Zc + residuals. (7.55) The ordinary F test for c = 0 in (7.55) is not robust to heteroskedasticity, but a heteroskedasticity-robust test can easily be computed using the procedure described in Section 6.8. This procedure works as follows: 1. Create the matrices ˜ UX and ˜ UZ by multiplying the t th row of X and the t th row of Z by ˜u t for all t. 2. Create the matrices ˜ U −1 X and ˜ U −1 Z by dividing the t th row of X and the t th row of Z by ˜u t for all t. 3. Regress each of the columns of ˜ U −1 X and ˜ U −1 Z on ˜ UX and ˜ UZ jointly. Save the resulting matrices of fitted values and call them ¯ X and ¯ Z, respectively. Copyright c  1999, Russell Davidson and James G. MacKinnon 282 Generalized Least Squares and Related Topics 4. Regress ι, a vector of 1s, on ¯ X. Retain the sum of squared residuals from this regression, and call it RSSR. Then regress ι on ¯ X and ¯ Z jointly, retain the sum of squared residuals, and call it USSR. 5. Compute the test statistic RSSR − USSR, which will be asymptotically distributed as χ 2 (p) under the null hypothesis. Although this heteroskedasticity-robust test is asymptotically valid, it will not be exact in finite samples. In principle, it should be possible to obtain more reliable results by using bootstrap P values instead of asymptotic ones. However, none of the metho ds of generating bootstrap samples for regression models that we have discussed so far (see Section 4.6) is appropriate for a model with heteroskedastic error terms. Several methods exist, but they are beyond the scope of this book, and there currently exists no method that we can recommend with complete confidence; see Davison and Hinkley (1997) and Horowitz (2001). Other Tests Based on OLS Residuals The tests for serial correlation that we have discussed in this section are by no means the only scale-invariant tests based on least squares residuals that are regularly encountered in econometrics. Many tests for heteroskedasticity, skewness, kurtosis, and other deviations from the NID assumption also have these properties. For example, consider tests for heteroskedasticity based on regression (7.28). Nothing in that regression depends on y except for the squared residuals that constitute the regressand. Further, it is clear that both the F statistic for the hypothesis that b γ = 0 and n times the centered R 2 are scale invariant. Therefore, for a classical normal linear model with X and Z fixed, these statistics are pivotal. Consequently, Monte Carlo tests based on them, in which we draw the error terms from the N(0, 1) distribution, are exact in finite samples. When the normality assumption is not appropriate, we have two options. If some other distribution that is known up to a scale parameter is thought to be appropriate, we can draw the error terms from it instead of from the N(0, 1) distribution. If the assumed distribution really is the true one, we obtain an exact test. Alternatively, we can perform a bootstrap test in which the error terms are obtained by resampling the rescaled residuals. This is also appropriate when there are lagged dependent variables among the regressors. The bootstrap test will not be exact, but it should still perform well in finite samples no matter how the error terms actually happen to be distributed. 7.8 Estimating Models with Autoregressive Errors If we decide that the error terms of a regression model are serially correlated, either on the basis of theoretical considerations or as a result of specification Copyright c  1999, Russell Davidson and James G. MacKinnon 7.8 Estimating Models with Autoregressive Errors 283 testing, and we are confident that the regression function itself is not misspec- ified, the next step is to estimate a modified model which takes account of the serial correlation. The simplest such model is (7.40), which is the original regression model modified by having the error terms follow an AR(1) process. For ease of reference, we rewrite (7.40) here: y t = X t β + u t , u t = ρu t−1 + ε t , ε t ∼ IID(0, σ 2 ε ). (7.56) In many cases, as we will discuss in the next section, the best approach may actually be to specify a more complicated, dynamic, model for which the error terms are not serially correlated. In this section, however, we ignore this important issue and simply discuss how to estimate the model (7.56) under various assumptions. Estimation by Feasible GLS We have seen that, if the u t follow a stationary AR(1) process, that is, if |ρ| < 1 and Var(u 1 ) = σ 2 u = σ 2 ε /(1 − ρ 2 ), then the covariance matrix of the entire vector u is the n × n matrix Ω(ρ) given in (7.32). In order to compute GLS estimates, we need to find a matrix Ψ with the property that Ψ Ψ  = Ω −1 . This property will be satisfied whenever the covariance matrix of Ψ  u is proportional to the identity matrix, which it will be if we choose Ψ in such a way that Ψ  u = ε. For t = 2, . . . , n, we know from (7.29) that ε t = u t − ρu t−1 , (7.57) and this allows us to construct the rows of Ψ  except for the first row. The t th row must have 1 in the t th position, −ρ in the (t − 1) st position, and 0s everywhere else. For the first row of Ψ  , however, we need to be a little more careful. Under the hypothesis of stationarity of u, the variance of u 1 is σ 2 u . Further, since the ε t are innovations, u 1 is uncorrelated with the ε t for t = 2, . . . , n. Thus, if we define ε 1 by the formula ε 1 = (σ ε /σ u )u 1 = (1 − ρ 2 ) 1/2 u 1 , (7.58) it can be seen that the n vector ε, with the first component ε 1 defined by (7.58) and the remaining components ε t defined by (7.57), has a covariance matrix equal to σ 2 ε I. Putting together (7.57) and (7.58), we conclude that Ψ  should be defined as an n × n matrix with all diagonal elements equal to 1 except for the first, which is equal to (1 − ρ 2 ) 1/2 , and all other elements equal to 0 except for Copyright c  1999, Russell Davidson and James G. MacKinnon 284 Generalized Least Squares and Related Topics the ones on the diagonal immediately below the principal diagonal, which are equal to −ρ. In terms of Ψ rather than of Ψ  , we have: Ψ (ρ) =        (1 − ρ 2 ) 1/2 −ρ 0 · · · 0 0 0 1 −ρ · · · 0 0 . . . . . . . . . . . . . . . 0 0 0 · · · 1 −ρ 0 0 0 · · · 0 1        , (7.59) where the notation Ψ(ρ) emphasizes that the matrix depends on the usually unknown parameter ρ. The calculations needed to show that the matrix Ψ Ψ  is proportional to the inverse of Ω, as given by (7.32), are outlined in Exercises 7.9 and 7.10. It is essential that the AR(1) parameter ρ either be known or be consistently estimable. If we know ρ, we can obtain GLS estimates. If we do not know it but can estimate it consistently, we can obtain feasible GLS estimates. For the case in which the explanatory variables are all exogenous, the simplest way to estimate ρ consistently is to use the estimator ˜ρ from regression (7.46), defined in (7.47). Whatever estimate of ρ is used must satisfy the stationarity condition that |ρ| < 1, without which the process would not be stationary, and the transformation for the first observation would involve taking the square root of a negative number. Unfortunately, the estimator ˜ρ is not guaranteed to satisfy the stationarity condition, although, in practice, it is very likely to do so when the model is correctly specified, even if the true value of ρ is quite large in absolute value. Whether ρ is known or estimated, the next step in GLS estimation is to form the vector Ψ  y and the matrix Ψ  X. It is easy to do this without having to store the n × n matrix Ψ in computer memory. The first element of Ψ  y is (1 − ρ 2 ) 1/2 y 1 , and the remaining elements have the form y t − ρy t−1 . Each column of Ψ  X has precisely the same form as Ψ  y and can be calculated in precisely the same way. The final step is to run an OLS regression of Ψ  y on Ψ  X. This regression yields the (feasible) GLS estimates ˆ β GLS = (X  Ψ Ψ  X) −1 X  Ψ Ψ  y (7.60) along with the estimated covariance matrix  Var( ˆ β GLS ) = s 2 (X  Ψ Ψ  X) −1 , (7.61) where s 2 is the usual OLS estimate of the variance of the error terms. Of course, the estimator (7.60) is formally identical to (7.04), since (7.60) is valid for any Ψ matrix. Copyright c  1999, Russell Davidson and James G. MacKinnon 7.8 Estimating Models with Autoregressive Errors 285 Estimation by Nonlinear Least Squares If we ignore the first observation, then (7.56), the linear regression model with AR(1) errors, can be written as the nonlinear regression model (7.41). Since the model (7.41) is written in such a way that the error terms are innovations, NLS estimation is consistent whether the explanatory variables are exogenous or merely predetermined. NLS estimates can be obtained by any standard nonlinear minimization algorithm of the type that was discussed in Section 6.4, where the function to be minimized is SSR(β, ρ), the sum of squared residuals for observations 2 through n. Such procedures generally work well, and they can also be used for models with higher-order autoregressive errors; see Exercise 7.17. However, some care must be taken to ensure that the algorithm does not terminate at a local minimum which is not also the global minimum. There is a serious risk of this, especially for models with lagged dependent variables among the regressors. 2 Whether or not there are lagged dependent variables in X t , a valid estimated covariance matrix can always be obtained by running the GNR (6.67), which corresponds to the model (7.41), with all variables evaluated at the NLS estimates ˆ β and ˆρ. This GNR is y t − ˆρy t−1 − X t ˆ β + ˆρX t−1 ˆ β = (X t − ˆρX t−1 )b + b ρ (y t−1 − X t−1 ˆ β) + residual. (7.62) Since the OLS estimates of b and b ρ will be equal to zero, the sum of squared residuals from regression (7.62) is simply SSR( ˆ β, ˆρ). Therefore, the estimated covariance matrix  Var( ˆ β, ˆρ) is SSR( ˆ β, ˆρ) n − k − 2  (X − ˆρX 1 )  (X − ˆρX 1 ) (X − ˆρX 1 )  ˆ u 1 ˆ u 1  (X − ˆρX 1 ) ˆ u 1  ˆ u 1  −1 , (7.63) where the n×k matrix X 1 has typical row X t−1 , and the vector ˆ u 1 has typical element y t−1 − X t−1 ˆ β. This is the estimated covariance matrix that a good nonlinear regression package should print. The first factor in (7.63) is just the NLS estimate of σ 2 ε . The SSR is divided by n − k − 2 because there are k + 1 parameters in the regression function, one of which is ρ, and we estimate using only n − 1 observations. It is instructive to compute the limit in probability of the matrix (7.63) when n → ∞ for the case in which all the explanatory variables in X t are exogenous. The parameters are all estimated consistently by NLS, and so the estimates converge to the true parameter values β 0 , ρ 0 , and σ 2 ε as n → ∞. In computing the limit of the denominator of the simple estimator ˜ρ given by (7.47), we saw that n −1 ˆ u 1  ˆ u 1 tends to σ 2 ε /(1 − ρ 2 0 ). The limit of n −1 (X − ˆρX 1 )  ˆ u 1 is the 2 See Dufour, Gaudry, and Liem (1980) and Betancourt and Kelejian (1981). Copyright c  1999, Russell Davidson and James G. MacKinnon 286 Generalized Least Squares and Related Topics same as that of n −1 (X −ρ 0 X 1 )  ˆ u 1 by the consistency of ˆρ. In addition, given the exogeneity of X, and thus also of X 1 , it follows at once from the law of large numbers that n −1 (X − ρ 0 X 1 )  ˆ u 1 tends to zero. Thus, in this special case, the asymptotic covariance matrix of n 1/2 ( ˆ β − β 0 ) and n 1/2 (ˆρ − ρ 0 ) is σ 2 ε  plim 1 − n (X − ρ 0 X 1 )  (X − ρ 0 X 1 ) 0 0  σ 2 ε /(1 − ρ 2 0 )  −1 . (7.64) Because the two off-diagonal blocks are zero, this matrix is said to be block- diagonal. As can be verified immediately, the inverse of such a matrix is itself a block-diagonal matrix, of which each block is the inverse of the corresponding block of the original matrix. Thus the asymptotic covariance matrix (7.64) is the limit as n → ∞ of  nσ 2 ε  (X − ρ 0 X 1 )  (X − ρ 0 X 1 )  −1 0 0  1 − ρ 2 0  . (7.65) The block-diagonality of (7.65), which holds only if everything in X t is exogenous, implies that the covariance matrix of ˆ β can be estimated using the GNR (7.62) without the regressor corresponding to ρ. The estimated covariance matrix will just be (7.63) without its last row and column. It is easy to see that n times this matrix tends to the top left block of (7.65) as n → ∞. The lower right-hand element of the matrix (7.65) tells us that, when all the regressors are exogenous, the asymptotic variance of n 1/2 (ˆρ − ρ 0 ) is 1 − ρ 2 0 . A sensible estimate of the variance is therefore  Var(ˆρ) = n −1 (1 − ˆρ 2 ). It may seem surprising that the variance of ˆρ does not depend on σ 2 ε . However, we saw earlier that, with exogenous regressors, the consistent estimator ˜ρ of (7.47) is scale invariant. The same is true, asymptotically, of the NLS estimator ˆρ, and so its asymptotic variance is independent of σ 2 ε . Comparison of GLS and NLS The most obvious difference between estimation by GLS and estimation by NLS is the treatment of the first observation: GLS takes it into account, and NLS does not. This difference reflects the fact that the two procedures are estimating slightly different models. With NLS, all that is required is the stationarity condition that |ρ| < 1. With GLS, on the other hand, the error process must actually be stationary. Recall that the stationarity condition is necessary but not sufficient for stationarity of the process. A sufficient condition requires, in addition, that Var(u 1 ) = σ 2 u = σ 2 ε /(1 − ρ 2 ), the stationary value of the variance. Thus, if we suspect that Var(u 1 ) = σ 2 u , GLS estimation is not appropriate, because the matrix (7.32) is not the covariance matrix of the error terms. The second major difference between estimation by GLS and estimation by NLS is that the former method estimates β conditional on ρ, while the latter Copyright c  1999, Russell Davidson and James G. MacKinnon 7.8 Estimating Models with Autoregressive Errors 287 method estimates β and ρ jointly. Except in the unlikely case in which the value of ρ is known, the first step in GLS is to estimate ρ consistently. If the explanatory variables in the matrix X are all exogenous, there are several procedures that will deliver a consistent estimate of ρ. The weak point is that the estimate is not unique, and in general it is not optimal. One possible solution to this difficulty is to iterate the feasible GLS procedure, as suggested at the end of Section 7.4, and we will consider this solution below. A more fundamental weakness of GLS arises whenever one or more of the explanatory variables are lagged dependent variables, or, more generally, predetermined but not exogenous variables. Even with a consistent estimator of ρ, one of the conditions for the applicability of feasible GLS, condition (7.23), does not hold when any elements of X t are not exogenous. It is not simple to see directly just why this is so, but, in the next paragraph, we will obtain indirect evidence by showing that feasible GLS gives an invalid estimator of the covariance matrix. Fortunately, there is not much temptation to use GLS if the non-exogenous explanatory variables are lagged variables, because lagged variables are not observed for the first observation. In all events, the conclusion is simple: We should avoid GLS if the explanatory variables are not all exogenous. The GLS covariance matrix estimator is (7.61), which is obtained by regressing Ψ  (ˆρ)y on Ψ  (ˆρ)X for some consistent estimate ˆρ. Since Ψ  (ρ)u = ε by construction, s 2 is an estimator of σ 2 ε . Moreover, the first observation has no impact asymptotically. Therefore, the limit as n → ∞ of n times (7.61) is the matrix σ 2 ε plim n→∞  1 − n (X − ρX 1 )  (X − ρX 1 )  −1 . (7.66) In contrast, the NLS covariance matrix estimator is (7.63). With exogenous regressors, n times (7.63) tends to the same limit as (7.65), of which the top left block is just (7.66). But when the regressors are not all exogenous, the argument that the off-diagonal blocks of n times (7.63) tend to zero no longer works, and, in fact, the limits of these blocks are in general nonzero. When a matrix that is not block-diagonal is inverted, the top left block of the inverse is not the same as the inverse of the top left blo ck of the original matrix; see Exercise 7.11. In fact, as readers are asked to show in Exercise 7.12, the top left block of the inverse is greater by a positive semidefinite matrix than the inverse of the top left block. Consequently, the GLS covariance matrix estimator underestimates the true covariance matrix asymptotically. NLS has only one major weak point, which is that it does not take account of the first observation. Of course, this is really an advantage if the error process satisfies the stationarity condition without actually being stationary, or if some of the explanatory variables are not exogenous. But with a stationary error process and exogenous regressors, we wish to retain the information in the first observation, because it appears that retaining the first observation can sometimes lead to a noticeable efficiency gain in finite samples. The Copyright c  1999, Russell Davidson and James G. MacKinnon 288 Generalized Least Squares and Related Topics reason is that the transformation for observation 1 is quite different from the transformation for all the other observations. In consequence, the transformed first observation may well be a high leverage point; see Section 2.6. This is particularly likely to happen if one or more of the regressors is strongly trending. If so, dropping the first observation can mean throwing away a lot of information. See Davidson and MacKinnon (1993, Section 10.6) for a much fuller discussion and references. Efficient Estimation by GLS or NLS When the error process is stationary and all the regressors are exogenous, it is possible to obtain an estimator with the best features of GLS and NLS by modifying NLS so that it makes use of the information in the first observation and therefore yields an efficient estimator. The first-order conditions (7.07) for GLS estimation of the model (7.56) can be written as X  Ψ Ψ  (y − Xβ) = 0. Using (7.59) for Ψ , we see that these conditions are n  t=2 (X t − ρX t−1 )   y t − X t β − ρ(y t−1 − X t−1 β)  + (1 − ρ 2 )X 1  (y 1 − X 1 β) = 0. (7.67) With NLS estimation, the first-order conditions that define the NLS estimator are the conditions that the regressors in the GNR (7.62) should be orthogonal to the regressand: n  t=2 (X t − ρX t−1 )   y t − X t β − ρ(y t−1 − X t−1 β)  = 0, and n  t=2 (y t−1 − X t−1 β)  y t − X t β − ρ(y t−1 − X t−1 β)  = 0. (7.68) For given β, the second of the NLS conditions can be solved for ρ. If we write u(β) = y − Xβ, and u 1 (β) = Lu(β), where L is the matrix lag operator defined in (7.49), we see that ρ(β) = u  (β)u 1 (β) u 1  (β)u 1 (β) . (7.69) This formula is similar to the estimator (7.47), except that β may take on any value instead of just ˜ β. In Section 7.4, we mentioned the possibility of using an iterated feasible GLS procedure. We can now see precisely how such a procedure would work for this model. In the first step, we obtain the OLS parameter vector ˜ β. In the Copyright c  1999, Russell Davidson and James G. MacKinnon [...]... correlation between all of the error terms and all of the endogenous variables If there are g endogenous variables and g equations, the solution will look very much like (8.09), with the inverse of a g × g matrix premultiplying the sum of a g vector of linear combinations of the exogenous and predetermined variables and a g vector of error terms If we want to estimate the full system of equations, there... asymptotic F test For the example of equations (7.72) and (7.73), the restricted sum of squared residuals, RSSR, is obtained from NLS estimation of H1 , and the unrestricted one, USSR, is obtained from OLS estimation of H2 Then the test statistic is (RSSR − USSR)/r a ∼ F (r, n − k − r − 2), USSR/(n − k − r − 2) (7.79) where r is the number of restrictions The number of degrees of freedom in the denominator... testing for first-order serial correlation of the error terms in the regression model y = βy1 + u, |β | < 1, (7.96) where y1 is the vector with typical element yt−1 , by use of the statistics tGNR and tSR defined in (7 .51 ) and (7 .52 ), respectively Show first that the ˜ ˜ vector denoted as MX u1 in (7 .51 ) and (7 .52 ) is equal to −βMX y2 , where ˜ is the OLS estimate of β y2 is the vector with typical element... (t − 1) = 1 The version of H2 that can actually be estimated has regression function δ1 + β2 zt + δ2 t + δ3 zt−1 + δ4 yt−1 + γ4 zt−2 + 5 yt−2 , (7.78) where δ1 = β1 + γ1 − γ3 , δ2 = β3 + γ3 , δ3 = β4 + γ2 , and δ4 = ρ + 5 We see that (7.78) has only 7 identifiable parameters: β2 , γ4 , 5 , δ1 , δ2 , δ3 , and δ4 , instead of the 11 parameters, many of them not identifiable, of expression (7.77) In... will explain If the et and vi are thought of as fixed effects, then they are treated as parameters to be estimated It turns out that they can then be estimated by OLS using dummy variables If they are thought of as random effects, then we must figure out the covariance matrix of the uit as functions of the variances of the et , vi , and εit , and use feasible GLS Each of these approaches can be appropriate... y on D, on MD X, the matrix of residuals from regressing each of the columns of X on D The fixed-effects estimator is therefore ˆ βFE = (X MD X)−1X MD y (7. 85) T For any n vector x, let xi denote the group mean T −1 t=1 xit Then it ¯ is easy to check that element it of the vector MD x is equal to xit − xi , ¯ the deviation from the group mean Since all the variables in (7. 85) are premultiplied by MD... estimates of β, we would need to know the values of σε and σv , or, at least, the value of their ratio, since, as we saw in Section 7.3, GLS estimation requires only that Ω should be specified up to a factor To obtain feasible GLS estimates, we need a consistent estimate of that ratio However, the reader may have noticed that we have made no use in this section so far of asymptotic concepts, such as that of. .. introduced in the first four sections of this chapter, which dealt with the basic theory of generalized least squares estimation The concept of an efficient MM estimator, which we introduced in Section 7.2, will be encountered again in the context of generalized instrumental variables estimation (Chapter 8) and generalized method of moments estimation (Chapter 9) The key idea of feasible GLS estimation, namely,... consistent estimate of that matrix without changing the asymptotic properties of the resulting estimator, will also be encountered again in Chapter 9 The remainder of the chapter dealt with the treatment of heteroskedasticity and serial correlation in linear regression models, and with error-components models for panel data Although this material is of considerable practical importance, most of the techniques... εt (7. 75) and H2 becomes It is evident that in (7.74), but not in (7. 75) , the common factor 1 − ρL appears on both sides of the equation This is where the term “common factor restrictions” comes from How Many Common Factor Restrictions Are There? There is one feature of common factor restrictions that can be tricky: It is often not obvious just how many restrictions there are For the case of testing . t th row of X and the t th row of Z by ˜u t for all t. 2. Create the matrices ˜ U −1 X and ˜ U −1 Z by dividing the t th row of X and the t th row of Z by ˜u t for all t. 3. Regress each of the. Topics same as that of n −1 (X −ρ 0 X 1 )  ˆ u 1 by the consistency of ˆρ. In addition, given the exogeneity of X, and thus also of X 1 , it follows at once from the law of large numbers that. to the top left block of (7. 65) as n → ∞. The lower right-hand element of the matrix (7. 65) tells us that, when all the regressors are exogenous, the asymptotic variance of n 1/2 (ˆρ − ρ 0 ) is

Định dạng
Số trang	69
Dung lượng	1,14 MB