Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 59 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
59
Dung lượng
481,24 KB
Nội dung
CHAPTER 65 Disturbance Related (Seemingly Unrelated) Regressions One has m timeseries regression equations y i = X i β i + ε ε ε i . Everything is dif- ferent: the dependent variables, the explanatory variables, the coefficient vectors. Even the numbers of the observations may be different, The ith regression has k i explanatory variables and t i observations. They may be time series covering dif- ferent but partly overlapping time periods. This is why they are called “seemingly unrelated” regressions. The only connection between the regressions is that for those observations which overlap in time the disturbances for different regressions are con- temperaneously correlated, and these correlations are assumed to be constant over 1375 1376 65. SEEMINGLY UNRELATED time. In tiles, this model is (65.0.18) t m Y m = t m X k m B ∆ m + t m E m 65.1. The Supermatrix Representation One can combine all these regressions into one big “supermatrix” as follows: (65.1.1) y 1 y 2 . . . y m = X 1 O ··· O O X 2 ··· O . . . . . . . . . . . . O O ··· X m β 1 β 2 . . . β m + ε ε ε 1 ε ε ε 2 . . . ε ε ε m 65.1. THE SUPERMATRIX REPRESENTATION 1377 The covariance matrix of the disturbance term in (65.1.1) has the following “striped” form: (65.1.2) V [ ε ε ε 1 ε ε ε 2 . . . ε ε ε m ] = σ 11 I 11 σ 12 I 12 ··· σ 1m I 1m σ 21 I 21 σ 22 I 22 ··· σ 2m I 2m . . . . . . . . . . . . σ m1 I m1 σ m2 I m2 ··· σ mm I mm Here I ij is the t i ×t j matrix which has zeros everywhere except at the intersections of rows and columns denoting the same time period. In the special case that all time pe riods are identical, i.e., all t i = t, one can define the matrices Y = y 1 ··· y m and E = ε ε ε 1 ··· ε ε ε m , and write the equations in matrix form as follows: (65.1.3) Y = X 1 β 1 . X m β m + E = H(B) + E The vector of dependent variables and the vector of disturbances in the supermatrix representation (65.1.1) can in this spe cial case be written in terms of the vector- ization operator as vec Y and vec E. And the covariance matrix can be written as a Kronecker product: V [vec E] = Σ Σ Σ ⊗ I, since all I ij in (65.1.2) are t × t identity 1378 65. SEEMINGLY UNRELATED matrices. If t = 5 and m = 3, the covariance matrix would be σ 11 0 0 0 0 σ 12 0 0 0 0 σ 13 0 0 0 0 0 σ 11 0 0 0 0 σ 12 0 0 0 0 σ 13 0 0 0 0 0 σ 11 0 0 0 0 σ 12 0 0 0 0 σ 13 0 0 0 0 0 σ 11 0 0 0 0 σ 12 0 0 0 0 σ 13 0 0 0 0 0 σ 11 0 0 0 0 σ 12 0 0 0 0 σ 13 σ 21 0 0 0 0 σ 22 0 0 0 0 σ 23 0 0 0 0 0 σ 21 0 0 0 0 σ 22 0 0 0 0 σ 23 0 0 0 0 0 σ 21 0 0 0 0 σ 22 0 0 0 0 σ 23 0 0 0 0 0 σ 21 0 0 0 0 σ 22 0 0 0 0 σ 23 0 0 0 0 0 σ 21 0 0 0 0 σ 22 0 0 0 0 σ 23 σ 31 0 0 0 0 σ 32 0 0 0 0 σ 33 0 0 0 0 0 σ 31 0 0 0 0 σ 32 0 0 0 0 σ 33 0 0 0 0 0 σ 31 0 0 0 0 σ 32 0 0 0 0 σ 33 0 0 0 0 0 σ 31 0 0 0 0 σ 32 0 0 0 0 σ 33 0 0 0 0 0 σ 31 0 0 0 0 σ 32 0 0 0 0 σ 33 If in addition all regressions have the same number of regressors, one can combine the coefficients into a matrix B and can write the system as (65.1.4) vec Y = Z vec B + vec E vec E ∼ (o,Σ Σ Σ ⊗ I), 65.1. THE SUPERMATRIX REPRESENTATION 1379 where Z contains the regressors arranged in a block-diagonal “supermatrix.” If one knows Σ Σ Σ up to a multiplicative factor, and if all regressions cover the same time period, then one can apply (26.0.2) to (65.1.4) to get the following formula for the GLS estimator and at the same time maximum likelihood estimator: (65.1.5) vec( ˆ B) = Z (Σ Σ Σ −1 ⊗ I)Z −1 Z (Σ Σ Σ −1 ⊗ I) vec(Y ). To evaluate this, note first that Z (Σ Σ Σ −1 ⊗ I) = X 1 O ··· O O X 2 ··· O . . . . . . . . . . . . O O ··· X m σ 11 I σ 12 I ··· σ 1m I σ 21 I σ 22 I ··· σ 2m I . . . . . . . . . . . . σ m1 I σ m2 I ··· σ mm I = σ 11 X 1 ··· σ 1m X 1 . . . . . . . . . σ m1 X m ··· σ mm X m where σ ij are the elements of the inverse of Σ Σ Σ, therefore (65.1.6) ˆ β 1 . . . ˆ β m = σ 11 X 1 X 1 ··· σ 1m X 1 X m . . . . . . . . . σ m1 X m X 1 ··· σ mm X m X m −1 X 1 m i=1 σ 1i y i . . . X m m i=1 σ mi y i . In the seemingly unrelated regression model, OLS on each equation singly is therefore less efficient than an approach which estimates all the equations simultaneously. If 1380 65. SEEMINGLY UNRELATED the numbers of observations in the different regressions are unequal, then the formula for the GLSE is no longer so simple. It is given in [JHG + 88, (11.2.59) on p. 464]. 65.2. The Likelihood Function We know therefore what to do in the hypothetical case that Σ Σ Σ is known. What if it is not known? We will derive here the maximum likelihood estimator. For the exponent of the likelihood function we need the following mathematical tool: Problem 532. Show that t s=1 a s Ω Ω Ωa s = tr A Ω Ω ΩA where A = a 1 . . . a t . Answer. A Ω Ω ΩA = a 1 . . . a t Ω Ω Ω a 1 . . . a t = a 1 Ω Ω Ωa 1 a 1 Ω Ω Ωa 2 ··· a 1 Ω Ω Ωa t a 2 Ω Ω Ωa 1 a 2 Ω Ω Ωa 2 ··· a 2 Ω Ω Ωa t a t Ω Ω Ωa 1 a t Ω Ω Ωa 2 ··· a t Ω Ω Ωa t Now take the trace of this. 65.2. THE LIKELIHOOD FUNCTION 1381 To derive the likelihood function, define the matrix function H(B) as follows: H(B) is a t × m matrix the ith column of which is X i β i , i.e., H(B) as a column- partitioned matrix is H(B) = X 1 β 1 ··· X m β m . In tiles, (65.2.1) H(B) = t m X k m B ∆ m The above notation follows [DM93, 315–318]. [Gre97, p. 683 top] writes this same H as the matrix product (65.2.2) H(B) = ZΠ(B) where Z has all the different regressors in the different regressions as columns (it is Z = X 1 ··· X n with duplicate columns deleted), and the ith column of Π has zeros for those regressors which are not in the ith equation, and elements of B for those regressors which are in the ith equation. Using H, the model is simply, as in (65.0.18), (65.2.3) Y = H(B) + E , vec(E) ∼ N (o, Σ Σ Σ ⊗ I) 1382 65. SEEMINGLY UNRELATED This is a matrix generalization of (56.0.21). The likelihood function which we are going to derive now is valid not only for this particular H but for more general, possibly nonlinear H. Define η s (B) to be the sth row of H, written as a column vector, i.e., as a row-partitioned matrix we have H(B) = η 1 (B) . . . η t (B) . Then ( 65.2.3) in row-partitioned form reads (65.2.4) y 1 . . . y t = η 1 (B) . . . η t (B) + ε ε ε 1 . . . ε ε ε t We assume Normality, the sth row vector is y s ∼ N(η s (B), Σ Σ Σ), or y s ∼ N(η s (B), Σ Σ Σ), and we assume that different rows are independent. Therefore the density function 65.2. THE LIKELIHOOD FUNCTION 1383 is f Y (Y ) = t s=1 (2π) −m/2 (detΣ Σ Σ) −1/2 exp − 1 2 (y s − η s (B)) Σ Σ Σ −1 (y s − η s (B)) = (2π) −mt/2 (detΣ Σ Σ) −t/2 exp − 1 2 s (y s − η s (B)) Σ Σ Σ −1 (y s − η s (B)) = (2π) −mt/2 (detΣ Σ Σ) −t/2 exp − 1 2 tr(Y −H(B))Σ Σ Σ −1 (Y −H(B)) = (2π) −mt/2 (detΣ Σ Σ) −t/2 exp − 1 2 tr(Y −H(B)) (Y −H(B))Σ Σ Σ −1 .(65.2.5) Problem 533. Expain exactly the step in the derivation of (65.2.5) in which the trace enters. 1384 65. SEEMINGLY UNRELATED Answer. Write the quadratic form in the exponent as follows: t s=1 (y s − η s (B)) Σ Σ Σ −1 (y s − η s (B)) = t s=1 tr(y s − η s (B)) Σ Σ Σ −1 (y s − η s (B)) (65.2.6) = t s=1 tr Σ Σ Σ −1 (y s − η s (B))(y s − η s (B)) (65.2.7) = tr Σ Σ Σ −1 t s=1 (y s − η s (B))(y s − η s (B)) (65.2.8) = tr Σ Σ Σ −1 (y 1 − η 1 (B)) ··· (y t − η t (B)) (y 1 − η 1 (B)) . . . (y t − η t (B)) (65.2.9) = tr Σ Σ Σ −1 (Y − H(B)) (Y − H(B))(65.2.10) The log likelihood function (Y ; B, Σ Σ Σ) is therefore (65.2.11) = − mt 2 log 2π − t 2 log detΣ Σ Σ − 1 2 tr(Y −H(B)) (Y −H(B))Σ Σ Σ −1 . [...]... • a 1 point In a seemingly unrelated regression framework, joint estimation of the whole model is much better than estimation of each equation singly if the errors are highly correlated True or false? Answer True • b 1 point In a seemingly unrelated regression framework, joint estimation of the whole model is much better than estimation of each equation singly if the independent variables in the different... valid for all the different models, including nonlinear models, which can be written in the form (65.2.3) As a next step we will write, following [Gre97, p 683], H(B) = ZΠ(B) and derive the following formula from [Gre97, p 685]: c (65.3.3) =− ∂ c ˆ −1 = Σ (Y − ZΠ) Z ∂Π 65.3 CONCENTRATING OUT THE COVARIANCE MATRIX (INCOMPLETE) 1387 ˆ Here is a derivation of this using tile notation We use the notation... term) Then the first claim is: y is correlated with ε, because y and c are determined simultaneously once i and ε is given, and both depend on i and ε Let us do that in more detail and write the reduced form equation for y That means, let us express y in terms of the exogenous variable and the disturbances only Plug 66.1 EXAMPLES 1403 c = y − i into (66.1.9) to get y − i = α + βy + ε (66.1.11) or (66.1.12)... and the second has all variables that the first has, plus some additional ones Then the inclusion of the second equation does not give additional information for the first; however, including the first gives additional information for the second! 1394 65 SEEMINGLY UNRELATED What is the rationale for this? Since the first equation has fewer variables than the second, I know the disturbances better For instance,... variables in X 1 is a subset of those in X 2 One of the following two statements is correct, the other is false Which is correct? (a) in order to estimate β 1 , OLS on the first equation singly is as good as SUR (b) in order to estimate β 2 , OLS on the second equation singly is as good as SUR Which of these two is true? Answer The first is true One cannot obtain a more efficient estimator of β 1 by considering... 537 4 points Explain how to do iterated EGLS (i.e., GLS with an estimated covariance matrix) in a model with first-order autoregression, and in a seemingly unrelated regression model Will you end up with the (normal) maximum likelihood estimator if you iterate until convergence? Answer You will only get the Maximum Likelihood estimator in the SUR case, not in the AR1 case, because the determinant term... d and ε s are independent of y, but amongst each other they are contemporaneously correlated, with their covariance constant over time: (66.1.6) cov[εdt , εsu ] = 0 σds if t = u if t = u • a 1 point Which variables are exogenous and which are endogenous? Answer p and q are called jointly dependent or endogenous y is determined outside the system or exogenous 66.1 EXAMPLES 1399 • b 2 points Assuming... an intercept) 1400 66 SIMULTANEOUS EQUATIONS SYSTEMS Answer By (66.1.7) (the reduced form equation for p), cov[εst , pt ] = cov[εst , 2 σsd −σs β1 −α1 εdt −εst ] β1 −α1 = This is generally = 0, therefore inconsistency • d 2 points If one estimates the supply function by instrumental variables, using y as an instrument for p and ι as instrument for itself, write down the formula for ˜ the resulting... the same β 1 as in part d • f 1 point Since the error terms in the reduced form equations are contemporaneously correlated, wouldn’t one get more precise estimates if one estimates the reduced form equations as a seemingly unrelated system, instead of OLS? Answer Not as long as one does not impose any constraints on the reduced form equations, since all regressors are the same • g 2 points We have shown... equivalent to (65.4.3) ˆ B = (X X)−1 X Y • d 3 points [DM93, p 313] appeals to Kruskal’s theorem, which is Question 499, to prove this Supply the details of this proof Answer Look at the derivation of (65.4.3) again The Σ −1 in numerator and denominator cancel out since they commute with Z defining Ω = Σ ⊗ I, this “commuting” is the formula 1392 65 SEEMINGLY UNRELATED Ω Z = ZK for some K, i.e., (65.4.4) . derivation of (65.4.3) again. The Σ Σ Σ −1 in numerator and denominator cancel out since they commute with Z. defining Ω Ω Ω = Σ Σ Σ ⊗ I, this “commuting” is the formula 1392 65. SEEMINGLY UNRELATED Ω Ω ΩZ. different models, including nonlinear models, which can be written in the form (65.2.3). As a next step we will write, following [Gre97, p. 683], H(B) = ZΠ(B) and derive the following formula from. • b. 1 point In a seemingly unrelated regression framework, joint estimation of the whole model is much better than estimation of each equation singly if the independent variables in the different