Remark 1. It is apparent from the definitions above that it makes very little substantive difference whether we deal with the forecast or the fore-
2.2.3 Forecasting from the RRF
In this section we take up the question of whether a gain in the efficiency of forecasts would accrue when we use the restricted reduced form (RRF),i.e. when we use estimates of II , that take account of the fact that II = CD. Given the preceding discussion, the question may be rephrased as: Is the restricted reduced form efficient, relative to the unrestricted re- duced form? In terms of the intuitive perception of the problem, one would expect that since in theRRFwe take into account the a priori restrictions on the structural parameters, therefore, the resulting estimators would be relatively efficient. It will turn out that this view is basically correct, al- though not literally as stated.
Properties of the RRF
Since the RRF is derived from the relationship II= CD, D = (I - B)-I,
we must look to the properties of the estimators of the matrices C and B for determining the properties of II. Let their estimators be, respectively,
C, b, IT and consider
IT-II ct: - CD = ci: - CD+CD - CD CiJ(D-1- iJ-l)D+(C - C)D, which may be rewritten more constructively as
(2.73) Ifwe put A = (B', C')' ,we note that vec(A) = L8,where 8 is the vector of structural parameters in the entire system not specified a priori (to be zero or one). Using the results in Ch. 4 of Dhrymes (1984), we may write
VT (vec(IT - II) ) VT(fr - n) = VT (D'Q9 (IT,I))vec(A - A)
VT (D'Q9(IT, I))L[VT(8 - 8)]
(D'Q9I)S[VT(8 - 8)], (2.74) where S = [IQ9 (II,I)]L is as defined in Chapter 1. Combining the dis- cussion above with that of the previous two (sub) sections we conclude that whether the GLSEM is static or dynamic the difference between "a T
period ahead forecast" and the (relevant) conditional mean is given by
where i = 1 stands for the URF, i = 2 stands for the RRF, induced by the 2SLS estimator of the structural parameters and i = 3 stands for the RRF induced by the 3SLS estimator of the structural parameters. Thus we have, for the URF and the static model,
(2.75) for the dynamic model. For the 28L8 induced restricted reduced form, we find for the static model
VT(fr - 7r)RRF(2SLS) VTvec(f)y+T' - YT+T')(2)
(D' ®I)SVT(8 - 8hsLS,
(I®PT+T.)VT(fr - 7r)RRF(2SLS) and (2.76) for the dynamic model. Finally, for 38L8 induced restricted reduced forms, we find for the static model
VT(fr - 7r)RRF(3SLS) VTvec(YT+T' - t1T+T')(3)
(D' ®I)SVT(8 - 8hsLS
(I®PT+T.)VT(fr - 7r)RRF(3SLS) and
for the dynamic model.
(2.77)
Remark 3. As we had found in the discussion ofURF the (suitably normal- ized) deviation of the forecast from the conditional mean of the dependent variables is, asymptotically, i.e. for large samples, a linear transformation of the deviation of the reduced form estimate from the underlying parame- ter. This holds true whether we deal with static or dynamic models. Thus, whether one type of forecast is "efficient" relative to another depends cru- cially on a similar comparison of the limiting distributions of the underlying reduced form estimators.
Up to this point we have considered three types of estimators for the reduced form: first, the unrestricted reduced form, obtained through the regression of the jointly dependent on the predetermined variables; second, the 28L8 induced restricted reduced form, in which the estimator of the matrix II is derived from the 28L8 estimators of the underlying structural matrices, B, C; and, finally, the 38L8 induced reduced form which derives an estimator of the reduced form matrix II through the 38L8 estimator of the underlying structural parameter matrices, B, C.14 Itfollows from the
14In subsequent chapters we shall examine other structural estimators such as
preceding discussion that before we can deal with the relative efficiencies of various forecasting procedures we must first prove a result regarding the relative efficiencies of the various estimators of the reduced form.
The properties of the reduced form estimators considered are given in the theorems below.
Theorem 4. Under the conditions of Theorem 1, the following statements are true:
i. asymptotically,
and <t> = I;(>9Ie ; ii. asymptotically,
where
G(2) = (D (>9R-1) 'J<t>J(D(>9R-1) , S* = (I (>9R')S;
111. asymptotically,
VT(fr - 7r)RRF(3SLS) rv N(O, G(3))'
G(3) = (D (>9R-1)'S*(S*'<t>-IS*)-IS*'(D(>9R-1) .
Proof: The proof of i is straightforward; since IT = (X' X)-1X'Y we find, upon substitution
IT-II (X'X)-IX'UD, or
(D (>91)' [I(>9 (X~X) -1] Jr (I (>9X')u.
Noting that Mx x = RR' ,the conclusion is immediate.
As for parts ii and iii, we have that
(fr - 7r)RRF = (D (>91)'8(6 - 8);
Full Information Maximum Likelihood (FIML), Limited Information Maximum Likelihood (LIML), and Indirect Least Squares (ILS) estimators. We shall show, in due course, that the first two are equivalent to the 3SLS and 2SLS estimators, respectively, in the sense that their limiting distributions are identical. ILS is an estimator that reverses the process and, thus, derives the structural parameter estimators from those of the unrestricted reduced form.
thus, for the 2SLS estimator we conclude
VT(fr-1r)RRF(2SLS) '"N (0, (D@I)'SC2S'(D@I)),
while for the 3SLS estimator we find
VT(fr-1r)RRF(3SLS) '"N (0, (D @I)'SC3S'(D@I)).
From Eqs. (2.4), (2.6) and (2.8) we determine
C2 = (S*' S*)-lS*' <l>S*(S*' S*)-l, C3 = (S*'<I>-l S* )- l
(2.78)
(2.79)
and (D@I)'S= (D@fl-1)'S* ,which concludes the proof of the theorem.
q.e.d.
While the discussion above has established the limiting distribution of the various reduced form estimators, the following theorem establishes their relative efficiencies.
Theorem5. Under the conditions of Theorem 3, the following statements are true:
1. G(1) - G(3) 2:0 ; ii. G(2) - G(3) 2:0 ;
iii. G(l) - G(2) 2: 0 , if and only if <1>21 =0; otherwise it is indefinite;
iv. <1>21 = 0, if at least one of the following three conditions holds, otherwise <l>h -I0 ;
1. aij =0, for all i -Ij ;
2. all equations are just identified;
3. aij -10, for some pair say (io,jo) ,implies that the correspond- ing equations are just identified.
Proof: Put
the matrix in the left member is positive semidefinite if and only if
Consider the characteristic equation
and note that it has exactly the same characteristic roots as
whose nonzero characteristic roots are exactly those of
m
1J.L1 - (S*' <I>- lS*)-lS*' <I>- lS* 1=1 J.L1 - h 1=0, k=l)mi+ c.;
i=l
Using the results in Ch. 2 of Dhrymes (1984), we conclude that there exists a nonsingular matrix P such that
so that
<I> - S*(S*' <I>- lS*)-lS*' = P [~
To prove part ii we note that
o ] P' > o.
1m G - k -
which is evidently positive semidefinite due to the fact that 3SLS is efficient relative to 2SLS, i.e. that O2 - 03 ~o.
As for part iii we have that
1 1 ' l '
G(1)-G(2) = (DrgJ!C )'[<I>-J<I>J](D(9R-), where J =S*(S* S*)- S* . In view of the fact that (D(9 R-1) is nonsingular, the matrix in the left member above is positive semidefinite, negative semidefinite or indefinite according to whether the matrix in square brackets (in the right member above) does or does not share these properties. Since J is a symmetric idempotent matrix, it obeys k =rank(J) = 2:7:1(mi+Gi) :::::: mG.Thus, there exists an orthogonal matrix E, such that
Consequently, we can write
E'[<I> - J<I>J]E = <1>* - [10k 00] <1>* [10k 0] [0 o = <1>;1
where <1>* =E'<I>E .Since, by assumption, <I> is nonsingular, it follows that
<1>;2 is also nonsingular. Thus, G(1) - G(2) ~ 0, if and only if <1>;1 = o.
The sufficiency part of this statement is obvious, so let us demonstrate its necessity. Thus, suppose G(l) - G(2) ~ 0 and <1>;1 -I- 0; we derive a
98 2. Extension of Classical Methods II
contradiction. Since the latter holds there exists at least one vector, say a -I-0 ,such that 1>21a -I-O. Consider, then
¢(a,(3)
3 I;T,.* ;T,.*-1;T,.* 0 a "'12"'22 "'21a > ,
I;T,.* ;T,.*-1;T,.* 0
= -a"'12"'22 "'21a < ,
f (3 ;T,.*-1;T,.*
or = "'22 "'21a, f (3 ;T,.*-1;T,.*
or = -"'22 "'21a.
Thus, the matrix G(1) - G(2) is indefinite unless special circumstances hold, which brings us to the consideration of part iv.
To prove part iv we note that J = diag(J1 ,h, ... ,Jm ) , where J; =
, l '
s; (s; s;)- S; ,for i = 1,2, ... , m and each is symmetric idempotent of rank m; +G, . Let E, be their respective matrices of characteristic vectors and partition E, = (Ei 1, E i2) , such that E i 1 corresponds to the unit (characteristic) roots and Ei 2 corresponds to the zero roots. Define
and note that E = (E(1) , E(2))' Consequently,
(2.80) and we see that 1>i2 = 0 if aij = 0, or if E:1Ej 2 = 0, for all i and j . It is interesting to note that if all equations of the system are just identified, the condition E:1Ej 2= 0 obviously does hold,15 ;conversely, if E:1Ej2 = 0 , for all i and j then all equations in the system are just identified. To see this let Vs denote the dimension of the column null space, i.e, the nullity of E~1 . By the condition above we have
E~1Es 2= 0, for all s.
Since E 12 is a basis for the column null space of E~1,there exists a unique matrix, say Cs 2 , such that E s 2 = E 12Cs 2 , and moreover, we must have
V s s:VI . Repeating this argument with E 21, E 31, etc., we conclude that
VI ~ Vs, V2 ~ VS, ... , Vm ~ VS, for s= 1,2, ... ,m,
But this implies that V s = VI , for all s. Similarly, from the condition
E~2Esl = 0, for all s, we conclude that E s 1 = EllCs l and, moreover, that the dimension of the column null space of E12 is equal to that of the column null space of E s2' for all s . This means that the matrices, Cs 1
15A particularly simple way of justifying this claim is to note that, in the case where all equations are just identified, J; = Ia ,for all s = 1,2, ... , m; thus,
<I> - J<I>J=O!
are square and nonsingular and have the same dimension for all s; the same is true of the matrices Cs2 ' Using the results above we find
(2.81 ) which implies that Ell is the matrix of characteristic vectors corresponding to the unit roots of Js , for all s. By a similar argument we establish that JsEI 2 = 0, for all s. We have now established that E; = EIC, , where C; = diag(Cs I , Cs2) ; since we must also have E~Es = EsE~ = Ie, we conclude that the block components of Cs must each be orthogonal matrices. Consequently, since
[ h
s, = e, OS
we conclude that J, = JI ,for all s. Moreover,
(2.82) Since all predetermined variables of the system appear in at least one equa- tion, the part of Eq. (2.82) above that reads
L~JUI=L~J?', implies fUI = il, or Ji =I.
In turn, this result leads to the conclusion that Si is a square invertible matrix, for every i, or that every equation of the system is just identified.
Finally, as to part iv.3 we note that if whenever aij -I- 0, we have that the it h and l h equations are just identified, then from the discussion above E~IEj 2 = 0 , so that <Ph = 0 .
q.e.d.
Remark 4. The results of Theorem 5 appear to be partly counterintuitive, particularly those in parts iii and iv. One would be tempted to think that if "more information is used", as in the case of RRF(2SLS), one would gain some efficiency over the case where "less information is used", as in the case of URF. A little reflection, however, would show that the conclusions of this theorem do not violate this intuition. The trouble is that we are not necessarily, or unambiguously, using "more information" in the case of RRF(2SLS) vis-a-vis the case of URF. We remind the reader that, in this context, there are two types of information; one is sample information, the other is "a priori" information. In URF we are using all sample informa- tion in estimating every element of the parameter matrix II, but none of the prior information, i.e. we do not use the fact that II = CD and, thus, we do not use any of the prior restrictions placed on the ele- ments of these structural matrices. By contrast, RRF(2SLS) uses all prior
information, i.e. uses the fact that II = CD, and respects all prior re- strictions on the structural matrices, but does not necessarily use all sample information.16 Since the two procedures leave out, or fail to use, a differ- ent subset of the information available, it is not surprising that our formal discussion finds that it is not possible to place an unambiguous ranking on the relative efficiency of URF and RRF(2SLS) estimators.
It is now almost anticlimactic to discuss the limiting distribution and ranking for various forecasting procedures.
Theorem 6. Under the conditions of Theorem 5, for T ::::: 1, and on the assertion that the sample size, T, is sufficiently large, the following statements are true:17
For static models:
i.2 vTvec((f)T+T' - YT+T')RRF(2SLS)) '" N (0, G(2)) ,
where G(2)= (I0PT+T.)G(2)(I0PT+T')';
i.3 vTvec((YT+T' - YT+T')RRF(3SLS)) '"N (0, G(3)) ,
where G(3)= (I 0 PT+T.)G(3)(I 0 PT+T')' . For dynamic models:
16In this context, it is the failure of 2SLS to use all sample information that renders it inefficient relative to the 3SLS estimation procedure.
17The statement of this theorem contains certain logical incongruities, viz.
when we contemplate the distribution of the normalized deviations of the left members we simultaneously want to take the "initial conditions", i.e. the trr., YT-1., ... , YT-k+1ã , as given, but we also wish to utilize the limiting dis- tribution of the underlying structural parameters, which, of course, depends on these very initial conditions! Thus, if one wished to be exceedingly puristic one should not have stated these results as a "theorem". On the other hand, in rare occasions such as now, we may use the statement of a theorem in order to sum- marize prominently the results of a certain discussion. Thus, Theorem 6 is to be viewed in this light, and should be thought of as a framework for a good working approximation in empirical applications, and not as a rigorous derivation of the distribution of forecasts from a dynamic GLSEM. This is quite reasonable since, normally, one would expect that the number of lags, k, would be quite small relative to the number of observations, T.
Proof: The results follow immediately from Eqs. (2.75), (2.76), (2.77), and Theorem 4.
q.e.d.
Remark 5. In view our discussion thus far, it would be rather pedantic to set forth another theorem regarding the ranking of the various forecasting procedures in terms of relative efficiencies. Itis quite apparent from Theo- rem 6, that these rankings follow exactly the results of Theorem 5. Thus, further discussion of this aspect is quite unnecessary.
Remark 6. Ifwe wish to consider the behavior of the forecast error, i.e.
if we examine YT+r. - YT+r. instead of YT+r. - YT+r. , then in the case of static models we ought to modify the results of Theorem 6 only to the extent of adding n to the covariance matrix of the relevant limiting distribution. This is so since YT+rã = YT+r. +VT+r' .
In the case ofdynamic models, however, the correction is more sub- stantial, since from Eqs. (2.65) and (2.66)
r r
I _ - I ' " " "B(r) I _ - I ' " " "A(r-i) I
YT+rã - YT+r. +~ i vT+iã - YT+rã +~ 1 VT+iã,
i=l i=l
and the forecast error contains the additional term L:T=lA~r-i)V;'+i' , whose covariance matrix is
r
nCr) = L A~r-i)nA~r-i)'.
i=l
Thus, the correction to be made to the relevant limiting distribution in dynamic cases is n(r). Two features of this situation are to be pointed out. First, for given lag structure, nCr) isan increasing function of 7.
This can be seen most readily with the change in index, j = 7 - i ,so that
nCr) = L:j":~ A~j)nA~j)' .Second, no matter how well the conditional mean is forecast, the variance of the forecast error is bound to increase the farther away we move from the sample. This is largely a consequence of the fact that, even if the exogenous sequence is constant over the forecasting horizon, the other uncertainty component obeys n(r2) >n(rt} , whenever
72 > 71 . This is a reflection of the fact that the farther away we move from the end of the sample, the less reliable become the
"initial conditions" we employ in the forecast; hence, the uncertainty attached to the forecast error increases!