Remark 1. It is apparent from the definitions above that it makes very little substantive difference whether we deal with the forecast or the fore-
2.5 IV and Insufficient Sample Size
2.5.1 The Nature of the Problem
The 2SLS (and 3SLS) procedure creates "excessive" data demands, even though the parameters to be estimated in each structural equation may be quite small. The problem lies with the necessity of estimating the reduced form, which is an integral (conceptual) part of the two procedures. Thus, when we are dealing with a sizable model, the fact that the latter may contain quite a large number of predetermined variables may render it impossible to estimate its reduced form. This occurs because the matrix X ,which is T x G may be such that T - G ,is quite small or even negative! When this is so, of course, the reduced form may be estimated very poorly, or may not be obtainable at all. The first will eventuate when T - G 2:0, but small, and the second when T - G < 0, thus violating condition (A.1a) of Chapter 1.
When the problem was first encountered, applied econometricians re- sponded by a number of ingenious, if futile, alternatives. One such alter- native was the use of the generalized inverse. More precisely, for the case T - G <0 , it was proposed that since the entity
IT = (X'X)-l X'y
does not exist (because the inverse (X'X)-l does not exist), it should be substituted by
IT= (X'X)gX'Y = XgY, where (X' X)g the generalized inverse of X' X .
As is made clear in Ch. 3 of Dhrymes (1984), this approach obtains a (unique) solution at the cost of imposing certain restrictions on the pa- rameter estimates. Such restrictions have no economic content or motiva- tion and merely require that of all matrices II satisfying the condition, X' XII= X'Y , the one chosen, IT,should have minimal norm!
Another proposal was to use the principal components of the variables in X, for estimating the reduced form. Without going into the details of the definition and derivation of principal components, let us say, for the purposes of this discussion, that the process of obtaining principal compo- nents entails the extraction of the characteristic roots of X'X and their associated characteristic vectors; subsequently, one is to choose a number of these characteristic vectors, say k*, which must be "much less" than T (typically only a small fraction of T), corresponding to the k; largest characteristic roots. Let these characteristic vectors be denoted by Ek .The principal components, for the purposes of this discussion, may be defined by Wk. = XEk • .One then estimates r(k.) = (W{.Wk.)-lW{.Y, and for the explanatory dependent variables appearing in the structural equations, say Yi, one substitutes Yi = Wk.r~k.) .Thereafter one follows the stan- dard 2SLS procedure.23 Of course, these procedures can be shown to be consistent, only by showing that as the sample increases in size the need for them disappears, so that we ultimately revert to the standard 2SLS technique.P? Needless to say, the same may be said of taking k; = T.
This choice implies that Y = wTr(T) = Y, so that the principal compo- nents procedure with T = k; is simply an elaborate process for applying Ordinary Least Squares (OLS) in order to estimate the parameters of a structural equation(s). The fact that, as the sample size increases, we move toward the standard 2SLS procedure cannot possibly be used to defend the use of thisprocedure in any instance where T < G !
2.5.2 Iterated Instrumental Variables (IIV)
The resolution of the problem encountered in the previous section is con- tained in the observation that while collectively the number of predeter- mined variables in a model may be quite large, the number of parameters to be estimated in each structural equation is, typically, rather small.
The method of IIV proceeds as follows: let X be the matrix of predeter- mined variables in the system as a whole, and let it be desired to estimate the parameters of the it h structural equation. Itis assumed that it is al- ways possible to choose instrumental variables (columns of X) for every equation. To this effect, choose X Oi = XLOi =X(L4i, L 2i) , where L 4i is of dimension Gxm; and corresponds to the choice of instruments in lieu of the dependent variables in Yi and L2i corresponds to choosing as instru- ments the predetermined variables in Xi .Thus, in this scheme, we always
23Since there are as many well defined characteristic vectors as the rank of the matrix X' X , it follows that the recommended procedure fails to utilize some of the information contained in the X matrix. Thus, it would appear that we wish to advance matters simply by ignoring some of the relevant information!
24This is the justification given, e.g., in Dhrymes (1970), which is fairly repre- sentative of the prevailing view at the time.
(2.123) choose as instruments the predetermined variables appearing in the structural equation dealt with. Given the choice of instruments, we proceed in three steps: (a) we obtain the initial IV estimators
8- (0) --i - (X' Z )-lX'Oi i OiYãi, i = 1,2, ...,m;
i=1,2, ...,m.
(b) subsequently, we estimate the restricted reduced form fI(O) = (7(0)1)(0), j7(0)= xfI(O) , ~;O) = (~(O), Xi);
(c) we obtain the IlV estimator as 8(1) = (~(O)fZ)-l~(O)f .
." " " "y.",
(2.124)
(2.125) Although this estimator may be further iterated, i.e, we may now recompute the (restricted) reduced form, obtaining the new estimator, fI(1), thus obtaining the second iterate 8\2) , and so on until convergence. As it turns out, nothing will be gained (asymptotically) by iteration to convergence since the first iterate, i.e. the estimator in Eq. (2.125), has exactly the same asymptotic properties as the converging iterate.
This is a single equation estimator and hence does not take advantage of relevant information that may be contained in other equations. Thus, it is more appropriately termed the limited information iterated instru- mental variables (LIlV) estimator.
Evidently, we can define an "efficient" estimator by analogy to 38L8. In particular, we may repeat (a), (b) and (c) of the LIlV estimator, but at stage (c) we also obtain estimators of the covariance matrix parameters
iiij = ~ (Y.i - Zi8.i)' (y.j - Z j8. j) , and the instrumental matrix
(f;-10I)~*, where ~*=diag(Zl,~2,...,~m);
finally, (d) we obtain the "efficient" estimator as
8 = (~*f(f;-10 I)Z*)-1~*fY,
where we have written the entire system not in C8F but as
(2.126)
(2.127)
Y= Z*8+u, Z* = diag(Zl,Z2,"" Zm). (2.128) Again by analogy with 38L8 we term this thefull information iterated instrumental variables (FIlV) estimator. We shall justify this terminol- ogy below where we shall show that FIlV is efficient relative to the LIlV estimator. We now establish the salient properties of such estimators.
Theorem 10. Consider the GL8EM of Chapter 1, subject to conditions (A.1) through (A.5) and as exhibited in Eq. (2.128), above. Then, the following statements are true:
1. the LIlV estimator, as discussed above, is consistent;
ii. the LIlV estimator is asymptotically equivalent to the 28L8 estima- tor, in the sense that their limiting distributions are identical;
iii. the FIlV estimator, as described above, is consistent;
iv. the FIlV estimator is asymptotically equivalent to the 38L8 estima- tor, in the sense that their limiting distributions are equivalent;
v. the FIlV estimator is efficient relative to the LIlV estimator.
Proof: Inproving parts i and ii we shall employ the systemwide form the LIlV estimator. From the description of the estimator above, the instru- mental matrix is Z*. Thus, the systemwide LIlV estimator is given by
{; = (Z*'z*)-1Z*' y. Since Zi =X(tt, L2i) = Xs, ,we see that the LIlV estimator above may also be rendered as
rrn - [- ( X'X) _]-1 - 1
vT(8 - 8)LIIV = 5' I ® ---;y- 5 5' y'T(I®X')u, (2.129) which shows (a) that the LIlV estimator is consistent, and, comparing with Eqs. (2.3) and (2.4), (b) that its limiting distribution is identical with that of the 28L8 estimator.
To prove iii and iv we note that, since Z* = (I®X)S,the FIlV estimator may be rendered as
rrn- [-(-1 XI X ) _ ] - l _ - l
vT(8-8)FIIV= 5' ~- ®---;y- 5 S'(~- ®I)y'T(I®X')u.1 (2.130)
- p - p
Moreover, since II - t II, and ~ - t ~, we see that the FIlV estimator is consistent and, comparing with Eqs. (2.7) and (2.8), that its limiting distribution is identical with that of the 38L8 estimator.
As for part v, this follows from parts ii and iv and Theorem 3.
q.e.d.
Corollary 2. Neither the LIlV nor the FIlV estimators are improved, in their asymptotic properties, by iteration.
Proof: From the representation in Eqs. (2.129) and (2.130) it is clear that only the consistency property of fr and/or f; enters into the argument for the consistency or asymptotic normality of the LIlV and FIlV estimators.
Hence, how II and ~ are initially estimated is immaterial.
q.e.d.
Remark 11. The corollary should not convey the impression that how we choose the initial instruments is immaterial; only that this is so in the limit, i.e. when the sample increases indefinitely. Nonetheless, it is a good practice, in finite samples, to choose one's initial instruments with some care and to iterate the procedure at least once, so as to overcome any adverse effects of an inept initial instrument selection.