Corollary 1. In the context of Theorem 4, the system as a whole is identi- fied by the exclusion and normalization conditions L*o'vec(A*) = vec(H) ,
4.1 The "Concentrated" Likelihood Function
4.1.1 A Subset of mằ Structural Equations
As we pointed out in the previous chapter, LIML is an estimation proce- dure that uses a priori information pertaining only to the equation (or equations) whose parameters we are interested in estimating. A priori re- strictions on the parameters of the remaining equations are completely ignored.
Because LIML is a relatively complex procedure, let us give an outline of the estimation strategy before actually deriving the LIML estimator.
Thus, suppose we are interested in estimating the parameters of a subset of m * (::::: m) equations. Or, alternatively, suppose that, because of the cumbersome nature of the system we must solve in order to obtain FIML estimators, we wish to break up the system into small subsystems, thus reducing the computational complexity involved, and estimate the param- eters of each subsystem separately. Obviously, in doing so we cannot, of course, ignore the remainder of the system. On the other hand, we cannot handle all equations symmetrically, as we do with FIML, for then no com- putational economy arises. The strategy is to concentrate on the parameters of the subsystem of interest without being forced to ignore a great deal of information in the process. How can this be done? We could eliminate the parameters of the remainder of the system by partially maximizing the likelihood function with respect to such parameters. Of course, proceeding in this fashion we shall ignore all a priori restrictions applicable to them.
This last remark is essential. First, by ignoring such a priorirerstrictions we greatly simplify the process of maximization, and second, this feature constitutes the essential distinction between FIML and LIML.
Having expressed the likelihood function in "concentrated" form, i.e.
in a form involving only the parameters of the subsystem of interest, we proceed to obtain maximum likelihood estimators in much the same way as in FIML. We maximize the "concentrated" likelihood function with respect to the unknown parameters it contains, taking the a priorirestrictions on the subsystem in question fully into account.
Suppose that we are interested in the parameters of the first m *(:s; m) equations. Partition the covariance matrix of the error terms of the system (4.1) where ~ll is the covariance matrix of the error terms appearing in the first m* equations, ~22 is the covariance matrix corresponding to the error terms appearing the remaining (m - m*) equations, and ~12, ~21
are the appropriate "cross covariance" matrices.
Next, we wish to transform the system in such a way that
1. the two subsystems are mutually independent;
11. we do not disturb the parameters of the subsystem of interest.
Ifcondition i is satisfied, the information "lost" by not explicitly taking it into account is "minimized" since the second subsystem is unrelated (stochastically) to the first. Ifii is also satisfied, it is possible to obtain estimators for the parameters of interest directly.
Of course, we should point out that a price is still being paid for this simplification. The price is that the maximizing values of the parameters of the second (ignored) subsystem do not necessarily satisfy the a priori restrictions on them. Thus, when used to obtain the concentrated likelihood function, these values entail LIML estimators that will generally be different from what they would have been if all a priori restrictions were respected in the estimation scheme. Ifthe latter holds, we are, of course, reduced to FIML estimation.
Let us, now, see whether a transformation that accomplishes i and ii ac- tually exists. Recall that A* = (B*, -0')' and partition A* = (000, Ao) , so that 000 contains the parameters of the first m* equations of interest, i.e. it consists of the first m* columns of A* .If H is the (conformably partitioned) transforming matrix, and if it is to accomplish our objectives under conditions i and ii above, we must have
H= [Hl l
H2 1 A* H = (000, A~). (4.2)
To make the requirements on H perfectly transparent, we note that the sample on the structural model can be written as ZA* =U, Z = (Y, X) . Transforming by H on the right,1 we find,
ZA*H=UH, (4.3)
and from the set of requirements in Eq. (4.2) we infer that the transforming matrix, H, must (at least) be of the form,
H = [10 HH1222] . (4.4)
The covariance matrix of the transformed system is,
(4.5) where ~22= Hf2~l1H12+H~2~21H12+Hb~12H22+H~2~22H22.
By the requirements of Eq. (4.2), and the representation in Eq. (4.5), we see that H12 must be chosen so that
(4.6) But this means
H 12 = -~li1~12H22' (4.7)
Finally, substituting Eq. (4.7) in the lower right hand block of Eq. (4.5), we find
~22 = H~2(~22 - ~21~111~12)H22'
Since ~ is positive definite, so is ~22 - ~21~1/~12.Thus, there exists a matrix, say Hi:} ,such that
H ' - l H - 1 '" '" " , - 1 " ,
22 22 = "-'22 - "-'21"-'11 "-'12' (4.8) Inparticular, we may choose H22 so that the second equation in Eq. (4.2) is satisfied with P =I, simply by choosing H22 so as to satisfy Eq. (4.8).
We further note that, with this choice of H, the transformed coefficient matrix becomes
Thus, we have established that there exists a matrix H satisfying condi- tions i and ii above, and it is of the form
H = [1
0 -~1/~12H22]~2' (410)
.
1As an exercise, the reader might ask himself: Why transform on the right and not, for example, on the left? Hint: Recall that we are dealing with observations on the vectors Zt.A* = iu,
whereH22 is chosen so that
H~2(L;22- L;21L;111L;dH22=I. (4.11) Now, what is the LF of the transformed system? Since {Ut. :t =1,2, ...,T}
is an i.i.d. sequence of zero mean normal vectors with covariance matrix, L; > 0, we conclude that the vectors H'u~. have the same properties, except that
( " ) I ( ' ) I ( )
Cov HUt. = H E ut.Ut. H = H L;H. 4.12 The joint density of these vectors is
P(U1.H, ... , Ut.H) = (21f)-mT/2 1 H'L;HI-T/2 (4.13)
[ I T ' ]
xexp -'2f;(Ut. H)(H- 1L;-1H -l)(Ut. H)' . Consider the transformation
Ut.H= Yt.B* H - Xt.GH and note that the Jacobian is given by
J = 1H'B*'B*H 11/2 .
Hence the LF of the current endogenous variables is
(4.14)
(4.15)
L(A*, H, L;;Y, X) --In(21f - -InmT ) T 1 H L;HI I
2 2 (4.16)
T I T
+21n1H' B* B*' HI -'2I)Zt.A* H)(H'L;H)-l(Zt.A* H)', t=l
where Zt. = (Yt-, xd. Using the notation in Eqs. (4.3) and (4.4), we can rewrite the last term of the right hand member of Eq. (4.16) as
T
L(Zt.A* H)(H'L;H)-l(Zt.A* H)' = tr(ZA* H)(H'L;H)-l(ZA* H)'.
t=l
(4.17) Our next step is to use the properties of the transforming matrix H so as to derive a notation that separates, as far as possible, the parameters of interest from those in which we have no interest. Thus, partition
(4.18) The obvious meaning is that
B**=B* H = (BI' BII), aD = [~~], -GH= (GI, GIl), (4.19)
and Ao= (B~l' Ch)'. We have, therefore,
0] [a~ ]M [ a~ ]'
I A*'o A*'0
(4.20) Ttr(~lla~Mao)+Ttr(Ao'MAo),
where M = (Z'Z/T). We may, therefore, write the LF as
L(ao, ~1l, A o; Y, X) = c - ~ln I ~1l I +~ln IB**' B** I
T ( 1 , ) T (*' *) ( )
-"2t r ~llaoMao - "2t r A o MAo, 4.21 where c= -(mT/2)ln(27l') . We observe that the transformation has sim- plified matters considerably, in that the parameters of interest, (ao, ~1l),
are almost completely segregated from those in which we have no interest, namely, Ao.A substantial nonlinearity still remains, however, in that the Jacobian term contains a submatrix of ao, viz. BI , and a submatrix of Ao'viz. B IJ .Our next step is to eliminate the nuisance parameters in A(;
by partially maximizing Eq. (4.21) with respect to them, and substituting therein their maximizing values. We stress again that in maximizing with respect to Ao'we neglect all a priori restrictions on its elements and thus, we are left to deal with a smaller, and perhaps simpler, system of equations in obtaining an estimator for ao.
The first order conditions for partial maximization with respect to A(;
are
8L T (8ln IB**' B** I ) T 8trAo'MAo
8vec(Ao)= "2 8vec(BIJ) ' 0 -"2 8vec(Ao) = 0, (4.22)
where we have made use of the notation ofEq. (4.19), i.e. B** = (BI , BIJ).
From Proposition 104, Corollaries 36 and 37, Dhrymes (1984), we obtain 8ln IB**' B** I
avec(BIJ)
Ifwe put we obtain
2vec(B**)'[(B**' B**)-l 0 I]8vec(B**) 8vec(BIJ) 2vec[B**(B**' B**)-l]' [ 0 ]
Im - m*
2vec(B**'-1)' [ 0 ]. (4.23)
Im - m*
B**'-l = o; h) (4.24)
8lnIB**' B** I
--,--'---,---,----'- = 2vec(Jd,
8vec(BIJ )
or,
Moreover, since
we have
OlnIB**' B** I
8BlI =2Jz.
8tr(Ao'MAo) = 2 (A*)'(I M)
8vec(Ao) vee a ®
(4.25)
(4.26) 8tr(Ao'MAo) =2MA*
8(Ao) oã
Thus, writing the first order conditions of Eq. (4.22) in matrix, rather than vector, form yields
[~2 ]_MAo = O.
We further observe that, since
I = B**' B**'-i = B**'J = [ BPi B~IJi we must have
(4.27)
(4.28)
(4.29) Let us partition M by
1 [y,y
M= T X'Y Y'X] _ [i?yy
xx - Mxy i?yx]Mxx (4.30)
and note that
[i?yyBlI+i?yxClI] . MAo =
MxyBlI +MxxClI Hence, Eq. (4.27) merely states that
Jz= MyyBlI +MyxClI' 0= MxyBlI +MxxClI.
Eliminating CII from Eq. (4.32), we find2
(4.31)
(4.32)
(4.33) It will be extremely difficult, if not impossible, to find an explicit ex- pression for the elements of Aoin terms of the elements of ao. Fortu- nately, this is not necessary; what we need, in order to obtain the "concen- trated" likelihood function, are the maximizing values of tr(Ao'MAo) and
2Notice, incidentally, that W is the second moment matrix of residuals of the OLS estimated reduced form of the (entire) system.
ln IB**' B** I .But from Eqs. (4.19) and (4.27) we have
A~'M A~ = (B~I' Ch) [~2] = B~IJz.
Furthermore, in view of Eq. (4.29), B~IJ2 = Im - m* , so that
To obtain the maximizing value of ln IB**' B** I,we note
However,
(4.34)
(4.35)
(4.36)
B~WBII ]
BhWBII ' (4.37)
and from Eqs. (4.29) and (4.33), we find
Finally, from Eqs. (4.36), (4.37), and (4.38), we conclude ln IB**' B** I= ln IB~W B1 I - ln IWIã
(4.38)
(4.39) Inserting the maximizing values of Eqs. (4.35) and (4.39) into Eq. (4.21), we obtain the "concentrated" likelihood function
L(ao, ~1l; Y, X) = c" - ~ln I ~1l I +~ln I B~WBI I
T ( -1 I )
-"2t r ~1l aoMao , (4.40) where c*= -(mT/2)[In(27r) + 1] + ~m*-~lnW.
This essentially accomplishes the task we set out to accomplish initially.
The parameters in which we had no interest, viz. ~12, ~22, Ao ,have been eliminated, and now we are dealing, solely, with ao and ~1l' Next, as in FIML, we make use of all a priori restrictions3 on ao, and maximize Eq. (4.40) with respect to the unknown parameters of ao and ~1l. As in the case of FIML, it is not easy to give an explicit representation for the LIML estimator of ao. However, the same remarks concerning algorithms for obtaining FIML estimators apply here as well.
3There may well be a priori restrictions on 2::11 , and such restrictions, of course, must be used. Typically, however, we do not assert that we know more about 2::11 beyond the fact that it is positive definite.