Corollary 1. In the context of Proposition 3, suppose that
6.1.4 Relation of Structural and Covariance Parameter (ML)
In the discussion of Example 4 we notice, from Eqs. (6.58) and (6.59), that the covariance matrix of the limiting distribution of the estimator of () is not [I((}O)]-l, as one might have expected through intuition carried over from the GLM, and the GNLM. On the other hand, when the covariance matrix is known the familiar result reappears. This raises the question of whether the intuition we carry from such models is based on an incomplete understanding of the conceptual framework, which is sufficiently correct for the GLM and GNLM, but is not appropriate in the context of NLSE. The objective of this section, is to draw attention to certain fundamental aspects of ML theory, similarities and differences between ML and distribution free procedures, as well as between ML estimators in the context of NLSE and other models.
Remark 3. Even though, in Eq. (6.59), we employ the notation I((}O) ,11
which suggests the Fisher information matrix, we should be careful to point out that, in the discussion leading to Eq. (6.59), I((}O) , denotes neither the information matrix, nor its inverse. The reader may perhaps be perplexed as to why the matrix I((}O) is not the information matrix. Indeed, in a great deal of the literature of econometrics this would appear to be the case.
To illuminate the issues raised in Remark 3, we begin with basic ML theory.
Itis well established therein that if LT(¢) is the (log) likelihood function, the Fisher information matrix, as well as the inverse of the covariance
11This is often the case with authors who use derivations based on the con- centrated LF.
matrix of the limiting distribution of the ML estimator of ¢, is given by
( 0) . (a2LT 0)
I ¢ = - i:"'~E¢o a¢a¢(¢) ,or
We may think of this matrix as a block matrix with blocks Ii j , i,j = 1,2, such that (using only the third version above)
Il1(¢O) = . (a2LT 0) h2(¢0) = . (a
2LT 0)
- ¥~~ aeae(¢) , - ¥~~ aeaa(¢ )
h2(¢0) = - plim. (a"""iJ8(¢)2LT 0) , I 21(¢o) = I~2(¢0).
T--->oo a a
The limiting distribution of the ML estimator, say ¢T,of ¢ is obtained from the relation
IÂ* - Âol ::; IÂ- Âolã Inthe preceding, as in all discussion in this chap- ter, we take e to correspond to the structural parameters, and a to correspond to the covariance parameters. Now, if we are interested in the structural parameters alone, we can obtain their marginal distribution from the joint distribution of the ML estimator above. It is an elementary property of normal distributions that the marginal distribution in question is also normal, more precisely
Next, note that we may obtain an estimate of the structural parameters, either by solving for the entire set of parameters and confining our attention to the subset of interest, which is, implicitly, what we did above, or we can concentrate the LF by first maximizing with respect to ~, inserting the maximizing value in theLF, and maximizing the concentratedLF with respect to e.Itis intuitively clear that the two approaches yield the same result. Thus, whatever method is employed, the ML estimator of ewould
have the same limiting distribution, which was easily obtained above from the limiting distribution of ¢T .
On the other hand if one considers the GLM either with i.i.d. errors, or autocorrelated errors, or the GNLM, it would appear that the covari- ance matrix of the limiting distribution of the structural parameters is the equivalent of 1111(0°),and this is true whether the variance or covariance parameters are known a priori or are estimated. For example, in the GLM y = X (3+U, under normality and the standard assumptions we have, asymptotically,
1m A ° 1
vT({3 - (30) rv N(O,[111(¢ )r ),
where ¢ = ({3', a2) ' and
( 0) . ({j2LT 0))
III ¢ = - ¥~~ {)O{)O(¢
( 0) . ({)2LT ( 0)) 1
h2 ¢ = - T--->OCJphm ()a2{) 2 ¢a = 2a4 '
For a system of GLM,12 say, Y = X B +U , with covariance matrix I;o, where restrictions may be imposed on B, so that not all variables appear in all equations, say vec(B) = L *(3 for an appropriate selection matrix, L* ,we similarly obtain
III (¢O) = - ¥~~. ({)2(){3{){3(¢)LT 0) L*'[I;Ol ®Mxx]L*
h2(¢0) -¥~~. ({)2(){3{)aLT (¢)0) = T--->OCJplimL*' [1® (X~U)]
X(I;Ol ®I;Ol ) =0,
122(¢0) = _ plim ({)2() L()T(¢O)) = 2(I;1 ol ®I;Ol ) , 121(¢0) = 1~2(¢0).
T--->OCJ a a
In the relations above, convergence may be a.c., depending on the underly- ing assumptions. In any event, h2(¢0) = 0, and this remains true whether I;o is known a priori, or is concurrently estimated. As for the GNLM, we note from the discussion of Chapter 5 (Sections 5.9 and 5.10), that again whether the covariance matrix I;o is known or not, has no effect on the limiting distribution of the structural parameters.
12This is the so called seemingly unrelated regressions (SUR) model.
LT(8,er)
Let us now take up the issue of the GLSEM and examine whether knowing the covariance matrix I;o has any effect on the limiting distri- bution of vT(8- 80 ) FI M L ;a casual reading of the discussion in Chapter 4might suggest that the answer is negative, since the limiting distribution
vT(8 - 80 ) F I M L is the same as that of vT(8 - 80 h s L s . In fact, this is not generally true, and the limiting distribution of the ML estimator of the structural parameters does depend on what is known about I;o. This aspect was the subject of a paper in the early sixties, Rothenberg and Leen- ders (1964), who basically investigated the consequences of knowing that I; is diagonal. Their findings were presented as an isolated result and it is fair to say that over the years it has not received the attention it deserved.
Here we shall examine how knowledge of I;o affects the properties of the ML estimator of the structural parameter vector, 8. We may write the LF, in the notation of Chapter 3, as
m 1 *' * 1 1 r ,:
-2In(2Jr)+"2 ln lB WB 1- "2lnlI;l- "2trI;- A MA, P = B*'WB*, W=~(Y'NY), N=I-X(X'X)-lX'.(6.70) Recalling that
8vec(A)88 =-L' 8 88vec(B*) = -(I I*)L * (I )
( 9 , 1 = m, OmXG ,
we obtain,
(8~8T) -L'(I(9I*')vec(WB* p-l)+L'vec(MAI;-l)
(88L: ) 'v ="21 (vec(I;-lA'MAI;-l)-vec(A'MA) .) We also establish that
82LT
ill = - - -
8888
+2L'(p-l(9I*'WB*P-1B*'WI*)L 82LT
- - -
888er 82LT - - - 8er8er Itis apparent that
Pora.c,
----+ L,[1(I;O (9 (ITo, 1) Mxx(ITo, I) L' ] +2L,[ 1I;O (9 nO *]L,
346 6. Topics in NLSE Theory
Pora.c.
---+ 0*a 1*'001* .
We shall denote the limits of the entities on the left of the relations above by Iij(qP) .Since by normality fourth order moments exist (in fact all higher even moments are finite and all odd moments are null), applying one of the standard CLT we find
and consequently
vT(¢ - ¢o) ~ N(O, C-1) .
A crucial difference in this result as compared to that in the GLM, SUR, or the GNLM is that the block element C 12= h2(¢0) is not null.
From standard normal theory we find that the marginal distribution of the ML estimator of 8 is given by
where
C*-l = (Cll - C12C2;lC2d-1 = {L'[~-l Q9(II,I)'Mxx(II,I)]L}-1,
in view of the fact that -C12C2;lC21 = -2L'(~-1Q90o)L .
Now, what would happen if the covariance matrix were known? First, we would not need to estimate it; thus, we would have no need for the deriva- tives
8LT 8 2LT 8 2LT
8a' 888a' 8a8a'
Second, the derivative 8LT/88, and the entity III(¢O) will remain the same as before. Consequently,
since, from the results above III(¢O) = Cll .Comparing the two covariance matrices, it is easily established that
Cll - C* :::::0, and hence that C*-l - ClJl ::::: 0,
showing that the efficiency of the ML estimator of {j is improved if the covariance matrix ~o is known. One may prove a similar result if there are valid restrictions on ~o and are imposed in the estimation process. The same, however, cannot be said, in either case, for the 3SLS estimator. We may remind the reader that the latter depends only on a prior consistent estimator of ~o, and no matter what is the nature of its elements, the consistent estimator will converge to ~o ; thus, imposing any restrictions on the consistent estimator can at best have a small sample, not an asymptotic, effect.
Remark 4.The preceding discussion has settled the formalities as to why, in the context of ML estimation, knowing something about the covariance matrix may improve the efficiency of other (structural) parameter estima- tors. However, it has not given us any intuitive principle by which we can judge as to when an improvement will or will not occur. This intuitive principle is easily supplied by the observation that in the (multivariate) normal the mean, f.L, and the covariance matrix, ~, are independent pa- rameters and their respective estimators are mutually independent as well.
Thus, in models where there is a rigid separation between mean and covariance parameters no improvement will occur if we know (all or something of) the covariance matrix; in models where some parameters are both mean and covariance parameters we would expect improvement. All the results that occasioned some "concern" can be "explained" by this prin- ciple, without having to calculate h2(¢O) which is often cumbersome to obtain. Thus, in the GLM (J2 and (3 are distinctly covariance and mean parameters respectively. In the SUR model ~ and (3 are distinctly co- variance and mean parameters as well. Similarly, in the discussion of the GNLM in Section 5.9, it is clear that what we termed there ¢ and ~
are distinctively mean and covariance parameters respectively. But what about the GLSEM?Itwould appear that there too, ~ and {j are distinctly covariance and mean parameters! Unfortunately, this is a false perception since the probability characteristics of the GLSEM are uniquely deter- mined by its likelihood function, and the latter is uniquely determined by the reduced form parameters, II = CB*-l and n = B*'-l~B*-l. Since the parameters of intestest to us are B*, C, ~,wesee that one of them, B* ,is both a mean and covariance parameter. Thus, mean and co- variance parameters are mixed up and what we know about ~ may well affect how we perceive B*.
The preceding also shows that ML is a far more "sophisticated" esti- mator than is 3SLS; this should be borne in mind when dealing with the GNLSEM with additive errors, in which mean and variance parameters
are "mixed up", as in the case of the GLSEM. Before the reader dismisses 3SLS, we ought to point out that the ML estimator is vulnerable, since its
"sophistication" depends on the truth of the assertion that the structural errors are jointly normal; 3SLS, on the other hand, does not depend on such specific distributional assumptions.