A Restricted Least Squares Interpretation

Một phần của tài liệu Topics in advanced econometrics (Trang 62 - 80)

We begin our discussion by postulating the conditions of the GLSEM as given in Section 1.4.1, together with Convention 1; instead of Convention 2, however, we do not implement the restrictions in assumption (A.3) prior to estimation, but rather estimate parameters subject to that restriction.

Thus, in the framework of the CSF, the 2SL8 procedure is derived in the context of the following problem:

~iinST1(a'i) = ~(W'i - Qa.i)'(w.i - Qa.i)

12If A is an m x n matrix then its generalized inverse, or 9-inverse, is a unique n x m matrix, say Ag , such that

i. AAgA=A, ii. AgAAg= A g ,

iii. AAg and AgA are symmetric matrices.

For further discussion see Dhrymes (1984), Ch. 3.

44 Extension of Classical Methods I

Li~ = (e'i,Li i) subject to

where

[ LOi~ 0]

L 2i '

Q = R-1X'Z, A = (B',d)/, (1.93)

a., is the it h column of A, and Lii is as defined in Remark 5; thus, note that Li~ is of full rank and dimension m x m - m; .13 The matrix L 2i is as given in Definition 4 and is a permutation of Gi =G - G, columns of the matrix Ie. Notice, further, that in Eq. (1.93) we have defined Q in a slightly different manner than we did in previous sections; precisely, earlier we had defined Qi = R-1X'Z, and Q = diag(Q1,Q2,"" Qm) . In this contex (of LM-derived 2SLS and 3SLS derived estimators) it is much more convenient to define Q = R-1X' Z so that the Qi of previous sections are simply the submatrices of Q corresponding to the it h equation of the system. For the notation to be totally consistent we should have defined in the earlier sections Q*=diag(Q1,Q2,"" Qm) . We did not do so in the interest of notational simplicity. The likelihood of confusion is eliminated if the reader remembers that the definition Q = R-1X' Z occurs only in the discussion of LM-derived 2SLS and 3SLS esti- mators and topics ancillary to that.

Remark 21. We recall from Proposition 3 that the matrix Li is of di- mension (m+G) x (mi +Gi - 1) and

rank(Li) = mi + Gi - 1.

It follows immediately that Lio is of dimension (m +G) x (mi +Gn and rank mi +Gi ,so that the matrix of restrictions in the constrained problem above is of full column rank. Letting .\.i denote the vector of Lagrange multipliers, the first order conditions yield the set of equations

(1.94) We observe that the matrix to be inverted in Eq. (1.94), for obtaining the estimators of the parameters and the Lagrange multipliers, is not of the

13This notation is necessary since, in this context, the normalization convention is imposed ab initio. Hence, we need notation to indicate the current endogenous variables that are excluded from the right hand side of the it h equation. In the maximum likelihood context, where no normalization convention is imposed until the very end, we need notation to indicate exclusion of current endogenous variables from the it h equation. Hence the dual notation.

(1.95) standard partitioned form in that both diagonal blocks (Q'QIT) and 0 are singular. Due to a result in Dhrymes (1984),14 the matrix above will be nonsingular if

Q'Q+ u:L*o'

T • •

is nonsingular. Since the matrix in question is sample dependent we shall deal with a suitable probability limit which depends only on the underlying structure. Itis easy to demonstrate that

plimQ'Q = (II, 1)'Mxx(II, I) = K

T-+oo T

where K is a matrix of dimension (m+G) x (m +G) and rank G. To show that the estimators defined by the solution to Eq. (1.94) exist, we need to show the validity of

Proposition 6. The matrix

is of full rank (nonsingular) if and only if the it h equation is identified.

Proof: Suppose the matrix above is of full rank, then L'(K't + ir'l, L~O')L't 1,

is also nonsingular. But this matrix is precisely S:MxxSi

the nonsingularity of which15 implies the identifiability of the it h struc- tural equation.

14 Specifically, see Proposition A.I, p.I42.

15As we shall learn when we take up the discussion of maximum likelihood methods, the proper criterion for the identification of the it h equation is that the matrix LY' K LY is of rank m,+G; . Its dimension, however, is m;+G,+1 , which affords us one degree of freedom. It is this feature that requires us to impose some normalization convention. Thus, in the context of the discussion of this section, where the normalization convention is part of the initial specification, we skip this slight complication and concentrate, instead, on the rank condition, as we have stated it above. The reader would do well to bear in mind, however, that, in the absence of an initial normalizing convention, the proper condition for the identifiability of the it h structural equation is that the (mi+G, +1) x (mi+G, +1) matrix

(e'i,O)KLi]

L;KLi

is of rank tn;+G, .Ifthe conventional normalization can be imposed on the it h equation, then we can state that the equation in question is identified if and only if L;KLi= S;MxxSi ,is of full rank, i.e., it is nonsingular.

Next suppose the it h equation is identified; we show that the matrix K +Lt L;o' is nonsingular. Consider the matrix

(1.96) and note that it is a nonsingular matrix of order m+G .If K +L;o L;o' is singular, its column null space contains at least one nonnull m +G- element column vector, say h. We may thus write

h= tt:ie, e= ("e(1),e(2))', (1.97)

where eel) and e(2) are, respectively, m;+G, -and m - m;+G;-element column vectors. Thus, we have

O=h' (K +L*oL*o')hi i =e'HO'(Ki +L*oL*o')HOi i ie =e'Ho'KHoi i e+ 'e(2)e(2)ã

(1.98) Since the first quadratic form in the rightmost member is positive semidef- inite, we must have that e(2) = o. This in turn implies that the first quadratic form obeys e~l)L~KLie(1)= 0, or that L~KLi is singular, un- less e(l) = o.The former, however, is ruled out by the identification of the equation in question. Thus Eq. (1.98) can hold only for h=0 and we conclude that K +L;o L;o' is nonsingular.

q.e.d.

Thus, Eq. (1.94) may be solved uniquely, to obtain o'.i =

where

E (Q'W.i)

2, T ' (1.99)

k Q'Q

T

[l-rV2i - V2i11- L*O(L*o'l-r L*O)-lL*o'l1- ]i i V2i i i V2iã

Making the appropriate substitutions in Eq. (1.99) and noting that

if. Q'W.i - L;" *0' - (- )' X'U.i

2i----;:y--- = a.; - V2i i L i a.,+V2i II,I ----;:y---'

we establish that

(1.100)

(1.101)

>'.i (1.102)

and in addition, since

Z'X(X'X)-1 = (ti, 1)'.

Although it is intuitively clear that the estimator in Eq. (1.101) is the 2SLS estimator, a formal demonstration of this fact is in order. To do so, we first observe that

, , '

L i(a.i - a.i) = O.i - O.i,

and moreover, according to the standard specification, L;o' a., = o.Next, note that

[TTY2i - Y2iT-T L*O(L*o'T-T L*o)-IL*o'TT JL*oi i Y2i i i Y2i i = ,0 H~H{ = LiL~+L;o L;o' = I.

This is so since the matrix H: of Eq. (1.96) is orthogonal; thus, we conclude L ' ('i aãi - aãi)=L'[T-Ti V2i - Y2iTT L*O(L*o'T-T L*O)-IL*o'T-T ]L L'(II- I),X'U.ii i Y2i i i Y2i i i ' T ã To complete the demonstration we need only show that

L 'i [T-Tv2i - TTv2iL*O(L*o'TT L*O)-IL*o'T-T ] Li i v2i i i v2i i == (S-'M-i xxS-i)-1,

where

(1.103)

- X'X

u.; = ---r-.

We note, however, that the matrix (in square brackets) in the left member of Eq. (1.103) is the inverse of the (1,1) block of ]-1, where

(1.104)

- 1 ('- )-1 ,- 1 [L~K,-.Li

J- = H~ V2iH~ = H~V2i H~ = L;o ici;

Itfollows, immediately, from Eq. (1.104) that

L'i [T-TY2i - TTY2iL*O(L*o'TT L*o)-IL*o'T-T ] Li i Y2i i i Y2i i, (1.105)

which completes the formal demonstration that the restricted least squares version of the 28L8 estimator is, indeed, what was claimed to be.

We close this section, by giving the generalization of this procedure to the systemwide 28L8 and 38L8 estimators. For the systemwide 28L8 estimator, we formulate the problem as

T1[w - (I®Q)a]'[w - (I®Q)a] (1.106)

subject to

2L*o'a = 0, where a = vee (~),

and, evidently, L*0 =diag(Lio, L2° , ... ,L;::) .

For the 3SLS estimator, given a prior (consistent) estimate of the covari- ance matrix, say, <i>= (f;Q9I) ,the problem may be posed as

minST3(a)= ![w - (IQ9Q)a]'<i>-l[w - (IQ9Q)a]

a T (1.107)

subject to

2L*0'a= O.

The solution to this problem renders the 3SLS as a restricted GLS, or a restricted feasible Aitken estimator of the structural parameters of the system.

We now show that the formal aspects of these problems are identical to those encountered when we considered a single equation; thus, the objective function of the formulation in Eq. (1.105) is

1 ,

A2 = T [w - (IQ9Q)a]'[w - (IQ9Q)a]+2>'"L*0a

and the equations to be solved for the estimators and the Lagrange multi- pliers (first order conditions) are

where All = (l/T)(I Q9Q)'(IQ9 Q) and X= (>..'1"'" >..'m)' denotes the multipliers corresponding to the restrictions on the parameters (appearing in the right hand side) of the m structural equations. The solution, then, gives the systemwide 2SLS estimator - including those parameters set to zero by prior information.

For the 3SLS case, i.e. the formulation given in Eq. (1.107), the objective function is

1 - 1 '

A3 = T[w - (IQ9Q)a]'<I>- [w - (IQ9Q)a]+2>"'L*0 a and the first order conditions are

where All = (l/T)(IQ9Q)'<i>-l(IQ9Q) and X= (>..'1"'" >..'m)' denotes the multipliers corresponding to the restrictions on the parameters (appearing

in the right hand side) of the m structural equations. The solution to that system gives the 3SLS estimator - including all the parameters set to zero by prior information.

It is quite apparent that the two procedures are formally identical with that employed in the case of a single equation. Thus, in the interest of brevity we shall give the relevant representations only for the 3SLS. Solving the matrix equation above yields,

a E 3 Ct-' 0Q)'W)T '

,\ (L*0'V3L*0)-1L*0'V3 [(t-1~Q)'w ] , (1.108) where

V3 [(t-1®K)+ c:L*0,]-l K Q'Q T

E3 [V3 - V3L*0(L*o'V3L*0)-1L*°'V3 ]. (1.109) Making the appropriate substitutions in Eq. (1.108) we establish that

(1.110)

- [I - (L*o'V3L*0)-ljL*o'a

+(L*°'V3L*0)-1L*o'V3[t -1® (IT,I)'] (I®;)'u.

(1.111)

Remark 22. The procedure outlined above enables us to carry out spec- ification tests utilizing efficient estimators, without having to compute in- efficient consistent estimators for comparison. Thus, if we choose a subset of the prior restrictions for testing, we can test for their validity using the limiting distribution of certain subsets of Lagrange multipliers; we do not need, as in the case of Hausman's test, to obtain the 2SLS estimators and base our test on the difference between them. The precise nature of the tests and their characteristics will be taken up in Chapter 2, after we have derived the limiting distribution of the underlying 2SLS and 3SLS estimators.

Remark 23. In this chapter we have given three independent deriva- tions of the 2SLS and 3SLS estimators. First, the original derivation as

given by Theil for 2SLS, and as given by Theil and Zellner (1962) for 3SLS; second, we have given a derivation based on the canonical structural form (CSF) first given in Dhrymes (1969). Both these procedures impose the prior restrictionsbefore estimation, and obtain parameter estimates given these restrictions. The third procedure again operates with the CSF, but imposes the prior restrictions, other than normalization, by means of Lagrange multipliers and obtains simultaneously estimates of all the parameters as well as the Lagrange multipliers (treated as additional pa- rameters). This procedure is to be distinguished from restricted 2SLS and 3SLS, where the a priori restrictions, including normalization conven- tions, are imposedbefore estimation and only additional restrictions, (beyond the a priori restrictions) are imposed by means of Lagrange mul- tipliers.

Each of the three derivations has its own attractions; the first is evi- dently of historical significance; the second facilitates the understanding of simultaneous equations inference in this context since, in CSF, 2SLS and 3SLS can be rendered as OLS and GLS problems, respectively. Finally, the third derivation makes it particularly simple to test the validity of prior restrictions.

Questions and Problems

1. In connection with the discussion surrounding Eqs. (1.41) and (1.42) verify that

where

X'Xi

Mx i x i = plim - ' - .

T-.oo T

2. In connection with the discussion surrounding Eq. (1.48) show that J'B" = [IIl i ICi]

II 2i 0 . Hint: XIIi = XiII1i+xtII2i = XJdlIIi .

3. In the discussion of the inconsistency of the OLS estimator of the parameters of a structural equation, show that if

8~Mxx8i is invertible, so is

8 ' Mi xx8 [Di + 0ii Hint: Dii is positive definite.

4. In the discussion following Eq. (1.52) show that if

Yi=Yi+Vi

where Vi is the matrix of OLS residuals, then fi'Yi = a

Hint: Vi = [I - X(X' X)-lX'JYi.

5. In connection with Eq. (1.63) show that Q is of full rank.

Hint: All equations of the GLSEM are assumed to be identified.

6. In connection with the proof of part ii. of Corollary 3, verify the reduction of Eq. (1.81) to Eq. (1.83), when the first system is just identified.

Hint: Use the partitioned inverse form for E-1, viz.

E22 (E22 - E21Ei}E 12 ) - 1 , E12 = -Ei}E12E22 ,

E2 1 -"-'22 "-'21"-'~-1~ ~ll,

7. Justify the estimators in Eq. (1.89) by explicitly deriving the first order conditions.

8. Derive the first order conditions represented by Eq. (1.94). Also show that the matrix H~ of Eq. (1.96) is orthogonal.

9. In connection with the estimators in Eq. (1.101), verify their deriva- tion, using the following result for symmetric matrices16

L~;o]-1 = [Bll

a B21 B12 ] . B22 ' B11 -- T-TV2i - V2iiT tr:i TV22T L*O'T-Ti V2i, B12 -- - T-TV2iL*oi v: .22'

B21=-V22L;o'V2i, B22= V22 - V22L;o'V2iL;0;

- -1 - - -1

V2i = (All +A12A21) , V22= (A 22-A21V2iA12-A21 V2iA12A22) . Note: The condition that 1-L;o' V2iL:C be nonsingular in the refer- ence above may be dispensed with, i.e. the proof can be carried out in the absence of this condition.

10. In Proposition 6, verify that L:(K+L;o L;O')L i = S~MxxSi. Hint: S;= (IIi, 1) .

11. In the proof of Proposition 6, verify that the condition L:KL; is singular implies that the it h equation is not identified.

16 Dhrymes (1984), CorollaryA.l,p. 143.

Appendix to Chapter 1

Preliminaries to Hausman's Test

In the last sections of the preceding chapter we had discussed how the validity of the prior restrictions may be tested, and we had mentioned a test proposed by Hausman (1978). In view of the fact that there is a substantial body of empirical research that relies, at least partially, on that test it is useful that the practicing econometrician have a working knowledge of it, its attractive features, as well as its limitations. On the other hand, the test per se is only of peripheral and minor interest in the context of our discussion, and for this reason its discussion is carried out in the appendix.

An important element in the derivation of the Hausman test and its distribution is to show that the difference between the two estimators com- pared, i.e. the consistent and the efficient estimator, is asymptotically independent of the efficient estimator. This feature, actually, has a much broader aspect and is a salient property of minimum variance estima- tors. We begin the analysis of this subject by exploring the characterization of the minimum variance estimator and certain other pertinent aspects of this topic.17

Proposition AI. Let (n, A, P) be a probability space and P e ={Pe :

e E 8} be a collection of probability measures (defined on A), where

17The ensuing discussion is a modification and extension of the treatment of this subject in Rao (1972).

54 Appendix to Chapter 1

e c Rk , i.e., we are dealing _with a k-dimensional parameter space. Let Ug be a class of estimators, (h, such that

Let Ui, be the class of estimators, say (h,such that E(e2 ) = O. Suppose, further, that

Cov(e1 I () =(}o) = Cll >0, Cov(e2 I () =(}o) =C2 2 >O.

A necessary and sufficient condition that an estimator, e1 ,be a minimum variance estimator for g((}) ,at the point ()= (}o , is that

Proof: (Necessity) Let e1 be the minimum variance estimator at the point (}o and let (}2EUo .Let

A = C12C:;}, t ER (i.e. tis a scalar), and consider

If C12 i-0 , we obtain a contradiction by taking t = -1 .Hence, necessity is proved.

(Sufficiency) Let e; e3 Eu.:and suppose e1 is the minimum variance estimator. Then, e= e1 - e3 E ti; .By the previous discussion, Cov(o; eI

() =(}o) =0 , which implies C13= Cll .Consequently,

0::; Cov(e1 - e3 I () = (}o)= Cll - C13 - C3 1+C33 = C33 - Cll,

which proves sufficiency.

q.e.d.

Definition AI. In the context of Proposition AI, let Xi :[2 ---> H"; i = 1,2, ...,T ,

be a sequence of ( n -element) random vectors. A statistic, e, is said to

be sufficient for (), if and only if for every set A E A there exist A- measurable functions, he and r such that

Pe(A) = he(A)r(A), and r does not contain ().

A statistic iJl is said to be complete if and only if for every A- measurable function, g,

Lg(iJl)dPo= 0

implies g((h) = 0, a.c. (almost certainly, or with probability one), for every PoE Pe .

Remark AI. The characterization of a sufficient statistic may also be rendered in the more familiar form, iJ(Xl,X2, . . . ,XT) is sufficient for 0 if and only if the joint density of the observations, say f, obeys

f(Xl ,X2 , . . . ,XT ) = p(iJ; O)r(Xl ,X2 , . . . ,XT ) ,

where p is (proportional to) the density of iJ and r does not depend on O.

We now prove the important

Proposition A2 (Rao-Blackwell Theorem). In the context of Definition AI,let iJ be a sufficient statistic for the parameter 0E e, and let iJl be any other estimator. The following statements are true.

i. If iJl is unbiased for h(0) then

E[s(iJ)] = h(O),

where s(iJ) = E(iJl IiJ) ,i.e., it is the conditional expectation of iJl

given iJ.

ii. For any measurable function, h, iJl is an inefficient estimator of h(O) ,in the mean square error (MSE) sense, i.e.,

E[iJl - h(O)][iJl - h(O)]' ;:::: E[s(iJ) - h(O)][s(iJ) - h(O)]'.

iii. If W is any convex loss function

E[W(iJl ,0)] ;:::: E[W(s(iJ), 0)].

Proof: We note that, by the property of conditional expectations,

which proves part i of the proposition.

To prove part ii, introduce the notation,

and note that

56 Appendix to Chapter 1

Taking expectations conditional on e, the last two members of the right- most member above vanish, so that taking expectations again, with respect to e,we obtain

which proves part i.

As for part iii, we note that by the convexity of W, see for example Proposition 14, in Dhrymes (1989, p. 107),

Consequently, taking expectations again with respect to e, we find

E[W(el ,e)] 2': E[W(s({}),e)] .

q.e.d.

The result above is used to construct an efficient estimator within the class of unbiased estimators, when an unbiased estimator and _a sufficient statistic exist for the parameter in question. Thus, suppose ei , i = 1,2 are, respectively, an unbiased estimator and a sufficient statistic, not necessarily unbiased, for a given parameter e. Clearly, the conditional expectation E(el I (2 ) = 9(e2 ) is a sufficient statistic for e.Moreover,

since el is an unbiased estimator. By the Rao-Blackwell theorem,

In fact, if a complete sufficient statistic is available for a given param- eter, and if an unbiased estimator for the parameter in question exists, we can find aunique minimum variance estimator.

Remark A2. The test suggested by Hausman, essentially, employs the insights gained from Propositions Al and A2 in order to test for misspec- ification in the case of the GLSEM and, indeed, other circumstances as well. What is meant by misspecification is, generally, the condition that the explanatory variables in the GLM (the variables in the matrix X), or the predetermined variables in the GLSEM, are not uncorrelated with the error sequence. Its great advantage is that a test of this kind is almost always available, and in a great variety of circumstances. Its disadvantage and severe limitation is that since the alternative is not clearly stated the intepretation of the results of such tests is highly problematic.

Examples

To illustrate the procedure and the problems involved, consider the GLM y = X (3+U , under the standard assumptions; suppose we wish to test the hypothesis Cov (u )= I:, where I: is an appropriate covariance matrix for which a consistent estimator, say t, is available. Let ~i, i = 1,2, be the OL8 and GL8 (feasible Aitken) estimators of (3. Note that both estimators are consistent and if the null is correct (i.e., if, indeed, Cov(u ) = I: ,) the GL8 estimator is asymptotically efficient relative to the OL8 estimator.

From the discussion above we are led to consider

When we deal with limiting distributions in Chapter 2, the development therein will imply that

vT(~1- ~2) rvN (0,Coi.s - G C L S) ,

where

G O L S plim. (X- -1 X ) - 1 (X/I:X) (X- - - - -1 X ) - 1

T-+oo T T T

( X/I:-lX)-1

GCLS = plim

T-+oo T

Thus, the difference vT(~1- ~2) ,may be taken to be, asymptotically, an unbiased estimator of zero. Again, from Chapter 2, we may establish that the limiting distribution of

vT (~1 ~~2) rv N(O, G),

where

G= [GOLS -oG C L S 0] .

G C L S

This shows that, asymptotically, ~1- ~2 is (uncorrelated) independent of

!J2 .By Proposition AI,18 we conclude that !J2 is the minimum variance estimator.Itfollows, then, that the specification test statistic suggested by Hausman is given, asymptotically, by

18 Strictly speaking, in order to invoke PropositionAl we must show that 132

is, asymptotically, uncorrelated with every unbiased estimator of zero. On the other hand, we could show by more direct means that 132 satisfies the Cramer- Rao bound.

It is not entirely clear just what rejection (or acceptance) would mean in this context.Itcould mean, for example, that the covariance matrix is not of the particular form specified; in such a case rejection would have nothing to do with the issue of whether (X'u/T) ~O.

Perhaps another example will give a sharper illustration of the ambigui- ties entailed by such tests. Consider again the standard GLM above, and suppose it is desired to examine the restriction R{3= r , where Rand r are appropriate and known matrix and vector, respectively. Let

- - 1 -

{32 ={31+(X' X)- R'C(r - R{3r),

i.e., {32 is the restricted least squares estimator. Ifthe error sequence obeys the standard conditions then

As in the standard Hausman context, when R{3= r is true, we have an efficient estimator, viz. ih.The OLB estimator, /31, is inefficient when the parameter constraint holds but, is nonetheless consistent whether the condition R{3=r is or is not valid. Moreover, the restricted estimator will be inconsistent if the condition above fails to hold. From the distribution above we can now construct the Hausman statistic. Before we do, however, let us simplify the notation somewhat. Put

and note that

iJ= (X'X)-lR'C(R{3 - r) +HX'u.

Thus under the null, R{3= r ,

and, again under the null, R{3= r, the misspecification test statistic is

iJ'H iJ

fJ= - - 2 -9-,

0-

where H9 is the generalized inverse of H. After some manipulation we may write the equation above as

u'XHX'u fJ=

Since, for finite samples, 0-2 is not known, we may divide fJ by u'Nu/0-2 ,

where N = I - X (X' X)-1X' , thereby obtaining the statistic u'XHX'u

IL* = - - - -

u'Nu

Một phần của tài liệu Topics in advanced econometrics (Trang 62 - 80)

Tải bản đầy đủ (PDF)

(424 trang)