Book Econometric Analysis of Cross Section and Panel Data By Wooldridge - Chapter 14 pptx

14 Generalized Method of Moments and Minimum Distance Estimation In Chapter we saw how the generalized method of moments (GMM) approach to estimation can be applied to multiple-equation linear models, including systems of equations, with exogenous or endogenous explanatory variables, and to panel data models In this chapter we extend GMM to nonlinear estimation problems This setup allows us to treat various e‰ciency issues that we have glossed over until now We also cover the related method of minimum distance estimation Because the asymptotic analysis has many features in common with Chapters and 12, the analysis is not quite as detailed here as in previous chapters A good reference for this material, which fills in most of the gaps left here, is Newey and McFadden (1994) 14.1 Asymptotic Properties of GMM Let fwi A RM : i ¼ 1; 2; g denote a set of independent, identically distributed random vectors, where some feature of the distribution of wi is indexed by the P Â parameter vector y The assumption of identical distribution is mostly for notational convenience; the following methods apply to independently pooled cross sections without modification We assume that for some function gðwi ; y Þ A RL , the parameter yo A Y H RP satisfies the moment assumptions E½gðwi ; yo Þ ¼ ð14:1Þ As we saw in the linear case, where gðwi ; y Þ was of the form Z i0 ðyi À X i y Þ, a minimal requirement for these moment conditions to identify yo is L b P If L ¼ P, then the analogy principle suggests estimating yo by setting the sample counterpart, PN N iẳ1 gwi ; y ị, to zero In the linear case, this step leads to the instrumental vari^ ables estimator [see equation (8.22)] When L > P, we can choose y to make the sample average close to zero in an appropriate metric A generalized method of PN ^ moments (GMM) estimator, y , minimizes a quadratic form in iẳ1 gwi ; y ị: " #0 " # N N X X ^ gðwi ; y Þ X gwi ; y ị 14:2ị yAY iẳ1 iẳ1 ^ where X is an L Â L symmetric, positive semidefinite weighting matrix Consistency of the GMM estimator follows along the lines of consistency of the PN M-estimator in Chapter 12 Under standard moment conditions, N iẳ1 gwi ; y ị ^ p satisfies the uniform law of large numbers (see Theorem 12.1) If, X ! X o , where X o is an L Â L positive definite matrix, then the random function 422 Chapter 14 " QN ðy Þ N À1 N X #0 " # N X ^ N À1 gðwi ; y Þ X gðwi ; y ị iẳ1 14:3ị iẳ1 converges uniformly in probability to fEẵgwi ; y ịg X o fEẵgwi ; y Þg ð14:4Þ Because X o is positive definite, yo uniquely minimizes expression (14.4) For completeness, we summarize with a theorem containing regularity conditions: theorem 14.1 (Consistency of GMM): Assume that (a) Y is compact; (b) for each y A Y, gðÁ ; y Þ is Borel measurable on W; (c) for each w A W, gðw; ÁÞ is continuous on Y; (d) jgj ðw; y Þj a bðwÞ for all y A Y and j ¼ 1; ; L, where bðÁÞ is a nonnegative ^ p function on W such that Eẵbwị < y; (e) X ! X o , an L Â L positive definite matrix; ^ and (f ) yo is the unique solution to equation (14.1) Then a random vector y exists ^ p that solves problem (14.2), and y ! yo If we assume only that X o is positive semidefinite, then we must directly assume that yo is the unique minimizer of expression (14.4) Occasionally this generality is useful, but we will not need it Under the assumption that gðw; ÁÞ is continuously diÔerentiable on intYị, yo A intYị, and other standard regularity conditions, we can easily derive the limiting ^ distribution of the GMM estimator The first-order condition for y can be written as " #0 " # N N X X ^Þ X ^Þ ^ ‘y gðwi ; y gwi ; y 14:5ị iẳ1 iẳ1 Dene the L Â P matrix G o E½‘y gðwi ; yo Þ ð14:6Þ which we assume to have full rank P This assumption essentially means that the moment conditions (14.1) are nonredundant Then, by the WLLN and CLT, N À1 N X p ‘y gðwi ; yo Þ ! G o i¼1 and N À1=2 N X gðwi ; yo Þ ¼ Op ð1Þ ð14:7Þ i¼1 PN ^ respectively Let g i ðy Þ gðwi ; y Þ A mean value expansion of iẳ1 gwi ; y ị about yo , appropriate standardizations by the sample size, and replacing random averages with their plims gives 0 ¼ Go X o N À1=2 N X i¼1 pffiffiffiffiffi ^ g i yo ị ỵ A o N y yo ị ỵ op 1ị 14:8ị Generalized Method of Moments and Minimum Distance Estimation 423 where A o Go X o G o ð14:9Þ Since A o is positive definite under the given assumptions, we have N X pffiffiffiffi d ^ N y yo ị ẳ ÀẦ1 Go X o N À1=2 g i ðyo Þ þ op ð1Þ ! Normalð0; Ầ1 B o Ầ1 Þ o o o iẳ1 14:10ị where B o Go X o L o X o G o ð14:11Þ and L o Eẵg i yo ịg i yo ị ẳ Varẵg i yo ị 14:12ị Expression (14.10) gives the influence function representation for the GMM estimator, and it also gives the limiting distribution of the GMM estimator We summarize with a theorem, which is essentially given by Newey and McFadden (1994, Theorem 3.4): theorem 14.2 (Asymptotic Normality of GMM): In addition to the assumptions in Theorem 14.1, assume that (a) yo is in the interior of Y; (b) gw; ị is continuously diÔerentiable on the interior of Y for all w A W; (c) each element of gðw; yo Þ has finite second moment; (d) each element of ‘y gðw; y Þ is bounded in absolute value by a function bwị, where Eẵbwị < y; and (e) G o in expression (14.6) has rank P Then ^ expression (14.10) holds, and so Avary ị ẳ A1 B o AÀ1 =N o o ^ Estimating the asymptotic variance of the GMM estimator is easy once y has been obtained A consistent estimator of L o is given by ^ L N À1 N X ^ ^ g i y ịg i y ị 14:13ị iẳ1 ^ ^ ^^ and Avarðy Þ is estimated as Ầ1 BAÀ1 =N, where ^ ^ ^^ A G XG; ^ ^^^^ ^ B G XLXG ð14:14Þ and ^ G N À1 N X i¼1 ^ ‘y g i ðy Þ ð14:15Þ 424 Chapter 14 As in the linear case in Section 8.3.3, an optimal weighting matrix exists for the ^ given moment conditions: X should be a consistent estimator of LÀ1 When X o ¼ LÀ1 , o o pffiffiffiffiffi ^ À yo Þ ¼ ðG LÀ1 G o ÞÀ1 Thus the diÔerence in asymptotic B o ẳ A o and Avar N ðy o o ^ variances between the general GMM estimator and the estimator with plim X ¼ LÀ1 is o 0 0 ðGo X o G o ÞÀ1 ðGo X o L o X o G o ÞðGo X o G o ÞÀ1 À ðGo LÀ1 G o ÞÀ1 o ð14:16Þ This expression can be shown to be positive semidefinite using the same argument as in Chapter (see Problem 8.5) In order to obtain an asymptotically e‰cient GMM estimator we need a prelimi^ ^ ^ ^ nary estimator of yo in order to obtain L Let y be such an estimator, and define L as ^ ^ in place of y Then, an e‰cient GMM estimator ^ in expression (14.13) but with y [given the function gðw; y Þ] solves " #0 " # N N X X ^ gðwi ; y Þ LÀ1 gðwi ; y ị 14:17ị yAY iẳ1 iẳ1 and its asymptotic variance is estimated as ^ ^ ^ Av^rðy Þ ¼ ðG LÀ1 GÞÀ1 =N a ^ ð14:18Þ As in the linear case, an optimal GMM estimator is called the minimum chi-square estimator because " #0 " # N N X X À1 ^ ^ ^ N À1=2 g i ðy Þ L N À1=2 g i ðy Þ 14:19ị iẳ1 iẳ1 has a limiting chi-square distribution with L À P degrees of freedom under the conditions of Theorem 14.2 Therefore, the value of the objective function (properly standardized by the sample size) can be used as a test of any overidentifying restrictions in equation (14.1) when L > P If statistic (14.19) exceeds the relevant critical value in a wLÀP distribution, then equation (14.1) must be rejected: at least some of the moment conditions are not supported by the data For the linear model, this is the same statistic given in equation (8.49) As always, we can test hypotheses of the form H : cðyo Þ ¼ 0, where cðy Þ is a Q Â vector, Q a P, by using the Wald approach and the appropriate variance matrix estimator A statistic based on the diÔerence in objective functions is also available if ~ the minimum chi-square estimator is used so that B o ¼ A o Let y denote the solution ^ to problem (14.17) subject to the restrictions cy ị ẳ 0, and let y denote the unrestricted estimator solving problem (14.17); importantly, these both use the same weighting Generalized Method of Moments and Minimum Distance Estimation 425 ^ ^ matrix LÀ1 Typically, L is obtained from a first-stage, unrestricted estimator Assuming that the constraints can be written in implicit form and satisfy the conditions discussed in Section 12.6.2, the GMM distance statistic (or GMM criterion function statistic) has a limiting wQ distribution: (" #0 " # " #0 " #) N N N N X X X X d À1 À1 ~ ^ ~ ^ ^ ^ g ðy Þ L g ðy Þ À g ðy Þ L g y ị =N ! w 14:20ị i iẳ1 i i¼1 i i¼1 i Q i¼1 When applied to linear GMM problems, we obtain the statistic in equation (8.45) One nice feature of expression (14.20) is that it is invariant to reparameterization of the null hypothesis, just as the quasi-LR statistic is invariant for M-estimation Therefore, we might prefer statistic (14.20) over the Wald statistic (8.48) for testing nonlinear restrictions in linear models Of course, the computation of expression (14.20) is more di‰cult because we would actually need to carry out estimation subject to nonlinear restrictions A nice application of the GMM methods discussed in this section is two-step estimation procedures, which arose in Chapters 6, 12, and 13 Suppose that the estimator ^ y —it could be an M-estimator or a GMM estimator—depends on a first-stage esti^ ^ mator, g A unified approach to obtaining the asymptotic variance of y is to stack the ^ ^ first-order conditions for y and g into the same function gðÁÞ This is always possible ^ for the estimators encountered in this book For example, if g is an M-estimator PN ^ ^ solving iẳ1 swi ; gị ẳ 0, and y is a two-step M-estimator solving N X ^^ hðwi ; y; gị ẳ 14:21ị iẳ1 ^ then we can obtain the asymptotic variance of y by defining ! hðw; y; gị gw; y; gị ẳ sw; gị and applying the GMM formulas The first-order condition for the full GMM problem reproduces the first-order conditions for each estimator separately ^ ^ In general, either g, y , or both might themselves be GMM estimators Then, stacking the orthogonality conditions into one vector can simplify the derivation of ^ the asymptotic variance of the second-step estimator y while also ensuring e‰cient estimation when the optimal weighting matrix is used Finally, sometimes we want to know whether adding additional moment conditions does not improve the e‰ciency of the minimum chi-square estimator (Adding 426 Chapter 14 additional moment conditions can never reduce asymptotic e‰ciency, provided an e‰cient weighting matrix is used.) In other words, if we start with equation (14.1) but add new moments of the form Eẵhw; yo ị ẳ 0, when does using the extra moment conditions yield the same asymptotic variance as the original moment conditions? Breusch, Qian, Schmidt, and Wyhowski (1999) prove some general redundancy results for the minimum chi-square estimator Qian and Schmidt (1999) study the problem of adding moment conditions that not depend on unknown parameters, and they characterize when such moment conditions improve e‰ciency 14.2 Estimation under Orthogonality Conditions In Chapter we saw how linear systems of equations can be estimated by GMM under certain orthogonality conditions In general applications, the moment conditions (14.1) almost always arise from assumptions that disturbances are uncorrelated with exogenous variables For a G Â vector rðwi ; y Þ and a G Â L matrix Z i , assume that yo satisfies E½Z i0 rðwi ; yo Þ ¼ ð14:22Þ The vector function rðwi ; y Þ can be thought of as a generalized residual function The matrix Z i is usually called the matrix of instruments Equation (14.22) is a special case of equation (14.1) with gðwi ; y Þ Z i0 rðwi ; y Þ In what follows, write ri ðy Þ rðwi ; y Þ Identification requires that yo be the only y A Y such that equation (14.22) holds Condition e of the asymptotic normality result Theorem 14.2 requires that rank EẵZi0 y ri yo ị ẳ P (necessary is L b P) Thus, while Z i must be orthogonal to ri ðyo Þ, Z i must be su‰ciently correlated with the G Â P Jacobian, ‘y ri ðyo Þ In the linear case where rwi ; y ị ẳ yi À X i y, this requirement reduces to EðZ i0 X i Þ having full column rank, which is simply Assumption SIV.2 in Chapter Given the instruments Z i , the e‰cient estimator can be obtained as in Section 14.1 ^ ^ A preliminary estimator y is usually obtained with ! À1 N X ^ N Z Zi 14:23ị X i iẳ1 ^ ^ so that y solves " #0 " # #À1 " N N N X X X 0 À1 Z i ri ðy Þ N Zi Zi Z i ri y ị yAY iẳ1 iẳ1 iẳ1 14:24ị Generalized Method of Moments and Minimum Distance Estimation 427 The solution to problem (14.24) is called the nonlinear system 2SLS estimator; it is an example of a nonlinear instrumental variables estimator From Section 14.1, we know that the nonlinear system 2SLS estimator is guaran2 teed to be the e‰cient GMM estimator if for some so > 0, E½Z i0 ri ðyo Þri ðyo Þ Z i ¼ so EðZ i0 Z i Þ Generally, this is a strong assumption Instead, we can obtain the minimum chi-square estimator by obtaining ^ L ¼ N À1 N X ^ ^ ^ ^ Z i0 ri ðy Þri ðyÞ Z i 14:25ị iẳ1 and using this in expression (14.17) In some cases more structure is available that leads to a three-stage least squares estimator In particular, suppose that E½Z i0 ri yo ịri yo ị Z i ẳ EZ i0 W o Z i Þ ð14:26Þ where W o is the G G matrix W o ẳ Eẵri yo ịri yo ị 14:27ị When Eẵri yo Þ ¼ 0, as is almost always the case under assumption (14.22), W o is the variance matrix of ri ðyo Þ As in Chapter 8, assumption (14.26) is a kind of system homoskedasticity assumption By iterated expectations, a su‰cient condition for assumption (14.26) is Eẵri yo ịri yo ị j Z i ẳ W o 14:28ị However, assumption (14.26) can hold in cases where assumption (14.28) does not If assumption (14.26) holds, then L o can be estimated as ^ L ¼ N À1 N X ^ Z i0 WZ i ð14:29Þ ^ ^ ^ ^ ri yịri y ị 14:30ị iẳ1 where ^ W ẳ N À1 N X i¼1 ^ ^ and y is a preliminary estimator The resulting GMM estimator is usually called the nonlinear 3SLS (N3SLS) estimator The name is a holdover from the traditional 428 Chapter 14 3SLS estimator in linear systems of equations; there are not really three estimation steps We should remember that nonlinear 3SLS is generally ine‰cient when assumption (14.26) fails The Wald statistic and the QLR statistic can be computed as in Section 14.1 In ~ ~ addition, a score statistic is sometimes useful Let y be a preliminary ine‰cient esti~ ~ mator with Q restrictions imposed The estimator y would usually come from prob~ lem (14.24) subject to the restrictions cy ị ẳ Let L be the estimated weighting ~ ~ Let y be the minimum chi~ matrix from equation (14.25) or (14.29), based on y ~À1 Then the score statistic is based on the square estimator using weighting matrix L limiting distribution of the score of the unrestricted objective function evaluated at the restricted estimates, properly standardized: " #0 " # N N X X À1 0 ~Þ LÀ1 N À1=2 ~Þ ~ N Z ‘y ri ðy Z ri ð y ð14:31Þ i i i¼1 i¼1 ~ ~ ~ r Let ~i G LÀ1 Z i0~i , where G is the first matrix in expression (14.31), and let s o s i À1 o Go L o Z i ri Then, following the proof in Section 12.6.2, it can be shown that equa0 tion (12.67) holds with A o Go LÀ1 G o Further, since B o ¼ A o for the minimum chio square estimator, we obtain !0 ! N N X X ~i A1 ~i =N LM ẳ s ~ s 14:32ị i¼1 i¼1 ~ ~ ~ ~ where A ¼ G LÀ1 G Under H and the usual regularity conditions, LM has a limit2 ing wQ distribution 14.3 Systems of Nonlinear Equations A leading application of the results in Section 14.2 is to estimation of the parameters in an implicit set of nonlinear equations, such as a nonlinear simultaneous equations model Partition wi as yi A RJ , x i A RK and, for h ¼ 1; ; G, suppose we have q1 ðyi ; x i ; yo1 ị ẳ ui1 14:33ị qG yi ; x i ; yoG ị ẳ uiG where yoh is a Ph Â vector of parameters As an example, write a two-equation SEM in the population as Generalized Method of Moments and Minimum Distance Estimation g 429 y1 ẳ x1 d1 ỵ g1 y2 ỵ u1 14:34ị y2 ẳ x2 d2 ỵ g3 y1 ỵ u2 ð14:35Þ (where we drop ‘‘o’’ to index the parameters) This model, unlike those covered in Section 9.5, is nonlinear in the parameters as well as the endogenous variables Nevertheless, assuming that Eug j xị ẳ 0, g ẳ 1; 2, the parameters in the system can be g estimated by GMM by dening q1 y; x; y1 ị ẳ y1 À x1 d1 À g1 y2 and q2 y; x; y ị ẳ y2 x d2 À g y1 Generally, the equations (14.33) need not actually determine yi given the exogenous variables and disturbances; in fact, nothing requires J ¼ G Sometimes equations (14.33) represent a system of orthogonality conditions of the form Eẵqg y; x; yog ị j x ẳ 0, g ¼ 1; ; G We will see an example later Denote the P Â vector of all parameters by yo , and the parameter space by Y H P R To identify the parameters we need the errors uih to satisfy some orthogonality conditions A general assumption is, for some subvector x ih of x i , Euih j x ih ị ẳ 0; h ¼ 1; 2; ; G ð14:36Þ This allows elements of x i to be correlated with some errors, a situation that sometimes arises in practice (see, for example, Chapter and Wooldridge, 1996) Under assumption (14.36), let z ih f h ðx ih Þ be a Â Lh vector of possibly nonlinear functions of x i If there are no restrictions on the yoh across equations we should have Lh b Ph so that each yoh is identified By iterated expectations, for all h ¼ 1; ; G, Eðzih uih Þ ¼ ð14:37Þ provided appropriate moments exist Therefore, we obtain a set of orthogonality conditions by defining the G Â L matrix Z i as the block diagonal matrix with z ig in the gth block: z i1 0 Á Á Á z i2 Á Á Á 7 ð14:38Þ Zi 0 Á Á Á z iG where L L1 ỵ L ỵ ỵ LG Letting rwi ; y ị qyi ; x i ; y ị ẵqi1 y1 Þ; ; qiG ðyG Þ , equation (14.22) holds under assumption (14.36) When there are no restrictions on the yg across equations and Z i is chosen as in matrix (14.38), the system 2SLS estimator reduces to the nonlinear 2SLS (N2SLS) estimator (Amemiya, 1974) equation by equation That is, for each h, the N2SLS estimator solves 430 Chapter 14 " N X yh #0 zih qih yh ị N iẳ1 N X ! " À1 zih z ih i¼1 N X # zih qih yh ị 14:39ị iẳ1 Given only the orthogonality conditions (14.37), the N2SLS estimator is the e‰cient estimator of yoh if 2 Eðuih zih z ih ị ẳ soh Ezih z ih ị 14:40ị ^ 2 2 ^ where soh Eðuih Þ; su‰cient for condition (14.40) is Eðuih j x ih Þ ¼ soh Let yh denote the N2SLS estimator Then a consistent estimator of soh is ^2 sh N N X ^2 ^ uih 14:41ị iẳ1 ^ ^ ^ ^ where uih qh ðyi ; x i ; y h Þ are the N2SLS residuals Under assumptions (14.37) and ^ ^ (14.40), the asymptotic variance of y h is estimated as ^2 sh 8" N < X : ^ ^ zih ‘yh qih ðy h Þ i¼1 #0 N X ! " À1 zih z ih i¼1 N X ^ ^ zih ‘yh qih yh ị iẳ1 #91 = ; 14:42ị ^ ^ where ‘yh qih ðyh Þ is the Â Ph gradient If assumption (14.37) holds but assumption (14.40) does not, the N2SLS estimator pffiffiffiffi ffi is still N -consistent, but it is not the e‰cient estimator that uses the orthogonality condition (14.37) whenever Lh > Ph [and expression (14.42) is no longer valid] A more e‰cient estimator is obtained by solving " #0 # ! " À1 N N N X X X À1 ^2 z z ih ^ z qih ðyh Þ N z qih yh ị u yh ih ih ih iẳ1 iẳ1 ih i¼1 with asymptotic variance estimated as 8" #0 #9À1 !À1 " < X = N N N X X ^ ^ 0 ^2 ^ ^ ^ zih ‘yh qih ðyh Þ zih ‘yh qih ðyh Þ uih zih z ih : i¼1 ; i¼1 i¼1 This estimator is asymptotically equivalent to the N2SLS estimator if assumption (14.40) happens to hold Rather than focus on one equation at a time, we can increase e‰ciency if we estimate the equations simultaneously One reason for doing so is to impose cross equation restrictions on the yoh The system 2SLS estimator can be used for these Generalized Method of Moments and Minimum Distance Estimation j Eẵ1 ỵ ait ịcit =ci; t1 ịl o j Ii; t1 ẳ ỵ ÞÀ1 expðx it bo Þ 435 ð14:50Þ where Iit is family i’s information set at time t and x it h i; tÀ1 À h it ; equation (14.50) assumes that h it À h i; tÀ1 A Ii; tÀ1 , an assumption which is often reasonable Given equation (14.50), we can define a residual function for each t: j rit y ị ẳ ỵ ait ịcit =ci; tÀ1 ÞÀl À expðx it b Þ ð14:51Þ where ð1 þ dÞÀ1 is absorbed in an intercept in x it Let wit contain cit , ci; tÀ1 , ait , and x it Then condition (14.48) holds, and l o and bo can be estimated by GMM Returning to condition (14.48), valid instruments at time t are functions of information known at time t À 1: z t ¼ f t ðwtÀ1 ; ; w1 Þ ð14:52Þ The T Â residual vector is rðw; y ị ẳ ẵr1 w1 ; y ị; ; rT ðwT ; y Þ , and the matrix of instruments has the same form as matrix (14.38) for each i (with G ¼ T) Then, the minimum chi-square estimator can be obtained after using the system 2SLS estimator, although the choice of instruments is a nontrivial matter A common choice is linear and quadratic functions of variables lagged one or two time periods Estimation of the optimal weighting matrix is somewhat simplified under the conditional moment restrictions (14.48) Recall from Section 14.2 that the optimal estimator uses the inverse of a consistent estimator of L o ẳ EẵZ i0 ri yo ịri yo ị Z i Under condition (14.48), this matrix is block diagonal Dropping the i subscript, the s; tị block is Eẵrs ðyo Þrt ðyo Þzs0 z t For concreteness, assume that s < t Then z t ; zs , and rs ðyo Þ are all functions of wtÀ1 ; wtÀ2 ; ; w1 By iterated expectations it follows that Eẵrs yo ịrt yo ịzs0 z t ẳ Efrs yo ịzs0 z t Eẵrt yo Þ j wtÀ1 ; ; w1 g ¼ and so we only need to estimate the diagonal blocks of EẵZ i0 ri yo ịri yo ị Z i : N À1 N X ^2 z z it ^ it rit 14:53ị iẳ1 ^ is a consistent estimator of the tth block, where the ît are obtained from an ine‰cient r GMM estimator In cases where the data frequency does not match the horizon relevant for decision making, the optimal matrix does not have the block diagonal form: some oÔ-diagonal blocks will be nonzero See Hansen (1982) for the pure time series case Ahn and Schmidt (1995) apply nonlinear GMM methods to estimate the linear, unobserved eÔects AR(1) model Some of the orthogonality restrictions they use are nonlinear in the parameters of interest In Part IV we will cover nonlinear panel data 436 Chapter 14 models with unobserved eÔects For the consumption example, we would like to allow for a family-specific rate of time preference, as well as unobserved family tastes Orthogonality conditions can often be obtained in such cases, but they are not as straightforward to obtain as in the previous example 14.5 E‰cient Estimation In Chapter we obtained the e‰cient weighting matrix for GMM estimation of linear models, and we extended that to nonlinear models in Section 14.1 In Chapter 13 we asserted that maximum likelihood estimation has some important e‰ciency properties We are now in a position to study a framework that allows us to show the e‰ciency of an estimator within a particular class of estimators, and also to find e‰cient estimators within a stated class Our approach is essentially that in Newey and McFadden (1994, Section 5.3), although we will not use the weakest possible assumptions Bates and White (1993) proposed a very similar framework and also considered time series problems 14.5.1 A General E‰ciency Framework Most estimators in econometrics—and all of the ones we have studied—are asymptotically normal, with variance matrices of the form V ẳ A1 Eẵswịswị ðA0 ÞÀ1 pffiffiffiffi ffi N- ð14:54Þ where, in most cases, sðwÞ is the score of an objective function (evaluated at yo ) and A is the expected value of the Jacobian of the score, again evaluated at yo (We suppress an ‘‘o’’ subscript here, as the value of the true parameter is irrelevant.) All M-estimators with twice continuously diÔerentiable objective functions (and even some without) have variance matrices of this form, as GMM estimators The following lemma is a useful su‰cient condition for showing that one estimator is more e‰cient than another pffiffiffiffi ffi ^ ^ lemma 14.1 (Relative E‰ciency): Let y1 and y be two N -asymptotically normal estimators of the P Â parameter vector yo , with asymptotic variances of the form (14.54) (with appropriate subscripts on A, s, and V) If for some r > 0, Eẵs1 wịs1 wị ẳ rA1 14:55ị Eẵs wịs1 wị ẳ rA2 ð14:56Þ then V2 À V1 is positive semidefinite Generalized Method of Moments and Minimum Distance Estimation 437 The proof of Lemma 14.1 is given in the chapter appendix Condition (14.55) is essentially the generalized information matrix equality (GIME) ^ we introduced in Section 12.5.1 for the estimator y1 Notice that A1 is necessarily symmetric and positive definite under condition (14.55) Condition (14.56) is new In most cases, it says that the expected outer product of the scores s and s1 equals the expected Jacobian of s (evaluated at yo ) In Section 12.5.1 we claimed that the GIME plays a role in e‰ciency, and Lemma 14.1 shows that it does so Verifying the conditions of Lemma 14.1 is also very convenient for constructing simple forms of the Hausman (1978) statistic in a variety of contexts Provided that the two estimators are jointly asymptotically normally distributed—something that is pffiffiffiffiffi almost always true when each is N -asymptotically normal, and that can be verified by stacking the first-order representations of the estimators—assumptions (14.55) and pffiffiffiffiffi pffiffiffiffiffi ^ ^ (14.56) imply that the asymptotic covariance between ffi N ðy À yo Þ and N ðy1 À yo Þ pffiffiffiffi À1 À1 À1 À1 À1 ^ is A Es s1 ịA1 ẳ A rA2 ịA1 ẳ p ẳ Avarẵ N y1 yo ị In other words, the rA1 asymptotic covariance between the ( N -scaled) estimators is equal to the asymptotic pffiffiffiffi ffi ^ ^ variance of the e‰cient estimator This equality implies that Avarẵ N y y1 ị ẳ V2 þ V1 À C À C ¼ V2 þ V1 À 2V1 ¼ V2 À V1 , where C is the asymptotic covariance If ffiffiffiffiffi2 À V1 is actually positive ffidefinite (rather than just positive semidefinite), then pV pffiffiffiffi ^ ^ ^ ^ a ^ ^ ½ N ðy y1 ị V2 V1 ị1 ẵ N ðy À y1 Þ @ wP under the assumptions of Lemma ^ 14.1, where Vg is a consistent estimator of Vg , g ¼ 1; Statistically significant diÔer^ ^ ences between y and y1 signal some sort of model misspecification (See Section 6.2.1, where we discussed this form of the Hausman test for comparing 2SLS and OLS to test whether the explanatory variables are exogenous.) If assumptions (14.55) and (14.56) not hold, this standard form of the Hausman statistic is invalid Given Lemma 14.1, we can state a condition that implies e‰ciency of an estimator in an entire class of estimators It is useful to be somewhat formal in defining the relevant class of estimators We so by introducing an index, t For each t in an ^ index set, say, T, the estimator y t has an associated s t and A t such that the asymppffiffiffiffi ffi ^t À yo Þ has the form (14.54) The index can be very abstract; it totic variance of N ðy pffiffiffiffiffi simply serves to distinguish diÔerent N -asymptotically normal estimators of yo For example, in the class of M-estimators, the set T consists of objective functions qðÁ ; ÁÞ such that yo uniquely minimizes Eẵqw; y ị over Y, and q satises the twice continuously diÔerentiable and bounded moment assumptions imposed for asymptotic normality For GMM with given moment conditions, T is the set of all L Â L positive definite matrices We will see another example in Section 14.5.3 Lemma 14.1 immediately implies the following theorem ^ theorem 14.3 (E‰ciency in a Class of Estimators): Let fy t : t A Tg be a class of pffiffiffiffiffi N -asymptotically normal estimators with variance matrices of the form (14.54) If 438 Chapter 14 for some t Ã A T and r > E½s t wịs t wị ẳ rA t ; all t A T ð14:57Þ ^ ^ then yt Ã is asymptotically relatively e‰cient in the class fy t : t A Tg This theorem has many applications If we specify a class of estimators by defining ^ the index set T, then the estimator yt Ã is more e‰cient than all other estimators in the class if we can show condition (14.57) [A partial converse to Theorem 14.3 also ^ holds; see Newey and McFadden (1994, Section 5.3).] This is not to say that yt Ã is pffiffiffiffiffi necessarily more e‰cient than all possible N -asymptotically normal estimators If there is an estimator that falls outside of the specified class, then Theorem 14.3 does ^ not help us to compare it with yt Ã In this sense, Theorem 14.3 is a more general (and asymptotic) version of the Gauss-Markov theorem from linear regression analysis: while the Gauss-Markov theorem states that OLS has the smallest variance in the class of linear, unbiased estimators, it does not allow us to compare OLS to unbiased estimators that are not linear in the vector of observations on the dependent variable 14.5.2 E‰ciency of MLE Students of econometrics are often told that the maximum likelihood estimator is ‘‘e‰cient.’’ Unfortunately, in the context of conditional MLE from Chapter 13, the statement of e‰ciency is usually ambiguous; Manski (1988, Chapter 8) is a notable exception Theorem 14.3 allows us to state precisely the class of estimators in which the conditional MLE is relatively e‰cient As in Chapter 13, we let E y ðÁ j xÞ denote the expectation with respect to the conditional density f ðy j x; y Þ Consider the class of estimators solving the first-order condition N À1 N X ^ gwi ; y ị 14:58ị iẳ1 where the P Â function gðw; y Þ such that E y ẵgw; y ị j x ẳ 0; all x A X; all y A Y ð14:59Þ In other words, the class of estimators is indexed by functions g satisfying a zero conditional moment restriction We assume the standard regularity conditions from Chapter 12; in particular, gðw; ÁÞ is continuously diÔerentiably on the interior of Y As we showed in Section 13.7, functions g satisfying condition (14.59) generally have the property Eẵy gw; yo ị j x ẳ Eẵgw; yo Þsðw; yo Þ j x Generalized Method of Moments and Minimum Distance Estimation 439 where sðw; y Þ is the score of log f ðy j x; y Þ (as always, we must impose certain regularity conditons on g and log f ) If we take the expectation of both sides with respect to x, we obtain condition (14.57) with r ẳ 1, A t ẳ Eẵy gw; yo Þ, and s t Ã ðwÞ ¼ Àsðw; yo Þ It follows from Theorem 14.3 that the conditional MLE is e‰cient in the class of estimators solving equation (14.58), where gðÁÞ satisfies condition (14.59) and appropriate regularity conditions Recall from Section 13.5.1 that the asymptotic variance of the (centered and standardized) CMLE is fEẵsw; yo ịsw; yo ị g1 This is an example of an e‰ciency bound because no estimator of the form (14.58) under condition (14.59) can have an asymptotic variance smaller than fEẵsw; yo ịsw; yo Þ gÀ1 (in the matrix sense) When an estimator from this class has the same asymptotic variance as the CMLE, we way it achieves the e‰ciency bound It is important to see that the e‰ciency of the conditional MLE in the class of estimators solving equation (14.58) under condition (14.59) does not require x to be ancillary for yo : except for regularity conditions, the distribution of x is essentially unrestricted, and could depend on yo Conditional MLE simply ignores information on yo that might be contained in the distribution of x, but so all other estimators that are based on condition (14.59) By choosing x to be empty, we conclude that the unconditional MLE is e‰cient in the class of estimators based on equation (14.58) with E y ẵgw; y ị ẳ 0, all y A Y This is a very broad class of estimators, including all of the estimators requiring condition (14.59): if a function g satisfies condition (14.59), it has zero unconditional mean, too Consequently, the unconditional MLE is generally more e‰cient than the conditional MLE This e‰ciency comes at the price of having to model the joint density of ðy; xÞ, rather than just the conditional density of y given x And, if our model for the density of x is incorrect, the unconditional MLE generally would be inconsistent When is CMLE as e‰cient as unconditional MLE for estimating yo ? Assume that the model for the joint density of ðx; yÞ can be expressed as f ðy j x; y Þhðx; dÞ, where y is the parameter vector of interest, and hðx; Þ is the marginal density of x for some vector Then, if d does not depend on y in the sense that y hx; dị ẳ for all x and d, x is ancillary for yo In fact, the CMLE is identical to the unconditional MLE If d depends on y, the term ‘y log½hðx; dÞ generally contains information for estimating yo , and unconditional MLE will be more e‰cient than CMLE 14.5.3 E‰cient Choice of Instruments under Conditional Moment Restrictions We can also apply Theorem 14.3 to find the optimal set of instrumental variables under general conditional moment restrictions For a G Â vector rðwi ; y Þ, where wi A RM , yo is said to satisfy conditional moment restrictions if E½rðwi ; yo ị j x i ẳ 14:60ị 440 Chapter 14 where x i A RK is a subvector of wi Under assumption (14.60), the matrix Z i appearing in equation (14.22) can be any function of x i For a given matrix Z i , we obtain the e‰cient GMM estimator by using the e‰cient weighting matrix However, unless Z i is the optimal set of instruments, we can generally obtain a more e‰cient estimator by adding any nonlinear function of x i to Z i Because the list of potential IVs is endless, it is useful to characterize the optimal choice of Z i The solution to this problem is now pretty well known, and it can be obtained by applying Theorem 14.3 Let W o x i ị Varẵrwi ; yo Þ j x i ð14:61Þ be the G Â G conditional variance of ri ðyo Þ given x i , and define R o ðx i Þ Eẵy rwi ; yo ị j x i 14:62ị Problem 14.3 asks you to verify that the optimal choice of instruments is Z Ã ðx i Þ W o ðx i ÞÀ1 R o ðx i Þ ð14:63Þ The optimal instrument matrix is always G Â P, and so the e‰cient method of moments estimator solves N X ^ Z Ã ðx i Þ ri ðy Þ ¼ i¼1 There is no need to use a weighting matrix Incidentally, by taking gðw; y Þ Z Ã ðxÞ rðw; y Þ, we obtain a function g satisfying condition (14.59) From our discussion in Section 14.5.2, it follows immediately that the conditional MLE is no less e‰cient than the optimal IV estimator In practice, Z Ã ðx i Þ is never a known function of x i In some cases the function R o ðx i Þ is a known function of x i and yo and can be easily estimated; this statement is true of linear SEMs under conditional mean assumptions (see Chapters and 9) and of multivariate nonlinear regression, which we cover later in this subsection Rarely moment conditions imply a parametric form for W o ðx i Þ, but sometimes homoskedasticity is assumed: Eẵri yo ịri yo ị j x i ẳ W o 14:64ị and W o is easily estimated as in equation (14.30) given a preliminary estimate of yo Since both W o ðx i Þ and R o ðx i Þ must be estimated, we must know the asymptotic properties of GMM with generated instruments Under conditional moment restrictions, generated instruments have no eÔect on the asymptotic variance of the GMM estimator Thus, if the matrix of instruments is Zðx i ; go Þ for some unknown parame- Generalized Method of Moments and Minimum Distance Estimation 441 pffiffiffiffiffi ^ ter vector go , and g is an estimator such that N ð^ À go Þ ¼ Op ð1Þ, then the GMM g ^ ^ estimator using the generated instruments Z i Zðx i ; gÞ has the same limiting distribution as the GMM estimator using instruments Zðx i ; go Þ (using any weighting matrix) This result follows from a mean value expansion, using the fact that the derivative of each element of Zðx i ; gÞ with respect to g is orthogonal to ri ðyo Þ under condition (14.60): N À1=2 N X ^ ^ Z i0 ri y ị ẳ N 1=2 iẳ1 N X Z i go ị ri yo ị iẳ1 p ^ ỵ EẵZ i go ị R o ðx i Þ N ðy À yo Þ þ op ð1Þ ð14:65Þ ^ The right-hand side of equation (14.65) is identical to the expansion with Z i replaced with Z i ðgo Þ Assuming now that Z i ðgo Þ is the matrix of e‰cient instruments, the asymptotic variance of the e‰cient estimator is pffiffiffiffi ffi ^ Avar N y yo ị ẳ fEẵR o x i Þ W o ðx i ÞÀ1 R o ðx i ÞgÀ1 ð14:66Þ as can be seen from Section 14.1 by noting that G o ẳ EẵR o x i Þ W o ðx i ÞÀ1 R o ðx i ị and L o ẳ G1 when the instruments are given by equation (14.63) o Equation (14.66) is another example of an e‰ciency bound, this time under the conditional moment restrictions (14.54) What we have shown is that any GMM estimator has variance matrix that diÔers from equation (14.66) by a positive semidefinite matrix Chamberlain (1987) has shown more: any estimator that uses only condition (14.60) and satisfies regularity conditions has variance matrix no smaller than equation (14.66) Estimation of R o ðx i Þ generally requires nonparametric methods Newey (1990) ^ ^ describes one approach Essentially, regress the elements of ‘y ri ðyÞ on polynomial ^ ^ functions of x i (or other functions with good approximating properties), where y is an initial estimate of yo The fitted values from these regressions can be used as the ^ elements of R i Other nonparametric approaches are available See Newey (1990, 1993) for details Unfortunately, we need a fairly large sample size in order to apply such methods eÔectively As an example of nding the optimal instruments, consider the problem of estimating a conditional mean for a vector yi : Eyi j x i ị ẳ mx i ; yo Þ ð14:67Þ Then the residual function is rðwi ; y Þ yi À mðx i ; y ị and W o x i ị ẳ Varyi j x i Þ; therefore, the optimal instruments are Zo ðx i Þ W o ðx i ÞÀ1 ‘y mðx i ; yo Þ This is an im- 442 Chapter 14 portant example where R o ðx i Þ ¼ À‘y mðx i ; yo Þ is a known function of x i and yo If the homoskedasticity assumption Varyi j x i ị ẳ W o 14:68ị ^ ^ holds, then the e‰cient estimator is easy to obtain First, let y be the multivariate PN nonlinear least squares (MNLS) estimator, which solves y A Y i¼1 ½yi À mðx i ; y Þ Á ½yi À mðx i ; y Þ pffiffiffiffi discussed in Problem 12.11, the MNLS estimator is generally As ffi ^ ^ ^ consistent and N -asymptotic normal Define the residuals ^ i yi À mðx i ; yÞ, and u PN ^ ^0 À1 ^ ^ define a consistent estimator of W o by W ¼ N uu i¼1 ^ i ^ i An e‰cient estimator, y , solves N X ^ ^ ^ ^ ‘y mðx i ; yị W1 ẵyi mx i ; y ị ¼ i¼1 pffiffiffiffiffi ^ and the asymptotic variance of N y yo ị is fEẵy m i yo Þ WÀ1 ‘y m i ðyo ÞgÀ1 An o asymptotically equivalent estimator is the nonlinear SUR estimator described in ^ Problem 12.7 In either case, the estimator of Avarðy Þ under assumption (14.68) is " #À1 N X ^ ^ ^ Av^ry ị ẳ a ^ y m i ðy Þ WÀ1 ‘y m i ðy Þ i¼1 Because the nonlinear SUR estimator is a two-step M-estimator and B o ¼ A o (in the notation of Chapter 12), the simplest forms of tests statistics are valid If assumption (14.68) fails, the nonlinear SUR estimator is consistent, but robust inference should be used because A o B o And, the estimator is no longer e‰cient 14.6 Classical Minimum Distance Estimation We end this chapter with a brief treatment of classical minimum distance (CMD) estimation This method has features in common with GMM, and often it is a convenient substitute for GMM Suppose that the P Â parameter vector of interest, yo , which often consists of parameters from a structural model, is known to be related to an S Â vector of reduced form parameters, po , where S > P In particular, po ẳ hyo ị for a known, continuously diÔerentiable function h: RP ! R S , so that h maps the structural parameters into the reduced form parameters ^ CMD estimation of yo entails first estimating po by p, and then choosing an esti^ of yo by making the distance between p and hðy Þ as small as possible As ^ ^ mator y with GMM estimation, we use a weighted Euclidean measure of distance While a Generalized Method of Moments and Minimum Distance Estimation 443 CMD estimator can be defined for any positive semidefinite weighting matrix, we ^ consider only the e‰cient CMD estimator given our choice of p As with e‰cient GMM, the CMD estimator that uses the e‰cient weighting matrix is also called the minimum chi-square estimator Assuming that for an S Â S positive definite matrix X o pffiffiffiffi ffi a N ð^ À po Þ @ Normalð0; X o Þ p ð14:69Þ it turns out that an e‰cient CMD estimator solves ^ p p minf^ À hðy Þg X À1 f^ À hðy Þg yAY 14:70ị ^ where plimN!y X ẳ X o In other words, an e‰cient weighting matrix is the inverse pffiffiffiffiffi of any consistent estimator of Avar N ð^ À po Þ pffiffiffiffiffi p ^ We can easily derive the asymptotic variance of N ðy À yo Þ The first-order con^ is dition for y ^ ^ ^ Hðy Þ X À1 f^ À hðy Þg p ð14:71Þ where Hðy Þ ‘y hðy Þ is the S Â P Jacobian of hðy Þ Since hðyo Þ ¼ po and pffiffiffiffi ffi pffiffiffiffi ffi ^ ^ N fhy ị hyo ịg ẳ Hyo ị N y yo ị ỵ op 1ị by a standard mean value expansion about yo , we have pffiffiffiffi ffi pffiffiffiffiffi ^ ^ ^ p ẳ Hy ị X À1 f N ð^ À po Þ À Hðyo Þ N y yo ịg ỵ op 1ị 14:72ị ^ ^ p ^ Because HðÁÞ is continuous and y ! yo , Hy ị ẳ Hyo ị ỵ op 1ị; by assumption X ẳ X o ỵ op 1ị Therefore, pffiffiffiffiffi pffiffiffiffiffi ^ p Hðyo Þ X À1 Hðyo Þ N ðy À yo Þ ¼ Hðyo Þ X N ^ po ị ỵ op 1ị o o By assumption (14.69) and the asymptotic equivalence lemma, pffiffiffiffiffi a ^ Hðyo Þ X À1 Hðyo Þ N y yo ị @ Normalẵ0; Hyo ị X À1 Hðyo Þ o o and so pffiffiffiffiffi a ^ N y yo ị @ Normalẵ0; H o X À1 H o ÞÀ1 o ð14:73Þ provided that H o Hðyo Þ has full-column rank P, as will generally be the case when yo is identified and hðÁÞ contains no redundancies The appropriate estimator of Av^rðy Þ is a ^ ^ ^ ^ ^ ^ Av^rðy Þ ðH X À1 HÞÀ1 =N ¼ ðH ẵAv^r^ị1 Hị1 a ^ a p 14:74ị 444 Chapter 14 ^ The proof that X À1 is the optimal weighting matrix in expression (14.70) is very similar to the derivation of the optimal weighting matrix for GMM (It can also be shown by applying Theorem 14.3.) We will simply call the e‰cient estimator the CMD estimator, where it is understood that we are using the e‰cient weighting matrix pffiffiffiffiffi There is another e‰ciency issue that arises when more than one N -asymptotically ^ normal estimator for po is available: Which estimator of po should be used? Let y be ~ be the estimator based on another estimator, p ^ ~ the estimator based on p, and let y pffiffiffiffi ffi pffiffiffiffiffi ~ ^ You are asked to showpffiffiffiffiffi in Problem 14.6 thatpffiffiffiffiffi Avar N ðy À yo Þ À Avar N ðy À yo Þ is p.s.d whenever Avar N ð~ À po Þ À Avar N ð^ À po Þ is p.s.d In other words, we p p should use the most e‰cient estimator of po to obtain the most e‰cient estimator of yo A test of overidentifying restrictions is immediately available after estimation, because, under the null hypothesis po ẳ hyo ị, a ^ ^ p ^ Nẵ^ hy ị X ẵ^ À hðy Þ @ wSÀP p ð14:75Þ To show this result, we use pffiffiffiffiffi pffiffiffiffiffi pffiffiffiffiffi ^ ^ N ½^ hy ị ẳ N ^ po ị H o N y yo ị ỵ op 1ị p p pffiffiffiffiffi pffiffiffiffi ffi 0 ¼ N ð^ À po Þ À H o ðH o X À1 H o ÞÀ1 H o X À1 N ð^ À po ị ỵ op 1ị p p o o p 1 p ẳ ẵIS H o ðH o X À1 H o Þ H o X o N ^ po ị ỵ op 1ị o Therefore, up to op ð1Þ, pffiffiffiffiffi À1=2 À1=2 0 1=2 ^ N f^ hy ịg ẳ ẵIS À X o H o ðH o X À1 H o ÞÀ1 H o X o Z M o Z p Xo o pffiffiffiffi ffi d À1=2 N ð^ À po Þ !ffi Normalð0; IS Þ But ffiM o is a symmetric idempotent p where Z X o pffiffiffiffi pffiffiffiffi a ^ ^ ^ matrix with rank S P, so f N ẵ^ hy ịg X f N ẵ^ hy ịg @ wSÀP Because X p p o is consistent for X o , expression (14.75) follows from the asymptotic equivalence lemma The statistic can also be expressed as ^ ^ f^ hy ịg ẵAv^r^ị1 f^ hy ịg p a p p ð14:76Þ Testing restrictions on yo is also straightforward, assuming that we can express the restrictions as yo ẳ da o ị for an R vector a o , R < P Under these restrictions, po ẳ hẵda o ị ga o ị Thus, a o can be estimated by minimum distance by solving ^ problem (14.70) with a in place of y and gðaÞ in place of hðy Þ The same estimator X should be used in both minimization problems Then it can be shown (under interiority and diÔerentiability) that Generalized Method of Moments and Minimum Distance Estimation 445 ^ ^ p ^ a Nẵ^ g^ị X ẵ^ g^ị Nẵ^ hy ị X ẵ^ À hðy Þ @ wPÀR p a ^ p a p ð14:77Þ when the restrictions on yo are true To illustrate the application of CMD estimation, we reconsider Chamberlain’s (1982, 1984) approach to linear, unobserved eÔects panel data models (See Section 11.3.2 for the GMM approach.) The key equations are yit ẳ c ỵ x i1 l1 ỵ ỵ x it b ỵ l t ị ỵ ỵ x iT lT ỵ vit 14:78ị where Evit ị ẳ 0; Ex i0 vit ị ẳ 0; t ¼ 1; 2; ; T ð14:79Þ (For notational simplicity we not index the true parameters by ‘‘o’’.) Equation (14.78) embodies the restrictions on the ‘‘structural’’ parameters y ðc; l1 ; ; 0 lT ; b Þ , a ỵ TK ỵ Kị vector To apply CMD, write yit ẳ pt0 ỵ x i pt þ vit ; t ¼ 1; ; T so that the vector p is T1 ỵ TKị Â When we impose the restrictions, 0 pt0 ẳ c, pt ẳ ẵl1 ; l ; ; b ỵ l t ị ; ; lT ; t ¼ 1; ; T Therefore, we can write p ẳ Hy for a T ỵ T Kị ỵ TK ỵ Kị matrix H When 0 T ¼ 2, p can be written with restrictions imposed as p ẳ c; b ỵ l1 ; l ; c; l1 ; b þ 0 l Þ , and so 0 6 IK IK 7 6 0 IK 7 H¼6 61 0 7 IK 0 0 IK I K ^ The CMD estimator can be obtained in closed form, once we have p; see Problem 14.7 for the general case ^ How should we obtain p, the vector of estimates without the restrictions imposed? There is really only one way, and that ffiis OLS for each time period Condition (14.79) pffiffiffiffi ensures that OLS is consistent and N -asymptotically normal Why not use a system method, in particular, SUR? For one thing, we cannot generally assume that vi satisfies the requisite homoskedasticity assumption that ensures that SUR is more e‰cient than OLS equation by equation; see Section 11.3.2 Anyway, because the same 446 Chapter 14 regressors appear in each equation and no restrictions are imposed on the pt , OLS and SUR are identical Procedures that might use nonlinear functions of x i as instruments are not allowed pffiffiffiffiffi condition (14.79) under ^ The estimator X of Avar N ð^ À pÞ is the robust asymptotic variance for system p OLS from Chapter 7: ! ! ! À1 À1 N N N X X X 0 À1 À1 ^ N À1 X X Xi N X î ^ X i X Xi 14:80ị vv N i iẳ1 i iẳ1 i i iẳ1 where X i ẳ IT n 1; x i ị is T T ỵ T Kị and î is the vector of OLS residuals; see v also equation (7.26) Given the linear model with an additive unobserved eÔect, the overidentification test statistic (14.75) in Chamberlain’s setup is a test of the strict exogeneity assumption Essentially, it is a test of whether the leads and lags of x t appearing in each time period are due to a time-constant unobserved eÔect ci The number of overidentifying restrictions is T ỵ T Kị ỵ TK ỵ KÞ Perhaps not surprisingly, the minimum distance approach to estimating y is asymptotically equivalent to the GMM procedure we described in Section 11.3.2, as can be reasoned from the work of Angrist and Newey (1991) One hypothesis of interest concerning y is that l t ¼ 0, t ¼ 1; ; T Under this hypothesis, the random eÔects assumption that the unobserved eÔect ci is uncorrelated with x it for all t holds We discussed a test of this assumption in Chapter 10 A more general test is available in the minimum distance setting First, estimate ^ ^ a ðc; b Þ by minimum distance, using p and X in equation (14.80) Second, compute the test statistic (14.77) Chamberlain (1984) gives an empirical example Minimum distance methods can be applied to more complicated panel data models, including some of the duration models that we cover in Chapter 20 (See Han and Hausman, 1990.) Van der Klaauw (1996) uses minimum distance estimation in a complicated dynamic model of labor force participation and marital status Problems 14.1 Consider the system in equations (14.34) and (14.35) a How would you estimate equation (14.35) using single-equation methods? Give a few possibilities, ranging from simple to more complicated State any additional assumptions relevant for estimating asymptotic variances or for e‰ciency of the various estimators Generalized Method of Moments and Minimum Distance Estimation 447 b Is equation (14.34) identified if g1 ¼ 0? c Now suppose that g3 ¼ 0, so that the parameters in equation (14.35) can be con^ sistently estimated by OLS Let y2 be the OLS fitted values Explain why nonlinear least squares estimation of g ^ y1 ẳ x1 d1 ỵ g1 y2 ỵ error does not consistently estimate d1 , g1 , and g2 when g1 0 and g2 14.2 Consider the following labor supply function nonlinear in parameters: hours ẳ z1 d1 ỵ g1 wage r1 1ị=r1 ỵ u1 ; Eu1 j zị ẳ where z1 contains unity and z is the full set of exogenous variables a Show that this model contains the level-level and level-log models as special cases [Hint: For w > 0, ðw r À 1Þ=r ! logðwÞ as r ! 0.] b How would you test H : g1 ¼ 0? (Be careful here; r1 cannot be consistently estimated under H ) c Assuming that g1 0, how would you estimate this equation if Varu1 j zị ẳ s1 ? What if Varðu1 j zÞ is not constant? d Find the gradient of the residual function with respect to d1 , g1 , and r1 [Hint: Recall that the derivative of w r with respect to r is w r logðwÞ.] e Explain how to obtain the score test of H0 : r1 ¼ 14.3 Use Theorem 14.3 to show that the optimal instrumental variables based on the conditional moment restrictions (14.60) are given by equation (14.63) 14.4 a Show that, under Assumptions WNLS.1–WNLS.3 in Chapter 12, the weighted NLS estimator has asymptotic variance equal to that of the e‰cient IV estimator based on the orthogonality condition Eẵ yi mx i ; bo ịị j x i ¼ b When does the nonlinear least squares estimator of bo achieve the e‰ciency bound derived in part a? c Suppose that, in addition to Eð y j xị ẳ mx; bo ị, you use the restriction Var y j xị 2 ẳ so for some so > Write down the two conditional moment restrictions for esti2 mating bo and so What are the e‰cient instrumental variables? 14.5 Write down y, p, and the matrix H such that p ¼ Hy in Chamberlain’s approach to unobserved eÔects panel data models when T ẳ pffiffiffiffiffi ^ ffi ~ 14.6 Letpffiffiffiffi p and p be two consistent estimators of po , with Avar N ^ po ị ẳ X o p ^ ~ ^ and Avar N ~ po ị ẳ L o Let y be the CMD estimator based on p, and let y be p 448 Chapter 14 ~ the CMD estimator based on pffiffiffiffiffi p, where po ¼ hðyo Þ Show that, if L o À X o is positive pffiffiffiffi ffi ~ ^ semidefinite, then so is Avar N ðy À yo Þ À Avar N ðy À yo Þ (Hint: Twice use the fact that, for two positive definite matrices A and B, A À B is p.s.d if and only if BÀ1 À AÀ1 is p.s.d.) 14.7 Show that when the mapping from yo to po is linear, po ¼ Hyo for a known ^ S P matrix H with rankHị ẳ P, the CMD estimator y is ^ ^ ^ ^ y ¼ ðH X À1 HÞÀ1 H X À1 p ð14:81Þ ^ Equation (14.81) looks like a generalized least squares (GLS) estimator of p on ^ H using variance matrix X, and this apparent similarity has prompted some to call the minimum chi-square estimator a ‘‘generalized least squares’’ (GLS) estimator ^ Unfortunately, the association between CMD and GLS is misleading because p and H are not data vectors whose row dimension, S, grows with N The asymptotic properties of the minimum chi-square estimator not follow from those of GLS 14.8 In Problem 13.9, suppose you model the unconditional distribution of y0 as f0 ð y0 ; y Þ, which depends on at least some elements of y appearing in ft ðyt j ytÀ1 ; y Þ Discuss the pros and cons of using f0 ðy0 ; y Þ in a maximum likelihood analysis along with ft ðyt j ytÀ1 ; y ị, t ẳ 1; 2; ; T 14.9 Verify that, for the linear unobserved eÔects model under Assumptions RE.1– ^ RE.3, the conditions of Lemma 14.1 hold for the xed eÔects y ị and the ran2 ^ dom eÔects y1 ị estimators, with r ¼ su [Hint: For clarity, it helps to introduce a € € cross section subscript, i Then A1 ẳ EX i0 X i ị, where X i ¼ X i À ljT x i ; A2 ¼ EðX i0 X i Þ, € i ¼ X i À j x i ; s i1 ¼ X ri , where ri ¼ vi À lj vi ; and s i2 ¼ X u i ; see Chapter € where X T T i i € € € € € 10 for further notation You should show that X i0 u i ¼ X i0 ri and then X i0 X i ¼ X i0 X i ] Appendix 14A Proof of Lemma 14.1: metric matrix, and Given condition (14.55), A1 ẳ 1=rịEs1 s1 Þ, a P Â P sym- 0 V1 ẳ A1 Es1 s1 ịA1 ẳ r ẵEs1 s1 ÞÀ1 1 where we drop the argument w for notational simplicity Next, under condition (14.56), A2 ẳ 1=rịEs 02 s1 ị, and so V2 ẳ A1 Es s 02 ịA02 ị1 ẳ r ẵEs s1 ÞÀ1 Eðs s 02 Þ½Eðs1 s 02 ÞÀ1 Generalized Method of Moments and Minimum Distance Estimation 449 Now we use the standard result that V2 À V1 is positive semidefinite if and only if À1 À1 V1 À V2 is p.s.d But, dropping the term r (which is simply a positive constant), we have À1 À1 0 V1 V2 ẳ Es1 s1 ị Es1 s 02 ịẵEs s 02 ị1 Es s1 Þ Eðr1 r1 Þ where r1 is the P Â population residual from the population regression s1 on s As Eðr1 r1 Þ is necessarily p.s.d., this step completes the proof ... a linear demand and supply system for G attributes of a good or service (see Epple, 1987; Kahn and Lang, 1988; and Wooldridge, 1996) The demand and supply system is written as demandg ẳ h1g ỵ... Chapter 14 along with equation (14. 44) Then ðx i1 ; x i3 Þ (and functions of these) can be used as instruments in any of the G demand equations, and ðq i ; x i Þ act as IVs in equation (14. 44)... true when each is N -asymptotically normal, and that can be verified by stacking the first-order representations of the estimators—assumptions (14. 55) and pffiffiffiffiffi pffiffiffiffiffi ^ ^ (14. 56) imply that the

Định dạng
Số trang	29
Dung lượng	222,77 KB