QML Estimation of Dynamic Panel Data Models with Spatial Errors∗ Liangjun Su†and Zhenlin Yang‡ February, 2007 Abstract We propose quasi maximum likelihood (QML) estimation of dynamic panel models with spatial errors when the cross-sectional dimension (n) is large and the time dimension (T ) is fixed We consider both the random effects and fixed effects models and derive the limiting distributions of the QML estimators under different assumptions on the individual effects and on the initial observations Monte Carlo simulation shows that the estimators perform well in finite samples JEL classifications: C12, C14, C22, C5 Key Words: Dynamic Panel, Fixed Effects, Random Effects, Spatial Dependence, Quasi Maximum Likelihood ∗ Liangjun Su gratefully acknowledges the financial support from the NSFC (70501001) He also thanks the School of Economics and Social Sciences, Singapore Management University (SMU), for the hospitality during his two-month visit, and the Wharton-SMU research center, SMU, for supporting his visit Zhenlin Yang gratefully acknowledges the research support from the Wharton-SMU research center, Singapore Management University † Guanghua School of Management, Peking University, Beijing 100871, China Telephone: +86-10-6276-7444 Email address: lsu@gsm.pku.edu.cn ‡ School of Economics and Social Sciences, Singapore Management University, 90 Stamford Road, Singapore 178903 Telephone: +65-0828-0852 Fax: +65-6828- 0853 Email address: zlyang@smu.edu.sg 1 Introduction Recently there has been a growing interest in the estimation of panel data models with cross-sectional or spatial dependence See Baltagi and Li (2004), Baltagi, Song and Koh (2003), Chen and Conley (2001), Elhorst (2003, 2005), Huang (2004), Kapoor, Kelejian and Prucha (2006), Persaran (2003, 2004), Phillips and Sul (2003), Yang, Li and Tse (2006), among others, for an overview In this paper we focus on the quasi-maximum likelihood estimation of dynamic panel data models with spatial errors The history of spatial econometrics can be traced back at least to Cliff and Ord (1973) Since then, various methods have been proposed to estimate the spatial dependence models, including the method of maximum likelihood (ML) (Ord, 1975; Smirnov and Anselin, 2001), the method of moments (MM) (Kelejian and Prucha, 1998, 1999, 2006; Lin and Lee, 2006), and the method of quasi-maximum likelihood (QML) (Lee, 2004) A common feature of these methods is that they are all developed for estimations of estimate a cross-sectional model with no time dimension Recently, Elhorst (2003, 2005) studies the ML estimation of (dynamic) panel data models with certain spatial dependence structure, but the asymptotic properties of the estimators are not given For over thirty years of spatial econometrics history, the asymptotic theory for the (Q)ML estimation of spatial models has been taken for granted until the influential paper by Lee (2004), which establishes systematically the desirable consistency and asymptotic normality results for the Gaussian QML estimates of a spatial autoregressive model He demonstrates that the rate of convergence of the QML estimates may depend on some general features of the spatial weights matrix More recently, Yu, de Jong, and Lee (2006) extend the work of Lee (2004) to spatial dynamic panel data models with fixed effects allowing both the time dimension (T ) and the cross-sectional dimension (n) large This paper concerns with the more traditional panel data model where n is allowed to grow but T is held fixed (usually small) As Binder, Hsiao and Pesaran (2005) remarked, this model remains the prevalent setting for the majority of empirical microeconometric research Our work is distinct from that of Yu, de Jong, and Lee (2006) in several aspects First, unlike Yu, de Jong, and Lee (2006) who consider only fixed effects model, we shall consider both random and fixed effects specification of the individual effects and highlight their differences and implications these differences have for estimation and inference Second, when we keep T fixed, our estimation strategy is quite different from that in the large-n and large-T setting In case of fixed effects model, we have to difference-out the fixed effects whereas Yu, de Jong, and Lee (2006) need not so Third, spatial dependence is present only in the error term in our model whereas Yu, de Jong, and Lee (2006) considers spatial lag model Consequently the two approaches complement each other We conjecture we can extend our work to the general spatial autoregressive model with spatial error (SARAR) The rest of the paper is organized as follows In Section we introduce our model specification We propose the quasi maximum likelihood estimates in Section and study their asymptotic properties in Section In Section we provide a small set of Monte Carlo experiments to evaluate the finite sample performance of our estimators All proofs are relegated to the appendix To proceed, we introduce some notation and convention Let In denote an n × n identity matrix Let ιT denote a T × vector of ones and JT = ιT ι0T , where prime denotes transposition throughout this paper ⊗ denotes the Kronecker product |·| denotes the absolute value of a scalar or determinant of a matrix Model Specification We consider the model of the form yit = ρyi,t−1 + x0it β + zi0 γ + uit , (2.1) for i = 1, , n, t = 1, , T, where the scalar parameter ρ with |ρ| < characterizes the dynamic effect, xit is a p × vector of time-varying exogenous variables, zi is a q × vector of time-invariant exogenous variables such as the constant term or dummies representing individuals’ gender or race, and the disturbance vector ut = (u1t , , unt ) is assumed to exhibit both non-observable individual effects and spatially autocorrelated structure, i.e., ut = μ + εt , (2.2) εt = λWn εt + vt , (2.3) 0 where μ = (μ1 , , μn ) , εt = (ε1t , , εnt ) , and vt = (v1t , , vnt ) , with μ representing the unobservable individual effects which could be either random or fixed, εt representing the spatially correlated errors, and vt representing the random innovations that are assumed to be independent and identically distributed (i.i.d.) with zero mean and variance 0, σ 2v In the case where μ is random, its elements are assumed to be i.i.d (0, σ2μ ) and to be independent of vt In the case where μ is fixed, the time invariant regressors should be removed from the model due to multicollinearity between the observed and unobserved individual-specific effects The parameter λ is a the spatial autoregressive coefficient and Wn is a known n × n spatial weight matrix whose diagonal elements are zero Following the literature in spatial econometrics, we assume that In − λWn is nonsingular We will also assume the observations on (yit , x0it , zi0 ) are available at the initial period t = Let Bn = Bn (λ) = In −λWn Frequently, we will suppress the dependence of Bn and Wn on n and write B and W instead We have εt = B −1 vt Let yt = (y1t , , ynt )0 , and xt = (x1t , , xnt )0 Define ¡ ¢0 0 Y = (y10 , , yT0 ) , Y−1 = y00 , , yT0 −1 , X = (x01 , , x0T ) , and Z = ιT ⊗ z, where z = (z1 , , zn )0 Using matrix notation, we can write the model specified by Eqs (2.1)-(2.3) as ¡ ¢ Y = ρY−1 + Xβ + Zγ + u, with u = (ιT ⊗ In )μ + IT ⊗ B −1 v (2.4) It is worth mentioning that Eqs (2.1)-(2.3) allow spatial dependence to be present in the random disturbance term εt but not in the individual effect μ See Baltagi, Song and Koh (2003) and Baltagi and Li (2004) for the application of this type of models Alternatively we can allow both εt and μ to follow a spatial autoregressive model as is done by Kapoor, Kelejian and Prucha (2006) who consider GMM estimation of static spatial panel model with random effects Our theory can readily be modified to take into account the latter case, and we conjecture that a specification test can also be developed to test for the two different specifications Quasi Maximum Likelihood Estimation In this section we develop quasi maximum likelihood estimates (QMLEs) based on Gaussian likelihood for the models specified above 3.1 QMLE for the Random Effects Model For the random effects model, the covariance matrix of u has the familiar form E (uu0 ) = σ 2v Ω, with ¡ ¢ −1 Ω = Ω φμ , λ = φμ (JT ⊗ In ) + IT ⊗ (B B) , (3.1) where φμ = σ 2μ /σ2v , JT = ιT ι0T , and we suppress the dependence of Ω on n We frequently suppress the argument of Ω when no confusion can arise It is well known that the likelihood function for a dynamic panel model depends on the assumptions on the initial observations (Hsiao, 2003) If |ρ| ≥ or the processes generating the xit are not stationary, it does not make sense to assume that the process generating the yit is the same prior to the period of observations as for t = 1, , T For this reason, we consider two sets of assumptions about initial observations {yi0 } Case I: yi0 is exogenous If y0 is taken as exogenous, it rules out the plausible assumption that it is generated by the same process as generates yt , t = 1, , T In this case, we can easily derive the likelihood function for ¡ ¢0 ¡ ¡ ¢0 ¢0 the model (2.4) conditional on y0 Let θ = β , γ , ρ , δ = λ, φμ , and ς = θ0 , σ 2v , δ The log likelihood function of (2.4) is Lr (ς) = − nT nT log(2π) − log(σ 2v ) − log |Ω| − u (θ) Ω−1 u (θ) 2 2σ v (3.2) where u (θ) = Y − ρY−1 − Xβ − Zγ Maximizing (3.2) gives the QMLE of the model parameters based on the Gaussian likelihood Computationally it is convenient to work with the concentrated log-likelihood by concentrating out the parameters θ and σ 2v From (3.2), given δ, the QMLE of θ is i−1 h b e e Ω−1 Y, e Ω−1 X X θ (δ) = X and the QMLE of σ 2v is σ b2v (δ) = e (δ) , u e (δ) Ω−1 u nT (3.3) (3.4) eb e = (X, Z, Y−1 ) , and u e (δ) = Y − X θ (δ) Substituting (3.3) and (3.4) back into (3.2) for θ where X and σ 2v , we obtain the concentrated log-likelihood function of δ : Lrc (δ) = − nT nT (log(2π) + 1) − log [b σ 2v (δ)] − log |Ω| 2 (3.5) b of δ maximizes the concentrated log-likelihood (3.5) The QMLEs of θ1 and b , λ) The QMLE b δ = (φ μ b σ θ1 (b δ) and σ b2v (b δ), respectively Further, the QMLE of σ 2μ is given by σ b2μ = φ σ 2v are given by b μ bv Case II: yi0 is endogenous If y0 is taken as endogenous, there are several approaches to treat initial observations Assume that, possibly after some differencing, both yit and xit are stationary In this case, the initial observations are determined by y0 = ∞ X j=0 ρj x−j β + ∞ X zγ μ ρj B −1 v−j + + − ρ − ρ j=0 (3.6) Since x−j , j = 1, 2, , are not observable, we cannot use x−j in our estimation procedure In this paper we follow Bhargava and Sargan (1983) (see also Hsiao, 2003, p.76) and assume that the initial observation y0 can be approximated by eπ + , y0 = π ιn + xπ + zπ + ≡ x (3.7) e = (ιn , x, z) , π = (π , π 01 , π 02 ) , E ( |x, z) = 0, and the covariance where x = (x0 , x1 , , xT ) , x structure of is affected by the spatial weight matrix W Note that if z contains the constant term, then ιn should vanish in (3.7) Under the stationarity assumption, (3.6) implies that y0 = ye0 + ζ , where ye0 is the systematic or exogenous part of y0 and ζ is the endogenous part, namely, ye0 = ∞ X j=0 ρj x−j β + ∞ X zγ μ ρj B −1 v−j and ζ = + 1−ρ − ρ j=0 (3.8) (3.7) then follows by assuming that the optimal predictor of ye0 conditional on the observable x and eπ + ζ, where ζ = (ζ , , ζ n ) , and ζ 0i s are i.i.d (0, σ 2ζ ) and they are independent of z is x eπ : ye0 = x xit , zi , μi and εit By construction, = ζ + ζ , and E ( i ) = We can verify that under strict exogeneity of xit and zi , E( ) = σ 2ζ In + and σ 2μ (1 − ρ) E ( u0 ) = In + σ2v −1 (B B) − ρ2 σ 2μ ι ⊗ In 1−ρ T (3.9) (3.10) We require that n > p (T + 1) + q + for the identification of the parameters in (3.7) The is impossible if T is relatively large and p 6= For this reason, the regressor in (3.7) is frequently P replaced by x ≡ (T + 1)−1 Tt=0 xt , and thus we have y0 = π ιn + xπ + zπ + ≡ x eπ + , where now x e = (ιn , x, z) , π = (π0 , π 01 , π 02 ) , and the variance-covariance structure of (3.11) and u is the same as before except for the definition of σ 2ζ s are random with a common In contrast, Hsiao (2003, p.76, Case II) simply assumes that yi0 mean π and write y0 as eπ + , y0 = ιn π + ≡ x (3.12) where now x e = ιn , = ( , , n) , i represents the deviation of initial individual endowments from the mean, and the variance-covariance structure of and u is the same as before except for the definition of σ 2ζ In the following, x e can be any one of them defined in (3.7), (3.11) or (3.12) We will simply refer its dimension as n × k, where k varies from case to case Because the likelihood function (3.2) assumes that the initial observations are exogenously given, it generally produces biased estimators when this assumption is not satisfied (see Bhargava and Sargan, 1983) Under the assumption that the presample values are generated by the same process as the within-sample observations, we need to derive the joint distribution of yT , , y1 , y0 from (2.4) and (3.7), (3.11) or (3.12) Denoting by σ 2v Ω∗ the n (T + 1) × n (T + 1) symmetric matrix of u∗ = ( , u0 ) , we see that Ω∗ has the form ⎛ ⎞ σ 2μ σ 2μ σ 2v −1 σ I + I + (B B) ι ⊗ I n ζ n 1−ρ2 1−ρ T (1−ρ)2 n ⎠ σ 2v Ω∗ = ⎝ σ 2μ ι ⊗ I σ Ω n v 1−ρ T ⎛ ⎞ φμ φ −1 μ φ I + I + (B B) ι ⊗ I n n n ζ 1−ρ 1−ρ T (1−ρ) ⎠ = σ 2v ⎝ φμ Ω 1−ρ ιT ⊗ In ⎛ ⎞ ω 11 ω 12 ⎠, ≡ σ 2v ⎝ ω 21 Ω where φζ = σ 2ζ /σ 2v and we frequently suppress the argument of Ω∗ when no confusion can arise ¡ ¡ ¢0 ¢0 ¢0 ¡ Now let θ = β , γ , π , δ = ρ, λ, φμ , φζ , and ς = θ0 , σ 2v , δ Note that ς is a (p + q + k + 5)× vector of unknown parameters Based on (2.4) and (3.7), (3.11) or (3.12), and assuming Gaussian likelihood, the random effect QML estimator of ς is derived by maximizing the following log-likelihood function: n (T + 1) 1 n (T + 1) log(2π) − log(σ 2v ) − log |Ω∗ | − u∗ (θ)0 Ω∗−1 u∗ (θ) 2 2σ v ¡ ¢0 where u∗ (θ) = y00 − π x e0 , u (β, γ, ρ) , and u (β, γ, ρ) = Y − ρY−1 − Xβ − Zγ Lrr (ς) = − (3.13) Maximizing (3.13) gives the quasi maximum likelihood estimates (QMLEs) of the model pa- rameters based on the Gaussian likelihood We will work with the concentrated log-likelihood by concentrating out the parameters θ and σ 2v From (3.13), given δ = (ρ, λ, φμ , φy )0 , the QMLE of is Ê Ô1 b θ (δ) = X ∗0 Ω∗−1 X ∗ X Ω Y , (3.14) and the QMLE of σ 2v is where ⎛ Y∗ =⎝ y0 Y − ρY−1 ⎞ ⎛ ⎠ , X∗ = ⎝ σ b2v (δ) = ∗ ∗−1 ∗ e (δ) , u e (δ) Ω u nT 0n×p 0n×q X Z x e 0nT ×k ⎞ (3.15) ⎛ ⎠, u e∗ (δ) = ⎝ y0 -e xπ b (δ) b (δ) -Zb Y -X β γ (δ) -ρY−1 ⎞ ⎠, b 0, γ and b θ (δ) = (β(δ) b(δ)0 , π b(δ)0 )0 Substituting (3.14) and (3.15) back into (3.13) for θ and σ2v , we obtain the concentrated log-likelihood function of δ : n (T + 1) n (T + 1) (3.16) (log(2π) + 1) − log σ b2v (δ) − log |Ω∗ | 2 ´0 ³ b b φ b ,φ of δ maximizes the concentrated log-likelihood (3.5) The QMLEs of The QMLE b δ= b ρ, λ, μ y ³ ´ ³ ´ δ , respectively Further, the QMLE of σ 2μ and σ 2y are given θ b δ and σ b2v b θ and σ 2v are given by b Lrr c (δ) = − 2 b σ b σ b2y = φ by σ b2μ = φ μ b v by σ y bv , respectively 3.2 QMLE for the Fixed Effects Model In this section, we consider the dynamic panel data model with fixed effects In this case, we write the model in vector notation yt = ρyt−1 + xt β + zγ + μ + B −1 vt , (3.17) where, for example, μ = (μ1 , , μn ) denotes the fixed effects that may be correlated with the regressors xt and z, and the specification for other variables is the same as the random effects case Following the standard practice, we eliminate μ by first-differencing (3.17), namely, ∆yt = ρ∆yt−1 + ∆xt β + B −1 ∆vt (3.18) (3.18) is well defined for t = 2, 3, , T but not for t = because observations on yi,−1 are not available By continuous substitution, we can write ∆y1 as ∆y1 = ρm ∆y−m+1 + m−1 X ρj ∆x1−j β + j=0 m−1 X ρj B −1 ∆v1−j (3.19) j=0 Like Hsiao et al (2002), since the observations ∆x1−j , j = 1, 2, are not available, the conditional mean of ∆y1 given ∆y−m+1 and ∆x1−j , j = 0, 1, 2, as defined by η = E (∆y1 |∆y−m+1 , ∆x1 , ∆x0 , ) = ρm ∆y−m+1 + m−1 X j=0 ρj ∆x1−j β, (3.20) is unknown even if we assume that m is sufficiently large Noting that η is an n × vector, we will confront the incidental parameters problem if we treat η1 as a free parameter to be estimated As Hsiao et al (2002) remark, to get around this problem, the expected value of η1 , conditional on the observables, has to be a function of a finite number of parameters, and such a condition can hold provided that {xit } are trend-stationary (with a common deterministic linear trend) or first-difference stationary processes In this case, the expected value of ∆xi,1−j , conditional on the pT × vector ∆xi = (∆x0i1 , , ∆x0iT ) , is linear in ∆xi , i.e., E (∆xi,1−j |∆xi ) = π 0j + π01j ∆xi , (3.21) where π0j and π 1j don’t depend on i Denote ∆x = (∆x1 , , ∆xn )0 , an n × pT matrix Then under the assumption that E (∆yi,−m+1 |∆xi1 , ∆xi2 , , ∆xiT ) is the same across all individuals, we have xπ + e, ∆y1 = π ιn + ∆xπ + e ≡ ∆e where e = (η1 − E (η1 |∆x)) + Pm−1 j=0 (3.22) ρj B −1 ∆v1−j is an n × random vector with typical element ei (i = 1, , n) ; π = (π0 , π 01 ) is a (pT + 1) × vector of parameters associated with the conditional x = (ιn , ∆x) (3.21) and (3.22) are associated with the Bhargava and Sargan’s mean of ∆y1 ; and ∆e (1983) approximation for the dynamic random effects model with endogenous initial observations See Ridder and Wansbeek (1990) and Blundell and Smith (1991) for a similar approach By construction, we can verify that under strict exogeneity of xit , E (ei |∆xi ) = 0, E (ee0 ) = σ 2e In + σ 2v cm (B B) −1 ≡ σ 2v B −1 (φe BB + cm In ) B 0−1 , (3.23) and E (e∆u02 ) = −σ 2v (B B) −1 , E (e∆u0t ) = for t = 3, , T, (3.24) ¡ ¢ where ∆ut = B −1 ∆vt , σ 2e =Var(η1i ) is identical across i, cm = + ρ2m−1 / (1 + ρ) , and φe = σ 2e /σ 2v Clearly, when m → ∞, c∞ = 2/ (1 + ρ) , which is not a free parameter We assume that m is an unknown finite and identical across individuals so that we will estimate cm below See Hsiao et al (2002, p.110) We require that n > pT + for the identification of the parameters in (3.22) This becomes impossible if T is relatively large in applications and p > Like the random effects model, ∆e x in (3.22) can be chosen to be other variables, so that we have xπ + e, ∆y1 = π ιn + ∆xπ + e ≡ ∆e (3.25) or xπ + e, ∆y1 = πιn + e ≡ ∆e where ∆e x = (ιn , ∆x) with ∆x = T −1 PT t=1 (3.26) ∆xt in (3.25), ∆e x = ιn in (3.26), and in each case the variance-covariance structure of e and (∆v2 , , ∆vT ) are the same as above In the following, we simply refer to the dimension of π to be k Let E = φe BB + cm In Then the covariance matrix of ∆u ≡ (e0 , ∆u02 , , ∆u0T ) is given by ¡ ¢ ¡ ¢ Var (∆u) = σ 2v IT ⊗ B −1 HE IT ⊗ B 0−1 ≡ σ 2v Ω† , where the nT × nT matrix HE is defined by ⎛ E ⎜ ⎜ ⎜ −In ⎜ ⎜ ⎜ ⎜ ⎜ HE = ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ −In ··· 0 2In −In ··· 0 −In 2In ··· 0 ··· 2In −In 0 ··· −In 2In −In 0 ··· −In 2In (3.27) ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ (3.28) ¡ ¡ ¢0 ¢0 Now let θ = β , ρ, π , δ = (λ, cm , φe )0 , and ς = θ0 , σ2v , δ Note that ς is a (p + k + 5) × vector of unknown parameters Based on (3.18) and (3.22) and using Gaussian likelihood, the fixed effects QML estimator of θ is derived by maximizing the following log-likelihood function: Lf (ς) = − where ¯ ¯ nT 1 nT log(2π) − log(σ 2v ) − log ¯Ω† ¯ − ∆u (θ) Ω†−1 ∆u (θ) , 2 2σ v ⎛ ⎜ ⎜ ⎜ ⎜ ∆u (θ) = ⎜ ⎜ ⎜ ⎝ ∆y1 − ∆e xπ ∆y2 − ρ∆y1 − ∆x2 β ∆yT − ρ∆yT −1 − ∆xT β (3.29) ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ (3.30) Maximizing (3.29) gives the Gaussian QMLE of the model parameters We will work with the concentrated log-likelihood by concentrating out the parameters θ and σ2v From (3.13), given δ = (λ, cm , φe )0 , the QMLE of is Ê Ô1 b θ (δ) = ∆X Ω†−1 ∆X ∆X Ω†−1 ∆Y, 10 (3.31) Proof Let Pj = ρj B −1 Then Vt = and otherwise, we have E (V0t Rts Vs ) = ∞ ∞ X X i=0 j=0 ⎛ ∞ X i=max(0,t−s) E (X0t Rts Xs ) = P∞ j=0 Pj vt−j Noting that E (vt0 Dvs ) = σ 2v tr (D) if t = s ∞ X i=max(0,t−s) ⎞ ¡ ¢ E vt−i Pi0 Rts Ps−t+i vt−i ¡ ¢ Pi0 Rts Ps−t+i ⎠ = σ 2v tr B 0−1 Rts B −1 ∞ X ρs−t+2i i=max(0,t−s) ρj xt−j , we have ∞ X ∞ X j=0 k=0 Now, j=0 ¡ ¢ E vt−i Pi0 Rts Pj vs−j = = σ 2v tr ⎝ Next, noting that Xt = P∞ ⎛ ⎞ ∞ X ∞ X ¡ ¡ ¢ ¢ ρj+k E x0t−k Rts xs−j = tr ⎝ ρj+k Rts E xs−j x0t−k ⎠ E (X0t Rts Vs ) = j=0 k=0 ∞ ∞ X X j=0 k=0 ¡ ¢ ρj+k E x0t−k Rts B −1 vs−j = Lemma B.10 Suppose that the conditions in Theorem 4.4 are satisfied Then ¢ © Pn ¡ ¢ ¡ ¢ ¡ 1) Cov V0t Rts Vs , V0g Rgh Vh = ρtsgh,1 κv i=1 B 0−1 Rts B −1 ii B 0−1 Rgh B −1 ii ³ ³ ´´o +2σ 4v tr B 0−1 Rts B −1 B 0−1 Rgh B −1 + B 0−1 Rgh B −1 ³ ³ ´ ´ −1 −1 B −1 +ρtsgh,2 σ4v tr B 0−1 Rts (B B) Rgh B −1 + ρtsgh,3 σ 4v tr B 0−1 Rts (B B) Rgh ¢ ¡ 2) Cov X0t Rts Vs , X0g Rgh Vh hP P ³ ´i m m Pmin(m,m+s−h) i+k+h−s+2j −1 Rts (B B) Rgh E x0g−k xt−i = σ 2v tr i=0 k=0 j=max(0,s−h) ρ ¢ ¡ 3) Cov X0t Rts Xt , X0g Rgh Xh = O (n) , P∞ P P (s+g+h−3t+4j) g−t+2i , ρtsgh,2 = ∞ where ρtsgh,1 = ∞ j=max(0,t−s,t−g,t−h) ρ i=max(0,t−g) ρ j=max(0,s−h) P∞ P∞ h−s+2j h−t+2i g−s+2j ρ (j 6= i + s − t) , and ρtsgh,3 = i=max(0,t−h) ρ (j 6= i + s − t) j=max(0,s−g) ρ Proof Let R1 and R2 be arbitrary n × n nonstochastic matrices We can show that £ ¡ ÂÔ E (vt0 R1 vs ) vg0 R2 vh P ⎪ ⎪ κv ni=1 R1,ii R2,ii + σ 4v [tr (R1 ) tr (R2 ) +tr (R1 (R2 +R20 ))] ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ σ4 tr (R1 ) tr (R2 ) ⎪ ⎪ ⎨ v = σ4v tr (R1 R2 ) ⎪ ⎪ ⎪ ⎪ ⎪ σv tr (R1 R20 ) ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ 35 if t = s = g = h if t = s 6= g = h if t = g 6= s = h if t = h 6= s = g otherwise Consequently, Ê Ô E V0t Rts Vs V0g Rgh Vh ⎤ ⎡ ∞ X ∞ ∞ X ∞ X X 0 ρi+j+k+l vt−i B 0−1 Rts B −1 vs−j vg−k B 0−1 Rgh B −1 vh−l ⎦ = E⎣ i=0 j=0 k=0 l=0 ∞ X = ρ(s+g+h−3t+4j) j=max(0,t−s,t−g,t−h) +σ 4v +σ 4v ( κv n X ¡ 0−1 ¢ ¡ ¢ B Rts B −1 ii B 0−1 Rgh B −1 ii i=1 Ê Ă 01  à  à à ÂÂÔo B −1 tr B Rts B −1 tr B 0−1 Rgh B −1 + 2tr B 0−1 Rts B −1 B 0−1 Rgh B −1 + B 0−1 Rgh ∞ X i=max(0,t−s) ∞ X + ¡ ¢ ρs−t+2i tr B 0−1 Rts B −1 ρg−t+2i i=max(0,t−g) ∞ X + ∞ X j=max(0,s−h) ρh−t+2i i=max(0,t−h) ∞ X j=max(0,s−g) ∞ X j=max(0,g−h) ¡ ¢ ρh−g+2j tr B 0−1 Rgh B −1 (j 6= i + g − t) ³ ´ −1 ρh−s+2j σ 4v tr B 0−1 Rts (B B) Rgh B −1 (j 6= i + s − t) ³ ´ −1 ρg−s+2j σ 4v tr B 0−1 Rts (B B) Rgh B −1 (j 6= i + s − t) Then 1) follows by Lemma B.9 For 2), we have ´ ³ ¡ ¢ Cov X0t Rts Vs , X0g Rgh Vh = E X0t Rts Vs (X0t Rgh Vs ) ⎡ ⎤ ∞ X ∞ ∞ X ∞ X X ¡ ¢ = E⎣ ρi+j+k+l x0t−i Rts B −1 vs−j x0g−k Rgh B −1 vh−l ⎦ i=0 j=0 k=0 l=0 ⎡ ∞ X ∞ X = σ 2v tr ⎣ ∞ X i=0 k=0 j=max(0,s−h) ⎤ ¡ ¢ −1 ρi+k+h−s+2j Rts (B B) Rgh E x0g−k xt−i ⎦ ¢ ¡ The expression for Cov X0t Rts Xt , X0g Rgh Xh is quite complicated, but we can use the three evident facts to show it is of order O (n) , which suffices for our purpose i p h P −1 PT −1 −1 PT −1 PT −1 Lemma B.11 1) (nT )−1 Tt=0 s=0 Vt Rts Vs − E (nT ) t=0 s=0 Vt Rts Vs → 0, P −1 PT −1 p 2) (nT )−1 Tt=0 s=0 Xt Rts Vs → 0, h i p −1 PT −1 PT −1 −1 PT −1 PT −1 X R X − E (nT ) X R X 3) (nT ) t ts t t ts t → t=0 s=0 t=0 s=0 Proof By the three evident facts and Lemmas B.2, B.9, and B.10, we can show that E[(nT )−1 PT −1 PT −1 t=0 s=0 Vt Rts Vs ] = O (1) , and à −1 Var n −1 T −1 T X X t=0 s=0 V0t Rts Vs ! = n−2 −1 T −1 T −1 T −1 T X X X X t=0 s=0 g=0 h=0 36 ¡ ¢ ¡ ¢ Cov V0t Rts Vs , V0g Rgh Vh = O n−1 Then 1) follows by the Chebyshev’s inequality For 2), we have E{(nT )−1 0, and à −1 Var n = n−2 −1 T −1 T X X X0t Rts Vs t=0 s=0 T −1 T −1 T −1 T −1 X X X X t=0 s=0 g=0 h=0 = n−2 −1 T −1 T −1 T −1 T X X X X t=0 s=0 g=0 h=0 ¡ ¢ = O n−1 , PT −1 PT −1 t=0 s=0 X0t Rts Vs } = ! ¡ ¢ Cov X0t Rts Vs , X0g Rts Vh ⎡ ∞ X ∞ X σ 2v tr ⎣ ∞ X i=0 k=0 j=max(0,s−h) ⎤ ¡ ¢ −1 ρi+k+h−s+2j Rts (B B) Rgh E xg−k x0t−i ⎦ where the last equality follows because (i) xit are i.i.d across i with second moments uniformly bounded in i, (ii) Rts (B B) −1 Rgh are uniformly bounded in both row and column sums by B.1, ¡ ¢ −1 E xg−k x0t−i are uniformly bounded by the third evident fact and (iii) elements of Rts (B B) Rgh Hence the conclusion follows by the Chebyshev inequality 3) follows by Lemma B.10 and the Chebyshev inequality Lemma B.12 For D1 , D2 = Ω∗−1 , Ω∗−1 (IT∗ ⊗ A) Ω∗−1 , Ω∗−1 (JT∗ ⊗ In ) Ω∗−1 or Ω∗−1 (KT In ) , Ô Ê 1) n1 u∗0 D1 Ω∗ D2 u∗ − σ 2v tr (D1 Ω∗ D2 Ω∗ ) = op (1) , 2) n−1 [X ∗0 D1 Ω∗ D2 X ∗ − E (X ∗0 D1 Ω∗ D2 X ∗ )] = op (1) Proof Let R = D1 Ω∗ D2 Note that R is uniformly bounded in both row and column sums ⎞ ⎛ R11 R12 ⎜ n×nT ⎟ To show 1), first note that E (u∗0 Ru∗ ) = σ 2v tr (RΩ∗ ) Now write R = ⎝ n×n ⎠ Let R21 R22 u0 = y0 − x eπ and u = Y − ρ0 Y−1 − Xβ − Zγ Then, nT ×n nT ×nT ) u) Var (u∗0 Ru∗ ) = Var (u00 R11 u0 + u0 R22 u + u00 (R12 + R21 ) u) + 2Cov (u00 R11 u0 , u0 R22 u) = Var (u00 R11 u0 ) + Var (u0 R22 u) + Var (u00 (R12 + R21 0 ) u) + 2Cov (u0 R22 u, u00 (R12 + R21 ) u) +2Cov (u00 R11 u0 , u00 (R12 + R21 By Lemmas B.2-B.4, we can show that each term on the right hand side is O (n) Consequently, 1) follows by the Chebyshev ⎛ inequality ⎞ e R22 X e X e R21 x X e ⎠ It is easy to show n−1 x e0 R11 x e converges in probability Next, X ∗0 RX ∗ = ⎝ e x e e0 R11 x x e0 R12 X e R22 X, e n−1 X e R21 x e converge in probability to to its expectation To show n−1 X e, and n−1 x e0 R12 X 37 e (recall X e = (X, Z, Y−1 )) their expectations, the major difficulty lies in the appearance of Y−1 in X ¡ ¢ p 0 R22 Y−1 − E Y−1 R22 Y−1 ) → and the other cases can be proved We show below that n−1 (Y−1 similarly To this goal, we need to use (B.2) (with Y0 = 0nT ×1 ) to obtain R22 Y−1 n−1 Y−1 ¡ ¢0 = n−1 X(−1) β + (lρ ⊗ In ) zγ + (lρ ⊗ In ) + V(1) R22 Ă Â ì X(1) + (lρ ⊗ In ) zγ + (lρ ⊗ In ) μ + V(−1) After expressing out the right hand side of the last expression, it has 16 terms, most of which can easily be shown to converge to their respective expectations The exceptions are terms involving X(−1) or V(−1) , namely: n−1 β 00 X0(−1) R22 X(−1) β , n−1 β 00 V0(−1) R22 V(−1) , n−1 β 00 X0(−1) R22 V(−1) , n−1 β 00 X0(−1) R22 (lρ ⊗ In ) zγ , n−1 β 00 X0(−1) R22 (lρ ⊗ In ) μ, n−1 V0(−1) R22 (lρ ⊗ In ) zγ , and n−1 V0(−1) R22 (lρ ⊗ In ) μ The first three terms converge in probability to their expectations by Lemma B.11 We can show the other terms converge in probability to their expectations by similar arguments to those used in proving Lemmas B.9-B.11 Lemma B.13 Let R be an n (T + 1) × n (T + 1) nonstochastic matrix that is uniformly bounded in both row and column sums, e.g., In(T +1) , Ω∗−1 (IT∗ ⊗ A) , Ω∗−1 (JT∗ ⊗ In ) , or Ω∗−1 (KT∗ ⊗ In ) Then ³ ³¡ ´´ ¢ p −1 /n, 01×k )0 , n−1 X ∗0 RΩ∗−1 u∗ → (00p×1 , 00q×1 , limn→∞ σ 2v tr RΩ∗−1 ι∗T lρ∗0 ⊗ In + Jρ∗ ⊗ (B B) where ⎛ Jρ∗ = ⎝ 01×T ⎞ ⎛ ⎠ , lρ∗ = ⎝ 01×1 ⎞ ⎛ ⎠ , ι∗T = ⎝ 1/ (1 − ρ) ⎞ ⎠ ιT jρ Cρ lρ ¡ ¢ ¡ ¢ p and jρ0 = 1, ρ, , ρT −1 / − ρ2 In particular, when R = In(T +1) , we have: n−1 X ∗0 Ω∗−1 u∗ → 0(p+q+k+1)×1 and E(X ∗0 Ω∗−1 u∗ ) = Proof Recall ⎛ X∗ = ⎝ 0n×p 0n×q 0n×1 X Z Y−1 x e 0nT ×k ⎞ ⎛ ⎠, u e∗ = ⎝ y0 − x eπ Y − Xβ − Zγ − Y−1 ρ0 ⎞ ⎠ By the strict exogeneity of X and Z, we can readily show that [0p×n X ] RΩ∗−1 u∗ , [0q×n Z ] RΩ∗−1 u∗ p p and [e x0 0k×nT ] RΩ∗−1 u∗ have expectation zero, n−1 [0p×n X ] RΩ∗−1 u∗ → 0, n−1 [0q×n Z ] RΩ∗−1 u∗ → p x0 0k×nT ] RΩ∗−1 u∗ → Let 0, and n−1 [e ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 0 n×1 n×p n×1 ∗ ⎠ , X∗ = ⎝ ⎠ , and V∗ = ⎝ ⎠ Y−1 =⎝ Y−1 X(−1) V(−1) 38 Then we are left to show ³ ³¡ n ´´o p ¢ −1 ∗0 RΩ∗−1 u∗ − σ2v tr R ι∗T lρ∗0 ⊗ In + Jρ∗ ⊗ (B B) → 0, and n−1 Y−1 ³ ³¡ ´´ ¢ ¢ ¡ ∗ −1 RΩ−1 u = σ 2v tr R ι∗T lρ∗0 ⊗ In + Jρ∗ ⊗ (B B) E Y−1 Using (B.2), Y−1 = X(−1) β + (lρ ⊗ In ) zγ + (lρ ⊗ In ) μ + V(−1) , we have ∗0 RΩ∗−1 u∗ Y−1 ¡ ¢ = μ0 lρ∗0 ⊗ In RΩ∗−1 u∗ + V∗0 RΩ∗−1 u∗ X ¡ ¢ +β 00 X∗ RΩ∗−1 u∗ + γ 00 z lρ∗0 ⊗ In RΩ∗−1 u∗ ≡ Anj j=1 It is easy to show that E (Anj ) = and n−1 Anj = op (1) for j = 3, So we will show that ³ ³¡ ´´ ¢ −1 + op (n) Let An1 + An2 = σ2v tr R ι∗T lρ∗0 ⊗ In + Jρ∗ ⊗ (B B) ⎛ P ⎞ ∞ j −1 ρ B v −j j=0 ⎠ v∗ = ⎝ ¡ ¢ IT ⊗ B −1 v Noting that u∗ = (u00 , u0 )0 , where u0 = ξ + μ/ (1 − ρ) + ¢ ¡ IT ⊗ B −1 v, we have P∞ j=0 ρj B −1 v−j , and u = (ιT ⊗ In ) μ + Ê Ă Ô Ê Â ĂĂ Â ÂÔ E (An1 ) = E μ0 lρ∗0 ⊗ In RΩ∗−1 (ι∗T ⊗ In ) μ = φμ σ 2υ tr RΩ∗−1 ι∗T l0 In and h i Ê Ô , E (An2 ) = E V∗0 RΩ∗−1 v ∗ = σ2υ tr RΩ∗−1 Jρ∗ ⊗ (B B) where we have used (3.35) and the fact that E (v ∗ V∗0 ) = Jρ∗ ⊗ (B B) −1 Hence ³¡ ´´ ³ ¢ −1 E (An1 + An2 ) = σ 2v tr RΩ∗−1 ι∗T lρ∗0 ⊗ In + Jρ∗ ⊗ (B B) ¡ ¢ We can show that E A2nj = O (n) for j = 1, The first conclusion then follows from the Chebyshev ³ ³¡ ´´ ¢ −1 inequality When R = In(T +1) , tedious calculation shows that tr Ω∗−1 ι∗T lρ∗0 ⊗ In + Jρ∗ ⊗ (B B) = and we can also verify that E (Ani ) = for i = 3, This completes the proof Lemma B.14 Suppose that the conditions in Theorem 4.4 are satisfied Then d X ∗0 Ω∗−1 u∗ → N (0, Γrr,1 ) , where Γrr,11 = p limn→∞ (nT )−1 X ∗0 Ω∗−1 X ∗ 1) √nT 2) √nT ∂Lrr (ς ) d → ∂ς N (0, Γrr ) 39 Proof For 1), by the Cramer-Wold device, it suffices to show that for any c = (c01 , c02 )0 ∈ d Rp+q+1 × Rk such that kck = 1, (nT )−1/2 c0 X ∗0 Ω∗−1 u∗ → N (0, c0 Γrr,11 c) Write −1/2 −1/2 c X ∗0 Ω∗−1 u∗ = (nT ) Tn ≡ (nT ) h i e ω 21 u0 + c01 X e ω 22 u + c02 x e0 ω 11 u0 + c02 x e0 ω 12 u c01 X Analogous to the proof of Lemma B.8, we can write Tn as the summation of six asymptotically P6 independent terms, namely, Tn = i=1 Tni , where Tni s are linear and quadratic functions of μ, linear and quadratic functions of v, linear function of μ and v, linear function of μ and e ξ, linear function of v and e ξ, and linear function of e ξ, respectively Further and E p d {Tni − E (Tni )} / Var (Tni ) → N (0, 1) , ³P ´ T s, we have = Now by the asymptotic independence of Tni ni i=1 X d √ Tni → N Tn = nT i=1 implying that (nT ) −1/2 à 0, lim (nT ) n→∞ −1 X ! Var (Tni ) , i=1 d −1 X ∗0 Ω∗−1 u∗ → N (0, Γrr,11 ) because we can show that (nT ) (X ∗0 Ω∗−1 X ∗ −Var(X ∗0 Ω∗−1 u∗ )) = op (1) The proof of 2) is similar and thus omitted The next three lemmas are used in the proof of Theorem 4.6 for the fixed effects model Lemma B.15 For D1 , D2 = Ω†−1 , Ω†−1 Ω†λ Ω†−1 , Ω†−1 Ω†cm Ω†−1 or Ω†−1 Ω†φe Ω†−1 , ¡ £ ÂÔ 1) n1 u0 D1 D2 u 2v tr D1 Ω† D2 Ω† = op (1) , Ă ÂÔ Ê 2) n1 X D1 D2 ∆X − E ∆X D1 Ω† D2 ∆X = op (1) Proof Let R = D1 Ω† D2 Note that R is uniformly bounded in both row and column sums Since ¡ ¢ ¢ ¡ E (∆u0 R∆u) = σ 2v tr RΩ† , by the Chebyshev inequality 1) follows provided Var n−1 ∆u0 R∆u = ´0 ³ Pm−1 0 , ∆v(1) Then o (1) Let ∆v(0) = Bζ + j=0 ρj ∆v1−j , ∆v(1) = (∆v20 , ∆vT0 ) , and ∆v = ∆v(0) ³ ³ ´ ¡ ´ ¡ ¢ ¢ 0 e e ≡ In ⊗ B −1 R In ⊗ B −1 where R ∆u0 R∆u = ∆v In ⊗ B −1 R In ⊗ B −1 ∆v = ∆v R∆v, ⎞ ⎛ R00 R01 ⎟ ⎜ n×n n×n(T −1) e similarly Let C be a (T − 1) × T ⎟ and partition R Now write R = ⎜ ⎠ ⎝ R10 R11 n(T −1)×n n(T −1)×n(T −1) matrix with Cij = −1 if i = j, Cij = if j = i + 1, and Cij = otherwise Then ∆v(1) = (C ⊗ In ) v, where v = (v10 , , vT0 )0 So e ∆v R∆v e e 0 (R01 + R10 ) ∆v(1) = ∆v(0) R00 ∆v(0) + ∆v(1) R11 ∆v(1) + ∆v(0) e e11 (C ⊗ In ) v + ∆v (R01 + R10 = ∆v(0) ) (C ⊗ In ) v R00 ∆v(0) + v (C ⊗ In ) R (0) 40 Then, Var (∆u0 R∆u) ´ ³ ´ ´ ³ ³ e e11 (C ⊗ In ) v + Var ∆v (R01 + R10 ) (C ⊗ I ) v R00 ∆v(0) + Var v (C ⊗ In ) R = Var ∆v(0) n (0) ´ ³ e e11 (C ⊗ In ) v R00 ∆v(0) , v (C ⊗ In ) R +2Cov ∆v(0) ´ ³ e 0 (R01 + R10 ) (C ⊗ In ) v R00 ∆v(0) , ∆v(0) +2Cov ∆v(0) ´ ³ e11 (C ⊗ In ) v, ∆v (R01 + R10 ) (C ⊗ In ) v +2Cov v (C ⊗ In ) R (0) By the Cauchy-Schwartz inequality it suffices to show that the first three terms on the right hand side ¢ ¡ ¢ ¡ are O (n) since then Var n−1 ∆u0 R∆u = O n−1 = o (1) Write ∆v(0) = Bζ + v1 − ρm−1 v−m+1 + Pm−2 j 0e j=0 ρ (ρ − 1) v−j Since B R00 B is uniformly bounded in both row and column sums, n h³ ´ ´ i2 ³ ³ ´ ´ ³ X e00 e00 Bζ = κζ e00 B e00 BB R e00 + R + σ 4ζ tr B R B0R B = O (n) Var ζ B R ii i=1 ´ ³¡ ³ ¢ ¡ ¢´ e00 v1 = O (n) , Var ρm−1 v−m+1 R e00 ρm−1 v−m+1 = O (n) , and Var(Pm−2 Similarly Var v10 R j=0 Pm−2 j j e ρ (ρ − 1) v−j R00 j=0 ρ (ρ − 1) v−j ) = O (n) It follows by the Cauchy-Schwartz inequality that ´ ´ ³ ³ e e11 (C ⊗ In ) v R00 ∆v(0) = O (n) By the same token, we can show that Var v (C ⊗ In ) R Var ∆v(0) ´ ³ 0 (R01 + R10 ) (C ⊗ In ) v = O (n) This completes the proof of 1) The proof = O (n) , and Var ∆v(0) of 2) is similar and thus omitted Lemma B.16 Let R be an nT × nT nonstochastic matrix that is uniformly bounded in both row p and column sums, e.g., In(T +1) , Ω†−1 , Ω†−1 Ω†λ , Ω†−1 Ω†cm or Ω†−1 Ω†φe Then n−1 ∆X RΩ†−1 ∆u → n h io −1 (00p×1 , 00q×1 , limn→∞ σ 2υ tr RΩ†−1 C1 ⊗ (B B) + C2 ⊗ In /n, 0)0 , where C1 is a T × T matrix defined by ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ C1 = ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 2ρ 1+ρ ³ 2ρ 1+ρ ´ −1 ··· ³ ´ ³ ´ 2ρ 2ρ ρT −4 ρ 1+ρ −1 ρT −3 ρ 1+ρ −1 ¡ ¢ ¡ ¢ ρT −5 2ρ − − ρ2 ρT −4 2ρ − − ρ2 ¡ ¢ ¡ ¢ ρT −6 2ρ − − ρ2 ρT −5 2ρ − − ρ2 1+ρ −1 2−ρ 2ρ − − ρ2 ··· −1 2−ρ ··· 0 0 ··· −1 2−ρ 0 0 ··· −1 −1 ρ ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ¡ ¢ and C2 is a T ×T matrix whose first row is 0, φe , ρφe , , ρT −2 φe and all other row elements are zero p In particular, when R = InT , we have n−1 ∆X Ω†−1 ∆u → 0(p+k+1)×1 and E(∆X Ω†−1 ∆u) = 41 ³ ´0 ¡ ¢0 0 Proof Let ∆X ∗ = (0p×n , ∆x02 , , ∆x0T ) , ∆Y ∗ = 01×n , ∆y1 , , ∆yT0 −1 , ∆e x∗ = ∆e x , 0k×n(T −1) ¡ ¢ x∗ ) By the notation in the proof of Lemma B.15, ∆u = In ⊗ B −1 ∆v Then ∆X = (∆X ∗ , ∆Y ∗ , ∆e x∗ RΩ†−1 ∆u By the strict exogeneity of X and Z, we can readily show that ∆X ∗ RΩ†−1 ∆u, and ∆e p p x∗ RΩ†−1 ∆u → We are left to show have expectation zero, n−1 ∆X ∗ RΩ†−1 ∆u → 0, and n−1 ∆e n n h ioo p −1 → 0, n−1 ∆Y ∗0 RΩ†−1 ∆u − σ 2υ tr RΩ†−1 C1 ⊗ (B B) + C2 ⊗ In n h io ¢ ¡ −1 and E ∆Y ∗0 RΩ†−1 ∆u = σ 2υ tr RΩ†−1 C1 ⊗ (B B) + C2 ⊗ In ³ ´ ¡ ¢0 PT −3 0 Let kρ = 0, 1, ρ, , ρT −2 , X = 01×n , 01×n , (∆x2 β ) , , j=0 ρj (∆xT −1−j β ) , and V = ´ ³ P −3 j ρ (∆vT −1−j )0 Since ∆y1 = ∆e xπ + e and 01×n , 01×n , (∆v2 )0 , , Tj=0 ∆yt = ρt−1 ∆y1 + t−2 X ρj ∆xt−j β + j=0 t−2 X ρj B −1 ∆vt−j for t = 2, 3, , (B.6) j=0 we have ¡ ¢ ∆Y ∗ = kρ ⊗ ∆y1 + X + IT ⊗ B −1 V So ∗0 †−1 ∆Y RΩ †−1 ∆u = X RΩ X ¡ ¢ ¡ ¢ †−1 0−1 †−1 ∆u + kρ ⊗ ∆y1 RΩ ∆u + V IT ⊗ B Anj RΩ ∆u ≡ j=1 It is easy to show that E (An1 ) = and n−1 An1 = op (1) Now after some tedious calculations we can show that â ÊĂ Â Ă ÂÔê E (An2 + An3 ) = E RΩ†−1 ∆u kρ0 ⊗ ∆y10 + V IT ⊗ B 0−1 n h io −1 = σ 2υ tr RΩ†−1 C1 ⊗ (B B) + C2 ⊗ In ¢ ¡ and that E A2nj = O (n) for j = 2, The first conclusion then follows from the Chebyshev inequal- ity When R = InT , write ³ ³ ´´ ¡ ¢ −1 E (An2 + An3 ) = σ 2υ tr Ω†−1 C1 ⊗ (B B) + σ 2υ tr Ω†−1 (C2 ⊗ In ) ≡ An2 + An3 −1 Using (3.39) and the explicit expressions for h−1 and h1 , we can show that where c = PT −1 j=1 ¡ ¡ ¢ ¢ An2 = −cφe σ2υ tr E ∗−1 BB and An3 = −cφe σ 2υ tr E ∗−1 BB , (T − j) ρj−1 Consequently, E (An2 + An3 ) = This completes the proof 42 Lemma B.17 Suppose that the conditions in Theorem 4.6 are satisfied Then d ∆X Ω†−1 ∆u → N (0, Γf,11 ) , where Γf,11 = p limn→∞ (nT )−1 ∆X Ω†−1 ∆X 1) √nT 2) √nT ∂Lf (ς ) d → ∂ς N (0, Γf ) Proof For 1), by the Cramer-Wold device, it suffices to show that for any c = (c01 , c2 , c03 ) ∈ d −1/2 ∆X Ω†−1 ∆u → N (0, c0 Γf,11 c) As in the proof of Lemma ³ ´0 ¡ ¢0 0 x∗ = ∆e x , 0k×n(T −1) B.16, let ∆X ∗ = (0p×n , ∆x02 , , ∆x0T ) , ∆Y ∗ = 01×n , ∆y1 , , ∆yT0 −1 , ∆e Rp × R × Rk such that kck = 1, (nT ) Then Tn ≡ ∆X Ω†−1 ∆u = (c01 ∆X ∗ + c02 ∆Y ∗ + c03 ∆x∗ )Ω†−1 ∆u By the proof of Lemma B.16 ¡ ¢ ∆Y ∗ = X + (kρ ⊗ ∆y1 ) + IT ⊗ B −1 V ¡ ¢ P j −1 Let V ∗ = v−m+1 , v−m+2 , , vT0 We can write ∆y1 = ∆e xπ + ζ + m−1 ∆v1−j = ∆e xπ + ζ + j=0 ρ B C1 V ∗ , V =C2 V ∗ and ∆u = C4 ζ + C3 V ∗ for some matrices Ci , i = 1, 2, 3, Consequently, ∆Y ∗ = X † + kρ ⊗ ζ + C5 V ∗ , ¡ ¢ where X † = X +kρ ⊗ (∆e xπ) , and C5 = (kρ,1 C10 , , kρ,T C10 ) + IT ⊗ B −1 C2 As a result, we can write Tn ¡ ¢ = (c01 ∆X ∗ + c02 X † + kρ ⊗ ζ + C5 V ∗ + c03 ∆x∗ )Ω†−1 (C4 ζ + C3 V ∗ ) ¢ ¡ = a01 ζ + ζ A1 ζ + (a02 V ∗ + V ∗0 A2 V ∗ ) + ζ A3 V ∗ , ≡ Tn1 + Tn2 + Tn3 , where and Ai are vectors or matrices that involve ∆X ∗ , X † , ∆x∗ , and the nonstochastic matrices Cj (j = 3, 4, 5) By Lemma B.7, By the facts that E have ³P i=1 p d {Tni − E (Tni )} / Var (Tni ) → N (0, 1) Tni ´ = and that Tni , i = 1, 2, 3, are asymptotically independent, we X d √ Tn = √ Tni → N nT nT i=1 −1/2 implying that (nT ) à 0, lim (nT ) d n→∞ −1 X ! Var (Tni ) , i=1 −1 ∆X Ω†−1 ∆u → N (0, Γf,11 ) because we can show that (nT ) −Var(∆X Ω†−1 ∆u)) = op (1) The proof of 2) is similar and thus omitted 43 (∆X Ω†−1 ∆X The next three Lemmas are used in simplifying the covariance-covariance matrix of the score function Lemma B.18 Suppose that the conditions in Theorem 4.2 are satisfied Then ⎛ ¡ ¢ ⎞ ¡ ¢ E μ31 X gqn ,1 diag (Gpn ,1 ) + E v11 X gqn ,2 diag (Gpn ,2 ) ⎟ ´ ⎜ ³ ¡ ¢ ¡ ¢ ⎟ e qn uu0 pn u = ⎜ E X ⎜ E μ31 Z gqn ,1 diag (Gpn ,1 ) + E v11 Z gqn ,2 diag (Gpn ,2 ) ⎟ , ⎝ ⎠ ¢ ¡ qn uu0 pn u E Y−1 ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ qn uu0 pn u = σ 2v E Y−1 qn Ωpn u +E μ31 Q0n gqn ,1 diag (Gpn ,1 )+E v11 Qn gqn ,2 diag (Gpn ,2 ) where E Y−1 ¢ ¢ ¡ ¢¢ Pn ¡¡ PnT ¡ +κμ i=1 lρ ⊗ In qn (ιT ⊗ In ) ii Gpn ,1ii + κv i=1 Av qn IT ⊗ B −1 ii Gpn ,2ii , Qn = E[Ax X β ¡ ¢ + (lρ ⊗ In ) zγ + Y0 ], gqn ,1 = qn (ιT ⊗ In ) , gqn ,2 = qn IT ⊗ B −1 , and Ax , Av , Y0 and lρ are defined in (B.2) and (B.5) Proof The proof is similar to that of Lemma B.4 Lemma B.19 Let qn and pn be n (T + 1) × n (T + 1) symmetric nonstochastic matrix Suppose that the conditions in Theorem 4.4 are satisfied Then ¡ ¢ ¡ ¢ ¡ ¢ 1) E (X ∗0 qn u∗ u∗0 pn u∗ ) = E μ31 X ∗0 qn d1 diag (d01 pn d1 ) + E ζ 31 X ∗0 qn d2 diag (d02 pn d2 ) + E a32 ¡ ¢ ∗0 X qn d4 diag (d04 pn d4 ) , X ∗0 qn d3 diag (d03 pn d3 ) + E v11 P P 2) E (u∗0 qn u∗ u∗0 pn u∗ ) = σ 4v tr (Ω∗ qn ) tr (Ω∗ pn )+κμ ni=1 (d01 qn d1 )ii (d01 pn d1 )ii +κζ ni=1 (d02 qn d2 )ii P P × (d02 pn d2 )ii + κa2 ni=1 (d03 qn d3 )ii (d03 pn d3 )ii + κv ni=1 (d04 qn d4 )ii (d04 pn d4 )ii , where d0i s and a2 are defined in (B.8) Proof Write ⎛ ⎞ −1 + B a u∗ = ⎝ ¡ ¢ ⎠ = d1 μ + d2 ζ + d3 a2 + d4 v, (ιT ⊗ In ) μ + IT ⊗ B −1 v where d1 d4 ⎛ = ⎝ ⎛ = ⎝ ζ+ 1−ρ ιT 01×1 0T ×1 μ 1−ρ ⎛ ⎞ ⎠ ⊗ In , d2 = ⎝ 01×T IT ⎞ 0T ×1 ⎛ ⎞ ⎠ ⊗ In , d3 = ⎝ ⎠ ⊗ B −1 , and a2 = ∞ X j=0 ρj v−j 0T ×1 (B.7) ⎞ ⎠ ⊗ B −1 , (B.8) Using the fact that μ, ζ, a2 , and v are mutually independent, we can readily show 1) and 3) For 44 example, for 3), we can apply Lemma B.3 to each term in (B.7) to get E (u∗0 qn u∗ u∗0 pn u∗ ) = σ 4μ [tr (d01 qn d1 ) tr (d01 pn d1 ) + 2tr {d01 qn d1 d01 pn d1 }] + κμ n X i=1 n X +σ 4ζ [tr (d02 qn d2 ) tr (d02 pn d2 ) + 2tr {d02 qn d2 d02 pn d2 }] + κζ + σ 4v (1 − ρ2 )2 = σ 4v tr (Ω∗ qn ) tr (Ω∗ pn ) + κμ n X i=1 i=1 n X (d01 qn d1 )ii (d01 pn d1 )ii + κζ i=1 +κa2 (d02 qn d2 )ii (d02 pn d2 )ii [tr (d03 qn d3 ) tr (d03 pn d3 ) + 2tr {d03 qn d3 d03 pn d3 }] + κa2 +σ 4v [tr (d04 qn d4 ) tr (d04 pn d4 ) + 2tr {d04 qn d4 d04 pn d4 }] + κv n X (d01 qn d1 )ii (d01 pn d1 )ii (d03 qn d3 )ii (d03 pn d3 )ii + κv n X (d03 qn d3 )ii (d03 pn d3 )ii i=1 (d04 qn d4 )ii (d04 pn d4 )ii i=1 n X (d02 qn d2 )ii (d02 pn d2 )ii i=1 n X (d04 qn d4 )ii (d04 pn d4 )ii i=1 Lemma B.20 Let qn and pn be nT × nT symmetric nonstochastic matrix Suppose that the condix∗ ) as in the proof of Lemma B.16 tions in Theorem 4.6 are satisfied Write ∆X = (∆X ∗ , ∆Y ∗ , ∆e Then 1) E (∆X qn ∆u∆u0 pn ∆u) ⎛ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ (1) (2) E ζ 31 ∆X ∗0 qn diag (p11 ) +E ve13 ∆X ∗0 qn diag B 0−1 p11 B −1 +E v11 ∆X ∗0 qn d diag (d0 pn d) ⎜ ⎜ =⎜ E (∆Y ∗0 qn ∆u∆u0 pn ∆u) ⎝ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ (1) (2) x∗0 qn diag (p11 ) +E ve13 ∆e x∗0 qn diag B 0−1 p11 B −1 +E v11 ∆e x∗0 qn d diag (d0 pn d) E ζ 31 ∆e ¡ ¢ ¡ ¢ ¢ P P n n ¡ 2) E (∆u0 qn ∆u∆u0 pn ∆u) = σ 4v tr Ω† qn tr Ω† pn +κζ i=1 q11,ii p11,ii +κvh i=1 B 0−1 q11 B −1 ii ¡ 0−1 ¢ Pn B p11 B −1 ii + κv i=1 (d0 qn d)ii (d0 pn d)ii ¡ ¢ ¡ ¢ ¡ ¢ Qn gqn ,2 where E (∆Y ∗0 qn ∆u∆u0 pn ∆u) = σ 2v E ∆Y ∗0 qn Ω† pn ∆u +E μ31 Q0n gqn ,1 diag (Gpn ,1 )+E v11 ¢ ¢ ¡ ¢¢ Pn ¡¡ PnT ¡ −1 diag (Gpn ,2 ) +κμ i=1 lρ ⊗ In qn (ιT ⊗ In ) ii Gpn ,1ii +κv i=1 Av qn IT ⊗ B G , Qn = ii pn ,2ii E[Ax X β + (lρ ⊗ In ) zγ + Y0 ], and Ax , Av , Y0 and lρ are defined in (B.2) and (B.5) Pm (1) (2) , 0n×n(T −1) )0 , qn = ((q11 B −1 )0 , (q21 B −1 )0 )0 , q11 (p11 ) is the vei = j=0 ρj ∆vi,1−j , qn = (q11 upper left n × n submatrix of qn (pn ), q21 is the lower left n (T − 1) × n submatrix of qn , d = (0T ×1 , C )0 ⊗ B −1 , and C is defined in the proof of Lemma B.15 Proof The proof is similar to that of Lemma B.19 and thus omitted 45 ⎞ ⎟ ⎟ ⎟, ⎠ C Proofs of the Theorems Proof of Theorem 4.1 By Theorem 3.4 of White (1994), it suffices to show that p r (nT )−1 [Lr∗ c (δ) − Lc (δ)] → uniformly in δ ∈ Λ, (C.1) and (nT ) lim sup max c n→∞ ρ∈N (ρ0 ) −1 r∗ [Lr∗ c (δ) − Lc (δ )] < for any > 0, (C.2) where N c (δ ) is the complement of an open neighborhood of δ on Λ of diameter By (3.5) and r e2v (δ) − ln σ b2v (δ)) To show (C.1) , it is sufficient to show (4.3) , (nT )−1 [Lr∗ c (δ) − Lc (δ)] = − (ln σ By (3.4) , we can write where M = InT e2v (δ) = op (1) uniformly on Λ σ b2v (δ) − σ (C.3) 1 −1/2 e (δ) = M Ω−1/2 Y, u e (δ) Ω−1 u Y Ω nT nT ³ ´ e X e Ω−1 X e X e Ω−1/2 Noting that M Ω−1/2 X e = 0, we have σ − Ω−1/2 X b2v (δ) = σ b2v (δ) = (nT )−1 u0 Ω−1/2 M Ω−1/2 u Then by (4.2) , we have e2v (δ) = σ b2v (δ) − σ ³ ´i h −1/2 M Ω−1/2 u − σ2v0 tr Ω−1/2 M Ω−1/2 Ω0 uΩ nT ´ ¡ ¢i σ2 h ³ + v0 tr Ω−1/2 M Ω−1/2 Ω0 − tr Ω−1 Ω0 nT ≡ Tn1 + Tn2 We can readily show that both Tn1 and Tn2 are op (1) uniformly on Λ To show (C.2) , we follow Lee (2002) and define an auxiliary process: YnT = UnT , where UnT ∼ ¢ ¡ ¡ ¢ N 0, σ 2v Ω with Ω = Ω (δ) The true parameter value is given by σ 2v0 , δ Let Ω0 = Ω (δ ) The ¡ ¢ nT log likelihood function of the auxiliary process is Lra δ, σ 2v = − nT log(2π) − log(σ v ) − log |Ω| − ¡ ¢ −1 r UnT We can verify that Lr∗ c (δ) = maxσ 2v Ea La δ, σ v , where Ea is the expectation 2σ 2v UnT Ω ¡ ¢ r r∗ under the auxiliary process By the Jensen inequality, Lr∗ c (δ) ≤ Ea La δ , σ v0 = Lc (δ ) for all δ Suppose the identification uniqueness condition in (C.2) is not satisfied, then there would exist a −1 sequence δ n ∈ Λ such that δ n → δ ∗ 6= δ , and limn→∞ (nT ) would contradict to Assumption R(iv) This completes the proof of the theorem ¥ 46 ∗ r∗ [Lr∗ c (δ ) − Lc (δ )] = The latter Proof of Theorem 4.2 By the Taylor series expansion ∂Lr (bς ) ∂Lr (ς ) ∂ Lr (eς ) √ nT (b ς − ς 0) , 0= √ =√ + ∂ς nT ∂ς∂ς nT ∂ς nT where elements of e ς lie in the segment joining the corresponding elements of b ς and ς Thus ∙ ¸−1 √ ∂ Lr (eς ) ∂Lr (ς ) √ nT (b ς − ς 0) = − nT ∂ς∂ς ∂ς nT p p ς → ς , and it suffices to show that By Theorem 4.1, bς → ς Consequently, e ∂ Lr (ς ) ∂ Lr (e ς) − = op (1) , nT ∂ς∂ς nT ∂ς∂ς (C.4) ∂ Lr (ς ) p → Σr , nT ∂ς∂ς (C.5) and ∂Lr (ς ) d √ → N (0, Σr + Λr ) (C.6) ∂ς nT ³ ´ p p e ,λ e → e≡Ω φ Ω0 By Lemmas B.2, B.5 and B.6, and the fact that As e ς → ς , it follows that Ω μ ´ ³ ´ ³ ee e θ0 − e θ , (C.4) holds for each of its component, e.g., ∂ Lr (eς ) /∂θ∂θ0 The u e θ = Y −X θ = u+X result in (C.5) follows from Lemmas B.5 and B.6 (C.6) is proved in Lemma B.8 ¥ Proof of Theorem 4.3 The proof is almost identical to that of Theorem 4.1 and thus omitted ¥ Proof of Theorem 4.4 The proof is analogous to that of Theorem 4.2 and now follows mainly from Lemmas B.12-B.14.¥ Proof of Theorem 4.5 The proof is almost identical to that of Theorem 4.1 and thus omitted ¥ Proof of Theorem 4.6 The proof is analogous to that of Theorem 4.2 and now follows mainly from Lemmas B.15-B.17.¥ 47 ρ Mean Table Monte Carlo Mean and RMSE for the QMLEs Random Effects Model with Normal Errors, n = 50, T = λ = 25 λ = 50 λ = 0.75 Rmse Mean Rmse Mean Rmse Mean Rmse Mean Rmse Mean Initial observations are exogenous 25 5.000 0.147 4.989 0.148 1.000 0.024 1.000 0.026 0.998 0.166 0.995 0.166 0.250 0.016 0.252 0.016 0.224 0.130 0.151 0.157 0.484 0.144 0.287 0.275 0.492 0.036 0.608 0.114 50 4.999 0.144 4.982 0.147 1.000 0.026 0.999 0.030 1.003 0.170 1.000 0.170 0.500 0.014 0.503 0.014 0.230 0.125 0.132 0.177 0.485 0.147 0.174 0.354 0.493 0.037 0.657 0.163 75 4.992 0.142 4.997 0.143 0.999 0.024 0.999 0.028 1.007 0.167 1.008 0.169 0.750 0.013 0.749 0.012 0.213 0.123 0.101 0.197 0.491 0.158 0.069 0.436 0.491 0.038 0.686 0.192 5.005 1.001 1.000 0.250 0.470 0.497 0.494 5.000 1.000 1.000 0.499 0.466 0.494 0.492 5.007 1.000 0.999 0.750 0.472 0.496 0.492 0.171 0.024 0.176 0.020 0.103 0.146 0.036 0.171 0.024 0.168 0.018 0.114 0.149 0.037 0.162 0.023 0.160 0.018 0.108 0.157 0.038 4.989 1.001 0.997 0.254 0.351 0.309 0.621 4.978 1.000 0.995 0.504 0.307 0.181 0.668 5.018 1.000 1.002 0.747 0.276 0.071 0.704 0.173 0.026 0.177 0.021 0.183 0.266 0.127 0.175 0.029 0.168 0.019 0.229 0.350 0.174 0.161 0.029 0.163 0.017 0.254 0.434 0.210 4.998 1.000 1.004 0.250 0.723 0.504 0.494 4.993 1.000 1.004 0.500 0.722 0.491 0.495 4.997 1.000 1.008 0.750 0.723 0.498 0.493 0.224 0.024 0.173 0.025 0.075 0.160 0.038 0.229 0.023 0.167 0.024 0.078 0.159 0.039 0.232 0.023 0.173 0.024 0.077 0.160 0.038 4.974 1.000 0.999 0.256 0.605 0.317 0.635 4.965 0.999 0.998 0.507 0.563 0.189 0.687 5.016 1.000 1.013 0.746 0.523 0.072 0.730 Rmse 0.230 0.026 0.176 0.029 0.171 0.268 0.141 0.240 0.029 0.172 0.029 0.213 0.346 0.193 0.236 0.030 0.175 0.024 0.253 0.433 0.237 Initial observations are endogenous 25 4.922 0.220 4.945 0.211 4.932 0.213 4.948 0.207 4.944 0.246 4.950 0.244 1.004 0.024 1.009 0.025 1.002 0.022 1.007 0.023 1.002 0.023 1.006 0.023 0.976 0.166 0.986 0.165 0.983 0.178 0.998 0.178 0.972 0.173 0.992 0.171 0.262 0.027 0.258 0.025 0.259 0.024 0.256 0.023 0.259 0.023 0.256 0.022 0.239 0.125 0.223 0.128 0.481 0.101 0.468 0.104 0.731 0.070 0.720 0.074 0.464 0.157 0.546 0.182 0.480 0.152 0.562 0.183 0.473 0.157 0.553 0.185 0.495 0.038 0.570 0.082 0.493 0.038 0.567 0.079 0.495 0.039 0.570 0.083 50 4.774 0.349 4.910 0.275 4.804 0.324 4.922 0.266 4.830 0.339 4.925 0.296 1.006 0.024 1.009 0.025 1.006 0.024 1.009 0.024 1.006 0.023 1.008 0.024 0.937 0.184 0.982 0.177 0.933 0.180 0.983 0.170 0.944 0.183 0.999 0.177 0.523 0.033 0.509 0.025 0.520 0.030 0.507 0.023 0.518 0.028 0.506 0.022 0.237 0.122 0.227 0.125 0.479 0.102 0.471 0.103 0.728 0.071 0.721 0.072 0.433 0.162 0.540 0.181 0.431 0.159 0.535 0.177 0.448 0.168 0.550 0.197 0.497 0.038 0.569 0.081 0.498 0.038 0.570 0.082 0.499 0.040 0.571 0.084 75 4.147 0.936 4.869 0.455 4.183 0.900 4.884 0.435 4.254 0.858 4.900 0.457 1.011 0.025 1.010 0.025 1.012 0.025 1.009 0.024 1.011 0.025 1.007 0.023 0.765 0.297 0.976 0.209 0.770 0.289 0.985 0.199 0.806 0.272 1.018 0.209 0.793 0.047 0.756 0.022 0.792 0.046 0.756 0.021 0.788 0.042 0.754 0.020 0.229 0.123 0.223 0.126 0.467 0.107 0.466 0.107 0.719 0.073 0.718 0.072 0.409 0.194 0.526 0.212 0.414 0.185 0.540 0.205 0.419 0.189 0.534 0.211 0.510 0.040 0.573 0.085 0.506 0.040 0.568 0.080 0.509 0.041 0.572 0.085 Note: Under each λ value, first two columns correspond to estimates treating y0 as exogenous, whereas the last two columns correspond to estimates treating y0 as endogenous Under each ρ value, the seven rows correspond to, respectively, β0 (= 5), β1 (= 1), γ(= 1), ρ, λ, σµ (= 5) and σv (= 5) 48 Table Monte Carlo Mean and RMSE for the QMLEs Fixed Effects with Normal Errors, n = 50, T = ρ λ = 25 λ = 50 λ = 0.75 Mean Rmse Mean Rmse Mean Rmse The fixed effects are randomly generated 25 β1 0.980 0.031 0.982 0.031 ρ 0.236 0.028 0.238 0.027 λ 0.248 0.124 0.484 0.104 σv 0.489 0.038 0.489 0.037 50 β1 0.975 0.036 0.977 0.035 ρ 0.481 0.033 0.484 0.030 λ 0.246 0.123 0.488 0.105 σv 0.490 0.037 0.491 0.038 75 β1 0.964 0.044 0.967 0.043 ρ 0.719 0.042 0.722 0.039 λ 0.239 0.125 0.490 0.102 σv 0.488 0.038 0.489 0.040 0.983 0.239 0.733 0.491 0.977 0.485 0.729 0.494 0.968 0.723 0.725 0.493 0.030 0.025 0.072 0.039 0.033 0.029 0.075 0.037 0.041 0.039 0.074 0.037 The fixed effects are average (over T ) of the X values 25 β1 0.976 0.037 0.976 0.037 0.979 0.035 ρ 0.231 0.034 0.230 0.035 0.233 0.034 λ 0.245 0.124 0.485 0.101 0.727 0.075 σv 0.490 0.036 0.488 0.039 0.492 0.036 50 β1 0.965 0.046 0.966 0.045 0.969 0.042 ρ 0.469 0.044 0.470 0.045 0.471 0.042 λ 0.245 0.121 0.485 0.104 0.729 0.073 σv 0.488 0.040 0.488 0.039 0.491 0.038 75 β1 0.949 0.060 0.949 0.060 0.953 0.055 ρ 0.700 0.061 0.699 0.062 0.700 0.060 λ 0.243 0.123 0.474 0.113 0.731 0.072 σv 0.487 0.038 0.489 0.039 0.490 0.039 Note: β0 = 5, β1 = 1, γ = 1, σµ = and σv = 49 ... studies the ML estimation of (dynamic) panel data models with certain spatial dependence structure, but the asymptotic properties of the estimators are not given For over thirty years of spatial... Testing panel data regression models with spatial error correlation Journal of Econometrics 117, 123-150 Bhargava, A., and J D Sargan (1983) Estimating dynamic random effects models from panel data. .. Likelihood inference for dynamic panel models In Essays in Panel Data Econometrics, M Nerlove (Eds), pp 307-348 Cambridge University Press Ord, J K (1975), Estimation Methods for Models of Spatial Interaction,