(Advances in econometrics) t fomby, r carter hill, thomas b fomby maximum likelihood estimation of misspecified models twenty years later, volume 17 emerald group publishing limited (2003)

252 5 0
(Advances in econometrics) t  fomby, r  carter hill, thomas b  fomby   maximum likelihood estimation of misspecified models  twenty years later, volume 17  emerald group publishing limited (2003)

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

CONTENTS LIST OF CONTRIBUTORS vii INTRODUCTION Thomas B Fomby and R Carter Hill ix A COMPARATIVE STUDY OF PURE AND PRETEST ESTIMATORS FOR A POSSIBLY MISSPECIFIED TWO WAY ERROR COMPONENT MODEL Badi H B.

CONTENTS LIST OF CONTRIBUTORS vii INTRODUCTION Thomas B Fomby and R Carter Hill ix A COMPARATIVE STUDY OF PURE AND PRETEST ESTIMATORS FOR A POSSIBLY MISSPECIFIED TWO-WAY ERROR COMPONENT MODEL Badi H Baltagi, Georges Bresson and Alain Pirotte TESTS OF COMMON DETERMINISTIC TREND SLOPES APPLIED TO QUARTERLY GLOBAL TEMPERATURE DATA Thomas B Fomby and Timothy J Vogelsang 29 THE SANDWICH ESTIMATE OF VARIANCE James W Hardin 45 TEST STATISTICS AND CRITICAL VALUES IN SELECTIVITY MODELS R Carter Hill, Lee C Adkins and Keith A Bender 75 ESTIMATION, INFERENCE, AND SPECIFICATION TESTING FOR POSSIBLY MISSPECIFIED QUANTILE REGRESSION Tae-Hwan Kim and Halbert White 107 QUASI-MAXIMUM LIKELIHOOD ESTIMATION WITH BOUNDED SYMMETRIC ERRORS Douglas Miller, James Eales and Paul Preckel 133 v vi CONSISTENT QUASI-MAXIMUM LIKELIHOOD ESTIMATION WITH LIMITED INFORMATION Douglas Miller and Sang-Hak Lee 149 AN EXAMINATION OF THE SIGN AND VOLATILITY SWITCHING ARCH MODELS UNDER ALTERNATIVE DISTRIBUTIONAL ASSUMPTIONS Mohamed F Omran and Florin Avram 165 ESTIMATING A LINEAR EXPONENTIAL DENSITY WHEN THE WEIGHTING MATRIX AND MEAN PARAMETER VECTOR ARE FUNCTIONALLY RELATED Chor-yiu Sin 177 TESTING IN GMM MODELS WITHOUT TRUNCATION Timothy J Vogelsang 199 BAYESIAN ANALYSIS OF MISSPECIFIED MODELS WITH FIXED EFFECTS Tiemen Woutersen 235 LIST OF CONTRIBUTORS Lee C Adkins Oklahoma State University, Stillwater, USA Florin Avram University de Pau, France Badi H Baltagi Texas A&M University, College Station, USA Keith A Bender University of Wisconsin-Milwaukee, Milwaukee, USA Georges Bresson Universit´e Paris II, Paris, France James Eales Purdue University, West Lafayette, USA Thomas B Fomby Southern Methodist University, Dallas, USA James W Hardin University of South Carolina, Columbia, USA R Carter Hill Louisiana State University, Baton Rouge, USA Tae-Hwan Kim University of Nottingham, Nottingham, UK Sang-Hak Lee Purdue University, West Lafayette, USA Douglas Miller Purdue University, West Lafayette, USA Mohammed F Omran University of Sharjah, UAE Alain Pirotte Universit´e de Valenciennes and Universit´e Paris II, Paris, France Paul Preckel Purdue University, West Lafayette, USA vii viii Dhor-yiu Sin Hong Kong Baptist University, Hong Kong Timothy J Vogelsang Cornell Univerisity, Ithaca, USA Halbert White University of California, San Diego, USA Tiemen Woutersen University of Western Ontario, Ontario, Canada INTRODUCTION It is our pleasure to bring you a volume of papers which follow in the tradition of the seminal work of Halbert White, especially his work in Econometrica (1980, 1982) and his Econometric Society monograph no 22 Estimation, Inference and Specification Analysis (1994) Approximately 20 years have passed since White’s initial work on heteroskedasticity-consistent covariance matrix estimation and maximum likelihood estimation in the presence of misspecified models, so-called quasi-maximum likelihood (QMLE) estimation Over this time, much has been written on these and related topics, many contributions being by Hal himself For example, following Hal’s pure heteroskedasticity robust estimation work, Newey and West (1987) extended robust estimation to autocorrelated data Extensions and refinements of these themes continue today There is no econometric package that we know of today that does not have some provision for robust standard errors in most of the estimation methods offered All of these innovations can be credited to the germinating ideas produced by Hal in his econometric research Thus, we offer this volume in recognition of the pioneering work that he has done in the past and that has proved to be so wonderfully useful in empirical research We look forward to seeing Hal’s work continuing well into the future, yielding, we are sure, many more useful econometric techniques that will be robust to misspecifications of the sort we often face in empirical problems in economics and elsewhere Now let us turn to a brief review of the contents of this volume In the spirit of White (1982), Baltagi, Bresson and Pirotte in their paper entitled “A Comparative Study of Pure and Pretest Estimators for a Possibly Misspecified Two-way Error Component Model” examine the consequences of model misspecification using a panel data regression model Maximum likelihood, random and fixed effects estimators are compared using Monte Carlo experiments under normality of the disturbances but with possibly misspecified variance-covariance matrix In the presence of perfect foresight on the form of the variance-covariance matrix, GLS (maximum likelihood) is always the best in MSE terms However, in the absence of perfect foresight (the more typical case), the authors show that a pre-test estimator is a viable alternative given that its performance is a close second to correct GLS whether the true specification is a two-way, one-way error component or a pooled regression model The authors further show that incorrect GLS, ix x maximum likelihood, or fixed effects estimators may lead to a big loss in mean square error In their paper “Tests of Common Deterministic Trend Slopes Applied to Quarterly Global Temperature Data” Fomby and Vogelsang apply the multivariate deterministic trend-testing framework of Franses and Vogelsang (2002) to compare global warming trends both within and across the hemispheres of the globe They find that globally and within hemispheres the seasons appear not to be warming equally fast In particular, winters appear to be warming faster than summers Across hemispheres, it appears that the winters in the northern and southern hemispheres are warming equally fast whereas the remaining seasons appear to have unequal warming rates In his paper “The Sandwich Estimate of Variance” Hardin examines the history, development, and application of the sandwich estimate of variance In describing this estimator he pays attention to applications that have appeared in the literature and examines the nature of the problems for which this estimator is used He also describes various adjustments to the estimate for use with small samples and illustrates the estimator’s construction for a variety of models In their paper “Test Statistics and Critical Values in Selectivity Models” Hill, Adkins, and Bender examine the finite sample properties of alternative covariance matrix estimators of the Heckman (1979) two-step estimator (Heckit) for the selectivity model so widely used in Economics and other social sciences The authors find that, in terms of how the alternative versions of asymptotic variancecovariance matrices used in selectivity models capture the finite sample variability of the Heckit two-step estimator, the answer depends on the degree of censoring and on whether the explanatory variables in the selection and regression equation differ or not With severe censoring and if the explanatory variables in the two equations are identical, then none of the asymptotic standard error formulations is reliable in small samples In larger samples the bootstrap does a good job in reflecting estimator variability as does a version of the White heteroskedasticity-consistent estimator With respect to finite sample inference the bootstrap standard errors seem to match the nominal standard errors computed from asymptotic covariance matrices unless censoring is severe and there is not much difference in the explanatory variables in the selection and regression equations Most importantly the critical values of the pivotal bootstrap t-statistics lead to better test size than those based on usual asymptotic theory To date the literature on quantile regression and least absolute deviation regression has assumed either explicitly or implicitly that the conditional quantile regression model is correctly specified In their paper “Estimation, Inference, and Specification Testing for Possibly Misspecified Quantile Regression” Kim and White allow for possible misspecification of a linear conditional quantile xi regression model They obtain consistency of the quantile estimator for certain “pseudo-true” parameter values and asymptotic normality of the quantile estimator when the model is misspecified In this case, the asymptotic covariance matrix has a novel form, not seen in earlier work, and they provide a consistent estimator of the asymptotic covariance matrix They also propose a quick and simple test for conditional quantile misspecification based on the quantile residuals Miller, Eales, and Preckel propose in their paper “Quasi-Maximum Likelihood Estimation with Bounded Symmetric Errors” a QMLE estimator for the location parameters of a linear regression model with bounded and symmetrically distributed errors The errors outcomes are restated as the convex combination of the bounds, and they use the method of maximum entropy to derive the quasi-log likelihood function Under the stated model assumptions, they show that the proposed estimator is unbiased, consistent, and asymptotically normal Miller, Eales, and Preckel then conduct a series of Monte Carlo exercises designed to illustrate the sampling properties of QMLE to the least squares estimator Although the least squares estimator has smaller quadratic risk under normal and skewed error processes, the proposed QML estimator dominates least squares for the bounded and symmetric error distribution considered in their paper In their paper “Consistent Quasi-Maximum Likelihood Estimation with Limited Information” Miller and Lee use the minimum cross-entropy method to derive an approximate joint probability model for a multivariate economic process based on limited information about the marginal quasi-density functions and the joint moment conditions The modeling approach is related to joint probability models derived from copula functions They note, however, that the entropy approach has some practical advantages over copula-based models Under suitable regularity conditions, the authors show that the quasi-maximum likelihood estimator (QMLE) of the model parameters is consistent and asymptotically normal They demonstrate the procedure with an application to the joint probability model of trading volume and price variability for the Chicago Board of Trade soybean futures contract There is growing evidence in the financial economics literature that the response of current volatility in financial data to past shocks is asymmetric with negative shocks having more impact on current volatility than positive shocks In their paper “An Examination of the Sign and Volatility Switching ARCH Models Under Alternative Distributional Assumptions” Omran and Avram investigate the asymmetric ARCH models of Glosten et al (1993) and Fornari and Mele (1997) and the sensitivity of their models to the assumption of normality in the innovations Omran and Avram hedge against the possibility of misspecification by basing the inferences on the robust variance-covariance matrix suggested by White (1982) Their results xii suggest that using more flexible distributional assumptions on financial data can have a significant impact on the inferences drawn from asymmetric ARCH models Gourieroux et al (1984) investigate the consistency of the parameters in the conditional mean, ignoring or misspecifying other features of the true conditional density They show that it suffices to have a QMLE of a density from the linear exponential family (LEF) Conversely, a necessary condition for a QMLE being consistent for the parameters in the conditional mean is that the likelihood function belongs to the LEF As a natural extension in Chapter of his book, White (1994) shows that the Gourieroux et al (1984) results carry over to dynamic models with possibly serially correlated and/or heteroskedastic errors In his paper “Estimating a Linear Exponential Density when the Weighting Matrix and Mean Parameter Vector are Functionally Related” Sin shows that the above results not hold when the weighting matrix of the density and the mean parameter vector are functionally related A prominent example is an autoregressive moving-average (ARMA) model with generalized autoregressive conditional heteroscedasticity (GARCH) error However, correct specification of the conditional variance adds conditional moment conditions for estimating the parameters of the conditional mean Based on the recent literature of efficient instrumental variables estimator (IVE) or generalized method of moments (GMM), the author proposes an estimator that is based on the QMLE of a density from the quadratic exponential family (QEF) The asymptotic variance of this modified QMLE attains the lower bound for minimax risk In this modeling approach the GARCH-M is also allowed In his paper “Testing in GMM Models Without Truncation” Vogelsang proposes a new approach to testing in the generalized method of moments (GMM) framework The new tests are constructed using heteroskedasticity autocorrelation (HAC) robust standard errors computed using nonparametric spectral density estimators without truncation While such standard errors are not consistent, a new asymptotic theory shows that they lead to valid tests nonetheless In an over-identified linear instrumental variables model, simulations suggest that the new tests and the associated limiting distribution theory provide a more accurate first order asymptotic null approximation than both standard nonparametric HAC robust tests and VAR-based parametric HAC robust tests Finite sample power of the new tests is shown to be comparable to standard tests In applied work, economists analyze individuals or firms that differ in observed and unobserved ways These unobserved differences are usually referred to as heterogeneity and one can control for the heterogeneity in panel data by allowing for time-invariant, individual-specific parameters This fixed effect approach introduces many parameters into the model that causes the “incidental parameter problem”: the maximum likelihood estimator is in general inconsistent Woutersen (2001) shows how to approximately separate the parameters of interest from xiii the fixed effects using a reparameterization He then shows how a Bayesian method gives a general solution to the incidental parameter for correctly specified models In his paper in this volume “Bayesian Analysis of Misspecified Models with Fixed Effects” Woutersen extends his 2001 work to misspecified models Following White (1982) he assumes that the expectation of the score of the integrated likelihood is zero at the true values of the parameters He then derives √ the conditions under which a Bayesian estimator converges at the rate of N, where N is the number of individuals Under these conditions, Woutersen shows that the variance-covariance matrix of the Bayesian estimator has the form of White (1982) He goes on to illustrate the approach by analyzing the dynamic linear model with fixed effects and a duration model with fixed effects Thomas B Fomby and R Carter Hill Co-editors REFERENCES Fornari, F., & Mele, A (1997) Sign- and volatility-switching ARCH models: Theory and applications to international stock markets Journal of Applied Econometrics, 12, 49–65 Franses, P H., & Vogelsang, T J (2002) Testing for common deterministic trend slopes, Center for Analytic Economics Working Paper 01–15, Cornell University Glosten, L R., Jagannathan, R., & Runkle, D (1993) On the relationship between expected value and the volatility of nominal excess returns on stocks Journal of Finance, 43, 1779–1801 Gourieroux, C., Monfort, A., & Trognon, A (1984) Pseudo-maximum likelihood methods: Theory Econometrica, 52, 681–700 Heckman, J J (1979) Sample selection bias as a specification error Econometrica, 47, 153–161 Newey, W., & West, K (1987) A simple positive semi-definite heteroskedasticity and autocorrelation consistent covariance matrix Econometrica, 55, 703–708 White, H (1980) A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity Econometrica, 48, 817–838 White, H (1982) Maximum likelihood estimation of misspecified models Econometrica, 50, 1–25 White, H (1994) Estimation, inference and specification analysis Econometric Society Monograph No 22 Cambridge, UK: Cambridge University Press Woutersen, T M (2001) Robustness against incidental parameters and mixing distributions, Working Paper, Department of Economics, University of Western Ontario A COMPARATIVE STUDY OF PURE AND PRETEST ESTIMATORS FOR A POSSIBLY MISSPECIFIED TWO-WAY ERROR COMPONENT MODEL Badi H Baltagi, Georges Bresson and Alain Pirotte ABSTRACT In the spirit of White’s (1982) paper, this paper examines the consequences of model misspecification using a panel data regression model Maximum likelihood, random and fixed effects estimators are compared using Monte Carlo experiments under normality of the disturbances but with a possibly misspecified variance-covariance matrix We show that the correct GLS (ML) procedure is always the best according to MSE performance, but the researcher does not have perfect foresight on the true form of the variance covariance matrix In this case, we show that a pretest estimator is a viable alternative given that its performance is a close second to correct GLS (ML) whether the true specification is a two-way, a one-way error component model or a pooled regression model Incorrect GLS, ML or fixed effects estimators may lead to a big loss in MSE A fundamental assumption underlying classical results on the properties of the maximum likelihood estimator is that the stochastic law which determines the behavior of the phenomena investigated (the “true” structure) is known to lie within a specified parametric family of probability distributions (the model) In other words, the probability model is Maximum Likelihood Estimation of Misspecified Models: Twenty Years Later Advances in Econometrics, Volume 17, 1–27 Copyright © 2003 by Elsevier Ltd All rights of reproduction in any form reserved ISSN: 0731-9053/doi:10.1016/S0731-9053(03)17001-6 BAYESIAN ANALYSIS OF MISSPECIFIED MODELS WITH FIXED EFFECTS Tiemen Woutersen ABSTRACT One way to control for the heterogeneity in panel data is to allow for time-invariant, individual specific parameters This fixed effect approach introduces many parameters into the model which causes the “incidental parameter problem”: the maximum likelihood estimator is in general inconsistent Woutersen (2001) shows how to approximately separate the parameters of interest from the fixed effects using a reparametrization He then shows how a Bayesian method gives a general solution to the incidental parameter for correctly specified models This paper extends Woutersen (2001) to misspecified models Following White (1982), we assume that the expectation of the score of the integrated likelihood is zero at the true values of the parameters We then derive the conditions under which a Bayesian √ estimator converges at rate N where N is the number of individuals Under these conditions, we show that the variance-covariance matrix of the Bayesian estimator has the form of White (1982) We illustrate our approach by the dynamic linear model with fixed effects and a duration model with fixed effects Maximum Likelihood Estimation of Misspecified Models: Twenty Years Later Advances in Econometrics, Volume 17, 235–249 © 2003 Published by Elsevier Ltd ISSN: 0731-9053/doi:10.1016/S0731-9053(03)17011-9 235 236 TIEMEN WOUTERSEN INTRODUCTION In applied work, economist rarely have data that can be viewed as being generated by an homogeneous group That is, firms or individuals differ in observed and unobserved ways These unobserved differences are usually referred to as heterogeneity and one can control for the heterogeneity in panel data by allowing for time-invariant, individual specific parameters Accounting for heterogeneity using such individual or fixed effects avoids distributional and independence assumptions (which are usually not supported by economic theory), see Chamberlain (1984, 1985), Heckman et al (1998) and Arellano and Honor´e (2001) This fixed effect approach introduces many parameters into the model which causes the “incidental parameter problem” of Neyman and Scott (1948): the maximum likelihood estimator is in general inconsistent Chamberlain (1984), Trognon (2000) and Arellano and Honor´e (2001) review panel data techniques that give good estimators for specific models Woutersen (2001) derives a general solution that approximately separates the parameters of interest from the fixed effects using a reparametrization After the reparametrization, the fixed effects are integrated out with respect to a flat prior This yields a Bayesian estimator for the parameter ˆ that has a low bias, O(T −2 ) where T is the number of observations of interest, ␤, per individual Moreover, the asymptotic distribution of ␤ˆ has the following form, √ d NT (␤ˆ − ␤0 )→N(0, I(␤)−1 ) where I(␤) is the information matrix and T ∝ N ␣ where ␣ > 1/3 Thus, the asymptotic variance of ␤ˆ is the same as the asymptotic variance of the infeasible maximum likelihood estimator that uses the true values of the fixed effects This paper extends the analysis of Woutersen (2001) by allowing for misspecification of the likelihood Following White (1982), we assume that the expectation of the score is zero at the true values of the parameters We then derive√the primitive conditions under which the Bayesian estimator converges at rate N In particular, we assume the “score” of the integrated likelihood to be zero at the true value of the parameter of interest Under these conditions, we show that the variance-covariance matrix of the Bayesian estimator has the form of White (1982) Lancaster (2000, 2002) does not derive asymptotic variances and another new feature of this paper is that it derives the asymptotic variance of the integrated likelihood in a fixed T, increasing N asymptotics We illustrate our approach by the dynamic linear model with fixed effects and a duration model with fixed effects This paper is organized as follows Section reviews information-orthogonality as a way to separate the nuisance parameters from the parameter of interest Section discusses the integrated likelihood approach Section gives the Bayesian Analysis of Misspecified Models with Fixed Effects 237 conditions for consistency and derives the variance-covariance matrix under misspecification Section discusses the dynamic linear model and a duration model and Section concludes INFORMATION-ORTHOGONALITY The presence of individual parameters in the likelihood can inhibit consistent estimation of the parameters of interest, as shown by Neyman and Scott (1948) For example, the dynamic linear model with fixed effects cannot be consistently estimated by maximum likelihood, as shown by Nickell (1981).1 Information-orthogonality reduces the dependence between the parameters of interest and the individual parameters We introduce more notation so that we can be specific Suppose we observe N individuals for T periods Let the log likelihood contribution of the tth spell of individual i be denoted by L it Summing over the contributions of individual i yields the log likelihood contribution, L i (␤, ␭i ) = L it (␤, ␭i ), t where ␤ is the common parameter and ␭i is the individual specific effect Suppose that the parameter ␤ is of interest and that the fixed effect ␭i is a nuisance parameter that controls for heterogeneity We can approximately separate ␤ from ␭ = {␭1 , , ␭N } by using an information-orthogonal parametrization of the quasi likelihood In particular, information-orthogonality reduces this dependence between ␤ and ␭ by having cross derivatives of the quasi log-likelihood being zero in expectation That is, EL ␤␭ (␤0 , ␭0 ) = i.e y max y L ␤␭ (␤0 , ␭0 ) eL(␤0 ,␭0 ) dy = 0, where y denotes the dependent variable, y ∈ [y , y max ] and {␤0 , ␭0 } denote the true value of the parameters Cox and Reid (1987) and Jeffrey (1961) use this concept and refer to it as “orthogonality.” Lancaster (2000, 2002) applies this orthogonality idea to panel data and Woutersen (2000) gives an overview of orthogonality concepts Chamberlain (1984) and Arellano and Honor´e (2001) review panel data econometrics in their handbook chapters All but two of their models can be written in information-orthogonal form.2 238 TIEMEN WOUTERSEN Suppose that a quasi-likelihood is not information-orthogonal In that case we reparameterize the quasi-likelihood to make it information-orthogonal Let the individual nuisance parameter that is not information-orthogonal be denoted by f We can interpret f as a function of ␤ and information-orthogonal ␭, f(␤, ␭), and write the log likelihood as L(␤, f(␤, ␭)) Differentiating L(␤, f(␤, ␭)) with respect to ␤ and ␭ yields ∂L(␤, f(␤, ␭)) ∂f = L␤ + Lf ∂␤ ∂␤ ∂2 L(␤, f(␤, ␭)) ∂f ∂f ∂f ∂2 f = L f␤ + L ff + Lf ∂␭∂␤ ∂␭ ∂␭ ∂␤ ∂␭∂␤ where L f is a score and therefore EL f = Information-orthogonality requires the cross-derivative ∂2 L(␤, f(␤, ␭))/∂␭∂␤ to be zero in expectation, i.e EL ␤␭ = EL f␤ ∂f ∂f ∂f + EL ff = ∂␭ ∂␭ ∂␤ This implies the following differential equation EL f␤ + EL ff ∂f = ∂␤ (1) If Eq (1) has an analytical solution then f(·) can be written as a function of {␤, ␭} If Eq (1) has an implicit solution, then the Jacobian ∂␭/∂f can be recovered from the implicit solution The Jacobian ∂␭/∂f is all we need for a reparametrization in a Bayesian framework The general nonlinear model and the single index model have an information-orthogonal parametrization that is implicit, as shown in Woutersen (2001) For the remainder of the paper, we assume information-orthogonality The “invariance result” of the maximum likelihood estimator implies that reparametrizations not change the estimates In particular, an informationorthogonal parametrization would yield the same estimates for ␤ as a parametrization that is not information-orthogonal However, the integrating out method does not have this invariance property and this paper shows that informationorthogonality can yield moment functions that are robust against incidental parameters, even under misspecification Bayesian Analysis of Misspecified Models with Fixed Effects 239 THE INTEGRATED LIKELIHOOD After ensuring information-orthogonality, we integrate out the fixed effects and use the mode of the integrated likelihood as an estimator That is, ␤ˆ = arg max L I (␤) ␤ where L I (␤) = ln eL i (␤,␭) d␭i i Misspecification has been, so far, not considered in combination with the integrated likelihood approach as is apparent from the overviews of Gelman et al (1995) and Berger et al (1999) The point of this paper, however, is to consider misspecification In particular, L i (␤, ␭) does not need to be a fully specified likelihood It is sufficient that we specify, as an approximation, a density for y it that is conditional on x it and ␭i The likelihood contribution L it (␤, ␭) is the logarithm of this conditional density and L i (␤, ␭) = t L it (␤, ␭) In particular, the distribution of the fixed effects is left unrestricted Thus, in this set-up we can think of the Data Generating Process as follows First, the fixed effects, f , , f N , are generated from an unknown and unrestricted distribution As a second step, x 11 , , x N1 is generated from another unknown distribution that can depend on f , , f N Then y 11 , , y N1 is generated by a conditional distribution3 that is approximated by the econometrician For period t = 2, the distribution of x 12 , , x N2 can depend on f , , f N , x , , x N Alternatively, x it can allowed to be endogenous in which case the econometrician specifies a density for y it that is conditional on x i,t−1 and f i ASSUMPTIONS AND THEOREM In this section, we consider estimation while allowing for misspecification of the model The clearest approach seems to impose the assumptions directly on the integrated likelihood function White (1982, 1993) assumes that the expectation of the score is zero at the true value of the parameter Similarly, we assume that the score of the integrated likelihood has expectation zero at the truth Assumption (i) Let {x i , y i } be i.i.d and (ii) let EL i,I ␤ = for every i j,I This assumption implies, by independence across individuals, that EL i,I ␤ L␤ = for i = j Note that the regressor x i = {x i1 , x i2 , , x iT } and dependent variable 240 TIEMEN WOUTERSEN y i = {y i1 , y i2 , , y iT } are not required to be stationary and that x i is not required to be exogenous Assumption (i) ␤ ∈ in ␤ where which is compact or (ii) L I␤ (␤) L I␤ (␤) is concave This is a regularity condition that is often assumed Assumption (i) EL ␤I (␤) = is uniquely solved for ␤ = ␤0 ; (ii) L ␤I (␤) is continuous at each ␤ ∈ with probability one; and (iii) E sup␤∈ ||L I␤ (␤)|| < ∞ Information-orthogonality, EL ␤␭ (␤0 , ␭0 ) = 0, does not imply EL ␤I (␤) = but the stronger condition L ␤␭ (␤, ␭) = does However, imposing this stronger condition excludes many interesting models Thus, it could be that EL ␤␭ (␤0 , ␭0 ) = is not a necessary condition for EL I␤ (␤) = but we not know examples for which EL ␤I (␤) = and EL ␤␭ (␤0 , ␭0 ) = We therefore recommend to first reparameterize the model so that EL ␤␭ (␤0 , ␭0 ) = and, as a second step, check Assumptions 1–3 Assumption (i) ␤0 ∈ interior( ); (ii) L I␤ (␤) is continuously differentiable in a neighborhood N of ␤0 ; (iii) EL I␤␤ (␤) is continuous at ␤0 and p sup␤∈N ||EL I␤␤ (␤) − EL I␤␤ (␤)||→0; and (iv) EL I␤␤ (␤0 ) is nonsingular Theorem Suppose ␤ˆ = arg min␤ {(L I␤ (␤)/NT )(L I␤ (␤)/NT)} Let Assumptions 1–4 hold Let N → ∞ while T is fixed Then √ NT (␤ˆ − ␤0 ) → N(0, ) where = EL I (␤ ) NT ␤␤ −1 E{(L I␤ (␤0 ))(L I␤ (␤0 )) } NT EL I (␤ ) NT ␤␤ −1 Proof: See Appendix A The theorem shows that the integrated likelihood as a convenient tool to derive moments that are robust against incidental parameters as well as robust against misspecification of the parametric error term Bayesian Analysis of Misspecified Models with Fixed Effects 241 EXAMPLES In this section we discuss two examples that illustrate the integrated likelihood approach 5.1 Dynamic Linear Model Consider the dynamic linear model with fixed effects, y it = y i,t−1 ␤ + f i + ␧it for E␧is ␧it = E␧it = 0, E␧2it < ∞ where for s=t and t = 1, , T This model is perhaps the simplest model that nests both state dependence and heterogeneity as alternative explanations for the variation in the values of y it across agents As such, the dynamic linear model is popular in the development and growth literature For a discussion and further motivation of this model, see Kiviet (1995), Hahn, Hausman and Kuersteiner (2001), Arellano and Honor´e (2001) as well as the references therein Lancaster (2002) suggests the following informationorthogonal parametrization, f i = y i0 (1 − ␤) + ␭i e−b(␤) where b(␤) = T T t=1 T−t t ␤ t However, Lancaster (2002) does not derive the asymptotic variance of the integrated likelihood estimator Woutersen (2001) shows that, under normality of ␧it , the integrated likelihood estimator is adaptive for an asymptotic with T ∝ N ␣ and ␣ > 1/3 That is, the asymptotic variance does not depend on knowledge of ␭ in this asymptotic We now consider the case where the normality of ␧it fails to hold and only assume normality in order to derive the integrated likelihood estimator Note that EL i␤ = E ␧t {y t−1 − y − b (␤)␭ e−b(␤) } = ␴ t EL i␤␭ = − b (␤) e−b(␤) E ␴2 ␧t = t The log likelihood contribution has the following form, 1 L i = − ln(␴2 ) − 2 2␴ (˜yt − y˜ t−1 ␤ − ␭ e−b(␤) )2 t where y˜ t = y t − y 242 TIEMEN WOUTERSEN Integrating with respect to ␭ gives the integrated likelihood contribution, eL i,I −b(␤) )2 e−(1/2␴ ) t (˜yt −˜yt−1 ␤−␭ e = √ d␭ ␴ 2 = √ eb(␤) e−(1/2␴ ) t (y t −y t−1 ␤−f) df ␴ b(␤)−(1/2␴2 ) (y t −y t−1 ␤)2 2 t e−(T/2␴ ){f −2f(y t −y t−1 ␤)} df = √ e ␴ ∝ eb(␤)−(1/2␴ ){(T/2)(y t −y t−1 ␤) 2+ t (y t −y t−1 ␤) } , does not depend on f Taking where we omit the subscript i and ∂␭/∂f = logarithms and differentiating with respect to ␤ yields eb(␤) ␴2 L i,I ␤ = b (␤) + L i,I ␤␤ = b (␤) − ␴2 (y it − y i,t−1 ␤)y i,t−1 − T(y it − y i,t−1 ␤)y i,t−1 t y 2i,t−1 + t Ty ␴2 i,t−1 T t t=1 (T − t/t)␤ , t−2 where b(␤) = 1/T b(␤) = 1/T Tt=1 (T − t)␤t−1 , b(␤) = 1/T Tt=1 (T − t)(t − 1)␤ Note that EL i,I ␤ /NT = for any N, T and that the mode of L I (␤)/NT is a consistent estimator for ␤ for N increasing Analogue to the quasi-maximum likelihood estimator of White (1982), the asymptotic variance has the form of Theorem 1, = [(1/NT)EL I␤␤ ]−1 [(1/NT)E{(L I␤ )(L I␤ ) }][(1/NT)EL I␤␤ ]−1 The author views the integrated likelihood as a convenient way to derive moments that can be robust against misspecification of the parametric error term In particular, the parametric assumptions on the error term are irrelevant for the models with additive error terms that are discussed in Arellano and Honor´e (2001) 5.2 Duration Model with Time-Varying Individual Effects Consider a duration model in which the hazard depends on an individual effect f i , a spell-specific effect u is and observable regressors x is In particular, consider the following hazard, ␪is (t) = ef i +x is ␤+u is (2) where the subscript i refers an individual and the subscript s refers to a spell of that individual This hazard depends on two unobservable stochasts, f i and u is Bayesian Analysis of Misspecified Models with Fixed Effects 243 In particular, the individual specific effect f i can depend on the regressors x is We avoid distributional assumption on the spell-specific effect u is but we assume that u is is independent of x is and Ee−u is < ∞ Thus, the hazard of Eq (2) is a generalization of the fixed effect hazard model with regressors where the hazard is ef i +x is ␤ Chamberlain (1984) developed an estimator for the last model and Van den Berg (2001) gives a current review of duration models A common criticism of the model with hazard ef i +x is ␤ is that it assumes that variations in the hazard can all be explained by variations in the regressor x is In other words, the unobservable effect is constant over time, see Van den Berg (2001) for this argument Equation (2) extends this model by allowing for a spell-specific effect u it As an approximation of the model of (2) we consider ␪is = e␭i +x is ␤ where s x is = This hazard implies a log likelihood and the normalization, s x is = 0, ensures that the log likelihood is information-orthogonal In particular, L i (␤, ␭i ) = T␭i − e␭i ex is ␤ t is , s L i␤ (␤, ␭i ) = −e␭i x is ex is ␤ t is , s and L i␤␭i (␤, ␭i ) = −e␭i x is ex is ␤ t is s Note that ex is ␤0 t is is exponentially distributed with mean e−(␭0,i +u is ) This implies, EL i␤ (␤0 , ␭i,0 ) = −E e␭0,i x is e−(␭0,i +u is ) = −E s x is e−u is = s since s x is = and u is is independent of x is Similarly, EL i␤␭ (␤0 , ␭i,0 ) = i Integrating ␭i with respect to the likelihood gives L i,I = ln i eL d␭i = ln i − eT␭ i e s ex is ␤+␭i t is d␭i = ln see Appendix B for details Thus, L i,I ␤ T = s x is ex is ␤ t is x is ␤ t is se { (T) x is ␤ t }T e is s 244 TIEMEN WOUTERSEN and L I␤ NT L I␤␤ NT = = s i N i N x is ex is ␤ t is x is ␤ t is se s x 2is ex is ␤ t is − { s x is ex is ␤ t is }2 { s ex is ␤ t is }2 In Appendix C, it is shown that (1/NT)EL I␤ = for any N and any T ≥ Thus, the mode of L I (␤)/NT is a consistent estimator for ␤ for N increasing Moreover, the asymptotic variance has the form of Theorem 1, = [(1/NT)EL I␤␤ ]−1 [(1/NT)E{(L I␤ )(L I␤ ) }][(1/NT)EL I␤␤ ]−1 5.2.1 Simulation Let the data be generated by the following hazard model, ␪is (t) = ef i +x is ␤+u is (3) This hazard implies that the expected duration, conditional on f i , x is , and u is equals4 1/ef i +x is ␤+u is , i.e E(t is |f i , x is , u is ) = 1/ef i +x is ␤+u is Let the exponent of the individual effect, ef i , have a unit exponential distribution and let the individual spell effect, u is , be normally distributed with mean zero and variance ␴2u Suppose that we observe a group of N individuals and that we observe an unemployment spell before and after treatment, that is x i1 = for all i and x i2 = for all i Heckman, Ichimura, Smith and Todd (1998) discuss the estimation of treatment effect models and conclude that the fixed effect model performs very well This simulation study extends the fixed effect duration model by allowing for an spell specific effect u is , i = 1, , N and s = 1, In particular, the model of Eq (3) also extends both Chamberlain (1985) and Ridder and Woutersen (2003) by allowing for both random and fixed effects We first assume that the treatment has no effect on the hazard out of unemployment, that is, ␤ = We then assume that the hazard out of unemployment increases by factor 2.7, That is, ␤ = and e␤ = e ≈ 2.7 The estimator developed in this subsection is denoted by “integrated likelihood estimator.” A naive Bayes estimator that just integrated out the fixed effects and then uses the posterior mode is denoted by “naive Bayes estimator.” We use flat priors for all parameters and base inference on the posterior mode after integrating out the fixed effects f i , i = 1, , N The model is misspecified in the sense that the individual spell effect, u is , is ignored Bayesian Analysis of Misspecified Models with Fixed Effects Bias (␤ = 0) RMSE (␤ = 0) 245 Bias (␤ = 1) RMSE (␤ = 1) Integrated likelihood estimator ␴2u = 21 −0.0008 ␴u = 0.0145 ␴u = −0.0008 0.1334 0.1451 0.1790 0.0039 −0.0010 −0.0042 0.1298 0.1467 0.1826 Naive bayes estimator ␴2u = 21 1.1346 ␴u = 1.2197 ␴u = 1.3739 1.1424 1.2288 1.3856 1.1308 1.2188 1.3674 1.1394 1.2285 1.3795 Note that the two estimators use the same likelihood and priors However, the “info-ortho Bayes estimator” separates the nuisance parameter from the parameter of interest before integrating out f i , i = 1, , N As a consequence, the bias is much lower, by about factor 8, for the “integrated likelihood estimator.” Note that, for both estimators, the Root Mean Squared Error (RMSE) is increasing in ␴2u and that the bias of the “naive Bayes estimator” does not strongly depend on the value of ␤ We conclude that separating the nuisance parameter from the parameter of interest works well for this misspecified model CONCLUSION This paper extends the integrated likelihood estimator to misspecified models Using information-orthogonality, we approximately separate the nuisance parameter from the parameter of interest We use a Bayesian techniques since reparametrization of a nuisance parameter only requires an expression of the Jacobian in a Bayesian framework Under the condition that the score of the integrated likelihood has expectation zero at the truth, we show that the variancecovariance matrix of the Bayesian estimator has the form of White (1982) Thus, information-orthogonality combined with the integrated likelihood is a promising approach which solves the incidental parameter problem of Neyman and Scott (1948) for a class of misspecified models We illustrate our approach by two misspecified models with individual effects In the dynamic linear model, we allow the error term to be non-normal and in the hazard model we allow the individual effect to change over time 246 TIEMEN WOUTERSEN NOTES The dynamic linear model assumes that y it = y i,t−1 ␤ + f i + ␧it and we discuss this model in Section 5.1 The transformation model of Abrevaya (1998) and one discrete choice model by Honor’e and Kyriazidou (2000) are not information-orthogonal Both models require infinite support for the regressor, can be estimated using a sign function and will be discussed in a separate paper that deals with “information-orthogonality” of sign functions That is, conditional on f , , f N and x , , x N Note that t is is exponentially distributed if we condition on f i , x is , and u is ACKNOWLEDGMENTS I gratefully acknowlegde stimulating suggestions from Tony Lancaster The Social Science and Humanities Research Council of Canada provided financial support All errors are mine REFERENCES Abrevaya, J (1998) Leapfrog estimation of a fixed-effects model with unknown transformation of the dependent variable Unpublished manuscript, Graduate School of Business, University of Chicago Arellano, M., & Honor´e, B E (2001) Panel data models: Some recent developments In: J Heckman & E Leamer (Eds), Handbook of Econometrics (Vol 5) Amsterdam: North-Holland Berger, J O., Liseo, B., & Wolpert, R L (1999) Integrated likelihood methods for eliminating nuisance parameters Statistical Science, 14, 1–28 Chamberlain, G (1984) Panel data In: Z Griliches & M D Intriligator (Eds), Handbook of Econometrics (Vol 2) Amsterdam: North-Holland Chamberlain, G (1985) Heterogeneity, omitted variable bias, and duration dependence In: J J Heckman & B Singer (Eds), Longitudinal Analysis of Labor Market Data Cambridge: Cambridge University Press Cox, D R., & Reid, N (1987) Parameter orthogonality and approximate conditional inference (with discussion) Journal of the Royal Statistical Society, Series B, 49, 1–39 Gelman, A., Carlin, J B., Stern, H S., & Rubin, D B (1995) Bayesian data analysis New York: Chapman & Hall Hahn, J., Hausman, J., & Kuersteiner, G (2001) Bias corrected instrumental variables estimation for dynamic panel models with fixed effects MIT Working Paper Heckman, J., Ichimura, H., Smith, J., & Todd, P (1998) Characterizing selection bias using experimental data Econometrica, 66, 1017–1098 Honor´e, B E., & Kyriazidou, E (2000) Panel data discrete choice models with lagged dependent variables Econometrica, 68, 839–874 Jeffreys, H (1961) Theory of probability (3rd ed.) Oxford: Clarendon Press Bayesian Analysis of Misspecified Models with Fixed Effects 247 Kiviet (1995) On bias, inconsistency and efficiency of various estimators in dynamic panel data models Journal of Econometrics, 68, 53–78 Lancaster, T (2000) The incidental parameters since 1948 Journal of Econometrics, 95, 391–413 Lancaster, T (2002) Orthogonal parameters and panel data Review of Economic Studies (forthcoming) Newey, W K., & McFadden, D (1994) Large sample estimation and hypothesis testing In: R F Engle & D MacFadden (Eds), Handbook of Econometrics (Vol 4) Amsterdam: North-Holland Neyman, J., & Scott, E L (1948) Consistent estimation from partially consistent observations Econometrica, 16, 1–32 Nickell, S (1981) Biases in dynamic models with fixed effects Econometrica, 49, 1417–1426 Ridder, G., & Woutersen, T M (2003) The singularity of the efficiency bound of the mixed proportional hazard model Econometrica (forthcoming) Trognon, A (2000) Panel data econometrics: A successful past and a promising future Working Paper, Genes (INSEE) Van den Berg, G T (2001) Duration models: Specification, identification, and multiple duration In: J Heckman & E Leamer (Eds), Handbook of Econometrics (Vol 5) Amsterdam: North-Holland White, H (1982) Maximum likelihood estimation of misspecified models Econometrica, 50, 1–25 White, H (1993) Estimation, inference and specification analysis New York: Cambridge University Press Woutersen, T M (2000) Consistent estimation and orthogonality Working Paper, Department of Economics, University of Western Ontario Woutersen, T M (2001) Robustness against incidental parameters and mixing distributions Working Paper, Department of Economics, University of Western Ontario APPENDIX A THEOREM To be shown √ N(␤ˆ − ␤0 ) → N(0, ) where = EL I (␤ ) NT ␤␤ −1 E{(L I␤ (␤0 ))(L I␤ (␤0 )) } NT EL I (␤ ) NT ␤␤ −1 Proof: Let Assumptions 1, 2(i), and hold Then all the conditions of Newey and McFadden (1994, Theorem 2.6) are satisfied and consistency follows Assuming that, in addition Assumption holds then the assumptions of Newey and McFadden (1994, Theorem 3.2) are satisfied and asymptotic normality follows where the identity matrix is used a the weighting matrix Instead of assuming that the parameter space is compact as in 2(ii) we can assume that we assume that ␤0 is an element of the interior of a convex set and 248 TIEMEN WOUTERSEN L I␤ (␤) is concave for all i as in Assumption 2(ii) and 4(i) All the requirements of Newey and McFadden (1994, Theorem 2.7) are satisfied and consistency of the integrated likelihood estimator follows Asymptotic normality is again implied by Newey and McFadden (1994, Theorem 3.2) APPENDIX B DURATION EXAMPLE, INTEGRATED LIKELIHOOD To be shown, i − eT␭ i e L i,I = ln s ex is ␤+␭i t is d␭i = ln (T) x is ␤ t }T e is s { where (·) denotes the Gamma function Proof: Define vi = e␭i L i,I = ln = ln Note that ({ eters T and Q.E.D eL i dvi = ln vi { (T) { s vT−1 e−vi i ex is ␤ t is }T s s ex is ␤ t is dvi ex is ␤ t is }T vT−1 e−vi i (T) s ex is ␤ t is dvi x ␤ ex is ␤ t is }T vT−1 e−vi s e is t is )/ (T) is a gamma density with parami x ␤ is t is and that this density integrates to one The result follows se s APPENDIX C DURATION EXAMPLE, SCORE To be shown, EL I = NT ␤ where L I␤ NT = i N s x is ex is ␤0 t is x is ␤0 t is se for any N and any T ≥ Proof: EL I = E i NT ␤ N s x is ex is ␤0 t is =E i x is ␤0 t N e is s s x is ex is ␤0 +␭0 t is x is ␤0 +␭0 t is se Bayesian Analysis of Misspecified Models with Fixed Effects 249 Note that ex is ␤0 +␭0 t is is exponentially distributed with mean e−u is Also note that the expectation of ex is ␤0 +␭0 t is / s ex is ␤0 +␭0 t is does not depend on x is Thus, define ␮i = E(ex is ␤0 +␭0 t is )/ s ex is ␤0 +␭0 t is This yields EL I = E i NT ␤ N since s x is = Q.E.D x is = E s i N x is = ␮i s ... consider t- tests that are not only robust to serial correlation in ut but are also robust to the possibility that ut has a unit root These tests not suffer from the usual over-rejection problem that... model to estimate or what estimation method to use Unfortunately, the statistical properties of pretest estimators are, in practice, very difficult to derive The literature on pretest estimators... deterioration in the correct choice of pretest estimator under non-normality, the resulting relative MSE performance reported in Table seems to be unaffected This indicates that, at least for our limited

Ngày đăng: 22/10/2022, 20:43

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan