Econometrics (hansen)

387 2.5K 0
Econometrics (hansen)

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

ECONOMETRICS Bruce E Hansen c 2000, 20151 University of Wisconsin Department of Economics This Revision: January 16, 2015 Comments Welcome This manuscript may be printed and reproduced for individual or instructional use, but may not be printed for commercial purposes Contents Preface viii Introduction 1.1 What is Econometrics? 1.2 The Probability Approach to Econometrics 1.3 Econometric Terms and Notation 1.4 Observational Data 1.5 Standard Data Structures 1.6 Sources for Economic Data 1.7 Econometric Software 1.8 Reading the Manuscript 1.9 Common Symbols 1 7 Conditional Expectation and Projection 2.1 Introduction 2.2 The Distribution of Wages 2.3 Conditional Expectation 2.4 Log Dierences* 2.5 Conditional Expectation Function 2.6 Continuous Variables 2.7 Law of Iterated Expectations 2.8 CEF Error 2.9 Intercept-Only Model 2.10 Regression Variance 2.11 Best Predictor 2.12 Conditional Variance 2.13 Homoskedasticity and Heteroskedasticity 2.14 Regression Derivative 2.15 Linear CEF 2.16 Linear CEF with Nonlinear Eects 2.17 Linear CEF with Dummy Variables 2.18 Best Linear Predictor 2.19 Linear Predictor Error Variance 2.20 Regression Coecients 2.21 Regression Sub-Vectors 2.22 Coecient Decomposition 2.23 Omitted Variable Bias 2.24 Best Linear Approximation 2.25 Normal Regression 2.26 Regression to the Mean 2.27 Reverse Regression 2.28 Limitations of the Best Linear Predictor 9 11 13 14 15 16 18 19 19 20 21 22 23 24 25 26 28 34 35 35 36 37 38 38 39 40 41 i CONTENTS ii 2.29 Random Coecient Model 2.30 Causal Eects 2.31 Expectation: Mathematical Details* 2.32 Existence and Uniqueness of the Conditional Expectation* 2.33 Identification* 2.34 Technical Proofs* Exercises The Algebra of Least Squares 3.1 Introduction 3.2 Random Samples 3.3 Sample Means 3.4 Least Squares Estimator 3.5 Solving for Least Squares with One Regressor 3.6 Solving for Least Squares with Multiple Regressors 3.7 Illustration 3.8 Least Squares Residuals 3.9 Model in Matrix Notation 3.10 Projection Matrix 3.11 Orthogonal Projection 3.12 Estimation of Error Variance 3.13 Analysis of Variance 3.14 Regression Components 3.15 Residual Regression 3.16 Prediction Errors 3.17 Influential Observations 3.18 Normal Regression Model 3.19 CPS Data Set 3.20 Programming 3.21 Technical Proofs* Exercises Least Squares Regression 4.1 Introduction 4.2 Sample Mean 4.3 Linear Regression Model 4.4 Mean of Least-Squares Estimator 4.5 Variance of Least Squares Estimator 4.6 Gauss-Markov Theorem 4.7 Residuals 4.8 Estimation of Error Variance 4.9 Mean-Square Forecast Error 4.10 Covariance Matrix Estimation Under 4.11 Covariance Matrix Estimation Under 4.12 Standard Errors 4.13 Computation 4.14 Measures of Fit 4.15 Empirical Example 4.16 Multicollinearity 4.17 Normal Regression Model Exercises Homoskedasticity Heteroskedasticity 41 43 47 49 50 51 55 57 57 57 58 58 59 60 62 62 63 65 66 67 68 68 70 71 72 74 76 78 82 83 86 86 86 87 88 89 91 92 93 95 96 97 100 101 102 103 105 108 110 CONTENTS iii An Introduction to Large Sample Asymptotics 5.1 Introduction 5.2 Asymptotic Limits 5.3 Convergence in Probability 5.4 Weak Law of Large Numbers 5.5 Almost Sure Convergence and the Strong Law* 5.6 Vector-Valued Moments 5.7 Convergence in Distribution 5.8 Higher Moments 5.9 Functions of Moments 5.10 Delta Method 5.11 Stochastic Order Symbols 5.12 Uniform Stochastic Bounds* 5.13 Semiparametric Eciency 5.14 Technical Proofs* Exercises 112 112 112 114 115 116 117 118 120 121 123 124 126 127 130 134 Asymptotic Theory for Least Squares 6.1 Introduction 6.2 Consistency of Least-Squares Estimator 6.3 Asymptotic Normality 6.4 Joint Distribution 6.5 Consistency of Error Variance Estimators 6.6 Homoskedastic Covariance Matrix Estimation 6.7 Heteroskedastic Covariance Matrix Estimation 6.8 Summary of Covariance Matrix Notation 6.9 Alternative Covariance Matrix Estimators* 6.10 Functions of Parameters 6.11 Asymptotic Standard Errors 6.12 t statistic 6.13 Confidence Intervals 6.14 Regression Intervals 6.15 Forecast Intervals 6.16 Wald Statistic 6.17 Homoskedastic Wald Statistic 6.18 Confidence Regions 6.19 Semiparametric Eciency in the Projection Model 6.20 Semiparametric Eciency in the Homoskedastic Regression Model* 6.21 Uniformly Consistent Residuals* 6.22 Asymptotic Leverage* Exercises 135 135 136 137 141 144 144 145 147 147 148 151 153 154 155 157 158 158 159 160 162 163 164 166 Restricted Estimation 7.1 Introduction 7.2 Constrained Least Squares 7.3 Exclusion Restriction 7.4 Minimum Distance 7.5 Asymptotic Distribution 7.6 Ecient Minimum Distance Estimator 7.7 Exclusion Restriction Revisited 7.8 Variance and Standard Error Estimation 7.9 Misspecification 169 169 170 171 172 173 174 175 177 177 CONTENTS 7.10 Nonlinear Constraints 7.11 Inequality Restrictions 7.12 Constrained MLE 7.13 Technical Proofs* Exercises iv 179 180 181 181 183 Hypothesis Testing 8.1 Hypotheses 8.2 Acceptance and Rejection 8.3 Type I Error 8.4 t tests 8.5 Type II Error and Power 8.6 Statistical Significance 8.7 P-Values 8.8 t-ratios and the Abuse of Testing 8.9 Wald Tests 8.10 Homoskedastic Wald Tests 8.11 Criterion-Based Tests 8.12 Minimum Distance Tests 8.13 Minimum Distance Tests Under Homoskedasticity 8.14 F Tests 8.15 Likelihood Ratio Test 8.16 Problems with Tests of NonLinear Hypotheses 8.17 Monte Carlo Simulation 8.18 Confidence Intervals by Test Inversion 8.19 Power and Test Consistency 8.20 Asymptotic Local Power 8.21 Asymptotic Local Power, Vector Case 8.22 Technical Proofs* Exercises 185 185 186 187 187 188 189 190 192 192 194 194 195 196 197 198 199 202 204 205 207 210 211 213 Regression Extensions 9.1 NonLinear Least Squares 9.2 Generalized Least Squares 9.3 Testing for Heteroskedasticity 9.4 Testing for Omitted NonLinearity 9.5 Least Absolute Deviations 9.6 Quantile Regression Exercises 215 215 218 221 221 222 224 227 10 The Bootstrap 10.1 Definition of the Bootstrap 10.2 The Empirical Distribution Function 10.3 Nonparametric Bootstrap 10.4 Bootstrap Estimation of Bias and Variance 10.5 Percentile Intervals 10.6 Percentile-t Equal-Tailed Interval 10.7 Symmetric Percentile-t Intervals 10.8 Asymptotic Expansions 10.9 One-Sided Tests 10.10Symmetric Two-Sided Tests 10.11Percentile Confidence Intervals 229 229 229 231 231 232 234 234 235 237 238 239 CONTENTS v 10.12Bootstrap Methods for Regression Models 240 Exercises 242 11 NonParametric Regression 11.1 Introduction 11.2 Binned Estimator 11.3 Kernel Regression 11.4 Local Linear Estimator 11.5 Nonparametric Residuals and Regression 11.6 Cross-Validation Bandwidth Selection 11.7 Asymptotic Distribution 11.8 Conditional Variance Estimation 11.9 Standard Errors 11.10Multiple Regressors Fit 243 243 243 245 246 247 249 252 255 255 256 12 Series Estimation 12.1 Approximation by Series 12.2 Splines 12.3 Partially Linear Model 12.4 Additively Separable Models 12.5 Uniform Approximations 12.6 Runges Phenomenon 12.7 Approximating Regression 12.8 Residuals and Regression Fit 12.9 Cross-Validation Model Selection 12.10Convergence in Mean-Square 12.11Uniform Convergence 12.12Asymptotic Normality 12.13Asymptotic Normality with Undersmoothing 12.14Regression Estimation 12.15Kernel Versus Series Regression 12.16Technical Proofs 259 259 259 261 261 261 263 263 266 266 267 268 269 270 271 272 272 13 Generalized Method of Moments 13.1 Overidentified Linear Model 13.2 GMM Estimator 13.3 Distribution of GMM Estimator 13.4 Estimation of the Ecient Weight Matrix 13.5 GMM: The General Case 13.6 Over-Identification Test 13.7 Hypothesis Testing: The Distance Statistic 13.8 Conditional Moment Restrictions 13.9 Bootstrap GMM Inference Exercises 278 278 279 280 281 282 282 283 284 285 287 14 Empirical Likelihood 14.1 Non-Parametric Likelihood 14.2 Asymptotic Distribution of EL Estimator 14.3 Overidentifying Restrictions 14.4 Testing 14.5 Numerical Computation 289 289 291 292 293 294 CONTENTS 15 Endogeneity 15.1 Instrumental Variables 15.2 Reduced Form 15.3 Identification 15.4 Estimation 15.5 Special Cases: IV and 2SLS 15.6 Bekker Asymptotics 15.7 Identification Failure Exercises vi 296 297 298 299 299 299 301 302 304 16 Univariate Time Series 16.1 Stationarity and Ergodicity 16.2 Autoregressions 16.3 Stationarity of AR(1) Process 16.4 Lag Operator 16.5 Stationarity of AR(k) 16.6 Estimation 16.7 Asymptotic Distribution 16.8 Bootstrap for Autoregressions 16.9 Trend Stationarity 16.10Testing for Omitted Serial Correlation 16.11Model Selection 16.12Autoregressive Unit Roots 306 306 308 309 309 310 310 311 312 312 313 314 314 17 Multivariate Time Series 17.1 Vector Autoregressions (VARs) 17.2 Estimation 17.3 Restricted VARs 17.4 Single Equation from a VAR 17.5 Testing for Omitted Serial Correlation 17.6 Selection of Lag Length in an VAR 17.7 Granger Causality 17.8 Cointegration 17.9 Cointegrated VARs 316 316 317 317 317 318 318 319 319 320 18 Limited Dependent Variables 18.1 Binary Choice 18.2 Count Data 18.3 Censored Data 18.4 Sample Selection 322 322 323 324 325 19 Panel Data 327 19.1 Individual-Eects Model 327 19.2 Fixed Eects 327 19.3 Dynamic Panel Regression 329 20 Nonparametric Density Estimation 330 20.1 Kernel Density Estimation 330 20.2 Asymptotic MSE for Kernel Estimates 332 CONTENTS A Matrix Algebra A.1 Notation A.2 Matrix Addition A.3 Matrix Multiplication A.4 Trace A.5 Rank and Inverse A.6 Determinant A.7 Eigenvalues A.8 Positive Definiteness A.9 Matrix Calculus A.10 Kronecker Products and the A.11 Vector and Matrix Norms A.12 Matrix Inequalities vii 335 335 336 336 337 338 339 340 341 342 342 343 343 B Probability B.1 Foundations B.2 Random Variables B.3 Expectation B.4 Gamma Function B.5 Common Distributions B.6 Multivariate Random Variables B.7 Conditional Distributions and Expectation B.8 Transformations B.9 Normal and Related Distributions B.10 Inequalities B.11 Maximum Likelihood 348 348 350 350 351 352 354 356 358 359 361 364 Vec Operator C Numerical Optimization C.1 Grid Search C.2 Gradient Methods C.3 Derivative-Free Methods 369 369 369 371 Preface This book is intended to serve as the textbook for a first-year graduate course in econometrics It can be used as a stand-alone text, or be used as a supplement to another text Students are assumed to have an understanding of multivariate calculus, probability theory, linear algebra, and mathematical statistics A prior course in undergraduate econometrics would be helpful, but not required Two excellent undergraduate textbooks are Wooldridge (2009) and Stock and Watson (2010) For reference, some of the basic tools of matrix algebra, probability, and statistics are reviewed in the Appendix For students wishing to deepen their knowledge of matrix algebra in relation to their study of econometrics, I recommend Matrix Algebra by Abadir and Magnus (2005) An excellent introduction to probability and statistics is Statistical Inference by Casella and Berger (2002) For those wanting a deeper foundation in probability, I recommend Ash (1972) or Billingsley (1995) For more advanced statistical theory, I recommend Lehmann and Casella (1998), van der Vaart (1998), Shao (2003), and Lehmann and Romano (2005) For further study in econometrics beyond this text, I recommend Davidson (1994) for asymptotic theory, Hamilton (1994) for time-series methods, Wooldridge (2002) for panel data and discrete response models, and Li and Racine (2007) for nonparametrics and semiparametric econometrics Beyond these texts, the Handbook of Econometrics series provides advanced summaries of contemporary econometric methods and theory The end-of-chapter exercises are important parts of the text and are meant to help teach students of econometrics Answers are not provided, and this is intentional I would like to thank Ying-Ying Lee for providing research assistance in preparing some of the empirical examples presented in the text As this is a manuscript in progress, some parts are quite incomplete, and there are many topics which I plan to add In general, the earlier chapters are the most complete while the later chapters need significant work and revision viii Chapter Introduction 1.1 What is Econometrics? The term econometrics is believed to have been crafted by Ragnar Frisch (1895-1973) of Norway, one of the three principal founders of the Econometric Society, first editor of the journal Econometrica, and co-winner of the first Nobel Memorial Prize in Economic Sciences in 1969 It is therefore fitting that we turn to Frischs own words in the introduction to the first issue of Econometrica to describe the discipline A word of explanation regarding the term econometrics may be in order Its definition is implied in the statement of the scope of the [Econometric] Society, in Section I of the Constitution, which reads: The Econometric Society is an international society for the advancement of economic theory in its relation to statistics and mathematics Its main object shall be to promote studies that aim at a unification of the theoreticalquantitative and the empirical-quantitative approach to economic problems But there are several aspects of the quantitative approach to economics, and no single one of these aspects, taken by itself, should be confounded with econometrics Thus, econometrics is by no means the same as economic statistics Nor is it identical with what we call general economic theory, although a considerable portion of this theory has a defininitely quantitative character Nor should econometrics be taken as synonomous with the application of mathematics to economics Experience has shown that each of these three view-points, that of statistics, economic theory, and mathematics, is a necessary, but not by itself a sucient, condition for a real understanding of the quantitative relations in modern economic life It is the unification of all three that is powerful And it is this unification that constitutes econometrics Ragnar Frisch, Econometrica, (1933), 1, pp 1-2 This definition remains valid today, although some terms have evolved somewhat in their usage Today, we would say that econometrics is the unified study of economic models, mathematical statistics, and economic data Within the field of econometrics there are sub-divisions and specializations Econometric theory concerns the development of tools and methods, and the study of the properties of econometric methods Applied econometrics is a term describing the development of quantitative economic models and the application of econometric methods to these models using economic data 1.2 The Probability Approach to Econometrics The unifying methodology of modern econometrics was articulated by Trygve Haavelmo (19111999) of Norway, winner of the 1989 Nobel Memorial Prize in Economic Sciences, in his seminal APPENDIX B PROBABILITY 364 where the second equality picks to satisfy 1+1 = and the final equality uses this fact to make the substitution = (1) and then collects terms Dividing both sides by E (kX + Y k )(1) we obtain (B.19) Ơ Proof of Markovs Inequality Let denote the distribution function of x Then Z (u) Pr ((x) ) = {()} Z {()} = Z (u) (u) ((u) ) (u) (u) = E ( (x) ((x) )) the inequality using the region of integration {(u) } This establishes the strong form (B.22) Since ((x) ) the final expression is less than E ((x)) establishing the standard form (B.21) Ơ Proof of Chebyshevs Define = ( E)2 and note that E = var () The events ê â Inequality {| E| } and are equal, so by an application Markovs inequality we find Pr(| E| ) = Pr( ) E () = var () as stated B.11 Ơ Maximum Likelihood In this section we provide a brief review of the asymptotic theory of maximum likelihood estimation When the density of y is (y | ) where is a known distribution function and is an unknown ì vector, we say that the distribution is parametric and that is the parameter of the distribution The space is the set of permissible value for In this setting the method of maximum likelihood is an appropriate technique for estimation and inference on We let denote a generic value of the parameter and let denote its true value The joint density of a random sample (y y ) is Y (y | ) (y y | ) = =1 The likelihood of the sample is this joint density evaluated at the observed sample values, viewed as a function of The log-likelihood function is its natural logarithm log () = X =1 log (y | ) The likelihood score is the derivative of the log-likelihood, evaluated at the true parameter value log (y | ) S = We also define the Hessian log (y | ) (B.24) H = E APPENDIX B PROBABILITY 365 and the outer product matrix Ă Â = E S S (B.25) We now present three important features of the likelihood Theorem B.11.1 E log (y | ) =0 =0 (B.26) ES = (B.27) H=I (B.28) and The matrix I is called the information, and the equality (B.28) is called the information matrix equality The maximum likelihood estimator (MLE) is the parameter value which maximizes the likelihood (equivalently, which maximizes the log-likelihood) We can write this as = argmax log () (B.29) In some simple cases, we can find an explicit expression for as a function of the data, but these cases are rare More typically, the MLE must be found by numerical methods To understand why the MLE is a natural estimator for the parameter observe that the standardized log-likelihood is a sample average and an estimator of E log (y | ) : 1X log () = log (y | ) E log (y | ) =1 As the MLE maximizes the left-hand-side, we can see that it is an estimator of the maximizer of the right-hand-side The first-order condition for the latter problem is 0= E log (y | ) which holds at = by (B.26) This suggests that is an estimator of In fact, under conventional regularity conditions, is consistent, as Furthermore, we can derive its asymptotic distribution Theorem B.11.2 Under Ă Â N I regularity conditions, We omit the regularity conditions for Theorem B.11.2, but the result holds quite broadly for models which are smooth functions of the parameters Theorem B.11.2 gives the general form for the asymptotic distribution of the MLE A famous result shows that the asymptotic variance is the smallest possible APPENDIX B PROBABILITY 366 e is an unbiased regTheorem B.11.3 Cramer-Rao Lower Bound If e ular estimator of then var() (I) The Cramer-Rao Theorem shows that the finite sample variance of an unbiased estimator is bounded below by (I)1 This means that the asymptotic variance of the standardized estimator e is bounded below by I In other words, the best possible asymptotic variance among all (regular) estimators is I An estimator is called asymptotically ecient if its asymptotic variance equals this lower bound Theorem B.11.2 shows that the MLE has this asymptotic variance, and is thus asymptotically ecient Theorem B.11.4 The MLE is asymptotically ecient in the sense that its asymptotic variance equals the Cramer-Rao Lower Bound Theorem B.11.4 gives a strong endorsement for the MLE in parametric models b = g() b Finally, consider functions of parameters If = g() then the MLE of is This is because maximization (e.g (B.29)) is unaected by parameterization and transformation Applying the Delta Method to Theorem B.11.2 we conclude that Ă Â b ' G0 b N G0 I G (B.30) b is an asymptotically ecient estimator for since it where G = g(0 )0 By Theorem B.11.4, is the MLE The asymptotic variance G I G is the Cramer-Rao lower bound for estimation of Theorem B.11.5 The Cramer-Rao lower bound for = g() is G0 I G, b = g() b is asymptotically ecient and the MLE Proof of Theorem B.11.1 To see (B.26) Z E log (y | ) log (y | ) (y | ) y = =0 =0 Z (y | ) (y | ) y = (y | ) =0 Z (y | ) y = =0 = = =0 Equation (B.27) follows by exchanging integration and dierentiation E log (y | ) = E log (y | ) = APPENDIX B PROBABILITY 367 Similarly, we can show that E (y | ) (y | ) ! = By direction computation, log (y | ) = (y | ) (y | ) (y | ) (y | )0 (y | )2 (y | ) log (y | ) log (y | )0 (y | ) = Ơ Taking expectations yields (B.28) Proof of Theorem B.11.2 Taking the first-order condition for maximization of log (), and making a first-order Taylor series expansion, log () 0= = X log y | = = =1 X =1 X log (y | ) + log (y | ) =1 where lies on a line segment joining and (Technically, the specific value of varies by row in this expansion.) Rewriting this equation, we find = X =1 !1 ! X log (y | ) S =1 where S are the likelihood scores Since the score S is mean-zero (B.27) with covariance matrix (equation B.25) an application of the CLT yields X S N (0 ) =1 The analysis of the sample Hessian is somewhat more complicated due to the presence of Let H() = log (y ) If it is continuous in then since it follows that H( ) H and so ả 1X X log (y ) = log (y ) H( ) + H( ) 0 =1 =1 H by an application of a uniform WLLN (By uniform, we mean that the WLLN holds uniformly over the parameter value This requires the second derivative to be a smooth function of the parameter.) Together, Ă Â Ă Â H1 N (0 ) = N H1 H1 = N I the final equality using Theorem B.11.1 Ơ APPENDIX B PROBABILITY 368 Proof of Theorem B.11.3 Let Y = (y y ) be the sample, and set S= X log (Y ) = S =1 e= e (Y ) as a which by Theorem (B.11.1) has mean zero and variance I Write the estimator e function of the data Since is unbiased for any Z e e (Y ) (Y ) Y = E = Dierentiating with respect to and evaluating at yields Z e (Y ) (Y ) Y I= Z e (Y ) log (Y ) (Y ) Y = e = E S e S =E the final equality since E (S) = e S = I and var (S) = E (SS ) = By the matrix Cauchy-Schwarz inequality (B.18), E I ả e =E e e var Ă ả Â 0 e e E S E SS E S Â Ă = E SS = (I) as stated Ơ Appendix C Numerical Optimization Many econometric estimators are defined by an optimization problem of the form = argmin () (C.1) where the parameter is R and the criterion function is () : R For example NLLS, GLS, MLE and GMM estimators take this form In most cases, () can be computed for given but is not available in closed form In this case, numerical methods are required to obtain C.1 Grid Search Many optimization problems are either one dimensional ( = 1) or involve one-dimensional optimization as a sub-problem (for example, a line search) In this context grid search may be employed Grid Search Let = [ ] be an interval Pick some and set = ( ) to be the number of gridpoints Construct an equally spaced grid on the region [ ] with gridpoints, which is {() = + ( ) : = } At each point evaluate the criterion function and find the gridpoint which yields the smallest value of the criterion, which is ( ) where = ) is the gridpoint estimate of If the grid is suciently fine argmin0 (()) This value ( to ) capture small oscillations in () the approximation error is bounded by that is, ( Plots of (()) against () can help diagnose errors in grid selection This method is quite robust but potentially costly Two-Step Grid Search The gridsearch method can be refined by a two-step execution For an error bound of pick so that = ( ) For the first step define an equally spaced grid on the region [ ] with gridpoints, which is {() = + ( ) : = } At each point evaluate the criterion function and let = argmin0 (()) For the second step define an equally spaced grid on [( 1) ( + 1)] with gridpoints, which is {0 () = ( 1) + 2( ) : = } Let = argmin0 ( ()) The estimate of is The advantage of the two-step method over a one-step grid search is that the number of p function evaluations has been reduced from ( ) to ( ) which can be substantial The disadvantage is that if the function () is irregular, the first-step grid may not bracket which thus would be missed C.2 Gradient Methods Gradient Methods are iterative methods which produce a sequence : = which All require the choice of a starting value and all require the are designed to converge to 369 APPENDIX C NUMERICAL OPTIMIZATION 370 computation of the gradient of () g() = () and some require the Hessian () If the functions g() and H() are not analytically available, they can be calculated numerically Take the element of g() Let be the unit vector (zeros everywhere except for a one in the row) Then for small H() = () ' ( + ) () Similarly, ( + + ) ( + ) ( + ) + () In many cases, numerical derivatives can work well but can be computationally costly relative to analytic derivatives In some cases, however, numerical derivatives can be quite unstable Most gradient methods are a variant of Newtons method which is based on a quadratic approximation By a Taylors expansion for close to ' g() + H() = g() () ' which implies = H()1 g() This suggests the iteration rule +1 = H( )1 g( ) where One problem with Newtons method is that it will send the iterations in the wrong direction if H( ) is not positive definite One modification to prevent this possibility is quadratic hill-climbing which sets +1 = (H( ) + I )1 g( ) where is set just above the smallest eigenvalue of H( ) if H() is not positive definite Another productive modification is to add a scalar steplength In this case the iteration rule takes the form (C.2) +1 = D g where g = g( ) and D = H( )1 for Newtons method and = (H( ) + I )1 for quadratic hill-climbing Allowing the steplength to be a free parameter allows for a line search, a one-dimensional optimization To pick write the criterion function as a function of () = ( + D g ) a one-dimensional optimization problem There are two common methods to perform a line search A quadratic approximation evaluates the first and second derivatives of () with respect to and picks as the value minimizing this approximation The half-step method considers the sequence = 1/2, 1/4, 1/8, Each value in the sequence is considered and the criterion ( + D g ) evaluated If the criterion has improved over ( ), use this value, otherwise move to the next element in the sequence APPENDIX C NUMERICAL OPTIMIZATION 371 Newtons method does not perform well if () is irregular, and it can be quite computationally costly if H() is not analytically available These problems have motivated alternative choices for the weight matrix These methods are called Quasi-Newton methods Two popular methods are to Davidson-Fletcher-Powell (DFP) and Broyden-Fletcher-Goldfarb-Shanno (BFGS) Let g = g g = and The DFP method sets D = D1 + D1 g g D1 + g D1 g g The BFGS methods sets D = D1 + g D1 D1 g 0 Ă D g + + Â g g g g For any of the gradient methods, the iterations continue until the sequence has converged in some sense This can be defined by examining whether | | | ( ) (1 )| or |( )| has become small C.3 Derivative-Free Methods All gradient methods can be quite poor in locating the global minimum when () has several local minima Furthermore, the methods are not well defined when () is non-dierentiable In these cases, alternative optimization methods are required One example is the simplex method of Nelder-Mead (1965) A more recent innovation is the method of simulated annealing (SA) For a review see Goe, Ferrier, and Rodgers (1994) The SA method is a sophisticated random search Like the gradient methods, it relies on an iterative sequence At each iteration, a random variable is drawn and added to the current value of the parameter If the resulting criterion is decreased, this new value is accepted If the criterion is increased, it may still be accepted depending on the extent of the increase and another randomization The latter property is needed to keep the algorithm from selecting a local minimum As the iterations continue, the variance of the random innovations is shrunk The SA algorithm stops when a large number of iterations is unable to improve the criterion The SA method has been found to be successful at locating global minima The downside is that it can take considerable computer time to execute Bibliography [1] Abadir, Karim M and Jan R Magnus (2005): Matrix Algebra, Cambridge University Press [2] Aitken, A.C (1935): On least squares and linear combinations of observations, Proceedings of the Royal Statistical Society, 55, 42-48 [3] Akaike, H (1973): Information theory and an extension of the maximum likelihood principle. In B Petroc and F Csake, eds., Second International Symposium on Information Theory [4] Anderson, T.W and H Rubin (1949): Estimation of the parameters of a single equation in a complete system of stochastic equations, The Annals of Mathematical Statistics, 20, 46-63 [5] Andrews, Donald W K (1988): Laws of large numbers for dependent non-identically distributed random variables, Econometric Theory, 4, 458-467 [6] Andrews, Donald W K (1991), Asymptotic normality of series estimators for nonparameric and semiparametric regression models, Econometrica, 59, 307-345 [7] Andrews, Donald W K (1993), Tests for parameter instability and structural change with unknown change point, Econometrica, 61, 821-8516 [8] Andrews, Donald W K and Moshe Buchinsky: (2000): A three-step method for choosing the number of bootstrap replications, Econometrica, 68, 23-51 [9] Andrews, Donald W K and Werner Ploberger (1994): Optimal tests when a nuisance parameter is present only under the alternative, Econometrica, 62, 1383-1414 [10] Ash, Robert B (1972): Real Analysis and Probability, Academic Press [11] Basmann, R L (1957): A generalized classical method of linear estimation of coecients in a structural equation, Econometrica, 25, 77-83 [12] Bekker, P.A (1994): Alternative approximations to the distributions of instrumental variable estimators, Econometrica, 62, 657-681 [13] Billingsley, Patrick (1968): Convergence of Probability Measures New York: Wiley [14] Billingsley, Patrick (1995): Probability and Measure, 3rd Edition, New York: Wiley [15] Bose, A (1988): Edgeworth correction by bootstrap in autoregressions, Annals of Statistics, 16, 1709-1722 [16] Box, George E P and Dennis R Cox, (1964) An analysis of transformations, Journal of the Royal Statistical Society, Series B, 26, 211-252 [17] Breusch, T.S and A.R Pagan (1979): The Lagrange multiplier test and its application to model specification in econometrics, Review of Economic Studies, 47, 239-253 372 BIBLIOGRAPHY 373 [18] Brown, B W and Whitney K Newey (2002): GMM, ecient bootstrapping, and improved inference , Journal of Business and Economic Statistics [19] Card, David (1995): Using geographic variation in college proximity to estimate the return to schooling, in Aspects of Labor Market Behavior: Essays in Honour of John Vanderkamp, L.N Christofides, E.K Grant, and R Swidinsky, editors Toronto: University of Toronto Press [20] Carlstein, E (1986): The use of subseries methods for estimating the variance of a general statistic from a stationary time series, Annals of Statistics, 14, 1171-1179 [21] Casella, George and Roger L Berger (2002): Statistical Inference, 2nd Edition, Duxbury Press [22] Chamberlain, Gary (1987): Asymptotic eciency in estimation with conditional moment restrictions, Journal of Econometrics, 34, 305-334 [23] Choi, In and Peter C.B Phillips (1992): Asymptotic and finite sample distribution theory for IV estimators and tests in partially identified structural equations, Journal of Econometrics, 51, 113-150 [24] Chow, G.C (1960): Tests of equality between sets of coecients in two linear regressions, Econometrica, 28, 591-603 [25] Cragg, John (1992): Quasi-Aitken Estimation for Heterskedasticity of Unknown Form" Journal of Econometrics, 54, 179-201 [26] Davidson, James (1994): Stochastic Limit Theory: An Introduction for Econometricians Oxford: Oxford University Press [27] Davison, A.C and D.V Hinkley (1997): Bootstrap Methods and their Application Cambridge University Press [28] Dickey, D.A and W.A Fuller (1979): Distribution of the estimators for autoregressive time series with a unit root, Journal of the American Statistical Association, 74, 427-431 [29] Donald Stephen G and Whitney K Newey (2001): Choosing the number of instruments, Econometrica, 69, 1161-1191 [30] Dufour, J.M (1997): Some impossibility theorems in econometrics with applications to structural and dynamic models, Econometrica, 65, 1365-1387 [31] Efron, Bradley (1979): Bootstrap methods: Another look at the jackknife, Annals of Statistics, 7, 1-26 [32] Efron, Bradley (1982): The Jackknife, the Bootstrap, and Other Resampling Plans Society for Industrial and Applied Mathematics [33] Efron, Bradley and R.J Tibshirani (1993): An Introduction to the Bootstrap, New York: Chapman-Hall [34] Eicker, F (1963): Asymptotic normality and consistency of the least squares estimators for families of linear regressions, Annals of Mathematical Statistics, 34, 447-456 [35] Engle, Robert F and Clive W J Granger (1987): Co-integration and error correction: Representation, estimation and testing, Econometrica, 55, 251-276 [36] Frisch, Ragnar (1933): Editorial, Econometrica, 1, 1-4 BIBLIOGRAPHY 374 [37] Frisch, Ragnar and F Waugh (1933): Partial time regressions as compared with individual trends, Econometrica, 1, 387-401 [38] Gallant, A Ronald and D.W Nychka (1987): Seminonparametric maximum likelihood estimation, Econometrica, 55, 363-390 [39] Gallant, A Ronald and Halbert White (1988): A Unified Theory of Estimation and Inference for Nonlinear Dynamic Models New York: Basil Blackwell [40] Galton, Francis (1886): Regression Towards Mediocrity in Hereditary Stature, The Journal of the Anthropological Institute of Great Britain and Ireland, 15, 246-263 [41] Goldberger, Arthur S (1964): Econometric Theory, Wiley [42] Goldberger, Arthur S (1968): Topics in Regression Analysis, Macmillan [43] Goldberger, Arthur S (1991): A Course in Econometrics Cambridge: Harvard University Press [44] Goe, W.L., G.D Ferrier and J Rogers (1994): Global optimization of statistical functions with simulated annealing, Journal of Econometrics, 60, 65-99 [45] Gosset, William S (a.k.a Student) (1908): The probable error of a mean, Biometrika, 6, 1-25 [46] Gauss, K.F (1809): Theoria motus corporum coelestium, in Werke, Vol VII, 240-254 [47] Granger, Clive W J (1969): Investigating causal relations by econometric models and cross-spectral methods, Econometrica, 37, 424-438 [48] Granger, Clive W J (1981): Some properties of time series data and their use in econometric specification, Journal of Econometrics, 16, 121-130 [49] Granger, Clive W J and Timo Terọsvirta (1993): Modelling Nonlinear Economic Relationships, Oxford University Press, Oxford [50] Gregory, A and M Veall (1985): On formulating Wald tests of nonlinear restrictions, Econometrica, 53, 1465-1468, [51] Haavelmo, T (1944): The probability approach in econometrics, Econometrica, supplement, 12 [52] Hall, A R (2000): Covariance matrix estimation and the power of the overidentifying restrictions test, Econometrica, 68, 1517-1527, [53] Hall, P (1992): The Bootstrap and Edgeworth Expansion, New York: Springer-Verlag [54] Hall, P (1994): Methodology and theory for the bootstrap, Handbook of Econometrics, Vol IV, eds R.F Engle and D.L McFadden New York: Elsevier Science [55] Hall, P and J.L Horowitz (1996): Bootstrap critical values for tests based on GeneralizedMethod-of-Moments estimation, Econometrica, 64, 891-916 [56] Hahn, J (1996): A note on bootstrapping generalized method of moments estimators, Econometric Theory, 12, 187-197 [57] Hamilton, James D (1994) Time Series Analysis BIBLIOGRAPHY 375 [58] Hansen, Bruce E (1992): Ecient estimation and testing of cointegrating vectors in the presence of deterministic trends, Journal of Econometrics, 53, 87-121 [59] Hansen, Bruce E (1996): Inference when a nuisance parameter is not identified under the null hypothesis, Econometrica, 64, 413-430 [60] Hansen, Bruce E (2006): Edgeworth expansions for the Wald and GMM statistics for nonlinear restrictions, Econometric Theory and Practice: Frontiers of Analysis and Applied Research, edited by Dean Corbae, Steven N Durlauf and Bruce E Hansen Cambridge University Press [61] Hansen, Lars Peter (1982): Large sample properties of generalized method of moments estimators, Econometrica, 50, 1029-1054 [62] Hansen, Lars Peter, John Heaton, and A Yaron (1996): Finite sample properties of some alternative GMM estimators, Journal of Business and Economic Statistics, 14, 262-280 [63] Hausman, J.A (1978): Specification tests in econometrics, Econometrica, 46, 1251-1271 [64] Heckman, J (1979): Sample selection bias as a specification error, Econometrica, 47, 153161 [65] Horn, S.D., R.A Horn, and D.B Duncan (1975) Estimating heteroscedastic variances in linear model, Journal of the American Statistical Association, 70, 380-385 [66] Horowitz, Joel (2001): The Bootstrap, Handbook of Econometrics, Vol 5, J.J Heckman and E.E Leamer, eds., Elsevier Science, 3159-3228 [67] Imbens, G.W (1997): One step estimators for over-identified generalized method of moments models, Review of Economic Studies, 64, 359-383 [68] Imbens, G.W., R.H Spady and P Johnson (1998): Information theoretic approaches to inference in moment condition models, Econometrica, 66, 333-357 [69] Jarque, C.M and A.K Bera (1980): Ecient tests for normality, homoskedasticity and serial independence of regression residuals, Economic Letters, 6, 255-259 [70] Johansen, S (1988): Statistical analysis of cointegrating vectors, Journal of Economic Dynamics and Control, 12, 231-254 [71] Johansen, S (1991): Estimation and hypothesis testing of cointegration vectors in the presence of linear trend, Econometrica, 59, 1551-1580 [72] Johansen, S (1995): Likelihood-Based Inference in Cointegrated Vector Auto-Regressive Models, Oxford University Press [73] Johansen, S and K Juselius (1992): Testing structural hypotheses in a multivariate cointegration analysis of the PPP and the UIP for the UK, Journal of Econometrics, 53, 211-244 [74] Kitamura, Y (2001): Asymptotic optimality and empirical likelihood for testing moment restrictions, Econometrica, 69, 1661-1672 [75] Kitamura, Y and M Stutzer (1997): An information-theoretic alternative to generalized method of moments, Econometrica, 65, 861-874 [76] Koenker, Roger (2005): Quantile Regression Cambridge University Press [77] Kunsch, H.R (1989): The jackknife and the bootstrap for general stationary observations, Annals of Statistics, 17, 1217-1241 BIBLIOGRAPHY 376 [78] Kwiatkowski, D., P.C.B Phillips, P Schmidt, and Y Shin (1992): Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? Journal of Econometrics, 54, 159-178 [79] Lafontaine, F and K.J White (1986): Obtaining any Wald statistic you want, Economics Letters, 21, 35-40 [80] Lehmann, E.L and George Casella (1998): Theory of Point Estimation, 2nd Edition, Springer [81] Lehmann, E.L and Joseph P Romano (2005): Testing Statistical Hypotheses, 3rd Edition, Springer [82] Lindeberg, Jarl Waldemar, (1922): Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeitsrechnung, Mathematische Zeitschrift, 15, 211-225 [83] Li, Qi and Jerey Racine (2007) Nonparametric Econometrics [84] Lovell, M.C (1963): Seasonal adjustment of economic time series, Journal of the American Statistical Association, 58, 993-1010 [85] MacKinnon, James G (1990): Critical values for cointegration, in Engle, R.F and C.W Granger (eds.) Long-Run Economic Relationships: Readings in Cointegration, Oxford, Oxford University Press [86] MacKinnon, James G and Halbert White (1985): Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties, Journal of Econometrics, 29, 305-325 [87] Magnus, J R., and H Neudecker (1988): Matrix Dierential Calculus with Applications in Statistics and Econometrics, New York: John Wiley and Sons [88] Mann, H.B and A Wald (1943) On stochastic limit and order relationships, The Annals of Mathematical Statistics 14, 217226 [89] Muirhead, R.J (1982): Aspects of Multivariate Statistical Theory New York: Wiley [90] Nelder, J and R Mead (1965): A simplex method for function minimization, Computer Journal, 7, 308-313 [91] Nerlove, Marc (1963): Returns to Scale in Electricity Supply, Chapter of Measurement in Economics (C Christ, et al, eds.) Stanford: Stanford University Press, 167-198 [92] Newey, Whitney K (1990): Semiparametric eciency bounds, Journal of Applied Econometrics, 5, 99-135 [93] Newey, Whitney K (1997): Convergence rates and asymptotic normality for series estimators, Journal of Econometrics, 79, 147-168 [94] Newey, Whitney K and Daniel L McFadden (1994): Large Sample Estimation and Hypothesis Testing, in Robert Engle and Daniel McFadden, (eds.) Handbook of Econometrics, vol IV, 2111-2245, North Holland: Amsterdam [95] Newey, Whitney K and Kenneth D West (1987): Hypothesis testing with ecient method of moments estimation, International Economic Review, 28, 777-787 [96] Owen, Art B (1988): Empirical likelihood ratio confidence intervals for a single functional, Biometrika, 75, 237-249 BIBLIOGRAPHY 377 [97] Owen, Art B (2001): Empirical Likelihood New York: Chapman & Hall [98] Park, Joon Y and Peter C B Phillips (1988): On the formulation of Wald tests of nonlinear restrictions, Econometrica, 56, 1065-1083, [99] Phillips, Peter C.B (1989): Partially identified econometric models, Econometric Theory, 5, 181-240 [100] Phillips, Peter C.B and Sam Ouliaris (1990): Asymptotic properties of residual based tests for cointegration, Econometrica, 58, 165-193 [101] Politis, D.N and J.P Romano (1996): The stationary bootstrap, Journal of the American Statistical Association, 89, 1303-1313 [102] Potscher, B.M (1991): Eects of model selection on inference, Econometric Theory, 7, 163-185 [103] Qin, J and J Lawless (1994): Empirical likelihood and general estimating equations, The Annals of Statistics, 22, 300-325 [104] Ramsey, J B (1969): Tests for specification errors in classical linear least-squares regression analysis, Journal of the Royal Statistical Society, Series B, 31, 350-371 [105] Rudin, W (1987): Real and Complex Analysis, 3rd edition New York: McGraw-Hill [106] Runge, Carl (1901): ĩber empirische Funktionen und die Interpolation zwischen ọquidistanten Ordinaten, Zeitschrift fỹr Mathematik und Physik, 46, 224-243 [107] Said, S.E and D.A Dickey (1984): Testing for unit roots in autoregressive-moving average models of unknown order, Biometrika, 71, 599-608 [108] Secrist, Horace (1933): The Triumph of Mediocrity in Business Evanston: Northwestern University [109] Shao, J and D Tu (1995): The Jackknife and Bootstrap NY: Springer [110] Sargan, J.D (1958): The estimation of economic relationships using instrumental variables, Econometrica, 6, 393-415 [111] Shao, Jun (2003): Mathematical Statistics, 2nd edition, Springer [112] Sheather, S.J and M.C Jones (1991): A reliable data-based bandwidth selection method for kernel density estimation, Journal of the Royal Statistical Society, Series B, 53, 683-690 [113] Shin, Y (1994): A residual-based test of the null of cointegration against the alternative of no cointegration, Econometric Theory, 10, 91-115 [114] Silverman, B.W (1986): Density Estimation for Statistics and Data Analysis London: Chapman and Hall [115] Sims, C.A (1972): Money, income and causality, American Economic Review, 62, 540-552 [116] Sims, C.A (1980): Macroeconomics and reality, Econometrica, 48, 1-48 [117] Staiger, D and James H Stock (1997): Instrumental variables regression with weak instruments, Econometrica, 65, 557-586 [118] Stock, James H (1987): Asymptotic properties of least squares estimators of cointegrating vectors, Econometrica, 55, 1035-1056 BIBLIOGRAPHY 378 [119] Stock, James H (1991): Confidence intervals for the largest autoregressive root in U.S macroeconomic time series, Journal of Monetary Economics, 28, 435-460 [120] Stock, James H and Jonathan H Wright (2000): GMM with weak identification, Econometrica, 68, 1055-1096 [121] Stock, James H and Mark W Watson (2010): Introduction to Econometrics, 3rd edition, Addison-Wesley [122] Stone, Marshall H (1937): Applications of the Theory of Boolean Rings to General Topology, Transactions of the American Mathematical Society, 41, 375-481 [123] Stone, Marshall H (1948): The Generalized Weierstrass Approximation Theorem, Mathematics Magazine, 21, 167-184 [124] Theil, Henri (1953): Repeated least squares applied to complete equation systems, The Hague, Central Planning Bureau, mimeo [125] Theil, Henri (1961): Economic Forecasts and Policy Amsterdam: North Holland [126] Theil, Henri (1971): Principles of Econometrics, New York: Wiley [127] Tobin, James (1958): Estimation of relationships for limited dependent variables, Econometrica, 6, 24-36 [128] Tripathi, Gautam (1999): A matrix extension of the Cauchy-Schwarz inequality, Economics Letters, 63, 1-3 [129] van der Vaart, A.W (1998): Asymptotic Statistics, Cambridge University Press [130] Wald, A (1943): Tests of statistical hypotheses concerning several parameters when the number of observations is large, Transactions of the American Mathematical Society, 54, 426-482 [131] Wang, J and E Zivot (1998): Inference on structural parameters in instrumental variables regression with weak instruments, Econometrica, 66, 1389-1404 [132] Weierstrass, K (1885): ĩber die analytische Darstellbarkeit sogenannter willkỹrlicher Functionen einer reellen Verọnderlichen, Sitzungsberichte der Kửniglich Preuòischen Akademie der Wissenschaften zu Berlin, 1885 [133] White, Halbert (1980): A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity, Econometrica, 48, 817-838 [134] White, Halbert (1984): Asymptotic Theory for Econometricians, Academic Press [135] Wooldridge, Jerey M (2010) Econometric Analysis of Cross Section and Panel Data, 2nd edition, MIT Press [136] Wooldridge, Jerey M (2009) Introductory Econometrics: A Modern Approach, 4th edition, Southwestern [137] Zellner, Arnold (1962): An ecient method of estimating seemingly unrelated regressions, and tests for aggregation bias, Journal of the American Statistical Association, 57, 348-368 [138] Zhang, Fuzhen and Qingling Zhang (2006): Eigenvalue inequalities for matrix product, IEEE Transactions on Automatic Control, 51, 1506-1509 [...]...CHAPTER 1 INTRODUCTION 2 paper “The probability approach in econometrics , Econometrica (1944) Haavelmo argued that quantitative economic models must necessarily be probability models (by which today we would mean stochastic) Deterministic models are blatently... but some features are left unspecified This approach typically leads to estimation methods such as least-squares and the Generalized Method of Moments The semiparametric approach dominates contemporary econometrics, and is the main focus of this textbook Another branch of quantitative structural economics is the calibration approach Similar to the quasi-structural approach, the calibration approach interprets... for a specific problem — and not based on a generalizable principle CHAPTER 1 INTRODUCTION 3 Economists typically denote variables by the italicized roman characters ,  and/or  The convention in econometrics is to use the character  to denote the variable to be explained, while the characters  and  are used to denote the conditioning (explaining) variables Following mathematical convention,... computational speed, at the cost of increased time in programming and debugging As these different packages have distinct advantages, many empirical economists end up using more than one package As a student of econometrics, you will learn at least one of these packages, and probably more than one 1.8 Reading the Manuscript I have endeavored to use a unified notation and nomenclature The development of the material... single summary measure, and thereby facilitate comparisons across groups Because of this simplifying property, conditional means are the primary interest of regression analysis and are a major focus in econometrics Table 2.1 allows us to easily calculate average wage differences between groups For example, we can see that the wage gap between men and women continues after disaggregation by race, as the... between men and women regardless of educational attainment In many cases it is convenient to simplify the notation by writing variables using single characters, typically   and/or  It is conventional in econometrics to denote the dependent variable (e.g log()) by the letter  a conditioning variable (such as sex ) by the letter  and multiple conditioning variables (such as race, education and sex... which is a function of the argument u The expression E ( | x = u) is the conditional expectation of  given that we know that the random variable x equals the specific value u However, sometimes in econometrics we take a notational shortcut and use E ( | x) to refer to this function Hopefully, the use of E ( | x) should be apparent from the context 2.6 Continuous Variables In the previous sections,

Ngày đăng: 04/09/2016, 08:03

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan