Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 20 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
20
Dung lượng
107,38 KB
Nội dung
Econometric Analysis of Cross Section and Panel Data JeÔrey M Wooldridge The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND 1 1.1 1.2 Introduction Causal Relationships and Ceteris Paribus Analysis The Stochastic Setting and Asymptotic Analysis 1.2.1 Data Structures 1.2.2 Asymptotic Analysis Some Examples Why Not Fixed Explanatory Variables? 3 4 7 1.3 1.4 2.1 2.2 2.3 3.1 3.2 3.3 3.4 3.5 Conditional Expectations and Related Concepts in Econometrics The Role of Conditional Expectations in Econometrics Features of Conditional Expectations 2.2.1 Definition and Examples 2.2.2 Partial EÔects, Elasticities, and Semielasticities 2.2.3 The Error Form of Models of Conditional Expectations 2.2.4 Some Properties of Conditional Expectations 2.2.5 Average Partial EÔects Linear Projections Problems Appendix 2A 2.A.1 Properties of Conditional Expectations 2.A.2 Properties of Conditional Variances 2.A.3 Properties of Linear Projections 13 13 14 14 15 18 19 22 24 27 29 29 31 32 Basic Asymptotic Theory Convergence of Deterministic Sequences Convergence in Probability and Bounded in Probability Convergence in Distribution Limit Theorems for Random Samples Limiting Behavior of Estimators and Test Statistics 3.5.1 Asymptotic Properties of Estimators 3.5.2 Asymptotic Properties of Test Statistics Problems 35 35 36 38 39 40 40 43 45 vi Contents II LINEAR MODELS 47 4.1 4.2 The Single-Equation Linear Model and OLS Estimation Overview of the Single-Equation Linear Model Asymptotic Properties of OLS 4.2.1 Consistency 4.2.2 Asymptotic Inference Using OLS 4.2.3 Heteroskedasticity-Robust Inference 4.2.4 Lagrange Multiplier (Score) Tests OLS Solutions to the Omitted Variables Problem 4.3.1 OLS Ignoring the Omitted Variables 4.3.2 The Proxy Variable–OLS Solution 4.3.3 Models with Interactions in Unobservables Properties of OLS under Measurement Error 4.4.1 Measurement Error in the Dependent Variable 4.4.2 Measurement Error in an Explanatory Variable Problems 49 49 51 52 54 55 58 61 61 63 67 70 71 73 76 4.3 4.4 5.1 5.2 5.3 6.1 83 83 83 90 92 92 94 96 97 100 101 Instrumental Variables Estimation of Single-Equation Linear Models Instrumental Variables and Two-Stage Least Squares 5.1.1 Motivation for Instrumental Variables Estimation 5.1.2 Multiple Instruments: Two-Stage Least Squares General Treatment of 2SLS 5.2.1 Consistency 5.2.2 Asymptotic Normality of 2SLS 5.2.3 Asymptotic E‰ciency of 2SLS 5.2.4 Hypothesis Testing with 2SLS 5.2.5 Heteroskedasticity-Robust Inference for 2SLS 5.2.6 Potential Pitfalls with 2SLS IV Solutions to the Omitted Variables and Measurement Error Problems 5.3.1 Leaving the Omitted Factors in the Error Term 5.3.2 Solutions Using Indicators of the Unobservables Problems 105 105 105 107 Additional Single-Equation Topics Estimation with Generated Regressors and Instruments 115 115 Contents 6.2 6.3 7.1 7.2 7.3 7.4 7.5 7.6 7.7 vii 6.1.1 OLS with Generated Regressors 6.1.2 2SLS with Generated Instruments 6.1.3 Generated Instruments and Regressors Some Specification Tests 6.2.1 Testing for Endogeneity 6.2.2 Testing Overidentifying Restrictions 6.2.3 Testing Functional Form 6.2.4 Testing for Heteroskedasticity Single-Equation Methods under Other Sampling Schemes 6.3.1 Pooled Cross Sections over Time 6.3.2 Geographically Stratified Samples 6.3.3 Spatial Dependence 6.3.4 Cluster Samples Problems Appendix 6A 115 116 117 118 118 122 124 125 128 128 132 134 134 135 139 Estimating Systems of Equations by OLS and GLS Introduction Some Examples System OLS Estimation of a Multivariate Linear System 7.3.1 Preliminaries 7.3.2 Asymptotic Properties of System OLS 7.3.3 Testing Multiple Hypotheses Consistency and Asymptotic Normality of Generalized Least Squares 7.4.1 Consistency 7.4.2 Asymptotic Normality Feasible GLS 7.5.1 Asymptotic Properties 7.5.2 Asymptotic Variance of FGLS under a Standard Assumption Testing Using FGLS Seemingly Unrelated Regressions, Revisited 7.7.1 Comparison between OLS and FGLS for SUR Systems 7.7.2 Systems with Cross Equation Restrictions 7.7.3 Singular Variance Matrices in SUR Systems 143 143 143 147 147 148 153 153 153 156 157 157 160 162 163 164 167 167 viii 7.8 8.1 8.2 8.3 8.4 8.5 8.6 9.1 9.2 9.3 9.4 Contents The Linear Panel Data Model, Revisited 7.8.1 Assumptions for Pooled OLS 7.8.2 Dynamic Completeness 7.8.3 A Note on Time Series Persistence 7.8.4 Robust Asymptotic Variance Matrix 7.8.5 Testing for Serial Correlation and Heteroskedasticity after Pooled OLS 7.8.6 Feasible GLS Estimation under Strict Exogeneity Problems 169 170 173 175 175 176 178 179 System Estimation by Instrumental Variables Introduction and Examples A General Linear System of Equations Generalized Method of Moments Estimation 8.3.1 A General Weighting Matrix 8.3.2 The System 2SLS Estimator 8.3.3 The Optimal Weighting Matrix 8.3.4 The Three-Stage Least Squares Estimator 8.3.5 Comparison between GMM 3SLS and Traditional 3SLS Some Considerations When Choosing an Estimator Testing Using GMM 8.5.1 Testing Classical Hypotheses 8.5.2 Testing Overidentification Restrictions More E‰cient Estimation and Optimal Instruments Problems 183 183 186 188 188 191 192 194 196 198 199 199 201 202 205 Simultaneous Equations Models The Scope of Simultaneous Equations Models Identification in a Linear System 9.2.1 Exclusion Restrictions and Reduced Forms 9.2.2 General Linear Restrictions and Structural Equations 9.2.3 Unidentified, Just Identified, and Overidentified Equations Estimation after Identification 9.3.1 The Robustness-E‰ciency Trade-oÔ 9.3.2 When Are 2SLS and 3SLS Equivalent? 9.3.3 Estimating the Reduced Form Parameters Additional Topics in Linear SEMs 209 209 211 211 215 220 221 221 224 224 225 Contents 9.4.1 9.4.2 9.4.3 9.5 9.6 10 10.1 10.2 10.3 10.4 10.5 10.6 ix Using Cross Equation Restrictions to Achieve Identification Using Covariance Restrictions to Achieve Identification Subtleties Concerning Identification and E‰ciency in Linear Systems SEMs Nonlinear in Endogenous Variables 9.5.1 Identication 9.5.2 Estimation DiÔerent Instruments for DiÔerent Equations Problems 225 227 Basic Linear Unobserved EÔects Panel Data Models Motivation: The Omitted Variables Problem Assumptions about the Unobserved EÔects and Explanatory Variables 10.2.1 Random or Fixed EÔects? 10.2.2 Strict Exogeneity Assumptions on the Explanatory Variables 10.2.3 Some Examples of Unobserved EÔects Panel Data Models Estimating Unobserved EÔects Models by Pooled OLS Random EÔects Methods 10.4.1 Estimation and Inference under the Basic Random EÔects Assumptions 10.4.2 Robust Variance Matrix Estimator 10.4.3 A General FGLS Analysis 10.4.4 Testing for the Presence of an Unobserved EÔect Fixed EÔects Methods 10.5.1 Consistency of the Fixed EÔects Estimator 10.5.2 Asymptotic Inference with Fixed EÔects 10.5.3 The Dummy Variable Regression 10.5.4 Serial Correlation and the Robust Variance Matrix Estimator 10.5.5 Fixed EÔects GLS 10.5.6 Using Fixed EÔects Estimation for Policy Analysis First DiÔerencing Methods 10.6.1 Inference 10.6.2 Robust Variance Matrix 247 247 229 230 230 235 237 239 251 251 252 254 256 257 257 262 263 264 265 265 269 272 274 276 278 279 279 282 x 10.7 11 11.1 11.2 11.3 11.4 11.5 Contents 10.6.3 Testing for Serial Correlation 10.6.4 Policy Analysis Using First DiÔerencing Comparison of Estimators 10.7.1 Fixed EÔects versus First DiÔerencing 10.7.2 The Relationship between the Random EÔects and Fixed EÔects Estimators 10.7.3 The Hausman Test Comparing the RE and FE Estimators Problems 282 283 284 284 More Topics in Linear Unobserved EÔects Models Unobserved EÔects Models without the Strict Exogeneity Assumption 11.1.1 Models under Sequential Moment Restrictions 11.1.2 Models with Strictly and Sequentially Exogenous Explanatory Variables 11.1.3 Models with Contemporaneous Correlation between Some Explanatory Variables and the Idiosyncratic Error 11.1.4 Summary of Models without Strictly Exogenous Explanatory Variables Models with Individual-Specific Slopes 11.2.1 A Random Trend Model 11.2.2 General Models with Individual-Specic Slopes GMM Approaches to Linear Unobserved EÔects Models 11.3.1 Equivalence between 3SLS and Standard Panel Data Estimators 11.3.2 Chamberlains Approach to Unobserved EÔects Models Hausman and Taylor-Type Models Applying Panel Data Methods to Matched Pairs and Cluster Samples Problems 299 286 288 291 299 299 305 307 314 315 315 317 322 322 323 325 328 332 III GENERAL APPROACHES TO NONLINEAR ESTIMATION 339 12 12.1 12.2 12.3 M-Estimation Introduction Identification, Uniform Convergence, and Consistency Asymptotic Normality 341 341 345 349 Contents 12.4 12.5 12.6 12.7 12.8 13 13.1 13.2 13.3 13.4 13.5 13.6 13.7 13.8 xi Two-Step M-Estimators 12.4.1 Consistency 12.4.2 Asymptotic Normality Estimating the Asymptotic Variance 12.5.1 Estimation without Nuisance Parameters 12.5.2 Adjustments for Two-Step Estimation Hypothesis Testing 12.6.1 Wald Tests 12.6.2 Score (or Lagrange Multiplier) Tests 12.6.3 Tests Based on the Change in the Objective Function 12.6.4 Behavior of the Statistics under Alternatives Optimization Methods 12.7.1 The Newton-Raphson Method 12.7.2 The Berndt, Hall, Hall, and Hausman Algorithm 12.7.3 The Generalized Gauss-Newton Method 12.7.4 Concentrating Parameters out of the Objective Function Simulation and Resampling Methods 12.8.1 Monte Carlo Simulation 12.8.2 Bootstrapping Problems 353 353 354 356 356 361 362 362 363 369 371 372 372 374 375 376 377 377 378 380 Maximum Likelihood Methods Introduction Preliminaries and Examples General Framework for Conditional MLE Consistency of Conditional MLE Asymptotic Normality and Asymptotic Variance Estimation 13.5.1 Asymptotic Normality 13.5.2 Estimating the Asymptotic Variance Hypothesis Testing Specification Testing Partial Likelihood Methods for Panel Data and Cluster Samples 13.8.1 Setup for Panel Data 13.8.2 Asymptotic Inference 13.8.3 Inference with Dynamically Complete Models 13.8.4 Inference under Cluster Sampling 385 385 386 389 391 392 392 395 397 398 401 401 405 408 409 xii 13.9 Contents Panel Data Models with Unobserved EÔects 13.9.1 Models with Strictly Exogenous Explanatory Variables 13.9.2 Models with Lagged Dependent Variables Two-Step MLE Problems Appendix 13A 410 410 412 413 414 418 Generalized Method of Moments and Minimum Distance Estimation Asymptotic Properties of GMM Estimation under Orthogonality Conditions Systems of Nonlinear Equations Panel Data Applications E‰cient Estimation 14.5.1 A General E‰ciency Framework 14.5.2 E‰ciency of MLE 14.5.3 E‰cient Choice of Instruments under Conditional Moment Restrictions Classical Minimum Distance Estimation Problems Appendix 14A 421 421 426 428 434 436 436 438 IV NONLINEAR MODELS AND RELATED TOPICS 451 15 15.1 15.2 15.3 15.4 Discrete Response Models Introduction The Linear Probability Model for Binary Response Index Models for Binary Response: Probit and Logit Maximum Likelihood Estimation of Binary Response Index Models Testing in Binary Response Index Models 15.5.1 Testing Multiple Exclusion Restrictions 15.5.2 Testing Nonlinear Hypotheses about b 15.5.3 Tests against More General Alternatives Reporting the Results for Probit and Logit Specification Issues in Binary Response Models 15.7.1 Neglected Heterogeneity 15.7.2 Continuous Endogenous Explanatory Variables 453 453 454 457 13.10 14 14.1 14.2 14.3 14.4 14.5 14.6 15.5 15.6 15.7 439 442 446 448 460 461 461 463 463 465 470 470 472 Contents 15.7.3 15.7.4 15.8 15.9 15.10 16 16.1 16.2 16.3 16.4 16.5 16.6 16.7 16.8 xiii A Binary Endogenous Explanatory Variable Heteroskedasticity and Nonnormality in the Latent Variable Model 15.7.5 Estimation under Weaker Assumptions Binary Response Models for Panel Data and Cluster Samples 15.8.1 Pooled Probit and Logit 15.8.2 Unobserved EÔects Probit Models under Strict Exogeneity 15.8.3 Unobserved EÔects Logit Models under Strict Exogeneity 15.8.4 Dynamic Unobserved EÔects Models 15.8.5 Semiparametric Approaches 15.8.6 Cluster Samples Multinomial Response Models 15.9.1 Multinomial Logit 15.9.2 Probabilistic Choice Models Ordered Response Models 15.10.1 Ordered Logit and Ordered Probit 15.10.2 Applying Ordered Probit to Interval-Coded Data Problems 477 Corner Solution Outcomes and Censored Regression Models Introduction and Motivation Derivations of Expected Values Inconsistency of OLS Estimation and Inference with Censored Tobit Reporting the Results Specification Issues in Tobit Models 16.6.1 Neglected Heterogeneity 16.6.2 Endogenous Explanatory Variables 16.6.3 Heteroskedasticity and Nonnormality in the Latent Variable Model 16.6.4 Estimation under Conditional Median Restrictions Some Alternatives to Censored Tobit for Corner Solution Outcomes Applying Censored Regression to Panel Data and Cluster Samples 16.8.1 Pooled Tobit 16.8.2 Unobserved EÔects Tobit Models under Strict Exogeneity 517 517 521 524 525 527 529 529 530 479 480 482 482 483 490 493 495 496 497 497 500 504 504 508 509 533 535 536 538 538 540 xiv Contents 16.8.3 Dynamic Unobserved EÔects Tobit Models Problems 17 17.1 17.2 17.3 17.4 17.5 17.6 17.7 17.8 18 18.1 18.2 18.3 18.4 542 544 Sample Selection, Attrition, and Stratified Sampling Introduction When Can Sample Selection Be Ignored? 17.2.1 Linear Models: OLS and 2SLS 17.2.2 Nonlinear Models Selection on the Basis of the Response Variable: Truncated Regression A Probit Selection Equation 17.4.1 Exogenous Explanatory Variables 17.4.2 Endogenous Explanatory Variables 17.4.3 Binary Response Model with Sample Selection A Tobit Selection Equation 17.5.1 Exogenous Explanatory Variables 17.5.2 Endogenous Explanatory Variables Estimating Structural Tobit Equations with Sample Selection Sample Selection and Attrition in Linear Panel Data Models 17.7.1 Fixed EÔects Estimation with Unbalanced Panels 17.7.2 Testing and Correcting for Sample Selection Bias 17.7.3 Attrition Stratified Sampling 17.8.1 Standard Stratified Sampling and Variable Probability Sampling 17.8.2 Weighted Estimators to Account for Stratification 17.8.3 Stratification Based on Exogenous Variables Problems 551 551 552 552 556 Estimating Average Treatment EÔects Introduction A Counterfactual Setting and the Self-Selection Problem Methods Assuming Ignorability of Treatment 18.3.1 Regression Methods 18.3.2 Methods Based on the Propensity Score Instrumental Variables Methods 18.4.1 Estimating the ATE Using IV 558 560 560 567 570 571 571 573 575 577 578 581 585 590 590 592 596 598 603 603 603 607 608 614 621 621 Contents 18.5 19 19.1 19.2 19.3 19.4 19.5 19.6 xv 18.4.2 Estimating the Local Average Treatment EÔect by IV Further Issues 18.5.1 Special Considerations for Binary and Corner Solution Responses 18.5.2 Panel Data 18.5.3 Nonbinary Treatments 18.5.4 Multiple Treatments Problems 633 636 Count Data and Related Models Why Count Data Models? Poisson Regression Models with Cross Section Data 19.2.1 Assumptions Used for Poisson Regression 19.2.2 Consistency of the Poisson QMLE 19.2.3 Asymptotic Normality of the Poisson QMLE 19.2.4 Hypothesis Testing 19.2.5 Specification Testing Other Count Data Regression Models 19.3.1 Negative Binomial Regression Models 19.3.2 Binomial Regression Models Other QMLEs in the Linear Exponential Family 19.4.1 Exponential Regression Models 19.4.2 Fractional Logit Regression Endogeneity and Sample Selection with an Exponential Regression Function 19.5.1 Endogeneity 19.5.2 Sample Selection Panel Data Methods 19.6.1 Pooled QMLE 19.6.2 Specifying Models of Conditional Expectations with Unobserved EÔects 19.6.3 Random EÔects Methods 19.6.4 Fixed EÔects Poisson Estimation 19.6.5 Relaxing the Strict Exogeneity Assumption Problems 645 645 646 646 648 649 653 654 657 657 659 660 661 661 636 637 638 642 642 663 663 666 668 668 670 671 674 676 678 xvi 20 20.1 20.2 20.3 20.4 20.5 Contents Duration Analysis Introduction Hazard Functions 20.2.1 Hazard Functions without Covariates 20.2.2 Hazard Functions Conditional on Time-Invariant Covariates 20.2.3 Hazard Functions Conditional on Time-Varying Covariates Analysis of Single-Spell Data with Time-Invariant Covariates 20.3.1 Flow Sampling 20.3.2 Maximum Likelihood Estimation with Censored Flow Data 20.3.3 Stock Sampling 20.3.4 Unobserved Heterogeneity Analysis of Grouped Duration Data 20.4.1 Time-Invariant Covariates 20.4.2 Time-Varying Covariates 20.4.3 Unobserved Heterogeneity Further Issues 20.5.1 Cox’s Partial Likelihood Method for the Proportional Hazard Model 20.5.2 Multiple-Spell Data 20.5.3 Competing Risks Models Problems References Index 685 685 686 686 690 691 693 694 695 700 703 706 707 711 713 714 714 714 715 715 721 737 Acknowledgments My interest in panel data econometrics began in earnest when I was an assistant professor at MIT, after I attended a seminar by a graduate student, Leslie Papke, who would later become my wife Her empirical research using nonlinear panel data methods piqued my interest and eventually led to my research on estimating nonlinear panel data models without distributional assumptions I dedicate this text to Leslie My former colleagues at MIT, particularly Jerry Hausman, Daniel McFadden, Whitney Newey, Danny Quah, and Thomas Stoker, played significant roles in encouraging my interest in cross section and panel data econometrics I also have learned much about the modern approach to panel data econometrics from Gary Chamberlain of Harvard University I cannot discount the excellent training I received from Robert Engle, Clive Granger, and especially Halbert White at the University of California at San Diego I hope they are not too disappointed that this book excludes time series econometrics I did not teach a course in cross section and panel data methods until I started teaching at Michigan State Fortunately, my colleague Peter Schmidt encouraged me to teach the course at which this book is aimed Peter also suggested that a text on panel data methods that uses ‘‘vertical bars’’ would be a worthwhile contribution Several classes of students at Michigan State were subjected to this book in manuscript form at various stages of development I would like to thank these students for their perseverance, helpful comments, and numerous corrections I want to specifically mention Scott Baier, Linda Bailey, Ali Berker, Yi-Yi Chen, William Horrace, Robin Poston, Kyosti Pietola, Hailong Qian, Wendy Stock, and Andrew Toole Naturally, they are not responsible for any remaining errors I was fortunate to have several capable, conscientious reviewers for the manuscript Jason Abrevaya (University of Chicago), Joshua Angrist (MIT), David Drukker (Stata Corporation), Brian McCall (University of Minnesota), James Ziliak (University of Oregon), and three anonymous reviewers provided excellent suggestions, many of which improved the book’s organization and coverage The people at MIT Press have been remarkably patient, and I have very much enjoyed working with them I owe a special debt to Terry Vaughn (now at Princeton University Press) for initiating this project and then giving me the time to produce a manuscript with which I felt comfortable I am grateful to Jane McDonald and Elizabeth Murry for reenergizing the project and for allowing me significant leeway in crafting the final manuscript Finally, Peggy Gordon and her crew at P M Gordon Associates, Inc., did an expert job in editing the manuscript and in producing the final text Preface This book is intended primarily for use in a second-semester course in graduate econometrics, after a first course at the level of Goldberger (1991) or Greene (1997) Parts of the book can be used for special-topics courses, and it should serve as a general reference My focus on cross section and panel data methods—in particular, what is often dubbed microeconometrics—is novel, and it recognizes that, after coverage of the basic linear model in a first-semester course, an increasingly popular approach is to treat advanced cross section and panel data methods in one semester and time series methods in a separate semester This division reflects the current state of econometric practice Modern empirical research that can be fitted into the classical linear model paradigm is becoming increasingly rare For instance, it is now widely recognized that a student doing research in applied time series analysis cannot get very far by ignoring recent advances in estimation and testing in models with trending and strongly dependent processes This theory takes a very diÔerent direction from the classical linear model than does cross section or panel data analysis Hamiltons (1994) time series text demonstrates this diÔerence unequivocally Books intended to cover an econometric sequence of a year or more, beginning with the classical linear model, tend to treat advanced topics in cross section and panel data analysis as direct applications or minor extensions of the classical linear model (if they are treated at all) Such treatment needlessly limits the scope of applications and can result in poor econometric practice The focus in such books on the algebra and geometry of econometrics is appropriate for a first-semester course, but it results in oversimplification or sloppiness in stating assumptions Approaches to estimation that are acceptable under the fixed regressor paradigm so prominent in the classical linear model can lead one badly astray under practically important departures from the fixed regressor assumption Books on ‘‘advanced’’ econometrics tend to be high-level treatments that focus on general approaches to estimation, thereby attempting to cover all data configurations— including cross section, panel data, and time series—in one framework, without giving special attention to any A hallmark of such books is that detailed regularity conditions are treated on par with the practically more important assumptions that have economic content This is a burden for students learning about cross section and panel data methods, especially those who are empirically oriented: definitions and limit theorems about dependent processes need to be included among the regularity conditions in order to cover time series applications In this book I have attempted to find a middle ground between more traditional approaches and the more recent, very unified approaches I present each model and xviii Preface method with a careful discussion of assumptions of the underlying population model These assumptions, couched in terms of correlations, conditional expectations, conditional variances and covariances, or conditional distributions, usually can be given behavioral content Except for the three more technical chapters in Part III, regularity conditions—for example, the existence of moments needed to ensure that the central limit theorem holds—are not discussed explicitly, as these have little bearing on applied work This approach makes the assumptions relatively easy to understand, while at the same time emphasizing that assumptions concerning the underlying population and the method of sampling need to be carefully considered in applying any econometric method A unifying theme in this book is the analogy approach to estimation, as exposited by Goldberger (1991) and Manski (1988) [For nonlinear estimation methods with cross section data, Manski (1988) covers several of the topics included here in a more compact format.] Loosely, the analogy principle states that an estimator is chosen to solve the sample counterpart of a problem solved by the population parameter The analogy approach is complemented nicely by asymptotic analysis, and that is the focus here By focusing on asymptotic properties I not mean to imply that small-sample properties of estimators and test statistics are unimportant However, one typically first applies the analogy principle to devise a sensible estimator and then derives its asymptotic properties This approach serves as a relatively simple guide to doing inference, and it works well in large samples (and often in samples that are not so large) Small-sample adjustments may improve performance, but such considerations almost always come after a large-sample analysis and are often done on a case-bycase basis The book contains proofs or outlines the proofs of many assertions, focusing on the role played by the assumptions with economic content while downplaying or ignoring regularity conditions The book is primarily written to give applied researchers a very firm understanding of why certain methods work and to give students the background for developing new methods But many of the arguments used throughout the book are representative of those made in modern econometric research (sometimes without the technical details) Students interested in doing research in cross section or panel data methodology will find much here that is not available in other graduate texts I have also included several empirical examples with included data sets Most of the data sets come from published work or are intended to mimic data sets used in modern empirical analysis To save space I illustrate only the most commonly used methods on the most common data structures Not surprisingly, these overlap con- Preface xix siderably with methods that are packaged in econometric software programs Other examples are of models where, given access to the appropriate data set, one could undertake an empirical analysis The numerous end-of-chapter problems are an important component of the book Some problems contain important points that are not fully described in the text; others cover new ideas that can be analyzed using the tools presented in the current and previous chapters Several of the problems require using the data sets that are included with the book As with any book, the topics here are selective and reflect what I believe to be the methods needed most often by applied researchers I also give coverage to topics that have recently become important but are not adequately treated in other texts Part I of the book reviews some tools that are elusive in mainstream econometrics books— in particular, the notion of conditional expectations, linear projections, and various convergence results Part II begins by applying these tools to the analysis of singleequation linear models using cross section data In principle, much of this material should be review for students having taken a first-semester course But starting with single-equation linear models provides a bridge from the classical analysis of linear models to a more modern treatment, and it is the simplest vehicle to illustrate the application of the tools in Part I In addition, several methods that are used often in applications—but rarely covered adequately in texts—can be covered in a single framework I approach estimation of linear systems of equations with endogenous variables from a diÔerent perspective than traditional treatments Rather than begin with simultaneous equations models, we study estimation of a general linear system by instrumental variables This approach allows us to later apply these results to models with the same statistical structure as simultaneous equations models, including panel data models Importantly, we can study the generalized method of moments estimator from the beginning and easily relate it to the more traditional three-stage least squares estimator The analysis of general estimation methods for nonlinear models in Part III begins with a general treatment of asymptotic theory of estimators obtained from nonlinear optimization problems Maximum likelihood, partial maximum likelihood, and generalized method of moments estimation are shown to be generally applicable estimation approaches The method of nonlinear least squares is also covered as a method for estimating models of conditional means Part IV covers several nonlinear models used by modern applied researchers Chapters 15 and 16 treat limited dependent variable models, with attention given to xx Preface handling certain endogeneity problems in such models Panel data methods for binary response and censored variables, including some new estimation approaches, are also covered in these chapters Chapter 17 contains a treatment of sample selection problems for both cross section and panel data, including some recent advances The focus is on the case where the population model is linear, but some results are given for nonlinear models as well Attrition in panel data models is also covered, as are methods for dealing with stratified samples Recent approaches to estimating average treatment eÔects are treated in Chapter 18 Poisson and related regression models, both for cross section and panel data, are treated in Chapter 19 These rely heavily on the method of quasi-maximum likelihood estimation A brief but modern treatment of duration models is provided in Chapter 20 I have given short shrift to some important, albeit more advanced, topics The setting here is, at least in modern parlance, essentially parametric I have not included detailed treatment of recent advances in semiparametric or nonparametric analysis In many cases these topics are not conceptually di‰cult In fact, many semiparametric methods focus primarily on estimating a finite dimensional parameter in the presence of an infinite dimensional nuisance parameter—a feature shared by traditional parametric methods, such as nonlinear least squares and partial maximum likelihood It is estimating infinite dimensional parameters that is conceptually and technically challenging At the appropriate point, in lieu of treating semiparametric and nonparametric methods, I mention when such extensions are possible, and I provide references A benefit of a modern approach to parametric models is that it provides a seamless transition to semiparametric and nonparametric methods General surveys of semiparametric and nonparametric methods are available in Volume of the Handbook of Econometrics—see Powell (1994) and Hardle and Linton (1994)as well as in ă Volume 11 of the Handbook of Statistics—see Horowitz (1993) and Ullah and Vinod (1993) I only briefly treat simulation-based methods of estimation and inference Computer simulations can be used to estimate complicated nonlinear models when traditional optimization methods are ineÔective The bootstrap method of inference and confidence interval construction can improve on asymptotic analysis Volume of the Handbook of Econometrics and Volume 11 of the Handbook of Statistics contain nice surveys of these topics (Hajivassilou and Ruud, 1994; Hall, 1994; Hajivassilou, 1993; and Keane, 1993) Preface xxi On an organizational note, I refer to sections throughout the book first by chapter number followed by section number and, sometimes, subsection number Therefore, Section 6.3 refers to Section in Chapter 6, and Section 13.8.3 refers to Subsection of Section in Chapter 13 By always including the chapter number, I hope to minimize confusion Possible Course Outlines If all chapters in the book are covered in detail, there is enough material for two semesters For a one-semester course, I use a lecture or two to review the most important concepts in Chapters and 3, focusing on conditional expectations and basic limit theory Much of the material in Part I can be referred to at the appropriate time Then I cover the basics of ordinary least squares and two-stage least squares in Chapters 4, 5, and Chapter begins the topics that most students who have taken one semester of econometrics have not previously seen I spend a fair amount of time on Chapters 10 and 11, which cover linear unobserved eÔects panel data models Part III is technically more di‰cult than the rest of the book Nevertheless, it is fairly easy to provide an overview of the analogy approach to nonlinear estimation, along with computing asymptotic variances and test statistics, especially for maximum likelihood and partial maximum likelihood methods In Part IV, I focus on binary response and censored regression models If time permits, I cover the rudiments of quasi-maximum likelihood in Chapter 19, especially for count data, and give an overview of some important issues in modern duration analysis (Chapter 20) For topics courses that focus entirely on nonlinear econometric methods for cross section and panel data, Part III is a natural starting point A full-semester course would carefully cover the material in Parts III and IV, probably supplementing the parametric approach used here with popular semiparametric methods, some of which are referred to in Part IV Parts III and IV can also be used for a half-semester course on nonlinear econometrics, where Part III is not covered in detail if the course has an applied orientation A course in applied econometrics can select topics from all parts of the book, emphasizing assumptions but downplaying derivations The several empirical examples and data sets can be used to teach students how to use advanced econometric methods The data sets can be accessed by visiting the website for the book at MIT Press: http://mitpress.mit.edu/Wooldridge-EconAnalysis ... method of inference and confidence interval construction can improve on asymptotic analysis Volume of the Handbook of Econometrics and Volume 11 of the Handbook of Statistics contain nice surveys of. .. available in Volume of the Handbook of Econometrics—see Powell (1994) and Hardle and Linton (1994)as well as in ă Volume 11 of the Handbook of Statistics—see Horowitz (1993) and Ullah and Vinod (1993).. .Econometric Analysis of Cross Section and Panel Data JeÔrey M Wooldridge The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND