ECONOMETRICS Bruce E. Hansen c 2000, 2007 1 University of Wisconsin www.ssc.wisc.edu/~bhansen This Revision: January 18, 2007 Comments Welcome 1 This manuscript may be printed and reproduced for individual or instructional use, but may not be printed for commercial purposes. Contents 1 Introduction 1 1.1 Economic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Observational Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 Economic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Regression and Projection 3 2.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Conditional Density and Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3 Regression Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.4 Conditional Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.5 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.6 Best Linear Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.7 Technical Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3 Least Squares Estimation 12 3.1 Random Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.4 Normal Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.5 Model in Matrix Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.6 Projection Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.7 Residual Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.8 Bias and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.9 Gauss-Markov Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.10 Semiparametric Eciency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.11 Multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.12 Inuential Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.13 Technical Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.14 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4 Inference 27 4.1 Sampling Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3 Asymptotic Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.4 Covariance Matrix Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.5 Alternative Covariance Matrix Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.6 Functions of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.7 t tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.8 Condence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.9 Wald Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.10 F Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.11 Normal Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.12 Problems with Tests of NonLinear Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.13 Monte Carlo Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.14 Estimating a Wage Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.15 Technical Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 i 4.16 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5 Additional Regression Topics 51 5.1 Generalized Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.2 Testing for Heteroskedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.3 Forecast Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.4 NonLinear Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.5 Least Absolute Deviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.6 Quantile Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.7 Testing for Omitted NonLinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.8 Omitted Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.9 Irrelevant Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.10 Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.11 Technical Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6 The Bootstrap 66 6.1 Denition of the Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.2 The Empirical Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.3 Nonparametric Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.4 Bootstrap Estimation of Bias and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.5 Percentile Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.6 Percentile-t Equal-Tailed Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.7 Symmetric Percentile-t Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.8 Asymptotic Expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.9 One-Sided Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.10 Symmetric Two-Sided Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.11 Percentile Condence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.12 Bootstrap Methods for Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 6.13 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 7 Generalized Method of Moments 77 7.1 Overidentied Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 7.2 GMM Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 7.3 Distribution of GMM Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 7.4 Estimation of the Ecient Weight Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7.5 GMM: The General Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7.6 Over-Identication Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 7.7 Hypothesis Testing: The Distance Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.8 Conditional Moment Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.9 Bootstrap GMM Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 7.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 8 Empirical Likelihood 86 8.1 Non-Parametric Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 8.2 Asymptotic Distribution of EL Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 8.3 Overidentifying Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 8.4 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 8.5 Numerical Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 8.6 Technical Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 9 Endogeneity 92 9.1 Instrumental Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 9.2 Reduced Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 9.3 Identication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 9.4 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 9.5 Special Cases: IV and 2SLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 9.6 Bekker Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 9.7 Identication Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 ii 9.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 10 Univariate Time Series 101 10.1 Stationarity and Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 10.2 Autoregressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 10.3 Stationarity of AR(1) Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 10.4 Lag Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 10.5 Stationarity of AR(k) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 10.6 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 10.7 Asymptotic Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 10.8 Bootstrap for Autoregressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 10.9 Trend Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 10.10Testing for Omitted Serial Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 10.11Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 10.12Autoregressive Unit Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 10.13Technical Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 11 Multivariate Time Series 110 11.1 Vector Autoregressions (VARs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 11.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 11.3 Restricted VARs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 11.4 Single Equation from a VAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 11.5 Testing for Omitted Serial Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 11.6 Selection of Lag Length in an VAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 11.7 Granger Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 11.8 Cointegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 11.9 Cointegrated VARs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 12 Limited Dependent Variables 115 12.1 Binary Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 12.2 Count Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 12.3 Censored Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 12.4 Sample Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 13 Panel Data 120 13.1 Individual-Eects Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 13.2 Fixed Eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 13.3 Dynamic Panel Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 14 Nonparametrics 123 14.1 Kernel Density Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 14.2 Asymptotic MSE for Kernel Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 A Matrix Algebra 127 A.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 A.2 Matrix Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 A.3 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 A.4 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 A.5 Rank and Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 A.6 Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 A.7 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 A.8 Positive Deniteness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 A.9 Matrix Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 A.10 Kronecker Products and the Vec Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 A.11 Vector and Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 iii B Probability 135 B.1 Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 B.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 B.3 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 B.4 Common Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 B.5 Multivariate Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 B.6 Conditional Distributions and Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 B.7 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 B.8 Normal and Related Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 C Asymptotic Theory 146 C.1 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 C.2 Weak Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 C.3 Convergence in Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 C.4 Asymptotic Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 D Maximum Likelihood 151 E Numerical Optimization 155 E.1 Grid Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 E.2 Gradient Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 E.3 Derivative-Free Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 iv Chapter 1 Introduction Econometrics is the study of estimation and inference for economic models using economic data. Econo- metric theory concerns the study and development of tools and methods for applied econometric applications. Applied econometrics concerns the application of these tools to economic data. 1.1 Economic Data An econometric study requires data for analysis. The quality of the study will be largely determined by the data available. There are three major types of economic data sets: cross-sectional, time-series, and panel. They are distinguished by the dependence structure across observations. Cross-sectional data sets are characterized by mutually independent observations. Surveys are a typical source for cross-sectional data. The individuals surveyed may be persons, households, or corporations. Time-series data is indexed by time. Typical examples include macroeconomic aggregates, prices and interest rates. This type of data is characterized by serial dependence. Panel data combines elements of cross-section and time-series. These data sets consist surveys of a set of individuals, repeated over time. Each individual (person, household or corporation) is surveyed on multiple occasions. 1.2 Observational Data A common econometric question is to quantify the impact of one set of variables on another variable. For example, a concern in labor economics is the returns to schooling { the change in earnings induced by increasing a worker's education, holding other variables constant. Another issue of interest is the earnings gap between men and women. Ideally, we would use experimental data to answer these questions. To measure the returns to schooling, an experiment might randomly divide children into groups, mandate dierent levels of education to the dierent groups, and then follow the children's wage path as they mature and enter the labor force. The dierences between the groups could be attributed to the dierent levels of education. However, experiments such as this are infeasible, even immoral! Instead, most economic data is observational. To continue the above example, what we observe (through data collection) is the level of a person's education and their wage. We can measure the joint distribution of these variables, and assess the joint dependence. But we cannot infer causality, as we are not able to manipulate one variable to see the direct eect on the other. For example, a person's level of education is (at least partially) determined by that person's choices and their achievement in education. These factors are likely to be aected by their personal abilities and attitudes towards work. The fact that a person is highly educated suggests a high level of ability. This is an alternative explanation for an observed positive correlation between educational levels and wages. High ability individuals do better in school, and therefore choose to attain higher levels of education, and their high ability is the fundamental reason for their high wages. The point is that multiple explanations are consistent with a positive correlation between schooling levels and education. Knowledge of the joint distibution cannot distinguish between these explanations. This discussion means that causality cannot be infered from observational data alone. Causal inference requires identication, and this is based on strong assumptions. We will return to a discussion of some of these issues in Chapter 9. 1 1.3 Economic Data Fortunately for economists, the development of the internet has provided a convenient forum for dis- semination of economic data. Many large-scale economic datasets are available without charge from gov- ernmental agencies. An excellent starting point is the Resources for Economists Data Links, available at http://rfe.wustl.edu/Data/index.html. Some other excellent data sources are listed below. Bureau of Labor Statistics: http://www.bls.gov/ Federal Reserve Bank of St. Louis: http://research.stlouisfed.org/fred2/ Board of Governors of the Federal Reserve System: http://www.federalreserve.gov/releases/ National Bureau of Economic Research: http://www.nber.org/ US Census: http://www.census.gov/econ/www/ Current Population Survey (CPS): http://www.bls.census.gov/cps/cpsmain.htm Survey of Income and Program Participation (SIPP): http://www.sipp.census.gov/sipp/ Panel Study of Income Dynamics (PSID): http://psidonline.isr.umich.edu/ U.S. Bureau of Economic Analysis: http://www.bea.doc.gov/ CompuStat: http://www.compustat.com/www/ International Financial Statistics (IFS): http://ifs.apdi.net/imf/ 2 Chapter 2 Regression and Projection 2.1 Variables The most commonly applied econometric tool is regression. This is used when the goal is to quantify the impact of one set of variables (the regressors, conditioning variable, or covariates) on another variable (the dependent variable). We let y denote the dependent variable and (x 1 ; x 2 ; :::; x k ) denote the k regressors. It is convenient to write the set of regressors as a vector in R k : x = 0 B B B @ x 1 x 2 . . . x k 1 C C C A : (2.1) Following mathematical convention, real numbers (elements of the real line R) are written using lower case italics such as y, and vectors (elements of R k ) by lower case bold italics such as x: Upper case bold italics such as X will be used for matrices. The random variables (y; x) have a distribution F which we call the population. This \population" is innitely large. This abstraction can be a source of confusion as it does not correspond to a physical population in the real world. The distribution F is unknown, and the goal of statistical inference is to learn about features of F from the sample. At this point in our analysis it is unimportant whether the observations y and x come from continuous or discrete distributions. For example, many regressors in econometric practice are binary, taking on only the values 0 and 1, and are typically called dummy variables. 2.2 Conditional Density and Mean To study how the distribution of y varies with the variables x in the population, we start with f (y j x) ; the conditional density of y given x: To illustrate, Figure 2.1 displays the density 1 of hourly wages for men and women, from the population of white non-military wage earners with a college degree and 10-15 years of potential work experience. These are conditional density functions { the density of hourly wages conditional on race, gender, education and experience. The two density curves show the eect of gender on the distribution of wages, holding the other variables constant. While it is easy to observe that the two densities are unequal, it is useful to have numerical measures of the dierence. An important summary measure is the conditional mean m (x) = E (y j x) = Z 1 1 yf (y j x) dy: (2.2) In general, m (x) can take any form, and exists so long as Ejyj < 1: In the example presented in Figure 2.1, the mean wage for men is $27.22, and that for women is $20.73. These are indicated in Figure 2.1 by the arrows drawn to the x-axis. 1 These are nonparametric density estimates using a Gaussian kernel with the bandwidth selected by cross-validation. See Chapter 14. The data are from the 2004 Current Population Survey 3 Figure 2.1: Wage Densities for White College Grads with 10-15 Years Work Experience Take a closer look at the density functions displayed in Figure 2.1. You can see that the right tail of the density is much thicker than the left tail. These are asymmetric (skewed) densities, which is a common feature of wage distributions. When a distribution is skewed, the mean is not necessarily a good summary of the central tendency. In this context it is often convenient to transform the data by taking the (natural) logarithm. Figure 2.2 shows the density of log hourly wages for the same population, with mean log hourly wages drawn in with the arrows. The dierence in the log mean wage between men and women is 0.30, which implies a 30% average wage dierence for this population. This is a more robust measure of the typical wage gap between men and women than the dierence in the untransformed wage means. For this reason, wage regressions typically use log wages as a dependent variable rather than the level of wages. The comparison in Figure 2.1 is facilitated by the fact that the control variable (gender) is discrete. When the distribution of the control variable is continuous, then comparisons become more complicated. To illustrate, Figure 2.3 displays a scatter plot 2 of log wages against education levels. Assuming for simplicity that this is the true joint distribution, the solid line displays the conditional expectation of log wages varying with education. The conditional expectation function is close to linear; the dashed line is a linear projection approximation which will be discussed in the Section 2.6. The main point to be learned from Figure 2.3 is that the conditional expectation describes the central tendency of the conditional distribution. Of particular interest to graduate students may be the observation that dierence between a B.A. and a Ph.D. degree in mean log hourly wages is 0.36, implying an average 36% dierence in wage levels. 2.3 Regression Equation The regression error e is dened to be the dierence between y and its conditional mean (2.2) evaluated at the observed value of x: e = y m(x): By construction, this yields the formula y = m(x) + e: (2.3) Theorem 2.3.1 Properties of the regression error e 1. E (e j x) = 0: 2. E(e) = 0: 2 White non-military male wage earners with 10-15 years of potential work experience. 4 Figure 2.2: Log Wage Densities 3. E (h(x)e) = 0 for any function h () : 4. E(xe) = 0: To show the rst statement, by the denition of e and the linearity of conditional expectations, E (e j x) = E ((y m(x)) j x) = E (y j x) E (m(x) j x) = m(x) m(x) = 0: The remaining parts of the Theorem are left as an exercise. The equations y = m(x) + e E (e j x) = 0: are often stated jointly as the regression framework. It is important to understand that this is a framework, not a model, because no restrictions have been placed on the joint distribution of the data. These equations hold true by denition. A regression model imposes further restrictions on the joint distribution; most typically, restrictions on the permissible class of regression functions m (x) : The conditional mean also has the property of being the the best predictor of y; in the sense of achieving the lowest mean squared error. To see this, let g (x) be an arbitrary predictor of y given x: The expected squared error using this prediction function is E (y g (x)) 2 = E (e + m (x) g (x)) 2 = Ee 2 + 2E (e (m (x) g (x))) + E (m (x) g (x)) 2 = Ee 2 + E (m (x) g (x)) 2 Ee 2 where the second equality uses Theorem 2.3.1.3. The right-hand-side is minimized by setting g (x) = m (x) : Thus the mean squared error is minimized by the conditional mean. 5 [...]... classic motivation for the estimator (3.4) De ne the sum-of-squared errors (SSE) function Sn ( ) = n X (yi 2 x0 ) i i=1 = n X 2 yi 2 i=1 0 n X xi y i + i=1 0 n X xi x0 : i i=1 This is a quadratic function of : To visualize this function, Figure 3.1 displays an example sum-of-squared errors function Sn ( ) for the case k = 2: Figure 3.1: Sum-of-Squared Errors Function The Ordinary Least Squares (OLS)... Q plays an important role in least-squares theory so we will discuss some of its properties in detail Observe that for any non-zero 2 Rk ; 2 0 Q = E ( 0 xx0 ) = E ( 0 x) 0 so Q is by construction positive semi-de nite It is invertible if and only if it is positive de nite, which 2 requires that for all non-zero ; E ( 0 x) > 0: Equivalently, there cannot exist a non-zero vector such that 0 x = 0 identically... smallest in the positive de nite sense The following result, known as the Gauss-Markov theorem, is a famous statement of the solution 21 Theorem 3.9.1 Gauss-Markov In the homoskedastic linear regression model, the best (minimum-variance) unbiased linear estimator is OLS The Gauss-Markov theorem is an e ciency justi cation for the least-squares estimator, but it is quite limited in scope Not only has the class... particularly unsatisfactory, as the theorem leaves open the possibility that a non-linear or biased estimator could have lower mean squared error than the least-squares estimator 3.10 Semiparametric E ciency In the previous section we presented the Gauss-Markov theorem as a limited e ciency justi cation for the least-squares estimator A broader justi cation is provided in Chamberlain (1987), who established... individual estimates are reduced 3.12 In uential Observations The i'th observation is in uential on the least-squares estimate if the deletion of the observation from the sample results in a meaningful change in ^ To investigate the possibility of in uential observations, de ne the leave-one-out least-squares estimator of ; that is, the OLS estimator based on the sample excluding the i'th observation This... of Caucasian non-military male wage earners with 12 years of education 8 Figure 2.4: Hourly Wage as a Function of Experience Another defect of linear projection is that it is sensitive to the marginal distribution of the regressors when the conditional mean is non-linear We illustrate the issue in Figure 2.5 for a constructed4 joint distribution of y and x The solid line is the non-linear conditional... alternative estimator of 2 proposed by Theil called \R-bar-squared" is 2 R =1 where ~2 = y 1 n 2 2 2 y 1 n X s2 ~2 y (yi 2 y) : i=1 Theil's estimator R is a ratio of adjusted variance estimators, and therefore is expected to be a better estimator of 2 than the unadjusted estimator R2 : 3.4 Normal Regression Model Another motivation for the least-squares estimator can be obtained from the normal regression... homoskedasticity, or in the projection model 3.9 Gauss-Markov Theorem In this section we restrict attention to the homoskedastic linear regression model, which is (2.8 )-( 2.9) plus E e2 j xi = 2 : Now consider the class of estimators of which are linear functions of the vector y; i and thus can be written as ~ = A0 y where A is an n k function of X The least-squares estimator is the special case obtained... variance 1 2 of the least-squares estimator is X 0 X and that of A0 y is A0 A 2 : It is su cient to show that the di erence A0 A X 0X we calculate that A0 A 1 X 0X 1 = 1 0 C + X X 0X 1 + X 0X = C 0C C + X X 0X 1 = C 0C + C 0X X 0X 1 X X 0X is positive semi-de nite Set C = A + X 0X 1 1 X 0X X 0X X 0X : Note that X 0 C = 0: Then 1 X 0X 1 X 0C 1 The matrix C 0 C is positive semi-de nite (see Appendix A.7)... log-likelihood function for the normal regression model is log L( ; 2 ) = n X 1 log 2 )1=2 (2 i=1 n log 2 2 = exp 1 2 1 2 2 2 2 x0 i (yi 2 ) ! Sn ( ) The MLE ( ^ ; ^ 2 ) maximize log L( ; 2 ): Since log L( ; 2 ) is a function of only through the sum of squared errors Sn ( ); maximizing the likelihood is identical to minimizing Sn ( ) Hence the MLE for equals the OLS estimator: Plugging ^ into the log-likelihood . function, Figure 3.1 displays an example sum-of-squared errors function S n () for the case k = 2: Figure 3.1: Sum-of-Squared Errors Function The Ordinary Least. economic data sets: cross-sectional, time-series, and panel. They are distinguished by the dependence structure across observations. Cross-sectional data sets