Deriving the expected value and variance of the OLS estimator ˆ is facilitated by matrix algebra, but we must show some care in stating the assumptions.
Assumption E.1 (Linear in Parameters)
The model can be written as in (E.3), where yis an observed n 1 vector, Xis an n (k 1) observed matrix, and u is an n 1 vector of unobserved errors or disturbances.
Assumption E.2 (No Perfect Collinearity) The matrix Xhas rank (k 1).
This is a careful statement of the assumption that rules out linear dependencies among the explanatory variables. Under Assumption E.2, XX is nonsingular, so ˆ is unique and can be written as in (E.8).
Assumption E.3 (Zero Conditional Mean)
Conditional on the entire matrix X, each error ut has zero mean: E(utX) 0, t1,2, …,n.
In vector form,
E(uX) 0. (E.11)
This assumption is implied by MLR.4 under the random sampling assumption, MLR.2.
In time series applications, Assumption E.3 imposes strict exogeneity on the explanatory variables, something discussed at length in Chapter 10. This rules out explanatory vari- ables whose future values are correlated with ut; in particular, it eliminates lagged depen- dent variables. Under Assumption E.3, we can condition on the xtjwhen we compute the expected value of ˆ.
Theorem E.1 (Unbiasedness of OLS) Under Assumptions E.1, E.2, and E.3, the OLS estimator ˆis unbiased for B.
PROOF: Use Assumptions E.1 and E.2 and simple algebra to write ˆ(XX)1Xy(XX)1X(XBu)
(XX)1(XX)B(XX)1XuB(XX)1Xu, (E.12) where we use the fact that (XX)1(XX) Ik 1. Taking the expectation conditional on Xgives
E(ˆX)B(XX)1XE(uX) B(XX)1X0B,
because E(uX) 0 under Assumption E.3. This argument clearly does not depend on the value of B, so we have shown that ˆis unbiased.
To obtain the simplest form of the variance-covariance matrix of ˆ, we impose the assumptions of homoskedasticity and no serial correlation.
Assumption E.4 (Homoskedasticity and No Serial Correlation) (i) Var(utX) s2, t 1,2, …,n. (ii) Cov(ut,usX) 0, for all t s. In matrix form, we can write these two assumptions as
Var(uX) s2In, (E.13)
where Inis the n nidentity matrix.
Part (i) of Assumption E.4 is the homoskedasticity assumption: the variance of utcannot depend on any element of X, and the variance must be constant across observations, t. Part (ii) is the no serial correlation assumption: the errors cannot be correlated across obser- vations. Under random sampling, and in any other cross-sectional sampling schemes with independent observations, part (ii) of Assumption E.4 automatically holds. For time series applications, part (ii) rules out correlation in the errors over time (both conditional on X and unconditionally).
Because of (E.13), we often say that u has a scalar variance-covariance matrix when Assumption E.4 holds. We can now derive the variance-covariance matrix of the OLS estimator.
Theorem E.2 (Variance-Covariance Matrix of the OLS Estimator) Under Assumptions E.1 through E.4,
Var(ˆX) s2(XX)1. (E.14)
PROOF: From the last formula in equation (E.12), we have
Var(ˆX) Var[(XX)1XuX] (XX)1X[Var(uX)]X(XX)1.
Now, we use Assumption E.4 to get
Var(ˆX) (XX)1X(s2In)X(XX)1
s2(XX)1XX(XX)1s2(XX)1.
Formula (E.14) means that the variance of ˆ
j(conditional on X) is obtained by multi- plying s2by the jthdiagonal element of (XX)1. For the slope coefficients, we gave an interpretable formula in equation (3.51). Equation (E.14) also tells us how to obtain the covariance between any two OLS estimates: multiply s2by the appropriate off-diagonal element of (XX)1. In Chapter 4, we showed how to avoid explicitly finding covariances for obtaining confidence intervals and hypotheses tests by appropriately rewriting the model.
The Gauss-Markov Theorem, in its full generality, can be proven.
Theorem E.3 (Gauss-Markov Theorem) Under Assumptions E.1 through E.4, ˆis the best linear unbiased estimator.
PROOF: Any other linear estimator of Bcan be written as
˜ Ay, (E.15)
where Ais an n (k1) matrix. In order for ˜to be unbiased conditional on X, A can con- sist of nonrandom numbers and functions of X. (For example, Acannot be a function of y.) To see what further restrictions on Aare needed, write
˜A(XBu) (AX)BAu. (E.16) Then,
E(˜X)AXBE(AuX)
AXBAE(uX) because A is a function of X AXBbecause E(uX) 0.
For ˜to be an unbiased estimator of B, it must be true that E(˜X) Bfor all (k1) 1 vectors B, that is,
AXBBfor all (k 1) 1 vectors B. (E.17) Because AXis a (k1) (k1) matrix, (E.17) holds if, and only if, AXIk1. Equations (E.15) and (E.17) characterize the class of linear, unbiased estimators for B.
Next, from (E.16), we have
Var(˜X) A[Var(uX)]As2AA,
by Assumption E.4. Therefore,
Var(˜X) Var(ˆX) s2[AA(XX)1]
s2[AAAX(XX)1XA] because AXIk1 s2A[InX(XX)1X]A
s2AMA,
where M InX(XX)1X. Because Mis symmetric and idempotent, AMAis positive semi- definite for any n (k1) matrix A. This establishes that the OLS estimator ˆis BLUE. Why is this important? Let c be any (k 1) 1 vector and consider the linear combination cB c0b0c1b1… ckbk, which is a scalar. The unbiased estimators of cB are cˆ and c˜. But
Var(c˜X) Var(cˆX) c[Var(˜X) Var(ˆX)]c 0,
because [Var(˜X) Var(ˆX)] is p.s.d. Therefore, when it is used for estimating any linear combination of B, OLS yields the smallest variance. In particular, Var(ˆ
jX) Var(˜
jX) for any other linear, unbiased estimator of bj.
The unbiased estimator of the error variance s2can be written as sˆ2uˆuˆ/(n k 1),
which is the same as equation (3.56).
Theorem E.4 (Unbiasedness of Sˆ2)
Under Assumptions E.1 through E.4, sˆ2is unbiased for s2: E(sˆ2X) s2for all s20.
PROOF: Write uˆyXˆyX(XX)1XyMyMu, where MInX(XX)1X, and the last equality follows because MX0. Because Mis symmetric and idempotent,
uˆuˆuMMuuMu.
Because uMuis a scalar, it equals its trace. Therefore,
E(uMuX)E[tr(uMu)X] E[tr(Muu)X]
tr[E(Muu|X)] tr[ME(uu|X)]
tr(Ms2In) s2tr(M) s2(nk1).
The last equality follows from tr(M) tr(In) tr[X(XX)1X] n tr[(XX)1XX] n tr (Ik 1) n(k1)nk1. Therefore,
E(sˆ2X) E(uMuX)/(n k1) s2.