Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 34 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
34
Dung lượng
419,24 KB
Nội dung
CHAPTER 27 Best Linear Prediction Best Linear Prediction is the second basic building block for the linear model, in addition to the OLS model. Instead of estimating a nonrandom parameter β about which no prior information is available, in the present situation one predicts a random variable z whose mean and covariance matrix are known. Most models to be discussed below are somewhere between these two extremes. Christensen’s [Chr87] is one of the few textbooks which treat best linear predic- tion on the basis of known first and second moments in parallel with the regression model. The two models have indeed so much in common that they should be treated together. 703 704 27. BEST LINEAR PREDICTION 27.1. Minimum Mean Squared Error, Unbiasedness Not Required Assume the expected values of the random vectors y and z are known, and their joint covariance matrix is known up to an unknown scalar factor σ 2 > 0. We will write this as (27.1.1) y z ∼ µ ν , σ 2 Ω Ω Ω yy Ω Ω Ω yz Ω Ω Ω zy Ω Ω Ω zz , σ 2 > 0. y is observed but z is not, and the goal is to predict z on the basis of the observation of y. There is a unique predictor of the form z ∗ = B ∗ y+b ∗ (i.e., it is linear with a con- stant term, the technical term for this is “affine”) with the following two properties: it is unbiased, and the prediction error is uncorrelated with y, i.e., (27.1.2) C [z ∗ − z, y] = O. The formulas for B ∗ and b ∗ are easily derived. Unbiasedness means ν = B ∗ µ + b ∗ , the predictor has therefore the form (27.1.3) z ∗ = ν + B ∗ (y −µ). Since (27.1.4) z ∗ − z = B ∗ (y −µ) −(z −ν) = B ∗ −I y −µ z −ν , 27.1. MINIMUM MEAN SQUARED ERROR, UNBIASEDNESS NOT REQUIRED 705 the zero correlation condition (27.1.2) translates into (27.1.5) B ∗ Ω Ω Ω yy = Ω Ω Ω zy , which, due to equation (A.5.13) holds for B ∗ = Ω Ω Ω zy Ω Ω Ω − yy . Therefore the predictor (27.1.6) z ∗ = ν + Ω Ω Ω zy Ω Ω Ω − yy (y −µ) satisfies the two requirements. Unbiasedness and condition (27.1.2) are sometimes interpreted to mean that z ∗ is an optimal predictor. Unbiasedness is often naively (but erroneously) considered to be a necessary condition for good estimators. And if the prediction error were correlated with the observed variable, the argument goes, then it would be possible to improve the prediction. Theorem 27.1.1 shows that despite the flaws in the argument, the result which it purports to show is indeed valid: z ∗ has the minimum MSE of all affine predictors, whether biased or not, of z on the basis of y. Theorem 27.1.1. In situation (27.1.1), the predictor (27.1.6) has, among all predictors of z which are affine functions of y, the smallest MSE matrix. Its MSE matrix is (27.1.7) MSE[z ∗ ; z] = E [(z ∗ − z)(z ∗ − z) ] = σ 2 (Ω Ω Ω zz −Ω Ω Ω zy Ω Ω Ω − yy Ω Ω Ω yz ) = σ 2 Ω Ω Ω zz.y . 706 27. BEST LINEAR PREDICTION Proof. Look at any predictor of the form ˜ z = ˜ By + ˜ b. Its bias is ˜ d = E [ ˜ z −z] = ˜ Bµ + ˜ b − ν, and by (23.1.2) one can write E [( ˜ z −z)( ˜ z −z) ] = V [( ˜ z −z)] + ˜ d ˜ d (27.1.8) = V ˜ B −I y z + ˜ d ˜ d (27.1.9) = σ 2 ˜ B −I Ω Ω Ω yy Ω Ω Ω yz Ω Ω Ω zy Ω Ω Ω zz ˜ B −I + ˜ d ˜ d .(27.1.10) This MSE-matrix is minimized if and only if d ∗ = o and B ∗ satisfies (27.1.5). To see this, take any solution B ∗ of (27.1.5), and write ˜ B = B ∗ + ˜ D. Since, due to theorem A.5.11, Ω Ω Ω zy = Ω Ω Ω zy Ω Ω Ω − yy Ω Ω Ω yy , it follows Ω Ω Ω zy B ∗ = Ω Ω Ω zy Ω Ω Ω − yy Ω Ω Ω yy B ∗ = Ω Ω Ω zy Ω Ω Ω − yy Ω Ω Ω yz . 27.1. MINIMUM MEAN SQUARED ERROR, UNBIASEDNESS NOT REQUIRED 707 Therefore MSE[ ˜ z; z] = σ 2 B ∗ + ˜ D −I Ω Ω Ω yy Ω Ω Ω yz Ω Ω Ω zy Ω Ω Ω zz B ∗ + ˜ D −I + ˜ d ˜ d = σ 2 B ∗ + ˜ D −I Ω Ω Ω yy ˜ D −Ω Ω Ω zz.y + Ω Ω Ω zy ˜ D + ˜ d ˜ d (27.1.11) = σ 2 (Ω Ω Ω zz.y + ˜ DΩ Ω Ω yy ˜ D ) + ˜ d ˜ d .(27.1.12) The MSE matrix is therefore minimized (with minimum value σ 2 Ω Ω Ω zz.y ) if and only if ˜ d = o and ˜ DΩ Ω Ω yy = O which means that ˜ B, along with B ∗ , satisfies (27.1.5). Problem 324. Show that the solution of this minimum MSE problem is unique in the following sense: if B ∗ 1 and B ∗ 2 are two different solutions of (27.1.5) and y is any feasible observed value y, plugged into equations (27.1.3) they will lead to the same p redicted value z ∗ . Answer. Comes from the fact that every feasible observed value of y can be written in the form y = µ + Ω Ω Ω yy q for some q, therefore B ∗ i y = B ∗ i Ω Ω Ω yy q = Ω Ω Ω zy q. 708 27. BEST LINEAR PREDICTION The matrix B ∗ is also called the regression matrix of z on y, and the unscaled covariance matrix has the form (27.1.13) Ω Ω Ω = Ω Ω Ω yy Ω Ω Ω yz Ω Ω Ω zy Ω Ω Ω zz = Ω Ω Ω yy Ω Ω Ω yy X XΩ Ω Ω yy XΩ Ω Ω yy X + Ω Ω Ω zz.y Where we wrote here B ∗ = X in order to make the analogy with regression clearer. A g-inverse is (27.1.14) Ω Ω Ω − = Ω Ω Ω − yy + X Ω Ω Ω − zz.y X −X Ω Ω Ω − zz.y −X Ω Ω Ω − zz.y Ω Ω Ω − zz.y and every g-inverse of the covariance matrix has a g-inverse of Ω Ω Ω zz.y as its zz- partition. (Proof in Problem 592.) If Ω Ω Ω = Ω Ω Ω yy Ω Ω Ω yz Ω Ω Ω zy Ω Ω Ω zz is nonsingular, 27.1.5 is also solved by B ∗ = −(Ω Ω Ω zz ) − Ω Ω Ω zy where Ω Ω Ω zz and Ω Ω Ω zy are the corresponding partitions of the inverse Ω Ω Ω −1 . See Problem 592 for a proof. Therefore instead of 27.1.6 the predictor can also be written (27.1.15) z ∗ = ν − Ω Ω Ω zz −1 Ω Ω Ω zy (y −µ) (note the minus sign) or (27.1.16) z ∗ = ν −Ω Ω Ω zz.y Ω Ω Ω zy (y −µ). 27.1. MINIMUM MEAN SQUARED ERROR, UNBIASEDNESS NOT REQUIRED 709 Problem 325. This problem utilizes the concept of a bounded risk estimator, which is not yet explained very well in these notes. Assume y, z, µ, and ν are jointly distributed random vectors. First assume ν and µ are observed, but y and z are not. Assume we know that in this case, the best linear bounded MSE predictor of y and z is µ and ν, with prediction errors distributed as follows: (27.1.17) y −µ z −ν ∼ o o , σ 2 Ω Ω Ω yy Ω Ω Ω yz Ω Ω Ω zy Ω Ω Ω zz . This is the initial information. Here it is unnecessary to specify the unconditional distributions of µ and ν, i.e., E [µ] and E [ν] as well as the joint covariance matrix of µ and ν are not needed, even if they are known. Then in a second step assume that an observation of y becomes available, i.e., now y, ν, and µ are observed, but z still isn’t. Then the predictor (27.1.18) z ∗ = ν + Ω Ω Ω zy Ω Ω Ω − yy (y −µ) is th e best linear bounded MSE predictor of z based on y, µ, and ν. • a. Give special cases of this specification in which µ and ν are constant and y and z random, and one in which µ and ν and y are random and z is constant, and one in which µ and ν are random and y and z are constant. 710 27. BEST LINEAR PREDICTION Answer. If µ and ν are constant, they are written µ and ν. From this follows µ = E [y] and ν = E [z] and σ 2 Ω Ω Ω yy Ω Ω Ω yz Ω Ω Ω zy Ω Ω Ω zz = V [ y rx ] and every linear predictor has bounded MSE. Then the proof is as given earlier in this chapter. But an example in which µ and ν are not known constants but are observed random variables, and y is also a random variable but z is constant, is (28.0.26). Another example, in which y and z both are constants and µ and ν random, is constrained least squares (29.4.3). • b. Prove equation 27.1.18. Answer. In this proof we allow all four µ and ν and y and z to be random. A linear predictor based on y, µ, and ν can be written as ˜ z = By + Cµ + Dν + d, therefore ˜ z − z = B(y −µ) + (C + B)µ + (D −I)ν −(z −ν) + d. E [ ˜ z −z] = o + (C + B) E [µ] + (D −I) E [ν] −o + d. Assuming that E [µ] and E [ν] can be anything, the requirement of bounded MSE (or simply the requirement of unbiasedness, bu t this is not as elegant) gives C = −B and D = I, therefore ˜ z = ν + B(y − µ) + d, and the estimation error is ˜ z − z = B(y − µ) − (z − ν) + d. Now continue as in the proof of theorem 27.1.1. I must still carry out this proof much more carefully! Problem 326. 4 points According to (27.1.2), the prediction error z ∗ − z is uncorrelated with y. If the distribution is such that the prediction error is even independent of y (as is the case if y and z are jointly normal), then z ∗ as defined in (27.1.6) is the conditional mean z ∗ = E [z|y], and its MSE-matrix as defined in (27.1.7) is the conditional variance V [z|y]. 27.1. MINIMUM MEAN SQUARED ERROR, UNBIASEDNESS NOT REQUIRED 711 Answer. From independence follows E [z ∗ − z|y] = E [z ∗ − z], and by the law of iterated expectation s E [z ∗ − z] = o. Rewrite this as E [z|y] = E [z ∗ |y]. But since z ∗ is a function of y, E [z ∗ |y] = z ∗ . Now the proof that the conditional dispersion matrix is the MSE matrix: V [z|y] = E [(z − E [z|y])(z − E [z|y]) |y] = E [(z − z ∗ )(z − z ∗ ) |y] = E [(z − z ∗ )(z − z ∗ ) ] = MSE[z ∗ ; z]. (27.1.19) Problem 327. Assume the expected values of x, y and z are known, and their joint covariance matrix is known up to an unknown scalar factor σ 2 > 0. (27.1.20) x y z ∼ λ µ ν , σ 2 Ω Ω Ω xx Ω Ω Ω xy Ω Ω Ω xz Ω Ω Ω xy Ω Ω Ω yy Ω Ω Ω yz Ω Ω Ω xz Ω Ω Ω yz Ω Ω Ω zz . x is the original information, y is additional information which becomes available, and z is the variable which we want to predict on the basis of this information. • a. 2 points Show that y ∗ = µ + Ω Ω Ω xy Ω Ω Ω − xx (x − λ) is the best linear predictor of y and z ∗ = ν + Ω Ω Ω xz Ω Ω Ω − xx (x − λ) the best linear predictor of z on the basis of the 712 27. BEST LINEAR PREDICTION observation of x, and that their joint MSE-matrix is E y ∗ − y z ∗ − z (y ∗ − y) (z ∗ − z) = σ 2 Ω Ω Ω yy −Ω Ω Ω xy Ω Ω Ω − xx Ω Ω Ω xy Ω Ω Ω yz −Ω Ω Ω xy Ω Ω Ω − xx Ω Ω Ω xz Ω Ω Ω yz −Ω Ω Ω xz Ω Ω Ω − xx Ω Ω Ω xy Ω Ω Ω zz −Ω Ω Ω xz Ω Ω Ω − xx Ω Ω Ω xz which can also be written = σ 2 Ω Ω Ω yy.x Ω Ω Ω yz.x Ω Ω Ω yz.x Ω Ω Ω zz.x . Answer. This part of the question is a simple application of the formulas derived earlier. For the MSE-matrix you first get σ 2 Ω Ω Ω yy Ω Ω Ω yz Ω Ω Ω yz Ω Ω Ω zz − Ω Ω Ω xy Ω Ω Ω xz Ω Ω Ω − xx Ω Ω Ω xy Ω Ω Ω xz • b. 5 points Show that the best linear predictor of z on the basis of the obser- vations of x and y has the form (27.1.21) z ∗∗ = z ∗ + Ω Ω Ω yz.x Ω Ω Ω − yy.x (y −y ∗ ) This is an important formula. All you need to compute z ∗∗ is the best estimate z ∗ before the new information y became available, the best estimate y ∗ of that new [...]... assume x1 , x2 , and ε are Normally distributed and independent of each other, and E[ε] = 0 Define y = α + β1 x1 + β2 x2 + ε Again express β1 and β2 in terms of variances and covariances of x1 , x2 , and y Answer Since x1 and x2 are independent, one gets the same formulas as in the univariate cov[x1 ,y] case: from cov[x1 , y] = β1 var[x1 ] and cov[x2 , y] = β2 var[x2 ] follows β1 = var[x ] and β2 = 1 cov[x2... yy.x yy.x Problem 328 Assume x, y, and z have a joint probability distribution, and the conditional expectation E [z|x, y] = α∗ + A∗ x + B ∗ y is linear in x and y 714 27 BEST LINEAR PREDICTION • a 1 point Show that E [z|x] = α∗ + A∗ x + B ∗ E [y|x] Hint: you may use the law of iterated expectations in the following form: E [z|x] = E E [z|x, y] x Answer With this hint it is trivial: E [z|x] = E α∗... x and y are not independent but E[y|x] = 2 − x Compute E[z|x] Answer E[z|x] = 1 + 2x + 3(2 − x) = 7 − x In this situation, both slope and intercept change, but it is still a linear relationship • d 1 point Again E[z|x, y] = 1 + 2x + 3y, and this time the relationship between x and y is nonlinear: E[y|x] = 2 − ex Compute E[z|x] Answer E[z|x] = 1 + 2x + 3(2 − ex ) = 7 + 2x − 3ex This time the marginal... • e Since x1 and y are jointly normal, they can also be written x1 = γ1 +δ1 y+ω 1 , where ω 1 is independent of y Likewise, x2 = γ2 + δ2 y + ω 2 , where ω 2 is independent of y Express δ1 and δ2 in terms of the variances and covariances of x1 , x2 , and y, and show that δ1 var[x1 ] var[y] = δ2 0 (27.1.23) 0 var[x2 ] β1 β2 This is (27.1.22) in the present situation Answer δ1 = cov[x1 ,y] var[y] and δ2... + B ∗ E [y|x] • b 1 point The next three examples are from [CW99, pp 264/5]: Assume E[z|x, y] = 1 + 2x + 3y, x and y are independent, and E[y] = 2 Compute E[z|x] Answer According to the formula, E[z|x] = 1 + 2x + 3E[y|x], but since x and y are independent, E[y|x] = E[y] = 2; therefore E[z|x] = 7 + 2x I.e., the slope is the same, but the intercept changes • c 1 point Assume again E[z|x, y] = 1 + 2x... and the prediction error is uncorrelated with x • b Express β in terms of the variances and covariances of x and y Answer cov[x, y] = β var[x], therefore β = cov[x,y] var[x] • c Since x and y are jointly normal, they can also be written x = γ + δy + ω where ω is independent of y Express δ in terms of the variances and covariances of x and y, and show that var[y]β = γ var[x] Answer δ = cov[x,y] var[y]... explanatory variables are linear Problem 329 In order to make relationship (27.1.22) more intuitive, assume x and ε are Normally distributed and independent of each other, and E[ε] = 0 Define y = α + βx + ε 716 27 BEST LINEAR PREDICTION • a Show that α + βx is the best linear predictor of y based on the observation of x Answer Follows from the fact that the predictor is unbiased and the prediction error... This time the marginal relationship between x and y is no longer linear This is so despite the fact that, if all the variables are included, i.e., if both x and y are included, then the relationship is linear 27.1 MINIMUM MEAN SQUARED ERROR, UNBIASEDNESS NOT REQUIRED 715 • e 1 point Assume E[f (z)|x, y] = 1 + 2x + 3y, where f is a nonlinear function, and E[y|x] = 2 − x Compute E[f (z)|x] Answer E[f... the linear model often deals with pairs of models which are nested in each other, one model either having more data or more stringent parameter restrictions than the other We will discuss such nested models in three forms: in the remainder of the present chapter 28 we will see how estimates must be updated when more observations become available, in chapter 29 how the imposition of a linear constraint... (to standardize it one would have to divide it by its relative standard deviation) Compare this with (31.2.9) Answer (28.0.29) can either be derived from (28.0.25), or from the following alternative application of the updating principle: All the information which the old observations have for the ˆ estimate of x0 β is contained in y 0 = x0 β The information which the updated regression, which ˆ includes . best linear bounded MSE predictor of z based on y, µ, and ν. • a. Give special cases of this specification in which µ and ν are constant and y and z random, and one in which µ and ν and y are random. y are random and z is constant, and one in which µ and ν are random and y and z are constant. 710 27. BEST LINEAR PREDICTION Answer. If µ and ν are constant, they are written µ and ν. From this. not yet explained very well in these notes. Assume y, z, µ, and ν are jointly distributed random vectors. First assume ν and µ are observed, but y and z are not. Assume we know that in this case,