Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 45 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
45
Dung lượng
499,23 KB
Nội dung
CHAPTER 11 The Regression Fallacy Only for the sake of this exercise we will assume that “intelligence” is an innate prop e rty of individuals and can be represented by a real number z. If one picks at random a student entering the U of U, the intelligence of this student is a random variable which we assume to be normally distributed with mean µ and standard deviation σ. Also assume every student has to take two intelligence tests, the first at the beginning of his or her studies, the other half a year later. The outcomes of these tests are x and y. x and y measure the intelligence z (which is assumed to be the same in both tests) plus a random error ε and δ, i.e., x = z + ε(11.0.14) y = z + δ(11.0.15) 309 310 11. THE REGRESSION FALLACY Here z ∼ N (µ, τ 2 ), ε ∼ N(0, σ 2 ), and δ ∼ N (0, σ 2 ) (i.e., we assume that both errors have the same variance). The three variables ε, δ, and z are independent of each other. Therefore x and y are jointly normal. var[x] = τ 2 + σ 2 , var[y] = τ 2 + σ 2 , cov[x, y] = cov[z + ε, z +δ] = τ 2 + 0 + 0 + 0 = τ 2 . Therefore ρ = τ 2 τ 2 +σ 2 . The contour lines of the joint density are ellipses with center (µ, µ) whose main axes are the lines y = x and y = −x in the x, y-plane. Now what is the conditional mean? Since var[x] = var[y], (10.3.17) gives the line E[y|x=x] = µ + ρ(x − µ), i.e., it is a line which goes through the center of the ellipses but which is flatter than the line x = y representing the real underlying linear relationship if there are no errors. Geometrically one can get it as the line which intersects each ellipse exactly where the ellipse is vertical. Therefore, the parameters of the best prediction of y on the basis of x are not the parameters of the underlying relationship. Why not? Because not only y but also x is subject to errors. As sume you pick an individual by random, and it turns out that his or her first test result is very much higher than the average. Then it is more likely that this is an individual which was lucky in the first exam, and his or her true IQ is lower than the one measured, than that the individual is an Einstein who had a bad day. This is simply because z is normally distributed, i.e., among the students entering a given University, there are more individuals with lower IQ’s than Einsteins. In order to make a good prediction of the result of the second test one 11. THE REGRESSION FALLACY 311 must make allowance for the fact that the individual’s IQ is most likely lower than his first score indicated, therefore one will predict the second score to be lower than the first score. The converse is true for individuals who scored lower than average, i.e., in your prediction you will do as if a “regression towards the mean” had taken place. The next important point to note here is: the “true regression line,” i.e., the prediction line, is uniquely determined by the joint distribution of x and y. However the line representing the underlying relationship can only be determined if one has information in addition to the joint density, i.e., in addition to the observations. E.g., assume the two tests have different standard deviations, which may be the case simply because the second test has more questions and is therefore more accurate. Then the underlying 45 ◦ line is no longer one of the main axes of the ellipse! To be more precise, the underlying line can only be identified if one knows the ratio of the variances, or if one knows one of the two variances. Without any knowledge of the variances, the only thing one can say about the underlying line is that it lies between the line predicting y on the basis of x and the line predicting x on the basis of y. The name “regression” stems from a confusion between the prediction line and the real underlying relationship. Francis Galton, the cousin of the famous Darwin, measured the height of fathers and sons, and concluded from his e vidence that the heights of sons tended to be closer to the average height than the height of the 312 11. THE REGRESSION FALLACY fathers, a purported law of “regression towards the mean.” Problem 180 illustrates this: Problem 180. The evaluation of two intelligence tests, one at the beginning of the semester, one at the end, gives the following disturbing outcome: While the underlying intelligence during the first test was z ∼ N(100, 20), it changed between the first and second test due to the learning experience at the university. If w is the intelligence of each student at the second test, it is connected to his intelligence z at the first test by the formula w = 0.5z + 50, i.e., those students with intelligence below 100 gained, but those students with intelligence above 100 lost. (The errors of both intelligence tests are normally distributed with expected value zero, and the variance of the first intelligence test was 5, and that of the second test, which had more questions, was 4. As usual, the errors are independent of each other and of the actual intelligence.) • a. 3 points If x and y are the outcomes of the first and second intelligence test, compute E[x], E[y], var[x], var[y], and the correlation coefficient ρ = corr[x, y]. Figure 1 shows an equi-density line of their joint distribution; 95% of the probability mass of the test results are inside this ellipse. Draw the line w = 0.5z + 50 into Figure 1. Answer. We know z ∼ N(100, 20); w = 0.5z + 50; x = z + ε; ε ∼ N(0, 4); y = w + δ; δ ∼ N(0, 5); therefore E[x] = 100; E[y] = 100; var[x] = 20 + 5 = 25; var[y] = 5 + 4 = 9; 11. THE REGRESSION FALLACY 313 cov[x, y] = 10; corr[x, y] = 10/15 = 2/3. In matrix notation (11.0.16) x y ∼ N 100 100 , 25 10 10 9 The line y = 50 + 0.5x goes through the points (80, 90) and (120, 110). • b. 4 points Compute E[y|x=x] and E[x|y=y]. The first is a linear function of x and the second a linear function of y. Draw the two lines representing these linear functions into Figure 1. Use (10.3.18) for this. Answer. E[y|x=x] = 100 + 10 25 (x − 100) = 60 + 2 5 x(11.0.17) E[x|y=y] = 100 + 10 9 (y − 100) = − 100 9 + 10 9 y.(11.0.18) The line y = E[y|x=x] goes th roug h the points (80, 92) and (120, 108) at the edge of Figure 1; it intersects the ellipse where it is vertical. The line x = E[x|y=y] goes through the points (80, 82) and (120, 118), which are the corner points of Figure 1; it intersects the ellipse where it is horizontal. The two lines intersect in the center of the ellipse, i.e., at the p oint (100, 100). • c. 2 points Another researcher says that w = 6 10 z + 40, z ∼ N(100, 100 6 ), ε ∼ N (0, 50 6 ), δ ∼ N(0, 3). Is this compatible with the data? 314 11. THE REGRESSION FALLACY Answer. Yes, it is compatible: E[x] = E[z]+E[ε] = 100; E[y] = E[w]+E[δ] = 6 10 100+40 = 100; var[x] = 100 6 + 50 6 = 25; var[y] = 6 10 2 var[z] + var[δ] = 63 100 100 6 + 3 = 9; cov[x, y] = 6 10 var[z] = 10. • d. 4 points A third researcher asserts that the IQ of the students really did not change. He says w = z, z ∼ N(100, 5), ε ∼ N (0, 20), δ ∼ N(0, 4). Is this compatible with the data? Is there unambiguous evidence in the data that the IQ declined? Answer. This is not compatible. This scenario gets everything right except the covariance: E[x] = E[z] + E[ε] = 100; E[y] = E[z] + E[δ] = 100; var[x] = 5 + 20 = 25; var[y] = 5 + 4 = 9; cov[x, y] = 5. A scenario in which both tests have same underlying intelligence cannot be found. Since the two conditional expectations are on the same side of the diagonal, the hypothesis that the intelligence did not change between the two tests is not consistent with the joint distribution of x and y. The diagonal goes through the points (82, 82) and (118, 118), i.e., it intersects the two horizontal boundaries of Figure 1. We just showed that the parameters of the true underlying relationship cannot be inferred from the data alone if there are errors in both variables. We also showed that this lack of identification is not complete, because one can specify an interval which in the plim contains the true parameter value. Chapter 53 has a much more detailed discussion of all this. There we will see that this lack of identification can be removed if more information is available, i.e., if one knows that the two e rror variances are equal, or if one knows that the regress ion 11. THE REGRESSION FALLACY 315 has zero intercept, etc. Question 181 shows that in this latter case, the OLS estimate is not consistent, but other estimates exist that are consistent. Problem 181. [Fri57, chapter 3] According to Friedman’s permanent income hypothesis, drawing at random families in a given country and asking them about their income y and consumption c can be modeled as the independent observations of two random variables which satisfy y = y p + y t ,(11.0.19) c = c p + c t ,(11.0.20) c p = βy p .(11.0.21) Here y p and c p are the permanent and y t and c t the transitory components of income and consumption. These components are not observed separately, only their sums y and c are observed. We assume that the permanent income y p is random, with E[y p ] = µ = 0 and var[y p ] = τ 2 y . The transitory components y t and c t are assumed to be independent of each other and of y p , and E[y t ] = 0 , var[y t ] = σ 2 y , E[c t ] = 0 , and var[c t ] = σ 2 c . Finally, it is assumed that all variables are normally distributed. • a. 2 po ints Given the above information, write down the vector of expected val- ues E [ y c ] and the covariance matrix V [ y c ] in terms of the five unknown parameters of the model µ, β, τ 2 y , σ 2 y , and σ 2 c . 316 11. THE REGRESSION FALLACY Answer. (11.0.22) E y c = µ βµ and V y c = τ 2 y + σ 2 y βτ 2 y βτ 2 y β 2 τ 2 y + σ 2 c . • b. 3 points Assume that you know the true parameter values and you observe a family’s actual income y. Show that your best guess (minimum mean squared error) of this family’s permanent income y p is (11.0.23) y p∗ = σ 2 y τ 2 y + σ 2 y µ + τ 2 y τ 2 y + σ 2 y y. Note: here we are guessing income, not yet consumption! Use (10.3.17) for this! Answer. This answer also does the math for part c. The best guess is the conditional mean E[y p |y = 22,000] = E[y p ] + cov[y p , y] var[y] (22,000 − E[y]) = 12,000 + 16,000,000 20,000,000 (22,000 − 12,000) = 20,000 11. THE REGRESSION FALLACY 317 or equivalently E[y p |y = 22,000] = µ + τ 2 y τ 2 y + σ 2 y (22,000 − µ) = σ 2 y τ 2 y + σ 2 y µ + τ 2 y τ 2 y + σ 2 y 22,000 = (0.2)(12,000) + (0.8)(22,000) = 20,000. • c. 3 points To make things more concrete, assume the parameters are β = 0.7(11.0.24) σ y = 2,000(11.0.25) σ c = 1,000(11.0.26) µ = 12,000(11.0.27) τ y = 4,000.(11.0.28) If a family’s income is y = 22,000, what is your best guess of this family’s permanent income y p ? Give an intuitive explanation why this best guess is smaller than 22,000. Answer. Since the observed income of 22,000 is above the average of 12,000, chances are greater that it is someone with a positive transitory income than someone with a negative one. 318 11. THE REGRESSION FALLACY • d. 2 points If a family’s income is y, show that your best guess about this family’s consumption is (11.0.29) c ∗ = β σ 2 y τ 2 y + σ 2 y µ + τ 2 y τ 2 y + σ 2 y y . Instead of an exact mathematical proof you may also reason out how it can be obtained from (11.0.23). Give the numbers for a family whose actual income is 22,000. Answer. This is 0.7 times the best guess about the family’s permanent income, since the transitory consumption is uncorrelated with everything else and therefore must be predicted by 0. This is an acceptable answer, but one can also derive it from scratch: E[c|y = 22,000] = E[c] + cov[c, y] var[y] (22,000 − E[y]) (11.0.30) = βµ + βτ 2 y τ 2 y + σ 2 y (22,000 − µ) = 8,400 + 0.7 16,000,000 20,000,000 (22,000 − 12,000) = 14,000(11.0.31) or = β σ 2 y τ 2 y + σ 2 y µ + τ 2 y τ 2 y + σ 2 y 22,000 (11.0.32) = 0.7 (0.2)(12,000) + (0.8)(22,000) = (0.7)(20,000) = 14,000.(11.0.33) [...]... Assume y 1 , y 2 , and y 3 are independent N (µ, σ 2 ) Define three new variables z 1 , z 2 , and z 3 as follows: z 1 is that multiple of y which has variance ¯ σ 2 z 2 is that linear combination of z 1 and y 2 which has zero covariance with z 1 and has variance σ 2 z 3 is that linear combination of z 1 , z 2 , and y 3 which has zero covariance with both z 1 and z 2 and has again variance σ 2 These... point Therefore z 3 is a linear combination of y 1 and y 3 only Compute its coefficients • d 1 point How does the joint distribution of z 1 , z 2 , and z 3 differ from that of y 1 , y 2 , and y 3 ? Since they are jointly normal, you merely have to look at the expected values, variances, and covariances • e 2 points Show that z 2 + z 2 + z 2 = y 2 + y 2 + y 2 Is this a surprise? 1 2 3 1 2 3 • f 1 point... the “Iron Law of Econometrics : they ignore that actual income is a measurement with error of the true underlying variable, permanent income Problem 182 This question follows the original article [SW 76] much more closely than [HVdP02] does Sargent and Wallace first reproduce the usual argument why “activist” policy rules, in which the Fed “looks at many things” and “leans against the wind,” are superior... they think makes the best tradeoff between unemployment and in ation, by setting mt according to a rule with feedback: (11.0.39) mt = g0 + g1 y t−1 + εt Show that the following values of g0 and g1 (11.0.40) g0 = (y ∗ − α)/β g1 = −λ/β represent an optimal monetary policy, since they bring the expected value of the steady state E[y t ] to y ∗ and minimize the steady state variance var[y t ] • c 3 points... and D = 1 2 1 −2 1 −2 1 2 339 is symmetric and idempotent D is singular because its determinant is zero • c 1 point The joint distribution of y 1 and y 2 is bivariate normal, why did we then get a χ2 with one, instead of two, degrees of freedom? Answer Because y 1 − y and y 2 − y are not independent; one is exactly the negative of the ¯ ¯ other; therefore summing their squares is really only the square... parameters into formula (11.0.29) for the best linear predictor Show that this is equivalent to using the ordinary least ˆ squares predictor c∗ = α + βy n+1 where α and β are intercept and slope in the simple ˆ ˆ ˆ 11 THE REGRESSION FALLACY 321 regression of c on y, i.e., ¯ c (y i − y )(ci − ¯) ¯ (y i − y )2 α = ¯ − β¯ ˆ c ˆy ˆ β= (11.0. 36) (11.0.37) Note that we are regressing c on y with an intercept,... Show that in this system, the parameters g0 and g1 have no in uence on the time path of y • e 4 points On the other hand, the econometric estimations which the policy makers are running seem to show that these coefficients have an impact During a 11 THE REGRESSION FALLACY 325 certain period during which a constant policy rule g0 , g1 is followed, the econometricians regress y t on y t−1 and mt in order... y 2 )2 2 √ and since z = (y 1 − y 2 )/ 2σ 2 ∼ N (0, 1), its square is a χ2 1 • b 4 points Write down the covariance matrix of the vector y1 − y ¯ y2 − y ¯ (12.3.15) and show that it is singular Answer (12.3.11) and (12.3.12) give (12.3. 16) 1 y1 − y ¯ 2 = y2 − y ¯ −1 2 −1 2 1 2 y1 y2 = Dy 12.3 VARIANCE ESTIMATION AND DEGREES OF FREEDOM and V [Dy] = D V [y]D = σ 2 D because V [y] = σ 2 I and D = 1 2... properties define z 1 , z 2 , and z 3 uniquely up factors ±1, i.e., if z 1 satisfies the above conditions, then −z 1 does too, and these are the only two solutions • a 2 points Write z 1 and z 2 (not yet z 3 ) as linear combinations of y 1 , y 2 , and y3 • b 1 point To make the computation of z 3 less tedious, first show the following: if z 3 has zero covariance with z 1 and z 2 , it also has zero covariance... E[y] And from E[c] E[c] E[y] This together with 2 2 This together with var[y] = τy + σy gives 2 2 the last equation var[c] = β 2 τy + σc one get All these are consistent estimators, as long as E[y] = 0 and β = 0 • g 4 points Now assume you are not interested in estimating β itself, but in addition to the two n-vectors y and c you have an observation of y n+1 and you want to predict the corresponding . determined by the joint distribution of x and y. However the line representing the underlying relationship can only be determined if one has information in addition to the joint density, i.e., in. permanent income hypothesis, drawing at random families in a given country and asking them about their income y and consumption c can be modeled as the independent observations of two random variables. underlying line is that it lies between the line predicting y on the basis of x and the line predicting x on the basis of y. The name “regression” stems from a confusion between the prediction line