Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 57 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
57
Dung lượng
526,34 KB
Nội dung
CHAPTER 29 Constrained Least Squares One of the assumptions for the linear model was that nothing is known about the true value of β. Any k-vector γ is a possible candidate for the value of β. We used this assumption e.g. when we concluded that an unbiased estimator ˜ By of β must satisfy ˜ BX = I. Now we will modify this assumption and assume we know that the true value β satisfies the linear constraint Rβ = u. To fix notation, assume y be a n × 1 vector, u a i × 1 vector, X a n × k matrix, and R a i × k matrix. In addition to our usual assumption that all columns of X are linearly independent (i.e., X has full column rank) we will also make the assumption that all rows of R are linearly indep endent (which is called: R has full row rank). In other words, the matrix of constraints R does not include “redundant” constraints which are linear combinations of the other constraints. 737 738 29. CONSTRAINED LEAST SQUARES 29.1. Building the Constraint into the Model Problem 337. Given a regression with a constant term and two explanatory variables which we will call x and z, i.e., (29.1.1) y t = α + βx t + γz t + ε t • a. 1 point How will you estimate β and γ if it is known that β = γ? Answer. Write (29.1.2) y t = α + β(x t + z t ) + ε t • b. 1 point How will you estimate β and γ if it is known that β + γ = 1? Answer. Setting γ = 1 −β gives the regression (29.1.3) y t − z t = α + β(x t − z t ) + ε t • c. 3 points Go back to a. If you add the original z as an additional regressor into the modified regression incorporating the constraint β = γ, then t he coefficient of z is no longer an estimate of the original γ, but of a new parameter δ which is a linear combination of α, β, and γ. Compute this linear combination, i.e., express δ 29.1. BUILDING THE CONSTRAINT INTO THE MODEL 739 in terms of α, β, and γ. Remark (no proof required): this regression is equivalent to (29.1.1), and it allows you to test the constraint. Answer. It you add z as additional regressor into (29.1.2), you get y t = α+β(x t +z t )+δz t +ε t . Now substitute the right hand side from (29.1.1) for y to g et α + βx t + γz t + ε t = α + β(x t + z t ) + δz t + ε t . Cancelling out gives γz t = βz t + δz t , in other words, γ = β + δ. In this regression, therefore, the coefficient of z is split into the sum of two terms, the first term is the value it should be if the constraint were satisfied, and the other term is the difference from that. • d. 2 points Now do the same thing with the modified regression from part b which incorporates the constraint β + γ = 1: include the original z as an additional regressor and determine the meaning of the coefficient of z. What Problem 337 suggests is true in general: every constrained Least Squares problem can be reduced to an equivalent unconstrained Least Squares problem with fewer explanatory variables. Indeed, one can consider every least squares problem to be “constrained” because the assumption E [y] = Xβ for some β is equivalent to a linear constraint on E [y]. The decision not to include certain explanatory variables in the regression can be considered the decision to set certain elements of β zero, which is the imposition of a constraint. If one writes a certain regression model as a constrained version of some other regression model, this simply means that one is interested in the relationship between two nested regressions. Problem 273 is another example here. 740 29. CONSTRAINED LEAST SQUARES 29.2. Conversion of an Arbitrary Constraint into a Zero Constraint This section, which is nothing but the matrix version of Problem 337, follows [DM93, pp. 16–19]. By reordering the elements of β one can write the constraint Rβ = u in the form (29.2.1) R 1 R 2 β 1 β 2 ≡ R 1 β 1 + R 2 β 2 = u where R 1 is a nonsingular i ×i matrix. Why can that be done? The rank of R is i, i.e., all the rows are linearly independent. Since row rank is equal to column rank, there are also i linearly independent columns. Use those for R 1 . Using this same partition, the original regression can be written (29.2.2) y = X 1 β 1 + X 2 β 2 + ε ε ε Now one can solve (29.2.1) for β 1 to get (29.2.3) β 1 = R −1 1 u −R −1 1 R 2 β 2 Plug ( 29.2.3) into (29.2.2) and rearrange to get a regression which is equivalent to the constrained regression: (29.2.4) y − X 1 R −1 1 u = (X 2 − X 1 R −1 1 R 2 )β 2 + ε ε ε 29.2. CONVERSION OF AN ARBITRARY CONSTR AINT INTO A ZERO CONSTRAINT 741 or (29.2.5) y ∗ = Z 2 β 2 + ε ε ε One more thing is noteworthy here: if we add X 1 as additional regressors into (29.2.5), we get a regression that is equivalent to (29.2.2). To see this, define the difference between the left hand side and right hand side of (29.2.3) as γ 1 = β 1 − R −1 1 u+R −1 1 R 2 β 2 ; then the constraint (29.2.1) is equivalent to the “zero constraint” γ 1 = o, and the regression (29.2.6) y −X 1 R −1 1 u = (X 2 −X 1 R −1 1 R 2 )β 2 + X 1 (β 1 −R −1 1 u + R −1 1 R 2 β 2 ) + ε ε ε is equivalent to the original regression (29.2.2). (29.2.6) can also be written as (29.2.7) y ∗ = Z 2 β 2 + X 1 γ 1 + ε ε ε The coefficient of X 1 , if it is added back into (29.2.5), is therefore γ 1 . Problem 338. [DM93] assert on p. 17, middle, that (29.2.8) R [X 1 , Z 2 ] = R [X 1 , X 2 ]. where Z 2 = X 2 − X 1 R −1 1 R 2 . Give a proof. Answer. We have to show (29.2.9) {z : z = X 1 γ + X 2 δ} = {z : z = X 1 α + Z 2 β} 742 29. CONSTRAINED LEAST SQUARES First ⊂: given γ and δ we need a α and β with (29.2.10) X 1 γ + X 2 δ = X 1 α + (X 2 − X 1 R −1 1 R 2 )β This can be accomplished with β = δ and α = γ + R −1 1 R 2 δ. The other side is even more trivial: given α and β, multiplying out the right side of ( 29.2.10) gives X 1 α + X 2 β − X 1 R −1 1 R 2 β, i.e., δ = β and γ = α − R −1 1 R 2 β. 29.3. Lagrange Approach to Constrained Least Squares The constrained least squares estimator is that k ×1 vector β = ˆ ˆ β which mini- mizes SSE = (y − Xβ) (y − Xβ) subjec t to the linear constraint Rβ = u. Again, we assume that X has full c olumn and R full row rank. The Lagrange approach to constrained least squares, which we follow here, is given in [Gre97, Section 7.3 on pp. 341/2], also [DM93, pp. 90/1]: The Constrained Least Squares problem can be solved with the help of the “Lagrange function,” which is a function of the k ×1 vector β and an additional i×1 vector λ of “Lagrange multipliers”: (29.3.1) L(β, λ) = (y −Xβ) (y − Xβ) + (Rβ − u) λ λ can be considered a vector of “penalties” for violating the constraint. For every possible value of λ one computes that β = ˜ β which minimizes L for that λ (This is an unconstrained minimization problem.) It will turn out that for one of the values 29.3. LAGRANGE APPROACH TO CONSTRAINED LEAST SQUARES 743 λ = λ ∗ , the corresponding β = ˆ ˆ β satisfies the constraint. This ˆ ˆ β is the solution of the constrained minimization problem we are looking for. Problem 339. 4 points Show the following: If β = ˆ ˆ β is the unconstrained minimum argument of the Lagrange function (29.3.2) L(β, λ ∗ ) = (y −Xβ) (y − Xβ) + (Rβ − u) λ ∗ for some fixed value λ ∗ , and if at the same time ˆ ˆ β satisfies R ˆ ˆ β = u, then β = ˆ ˆ β minimizes (y − Xβ) (y − Xβ) subject to the constraint Rβ = u. Answer. Since ˆ ˆ β minimizes the Lagrange function, we know that (y −X ˜ β) (y −X ˜ β) + (R ˜ β −u) λ ∗ ≥ (y −X ˆ ˆ β) (y −X ˆ ˆ β) + (R ˆ ˆ β −u) λ ∗ (29.3.3) for all ˜ β. Since by assumption, ˆ ˆ β also satisfies the constraint, this simplifies to: (y −X ˜ β) (y −X ˜ β) + (R ˜ β −u) λ ∗ ≥ (y −X ˆ ˆ β) (y −X ˆ ˆ β).(29.3.4) This is still true for all ˜ β. If we only look at those ˜ β which satisfy the constraint, we get (y −X ˜ β) (y −X ˜ β) ≥ (y − X ˆ ˆ β) (y −X ˆ ˆ β).(29.3.5) This means, ˆ ˆ β is the constrained minimum argument. 744 29. CONSTRAINED LEAST SQUARES Instead of imposing the constraint itself, one imposes a penalty function which has such a form that the agents will “voluntarily” heed the constraint. This is a familiar principle in neoclassical economics: instead of restricting pollution to a certain level, tax the polluters so much that they will voluntarily stay within the desired level. The proof which follows now not only derives the formula for ˆ ˆ β but also shows that there is always a λ ∗ for which ˆ ˆ β satisfies R ˆ ˆ β = u. Problem 340. 2 points Use the simple matrix differentiation rules ∂(w β)/∂β = w and ∂(β Mβ)/∂β = 2β M to compute ∂L/∂β where (29.3.6) L(β) = (y − Xβ) (y − Xβ) + (Rβ − u) λ Answer. Write the objective function as y y − 2y Xβ + β X Xβ + λ Rβ − λ u to get (29.3.7). Our goal is to find a ˆ ˆ β and a λ ∗ so that (a) β = ˆ ˆ β minimizes L(β, λ ∗ ) and (b) R ˆ ˆ β = u. In other words, ˆ ˆ β and λ ∗ together satisfy the following two conditions: (a) they must satisfy the first order condition for the unconstrained minimization of L with respect to β, i.e., ˆ ˆ β must annul (29.3.7) ∂L/∂β = −2y X + 2β X X + λ ∗ R, 29.3. LAGRANGE APPROACH TO CONSTRAINED LEAST SQUARES 745 and (b) ˆ ˆ β must satisfy the constraint (29.3.9). (29.3.7) and (29.3.9) are two linear matrix equations which can indeed be solved for ˆ ˆ β and λ ∗ . I wrote (29.3.7) as a row vector, because the Jacobian of a s calar function is a row vector, but it is usually written as a column vector. Since this conventional notation is arithmetically a little simpler here, we will replace (29.3.7) with its transpose (29.3.8). Our starting point is therefore 2X X ˆ ˆ β = 2X y − R λ ∗ (29.3.8) R ˆ ˆ β − u = o(29.3.9) Some textbook treatments have an extra factor 2 in front of λ ∗ , which makes the math slightly smoother, but which has the disadvantage that the Lagrange multiplier can no longer be interpreted as the “shadow price” for violating the constraint. Solve (29.3.8) for ˆ ˆ β to get that ˆ ˆ β which minimizes L for any given λ ∗ : (29.3.10) ˆ ˆ β = (X X) −1 X y − 1 2 (X X) −1 R λ ∗ = ˆ β − 1 2 (X X) −1 R λ ∗ Here ˆ β on the right hand side is the unconstrained OLS estimate. Plug this formula for ˆ ˆ β into (29.3.9) in order to determine that value of λ ∗ for which the corresp onding 746 29. CONSTRAINED LEAST SQUARES ˆ ˆ β satisfies the constraint: (29.3.11) R ˆ β − 1 2 R(X X) −1 R λ ∗ − u = o. Since R has full row rank and X full column rank, R(X X) −1 R has an inverse (Problem 341). Therefore one can solve for λ ∗ : (29.3.12) λ ∗ = 2 R(X X) −1 R −1 (R ˆ β − u) If one substitutes this λ ∗ back into (29.3.10), one gets the formula for the constrained least squares estimator: (29.3.13) ˆ ˆ β = ˆ β − (X X) −1 R R(X X) −1 R −1 (R ˆ β − u). Problem 341. If R has full row rank and X full column rank, show that R(X X) −1 R has an inverse. Answer. Since it is nonnegative definite we have to show that it is positive definite. b R(X X) −1 R b = 0 implies b R = o because (X X) −1 is positive definite, and this implies b = o because R has full row rank. Problem 342. Assume ε ε ε ∼ (o, σ 2 Ψ) with a nonsingular Ψ and show: If one minimizes SSE = (y −Xβ) Ψ −1 (y −Xβ) subject to the linear constraint Rβ = u, [...]... obtained by simple regressions, c, p, q, r, and s, three lie on one straight line, and the other two on a different straight line, with the intersection of these straight lines being the corner point in the multiple regression of y on x1 and x2 Which three points are on the same line, and how can these two lines be characterized? • j 1 point Of the lines cp, pq, qr, and rs, two are parallel to x1 , and. .. reggeom-simulation, y is the purple line; X β is the red line starting at the ˆ ; X(β − β) = y − y is the light blue line, and ε is the green line which ˆ ˆ ˆ ˆ origin, one could also call it y ˆ ˆ ˆ ˆ does not start at the origin In other words: if one projects y on a plane, and also on a line in that plane, and then connects the footpoints of these two projections, one obtains a zig-zag line with two right angles... graphical intuition of the issues in sequential regression Make sure the stand-alone program xgobi is installed on your computer (in Debian GNU-Linux do apt-get install xgobi), and the Rinterface xgobi is installed (the R-command is simply install.packages("xgobi"), or, on a Debian system the preferred argument is install.packages("xgobi", lib = "/usr/lib/R/library")) You have to give the commands library(xgobi)... animation in which the backfitting prodecure is highlighted The successive residuals which are used as regressors 30 ADDITIONAL REGRESSORS 769 are drawn in dark blue, and the quickly improving approximations to the fitted value are connected by a red zig-zag line • h 1 point The diagram contains the points for two more backfitting steps Identify the endpoints of both residuals • i 2 points Of the five cornerpoints... situation in which this formula is valid as a BLUE-formula, and compare the situation with the situation here Answer Of course, constrained least squares But in contrained least squares, β is nonrandom ˆ and β is random, while here it is the other way round In the unconstrained OLS model, i.e., before the “observation” of u = Rβ, the ˆ ˆ best bounded MSE estimators of u and β are Rβ and β, with the sampling... ?? Problem 352 3 points In the same way, check that the decomposition 3 3 0 + 0 0 4 is y = y + ε in the regression of y = ˆ 3 3 4 on x1 = 5 0 0 and x2 = 3 3 4 −1 4 0 = ˆ ˆ ˆ Answer Besides the equation y = y + ε we have to check two things: (1) y is a linear ˆ ˆ ˆ combination of all the explanatory variables Since both x1 and x2 have zero as third coordinate, and they are linearly independent, they... one obtains an orthogonal decomposition into three parts: ˆ ˆ Problem 347 Assume β is the constrained least squares estimator subject to ˆ is the unconstrained least squares estimator the constraint Rβ = o, and β ˆ ˆ ˆ ˆ • a 1 point With the usual notation y = X β and y = X β, show that ˆ ˆ ˆ y = y + (ˆ − y ) + ε ˆ y ˆ ˆ ˆ (29.7.7) Point out these vectors in the reggeom simulation ˆ ˆ Answer In the... where h is a multiple of x2 and k orthogonal to x2 This is the next step in the backfitting algorithm Draw this decomposition into the diagram The points are already invisibly present Therefore you should use the line editor to connect the points You may want to increase the magnification scale of the figure for this (In my version of XGobi, I often lose lines if I try to add more lines This seems to be a... point Draw in the regression of y on x2 • l 3 points Which two variables are plotted against each other in an addedvariable plot for x2 ? Here are the coordinates of some of the points in this animation: 770 30 ADDITIONAL REGRESSORS x1 x2 y y y ˆ ˆ ˆ 5 -1 3 3 3 0 4 3 3 0 0 0 4 0 0 In the dataset which R submits to XGobi, all coordinates are multiplied by 1156 , which has the effect that all the points... Which label does the corner point of the decomposition have? Make a geometric argument that the new residual k is no longer orthogonal to x2 • g 1 point The next step in the backfitting procedure is to regress k on x1 The corner point for this decomposition is again invisibly in the animation Identify the two endpoints of the residual in this regression Hint: the R-command example(reggeom) produces . of constraints R does not include “redundant” constraints which are linear combinations of the other constraints. 737 738 29. CONSTRAINED LEAST SQUARES 29.1. Building the Constraint into the Model Problem. parameter δ which is a linear combination of α, β, and γ. Compute this linear combination, i.e., express δ 29.1. BUILDING THE CONSTRAINT INTO THE MODEL 739 in terms of α, β, and γ. Remark (no proof. constraint were satisfied, and the other term is the difference from that. • d. 2 points Now do the same thing with the modified regression from part b which incorporates the constraint β + γ = 1: include