Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 30 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
30
Dung lượng
254,2 KB
Nội dung
Chapter Linear Quadratic Dynamic Programming 5.1 Introduction This chapter describes the class of dynamic programming problems in which the return function is quadratic and the transition function is linear This specification leads to the widely used optimal linear regulator problem, for which the Bellman equation can be solved quickly using linear algebra We consider the special case in which the return function and transition function are both time invariant, though the mathematics is almost identical when they are permitted to be deterministic functions of time Linear quadratic dynamic programming has two uses for us A first is to study optimum and equilibrium problems arising for linear rational expectations models Here the dynamic decision problems naturally take the form of an optimal linear regulator A second is to use a linear quadratic dynamic program to approximate one that is not linear quadratic Later in the chapter, we also describe a filtering problem of great interest to macroeconomists Its mathematical structure is identical to that of the optimal linear regulator, and its solution is the Kalman filter, a recursive way of solving linear filtering and estimation problems Suitably reinterpreted, formulas that solve the optimal linear regulator also describe the Kalman filter – 107 – 108 Linear Quadratic Dynamic Programming 5.2 The optimal linear regulator problem The undiscounted optimal linear regulator problem is to maximize over choice of {ut }∞ the criterion t=0 ∞ {xt Rxt + ut Qut }, − (5.2.1) t=0 subject to xt+1 = Axt + But , x0 given Here xt is an (n × 1) vector of state variables, ut is a (k×1) vector of controls, R is a positive semidefinite symmetric matrix, Q is a positive definite symmetric matrix, A is an (n × n) matrix, and B is an (n × k) matrix We guess that the value function is quadratic, V (x) = −x P x, where P is a positive semidefinite symmetric matrix Using the transition law to eliminate next period’s state, the Bellman equation becomes −x P x = max{−x Rx − u Qu − (Ax + Bu) P (Ax + Bu)} u (5.2.2) The first-order necessary condition for the maximum problem on the right side of equation (5.2.2 ) is (Q + B P B) u = −B P Ax, (5.2.3) which implies the feedback rule for u : −1 u = − (Q + B P B) or u = −F x, where −1 F = (Q + B P B) B P Ax (5.2.4) B P A (5.2.5) Substituting the optimizer (5.2.4 ) into the right side of equation (5.2.2 ) and rearranging gives −1 P = R + A P A − A P B (Q + B P B) B P A (5.2.6) Equation (5.2.6 ) is called the algebraic matrix Riccati equation It expresses the matrix P as an implicit function of the matrices R, Q, A, B Solving this equation for P requires a computer whenever P is larger than a × matrix We use the following rules for differentiating quadratic and bilinear matrix forms: ∂x Ax = (A + A )x; ∂y Bz = Bz, ∂y Bz = B y ∂x ∂y ∂z The optimal linear regulator problem 109 In exercise 5.1, you are asked to derive the Riccati equation for the case where the return function is modified to − (xt Rxt + ut Qut + 2ut W xt ) 5.2.1 Value function iteration Under particular conditions to be discussed in the section on stability, equation (5.2.6 ) has a unique positive semidefinite solution, which is approached in the limit as j → ∞ by iterations on the matrix Riccati difference equation: −1 Pj+1 = R + A Pj A − A Pj B (Q + B Pj B) B Pj A, (5.2.7a) starting from P0 = The policy function associated with Pj is Fj+1 = (Q + B Pj B) −1 B Pj A (5.2.7b) Equation (5.2.7 ) is derived much like equation (5.2.6 ) except that one starts from the iterative version of the Bellman equation rather than from the asymptotic version 5.2.2 Discounted linear regulator problem The discounted optimal linear regulator problem is to maximize ∞ − β t {xt Rxt + ut Qut }, < β < 1, (5.2.8) t=0 subject to xt+1 = Axt + But , x0 given This problem leads to the following matrix Riccati difference equation modified for discounting: Pj+1 = R + βA Pj A − β A Pj B (Q + βB Pj B) −1 B Pj A (5.2.9) If the eigenvalues of A are bounded in modulus below unity, this result obtains, but much weaker conditions suffice See Bertsekas (1976, chap 4) and Sargent (1980) 110 Linear Quadratic Dynamic Programming The algebraic matrix Riccati equation is modified correspondingly The value function for the infinite horizon problem is simply V (x0 ) = −x0 P x0 , where P is the limiting value of Pj resulting from iterations on equation (5.2.9 ) starting from P0 = The optimal policy is ut = −F xt , where F = β(Q + βB P B)−1 B P A The Matlab program olrp.m solves the discounted optimal linear regulator problem Matlab has a variety of other programs that solve both discrete and continuous time versions of undiscounted optimal linear regulator problems The program policyi.m solves the undiscounted optimal linear regulator problem using policy iteration, which we study next 5.2.3 Policy improvement algorithm The policy improvement algorithm can be applied to solve the discounted optimal linear regulator problem Starting from an initial F0 for which the eigen√ values of A − BF0 are less than 1/ β in modulus, the algorithm iterates on the two equations Pj = R + Fj QFj + β (A − BFj ) Pj (A − BFj ) −1 Fj+1 = β (Q + βB Pj B) B Pj A (5.2.10) (5.2.11) The first equation is an example of a discrete Lyapunov or Sylvester equation, which is to be solved for the matrix Pj that determines the value −xt Pj xt that is associated with following policy Fj forever The solution of this equation can be represented in the form ∞ β k (A − BFj ) Pj = k k R + Fj QFj (A − BFj ) k=0 √ If the eigenvalues of the matrix A − BFj are bounded in modulus by 1/ β , then a solution of this equation exists There are several methods available for solving this equation The Matlab program policyi.m solves the undiscounted optimal linear regulator problem using policy iteration This algorithm is typically much faster than the algorithm that iterates on the matrix Riccati The Matlab programs dlyap.m and doublej.m solve discrete Lyapunov equations See Anderson, Hansen, McGrattan, and Sargent (1996) The stochastic optimal linear regulator problem 111 equation Later we shall present a third method for solving for P that rests on the link between P and shadow prices for the state vector 5.3 The stochastic optimal linear regulator problem The stochastic discounted linear optimal regulator problem is to choose a decision rule for ut to maximize ∞ β t {xt Rxt + ut Qut }, −E0 < β < 1, (5.3.1) t ≥ 0, (5.3.2) t=0 subject to x0 given, and the law of motion xt+1 = Axt + But + C t+1 , where t+1 is an (n × 1) vector of random variables that is independently and identically distributed according to the normal distribution with mean vector zero and covariance matrix E t t = I (5.3.3) (See Kwakernaak and Sivan, 1972, for an extensive study of the continuous-time version of this problem; also see Chow, 1981.) The matrices R, Q, A, and B obey the assumption that we have described The value function for this problem is v (x) = −x P x − d, (5.3.4) where P is the unique positive semidefinite solution of the discounted algebraic matrix Riccati equation corresponding to equation (5.2.9 ) As before, it is the limit of iterations on equation (5.2.9 ) starting from P0 = The scalar d is given by −1 d = β (1 − β) tr P CC (5.3.5) where “tr” denotes the trace of a matrix Furthermore, the optimal policy continues to be given by ut = −F xt , where −1 F = β (Q + βB P B) B P A (5.3.6) 112 Linear Quadratic Dynamic Programming A notable feature of this solution is that the feedback rule (5.3.6 ) is identical with the rule for the corresponding nonstochastic linear optimal regulator problem This outcome is the certainty equivalence principle Certainty Equivalence Principle: The decision rule that solves the stochastic optimal linear regulator problem is identical with the decision rule for the corresponding nonstochastic linear optimal regulator problem Proof: Substitute guess (5.3.4 ) into the Bellman equation to obtain v (x) = max −x Rx − u Qu − βE (Ax + Bu + C ) P (Ax + Bu + C ) − βd , u where is the realization of preceding equation implies t+1 when xt = x and where E |x = The v (x) = max {−x Rx − u Qu − βE {x A P Ax + x A P Bu u + x A P C + u B P Ax + u B P Bu + u B P C + C P Ax + C P Bu + C P C } − βd} Evaluating the expectations inside the braces and using E |x = gives v (x) = max − {x Rx + u Qu + βx A P Ax + β2x A P Bu u + βu B P Bu + βE P } − βd The first-order condition for u is (Q + βB P B) u = −βB P Ax, which implies equation (5.3.6 ) Using E C P C = tr P CC , substituting equation (5.3.6 ) into the preceding expression for v(x), and using equation (5.3.4 ) gives −1 P = R + βA P A − β A P B (Q + βB P B) B P A, and −1 d = β (1 − β) trP CC Shadow prices in the linear regulator 113 5.3.1 Discussion of certainty equivalence The remarkable thing about this solution is that, although through d the objective function (5.3.3 ) depends on CC , the optimal decision rule ut = −F xt is independent of CC This is the message of equation (5.3.6 ) and the discounted algebraic Riccati equation for P , which are identical with the formulas derived earlier under certainty In other words, the optimal decision rule ut = h(xt ) is independent of the problem’s noise statistics The certainty equivalence principle is a special property of the optimal linear regulator problem and comes from the quadratic objective function, the linear transition equation, and the property E( t+1 |xt ) = Certainty equivalence does not characterize stochastic control problems generally For the remainder of this chapter, we return to the nonstochastic optimal linear regulator, remembering the stochastic counterpart 5.4 Shadow prices in the linear regulator For several purposes, it is helpful to interpret the gradient −2P xt of the value function −xt P xt as a shadow price or Lagrange multiplier Thus, associate with the Bellman equation the Lagrangian −xt P xt = V (xt ) = max − xt Rxt + ut Qut + xt+1 P xt+1 {µt+1 } ut + 2µt+1 [Axt + But − xt+1 ] , where 2µt+1 is a vector of Lagrange multipliers The first-order necessary conditions for an optimum with respect to ut and xt are 2Qut + 2B µt+1 = 2P xt+1 − 2µt+1 = (5.4.1) Therefore, in linear quadratic versions of the optimum savings problem, there are no precautionary savings See chapters 16 and 17 The gradient of the value function has information from which prices can be coaxed where the value function is for a planner in a linear quadratic economy See Hansen and Sargent (2000) 114 Linear Quadratic Dynamic Programming Using the transition law and rearranging gives the usual formula for the optimal decision rule, namely, ut = −(Q + B P B)−1 B P Axt Notice that by (5.4.1 ), the shadow price vector satisfies µt+1 = P xt+1 Later in this chapter, we shall describe a computational strategy that solves for P by directly finding the optimal multiplier process {µt } and representing it as µt = P xt This strategy exploits the stability properties of optimal solutions of the linear regulator problem, which we now briefly take up 5.4.1 Stability Upon substituting the optimal control ut = −F xt into the law of motion xt+1 = Axt + But , we obtain the optimal “closed-loop system” xt+1 = (A − BF )xt This difference equation governs the evolution of xt under the optimal control The system is said to be stable if limt→∞ xt = starting from any initial x0 ∈ Rn Assume that the eigenvalues of (A − BF ) are distinct, and use the eigenvalue decomposition (A − BF ) = DΛD−1 where the columns of D are the eigenvectors of (A − BF ) and Λ is a diagonal matrix of eigenvalues of (A− BF ) Write the “closed-loop” equation as xt+1 = DΛD−1 xt The solution of this difference equation for t > is readily verified by repeated substitution to be xt = DΛt D−1 x0 Evidently, the system is stable for all x0 ∈ Rn if and only if the eigenvalues of (A − BF ) are all strictly less than unity in absolute value When this condition is met, (A − BF ) is said to be a “stable matrix.” A vast literature is devoted to characterizing the conditions on A, B, R , and Q under which the optimal closed-loop system matrix (A−BF ) is stable These results are surveyed by Anderson, Hansen, McGrattan, and Sargent (1996) and can be briefly described here for the undiscounted case β = Roughly speaking, the conditions on A, B, R , and Q that are required for stability are as follows: First, A and B must be such that it is possible to pick a control law ut = −F xt that drives xt to zero eventually, starting from any x0 ∈ Rn [“the pair (A, B) must be stabilizable”] Second, the matrix R must be such that the controller wants to drive xt to zero as t → ∞ It is possible to amend the statements about stability in this section to permit A − BF to have a single unit eigenvalue associated with a constant in the state vector See chapter for examples Shadow prices in the linear regulator 115 It would take us far afield to go deeply into this body of theory, but we can give a flavor for the results by considering some very special cases The following assumptions and propositions are too strict for most economic applications, but similar results can obtain under weaker conditions relevant for economic problems Assumption A.1: The matrix R is positive definite There immediately follows: Proposition 1: Under Assumption A.1, if a solution to the undiscounted regulator exists, it satisfies limt→∞ xt = Proof: If xt → , then ∞ t=0 xt Rxt → −∞ Assumption A.2: The matrix R is positive semidefinite Under Assumption A.2, R is similar to a triangular matrix R∗ : R=T ∗ R11 0 T ∗ where R11 is positive definite and T is nonsingular Notice that xt Rxt = T1 x∗ 1t ∗ xt = Let x∗ ≡ T1 xt These x∗ R11 x∗ where x∗ = T xt = 1t 1t t 1t T2 x∗ 2t calculations support the proposition: Proposition 2: Suppose that a solution to the optimal linear regulator exists under Assumption A.2 Then limt→∞ x∗ = 1t The following definition is used in control theory: Definition: The pair (A, B) is said to be stabilizable if there exists a matrix F for which (A − BF ) is a stable matrix See Kwakernaak and Sivan (1972) and Anderson, Hansen, McGrattan, and Sargent (1996) 116 Linear Quadratic Dynamic Programming The following is illustrative of a variety of stability theorems from control theory: , Theorem: If (A, B) is stabilizable and R is positive definite, then under the optimal rule F , (A − BF ) is a stable matrix In the next section, we assume that A, B, Q, R satisfy conditions sufficient to invoke such a stability propositions, and we use that assumption to justify a solution method that solves the undiscounted linear regulator by searching among the many solutions of the Euler equations for a stable solution 5.5 A Lagrangian formulation This section describes a Lagrangian formulation of the optimal linear regulator 10 Besides being useful computationally, this formulation carries insights about the connections between stability and optimality and also opens the way to constructing solutions of dynamic systems not coming directly from an intertemporal optimization problem 11 These conditions are discussed under the subjects of controllability, stabilizability, reconstructability, and detectability in the literature on linear optimal control (For continuous-time linear system, these concepts are described by Kwakernaak and Sivan, 1972; for discrete-time systems, see Sargent, 1980) These conditions subsume and generalize the transversality conditions used in the discrete-time calculus of variations (see Sargent, 1987a) That is, the case when (A − BF ) is stable corresponds to the situation in which it is optimal to solve “stable roots backward and unstable roots forward.” See Sargent (1987a, chap 9) Hansen and Sargent (1981) describe the relationship between Euler equation methods and dynamic programming for a class of linear optimal control systems Also see Chow (1981) The conditions under which (A − BF ) is stable are also the conditions under which xt converges to a unique stationary distribution in the stochastic version of the linear regulator problem 10 Such formulations are recommended by Chow (1997) and Anderson, Hansen, McGrattan, and Sargent (1996) 11 Blanchard and Kahn (1980), Whiteman (1983), Hansen, Epple, and Roberds (1985), and Anderson, Hansen, McGrattan and Sargent (1996) use and extend such methods 122 Linear Quadratic Dynamic Programming Riccati equation Indeed, with the judicious use of matrix transposition and reversal of time, the two systems of equations (5.6.3 ) and (5.2.7 ) can be made to match In chapter B on dual filtering and control, we compare versions of these equations and describe the concept of duality that links them Chapter B also contains a formal derivation of the Kalman filter We now put the Kalman filter to work 15 5.6.1 Muth’s example Phillip Cagan (1956) and Milton Friedman (1956) posited that when people wanted to form expectations of future values of a scalar yt they would use the following “adaptive expectations” scheme: ∗ yt+1 = K ∞ j (1 − K) yt−j (5.6.6a) j=0 or ∗ ∗ yt+1 = (1 − K) yt + Kyt , (5.6.6b) ∗ yt+1 where is people’s expectation Friedman used this scheme to describe people’s forecasts of future income Cagan used it to model their forecasts of inflation during hyperinflations Cagan and Friedman did not assert that the scheme is an optimal one, and so did not fully defend it Muth (1960) wanted to understand the circumstances under which this forecasting scheme would be optimal Therefore, he sought a stochastic process for yt such that equation (5.6.6 ) would be optimal In effect, he posed and solved an “inverse optimal prediction” problem of the form “You give me the forecasting scheme; I have to find the stochastic process that makes the scheme optimal.” Muth solved the problem using classical (non-recursive) methods The Kalman filter was first described in print in the same year as Muth’s solution of this problem (Kalman, 1960) The Kalman filter lets us present the solution to Muth’s problem quickly Muth studied the model xt+1 = xt + wt+1 (5.6.7a) yt = xt + vt , (5.6.7b) 15 The Matlab program kfilter.m computes the Kalman filter Matlab has several other programs that compute the Kalman filter for discrete and continuous time models The Kalman filter 123 where yt , xt are scalar random processes, and wt+1 , vt are mutually indepen2 dent i.i.d Gaussian random process with means of zero and variances Ewt+1 = Q, Evt = R , and Evs wt+1 = for all t, s The initial condition is that x0 is Gaussian with mean x0 and variance Σ0 Muth sought formulas for ˆ xt+1 = E[xt+1 |y t ], where y t = [yt , , y0 ] ˆ 2.5 1.5 0.5 0 0.5 1.5 2.5 Figure 5.6.1: Graph of f (Σ) = Σ(R+Q)+QR , Q = R = , Σ+R against the 45-degree line Iterations on the Riccati equation for Σt converge to the fixed point For this problem, A = 1, CC = Q, G = , causing the Kalman filtering equations to become Kt = Σt Σt + R (5.6.8a) Σ2 t Σt + R (5.6.8b) Σt (R + Q) + QR Σt + R (5.6.9) Σt+1 = Σt + Q − The second equation can be rewritten Σt+1 = For Q = R = , Figure 4.1 plots the function f (Σ) = Σ(R+Q)+QR appearing on Σ+R the right side of equation (5.6.9 ) for values Σ ≥ against the 45-degree line 124 Linear Quadratic Dynamic Programming Note that f (0) = Q This graph identifies the fixed point of iterations on f (Σ) as the intersection of f (·) and the 45-degree line That the slope of f (·) is less than unity at the intersection assures us that the iterations on f will converge as t → +∞ starting from any Σ0 ≥ Muth studied the solution of this problem as t → ∞ Evidently, Σt → Σ∞ ≡ Σ is the fixed point of a graph like Figure 4.1 Then Kt → K and the formula for xt+1 becomes ˆ xt+1 = (1 − K) xt + Kyt ˆ ˆ (5.6.10) Σ where K = Σ+R ∈ (0, 1) This is a version of Cagan’s adaptive expectations t formula Iterating backward on equation (5.6.10 ) gives xt+1 = K j=0 (1 − ˆ j t+1 K) yt−j +K(1−K) x0 , which is a version of Cagan and Friedman’s geometric ˆ distributed lag formula Using equations (5.6.7 ), we find that E[yt+j |y t ] = E[xt+j |y t ] = xt+1 for all j ≥ This result in conjunction with equation ˆ (5.6.10 ) establishes that the adaptive expectation formula (5.6.10 ) gives the optimal forecast of yt+j for all horizons j ≥ This finding itself is remarkable and special because for most processes the optimal forecast will depend on the horizon That there is a single optimal forecast for all horizons in one sense justifies the term “permanent income” that Milton Friedman (1955) chose to describe the forecast The dependence of the forecast on horizon can be studied using the formulas E xt+j |y t−1 = Aj xt ˆ (5.6.11a) E yt+j |y (5.6.11b) t−1 j = GA xt ˆ In the case of Muth’s example, E yt+j |y t−1 = yt = xt ∀j ≥ ˆ ˆ The Kalman filter 125 5.6.2 Jovanovic’s example In chapter 6, we will describe a version of Jovanovic’s (1979) matching model, at the core of which is a “signal-extraction” problem that simplifies Muth’s problem Let xt , yt be scalars with A = 1, C = 0, G = 1, R > Let x0 be Gaussian with mean µ and variance Σ0 Interpret xt (which is evidently constant with this specification) as the hidden value of θ , a “match parameter.” Let y t denote the history of ys from s = to s = t Define mt ≡ xt+1 ≡ E[θ|y t ] ˆ and Σt+1 = E(θ − mt ) Then in this particular case the Kalman filter becomes mt = (1 − Kt ) mt−1 + Kt yt Σt Kt = Σt + R Σt R Σt+1 = Σt + R (5.6.12a) (5.6.12b) (5.6.12c) The recursions are to be initiated from (m−1 , Σ0 ), a pair that embodies all “prior” knowledge about the position of the system It is easy to see from Figure 4.1 that when Q = , Σ = is the limit point of iterations on equation (5.6.12c) starting from any Σ0 ≥ Thus, the value of the match parameter is eventually learned It is instructive to write equation (5.6.12c) as Σt+1 = 1 + Σt R (5.6.13) The reciprocal of the variance is often called the precision of the estimate According to equation (5.6.13 ) the precision increases without bound as t grows, and Σt+1 → 16 We can represent the Kalman filter in the form (5.6.4 ) as mt+1 = mt + Kt+1 at+1 which implies that 2 E (mt+1 − mt ) = Kt+1 σa,t+1 16 As a further special case, consider when there is zero precision initially (Σ0 = +∞) Then solving the difference equation (5.6.13 ) gives Σt = t/R Substituting this into equations (5.6.12 ) gives Kt = (t + 1)−1 , so that the Kalman filter becomes m0 = y0 and mt = [1 − (t + 1)−1 ]mt−1 + (t + 1)−1 yt , t which implies that mt = (t + 1)−1 s=0 yt , the sample mean, and Σt = R/t 126 Linear Quadratic Dynamic Programming where at+1 = yt+1 − mt and the variance of at is equal to σa,t+1 = (Σt+1 + R) from equation (5.6.5 ) This implies E (mt+1 − mt ) = Σ2 t+1 Σt+1 + R For the purposes of our discrete time counterpart of the Jovanovic model in chapter 6, it will be convenient to represent the motion of mt+1 by means of the equation mt+1 = mt + gt+1 ut+1 Σ2 t+1 where gt+1 ≡ Σt+1 +R and ut+1 is a standardized i.i.d normalized and standardized with mean zero and variance constructed to obey gt+1 ut+1 ≡ Kt+1 at+1 5.7 Concluding remarks In exchange for their restrictions, the linear quadratic dynamic optimization models of this chapter acquire tractability The Bellman equation leads to Riccati difference equations that are so easy to solve numerically that the curse of dimensionality loses most of its force It is easy to solve linear quadratic control or filtering with many state variables That it is difficult to solve those problems otherwise is why linear quadratic approximations are used so widely We describe those approximations in appendix B to this chapter In chapter 7, we go beyond the single-agent optimization problems of this chapter and the previous one to study systems with multiple agents simultaneously solving such problems We introduce two equilibrium concepts for restricting how different agents’ decisions are reconciled To facilitate the analysis, we describe and illustrate those equilibrium concepts in contexts where each agent solves an optimal linear regulator problem Linear-quadratic approximations 127 A Matrix formulas Let (z, x, a) each be n × vectors, A, C, D , and V each be (n × n) matrices, B an (n × m) matrix, and y an (m × 1) vector Then ∂a x = a, ∂x Ax = (A + ∂x ∂x (x Ax) A )x, ∂ ∂x∂x = (A + A ), The equation ∂x Ax ∂A = xx , ∂y Bz = Bz, ∂y Bz = B y, ∂y Bz = yz ∂y ∂z ∂B AVA+C =V to be solved for V , is called a discrete Lyapunov equation; and its generalization AVD+C =V is called the discrete Sylvester equation The discrete Sylvester equation has a unique solution if and only if the eigenvalues {λi } of A and {δj } of D satisfy the condition λi δj = ∀ i, j B Linear-quadratic approximations This appendix describes an important use of the optimal linear regulator: to approximate the solution of more complicated dynamic programs 17 Optimal linear regulator problems are often used to approximate problems of the following form: maximize over {ut }∞ t=0 ∞ β t r (zt ) E0 (5.B.1) t=0 xt+1 = Axt + But + Cwt+1 (5.B.2) where {wt+1 } is a vector of i.i.d random disturbances with mean zero and finite variance, and r(zt ) is a concave and twice continuously differentiable function xt All nonlinearities in the original problem are absorbed into the of zt ≡ ut composite function r(zt ) 17 Kydland and Prescott (1982) used such a method, and so many of their followers in the real business cycle literature See King, Plosser, and Rebelo (1988) for related methods of real business cycle models 128 Linear Quadratic Dynamic Programming 5.B.1 An example: the stochastic growth model Take a parametric version of Brock and Mirman’s stochastic growth model, whose social planner chooses a policy for {ct , at+1 }∞ to maximize t=0 ∞ β t ln ct E0 t=0 where ct + it = Aaα θt t at+1 = (1 − δ) at + it ln θt+1 = ρ ln θt + wt+1 where {wt+1 } is an i.i.d stochastic process with mean zero and finite variance, ˜ θt is a technology shock, and θt ≡ ln θt To get this problem into the form at ˜ (5.B.1 )–(5.B.2 ), take xt = ˜ , ut = it , and r(zt ) = ln(Aaα exp θt − it ), t θt and we write the laws of motion as 0 1 at+1 = (1 − δ) at + it + wt+1 ˜ ˜ 0 ρ θt+1 θt where it is convenient to add the constant as the first component of the state vector 5.B.2 Kydland and Prescott’s method We want to replace r(zt ) by a quadratic zt M zt We choose a point z and ¯ 18 approximate with the first two terms of a Taylor series: r (z) = r (¯) + (z − z ) ˆ z ¯ + ∂r ∂z ∂2r (z − z ) ¯ (z − z ) ¯ ∂z∂z (5.B.3) If the state xt is n × and the control ut is k × , then the vector zt is (n + k) × Let e be the (n + k) × vector with 0’s everywhere except for 18 This setup is taken from McGrattan (1994) and Anderson, Hansen, McGrattan, and Sargent (1996) Linear-quadratic approximations 129 a in the row corresponding to the location of the constant unity in the state vector, so that ≡ e zt for all t Repeatedly using z e = e z = , we can express equation (5.B.3 ) as r (z) = z M z, ˆ where ∂r ∂z M =e r (¯) − z ∂2r ¯ z+ z ¯ z e ¯ ∂z∂z + ∂r ∂2r ∂2r ∂r e − e¯ z − ze + e ¯ ∂z ∂z∂z ∂z∂z ∂z + ∂2r ∂z∂z where the partial derivatives are evaluated at z Partition M , so that ¯ z Mz ≡ x u = x u M11 M21 R W M12 M22 W Q x u x u 5.B.3 Determination of z ¯ Usually, the point z is chosen as the (optimal) stationary state of the non¯ stochastic version of the original nonlinear model: ∞ β t r (zt ) t=0 xt+1 = Axt + But This stationary point is obtained in these steps: Find the Euler equations Substitute zt+1 = zt ≡ z into the Euler equations and transition laws, ¯ and solve the resulting system of nonlinear equations for z This pur¯ pose can be accomplished, for example, by using the nonlinear equation solver fsolve.m in Matlab 130 Linear Quadratic Dynamic Programming 5.B.4 Log linear approximation For some problems Christiano (1990) has advocated a quadratic approximation in logarithms We illustrate his idea with the stochastic growth example Define ˜ at = log at , θt = log θt ˜ Christiano’s strategy is to take at , θt as the components of the state and write ˜ ˜ the law of motion as 1 0 at+1 = 0 at ˜ ˜ ˜ ˜ θt+1 θt 0 ρ 0 ut + wt+1 + where the control ut is at+1 ˜ Express consumption as α ct = A (exp at ) ˜ ˜ exp θt + (1 − δ) exp at − exp at+1 ˜ ˜ Substitute this expression into ln ct ≡ r(zt ), and proceed as before to obtain the second-order Taylor series approximation about z ¯ 5.B.5 Trend removal It is conventional in the real business cycle literature to specify the law of motion for the technology shock θt by ˜ θt = log θt γt ˜ ˜ θt+1 = ρθt + wt+1 , , γ>1 |ρ| < This inspires us to write the law of motion for capital as γ at it at+1 = (1 − δ) t + t γ t+1 γ γ (5.B.4) Exercises 131 or γ exp at+1 = (1 − δ) exp at + exp ˜t ˜ ˜ i (5.B.5) a i i where at ≡ log γ t , ˜t = log γt By studying the Euler equations for a model ˜ t t with a growing technology shock (γ > 1), we can show that there exists a steady state for at , but not for at Researchers often construct linear-quadratic ˜ approximations around the nonstochastic steady state of a ˜ Exercises Exercise 5.1 Consider the modified version of the optimal linear regulator problem where the objective is to maximize ∞ − β t {xt Rxt + ut Qut + 2ut Hxt } t=0 subject to the law of motion: xt+1 = Axt + But Here xt is an n × state vector, ut is a k × vector of controls, and x0 is a given initial condition The matrices R, Q are positive definite and symmetric The maximization is with respect to sequences {ut , xt }∞ t=0 a Show that the optimal policy has the form −1 ut = − (Q + βB P B) (βB P A + H) xt , where P solves the algebraic matrix Riccati equation −1 P = R + βA P A − (βA P B + H ) (Q + βB P B) (βB P A + H) (5.6) b Write a Matlab program to solve equation (5.6 ) by iterating on P starting from P being a matrix of zeros Exercise 5.2 Verify that equations (5.2.10 ) and (5.2.11 ) implement the policy improvement algorithm for the discounted linear regulator problem 132 Exercise 5.3 Linear Quadratic Dynamic Programming A household seeks to maximize ∞ − β t (ct − b) + γi2 t t=1 subject to ct + it = rat + yt (5.7a) at+1 = at + it (5.7b) yt+1 = ρ1 yt + ρ2 yt−1 (5.7c) Here ct , it , at , yt are the household’s consumption, investment, asset holdings, and exogenous labor income at t; while b > 0, γ > 0, r > 0, β ∈ (0, 1), and ρ1 , ρ2 are parameters, and y0 , y−1 are initial conditions Assume that ρ1 , ρ2 are such that (1 − ρ1 z − ρ2 z ) = implies |z| > a Map this problem into an optimal linear regulator problem b For parameter values [β, (1 + r), b, γ, ρ1 , ρ2 ] = (.95, 95−1, 30, 1, 1.2, −.3), compute the household’s optimal policy function using your Matlab program from exercise 5.1 Exercise 5.4 maximize Modify exercise 5.3 by assuming that the household seeks to ∞ − β t (st − b) + γi2 t t=1 Here st measures consumption services that are produced by durables or habits according to st = λht + πct (5.8a) ht+1 = δht + θct (5.8b) where ht is the stock of the durable good or habit, (λ, π, δ, θ) are parameters, and h0 is an initial condition a Map this problem into a linear regulator problem b For the same parameter values as in exercise 5.3 and (λ, π, δ, θ) = (1, 05, 95, 1), compute the optimal policy for the household Exercises 133 c For the same parameter values as in exercise 5.3 and (λ, π, δ, θ) = (−1, 1, 95, 1), compute the optimal policy d Interpret the parameter settings in part b as capturing a model of durable consumption goods, and the settings in part c as giving a model of habit persistence Exercise 5.5 A household’s labor income follows the stochastic process yt+1 = ρ1 yt + ρ2 yt−1 + wt+1 + γwt , where wt+1 is a Gaussian martingale difference sequence with unit variance Calculate ∞ β j yt+j |y t , wt , E (5.9) j=0 where y t , wt denotes the history of y, w up to t a Write a Matlab program to compute expression (5.9 ) b Use your program to evaluate expression (5.9 ) for the parameter values (β, ρ1 , ρ2 , γ) = (.95, 1.2, −.4, 5) Exercise 5.6 Dynamic Laffer curves The demand for currency in a small country is described by (1) Mt /pt = γ1 − γ2 pt+1 /pt , where γ1 > γ2 > , Mt is the stock of currency held by the public at the end of period t, and pt is the price level at time t There is no randomness in the country, so that there is perfect foresight Equation (1) is a Cagan-like demand function for currency, expressing real balances as an inverse function of the expected gross rate of inflation Speaking of Cagan, the government is running a permanent real deficit of g per period, measured in goods, all of which it finances by currency creation The government’s budget constraint at t is (2) (Mt − Mt−1 ) /pt = g, where the left side is the real value of the new currency printed at time t The economy starts at time t = , with the initial level of nominal currency stock M−1 = 100 being given 134 Linear Quadratic Dynamic Programming For this model, define an equilibrium as a pair of positive sequences {pt > 0, Mt > 0}∞ that satisfy equations (1) and (2) (portfolio balance and the t=0 government budget constraint, respectively) for t ≥ , and the initial condition assigned for M−1 a Let γ1 = 100, γ2 = 50, g = 05 Write a computer program to compute equilibria for this economy Describe your approach and display the program b Argue that there exists a continuum of equilibria Find the lowest value of the initial price level p0 for which there exists an equilibrium (Hint Number 1: Notice the positivity condition that is part of the definition of equilibrium Hint Number 2: Try using the general approach to solving difference equations described in the section “A Lagrangian formulation.” c Show that for all of these equilibria except the one that is associated with the minimal p0 that you calculated in part b, the gross inflation rate and the gross money creation rate both eventually converge to the same value Compute this value d Show that there is a unique equilibrium with a lower inflation rate than the one that you computed in part b Compute this inflation rate e Increase the level of g to 075 Compare the (eventual or asymptotic) inflation rate that you computed in part b and the inflation rate that you computed in part c Are your results consistent with the view that “larger permanent deficits cause larger inflation rates”? f Discuss your results from the standpoint of the “Laffer curve.” Hint: A Matlab program dlqrmon.m performs the calculations It is available from the web site for the book Exercise 5.7 A government faces an exogenous stream of government expenditures {gt } that it must finance Total government expenditures at t, consist of two components: (1) gt = gT t + gP t where gT t is ‘transitory’ expenditures and gP t is ‘permanent’ expenditures At the beginning of period t, the government observes the history up to t of both gT t and gP t Further, it knows the stochastic laws of motion of both, namely, (2) gP t+1 = gP t + c1 1,t+1 gT t+1 = (1 − ρ) µT + ρgT t + c2 2t+1 Exercises where t+1 = 1t+1 135 is an i.i.d Gaussian vector process with mean zero and 2t+1 identity covariance matrix The government finances its budget with a distorting taxes If it collects Tt total revenues at t, it bears a dead weight loss of W (Tt ) where W (T ) = w1 Tt + 5w2 Tt2 , where w1 , w2 > The government’s loss functional is ∞ (3) β t W (Tt ) , E β ∈ (0, 1) t=0 The government can purchase or issue one-period risk free loans at a constant price q Therefore, it faces a sequence of budget constraints (4) gt + qbt+1 = Tt + bt , where q −1 is the gross rate of return on one period risk-free government loans Assume that b0 = The government also faces the terminal value condition lim β t W (Tt ) bt+1 = 0, t→+∞ which prevents it from running a Ponzi scheme The government wants to design a tax collection strategy expressing Tt as a function of the history of gT t , gP t , bt that minimizes (3) subject to (1), (2), and (4) a Formulate the government’s problem as a dynamic programming problem Please carefully define the state and control for this problem Write the Bellman equation in as much detail as you can Tell a computational strategy for solving the Bellman equation Tell the form of the optimal value function and the optimal decision rule b Using objects that you computed in part a, please state the form of the law of motion for the joint process of gT t , gP t , Tt , bt+1 under the optimal government policy Some background: Assume now that the optimal tax rule that you computed above has been in place for a very long time A macroeconomist who is studying the economy observes time series on gt , Tt , but not on bt or the breakdown of gt into its components gT t , gP t The macroeconomist has a very long time series for [gt , Tt ] and proceeds to computing a vector autoregression for this vector 136 Linear Quadratic Dynamic Programming c Define a population vector autoregression for the [gt , Tt ] process (Feel free to assume that lag lengths are infinite if this simplifies your answer.) d Please tell precisely how the vector autoregression for [gt , Tt ] depends on the parameters [ρ, β, µ, q, w1 , w2 , c1 , c2 ] that determine the joint [gt , Tt ] process according to the economic theory you used in part a e Now suppose that in addition to his observations on [Tt , gt ], the economist gets an error ridden time series on government debt bt : ˜t = bt + c3 w3t+1 b where w3t+1 is an i.i.d scalar Gaussian process with mean zero and unit variance that is orthogonal to wis+1 for i = 1, for all s and t Please tell how the vector autoregression for [gt , Tt , ˜t ] is related to the parameters [ρ, β, µ, q, w1 , w2 , c1 , c2 , c3 ] b Is there any way to use the vector autoregression to make inferences about those parameters? ... be veri? ?ed directly that M in equation (5. 5.3 ) is symplectic It follows from equation (5. 5.4 ) and J −1 = J = −J that for any symplectic matrix M , M = J −1 M −1 J (5. 5 .5) Equation (5. 5 .5 ) states... , y0 ] ˆ 2 .5 1 .5 0 .5 0 0 .5 1 .5 2 .5 Figure 5. 6.1: Graph of f (Σ) = Σ(R+Q)+QR , Q = R = , Σ+R against the 4 5- degree line Iterations on the Riccati equation for Σt converge to the fixed point For... from equation (5. 5 .5 ) that the eigenvalues of M occur in reciprocal pairs: if λ is an eigenvalue of M , so is λ−1 Write equation (5. 5.2 ) as yt+1 = M yt where yt = xt µt (5. 5.6) Consider the