A Feedback Stackelberg Stochastic Diﬀerential Game- 123docz.net

The preceding sections in this chapter dealt with diﬀerential games in which all players make their decisions simultaneously. We now discuss

396 13. Diﬀerential Games

XY XY

Figure 13.1: A sample path of optimal market share trajectories

a diﬀerential game in which two players make their decisions in a hier- archical manner. The player having the right to move ﬁrst is called the leader and the other player is called the follower. If there are two or more leaders, they play Nash, and the same goes for the followers.

In terms of solutions of Stackelberg diﬀerential games, we have open- loop and feedback solutions. An open-loop Stackelberg equilibrium spec- iﬁes, at the initial time (say,t= 0), the decisions over the entire horizon.

As in Sect.13.1, there is a maximum principle for open-loop solutions.

Typically, open-loop solutions are not time consistent in the sense that at any time t >0,the remaining decision may no longer be optimal; see Exercise 13.2. A feedback or Markovian Stackelberg equilibrium, on the other hand, consists of decisions expressed as functions of the current state and time. Such a solution is time consistent.

In this section, we will not develop the general theory, for which we refer the reader to Basar and Olsder (1999), Dockner et al. (2000), and Bensoussan et al. (2014,2015a,2018). Instead, we will formulate a Stackelberg diﬀerential game of cooperative advertising between a manufacturer as the leader and a retailer as the follower, and obtain a feedback Stackelberg solution. This formulation is due to He et al. (2009). A veri- ﬁcation theorem that applies to this problem can be found in Bensoussan et al. (2018).

13.4. A Stackelberg Diﬀerential Game of Cooperative Advertising 397 The manufacturer sells a product to end users through the retailer.

The product is in a mature category where sales, expressed as a fraction of the potential market, is inﬂuenced through advertising expenditures.

The manufacturer as the leader decides on an advertising support scheme via a subsidy rate, i.e., he will contribute a certain percentage of the advertising expenditure by the retailer. Speciﬁcally, the manufacturer decides on a subsidy rate Wt, 0 ≤ Wt ≤ 1, and the retailer as the follower decides on the advertising eﬀort levelUt≥0, t≥0.

As in Sect.12.3, the cost of advertising is quadratic in the advertising eﬀort Ut. Then, with the advertising eﬀortUt and the subsidy rate Wt, the manufacturer’s and the retailer’s advertising expenditures areWtUt2 and (1−Wt)Ut2, respectively. The market share dynamics is given by the Sethi model

dXt= (rUt#

1−Xt−δXt)dt+σ(Xt)dZt, X0=x0. (13.50) The corresponding expected proﬁts of the retailer and the manufacturer are, respectively, as follows:

JR=E ∞

0 e−ρt(πXt−(1−Wt)Ut2)dt

, (13.51)

JM =E ∞

0 e−ρt5

πMXt−WtUt26 dt

. (13.52)

A solution of this Stackelberg diﬀerential game depends on the avail- able information structure. We shall assume that at each time t, both players know the current system state and the follower knows the action of the leader. The concept of equilibrium that applies in this case is that of feedback Stackelberg equilibrium. For this and other information structures and equilibrium concepts, see Bensoussan et al. (2015a).

Next we define the rules, governing the sequence of actions, by which this game will be played over time. To be specific, the sequence of plays at any time t ≥0 is as follows. First, the manufacturer observes the market share Xt at time t and selects the subsidy rate Wt. Then, the retailer observes this actionWt and, knowing also the market share Xt at time t, sets the advertising effort rate Ut as his response to Wt. The system evolves over time as this game is played in continuous time beginning at time t= 0. One could visualize this game as being played at times 0, δt,2δt, . . . , and then letδt→0.

Next, we will address the question of how players choose their actions at any givent. Speciﬁcally, we are interested in deriving an equilibrium menu W(x) for the leader representing his decision when the state is x

398 13. Differential Games at timet, and a menuU(x, W) for the follower representing his decision when he observes the leader’s decision to beW in addition to the statex at time t.For this, let us first define a feedback Stackelberg equilibrium, and then develop a procedure to obtain it.

We begin with specifying the admissible strategy spaces for the manufacturer and the retailer, respectively:

W = {W|W : [0,1]→[0,1]

and W(x) is Lipschitz continuous inx} U = {U|U : [0,1]×[0,1]→[0,∞)

and U(x, W) is Lipschitz continuous in (x, W)}.

For a pair of strategies (W, U)⊂ W ×U,letYs, s≥t,denote the solution of the state equation

dYs = (rU(Ys, Ws)#

1−Ys−δYs)ds+σ(Ys)dZs, Yt=x. (13.53) We should note that Ys here stands for Ys(t, x;W, U), as the solution depends on the speciﬁed arguments. Then JMt,x(W(ã), U(ã, W(ã))) and JRt,x(W(ã), U(ã, W(ã))) representing the current-value proﬁts of the manufacturer and retailer at timet are, respectively,

JMt,x(W(ã), U(ã, W(ã)))

=E∞

t e−ρ(s−t)[πMYs−W(Ys){U(Ys, W(Ys))}2], (13.54) JRt,x(W(ã), U(ã, W(ã)))

=E∞

t e−ρ(s−t)[πYs−(1−W(Ys)){U(Ys, W(Ys))}2], (13.55) where we should stress thatW(ã), U(ã, W(ã)) evaluated at any stateζare W(ζ), U(ζ, W(ζ)).We can now deﬁne our equilibrium concept.

A pair of strategies (W∗, U∗)∈ W×Uis called a feedback Stackelberg equilibrium if

JMt,x(W∗(ã), U∗(ã, W∗(ã)))

≥JMt,x(W(ã), U∗(ã, W(ã))), W ∈ W, x∈[0,1], t≥0, (13.56) and

JRt,x(W∗(ã), U∗(ã, W∗(ã)))

≥JRt,x(W∗(ã), U(ã, W∗(ã))), U ∈ U, x∈[0,1], t≥0. (13.57)

13.4. A Stackelberg Diﬀerential Game of Cooperative Advertising 399 It has been shown in Bensoussan et al. (2014) that this equilibrium is obtained by solving a pair of Hamilton-Jacobi-Bellman equations where a static Stackelberg game is played at the Hamiltonian level at each t, and where

HM(x, W, U, λM) =πMx−W U2+λM(rU√

1−x−δx) (13.58) HR(x, W, U, λR) =πx−(1−W)U2+λR(rU√

1−x−δx) (13.59) are the Hamiltonians for the manufacturer and the retailer, respectively.

To solve this Hamiltonian level game, we ﬁrst maximizeHRwith respect toU in terms ofx and W.The ﬁrst order condition gives

U∗(x, W) = λRr√ 1−x

2(1−W) , (13.60)

as the optimal response of the follower for any decisionW by the leader.

We then substitute this for U inHM to obtain

HM(x, W, U∗(x, W), λM) = πMx−W(λRr)2(1−x) 4(1−W)2 +λM

λRr2(1−x) 2(1−W) −δx

.(13.61) The ﬁrst-order condition of maximizing HM with respect to W gives us

W(x) = 2λM −λR

2λM +λR. (13.62)

Clearly W(x)≥1 makes no intuitive sense because it would induce the retailer to spend an inﬁnite amount on advertising, and that would not be optimal for the leader. Moreover,λM andλR,the marginal valuations of the market share of the leader and the follower, respectively, are expected to be positive, and therefore it follows from (13.62) thatW(x)<1.Thus, we set,

W∗(x) = max

0,2λM−λR 2λM+λR

. (13.63)

We can now write the HJB equations as

ρVR = HR(x, W∗(x), U∗(x, W∗(x)), VxR) + (σ(x))2VxxR/2

= πx+(VxRr)2(1−x)

4(1−W∗(x)) −VxRδx+(σ(x))2VxxR

2 (13.64)

400 13. Diﬀerential Games ρVM = HM(x, W∗(x), U∗(x, W∗(x)), VxM) + (σ(x))2VxxM/2

= πMx−(VxRr)2(1−x)W∗(x)

4(1−W∗(x))2 +VxMVxRr2(1−x) 2(1−W∗(x))

−VxMδx+ (σ(x))2VxxM/2 (13.65) The solution of these equations will yield the value functionsVM(x) and VR(x).With these in hand, we can give the equilibrium menu of actions to the manufacturer and the retailer to guide their decisions at each t.

These menus are W∗(x) = max

0,2VxM −VxR 2VxM +VxR

and U∗(x, W) = VxRr√ 1−x 2(1−W) .

(13.66) To solve for the value function, we next investigate the two cases where the subsidy rate is (a) zero and (b) positive, and determine the condition required for no subsidy to be optimal.

Case (a): No Co-op Advertising (W∗ = 0). Inserting W∗(x) = 0 into (13.66) gives

U∗(x,0) = rVxR√ 1−x

2 . (13.67)

Inserting W∗(x) = 0 into (13.65) and (13.64), we have ρVM =πMx+VxMVxRr2(1−x)

2 −VxMδx+(σ(x))2VxxM

2 , (13.68)

ρVR=πx+(VxR)2r2(1−x)

4 −VxRδx+(σ(x))2VxxR

2 . (13.69)

Let VM(x) = αM +βMx and VR(x) = α+βx. Then, VxM =βM and VxR = β. Substituting these into (13.68) and (13.69) and equating like powers of x, we can express all of the unknowns in terms of β, which itself can be explicitly solved. That is, we obtain

β = 2π

#(ρ+δ)2+r2π+ (ρ+δ), βM = 2πM 2(ρ+δ) +βr2,

(13.70) α = β2r2

4ρ , αM = ββMr2

2ρ . (13.71)

13.4. A Stackelberg Diﬀerential Game of Cooperative Advertising 401 Using (13.71) in (13.67), we can writeU∗(x) =#

ρα(1−x).Finally, we can derive the required condition from the right-hand side of W∗(x) in (13.66), which is 2VxM ≤VxR,for no co-op advertising (W∗ = 0) in the equilibrium. This is given by 2βM ≤β,or

4πM 2(ρ+δ) +√ 2πr2

(ρ+δ)2+r2π+(ρ+δ)

≤ 2π

#(ρ+δ)2+r2π+ (ρ+δ). (13.72)

After a few steps of algebra, this yields the required condition

θ:= πM

#(ρ+δ)2+r2π − π

#(ρ+δ)2+r2π+ (ρ+δ) ≤0. (13.73) Next, we obtain the solution when θ >0.

Case (b): Co-op Advertising (W∗>0).Then,W∗(x) in (13.66) reduces to

W∗(x) = 2VxM −VxR

2VxM +VxR. (13.74) Inserting this for W∗(x) into (13.65) and (13.64), we have

ρVM = πMx−r2(1−x)[4(VxM)2−(VxR)2] 16

+VxMr2(1−x)[2VxM +VxR] 4

−VxMδx+(σ(x))2VxxM

2 , (13.75)

ρVR=πx+

(VxR)2r2(1−x) 4

2VxM+VxR 2VxR

−VxRδx+(σ(x))2VxxR

2 .

(13.76) Once again, VM(x) =αM +βMx, VR =α+βx, VxM =βM, VxR = β.

Substituting these into (13.75) and (13.76) and equating like powers of x, we have

α= β(β+ 2βM)r2

8ρ , (13.77)

(ρ+δ)β =π−β(β+ 2βM)r2

8 , (13.78)

αM = (β+ 2βM)2r2

16ρ , (13.79)

402 13. Diﬀerential Games

(ρ+δ)βM =πM− (β+ 2βM)2r2

16 . (13.80)

Using (13.66), (13.74), and (13.79), we can write U∗(x, W∗(x)), with a slight abuse of notation, as

U∗(x) = r(VxR+ 2VxM)√ 1−x

4 =#

ραM(1−x). (13.81) The four equations (13.77)–(13.80) determine the solutions for the four unknowns, α, β, αM, and βM. From (13.78) and (13.80), we can obtain

β3+ 2πM

ρ+δβ2+8π

r2β− 8π2

(ρ+δ)r2 = 0. (13.82) If we denote

a1 = 2πM

ρ+δ, a2 = 8π

r2, and a3 = −8π2 (ρ+δ)r2,

then a1 >0, a2 >0,and a3 <0. From Descarte’s Rule of Signs, there exists a unique, positive real root. The two remaining roots may be both imaginary or both real and negative. Since this is a cubic equation, a complete solution can be obtained. Using Mathematica or following Spiegel et al. (2008), we can write down the three roots as

β(1) = S+T− 1 3a1, β(2) = −1

2(S+T)−1 3a1+

√3

2 i(S−T), β(3) = −1

2(S+T)−1 3a1−

√3

2 i(S−T), with

S= 3

R+#

Q3+R2, T = 3

R−#

Q3+R2, i=√

−1, where

Q= 3a2−a21

9 , R= 9a1a2−27a3−2a31

54 .

Next, we identify the positive root in each of the following three cases:

Case 1 (Q > 0): We have S > 0 > T and Q3 +R2 > 0. There is one positive root and two imaginary roots. The positive root is β = S+T −(1/3)a1.

13.4. A Stackelberg Diﬀerential Game of Cooperative Advertising 403 Table 13.1: Optimal feedback Stackelberg solution

(a) ifθ≤0 (b) ifθ >0 No co-op equilibrium Co-op equilibrium Retailer’s

proﬁtVR VR(x) =α+βx VR(x) =α+βx Manufacturer’s

proﬁtVM VM(x) =αM +βMx VM(x) =αM+βMx Coeﬃcients of β =√ 2π

(ρ+δ)2+rπ+(ρ+δ) β= ρ+δπ −β(β+2β8(ρ+δ)M)r2 proﬁt functions, βM = 2(ρ+δ)+βr2πM 2 βM =ρ+δπM −(β+2β16(ρ+δ)M)2r2 α, β, αM, βM α=β4ρ2r2 α= β(β+2β8ρM)r2 obtained from: αM = ββ2ρMr2 αM = (β+2β16ρM)2r2 Subsidy

rateW∗(x) = 0 2β2βM−β

M+β = 1−ααM Advertising

eﬀortU∗(x) = rβ√21−x =#

ρα(1−x) r(β+2βM)

√1−x

4 =#

ραM(1−x)

Case 2 (Q <0 and Q3+R2 >0): There are three real roots with one positive root, which is β=S+T−(1/3)a1.

Case 3 (Q < 0 and Q3+R2 <0): S and T are both imaginary. We have three real roots with one positive root. While subcases can be given to identify the positive root, for our purposes, it is enough to identify it numerically.

Finally, we can conclude that 2βM −β >0 so that W∗ >0,since if this were not the case, thenW∗ would be zero, and we would once again be in Case (a).

We can now summarize the optimal feedback Stackelberg equilibrium in Table 13.1. In Exercises13.7–13.10, you are asked to further explore the model of this section when the parameters π= 0.25, πM = 0.5, r =

404 13. Diﬀerential Games 2, ρ = 0.05, δ = 1,and σ(x) = 0.25#

x(1−x). For this case, He et al.

(2009) obtain the comparative statics as shown in Fig.13.2.

0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6

0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6

-0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6

Figure 13.2: Optimal subsidy rate vs. (a) Retailer’s margin and (b) Manufacturer’s margin

There have been many applications of differential games in mar- keting in general and optimal advertising in particular. Some refer- ences are Bensoussan et al. (1978), Deal et al. (1979), Deal (1979), Jứrgensen (1982a), Rao (1984, 1990), Dockner and Jứrgensen (1986, 1992), Chintagunta and Vilcassim (1992), Chintagunta and Jain (1994, 1995), Fruchter (1999a), Jarrar et al. (2004), Mart´ın-Herrán et al. (2005), Breton et al. (2006), Jứrgensen and Zaccour (2007), He and Sethi (2008), Naik et al. (2008), Zaccour (2008a), Jứrgensen et al. (2009), Prasad and Sethi (2009). The literature on advertising differential games is surveyed by Jứrgensen (1982a) and the literature on management applications of Stackelberg differential games is reviewed by He et al. (2007). Mono- graphs are written by Erickson (2003) and Jứrgensen and Zaccour (2004).

For applications of diﬀerential games to economics and management sci- ence in general, see the book by Dockner et al. (2000).

Exercises for Chapter 13

E 13.1 A Bilinear Quadratic Advertising Model(Deal et al.1979). Let xi be the market share of ﬁrm i and ui be its advertising rate, i= 1,2.

Exercises for Chapter 13 405 The state equations are

x1 = b1u1(1−x1−x2) +e1(u1−u2)(x1+x2)−a1x1 x1(0) =x10,

x2 = b2u2(1−x1−x2) +e2(u2−u1)(x1+x2)−a2x2 x2(0) =x20,

wherebi, ei, andai are given positive constants. Firmiwants to maxi- mize

Ji=wie−ρTxi(T) + T

0 (cixi−u2i)e−ρtdt,

where wi, ci,and ρ are positive constants. Derive the necessary condi- tions for the open-loop Nash solution, and formulate the resulting bound- ary value problem. In a related paper, Deal (1979) provides a numerical solution to this problem with e1=e2 = 0.

E 13.2 Let x(t) denote the stock of pollution at time t ∈ [0, T] that aﬀects the welfare of two countries, one of which is the leader and the other the follower. The state dynamics is

x=u+v, x(0) =x0,

where u and v are emission rates of the leader and the follower, respectively. Let their instantaneous utility functions be

u−(u2+x2)/2 andv−(v2+x2)/2,

respectively. Obtain the open-loop Stackelberg solution. By re-solving this problem at time τ , 0< τ < T,show that the ﬁrst solution obtained is time inconsistent.

Hint: Apply ﬁrst the maximum principle to the follower’s problem for any given leader’s decision u. Let λF denote the adjoint variable associated with the state x; Clearly λF(T) = 0. Then apply the maximum principle to the leader’s problem, treating the follower’s adjoint equation as a “state” equation in addition to the state equation for x. Let the adjoint variables associated with x and λF be λL and μ, respectively. Clearly λL(T) = 0. However, the transversality condition for μ will be μ(0) = 0 in view of Remark 3.9. See Basar and Olsder (1999) and Dockner et al. (2000) for further details.

406 13. Diﬀerential Games E 13.3 Develop the nonlinear model for licensing of ﬁsherman described toward the end of Sect.13.2.3 by rewriting (13.19) and (13.22) for the model. Derive the adjoint equation forλifor theith producer, and show that the feedback Nash policy for producer iis given by

f(vi∗) = ci (pi−λi)x.

E 13.4 Consider an N-firm oligopoly. Let Si(t) denote the cumulative sales by time tof firmi∈ {1,2, ..., N}and defineS(t) =N

i=1Si(t).Let Ai(t) denote firm i’s advertising rate. With positive constants a, b, and d, assume that the differential game has the diffusion dynamics

S˙i(t) = [a+blogAi(t) +dS(t)][M−S(t)], Si(0) =Si0 ≥0,

which means that a firm can stimulate its sales through advertising (but subject to decreasing returns) and that demand learning effects (imita- tion) are industry-wide. (If these effects were firm-specific we would have Si instead ofS in the brackets on the right-hand side of the dynamics.) Payoffs are given by

Ji= T

0 [(pi−ci) ˙Si(t)−Ai(t)]dt,

in which prices and unit costs are constant. Since ˙Si(t) in the expression for Ji is stated in terms of the state variable S(t) and the control variables Ai(t), i ∈ {1,2, . . . , N}, formulate the diﬀerential game problem with S(t) as the state variable. In the open-loop Nash equilibrium, show that the advertising rates are monotonically decreasing over time.

Hint: Assume ∂2Hi/∂S2 ≤ 0 so that Hi is concave in S. Use this condition to prove the monotone property.

E 13.5 Solve (13.43) to obtain the solution forαand β given in (13.44) and (13.45).

E 13.6 UseMathematicaor another suitable software program to solve the quartic equation (13.46). Show that forρ1=ρ2 = 0.05, π1=π2= 1, δ = 0.01, R1 = 1, R2 = 4,the only positive solution for β1 is 0.264545.

Figure 13.1gives a sample path of the optimal market shares of the two ﬁrms for this problem. Draw another sample path.

Exercises for Chapter 13 407 E 13.7 In the Stackelberg diﬀerential game of Sect.13.4 let π = 0.25, πM = 0.5, r = 2, ρ = 0.05, and δ = 1. Obtain the coeﬃcients α, β, αM, βM, and show that W∗ = 0.58. Graph the value functions VM(x) =αM +βMx, V(x) =α+βx, and their sum VM(x) +V(x),as the functions of the market share x.

E 13.8 Suppose the manufacturer in Exercise13.7 does not behave op- timally and decides instead to oﬀer no cooperative advertising. Obtain the value functions of the manufacturer and the retailer. Compare the manufacturer’s value function in this case with VM(x) in Exercise 13.7.

Furthermore, when x0 = 0.5,obtain the manufacturer’s loss in expected proﬁt when compared to the optimal expected proﬁt VM(x0) in Exer- cise13.7.

E 13.9 Suppose that the manufacturer and the retailer in the problem of Sect.13.4 are integrated into a single firm. Then, formulate the stochastic optimal control problem of the integrated firm. Also, using the data in Exercise 13.7, obtain the value function VI(x) = αI +βIx of the integrated firm, and compare it to VM(x) +V(x) obtained in Exercise 13.7.

E 13.10 Let σ(x) = 0.25#

x(1−x) and the initial market share x0 = 0.1. Use the optimal feedback advertising eﬀort U∗(x) in (13.50) to determine the optimal market shareXt∗ over time. You may use MATLAB or another suitable software to graph a sample path of Xt∗, t≥0.

Appendix A

Solutions of Linear

Diﬀerential Equations

A Feedback Stackelberg Stochastic Diﬀerential Game of

Formulation of Simple Control Models

Solving a TPBVP by Using Excel