5.3.1 Converting into a Non-constrained Optimization Problem
The problem of finding a throughput-maximizing adaptive scheduling/transmission policy in Section 5.2.3 consists of one objective function andN constraints. We study this problem using an approach similar to the one used in Chapters 3.
110 In particular, the constrained optimization problem is first reformulated into a non-constrained optimization problem of which the objective is to minimize the weighted sum of the total packet loss rate and total average transmit power, i.e.,
minψ∈Ψlim sup
T→∞
1 T
XT
i=0
XN
n=1
βP(φi(Si, n), Gni) +Lo(Bin, φi(Si, n))
, (5.6)
where β is a weighting factor. Increasing β gives more priority to reducing transmit power, at the cost of more packet loss due to buffer overflow. On the other hand, reducing β puts more priority on reducing the packet loss rate, at the cost of more transmit power.
The problem in (5.6) can be regarded as an infinite horizon average cost Markov decision process (MDP). More importantly, as the weighted sum of total packet loss and total transmit power in each time slot is bounded and all the system states are connected, there exists a stationary policy ψβ ∈ Ψst which is a solution to (5.6). Furthermore, the following theorem states the relationship between the performance of ψβ and that of a policy solving the constrained throughput maximization problem in (5.4) and (5.5).
Theorem 5.3.1. Let Lβo and Pβ be the average total packet loss rate and total transmit power corresponding to some stationary policy ψβ that solves (5.6), then Lβo is also the minimum achievable total packet loss rate when each of the N users is subject to the average transmit power constraint of Pβ/N.
What stated by Theorem 5.3.1 is that, by solving (5.6) for a particular value of β, we obtain a pareto optimal point on the curve of Packet loss rate versus
Power constraint. Whenβ is varied, we can obtain different points on the opti- mal curve and that allows us to study the solution to the constrained throughput maximization problem in (5.4) and (5.5). Before proving Theorem 5.3.1, let us state the following lemma.
Lemma 5.3.2. For any stationary feasible adaptive scheduling/transmission policy φ ∈ Ψst, let Lφo be the total packet loss rate of all users and Pnφ be the average power consumed by user n when φ is employed, there exists a non- stationary policy ψ ∈Ψ such that
Lψo = Lφo while Pmψ = 1 N
XN
n=1
Pnφ, ∀m∈ N, (5.7) where Lψo is the total packet loss rate and Pmψ is the average power consumed by user m when policy ψ is employed.
A proof for Lemma 5.3.2 is given in Appendix B. Using this lemma, we present a proof for Theorem 5.3.1 as follows.
Proof. From Lemma 5.3.2, given stationary policyφβ that solves (5.6) for a par- ticular value ofβ, we can construct a non-stationary policyψβ that achieve the total packet loss rate ofLβo while guaranteeing that the average power consumed by each user is Pβ/N.
Now, suppose there exists another policyψ, ψ ∈Ψ, that results in
Lψo < Lβo, while Pnψ ≤ Pβ/N. (5.8) This contradicts the assumption thatφβ is the solution of (5.6). Therefore,Lβo is the minimum achievable packet loss rate when all users are subject to the average power constraint of Pβ/N.
112
5.3.2 Markov Decision Process
The minimization problem in (5.6) can be regarded as an infinite horizon average cost Markov decision process (MDP). This MDP is specified by the following components.
System states: The system state at timeiisSi = (Bi1, B2i, . . . BiN, G1i, G2i, . . . GNi ) with Bin ∈ {1,2, . . . B} and Gni ∈ {0,1, . . . K −1}. Note that the number of possible system states is finite.
Control actions: Given state Si the control actions is specified by an N- element vector of integers (Ui1, Ui2, . . . UiN), whereUin is the transmission rate of user n in time slot i. Note that we must have:
Uin∈ {0,1, . . . Bin} and UinUim = 0, ∀n, m∈ N, m6=n. (5.9) Immediate cost function: The immediate cost of choosing action (Ui1, Ui2, . . . UiN) in state Si is the sum of weighted total power consumed and total packet loss due to buffer overflow, i.e.,
C(Si, Ui1, Ui2, . . . UiN) = XN
n=1
βnP(Uin, Gni) +Lo(Bin, Uin)
. (5.10)
System dynamics: To fully specify the MDP, we also need to characterize the system dynamics, i.e., the transitioning probability of the system states, given control actions. The system states consist of the buffer occupancies and channel states of allN users. We note that the channel transition probabilities are independent of the control actions. For user n, n ∈ N, as all packets arriving to the buffer during frame i, i.e. Ani, are only added to the buffer at the end of this frame, we can write
Bi+1n = min{Bin−Uin+Ani, B} (5.11)
Knowing the channel state transition probabilities, as well as the distribution of packet arrival processes, the dynamics of the system state can be readily characterized.
The infinite horizon, average cost MDP in (5.6) can be solved using dynamic programming techniques such as value iteration or policy iteration [KV86].
5.3.3 Complexity of Obtaining and Implementing Throughput Maximizing Policies
The difficulty in obtaining a solution to the optimization problem in (5.6) using dynamic programming techniques lies in the fact that the size of the system state space of the corresponding MDP can be very large. In particular, the total number of possible system states is (B+ 1)NKN. This results in a high complexity in solving the corresponding MDP when the number of users in the system and their buffer length and/or number of channel states increase.
As the operation of optimal adaptive scheduling/transmission policies re- quires knowledge of the buffer occupancies and channel states of all N users, it is more reasonable to implement these adaptive policies at the base station, rather than doing so at each individual node. The optimal policies can be stored at the base station using a look-up table. At the beginning of each time slot, all users signal their buffer occupancies to the base station. As for the channel states, they can be estimated by the base station. The base station then outputs a scheduling/transmission decision based on the current system state. We note that when there are many users in the system and their buffer lengths are long, the amount of signaling required to transmit the buffer occupancies to the base station can be significant.
114