In Section 4.3, we study cases when at each sloti, the transmitter knows exactly what the channel states up to time slot i−m−n are. As discussed, when a delayed but error-free channel state is available at the transmitter, the number of internal states of the POMDP is finite and optimal control policy can be obtained. Now, we consider the general situation, described in Section 4.1.4, when a delayed error-free channel estimate is not available for choosing the transmit power and rate. In particular, at timei, we assume that the transmitter only knows a sequence of imperfect channel estimates {Gest0 . . . Gesti−m}.
4.4.1 Optimal Policies Given Delayed Imperfect Channel Estimates with i.i.d. Channel Model
In the special case when the channel state is independent, identically distributed over time, there is no extra information gained by keeping estimates of past channel states. We suppose that during time sloti, the transmitter knows the estimates of channel state Gi, i.e., Gesti , then an internal channel state can be defined as
GIi =Gesti . (4.15)
84 The number of possible internal channel states isK and therefore, the problem of minimizing the weighted sum of packet loss rate and average transmit power can be formulated as a finite-state MDP. In particular, the dynamics of the internal channel state are easy to write down:
PGI(gb1,gb2) = Pr{Gesti+1 =gb2 | Gesti =gb1}=
K−1X
g=0
Pce(g,gb2)pG(g). (4.16) Also, during time slot i, given that the channel estimate is GIi =bg, we can derive the probability distribution of the current channel states as
G(bg, g) = Pr{Gi =g |GIi =bg}= Pr{Gi =g | Gesti =gb}
= Pce(g,bg)pG(g) PK−1
j=0 Pce(j,bg)pG(j). (4.17)
Given this distribution, all the cost functions can be derived.
4.4.2 Heuristic Policies Given Delayed Imperfect Chan- nel Estimates
Now let us consider the case when the channel states are time-correlated. At time i, the transmitter only knows a sequence of delayed imperfect channel estimates {Gest0 , . . . Gesti−m}. To simplify the derivations, we further assume that m= 0. However, when m >0 the analysis is similar.
When the channel is correlated over time, the decision maker needs to keep track of an entire channel estimation history, i.e.,{Gest0 , . . . Gesti }, in order to se- lect the optimal transmit power and rate. If we take the sequence{Gest0 , . . . Gesti } as the internal channel state at timei, then the total number of internal channel states is infinite. Another option for the internal system state, which is more efficient to maintain, is the so called belief channel state. This is a K-element
vector which specifies the probability distribution overK possible channel states at timei. In particular, let Gi be the belief channel state at time i, then
Gi(g) = Pr{Gi =g | G0, Gest0 , . . . Gesti } (4.18) where the initial probability distribution G0 is assumed known (in case G0 is not given, it can be set to pG, i.e., the stationary distribution of the channel states). The advantage of keeping a belief state for every time slot is that it contains all relevant information for making control actions. Furthermore, in the next time slot, given a new channel estimation Gesti+1 = bg, the new belief state can be readily derived from
Gi+1(g) = Pce(g,bg)PK−1
g′=0Gi(g′)PG(g′, g) PK−1
g′=0Pce(g′,bg)PK−1
g′′=0Gi(g′′)PG(g′′, g′). (4.19) Unfortunately, maintaining a belief channel state for each time slot does not solve the problem of having infinite number of possible system states. When the number of system states is infinite, it is extremely hard to obtain an optimal adaptive policy. Doing so may require infinite time and memory. Therefore, instead of aiming for an optimal control policy, let us look at some approaches that can be used to approximate it. All of these approximations start with the assumption that we have already obtained the MDP policy π∗ in (3.43), i.e., an optimal policy when the system state is fully observable.
Employing the MDP Policy π∗
As discussed in Section 4.2.1, the most straightforward approach is to ignore the partial observability of the channel state and just employ policyπ∗, i.e., an optimal policy when the system state is fully observable. At time i, given the channel estimateGesti and buffer occupancyBi, the transmission parameters are
86 set as:
(Ui, Pi) =π∗(Bi, Gesti ). (4.20)
The Most Likely State Heuristic
In this approach, we first determine the state that the channel is most likely in, i.e.,
GM LSi = arg max
g∈{0,...K−1} {Gi(g)} (4.21)
Note that Gi is the belief channel state at time i and is calculated using (4.19).
Then the transmission parameters are set as:
(Ui, Pi) =π∗(Bi, GM LSi ). (4.22) This approach, which is usually termed the MLS approach, was proposed in [NPB95].
The QMDP Heuristic
This approach is related to the discounted cost problem defined in Chapter 3 (equation (3.21)). In particular, let theQ function be defined as:
Q(b, g, u, P) =CI(b, g, u, P)+α
K−1X
g′=0
X∞
a=0
PG(g, g′)pA(a)Jα∗(q(bưu, a), g′). (4.23) When the system state is fully observed, Q(b, g, u, P) represents the cost of taking action (u, P) in state (b, g) and then acting optimally. The QMDP heuristic takes into account the belief state for one step and then assumes that the state is entirely known [LCK95]. In particular, the transmission rate and power for time i is chosen by:
(Ui, Pi) = arg min
u∈{0,...Bi}, P∈P
nK−1X
g=0
Gi(g)Q(Bi, g, u, P)o
. (4.24)
For more discussion on different approaches to approximate an optimal so- lution for a POMDP, please refer to [Lov91].
The Minimum Immediate Cost Heuristic
Finally, in order to assess the effectiveness of the MDP, MLS, and QMDP ap- proaches, which are all MDP-based, we introduce a non-MDP heuristics which is called Minimum Immediate Cost (MIC). In MIC, at time slot i, given the belief state Gi, the transmission parameters are selected so that the expected immediate cost is minimized, i.e.,
(Ui, Pi) = arg min
u∈{0,...Bi}, p∈P
nK−1X
g=0
Gi(g)CI(Bi, g, u, P)o
. (4.25)