Hindawi Publishing Corporation EURASIP Journal on Applied Signal Processing Volume 2006, Article ID 83542, Pages 1–19 DOI 10.1155/ASP/2006/83542 Multiple Description Wavelet Coding of Layered Video Using Optimal Redundancy Allocation Nikolaos V Boulgouris,1 Konstantinos E Zachariadis,2 Angelos Kanlis,3 and Michael G Strintzis4, Department of Electronic Engineering, Division of Engineering, King’s College London, WC2R 2LS London, United Kingdom Kellogg School of Management, Northwestern University, IL 60208, USA The European Patent Office, Munich 80298, Germany The Informatics and Telematics Institute, Thessaloniki GR-57001, Greece The Electrical and Computer Engineering Department of the University of Thessaloniki, Thessaloniki GR-54124, Greece The Received March 2005; Revised 30 August 2005; Accepted September 2005 We present a wavelet-based framework for the encoding of video in multiple descriptions Using the proposed methodology, the generation of multiple descriptions is performed so that drift is eliminated at the decoder regardless of the number of received descriptions Moreover, the proposed framework is flexible in the sense that it allows the encoding of video into an arbitrary number of descriptions We also present a thorough analysis of rate allocation issues and propose three algorithms for the optimal allocation of redundancy Experimental results for the transmission of video using two descriptions demonstrate the efficiency of the proposed method Copyright © 2006 Hindawi Publishing Corporation All rights reserved INTRODUCTION Multiple description (MD) coding [1, 2] offers an attractive framework for the transmission of multimedia over heterogeneous networks In MD coding, a source is encoded into multiple independently decodable bitstreams which are mutually refining and equally important At the decoder side, the reconstruction quality is dependent on the number of descriptions that was errorlessly received Due to its flexibility, multiple description coding is considered a very robust and reliable tool for information transmission Multiple description coding has been investigated for image [3–5] and video transmission [6–11] In the particular case of video transmission, the study of MD systems becomes more complicated due to the uncertainty about the information that will be available at the decoder of an MD system In [12], a methodology was presented for the design of two-channel orthonormal filter banks based on the Lagrangian optimization of the redundancy rate-distortion performance of MD subband coding In [7], an MD predictive quantization system was introduced, appropriate for the encoding of correlated information sources such as video and speech The proposed system was used to construct a balanced twin-description interframe MD video coder, and performance results are presented using two packetization strategies A review on MD coding was recently presented in [13] In [6], MD video coders were proposed which use motion-compensated prediction These systems utilize MD transform coding, three separate prediction paths, and side information in order to accommodate all possible scenarios at the decoder For this reason, three different algorithms for redundancy allocation were implemented, and experimental results were presented An improved algorithm based on the same principles was presented in [10] where the encoding of the side information was modified in order to be useful even if no drift occurs In [14], a novel scheme for doubledescription coding was proposed, which is built in the H.263 coder and replicates some selected DCT coefficients in both descriptions The selection is based on a threshold determined using rate-distortion techniques In [8], a novel way to deal with redundancy was devised Temporal redundancy was used to control the tradeoff between drift and redundancy However, this method does not inherently eliminate drift, that is, the cumulative distortion which occurs whenever the reference frames used at the decoder are not identical to the ones used by the encoder In [9], a drift-free wavelet-based MDC video coding scheme was proposed However, the redundancy allocation algorithm did not take into consideration the impact of the temporal redundancy into the design of the system, thus resulting in suboptimal coding The above problem was dealt with in [15], where an improved version of the method in [9] was presented In [16], a multiple description coding method for video streaming was presented The method in [16] was based on a 3D discrete wavelet transform Redundancy was allocated by applying Lagrangian optimization techniques for the appropriate selection of subband quantizers In [17], an MDC scheme for video coding was presented based on a spatiotemporal multiresolution analysis Correlation between the two descriptions was introduced in the temporal domain by using an oversampled motion-compensated filter bank In the present paper, the intraframe and the motion compensated prediction residual frames are wavelet-coded and divided into a redundant and an enhancement part with the redundant part encoded in all descriptions and the enhancement part distributed in several descriptions The “repeat or split” strategy was chosen over other proposed techniques, such as that presented in [2] since, in our case, drift-free reconstruction is straightforward Using the above framework, we present and evaluate two techniques for the multiple description coding of video sequences (i) In the first technique, only the redundant part is used for the construction of reference frames and thus the resulting video coding scheme is able to perform drift-free reconstruction Since the quality of the reference frame affects the coding efficiency of the system, an algorithm incorporating the impact of temporal correlation is also presented for the allocation of redundancy among multiple descriptions (ii) In the second technique, both the redundant and the nonredundant parts of the stream are used for the creation of the reference frame This technique uses high-quality reference frames but the reconstructed video suffers from drift in case of transmission over channels with severe loss Additionally, in the present paper the problem of optimal redundancy allocation, that is, the appropriate selection of the redundant and the enhancement parts for each frame, is investigated Specifically, this problem is formulated as the maximization of the average video quality under the constraint of a target total rate Three variations of an optimization algorithm are proposed and evaluated in terms of their complexity It should be noted here that, in our system, the compression and the optimization steps are distinct In this manner, our redundancy allocation algorithm is applied directly to compressed source layers, that is, the algorithm actually parses the compressed stream to multiple descriptions This clearly differentiates our algorithm from the method in [16] in which the generation of descriptions is performed by application of appropriate quantizers to the transform coefficients The structure of the paper is as follows In Section 2, the proposed framework for multiple description coding of video is presented Section describes the wavelet coding of intraframes and motion compensation residuals In Section 4, the exploitation of temporal correlation during the optimization process is discussed In Section 5, the redundancy allocation problem is formulated The complexity of the redundancy allocation algorithm is studied in Section 6, EURASIP Journal on Applied Signal Processing and a faster algorithm is presented in Section based on the Equivalent Continuous Problem In Section 8, experimental results are presented and finally conclusions are drawn in Section PROPOSED FRAMEWORK FOR MULTIPLE DESCRIPTION GENERATION The proposed system for the generation of multiple descriptions is depicted in Figures and Initially, the available bit budget is evenly allocated to the frames in a group of pictures (GOP) The first frame in each GOP is intra-coded using block-based wavelet coding The resulting coded stream is distributed over a number of descriptions A portion of the bitstream is redundant in all descriptions The correlation between consecutive frames is subsequently removed using overlapped block motion compensation (OBMC) [18] The reference frames used to calculate motion vectors are the original frames in order to ensure good precision in the estimation of the motion vectors Motion vectors are losslessly coded using the techniques in [19] and are included in all descriptions Using the previously estimated half-pixel accurate motion vectors, the procedure for the generation of multiple descriptions for the interframes continues as follows: initially, the first interframe is compensated No intra-coding is used in interframes We employ two different mechanisms for the derivation of reference frames that are used during motion compensation In the first, a version of the I-frame, reconstructed using only the redundant part of the bitstream so far coded, is used as reference for the compensation process In the second, both redundant and nonredundant parts are used for the derivation of reference frames in motion compensation The prediction error is derived by subtracting the compensated prediction from the original interframe The prediction error is wavelet transformed and coded into multiple descriptions A version of the error frame is reconstructed using either the redundant part or both redundant and nonredundant information of the coded bitstream depending on which of the two mechanisms described above is used The reconstructed error frame is added to the compensated frame The resulting interframe (instead of the original) will serve as the reference frame for the compensation of the next interframe The same procedure is iterated until all frames in a GOP are treated Using the above methodology, the proposed multiple description video coding scheme is able to produce an arbitrary number of descriptions at the cost of reduced compression efficiency whenever the number of descriptions is large In each description, there is a redundant part, which is always used for the derivation of the reference frame in the motion compensation process, and a complementary refinement part, which is used to improve the quality of each description and may or may not be used for the derivation of the reference frame When both redundant and nonredundant information is used, reference frames of high quality are available When only the redundant part is used, the motion compensation process performed at the encoder can be identically Nikolaos V Boulgouris et al I0 = E0 I-frame Input video It Et Description 1: Dt P-frame + – Wavelet coding WT Redundancy and refinement control IC,t Half-pixel motion estimation WT – Multiple description generation ERD,t Description K: K–1 Dt + IC,t Description 2: Dt Overlapped block motion compensation Iref,t–1 Frame buffer Iref,t Motion vectors Arithmetic coding Figure 1: Block diagram of the coder Description 1: Dt Multiple description decoding Wavelet decoding Iest,0 = Eest,0 WT – Description K: K–1 Dt I-frame Output video Iest,t P-frame Description 2: Dt Wavelet decoding + Eest,t Wavelet decoding IC,t OR Arithmetic decoding WT – Motion vectors ER,t Iref,t + IC,t Frame buffer Iref,t–1 Overlapped block motion compensation Figure 2: Block diagram of the decoder replicated at the decoder even if only one description is received This is a very important feature of our coder since, if the decoder is unable to use the same reference frames, errors will accumulate in the decoded video sequence causing the aforementioned drift distortion [20] With the proposed methodology, which relies only on the redundant part for motion compensation, the possibility of facing drift at the decoder is eliminated and thus a reconstructed sequence of high quality is obtained even if only some (or even a single) descriptions are received The determination of the portion of the bitstream that is redundant in all descriptions is performed after the wavelet EURASIP Journal on Applied Signal Processing B1 B2 B3 B4 BM–2 BM–1 BM N bitplane N – bitplane N – k bitplane B3(N+1)M bitplane Description A Redundant part (in both descriptions) Description B Only in description Only in description (a) (b) Figure 3: (a) Assignment of the blocks of a wavelet representation for the case of two descriptions The bitstreams corresponding to the blocks may be included in one or more descriptions (b) Representation of redundant and nonredundant part of the stream for the case of two descriptions coding of the intra and the residual error frames The wavelet coefficients are coded using a simple bitplane encoder, based on the context models in [21] Specifically, the decomposed frame is divided into blocks of equal dimensions Each block may be included in some or all descriptions Thus, some blocks may appear in all descriptions whereas some other blocks appear in only one of the descriptions The inclusion of blocks in one or more descriptions is done so as to maximize the average quality at the decoder, subject to a total rate constraint, and attain fairly equal bitrate and fairly equal quality descriptions Such an assignment is depicted in Figure 3(a) A representation of the redundant and nonredundant part of the coded bitstream for a two-description system is shown in Figure 3(b) The generation of descriptions can be achieved by including appropriate blocks of wavelet coefficients in one or both of the descriptions In the case of two descriptions, this is achieved by using the checkerboard pattern which we originally proposed in [9] This approach bears some resemblance with the flexible macroblock ordering (FMO) approach in H.264 (see, e.g., [22]) However, there are fundamental differences between FMO and our approach which arise from the fact that our method operates in the wavelet domain whereas FMO is applied in the spatial domain Since the FMO approach uses spatial blocks, the loss of a block would mean complete loss of information for that spatial region This is why in FMO at least a coarsely quantized version of a chess-block need be included in each description Clearly, this means that using FMO there is much less control over redundancy since information about all blocks need be encoded in both descriptions Moreover, since redundancy is introduced by the use of different quantizers, and not by explicitly including the same portion of the bitstream in all descriptions, the elimination of drift is not a trivial task Finally, in FMO there is a need for error concealment in case the reconstructed quality in a spatial region is not good Unlike the FMO approach, in our system, a loss of a wavelet block (due to the loss of the description in which the block is encoded) causes only the loss of some detail in the reconstructed frame Moreover, in our method, most wavelet blocks are included in only one of the descriptions and only a few important blocks are included in both descriptions This is possible since the wavelet transform compacts the important information in a few blocks (subbands) of transform coefficients This strategy seems to be naturally more suitable for MD coding since it allows better manipulation of redundancy and generally achieves lower redundancy levels Throughout our manuscript we assume that no B-frames are encoded (see Figure 4) However, this assumption does not affect the significance of our work, which can also be applied when using B-frames Suppose that we have an intra-coded frame, several (unidirectionally predicted) interframes, and some other frames that are to be bidirectionally predicted using the intra- and interframes Apparently, our MD generation methodology is directly applicable to the sequence of intra- and interframes In each description, Nikolaos V Boulgouris et al F1 P1 P2 PM–2 PM–1 F2 PM GOP (M frames) Figure 4: Structure of a group of pictures (GOP) in the proposed coding scheme where F1 , F2 are intra-coded frames and P1 , P2 , , PM are interframes bidirectionally predicted frames could be encoded based on the reconstructions of intra- and interframes which are achieved using the bitstream in the same description Note that, since B-frames not propagate errors and not cause drift, the reconstructed versions of intra- and interframes can be obtained using not only the redundant part of the description but also using the nonredundant part as well An interesting and desirable result of this strategy is that, as these reconstructions will be different in the two descriptions, the associated residuals of the bidirectionally predicted frames will be inherently different in the two descriptions This is perfectly consistent with the MD coding principle of encoding different versions of the information in each description In the ensuing section, the complete wavelet coding method, used for both intra- and interframes, is described BLOCK-BASED WAVELET CODING OF MOTION COMPENSATION RESIDUALS The intra-frame and the motion-compensated residuals are decomposed using a wavelet transform based on the 9–7 biorthogonal filter bank [23] The maximum absolute coefficient in each subband is placed in the image header All subband maxima are arithmetically encoded The transmission of information takes place in a bitplane-wise manner starting from the most significant bit (MSB) to the least significant bit (LSB) Within each bitplane, subbands are encoded in a predefined scanning order from the lowest to the highest resolution Each subband is divided into a set of blocks The default block size is (W/2L+1 ) × (H/2L+1 ), where W, H are the width and height of the frame, respectively, and L is the maximum level of the wavelet decomposition For each block, first the coefficients whose most significant bit is on the bitplane currently coded are identified by comparison to a threshold T = 2n , where n is the index of the bitplane that is being coded If a coefficient becomes significant, that is, it is found to be greater than or equal to T for the first time, then its sign is coded This process is often called significance identification [24] and the compressed significance map for a block is termed significance layer Similarly, the refinement layer is defined as the one containing the nth bitplane of coefficients (in a block) found significant in previous passes In our coder, refinement layers for the nth bitplane are transmitted immediately after the transmission of significance layers for the same bitplane Note that each layer contains significant or refinement information for a single block and that the even- tual allocation of layers in descriptions is performed by taking into consideration the fact that the decoding of a layer is possible only when all its predecessor layers in the same block are also included in the description The nth bit in the binary representation of a coefficient f in subband B is coded if the maximum coefficient in the subband B is greater than or equal to the current threshold max( f ) ≥ 2n (1) f ∈B The deployment of the above rule reduces drastically the number of coefficients whose significance is tested during the coding of a significance identification layer For this reason, subband maxima are included in all descriptions However, in order to further reduce the number of symbols that have to be coded during the layer coding stage, a single bit is initially coded to indicate whether all coefficients in a block are insignificant A value of “1” of this bit indicates that the block contains no significant coefficients and no further information is coded for this block The symbol streams described above are coded using adaptive arithmetic codes [25] The context modelling strategy in [21] is followed for the coding of significance identification layers Refinement bits are entropy coded using a single adaptive arithmetic model The max frequency count of the arithmetic coder was set equal to 512 in order to allow fast adaptation of the coder to the statistics of the incoming symbol stream In order to apply an efficient redundancy allocation algorithm that takes into account the actual rate-distortion characteristics of the compressed stream, the distortion decrease achieved by the transmission of each bitplane should be calculated [21, 26] for each layer The distortion decrease caused by the transmission of the ith layer is given by Di = t ftn+1 − ft − ftn − ft , (2) where n is the index of the bitplane included in the layer, t is the coefficient index, and c, c denote the original and the reconstructed wavelet coefficients, respectively Each layer corresponding to a specific block of wavelet coefficients cause different reduction in the distortion Analytical expressions for the distortion reduction caused by the transmission of layers can be found in [26] Let Ri be the number of bits required for the coding of the ith layer When all pairs (Di , Ri ) are determined, the redundancy allocation algorithm can be applied This is examined in the following sections TEMPORAL CORRELATION COMPUTATION An optimization algorithm should take into consideration the temporal correlation linking adjacent video frames Modelling the dependency of adjacent frames in a video sequence is a nontrivial problem In this paper, in order to deal with this issue, we introduce a temporal correlation coefficient , ≤ < 1, meant to incorporate the effect of temporal correlation of layer i into the optimization algorithm Specifically, we assume (a similar conclusion was EURASIP Journal on Applied Signal Processing drawn in [27]) that the distortion reduction in frame m + is Di , where m is the frame index In the same manner, the additional distortion reduction Di in frame m + stimulates additional distortion reduction a j (ai Di ) in frame m + 2, ak (a j (ai Di )) in frame m+3 and so on, where a j , ak , are the temporal correlation coefficients for frames m + 1, m + 2, correspondingly We further assume that , a j , ak are approximately equal for all frames in a GOP since the dependency between consecutive frames in the same GOP is not expected to exhibit significant variations In general, the distortion reduction in frame n caused by the transmission of the ith layer in frame m, m < n, is an−m Di Thus, as the temi poral distance n−m between m and n increases the additional distortion reduction decreases exponentially Assuming that the total number of frames in a GOP is M, the total distortion decrease is given by Di + Di + a2 Di + · · · + aM −m Di , i i Di + + a2 + · · · + aM −m Di , i i (4) where the first term is the distortion reduction in the current frame and the second term denotes the distortion reduction in all subsequent frames If Ci = + a2 + · · · + aM −m = i i n=1 an = i − aM −m i − , (5) the total distortion reduction caused by the transmission of the ith layer in the mth frame can now be expressed as D i + D i Ci , FORMULATION OF THE REDUNDANCY ALLOCATION PROBLEM In order to address the problem of optimal allocation in MD video coding, it is important to derive expressions for the average video quality at the decoder and the total rate used in terms of the assignment strategy Although in the experimental results section we consider the average PSNR over the entire sequence, in this section we will attempt to maximize (7) This assumption is generally valid for the case of our coder (a curve based on real data is shown in Figure 5(b)) We further note that lower-indexed layers correspond to coarse image information whereas high-indexed layers correspond to detail information Between adjacent frames, coarse information is much more correlated than detail information Thus, is fully expected to decrease with i Since Ci is obviously a monotone function of , this implies that: C1 ≥ C ≥ · · · ≥ C L , Even though all coefficients Di , , and Ci depend on the frame index m, this dependence will in the sequel be omitted for convenience (8) an observation which is also verified experimentally This ensures that (7) will still hold, if we replace the Di ’s with Di (1 + Ci ), that is, (6) where Di Ci is the cumulative distortion reduction1 that is caused in the subsequent frames due to the higher quality of the current (reference) frame m Clearly, with this formulation, layers in frames lying in the beginning of a GOP are more important than layers of frames at the end of the GOP since the quality of the former affects the quality of the latter The coefficients , and hence Ci , which quantify the impact of the current frame on the quality of subsequent frames were calculated using the methods in [27] D2 DL D1 ≥ ≥ ··· ≥ R1 R2 RL (3) where Di is the distortion reduction caused in the m + frame, a2 Di is the distortion reduction in the m + frame, i and so forth The above quantity is equivalently written as the sum M −m the distortion reduction incurred by each frame of the GOP separately This simplification will not significantly affect the optimality of the strategy derived here, while it will serve in addressing the problem of optimal assignment in a more rigorous way and in providing useful insight into the optimization procedure Let us assume that each frame is coded into L layers, each using Ri bits and contributing a reduction of distortion equal to Di relative to the quality of the current frame and Ci Di , i = 1, , L, to the quality of the next frames in the GOP,2 when used for motion compensation for the next frames We further assume that the curve appearing in Figure 5(a) is concave, namely, D 1 + C1 D + C2 D L + CL ≥ ≥ ··· ≥ R1 R2 RL (9) We wish to encode the initial video sequence into K descriptions, each of which will either provide a coarse reconstruction of the initial sequence by itself or improve a reconstruction based on one of the other descriptions To this end, for every frame in the GOP we will assign a number of layers to each description in a way so as to maximize the distortion reduction incurred under a limited-rate constraint We will consider the case of double-description coding (K = 2) The general case is studied in Appendix B Let I = {1, , L} denote the set of the possible values that the layer indices may assume The problem of providing two descriptions for each frame in the GOP is equivalent to assigning a set of layer indices I1 ⊂ I to the first and a set I2 ⊂ I to the second description Subsequently, the two descriptions will be transmitted over two communication links to the decoder If Ak represents the event that description k reaches the decoder and p denotes the probability that each stream is successfully delivered to the decoder (i.e., For the last frame in the GOP Ci = 0, i = 1, , L Nikolaos V Boulgouris et al ×106 Distortion reduction Di Ri 50 100 150 Rate (a) (b) Figure 5: (a) Comprising layers and induced distortion reduction, (b) distortion reduction as a function of rate for a frame of “Akiyo” using the source coder of Section p = Pr{Ak }, k = 1, 2), four events exist for each frame: tion will be B1 A1 \ A2 : only the first description is delivered B2 A2 \ A1 : only the second description is delivered B12 A1 ∩ A2 : both descriptions are delivered B0 Ac ∩ Ac : no descriptions are delivered The probability of each of these events may be easily derived if we make the reasonable assumption that the events A1 and A2 are independent: Pr B1 = p(1 − p), Pr B2 = p(1 − p), Pr B0 = (1 − p)2 Pr B12 = p , (10) d B2 = Di , i∈I1 d B12 = Di , i∈I2 Di , d B0 = Ci D i (12) i∈I∩ Consequently, the expected distortion reduction, De (I1 , I2 ), incurred at the decoder, when the index-assignment policy (I1 , I2 ) is used, will be De I1 , I2 = Pr B1 d B1 + Pr B2 d B2 + Pr B12 d B12 + Pr B1|2 d B1|2 Let d(B1 ), d(B2 ), d(B12 ), d(B0 ) denote, respectively, the distortion reduction at the decoder for the current frame when each of the events B1 , B2 , B12 , and B0 occurs Their values may be calculated as d B1 = d B1| (11) i∈I1 ∪I2 = p(1 − p) Di + p(1 − p) i∈I1 + p2 Di + p(2 − p) i∈I1 ∪I2 Ci D i , i∈I∩ and after some simple manipulations we arrive at De I1 , I2 = p(2 − p) D i + Ci + p i∈I∩ Moreover, when at least one of the descriptions arrives at the decoder, the layers common to all descriptions will be used for the motion compensation of the next frame in the GOP, incurring an additional distortion reduction of Ci Di c for each layer Let B1|2 B0 denote the event that at least one description reaches the decoder and I∩ I1 ∩ I2 denote the set of indices common to both descriptions Then, Pr{B1|2 } = p(2 − p) and the corresponding distortion reduc- (13) Di i∈I2 Di , (14) i∈I where I (I1 ∪ I2 ) \ I∩ is the set of indices contained in exactly one of the descriptions The total rate, R(I1 , I2 ), used by the two streams is R I , I2 = Ri + i∈I1 Ri , i∈I2 (15) EURASIP Journal on Applied Signal Processing and may also be expressed as R I , I2 = Ri + i∈I∩ Ri (16) i∈I Assuming that the total rate used may not exceed a predefined rate budget RB , our purpose is to identify the indexassignment sets I1 and I2 , which not violate the rate constraint and maximize the expected distortion reduction at the decoder max I1 ,I2 :R(I1 ,I2 )≤RB D e I , I2 (17) It is clear from (14) and (16) that the expected distortion reduction and total rate depend upon the sets I∩ and I Furthermore, the factor p in the expected distortion reduction (14) may be ignored for the optimization procedure for the sake of simplicity Therefore, the maximization problem may be rephrased as max D = (maximum distortion originally 0) ∗ I∩ = I ∗ = (optimal sets originally empty) for I∩ = 0, , 2L − (all possible realizations of I∩ ) for I = 0, , 2L − (all possible realizations of I ) if I∩ AND I = (check if sets are disjoint) if (19) is satisfied (check rate constraint) Calculate expected distortion reduction D(I∩ , I ) from (18) ∗ if D(I∩ , I ) > max D, update max D, I∩ and I ∗ (update optimal sets) endif endif endfor endfor Partition I ∗ into two fairly equal-rate subsets I ∗(1) and I ∗(2) ∗ ∗ The optimal index assignment is given by I1 = I∩ ∪ I ∗(1) , ∗(2) ∗ ∗ I2 = I∩ ∪ I Algorithm 1: Exhaustive search algorithm Maximization problem Find disjoint sets I∩ , I ⊂ I maximizing D I∩ , I = (2 − p) D i + Ci + i∈I∩ Di (18) i∈I subject to the constraint R I∩ , I =2 Ri ≤ RB Ri + i∈I∩ (19) i∈I The solution of the above problem will yield the optimal sets I∩ and I , where I∩ will contain the indices of the layers assigned to both streams and I will contain the indices assigned only to one of the streams In order to obtain the optimal I1 , I2 , we need to further partition I into two disjoint index-assignment sets, one for each stream It is clear from (14), however, that any such partition will yield sets I1 , I2 , inducing the same expected distortion reduction at the decoder; hence, the partition of I may be arbitrary (we may even assign the whole set I to only one of the streams) However, since balanced MD coding is sought, an acceptable partitioning should result in fairly equal total rates of I1 and I2 In order to achieve this, the indices in I may be ordered in terms of decreasing corresponding rates Ri and be assigned alternately to each stream representation of a number between and 2L − 1, with the ith bit being 1, if i ∈ A and otherwise An exhaustive search ∗ algorithm which will determine the optimal solution I∩ , I ∗ to the maximization problem is shown in Algorithm Although this algorithm will always produce an optimal solution, the number of possible realizations of I∩ and I , over which the search will be performed, is 3L , still prohibitive even for moderate values of L The NP-completeness of the maximization problem described by (18) and (19) can also be shown by formulating it as an integer (0–1) programming problem as shown in Appendix A In view of these remarks, it would be desirable to establish some optimality results that will narrow the number of possible candidate solutions or devise techniques that would search through a smaller set of possible near-optimal solutions To this end, the following will prove helpful Lemma If I∩ and I are fixed and j ∈ I∩ or j ∈ I , replacing layer j with layers of higher indices, such that their total rate does not exceed R j , would result in smaller expected distortion reduction Proof Assume that j ∈ I∩ (the proof for j ∈ I is similar) and j1 , , jk ∈ I∩ , I with j ≤ j1 ≤ · · · ≤ jk and k COMPLEXITY ANALYSIS If we were to solve the maximization problem (17) by exhaustively examining all possible realizations of I1 and I2 , this would involve 22L possibilities, since there are 2L subsets of the index set I Clearly, the optimal solution will be achieved by choosing any pair of sets I1 and I2 resulting in the same ∗ sets I∩ and I ∗ , which solve the maximization problem described by (18) and (19) Hence, we only need to examine all possible realizations of disjoint sets I∩ , I ⊂ I Note that since there are 2L possible subsets of the index set I, any subset A ⊂ I may be expressed as the binary R ji ≤ R j (20) i=1 If I∩ is replaced by the set I∩ (I∩ \ { j }) ∪ { j1 , , jk }, then the rate constraint (19) would still be satisfied and the expected distortion reduction (18) would decrease by D I∩ , I − D I∩ , I k = (2 − p) D j + C j − D ji + C ji i=1 (21) Nikolaos V Boulgouris et al Using (9) and (20) it is straightforward to show that the outcome of (21) is nonnegative; hence, this replacement would prove inefficient The same also holds if we were to replace more than one lower-indexed layers with higher-indexed ones of smaller total rate In other words, Lemma suggests that, if possible (i.e., if the rate constraint is not violated), we should replace higher-indexed layers with lower-indexed ones with appropriate total rate However, Lemma might mislead us to as∗ sume that the optimal solution would consist of sets I∩ and ∗ I comprising the lower-indexed layers, that is, ∗ I∩ = 1, , L∗ , ∩ I ∗ = L∗ + 1, , L∗ , ∩ L∗ ≤ L∗ ∩ (22) This would not be true in case the rate margin RM RB − i∈I∩ Ri − i∈I Ri can be filled by replacing one (or more) of the lower-indexed layers j with one or more higherindexed layers j ≤ j1 ≤ · · · ≤ jk , such that k=1 R ji ≤ i 2R j + RM It is possible that in this case the resulting expected distortion reduction actually be larger, as shown in the example below Counterexample Let RB = 21.5, p = 0.8, Ci = 0, i = 1, , L, and Ri , Di given by the following table: i Ri 1.5 Di 0.9 0.7 0.4 0.25 0.18 It turns out that the optimal sets I∩ , I of the form (22) are I∩ = {1} and I = {2, 3, 4, 5} (L∗ = 1, L∗ = 5) resulting in ∩ total rate 20.5 and expected distortion reduction 2.61 There is, however, a rate margin RM = RB − 20.5 = that may be taken advantage of, if I∩ or I is properly chosen In fact, if the sets I∩ = {2, 4} and I = {1, 4, 5} are used, the total rate matches the rate budget RB and the expected distortion reduction increases slightly to 2.62 This counterexample verifies that the optimal solution will not always be of the form (22); however, extensive experimentation showed that in most cases the sets I∩ and I given by (22) provide a near-optimal solution, as was indeed the case in the previous example An improved exhaustive search algorithm, which stems from this remark, would consider only sets I∩ , I of the form (22) The number of possible candidates may be further reduced based on the following lemmas Lemma L∗ cannot exceed any certain value beyond which L∗ the sum i=1 Ri exceeds the rate budget RB Proof This lemma is a direct consequence of the total rate constraint (19) for L∗ = ∩ Lemma L∗ cannot be smaller than any value for which the L∗ sum i=1 Ri does not exceed RB /2 max D = (maximum distortion originally 0) L∗ = L∗ = (optimal sets originally empty) ∩ L1 = max{l ∈ I : li=1 Ri ≤ RB /2} (smallest value for L∗ ) L2 = max{l ∈ I : li=1 Ri ≤ RB } (largest value for L∗ ) L∩ = L1 (initial value for L∩ ) for L = L1 , , L2 (all possible values of L ) ∩ while L=1 Ri > RB − L=1 Ri i i decrease L∩ endwhile I∩ = {1, , L∩ }, I = {L∩ + 1, , L } (corresponding index-assignment sets) Calculate expected distortion reduction D(I∩ , I ) from (18) if D(I∩ , I ) > max D update max D, L∗ , and L∗ ∩ (update optimal values) endfor ∗ I∩ = {1, , L∗ }, I ∗ = {L∗ + 1, , L∗ } ∩ ∩ (optimal index-assignment sets) Algorithm 2: Improved exhaustive search algorithm L∗ Proof If i=1 Ri ≤ RB /2, the best choice for L∗ is L∗ = L∗ , ∩ ∩ since the rate constraint will still be met If there exists a l > L∗ with li=1 Ri ≤ RB /2, then setting L∗ = L∗ = l improves ∩ D(I∩ , I ) Lemma For a given L∗ , the optimal value of L∗ is the largest ∩ integer l ≤ L∗ , for which the total rate for I∩ does not exceed L∗ the remaining available rate, li=1 Ri ≤ RB − i=l+1 Ri ⇔ ∗ L l i=1 Ri ≤ RB − i=1 Ri Proof It is straightforward to prove that the more layers I∩ comprises, the better the distortion reduction will be Therefore, we should try to “fit” as many layers as possible in the remaining available rate Lemmas 2–4 may be used to narrow down the exhaustive search space In particular, Lemmas and suggest that we should examine values of L∗ , in a set {L1 , , L2 }, while Lemma suggests that for each of these values of L∗ there is a unique optimal value of L∗ ; hence, it suffices to examine only ∩ L2 − L1 + < L cases In view of these results, we can describe the improved exhaustive search procedure in Algorithm The while loop in this algorithm searches for the maximum value of L∩ fitting in the rate margin, since, as can be easily verified, the corresponding value of L∩ for L + will be smaller than that for L (the previous value of L∩ ) Hence, the search is performed over L2 − L1 + possible values of L∗ and L1 possible values of L∗ and the complexity of the algo∩ rithm will be linear in L In general, the improved exhaustive search algorithm will ∗ result in sets I∩ and I ∗ , which not exactly meet the rate constraint In this case, there will be a rate margin RM ∗ RB − i∈I∩ Ri − i∈I ∗ Ri , which can be “filled” with smaller ∗ segments outside I∩ or I ∗ A further improvement would ∗ search for possible augmentations of I∩ or I ∗ , so that the total rate be closer to the rate budget RB 10 EURASIP Journal on Applied Signal Processing As already stated, this algorithm will, in general, yield suboptimal yet near-optimal solutions to the maximization problem A further (and more important) disadvantage of this algorithm is that, when applied in the general case of K > descriptions, its complexity will be even higher If we are to construct a low-complexity algorithm for the general case, we may resort to heuristics emanating from a continuous-case consideration of the problem This is explored in the next section By examining closely the discrete maximization problem described by (18) and (19), we first note that the sums i∈I∩ Di (1 + Ci ), i∈I∩ Ri and i∈I Di , i∈I Ri are the distortion reduction and rate “measures” of I∩ and I respectively A further restriction arises from the requirement that I∩ and I have to comprise intervals dictated by the available blocks and that partial blocks may not be used If we relax this restriction, we may formulate a corresponding Continuous Maximization Problem, which is easier to solve Assume that the curve appearing in Figure represents a continuous, differentiable, nondecreasing, and concave function D(R) of the rate R Then the derivative D (R) will be a well-defined, continuous, positive, and decreasing function of R, for every R ∈ R+ In a similar fashion, assume that the fraction of distortion reduction due to motion compensation is provided by a continuous decreasing function c(R) and that the curve corresponding to the products Di Ci defines a function C(R) with derivative C (R) = D (R)c(R), which will have properties similar to those of D (R).3 For any rate interval [r1 , r2 ], let μR , μD , μC denote the following quantities: = μR r1 , r2 μC r1 , r2 r2 = μD r1 , r2 = r1 r2 r1 r1 (23) S∗ = 0, R∩ ∩ R+ \ S , ∩ (25) Proof We will outline the general concept behind (26) Assume that (26) does not hold Then there exist δ > and r2 > r1 ≥ such that the interval [r1 , r1 + δ] is lying outside S∩ (i.e., [r1 , r1 + δ] ∩ S∩ = ∅) and the interval [r2 , r2 + δ] is contained in S∩ (i.e., [r2 , r2 + δ] ⊂ S∩ ) If we replace S∩ with S∩ (S∩ \ [r2 , r2 + δ]) ∪ [r1 , r1 + δ] (remove the second interval and add the first), then the rate constraint will still be met and the increase in expected distortion reduction (24) will be D S∩ , S − D S∩ , S = (2 − p) μC r1 , r1 + δ − μC r1+δ r1 ≥ (2 − p) Find disjoint sets S∩ , S ⊂ R+ maximizing (β) = (2 − p) + μD r1 , r1 + δ r2 , r2 + δ − μC D (r) 1+c(r) dr − r1 +δ (α) r1 r2 , r2 + δ r2 +δ r2 D (r) 1+c(r) dr D r + r2 − r1 + c r + r2 − r1 dr r2 +δ − Continuous maximization problem (26) for some positive rate R∩ c(r)D (r)dr = C r2 − C r1 In practice, the number of intervals of the form [r1 , r2 ] is always finite (with an upper bound equal to the number of bits in the compressed bitstream) Obviously, the measure of a union of a finite number of disjoint intervals of the form [r1 , r2 ] would equal the sum of the measures of these intervals Thus, a continuous version of the discrete maximization problem described by (18) and (19) may now correspondingly be formulated as follows r2 r2 +δ r2 D (r) + c(r) dr C (ρ)dρ − r2 +δ r2 C (r)dr = 0, (27) where (α) results from r2 − r1 > and the fact that D (·) and c(·) are decreasing and (β) involves a simple change of integration variable It follows, therefore, that S∩ will not be optimal (since it is outperformed by S∩ ) unless it is given by (26) for some R∩ (24) In a similar manner, it is possible to establish an equivalent property for S In other words, D (R) corresponds to the ratios Di /Ri and c(R) to the coefficients Ci Lemma If S∩ is fixed, the optimal S comprises the “smallest-rate region” of the remaining space R+ \ S∩ , that is, S∗ = S∩ ∪ [0, R ] for some positive rate R D S∩ , S ≤ RB With the further reasonable assumption that S∩ and S are unions of closed intervals, properties stronger than Lemma may be established for the continuous problem, leading to optimal solutions = (2 − p) dr = r2 − r1 , D (r)dr = D r2 − D r1 , = 2μR S∩ + μR S R S∩ , S Lemma If S is fixed, the optimal S∩ comprises the “smallest-rate region” of the remaining space R+ \ S , that is, EQUIVALENT CONTINUOUS PROBLEM r2 subject to the constraint = (2 − p) μC S∩ + μD S∩ + μD S Nikolaos V Boulgouris et al 11 Furthermore, the concavity of D(·) implies the following Lemma If r1 < r2 , δ > and [r1 , r1 + δ] ∈ S , [r2 , r2 + S∩ \ [r2 , r2 + δ] ∪ [r1 , r1 + δ] δ] ∈ S∩ , then the sets S∩ and S S∩ \ [r1 , r1 + δ] ∪ [r2 , r2 + δ] yield smaller expected distortion reduction Proof This is true because the contribution of S∩ in the expected distortion reduction (24) involves the factor − p > and the function C(R) ≥ D(R), R ∈ R+ Hence, incorporating the smaller-rate interval [r1 , r1 + δ] in S∩ and the higherrate interval [r2 , r2 + δ] in S will yield smaller expected distortion, as is easily be verified Lemmas 5, 6, and suggest that the jointly optimal sets S∗ , S∗ will be intervals of the form ∩ S∗ = 0, R∩ , ∩ S∗ = R∩ , R , (28) for some R ≥ R∩ ≥ In terms of the original maximization problem, (28) would provide the optimal solution if the (0–1) constraint for x is relaxed, namely, if assignment of partial blocks is allowed In view of (28), the equivalent continuous problem may be restated as follows +D R − D R∩ (29) subject to the constraint R R∩ , RR φ(RB /2) = − (2 − p)c(RB /2) Therefore, if (1 − p) ∈ [φ(0), φ(RB /2)], the optimal value for R∗ will be φ−1 (1 − p) ∩ Otherwise (32) does not have a solution and optimality is achieved either at or RB /2 In general, we can write = ⎧ ⎪0, ⎪ ⎪ ⎨ φ (1 − p), if − p ∈ φ(0), φ RB /2 , ⎪ ⎪ ⎪ ⎩ = R∩ + RR ≤ RB (30) (2 − p) C +D R∗ ∩ − D RB − R∗ ∩ −D R∗ ∩ = 0, (31) which after some simple manipulations translates to the condition φ R∗ ∩ D RB − R∗ ∩ − (2 − p)c R∗ = − p ∩ D R∗ ∩ (32) Observe that, since D (·) and c(·) are decreasing, φ(·) will be continuous and increasing in the interval [0, RB /2] and the continuous maximization problem will not involve local maxima Also, the smallest value of φ(·) will be φ(0) = D (RB )/D (0) − (2 − p)c(0) and the largest value will be (33) if − p > φ RB /2 , while R∗ = RB − R∗ ∩ Returning to the discrete maximization problem, it is reasonable to assume that a near-optimal solution will resemble that of the equivalent continuous maximization problem, especially for large values of L This means that a nearoptimal choice for the index assignment sets would be I∩ = {1, , L∗ }, I = {L∗ + 1, , L∗ }, where L∗ and L∗ would ∩ ∩ ∩ be such that This is a simple Lagrangian maximization problem with optimal solution R∗ , R∗ satisfying the constraint (30) at the R ∩ boundary The optimal R∗ should satisfy ∩ R∗ ∩ if − p < φ(0), −1 RB /2, Find positive rates R ≥ R∩ ≥ maximizing = (2 − p) C R∩ + D R∩ Algorithm 3: Fast search algorithm R∗ ∩ Continuous maximization problem D R∩ , R L1 = max{l ∈ I : li=1 Ri ≤ RB /2} (index corresponding to RB /2) L2 = max{l ∈ I : li=1 Ri ≤ RB } (index corresponding to RB ) if φ(1, L2) > − p, set L∗ = 0, L∗ = L2 and exit (case ∩ − p < φ(0) in (33)) if φ(L1, L1) < − p, set L∗ = L1, L∗ = L1 and exit (case ∩ − p > φ(RB /2) in (33)) L∩ = 1, L = L2 (initial values) while L∩ ≤ L ∩ while L=1 Ri > RB − L=1 Ri decrease L i i (find largest L satisfying rate constraint) if φ(L∩ , L ) > − p set L∗ = L∩ , L∗ = L and exit ∩ (crossed − p line) increase L∩ (next value of L∩ ) endwhile L∗ = L1, L∗ = L1 (if this point is reached, φ(L∩ , L ) never ∩ crossed the − p line) L∗ ∩ L∗ Ri i=1 φ L ∗ , L∗ ∩ RB − Ri , (34) i=1 DL∗ /RL∗ − (2 − p)CL∗ ∩ DL∗ /RL∗ ∩ ∩ − p, (35) This consideration suggests Algorithm above The advantage of this algorithm lies in that it involves fewer calculations and terminates sooner that the improved exhaustive search algorithm It is clear, however, that the price paid for its reduced complexity, which is important in cases of real-time applications, is its inferior performance compared to the exhaustive search algorithms Let us also note that the implementation of the fast search algorithm involves serial search through all values from to the terminating, estimated optimal, value of L∗ A further ∩ improvement would involve a binary search modification of this algorithm, according to the actual values of φ(L∩ , L ) at the boundaries of the binary-search interval 12 EURASIP Journal on Applied Signal Processing Table 1: Description size (bytes) ratio and PSNR ratio of the two descriptions, for several frames of the sequence “Foreman” (p = 0.9, Rtotal = 128 Kbps) Frame Foreman Bytes ratio 1.015 1.020 0.950 0.972 0.989 PSNR ratio 1.005 1.006 0.985 0.993 1.003 EXPERIMENTAL RESULTS The proposed multiple description video coding scheme was experimentally evaluated for the transmission of the Y component (15 frames/second) of the standard test sequence “Foreman” over two channels Each frame was coded in two descriptions Motion vector information was duplicated in both descriptions The proposed redundancy allocation Algorithm of the preceding section was applied for video transmission over two channels of total capacity 128 Kbps and for three different probabilities of description arrival: p = 0.8, 0.9, 0.95, or equivalently three probabilities of description loss equal to 20%, 10%, 5% The number of frames in each GOP was chosen with respect to p as suggested in [28] The target rate RB for each frame was determined by allocating to intra-frames a rate equal to four times the rate allocated to interframes The resulting descriptions, as shown in Table for the first five frames of the sequence, are remarkably “balanced,” that is, they have approximately equal size and yield almost equal reconstruction qualities In the present work, we assume that descriptions that arrive at the decoder not contain bit errors We examine two types of transmission scenarios: in the first scenario, we assume that the channels retain their status during the entire transmission In this case, the parameter p serves as a means to control the redundancy and is not directly associated with the condition of the channel In the second scenario, we assume that the channels go on and off during transmission In the latter scenario, it is possible that both descriptions of a frame are lost In such a case, the decoder uses the most recent reference frame that is available For each frame, the peak-signal-to-noise-ratio is used as a measure of the reconstruction quality (in dB) PSNR = 10 log10 2552 MSE drift accumulation, which we will term multiple description wavelet video coder (MDWVC), is superior in comparison to the proposed drift-free system, termed DF-MDWVC This was expected since when both descriptions are available, drift is eliminated anyway On the other hand, the side distortion appears to be lower in the drift-free system The performance of MDWVC is shown in Figure The redundancy rate-distortion performance of our coders is shown in Figure As seen, DF-MDWVC and MDWVC reach similar performances for redundancy greater than 15% For lower redundancies, the drift-free system performs worse due to the very low quality of the reference frames In the second simulation, in which the channels may go on and off from frame to frame, we tested our systems under identical description loss patterns For each frame, one, two, or none of the descriptions was lost As seen from Figure and Tables and 3, the drift-free system is much more reliable and demonstrates no abrupt changes in its performance, contrary to MDWVC which demonstrates significant variations in the video quality it delivers In addition, both schemes demonstrate significant gains over the single description scheme which appears to collapse very frequently due to description losses In Figure 8(d), we report the performance of a scheme that is based on H.264 and uses the FMO for transmission of video over two channels This scheme uses P-frames and two FMO slices As seen, despite the fact that the H.264-based scheme uses advanced error concealment techniques at the decoder, the reconstruction quality it delivers exhibits significant variations in comparison to the quality achieved by our drift-free scheme Reconstructed frames obtained by simulating the transmission of 180 frames of the “Foreman” sequence at 15 frames/second over two channels of total capacity 128 Kbps and probability of description arrival equal to 0.9 using the above systems are displayed in Figure The reconstruction displayed in Figure 9(c), achieved using the driftfree system, is qualitatively more pleasant than the reconstruction using MDWVC This proves that, in practical cases, the drift-free system can be a better choice even though MDWVC operates better at low error rates The image reconstructed using the single description scheme exhibits the worst performance In Figure 10, we present the reconstruction quality obtained using the drift-free system for the case of transmission over four channels of total capacity 128 Kbps and probabilities of description loss equal to 20% (36) Following the approach adopted in [29, 30], the reported mean PSNR values are computed by averaging decoded MSE values and then converting the mean MSE to the corresponding PSNR value rather than averaging the PSNR values directly In the first transmission scenario, the coding of the “Foreman” sequence into two descriptions is simulated under the respective assumption that the channels are available or unavailable during the entire transmission As expected, the central distortion in the proposed scheme that allows CONCLUSIONS We presented a wavelet-based framework for the encoding of video in multiple descriptions The generation of multiple descriptions was performed so that drift is eliminated at the decoder side The proposed framework is flexible and allows the encoding of video into an arbitrary number of descriptions The resulting framework is endowed with the capability for drift-free reconstruction regardless of the number of descriptions that arrived at the decoder Three algorithms were also presented for the optimal allocation of Nikolaos V Boulgouris et al 13 Table 2: Performance comparison PSNR is reported 36 34 Packet loss 20% 10% 5% PSNR (dB) 32 30 Drift 26.60 27.32 27.60 Drift-free 26.90 27.49 27.84 28 26 Table 3: Performance comparison Standard deviation of reconstruction quality is reported 24 22 20 20 40 60 80 100 120 140 160 180 Frame number Central distortion Side distortion (a) Packet loss 20% 10% 5% Single 7.07 4.67 5.05 Drift 2.58 2.32 2.00 Drift-free 1.94 1.46 1.34 redundancy Experimental results for the transmission of video using two descriptions demonstrated the efficiency of the proposed method 38 36 34 PSNR (dB) Single 17.95 21.70 22.96 APPENDICES 32 30 A 28 Let xi∩ = 1I∩ (i), xi = 1I (i), i = 1, , L denote binaryvalued variables, where 1A is the indicator function of set A, that is, 1A (x) = if x ∈ A and 1A (x) = if x ∈ A Then, the sets I∩ and I are determined by the vectors x∩ ∩ ∩ [x1 , , xL ]T and x [x1 , , xL ]T , respectively, where AT denotes the transpose of matrix A If we adopt this notation, (18) may be written as 26 24 22 20 20 40 60 80 100 120 140 160 180 Frame number INTEGER (0–1) PROGRAMMING FORMULATION Central distortion Side distortion D I∩ , I = (2 − p)cT · x∩ + dT · x , (A.1) (b) with c [D1 (1 + C1 ), , DL (1 + CL )]T , d and constraint (19) as 38 [D1 , , DL ]T , 36 34 R I∩ , I = 2rT · x∩ + rT · x ≤ RB , (A.2) PSNR (dB) 32 30 with r 28 [R1 , , RL ]T Property I∩ ⊂ I may be written as 26 x∩ + x∪ ≤ 1L , 24 22 20 18 20 40 60 80 100 120 140 160 180 Frame number Central distortion Side distortion where 1L is the L × unity vector and inequalities involving vectors are meant in the percomponent sense In order to find the optimal solution, it suffices to find binary-valued vectors x∩ and x minimizing (A.1) subject to the constraints (A.2) and (A.3) This is an integer (0–1) programming problem and can be formulated by defining (c) x Figure 6: Reconstruction quality for the “Foreman” sequence using the MDWVC coder: (a) 20% probability of loss, (b) 10% probability of channel loss, and (c) 5% probability of loss (A.3) C x∩ , x IL IL , rT rT d (2 − p)c , d b 1L , RB (A.4) 14 EURASIP Journal on Applied Signal Processing 26.9 Average one-description PSNR (dB) Average one-description PSNR (dB) 27 26.5 26 25.5 25 24.5 24 23.5 23 10 15 20 25 26.8 26.7 26.6 26.5 26.4 26.3 26.2 26.1 26 16 18 20 22 Redundancy (%) 24 26 28 30 32 Redundancy (%) (a) (b) Figure 7: Redundancy rate-distortion performance of the proposed schemes: (a) drift, (b) drift-free where IL is the L × L identity matrix, x and d are 2L × vectors, C is a (L + 1) × L matrix, and b is a (L + 1) × vector In view of these definitions, the maximization problem may be expressed as an integer-programming problem where the subscript x = [x1 , , xK ]T is a (K × 1) binaryvalued vector and b Ik ⎧ ⎨I , k = c ⎩I , k Integer (0–1) programming problem Find (0–1)-valued vector x such that T max d · x, Cx ≤ b (A.5) Although several techniques exist for the solution of integer-programming problems, it is well known that integer-programming problems are, in general, NP-complete and, most of the times, exhaustive search over all possible realizations of binary-valued vector x is the only procedure that guarantees optimal solution Even if a cutting-plane or branch-and-bound technique is used, it does not guarantee that the number of operations will be less than exponential in L B THE GENERAL MULTIPLE DESCRIPTION PROBLEM In the general case, the original frame comprises L layers and we need to form K ≥ descriptions so that a rate constraint is met and the expected distortion reduction at the decoder is maximized Conforming to the notation used for the double-description case, we define the index sets Ik , k = 1, , K, where each Ik describes the assignment of layers to description k, and the events Ak = {Description k reaches the decoder}, k = 1, , K The index-assignment sets Ik , k = 1, , K define 2K disjoint subsets of the index set I = {1, , L}, which can be written as K Jx = k=1 Ikxk , x ∈ {0, 1}K , (B.1) b = 1, b = 0, k = 1, , K (B.2) For every x ∈ {0, 1}K , the set Jx comprises the indices belonging to the sets I j with x j = The original indexassignment sets Ik , k = 1, , K can then be expressed in terms of the collection {Jx }x∈{0,1}K as Ik = k = 1, , K Jx , (B.3) x∈{0,1}K :xk =1 K Let w(x) i=1 xk denote the weight of the binaryvalued vector x and for every index set A ⊂ I define Ri , R(A) D(A) i∈A Di , (B.4) i∈A representing the total rate and distortion reduction of the layers with indices in A The total rate sent to the decoder can be expressed as K R Jx x∈{0,1}K = (α) K Ri = k=1 i∈Ik (β) = Ri k=1 x∈{0,1}K :xk =1 i∈Jx w(x)R Jx , x∈{0,1}K (B.5) where (α) comes from (B.3) and the fact that the sets Jx are mutually disjoint and we can derive (β) by observing that each sum i∈Jx Ri appears exactly w(x) times in the previous expression of the total rate For a given x ∈ {0, 1}K , assume that x j1 = · · · = x jw(x) = and the rest are zero In order to express the expected Nikolaos V Boulgouris et al 15 35 30 30 PSNR (dB) 40 35 PSNR (dB) 40 25 25 20 20 15 15 10 10 20 40 60 80 100 120 140 160 180 20 40 60 Frame number Multiple description (drift-free) Multiple description (drift) Single description 100 120 140 160 180 Multiple description (drift-free) Multiple description (drift) Single description (a) (b) 40 35 35 30 30 PSNR (dB) 40 PSNR (dB) 80 Frame number 25 25 20 20 15 15 10 10 20 40 60 80 100 120 140 160 180 20 40 60 Frame number 80 100 120 140 160 180 Frame number Multiple description (drift-free) Multiple description (drift) Single description (d) (c) Figure 8: Reconstruction quality for the “Foreman” sequence when the channels go on and off during transmission and a probability of error equal to (a) 5%, (b) 10%, (c) 20%, and (d) transmission based on H.264 using flexible macroblock ordering distortion reduction at the decoder in terms of the collection {Jx }x∈{0,1}K , we observe that the distortion at the decoder will improve by D(Jx ) (layers with indices in Jx will be used) whenever the event Ax {description j1 description j2 or · · · description jw(x) is delivered} occurs, that is, probability Pr Ax = − Pr Ac x w(x) = − Pr =1 Acj = − (1 − p)w(x) (B.7) w(x) Ax = Aj (B.6) =1 Assuming that the events Ak , k = 1, , K are independent and Pr{Ak } = − Pr{Ac } = p, we can calculate its k If we also define C(A) i∈A Di Ci for A ⊂ I, the distortion reduction due to motion compensation based on the layers common to all descriptions will be C(J1K ), 1K being the (K × 1) unity vector The distortion reduction due to motion compensation is conditional on the event A1K (at least one 16 EURASIP Journal on Applied Signal Processing (a) (b) (c) (d) Figure 9: Reconstructed frame for the transmission of the “Foreman” sequence, p = 0.9, over two channels of total capacity 128 Kbps: (a) original “Foreman” frame, (b) reconstructed using the coder without drift control (25.84 dB), (c) reconstructed using the drift-free coder (28.81 dB), and (d) reconstructed using the single description coder (25.78 dB) 30 of the descriptions reaches the decoder) whose probability is − (1 − p)K Therefore, the overall expected distortion reduction at the decoder will be x∈{0,1}K = Pr A1K C J1K + Pr Ax D Jx x∈{0,1}K = − (1 − p)K C J1K − (1 − p) + w(x) 28 27 PSNR (dB) D Jx 29 26 25 24 23 D Jx 22 x∈{0,1}K (B.8) At this juncture, observe that both the total rate (B.5) and the expected distortion reduction (B.8) can be expressed as linear functions of the {R(Jx )}x∈{0,1}K and {D(Jx )}x∈{0,1}K , respectively, with coefficients depending only on the weight of the index vector x Therefore, we can group all sets Jx with the same weight and define the new (fewer) sets 21 20 20 40 60 80 100 120 140 160 180 Frame nmber Four descriptions (drift-free) (B.9) Figure 10: Reconstruction quality obtained using the drift-free system with four descriptions transmitted over channels with probability of loss equal to 20% each set Jk containing the layer indices assigned to exactly k descriptions Also, observe that the set J0 = J0 has a zero coefficient in both (B.5) and (B.8); hence, it does not contribute to the total rate or expected distortion reduction By reformulating (B.5) and (B.8), the maximization problem for the general multiple description case may be stated as follows Jk = Jx , k = 0, , K, x∈{0,1}K :w(x)=k Nikolaos V Boulgouris et al 17 General maximization problem that the heuristics stemming from the equivalent continuous general maximization problem will provide solutions deviating from the optimal one even more as K increases Find disjoint sets J1 , , JK ⊂ I maximizing D J1 , , JK = − (1 − p)K C JK K k − (1 − p) D Jk + (B.10) k=1 subject to the constraint K kR Jk ≤ RB R J1 , , JK = (B.11) k=1 The integer-programming formulation of the general maximization problem would involve K binary-valued L × vectors xk , k = 1, , K, with ⎧ ⎨1, xk,i = ⎩ 0, if i ∈ Jk , if i ∈ Jk , k = 1, , K, i = 1, , L, (B.12) and the requirement that the Jk , k = 1, , K be disjoint can be written as x1 + · · · + xK ≤ 1L (B.13) Let us define x dK pdT · · · CK T T T x1 · · · xk · · · xK − (1 − p)k dT · · · T , − (1 − p)K cT T , IL · · · IL · · · IL , rT · · · krT · · · KrT bK 1T RB L T , (B.14) where x and dK are KL × vectors, CK is a (L + 1) × KL matrix, bK is a (L + 1) × vector and the L × vectors r, d, c are those defined in the double-description integerprogramming formulation Then, the integer-programming formulation of the general multiple description problem will be as follows General integer (0–1) programming problem Find (0–1)-valued vector x such that T max dK · x, C K x ≤ bK (B.15) As is clear from the integer-programming formulation, the complexity of the general maximization problem may be as high as 2KL Heuristics similar to those proposed for the double-description case may be used for an estimate of the optimal index-assignment scheme, based on the general equivalent continuous problem, which can be easily formulated from (B.10) and (B.11) It is reasonable to conjecture ACKNOWLEDGMENTS The authors would like to thank Savvas Argyropoulos for his help with the H.264 results and the anonymous reviewers for their constructive comments This work was partly presented in the IEEE International conference on image processing, 2004 This work was supported by the EU IST projects “3DTV,” “BOEMIE,” and “K-SPACE.” REFERENCES [1] A E El Gamal and T M Cover, “Achievable rates for multiple descriptions,” IEEE Transactions on Information Theory, vol 28, no 6, pp 851–857, 1982 [2] V A Vaishampayan, “Design of multiple description scalar quantizers,” IEEE Transactions on Information Theory, vol 39, no 3, pp 821–834, 1993 [3] A C Miguel, A E Mohr, and E A Riskin, “SPIHT for generalized multiple description coding,” in Proceedings of IEEE International Conference on Image Processing (ICIP ’99), vol 3, pp 842–846, Kobe, Japan, October 1999 [4] W Jiang and A Ortega, “Multiple description coding via polyphase transform and selective quantization,” in Visual Communications and Image Processing (VCIP ’99), vol 3653, part 1-2 of Proceedings of SPIE, pp 998–1008, San Jose, Calif, USA, January 1999 [5] A E Mohr, E A Riskin, and R E Ladner, “Generalized multiple description coding through unequal loss protection,” in Proceedings of IEEE International Conference on Image Processing (ICIP ’99), vol 1, pp 411–415, Kobe, Japan, October 1999 [6] A R Reibman, H Jafarkhani, Y Wang, M T Orchard, and R Puri, “Multiple description coding for video using motion compensated prediction,” in Proceedings of IEEE International Conference on Image Processing (ICIP ’99), vol 3, pp 837–841, Kobe, Japan, October 1999 [7] V A Vaishampayan and S John, “Interframe balanced multiple description video compression,” in Proceedings of 9th Packet Video Workshop (PVW ’99), pp 812–816, New York, NY, USA, April 1999 [8] A Sehgal, A Jagmohan, and N Ahuja, “Wireless video conferencing using multiple description coding,” in Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS ’01), vol 5, pp 303–306, Sydney, NSW, Australia, May 2001 [9] N V Boulgouris, K E Zachariadis, A N Leontaris, and M G Strintzis, “Drift-free multiple description coding of video,” in Proceedings of 4th IEEE Workshop on Multimedia Signal Processing (MMSP ’01), pp 105–110, Cannes, France, October 2001 [10] Y.-C Lee, Y Altunbasak, and R M Mersereau, “An enhanced two-stage multiple description video coder with drift reduction,” IEEE Transactions on Circuits and Systems for Video Technology, vol 14, no 1, pp 122–127, 2004 [11] N Franchi, M Fumagalli, R Lancini, and S Tubaro, “Multiple description video coding for scalable and robust transmission over IP,” IEEE Transactions on Circuits and Systems for Video Technology, vol 15, no 3, pp 321–334, 2005 18 [12] X Yang and K Ramchandran, “Optimal multiple description subband coding,” in Proceedings of IEEE International Conference on Image Processing (ICIP ’98), vol 1, pp 654–658, Chicago, Ill, USA, October 1998 [13] Y Wang, A R Reibman, and S Lin, “Multiple description coding for video delivery,” Proceedings of the IEEE, vol 93, no 1, pp 57–70, 2005 [14] A R Reibman, H Jafarkhani, Y Wang, and M T Orchard, “Multiple description video using rate-distortion splitting,” in Proceedings of IEEE International Conference on Image Processing (ICIP ’01), vol 1, pp 978–981, Thessaloniki, Greece, October 2001 [15] N V Boulgouris, K E Zachariadis, A Kanlis, and M G Strintzis, “Multiple description wavelet coding of layered video,” in Proceedings of IEEE International Conference on Image Processing (ICIP ’04), vol 4, pp 2263–2266, Singapore, October 2004 [16] M Pereira, M Antonini, and M Barlaud, “Multiple description coding for internet video streaming,” in Proceedings of IEEE International Conference on Image Processing (ICIP ’03), vol 3, pp 281–284, Barcelona, Spain, September 2003 [17] T Petrisor, C Tillier, B Pesquet-Popescu, and J.-C Pesquet, “Redundant multiresolution analysis for multiple description video coding,” in Proceedings of 6th IEEE Workshop on Multimedia Signal Processing (MMSP ’04), pp 95–98, Siena, Italy, September–October 2004 [18] H Watanabe and S Singhal, “Windowed motion compensation,” in Visual Communications and Image Processing ’91: Visual Communication, vol 1605, part of Proceedings of SPIE, pp 582–589, Boston, Mass, USA, November 1991 [19] N V Boulgouris, D Tzovaras, and M G Strintzis, “Lossless image compression based on optimal prediction, adaptive lifting, and conditional arithmetic coding,” IEEE Transactions on Image Processing, vol 10, no 1, pp 1–14, 2001 [20] J F Arnold, M R Fracter, and Y Wang, “Efficient drift-free signal-to-noise ratio scalability,” IEEE Transactions on Circuits and Systems for Video Technology, vol 10, no 1, pp 70–82, 2000 [21] D Taubman, “High performance scalable image compression with EBCOT,” IEEE Transactions on Image Processing, vol 9, no 7, pp 1158–1170, 2000 [22] D Wang, N Canagarajah, and D Bull, “Slice group based multiple description video coding using motion vector estimation,” in Proceedings of IEEE International Conference on Image Processing (ICIP ’04), vol 5, pp 3237–3240, Singapore, October 2004 [23] M Antonini, M Barlaud, P Mathieu, and I Daubechies, “Image coding using wavelet transform,” IEEE Transactions on Image Processing, vol 1, no 2, pp 205–220, 1992 [24] J M Shapiro, “Embedded image coding using zerotrees of wavelet coefficients,” IEEE Transactions on Signal Processing, vol 41, no 12, pp 3445–3462, 1993 [25] I H Witten, R M Neal, and J G Cleary, “Arithmetic coding for data compression,” Communications of the ACM, vol 30, no 6, pp 520–540, 1987 [26] J Li and S Lei, “An embedded still image coder with ratedistortion optimization,” IEEE Transactions on Image Processing, vol 8, no 7, pp 913–924, 1999 [27] P.-Y Cheng, J Li, and C.-C Jay Kuo, “Rate control for an embedded wavelet video coder,” IEEE Transactions on Circuits and Systems for Video Technology, vol 7, no 4, pp 696–702, 1997 EURASIP Journal on Applied Signal Processing [28] A R Reibman, H Jafarkhani, Y Wang, M T Orchard, and R Puri, “Multiple-description video coding using motioncompensated temporal prediction,” IEEE Transactions on Circuits and Systems for Video Technology, vol 12, no 3, pp 193– 204, 2002 [29] P G Sherwood and K Zeger, “Error protection for progressive image transmission over memoryless and fading channels,” IEEE Transactions on Communications, vol 46, no 12, pp 1555–1559, 1998 [30] D G Sachs, R Anand, and K Ramchandran, “Wireless image transmission using multiple-description based concatenated codes,” in Proceedings of Data Compression Conference (DCC ’00), p 569, Snowbird, Utah, USA, March 2000 Nikolaos V Boulgouris received the Diploma and the Ph.D degrees from the Electrical and Computer Engineering Department of the University of Thessaloniki, Greece, in 1997 and 2002, respectively Since December 2004, he has been a Lecturer with the Department of Electronic Engineering, Division of Engineering, at King’s College London, United Kingdom From September 2003 to November 2004 he was a Postdoctoral Fellow with the Department of Electrical and Computer Engineering, University of Toronto, Canada Previously, he was affiliated with the Informatics and Telematics Institute in Greece He has participated in several research projects in the areas of image/video communication, pattern recognition, multimedia security, and content-based indexing and retrieval He is a Member of the IEEE and the British Machine Vision Association Konstantinos E Zachariadis is currently a Ph.D candidate of Managerial Economics and Strategy in Northwestern University’s Kellogg School of Management He received his M.S in electrical and computer engineering, from Northwestern University, for work on source fidelity over fading channels using scalable codes and erasure codes; and his Diploma in electrical and computer engineering from the Aristotle University of Thessaloniki in Greece, for work on multiple description coding of images and video For his graduate studies, he has received the IEEE Life Member Graduate Study Fellowship and a Fulbright Fellowship His current research interests are on dynamic contract theory, stochastic control, services operations, and resource allocation in wireless communications Angelos Kanlis obtained the Diploma in electrical engineering from the Aristotle University of Thessaloniki, Thessaloniki, Greece in 1992, and the M.S and Ph.D degrees in electrical engineering from the University of Maryland, College Park, in 1994 and 1997, respectively Since 2003, he has been a Patent Examiner at the European Patent Office, Munich, Germany He was a Visiting Professor in the Department of Computer & Communication Engineering, Volos, Greece (2002– 2003) and a Research Associate in the Informatics and Telematics Institute, Thessaloniki, Greece (2000–2002) and the Foundation of Research and Technology, Heraklion Crete, Greece (1997–1999) Nikolaos V Boulgouris et al Michael G Strintzis received the Diploma in electrical engineering from the National Technical University of Athens, Athens, Greece, in 1967 and the M.A and Ph.D degrees in electrical engineering from Princeton University, Princeton, NJ, in 1969 and 1970, respectively He joined the Electrical Engineering Department, University of Pittsburgh, Pittsburgh, PA, where he served as an Assistant Professor from 1970 to 1976 and an Associate Professor from 1976 to 1980 During that time, he worked in the area of stability of multidimensional systems Since 1980, he has been a Professor of electrical and computer engineering at the Aristotle University of Thessaloniki, Thessaloniki, Greece He has worked in the areas of multidimensional imaging and video coding Over the past ten years, he has authored over 100 journal publications and over 200 conference presentations In 1998, he founded the Informatics and Telematics Institute, currently part of the Centre for Research and Technology Hellas, Thessaloniki He was awarded the Centennial Medal of the IEEE in 1984 and the Empirikeion Award for Research Excellence in Engineering in 1999 19 ... decoding Wavelet decoding Iest,0 = Eest,0 WT – Description K: K–1 Dt I-frame Output video Iest,t P-frame Description 2: Dt Wavelet decoding + Eest,t Wavelet decoding IC,t OR Arithmetic decoding... Frame number Multiple description (drift-free) Multiple description (drift) Single description 100 120 140 160 180 Multiple description (drift-free) Multiple description (drift) Single description. .. generation of descriptions can be achieved by including appropriate blocks of wavelet coefficients in one or both of the descriptions In the case of two descriptions, this is achieved by using the