"Wireless video communications encompass a broad range of issues and opportunities that serve as the catalyst for technical innovations. To disseminate the most recent advances in this challenging yet exciting field, Advanced Video Communications over Wireless Networks provides an in-depth look at the fundamentals, recent technical achievements, challenges, and emerging trends in mobile and wireless video communications. The editors have carefully selected a panel of researchers with expertise in diverse aspects of wireless video communication to cover a wide spectrum of topics, including the underlying theoretical fundamentals associated with wireless video communications, the transmission schemes tailored to mobile and wireless networks, quality metrics, the architectures of practical systems, as well as some novel directions. They address future directions, including Quality-of-Experience in wireless video communications, video communications over future networks, and 3D video communications. The book presents a collection of tutorials, surveys, and original contributions, providing an up-to-date, accessible reference for further development of research and applications in mobile and wireless video communication systems. The range of coverage and depth of expertise make this book the go-to resource for facing current and future challenges in this field."
Trang 2Network-Aware Error-Resilient Video Coding
Luís Ducla Soares and Paulo Nunes
2.
Distributed Video Coding: Principles and Challenges
Jürgen Slowack and Rik Van de Walle
3.
Computer Vision–Aided Video Coding
Manoranjan Paul and Weisi Lin
Cooperative Video Provisioning in Mobile Wireless Environments
Paolo Bellavista, Antonio Corradi, and Carlo Giannelli
Trang 3Combined CODEC and Network Parameters for an Enhanced Quality
of Experience in Video Streaming
Araz Jahaniaval and Dalia Fayek
15.
Video QoS Analysis over Wi-Fi Networks
Rashid Mehmood and Raad Alturki
Index
Preface
Video communication has evolved from a simple tool for visual communication
to a key enabler for various video applications A number of exciting videoapplications have been successfully deployed in recent years, with the goal ofproviding users with more flexible, personalized, and content-rich viewingexperience Accompanied with the ubiquitous video applications, we have alsoexperienced a paradigm shift from passive, wired, and centralized video contentaccess to interactive, wireless, and distributed content access Undoubtedly,wireless video communications have paved the way for advanced applications.However, given the distributed, resource-constraint, and heterogeneous nature
of wireless networks, the support of quality video communications over wirelessnetworks is still challenging Video coding is one of the indispensablecomponents in various wireless video applications, whereas the wirelessnetwork condition always imposes more stringent requirements on codingtechnologies To cope with the limited transmission bandwidth and to offeradaptivity to the harsh wireless channels, rate control, packet scheduling, aswell as error control mechanisms are usually incorporated in the design ofcodecs to enable efficient and reliable video communications At the same time,due to energy constraint in wireless systems, video coding algorithms shouldoperate with the lowest possible power consumption Therefore, video codingover wireless networks is inherently a complex optimization problem with a set
of constraints In addition, the high heterogeneity and user mobility associatedwith wireless networks are also key issues to be tackled for a seamless delivery
of quality-of-experience supported video streams
To sum up, wireless video communications encompass a broad range ofchallenges and opportunities that provide the catalyst for technical innovations
To disseminate the most recent advances in this challenging yet exciting field,
we bring forth this book as a compilation of high-quality chapters This book isintended to be an up-to-date reference book on wireless video communications,providing the fundamentals, recent technical achievements, challenges, andsome emerging trends We hope that the book will be accessible to variousaudiences, ranging from those in academia and industry to seniorundergraduates and postgraduates To achieve this goal, we have solicitedchapters from a number of researchers who are experts in diverse aspects ofwireless video communications We received a good response and, finally, afterpeer review and revision, 15 chapters were selected These chapters cover a
Trang 4wide spectrum of topics, including the underlying theoretical fundamentalsassociated with wireless video communications, transmission schemes tailored
to mobile and wireless networks, quality metrics, architectures of practicalsystems, as well as some novel directions In what follows, we present asummary of each chapter
In Chapter 1, “Network-Aware Error-Resilient Video Coding,” a network-awareIntra coding refresh method is presented This method increases the errorrobustness of H.264/AVC bitstreams, considering the network packet loss rateand the encoding bit rate, by efficiently taking into account the rate-distortionimpact of Intra coding decisions while guaranteeing that errors do notpropagate
Chapter 2, “Distributed Video Coding: Principles and Challenges,” is a tutorial ondistributed video coding (DVC) In contrast to conventional video compressionschemes featuring an encoder that is significantly more complex than thedecoder, in DVC the complexity distribution is the reverse This chapter provides
an overview of the basic principles, state of the art, current problems, andtrends in DVC
Chapter 3, “Computer Vision Aided Video Coding,” studies video coding from theperspective of computer vision Motivated by the fact that the human visualsystem (HVS) is the ultimate receiver of the majority of compressed videos andthat there is a scope to remove unimportant information through HVS, thechapter proposes a computer vision–aided video coding technique by exploitingthe spatial and temporal redundancies with visually unimportant information
In Chapter 4, “Macroblock Classification Method for Computation Control VideoCoding and Other Video Applications Involving Motions,” a new macroblock (MB)classification method is proposed, which classifies MBs into different classesaccording to their temporal and spatial motion and texture information.Furthermore, the implementations of the proposed MB classification method intocomplexity-scalable video coding as well as other video applications are alsodiscussed in detail in the chapter
Chapter 5, “Transmission Rate Adaptation in Multimedia WLAN: A DynamicGames Approach,” considers the scheduling, rate adaptation, and buffermanagement in a multiuser wireless local area network (WLAN), where eachuser transmits scalable video payload Based on opportunistic scheduling, usersaccess the available medium (channel) in a decentralized manner The rateadaptation problem of the WLAN multimedia networks is then formulated as ageneral-sum switching control dynamic Markovian game
In Chapter 6, “Energy and Bandwidth Optimization in Mobile Video StreamingSystems,” the authors consider the problem of multicasting multiple variable bitrate video streams from a wireless base station to many mobile receivers over acommon wireless channel This chapter presents a sequence of increasinglysophisticated streaming protocols for optimizing energy usage and utilization ofthe wireless bandwidth
Chapter 7, “Resource Allocation for Scalable Videos over Cognitive RadioNetworks,” investigates the challenging problem of video communication overcognitive radio (CR) networks It first addresses the problem of scalable videoover infrastructure-based CR networks and then considers the problem ofscalable video over multihop CR networks
Trang 5Chapter 8, “Cooperative Video Provisioning in Mobile Wireless Environments,”focuses on the challenging scenario of cooperative video provisioning in mobilewireless environments On one hand, it provides a general overview about thestate-of-the-art literature on collaborative mobile networking On the otherhand, it provides technical details and reports about the RAMP middleware casestudy, practically showing that node cooperation can properly achievestreaming adaptation.
Chapter 9, “Multilayer Iterative FEC Decoding for Video Transmission overWireless Networks,” develops a novel multilayer iterative decoding schemeusing deterministic bits to lower the decoding threshold of low-density parity-check (LDPC) codes These deterministic bits serve as known information in theLDPC decoding process to reduce redundancy during data transmission Unlikethe existing work, the proposed scheme addresses controllable deterministicbits, such as MPEG null packets, rather than widely investigated protocolheaders
Chapter 10, “Network-Adaptive Rate and Error Controls for WiFi VideoStreaming,” investigates the fundamental issues for network-adaptive mobilevideo streaming over WiFi networks Specifically, it highlights the practicalaspects of network-adaptive rate and error control schemes to overcome thedynamic variations of underlying WiFi networks
Chapter 11, “State of the Art and Challenges for 3D Video Delivery over MobileBroadband Networks,” examines the technologies underlying the delivery of 3Dvideo content to wireless subscribers over mobile broadband networks Theincorporated study covers key issues, such as the effective delivery of 3D videocontent in a system that has limited resources in comparison to wired networks,network design issues, as well as scalability and backward compatibilityconcepts
In Chapter 12, “A New Hierarchical 16-QAM-Based UEP Scheme for 3-D Videowith Depth Image–Based Rendering,” an unequal error protection (UEP) schemebased on hierarchical quadrature amplitude modulation (HQAM) for 3-D videotransmission is proposed The proposed scheme exploits the uniquecharacteristics of the color plus depth map stereoscopic video where the colorsequence has a significant impact on the reconstructed video quality
Chapter 13, “2D-to-3D Video Conversion: Techniques and Applications in 3DVideo Communications,” provides an overview of the main techniques for 2D-to-3D conversion, which includes different depth cues and state-of-the-artschemes In the 3D video communications context, 2D-to-3D conversion hasbeen used to improve the coding efficiency and the error resiliency andconcealment for the 2D video plus depth format
Chapter 14, “Combined CODEC and Network Parameters for an EnhancedQuality of Experience in Video Streaming,” presents the research involved inbridging the gap between the worlds of video compression/encoding andnetwork traffic engineering by (i) using enriched video trace formats inscheduling and traffic control, (ii) using prioritized and error-resilience features
in H.264, and (iii) optimizing the combination of the network performanceindices with codec-specific distortion parameters for an increased quality of thereceived video
Trang 6In Chapter 15, “Video QoS Analysis over Wi-Fi Networks,” the authors present adetailed end-to-end QoS analysis for video applications over wireless networks,both infrastructure and ad hoc networks Several networking scenarios arecarefully configured with variations in network sizes, applications, codecs, androuting protocols to extensively analyze network performance.
MATLAB® is a registered trademark of The MathWorks, Inc For productinformation, please contact:
The MathWorks, Inc
3 Apple Hill Drive
Natick, MA, 01760-2098 USA
Contributors
Omar Abdul-Hameed
Faculty of Engineering and Physical Sciences
I-Lab: Multimedia Communications Research
Department of Electronic Engineering
Centre for Vision, Speech and Signal Processing
University of Surrey
Surrey, United Kingdom
Khalid Mohamed Alajel
Faculty of Engineering and Surveying
University of Southern Queensland
Toowoomba, Queensland, Australia
Raad Alturki
Department of Computer Science
Al Imam Mohammad Ibn Saud Islamic University
Riyadh, Saudi Arabia
Paolo Bellavista
Department of Electronics, Computer Science, and Systems
University of Bologna
Bologna, Italy
Trang 7Faculty of Engineering and Physical Sciences
I-Lab: Multimedia Communications Research
Department of Electronic Engineering
Centre for Vision, Speech and Signal Processing
Branch of Broadcast Technologies Research
Communications Research Centre Canada
Ottawa, Ontario, Canada
School of Computing Science
Simon Fraser University
Surrey, British Columbia, Canada
Cheng-Hsin Hsu
Department of Computer Science
National Tsing Hua University
Hsin Chu, Taiwan, Republic of China
Trang 8Electrical Computer Engineering Department
University of British Columbia
Vancouver, British Columbia, Canada
Chinese Academy of Sciences
Haidian, Beijing, People’s Republic of China
JongWon Kim
School of Information and Communications
Gwangju Institute of Science and Technology (GIST)Gwangju, South Korea
Ahmet Kondoz
Faculty of Engineering and Physical Sciences
I-Lab: Multimedia Communications Research
Department of Electronic Engineering
Centre for Vision, Speech and Signal ProcessingUniversity of Surrey
Surrey, United Kingdom
Vikram Krishnamurthy
Electrical Computer Engineering Department
University of British Columbia
Vancouver, British Columbia, Canada
Ghent, Belgium
and
Institute of Information Science
Beijing Jiaotong University
Trang 9Haidian, Beijing, People’s Republic of China
Weisi Lin
School of Computer Engineering
Nanyang Technological University
Singapore, Singapore
Weiyao Lin
Department of Electronic Engineering
Shanghai Jiao Tong University
Xuhui, Shanghai, People’s Republic of China
Hassan Mansour
Electrical Computer Engineering Department
University of British Columbia
Vancouver, British Columbia, Canada
Communications R&D Center
Samsung Thales Co., Ltd
Seongnam-Si, South Korea
Manoranjan Paul
School of Computing and Mathematics
Charles Sturt University
Bathurst, New South Wales, Australia
Joseph Peters
School of Computing Science
Simon Fraser University
Surrey, British Columbia, Canada
Bo Rong
Trang 10Branch of Broadcast Technologies Research
Communications Research Centre Canada
Ottawa, Ontario, Canada
Branch of Broadcast Technologies Research
Communications Research Centre Canada
Ottawa, Ontario, Canada
Wei Xiang
Faculty of Engineering and Surveying
University of Southern Queensland
Toowoomba, Queensland, Australia
Chongyang Zhang
Department of Electronic Engineering
Shanghai Jiao Tong University
Xuhui, Shanghai, People’s Republic of China
Network-Aware Error-Resilient Video Coding
Luís Ducla Soares and Paulo Nunes
CONTENTS
Trang 111.1 Introduction
1.2 Video Coding Framework
1.2.1 Rate-Distortion Optimization
1.2.2 Random Intra Refresh
1.3 Efficient Intracoding Refresh
1.3.1 Error-Resilient RDO-Driven Intra Refresh
1.3.1.1 RDO Intra and Inter Mode Decision
1.3.1.2 Error-Resilient Intra/Inter Mode Decision
1.3.2 Random Intra Refresh
1.4 Network-Aware Error-Resilient Video Coding Method
1.4.1 Intra/Inter Mode Decision with Constant α RD
1.4.2 Intra/Inter Mode Decision with Network-Aware αRD Selection
1.4.3 Model for the f NMD Mapping Function
1.4.4 Network-Aware Cyclic Intra Refresh
1.4.5 Intra Refresh with Network-Aware αRD and CIR Selection
In order to extend the useful lifetime of a video coding standard, standardizationbodies usually specify the minimum set of tools that are essential forguaranteeing interoperability between devices or applications of differentmanufacturers With this strategy, the standard may evolve continuouslythrough the development and improvement of its nonnormative parts Errorresilience is an example of a video coding tool that is not completely specified in
a normative way, in any of the currently available and emerging video codingstandards The reason for this is that it is simply not necessary forinteroperability and, therefore, it is one of the main degrees of freedom toimprove the performance of standard-based systems, even after the standardhas been finalized Nevertheless, recognizing the paramount importance of thistype of tool, standardization initiatives always include a minimum set of error-resilient hooks (e.g., in the form of bitstream syntax elements) in order tofacilitate the development of effective error resilience techniques, as needed forthe particular application envisaged
Error-resilience techniques are usually seen as playing a role at the decoder side
of the communication chain However, by using preventive error resiliencetechniques at the encoder side, which involve the intelligent design of theencoder, it is also possible to make the task of the decoder much easier in terms
Trang 12of dealing with errors In fact, the performance of the decoder can greatly varydepending on the amount of error resilience help provided in the bitstreamgenerated by the encoder This way, at the encoder, the challenge is to developtechniques that make video bitstreams more resilient to errors, in order to allowthe decoder to better recover in case errors occur; these techniques may becalled preventive error resilience techniques At the decoder, the challenge is todevelop techniques that make it possible for the decoder to take all theavailable received data (correct and, eventually, corrupted) and decode it withthe best possible video quality, thus minimizing the negative subjective impact
of the errors on the video quality offered to the user; these techniques may becalled corrective error resilience techniques
Video communication systems, in order to be globally more error-resilient tochannel errors, typically include both preventive and corrective error-resilienttechniques An important class of preventive techniques is error-resilient sourcecoding, which consists of providing redundancy at the source coding level inorder to prevent error propagation and consequently reduce the distortioncaused by data corruption/loss Error-resilient source coding techniques includedata partitioning, resynchronization and reversible variable length codes [1,2],redundant coding schemes, such as sending the same information predictedfrom different references [3], scalable video coding [4,5,6], or multipledescription coding [7,8] Besides source coding redundancy, channel codingredundancy can also be used, where a good example is the case of forwarderror correction [9] In terms of corrective error-resilient techniques, errorconcealment techniques correspond to one of the most important classes, butother important techniques also exist, such as error detection and errorlocalization techniques [10] Error concealment techniques consist essentially ofpostprocessing methods aiming at recovering missing or corrupted data fromneighboring data (either spatially or temporally) [11], but for these techniques
to be truly effective, an error detection technique should be first used to detect
if an error has indeed occurred, followed by an error localization technique todetermine where the error occurred and which parts of the video content wereaffected [10] For a good review of the many different preventive and correctiveerror-resilient video coding techniques that have been proposed in theliterature, the reader can refer to Refs [12,13]
This chapter addresses the problem of error-resilient encoding, in particular ofhow to efficiently improve the resilience of compressed video bitstreams, whileadaptively considering the network characteristics in terms of information loss.Video coding systems that rely on predictive (inter) coding to remove temporalredundancy, such as those based on the H.264/AVC standard [14], are stronglyaffected by transmission errors/information loss due to the error propagationcaused by the prediction mechanisms Therefore, typical approaches to makebitstreams generated by the encoder more error-resilient rely on the adaptation
of the video coding mode decisions, at various levels (e.g., picture, slice, ormacroblock level), to the underlying network characteristics, trying to establish
an adequate trade-off between predictive and non-predictive encoding modes.This is done because nonpredictive modes are less efficient in terms ofcompression but can provide higher error resilience In this context, controlling
Trang 13the amount of nonpredictive versus predictive encoded data is an efficient andhighly scalable error resilience tool.
The intracoding refresh schemes available in the literature[2,15,16,17,18,19,20,21,22] are a typical example of efficient error resiliencetechniques to improve the video quality over error-prone environments withoutrequiring changes to the bitstream syntax, thus allowing to continuouslyimprove the performance of standard video codecs without compromisinginteroperability However, a permanently open issue related to these techniques
is how to achieve the best trade-off between error resilience and codingefficiency
Since these schemes work by selectively coding in intra mode different parts ofthe video content at different time instants, they are able to avoid long-termpropagation of transmission or storage errors that could make the decodedquality decay very rapidly This way, these intracoding refresh schemes are able
to significantly improve the error resilience of the coded bitstreams and increasethe overall subjective impact of the decoded video While some schemes do notrequire any specific knowledge of what is being done at the decoder in terms oferror concealment [16,17,18], other approaches try to estimate the distortionexperienced at the decoder given a certain probability of data corruption/lossand the concealment techniques adopted [2,22]
The problem with most video coding mode decision approaches, includingtypical intracoding refresh schemes, is that they can significantly decrease thecoding efficiency if they make their decisions without taking into account therate-distortion (RD) cost of such decisions This problem can be dealt with bycombining the error-resilient coding mode decisions with the video encoder ratecontrol module [23], where the usual coding mode decisions are taken [24,25].This way, coding-efficient error robustness can be achieved In the specific case
of intracoding refresh schemes, a clever solution for this combination, is tocompare the RD cost of coding macroblocks (MBs) in intra and inter modes; ifthe cost of intracoding is only slightly larger than the cost of intercoding, thenthe coding mode could be changed to intra, providing error robustness almostfor free This strategy is able to reduce error propagation and, thus, to increaseerror robustness when transmission errors occur, at a very limited RD costincrease and without the huge complexity of estimating the expected distortionexperienced at the decoder
Nevertheless, in order for these error-resilient video coding mode decisionschemes to be really useful in an adaptive way, the current error characteristics
of the underlying network being used for transmission should be taken intoaccount For example, in the case of intracoding refresh schemes, this will allowthe bit rate resources allocated to intracoding refresh to be adequately adapted
to the error characteristics of the network [26] After all, networks with smallamounts of channel errors only need small amounts of intracoding refresh andvice versa Thus, efficient bit rate allocation in an error-resilient way has todepend on the feedback received from the network about its current errorcharacteristics, which define the error robustness needed
Therefore, network awareness makes it possible to dynamically vary the amount
of error resilience resources to better suit the current state of the network and,therefore, further improve the decoded video quality without reducing the error
Trang 14robustness [26,27] This problem is nowadays more relevant than ever, since
more and more audiovisual content is accessed over error-prone networks, such
as mobile networks, and these networks can have extremely varying error
characteristics (over time)
As an illustrative insightful example, this chapter presents a fully automatic
network-aware MB intracoding refresh technique for error-resilient H.264/AVC
video coding, which also dynamically adjusts the amount of cyclically intra
refreshed MBs according to the network conditions, guaranteeing that endless
error propagation is avoided
The rest of the chapter is organized as follows Section 1.2 describes the general
video coding framework that was used for implementing the considered
error-resilient network-aware MB intracoding refresh scheme Section 1.3 introduces
the concept of efficient intracoding refresh, which will later be needed in Section
1.4, where the considered network-aware intracoding refresh scheme itself is
described Section 1.5 presents some relevant performance results for the
considered scheme in typical mobile network conditions and, finally, Section
1.6 concludes the chapter
1.2 Video Coding Framework
The network-aware error-resilient scheme described in this chapter relies on the
rate control scheme proposed by Li et al [24,28], as well as on the RD
optimization (RDO) framework and the random intra refresh technique included
in the H.264/AVC reference software [25] Since the main contributions and
novelty of network-aware error-resilient scheme described in this chapter regard
the latter two techniques, it is useful to first briefly review the RDO and the
random intra refresh techniques included in the H.264/AVC reference software
in order for the reader to better understand the described solutions
1.2.1 Rate-Distortion Optimization
The H.264/AVC video coding standard owes its major performance gains,
relatively to previous standards, essentially to the many different intra and inter
MB coding modes supported by the video coding syntax Although not all modes
are allowed in every H.264/AVC profile [14], even for the simplest profiles, such
as the Baseline Profile, the encoder has a plethora of possibilities to encode
each MB, which makes it difficult to accomplish optimal MB coding mode
decisions with low (encoding) complexity Besides the MB coding mode decision,
for motion-compensated inter coded MBs, finding the optimal motion vectors
and MB partitions is also not a straightforward task In this context, RDO
becomes a powerful tool, allowing the encoder to optimally select the best MB
coding modes and motion vectors (if applicable) [28,29]
In the H.264/AVC reference software [25], the best MB mode decision is
accomplished through the RDO technique, where the best MB mode is selected
by minimizing the following Lagrangian cost function:
Trang 15where
MODE is one of the allowable MB coding modes (e.g., SKIP, INTER 16 × 16,
INTER 16 × 8, INTER 8 × 16, INTER 8 × 8, INTRA 4 × 4, INTRA 16 × 16)
QP is the quantization parameter
D(MODE, QP) and R(MODE,QP) are, respectively, the distortion (between the
original and the reconstructed MB) and the number of bits that will be achieved
by applying the corresponding MODE and QP
In Ref [28], it is recommended that, for intra (I) and inter predicted (P) slices,
λMODE be computed as follows:
(1.2)
Motion estimation can also be accomplished through the same framework In
this case, the best motion vector and reference frame can be selected by
minimizing the following Lagrangian cost function:
)
where
mv(REF) is the motion vector for the frame reference REF
D(mv(REF)) is the residual error measure, such as the sum of absolute
differences (SAD) between the original and the reference
R(mv(REF)) is the number of bits necessary to encode the corresponding motion
vector (i.e., the motion vector difference between the selected motion vector
and its prediction) and to signal the selected reference frame
In a similar way, Ref [28] also recommends that, for P-slices, λMOTION be computed
as
(1.4)
when the SAD measure is used
Since the quantization parameter is required for computing the Lagrangian
multipliers λMODE and λMOT1ON, as well as for computing the number of bits to encode
the residue for a given MB, a rate control mechanism must be used that can
efficiently compute for each MB (or set of MBs, such as a slice) an adequate
quantization parameter in order to maximize the decoded video quality for a
given bit rate budget In this case, the method proposed by Li et al [24,28] has
Trang 16been used since it is the one implemented in the H.264/AVC reference software[25].
1.2.2 Random Intra Refresh
As mentioned earlier, the H.264/AVC reference software [25] includes a(nonnormative) technique for intra refreshing MBs Although this technique iscalled random intra refresh (RIR), it is not really a purely random refreshtechnique This technique is basically a cyclic intra refresh (CIR) technique forwhich the refresh order is not simply the raster scan order The refresh order israndomly defined once before encoding, but afterward intra refresh proceeds
cyclically, following the determined order, with n MBs for each time instant An
example of a randomly determined intra refresh order, for QCIF spatialresolution, may be seen in Figure 1.1
Example of random intra refresh order for QCIF spatial resolution (From Nunes,
P et al., Error resilient macroblock rate control for H.264/AVC video
coding, Proceedings of the IEEE International Conference on Image Processing,
San Diego, CA, p 2133, October 2008 With permission © 2008 IEEE.)
Since the RIR technique used in the H.264/AVC reference software and alsoconsidered here is basically a CIR technique, in the remainder of this chapter,the acronyms RIR and CIR will be used interchangeably
One of the main advantages of this technique is that, being cyclic, it guaranteesthat all MBs will be refreshed, at least, once in each cycle, thus guaranteeingthat there are no MBs where errors can propagate indefinitely However, thistechnique also has disadvantages, one of which is the fact that all MBs arerefreshed exactly the same number of times This basically means that it is notpossible to refresh more often MBs that are more likely to be lost or are harder
to conceal at the decoder if an error does occur
Another important aspect of this technique is that MBs are refreshed according
to the predetermined order, without taking into account the eventual RD cost ofintra refreshing a given MB, as opposed to letting the rate control module decidewhich encoding mode is best in terms of RD cost This is exactly where there isroom for improvement: Intra refresh should be performed by taking into accountthe RD cost of a given MB
Trang 171.3 Efficient Intracoding Refresh
When deciding the best MB coding mode, notably between inter- andintracoding modes, the RDO framework, as briefly described in Section 1.2.1,simply selects the mode that has lower RD cost, given by Equation 1.1 ThisRDO framework, as implemented in the H.264/AVC reference software, does nottake into account other dimensions, besides rate and distortion optimization,such as the robustness of the bitstream in error-prone environments Therefore,some MBs are simply inter coded because their best inter mode RD cost isslightly lower than the best intra mode RD cost For these cases, selecting theintra mode, although not optimal in a strict RD sense, can prove to be a muchbetter decision when the bitstream becomes corrupted by errors (e.g., due topacket losses in packet networks), and the intra coded MBs can be used to stoperror propagation due to the (temporal) predictive coding modes Moreover, ifadditional error robustness is introduced through an intra refresh technique, forexample, as the one described in Section 1.2.2, some MBs can be highlypenalized in a RD sense, since they can be blindly forced to be encoded in anintra mode, without taking into account the RD cost of that decision
1.3.1 Error-Resilient RDO-Driven Intra Refresh
The main idea of a network-aware error-resilient scheme is to perform RDO in aresilient manner, using the relative RD cost of the best intra mode and the bestinter mode for each MB Therefore, whenever coding a given MB in intra modedoes not cost significantly more than the best intercoding mode, the given MB isgracefully forced to be encoded in its best intra mode
This error-resilient RDO provides an efficient intra refresh scheme, thusguaranteeing that the generated bitstream will be more robust to channelerrors, without having to spend a lot of bits on intra coded MBs, which typicallyreduces the decoded video quality when there are no errors in the channel Thisscheme can be described through the MB-level mode decision architecturedepicted in Figure 1.2
Architecture of the error-resilient MB intra/inter mode decision scheme FromNunes, P et al., Error resilient macroblock rate control for H.264/AVC video
Trang 18coding, Proceedings of the IEEE International Conference on Image Processing,
San Diego, CA, p 2134, October 2008 With permission © 2008 IEEE.)
1.3.1.1 RDO Intra and Inter Mode Decision
Before deciding the best mode to encode a given MB, the best inter mode RD
cost, J INTER, is computed from the set of all possible inter modes, and the best intra
mode RD cost, J INTRA, is computed from the set of all possible intra modes through
RDO, i.e., Equations 1.1 and 1.3, where
(1.5)and
(1.6)
where
INTER 8 × 16, INTER 8 × 8, INTER 8 × 4, INTER 4 × 8, and INTER 4 × 4)
PCM, or INTRA 8 × 8)
The best intra and inter modes are the ones with the lowest intra and inter RD
costs, respectively
1.3.1.2 Error-Resilient Intra/Inter Mode Decision
To control the amount of MBs that will be gracefully forced to be encoded in
intra mode, a control parameter, αRD (which basically specifies the tolerable RD
cost increase for replacing an inter by an intra MB) is used in such a way that
Notice that, for αRD = 1, no particular mode is favored in an RD sense, while for
αRD > 1, the intra modes are favored relatively to the inter modes (see Figure
1.3) Therefore, the amount of gracefully forced intra encoded MBs can be
controlled by the αRD parameter The MBs that end up being forced to intra mode
are the MBs for which the RD costs of intra and inter modes are similar, which
typically correspond to MBs that have high inter RD cost and, therefore, would
be difficult to conceal at the decoder if lost
Trang 191.3.2 Random Intra Refresh
Notice that the previous scheme does not guarantee that all MBs areperiodically refreshed, which, if not properly handled, could lead to an endlesspropagation of errors along time for some MBs in the video sequence To handlethis issue, an RIR can also be concurrently applied, but with a lower number ofrefreshed MBs per frame when compared with solely applying the RIRtechnique, in order not to compromise dramatically the RD efficiency
MBs with an intra/inter RD cost ratio below the line will be gracefully forced tointra mode (From Nunes, P et al., Error resilient macroblock rate control for
H.264/AVC video coding, Proceedings of the IEEE International Conference on
Image Processing, San Diego, CA, p 2134, October 2008 With permission ©
2008 IEEE.)
1.4 Network-Aware Error-Resilient Video Coding Method
The main limitation of the MB coding mode decision method described
in Section 1.3 is that the control parameter, αRD, is not dynamically adapted tothe actual network error conditions However, when feedback about the networkerror conditions is available, it would be possible to use this information toadjust the αRD control parameter in order to maximize the decoded video qualitywhile dynamically providing adequate error resilience
1.4.1 Intra/Inter Mode Decision with Constant αRD
When a constant αRD value is used without considering the current network errorconditions in terms of packet loss rate (PLR), the benefits of the techniquedescribed in Section 1.3 (and proposed in Ref [23]) are not fully exploited This
is clear from Figure 1.4, where the Foreman sequence has been encoded with
the Baseline Profile of H.264/AVC with different α values, including α = 1
Trang 20In Figure 1.4, as well as in the remainder of Section 1.4, CIR is not used in order
to avoid biasing the behavior associated with the αRD parameter Notice,however, that the use of CIR is typically recommended, as mentioned in Section1.2.2 As can be seen, in these conditions, the optimal αRD (i.e., the one thatleads to the highest PSNR) is highly dependent on the network PLR
PSNR versus PLR for a constant αRD parameter for the Foreman sequence (From
Soares, L D et al., Efficient network-aware macroblock mode decision for error
resilient H.264/AVC video coding, Proceedings of the SPIE Conference on
Applications of Digital Image Processing, vol 7073, San Diego, CA, August
2008.)
As expected, when there are no errors (PLR = 0%), the highest decoding quality
is achieved when no intra MBs are forced (i.e., αRD = 1.0) However, for this
αRD value, the decoded video quality decays very rapidly as the PLR increases
On the other hand, if only a small amount of intra MBs are forced (i.e., αRD = 1.8),the decoded video quality is slightly improved for the higher PLR values, whencompared to the case with no forced intra MBs, but will be slightly penalized forerror-free transmission This effect is even more evident as the αRD valueincreases, which corresponds to the situation where more and more intra MBsare gracefully forced, depending on the αRD value For example, for αRD = 3.8 andfor a PLR of 10%, the decoded video quality is highly improved relatively to thesituation with no forced intra MBs (i.e., 6.36 dB), because the error propagation
is significantly reduced However, for lower PLRs, the decoded video quality is
penalized due to the excessive use of intracoding (i.e., 7.19 dB for PLR = 0% and 1.50 dB for PLR = 1%), still for α RD = 3.8
Therefore, from what has been presented earlier, it is possible to conclude thatthe optimal amount of intra coded MBs is highly dependent on the errorcharacteristics of the underlying network and, thus, the error resilience control
Trang 21parameter αRD should be dynamically adjusted to the channel error conditions tomaximize the decoded quality.
PSNR versus αRD (alpha in the x-axis label) parameter for various PLRs for the Mother and Daughter sequence (From Soares, L.D et al., Efficient network-
aware macroblock mode decision for error resilient H.264/AVC video
coding, Proceedings of the SPIE Conference on Applications of Digital Image
Processing, vol 7073, San Diego, CA, August 2008.)
In order to illustrate the influence of the αRD parameter on the decodedPSNR, Figure 1.5 shows the decoded video quality, in terms of PSNR, versus the
αRD parameter for several PLRs for the Mother and Daughter sequence (QCIF, 10
Hz) encoded at 64 kbit/s Clearly, for each PLR condition, there is an αRD valuethat maximizes the decoded video quality For example, for a PLR of 10%, themaximum PSNR value is achieved for αRD = 2.2 To further illustrate theimportance of a proper selection of the αRD parameter and how it cansignificantly improve the overall decoded video quality under severe errorconditions, it should be noted that, for a PLR of 10%, the PSNR differencebetween having αRD = 2.2 and αRD = 1.1 is 5.47 dB
1.4.2 Intra/Inter Mode Decision with Network-Aware αRD Selection
A possible approach to address the problem of adapting the αRD parameter to thechannel error conditions is to use the information in the receiver reports (RR) ofthe real-time transport protocol (RTP) control protocol (RTCP) [30] to provide theencoder with the actual error characteristics of the underlying network Thismakes it possible to adaptively and efficiently select the amount of intra codedMBs to be inserted in each frame by taking into account this feedbackinformation about the rate of lost packets, as shown in Figure 1.6
Trang 22FIGURE 1.6
Network-aware video encoding architecture (From Soares, L.D et al., Efficientnetwork-aware macroblock mode decision for error resilient H.264/AVC video
coding, Proceedings of the SPIE Conference on Applications of Digital Image
Processing, vol 7073, San Diego, CA, August 2008.)
In the method presented here, the intra/inter mode decision is still based on the
αRD parameter, but this time αRD may depend on several aspects, such as thecontent type, the content spatial and temporal resolutions, the coding bit rate,and the PLR of the network
This way, by considering a mapping function f NMD, it will be possible todynamically determine the αRD parameter from the following expression:
(1.9)
where
PLR is the packet loss rate
S can be an n-dimensional vector characterizing the encoding scenario, for
example, in terms of the content motion activity and the texture codingcomplexity, the content spatial and temporal resolutions, and the coding bit rate
In this work, however, as it will be shown later in Section 1.4.3, the encodingscenario can be characterized solely by the encoded bit rate with a good
approximation The f NMD function basically maps the encoding scenario and thenetwork PLR into a “good” αRD parameter that dynamically maximizes theaverage decoding video quality Notice that, although it is not easy to obtain ageneral function, it can be defined for several classes of content and a discretelimited set of encoding parameters and PLRs In this chapter, it will be shown
that, by carefully designing the f NMD function, significant gains can be obtained interms of video quality regarding the reference method described in Section1.4.4
Therefore, the network-aware MB mode decision (NMD) method can be brieflydescribed through the following steps in terms of encoder operation:
1 Obtain the packet loss rate through network feedback
2 Compute the αRD parameter through the mapping function given by Equation1.9 (and detailed in the following)
3 Perform intra/inter mode decision using the αRD parameter, computed in Step
2, for the next MB to be encoded, and encode the MB
4 Check if a new network feedback report has arrived; if yes, go back to Step 1;
if not, go back to Step 3
Trang 23Notice that it is out of the scope of this chapter to define when the networkreports are issued, since this will depend on how the network protocols areconfigured and the varying characteristics of the network itself [30].Nevertheless, in real application scenarios, it is important to design appropriateinterfacing mechanisms between the codec and the underlying network, in orderthat both encoder and decoder can adaptively adjust their operations according
to the network conditions [12]
Through Equation 1.9, the encoder is able to adjust the amount of intra refreshaccording to the network error conditions and the available bit rate This intrarefresh method typically increases the intra refresh for the more complex MBs,which are those typically more difficult to conceal The main problem of thisapproach is that it does not guarantee that all MBs in the scene are refreshed.This is clearly illustrated in Figure 1.7 for the Foreman sequence, where the right
image represents the relative amount of MB intra refresh along the sequence(lighter blocks mean more intra refresh) As it can be seen, with this intrarefresh scheme some MBs are never refreshed, which can lead to errorspropagating indefinitely along time in these MB positions (dark blocks in Figure1.7)
1.4.3 Model for the f NMD Mapping Function
In order to devise a model for the mapping function f NMD defined in Equation 1.9,
it is first important to see how the optimal αRD parameter varies with PLR This isplotted in Figure 1.8 for three different sequences (i.e., Mother and Daughter,
Foreman, and Mobile and Calendar) encoded at different bit rates, and
resolutions, for illustrative purposes Each curve in Figure 1.8 corresponds to a
different encoding scenario S, in terms of the content motion activity and the
texture coding complexity, the content spatial and temporal resolutions, and thecoding bit rate (see Equation 1.9) As shall be detailed later in Section 1.5, thesethree sequences have also been encoded at many other bit rates, and the kind
of curves obtained was always similar
Relative amount of intra refresh (b) for the MBs of the Foreman sequence (a) (QCIF, 15 Hz, 128 kbit/s,and α = 1.1) (From Nunes, P et al., Automatic and
Trang 24adaptive network-aware macroblock intra refresh for error-resilient H.264/AVC
video coding, Proceedings of the IEEE International Conference on Image
Processing, Cairo, Egypt, p.3074, November 2009 With permission © 2009
IEEE.)
Example of optimal αRD versus PLR for various sequences and bit rates (FromSoares, L.D et al., Efficient network-aware macroblock mode decision for error
resilient H.264/AVC video coding, Proceedings of the SPIE Conference on
Applications of Digital Image Processing, vol 7073, San Diego, CA, August
2008.)
As can be seen from the plots in Figure 1.8, the behavior of the optimal
αRD parameter versus the PLR is similar to that of a charging capacitor [31] (butstarting at αRD = 1.0) Therefore, for a given sequence and for a given bit rate
(i.e., a given encoding scenario S), it should be possible to model the behavior
of the αRD parameter with respect to the PLR with the following expression:
(1.10)
where PLR represents the packet loss rate, while K1 and K2 represent constantsthat are specific to the considered encoding scenario, notably the sequencecharacteristics and bit rate However, the main problem in using Equation1.10 to compute αRD is that, for a given sequence, a different set
of K1 and K2 would be needed for each of the considered bit rates, which would
be extremely unpractical In order to address this issue, it is important tounderstand how the optimal αRD parameter varies when both the PLR and the bitrate vary This variation is illustrated in Figure 1.9 for the Mobile and
Calendar sequence.
Trang 25FIGURE 1.9
Optimal αRD versus PLR and bit rate for the Mobile and Calendar sequence (From
Soares, L.D et al., Efficient network-aware macroblock mode decision for error
resilient H.264/AVC video coding, Proceedings of the SPIE Conference on
Applications of Digital Image Processing, vol 7073, San Diego, CA, August
2008.)
After close inspection of Figure 1.9, it can be seen that the K1 value, which
basically dictates the value of αRD toward which the curve asymptotically
converges, depends linearly on the used bit rate and, therefore, it can be
modeled by the following expression:
(1.11)
where r b is the bit rate, while a and b are the parameters that need to be
estimated for a given sequence
As for the K2 value, which dictates the growth rate of the considered
exponential, it appears, after exhaustive testing, to not depend on the used bit
rate Therefore, as a first approach, it can be considered to be constant, as in
(1.12)
This behavior was observed for the three different video sequences mentioned
earlier and, therefore, makes it possible to establish a final expression which
allows the video encoder to automatically select, for a given sequence, an
adequate αRD parameter when the PLR and the bit rate r b are known:
Trang 26where a, b, and c are the model parameters that need to be estimated (see Ref.
[26]) After extensive experimentation, it was found that the parameters a, b, and c can be considered more or less independent of the sequence, which
means that a single set of parameters could be used for three different videosequences with a low fitting error This basically means that the encoding
scenario S, defined in Section 1.4.2, can be well represented only by the bit
rate r b
As explained in Ref [26], the parameters a, b, and c could be obtained by
considering four packet loss rates and two different bit rates for three different
sequences, corresponding to a total of 24 (r b , PLR) pairs, with the iterative
Levenberg–Marquardt method [32,33] By following this approach, the estimated
parameters are a = 0.83 × 10−6, b = 0.97, and c = 0.90.
1.4.4 Network-Aware Cyclic Intra Refresh
The approach presented in Section 1.4.2 can also be followed to simply adjustthe number of cyclic intra refreshed MBs per frame, based on the feedbackreceived about the network PLR, without any RD cost considerations This isshown in Figure 1.10, where it is clear that for each PLR condition there are anumber of cyclic intra refresh MBs that maximize the decoded video quality.However, when comparing the best PSNR results of Figures 1.5 and 1.10 (both
obtained for the Mother and Daughter sequence encoded with the same spatial
and temporal resolutions and the same bit rate), for a given PLR, the PSNRvalues obtained by varying αRD are always higher For example, for a PLR of 5%,
a maximum average PSNR of 37.03 dB is achieved for αRD = 1.9 (see Figure 1.5),while a maximum PSNR of only 34.94 dB is achieved for 33 cyclically intrarefreshed MBs in each frame (see Figure 1.10), a difference of approximately 2
dB This shows that by adequately choosing the αRD parameter it should bepossible to achieve a higher quality than when using the optimal number of CIRMBs This is mainly due to the fact that when simply cyclically intra refreshingsome MBs in a given frame, the additional RD cost of that decision can beextremely high, penalizing the overall video quality, since the “cheap” intra MBsare not looked for as in the efficient intracoding refresh solution based on the
αRD parameter
Trang 27FIGURE 1.10
PSNR versus number of CIR MBs for various PLRs for the Mother and
Daughter sequence (From Soares, L.D et al., Efficient network-aware
macroblock mode decision for error resilient H.264/AVC video
coding, Proceedings of the SPIE Conference on Applications of Digital Image
Processing, vol 7073, San Diego, CA, August 2008.)
1.4.5 Intra Refresh with Network-Aware αRD and CIR Selection
The main drawback of the scheme described in Section 1.4.3 of not being able
to guarantee that all MBs are periodically refreshed, can be alleviated by
introducing some additional CIR MBs per frame to guarantee that all MB
positions are refreshed with a minimum periodicity This requirement raises the
question of how to adaptively select an adequate amount of CIR MBs that is
sufficiently high to avoid long-term error propagation without penalizing too
much the encoder RD performance
A possible approach to tackle this problem is to decide the adequate αRD value
and the number of CIR MBs per frame separately, using a different model for
each of these two error resilience parameters For the αRD selection, the model
in Equation 1.9 is used As for the selection of the number of CIR MBs, it was
verified after exhaustive testing [27] that the optimal amount of CIR MBs tends
to increase linearly with the bit rate r b, for a given PLR, but tends to increase
exponentially with the PLR, for a given bit rate Based on these observations,
the following model was considered for the selection of the amount of CIR MBs
per frame:
(1.14)
where a1, b1, and c1 are the model parameters that need to be estimated In Ref
[27], these parameters have been determined by nonlinear curve fitting (the
Trang 28Levenberg–Marquardt method) of the optimal amount of CIR MBs per frame,experimentally determined for a set of representative test sequences, encoding
bit rate ranges and packet loss rates The estimated parameters were a1 =12.97 × 10−6, b1 = −0.13, and c1 = 0.24; these parameter values will also beconsidered here
Figure 1.11 shows the proposed model as well as the experimental data for
the Mobile and Calendar test sequence As can be seen, a simple linear model
would not have represented well the experimental data
Optimal amount of CIR MBs per frame versus PLR and bit rate for the Mobile and
Calendar sequence (From Nunes, P et al., Automatic and adaptive
network-aware macroblock intra refresh for error-resilient H.264/AVC video
coding, Proceedings of the IEEE International Conference on Image Processing,
Cairo, Egypt, p 3075, November 2009 With permission © 2009 IEEE.)
The CIR order is randomly defined once before encoding, as described in Section1.2.2 (and in Ref [25]), to avoid the subjectively disturbing effect of performingsequential (e.g., raster scan) refresh The determined order is then cyclicallyfollowed with the computed number of MBs being refreshed in each frame
Therefore, the complete network-aware MB intracoding refresh (NIR) scheme(which was initially proposed in Ref [27]) can be briefly described by thefollowing steps in terms of encoder operation:
Step 1 Obtain the PLR value through network feedback.
Step 2 Compute the number of CIR MBs to be used per frame, by using the
proposed f CIR function defined by Equation 1.14 and rounding it to the nearestinteger
Step 3 Compute the α RD value by using the f NMD function defined by Equation1.9 in Section 1.4.2
Trang 29Step 4 For each MB in a frame, check if it should be forced to intra mode
according to the CIR order and the determined number of CIR MBs per frame; if
not, perform intra/inter mode decision using the αRD value computed in Step 3;
encode the MB with selected mode
Step 5 At the end of the frame, check if a new network feedback report has
arrived; if yes, go back to Step 1; if not, go back to Step 4
The definition of when the network reports are issued depends on how the
network protocols are configured and the varying characteristics of the network
itself [34]
Notice that independently selecting the αRD value and the amount of CIR MBs,
while they are likely interdependent, can lead to chosen values that do not
correspond to the optimal (αRD , CIR) pair However, it has been verified after
extensive experimentation that the considered independent selection process is
still robust in the sense that the chosen values are typically close enough to the
optimal pair and, therefore, the overall performance is not dramatically
penalized
1.5 Performance Evaluation
To evaluate the performance of the complete NIR scheme described in this
chapter, it has been compared in similar conditions to a reference intra refresh
scheme, which basically corresponds to the network-aware version with the
cyclic intra refresh scheme of the H.264/AVC reference software [25] described
in Section 1.4.4 This solution has been adopted because at the time of writing
no other network-aware intra refresh techniques, which adaptively take into
account the current network conditions, were known
In the reference scheme, the optimal number of CIR MBs per frame is selected
manually for the considered network conditions, while in the considered NIR
solution, the selection of the amount of CIR MBs per frame and the
αRD parameter is done fully automatically For the complete NIR and reference
schemes, the Mother and Daughter, the Foreman, and the Mobile and
Calendar video sequences have been encoded using the H.264/AVC Baseline
Profile [25] The used test conditions, which are representative of those
currently used for personal communications over mobile networks, are
summarized in Table 1.1 For QCIF, each frame was divided into three slices,
while for CIF each frame was divided into six slices In both cases, each
slice consists of three MB rows After encoding, each slice was mapped to an
RTP packet for network transmission [34]
TABLE 1.1
Test Conditions
Sequence Mother Daughter and Foreman Mobile and Calendar
Trang 30Bit rate (kbit/s) 24–64 48–128 384–1152
Source: Nunes, P., Soares, D., and Periera, F., Error resilient macroblock rate
control for H.264/AVC video coding, Proceedings of the IEEE International
Conference on Image Processing, San Diego, CA, p 2134, October 2008 With
permission Copyright 2008 IEEE
For the reference scheme, the number of cyclically intra refreshed MBs perframe was chosen for each PLR and bit rate, such that the decoded video qualitywould be the best possible This was done manually by performing anexhaustive set of tests using many different amounts of CIR MBs per frame andthen choosing the one that leads to the highest decoded average PSNR value,obtained by averaging over 50 different error patterns For the QCIF videosequences, the possible values for the number of cyclically intra refreshed MBswere chosen from the representative set {0, 5, 11, 22, 33, …, 99}, while for theCIF video sequences the representative set consisted of {0, 22, 44, 66,…, 396}
To simulate the network conditions, three different PLRs were considered: 1%,5%, and 10% Since each slice is mapped to one RTP packet, each lost packetwill correspond to a lost video slice Packet losses are considered independentand identically distributed For each one of the studied PLRs, each codedbitstream has been corrupted and then decoded 50 times (i.e., corresponding to
50 different error patterns or runs), while applying the default error concealmenttechnique implemented in the H.264/AVC reference software [25,28] Thepresented results correspond to PSNR averages of these 50 different runs for theluminance component (PSNR Y)
For the conditions mentioned earlier, PSNR Y results are shown in Tables1.2 through 1.4 for the Mother and Daughter, Foreman, and Mobile and
Calendar video sequences, respectively In these tables, NIR refers to the
complete network-aware intracoding refresh scheme described in this chapter,and JM refers to the reference technique (winning cases appear in bold) Inaddition, OPT corresponds to the manual selection of the best (αRD , CIR) pair.
TABLE 1.2
PSNR Results for the Mother and Daughter Sequence
Trang 31Source: From Nunes, P., Soares, D., and Periera, F., Automatic and adaptive
network-aware macroblock intra refresh for error-resilient H.264/AVC video
coding, Proceedings of the IEEE International Conference on Image Processing,
Cairo, Egypt, p 3076, November 2009 With permission Copyright 2009 IEEE
TABLE 1.3
PSNR Results for the Foreman Sequence
Source: From Nunes, P., Soares, D., and Periera, F., Automatic and adaptive
network-aware macroblock intra refresh for error-resilient H.264/AVC video
coding, Proceedings of the IEEE International Conference on Image Processing,
Cairo, Egypt, p 3076, November 2009 With permission Copyright 2009 IEEE
TABLE 1.4
PSNR Results for the Mobile and Calendar Sequence
Trang 32Source: From Nunes, P., Soares, D., and Periera, F., Automatic and adaptive
network-aware macroblock intra refresh for error-resilient H.264/AVC video
coding, Proceedings of the IEEE International Conference on Image Processing,
Cairo, Egypt, p 3076, November 2009 With permission Copyright 2009 IEEE
No visual results are given here, because the direct comparison of peer frames(encoded with different coding mode selection schemes) is rather meaningless
in this case; only the comparison of the total video quality for several errorpatterns makes sense This is due to the fact that the generated streams for theproposed and the reference techniques are different and, even if the same errorpattern is used to corrupt them, the errors will affect different parts of the data
at a given time instant, causing very different artifacts
To help the reader to better read the gains obtained with the proposed
technique, the results obtained for the Mother and Daughter sequence are
also shown in a plot in Figure 1.12, for both JM and NIR For the Foreman and the Mobile and Calendar sequences, the trends are similar.
Trang 33FIGURE 1.12
PSNR results for the Mother and Daughter sequence (From Nunes, P., Soares,
D., and Periera, F., Automatic and adaptive network-aware macroblock intra
refresh for error-resilient H.264/AVC video coding, Proceedings of the IEEE
International Conference on Image Processing, Cairo, Egypt, p 3076, November
2009 With permission Copyright 2009 IEEE.)
The presented results show that, when the fully automatic NIR scheme is used,the decoded video quality is significantly improved for the vast majority oftested conditions when compared to the reference method with a manuallyselected amount of CIR MBs (JM) Improvements of the NIR method can be as
high as 1.90 dB for the Mother and Daughter sequence encoded at 64 kbit/s and
a PLR of 5% The most significant exception is for the PLR of 10% and higher bitrates (see Tables 1.3 and 1.4) This exception is due to the fact that, for these
PLR and bit rate values, the number of CIR MBs chosen with the proposed f CIR isslightly different from the optimal values
When comparing the NIR scheme to the one proposed in Ref [26], which doesnot use CIR, the NIR PSNR Y values are most of the times higher than or equal tothose achieved in Ref [26] The highest gains occur for the Foreman sequence encoded at 128 kbit/s and a PLR of 10% (0.90 dB), and for the Mobile and
Calendar sequence encoded at 768 kbit/s and a PLR of 10% (0.60 dB) For the
cases, where the NIR leads to lower PSNR Y values, the losses are never more
than 0.49 dB, which happens for the Mobile and Calendar sequence encoded at
896 kbit/s and a PLR of 5%
Notice, however, that the scheme in Ref [26] cannot guarantee that all MBs willeventually be refreshed, which is a major drawback for real usage in error-proneenvironments, such as mobile networks On the other hand, the one described inthis chapter can, not only overcome this drawback, but it does so fullyautomatically, without any user intervention
Trang 341.6 Final Remarks
This chapter describes a method to efficiently and fully automatically performintracoding refresh, while taking into account the PLR of the underlying networkand the encoded bit rate The described method can be used to efficientlygenerate error-resilient H.264/AVC bitstreams that are perfectly adapted to thechannel error characteristics This is extremely important because it can meanthat error-resilient video transmission will be possible in environments withvarying error characteristics with an improved quality, notably, when compared
to the case where the MB intracoding decisions are taken without consideringthe error characteristics of the network
Acknowledgments
The authors would like to acknowledge that the work described in this chapterwas developed at Instituto de Telecomunicações (Lisboa, Portugal) and wassupported by FCT project PEst-OE/EEI/LA0008/2011
References
1 A H Li, S Kittitornkun, Y.-H Hu, D.-S Park, J Villasenor, Data partitioningand reversible variable length codes for robust video
communications, Proceedings of the IEEE Data Compression Conference,
Snowbird, UT, pp 460–469, March 2000
2 G Cote, S Shirani, F Kossentini, Optimal mode selection and synchronization
for robust video communications over error-prone networks, IEEE Journal on
Selected Areas in Communications, 18(6), 952–965, June 2000.
3 S Wenger, G D Knorr, J Ott, F Kossentini, Error resilience support in
H.263+, IEEE Transactions on Circuits and Systems for Video Technology, 8(7),
867–877, November 1998
4 L P Kondi, F Ishtiaq, A K Katsaggelos, Joint source-channel coding for
motion-compensated DCT-based SNR scalable video, IEEE Transactions on
Image Processing, 11(9), 1043–1052, September 2002.
5 H M Radha, M van der Schaar, Y Chen, The MPEG-4 fine-grained scalable
video coding method for multimedia streaming over IP, IEEE Transactions on
Multimedia, 3(1), 53–68, March 2001.
6 T Schierl, T Stockhammer, T Wiegand, Mobile video transmission using
scalable video coding, IEEE Transactions on Circuits and Systems for Video
Technology, 17(9), 1204–1217, September 2007.
7 R Puri, K Ramchandran, Multiple description source coding through forward
error correction codes, Proceedings of the Asilomar Conference on Signals,
Systems, and Computers, Pacific Grove, CA, vol 1, pp 342–346, October 1999.
8 V K Goyal, Multiple description coding: Compression meets the
network, IEEE Signal Processing Magazine, 18(5), 74–93, September 2001.
9 K Stuhlmüller, N Färber, M Link, B Girod, Analysis of video transmission
over lossy channels, IEEE Journal on Selected Areas in Communications, 18(6),
1012–1032, June 2000
10 L D Soares, F Pereira, Error resilience and concealment performance for
MPEG-4 frame-based video coding, Signal Processing: Image Communication,
14(6–8), 447–472, May 1999
Trang 3511 A K Katsaggelos, F Ishtiaq, L.P Kondi, M.-C Hong, M Banham, J Brailean,
Error resilience and concealment in video coding, Proceedings of the European
Signal Processing Conference, Rhodes, Greece, pp 221–228, September 1998.
12 Y Wang, S Wenger, J Wen, A Katsaggelos, Error resilient video coding
techniques IEEE Signal Processing Magazine, 17(4), 61–82, July 2000.
13 F Zhai, A Katsaggelos, Joint Source-Channel Video Transmission, Morgan &
Claypool Publishers, San Rafael, CA, 2007
14 ISO/IEC 14496-10, Information Technology—Coding of Audio-Visual Objects
—Part 10: Advanced Video Coding, 2005
15 ISO/IEC 14496-2, Information Technology—Coding of Audio-Visual Objects—Part 2: Visual (2nd Edn.), 2001
16 P Haskell, D Messerschmitt, Resynchronization of motion compensated
video affected by ATM cell loss, Proceedings of the IEEE International
Conference on Acoustics, Speech and Signal Processing, San Francisco, CA, vol.
3, pp 545–548, March 1992
17 G Côté, F Kossentini, Optimal intra coding of blocks for robust video
communication over the Internet, Signal Processing: Image Communication,
15(1–2), 25–34, September 1999
18 J Y Liao, J.D Villasenor, Adaptive intra block update for robust transmission
of H.263, IEEE Transactions on Circuits and Systems for Video Technology,
10(1), 30–35, February 2000
19 P Frossard, O Verscheure, AMISP: A complete content-based MPEG-2
error-resilient scheme, IEEE Transactions on Circuits and Systems for Video
Technology, 11(9), 989–998, September 2001.
20 Z He, J Cai, C Chen, Joint source channel rate-distortion analysis for
adaptive mode selection and rate control in wireless video coding, IEEE
Transactions on Circuits and Systems for Video Technology, 12(6), 511–523,
June 2002
21 H Shu, L Chau, Intra/Inter macroblock mode decision for error-resilient
transcoding, IEEE Transactions on Multimedia, 10(1), 97–104, January 2008.
22 H-J Ma, F Zhou, R.-X Jiang, Y.-W Chen, A network-aware error-resilient
method using prioritized intra refresh for wireless video communications, Journal
of Zhejiang University - Science A, 10(8), 1169–1176, August 2009.
23 P Nunes, L.D Soares, F Pereira, Error resilient macroblock rate control for
H.264/AVC video coding, Proceedings of the IEEE International Conference on
Image Processing, San Diego, CA, pp 2132–2135, October 2008.
24 Z Li, F Pan, K Lim, G Feng, X Lin, S Rahardaj, Adaptive basic unit layer
rate control for JVT, Doc JVT-G012, 7th MPEG Meeting, Pattaya, Thailand, March
2003
Available: http://iphome.hhi.de/suehring/tml/download/
26 L.D Soares, P Nunes, F Pereira, Efficient network-aware macroblock mode
decision for error resilient H.264/AVC video coding, Proceedings of the SPIE
Conference on Applications of Digital Image Processing, vol 7073, San Diego,
CA, pp 1–12, August 2008
27 P Nunes, L.D Soares, F Pereira, Automatic and adaptive network-aware
macroblock intra refresh for error-resilient H.264/AVC video coding, Proceedings
Trang 36of the IEEE International Conference on Image Processing, Cairo, Egypt, pp.
3073–3076, November 2009
28 K.-P Lim, G Sullivan, T Wiegand, Text description of joint model reference
encoding methods and decoding concealment methods, Doc JVT-X101, ITU-T
VCEG Meeting, Geneva, Switzerland, June 2007.
29 T Wiegand, H Schwarz, A Joch, F Kossentini, G Sullivan, Rate-constrained
coder control and comparison of video coding standards, IEEE Transactions on
Circuits and Systems for Video Technology, 13(7), 688–703, July 2003.
30 H Schulzrinne, S Casner, R Frederick, V Jacobson, RTP: A transport
protocol for real-time applications, Internet Engineering Task Force, RFC 1889,
January 1996
31 R C Dorf, J.A Svoboda, Introduction to Electric Circuits, 5th Edition, Wiley,
New York, 2001
32 K Levenberg, A method for the solution of certain non-linear problems in
least squares, Quarterly of Applied Mathematics, 2(2), 164–168, July 1944.
33 D Marquardt, An algorithm for the least-squares estimation of nonlinear
parameters, SIAM Journal of Applied Mathematics, 11(2), 431–441, June 1963.
34 S Wenger, H.264/AVC over IP, IEEE Transactions on Circuits and Systems
for Video Technology, 13(7), 645–656, July 2003.
2
Distributed Video Coding: Principles and Challenges
Jürgen Slowack and Rik Van de Walle
CONTENTS
2.1 Introduction
2.2 Theoretical Foundations
2.2.1 Lossless Distributed Source Coding (Slepian–Wolf)
2.2.2 Lossy Compression with Receiver Side Information (Wyner–Ziv)
2.3 General Concept
2.4 Use-Case Scenarios in the Context of Wireless Networks
2.5 DVC Architectures and Components
2.5.1 Side Information Generation
2.5.1.1 Frame-Level Interpolation Strategies
2.5.1.2 Frame-Level Extrapolation Strategies
2.5.1.3 Encoder-Aided Techniques
2.5.1.4 Partitioning and Iterative Refinement
2.5.2 Correlation Noise Estimation
2.5.3 Channel Coding
2.5.4 Determining the WZ Rate
2.5.5 Transformation and Quantization
2.5.6 Mode Decision
2.6 Evaluation of DVC Compression Performance
2.7 Other DVC Architectures and Scenarios
2.8 Future Challenges and Research Directions
References
Trang 372.1 Introduction
A video compression system consists of an encoder that converts uncompressedvideo sequences into a compact format suitable for transmission or storage, and
a decoder that performs the opposite operations to facilitate video display
Compression is typically achieved by exploiting similarities between frames(temporal direction), as well as similarities between pixels within the sameframe (spatial direction) The conventional way is to exploit these similarities atthe encoder Using already-coded information, the encoder generates aprediction of the information still to be coded Next, the difference between theinformation to be coded and the prediction is further processed and compressedthrough entropy coding
The accuracy of the prediction determines the compression performance, in thesense that more accurate predictions will lead to smaller residuals and bettercompression As a consequence, computationally complex algorithms have beendeveloped to search for the best predictor This has led to a complexityimbalance, in which the encoder is significantly more complex than the decoder
A radically different approach to video coding—called distributed video coding(DVC)—has emerged during the past decade In DVC, the prediction isgenerated at the decoder instead of at the encoder As this prediction—calledside information—typically contains errors, additional information is sent fromthe encoder to the decoder to allow correcting the side information Generatingthe prediction signal at the decoder shifts the computational burden from theencoder to the decoder side This facilitates applications in which encodingdevices are relatively cheap, small, and/or power-friendly Some examples ofthese applications include wireless sensor networks, wireless video surveillance,and videoconferencing using mobile devices [44]
Many publications covering DVC have appeared (including a book on distributedsource coding [DSC] [16]) The objective of this chapter is therefore to provide acomprehensive overview of the basic principles behind DVC and illustrate theseprinciples with examples from the current state-of-the-art Based on thisdescription, the main future challenges will be identified and discussed
2.2 Theoretical Foundations
Before describing the different DVC building blocks in detail we start byhighlighting some of the most important theoretical results This includes adiscussion on the Slepian–Wolf and Wyner–Ziv (WZ) theorems, which aregenerally regarded as providing a fundamental information–theoretical basis forDVC It should be remarked that these results apply to DSC in general and thatDVC is only a special case
2.2.1 Lossless Distributed Source Coding (Slepian–Wolf)
David Slepian and Jack K Wolf considered the configuration depicted in Figure2.1, in which two sources X and Y generate correlated sequences of information
symbols [51] Each of these sequences is compressed by a separate encoder,
namely, one for X and one for Y The encoder of each source is constrained to
operate without knowledge of the other source, explaining the term DSC Thedecoder, on the other hand, receives both coded streams as input and should be
Trang 38able to exploit the correlation between the sources X and Y for decoding the
information symbols
Slepian and Wolf consider the setup in which two correlated sources X and Y are
coded independently, but decoded jointly
Surprisingly, Slepian and Wolf proved that the compression bound for this
configuration is the same as in the case where the two encoders are allowed to
communicate More precisely, they proved that the rates R X and R Y of the coded
streams satisfy the following set of equations:
(2.1)
where H(.) denotes the entropy These conditions can be represented
graphically, as a so-called admissible or achievable rate region, as depicted
in Figure 2.2
While any point on the line H(X,Y) is equivalent from a compression point of
view, special attention goes to the corner points of the achievable rate region
For example, the point (H(X|Y), H(Y)) corresponds to the special case of source
coding with side information available at the decoder, as depicted in Figure 2.3
This case is of particular interest in the context of current DVC solutions, where
side information Y is generated at the decoder and used to decode X According
to the Slepian–Wolf theorem, the minimal rate required in this case is the
conditional entropy H(X|Y).
2.2.2 Lossy Compression with Receiver Side Information (Wyner–Ziv)
The work of Slepian and Wolf relates to lossless compression These results were
extended to lossy compression by Aaron D Wyner and Jacob Ziv [65] Although
introducing quality loss seems undesirable at first thought, it is often necessary
to allow some loss of quality at the output of the decoder in order to achieve
even higher compression ratios (i.e., lower bit rates)
Trang 39FIGURE 2.2
Graphical representation of the achievable rate region
(Lossless) source coding with side information available at the decoder
Denote the acceptable distortion between the original signal X and the decoded signal X′ as D = E[d(X, X′)], where d is a specific distortion metric (such as the
mean-squared error) Two cases are considered for compression with side
information available at the decoder In the first case, the side information Y is
not available at the encoder The rate of the compressed stream for this case isdenoted RWZX|Y(D)RX|YWZ(D) In the second case, Y is made available to the
encoder as well, resulting in a rate denoted RX|Y(D)RX|Y(D) With thesenotations, Wyner and Ziv proved that
(2.2)
In other words, not having the side information available at the encoder results
in a rate loss greater than or equal to zero, for a particular distortion D.
Interestingly, the rate loss has been proved to be zero in the case of Gaussianmemoryless sources and a mean-squared error (MSE) distortion metric
The results of Wyner and Ziv were further extended by other researchers, for
example, proving that the equality also holds in case X is equal to the sum of arbitrarily distributed Y and independent Gaussian noise [46] In addition, Zamirshowed that the rate loss for sources with general statistics is less than 0.5 bitsper sample when using the MSE as a distortion metric [68]
2.3 General Concept
Trang 40The theorems of Slepian–Wolf and Wyner–Ziv apply to DSC, and therefore also
to the specific case of DVC Basically, the theorems indicate that a DVC systemshould be able to achieve the same compression performance as a conventionalvideo compression system However, the proofs do not provide insights on how
to actually construct such a system As a result, the first DVC systems haveappeared in the scientific literature only about 30 years later
The common approach in the design of a DVC system is to consider Y as being a corrupted version of X This way, the proposed setup becomes highly similar to
a channel-coding scenario In the latter, a sequence of information
symbols X could be sent across an error-prone communication channel, so that Y has been received instead of X To enable successful recovery of X at the
receiver’s end, the sender could include additional error-correcting information
calculated on X, such as turbo or low-density parity-check (LDPC) codes [33].The difference between such a channel-coding scenario and the setup depicted
in Figure 2.3 is that in our case Y is already available at the decoder In other
words, the encoder should only send the error-correcting information to allow
recovery of X (or X′ in the lossy case) Since Y is already available at the decoder instead of being communicated by the encoder, the errors in Y are said to be
induced by virtual noise (also called correlation noise) on a virtualcommunication channel
2.4 Use-Case Scenarios in the Context of Wireless Networks
By generating Y itself at the decoder side as a prediction of the original X at the
encoder, the complexity balance between the encoder and the decoderbecomes totally different from a conventional video compression system such
as H.264/AVC [64] While conventional systems feature an encoder that issignificantly more complex than the decoder, in DVC the complexity balance iscompletely the opposite
In the context of videoconferencing using mobile devices, DVC can be used incombination with conventional video coding techniques (such as H.264/AVC),which allows to assign computationally less complex steps to mobile devices,while performing computationally complex operations in the network