Advance video communication in wireless network

"Wireless video communications encompass a broad range of issues and opportunities that serve as the catalyst for technical innovations. To disseminate the most recent advances in this challenging yet exciting field, Advanced Video Communications over Wireless Networks provides an in-depth look at the fundamentals, recent technical achievements, challenges, and emerging trends in mobile and wireless video communications. The editors have carefully selected a panel of researchers with expertise in diverse aspects of wireless video communication to cover a wide spectrum of topics, including the underlying theoretical fundamentals associated with wireless video communications, the transmission schemes tailored to mobile and wireless networks, quality metrics, the architectures of practical systems, as well as some novel directions. They address future directions, including Quality-of-Experience in wireless video communications, video communications over future networks, and 3D video communications. The book presents a collection of tutorials, surveys, and original contributions, providing an up-to-date, accessible reference for further development of research and applications in mobile and wireless video communication systems. The range of coverage and depth of expertise make this book the go-to resource for facing current and future challenges in this field."

Trang 2

Network-Aware Error-Resilient Video Coding

Luís Ducla Soares and Paulo Nunes

2.

Distributed Video Coding: Principles and Challenges

Jürgen Slowack and Rik Van de Walle

3.

Computer Vision–Aided Video Coding

Manoranjan Paul and Weisi Lin

Cooperative Video Provisioning in Mobile Wireless Environments

Paolo Bellavista, Antonio Corradi, and Carlo Giannelli

Trang 3

Combined CODEC and Network Parameters for an Enhanced Quality

of Experience in Video Streaming

Araz Jahaniaval and Dalia Fayek

15.

Video QoS Analysis over Wi-Fi Networks

Rashid Mehmood and Raad Alturki

Index

Preface

Video communication has evolved from a simple tool for visual communication

to a key enabler for various video applications A number of exciting videoapplications have been successfully deployed in recent years, with the goal ofproviding users with more flexible, personalized, and content-rich viewingexperience Accompanied with the ubiquitous video applications, we have alsoexperienced a paradigm shift from passive, wired, and centralized video contentaccess to interactive, wireless, and distributed content access Undoubtedly,wireless video communications have paved the way for advanced applications.However, given the distributed, resource-constraint, and heterogeneous nature

of wireless networks, the support of quality video communications over wirelessnetworks is still challenging Video coding is one of the indispensablecomponents in various wireless video applications, whereas the wirelessnetwork condition always imposes more stringent requirements on codingtechnologies To cope with the limited transmission bandwidth and to offeradaptivity to the harsh wireless channels, rate control, packet scheduling, aswell as error control mechanisms are usually incorporated in the design ofcodecs to enable efficient and reliable video communications At the same time,due to energy constraint in wireless systems, video coding algorithms shouldoperate with the lowest possible power consumption Therefore, video codingover wireless networks is inherently a complex optimization problem with a set

of constraints In addition, the high heterogeneity and user mobility associatedwith wireless networks are also key issues to be tackled for a seamless delivery

of quality-of-experience supported video streams

To sum up, wireless video communications encompass a broad range ofchallenges and opportunities that provide the catalyst for technical innovations

To disseminate the most recent advances in this challenging yet exciting field,

we bring forth this book as a compilation of high-quality chapters This book isintended to be an up-to-date reference book on wireless video communications,providing the fundamentals, recent technical achievements, challenges, andsome emerging trends We hope that the book will be accessible to variousaudiences, ranging from those in academia and industry to seniorundergraduates and postgraduates To achieve this goal, we have solicitedchapters from a number of researchers who are experts in diverse aspects ofwireless video communications We received a good response and, finally, afterpeer review and revision, 15 chapters were selected These chapters cover a

Trang 4

wide spectrum of topics, including the underlying theoretical fundamentalsassociated with wireless video communications, transmission schemes tailored

to mobile and wireless networks, quality metrics, architectures of practicalsystems, as well as some novel directions In what follows, we present asummary of each chapter

In Chapter 1, “Network-Aware Error-Resilient Video Coding,” a network-awareIntra coding refresh method is presented This method increases the errorrobustness of H.264/AVC bitstreams, considering the network packet loss rateand the encoding bit rate, by efficiently taking into account the rate-distortionimpact of Intra coding decisions while guaranteeing that errors do notpropagate

Chapter 2, “Distributed Video Coding: Principles and Challenges,” is a tutorial ondistributed video coding (DVC) In contrast to conventional video compressionschemes featuring an encoder that is significantly more complex than thedecoder, in DVC the complexity distribution is the reverse This chapter provides

an overview of the basic principles, state of the art, current problems, andtrends in DVC

Chapter 3, “Computer Vision Aided Video Coding,” studies video coding from theperspective of computer vision Motivated by the fact that the human visualsystem (HVS) is the ultimate receiver of the majority of compressed videos andthat there is a scope to remove unimportant information through HVS, thechapter proposes a computer vision–aided video coding technique by exploitingthe spatial and temporal redundancies with visually unimportant information

In Chapter 4, “Macroblock Classification Method for Computation Control VideoCoding and Other Video Applications Involving Motions,” a new macroblock (MB)classification method is proposed, which classifies MBs into different classesaccording to their temporal and spatial motion and texture information.Furthermore, the implementations of the proposed MB classification method intocomplexity-scalable video coding as well as other video applications are alsodiscussed in detail in the chapter

Chapter 5, “Transmission Rate Adaptation in Multimedia WLAN: A DynamicGames Approach,” considers the scheduling, rate adaptation, and buffermanagement in a multiuser wireless local area network (WLAN), where eachuser transmits scalable video payload Based on opportunistic scheduling, usersaccess the available medium (channel) in a decentralized manner The rateadaptation problem of the WLAN multimedia networks is then formulated as ageneral-sum switching control dynamic Markovian game

In Chapter 6, “Energy and Bandwidth Optimization in Mobile Video StreamingSystems,” the authors consider the problem of multicasting multiple variable bitrate video streams from a wireless base station to many mobile receivers over acommon wireless channel This chapter presents a sequence of increasinglysophisticated streaming protocols for optimizing energy usage and utilization ofthe wireless bandwidth

Chapter 7, “Resource Allocation for Scalable Videos over Cognitive RadioNetworks,” investigates the challenging problem of video communication overcognitive radio (CR) networks It first addresses the problem of scalable videoover infrastructure-based CR networks and then considers the problem ofscalable video over multihop CR networks

Trang 5

Chapter 8, “Cooperative Video Provisioning in Mobile Wireless Environments,”focuses on the challenging scenario of cooperative video provisioning in mobilewireless environments On one hand, it provides a general overview about thestate-of-the-art literature on collaborative mobile networking On the otherhand, it provides technical details and reports about the RAMP middleware casestudy, practically showing that node cooperation can properly achievestreaming adaptation.

Chapter 9, “Multilayer Iterative FEC Decoding for Video Transmission overWireless Networks,” develops a novel multilayer iterative decoding schemeusing deterministic bits to lower the decoding threshold of low-density parity-check (LDPC) codes These deterministic bits serve as known information in theLDPC decoding process to reduce redundancy during data transmission Unlikethe existing work, the proposed scheme addresses controllable deterministicbits, such as MPEG null packets, rather than widely investigated protocolheaders

Chapter 10, “Network-Adaptive Rate and Error Controls for WiFi VideoStreaming,” investigates the fundamental issues for network-adaptive mobilevideo streaming over WiFi networks Specifically, it highlights the practicalaspects of network-adaptive rate and error control schemes to overcome thedynamic variations of underlying WiFi networks

Chapter 11, “State of the Art and Challenges for 3D Video Delivery over MobileBroadband Networks,” examines the technologies underlying the delivery of 3Dvideo content to wireless subscribers over mobile broadband networks Theincorporated study covers key issues, such as the effective delivery of 3D videocontent in a system that has limited resources in comparison to wired networks,network design issues, as well as scalability and backward compatibilityconcepts

In Chapter 12, “A New Hierarchical 16-QAM-Based UEP Scheme for 3-D Videowith Depth Image–Based Rendering,” an unequal error protection (UEP) schemebased on hierarchical quadrature amplitude modulation (HQAM) for 3-D videotransmission is proposed The proposed scheme exploits the uniquecharacteristics of the color plus depth map stereoscopic video where the colorsequence has a significant impact on the reconstructed video quality

Chapter 13, “2D-to-3D Video Conversion: Techniques and Applications in 3DVideo Communications,” provides an overview of the main techniques for 2D-to-3D conversion, which includes different depth cues and state-of-the-artschemes In the 3D video communications context, 2D-to-3D conversion hasbeen used to improve the coding efficiency and the error resiliency andconcealment for the 2D video plus depth format

Chapter 14, “Combined CODEC and Network Parameters for an EnhancedQuality of Experience in Video Streaming,” presents the research involved inbridging the gap between the worlds of video compression/encoding andnetwork traffic engineering by (i) using enriched video trace formats inscheduling and traffic control, (ii) using prioritized and error-resilience features

in H.264, and (iii) optimizing the combination of the network performanceindices with codec-specific distortion parameters for an increased quality of thereceived video

Trang 6

In Chapter 15, “Video QoS Analysis over Wi-Fi Networks,” the authors present adetailed end-to-end QoS analysis for video applications over wireless networks,both infrastructure and ad hoc networks Several networking scenarios arecarefully configured with variations in network sizes, applications, codecs, androuting protocols to extensively analyze network performance.

MATLAB® is a registered trademark of The MathWorks, Inc For productinformation, please contact:

The MathWorks, Inc

3 Apple Hill Drive

Natick, MA, 01760-2098 USA

Contributors

Omar Abdul-Hameed

Faculty of Engineering and Physical Sciences

I-Lab: Multimedia Communications Research

Department of Electronic Engineering

Centre for Vision, Speech and Signal Processing

University of Surrey

Surrey, United Kingdom

Khalid Mohamed Alajel

Faculty of Engineering and Surveying

University of Southern Queensland

Toowoomba, Queensland, Australia

Raad Alturki

Department of Computer Science

Al Imam Mohammad Ibn Saud Islamic University

Riyadh, Saudi Arabia

Paolo Bellavista

Department of Electronics, Computer Science, and Systems

University of Bologna

Bologna, Italy

Trang 7

Centre for Vision, Speech and Signal Processing

Branch of Broadcast Technologies Research

Communications Research Centre Canada

Ottawa, Ontario, Canada

School of Computing Science

Simon Fraser University

Surrey, British Columbia, Canada

Cheng-Hsin Hsu

Department of Computer Science

National Tsing Hua University

Hsin Chu, Taiwan, Republic of China

Trang 8

Electrical Computer Engineering Department

University of British Columbia

Vancouver, British Columbia, Canada

Chinese Academy of Sciences

Haidian, Beijing, People’s Republic of China

JongWon Kim

School of Information and Communications

Gwangju Institute of Science and Technology (GIST)Gwangju, South Korea

Ahmet Kondoz

Centre for Vision, Speech and Signal ProcessingUniversity of Surrey

Surrey, United Kingdom

Vikram Krishnamurthy

Ghent, Belgium

and

Institute of Information Science

Beijing Jiaotong University

Trang 9

Haidian, Beijing, People’s Republic of China

Weisi Lin

School of Computer Engineering

Nanyang Technological University

Singapore, Singapore

Weiyao Lin

Shanghai Jiao Tong University

Xuhui, Shanghai, People’s Republic of China

Hassan Mansour

Communications R&D Center

Samsung Thales Co., Ltd

Seongnam-Si, South Korea

Manoranjan Paul

School of Computing and Mathematics

Charles Sturt University

Bathurst, New South Wales, Australia

Joseph Peters

School of Computing Science

Simon Fraser University

Surrey, British Columbia, Canada

Bo Rong

Trang 10

Wei Xiang

Faculty of Engineering and Surveying

University of Southern Queensland

Toowoomba, Queensland, Australia

Chongyang Zhang

Shanghai Jiao Tong University

Xuhui, Shanghai, People’s Republic of China

Network-Aware Error-Resilient Video Coding

Luís Ducla Soares and Paulo Nunes

CONTENTS

Trang 11

1.1 Introduction

1.2 Video Coding Framework

1.2.1 Rate-Distortion Optimization

1.2.2 Random Intra Refresh

1.3 Efficient Intracoding Refresh

1.3.1 Error-Resilient RDO-Driven Intra Refresh

1.3.1.1 RDO Intra and Inter Mode Decision

1.3.1.2 Error-Resilient Intra/Inter Mode Decision

1.3.2 Random Intra Refresh

1.4 Network-Aware Error-Resilient Video Coding Method

1.4.1 Intra/Inter Mode Decision with Constant α RD

1.4.2 Intra/Inter Mode Decision with Network-Aware αRD Selection

1.4.3 Model for the f NMD Mapping Function

1.4.4 Network-Aware Cyclic Intra Refresh

1.4.5 Intra Refresh with Network-Aware αRD and CIR Selection

In order to extend the useful lifetime of a video coding standard, standardizationbodies usually specify the minimum set of tools that are essential forguaranteeing interoperability between devices or applications of differentmanufacturers With this strategy, the standard may evolve continuouslythrough the development and improvement of its nonnormative parts Errorresilience is an example of a video coding tool that is not completely specified in

a normative way, in any of the currently available and emerging video codingstandards The reason for this is that it is simply not necessary forinteroperability and, therefore, it is one of the main degrees of freedom toimprove the performance of standard-based systems, even after the standardhas been finalized Nevertheless, recognizing the paramount importance of thistype of tool, standardization initiatives always include a minimum set of error-resilient hooks (e.g., in the form of bitstream syntax elements) in order tofacilitate the development of effective error resilience techniques, as needed forthe particular application envisaged

Error-resilience techniques are usually seen as playing a role at the decoder side

of the communication chain However, by using preventive error resiliencetechniques at the encoder side, which involve the intelligent design of theencoder, it is also possible to make the task of the decoder much easier in terms

Trang 12

of dealing with errors In fact, the performance of the decoder can greatly varydepending on the amount of error resilience help provided in the bitstreamgenerated by the encoder This way, at the encoder, the challenge is to developtechniques that make video bitstreams more resilient to errors, in order to allowthe decoder to better recover in case errors occur; these techniques may becalled preventive error resilience techniques At the decoder, the challenge is todevelop techniques that make it possible for the decoder to take all theavailable received data (correct and, eventually, corrupted) and decode it withthe best possible video quality, thus minimizing the negative subjective impact

of the errors on the video quality offered to the user; these techniques may becalled corrective error resilience techniques

Video communication systems, in order to be globally more error-resilient tochannel errors, typically include both preventive and corrective error-resilienttechniques An important class of preventive techniques is error-resilient sourcecoding, which consists of providing redundancy at the source coding level inorder to prevent error propagation and consequently reduce the distortioncaused by data corruption/loss Error-resilient source coding techniques includedata partitioning, resynchronization and reversible variable length codes [1,2],redundant coding schemes, such as sending the same information predictedfrom different references [3], scalable video coding [4,5,6], or multipledescription coding [7,8] Besides source coding redundancy, channel codingredundancy can also be used, where a good example is the case of forwarderror correction [9] In terms of corrective error-resilient techniques, errorconcealment techniques correspond to one of the most important classes, butother important techniques also exist, such as error detection and errorlocalization techniques [10] Error concealment techniques consist essentially ofpostprocessing methods aiming at recovering missing or corrupted data fromneighboring data (either spatially or temporally) [11], but for these techniques

to be truly effective, an error detection technique should be first used to detect

if an error has indeed occurred, followed by an error localization technique todetermine where the error occurred and which parts of the video content wereaffected [10] For a good review of the many different preventive and correctiveerror-resilient video coding techniques that have been proposed in theliterature, the reader can refer to Refs [12,13]

This chapter addresses the problem of error-resilient encoding, in particular ofhow to efficiently improve the resilience of compressed video bitstreams, whileadaptively considering the network characteristics in terms of information loss.Video coding systems that rely on predictive (inter) coding to remove temporalredundancy, such as those based on the H.264/AVC standard [14], are stronglyaffected by transmission errors/information loss due to the error propagationcaused by the prediction mechanisms Therefore, typical approaches to makebitstreams generated by the encoder more error-resilient rely on the adaptation

of the video coding mode decisions, at various levels (e.g., picture, slice, ormacroblock level), to the underlying network characteristics, trying to establish

an adequate trade-off between predictive and non-predictive encoding modes.This is done because nonpredictive modes are less efficient in terms ofcompression but can provide higher error resilience In this context, controlling

Trang 13

the amount of nonpredictive versus predictive encoded data is an efficient andhighly scalable error resilience tool.

The intracoding refresh schemes available in the literature[2,15,16,17,18,19,20,21,22] are a typical example of efficient error resiliencetechniques to improve the video quality over error-prone environments withoutrequiring changes to the bitstream syntax, thus allowing to continuouslyimprove the performance of standard video codecs without compromisinginteroperability However, a permanently open issue related to these techniques

is how to achieve the best trade-off between error resilience and codingefficiency

Since these schemes work by selectively coding in intra mode different parts ofthe video content at different time instants, they are able to avoid long-termpropagation of transmission or storage errors that could make the decodedquality decay very rapidly This way, these intracoding refresh schemes are able

to significantly improve the error resilience of the coded bitstreams and increasethe overall subjective impact of the decoded video While some schemes do notrequire any specific knowledge of what is being done at the decoder in terms oferror concealment [16,17,18], other approaches try to estimate the distortionexperienced at the decoder given a certain probability of data corruption/lossand the concealment techniques adopted [2,22]

The problem with most video coding mode decision approaches, includingtypical intracoding refresh schemes, is that they can significantly decrease thecoding efficiency if they make their decisions without taking into account therate-distortion (RD) cost of such decisions This problem can be dealt with bycombining the error-resilient coding mode decisions with the video encoder ratecontrol module [23], where the usual coding mode decisions are taken [24,25].This way, coding-efficient error robustness can be achieved In the specific case

of intracoding refresh schemes, a clever solution for this combination, is tocompare the RD cost of coding macroblocks (MBs) in intra and inter modes; ifthe cost of intracoding is only slightly larger than the cost of intercoding, thenthe coding mode could be changed to intra, providing error robustness almostfor free This strategy is able to reduce error propagation and, thus, to increaseerror robustness when transmission errors occur, at a very limited RD costincrease and without the huge complexity of estimating the expected distortionexperienced at the decoder

Nevertheless, in order for these error-resilient video coding mode decisionschemes to be really useful in an adaptive way, the current error characteristics

of the underlying network being used for transmission should be taken intoaccount For example, in the case of intracoding refresh schemes, this will allowthe bit rate resources allocated to intracoding refresh to be adequately adapted

to the error characteristics of the network [26] After all, networks with smallamounts of channel errors only need small amounts of intracoding refresh andvice versa Thus, efficient bit rate allocation in an error-resilient way has todepend on the feedback received from the network about its current errorcharacteristics, which define the error robustness needed

Therefore, network awareness makes it possible to dynamically vary the amount

of error resilience resources to better suit the current state of the network and,therefore, further improve the decoded video quality without reducing the error

Trang 14

robustness [26,27] This problem is nowadays more relevant than ever, since

more and more audiovisual content is accessed over error-prone networks, such

as mobile networks, and these networks can have extremely varying error

characteristics (over time)

As an illustrative insightful example, this chapter presents a fully automatic

network-aware MB intracoding refresh technique for error-resilient H.264/AVC

video coding, which also dynamically adjusts the amount of cyclically intra

refreshed MBs according to the network conditions, guaranteeing that endless

error propagation is avoided

The rest of the chapter is organized as follows Section 1.2 describes the general

video coding framework that was used for implementing the considered

error-resilient network-aware MB intracoding refresh scheme Section 1.3 introduces

the concept of efficient intracoding refresh, which will later be needed in Section

1.4, where the considered network-aware intracoding refresh scheme itself is

described Section 1.5 presents some relevant performance results for the

considered scheme in typical mobile network conditions and, finally, Section

1.6 concludes the chapter

1.2 Video Coding Framework

The network-aware error-resilient scheme described in this chapter relies on the

rate control scheme proposed by Li et al [24,28], as well as on the RD

optimization (RDO) framework and the random intra refresh technique included

in the H.264/AVC reference software [25] Since the main contributions and

novelty of network-aware error-resilient scheme described in this chapter regard

the latter two techniques, it is useful to first briefly review the RDO and the

random intra refresh techniques included in the H.264/AVC reference software

in order for the reader to better understand the described solutions

1.2.1 Rate-Distortion Optimization

The H.264/AVC video coding standard owes its major performance gains,

relatively to previous standards, essentially to the many different intra and inter

MB coding modes supported by the video coding syntax Although not all modes

are allowed in every H.264/AVC profile [14], even for the simplest profiles, such

as the Baseline Profile, the encoder has a plethora of possibilities to encode

each MB, which makes it difficult to accomplish optimal MB coding mode

decisions with low (encoding) complexity Besides the MB coding mode decision,

for motion-compensated inter coded MBs, finding the optimal motion vectors

and MB partitions is also not a straightforward task In this context, RDO

becomes a powerful tool, allowing the encoder to optimally select the best MB

coding modes and motion vectors (if applicable) [28,29]

In the H.264/AVC reference software [25], the best MB mode decision is

accomplished through the RDO technique, where the best MB mode is selected

by minimizing the following Lagrangian cost function:

Trang 15

where

MODE is one of the allowable MB coding modes (e.g., SKIP, INTER 16 × 16,

INTER 16 × 8, INTER 8 × 16, INTER 8 × 8, INTRA 4 × 4, INTRA 16 × 16)

QP is the quantization parameter

D(MODE, QP) and R(MODE,QP) are, respectively, the distortion (between the

original and the reconstructed MB) and the number of bits that will be achieved

by applying the corresponding MODE and QP

In Ref [28], it is recommended that, for intra (I) and inter predicted (P) slices,

λMODE be computed as follows:

(1.2)

Motion estimation can also be accomplished through the same framework In

this case, the best motion vector and reference frame can be selected by

minimizing the following Lagrangian cost function:

)

where

mv(REF) is the motion vector for the frame reference REF

D(mv(REF)) is the residual error measure, such as the sum of absolute

differences (SAD) between the original and the reference

R(mv(REF)) is the number of bits necessary to encode the corresponding motion

vector (i.e., the motion vector difference between the selected motion vector

and its prediction) and to signal the selected reference frame

In a similar way, Ref [28] also recommends that, for P-slices, λMOTION be computed

as

(1.4)

when the SAD measure is used

Since the quantization parameter is required for computing the Lagrangian

multipliers λMODE and λMOT1ON, as well as for computing the number of bits to encode

the residue for a given MB, a rate control mechanism must be used that can

efficiently compute for each MB (or set of MBs, such as a slice) an adequate

quantization parameter in order to maximize the decoded video quality for a

given bit rate budget In this case, the method proposed by Li et al [24,28] has

Trang 16

been used since it is the one implemented in the H.264/AVC reference software[25].

1.2.2 Random Intra Refresh

As mentioned earlier, the H.264/AVC reference software [25] includes a(nonnormative) technique for intra refreshing MBs Although this technique iscalled random intra refresh (RIR), it is not really a purely random refreshtechnique This technique is basically a cyclic intra refresh (CIR) technique forwhich the refresh order is not simply the raster scan order The refresh order israndomly defined once before encoding, but afterward intra refresh proceeds

cyclically, following the determined order, with n MBs for each time instant An

example of a randomly determined intra refresh order, for QCIF spatialresolution, may be seen in Figure 1.1

Example of random intra refresh order for QCIF spatial resolution (From Nunes,

P et al., Error resilient macroblock rate control for H.264/AVC video

coding, Proceedings of the IEEE International Conference on Image Processing,

Since the RIR technique used in the H.264/AVC reference software and alsoconsidered here is basically a CIR technique, in the remainder of this chapter,the acronyms RIR and CIR will be used interchangeably

One of the main advantages of this technique is that, being cyclic, it guaranteesthat all MBs will be refreshed, at least, once in each cycle, thus guaranteeingthat there are no MBs where errors can propagate indefinitely However, thistechnique also has disadvantages, one of which is the fact that all MBs arerefreshed exactly the same number of times This basically means that it is notpossible to refresh more often MBs that are more likely to be lost or are harder

to conceal at the decoder if an error does occur

Another important aspect of this technique is that MBs are refreshed according

to the predetermined order, without taking into account the eventual RD cost ofintra refreshing a given MB, as opposed to letting the rate control module decidewhich encoding mode is best in terms of RD cost This is exactly where there isroom for improvement: Intra refresh should be performed by taking into accountthe RD cost of a given MB

Trang 17

1.3 Efficient Intracoding Refresh

When deciding the best MB coding mode, notably between inter- andintracoding modes, the RDO framework, as briefly described in Section 1.2.1,simply selects the mode that has lower RD cost, given by Equation 1.1 ThisRDO framework, as implemented in the H.264/AVC reference software, does nottake into account other dimensions, besides rate and distortion optimization,such as the robustness of the bitstream in error-prone environments Therefore,some MBs are simply inter coded because their best inter mode RD cost isslightly lower than the best intra mode RD cost For these cases, selecting theintra mode, although not optimal in a strict RD sense, can prove to be a muchbetter decision when the bitstream becomes corrupted by errors (e.g., due topacket losses in packet networks), and the intra coded MBs can be used to stoperror propagation due to the (temporal) predictive coding modes Moreover, ifadditional error robustness is introduced through an intra refresh technique, forexample, as the one described in Section 1.2.2, some MBs can be highlypenalized in a RD sense, since they can be blindly forced to be encoded in anintra mode, without taking into account the RD cost of that decision

1.3.1 Error-Resilient RDO-Driven Intra Refresh

The main idea of a network-aware error-resilient scheme is to perform RDO in aresilient manner, using the relative RD cost of the best intra mode and the bestinter mode for each MB Therefore, whenever coding a given MB in intra modedoes not cost significantly more than the best intercoding mode, the given MB isgracefully forced to be encoded in its best intra mode

This error-resilient RDO provides an efficient intra refresh scheme, thusguaranteeing that the generated bitstream will be more robust to channelerrors, without having to spend a lot of bits on intra coded MBs, which typicallyreduces the decoded video quality when there are no errors in the channel Thisscheme can be described through the MB-level mode decision architecturedepicted in Figure 1.2

Architecture of the error-resilient MB intra/inter mode decision scheme FromNunes, P et al., Error resilient macroblock rate control for H.264/AVC video

Trang 18

1.3.1.1 RDO Intra and Inter Mode Decision

Before deciding the best mode to encode a given MB, the best inter mode RD

cost, J INTER, is computed from the set of all possible inter modes, and the best intra

mode RD cost, J INTRA, is computed from the set of all possible intra modes through

RDO, i.e., Equations 1.1 and 1.3, where

(1.5)and

(1.6)

where

INTER 8 × 16, INTER 8 × 8, INTER 8 × 4, INTER 4 × 8, and INTER 4 × 4)

PCM, or INTRA 8 × 8)

The best intra and inter modes are the ones with the lowest intra and inter RD

costs, respectively

1.3.1.2 Error-Resilient Intra/Inter Mode Decision

To control the amount of MBs that will be gracefully forced to be encoded in

intra mode, a control parameter, αRD (which basically specifies the tolerable RD

cost increase for replacing an inter by an intra MB) is used in such a way that

Notice that, for αRD = 1, no particular mode is favored in an RD sense, while for

αRD > 1, the intra modes are favored relatively to the inter modes (see Figure

1.3) Therefore, the amount of gracefully forced intra encoded MBs can be

controlled by the αRD parameter The MBs that end up being forced to intra mode

are the MBs for which the RD costs of intra and inter modes are similar, which

typically correspond to MBs that have high inter RD cost and, therefore, would

be difficult to conceal at the decoder if lost

Trang 19

1.3.2 Random Intra Refresh

Notice that the previous scheme does not guarantee that all MBs areperiodically refreshed, which, if not properly handled, could lead to an endlesspropagation of errors along time for some MBs in the video sequence To handlethis issue, an RIR can also be concurrently applied, but with a lower number ofrefreshed MBs per frame when compared with solely applying the RIRtechnique, in order not to compromise dramatically the RD efficiency

MBs with an intra/inter RD cost ratio below the line will be gracefully forced tointra mode (From Nunes, P et al., Error resilient macroblock rate control for

H.264/AVC video coding, Proceedings of the IEEE International Conference on

Image Processing, San Diego, CA, p 2134, October 2008 With permission ©

2008 IEEE.)

1.4 Network-Aware Error-Resilient Video Coding Method

The main limitation of the MB coding mode decision method described

in Section 1.3 is that the control parameter, αRD, is not dynamically adapted tothe actual network error conditions However, when feedback about the networkerror conditions is available, it would be possible to use this information toadjust the αRD control parameter in order to maximize the decoded video qualitywhile dynamically providing adequate error resilience

1.4.1 Intra/Inter Mode Decision with Constant αRD

When a constant αRD value is used without considering the current network errorconditions in terms of packet loss rate (PLR), the benefits of the techniquedescribed in Section 1.3 (and proposed in Ref [23]) are not fully exploited This

is clear from Figure 1.4, where the Foreman sequence has been encoded with

the Baseline Profile of H.264/AVC with different α values, including α = 1

Trang 20

In Figure 1.4, as well as in the remainder of Section 1.4, CIR is not used in order

to avoid biasing the behavior associated with the αRD parameter Notice,however, that the use of CIR is typically recommended, as mentioned in Section1.2.2 As can be seen, in these conditions, the optimal αRD (i.e., the one thatleads to the highest PSNR) is highly dependent on the network PLR

PSNR versus PLR for a constant αRD parameter for the Foreman sequence (From

Soares, L D et al., Efficient network-aware macroblock mode decision for error

resilient H.264/AVC video coding, Proceedings of the SPIE Conference on

Applications of Digital Image Processing, vol 7073, San Diego, CA, August

2008.)

As expected, when there are no errors (PLR = 0%), the highest decoding quality

is achieved when no intra MBs are forced (i.e., αRD = 1.0) However, for this

αRD value, the decoded video quality decays very rapidly as the PLR increases

On the other hand, if only a small amount of intra MBs are forced (i.e., αRD = 1.8),the decoded video quality is slightly improved for the higher PLR values, whencompared to the case with no forced intra MBs, but will be slightly penalized forerror-free transmission This effect is even more evident as the αRD valueincreases, which corresponds to the situation where more and more intra MBsare gracefully forced, depending on the αRD value For example, for αRD = 3.8 andfor a PLR of 10%, the decoded video quality is highly improved relatively to thesituation with no forced intra MBs (i.e., 6.36 dB), because the error propagation

is significantly reduced However, for lower PLRs, the decoded video quality is

penalized due to the excessive use of intracoding (i.e., 7.19 dB for PLR = 0% and 1.50 dB for PLR = 1%), still for α RD = 3.8

Therefore, from what has been presented earlier, it is possible to conclude thatthe optimal amount of intra coded MBs is highly dependent on the errorcharacteristics of the underlying network and, thus, the error resilience control

Trang 21

parameter αRD should be dynamically adjusted to the channel error conditions tomaximize the decoded quality.

PSNR versus αRD (alpha in the x-axis label) parameter for various PLRs for the Mother and Daughter sequence (From Soares, L.D et al., Efficient network-

aware macroblock mode decision for error resilient H.264/AVC video

coding, Proceedings of the SPIE Conference on Applications of Digital Image

Processing, vol 7073, San Diego, CA, August 2008.)

In order to illustrate the influence of the αRD parameter on the decodedPSNR, Figure 1.5 shows the decoded video quality, in terms of PSNR, versus the

αRD parameter for several PLRs for the Mother and Daughter sequence (QCIF, 10

Hz) encoded at 64 kbit/s Clearly, for each PLR condition, there is an αRD valuethat maximizes the decoded video quality For example, for a PLR of 10%, themaximum PSNR value is achieved for αRD = 2.2 To further illustrate theimportance of a proper selection of the αRD parameter and how it cansignificantly improve the overall decoded video quality under severe errorconditions, it should be noted that, for a PLR of 10%, the PSNR differencebetween having αRD = 2.2 and αRD = 1.1 is 5.47 dB

1.4.2 Intra/Inter Mode Decision with Network-Aware αRD Selection

A possible approach to address the problem of adapting the αRD parameter to thechannel error conditions is to use the information in the receiver reports (RR) ofthe real-time transport protocol (RTP) control protocol (RTCP) [30] to provide theencoder with the actual error characteristics of the underlying network Thismakes it possible to adaptively and efficiently select the amount of intra codedMBs to be inserted in each frame by taking into account this feedbackinformation about the rate of lost packets, as shown in Figure 1.6

Trang 22

FIGURE 1.6

Network-aware video encoding architecture (From Soares, L.D et al., Efficientnetwork-aware macroblock mode decision for error resilient H.264/AVC video

In the method presented here, the intra/inter mode decision is still based on the

αRD parameter, but this time αRD may depend on several aspects, such as thecontent type, the content spatial and temporal resolutions, the coding bit rate,and the PLR of the network

This way, by considering a mapping function f NMD, it will be possible todynamically determine the αRD parameter from the following expression:

(1.9)

where

PLR is the packet loss rate

S can be an n-dimensional vector characterizing the encoding scenario, for

example, in terms of the content motion activity and the texture codingcomplexity, the content spatial and temporal resolutions, and the coding bit rate

In this work, however, as it will be shown later in Section 1.4.3, the encodingscenario can be characterized solely by the encoded bit rate with a good

approximation The f NMD function basically maps the encoding scenario and thenetwork PLR into a “good” αRD parameter that dynamically maximizes theaverage decoding video quality Notice that, although it is not easy to obtain ageneral function, it can be defined for several classes of content and a discretelimited set of encoding parameters and PLRs In this chapter, it will be shown

that, by carefully designing the f NMD function, significant gains can be obtained interms of video quality regarding the reference method described in Section1.4.4

Therefore, the network-aware MB mode decision (NMD) method can be brieflydescribed through the following steps in terms of encoder operation:

1 Obtain the packet loss rate through network feedback

2 Compute the αRD parameter through the mapping function given by Equation1.9 (and detailed in the following)

3 Perform intra/inter mode decision using the αRD parameter, computed in Step

2, for the next MB to be encoded, and encode the MB

4 Check if a new network feedback report has arrived; if yes, go back to Step 1;

if not, go back to Step 3

Trang 23

Notice that it is out of the scope of this chapter to define when the networkreports are issued, since this will depend on how the network protocols areconfigured and the varying characteristics of the network itself [30].Nevertheless, in real application scenarios, it is important to design appropriateinterfacing mechanisms between the codec and the underlying network, in orderthat both encoder and decoder can adaptively adjust their operations according

to the network conditions [12]

Through Equation 1.9, the encoder is able to adjust the amount of intra refreshaccording to the network error conditions and the available bit rate This intrarefresh method typically increases the intra refresh for the more complex MBs,which are those typically more difficult to conceal The main problem of thisapproach is that it does not guarantee that all MBs in the scene are refreshed.This is clearly illustrated in Figure 1.7 for the Foreman sequence, where the right

image represents the relative amount of MB intra refresh along the sequence(lighter blocks mean more intra refresh) As it can be seen, with this intrarefresh scheme some MBs are never refreshed, which can lead to errorspropagating indefinitely along time in these MB positions (dark blocks in Figure1.7)

1.4.3 Model for the f NMD Mapping Function

In order to devise a model for the mapping function f NMD defined in Equation 1.9,

it is first important to see how the optimal αRD parameter varies with PLR This isplotted in Figure 1.8 for three different sequences (i.e., Mother and Daughter,

Foreman, and Mobile and Calendar) encoded at different bit rates, and

resolutions, for illustrative purposes Each curve in Figure 1.8 corresponds to a

different encoding scenario S, in terms of the content motion activity and the

texture coding complexity, the content spatial and temporal resolutions, and thecoding bit rate (see Equation 1.9) As shall be detailed later in Section 1.5, thesethree sequences have also been encoded at many other bit rates, and the kind

of curves obtained was always similar

Relative amount of intra refresh (b) for the MBs of the Foreman sequence (a) (QCIF, 15 Hz, 128 kbit/s,and α = 1.1) (From Nunes, P et al., Automatic and

Trang 24

adaptive network-aware macroblock intra refresh for error-resilient H.264/AVC

video coding, Proceedings of the IEEE International Conference on Image

IEEE.)

Example of optimal αRD versus PLR for various sequences and bit rates (FromSoares, L.D et al., Efficient network-aware macroblock mode decision for error

2008.)

As can be seen from the plots in Figure 1.8, the behavior of the optimal

αRD parameter versus the PLR is similar to that of a charging capacitor [31] (butstarting at αRD = 1.0) Therefore, for a given sequence and for a given bit rate

(i.e., a given encoding scenario S), it should be possible to model the behavior

of the αRD parameter with respect to the PLR with the following expression:

(1.10)

where PLR represents the packet loss rate, while K1 and K2 represent constantsthat are specific to the considered encoding scenario, notably the sequencecharacteristics and bit rate However, the main problem in using Equation1.10 to compute αRD is that, for a given sequence, a different set

of K1 and K2 would be needed for each of the considered bit rates, which would

be extremely unpractical In order to address this issue, it is important tounderstand how the optimal αRD parameter varies when both the PLR and the bitrate vary This variation is illustrated in Figure 1.9 for the Mobile and

Calendar sequence.

Trang 25

FIGURE 1.9

Optimal αRD versus PLR and bit rate for the Mobile and Calendar sequence (From

Soares, L.D et al., Efficient network-aware macroblock mode decision for error

2008.)

After close inspection of Figure 1.9, it can be seen that the K1 value, which

basically dictates the value of αRD toward which the curve asymptotically

converges, depends linearly on the used bit rate and, therefore, it can be

modeled by the following expression:

(1.11)

where r b is the bit rate, while a and b are the parameters that need to be

estimated for a given sequence

As for the K2 value, which dictates the growth rate of the considered

exponential, it appears, after exhaustive testing, to not depend on the used bit

rate Therefore, as a first approach, it can be considered to be constant, as in

(1.12)

This behavior was observed for the three different video sequences mentioned

earlier and, therefore, makes it possible to establish a final expression which

allows the video encoder to automatically select, for a given sequence, an

adequate αRD parameter when the PLR and the bit rate r b are known:

Trang 26

where a, b, and c are the model parameters that need to be estimated (see Ref.

[26]) After extensive experimentation, it was found that the parameters a, b, and c can be considered more or less independent of the sequence, which

means that a single set of parameters could be used for three different videosequences with a low fitting error This basically means that the encoding

scenario S, defined in Section 1.4.2, can be well represented only by the bit

rate r b

As explained in Ref [26], the parameters a, b, and c could be obtained by

considering four packet loss rates and two different bit rates for three different

sequences, corresponding to a total of 24 (r b , PLR) pairs, with the iterative

Levenberg–Marquardt method [32,33] By following this approach, the estimated

parameters are a = 0.83 × 10−6, b = 0.97, and c = 0.90.

1.4.4 Network-Aware Cyclic Intra Refresh

The approach presented in Section 1.4.2 can also be followed to simply adjustthe number of cyclic intra refreshed MBs per frame, based on the feedbackreceived about the network PLR, without any RD cost considerations This isshown in Figure 1.10, where it is clear that for each PLR condition there are anumber of cyclic intra refresh MBs that maximize the decoded video quality.However, when comparing the best PSNR results of Figures 1.5 and 1.10 (both

obtained for the Mother and Daughter sequence encoded with the same spatial

and temporal resolutions and the same bit rate), for a given PLR, the PSNRvalues obtained by varying αRD are always higher For example, for a PLR of 5%,

a maximum average PSNR of 37.03 dB is achieved for αRD = 1.9 (see Figure 1.5),while a maximum PSNR of only 34.94 dB is achieved for 33 cyclically intrarefreshed MBs in each frame (see Figure 1.10), a difference of approximately 2

dB This shows that by adequately choosing the αRD parameter it should bepossible to achieve a higher quality than when using the optimal number of CIRMBs This is mainly due to the fact that when simply cyclically intra refreshingsome MBs in a given frame, the additional RD cost of that decision can beextremely high, penalizing the overall video quality, since the “cheap” intra MBsare not looked for as in the efficient intracoding refresh solution based on the

αRD parameter

Trang 27

FIGURE 1.10

PSNR versus number of CIR MBs for various PLRs for the Mother and

Daughter sequence (From Soares, L.D et al., Efficient network-aware

macroblock mode decision for error resilient H.264/AVC video

1.4.5 Intra Refresh with Network-Aware αRD and CIR Selection

The main drawback of the scheme described in Section 1.4.3 of not being able

to guarantee that all MBs are periodically refreshed, can be alleviated by

introducing some additional CIR MBs per frame to guarantee that all MB

positions are refreshed with a minimum periodicity This requirement raises the

question of how to adaptively select an adequate amount of CIR MBs that is

sufficiently high to avoid long-term error propagation without penalizing too

much the encoder RD performance

A possible approach to tackle this problem is to decide the adequate αRD value

and the number of CIR MBs per frame separately, using a different model for

each of these two error resilience parameters For the αRD selection, the model

in Equation 1.9 is used As for the selection of the number of CIR MBs, it was

verified after exhaustive testing [27] that the optimal amount of CIR MBs tends

to increase linearly with the bit rate r b, for a given PLR, but tends to increase

exponentially with the PLR, for a given bit rate Based on these observations,

the following model was considered for the selection of the amount of CIR MBs

per frame:

(1.14)

where a1, b1, and c1 are the model parameters that need to be estimated In Ref

[27], these parameters have been determined by nonlinear curve fitting (the

Trang 28

Levenberg–Marquardt method) of the optimal amount of CIR MBs per frame,experimentally determined for a set of representative test sequences, encoding

bit rate ranges and packet loss rates The estimated parameters were a1 =12.97 × 10−6, b1 = −0.13, and c1 = 0.24; these parameter values will also beconsidered here

Figure 1.11 shows the proposed model as well as the experimental data for

the Mobile and Calendar test sequence As can be seen, a simple linear model

would not have represented well the experimental data

Optimal amount of CIR MBs per frame versus PLR and bit rate for the Mobile and

Calendar sequence (From Nunes, P et al., Automatic and adaptive

network-aware macroblock intra refresh for error-resilient H.264/AVC video

The CIR order is randomly defined once before encoding, as described in Section1.2.2 (and in Ref [25]), to avoid the subjectively disturbing effect of performingsequential (e.g., raster scan) refresh The determined order is then cyclicallyfollowed with the computed number of MBs being refreshed in each frame

Therefore, the complete network-aware MB intracoding refresh (NIR) scheme(which was initially proposed in Ref [27]) can be briefly described by thefollowing steps in terms of encoder operation:

Step 1 Obtain the PLR value through network feedback.

Step 2 Compute the number of CIR MBs to be used per frame, by using the

proposed f CIR function defined by Equation 1.14 and rounding it to the nearestinteger

Step 3 Compute the α RD value by using the f NMD function defined by Equation1.9 in Section 1.4.2

Trang 29

Step 4 For each MB in a frame, check if it should be forced to intra mode

according to the CIR order and the determined number of CIR MBs per frame; if

not, perform intra/inter mode decision using the αRD value computed in Step 3;

encode the MB with selected mode

Step 5 At the end of the frame, check if a new network feedback report has

arrived; if yes, go back to Step 1; if not, go back to Step 4

The definition of when the network reports are issued depends on how the

network protocols are configured and the varying characteristics of the network

itself [34]

Notice that independently selecting the αRD value and the amount of CIR MBs,

while they are likely interdependent, can lead to chosen values that do not

correspond to the optimal (αRD , CIR) pair However, it has been verified after

extensive experimentation that the considered independent selection process is

still robust in the sense that the chosen values are typically close enough to the

optimal pair and, therefore, the overall performance is not dramatically

penalized

1.5 Performance Evaluation

To evaluate the performance of the complete NIR scheme described in this

chapter, it has been compared in similar conditions to a reference intra refresh

scheme, which basically corresponds to the network-aware version with the

cyclic intra refresh scheme of the H.264/AVC reference software [25] described

in Section 1.4.4 This solution has been adopted because at the time of writing

no other network-aware intra refresh techniques, which adaptively take into

account the current network conditions, were known

In the reference scheme, the optimal number of CIR MBs per frame is selected

manually for the considered network conditions, while in the considered NIR

solution, the selection of the amount of CIR MBs per frame and the

αRD parameter is done fully automatically For the complete NIR and reference

schemes, the Mother and Daughter, the Foreman, and the Mobile and

Calendar video sequences have been encoded using the H.264/AVC Baseline

Profile [25] The used test conditions, which are representative of those

currently used for personal communications over mobile networks, are

summarized in Table 1.1 For QCIF, each frame was divided into three slices,

while for CIF each frame was divided into six slices In both cases, each

slice consists of three MB rows After encoding, each slice was mapped to an

RTP packet for network transmission [34]

TABLE 1.1

Test Conditions

Sequence Mother Daughter and Foreman Mobile and Calendar

Trang 30

Bit rate (kbit/s) 24–64 48–128 384–1152

Source: Nunes, P., Soares, D., and Periera, F., Error resilient macroblock rate

control for H.264/AVC video coding, Proceedings of the IEEE International

Conference on Image Processing, San Diego, CA, p 2134, October 2008 With

For the reference scheme, the number of cyclically intra refreshed MBs perframe was chosen for each PLR and bit rate, such that the decoded video qualitywould be the best possible This was done manually by performing anexhaustive set of tests using many different amounts of CIR MBs per frame andthen choosing the one that leads to the highest decoded average PSNR value,obtained by averaging over 50 different error patterns For the QCIF videosequences, the possible values for the number of cyclically intra refreshed MBswere chosen from the representative set {0, 5, 11, 22, 33, …, 99}, while for theCIF video sequences the representative set consisted of {0, 22, 44, 66,…, 396}

To simulate the network conditions, three different PLRs were considered: 1%,5%, and 10% Since each slice is mapped to one RTP packet, each lost packetwill correspond to a lost video slice Packet losses are considered independentand identically distributed For each one of the studied PLRs, each codedbitstream has been corrupted and then decoded 50 times (i.e., corresponding to

50 different error patterns or runs), while applying the default error concealmenttechnique implemented in the H.264/AVC reference software [25,28] Thepresented results correspond to PSNR averages of these 50 different runs for theluminance component (PSNR Y)

For the conditions mentioned earlier, PSNR Y results are shown in Tables1.2 through 1.4 for the Mother and Daughter, Foreman, and Mobile and

Calendar video sequences, respectively In these tables, NIR refers to the

complete network-aware intracoding refresh scheme described in this chapter,and JM refers to the reference technique (winning cases appear in bold) Inaddition, OPT corresponds to the manual selection of the best (αRD , CIR) pair.

TABLE 1.2

PSNR Results for the Mother and Daughter Sequence

Trang 31

Source: From Nunes, P., Soares, D., and Periera, F., Automatic and adaptive

TABLE 1.3

PSNR Results for the Foreman Sequence

TABLE 1.4

PSNR Results for the Mobile and Calendar Sequence

Trang 32

No visual results are given here, because the direct comparison of peer frames(encoded with different coding mode selection schemes) is rather meaningless

in this case; only the comparison of the total video quality for several errorpatterns makes sense This is due to the fact that the generated streams for theproposed and the reference techniques are different and, even if the same errorpattern is used to corrupt them, the errors will affect different parts of the data

at a given time instant, causing very different artifacts

To help the reader to better read the gains obtained with the proposed

technique, the results obtained for the Mother and Daughter sequence are

also shown in a plot in Figure 1.12, for both JM and NIR For the Foreman and the Mobile and Calendar sequences, the trends are similar.

Trang 33

FIGURE 1.12

PSNR results for the Mother and Daughter sequence (From Nunes, P., Soares,

D., and Periera, F., Automatic and adaptive network-aware macroblock intra

refresh for error-resilient H.264/AVC video coding, Proceedings of the IEEE

International Conference on Image Processing, Cairo, Egypt, p 3076, November

The presented results show that, when the fully automatic NIR scheme is used,the decoded video quality is significantly improved for the vast majority oftested conditions when compared to the reference method with a manuallyselected amount of CIR MBs (JM) Improvements of the NIR method can be as

high as 1.90 dB for the Mother and Daughter sequence encoded at 64 kbit/s and

a PLR of 5% The most significant exception is for the PLR of 10% and higher bitrates (see Tables 1.3 and 1.4) This exception is due to the fact that, for these

PLR and bit rate values, the number of CIR MBs chosen with the proposed f CIR isslightly different from the optimal values

When comparing the NIR scheme to the one proposed in Ref [26], which doesnot use CIR, the NIR PSNR Y values are most of the times higher than or equal tothose achieved in Ref [26] The highest gains occur for the Foreman sequence encoded at 128 kbit/s and a PLR of 10% (0.90 dB), and for the Mobile and

Calendar sequence encoded at 768 kbit/s and a PLR of 10% (0.60 dB) For the

cases, where the NIR leads to lower PSNR Y values, the losses are never more

than 0.49 dB, which happens for the Mobile and Calendar sequence encoded at

896 kbit/s and a PLR of 5%

Notice, however, that the scheme in Ref [26] cannot guarantee that all MBs willeventually be refreshed, which is a major drawback for real usage in error-proneenvironments, such as mobile networks On the other hand, the one described inthis chapter can, not only overcome this drawback, but it does so fullyautomatically, without any user intervention

Trang 34

1.6 Final Remarks

This chapter describes a method to efficiently and fully automatically performintracoding refresh, while taking into account the PLR of the underlying networkand the encoded bit rate The described method can be used to efficientlygenerate error-resilient H.264/AVC bitstreams that are perfectly adapted to thechannel error characteristics This is extremely important because it can meanthat error-resilient video transmission will be possible in environments withvarying error characteristics with an improved quality, notably, when compared

to the case where the MB intracoding decisions are taken without consideringthe error characteristics of the network

Acknowledgments

The authors would like to acknowledge that the work described in this chapterwas developed at Instituto de Telecomunicações (Lisboa, Portugal) and wassupported by FCT project PEst-OE/EEI/LA0008/2011

References

1 A H Li, S Kittitornkun, Y.-H Hu, D.-S Park, J Villasenor, Data partitioningand reversible variable length codes for robust video

communications, Proceedings of the IEEE Data Compression Conference,

Snowbird, UT, pp 460–469, March 2000

2 G Cote, S Shirani, F Kossentini, Optimal mode selection and synchronization

for robust video communications over error-prone networks, IEEE Journal on

Selected Areas in Communications, 18(6), 952–965, June 2000.

3 S Wenger, G D Knorr, J Ott, F Kossentini, Error resilience support in

H.263+, IEEE Transactions on Circuits and Systems for Video Technology, 8(7),

867–877, November 1998

4 L P Kondi, F Ishtiaq, A K Katsaggelos, Joint source-channel coding for

motion-compensated DCT-based SNR scalable video, IEEE Transactions on

Image Processing, 11(9), 1043–1052, September 2002.

5 H M Radha, M van der Schaar, Y Chen, The MPEG-4 fine-grained scalable

video coding method for multimedia streaming over IP, IEEE Transactions on

Multimedia, 3(1), 53–68, March 2001.

6 T Schierl, T Stockhammer, T Wiegand, Mobile video transmission using

scalable video coding, IEEE Transactions on Circuits and Systems for Video

Technology, 17(9), 1204–1217, September 2007.

7 R Puri, K Ramchandran, Multiple description source coding through forward

error correction codes, Proceedings of the Asilomar Conference on Signals,

Systems, and Computers, Pacific Grove, CA, vol 1, pp 342–346, October 1999.

8 V K Goyal, Multiple description coding: Compression meets the

network, IEEE Signal Processing Magazine, 18(5), 74–93, September 2001.

9 K Stuhlmüller, N Färber, M Link, B Girod, Analysis of video transmission

over lossy channels, IEEE Journal on Selected Areas in Communications, 18(6),

1012–1032, June 2000

10 L D Soares, F Pereira, Error resilience and concealment performance for

MPEG-4 frame-based video coding, Signal Processing: Image Communication,

14(6–8), 447–472, May 1999

Trang 35

11 A K Katsaggelos, F Ishtiaq, L.P Kondi, M.-C Hong, M Banham, J Brailean,

Error resilience and concealment in video coding, Proceedings of the European

Signal Processing Conference, Rhodes, Greece, pp 221–228, September 1998.

12 Y Wang, S Wenger, J Wen, A Katsaggelos, Error resilient video coding

techniques IEEE Signal Processing Magazine, 17(4), 61–82, July 2000.

13 F Zhai, A Katsaggelos, Joint Source-Channel Video Transmission, Morgan &

Claypool Publishers, San Rafael, CA, 2007

14 ISO/IEC 14496-10, Information Technology—Coding of Audio-Visual Objects

—Part 10: Advanced Video Coding, 2005

15 ISO/IEC 14496-2, Information Technology—Coding of Audio-Visual Objects—Part 2: Visual (2nd Edn.), 2001

16 P Haskell, D Messerschmitt, Resynchronization of motion compensated

video affected by ATM cell loss, Proceedings of the IEEE International

Conference on Acoustics, Speech and Signal Processing, San Francisco, CA, vol.

3, pp 545–548, March 1992

17 G Côté, F Kossentini, Optimal intra coding of blocks for robust video

communication over the Internet, Signal Processing: Image Communication,

15(1–2), 25–34, September 1999

18 J Y Liao, J.D Villasenor, Adaptive intra block update for robust transmission

of H.263, IEEE Transactions on Circuits and Systems for Video Technology,

10(1), 30–35, February 2000

19 P Frossard, O Verscheure, AMISP: A complete content-based MPEG-2

error-resilient scheme, IEEE Transactions on Circuits and Systems for Video

Technology, 11(9), 989–998, September 2001.

20 Z He, J Cai, C Chen, Joint source channel rate-distortion analysis for

adaptive mode selection and rate control in wireless video coding, IEEE

Transactions on Circuits and Systems for Video Technology, 12(6), 511–523,

June 2002

21 H Shu, L Chau, Intra/Inter macroblock mode decision for error-resilient

transcoding, IEEE Transactions on Multimedia, 10(1), 97–104, January 2008.

22 H-J Ma, F Zhou, R.-X Jiang, Y.-W Chen, A network-aware error-resilient

method using prioritized intra refresh for wireless video communications, Journal

of Zhejiang University - Science A, 10(8), 1169–1176, August 2009.

23 P Nunes, L.D Soares, F Pereira, Error resilient macroblock rate control for

H.264/AVC video coding, Proceedings of the IEEE International Conference on

Image Processing, San Diego, CA, pp 2132–2135, October 2008.

24 Z Li, F Pan, K Lim, G Feng, X Lin, S Rahardaj, Adaptive basic unit layer

rate control for JVT, Doc JVT-G012, 7th MPEG Meeting, Pattaya, Thailand, March

2003

Available: http://iphome.hhi.de/suehring/tml/download/

26 L.D Soares, P Nunes, F Pereira, Efficient network-aware macroblock mode

decision for error resilient H.264/AVC video coding, Proceedings of the SPIE

Conference on Applications of Digital Image Processing, vol 7073, San Diego,

CA, pp 1–12, August 2008

27 P Nunes, L.D Soares, F Pereira, Automatic and adaptive network-aware

macroblock intra refresh for error-resilient H.264/AVC video coding, Proceedings

Trang 36

of the IEEE International Conference on Image Processing, Cairo, Egypt, pp.

3073–3076, November 2009

28 K.-P Lim, G Sullivan, T Wiegand, Text description of joint model reference

encoding methods and decoding concealment methods, Doc JVT-X101, ITU-T

VCEG Meeting, Geneva, Switzerland, June 2007.

29 T Wiegand, H Schwarz, A Joch, F Kossentini, G Sullivan, Rate-constrained

coder control and comparison of video coding standards, IEEE Transactions on

Circuits and Systems for Video Technology, 13(7), 688–703, July 2003.

30 H Schulzrinne, S Casner, R Frederick, V Jacobson, RTP: A transport

protocol for real-time applications, Internet Engineering Task Force, RFC 1889,

January 1996

31 R C Dorf, J.A Svoboda, Introduction to Electric Circuits, 5th Edition, Wiley,

New York, 2001

32 K Levenberg, A method for the solution of certain non-linear problems in

least squares, Quarterly of Applied Mathematics, 2(2), 164–168, July 1944.

33 D Marquardt, An algorithm for the least-squares estimation of nonlinear

parameters, SIAM Journal of Applied Mathematics, 11(2), 431–441, June 1963.

34 S Wenger, H.264/AVC over IP, IEEE Transactions on Circuits and Systems

for Video Technology, 13(7), 645–656, July 2003.

2

Distributed Video Coding: Principles and Challenges

Jürgen Slowack and Rik Van de Walle

CONTENTS

2.1 Introduction

2.2 Theoretical Foundations

2.2.1 Lossless Distributed Source Coding (Slepian–Wolf)

2.2.2 Lossy Compression with Receiver Side Information (Wyner–Ziv)

2.3 General Concept

2.4 Use-Case Scenarios in the Context of Wireless Networks

2.5 DVC Architectures and Components

2.5.1 Side Information Generation

2.5.1.1 Frame-Level Interpolation Strategies

2.5.1.2 Frame-Level Extrapolation Strategies

2.5.1.3 Encoder-Aided Techniques

2.5.1.4 Partitioning and Iterative Refinement

2.5.2 Correlation Noise Estimation

2.5.3 Channel Coding

2.5.4 Determining the WZ Rate

2.5.5 Transformation and Quantization

2.5.6 Mode Decision

2.6 Evaluation of DVC Compression Performance

2.7 Other DVC Architectures and Scenarios

2.8 Future Challenges and Research Directions

References

Trang 37

2.1 Introduction

A video compression system consists of an encoder that converts uncompressedvideo sequences into a compact format suitable for transmission or storage, and

a decoder that performs the opposite operations to facilitate video display

Compression is typically achieved by exploiting similarities between frames(temporal direction), as well as similarities between pixels within the sameframe (spatial direction) The conventional way is to exploit these similarities atthe encoder Using already-coded information, the encoder generates aprediction of the information still to be coded Next, the difference between theinformation to be coded and the prediction is further processed and compressedthrough entropy coding

The accuracy of the prediction determines the compression performance, in thesense that more accurate predictions will lead to smaller residuals and bettercompression As a consequence, computationally complex algorithms have beendeveloped to search for the best predictor This has led to a complexityimbalance, in which the encoder is significantly more complex than the decoder

A radically different approach to video coding—called distributed video coding(DVC)—has emerged during the past decade In DVC, the prediction isgenerated at the decoder instead of at the encoder As this prediction—calledside information—typically contains errors, additional information is sent fromthe encoder to the decoder to allow correcting the side information Generatingthe prediction signal at the decoder shifts the computational burden from theencoder to the decoder side This facilitates applications in which encodingdevices are relatively cheap, small, and/or power-friendly Some examples ofthese applications include wireless sensor networks, wireless video surveillance,and videoconferencing using mobile devices [44]

Many publications covering DVC have appeared (including a book on distributedsource coding [DSC] [16]) The objective of this chapter is therefore to provide acomprehensive overview of the basic principles behind DVC and illustrate theseprinciples with examples from the current state-of-the-art Based on thisdescription, the main future challenges will be identified and discussed

2.2 Theoretical Foundations

Before describing the different DVC building blocks in detail we start byhighlighting some of the most important theoretical results This includes adiscussion on the Slepian–Wolf and Wyner–Ziv (WZ) theorems, which aregenerally regarded as providing a fundamental information–theoretical basis forDVC It should be remarked that these results apply to DSC in general and thatDVC is only a special case

2.2.1 Lossless Distributed Source Coding (Slepian–Wolf)

David Slepian and Jack K Wolf considered the configuration depicted in Figure2.1, in which two sources X and Y generate correlated sequences of information

symbols [51] Each of these sequences is compressed by a separate encoder,

namely, one for X and one for Y The encoder of each source is constrained to

operate without knowledge of the other source, explaining the term DSC Thedecoder, on the other hand, receives both coded streams as input and should be

Trang 38

able to exploit the correlation between the sources X and Y for decoding the

information symbols

Slepian and Wolf consider the setup in which two correlated sources X and Y are

coded independently, but decoded jointly

Surprisingly, Slepian and Wolf proved that the compression bound for this

configuration is the same as in the case where the two encoders are allowed to

communicate More precisely, they proved that the rates R X and R Y of the coded

streams satisfy the following set of equations:

(2.1)

where H(.) denotes the entropy These conditions can be represented

graphically, as a so-called admissible or achievable rate region, as depicted

in Figure 2.2

While any point on the line H(X,Y) is equivalent from a compression point of

view, special attention goes to the corner points of the achievable rate region

For example, the point (H(X|Y), H(Y)) corresponds to the special case of source

coding with side information available at the decoder, as depicted in Figure 2.3

This case is of particular interest in the context of current DVC solutions, where

side information Y is generated at the decoder and used to decode X According

to the Slepian–Wolf theorem, the minimal rate required in this case is the

conditional entropy H(X|Y).

2.2.2 Lossy Compression with Receiver Side Information (Wyner–Ziv)

The work of Slepian and Wolf relates to lossless compression These results were

extended to lossy compression by Aaron D Wyner and Jacob Ziv [65] Although

introducing quality loss seems undesirable at first thought, it is often necessary

to allow some loss of quality at the output of the decoder in order to achieve

even higher compression ratios (i.e., lower bit rates)

Trang 39

FIGURE 2.2

Graphical representation of the achievable rate region

(Lossless) source coding with side information available at the decoder

Denote the acceptable distortion between the original signal X and the decoded signal X′ as D = E[d(X, X′)], where d is a specific distortion metric (such as the

mean-squared error) Two cases are considered for compression with side

information available at the decoder In the first case, the side information Y is

not available at the encoder The rate of the compressed stream for this case isdenoted RWZX|Y(D)RX|YWZ(D) In the second case, Y is made available to the

encoder as well, resulting in a rate denoted RX|Y(D)RX|Y(D) With thesenotations, Wyner and Ziv proved that

(2.2)

In other words, not having the side information available at the encoder results

in a rate loss greater than or equal to zero, for a particular distortion D.

Interestingly, the rate loss has been proved to be zero in the case of Gaussianmemoryless sources and a mean-squared error (MSE) distortion metric

The results of Wyner and Ziv were further extended by other researchers, for

example, proving that the equality also holds in case X is equal to the sum of arbitrarily distributed Y and independent Gaussian noise [46] In addition, Zamirshowed that the rate loss for sources with general statistics is less than 0.5 bitsper sample when using the MSE as a distortion metric [68]

2.3 General Concept

Trang 40

The theorems of Slepian–Wolf and Wyner–Ziv apply to DSC, and therefore also

to the specific case of DVC Basically, the theorems indicate that a DVC systemshould be able to achieve the same compression performance as a conventionalvideo compression system However, the proofs do not provide insights on how

to actually construct such a system As a result, the first DVC systems haveappeared in the scientific literature only about 30 years later

The common approach in the design of a DVC system is to consider Y as being a corrupted version of X This way, the proposed setup becomes highly similar to

a channel-coding scenario In the latter, a sequence of information

symbols X could be sent across an error-prone communication channel, so that Y has been received instead of X To enable successful recovery of X at the

receiver’s end, the sender could include additional error-correcting information

calculated on X, such as turbo or low-density parity-check (LDPC) codes [33].The difference between such a channel-coding scenario and the setup depicted

in Figure 2.3 is that in our case Y is already available at the decoder In other

words, the encoder should only send the error-correcting information to allow

recovery of X (or X′ in the lossy case) Since Y is already available at the decoder instead of being communicated by the encoder, the errors in Y are said to be

induced by virtual noise (also called correlation noise) on a virtualcommunication channel

2.4 Use-Case Scenarios in the Context of Wireless Networks

By generating Y itself at the decoder side as a prediction of the original X at the

encoder, the complexity balance between the encoder and the decoderbecomes totally different from a conventional video compression system such

as H.264/AVC [64] While conventional systems feature an encoder that issignificantly more complex than the decoder, in DVC the complexity balance iscompletely the opposite

In the context of videoconferencing using mobile devices, DVC can be used incombination with conventional video coding techniques (such as H.264/AVC),which allows to assign computationally less complex steps to mobile devices,while performing computationally complex operations in the network

Tiêu đề	Advance video communication in wireless network
Tác giả	Luís Ducla Soares, Paulo Nunes, Jürgen Slowack, Rik Van de Walle, Manoranjan Paul, Weisi Lin, Weiyao Lin, Bing Zhou, Dong Jiang, Chongyang Zhang, Jane Wei Huang, Hassan Mansour, Vikram Krishnamurthy, Mohamed Hefeeda, Cheng-Hsin Hsu, Joseph Peters, Donglin Hu, Shiwen Mao, Paolo Bellavista, Antonio Corradi, Carlo Giannelli, Bo Rong, Yiyan Wu, Gilles Gagnon, JongWon Kim, Sang-Hoon Park, Omar Abdul-Hameed, Erhan Ekmekcioglu, Ahmet Kondoz, Khalid Mohamed Alajel, Wei Xiang, Chunyu Lin, Jan De Cock, Peter Lambert, Araz Jahaniaval, Dalia Fayek, Rashid Mehmood, Raad Alturki
Chuyên ngành	Wireless Video Communications
Thể loại	Book

Định dạng
Số trang	518
Dung lượng	10,34 MB