1. Trang chủ
  2. » Luận Văn - Báo Cáo

Advance video communication in wireless network

518 0 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Nội dung

"Wireless video communications encompass a broad range of issues and opportunities that serve as the catalyst for technical innovations. To disseminate the most recent advances in this challenging yet exciting field, Advanced Video Communications over Wireless Networks provides an in-depth look at the fundamentals, recent technical achievements, challenges, and emerging trends in mobile and wireless video communications. The editors have carefully selected a panel of researchers with expertise in diverse aspects of wireless video communication to cover a wide spectrum of topics, including the underlying theoretical fundamentals associated with wireless video communications, the transmission schemes tailored to mobile and wireless networks, quality metrics, the architectures of practical systems, as well as some novel directions. They address future directions, including Quality-of-Experience in wireless video communications, video communications over future networks, and 3D video communications. The book presents a collection of tutorials, surveys, and original contributions, providing an up-to-date, accessible reference for further development of research and applications in mobile and wireless video communication systems. The range of coverage and depth of expertise make this book the go-to resource for facing current and future challenges in this field."

Trang 2

Network-Aware Error-Resilient Video Coding

Luís Ducla Soares and Paulo Nunes

Distributed Video Coding: Principles and Challenges

Jürgen Slowack and Rik Van de Walle

Computer Vision–Aided Video Coding

Manoranjan Paul and Weisi Lin

Cooperative Video Provisioning in Mobile Wireless Environments

Paolo Bellavista, Antonio Corradi, and Carlo Giannelli

Trang 3

Video QoS Analysis over Wi-Fi Networks

Rashid Mehmood and Raad Alturki

Video communication has evolved from a simple tool for visual communicationto a key enabler for various video applications A number of exciting videoapplications have been successfully deployed in recent years, with the goal ofproviding users with more flexible, personalized, and content-rich viewingexperience Accompanied with the ubiquitous video applications, we have alsoexperienced a paradigm shift from passive, wired, and centralized video contentaccess to interactive, wireless, and distributed content access Undoubtedly,wireless video communications have paved the way for advanced applications.However, given the distributed, resource-constraint, and heterogeneous natureof wireless networks, the support of quality video communications over wirelessnetworks is still challenging Video coding is one of the indispensablecomponents in various wireless video applications, whereas the wirelessnetwork condition always imposes more stringent requirements on codingtechnologies To cope with the limited transmission bandwidth and to offeradaptivity to the harsh wireless channels, rate control, packet scheduling, aswell as error control mechanisms are usually incorporated in the design ofcodecs to enable efficient and reliable video communications At the same time,due to energy constraint in wireless systems, video coding algorithms shouldoperate with the lowest possible power consumption Therefore, video codingover wireless networks is inherently a complex optimization problem with a setof constraints In addition, the high heterogeneity and user mobility associatedwith wireless networks are also key issues to be tackled for a seamless deliveryof quality-of-experience supported video streams.

To sum up, wireless video communications encompass a broad range ofchallenges and opportunities that provide the catalyst for technical innovations.To disseminate the most recent advances in this challenging yet exciting field,we bring forth this book as a compilation of high-quality chapters This book isintended to be an up-to-date reference book on wireless video communications,providing the fundamentals, recent technical achievements, challenges, andsome emerging trends We hope that the book will be accessible to variousaudiences, ranging from those in academia and industry to seniorundergraduates and postgraduates To achieve this goal, we have solicitedchapters from a number of researchers who are experts in diverse aspects ofwireless video communications We received a good response and, finally, afterpeer review and revision, 15 chapters were selected These chapters cover a

Trang 4

wide spectrum of topics, including the underlying theoretical fundamentalsassociated with wireless video communications, transmission schemes tailoredto mobile and wireless networks, quality metrics, architectures of practicalsystems, as well as some novel directions In what follows, we present asummary of each chapter.

In Chapter 1, “Network-Aware Error-Resilient Video Coding,” a network-awareIntra coding refresh method is presented This method increases the errorrobustness of H.264/AVC bitstreams, considering the network packet loss rateand the encoding bit rate, by efficiently taking into account the rate-distortionimpact of Intra coding decisions while guaranteeing that errors do notpropagate.

Chapter 2, “Distributed Video Coding: Principles and Challenges,” is a tutorial ondistributed video coding (DVC) In contrast to conventional video compressionschemes featuring an encoder that is significantly more complex than thedecoder, in DVC the complexity distribution is the reverse This chapter providesan overview of the basic principles, state of the art, current problems, andtrends in DVC.

Chapter 3, “Computer Vision Aided Video Coding,” studies video coding from theperspective of computer vision Motivated by the fact that the human visualsystem (HVS) is the ultimate receiver of the majority of compressed videos andthat there is a scope to remove unimportant information through HVS, thechapter proposes a computer vision–aided video coding technique by exploitingthe spatial and temporal redundancies with visually unimportant information.In Chapter 4, “Macroblock Classification Method for Computation Control VideoCoding and Other Video Applications Involving Motions,” a new macroblock (MB)classification method is proposed, which classifies MBs into different classesaccording to their temporal and spatial motion and texture information.Furthermore, the implementations of the proposed MB classification method intocomplexity-scalable video coding as well as other video applications are alsodiscussed in detail in the chapter.

Chapter 5, “Transmission Rate Adaptation in Multimedia WLAN: A DynamicGames Approach,” considers the scheduling, rate adaptation, and buffermanagement in a multiuser wireless local area network (WLAN), where eachuser transmits scalable video payload Based on opportunistic scheduling, usersaccess the available medium (channel) in a decentralized manner The rateadaptation problem of the WLAN multimedia networks is then formulated as ageneral-sum switching control dynamic Markovian game.

In Chapter 6, “Energy and Bandwidth Optimization in Mobile Video StreamingSystems,” the authors consider the problem of multicasting multiple variable bitrate video streams from a wireless base station to many mobile receivers over acommon wireless channel This chapter presents a sequence of increasinglysophisticated streaming protocols for optimizing energy usage and utilization ofthe wireless bandwidth.

Chapter 7, “Resource Allocation for Scalable Videos over Cognitive RadioNetworks,” investigates the challenging problem of video communication overcognitive radio (CR) networks It first addresses the problem of scalable videoover infrastructure-based CR networks and then considers the problem ofscalable video over multihop CR networks.

Trang 5

Chapter 8, “Cooperative Video Provisioning in Mobile Wireless Environments,”focuses on the challenging scenario of cooperative video provisioning in mobilewireless environments On one hand, it provides a general overview about thestate-of-the-art literature on collaborative mobile networking On the otherhand, it provides technical details and reports about the RAMP middleware casestudy, practically showing that node cooperation can properly achievestreaming adaptation.

Chapter 9, “Multilayer Iterative FEC Decoding for Video Transmission overWireless Networks,” develops a novel multilayer iterative decoding schemeusing deterministic bits to lower the decoding threshold of low-density parity-check (LDPC) codes These deterministic bits serve as known information in theLDPC decoding process to reduce redundancy during data transmission Unlikethe existing work, the proposed scheme addresses controllable deterministicbits, such as MPEG null packets, rather than widely investigated protocolheaders.

Chapter 10, “Network-Adaptive Rate and Error Controls for WiFi VideoStreaming,” investigates the fundamental issues for network-adaptive mobilevideo streaming over WiFi networks Specifically, it highlights the practicalaspects of network-adaptive rate and error control schemes to overcome thedynamic variations of underlying WiFi networks.

Chapter 11, “State of the Art and Challenges for 3D Video Delivery over MobileBroadband Networks,” examines the technologies underlying the delivery of 3Dvideo content to wireless subscribers over mobile broadband networks Theincorporated study covers key issues, such as the effective delivery of 3D videocontent in a system that has limited resources in comparison to wired networks,network design issues, as well as scalability and backward compatibilityconcepts.

In Chapter 12, “A New Hierarchical 16-QAM-Based UEP Scheme for 3-D Videowith Depth Image–Based Rendering,” an unequal error protection (UEP) schemebased on hierarchical quadrature amplitude modulation (HQAM) for 3-D videotransmission is proposed The proposed scheme exploits the uniquecharacteristics of the color plus depth map stereoscopic video where the colorsequence has a significant impact on the reconstructed video quality.

Chapter 13, “2D-to-3D Video Conversion: Techniques and Applications in 3DVideo Communications,” provides an overview of the main techniques for 2D-to-3D conversion, which includes different depth cues and state-of-the-artschemes In the 3D video communications context, 2D-to-3D conversion hasbeen used to improve the coding efficiency and the error resiliency andconcealment for the 2D video plus depth format.

Chapter 14, “Combined CODEC and Network Parameters for an EnhancedQuality of Experience in Video Streaming,” presents the research involved inbridging the gap between the worlds of video compression/encoding andnetwork traffic engineering by (i) using enriched video trace formats inscheduling and traffic control, (ii) using prioritized and error-resilience featuresin H.264, and (iii) optimizing the combination of the network performanceindices with codec-specific distortion parameters for an increased quality of thereceived video.

Trang 6

In Chapter 15, “Video QoS Analysis over Wi-Fi Networks,” the authors present adetailed end-to-end QoS analysis for video applications over wireless networks,both infrastructure and ad hoc networks Several networking scenarios arecarefully configured with variations in network sizes, applications, codecs, androuting protocols to extensively analyze network performance.

MATLAB® is a registered trademark of The MathWorks, Inc For productinformation, please contact:

The MathWorks, Inc.3 Apple Hill Drive

Natick, MA, 01760-2098 USATel: 508-647-7000

Omar Abdul-Hameed

Faculty of Engineering and Physical SciencesI-Lab: Multimedia Communications ResearchDepartment of Electronic Engineering

Centre for Vision, Speech and Signal ProcessingUniversity of Surrey

Surrey, United Kingdom

Khalid Mohamed Alajel

Faculty of Engineering and SurveyingUniversity of Southern QueenslandToowoomba, Queensland, Australia

Raad Alturki

Department of Computer Science

Al Imam Mohammad Ibn Saud Islamic UniversityRiyadh, Saudi Arabia

Paolo Bellavista

Department of Electronics, Computer Science, and SystemsUniversity of Bologna

Bologna, Italy

Trang 7

Centre for Vision, Speech and Signal ProcessingUniversity of Surrey

Surrey, United Kingdom

Dalia Fayek

School of EngineeringUniversity of GuelphGuelph, Ontario, Canada

Gilles Gagnon

Branch of Broadcast Technologies ResearchCommunications Research Centre CanadaOttawa, Ontario, Canada

Trang 8

Electrical Computer Engineering DepartmentUniversity of British Columbia

Vancouver, British Columbia, Canada

Araz Jahaniaval

School of EngineeringUniversity of GuelphGuelph, Ontario, Canada

Dong Jiang

Institute of MicroelectronicsChinese Academy of Sciences

Haidian, Beijing, People’s Republic of China

JongWon Kim

School of Information and Communications

Gwangju Institute of Science and Technology (GIST)Gwangju, South Korea

Ahmet Kondoz

Faculty of Engineering and Physical SciencesI-Lab: Multimedia Communications ResearchDepartment of Electronic Engineering

Centre for Vision, Speech and Signal ProcessingUniversity of Surrey

Surrey, United Kingdom

Ghent, Belgiumand

Institute of Information ScienceBeijing Jiaotong University

Trang 9

Haidian, Beijing, People’s Republic of China

Weisi Lin

School of Computer EngineeringNanyang Technological UniversitySingapore, Singapore

Instituto Universitário de LisboaLisboa, Portugal

Sang-Hoon Park

Communications R&D CenterSamsung Thales Co., Ltd.Seongnam-Si, South Korea

Trang 10

Branch of Broadcast Technologies ResearchCommunications Research Centre CanadaOttawa, Ontario, Canada

Instituto Universitário de Lisboa (ISCTE-IUL)Lisboa, Portugal

Rik Van de Walle

Wei Xiang

Faculty of Engineering and SurveyingUniversity of Southern QueenslandToowoomba, Queensland, Australia

Network-Aware Error-Resilient Video Coding

Luís Ducla Soares and Paulo Nunes

CONTENTS

Trang 11

1.1 Introduction

1.2 Video Coding Framework1.2.1 Rate-Distortion Optimization1.2.2 Random Intra Refresh

1.3 Efficient Intracoding Refresh

1.3.1 Error-Resilient RDO-Driven Intra Refresh1.3.1.1 RDO Intra and Inter Mode Decision1.3.1.2 Error-Resilient Intra/Inter Mode Decision1.3.2 Random Intra Refresh

1.4 Network-Aware Error-Resilient Video Coding Method

1.4.1 Intra/Inter Mode Decision with Constant αRD

1.4.2 Intra/Inter Mode Decision with Network-Aware αRD Selection

1.4.3 Model for the fNMD Mapping Function1.4.4 Network-Aware Cyclic Intra Refresh

1.4.5 Intra Refresh with Network-Aware αRD and CIR Selection1.5 Performance Evaluation

1.6 Final RemarksAcknowledgmentsReferences

1.1 Introduction

With the growing demand for universal accessibility to video content, more andmore different networks are being used to deploy video services However, inorder to make these video services efficiently available with an acceptablequality in error-prone environments, such as mobile networks, appropriate errorresilience techniques are necessary Since these error-prone environments cantypically have very different characteristics, which can also vary over time, it isimportant that the considered error resilience techniques are network-awareand can adapt to the varying characteristics of the used networks.

In order to extend the useful lifetime of a video coding standard, standardizationbodies usually specify the minimum set of tools that are essential forguaranteeing interoperability between devices or applications of differentmanufacturers With this strategy, the standard may evolve continuouslythrough the development and improvement of its nonnormative parts Errorresilience is an example of a video coding tool that is not completely specified ina normative way, in any of the currently available and emerging video codingstandards The reason for this is that it is simply not necessary forinteroperability and, therefore, it is one of the main degrees of freedom toimprove the performance of standard-based systems, even after the standardhas been finalized Nevertheless, recognizing the paramount importance of thistype of tool, standardization initiatives always include a minimum set of error-resilient hooks (e.g., in the form of bitstream syntax elements) in order tofacilitate the development of effective error resilience techniques, as needed forthe particular application envisaged.

Error-resilience techniques are usually seen as playing a role at the decoder sideof the communication chain However, by using preventive error resiliencetechniques at the encoder side, which involve the intelligent design of theencoder, it is also possible to make the task of the decoder much easier in terms

Trang 12

of dealing with errors In fact, the performance of the decoder can greatly varydepending on the amount of error resilience help provided in the bitstreamgenerated by the encoder This way, at the encoder, the challenge is to developtechniques that make video bitstreams more resilient to errors, in order to allowthe decoder to better recover in case errors occur; these techniques may becalled preventive error resilience techniques At the decoder, the challenge is todevelop techniques that make it possible for the decoder to take all theavailable received data (correct and, eventually, corrupted) and decode it withthe best possible video quality, thus minimizing the negative subjective impactof the errors on the video quality offered to the user; these techniques may becalled corrective error resilience techniques.

Video communication systems, in order to be globally more error-resilient tochannel errors, typically include both preventive and corrective error-resilienttechniques An important class of preventive techniques is error-resilient sourcecoding, which consists of providing redundancy at the source coding level inorder to prevent error propagation and consequently reduce the distortioncaused by data corruption/loss Error-resilient source coding techniques includedata partitioning, resynchronization and reversible variable length codes [1,2],redundant coding schemes, such as sending the same information predictedfrom different references [3], scalable video coding [4,5,6], or multipledescription coding [7,8] Besides source coding redundancy, channel codingredundancy can also be used, where a good example is the case of forwarderror correction [9] In terms of corrective error-resilient techniques, errorconcealment techniques correspond to one of the most important classes, butother important techniques also exist, such as error detection and errorlocalization techniques [10] Error concealment techniques consist essentially ofpostprocessing methods aiming at recovering missing or corrupted data fromneighboring data (either spatially or temporally) [11], but for these techniquesto be truly effective, an error detection technique should be first used to detectif an error has indeed occurred, followed by an error localization technique todetermine where the error occurred and which parts of the video content wereaffected [10] For a good review of the many different preventive and correctiveerror-resilient video coding techniques that have been proposed in theliterature, the reader can refer to Refs [12,13].

This chapter addresses the problem of error-resilient encoding, in particular ofhow to efficiently improve the resilience of compressed video bitstreams, whileadaptively considering the network characteristics in terms of information loss.Video coding systems that rely on predictive (inter) coding to remove temporalredundancy, such as those based on the H.264/AVC standard [14], are stronglyaffected by transmission errors/information loss due to the error propagationcaused by the prediction mechanisms Therefore, typical approaches to makebitstreams generated by the encoder more error-resilient rely on the adaptationof the video coding mode decisions, at various levels (e.g., picture, slice, ormacroblock level), to the underlying network characteristics, trying to establishan adequate trade-off between predictive and non-predictive encoding modes.This is done because nonpredictive modes are less efficient in terms ofcompression but can provide higher error resilience In this context, controlling

Trang 13

the amount of nonpredictive versus predictive encoded data is an efficient andhighly scalable error resilience tool.

The intracoding refresh schemes available in the literature[2,15,16,17,18,19,20,21,22] are a typical example of efficient error resiliencetechniques to improve the video quality over error-prone environments withoutrequiring changes to the bitstream syntax, thus allowing to continuouslyimprove the performance of standard video codecs without compromisinginteroperability However, a permanently open issue related to these techniquesis how to achieve the best trade-off between error resilience and codingefficiency.

Since these schemes work by selectively coding in intra mode different parts ofthe video content at different time instants, they are able to avoid long-termpropagation of transmission or storage errors that could make the decodedquality decay very rapidly This way, these intracoding refresh schemes are ableto significantly improve the error resilience of the coded bitstreams and increasethe overall subjective impact of the decoded video While some schemes do notrequire any specific knowledge of what is being done at the decoder in terms oferror concealment [16,17,18], other approaches try to estimate the distortionexperienced at the decoder given a certain probability of data corruption/lossand the concealment techniques adopted [2,22].

The problem with most video coding mode decision approaches, includingtypical intracoding refresh schemes, is that they can significantly decrease thecoding efficiency if they make their decisions without taking into account therate-distortion (RD) cost of such decisions This problem can be dealt with bycombining the error-resilient coding mode decisions with the video encoder ratecontrol module [23], where the usual coding mode decisions are taken [24,25].This way, coding-efficient error robustness can be achieved In the specific caseof intracoding refresh schemes, a clever solution for this combination, is tocompare the RD cost of coding macroblocks (MBs) in intra and inter modes; ifthe cost of intracoding is only slightly larger than the cost of intercoding, thenthe coding mode could be changed to intra, providing error robustness almostfor free This strategy is able to reduce error propagation and, thus, to increaseerror robustness when transmission errors occur, at a very limited RD costincrease and without the huge complexity of estimating the expected distortionexperienced at the decoder.

Nevertheless, in order for these error-resilient video coding mode decisionschemes to be really useful in an adaptive way, the current error characteristicsof the underlying network being used for transmission should be taken intoaccount For example, in the case of intracoding refresh schemes, this will allowthe bit rate resources allocated to intracoding refresh to be adequately adaptedto the error characteristics of the network [26] After all, networks with smallamounts of channel errors only need small amounts of intracoding refresh andvice versa Thus, efficient bit rate allocation in an error-resilient way has todepend on the feedback received from the network about its current errorcharacteristics, which define the error robustness needed.

Therefore, network awareness makes it possible to dynamically vary the amountof error resilience resources to better suit the current state of the network and,therefore, further improve the decoded video quality without reducing the error

Trang 14

robustness [26,27] This problem is nowadays more relevant than ever, sincemore and more audiovisual content is accessed over error-prone networks, suchas mobile networks, and these networks can have extremely varying errorcharacteristics (over time).

As an illustrative insightful example, this chapter presents a fully automaticnetwork-aware MB intracoding refresh technique for error-resilient H.264/AVCvideo coding, which also dynamically adjusts the amount of cyclically intrarefreshed MBs according to the network conditions, guaranteeing that endlesserror propagation is avoided.

The rest of the chapter is organized as follows Section 1.2 describes the generalvideo coding framework that was used for implementing the considered error-resilient network-aware MB intracoding refresh scheme Section 1.3 introducesthe concept of efficient intracoding refresh, which will later be needed in Section1.4, where the considered network-aware intracoding refresh scheme itself isdescribed Section 1.5 presents some relevant performance results for theconsidered scheme in typical mobile network conditions and, finally, Section1.6 concludes the chapter.

1.2 Video Coding Framework

The network-aware error-resilient scheme described in this chapter relies on therate control scheme proposed by Li et al [24,28], as well as on the RDoptimization (RDO) framework and the random intra refresh technique includedin the H.264/AVC reference software [25] Since the main contributions andnovelty of network-aware error-resilient scheme described in this chapter regardthe latter two techniques, it is useful to first briefly review the RDO and therandom intra refresh techniques included in the H.264/AVC reference softwarein order for the reader to better understand the described solutions.

1.2.1 Rate-Distortion Optimization

The H.264/AVC video coding standard owes its major performance gains,relatively to previous standards, essentially to the many different intra and interMB coding modes supported by the video coding syntax Although not all modesare allowed in every H.264/AVC profile [14], even for the simplest profiles, suchas the Baseline Profile, the encoder has a plethora of possibilities to encodeeach MB, which makes it difficult to accomplish optimal MB coding modedecisions with low (encoding) complexity Besides the MB coding mode decision,for motion-compensated inter coded MBs, finding the optimal motion vectorsand MB partitions is also not a straightforward task In this context, RDObecomes a powerful tool, allowing the encoder to optimally select the best MBcoding modes and motion vectors (if applicable) [28,29].

In the H.264/AVC reference software [25], the best MB mode decision isaccomplished through the RDO technique, where the best MB mode is selectedby minimizing the following Lagrangian cost function:

Trang 15

MODE is one of the allowable MB coding modes (e.g., SKIP, INTER 16 × 16,

INTER 16 × 8, INTER 8 × 16, INTER 8 × 8, INTRA 4 × 4, INTRA 16 × 16)

QP is the quantization parameter

D(MODE, QP) and R(MODE,QP) are, respectively, the distortion (between the

original and the reconstructed MB) and the number of bits that will be achieved

by applying the corresponding MODE and QP

In Ref [28], it is recommended that, for intra (I) and inter predicted (P) slices,

λMODE be computed as follows:

(1.2)Motion estimation can also be accomplished through the same framework Inthis case, the best motion vector and reference frame can be selected byminimizing the following Lagrangian cost function:

mv(REF) is the motion vector for the frame reference REF

D(mv(REF)) is the residual error measure, such as the sum of absolute

differences (SAD) between the original and the reference

R(mv(REF)) is the number of bits necessary to encode the corresponding motion

vector (i.e., the motion vector difference between the selected motion vectorand its prediction) and to signal the selected reference frame

In a similar way, Ref [28] also recommends that, for P-slices, λMOTION be computedas

(1.4)when the SAD measure is used.

Since the quantization parameter is required for computing the Lagrangianmultipliers λMODE and λMOT1ON, as well as for computing the number of bits to encodethe residue for a given MB, a rate control mechanism must be used that canefficiently compute for each MB (or set of MBs, such as a slice) an adequatequantization parameter in order to maximize the decoded video quality for agiven bit rate budget In this case, the method proposed by Li et al [24,28] has

Trang 16

been used since it is the one implemented in the H.264/AVC reference software[25].

1.2.2 Random Intra Refresh

As mentioned earlier, the H.264/AVC reference software [25] includes a(nonnormative) technique for intra refreshing MBs Although this technique iscalled random intra refresh (RIR), it is not really a purely random refreshtechnique This technique is basically a cyclic intra refresh (CIR) technique forwhich the refresh order is not simply the raster scan order The refresh order israndomly defined once before encoding, but afterward intra refresh proceeds

cyclically, following the determined order, with n MBs for each time instant An

example of a randomly determined intra refresh order, for QCIF spatialresolution, may be seen in Figure 1.1.

Example of random intra refresh order for QCIF spatial resolution (From Nunes,P et al., Error resilient macroblock rate control for H.264/AVC video

coding, Proceedings of the IEEE International Conference on Image Processing,

San Diego, CA, p 2133, October 2008 With permission © 2008 IEEE.)

Since the RIR technique used in the H.264/AVC reference software and alsoconsidered here is basically a CIR technique, in the remainder of this chapter,the acronyms RIR and CIR will be used interchangeably.

One of the main advantages of this technique is that, being cyclic, it guaranteesthat all MBs will be refreshed, at least, once in each cycle, thus guaranteeingthat there are no MBs where errors can propagate indefinitely However, thistechnique also has disadvantages, one of which is the fact that all MBs arerefreshed exactly the same number of times This basically means that it is notpossible to refresh more often MBs that are more likely to be lost or are harderto conceal at the decoder if an error does occur.

Another important aspect of this technique is that MBs are refreshed accordingto the predetermined order, without taking into account the eventual RD cost ofintra refreshing a given MB, as opposed to letting the rate control module decidewhich encoding mode is best in terms of RD cost This is exactly where there isroom for improvement: Intra refresh should be performed by taking into accountthe RD cost of a given MB.

Trang 17

1.3 Efficient Intracoding Refresh

When deciding the best MB coding mode, notably between inter- andintracoding modes, the RDO framework, as briefly described in Section 1.2.1,simply selects the mode that has lower RD cost, given by Equation 1.1 ThisRDO framework, as implemented in the H.264/AVC reference software, does nottake into account other dimensions, besides rate and distortion optimization,such as the robustness of the bitstream in error-prone environments Therefore,some MBs are simply inter coded because their best inter mode RD cost isslightly lower than the best intra mode RD cost For these cases, selecting theintra mode, although not optimal in a strict RD sense, can prove to be a muchbetter decision when the bitstream becomes corrupted by errors (e.g., due topacket losses in packet networks), and the intra coded MBs can be used to stoperror propagation due to the (temporal) predictive coding modes Moreover, ifadditional error robustness is introduced through an intra refresh technique, forexample, as the one described in Section 1.2.2, some MBs can be highlypenalized in a RD sense, since they can be blindly forced to be encoded in anintra mode, without taking into account the RD cost of that decision.

1.3.1 Error-Resilient RDO-Driven Intra Refresh

The main idea of a network-aware error-resilient scheme is to perform RDO in aresilient manner, using the relative RD cost of the best intra mode and the bestinter mode for each MB Therefore, whenever coding a given MB in intra modedoes not cost significantly more than the best intercoding mode, the given MB isgracefully forced to be encoded in its best intra mode.

This error-resilient RDO provides an efficient intra refresh scheme, thusguaranteeing that the generated bitstream will be more robust to channelerrors, without having to spend a lot of bits on intra coded MBs, which typicallyreduces the decoded video quality when there are no errors in the channel Thisscheme can be described through the MB-level mode decision architecturedepicted in Figure 1.2.

Architecture of the error-resilient MB intra/inter mode decision scheme FromNunes, P et al., Error resilient macroblock rate control for H.264/AVC video

Trang 18

coding, Proceedings of the IEEE International Conference on Image Processing,

San Diego, CA, p 2134, October 2008 With permission © 2008 IEEE.)

1.3.1.1 RDO Intra and Inter Mode Decision

Before deciding the best mode to encode a given MB, the best inter mode RD

cost, JINTER, is computed from the set of all possible inter modes, and the best intra

mode RD cost, JINTRA, is computed from the set of all possible intra modes throughRDO, i.e., Equations 1.1 and 1.3, where

INTER 8 × 16, INTER 8 × 8, INTER 8 × 4, INTER 4 × 8, and INTER 4 × 4)

PCM, or INTRA 8 × 8)

The best intra and inter modes are the ones with the lowest intra and inter RDcosts, respectively.

1.3.1.2 Error-Resilient Intra/Inter Mode Decision

To control the amount of MBs that will be gracefully forced to be encoded inintra mode, a control parameter, αRD (which basically specifies the tolerable RDcost increase for replacing an inter by an intra MB) is used in such a way that

Trang 19

1.3.2 Random Intra Refresh

Notice that the previous scheme does not guarantee that all MBs areperiodically refreshed, which, if not properly handled, could lead to an endlesspropagation of errors along time for some MBs in the video sequence To handlethis issue, an RIR can also be concurrently applied, but with a lower number ofrefreshed MBs per frame when compared with solely applying the RIRtechnique, in order not to compromise dramatically the RD efficiency.

MBs with an intra/inter RD cost ratio below the line will be gracefully forced tointra mode (From Nunes, P et al., Error resilient macroblock rate control for

H.264/AVC video coding, Proceedings of the IEEE International Conference on

Image Processing, San Diego, CA, p 2134, October 2008 With permission ©

2008 IEEE.)

1.4 Network-Aware Error-Resilient Video Coding Method

The main limitation of the MB coding mode decision method describedin Section 1.3 is that the control parameter, αRD, is not dynamically adapted tothe actual network error conditions However, when feedback about the networkerror conditions is available, it would be possible to use this information toadjust the αRD control parameter in order to maximize the decoded video qualitywhile dynamically providing adequate error resilience.

1.4.1 Intra/Inter Mode Decision with Constant αRD

When a constant αRD value is used without considering the current network errorconditions in terms of packet loss rate (PLR), the benefits of the techniquedescribed in Section 1.3 (and proposed in Ref [23]) are not fully exploited Thisis clear from Figure 1.4, where the Foreman sequence has been encoded with

the Baseline Profile of H.264/AVC with different α values, including α = 1.

Trang 20

In Figure 1.4, as well as in the remainder of Section 1.4, CIR is not used in orderto avoid biasing the behavior associated with the αRD parameter Notice,however, that the use of CIR is typically recommended, as mentioned in Section1.2.2 As can be seen, in these conditions, the optimal αRD (i.e., the one thatleads to the highest PSNR) is highly dependent on the network PLR.

PSNR versus PLR for a constant αRD parameter for the Foreman sequence (From

Soares, L D et al., Efficient network-aware macroblock mode decision for error

resilient H.264/AVC video coding, Proceedings of the SPIE Conference on

Applications of Digital Image Processing, vol 7073, San Diego, CA, August

As expected, when there are no errors (PLR = 0%), the highest decoding quality

is achieved when no intra MBs are forced (i.e., αRD = 1.0) However, for thisαRD value, the decoded video quality decays very rapidly as the PLR increases.On the other hand, if only a small amount of intra MBs are forced (i.e., αRD = 1.8),the decoded video quality is slightly improved for the higher PLR values, whencompared to the case with no forced intra MBs, but will be slightly penalized forerror-free transmission This effect is even more evident as the αRD valueincreases, which corresponds to the situation where more and more intra MBsare gracefully forced, depending on the αRD value For example, for αRD = 3.8 andfor a PLR of 10%, the decoded video quality is highly improved relatively to thesituation with no forced intra MBs (i.e., 6.36 dB), because the error propagationis significantly reduced However, for lower PLRs, the decoded video quality is

penalized due to the excessive use of intracoding (i.e., 7.19 dB for PLR = 0%and 1.50 dB for PLR = 1%), still for αRD = 3.8.

Therefore, from what has been presented earlier, it is possible to conclude thatthe optimal amount of intra coded MBs is highly dependent on the errorcharacteristics of the underlying network and, thus, the error resilience control

Trang 21

parameter αRD should be dynamically adjusted to the channel error conditions tomaximize the decoded quality.

PSNR versus αRD (alpha in the x-axis label) parameter for various PLRs forthe Mother and Daughter sequence (From Soares, L.D et al., Efficient network-

aware macroblock mode decision for error resilient H.264/AVC video

coding, Proceedings of the SPIE Conference on Applications of Digital Image

Processing, vol 7073, San Diego, CA, August 2008.)

In order to illustrate the influence of the αRD parameter on the decodedPSNR, Figure 1.5 shows the decoded video quality, in terms of PSNR, versus theαRD parameter for several PLRs for the Mother and Daughter sequence (QCIF, 10

Hz) encoded at 64 kbit/s Clearly, for each PLR condition, there is an αRD valuethat maximizes the decoded video quality For example, for a PLR of 10%, themaximum PSNR value is achieved for αRD = 2.2 To further illustrate theimportance of a proper selection of the αRD parameter and how it cansignificantly improve the overall decoded video quality under severe errorconditions, it should be noted that, for a PLR of 10%, the PSNR differencebetween having αRD = 2.2 and αRD = 1.1 is 5.47 dB.

1.4.2 Intra/Inter Mode Decision with Network-Aware αRD Selection

A possible approach to address the problem of adapting the αRD parameter to thechannel error conditions is to use the information in the receiver reports (RR) ofthe real-time transport protocol (RTP) control protocol (RTCP) [30] to provide theencoder with the actual error characteristics of the underlying network Thismakes it possible to adaptively and efficiently select the amount of intra codedMBs to be inserted in each frame by taking into account this feedbackinformation about the rate of lost packets, as shown in Figure 1.6.

Trang 22

Network-aware video encoding architecture (From Soares, L.D et al., Efficientnetwork-aware macroblock mode decision for error resilient H.264/AVC video

coding, Proceedings of the SPIE Conference on Applications of Digital Image

Processing, vol 7073, San Diego, CA, August 2008.)

In the method presented here, the intra/inter mode decision is still based on theαRD parameter, but this time αRD may depend on several aspects, such as thecontent type, the content spatial and temporal resolutions, the coding bit rate,and the PLR of the network.

This way, by considering a mapping function fNMD, it will be possible todynamically determine the αRD parameter from the following expression:

PLR is the packet loss rate

S can be an n-dimensional vector characterizing the encoding scenario, for

example, in terms of the content motion activity and the texture codingcomplexity, the content spatial and temporal resolutions, and the coding bit rateIn this work, however, as it will be shown later in Section 1.4.3, the encodingscenario can be characterized solely by the encoded bit rate with a good

approximation The fNMD function basically maps the encoding scenario and thenetwork PLR into a “good” αRD parameter that dynamically maximizes theaverage decoding video quality Notice that, although it is not easy to obtain ageneral function, it can be defined for several classes of content and a discretelimited set of encoding parameters and PLRs In this chapter, it will be shown

that, by carefully designing the fNMD function, significant gains can be obtained interms of video quality regarding the reference method described in Section1.4.4.

Therefore, the network-aware MB mode decision (NMD) method can be brieflydescribed through the following steps in terms of encoder operation:

1 Obtain the packet loss rate through network feedback.

2 Compute the αRD parameter through the mapping function given by Equation1.9 (and detailed in the following).

3 Perform intra/inter mode decision using the αRD parameter, computed in Step2, for the next MB to be encoded, and encode the MB.

4 Check if a new network feedback report has arrived; if yes, go back to Step 1;if not, go back to Step 3.

Trang 23

Notice that it is out of the scope of this chapter to define when the networkreports are issued, since this will depend on how the network protocols areconfigured and the varying characteristics of the network itself [30].Nevertheless, in real application scenarios, it is important to design appropriateinterfacing mechanisms between the codec and the underlying network, in orderthat both encoder and decoder can adaptively adjust their operations accordingto the network conditions [12].

Through Equation 1.9, the encoder is able to adjust the amount of intra refreshaccording to the network error conditions and the available bit rate This intrarefresh method typically increases the intra refresh for the more complex MBs,which are those typically more difficult to conceal The main problem of thisapproach is that it does not guarantee that all MBs in the scene are refreshed.This is clearly illustrated in Figure 1.7 for the Foreman sequence, where the right

image represents the relative amount of MB intra refresh along the sequence(lighter blocks mean more intra refresh) As it can be seen, with this intrarefresh scheme some MBs are never refreshed, which can lead to errorspropagating indefinitely along time in these MB positions (dark blocks in Figure1.7).

1.4.3 Model for the fNMD Mapping Function

In order to devise a model for the mapping function fNMD defined in Equation 1.9,it is first important to see how the optimal αRD parameter varies with PLR This isplotted in Figure 1.8 for three different sequences (i.e., Mother and Daughter,

Foreman, and Mobile and Calendar) encoded at different bit rates, and

resolutions, for illustrative purposes Each curve in Figure 1.8 corresponds to a

different encoding scenario S, in terms of the content motion activity and the

texture coding complexity, the content spatial and temporal resolutions, and thecoding bit rate (see Equation 1.9) As shall be detailed later in Section 1.5, thesethree sequences have also been encoded at many other bit rates, and the kindof curves obtained was always similar.

Relative amount of intra refresh (b) for the MBs of the Foreman sequence (a)(QCIF, 15 Hz, 128 kbit/s,and α = 1.1) (From Nunes, P et al., Automatic and

Trang 24

adaptive network-aware macroblock intra refresh for error-resilient H.264/AVC

video coding, Proceedings of the IEEE International Conference on Image

Processing, Cairo, Egypt, p.3074, November 2009 With permission © 2009

Example of optimal αRD versus PLR for various sequences and bit rates (FromSoares, L.D et al., Efficient network-aware macroblock mode decision for error

resilient H.264/AVC video coding, Proceedings of the SPIE Conference on

Applications of Digital Image Processing, vol 7073, San Diego, CA, August

As can be seen from the plots in Figure 1.8, the behavior of the optimalαRD parameter versus the PLR is similar to that of a charging capacitor [31] (butstarting at αRD = 1.0) Therefore, for a given sequence and for a given bit rate

(i.e., a given encoding scenario S), it should be possible to model the behavior

of the αRD parameter with respect to the PLR with the following expression:

where PLR represents the packet loss rate, while K1 and K2 represent constantsthat are specific to the considered encoding scenario, notably the sequencecharacteristics and bit rate However, the main problem in using Equation1.10 to compute αRD is that, for a given sequence, a different set

of K1 and K2 would be needed for each of the considered bit rates, which wouldbe extremely unpractical In order to address this issue, it is important tounderstand how the optimal αRD parameter varies when both the PLR and the bitrate vary This variation is illustrated in Figure 1.9 for the Mobile and

Calendar sequence.

Trang 25

Optimal αRD versus PLR and bit rate for the Mobile and Calendar sequence (From

Soares, L.D et al., Efficient network-aware macroblock mode decision for error

resilient H.264/AVC video coding, Proceedings of the SPIE Conference on

Applications of Digital Image Processing, vol 7073, San Diego, CA, August

After close inspection of Figure 1.9, it can be seen that the K1 value, whichbasically dictates the value of αRD toward which the curve asymptoticallyconverges, depends linearly on the used bit rate and, therefore, it can bemodeled by the following expression:

where rb is the bit rate, while a and b are the parameters that need to be

estimated for a given sequence.

As for the K2 value, which dictates the growth rate of the consideredexponential, it appears, after exhaustive testing, to not depend on the used bitrate Therefore, as a first approach, it can be considered to be constant, as in

This behavior was observed for the three different video sequences mentionedearlier and, therefore, makes it possible to establish a final expression whichallows the video encoder to automatically select, for a given sequence, anadequate αRD parameter when the PLR and the bit rate rb are known:

Trang 26

where a, b, and c are the model parameters that need to be estimated (see Ref.

[26]) After extensive experimentation, it was found that the parameters a, b,and c can be considered more or less independent of the sequence, which

means that a single set of parameters could be used for three different videosequences with a low fitting error This basically means that the encoding

scenario S, defined in Section 1.4.2, can be well represented only by the bit

rate rb.

As explained in Ref [26], the parameters a, b, and c could be obtained by

considering four packet loss rates and two different bit rates for three different

sequences, corresponding to a total of 24 (rb, PLR) pairs, with the iterative

Levenberg–Marquardt method [32,33] By following this approach, the estimated

parameters are a = 0.83 × 10−6, b = 0.97, and c = 0.90.

1.4.4 Network-Aware Cyclic Intra Refresh

The approach presented in Section 1.4.2 can also be followed to simply adjustthe number of cyclic intra refreshed MBs per frame, based on the feedbackreceived about the network PLR, without any RD cost considerations This isshown in Figure 1.10, where it is clear that for each PLR condition there are anumber of cyclic intra refresh MBs that maximize the decoded video quality.However, when comparing the best PSNR results of Figures 1.5 and 1.10 (both

obtained for the Mother and Daughter sequence encoded with the same spatial

and temporal resolutions and the same bit rate), for a given PLR, the PSNRvalues obtained by varying αRD are always higher For example, for a PLR of 5%,a maximum average PSNR of 37.03 dB is achieved for αRD = 1.9 (see Figure 1.5),while a maximum PSNR of only 34.94 dB is achieved for 33 cyclically intrarefreshed MBs in each frame (see Figure 1.10), a difference of approximately 2dB This shows that by adequately choosing the αRD parameter it should bepossible to achieve a higher quality than when using the optimal number of CIRMBs This is mainly due to the fact that when simply cyclically intra refreshingsome MBs in a given frame, the additional RD cost of that decision can beextremely high, penalizing the overall video quality, since the “cheap” intra MBsare not looked for as in the efficient intracoding refresh solution based on theαRD parameter.

Trang 27

PSNR versus number of CIR MBs for various PLRs for the Mother and

Daughter sequence (From Soares, L.D et al., Efficient network-aware

macroblock mode decision for error resilient H.264/AVC video

coding, Proceedings of the SPIE Conference on Applications of Digital Image

Processing, vol 7073, San Diego, CA, August 2008.)

1.4.5 Intra Refresh with Network-Aware αRD and CIR Selection

The main drawback of the scheme described in Section 1.4.3 of not being ableto guarantee that all MBs are periodically refreshed, can be alleviated byintroducing some additional CIR MBs per frame to guarantee that all MBpositions are refreshed with a minimum periodicity This requirement raises thequestion of how to adaptively select an adequate amount of CIR MBs that issufficiently high to avoid long-term error propagation without penalizing toomuch the encoder RD performance.

A possible approach to tackle this problem is to decide the adequate αRD valueand the number of CIR MBs per frame separately, using a different model foreach of these two error resilience parameters For the αRD selection, the modelin Equation 1.9 is used As for the selection of the number of CIR MBs, it wasverified after exhaustive testing [27] that the optimal amount of CIR MBs tends

to increase linearly with the bit rate rb, for a given PLR, but tends to increaseexponentially with the PLR, for a given bit rate Based on these observations,the following model was considered for the selection of the amount of CIR MBsper frame:

where a1, b1, and c1 are the model parameters that need to be estimated In Ref.[27], these parameters have been determined by nonlinear curve fitting (the

Trang 28

Levenberg–Marquardt method) of the optimal amount of CIR MBs per frame,experimentally determined for a set of representative test sequences, encoding

bit rate ranges and packet loss rates The estimated parameters were a1 =12.97 × 10−6, b1 = −0.13, and c1 = 0.24; these parameter values will also beconsidered here.

Figure 1.11 shows the proposed model as well as the experimental data for

the Mobile and Calendar test sequence As can be seen, a simple linear model

would not have represented well the experimental data.

Optimal amount of CIR MBs per frame versus PLR and bit rate for the Mobile and

Calendar sequence (From Nunes, P et al., Automatic and adaptive

network-aware macroblock intra refresh for error-resilient H.264/AVC video

coding, Proceedings of the IEEE International Conference on Image Processing,

Cairo, Egypt, p 3075, November 2009 With permission © 2009 IEEE.)

The CIR order is randomly defined once before encoding, as described in Section1.2.2 (and in Ref [25]), to avoid the subjectively disturbing effect of performingsequential (e.g., raster scan) refresh The determined order is then cyclicallyfollowed with the computed number of MBs being refreshed in each frame.

Therefore, the complete network-aware MB intracoding refresh (NIR) scheme(which was initially proposed in Ref [27]) can be briefly described by thefollowing steps in terms of encoder operation:

Step 1 Obtain the PLR value through network feedback.

Step 2 Compute the number of CIR MBs to be used per frame, by using the

proposed fCIR function defined by Equation 1.14 and rounding it to the nearestinteger.

Step 3 Compute the αRD value by using the fNMD function defined by Equation1.9 in Section 1.4.2.

Trang 29

Step 4 For each MB in a frame, check if it should be forced to intra mode

according to the CIR order and the determined number of CIR MBs per frame; ifnot, perform intra/inter mode decision using the αRD value computed in Step 3;encode the MB with selected mode.

Step 5 At the end of the frame, check if a new network feedback report has

arrived; if yes, go back to Step 1; if not, go back to Step 4.

The definition of when the network reports are issued depends on how thenetwork protocols are configured and the varying characteristics of the networkitself [34].

Notice that independently selecting the αRD value and the amount of CIR MBs,while they are likely interdependent, can lead to chosen values that do notcorrespond to the optimal (αRD, CIR) pair However, it has been verified after

extensive experimentation that the considered independent selection process isstill robust in the sense that the chosen values are typically close enough to theoptimal pair and, therefore, the overall performance is not dramaticallypenalized.

1.5 Performance Evaluation

To evaluate the performance of the complete NIR scheme described in thischapter, it has been compared in similar conditions to a reference intra refreshscheme, which basically corresponds to the network-aware version with thecyclic intra refresh scheme of the H.264/AVC reference software [25] describedin Section 1.4.4 This solution has been adopted because at the time of writingno other network-aware intra refresh techniques, which adaptively take intoaccount the current network conditions, were known.

In the reference scheme, the optimal number of CIR MBs per frame is selectedmanually for the considered network conditions, while in the considered NIRsolution, the selection of the amount of CIR MBs per frame and theαRD parameter is done fully automatically For the complete NIR and reference

schemes, the Mother and Daughter, the Foreman, and the Mobile and

Calendar video sequences have been encoded using the H.264/AVC Baseline

Profile [25] The used test conditions, which are representative of thosecurrently used for personal communications over mobile networks, aresummarized in Table 1.1 For QCIF, each frame was divided into three slices,while for CIF each frame was divided into six slices In both cases, eachslice consists of three MB rows After encoding, each slice was mapped to anRTP packet for network transmission [34].

TABLE 1.1

Test Conditions

SequenceMotherDaughterandForemanMobile and Calendar

Trang 30

Bit rate (kbit/s) 24–64 48–128 384–1152

Source: Nunes, P., Soares, D., and Periera, F., Error resilient macroblock rate

control for H.264/AVC video coding, Proceedings of the IEEE International

Conference on Image Processing, San Diego, CA, p 2134, October 2008 With

permission Copyright 2008 IEEE.

For the reference scheme, the number of cyclically intra refreshed MBs perframe was chosen for each PLR and bit rate, such that the decoded video qualitywould be the best possible This was done manually by performing anexhaustive set of tests using many different amounts of CIR MBs per frame andthen choosing the one that leads to the highest decoded average PSNR value,obtained by averaging over 50 different error patterns For the QCIF videosequences, the possible values for the number of cyclically intra refreshed MBswere chosen from the representative set {0, 5, 11, 22, 33, …, 99}, while for theCIF video sequences the representative set consisted of {0, 22, 44, 66,…, 396}.To simulate the network conditions, three different PLRs were considered: 1%,5%, and 10% Since each slice is mapped to one RTP packet, each lost packetwill correspond to a lost video slice Packet losses are considered independentand identically distributed For each one of the studied PLRs, each codedbitstream has been corrupted and then decoded 50 times (i.e., corresponding to50 different error patterns or runs), while applying the default error concealmenttechnique implemented in the H.264/AVC reference software [25,28] Thepresented results correspond to PSNR averages of these 50 different runs for theluminance component (PSNR Y).

For the conditions mentioned earlier, PSNR Y results are shown in Tables1.2 through 1.4 for the Mother and Daughter, Foreman, and Mobile and

Calendar video sequences, respectively In these tables, NIR refers to the

complete network-aware intracoding refresh scheme described in this chapter,and JM refers to the reference technique (winning cases appear in bold) Inaddition, OPT corresponds to the manual selection of the best (αRD, CIR) pair.

TABLE 1.2

PSNR Results for the Mother and Daughter Sequence

Trang 31

Source: From Nunes, P., Soares, D., and Periera, F., Automatic and adaptive

network-aware macroblock intra refresh for error-resilient H.264/AVC video

coding, Proceedings of the IEEE International Conference on Image Processing,

Cairo, Egypt, p 3076, November 2009 With permission Copyright 2009 IEEE.

TABLE 1.3

PSNR Results for the Foreman Sequence

Source: From Nunes, P., Soares, D., and Periera, F., Automatic and adaptive

network-aware macroblock intra refresh for error-resilient H.264/AVC video

coding, Proceedings of the IEEE International Conference on Image Processing,

Cairo, Egypt, p 3076, November 2009 With permission Copyright 2009 IEEE.

TABLE 1.4

PSNR Results for the Mobile and Calendar Sequence

Trang 32

Source: From Nunes, P., Soares, D., and Periera, F., Automatic and adaptive

network-aware macroblock intra refresh for error-resilient H.264/AVC video

coding, Proceedings of the IEEE International Conference on Image Processing,

Cairo, Egypt, p 3076, November 2009 With permission Copyright 2009 IEEE.No visual results are given here, because the direct comparison of peer frames(encoded with different coding mode selection schemes) is rather meaninglessin this case; only the comparison of the total video quality for several errorpatterns makes sense This is due to the fact that the generated streams for theproposed and the reference techniques are different and, even if the same errorpattern is used to corrupt them, the errors will affect different parts of the dataat a given time instant, causing very different artifacts.

To help the reader to better read the gains obtained with the proposed

technique, the results obtained for the Mother and Daughter sequence are

also shown in a plot in Figure 1.12, for both JM and NIR For the Foreman andthe Mobile and Calendar sequences, the trends are similar.

Trang 33

PSNR results for the Mother and Daughter sequence (From Nunes, P., Soares,

D., and Periera, F., Automatic and adaptive network-aware macroblock intra

refresh for error-resilient H.264/AVC video coding, Proceedings of the IEEE

International Conference on Image Processing, Cairo, Egypt, p 3076, November

2009 With permission Copyright 2009 IEEE.)

The presented results show that, when the fully automatic NIR scheme is used,the decoded video quality is significantly improved for the vast majority oftested conditions when compared to the reference method with a manuallyselected amount of CIR MBs (JM) Improvements of the NIR method can be as

high as 1.90 dB for the Mother and Daughter sequence encoded at 64 kbit/s and

a PLR of 5% The most significant exception is for the PLR of 10% and higher bitrates (see Tables 1.3 and 1.4) This exception is due to the fact that, for these

PLR and bit rate values, the number of CIR MBs chosen with the proposed fCIR isslightly different from the optimal values.

When comparing the NIR scheme to the one proposed in Ref [26], which doesnot use CIR, the NIR PSNR Y values are most of the times higher than or equal tothose achieved in Ref [26] The highest gains occur for the Foreman sequenceencoded at 128 kbit/s and a PLR of 10% (0.90 dB), and for the Mobile and

Calendar sequence encoded at 768 kbit/s and a PLR of 10% (0.60 dB) For the

cases, where the NIR leads to lower PSNR Y values, the losses are never more

than 0.49 dB, which happens for the Mobile and Calendar sequence encoded at

896 kbit/s and a PLR of 5%.

Notice, however, that the scheme in Ref [26] cannot guarantee that all MBs willeventually be refreshed, which is a major drawback for real usage in error-proneenvironments, such as mobile networks On the other hand, the one described inthis chapter can, not only overcome this drawback, but it does so fullyautomatically, without any user intervention.

Trang 34

1.6 Final Remarks

This chapter describes a method to efficiently and fully automatically performintracoding refresh, while taking into account the PLR of the underlying networkand the encoded bit rate The described method can be used to efficientlygenerate error-resilient H.264/AVC bitstreams that are perfectly adapted to thechannel error characteristics This is extremely important because it can meanthat error-resilient video transmission will be possible in environments withvarying error characteristics with an improved quality, notably, when comparedto the case where the MB intracoding decisions are taken without consideringthe error characteristics of the network.

The authors would like to acknowledge that the work described in this chapterwas developed at Instituto de Telecomunicações (Lisboa, Portugal) and wassupported by FCT project PEst-OE/EEI/LA0008/2011.

1 A H Li, S Kittitornkun, Y.-H Hu, D.-S Park, J Villasenor, Data partitioningand reversible variable length codes for robust video

communications, Proceedings of the IEEE Data Compression Conference,

Snowbird, UT, pp 460–469, March 2000.

2 G Cote, S Shirani, F Kossentini, Optimal mode selection and synchronization

for robust video communications over error-prone networks, IEEE Journal on

Selected Areas in Communications, 18(6), 952–965, June 2000.

3 S Wenger, G D Knorr, J Ott, F Kossentini, Error resilience support in

H.263+, IEEE Transactions on Circuits and Systems for Video Technology, 8(7),

867–877, November 1998.

4 L P Kondi, F Ishtiaq, A K Katsaggelos, Joint source-channel coding for

motion-compensated DCT-based SNR scalable video, IEEE Transactions on

Image Processing, 11(9), 1043–1052, September 2002.

5 H M Radha, M van der Schaar, Y Chen, The MPEG-4 fine-grained scalable

video coding method for multimedia streaming over IP, IEEE Transactions on

Multimedia, 3(1), 53–68, March 2001.

6 T Schierl, T Stockhammer, T Wiegand, Mobile video transmission using

scalable video coding, IEEE Transactions on Circuits and Systems for Video

Technology, 17(9), 1204–1217, September 2007.

7 R Puri, K Ramchandran, Multiple description source coding through forward

error correction codes, Proceedings of the Asilomar Conference on Signals,

Systems, and Computers, Pacific Grove, CA, vol 1, pp 342–346, October 1999.

8 V K Goyal, Multiple description coding: Compression meets the

network, IEEE Signal Processing Magazine, 18(5), 74–93, September 2001.

9 K Stuhlmüller, N Färber, M Link, B Girod, Analysis of video transmission

over lossy channels, IEEE Journal on Selected Areas in Communications, 18(6),

1012–1032, June 2000.

10 L D Soares, F Pereira, Error resilience and concealment performance for

MPEG-4 frame-based video coding, Signal Processing: Image Communication,

14(6–8), 447–472, May 1999.

Trang 35

11 A K Katsaggelos, F Ishtiaq, L.P Kondi, M.-C Hong, M Banham, J Brailean,

Error resilience and concealment in video coding, Proceedings of the European

Signal Processing Conference, Rhodes, Greece, pp 221–228, September 1998.

12 Y Wang, S Wenger, J Wen, A Katsaggelos, Error resilient video coding

techniques IEEE Signal Processing Magazine, 17(4), 61–82, July 2000.

13 F Zhai, A Katsaggelos, Joint Source-Channel Video Transmission, Morgan &

Claypool Publishers, San Rafael, CA, 2007.

14 ISO/IEC 14496-10, Information Technology—Coding of Audio-Visual Objects—Part 10: Advanced Video Coding, 2005.

15 ISO/IEC 14496-2, Information Technology—Coding of Audio-Visual Objects—Part 2: Visual (2nd Edn.), 2001.

16 P Haskell, D Messerschmitt, Resynchronization of motion compensated

video affected by ATM cell loss, Proceedings of the IEEE International

Conference on Acoustics, Speech and Signal Processing, San Francisco, CA, vol.

3, pp 545–548, March 1992.

17 G Côté, F Kossentini, Optimal intra coding of blocks for robust video

communication over the Internet, Signal Processing: Image Communication,

15(1–2), 25–34, September 1999.

18 J Y Liao, J.D Villasenor, Adaptive intra block update for robust transmission

of H.263, IEEE Transactions on Circuits and Systems for Video Technology,

10(1), 30–35, February 2000.

19 P Frossard, O Verscheure, AMISP: A complete content-based MPEG-2

error-resilient scheme, IEEE Transactions on Circuits and Systems for Video

Technology, 11(9), 989–998, September 2001.

20 Z He, J Cai, C Chen, Joint source channel rate-distortion analysis for

adaptive mode selection and rate control in wireless video coding, IEEE

Transactions on Circuits and Systems for Video Technology, 12(6), 511–523,

June 2002.

21 H Shu, L Chau, Intra/Inter macroblock mode decision for error-resilient

transcoding, IEEE Transactions on Multimedia, 10(1), 97–104, January 2008.

22 H-J Ma, F Zhou, R.-X Jiang, Y.-W Chen, A network-aware error-resilient

method using prioritized intra refresh for wireless video communications, Journal

of Zhejiang University - Science A, 10(8), 1169–1176, August 2009.

23 P Nunes, L.D Soares, F Pereira, Error resilient macroblock rate control for

H.264/AVC video coding, Proceedings of the IEEE International Conference on

Image Processing, San Diego, CA, pp 2132–2135, October 2008.

24 Z Li, F Pan, K Lim, G Feng, X Lin, S Rahardaj, Adaptive basic unit layer

rate control for JVT, Doc JVT-G012, 7th MPEG Meeting, Pattaya, Thailand, March

Available: http://iphome.hhi.de/suehring/tml/download/

26 L.D Soares, P Nunes, F Pereira, Efficient network-aware macroblock mode

decision for error resilient H.264/AVC video coding, Proceedings of the SPIE

Conference on Applications of Digital Image Processing, vol 7073, San Diego,

CA, pp 1–12, August 2008.

27 P Nunes, L.D Soares, F Pereira, Automatic and adaptive network-aware

macroblock intra refresh for error-resilient H.264/AVC video coding, Proceedings

Trang 36

of the IEEE International Conference on Image Processing, Cairo, Egypt, pp.

3073–3076, November 2009.

28 K.-P Lim, G Sullivan, T Wiegand, Text description of joint model reference

encoding methods and decoding concealment methods, Doc JVT-X101, ITU-T

VCEG Meeting, Geneva, Switzerland, June 2007.

29 T Wiegand, H Schwarz, A Joch, F Kossentini, G Sullivan, Rate-constrained

coder control and comparison of video coding standards, IEEE Transactions on

Circuits and Systems for Video Technology, 13(7), 688–703, July 2003.

30 H Schulzrinne, S Casner, R Frederick, V Jacobson, RTP: A transport

protocol for real-time applications, Internet Engineering Task Force, RFC 1889,

January 1996.

31 R C Dorf, J.A Svoboda, Introduction to Electric Circuits, 5th Edition, Wiley,

New York, 2001.

32 K Levenberg, A method for the solution of certain non-linear problems in

least squares, Quarterly of Applied Mathematics, 2(2), 164–168, July 1944.

33 D Marquardt, An algorithm for the least-squares estimation of nonlinear

parameters, SIAM Journal of Applied Mathematics, 11(2), 431–441, June 1963.

34 S Wenger, H.264/AVC over IP, IEEE Transactions on Circuits and Systems

for Video Technology, 13(7), 645–656, July 2003.

Distributed Video Coding: Principles and Challenges

Jürgen Slowack and Rik Van de Walle

2.1 Introduction

2.2 Theoretical Foundations

2.2.1 Lossless Distributed Source Coding (Slepian–Wolf)

2.2.2 Lossy Compression with Receiver Side Information (Wyner–Ziv)2.3 General Concept

2.4 Use-Case Scenarios in the Context of Wireless Networks2.5 DVC Architectures and Components

2.5.1 Side Information Generation

2.5.1.1 Frame-Level Interpolation Strategies2.5.1.2 Frame-Level Extrapolation Strategies2.5.1.3 Encoder-Aided Techniques

2.5.1.4 Partitioning and Iterative Refinement2.5.2 Correlation Noise Estimation

2.5.3 Channel Coding

2.5.4 Determining the WZ Rate

2.5.5 Transformation and Quantization2.5.6 Mode Decision

2.6 Evaluation of DVC Compression Performance2.7 Other DVC Architectures and Scenarios2.8 Future Challenges and Research DirectionsReferences

Trang 37

2.1 Introduction

A video compression system consists of an encoder that converts uncompressedvideo sequences into a compact format suitable for transmission or storage, anda decoder that performs the opposite operations to facilitate video display.

Compression is typically achieved by exploiting similarities between frames(temporal direction), as well as similarities between pixels within the sameframe (spatial direction) The conventional way is to exploit these similarities atthe encoder Using already-coded information, the encoder generates aprediction of the information still to be coded Next, the difference between theinformation to be coded and the prediction is further processed and compressedthrough entropy coding.

The accuracy of the prediction determines the compression performance, in thesense that more accurate predictions will lead to smaller residuals and bettercompression As a consequence, computationally complex algorithms have beendeveloped to search for the best predictor This has led to a complexityimbalance, in which the encoder is significantly more complex than the decoder.A radically different approach to video coding—called distributed video coding(DVC)—has emerged during the past decade In DVC, the prediction isgenerated at the decoder instead of at the encoder As this prediction—calledside information—typically contains errors, additional information is sent fromthe encoder to the decoder to allow correcting the side information Generatingthe prediction signal at the decoder shifts the computational burden from theencoder to the decoder side This facilitates applications in which encodingdevices are relatively cheap, small, and/or power-friendly Some examples ofthese applications include wireless sensor networks, wireless video surveillance,and videoconferencing using mobile devices [44].

Many publications covering DVC have appeared (including a book on distributedsource coding [DSC] [16]) The objective of this chapter is therefore to provide acomprehensive overview of the basic principles behind DVC and illustrate theseprinciples with examples from the current state-of-the-art Based on thisdescription, the main future challenges will be identified and discussed.

2.2 Theoretical Foundations

Before describing the different DVC building blocks in detail we start byhighlighting some of the most important theoretical results This includes adiscussion on the Slepian–Wolf and Wyner–Ziv (WZ) theorems, which aregenerally regarded as providing a fundamental information–theoretical basis forDVC It should be remarked that these results apply to DSC in general and thatDVC is only a special case.

2.2.1 Lossless Distributed Source Coding (Slepian–Wolf)

David Slepian and Jack K Wolf considered the configuration depicted in Figure2.1, in which two sources X and Y generate correlated sequences of information

symbols [51] Each of these sequences is compressed by a separate encoder,

namely, one for X and one for Y The encoder of each source is constrained to

operate without knowledge of the other source, explaining the term DSC Thedecoder, on the other hand, receives both coded streams as input and should be

Trang 38

able to exploit the correlation between the sources X and Y for decoding the

information symbols.

Slepian and Wolf consider the setup in which two correlated sources X and Y are

coded independently, but decoded jointly.

Surprisingly, Slepian and Wolf proved that the compression bound for thisconfiguration is the same as in the case where the two encoders are allowed to

communicate More precisely, they proved that the rates RX and RY of the codedstreams satisfy the following set of equations:

where H(.) denotes the entropy These conditions can be represented

graphically, as a so-called admissible or achievable rate region, as depictedin Figure 2.2.

While any point on the line H(X,Y) is equivalent from a compression point of

view, special attention goes to the corner points of the achievable rate region.

For example, the point (H(X|Y), H(Y)) corresponds to the special case of source

coding with side information available at the decoder, as depicted in Figure 2.3.This case is of particular interest in the context of current DVC solutions, where

side information Y is generated at the decoder and used to decode X According

to the Slepian–Wolf theorem, the minimal rate required in this case is the

conditional entropy H(X|Y).

2.2.2 Lossy Compression with Receiver Side Information (Wyner–Ziv)

The work of Slepian and Wolf relates to lossless compression These results wereextended to lossy compression by Aaron D Wyner and Jacob Ziv [65] Althoughintroducing quality loss seems undesirable at first thought, it is often necessaryto allow some loss of quality at the output of the decoder in order to achieveeven higher compression ratios (i.e., lower bit rates).

Trang 39

Graphical representation of the achievable rate region.

(Lossless) source coding with side information available at the decoder.

Denote the acceptable distortion between the original signal X and the decodedsignal X′ as D = E[d(X, X′)], where d is a specific distortion metric (such as the

mean-squared error) Two cases are considered for compression with side

information available at the decoder In the first case, the side information Y is

not available at the encoder The rate of the compressed stream for this case isdenoted RWZX|Y(D)RX|YWZ(D) In the second case, Y is made available to the

encoder as well, resulting in a rate denoted RX|Y(D)RX|Y(D) With thesenotations, Wyner and Ziv proved that

(2.2)In other words, not having the side information available at the encoder results

in a rate loss greater than or equal to zero, for a particular distortion D.

Interestingly, the rate loss has been proved to be zero in the case of Gaussianmemoryless sources and a mean-squared error (MSE) distortion metric.

The results of Wyner and Ziv were further extended by other researchers, for

example, proving that the equality also holds in case X is equal to the sum ofarbitrarily distributed Y and independent Gaussian noise [46] In addition, Zamirshowed that the rate loss for sources with general statistics is less than 0.5 bitsper sample when using the MSE as a distortion metric [68].

2.3 General Concept

Trang 40

The theorems of Slepian–Wolf and Wyner–Ziv apply to DSC, and therefore alsoto the specific case of DVC Basically, the theorems indicate that a DVC systemshould be able to achieve the same compression performance as a conventionalvideo compression system However, the proofs do not provide insights on howto actually construct such a system As a result, the first DVC systems haveappeared in the scientific literature only about 30 years later.

The common approach in the design of a DVC system is to consider Y as being acorrupted version of X This way, the proposed setup becomes highly similar to

a channel-coding scenario In the latter, a sequence of information

symbols X could be sent across an error-prone communication channel, sothat Y has been received instead of X To enable successful recovery of X at the

receiver’s end, the sender could include additional error-correcting information

calculated on X, such as turbo or low-density parity-check (LDPC) codes [33].The difference between such a channel-coding scenario and the setup depictedin Figure 2.3 is that in our case Y is already available at the decoder In other

words, the encoder should only send the error-correcting information to allow

recovery of X (or X′ in the lossy case) Since Y is already available at the decoderinstead of being communicated by the encoder, the errors in Y are said to be

induced by virtual noise (also called correlation noise) on a virtualcommunication channel.

2.4 Use-Case Scenarios in the Context of Wireless Networks

By generating Y itself at the decoder side as a prediction of the original X at the

encoder, the complexity balance between the encoder and the decoderbecomes totally different from a conventional video compression system suchas H.264/AVC [64] While conventional systems feature an encoder that issignificantly more complex than the decoder, in DVC the complexity balance iscompletely the opposite.

In the context of videoconferencing using mobile devices, DVC can be used incombination with conventional video coding techniques (such as H.264/AVC),which allows to assign computationally less complex steps to mobile devices,while performing computationally complex operations in the network.

Ngày đăng: 29/07/2024, 15:42

w