Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 431 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
431
Dung lượng
20,94 MB
Nội dung
Advanced Video Coding: Principles and Techniques Series Editor: J Biemond, Delft University of Technology, The Netherlands Volume Volume Volume Volume Volume Volume Volume Three-Dimensional Object Recognition Systems (edited by A.K Jain and P.J Flynn) VLSI Implementations for Image Communications (edited by P Pirsch) Digital Moving Pictures - Coding and Transmission on ATM Networks (J.-P Leduc) Motion Analysis for Image Sequence Coding (G.Tziritas and C Labit) Wavelets in Image Communication (edited by M Barlaud) Subband Compression of Images: Principles and Examples (T.A Ramstad, S.O Aase and J.H Husey) Advanced Video Coding: Principles and Techniques (K.N Ngan, T Meier and D Chai) ADVANCES IN IMAGE COMMUNICATION Advanced Video Coding: Principles and Techniques King N N g a n , T h o m a s M e i e r and D o u g l a s Chai University of Western Australia, Dept of Electrical and Electronic Engineering, Visual Communications Research Group, Nedlands, Western Australia 6907 1999 Elsevier Amsterdam - Lausanne - New York - Oxford - Shannon - Singapore - Tokyo ELSEVIER SCIENCE B.V Sara Burgerhartstraat 25 P.O Box 211, 1000 AE Amsterdam, The Netherlands 1999 Elsevier Science B.V All rights reserved This work is protected under copyright by Elsevier Science, and the following terms and conditions apply to its use: Photocopying Single photocopies of single chapters may be made for personal use as allowed by national copyright laws Permission of the Publisher and payment of a fee is required for all other photocopying, including multiple or systematic copying, copying for advertising or promotional purposes, resale, and all forms of document delivery Special rates are available for educational institutions that wish to make photocopies for non-profit educational classroom use Permissions may be sought directly from Elsevier Science Rights & Permissions Department, PO Box 800, Oxford OX5 1DX, UK; phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail: permissions@elsevier.co.uk You may also contact Rights & Permissions directly through Elsevier's home page (http://www.elsevier.nl), selecting first 'Customer Support', then 'General Information', then 'Permissions Query Form' In the USA, users may clear permissions and make payments through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA; phone: (978) 7508400, fax: (978) 7504744, and in the UK through the Copyright Licensing Agency Rapid Clearance Service (CLARCS), 90 Tottenham Court Road, London WlP 0LP, UK; phone: (+44) 171 631 5555; fax: (+44) 171 631 5500 Other countries may have a local reprographic rights agency for payments Derivative Works Tables of contents may be reproduced for internal circulation, but permission of Elsevier Science is required for external resale or distribution of such material Permission of the Publisher is required for all other derivative works, including compilations and translations Electronic Storage or Usage Permission of the Publisher is required to store or use electronically any material contained in this work, including any chapter or part of a chapter Except as outlined above, no part of this work may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of the Publisher Address permissions requests to: Elsevier Science Rights & Permissions Department, at the mail, fax and e-mail addresses noted above Notice No responsibility is assumed by the Publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made First edition 1999 Library of Congress Cataloging in Publication Data A catalog record from the Library of Congress has been applied for ISBN: 4 82667 X The paper used in this publication meets the requirements of ANSI/NISO Z39.48-1992 (Permanence of Paper) Printed in The Netherlands To Nerissa, Xixiang, Simin, Siqi To Elena To June This Page Intentionally Left Blank Preface The rapid advancement in computer and telecommunication technologies is affecting every aspects of our daily lives It is changing the way we interact with each other, the way we conduct business and has profound impact on the environment in which we live Increasingly, we see the boundaries between computer, telecommunication and entertainment are blurring as the three industries become more integrated with each other Nowadays, one no longer uses the computer solely as a computing tool, but often as a console for video games, movies and increasingly as a telecommunication terminal for fax, voice or videoconferencing Similarly, the traditional telephone network now supports a diverse range of applications such as video-on-demand, videoconferencing, Internet, etc One of the main driving forces behind the explosion in information traffic across the globe is the ability to move large chunks of data over the existing telecommunication infrastructure This is made possible largely due to the tremendous progress achieved by researchers around the world in data compression technology, in particular for video data This means that for the first time in human history, moving images can be transmitted over long distances in real-time, i.e., the same time as the event unfolds over at the sender's end Since the invention of image and video compression using DPCM (differential pulse-code-modulation), followed by transform coding, vector quantization, subband/wavelet coding, fractal coding, object-oreinted coding and model-based coding, the technology has matured to a stage that various coding standards had been promulgated to enable interoperability of different equipment manufacturers implementing the standards This promotes the adoption of the standards by the equipment manufacturers and popularizes the use of the standards in consumer products JPEG is an image coding standard for compressing still images according to a compression/quality trade-off It is a popular standard for image exchange over the Internet For video, MPEG-1 caters for storage media vii viii up to a bit rate of 1.5 Mbits/s; MPEG-2 is aimed at video transmission of typically 4-10 Mbits/s but it alSo can go beyond that range to include HDTV (high-definition TV) image~ At the lower end of the bit rate spectrum, there are H.261 for videoconmrencing applications at p x 64 Kbits/s, where p = 1, , , 30; and H.263,~which can transmit at bit rates of less than 64 Kbits/s, clearly aiming at the videophony market The standards above have a number of commonalities: firstly, they are based on predictive/transform coder architecture, and secondly, they process video images as rectangular frames These place severe constraints as demand for greater variety and access of video content increases Multimedia including sound, video, graphics, text, and animation is contained in many of the information content encountered in daily life Standards have to evolve to integrate and code the multimedia content The concept of video as a sequence of rectangular frames displayed in time is outdated since video nowadays can be captured in different locations and composed as a composite scene Furthermore, video can be mixed with graphics and animation to form a new video, and so on The new paradigm is to view video content as audiovisual object which as an entity can be coded, manipulated and composed in whatever way an application requires MPEG-4 is the emerging stanc lard for the coding of multimedia content It defines a syntax for a set c,f content-based functionalities, namely, content-based interactivity, compre ssion and universal access However, it does not specify how the video tent is to be generated The process of video generation is difficult and under active research One simple way is to capture the visual objects separately , as it is done in TV weather reports, where the weather reporter stands in front of a weather map captured separately and then composed together y i t h the reporter The problem is this is not always possible as in the case mj outdoor live broadcasts Therefore, automatic segmentation has to be employed to generate the visual content in real-time for encoding Visual content is segmented as semantically meaningful object known as video objec I plane The video object plane is then tracked making use of the tempora ~ correlation between frames so that its I location is known in subsequent frames Encoding can then be carried out using MPEG-4 " L This book addresses the more ~dvanced topics in video coding not included in most of the video codingbooks in the market The focus of the book is on coding of arbitrarily shaped visual objects and its associated topics | It is organized into six chapters:Image and Video Segmentation (Chapter 1), Face Segmentation (Chapter" 2), Foreground/Background Coding ix (Chapter 3), Model-based Coding (Chapter 4), Video Object Plane Extraction and Tracking (Chapter 5), and MPEG-4 Video Coding Standard (Chapter 6) Chapter deals with image and video segmentation It begins with a review of Bayesian inference and Markov random fields, which are used in the various techniques discussed throughout the chapter An important component of many segmentation algorithms is edge detection Hence, an overview of some edge detection techniques is given The next section deals with low level image segmentation involving morphological operations and Bayesian approaches Motion is one of the key parameters used in video segmentation and its representation is introduced in Section 1.4 Motion estimation and some of its associated problems like occlusion are dealt with in the following section In the last section, video segmentation based on motion information is discussed in detail Chapter focuses on the specific problem of face segmentation and its applications in videoconferencing The chapter begins by defining the face segmentation problem followed by a discussion of the various approaches along with a literature review The next section discusses a particular face segmentation algorithm based on a skin color map Results showed that this particular approach is capable of segmenting facial images regardless of the facial color and it presents a fast and reliable method for face segmentation suitable for real-time applications The face segmentation information is exploited in a video coding scheme to be described in the next chapter where the facial region is coded with a higher image quality than the background region Chapter describes the foreground/background (F/B) coding scheme where the facial region (the foreground) is coded with more bits than the background region The objective is to achieve an improvement in the perceptual quality of the region of interest, i.e., the face, in the encoded image The F/B coding algorithm is integrated into the H.261 coder with full compatibility, and into the H.263 coder with slight modifications of its syntax Rate control in the foreground and background regions is also investigated using the concept of joint bit assignment Lastly, the MPEG-4 coding standard in the context of foreground/background coding scheme is studied As mentioned above, multimedia content can contain synthetic objects or objects which can be represented by synthetic models One such model is the 3-D wire-frame model (WFM) consisting of 500 triangles commonly used to model human head and body Model-based coding is the technique used to code the synthetic wire-frame models Chapter describes the pro- CHAPTER MPEG-4 STANDARD 398 Table 6.6: Suggested Resynchronization Marker Spacing G I S O / I E C 1998 Bit Rate (kbit/s) 0-24 25-48 49-128 128-512 512-1000 6.8.5 6.8.5.1 Spacing (bits) 480 736 1500 TBD TBD Error Resilience E n c o d i n g Tools Resynchronization Markers The resynchronization markers should be inserted by the encoder before the first macroblock after the number of bits output since the last resync_marker field exceeds a predetermined value The value used for this spacing is dependent on the anticipated error conditions of the transmission channel and compressed data rate Suggested values for this spacing are provided in Table 6.6 These values are obtained experimentally and provide good results for a variety of error conditions encountered in error-prone channels such as wireless fading channels It is highly recommended that users of MPEG-4's error resilient tools adjust this spacing of the resynchronization markers to fit the error conditions of their particular channel As shown in Fig 6.49, in addition to the resync_marker field, the encoder also inserts a field indicating the current macroblock address (MB address), a field indicating the current quantization parameter (QP) and a Header Extension Code (HEC) This additional information is provided to the decoder enabling it to determine which VOP a resync packet belongs to in case the VOP start_code is lost 6.8.5.2 Data Partitioning When the data partitioning is used, then in addition to the resync_marker fields, MB address, QP and HEC, a motion_marker field is inserted after the motion data (before the beginning of the texture data) This motion_marker field is unique from the motion data and enables the decoder to determine when all the motion information has been received correctly 6.8 ERROR RESILIENCE 6.8.5.3 399 Reversible VLCs The use of reversible VLCs enables the decoder to recover additional texture information in the presence of errors This is accomplished by first detecting the error and searching forward to the next resync_marker, once this point is determined the texture d a t a can be read in the reverse direction until an error is detected W h e n errors are detected in the texture data, the decoder can use the correctly decoded motion vector information to perform motion compensation and conceal these errors 6.8.5.4 D e c o d e r Operation W h e n an error is detected in the bitstream, the decoder should resynchronize at the next suitable resynchronization point Where a VOP header is missed or received with obvious errors, this should be the next VOP start_code Otherwise, the next resynchronization point in the bitstream should be used Under the following error conditions, the baseline decoder should resynchronize at the next resynchronization point in the bitstream: An illegal VLC is received More t h a n 64 D C T coefficients are decoded in a single block Inconsistent resynchronization header information (i.e., QP out of range, MBN(k) < M B N ( k - 1), etc.) Resynchronization marker is corrupted Under the following error conditions, the decoder should resynchronize at the next VOP header: VOP start code corrupted For other resynchronization techniques, conditions for error detection and resynchronization should be as close as possible to those outlined above Missing blocks should be replaced with the same block from the previous frame CHAPTER MPEG-4 STANDARD 400 References [1] ISO/IEC 14496-2, "Information technology - generic coding of audiovisual objects (final draft of international standard)," Dec 1998 in ISO/IEC JTC1/SC29/WG11 MPEG98/N2~59, Atlantic City, USA, Oct 1998 [2] R Koenen, "Overview of MPEG-4 standard," [3] ISO/IEC, "Managing intellectual property identification and protection within MPEG-4," in ISO/IEC JTC1/SC29/WG11 MPEG97/N1918, Fribourg, Switzerland, Oct 1997 [4] MPEG Video Group, "MPEG-4 video verification model version 11.0," in ISO/IEC JTC1/SC29/WG11 MPEG98/N2172, Tokyo, Japan, Mar 1998 [5] F Lavagetto, R Pockaj, and M Costa, "MPEG-4 compliant calibration of 3d head models," in Int Picture Coding Symposium, PCS'99, Portland, Oregon, USA, Apr 1999, pp 217-220 Index ADC-SA-DCT for Intra-Coded Macroblocks, 333 'Makefile' program, 204 2-D animated meshes, 369 mesh tracking, 370 texture mapping, 370 2-D dynamic mesh, 369 2-D mesh modeling, 370 content-based video indexing, 371 video object compression, 371 video object manipulation, 370 2-D model-based coding, 165-166 3-D displacement vector, 216 3-D feature-based coding, 166 3-D human facial modeling, 169 3-D model-based coding, 166-168 applications, 168 4:2:0 format, 307 AC coefficients, 336 AC prediction, 340 access units, 294 acoustic instrument, 304 action units (AUs), 218 brow lowerer (AU4), 218 inner brow raiser (AU 1), 218 outer brow raiser (AU2), 218 active contour, 183, 187 definition and properties, 187 numerical solution, 187 adaptive frame/field DCT, 329 adjusted 3-D WFM, 210 advanced mode, 314 advanced prediction mode, 324 atone mapping, 370 atone motion model, 38 affine transform, 171 AM modulation, 302 analysis of facial image sequences, 210 aperture problem, 35 arbitrarily shaped objects, 314 area of interest, 97, 98 arithmetic coder, 366 arithmetic mean value, 328 audio coding tools, 302 audiovisual objects (AVOs), 293 coded representation, 293 composition, 294 description, synchronization and delivery, 294 interaction with, 297 audiovisual scene, 293 AUs deformation rule, 221 automatic 3-D WFM adaptation, 203 eye model adjustment, 206 eyebrow model adjustment, 208 head model adjustment, 203 mouth model adjustment, 208 nose model adjustment, 210 B-frame, 344 B-picture, 344 background filter, 281-283 401 402 backward motion vectors, 345 backward prediction reference, 353 bandwidth scalability, 303 base layer, 349 Bayesian inference, 2-4 inversion formula, MAP estimation, 3-4 Bayesian segmentation, 28-32 multi-resolution segmentation, 32 Pappas' method, 29-32 bi-level quantization, 365 bidirectional motion compensation, 344 bilinear interpolation, 319 binary alpha block (BAB), 308, 332 binary arithmetic codeword (BAC), 309 binary format for scenes (BIFS), 367 bit rate scalability, 303 bitstream syntax, 297 block matching, 44-46, 314 hierarchical block matching, 45 mean absolute difference (MAD), 44 mean squared difference (MSD), 45 pixel difference classification (PDC), 45 body animation, 369 bottom field, 318 bounding box, 314 bounding rectangle, 306 buffer architecture, 301 buffer resources, 301 C++, 302 CAE algorithm, 308 call for proposals, 291 camera motion, 356 INDEX Canny operator, 17-19 central projection, 39 change detection mask (CDM), 233, 234, 278-280 chorus effects, 305 chrominance alpha pixel, 308 chrominance alpha plane, 307 clip-and-paste method, 218 clique, clock reference, 296 code excited linear predictive (CELP) Coding, 303 coding coding efficiency, 293 coding of audio objects, 302 coding of natural visual objects, 305 coding of synthetic objects, 367 object-oriented analysis-synthesis coding, 53 second generation techniques, 1, 20, 49 combinatorial optimization, committee draft, 292 compound AVOs, 293 compression, 305 connected operators, 22 flat zone, 23 morphological motion filters, 244 partition, 23 consistency checking, 300 constant alpha, 313 content-based approach, 292 content-based bit allocation, 107 joint bit assignment, 111-115 maximum bit transfer, 107-111 content-based coding, 305 content-based functionalities, 293 manipulation and bitstream edit- INDEX ing, 293 multimedia data access tools, 293 scalability, 293, 305 content-based rate control, 115-116 content-based scalability, 305 context computation, 309 context-based arithmetic encoding (CAE), 308 continuity constraints, 187 contour deformation process, 177 contour extraction, 178 contour-based coding, 165 control nodes, 203 control points, 174 data partitioning, 374 data recovery, 372 data retrieval, 300 Daubechies biorthogonal filter, 363 DC coefficient, 332 decoding device, 297 deformable templates, 193 delivery layer, 294 FlexMux layer, 294 TransMux layer, 296 delivery multimedia integration framework (DMIF), 298 depth information, 226 deterministic algorithms, 11-15 differential coding of motion vectors, 321 digital audio broadcasting, 302 Dijkstra's shortest path algorithm, 271 dilation, 23 direct coding, 344 discrete cosine transform (DCT), 1,330 discrete wavelet transform (DWT), 362 403 coding of the lowest subband, 363 entropy coding, 366 quantization, 365 zerotree coding of the higher subbands, 364 discriminatory quantization process, 126 displaced block, 317 displaced frame difference (DFD), 46 distance transformation, 260-262 Chamfer 3-4, 261 Chamfer 5- 7-11, 261 DMIF, 298 DMIF architecture, 298 DSM-CC SRM functionality, 299 edge detection, 15-19 Canny operator, 17-19 Frei-Chen operator, 17 gradient operators, 16-17 non-maximum supression, 19 Prewitt operator, 17 Sobel operator, 16, 17 edge potentials, 196 edge sample, 323 edges, 180 eight-parameter model, 39 elementary streams, 294 ellipse fitting, 61 encoder/decoder complexity sealability, 304 end user, 297 end-to-end delay, 301 energy functional, 191 energy minimizing spline, 187 enhancement layer, 349 epochs eye template, 197 mouth template, 200 404 erosion, 23 error concealment, 372 error detection, 300 error measure, 317 error recovery, 300 error resilience, 371 error robustness, 293 error-prone environments, 293 expansion energy, 183 extended pixels, 317 extended SA-DCT, 332 exterior macroblocks, 314 external constraint force, 177, 187 eye extraction, 194-198 definitions and properties, 194 implementation, 196 eye template, 195 eye-to-eye axis, 201 eyebrow extraction, 190 f_code, 317 face animation table (FAT), 367 face detection, s e e face segmentation face extraction, s e e face segmentation face interpolation technique (FIT), 369 face location, s e e face segmentation face outline, 171 face profile contour, 204 face profile extraction, 191-193 face recognition, 177 face segmentation, 59 algorithm, 75 color segmentation, 77 contour extraction, 84 density regularization, 79 geometric correction, 83 INDEX luminance regularization, 81 applications, 64 coding, 64 content-based representation, 66 face classification, 66 face identification, 66 face recognition, 66 face tracking, 68 facial expression study, 68 image enhancement, 66 model fitting, 66 model-based coding, 66 MPEG-4, 66 multimedia databased indexing, 68 experimental results, 84 success rate, 93 various approaches, 60 color analysis, 63 motion analysis, 62 shape analysis, 61 statistical analysis, 62 facial action coding system (FACS), 218 facial animation, 367 facial animation parameter (FAP), 367 facial description parameter (FDP), 367 facial expression estimation, 213 facial expression parameters (FEP), 216 facial expressions, 203 facial feature contours extraction, 175 facial features components, 178 facial muscular actions, 221 facial structure deformation method, 218 INDEX FB, see foreground/background feathering algorithm, 312 feathering coding, 311 feathering filter, 312 feature points, 171 feature_distance, 312 features extraction, 183 features of MPEG-4, 292 compression, 293 content-based interactivity, 293 universal access, 293 field DCT, 329 field motion vectors, 318 field_prediction flag, 324 final committee draft, 292 final draft international standard (FDIS), 292 first texture basis, 225 flat zone, 23 foreground/background regions, 98, 106 video coding scheme, 98 benefits, 98 MPEG-4, 155 related works, 100 forward motion vectors, 345 forward quantization, 336 frame, 292 frame difference (FD), 54 framing, 301 full search, 318 furrow texture, 223 generalized scalable coding, 349 encoding of enhancement layer, 352 upsampling process, 351 generic 3-D face wire frame model, 171 geodesic dilation, 24 405 geodesic erosion, 24 Gibbs distribution, Gibbs sampler, 10-11 annealing schedule, 11 global motion estimation, 239-243 least median of squares (LMS) method, 242-243 least squares (LS) method, 241 global motion parameters, 356 Gram-Schmidt orthonormalization, 223 gray scale shape coding, 311 greedy algorithm, 12, 189 H.261, 117 source coder, 117 syntax structure, 118 unspecified encoding procedures, 120 video data format, 117 H.261FB, 116 bit allocation, 124-125 experimental results, 129-149 implementation, 123 rate control, 126-128 H.263FB, 149 experimental results, 151-155 implementation, 149-151 Hammersley-Clifford theorem, Harmonic Vector eXcitation Coding (HVXC), 303 Hausdorff distance, 258-263 distance transformation, 260262 early scan termination, 262 generalized Hausdorff distance, 259 head motion estimation, 213 head motion parameter (HMP), 214 hierarchical object representation model, 49 406 higher-order statistics (HOS), 233 highest confidence first (HCF), 1315 confidence, 14 uncommitted state, 14 highpass analysis filter, 362 highpass synthetic filter, 362 horizontal upsampling, 352 Horn and Schunk method, 42 human skin color, 69 color analysis, 63 color segmentation, 77 limitations, 74-75 color space, 70-74 map, 75 results, 85 modeling, 69 hybrid natural and synthetic data coding, 293 I-VOP, 308, 358 image content, 292 image convolution, 180 image forces, 177 image intensity, 180 image morphological processing, 180 edge image, 183 erosion and dilation, 180 opening and closing, 181 peak and valley images, 181 smoothing, 183 image morphology, 180 image potentials, 196 image segmentation, 20-32 image simplification, 23-26 indexed probability, 309 inner products, 225 intellectual property, 297 intellectual property identification (IPI), 298 INDEX intellectual property rights (IPR), 298 intensity regions, 196 inter mode, 308 inter-picture interval, 302 interactive peer, 299 interlaced motion compensation, 326 interleaving data, 300 internal energy, 187 international standard (IS), 292 international standards organization (ISO), 291 intra macroblocks, 328 intra mode, 308 intra quantizer matrix, 336 inverse DCT, 331 iterated conditional modes (ICM), 12-13 joint motion estimation and segmentation, 56-58 K-means algorithm, 31 linear feathering, 313 low-low band, 363 lower lip, 199 luminance alpha pixels, 308 luminance alpha plane, 307 macroblocks, 306 manipulation, 305 marker extraction, 26-27 Markov random field (MRF), 4-7 clique, Gibbs distribution, Hammersley-Clifford theorem, neighborhood system, 4, potential function, media objects, 296 INDEX Metropolis algorithm, 8-10 mismatch control, 338 morphological mask, 180 morphological motion filtering, 243254 filter criterion, 246 increasingness, 248 max-tree representation, 244 min-tree representation, 244 Viterbi algorithm, 249-252 morphological operations, 180 morphological segmentation, 2228 marker extraction, 26-27 simplification, 23-26 watershed algorithm, 27-28 morphology, 21 connected operators, 22 dilation, 23 erosion, 23 geodesic dilation, 24 geodesic erosion, 24 morphological closing, 24 morphological closing by reconstruction, 25 morphological gradient, 27 morphological opening, 24 morphological opening by reconstruction, 25 reconstruction by dilation, 25 reconstruction by erosion, 25 simplification filters, 23-26 watershed algorithm, 27-28 motion, 32-40 affine motion model, 38 aperture problem, 35 apparent motion, 33 background to be covered, 40 correspondence vector, 33 displaced frame difference (DFD), 407 46 eight-parameter model, 39 frame difference (FD), 54 non-parametric motion field representation, 35-36 occlusion problem, 40 optical flow, 33 optical flow constraint (OFC), 34 parametric motion field representation, 36-40 planar patch, 38 real motion, 33 twelve-parameter model, 39 uncovered background, 40 motion and texture coder, 307 motion compensation, 314 motion estimation, 41-48, 316 affine motion model, 38 aperture problem, 35 background to be covered, 40 Bayesian approaches, 47-48 block matching, 44-46 displaced frame difference (DFD), 46 eight-parameter model, 39 frame difference (FD), 54 global motion estimation, 239243 gradient-based methods, 4244 half sample search, 319 hierarchical block matching, 45 Horn and Schunck method, 42 integer pixel motion estimation, 317 occlusion problem, 40 pixel-recursive algorithms, 4647 polygon matching, 316 408 quarter sample search, 320 twelve-parameter model, 39 uncovered background, 40 motion estimation over VOP boundaries, 323 motion segmentation, 49-58 3-D segmentation, 50-52 joint motion estimation and segmentation, 56-58 object-oriented analysis-synthesis coding, 53 spatio-temporal segmentation, 54-56 motion trajectories, 358 motion vectors, 314 mouth extraction, 198-201 definition and properties, 198 implementation, 200 mouth template, 199 MPEG-2 AAC standard, 302 MPEG-4, 2, 49 development process, 291 N2 core experiment on automatic segmentation techniques, 230 technical description, 297 MPEG-4 data streams, 300 multi-resolution segmentation, 32 multilevel quantization, 365 multimedia environments, 305 multimedia streaming, 298 multiple concurrent data streams, 293 multiplex functionality, 300 multiplex layer, 300 multiplexer, 294 multiplexing tool, 296 nasal axis, 201 national bodies, 292 natural sounds, 302 INDEX audio coding, 303 audio scalability, 303 speech coding, 303 neutral face image, 223 numerical approximation, 7-15 annealing schedule, 11 deterministic algorithms, 1115 Gibbs sampler, 10-11 greedy algorithm, 12 highest confidence first (HCF), 13-15 iterated conditional modes (ICM), 12-13 Metropolis algorithm, 8-10 simulated annealing, 8-11 object descriptor, 294 object tracking, 257-268 background filter, 281-283 Hausdorff distance, 258-263 model update, 263-268, 281 object-based temporal scalability (OTS), 353 object-oriented analysis-synthesis coding, 53 occlusion problem, 40 off-line static sprites, 358 coding, 358 generation, 358 on-line dynamic sprites, 360 coding, 361 generation, 360 opaque alpha values, 312 opaque pixel, 328 opaque_value, 312 optical flow, 33 optical flow constraint (OFC), 34 orthographic projection, 38 orthonormal DCT basis functions, 332 409 INDEX overlapped motion compensation, 325 P-picture, 344 P-VOP, 308 padding, 314 padding process, 314 extended padding, 315 horizontal and vertical padding, 314 low pass extrapolation (LPE) padding, 328 padding for chrominance components, 316 padding for interlaced macroblocks, 316 Pappas' method, 29-32 parallel projection, 38 parameterized facial model, 170 perspective projection, 39 potential function, prediction and coding of B-VOPs, 344 backward coding, 347 bidirectional coding, 347 forward coding, 347 interlaced direct coding, 345 motion vector coding, 348 progressive direct coding, 344 prediction and coding of B-VOPs mode decisions, 347 prediction block, 345 prediction mode decision, 320 prediction motion vectors, 345 primitive AVO), 293 primitive semantics, 299 prior potentials, 196 probability table, 308 Q-step scaling, 340 quality of service (QoS), 294 quality scalability, 305 quantization, 335 H.263 method, 335 MPEG method, 336 random access, 305 real time operation, 301 real-time implementation of MBC system, 169 reconstruction by dilation, 25 reconstruction by erosion, 25 rectangular imagery, 305 rectangular VOP, 306 reference points, 358 reference VOP, 323 refinement stage, 204 region-based coding, 165 repetitive padding~ 314 resynchronization, 371 resynchronization markers, 371 video packets, 371 reverberation, 302 reversible VLCs~ 375 RM8, 121 rough contour estimation routine (RCER), 178 rough contour location finding, 178 saturated integer IDCT, 331 scalability postprocessor, 350 scalable encoder, 349 scene composition, 297 scene description, 297 scores, 304 search area range, 317 segmentation K-means algorithm, 31 3-D segmentation, 50-52 Bayesian segmentation, 28-32 double partition approach, 231 410 foreground-background separation, 232 high-level segmentation, 1, 2, 49 image segmentation, 20-32 joint motion estimation and segmentation, 56-58 layered representation, 230 low-level segmentation, 1, 2, 49 morphological segmentation, 2228 motion segmentation, 49-58 multi-resolution segmentation, 32 spatio-temporal segmentation, 54-56 video object plane extraction, 49, 229-289 semantic object, 306 semantic properties, 292 shape adaptive DCT (SA-DCT), 332 shape block, 314 shape boundary, 312 shape coding, 308 binary alpha block coding, 308 gray scale shape coding, 311 shape information, 306 simulated annealing, 8-11 skin color, see human skin color SL-packet headers, 302 SL-packetized streams, 300 snake, 183 Sobel operator, 16, 17, 183 spatialization, 302 spatio-temporal gradient, 213 speech/text-driven facial animation, 169 sprite coding, 354 INDEX sprite points, 359 standardized core technologies, 305 standards MPEG-1,291 MPEG-2, 291 MPEG-4, 291 still image texture coding, 362 storage media, 299 streaming data, 294 structured audio orchestra language (SAOL), 304 structured audio score language (SASL), 304 structured descriptions, 302 structuring element, 180 subjective viewing quality, 97 sum of absolute difference (SAD), 317 support function, 311 symmetric extensions, 362 syntactic description language, 302 syntactic representation of objects, 302 syntax description, 302 synthesis of facial image sequences, 217 abstract level, 218 muscle control level, 218 node control level, 217 shape control level, 217 texture reproduction level, 217 synthesized electronic music, 304 synthesized sound, 304 score driven synthesis, 304 text-to-speech, 304 synthetic objects, 367 system decoder model, 300 buffer management, 301 demultiplexing, 300 synchronization, 300 INDEX time identification, 301 system layer model, 294 delivery layer, 294 DMIF network interface, 296 synchronization layer, 294 temporal random access, 293 temporal reference, 344 temporal resolution, 349 temporal scalability, 350 enhancement types, 354 type I, 354 type II, 354 texture basis, 223 texture coding, 362 texture description parameters (TDP), 227 texture update, 223 time base, 300 time stamps, 302 top field, 318 top-hat image, 181 translucency coding, 311 transmission bit rates, 226 transparent blocks, 328 transparent region, 314 transport network, 298 transport timestamps, 296 triangulated mesh, 171 twelve-parameter model, 39 Twin VQ, 303 unrestricted mode, 314 unrestricted motion estimation/compensation, 323 upper lip, 199 user-interactive program, 203 valley potentials, 196 variable length code (VLC), 342 verification models, 292 411 vertical displacements, 328 vertical upsampling, 351 video coding very low bit rate, 9? video data, 60 video object plane (VOP), 2, 49, 305 binary alpha plane, 307 definition, 305 formation, 306 greyscale alpha plane, 308 video object plane extraction, 49, 229-289 automatic segmentation, 234 change detection mask (CDM), 233, 234, 278-280 double partition approach, 231 foreground objects, 232 global motion estimation, 239243 layered representation, 230 object tracking, 257-268 semi-automatic segmentation, 235 video objects, 305 video segment, 354 virtual space teleconferencing, 168 virtual studio, 168 Viterbi algorithm, 249-252 cost, 249-252 trellis, 249 VLC encoding of quantized transform coefficients, 342 VOP boundary, 314 VOP encoder, 307 warping parameters, 358 watershed algorithm, 27-28 wavelet coefficients, 363 wavelet decomposition, 362 wavelet encoder, 362 412 wavelet tree, 364 wavetable bank format, 304 weighting values, 327 working draft, 292 Y, U, V components, 306 zerotree scanning, 364 zigzag scanning, 335 INDEX ... ao - -~ { I ( x o + 1, yo - 1) - I ( x o - l, yo - 1) (1.23) + 2[I(x0 + 1, y ) - I ( x o - 1, y0)] + I ( x o + l~yo + l) - I ( x o - l , y o + l) } and al - -~ { I ( x o - 1, yo + ) - I ( x o -. .. Video Coding: Principles and Techniques (K.N Ngan, T Meier and D Chai) ADVANCES IN IMAGE COMMUNICATION Advanced Video Coding: Principles and Techniques King N N g a n , T h o m a s M e i e r and. .. Foreground/Background Coding ix (Chapter 3), Model-based Coding (Chapter 4), Video Object Plane Extraction and Tracking (Chapter 5), and MPEG-4 Video Coding Standard (Chapter 6) Chapter deals with image and video