john watkinson - the mpeg handbook

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	244
Dung lượng	8,83 MB

Nội dung

2 Table of Contents Chapter 1: Introduction to compression 7 1.1 What is MPEG? 7 1.2 Why compression is necessary 8 1.3 MPEG-1, 2 and 4 contrasted 9 1.4 Some applications of compression 9 1.5 Lossless and perceptive coding 10 1.6 Compression principles 12 1.7 Video compression 14 1.7.1 Intra-coded compression 15 1.7.2 Inter-coded compression 15 1.7.3 Introduction to motion compensation 16 1.7.4 Film-originated video compression 17 1.8 Introduction to MPEG-1 18 1.9 MPEG-2: Profiles and Levels 18 1.10 Introduction to MPEG-4 20 1.11 Audio compression 22 1.11.1 Sub-band coding 22 1.11.2 Transform coding 22 1.11.3 Predictive coding 22 1.12 MPEG bitstreams 22 1.13 Drawbacks of compression 23 1.14 Compression pre-processing 24 1.15 Some guidelines 24 Chapter 2: Fundamentals 25 2.1 What is an audio signal? 25 2.2 What is a video signal? 25 2.3 Types of video 25 2.4 What is a digital signal? 26 2.5 Sampling 28 2.6 Reconstruction 31 2.7 Aperture effect 33 2.8 Choice of audio sampling rate 36 2.9 Video sampling structures 37 2.10 The phase-locked loop 39 2.11 Quantizing 40 2.12 Quantizing error 41 2.13 Dither 43 2.14 Introduction to digital processing 44 2.15 Logic elements 45 2.16 Storage elements 46 2.17 Binary coding 47 2.18 Gain control 53 2.19 Floating-point coding 55 2.20 Multiplexing principles 56 2.21 Packets 57 2.22 Statistical multiplexing 57 2.23 Timebase correction 58 Chapter 3: Processing for compression 59 3.1 Introduction 59 3 3.2 Transforms 61 3.3 Convolution 61 3.4 FIR and IIR filters 63 3.5 FIR filters 64 3.6 Interpolation 67 3.7 Downsampling filters 74 3.8 The quadrature mirror filter 74 3.9 Filtering for video noise reduction 77 3.10 Warping 77 3.11 Transforms and duality 81 3.12 The Fourier transform 83 3.13 The discrete cosine transform (DCT) 89 3.14 The wavelet transform 91 3.15 The importance of motion compensation 94 3.16 Motion-estimation techniques 95 3.16.1 Block matching 95 3.16.2 Gradient matching 96 3.16.3 Phase correlation 96 3.17 Motion-compensated displays 99 3.18 Camera-shake compensation 100 3.19 Motion-compensated de-interlacing 102 3.20 Compression and requantizing 103 Chapter 4: Audio compression 107 4.1 Introduction 107 4.2 The deciBel 107 171 110 4.3 Audio level metering 111 4.4 The ear 112 4.5 The cochlea 113 4.6 Level and loudness 114 4.7 Frequency discrimination 115 4.8 Critical bands 116 4.9 Beats 117 4.10 Codec level calibration 118 4.11 Quality measurement 119 4.12 The limits 120 4.13 Compression applications 120 4.14 Audio compression tools 121 4.15 Sub-band coding 124 4.17 MPEG audio compression 124 4.18 MPEG Layer I audio coding 126 4.19 MPEG Layer II audio coding 128 4.20 MPEG Layer III audio coding 130 4.21 MPEG-2 AAC – advanced audio coding 131 4.23 MPEG-4 Audio 135 4.24 MPEG-4 AAC 135 4.25 Compression in stereo and surround sound 136 Chapter 5: MPEG video compression 140 5.1 The eye 140 5.2 Dynamic resolution 143 5.3 Contrast 145 5.4 Colour vision 146 4 5.5 Colour difference signals 147 5.6 Progressive or interlaced scan? 149 5.7 Spatial and temporal redundancy in MPEG 152 5.8 I and P coding 156 5.9 Bidirectional coding 156 5.10 Coding applications 158 5.11 Intra-coding 159 5.12 Intra-coding in MPEG-1 and MPEG-2 162 5.13 A bidirectional coder 165 5.14 Slices 166 5.15 Handling interlaced pictures 167 5.16 MPEG-1 and MPEG-2 coders 170 5.17 The elementary stream 171 5.18 An MPEG-2 decoder 172 5.19 MPEG-4 173 5.20 Video objects 174 5.18 An MPEG-2 decoder 177 5.19 MPEG-4 178 5.20 Video objects 179 5.21 Texture coding 182 5.22 Shape coding 186 5.23 Padding 188 5.24 Video object coding 188 5.25 Two-dimensional mesh coding 189 5.26 Sprites 193 5.27 Wavelet-based compression 194 5.28 Three-dimensional mesh coding 197 5.29 Animation 203 5.30 Scaleability 204 5.31 Coding artifacts 206 5.32 MPEG and concatenation 208 Chapter 6: Program and transport streams 213 6.1 Introduction 213 6.2 Packets and time stamps 213 6.3 Transport streams 214 6.4 Clock references 215 6.5 Program Specific Information (PSI) 216 6.6 Multiplexing 217 6.7 Remultiplexing 218 Chapter 7: MPEG applications 220 7.1 Introduction 220 7.2 Video phones 221 7.3 Digital television broadcasting 221 7.4 The DVB receiver 229 7.5 CD-Video and DVD 230 7.6 Personal video recorders 233 7.7 Networks 235 7.8 FireWire 239 7.9 Broadband networks and ATM 241 7.10 ATM AALs 243 5 The MPEG Handbook—MPEG-1, MPEG-2, MPEG-4 John Watkinson Focal Press OXFORD AUCKLAND BOSTON JOHANNESBURG MELBOURNE NEW DELHI Focal Press An imprint of Butterworth-Heinemann Linacre House, Jordan Hill, Oxford OX2 8DP 225 Wildwood Avenue, Woburn, MA 01801-2041 A division of Reed Educational and Professional Publishing Ltd A member of the Reed Elsevier plc group First published 2001 © John Watkinson 2001 All rights reserved. No part of this publication may be reproduced in any material form (including photocopying or storing in any medium by electronic means and whether or not transiently or incidentally to some other use of this publication) without the written permission of the copyright holder except in accordance with the provisions of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London, England W1P 0LP. Applications for the copyright holder’s written permission to reproduce any part of this publication should be addressed to the publishers British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloguing in Publication Data A catalogue record for this book is available from the Library of Congress For information on all Focal Press publications visit our website at www.focalpress.com ISBN 0 240 51656 7 Composition by Genesis Typesetting, Rochester, Kent Printed and bound in Great Britain For Howard and Matthew Acknowledgements Information for this book has come from a number of sources to whom I am indebted. The publications of the ISO, AES and SMPTE provided essential reference material. Thanks also to the following for lengthy discussions and debates: Peter de With, Steve Lyman, Bruce Devlin, Mike Knee, Peter Kraniauskas and Tom MacMahon. The assistance of MicroSoft Corp. and Tektronix Inc. is also appreciated. Special thanks to Mikael Reichel. 6 Preface This book completely revises the earlier book entitled MPEG-2. It is an interesting insight into the rate at which this technology progresses that this book was in preparation only a year after MPEG-2 was first published. The impetus for the revision is, of course, MPEG-4 which is comprehensively covered here. The opportunity has also been taken to improve a number of explanations and to add a chapter on applications of MPEG. The approach of the book has not changed in the slightest. Compression is a specialist subject with its own library of specialist terminology which is generally accompanied by a substantial amount of mathematics. I have always argued that mathematics is only a form of shorthand, itself a compression technique! Mathematics describes but does not explain, whereas this book explains and then describes. A chapter of fundamentals is included to make the main chapters easier to follow. Also included are some guidelines which have been found practically useful in getting the best out of compression systems. The reader who has endured this book will be in a good position to tackle the MPEG standards documents themselves, although these are not for the faint-hearted, especially the MPEG-4 documents which are huge and impenetrable. One wonders what they will come up with next! 7 Chapter 1: Introduction to compression 1.1 What is MPEG? MPEG is actually an acronym for the Moving Pictures Experts Group which was formed by the ISO (International Standards Organization) to set standards for audio and video compression and transmission. Compression is summarized in Figure 1.1 . It will be seen in (a) that the data rate is reduced at source by the compressor. The compressed data are then passed through a communication channel and returned to the original rate by the expander. The ratio between the source data rate and the channel data rate is called the compression factor. The term coding gain is also used. Sometimes a compressor and expander in series are referred to as a compander. The compressor may equally well be referred to as a coder and the expander a decoder in which case the tandem pair may be called a codec. Figure 1.1: In (a) a compression system consists of compressor or coder, a transmission channel and a matching expander or decoder. The combination of coder and decoder is known as a codec. (b) MPEG is asymmetrical since the encoder is much more complex than the decoder. Where the encoder is more complex than the decoder, the system is said to be asymmetrical. Figure 1.1 (b) shows that MPEG works in this way. The encoder needs to be algorithmic or adaptive whereas the decoder is ‘dumb’ and carries out fixed actions. This is advantageous in applications such as broadcasting where the number of expensive complex encoders is small but the number of simple inexpensive decoders is large. In point-to-point applications the advantage of asymmetrical coding is not so great. The approach of the ISO to standardization in MPEG is novel because it is not the encoder which is standardized. Figure 1.2 (a) shows that instead the way in which a decoder shall interpret the bitstream is defined. A decoder which can successfully interpret the bitstream is said to be compliant. Figure 1.2(b) shows that the advantage of standardizing the decoder is that over time encoding algorithms can improve yet compliant decoders will continue to function with them. It should be noted that a compliant decoder must correctly be able to interpret every allowable bitstream, whereas an encoder which produces a restricted subset of the possible codes can still be compliant. 8 Figure 1.2: (a) MPEG defines the protocol of the bitstream between encoder and decoder. The decoder is defined by implication, the encoder is left very much to the designer. (b) This approach allows future encoders of better performance to remain compatible with existing decoders. (c) This approach also allows an encoder to produce a standard bitstream while its technical operation remains a commercial secret. The MPEG standards give very little information regarding the structure and operation of the encoder. Provided the bitstream is compliant, any coder construction will meet the standard, although some designs will give better picture quality than others. Encoder construction is not revealed in the bitstream and manufacturers can supply encoders using algorithms which are proprietary and their details do not need to be published. A useful result is that there can be competition between different encoder designs which means that better designs can evolve. The user will have greater choice because different levels of cost and complexity can exist in a range of coders yet a compliant decoder will operate with them all. MPEG is, however, much more than a compression scheme as it also standardizes the protocol and syntax under which it is possible to combine or multiplex audio data with video data to produce a digital equivalent of a television program. Many such programs can be combined in a single multiplex and MPEG defines the way in which such multiplexes can be created and transported. The definitions include the metadata which decoders require to demultiplex correctly and which users will need to locate programs of interest. As with all video systems there is a requirement for synchronizing or genlocking and this is particularly complex when a multiplex is assembled from many signals which are not necessarily synchronized to one another. 1.2 Why compression is necessary Compression, bit rate reduction, data reduction and source coding are all terms which mean basically the same thing in this context. In essence the same (or nearly the same) information is carried using a smaller quantity or rate of data. It should be pointed out that in audio compression traditionally means a process in which the dynamic range of the sound is reduced. In the context of MPEG the same word means that the bit rate is reduced, ideally leaving the dynamics of the signal unchanged. Provided the context is clear, the two meanings can co-exist without a great deal of confusion. There are several reasons why compression techniques are popular: (a) Compression extends the playing time of a given storage device. 9 (b) Compression allows miniaturization. With fewer data to store, the same playing time is obtained with smaller hardware. This is useful in ENG (electronic news gathering) and consumer devices. (c) Tolerances can be relaxed. With fewer data to record, storage density can be reduced making equipment which is more resistant to adverse environments and which requires less maintenance. (d) In transmission systems, compression allows a reduction in bandwidth which will generally result in a reduction in cost. This may make possible a service which would be impracticable without it. (e) If a given bandwidth is available to an uncompressed signal, compression allows faster than real-time transmission in the same bandwidth. (f) If a given bandwidth is available, compression allows a better-quality signal in the same bandwidth 1.3 MPEG-1, 2 and 4 contrasted The first compression standard for audio and video was MPEG-1. Although many applications have been found, MPEG-1 was basically designed to allow moving pictures and sound to be encoded into the bit rate of an audio Compact Disc. The resultant Video-CD was quite successful but has now been superseded by DVD. In order to meet the low bit requirement, MPEG-1 downsampled the images heavily as well as using picture rates of only 24– 30 Hz and the resulting quality was moderate. [1][2] The subsequent MPEG-2 standard was considerably broader in scope and of wider appeal. For example, MPEG-2 supports interlace and HD whereas MPEG-1 did not. MPEG-2 has become very important because it has been chosen as the compression scheme for both DVB (digital video broadcasting) and DVD (digital video disk). Developments in standardizing scaleable and multi-resolution compression which would have become MPEG-3 were ready by the time MPEG-2 was ready to be standardized and so this work was incorporated into MPEG-2, and as a result there is no MPEG-3 standard. [3] MPEG-4 uses further coding tools with additional complexity to achieve higher compression factors than MPEG-2. In addition to more efficient coding of video, MPEG-4 moves closer to computer graphics applications. In the more complex Profiles, the MPEG-4 decoder effectively becomes a rendering processor and the compressed bitstream describes three-dimensional shapes and surface texture. It is to be expected that MPEG-4 will become as important to Internet and wireless delivery as MPEG-2 has become in DVD and DVB. [4] [1] ISO/IEC JTC1/SC29/WG11 MPEG, International standard ISO 11172, Coding of moving pictures and associated audio for digital storage media up to 1.5 Mbits/s (1992) [2] LeGall, D., MPEG: a video compression standard for multimedia applications. Communications of the ACM, 34, No.4, 46–58 (1991) [3] MPEG-2 Video Standard: ISO/IEC 13818–2: Information technology – generic coding of moving pictures and associated audio information: Video (1996) (aka ITU-T Rec. H-262 (1996)) [4] MPEG-4 Standard: ISO/IEC 14496–2: Information technology – coding of audio-visual objects: Amd.1 (2000) 1.4 Some applications of compression The applications of audio and video compression are limitless and the ISO has done well to provide standards which are appropriate to the wide range of possible compression products. MPEG coding embraces video pictures from the tiny screen of a videophone to the high-definition images needed for electronic cinema. Audio coding stretches from speech-grade mono to multichannel surround sound. Figure 1.3 shows the use of a codec with a recorder. The playing time of the medium is extended in proportion to the compression factor. In the case of tapes, the access time is improved because the length of tape needed for a given recording is reduced and so it can be rewound more quickly. In the case of DVD (digital video disk aka digital versatile disk) the challenge was to store an entire movie on one 12 cm disk. The storage density available with today’s optical disk technology is such that consumer recording of conventional uncompressed video would be out of the question. In communications, the cost of data links is often roughly proportional to the data rate and so there is simple economic pressure to use a high compression factor. However, it should be borne in mind that implementing the codec also has a cost which rises with compression factor and so a degree of compromise will be inevitable. 10 Figure 1.3: Compression can be used around a recording medium. The storage capacity may be increased or the access time reduced according to the application. In the case of video-on-demand, technology exists to convey full bandwidth video to the home, but to do so for a single individual at the moment would be prohibitively expensive. Without compression, HDTV (high-definition television) requires too much bandwidth. With compression, HDTV can be transmitted to the home in a similar bandwidth to an existing analog SDTV channel. Compression does not make video-on- demand or HDTV possible; it makes them economically viable. In workstations designed for the editing of audio and/or video, the source material is stored on hard disks for rapid access. Whilst top-grade systems may function without compression, many systems use compression to offset the high cost of disk storage. In some systems a compressed version of the top-grade material may also be stored for browsing purposes. When a workstation is used for off-line editing, a high compression factor can be used and artifacts will be visible in the picture. This is of no consequence as the picture is only seen by the editor who uses it to make an EDL (edit decision list) which is no more than a list of actions and the timecodes at which they occur. The original uncompressed material is then conformed to the EDL to obtain a high-quality edited work. When on- line editing is being performed, the output of the workstation is the finished product and clearly a lower compression factor will have to be used. Perhaps it is in broadcasting where the use of compression will have its greatest impact. There is only one electromagnetic spectrum and pressure from other services such as cellular telephones makes efficient use of bandwidth mandatory. Analog television broadcasting is an old technology and makes very inefficient use of bandwidth. Its replacement by a compressed digital transmission is inevitable for the practical reason that the bandwidth is needed elsewhere. Fortunately in broadcasting there is a mass market for decoders and these can be implemented as low-cost integrated circuits. Fewer encoders are needed and so it is less important if these are expensive. Whilst the cost of digital storage goes down year on year, the cost of the electromagnetic spectrum goes up. Consequently in the future the pressure to use compression in recording will ease whereas the pressure to use it in radio communications will increase. 1.5 Lossless and perceptive coding Although there are many different coding techniques, all of them fall into one or other of these categories. In lossless coding, the data from the expander are identical bit-for-bit with the original source data. The so-called ‘stacker’ programs which increase the apparent capacity of disk drives in personal computers use lossless codecs. Clearly with computer programs the corruption of a single bit can be catastrophic. Lossless coding is generally restricted to compression factors of around 2:1. It is important to appreciate that a lossless coder cannot guarantee a particular compression factor and the communications link or recorder used with it must be able to function with the variable output data rate. Source data which result in poor compression factors on a given codec are described as difficult. It should be pointed out that the difficulty is often a function of the codec. In other words data which one codec finds difficult may not be found difficult by another. Lossless codecs can be included in bit-error-rate testing schemes. It is also possible to cascade or concatenate lossless codecs without any special precautions. Higher compression factors are only possible with lossy coding in which data from the expander are not identical bit-for-bit with the source data and as a result comparing the input with the output is bound to reveal differences. Lossy codecs are not suitable for computer data, but are used in MPEG as they allow greater compression factors than lossless codecs. Successful lossy codecs are those in which the errors are arranged so that a human viewer or listener finds them subjectively difficult to detect. Thus lossy codecs must be based on an understanding of psycho-acoustic and psycho-visual perception and are often called perceptive codes. In perceptive coding, the greater the compression factor required, the more accurately must the human senses be modelled. Perceptive coders can be forced to operate at a fixed compression factor. This is convenient for practical [...]... have the same number of pixels, leading to the use of a relative unit, shown in (d), which is cyclesper-picture-height (cph) in the vertical axis and cycles-per-picture-width (cpw) in the horizontal axis 28 The computer screen has more cycles-per-millimetre than the cinema screen, but in this example has the same number of cycles-per-picture- height Spatial and temporal frequencies are related by the. .. systems is also addressed, along with definitions of how multiple MPEG bitstreams can be multiplexed As MPEG- 2 is an extension of MPEG- 1, it is easy for MPEG- 2 decoders to handle 18 MPEG- 1 data In a sense an MPEG- 1 bitstream is an MPEG- 2 bitstream which has a restricted vocabulary and so can be readily understood by an MPEG- 2 decoder MPEG- 2 has too many applications to solve with a single standard and... senses MPEG- 4 has gone upstream of the video signal which forms the input to MPEG- 1 and MPEG- 2 coders to analyse ways in which the video signal was rendered Figure 1.14(a) shows that in a system using MPEG- 1 and MPEG- 2, all rendering and production steps take place before the encoder Figure 1.14(b) shows that in MPEG- 4, some of these steps can take place in the decoder The advantage is that fewer data... smaller but also because more noise can be tolerated The wavelet transform (MPEG- 4 only) and the discrete cosine transform used in JPEG and MPEG- 1, MPEG- 2 and MPEG- 4 allow two-dimensional pictures to be described in the frequency domain and these are discussed in Chapter 3 1.7.2 Inter-coded compression 15 Inter-coding takes further advantage of the similarities between successive pictures in real material... the cut-off frequency of the filter is one-half of the sampling rate, the impulse passes through zero at the sites of all other samples It can be seen from Figure 2.11(b) that at the output of such a filter, the voltage at the centre of a sample is due to that sample alone, since the value of all other samples is zero at that instant In other words the continuous output waveform must pass through the. .. vertical resolution, downsampling to CIF actually does little damage to still images, although the very low picture rates damage motion portrayal Although MPEG- 1 appeared rather rough on screen, this was due to the very low bit rate It is more important to appreciate that MPEG- 1 introduced the great majority of the coding tools which would continue to be used in MPEG- 2 and MPEG- 4 These included an elementary... sends the former An ideal decoder would then re-create the original impression of the information quite perfectly As the ideal is approached, the coder complexity and the latency or delay both rise Figure 1.5(b) shows how complexity increases with compression factor The additional complexity of MPEG- 4 over MPEG- 2 is obvious from this Figure 1.5(c) shows how increasing the codec latency can improve the. .. to MPEG- 4 MPEG- 4 introduces a number of new coding tools as shown in Figure 1.13 In MPEG- 1 and MPEG- 2 the motion compensation is based on regular fixed-size areas of image known as macroblocks Whilst this works well at the designed bit rates, there will always be some inefficiency due to real moving objects failing to align with macroblock boundaries This will increase the residual bit rate In MPEG- 4,... how funny a joke is Often the build-up paints a certain picture in the listener’s imagination, which the punch line destroys utterly One of the author’s favourites is the one about the newly married couple who didn’t know the difference between putty and petroleum jelly – their windows fell out The difference between the information rate and the overall bit rate is known as the redundancy Compression... 5 MPEG- 4 also extends the boundaries of audio coding The MPEG- 2 AAC technique is extended in MPEG- 4 by some additional tools New tools are added which allow operation at very low bit rates for speech applications Also introduced is the concept of structured audio in which the audio waveform is synthesized at the decoder from a bitstream which is essentially a digital musical score 21 Figure 1.16: The . pictures 167 5.16 MPEG- 1 and MPEG- 2 coders 170 5.17 The elementary stream 171 5.18 An MPEG- 2 decoder 172 5.19 MPEG- 4 173 5.20 Video objects 174 5.18 An MPEG- 2 decoder 177 5.19 MPEG- 4 178 5.20. AALs 243 5 The MPEG Handbook MPEG- 1, MPEG- 2, MPEG- 4 John Watkinson Focal Press OXFORD AUCKLAND BOSTON JOHANNESBURG MELBOURNE NEW DELHI Focal Press An imprint of Butterworth-Heinemann Linacre. because their amplitudes are smaller but also because more noise can be tolerated. The wavelet transform (MPEG- 4 only) and the discrete cosine transform used in JPEG and MPEG- 1, MPEG- 2 and MPEG- 4

Ngày đăng: 05/06/2014, 12:04

Xem thêm