19 © 2000 by CRC Press LLC ITU-T Video Coding Standards H.261 and H.263 This chapter introduces ITU-T video coding standards H.261 and H.263, which are established mainly for videophony and videoconferencing. The basic technical detail of H.261 is presented. The technical improvements with which H.263 achieves high coding efficiency are discussed. Features of H.263+, H.263++, and H.26L are presented. 19.1 INTRODUCTION Very low bit rate video coding has found many industry applications such as wireless and network communications. The rapid convergence of standardization of digital video-coding standards is the reflection of several factors: the maturity of technologies in terms of algorithmic performance, hardware implementation with VLSI technology, and the market need for rapid advances in wireless and network communications. As stated in the previous chapters, these standards include JPEG for still image coding and MPEG-1/2 for CD-ROM storage and digital television applications. In parallel with the ISO/IEC development of the MPEG-1/2 standards, the ITU-T has developed H.261 (ITU-T, 1993) for videotelephony and videoconferencing applications in an ISDN environment. 19.2 H.261 VIDEO-CODING STANDARD The H.261 video-coding standard was developed by ITU-T study group XV during 1988 to 1993. It was adopted in 1990 and the final revision approved in 1993. This is also referred to as the P ¥ 64 standard because it encodes the digital video signals at the bit rates of P ¥ 64 Kbps, where P is an integer from 1 to 30, i.e., at the bit rates 64 Kbps to 1.92 Mbps. 19.2.1 O VERVIEW OF H.261 V IDEO -C ODING S TANDARD The H.261 video-coding standard has many features in common with the MPEG-1 video-coding standard. However, since they target different applications, there exist many differences between the two standards, such as data rates, picture quality, end-to-end delay, and others. Before indicating the differences between the two coding standards, we describe the major similarity between H.261 and MPEG-1/2. First, both standards are used to code similar video format. H.261 is mainly used to code the video with the common intermediate format (CIF) or quarter-CIF (QCIF) spatial resolution for teleconferencing application. MPEG-1 uses CIF, SIF, or higher spatial resolution for CD-ROM applications. The original motivation for developing the H.261 video-coding standard was to provide a standard that can be used for both PAL and NTSC television signals. But later, the H.261 was mainly used for videoconferencing and the MPEG-1/2 was used for digital television (DTV), VCD (video CD), and DVD (digital video disk). The two TV systems, PAL and NTSC, use different line and picture rates. The NTSC, which is used in North America and Japan, uses 525 lines per interlaced picture at 30 frames/second. The PAL system is used for most other countries, and it uses 625 lines per interlaced picture at 25 frames/second. For this purpose, the CIF was adopted as the source video format for the H.261 video coder. The CIF format consists of 352 pixels/line, 288 lines/frame, and 30 frames/second. This format represents half the active © 2000 by CRC Press LLC lines of the PAL signal and the same picture rate of the NTSC signal. The PAL systems need only perform a picture rate conversion and NTSC systems need only perform a line number conversion. Color pictures consist of one luminance and two color-difference components (referred to as Y C b C r format) as specified by the CCIR601 standard. The C b and C r components are the half-size on both horizontal and vertical directions and have 176 pixels/line and 144 lines/frame. The other format, QCIF, is used for very low bit rate applications. The QCIF has half the number of pixels and half the number of lines of CIF format. Second, the key coding algorithms of H.261 and MPEG-1 are very similar. Both H.261 and MPEG-1 use DCT-based coding to remove intraframe redundancy and motion compensation to remove interframe redundancy. Now let us describe the main differences between the two coding standards with respect to coding algorithms. The main differences include: • H.261 uses only I- and P-macroblocks but no B-macroblocks, while MPEG-1 uses three macroblock types, I-, P-, and B-macroblocks (I-macroblock is in intraframe-coded mac- roblock, P-macroblock is a predictive-coded macroblock, and B-macroblock is a bidi- rectionally coded macroblock), as well as three picture types, I-, P-, and B-pictures as defined in Chapter 16 for the MPEG-1 standard. • There is a constraint of H.261 that for every 132 interframe-coded macroblocks, which corresponds to 4 GOBs (group of blocks) or to one-third of the CIF pictures, it requires at least one intraframe-coded macroblock. To obtain better coding performance at low- bit-rate applications, most encoding schemes of H.261 prefer not to use intraframe coding on all the macroblocks of a picture, but only on a few macroblocks in every picture with a rotational scheme. MPEG-1 uses the GOP (group of pictures) structure, where the size of GOP (the distance between two I-pictures) is not specified. • The end-to-end delay is not a critical issue for MPEG-1, but is critical for H.261. The video encoder and video decoder delays of H.261 need to be known to allow audio compensation delays to be fixed when H.261 is used in interactive applications. This will allow lip synchronization to be maintained. • The accuracy of motion compensation in MPEG-1 is up to a half-pixel, but is only a full-pixel in H.261. However, H.261 uses a loop filter to smooth the previous frame. This filter attempts to minimize the prediction error. • In H.261, a fixed picture aspect ratio of 4:3 is used. In MPEG-1, several picture aspect ratios can be used and the picture aspect ratio is defined in the picture header. • Finally, in H.261, the encoded picture rate is restricted to allow up to three skipped frames. This would allow the control mechanism in the encoder some flexibility to control the encoded picture quality and satisfy the buffer regulation. Although MPEG-1 has no restriction on skipped frames, the encoder usually does not perform frame skipping. Rather, the syntax for B-frames is exploited, as B-frames require much fewer bits than P-pictures. 19.2.2 T ECHNICAL D ETAIL OF H.261 The key technologies used in the H.261 video-coding standard are the DCT and motion compen- sation. The main components in the encoder include DCT, prediction, quantization (Q), inverse DCT (IDCT), inverse quantization (IQ), loop filter, frame memory, variable-length coding, and coding control unit. A typical encoder structure is shown in Figure 19.1. The input video source is first converted to the CIF frame and then is stored in the frame memory. The CIF frame is then partitioned into GOBs. The GOB contains 33 macroblocks, which are 1 / 12 of a CIF picture or N of a QCIF picture. Each macroblock consists of six 8 ¥ 8 blocks among which four are luminance ( Y ) blocks and two are chrominance blocks (one of C b and one of C r ). For the intraframe mode, each 8 ¥ 8 block is first transformed with DCT and then quantized. The variable-length coding (VLC) is applied to the quantized DCT coefficients with a zigzag scanning order such as in MPEG-1. The resulting bits are sent to the encoder buffer to form a bitstream. For the interframe-coding mode, frame prediction is performed with motion estimation in a similar manner to that in MPEG-1, but only P-macroblocks and P-pictures, no B-macroblocks and B-pictures, are used. Each 8 ¥ 8 block of differences or prediction residues is coded by the same DCT coding path as for intraframe coding. In the motion-compensated predictive coding, the encoder should perform the motion estimation with the reconstructed pictures instead of the original video data, as it will be done in the decoder. Therefore, the IQ and IDCT blocks are included in the motion compensation loop to reduce the error propagation drift. Since the VLC operation is lossless, there is no need to include the VLC block in the motion compensation loop. The role of the spatial filter is to minimize the prediction error by smoothing the previous frame that is used for motion compensation. The loop filter is a separable 2-D spatial filter that operates on an 8 ¥ 8 block. The corresponding 1-D filters are nonrecursive with coefficients , , . At block boundaries, the coefficients are 0, 1, 0 to avoid the taps falling outside the block. It should be noted that MPEG-1 uses subpixel accurate motion vectors instead of a loop filter to smooth the anchor frame. The performance comparison of two methods should be interesting. The role of coding control includes the rate control, the buffer control, the quantization control, and the frame rate control. These parameters are intimately related. The coding control is not the part of the standard; however, it is an important part of the encoding process. For a given target bit rate, the encoder has to control several parameters to reach the rate target and at the same time provide reasonable coded picture quality. Since H.261 is a predictive coder and the VLCs are used everywhere, such as coding quantized DCT coefficients and motion vectors, a single transmission error may cause a loss of synchronization and consequently cause problems for the reconstruction. To enhance the performance of the H.261 video coder in noisy environments, the transmitted bitstream of H.261 can optionally contain a BCH (Bose, Chaudhuri, and Hocquengham) (511,493) forward error-correction code. The H.261 video decoder performs the inverse operations of the encoder. After optional error correction decoding, the compressed bitstream enters the decoder buffer and then is parsed by the variable-length decoder (VLD). The output of the VLD is applied to the IQ and IDCT where the data are converted to the values in the spatial domain. For the interframe-coding mode, the motion FIGURE 19.1 Block diagram of a typical H.261 video encoder. (From ITU-T Recommendation H.261, March 1993. With permission.) 1 4 § 1 2 § 1 4 § © 2000 by CRC Press LLC © 2000 by CRC Press LLC compensation is performed and the data from the macroblocks in the anchor frame are added to the current data to form the reconstructed data. 19.2.3 S YNTAX D ESCRIPTION The syntax of H.261 video coding has a hierarchical layered structure. From the top to the bottom the layers are picture layer, GOB layer, macroblock layer, and block layer. 19.2.3.1 Picture Layer The picture layer begins with a 20-bit picture start code (PSC). Following the PSC, there are temporal reference (5-bit), picture type information (PTYPE, 6-bit), extra insertion information (PEI, 1-bit), and spare information (PSPARE). Then the data for GOBs are followed. 19.2.3.2 GOB Layer A GOB corresponds to 176 pixels by 48 lines of Y and 88 pixels by 24 lines of C b and C r . The GOB layer contains the following data in order: 16-bit GOB start code (GBSC), 4-bit group number (GN), 5-bit quantization information (GQUANT), 1-bit extra insertion information (GEI), and spare information (GSPARE). The number of bits for GSPARE is variable depending on the set of GEI bits. If GEI is set to “1,” then 9 bits follow, consisting of 8 bits of data and another GEI bit to indicate whether a further 9 bits follow, and so on. Data of the GOB header are then followed by data for macroblocks. 19.2.3.3 Macroblock Layer Each GOB contains 33 macroblocks, which are arranged as in Figure 19.2. A macroblock consists of 16 pixels by 16 lines of Y that spatially correspond to 8 pixels by 8 lines each of C b and C r . Data in the bitstream for a macroblock consist of a macroblock header followed by data for blocks. The macroblock header may include macroblock address (MBA) (variable length), type information (MTYPE) (variable length), quantizer (MQUANT) (5 bits), motion vector data (MVD) (variable length), and coded block pattern (CBP) (variable length). The MBA information is always present and is coded by VLC. The VLC table for macroblock addressing is shown in Table 19.1. The presence of other items depends on macroblock type information, which is shown in the VLC Table 19.2. 19.2.3.4 Block Layer Data in the block layer consists of the transformed coefficients followed by an end of block (EOB) marker (10 bits). The data of transform coefficients (TCOEFF) is first converted to the pairs of RUN and LEVEL according to the zigzag scanning order. The RUN represents the number of successive zeros and the LEVEL represents the value of nonzero coefficients. The pairs of RUN and LEVEL are then encoded with VLCs. The DC coefficient of an intrablock is coded by a fixed- length code with 8 bits. All VLC tables can be found in the standard document (ITU-T, 1993). FIGURE 19.2 Arrangement of macroblocks in a GOB. (From ITU-T Recommendation H.261, March 1993. With permission.) © 2000 by CRC Press LLC 19.3 H.263 VIDEO-CODING STANDARD The H.263 video-coding standard (ITU-T, 1996) is specifically designed for very low bit rate applications such as practical video telecommunication. Its technical content was completed in late 1995 and the standard was approved in early 1996. 19.3.1 O VERVIEW OF H.263 V IDEO C ODING The basic configuration of the video source coding algorithm of H.263 is based on the H.261. Several important features that are different from H.261 include the following new options: unre- stricted motion vectors, syntax-based arithmetic coding, advanced prediction, and PB-frames. All these features can be used together or separately for improving the coding efficiency. The H.263 TABLE 19.1 VLC Table for Macroblock Addressing MBA Code MBA Code MBA Code 1 1 13 0000 1000 25 0000 0100 000 2 011 14 0000 0111 26 0000 0011 111 3 010 15 0000 0110 27 0000 0011 110 4 0011 16 0000 0101 11 28 0000 0011 101 5 0010 17 0000 0101 10 29 0000 0011 100 6 0001 1 18 0000 0101 01 30 0000 0011 011 7 0001 0 19 0000 0101 00 31 0000 0011 010 8 0000 111 20 0000 0100 11 32 0000 0011 001 9 0000 110 21 0000 0100 10 33 0000 0011 000 10 0000 1011 22 0000 0100 011 MBA stuffing 0000 0001 111 11 0000 1010 23 0000 0100 010 Start code 0000 0000 0000 0001 12 0000 1001 24 0000 0100 001 TABLE 19.2 VLC Table for Macroblock Type Prediction MQUANT MVD CBP TCOEFF VLC Intra x 0001 Intra x x 0000 001 Inter x x 1 Inter x x x 0000 1 Inter+MC x 0000 0000 1 Inter+MC x x x 0000 0001 Inter+MC x x x x 0000 0000 01 Inter+MC+FIL x 001 Inter+MC+FIL x x x 01 Inter+MC+FIL x x x x 0000 01 Notes: 1. “x” means that the item is present in the macroblock, 2. It is possible to apply the filter in a non-motion-compensated macroblock by declaring it as MC+FIL but with a zero vector. © 2000 by CRC Press LLC video standard can be used for both 625-line and 525-line television standards. The source coder operates on the noninterlaced pictures at picture rate about 30 pictures/second. The pictures are coded as luminance and two color difference components ( Y , C b , and C r ). The source coder is based on a CIF. Actually, there are five standardized formats which include sub-QCIF, QCIF, CIF, 4CIF, and 16CIF. The detail of formats is shown in Table 19.3. It is noted that for each format, the chrominance is a quarter the size of the luminance picture, i.e., the chrominance pictures are half the size of the luminance picture in both horizontal and vertical directions. This is defined by the ITU-R 601 format. For CIF format, the number of pixels/line is compatible with sampling the active portion of the luminance and color difference signals from a 525- or 626-line source at 6.75 and 3.375 MHz, respectively. These frequencies have a simple relationship to those defined by the ITU-R 601 format. 19.3.2 T ECHNICAL F EATURES OF H.263 The H.263 encoder structure is similar to the H.261 encoder with the exception that there is no loop filter in H.263 encoder. The main components of the encoder include block transform, motion- compensated prediction, block quantization, and VLC. Each picture is partitioned into groups of blocks, which are referred to as GOBs. A GOB contains a multiple number of 16 lines, k * 16 lines, depending on the picture format ( k = 1 for sub-QCIF, QCIF; k = 2 for 4CIF; k = 4 for 16CIF). Each GOB is divided into macroblocks that are the same as in H.261 and each macroblock consists of four 8 ¥ 8 luminance blocks and two 8 ¥ 8 chrominance blocks. Compared with H.261, H.263 has several new technical features for the enhancement of coding efficiency for very low bit rate applications. These new features include picture-extrapolating motion vectors (or unrestricted motion vector mode), motion compensation with half-pixel accuracy, advanced prediction (which includes variable-block-size motion compensation and overlapped block motion compensation), syntax-based arithmetic coding, and PB-frame mode. 19.3.2.1 Half-Pixel Accuracy In H.263 video coding, half-pixel accuracy motion compensation is used. The half-pixel values are found using bilinear interpolation as shown in Figure 19.3. Note that H.263 uses subpixel accuracy for motion compensation instead of using a loop filter to smooth the anchor frames as in H.261. This is also done in other coding standards, such as MPEG-1 and MPEG-2, which also use half-pixel accuracy for motion compensation. In MPEG-4 video, quarter-pixel accuracy for motion compensation has been adopted as a tool for version 2. 19.3.2.2 Unrestricted Motion Vector Mode Usually motion vectors are limited within the coded picture area of anchor frames. In the unrestricted motion vector mode, the motion vectors are allowed to point outside the pictures. When the values TABLE 19.3 Number of Pixels per Line and the Number of Lines for Each Picture Format Picture Format Number of Pixels for Luminance ( dx ) Number of Lines for Luminance ( dy ) Number of Pixels for Chrominance ( dx /2) Number of Lines for Chrominance ( dy /2) Sub-QCIF 128 96 64 48 QCIF 176 144 88 72 CIF 352 288 176 144 4CIF 704 576 352 288 16CIF 1408 1152 704 576 © 2000 by CRC Press LLC of the motion vectors exceed the boundary of the anchor frame in the unrestricted motion vector mode, the picture-extrapolating method is used. The values of reference pixels outside the picture boundary will take the values of boundary pixels. The extension of the motion vector range is also applied to the unrestricted motion vector mode. In the default prediction mode, the motion vectors are restricted to the range of [–16, 15.5]. In the unrestricted mode, the maximum range for motion vectors is extended to [–31.5, 31.5] under certain conditions. 19.3.2.3 Advanced Prediction Mode Generally, the decoder will accept no more than one motion vector per macroblock for baseline algorithm of H.263 video-coding standard. However, in the advanced prediction mode, the syntax allows up to four motion vectors to be used per macroblock. The decision to use one or four vectors is indicated by the macroblock type and coded block pattern for chrominance (MCBPC) codeword for each macroblock. How to make this decision is the task of the encoding process. The following example gives the steps of motion estimation and coding mode selection for the advanced prediction mode in the encoder. Step 1. Integer pixel motion estimation: (19.1) where SAD is the sum of absolute difference, values of ( x , y ) are within the search range, N is equal to 16 for 16 ¥ 16 block, and N is equal to 8 for 8 ¥ 8 block. (19.2) (19.3) Step 2. Intra/intermode decision: If A < ( SAD inter – 500), this macroblock is coded as intra-MB; otherwise, it is coded as inter-MB, where SAD inter is determined in step 1, and (19.4) FIGURE 19.3 Half-pixel prediction by bilinear interpolation. SAD x y original N j N i N ,, () =- = - = - ÂÂ previous 0 1 0 1 SAD SAD x y 48 8¥ = () Â , SAD SAD x y SAD inter = () () ¥ min , , . 16 48 AMB mean ji == == ÂÂ original 0 15 0 15 © 2000 by CRC Press LLC If this macroblock is determined to be coded as inter-MB, go to step 3. Step 3. Half-pixel search: In this step, half-pixel search is performed for both 16 ¥ 16 blocks and 8 ¥ 8 blocks as shown in Figure 19.3. Step 4. Decision on 16 ¥ 16 or four 8 ¥ 8 (one motion vector or four motion vectors per macroblock): If SAD 4 x 8 < SAD 16 – 100, four motion vectors per macroblock will be used, one of the motion vectors is used for all pixels in one of the four luminance blocks in the macroblock, otherwise, one motion vector will be used for all pixels in the mac- roblock. Step 5. Differential coding of motion vectors for each of 8 ¥ 8 luminance block is performed as in Figure 19.4. When it has been decided to use four motion vectors, the MVD CHR motion vector for both chrominance blocks is derived by calculating the sum of the four luminance vectors and dividing by 8. The component values of the resulting 1 / 16 pixel resolution vectors are modified toward the position as indicated in the Table 19.4. Another advanced prediction mode is overlapped motion compensation for luminance. Actually, this idea is also used by MPEG-4, which has been described in Chapter 18. In the overlapped motion compensation mode, each pixel in an 8 ¥ 8 luminance block is a weighted sum of three values divided by 8 with rounding. The three values are obtained by the motion compensation with three motion vectors: the motion vector of the current luminance block and two of four “remote” FIGURE 19.4 Differential coding of motion vectors. MB mean i = () = ÂÂ original. j= 150 15 1 256 MVD MV P MVD MV P P Median MV MV MV P Median MV MV MV PP xxx yyy xxxx yyyy xy =- =- = () = () == 123 123 0 ,, ,, , if MB is intracoded or block is outside of picture boundary © 2000 by CRC Press LLC vectors. These remote vectors include the motion vector of the block to the left or right of the current block and the motion vector of the block above or below the current block. The remote motion vectors from other GOBs are used in the same way as remote motion vectors inside the current GOB. For each pixel to be coded in the current block, the remote motion vectors of the blocks at the two nearest block borders are used, i.e., for the upper half of the block the motion vector corresponding to the block above the current block is used while for the lower half of the block the motion vector corresponding to the block below the current block is used. Similarly, the left half of the block uses the motion vector of the block at the left side of the current block and the right half uses the one at the right side of the current block. To make this clearer, let ( MV x 0 , MV y 0 ) be the motion vector for the current block, ( MV x 1 , MV y 1 ) be the motion vector for the block either above or below, and ( MV x 2 , MV y 2 ) be the motion vector of the block either to the left or right of the current block. Then the value of each pixel, p ( x , y ) in the current 8 ¥ 8 luminance block is given by (19.5) where and (19.6) H 0 is the weighting matrix for prediction with the current block motion vector, H 1 is the weighting matrix for prediction with the top or bottom block motion vector and H 2 is the weighting matrix for prediction with the left or right block motion vector. This applies to the luminance block only. The values of H 0 , H 1 , and H 2 are shown in Figure 19.5. TABLE 19.4 Modification of 1 / 16 Pixel Resolution Chrominance Vector Components 1 / 16 Pixel Position 0123456789101112131415/16 Resulting Position 0001111111 111122/2 FIGURE 19.5 Weighting matrices for overlapped motion compensation. pxy qxy H rxy H sxy H xy,, , ,,, () = () ◊+ () ◊+ () ◊ () + () 012 48 qxypxMVyMV rxypxMVyMV xy xy ,,, ,,, () =+ + () () =+ + () 00 11 sxy px MV y MV xy ,,, () =+ + () 22 © 2000 by CRC Press LLC It should be noted that the above coding scheme is not optimized in the selection of mode decision since the decision depends only on the values of predictive residues. Optimized mode decision techniques that include the above possibilities for prediction have been considered by Weigand (1996). 19.3.2.4 Syntax-Based Arithmetic Coding As in other video-coding standards, H.263 uses VLC and variable-length decoding (VLC/VLD) to remove the redundancy in the video data. The basic principle of VLC is to encode a symbol with a specific table based on the syntax of the coder. The symbol is mapped to an entry of the table in a table lookup operation, then the binary codeword specified by the entry is sent to a bitstream buffer for transmitting to the decoder. In the decoder, an inverse operation, VLD, is performed to reconstruct the symbol by the table lookup operation based on the same syntax of the coder. The tables in the decoder must be the same as the one used in the encoder for encoding the current symbol. To obtain better performance, the tables are generated in a statistically optimized way (such as a Huffman coder) with a large number of training sequences. This VLC/VLD process implies that each symbol be encoded into a fixed-integral number of bits. An optional feature of H.263 is to use arithmetic coding to remove the restriction of fixed-integral number bits for symbols. This syntax-based arithmetic coding mode may result in bit rate reductions. 19.3.2.5 PB-Frames The PB-frame is a new feature of H.263 video coding. A PB-frame consists of two pictures, one P-picture and one B-picture, being coded as one unit, as shown in Figure 19.6. Since H.261 does not have B-pictures, the concept of a B-picture comes from the MPEG video-coding standards. In a PB-frame, the P-picture is predicted from the previous decoded I- or P-picture and the B-picture is bidirectionally predicted both from the previous decoded I- or P-picture and the P-picture in the PB-frame unit, which is currently being decoded. Several detailed issues have to be addressed at macroblock level in PB-frame mode: • If a macroblock in the PB-frame is intracoded, the P-macroblock in the PB-unit is intracoded and the B-macroblock in the PB-unit is intercoded. The motion vector of intercoded PB-macroblock is used for the B-macroblock only. • A macroblock in PB-frame contains 12 blocks for 4:2:0 format, six (four luminance blocks and two chrominance blocks) from the P-frame and six from the B-frame. The data for the six P-blocks are transmitted first and then for the six B-blocks. • Different parts of a B-block in a PB-frame can be predicted with different modes. For pixels where the backward vector points inside of coded P-macroblock, bidirectional prediction is used. For all other pixels, forward prediction is used. FIGURE 19.6 Prediction in PB-frames mode. (From ITU-T Recommendation H.263, May 1996. With permission.) [...]... communications, video storage and retrieval service, multipoint communication, and other visual communication systems H.263L is currently scheduled for approval in the year 2002 19.6 SUMMARY In this chapter, the video-coding standards for low-bit-rate applications are introduced These standards include H.261, H.263, H.263 version 2, and the versions under development, H.263++ and H.263L H.261 and H.263 are extensively... one is defined as follows If two pixels, A and B, are neighboring pixels and A is in block 1 and B is in block 2, respectively, then the filter is designed as A1 = (3 * A + B + 2) 4 (19.9a) B1 = ( A + 3 * B + 2) 2 , (19.9b) where A1 and B1 are the pixels after filtering and “/” is division with truncation © 2000 by CRC Press LLC 19.4.2.10 Alternative Inter-VLC and Modified Quantization The alternative inter-VLC... VIDEO CODING AND H.26L H.263++ is the next version of H.263 It considers adding more optional enhancements to H.263 and is the extension of H.263 version 2 It is currently scheduled be completed late in the year © 2000 by CRC Press LLC 2000 H.26L, the L standards for long term, is a project to seek more-efficient video-coding algorithms that will be much better than the current H.261 and H.263 standards...19.4 H.263 VIDEO CODING STANDARD VERSION 2 19.4.1 OVERVIEW OF H.263 VERSION 2 The H.263 version 2 (ITU-T, 1998) video-coding standard, also known as H.263+, was approved in January 1998 by the ITU-T H.263 version 2 includes a number of new optional features based on the H.263 video-coding standard These new optional features are added to broaden the application range of H.263 and to improve its coding... video-coding standard The B-pictures are predicted from either or both a previous and subsequent decoded picture in the base layer In SNR scalability (Figure 19.8), the pictures are first encoded with coarse quantization in the base layer The differences or coding error pictures between a reconstructed picture and its original in the base layer encoder are then encoded in the enhancement layer and sent to... obtain better performance of motion estimation and compensation The wrapping is defined by four motion vectors for the corners of the reference picture as shown in Figure 19.13 For the current picture with horizontal size H and vertical size V, four conceptual motion vectors, MVOO, MVOV, MVHO, and MVHV are defined for the upper-left, lower-left, upper-right, and lowerright corners of the picture, respectively... structured mode, reference picture selection mode, and independent segment decoding mode, are used to meet the need of mobile video applications The others provide the functionality of scalability such as spatial, temporal, and SNR scalability H.26L is a future standard to meet the requirements of very low bit rate, real-time, low end-to-end delay, and other advanced performance needs 19.7 EXERCISES... tool of H.263 19-2 Compared with MPEG-1 and MPEG-2, which features of H.261 and H.263 are used to improve coding performance at low bit rates? Explain the reasons 19-3 What is the difference between spatial scalability and reduced resolution update mode in H.263 video coding? 19-4 Conduct a project to compare the results by using deblocking filters in the coding loop and out of the coding loop Which method... error pictures between up-sampled decoded base layer pictures and their original picture are encoded in the enhancement layer and sent to the decoder providing the spatial enhancement pictures As in MPEG-2, spatial interpolation filters are used for the spatial scalability There are also two types of pictures in the enhancement layer: EI and EP If a decoder is able to perform spatial scalability, it... filter operations are performed across 8 ¥ 8 block edges using a set of four pixels on both horizontal and vertical directions at the block boundaries, such as shown in Figure 19.11 In the figure, the filtering process is applied to the edges The edge pixels, A, B, C, and D, are replaced by A1, B1, C1, and D1 by the following operations: © 2000 by CRC Press LLC FIGURE 19.11 permission.) Positions of filtered . Coding Standards H.261 and H.263 This chapter introduces ITU-T video coding standards H.261 and H.263, which are established mainly for videophony and videoconferencing by the CCIR601 standard. The C b and C r components are the half-size on both horizontal and vertical directions and have 176 pixels/line and 144 lines/frame.