Advanaced video coding for reducing complexity in h 264 3

CHAPTER INTRODUCTION 1.1 Background Recently, a new international video coding standard known as H.264 [1] has been proposed by the Joint Video Team (JVT), which comprises experts from ISO/IEC’s Motion Picture Experts Group (MPEG) and ITU-T’s Video Coding Experts Group (VCEG). Compared to previous video coding standards, H.264/AVC (Advanced Video Coding) has significantly better performance in terms of being able to achieve much better peak signal-to-noise ratio (PSNR) and visual quality at the same bit rate [2,3]. This is due to the reason that a number of new techniques have been adopted. They include directional prediction for intra coded blocks, variable block sizes for inter coded blocks, multi-reference frame motion estimation, integer transform (an approximation to DCT), in-loop filter and advanced entropy coding methods such as context-based adaptive variable length coding (CAVLC) and context-based adaptive binary arithmetic coding (CABAC), etc. To achieve the highest coding efficiency, H.264/AVC uses rate distortion optimization (RDO) technique to get the best coding result in terms of maximizing coding quality and minimizing resulting data bits. The idea of RDO can be briefly explained as follows: the encoder examines all possible modes of blocks such as different directions in intra spatial predication, different block sizes and multiple reference frames in the case of inter prediction and chooses the mode with the least RDO cost. This brute-force effort of RDO achieves much better performance, but at the expense of very high computational complexity. Even with the state-of-the-art hardware technology, the real-time video coding using H.264/AVC is still a prohibitive task. Therefore, algorithms for reducing the time complexity of H.264/AVC, while maintaining the coded bit rate and reconstructed video quality, are indispensable for real-time implementation of H.264/AVC. Since early 1990’s, video coding technology has evolved continuously, generating international video coding standards such as H.261, H.263, H.263++, MPEG-1, MPEG-2, and MPEG-4 Visual. They have contributed tremendously to the successful commercialization of digital video coding. Similar to these previous video coding standards, H.264 will continue to provide technical solutions in many targeted application fields such as mobile video communication, digital media production and telemedicine. 1.2 Objectives Because of the computational complexity of H.264, the thesis aims at developing fast algorithms that can improve the encoding speed of H.264 without much loss of visual quality. In detail, the objectives are: (1) Develop new fast and efficient intra mode coding methods for H.264: these methods should adaptively select the most possible candidates for direction prediction. The approaches are capable of achieving low complexity performance of existing H.264 coding methods without much sacrifice in visual quality. (2) Explore a new scheme to inter mode coding for H.264: the scheme should effectively reduce the time spent on selecting different block sizes in inter coding with minimum sacrifice in visual quality. (3) Explore new interpolation approaches for H.264: the approaches should be lossless and greatly reduce the time incurred due to the interpolation process in the encoder. 1.3 Thesis Contributions This thesis has proposed algorithms that are able to achieve the set objectives. They are not only contributions to the academic, but to the industry as well. To be specific, the contributions are: (1) A fast mode decision algorithm is presented for intra prediction in H.264 video coding. By making use of the edge direction histogram, the number of mode combinations for luminance and chrominance blocks in a macroblock (MB) that take part in RDO calculation has been reduced significantly from 592 to as low as 132. This results in great reduction in the complexity and computation load of the encoder. Experimental results show that the fast algorithm has a very negligible loss of PSNR compared to the original scheme. (2) A fast inter mode decision algorithm is proposed to decide the best mode in inter coding of H.264. It makes use of the spatial homogeneity and the temporal stationarity characteristics of the textures of video objects. Specifically, homogeneity decision of a block is based on edge information inside the block, and co-sited MB difference is used to decide whether the MB is temporal stationary. Based on the homogeneity and stationarity of the video objects, only a small number of inter modes are used in RDO. The experimental results show that the fast algorithm is able to reduce on the average 30% encoding time, with negligible PSNR loss. (3) Two fast intra 4x4 mode elimination approaches are put forward for H.264. The lossless approach checks cost after each 4x4 block intra mode decision, and terminates if the cost is higher than the minimum cost of inter mode coding. The lossy approach, by using some low cost preprocessing to make prediction, terminates if the cost is higher than some fraction of the minimum cost of inter mode. Experimental results show that the lossless approach can reduce the encoding time without any sacrifice of visual quality. The lossy approach can further reduce encoding time with negligible PSNR loss or bit rate increment. (4) Two adaptive interpolation methods are also presented that significantly reduce the interpolation operation required in H.264 video coding. By making use of flag matrix data structure and interpolation on-demand, the proposed methods are able to increase encoder speed greatly without any PSNR loss or bit rate increase. 1.4 Organization of the Thesis The rest of this thesis is structured as follows. In Chapter 2, a brief introduction to H.264 is given and some existing H.264 fast encoding methods are reviewed. In Chapter 3, a fast intra mode decision method is proposed. A fast inter mode decision approach is given in Chapter 4. Intra 4x4 mode elimination approaches are presented in Chapter 5. Adaptive interpolation methods are described in Chapter 6. In Chapter 7, the contributions of this thesis are summarized and future work is outlined. CHAPTER H.264 AND LITERATURE SURVEY In this chapter, an overview of the H.264 standard will be presented. Furthermore, some important aspects of the standard will be briefly introduced. They include the architectures of the encoder and the decoder, inter mode decision, intra mode decision and motion estimation. A literature survey is presented in the later part of the chapter. 2.1 H.264 The well-known international standards on video coding such as H.261, H.263, MPEG-1, MPEG-2 and MPEG-4 Visual have been developed in the past one or two decades. A few years ago, the ITU-T video coding expert group (VCEG) aimed at putting long-term effort to further development of a new standard for low bit rate video coding and communication applications. This long-term effort has resulted in the standardization of H.26L, which demonstrates much higher compression efficiency than existing ITU-T standards such as H.261 and H.263. In 2001, the ISO /MPEG joined ITU-T/VCEG for further development of H.26L. A joint team called Joint Video Team (JVT) was formed, whose main goal was to improve the draft H.26L model into a final, complete international standard. This new standard is known as advanced video coding (AVC), which is also called MPEG-4 Part 10. It also has an ITU-T document number, H.264. In 2003, H.264 was ratified as recommendation by ITU-T and in the same year, AVC was accepted as international standard by ISO. H.264 and/or AVC will be used interchangeably in this thesis. H.264 was proposed under the MPEG requirement for advanced video coding tools. Compared with MPEG-4 Visual, it has a narrower scope and targets mainly at supporting more efficient and robust coding and transmission of video frames instead of segmenting different objects inside the frames. Its original aim was to provide similar functionality to existing video coding standards such as H.263 and Simple Profile of MPEG-4 Visual but with significantly better compression efficiency and more robust and reliable transportation over transmission channels. It targets at applications including duplex video communication, also known as video conferencing or video telephony, digital television broadcasting, digital video streaming, telemedicine applications, digital video storage and digital cinema, etc. Support for robust transmission over various network architecture is built inside. In addition, the standard is designed to facilitate implementations on a wide range of processor platforms such as Intel, AMD and Sun Solaris. One aspect on which the standard differentiates itself and other existing video coding standards is an attempt to interoperate easily among different developers to avoid misinterpretation [4, 5]. The elements common to all video coding standards are present in the current H.264 recommendation. Specifically, macroblocks are 16x16 in size. Luminance is represented with higher resolution than chrominance with 4:2:0 sub-sampling. Motion compensation and block transforms are followed by scalar quantization and entropy coding. Motion vectors are predicted from the median of the motion vectors of neighboring blocks. Bi-directional B-pictures are supported that may be motion compensated from both temporally previous and subsequent pictures. A direct mode exists for B-pictures in which both forward and backward motion vectors are derived from the motion vector of a co-located macroblock in a reference picture. In addition, H.264 has many advantages that distinguish itself from existing video coding standards, while at the same time having similar common features with other existing standards. Some of the key advantages of H.264 are: (1) Up to 50% in bit rate savings compared to MPEG-4 Visual; (2) Much better visual quality and PSNR value; (3) Better error resilience technology; (4) Network adaptation friendliness. Experimental results [6, 7, 8, 9] have demonstrated that H.264 has achieved substantial better video quality over that achieved by H.261, H.263, MPEG-2, and MPEG-4 Visual. The JVT reference model software is able to achieve up to 50% in bit rate saving compared with the existing H.263 or MPEG-4 Visual codec. In other words, this implies that H.264 provides significantly better visual quality using the same bit rates. In addition to new coding features such as inter and intra prediction, H.264 utilizes some error resilient techniques to cope with different channel environments. These characteristics make H.264 an ideal codec for applications with very limited channel capacity, storage limitation and extremely error prone channels such as mobile communication and video telephony. Because of its high compression ratio, the H.264 codec can be utilized to generate high quality video using lower bit rates. It is expected that H.264 will gradually replace previous video coding standards such as H.263 and MPEG-4 Visual. Therefore, it is no doubt that H.264 will be a strong competitor in the deployment of next generation multimedia applications. Besides, one important feature of H.264 is that it is an open standard. This will bring the codec price down, making the technology affordable to everyone. Furthermore, the bit stream format of H.264 is non-proprietary, which is crucial in today’s multimedia application environment. 2.2 H.264 Encoder Dn + Fn (current) Integer Transform Quantization X Reorder Entropy Encode NAL Predictive Coding (Intra / Inter) F‘n-1 (reference) F‘n (reconstructed) uF‘n Inloop Filter P + D‘n + Inverse Transform Inverse Quantization Figure 2.1 H.264 encoder architecture Figure 2.1 illustrates the architecture of the H.264 encoder. The current frame, denoted by fn, is one of the original video sequences input to encoder. The frame will be divided into partitions in the unit of marcoblocks, each of 16 by 16 pixels in size. These macroblocks will be encoded one by one in the raster scan order until the whole frame has been processed. Predictive coding is applied to each macroblock, in the sense that it will be encoded either as intra mode or inter mode. A prediction block P of the input source macroblock will be generated, which will be further subtracted from the input macroblock in order to form the residue, Dn. Integer transform (approximation to Discrete Cosine Transform) is performed on Dn and the transform coefficients are quantized. It should be noted that the transform itself does not compress information whereas quantization compresses data in a lossy approach by discarding irrelevant information. The quantized transform coefficients are re-ordered and entropy coded. Entropy encoding compresses information losslessly by exploiting the information redundancy. The entropy-coded coefficients, together with side information required for decoding the macroblock (such as the quantizer step size, prediction direction in intra mode or block size decision in inter mode, motion vectors and reference frame number) form the compressed bit stream. The bit stream is further passed to a Network Abstraction Layer (NAL) for transmission or storage. In addition, there exists a reconstruction path inside the encoder, which essentially serves as the decoder. The purpose is to reconstruct the encoded block for future predictive coding of other macroblocks that will reference the encoded block. In detail, quantized coefficients are inversely quantized and inversely transformed, resulting in a decoded prediction residual, Dn'. Dn' is added to the prediction, and the resultant frame is fed into an in-loop deblocking filter to generate the final reconstruction frame. 2.3 H.264 Decoder P Predictive Coding F’n-1 (reference) + NAL Entropy Decode Reorder X Inverse Quantization Inverse D’n Transform + uF’n Filter F’n (reconstructed) Figure 2.2 H.264 decoder architecture 10 for (B1 to B4) { if(corresponding element in IS is 0) //has not been interpolated then { interpolation; set element to 1; } else //this Macroblock has been interpolated before; continue; //do nothing } + . - . + - . . . . - . - . - - . . . . ( b ) I n t e g e r p ix e l a n d it s c o r r e s p o n d in g h a lf p ix e ls ( a ) I n t e g e r p ix e l a n d it s c o r r e s p o n d in g h a lf a n d q u a r t e r p ix e ls P la n e + + + - P la n e - - - P la n e - - - P la n e ( c ) In te g e r p ix e l a n d it s c o r r e s p o n d in g h a lf p ix e ls in c lu d e d in d if fe r e n t p la n e s Figure 6.6 Interpolated frame memory reorganization 86 6.4.2 Approach Two The first approach proposed improves the interpolation method based on H.264 reference software. A flag matrix for macroblocks in reference frame is initialized with 0. Once a macroblock has been referred, interpolation will be done and value stored before the flag corresponding to this macroblock is set to 1. Later interpolation operations on this macroblock will not be necessary since the values are all computed and stored. The scheme is for both quarter pixels and half pixels. However, based on the analysis, it is found that compared with half pixel interpolation that is a six-tap filtering process, quarter pixel interpolation is just linear interpolation and thus has considerably lower computation. This prompts the thought of doing quarter pixel on the fly. On another hand, this change will also save a lot of memory space, which will increase greatly the hit rate of cache located inside the processor unit. The modified approach has also other several important changes, which will be described later. 10 11 UL UR 11 MBy MBx DR DL Figure 6.7 Flag matrix for a Quarter CIF frame 87 Generate loopfiltered integer pixels of previous frame and store them in field Form the flag matrix as N MBs of size 16x16 from left to right and top to bottom of the frame, as MBi, i=0,…,N-1. Initialize each element to ‘0’. Integer ME on the field of reference frame For the current block (whose size may be 16*16,16*8, 8*16, 8*8, 8*4, 4*8 or 4*4) with the motion vector available, find its reference block in the reference frame. The reference block might fall into one MB, or it may overlap with or MBs. /* Interpolate half pixels interpolation*/ //check upleft,upright,downleft,downright MBs for (MB_UL to MB_DR) { if (corresponding flag is 0) then { interpolate and store value in fields. set flag to 1; } else continue; // nothing since interpolated } Interpolate quarter pixels on-the-fly Figure 6.8 Flow chart of the second approach The whole approach is shown as a flow chart in Figure 6.8. Note that the blocks surrounded by dotted lines are parts of the new/modified approach whereas the rest are same as those in the previous approach. Rather than allocate 16 times of memory for each integer pixel and its associated to-be-interpolated sub pixels as shown in Figure 6.6 (a), only times of memory is allocated for each pixel and its associated three tobe-interpolated half pixels. The four pixels are put in to planes of one frame, as shown in Figure 6.6(c). Thus, the first step in the flow chart is to generate filtered integer pixels and store them into Plane of the frame memory. The second step remains unchanged as before. A flag matrix for macroblocks in reference frame is initialized. Integer motion estimation is done in the third step. Since 88 the integer pixels are adjacent with each other in field 0, the approach facilitates fast calculations of Sum of Absolute difference (SAD). The fourth step is same as before and a detailed illustration is shown in Figure 6.7, where one block inside one MB may fall into four MBs (UL, UR, DL, DR) of the reference frame or just two MBs (UR, DR) of the reference frame. It should be noted that due to padding of one MB at the borders of the QCIF reference frame, the total number of MB is calculated as (11+2)*(9+2) instead of 99 macroblocks. Step does half pixel interpolation and puts the values into Field 1, Field and Field of the reference frame. In addition, for half pixels located at diagonal positions between integer pixels, horizontal operations are chosen since it will also increase hit rate in the cache of processor. For example in Figure 6.1, half sample j is calculated as the weighted sum of half pixel samples c, d, h, m, e and f rather than that of a, b, b, s, g and h. The last step will quarter pixel interpolation on the fly, different from the previous approach. 6.5 Experimental Results The two proposed approaches are compared with JM software [53]. The test conditions are as follows: 1) MV search range is ±32 pixels; 2) Reference frame number equals to 1; 3) MV resolution is ¼ pixel; 4) GOP structure is IPPP 89 5) Number of frames in a sequence is 150. A group of experiments were carried out on the test sequences with the quantization parameters, QP=16, 32. They are all CIF sequences ranging from not too much motion such as ‘akiyo’ to sequences with much higher motion such as ‘Mobile’ and ‘Stefan’. Intensive experimental results on these typical video sequences show that compared to JM software, the proposed methods are able to increase encoder speed ranging from 10% to 94%. Results are given in Table 6.1 and 6.2, in which A. I represents the first approach whereas A. II represents the second approach. Furthermore, the experimental results prove that adoption of the new algorithms does not degrade image quality (PSNR or bit rate) at all. 6.6 Summary In this chapter, two advanced and adaptive interpolation techniques are presented that significantly reduce the interpolation operation required in video coding. The proposed methods are able to increase encoder speed ranging from 10% to 94% compared with the previous work without any PSNR loss or bit rate increase. 90 Table 6.1 Speed increase at QP = 16 A. I fps fps 31.33 49.61 60.96 A. II (on JM) % 94.57 Bus 11.68 11.92 13.39 14.64 12.33 Foreman 13.37 14.11 15.94 19.22 12.97 Mobile 10.25 10.30 11.43 11.51 10.97 Seq. QP=16 JM Akiyo (on A. I ) % 22.88 Mom& daughter Paris 24.06 29.73 36.51 51.75 22.81 17.58 19.96 23.44 33.33 17.43 Stefan 11.28 11.63 12.88 14.18 10.75 Tempete 10.57 10.66 11.96 13.15 12.20 Waterfall 11.93 12.00 13.24 10.98 10.33 91 Table 6.2 Speed increase at QP = 32 Seq. QP=32 Akiyo Bus Foreman Mobile Mom& daughter Paris Stefan Tempete Waterfall A. I JM A.II (on JM) % (on A.I) % fps fps 16.35 18.39 20.34 24.40 10.60 11.24 11.39 13.03 15.93 14.40 11.68 13.60 15.70 34.42 15.44 11.91 12.02 13.72 15.20 14.14 13.78 14.22 16.54 20.03 16.32 12.66 12.92 14.82 17.06 14.71 11.99 12.06 14.06 17.26 16.58 11.65 11.72 13.37 14.76 14.08 11.93 12.52 14.53 21.79 16.05 92 CHAPTER CONCLUSIONS AND FUTURE WORK 7.1 Conclusions The objective of the research in this thesis is to propose and investigate new methods that can reduce H.264 encoding complexity with little or no quality degradation. The brute force computational approach adopted in H.264 standard has been addressed. New approaches addressing these problems have been proposed. They are illustrated as follows. A fast intra mode decision algorithm for H.264 is proposed. It makes use of the edge direction histogram. The number of mode combinations for luminance and chrominance blocks in a MB that take part in RDO calculation has been reduced significantly. This results in great reduction in the computation load of the encoder. Experimental results show that the fast algorithm has very negligible loss of PSNR compared to the original scheme. A fast inter mode decision algorithm is put forward to decide the best mode in inter coding of H.264. It makes use of the spatial homogeneity and the temporal stationarity characteristics of texture of video objects. Specifically, homogeneity decision of a block is based on edge information inside the block, and co-sited MB difference is used to decide whether the MB is temporal stationary. Based on the homogeneity and stationarity of the video objects, only a small number of inter modes 93 are used in RDO. The experimental results show that the fast algorithm is able to reduce encoding time greatly, with only negligible PSNR loss. Two fast intra 4x4 mode elimination approaches are also proposed for H.264. The lossless approach checks cost after each 4x4 block intra mode decision, and terminate if the cost is higher than cost of inter mode. The lossy approach, by using some low cost preprocessing to make prediction, terminates if the cost is higher than some fraction of minimum cost of inter mode. Experimental results show that the lossless approach can reduce the encoding time without any sacrifice of visual quality. The lossy approach can further reduce encoding time with negligible PSNR loss or bit rate increment. Two adaptive interpolation techniques are also presented that significantly reduce the interpolation operation required in video coding. By making use of flag matrix data structure and interpolation on-demand philosophy, the proposed methods are able to increase encoder speed greatly without any PSNR loss or bit rate increase. 7.2 Future work Although the proposed fast encoding algorithms are effective in reducing the H.264 encoder complexity, some areas may be further explored and addressed to improve the performance of the encoder. Two areas deserving exploration are listed as follows. 94 7.2.1 Fast SAD Method Although the exhaustive full search method can obtain the minimum SAD value during the motion estimation process, the time complexity of the approach is formidable. Thus, some suboptimal methods have been proposed in the literature such as fast motion estimation (FME) algorithms [10, 11, 56]. Instead of evaluating every search point in the search window, they perform motion estimation using only a subset of the candidate search points. Bailo et. al. [14] proposed a method to determine the search window size based on the result of motion detection pre-processing. Tham et. al. [56] proposed a novel fast diamond search configuration. Recently, Ates et. al. [5] proposed to reuse a limited set of SAD values to approximate the SAD value of different block sizes. An inevitable consequence of this approximation is quality drop in terms of PSNR value and/or bit rate increase. For instance, the bit rate increment can be as high as 3.9% for the “Mobile” sequence. Thus, an efficient SAD map approach can be proposed to record the computed SAD values dynamically so that it can be reused for different block sizes. It will be a lossless approach since the SAD values of different block sizes are accurately calculated or fetched from pre-stored SAD maps. Due to its generic nature, this method can be applied in any fast motion estimation methods although it is especially effective in the full search motion estimation method. Experiments need to be done to explore the effectiveness of the method. 95 7.2.2 Reordering Motion Estimation Steps for Different Block Sizes Currently, the H.264 encoder does motion estimation for difference block sizes ranging from 16x16 to 4x4 in such a way that 16x16 ME is performed firstly in the following steps: 1) 16x16 block full pixel ME is performed; 2) 16x16 block half pixel ME is performed; 3) 16x16 block quarter pixel ME is performed; Then 16x8 ME is done in the following steps: 1) 16x8 block full pixel ME is performed; 2) 16x8 block half pixel ME is performed; 3) 16x8 block quarter pixel ME is performed; 8x16, 8x8 …8x4 block sizes will be done in the similar manner. The repetition inside this scheme is observed and it is proposed that all the integer/full pixels ME are performed first for all the block sizes. From the various block sizes, we choose the one having the minimum cost and perform half pixel and quarter pixel ME only on this block size. Thus the intensive computation of half and quarter pixel ME for other block sizes is avoided. However, experiments need to be done to evaluate the performance of this approach. 96 REFERENCES [1] ISO/IEC JTC1, Information Technology ― Coding of Audio-Visual Objects ― Part 10: Advanced Video Coding, ISO/IEC FDIS 14496-10, 2003. [2] T. Wiegand, G.J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC Video Coding Standard,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, No.7, pp. 560-576, July 2003. [3] T. Oelbaum, V. Baroncini, T. K. Tan, and C. Fenimore, “Subjective quality assessment of the emergingAVC/H.264 video coding standard,” International Broadcasting Conference (IBC), September, 2004. [4] I.E.G. Richardson, H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia, Chichester: John Wiley & Sons Ltd, 2003. [5] I.E.G. Richardson, “H.264 http://www.vcodex.com, 2002. [6] T. Wiegand, H. Schwarz, A. Joch, F. Lossentini, and G.J. Sullivan, “RateConstrained Coder Control and Comparison of Video Coding Standards,” IEEE Transactions on Circuits and Systems for Video Technology, Vol.13, No.7, pp.688-703, July 2003. [7] S. Saponara, C. Blanch, K. Denolf, and J. Bormans, “The JVT Advanced Video Coding Standard: Complexity and Performance Analysis on a Tool-by-Tool Basis,” Proceedings of the 13th International Packetvideo Workshop, Nantes, France, Apr. 2003. [8] A. Luthra and P.Topiwala, “Overview of the H.264/AVC Video Coding Standard,” Motorola Internal Report, 2003. [9] A. Joch, F. Kossentini, and P. Nasiopoulos, “A Performance Analysis of the ITU-T Draft H.26L Video Coding Standard,” Proceedings of the 12th International Packetvideo Workshop, Pittsburgh, USA, Apr. 2002. / MPEG-4 Part 10 White Paper,” [10] X. Li, G. Wu, “Fast Integer Pixel Motion Estimation,” JVT-F011, 6th Meeting, Awaji Island, Japan, December 5-13, 2002. [11] Z. Chen, P. Zhou and Y. He, “Fast Integer Pel and Fractional Pel Motion Estimation for JVT,” JVT-F017, 6th JVT Meeting, Awaji Island, Japan, December 5-13, 2002. 97 [12] H. Y. Cheong, A. M. Tourapis, and P. Topiwala, “Fast Motion Estimation within the JVT codec,” JVT-E023, 5th JVT Meeting, Geneva, Switzerland, October 917, 2002. [13] C. W. Ting, L. M. Po and C. H. Cheung, “Center-Biased Frame Selection Algorithms For Fast Multi-Frame Motion Estimation In H.264,” IEEE International Conference on Neural Networks & Signal Processing, Nanjing. China, December 14-17, 2003, pp. 1258-1261. [14] G. Bailo, M. Bariani, I. Barbieri and M. Raggio, “Search Window Size Decision for Motion Estimation Algorithm in H.264 Video Coder,” 2004 International Conference on Image Processing, October 24-27, 2004, Vol. 3, pp. 1453-1456. [15] H. Ates, Y. Altunbasak, “SAD Reuse In Hierarchical Motion Estimation For the H.264 Encoder,” 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, Philadelphia, PA, USA, March 18-23, 2005. [16] G. Y. Kim, Y. H. Moon and J. H. Kim, “An Early Detection of All-Zero DCT Blocks in H.264,” 2004 International Conference on Image Processing, 24-27 October, Vol. 1, pp. 453-456. [17] Y. H. Kim, J. W. Yoo, S. W. Lee, J. Shin, J. Paik, and H. K. Jung, “Adaptive Mode Decision for H.264 Encoder,” Electronics Letters, Vol. 40, No. 19, pp. 1172-1173, September, 2004. [18] Y. H. Kim, J. W. Yoo, S. W. Lee, J. Paik, and B. Choi, “Optimization of H.264 Encoder Using Adaptive Mode Decision and SIMD Instructions,” Proceedings of 2005 IEEE International Conference on Consumer Electronics, Las Vegas, USA, January, 2005, pp. 289-290. [19] I. Choi, J. Lee, W. I. Choi, and B. Jeon, “Performance Evaluation of Fast Mode Decision Algorithms for H.264,” JVT-M013, 13th JVT Meeting, Palma de Mallorca, Spain, October, 2004. [20] I. Choi, J. Lee, and B. Jeon, “Efficient Coding Mode Decision in MPEG-4 Part10 AVC/H.264 Main Profile,” Proceedings of 2004 IEEE International Conference on Image Processing, Singapore, October, 2004, pp. 1141-1144. [21] I. Choi, W. I. Choi, J. Lee, and B. Jeon, “The Fast Mode Decision for High Profile,” JVT-N012, 14th JVT Meeting, Hong Kong, China, January, 2005. [22] I. Choi, W. I. Choi, J. Lee, and B. Jeon, “The Fast Mode Decision with Fast Mode Estimation,” JVT-N013, 14th JVT Meeting, Hong Kong, China, January, 2005. [23] P. Yin, H.Y. Cheong, A. M. Tourapis and J. Boyce, “Fast Mode Decision and Motion Estimation for JVT/H.264,” Proceedings of the IEEE Conference on Image Processing (ICIP’03), 14-17 September, 2003, Vol. 3, pp. 853-856. [24] A. Ahmad, N. Khan, S. Masud, and M. A. Maud, “Selection of Variable Block Sizes in H.264,” Proceedings of 2004 IEEE International Conference on 98 Acoustics, Speech and Signal Processing, Montreal, Canada, May 2004, Vol. III, pp.173-176. [25] E. Arsura, L. Del Vecchio, R. Lancini, and L. Nisti, “Fast Macroblock Intra and Inter Modes Selection for H.264/AVC,” Proceedings of 2005 IEEE International Conference on Multimedia and Expo, Amsterdam, Netherlands, July 2005. [26] B. Jeon and J. Lee, “Fast Mode Decision for H.264,” JVT-J033, 10th JVT Meeting, Hawaii, United States, December 2003. [27] J. Lee and B. Jeon, “Fast Mode Decision for H.264 with Variable Motion Block Sizes,” Proceedings of International Symposium on Computer and Information Sciences (ISCIS) 2003, 3-5 November, 2003, pp. 723-730. [28] Q. H. Dai, D. D. Zhu, and R. Ding, “Fast Mode Decision for Inter Prediction in H.264,” Proceedings of 2004 IEEE International Conference on Image Processing, Singapore, October, 2004, pp. 119-122. [29] X. Jing and L. P. Chau, “An Efficient Inter Mode Decision Approach for H.264 Video Coding,” Proceedings of 2004 IEEE International Conference on Multimedia and Expo, Taipei, China, June 2004. [30] X. A. Lu, A. M. Tourapis, P. Yin, and J. Boyce, “Fast Mode Decision and Motion Estimation for H.264 with a Focus on MPEG-2/H.264 Transcoding,” Proceedings of 2005 IEEE International Symposium on Circuits and Systems, Kobe, Japan, May 2005, pp. 1246-1249. [31] Y. L. Liu, K. Tang, and H. J. Cui, “Efficient Probability Based Macroblock Mode Selection in H.264/AVC,” Proceedings of SPIE Visual Communications and Image Processing 2005, Beijing, China, July 2005, Vol. 5960, pp. 10801088. [32] C. Grecos and M. Y. Yang, “Fast Inter Mode Prediction for P Slices in the H.264 Video Coding Standard,” IEEE Transactions on Broadcasting, Vol. 51, No. 2, pp. 256-263, June 2005. [33] J. Lee and B. Jeon, “Fast Mode Decision for H.264,” Proceedings of 2004 IEEE International Conference on Multimedia and Expo, Taipei, China, June 2004. [34] J. Lee and B. Jeon, “Pruned Mode Decision Based on Variable Block Sizes Motion Compensation for H.264,” Proceedings of the First International Workshop on Multimedia Interactive Protocols and Systems, Napoli, Italy, November, 2003, pp. 410-418. [35] A. Chang, P. H. W. Wong, Y. M. Yeung, and O. C. Au, “Fast Multi-Block Selection for H.264 Video Coding,” Proceedings of 2004 IEEE International Symposium on Circuits and Systems, Vancouver, Canada, May 2004, pp. 817820. [36] C. C. Cheng and T. S. Chang, “Fast Three Step Intra Prediction Algorithm for 4x4 Blocks in H.264,” Proceedings of 2005 IEEE International Symposium on Circuits and Systems, Kobe, Japan, May 2005, pp. 1509-1512. 99 [37] W. I. Choi, J. Lee, S. Yang, and B. Jeon, “Fast Motion Estimation and Mode Decision with Variable Motion Block Sizes,” Proceedings of 2003 SPIE Conference on Visual Communication and Image Processing, Lugano, Switzerland, July 2003. [38] Y. L. Lai, Y. Y. Tseng, C. W. Lin, Z. Zhou, and M. T. Sun, “H.264 Encoder Speed-up via Joint Algorithm/Code-Level Optimization,” Proceedings of SPIE Visual Communications and Image Processing 2005, Beijing, China, July 2005, Vol. 5960, pp. 1089-1100. [39] H. J. Kim and Y. Altunbasak, “Low-Complexity Macroblock Mode Selection for H.264/AVC Encoders,” Proceedings of 2004 IEEE International Conference on Image Processing, Singapore, October, 2004, pp. 765-768. [40] R. Garg, M. Jindal, and M. Chauhan, “Statistics Based Fast Intra Mode Detection,” Proceedings of SPIE Visual Communications and Image Processing 2005, Beijing, China, July 2005, Vol. 5960, pp. 2085-2091. [41] K. H. Han and Y. L. Lee, “Fast Macroblock Mode Decision in H.264,” Proceedings of 2004 IEEE Region 10 Conference, Chiang Mai, Thailand, November, 2004, Vol. A, pp. 347-350. [42] C. S. Kim, H. H. Shih, and C. C. J. Kuo, “Feature-Based Intra-Prediction Mode Decision for H.264,” Proceedings of 2004 IEEE International Conference on Image Processing, Singapore, October, 2004, pp. 769-772. [43] C. S. Kim, Q. Li, and C. C. J. Kuo, “Fast Intra-Prediction Model Selection for H.264 Codec,” Proceedings of the SPIE, November, 2003, Vol. 5241, pp. 99-110. [44] C. S. Kim, H. H. Shih, and C. C. J. Kuo, “Fast H.264 Intra-Prediction Mode Selection Using Joint Spatial and Transform Domain Features,” Journal of Visual Communication & Image Representation, Vol. 17, pp. 291-310, 2006. [45] C. S. Kim and C. C. J. Kuo, “A Feature-Based Approach to Fast H.264 Intra/Inter Mode Decision,” Proceedings of 2005 IEEE International Symposium on Circuits and Systems, Kobe, Japan, May 2005, pp. 308-311. [46] A. M. Bazen and S. H. Gerez, “Systematic Methods for the Computation of the Directional Fields and Singular Points of Fingerprints,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, July 2002. [47] R. C. Gonzalez, R. E. Woods, “Digital Image Processing,” Prentice Hall, 2002 [48] G. Sullivan, “Recommended Simulation Common Conditions for H.26L Coding Efficiency Experiments on Low Resolution Progressive Scan Source Material,” VCEG-N81, 14th meeting: Santa Barbara, CA, USA. September 24-27, 2001. [49] JVT Test Model Ad Hoc Group, “Evaluation Sheet for Motion Estimation,” Draft version 4, Feb. 19, 2003. [50] G. Bjontegaard, “Calculation of Average PSNR Differences between RDcurves,” VCEG-M33, 13th meeting: Austin, Texas, USA, April 2-4, 2001. 100 [51] T. Uchiyama, N. Mukawa and H. Kaneko, “Estimation of Homogeneous Regions for Segmentation of Textured Images,” IEEE Proceedings in Pattern Recognition, 2000, pp. 1072-1075. [52] X. W. Liu, D. L. Liang and A. Srivastava, “Image Segmentation Using Local Spectral Histograms,” IEEE International Conference on Image Processing 2001, pp. 70-73. [53] JVT reference software for H.264, ftp://ftp.imtc-files.org. [54] P. I. Hosur and K. K. Ma, “Motion Vector Field Adaptive Fast Motion Estimation,” Second International Conference on Information, Communications and Signal Processing (ICICS ’99), Singapore, Dec., 1999 [55] F. Kelly and A. Kokaram, “Fast image interpolation for motion estimation using graphics hardware,” in Real Time Imaging VIII, SPIE vol. 5297, January 2004. [56] J. Y. Tham, S. Ranganath, M. Ranganath, A. K. Kassim, “A Novel Unrestricted Center-Biased Diamond Search for Block Motion Estimation,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 8, no. 4, pp. 369-377, August 1998. 101 [...]... techniques One is intra coding, which makes prediction using the information inside the same frame The other is inter coding, which predicts using the information from other frames In common, the two approaches both attempt to find the best prediction for each input macroblock, thus leading to best coding gain The predictions are generally of high complexity The details of the operations of these two modes... shown in the figure, there is a 9th mode, i.e., the DC prediction mode, or Mode 2 in H. 264 H. 264 video coding is based on the concept of rate distortion optimization, which means that the encoder has to encode the intra block using all the mode combinations and choose the one that gives the best RDO performance According to the structure of intra prediction in H. 264, the number of mode combinations for. .. decoder, inter and intra mode decision scheme, motion estimation The relevant research efforts in the literature to reducing encoder complexity have been summarized and evaluated Different from others, this thesis provides novel directions and approaches in reducing the H. 264 encoding complexity and they successfully achieve the set targets The approaches will be described and analyzed in detail in the subsequent... in Figures 2 .3 and 2.4, each prediction block is acquired through extrapolating or interpolating the pixels in various specific patterns The prediction mode which can minimize the residue error between the original input macroblock and its prediction block will be chosen as the final coding mode by the encoder In general, intra coding is necessary for intra-coded frames (I Frame) since the blocks inside... shapes as the prediction for the following block shapes within a macroblock In the final search stage, seven adaptive preferential search ranges will be used for seven shapes of the blocks These algorithms achieve significant time saving with negligible loss of coding efficiency In the proposal [11], a hierarchical FME algorithm consisting of four main steps are proposed Firstly, the prediction of initial... cost) for intra mode is omitted if 25 the minimum Rdcost at one inter mode is below a threshold Due to the rough relationship among Rdcosts under high complexity mode, the results show a relatively high bit rate increase The method proposed in [38 ] only needs to analyze a subset of the seven modes by using spatiotemporal predictions from neighboring blocks The coding modes of five neighboring blocks: the... matching within some search window will be conducted in order to find the best prediction for the current input source block The search will be done with reference to the reconstructed picture that is commonly defined as reference frame For each block, the search will generally result in a motion vector pointing to the location where the best prediction block would be obtained in the corresponding... subsequent chapters 28 CHAPTER 3 FAST INTRA MODE DECISION FOR H. 264 In this chapter, a fast mode decision algorithm is presented for H. 264 intra prediction based on local edge information to reduce the amount of calculations in intra prediction This method is based on the observation that the pixels along the direction of local edge are normally of similar values (this is true for both luminance and chrominance... entropy coding During this procedure, distortion can be acquired after reconstructing the macroblock The calculation of rate and distortion contributes to the overall RDO in achieving much better coding efficiency, but at the expense of very high computational complexity It makes real-time video coding using H. 264/ AVC a difficult problem Therefore, algorithms which reduce the computational complexity of H. 264/ AVC... adjacent to the minimum one and skips other unlikely directions Instead of 9 modes, 6 modes are required to determine the prediction mode in the full search method Because of the inaccurate information it used, the performance in time saving is not very high Choi et al [37 ] proposes a fast mode decision scheme in which early decision is possible for inter macroblock mode and the routine of computing RD cost . computational complexity. Even with the state-of-the-art hardware technology, the real-time video coding using H. 264/ AVC is still a prohibitive task. Therefore, algorithms for reducing the time complexity. Predictive Coding Predictive coding in H. 264 consists of two categories of prediction techniques. One is intra coding, which makes prediction using the information inside the same frame. The other. other is inter coding, which predicts using the information from other frames. In common, the two approaches both attempt to find the best prediction for each input macroblock, thus leading to

Định dạng
Số trang	101
Dung lượng	716,71 KB