Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 29 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
29
Dung lượng
361,99 KB
Nội dung
Ramstad, T.A. “Still Image Compression” DigitalSignalProcessingHandbook Ed. Vijay K. Madisetti and Douglas B. Williams Boca Raton: CRC Press LLC, 1999 c 1999byCRCPressLLC 52 Still Image Compression 1 Tor A. Ramstad Norwegian University of Science and Technology (NTNU) 52.1 Introduction Signal Chain • Compressibility of Images • The Ideal Coding System • Coding with Reduced Complexity 52.2 Signal Decomposition Decomposition by Transforms • Decomposition by Filter Banks • Optimal Transforms/Filter Banks • Decomposition by Differential Coding 52.3 Quantization and Coding Strategies Scalar Quantization • Vector Quantization • Efficient Use of Bit-Resources 52.4 Frequency Domain Coders The JPEG Standard • Improved Coders: State-of-the-Art 52.5 Fractal Coding Mathematical Background • Mean-Gain-Shape Attractor Coding • Discussion 52.6 Color Coding References 52.1 Introduction Digital representation of images is important for digital transmission and storage on different media suchasmagnetic or laserdisks. However, pictorialmaterialrequiresvastamountsofbitsif represented throughdirectquantization. Asanexample,anSVGAcolorimagerequires3×600×800bytes = 1, 44 Mbytes when each color component is quantized using 1 byte per pixel, the amount of bytes that can be stored on one standard 3.5-inch diskette. It is therefore evident that compression (often called coding) is necessary for reducing the amount of data [33]. In this chapter we address three fundamental questions concerning image compression: • Why is image compression possible? • What are the theoretical coding limits? • Which practical compression methods can be devised? The first two questions concern statistical and structural properties of the image material and human visual perception. Even if we were able to answer these questions accurately, the methodol- 1 Parts of this manuscript are based on Ramtad, T.A., Aase, S.O., and Husøy, J.H., Subband Compression of Images — Principles and Examples, Elsevier Science Publishers BV, North Holland, 1995. Permission to use the material is given by ELSEVIER Science Publishers BV. c 1999 by CRC Press LLC ogy for image compression (third question) does not follow thereof. That is, the practical coding algorithms must be found otherwise. The bulk of the chapter will review image coding principles and present some of the best proposed still image coding methods. The prevailing technique for image coding is transform coding. This is part of the JPEG (Joint Picture Expert Group) standard [14] as well as a part of all the existing video coding standards (H.261, H.263, MPEG-1, MPEG-2) [15, 16, 17, 18]. Another closely related technique, subband coding, is in some respects better, but has not yet been recognized by the standardization bodies. A third technique, differential coding, has not been successful for still image coding, but is often used to code the lowpass-lowpass band in subband coders, and is an integral part of hybrid video coders for removal of temporal redundancy. Vector quantization (VQ) is the ultimate technique if there were no complexity constraints. Because all practical systems must have limited complexity, VQ is usually used as a componentin a multi-component codingscheme. Finally, fractal or attraclor codingis based on an idea far from other methods, but it is, nevertheless, strongly related to vector quantization. For natural images, no exact digital representation exists because the quantization, which is an integral part of digital representations, is a lossy technique. Lossy techniques will always add noise, but the noise level and its characteristics can be controlled and depend on the number of bits per pixel as well as the performance of the method employed. Lossless techniques will be discussed as a component in other coding methods. 52.1.1 Signal Chain We assume a model where the input signal is properly bandlimited and digitized by an appropriate analog-to-digital converter. All subsequent processing in the encoder will be digital. The decoder is also digital up to the digital-to-analog converter, which is followed by a lowpass reconstruction filter. Under idealized conditions, the interconnection of the signal chain excluding the compression unit will be assumed to be noise-free. (In reality, the analog-to-digital conversion will render a noise powerwhich can be approximatedby 2 /12,whereis the quantizerinterval. Thisinterval depends on the number of bits, and we assume that it is so high that the contribution to the overall noise from this process is negligible). The performance of the coding chain can then be assessed from the difference between the input and output of the digital compression unit disregarding the analog part. Still images must be sampled on some two-dimensional grid. Several schemes are viable choices, and there are good reasons for selecting nonrectangular grids. However, to simplify, rectangular sam- pling will be considered only, and all filtering will be based on separable operations, first performed on the rows and subsequently on the columns of the image. The theory is therefore presented for one-dimensional models, only. 52.1.2 Compressibility of Images There are two reasons why images can be compressed: • All meaningful images exhibit some form of internal structure, often expressed through statistical dependencies between pixels. We call this property signal redundancy. • The human visual system is not perfect. This means that certain degradations cannot be perceivedby humanobservers. The degree of allowablenoise iscalled irrelevancy or visual redundancy. If we furthermore accept visual degradation, we can exploit what might be termed tolerance. In this section we make some speculations about the compression potential resulting from redun- dancy and irrelevancy. Thetwofundamental conceptsin evaluating acodingschemearedistortion, which measures quality in the compressed signal, and rate, which measures how costly it is to transmit or store a signal. c 1999 by CRC Press LLC Distortion is a measure of the deviation between the encoded/decoded signal and the original signal. Usually, distortion is measured by a single number for a given coder and bit rate. There are numerous ways of mapping an error signal onto a single number. Moreover, it is hard to conceive that a single number could mimic the quality assessment performed by a human observer. An easy- to-use and well-known error measure is the mean square error (mse). The visual correctness of this measure is poor. The human visual system is sensitive to errors in shapes and deterministic patterns, but not so much in stochastic textures. The mse defined over the entire image can, therefore, be entirely erroneous in the visual sense. Still, mse is the prevailing error measure, and it can be argued that it reflects well small changes due to optimization in a given coder structure, but poor as for the comparison between different models that create different noise characteristics. Rate is defined as bits per pixel and is connected to the information content in a signal, which can be measured by entropy. A Lower Bound for Lossless Coding To define image entropy, we introduce the set S containing all possible images of a certain size and call the number of images in the set N S . To exemplify, assume the image set under consideration has dimension 512 × 512 pixels and each pixel is represented by 8 bits. The number of different images that exist in this set is 2 512×512×8 , an overwhelming number! Given the probability P i of each image in the set S,wherei ∈ N S is the index pointing to the different images, the source entropy is given by H =− i∈N S P i log 2 P i . (52.1) The entropy is a lower bound for the rate in lossless coding of the digital images. A Lower Bound for Visually Lossless Coding In order to incorporate perceptual redundancies, it is observed that all the images in the given set cannot be distinguished visually. We therefore introduce visual entropy as an abstract measure which incorporates distortion. We now partition the image set into disjoint subsets, S i , in which all the different images have similar appearance. One imagefrom each subsetis chosen as the representation image. The collection of these N R representation images constitutes a subset R, that is a set spanning all distinguishable images in the original set. Assume that image i ∈ R appears with probability ˆ P i . Then the visual entropy is defined by H V =− i∈N R ˆ P i log 2 ˆ P i . (52.2) The minimum attainable bit rate is lower bounded by this number for image coders without visual degradation. 52.1.3 The Ideal Coding System Theoretically, we can approach the visual entropy limit using an unrealistic vector quantizer (VQ), in conjunction with an ideal entropy coder. The principle of such an optimal coding scheme is described next. The set of representation images is stored in what is usually called a codebook. The encoder and decoder have similar copies of this codebook. In the encoding process, the image to be coded is compared to all the vectors in the codebook applying the visually correct distortion measure. c 1999 by CRC Press LLC The codebook member with the closest resemblance to the sample image is used as the coding approximation. The corresponding codebook index (address) is entropy coded and transmitted to the decoder. The decoder looks up the image located at the address given by the transmitted index. Obviously, the above method is unrealistic. The complexity is beyond any practical limit both in terms of storage and computational requirement. Also, the correct visual distortion measure is not presently known. We should therefore only view the indicated coding strategy as the limit for any coding scheme. 52.1.4 Coding with Reduced Complexity In practical coding methods, there are basically two ways of avoiding the extreme complexity of ideal VQ. In the first method, the encoder operates on small image blocks rather than on the complete image. This is obviously suboptimal because the method cannot profit from the redundancy offered by large structures in an image. But the larger the blocks, the better the method. The second strategy is very different and applies some preprocessing on the image prior to quantization. The aim is to remove statistical dependencies among the image pixels, thus avoiding representation of the same information more than once. Both techniques are exploited in practical coders, either separately or in combination. A typical image encoder incorporating preprocessing is shown in Fig. 52.1. FIGURE 52.1: Generic encoder structure block diagram. D = decomposition unit, Q = quantizer, B = coder for minimum bit-representation. Thefirstblock(D)decomposesthesignalintoasetofcoefficients. Thecoefficientsaresubsequently quantized (in Q), and arefinally coded toa minimum bit representation(in B). This model is correct for frequency domain coders, but in closed loop differential coders (DPCM), the decomposition and quantizationis performedin the sameblock, aswill bedemonstrated later. Usuallythe decomposition is exact. In fractal coding, the decomposition is replaced by approximate modeling. Let usconsider thedecoder andintroduce a series expansionas a unifying description of the different image representation methods: ˆx(l) = k ˆa k φ k (l) . (52.3) The formula represents the recombination of signal components. Here {ˆa k } are the coefficients (the parameters in the representation), and {φ k (l)} are the basis functions. A major distinction between coding methods is their set of basis functions, as will be demonstrated in the next section. Thecompletedecoderconsistsof three major parts as showninFig. 52.2. The firstblock (I)receives the bit representation which it partitions into entities representing the different coder parameters and decodes them. The second block (Q −1 ) is a dequantizer which maps the code to the parametric approximation. The third block (R) reconstructs the signal from the parameters using the series representation. c 1999 by CRC Press LLC FIGURE 52.2: Block diagram of generic decoder structure. I = bit-representation decoder, Q −1 = inverse quantizer, R = signal reconstruction unit. The second important distinction between compression structures is the coding of the series expansion coefficients in terms of bits. This is dealt with in section 52.3. 52.2 Signal Decomposition As introduced in the previous section, series expansion can be viewed as a common tool to describe signal decomposition. The choice of basis functions will distinguish different coders and influence such features as coding gain and the types of distortions present in the decoded image for low bit rate coding. Possible classes of basis functions are: 1. Block-oriented basis functions. • The basis functions can cover the whole signal length L. L linearly independent basis functions will make a complete representation. • Blocks of size N ≤ L can be decomposed individually. Transform coders operate in this way. If the blocks are small, the decomposition can catch fast transients. On the other hand, regions with constant features, such as smooth areas or textures, require long basis functions to fully exploit the correlation. 2. Overlapping basis functions: The length of the basis functions and the degree of overlap are important parameters. The issue of reversibility of the system becomes nontrivial. • In differential coding, one basis function is used over and over again, shifted by one sample relative to the previous function. In this case, the basis function usually varies slowly according to some adaptation criterion with respect to the local signal statistics. • In subband coding using a uniform filter bank, N distinct basis functions are used. These are repeated over and over with a shift between each group by N samples. The length of the basis functions is usually several times larger than the shifts ac- commodating for handling fast transients as well as long-term correlations if the basis functions taper off at both ends. • The basis functions may be finite (FIR filters) or semi-infinite (IIR filters). Both time domain and frequency domain properties of the basis functions are indicators of the coder performance. It can be argued that decomposition, whether it is performed by a transform or a filter bank, represents a spectral decomposition. Coding gain is obtained if the different output channels are decorrelated. It is therefore desirable that the frequency responses of the different basis functions are localized and separate in frequency. At the same time, they must cover the whole frequency band in order to make a complete representation. c 1999 by CRC Press LLC The desire to have highly localized basis functions to handle transients, with localized Fourier transforms to obtain good coding gain, are contradictory requirements due to the Heisenberg uncer- tainty relation [33] between a function and its Fourier transform. The selection of the basis functions must be a compromise between these conflicting requirements. 52.2.1 Decomposition by Transforms When nonoverlapping block transforms are used, the Karhunen-Lo ` eve transform decorrelates, in a statistical sense, the signal within each block completely. It is composed of the eigenvectors of the correlation matrix of the signal. This means that one either has to know the signal statistics in advance or estimate the correlation matrix from the image itself. Mathematically the eigenvalue equation is given by R xx h n = λ n h n . (52.4) If the eigenvectors are column vectors, the KLT matrix is composed of the eigenvectors h n ,n= 0, 1,···,N− 1, as its rows: K = h 0 h 1 .h N−1 T . (52.5) The decomposition is performed as y = Kx . (52.6) The eigenvalues are equal to the power of each transform coefficient. In practice, the so-called Cosine Transform (of type II) is usually used because it is a fixed transform and it is close to the KLT when the signal can be described as a first-order autoregressive process with correlation coefficient close to 1. The cosine transform of length N in one dimension is given by: y(k) = 2 N α(k) N−1 n=0 x(n) cos (2n + 1)kπ 2N ,k= 0, 1,···,N− 1 , (52.7) where α(0) = 1 √ 2 and α(k) = 1 for k = 0 . (52.8) The inverse transform is similar except that the scaling factor α(k) is inside the summation. Many other transforms have been suggested in the literature (DFT, Hadamard Transform, Sine Transform, etc.), but none of these seem to have any significance today. 52.2.2 Decomposition by Filter Banks Uniform analysis and synthesis filter banks are shown in Fig. 52.3. In the analysis filter bank the input signal is split in contiguous and slightly overlapping frequency bands denoted subbands. An ideal frequency partitioning is shown in Fig. 52.4. If the analysis filter bank was able to decorrelate the signal completely, the output signal would be white. For all practical signals, complete decorrelation requires an infinite number of channels. In the encoder the symbol ↓ N indicates decimation by a factor of N. By performing this deci- mation in each of the N channels, the total number of samples is conserved from the system input to decimator outputs. With the channel arrangement in Fig. 52.4, the decimation also serves as a demodulator. All channels will have a baseband representation in the frequency range [0,π/N] after decimation. c 1999 by CRC Press LLC FIGURE 52.3: Subband coder system. FIGURE 52.4: Ideal frequency partitioning in the analysis channel filters in a subband coder. The synthesis filter bank, as shown in Fig. 52.3, consists of N branches with interpolators indicated by ↑ N and bandpass filters arranged as the filters in Fig. 52.4. The reconstruction formula constitutes the following series expansion of the output signal: ˆx(l) = N−1 n=0 ∞ k=−∞ e n (k)g n (l − kN) , (52.9) where{e n (k), n = 0,1, .,N− 1,k=−∞, .,−1, 0, 1, .,∞} are the expansion coefficients representing the quantized subband signals and {g n (k), n = 0, 1, .,N} are the basis functions, which are implemented as unit sample responses of bandpass filters. Filter Bank Structures Through the last two decades, an extensive literature on filter banks and filter bank structures has evolved. Perfect reconstruction (PR) is often considered desirable in subband coding systems. It is not a trivial task to design such systems due to the downsampling required to maintain a minimum sampling rate. PR filter banks are often called identity systems. Certain filter bank structures inherently guarantee PR. It is beyond the scope of this chapter to give a comprehensive treatment of filter banks. We shall only present different alternative solutions at an overview level, and in detail discuss an important two-channel system with inherent perfect reconstruction properties. We can distinguish between different filter banks based on several properties. In the following, five classifications are discussed. 1. FIR vs. IIR filters — Although IIR filters have an attractive complexity, their inherent long unit sample response and nonlinear phase are obstacles in image coding. The unit sample response length influences the ringing problem, which is a main source of c 1999 by CRC Press LLC objectionabledistortionin subband coders. Thenonlinearphasemakestheedge mirroring technique [30] for efficient coding of images near their borders impossible. 2. Uniform vs. nonuniform filter banks — This issue concerns the spectrum partioning in frequency subbands. Currently it is the general conception that nonuniform filter banks perform better than uniform filter banks. There are two reasons for that. The first reason is that our visual system also performs a nonuniform partioning, and the coder should mimic the type of receptor for which it is designed. The second reason is that the filter bank should be able to cope with slowly varying signals (correlation over a large region) as well as transients that are short and represent high frequency signals. Ideally, the filter banks should be adaptive (and good examples of adaptive filter banks have been demonstrated in the literature [2, 11]), but without adaptivity one filter bank has to be a good compromise between the two extreme cases cited above. Nonuniform filter banks can give the best tradeoff in terms of space-frequency resolution. 3. Parallel vs. tree-structured filter banks — The parallel filter banks are the most general, but tree-structured filter banks enjoya large popularity, especially foroctave band (dyadic frequency partitioning) filter banks as they are easily constructed and implemented. The popular subclass of filter banks denoted wavelet filter banks or wavelet transforms belong to this class. For octave band partioning, the tree-structured filter banks are as general as the parallel filter banks when perfect reconstruction is required [4]. 4. Linear phase vs. nonlinear phase filters — There is no general consensus about the optimality of linear phase. In fact, the traditional wavelet transforms cannot be made linear phase. There are, however, three indications that linear phase should be chosen. (1) The noise in the reconstructed image will be antisymmetrical around edges with nonlinear phase filters. This does not appear to be visually pleasing. (2) The mirror extension technique [30] cannot be used for nonlinear phase filters. (3) Practical coding gain optimizations have given better results for linear than nonlinear phase filters. 5. Unitaryvs. nonunitary systems—A unitary filterbank has thesame analysisand synthesis filters(except fora reversalof theunit sample responsesin thesynthesis filterswith respect to the analysis filters to make the overall phase linear). Because the analysis and synthesis filters play different roles, it seems plausible that they, in fact, should not be equal. Also, the gain can be larger, as demonstrated in section 52.2.3, for nonunitary filter banks as long as straightforward scalar quantization is performed on the subbands. Several other issues could be taken into consideration when optimizing a filter bank. These are, among others, the actual frequency partitioning including the number of bands, the length of the individual filters, and other design criteria than coding gain to alleviate coding artifacts, especially at low rates. As an example of the last requirement, it is important that the different phases in the reconstruction process generate the same noise; in other words, the noise should be stationary rather than cyclo-stationary. This maybe guaranteedthrough requirementson the normsof theunit sample responses of the polyphase components [4]. The Two-Channel Lattice Structure A versatile perfect reconstruction system can be built from two-channel substructures based on lattice filters [36]. The analysis filter bank is shown in Fig. 52.5. It consists of delay-free blocks given in matrix forms as η = ab cd , (52.10) and single delays in the lower branch between each block. At the input, the signal is multiplexed into the two branches, which also constitutes the decimation in the analysis system. c 1999 by CRC Press LLC FIGURE 52.5: Multistage two-channel lattice analysis lattice filter bank. FIGURE 52.6: Multistage two-channel polyphase synthesis lattice filter bank. A similar synthesis filter structure is shown in Fig. 52.6. In this case, the lattices are given by the inverse of the matrix in Eq. 52.10: η −1 = 1 ad − bc d −b −ca , (52.11) and the delays are in the upper branches. It is not hard to realize that the two systems are inverse systems provided ad − bc = 0,exceptforasystemdelay. As the structure can be extended as much as wanted, the flexibility is good. The filters can be made unitary or they can have a linear phase. In the unitary case, the coefficients are related through a = d = cosφ and b =−c = sinφ, whereas in the linear phase case, the coefficients are a = d = 1 and b = c. In the linear phase case, the last block (η L ) must be a Hadamard transform. Tree Structured Filter Banks In tree-structured filter banks, the signal is first split in two channels. The resulting outputs are input to a second stage with further separation. This process can go on as indicated in Fig. 52.7 for a system where at every stage the outputs are split further until the required resolution has been obtained. Tree-structured systems have a rather high flexibility. Nonuniform filter banks are obtained by splitting only some of the outputs at each stage. To guarantee perfect reconstruction, each stage in the synthesis filter bank (Fig. 52.7) must reconstruct the input signal to the corresponding analysis filter. 52.2.3 Optimal Transforms/Filter Banks The gain in subband and transform coders depends on the detailed construction of the filter bank as well as the quantization scheme. Assumethattheanalysisfilterbankunitsampleresponsesaregivenby{h n (k), n = 0, 1, .,N−1}. The corresponding unit sample responses of the synthesis filters are required to have unit norm: L−1 k=0 g 2 n (k) = 1 . c 1999 by CRC Press LLC [...]... and (b), respectively In the encoder, the input signal x is represented by the bit-stream b Q is the quantizer and Q−1 the dequantizer, but QQ−1 = 1, except for the case of infinite resolution in the quantizer The signal d, which is quantized and transmitted by some binary code, is the difference between the input signal and a predicted value of the input signal based on previous outputs and a prediction... Assume we have a signal that can be split in classes with different statistics As an example, after applying signal decomposition, the different transform coefficients typically have different variances Assume also that we have a pool of bits to be used for representing a collection of signal vectors from the different classes, or we try to minimize the number of bits to be used after all signals have been... Acoustics, Speech, and Signal Proc (ICASSP), Minneapolis, MN, 3, 233–236, April 1993 [3] Balasingham, I., Fuldseth, A and Ramstad, T A., On optimal tiling of the spectrum in subband image compression, in Proc Int Conf on Image Processing (ICIP), 1997 [4] Balasingham, I and Ramstad, T.A., On the optimality of tree-structured filter banks in subband image compression, IEEE Trans Signal Processing, 1997, (submitted)... a scalar quantizer where the 1st order signal pdf is exploited to increase the quantizer performance It is therefore often referred to as a pdf-optimized quantizer Each signal sample is quantized using the same number of bits The optimization is done by minimizing the total distortion of a quantizer with a given number L of representation levels For an input signal X with pdf pX (x), the average mean... coders Rate Allocation Assume we have the same signal collection as above This time we want to minimize the number of bits to be used after the signal components have been quantized The first order entropy of the decomposed source will be selected as the measure for the obtainable minimum bit-rate when scalar representation is specified To simplify, assume all signal components are Gaussian The entropy of... the rate gain in Eq 52.32 are equivalent for Gaussian sources In order to exploit this result in conjunction with signal decomposition, we can view each output component as a stationary source, each with different signal statistics The variances will depend on the spectrum of the input signal From Eq 52.32 and Eq 52.33 we see that the rate difference is larger the more different the channel variances... encoder structure showing the successive approximation of the signal vector The first block in the encoder makes a rough approximation to the input vector by selecting the codebook vector which, upon scaling by e1 , is closest in some distortion measure Then this approximation is subtracted from the input signal In the second stage, the difference signal is approximated by a vector from the second codebook... orthogonal bases and fast tiling transforms, IEEE Trans Signal Processing, 41(12),3341–3359, Dec 1993 [12] Huffman, D.A., A method for the construction of minimum redundancy codes, Proc IRE, 40(9),1098–1101, Sept 1952 [13] Hung, A.C., PVRG-JPEG Codec 1.2.1, Portable Video Research Group, Stanford University, Boston, MA, 1993 [14] ISO/IEC IS 10918-1, Digital Compression and Coding of Continuous-Tone Still... image coding using zerotrees of wavelets coefficients, IEEE Trans Signal Processing, 41,3445–3462, Dec 1993 [36] Vaidyanathan, P.P., Multirate Systems and Filter Banks, Prentice-Hall, Englewood Cliffs, NJ, 1993 [37] Wallace, G.K., Overview of the JPEG (ISO/CCITT) still image compression standard, in Proc SPIE’s Visual Communications and Image Processing, 1989 [38] Antonini, M., Barland, M., Mathieu, P.,... function will vary slowly, depending on some spectral modification derived from the incoming samples 52.3 Quantization and Coding Strategies Quantization is the means of providing approximations to signals and signal parameters by a finite number of representation levels This process is nonreversible and thus always introduces noise The representation levels constitute a finite alphabet which is usually . Ramstad, T.A. “Still Image Compression” Digital Signal Processing Handbook Ed. Vijay K. Madisetti and Douglas B. Williams Boca Raton:. analog-to -digital converter. All subsequent processing in the encoder will be digital. The decoder is also digital up to the digital- to-analog converter, which