Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 30 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
30
Dung lượng
2,37 MB
Nội dung
488 CHAPTER 18 Wavelet Image Compression Time Frequency (a) Frequency Time (b) Frequency Time (c) Frequency Tim e (d) FIGURE 18.14 Tiling representations of several expansions for 1D signals. (a) STFT-like decomposition; (b) wavelet decomposition; (c) wavelet packet decomposition, and (d) “anti-wavelet” packet decomposition. Fig. 18.14(d) highlights a wavelet packet expansion where the time-frequency attributes are exactly the reverse of the wavelet case: the expansion has good frequency resolution at higher frequencies, and good time localization at lower frequencies—we might call this the “anti-wavelet” packet. There are a plethora of other options for the time-frequency resolution tradeoff, and these all correspond to admissible wavelet packet choices. The extra adaptivity of the wavelet packet framework is obtained at the price of added computation in searching for the best wavelet packet basis, so an efficient fast search algorithm is the key in applications involving wavelet packets. The problem of searching for the best basis from the wavelet packet libr ary for the compression problem using an RD optimization framework and a fast tree-pruning algorithm was described in [22]. The 1D wavelet packet bases can be easily extended to 2D by w riting a 2D basis func- tion as the product of two 1D basis functions. In another words, we can treat the rows and columns of an image separately as 1D signals. The perfor mance gains associated with wavelet packets are obviously image-dependent. For difficult images such as Barbara in Fig. 18.12, a wavelet packet decomposition shown in Fig. 18.15(a) givesmuchbettercod- ing performance than the wavelet decomposition. The wavelet packet decoded Barbara image at 0.1825 b/p is shown in Fig. 18.15(b), whose visual quality (or PSNR) is the same as the wavelet SPIHT decoded Barbara image at 0.25 b/p in Fig. 18.12. The bit rate saving achieved by using a wavelet packet basis instead of the wavelet basis in this case is 27% at the same visual quality. An important pr actical application of wavelet packet expansions is the FBI wavelet scalar quantization (WSQ) standard for fingerpr int image compression [23]. Because of the complexity associated with adaptive wavelet packet transforms,the FBIWSQ standard uses a fixed wavelet packet decomposition in the transform stage. The transform structure specified by the FBI WSQ standard is shown in Fig. 18.16. It was designed for 500 dots per inch fingerprint images by spectral analysis and trial and error. A total of 64 subbands are generated with a five-level wavelet packet decomposition. Trials by the FBI have shown that the WSQ standard benefited from having fine frequency partitions in the middle frequency region containing the fingerprint ridge patterns. 18.6 Adaptive Wavelet Transforms: Wavelet Packets 489 (a) (b) FIGURE 18.15 (a) A wavelet packet decomposition for the Barbara image. White lines represent frequency boundaries. Highpass bands are processed for display; (b) Wavelet packet decoded Barbara at 0.1825 b/p. PSNR ϭ 27.6 dB. 01 23 4 56 78 910 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 y x 58 59 60 61 6362 p /2 p /2 p p 0 FIGURE 18.16 The wavelet packet transform structure given in the FBI WSQ specification. The number seq- uence shows the labeling of the different subbands. 490 CHAPTER 18 Wavelet Image Compression FIGURE 18.17 Space-frequency segmentation and tiling for the Building image. Theimagetothe left shows that spatial segmentation separates the sky in the background from the building and the pond in the foreground. Theimagetothe right gives the best wavelet packet decomposition of each spatial segment. Dark lines represent spatial segments; white lines represent subband boundaries of wavelet packet decompositions. Note that the upper-left corners are the lowpass bands of wavelet packet decompositions. As an extension of adaptive wavelet packet transforms, one can introduce time- variation by segmenting the signal in time and allowing the wavelet packet bases to evolve with the signal. The result is a time-varying transform coding scheme that can adapt to signal nonstationarities. Computationally fast algorithms are again ver y important for finding the optimal signal expansions in such a time-vary ing system. For 2D images, the simplest of these algorithms performs adaptive frequency segmentations over regions of theimage selected through a quadtree decomposition. More complicated algorithms pro- vide combinations of frequency decomposition and spatial segmentation. These jointly adaptive algorithms work particularly well for hig hly nonstationary images. Figure 18.17 shows the space-frequency tree segmentation and tiling for the Building image [24].The imagetothe left shows the spatial segmentation result that separates the sky in the back- ground from the building and the pond in the foreground. Theimagetothe right gives the best wavelet packet decomposition for each spatial segment. 18.7 JPEG2000 AND RELATED DEVELOPMENTS JPEG2000 by default employs the dyadic wavelet transform for natural images in many standard applications. It also allows the choice of the more general wavelet packet trans- forms for certain types of imagery (e.g., fingerprints and radar images). Instead of using the zerotree-based SPIHT algorithm, JPEG2000 relies on embedded block coding with 18.8 Conclusion 491 optimized truncation (EBCOT) [25] to provide a rich set of features such as quality scalability, resolution scalability, spatial random access, and region-of-interest coding. Besides robustness toimage type changes in terms of compression performance, the main advantage of the block-based EBCOT algorithm is that it provides easier random access to local image components. On the other hand, both encoding and decoding in SPIHT require nonlocal memory access tothe whole tree of wavelet coefficients, caus- ing reduction in throughput when coding large-size images. A thorough description of the JPEG2000 standard is in [1]. Other JPEG2000 related references are Chapter 17 and [26, 27]. Although this chapter is about wavelet coding of 2D images, the wavelet coding framework and its extension to wavelet packets apply to 3D video as well. Recent research works (see [28] and references therein) on 3D scalable wavelet video coders based on the framework of motion-compensated temporal filtering (MCTF) [29] have shown com- petitive or better performance than the best MC-DCT-based standard video coder (e.g., H.264/AVC [30]). They have stirred considerable excitement in the video coding com- munity and stimulated research efforts toward subband/wavelet interframe video coding, especially in the area of scalable motion coding [31] within the context of MCTF. MCTF can be conceptually viewed as the extension of wavelet-based coding in JPEG2000 from 2D images to 3D video. It nicely combines scalability features of wavelet-based coding with motion compensation, which has been proven to be very efficient and necessary in MC-DCT-based standard video coders. We refer the readers to a recent special issue [32] on the latest results and Chapter 11 in [9] for an exposition of 3D subband/wavelet video coding. 18.8 CONCLUSION Since the introduction of wavelets as a signal processing tool in the late 1980s, a variety of wavelet-based coding algorithms have advanced the limits of compression performance well beyond that of the current commercial JPEG image coding standard. In this chapter, we have provided very simple high-level insights, based on the intuitive concept of time- frequency representations,intowhy wavelets are good for image coding. After introducing the salient aspects of the compression problem in general and the transform coding problem in particular, we have highlighted the key important differences between the early class of subband coders and the more advanced class of modern-day wavelet image coders. Selecting the EZW coding structure embodied in the celebrated SPIHT algorithm as a representative of this latter class, we have detailed its operation by using a simple illustrative example. We have also described the role of wavelet packets as a simple but powerful generalization of the wavelet decomposition in order to offer a more robust and adaptive transform image coding framework. JPEG2000 is the result of the rapid progress made in wavelet image coding research in the 1990s. The triumph of wavelet transform in the evolution of the JPEG2000 standard underlines the importance of the fundamental insights provided in this chapter into why wavelets are so attractive for image compression. 492 CHAPTER 18 Wavelet Image Compression REFERENCES [1] D. Taubman and M. Marcellin. JPEG2000: Image Compression Fundamentals, Standards, and Practice. Kluwer, New York, 2001. [2] G. Strang and T. Nguyen. Wavelets and Filter Banks. Wellesley-Cambridge Press, New York, 1996. [3] M. Vetterli and J. Kovaˇcevi´c. Wavelets and Subband Coding. Prentice-Hall, Englewood Cliffs, NJ, 1995. [4] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley & Sons, Inc., New York, 1991. [5] C. E. Shannon. A mathematical theory of communication. Bell Syst. Tech. J., 27:379–423, 623–656, 1948. [6] C. E. Shannon. Coding theorems for a discrete source with a fidelity criterion. IRE Natl. Conv. Rec., 4:142–163, 1959. [7] A. Gersho and R. M. Gray. Vector Quantization and Signal Compression. Kluwer Academic, Boston, MA, 1992. [8] N. S. Jayant and P. Noll. Digital Coding of Waveforms. Prentice-Hall, Englewood Cliffs, NJ, 1984. [9] A. Bovik, editor. The Video Processing Companion. Elsevier, Burlington, MA, 2008. [10] S. P. Lloyd. Least squares quantization in PCM. IEEE Trans. Inf. Theor y, IT-28:127–135, 1982. [11] H. Gish and J. N. Pierce. Asymptotically efficient quantizing. IEEE Trans. Inf. Theory, IT-14(5): 676–683, 1968. [12] M. W. Marcellin and T. R. Fischer. Trellis coded quantization of memoryless and Gauss-Markov sources. IEEE Trans. Commun., 38(1):82–93, 1990. [13] T. Berger. Rate Distortion Theory. Prentice-Hall, Englewood Cliffs, NJ, 1971. [14] N. Farvardin and J. W. Modestino. Optimum quantizer performance for a class of non-Gaussian memoryless sources. IEEE Trans. Inf. Theory, 30:485–497, 1984. [15] D. A. Huffman. A method for the construction of minimum redundancy codes. Proc. IRE, 40: 1098–1101, 1952. [16] T. C. Bell, J. G. Cleary, and I. H. Witten. Tex t Compression. Prentice-Hall, Englewood Cliffs, NJ, 1990. [17] J. W. Woods, editor. Subband Image Coding. Kluwer Academic, Boston, MA, 1991. [18] M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies. Image coding using wavelet transform. IEEE Trans. Image Process., 1(2):205–220, 1992. [19] J. Shapiro. Embedded image coding using zero-trees of wavelet coefficients. IEEE Trans. Signal Process., 41(12):3445–3462, 1993. [20] A. Said and W. A. Pearlman. A new, fast, and efficient image codec based on set partitioning in hierarchical trees. IEEE Trans. Circuits Syst. Video Technol., 6(3):243–250, 1996. [21] R. R. Coifman and M. V. Wickerhauser. Entropy based algorithms for best basis selection. IEEE Trans. Inf. Theor y , 32:712–718, 1992. [22] K. Ramchandran and M. Vetterli. Best wavelet packet bases in a r ate-distortion sense. IEEE Trans. Image Process., 2(2):160–175, 1992. [23] Criminal Justice Information Services. WSQ Gray-Scale Fingerprint Image Compression Specification (Ver. 2.0). Federal Bureau of Investigation, 1993. References 493 [24] K. Ramchandran, Z. Xiong, K. Asai, and M. Vetterli. Adaptive transforms for image coding using spatially-varying wavelet packets. IEEE Trans. Image Process., 5:1197–1204, 1996. [25] D. Taubman. High performance scalable image compression with EBCOT. IEEE Trans. Image Process., 9(7):1151–1170, 2000. [26] Special Issue on JPEG2000. Signal Process. Image Commun., 17(1), 2002. [27] D. Taubman and M. Marcellin. JPEG2000: standard for interactive imaging. Proc. IEEE, 90(8): 1336–1357, 2002. [28] J. Ohm, M. van der Schaar,and J.Woods. Interframe wavelet coding – motion picture representation for universal scalability. Signal Process. Image Commun., 19(9):877–908, 2004. [29] S T. Hsiang and J. Woods. Embedded video coding using invertible motion compensated 3D subband/wavelet filter bank. Signal Process. Image Commun., 16(8):705–724, 2001. [30] T. Wiegand, G. Sullivan, G. Bjintegaard, and A. Luthra. Overview of the H.264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol., 13:560–576, 2003. [31] A. Secker and D. Taubman. Highly scalable video compression with scalable motion coding. IEEE Trans. Image Process., 13(8):1029–1041, 2004. [32] Special issue on subband/wavelet interframe video coding. Signal Process. Image Commun., 19, 2004. CHAPTER 19 Gradient and Laplacian Edge Detection Phillip A. Mlsna 1 and Jeffrey J. Rodríguez 2 1 Northern Arizona University; 2 University of Arizona 19.1 INTRODUCTION One of the most fundamental image analysis operations is edge detection. Edges are often vital clues toward the analysis and interpretation of image information, both in biological vision and in computer image analysis. Some sort of edge detection capability is present in the visual systems of a wide variety of creatures, so it is obviously useful in their abilities to perceive their surroundings. For this discussion, it is important to define what is and is not meant by the term “edge.” The everyday notion of an edge is usually a physical one, caused by either the shapes of physical objects in three dimensions or by their inherent material properties. Described in geometric terms, there are two types of physical edges: (1) the set of points along which there is an abrupt change in local orientation of a physical surface and (2) the set of points describing the boundary between two or more materially distinct regions of a physical surface. Most of our perceptual senses, including vision, operate at a distance and gather information using receptors that work in, at most, two dimensions. Only the sense of touch, which requires direct contact to stimulate the skin’s pressure sensors, is capable of direct perception of objects in three-dimensional (3D) space. However, some physical edges of the second type may not be perceptible by touch because mater ial differences—for instance different colors of paint—do not always produce distinct tactile sensations. Everyone first develops a working understanding of physical edges in early childhood by touching and handling every object within reach. The imaging process inherently performs a projection from a 3D scene to a two- dimensional (2D) representation of that scene, according tothe viewpoint of the imaging device. Because of this projection process, edges in images have a somewhat different meaning than physical edges. Although the precise definition depends on the applica- tion context, an edge can generally be defined as a boundary or contour that separates adjacent image regions having relatively distinct characteristics according to some fea- ture of interest. Most often this feature is gray level or luminance, but others, such as 495 496 CHAPTER 19 Gradient and Laplacian Edge Detection reflectance, color, or texture, are sometimes used. In the most common situation where luminance is of primary interest, edge pixels are those at the locations of abrupt gray level change. To eliminate single-point impulses from consideration as edge pixels, one usually requires that edges be sustained along a contour; i.e., an edge point must be part of an edge structure having some minimum extent appropriate for the scale of interest. Edge detection is the process of determining which pixels are the edge pixels. The result of the edge detection process is typically an edge map, a new image that describes each original pixel’s edge classification and perhaps additional edge attributes, such as magnitude and orientation. There is usually a strong correspondence between the physical edges of a set of objects and the edges in images containing views of those objects. Infants and young children learn this as they de velop hand–eye coordination, gradually associating visual patterns with touch sensations as they feel and handle items in their vicinity. There are many situations, however, in which edges in an image do not correspond to physical edges. Illu- mination differences are usually responsible for this effect—for example, the boundary of a shadow cast across an otherwise uniform surface. Conversely, physical edges do not always give rise to edges in images. This can also be caused by certain cases of lighting and surface properties. Consider what happens when one wishes to photog raph a scene rich with physical edges—for example, a craggy mountain face consisting of a single type of rock. When this scene is imaged while the sun is directly behind the camera, no shadows are visible in the scene and hence shadow-dependent edges are n onexistent in the photo. The only edges in such a photo are produced by the differences in material reflectance, texture, or color. Since our rocky subject material has little variation of these types, the result is a rather dull photograph because of the lack of apparent depth caused by the missing edges. Thus images can exhibit edges having no physical counterpart, and they can also miss capturing edges that do. Although edge information can be very useful in the initial stages of such image processing and analysis tasks as segmentation, registration, and object recognition, edges are not completely reliable for these purposes. If one defines an edge as an abrupt g ray level change, then the derivative, or gradient, is a natural basis for an edge detector. Figure 19.1 illustrates the idea with a continuous, one-dimensional (1D) example of a bright central region against a dark background. The left-hand portion of the gray level function f c (x) shows a smooth transition from darktobrightasx increases. There must be a point x 0 that marks the t ransition from the low-amplitude region on the left tothe adjacent high-amplitude region in the center. The gradient approach to detecting this edge is to locate x 0 where f Ј c (x) reaches a local maximum or, equivalently,f Ј c (x) reaches a local extremum, as shown in the second plot of Fig. 19.1. The second derivative, or Laplacian approach, locates x 0 where a zero-crossing of f ЈЈ c (x) occurs,as in the third plot of Fig. 19.1. The right-hand side of Fig. 19.1 illustrates the case for a falling edge located at x 1 . To use the gradient or the Laplacian approaches as the basis for practical image edge detectors, one must extend the process to two dimensions, adapt tothe discrete case, and somehow deal with the difficulties presented by real images. Relative tothe 1D edges 19.1 Introduction 497 f c (x) f' c (x) 0 0 x 0 x 1 0 f" c (x) FIGURE 19.1 Edge detection in the 1D continuous case; changes in f c (x) indicate edges, and x 0 and x 1 are the edge locations found by local extrema of f Ј c (x) or by zero-crossings of f ЈЈ c (x). shown in Fig. 19.1, edges in 2D images have the additional quality of direction. One usually wishes to find edges regardless of direction, but a directionally sensitive edge detector can be useful at times. Also, the discrete nature of digital images requires the use of an approximation tothe derivative. Finally, there are a number of problems that can confound the edge detection process in real images. These include noise, crosstalk or interference between nearby edges, and inaccuracies resulting from the use of a discrete grid. False edges, missing edges, and errors in edge location and orientation are often the result. Because the derivative operator acts as a highpass filter, edge detectors based on it are sensitive to noise. It is easy for noise inherent in an imageto corrupt the real edges by shifting their apparent locations and by adding many false edge pixels. Unless care is taken, seemingly moderate amounts of noise are capable of overwhelming the edge detection process, rendering the results virtually useless. The wide variety of edge detection algorithms developed over the past three decades exists, in large part, because of the many ways proposed for dealing with noise and its effects. Most algorithms employ noise-suppression filtering of some kind before applying the edge detector itself. Some decompose theimage into a set of lowpass or bandpass versions, apply the edge detector to each, and merge the results. Still others use adaptive methods, modifying the edge detector’s parameters and behavior according tothe noise characteristics of theimage 498 CHAPTER 19 Gradient and Laplacian Edge Detection data. Some recent work by Mathieu et al. [20] on fractional derivative operators shows some promise for enriching the gradient and Laplacian possibilities for edge detection. Fractional derivatives may allow better control of noise sensitivity, edge localization, and error rate under various conditions. An important tradeoff exists between correct detection of the actual edges and precise location of their positions. Edge detection errors can occur in two for ms: false positives, in which nonedge pixels are misclassified as edge pixels, and false negatives, which are the reverse. Detection errors of b oth types tend to increase with noise, making good noise suppression ver y important in achieving a high detection accuracy. In general, the potential for noise suppression improveswith the spatial extent of the edgedetection filter. Hence, the goal of maximum detection accuracy calls for a large-sized filter. Errors in edge localization also increase with noise. To achieve good localization, however, the filter should generally b e of small spatial extent. The goals of detection accuracy and location accuracy are thus put into direct conflict, creating a kind of uncertainty principle for edge detection [28]. In this chapter, we cover the basics of gradient and Laplacian edge detection methods in some detail. Following each, we also describe several of the more important and useful edge detection algorithms based on that approach. While the primary focus is on gray level edge detectors, some discussion of edge detection in color and multispectral images is included. 19.2 GRADIENT-BASED METHODS 19.2.1 Continuous Gradient The core of gradient edge detection is, of course, the gradient operator, ٌ. In continuous form, applied to a continuous-space image, f c (x, y), the gradient is defined as ٌf c (x,y) ϭ Ѩf c (x,y) Ѩx i x ϩ Ѩf c (x,y) Ѩy i y , (19.1) where i x and i y are the unit vectors in the x and y directions. Notice that the gradient is a vector, having both magnitude and direction. Its magnitude, |ٌf c (x 0 ,y 0 )|, measures the maximum rate of change in the intensity at the location (x 0 ,y 0 ). Its direction is that of the greatest increase in intensity; i.e., it points “uphill.” To produce an edge detector, one may simply extend the 1D case described earlier. Consider the effect of finding the local extrema of ٌf c (x, y) or the local maxima of ٌf c (x,y) ϭ Ѩf c (x,y) Ѩx 2 ϩ Ѩf c (x,y) Ѩy 2 . (19.2) The precise meaning of “local” is very important here. If the maxima of Eq. (19.2) are found over a 2D neighborhood, the result is a set of isolated points rather than the desired edge contours. The problem stems from the fact that the gradient magni- tude is seldom constant along a given edge, so finding the 2D local maxima yields only [...]... Methods the locally strongest of the edge contour points To fully construct edge contours, it is better to apply Eq (19.2) to a 1D local neighborhood, namely a line segment, whose direction is chosen to cross the edge The situation is then similar to that of Fig 19.1, where the point of locally maximum gradient magnitude is the edge point Now the issue becomes how to select the best direction for the line... reducing the amount of edge fragmentation The edge maps in Fig 19.3, computed from the original image in Fig 19.2, illustrate the effect of the thresholding and subsequent thinning steps The selection of the threshold value T is a tradeoff between the wish to fully capture the actual edges in theimage and the desire to reject noise Increasing T decreases sensitivity to noise at the cost of rejecting the. .. (19.20) after choosing a value of , then convolving with theimage If the filter extent is not small, it is usually more efficient to work in the frequency domain by multiplying the discrete Fourier transforms of the filter and the image, then inverse transforming the result The fast Fourier transform is the method of choice for computing these transforms Although the discrete form of Eq (19.20) is a... and the nearest filter is 15◦ As discussed previously, edges can be detected from either the maxima of the gradient magnitude or the zero crossings of the second derivative Another way to realize the essence of Canny’s method is to look for zero crossings of the second directional derivative taken along the gradient direction Let us examine the mathematical basis for this If n is a unit vector in the. .. then chosen When applied tothe Gaussiansmoothed image, this filter produces an estimate of gradient magnitude at that pixel One may instead apply simpler methods, such as the central difference operator, to estimate the partial derivatives The initial Gaussian smoothing step makes additional smoothing along the edge, as with the Prewitt or Sobel operators, completely unnecessary Next, the goal is to. .. and the second term perpendicular to it The first term then generates no response at all because it acts only along the edge The second term produces a zero crossing at the edge position along its edge-crossing profile An edge detector based solely on the zero crossings of the continuous Laplacian produces closed edge contours if the image, f (x, y), meets certain smoothness constraints [28] The contours... because the central difference operator inherently constrains its zero crossing to an exact pixel location The other difficulty caused by the first difference is its noise sensitivity In fact, both the first- and central-difference derivative estimators are quite sensitive to noise The noise problem can be reduced somewhat by incorporating smoothing into each filter in the direction normal to that of the difference... of the actual edge The effect of the derivative of Gaussian is to prevent multiple responses by smoothing the truncated signum in order to permit only one response peak in the edge neighborhood The choice of variance for the Gaussian kernel controls the filter width and the amount of smoothing This defines the width of the neighborhood in which only a single peak is to be allowed The variance selected... edge points tends to form strips, which have positive width Since the desire is usually for zero-width boundary segments or contours to describe the edges, a subsequent processing stage is needed to thin the strips tothe final edge contours Edge contours derived from continuous-space images should have zero width because any local maxima of ٌfc (x, y) , along a line segment that crosses the edge, cannot... photoreceptor cell in the mammalian retina has a roughly Gaussian shape The photoreceptor output feeds into horizontal cells in the adjacent layer of neurons Each horizontal cell averages the responses of the receptors 515 516 CHAPTER 19 Gradient and Laplacian Edge Detection in its immediate neighborhood, producing a Gaussian-shaped impulse response with a higher than that of a single photoreceptor . for the Building image. The image to the left shows that spatial segmentation separates the sky in the background from the building and the pond in the foreground. The image to the right gives the. [24] .The image to the left shows the spatial segmentation result that separates the sky in the back- ground from the building and the pond in the foreground. The image to the right gives the best. before applying the edge detector itself. Some decompose the image into a set of lowpass or bandpass versions, apply the edge detector to each, and merge the results. Still others use adaptive