Tài liệu Cơ sở dữ liệu hình ảnh P13 pptx

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	28
Dung lượng	191,84 KB

Nội dung

Image Databases: Search and Retrieval of Digital Imagery Edited by Vittorio Castelli, Lawrence D. Bergman Copyright  2002 John Wiley & Sons, Inc. ISBNs: 0-471-32116-8 (Hardback); 0-471-22463-4 (Electronic) 13 Shape Representation for Image Retrieval BENJAMIN B. KIMIA Brown University, Providence, Rhode Island 13.1 INTRODUCTION The human–machine interface is evolving at an incredible pace, surpassing the traditional text-based boundaries. A driving force motivating this development is the need to endow computers with capabilities that parallel our perceptual abilities. Vision is arguably our most significant sense, giving rise to efforts to empower computers to represent, process, understand, and act on visual imagery. As a result, images are being generated at a mind-boggling pace from a variety of sources. Terabytes of data are being generated in the form of aerial imagery, surveillance images, mug shots, fingerprints, trademarks and logos, graphic illus- trations, engineering line drawings, documents, manuals, medical images, images from sports events, documentation of environmental resources in the form of images, and entertainment industry photos and videos [1–7]. Clearly, the management of such databases must rely on the perceptual and cognitive dimensions of the visual space, namely, color, texture, shape, and so on. The basic premise is that there exists qualitative aspects of images that can be used to retrieve images without fully specifying them. The use of shape as a cue is less developed than the use of color or texture, mainly because of the inherent complexity of representing it. Yet, retrieval-by- shape has the potential of being the most effective search technique in many application fields. This chapter reviews and discusses the representation of shape as a cue for indexing image databases. The central question is how complete or partial information regarding a shape in an image can be represented so that it can be easily extracted, matched, and retrieved. Specifically, five key items must be addressed: Image and Query Preparation. How are shapes extracted from images? The segregation of figure from ground is rather straightforward in images that 345 346 SHAPE REPRESENTATION FOR IMAGE RETRIEVAL are binary or have a bi-level histogram, but usually difficult otherwise. As a consequence, a wide spectrum of shape-extraction techniques have been developed, ranging from segmenting the image to extracting related lower- level features, such as edges, that yield a partial representation of shape. Query formulation and shape extraction are therefore inherently related. The query-specification mechanism provided by the user interface (sketch drawing, query-by-example, query-by-keyword, spatial-layout specification, and so on) must closely match the shape extraction process, and, in particular, emphasize the specific representation of shape used during the search. Shape Representation. How is shape represented? Is there “invariance” to a class of transformations? Is the representation contour-based or region- based? Is it based on local features or global attributes? Do parts play a role? Is the spatial relationship among parts or features represented explicitly? Is the representation multiscale? Shape Similarity and Matching. How are the query and database items matched? Is the matching based on geometric hashing, graph matching, energy minimization, probabilistic formulation, and so on? How is the similarity between two objects represented? Indexing and Retrieval. How is the database organized? Are prototypes or categories used? Do models guide the retrieval process? Validation. How well does each approach perform in terms of accuracy and precision? How efficient is the retrieval? This chapter focuses on the second question, namely, the issue of shape representation, although this necessarily requires a discussion of the remaining items. We will begin by citing some examples in which indexing by shape content is used, followed by a discussion on how the database of shapes and the related image queries are prepared. Next, we will discuss the main issues pertaining to shape representation. This is followed by a brief discussion of matching and shape similarity as it pertains to the nature of the underlying representation. 13.2 WHERE IS INDEXING BY SHAPE RELEVANT? Although it is inherently difficult to characterize and manipulate, shape is a significant cue for describing objects. Despite the difficulty of capturing a computational notion of shape, an increasing number of applications have used it as a primary cue for indexing, (illustrated in Fig. 13.1) a few of which are now briefly reviewed. Trademarks and logos are often distinguished by their specific shapes. Patent application offices must avoid duplication partly by checking the similarity in shape with previously used forms. ARTISAN is an example of a system that uses shape to retrieve trademarks [8–10]. Numerous shape- representation techniques (described in Section 13.4) have been applied to WHERE IS INDEXING BY SHAPE RELEVANT? 347 Figure 13.1. Examples of shapes for indexing into a database: trademarks and logos, medical structures, drawings, fingerprints, face profiles, and signatures. trademark and logo retrieval, including geometric invariant signatures [11], string matching of the contour chain code [12], and combinations of moment invariants and Fourier descriptors [13,14]. In the medical domain, shape is used as a cue to describe the similarity of medical scans. Applications include detecting emphysema in high- resolution CT scans of the lung [15,16], classifying deformations arising from pathological changes as evident in dental radiographs (e.g., for periapical disease), and retrieving tumors [17]. Several image query systems supporting retrieval-by-shape have been developed [18,19]. Shape also plays a key role in the management of document databases.Sample applications include the retrieval of architectural drawings, computer- generated technical drawings [20], character bitmaps (e.g., Chinese characters) [21], technical drawing of machine parts (e.g., aircraft parts), clipart, and graphics. Law-enforcement and security is another application area for retrieval of images by shape. Fingerprint matching [22] is used in automatic personal identification for criminal identification by law-enforcement agencies, access control to restricted facilities, credit card user identification, and other applications. The size of a fingerprint database is often very large, on the order of hundreds of million fingerprint codes, and requires indexing into terabytes of data. Earth Science applications of retrieval-by-shape include indexing databases of auroras [23]. 348 SHAPE REPRESENTATION FOR IMAGE RETRIEVAL Other applications include art and art history [24], electronic shopping, multi- media systems for museums and archaeology, defense, entertainment, and so on. 13.3 IMAGE PREPARATION AND QUERY FORMULATION The question of how images must be prepared prior to storage in a database, and how queries can be formulated are both intimately connected with how shape is used as an indexing mechanism. In principle, a complete representation of a two-dimensional shape is provided by its contour. The contour is a continuous curve in the plane, and can specified by a large number of points. Clearly, such a voluminous representation of shape cannot be effectively used for similarity retrieval, and partial representations capturing its salient aspects are used in practice. These partial representations range from very simple (for example, a shape can be approximated by an ellipse and represented just by its elongation) to very complex (for example, the contour could be approximated by a piecewise polynomial representation). The specific application imposes requirements on the richness of the representation. When a complete description of shape is used in the indexing scheme, the image must be segmented and entire shapes must be stored. This process is quite straightforward when images contain binary or nearly binary shapes, such as trademarks, logos, bitmaps of characters, signatures, clip art, designs, drawings, graphics, and so on. In general, however, the task of figure-ground segregation is formidable, as is evident from the relatively large “segmentation” literature in computer vision and image processing. Nevertheless, in certain domains automatic segmentation has been used. For example, Gunsel and Tekalp [25] address the segmentation, or figure background separation problem, by a combination of methods. A color histogram intersection method [26] is used to eliminate database objects with significantly different color from the query object. Boundaries are estimated using either the Canny edge detector [27] or the graduated nonconvexity (GNC) algorithm [28,29]. As a result of the difficulty of figure-ground segregation, partial representations are often used when application requirements permit. The most common methods rely on edge content, which is indicative of shape boundary. A brief historical sequence that samples these methods is presented here. Hirata and Kato [30,31] performed a pixel-by-pixel edge-content comparison of a query and shifted image blocks and used the resulting “edge similarity score” to find the best match. Gray [32] evaluated this approach and concluded that its fundamental weakness is the “pixel-by-pixel” nature of the comparison, which produces multiple false matches. DelBimbo [33] introduced the notion of flexible matching for indexing, which allows for significant deviations of the sketch from the edge map. Rectangular regions of interest are identified for images containing well-delineated objects, and a gradient-descent method detects object boundaries from edge maps. Chan and coworkers [34] extend the pixel-by-pixel approach to correlation of “curvelets” by grouping edge pixels into edge elements using REPRESENTATION OF SHAPE 349 the Hough transform, by modeling grouped edges as curvelets using implicit polynomial (IP) models [35], and by computing the similarity between a pair of IP curvelets. Other approaches augment the edge content by making the relative spatial arrangement of edges explicit. This evolution from local models of edge content to those that incorporate more of the global geometry, namely, deformable templates, curves, and the inclusion of relative spatial arrangement, indicates a move toward more complete descriptors of shape. Query formulation must closely match the underlying shape representation: a query shape specified by a user is necessarily an approximation of the shape the user is trying to communicate. Thus, a neighborhood of shapes is implicitly being presented to the system for a match, as determined by the underlying representation of the query. The requirement of indexing robustness with variations in the underlying representation of shape motivates the use of identical representations for the query and for the indexing mechanism. 13.4 REPRESENTATION OF SHAPE As mentioned in the previous section, only approximate representations of shape are practically usable for image retrieval. There is clearly a trade- off between the complexity of the representation and its ability to capture different aspects of shape. However, the elusive nature of shape makes it almost impossible to formally analyze this trade-off. As a consequence, shape has been represented using a variety of descriptors such as moments, Fourier descriptors (FD), geometric and algebraic invariants, polygons, polynomials, splines, strings, deformable templates, skeletons, and so on, for both object recognition and for indexing of image databases. Each of these representations aims at capturing specific perceptually salient dimensions of the qualitative aspects of shape. Because of the heterogeneous nature of the aspects captured, it is not possible to compare different descriptors outside the context of very specific applications. Shape comparison is also a very difficult problem. It is well established that neither mathematical descriptions based on differential geometry [36], mathematical morphology [37], or statistics [38], nor formal metrics for shape comparison [39,40], fully capture the salient aspects of shape. The key observation is that shape, a construct of the projected object that is a perceptual invariant of the object, is multifaceted. Existing approaches can be organized according to the particular facets that have been targeted in the representation. We specifically analyze several dimensions; we distinguish first between methods that describe the boundary and methods that describe the interior; we then contrast global and local representations; we differentiate between composition-based and deformation- based approaches; we discuss representations of shape at multiple scale; we categorize shape representation by their completeness; and finally, we distinguish between the descriptions of isolated shapes and of shape arrangements. 350 SHAPE REPRESENTATION FOR IMAGE RETRIEVAL 13.4.1 Boundary Versus Interior Two large categories of shape descriptors can be identified: those capturing the boundary (or contour appearance) and those characterizing the interior region. Boundary representations emphasize the closed curve that surrounds the shape. This curve has been described by numerous models, including chain codes [41], polygons [42–46], circular arcs [9], splines [47–49], explicit and implicit polynomials [35,50], and boundary Fourier descriptors. Alternately, a boundary can be described by its features, for example, curvature extrema and inflection points [51,52]. Interior descriptions of shape, on the other hand, emphasize the “material” within the closed boundary. The interior has been modeled in a variety of ways, including collections of primitives [53] (rectangles, disks, superquadrics, etc.), deformable templates [54–56], by modes of resonance, skeletal models, or simply as a set of points (as in mathematical morphology). Each description, whether boundary-based or region-based, is intuitively appealing and corresponds to a perceptually meaningful dimension. Clearly, each representation is complete, and can be used as a basis to compute the other, that is, by filling in the interior region or by tracing the boundary. Although the two representations are interchangeable in the sense of information content, the issue of which aspects of shape have been made explicit matters to the subsequent phases of the computation. For example, in boundary-based models, features such as curvature and arc length are immediately available; in region-based methods, the explicit features are quite different and include spatial relationship among shape features (for example, the shortest regional distance used in determining a neck). Shape features that are represented explicitly will generally permit more efficient retrieval when these particular features are queried. Because both contours and interiors correspond to meaningful perceptual dimensions, an ideal representation would include both, enabling a full range of queries. We now consider examples utilizing either contours, interiors, or both, in their representation of shape. 13.4.1.1 Boundary Representations of Shape. Grosky and Mehrotra [6,57] represent shape as an ordered set of boundary features, encoded as a polygonal approximation. Shape similarity is the distance between two boundary feature vectors. Eakins and coworkers [8–10] represent boundaries with circular polyarcs and discontinuities. In the query-by-visual example (QVE) system [30] a boundary-based approach is followed: edges are extracted, thinned, binarized, andstoredina64× 64 binary-edge map. A user query, which is formulated as a sketch, is similarly represented but viewed as a collection of 64 blocks (8 × 8). The sketch is correlated with the edge map in each block, allowing for one to four pixel horizontal and vertical shifts, thus effectively building some tolerance against deformation and warping. The approach in DelBimbo and coworkers [48] is one of matching user sketches, which represent the boundaries of the object of interest. They argue that straightforward correlation measures, such as those used in QVE [30], produce good matches only when sketches are drawn exactly. In QVE, the lack of an exact REPRESENTATION OF SHAPE 351 match between a sketch and a set of image edges is tolerated only to some extent by allowing for limited horizontal and vertical shifts. In Ref. [48], the approach relies on a different measure of similarity in which the sketches are allowed to elastically deform. The sketch is deformed to adjust to the shape of target models; the extent of the final match and the elastic deformation energy are used as a measure of shape similarity. Specifically, the one-dimensional sketched template is modeled by a second-order spline and parameterized by arc length. The sketch is then allowed to act as an active contour (or snake) [58], namely, it is allowed to deform to maximize the edge strength integral along the curve, at the same time minimizing the strain and bending energies. These energies are typically modeled by integrals of the first and second derivatives of the deformation along the curve. Shape similarity is then measured as a combination of strain and bending energy, edge strength function along the curve, curve complexity, and correlation between certain functions classified by a back-propagation neural network subject to appropriate training (Fig. 13.2). This approach is translation- invariant, but requires template scaling and rotation. Kliot and Rivlin [11] represent a binary shape via the local multivalued invariant signatures of its boundary. First, edge contours are traced and described as a set of geometric entities, such as circles, ellipses, and straight lines. Then, the relative position of these geometric entities is described via a containment tree in which each directed edge points to a curve contained in the current curves. Finally, each curve is represented by an invariant signature, which is essentially the derivative of the curve in a transform-invariant parameterization [60,61]. The shape representation by Gunsel and Tekalp [25] uses edge features obtained by either the Canny edge detector [27] or the graduated nonconvexity (GNC) algorithm [28]. If boundaries are closed, the method organizes the edges as B-splines [49,62]; otherwise, it represents them as a set of feature points. The advantages of the B-spline representation are the reduction of data volume to a small number of control points, affine invariance, and robustness to noise because of inherent smoothing. Figure 13.2. This figure from Ref. [59] illustrates the use of deformable models in matching user-drawn sketches to shapes in images. 352 SHAPE REPRESENTATION FOR IMAGE RETRIEVAL Jain and Vailaya [63] represent edge directions in a histogram, which is used as a shape feature. An alternate representation of shape boundary is a series of 2D strings, as presented in Refs. [64–66]. 13.4.1.2 Interior Representations of Shape. Jaggadish [67] represents a shape by a fixed number of largest rectangles covering the shape. This allows a shape to be represented by a finite number of numeric parameters, which are mapped to a point in a multidimensional space, and indexed by a point-access method (PAM, Chapter 14). Pentland and Sclaroff propose a physically motivated modal representation in which the low-order vibration modes of a shape are used as its representation [68–70]. For a related approach, see Ref. [71]. A class of rather intuitive representations of shape relies on the axis of symmetry between a pair of boundaries. The earliest use of this representation is by Blum [72], who defined the medial axis as a locus of inscribed circles that are maximal in size. The trace of this representation, typically known as a skeleton, is usually represented by a graph and used in Refs. [73,74]. The symmetry set is the locus of bitangent circles; its definition is identical to that of the medial axis minus the maximality condition. Thus, the medial axis is a subset of the symmetry set. However, although it appears that the symmetry set contains more information than the medial axis, the additional branches of the symmetry set are in fact redundant. Furthermore, their presence creates numerous difficulties for indexing when shapes undergo slight perturbations. The shock set is another variant of the medial axis and is based on the notion of propagation from boundaries, much like a “grassfire” initiated from the boundaries of a field. Shocks are singularities that form from the collision of fronts. These shocks flow along with the wave-front itself [39,75–77]. This addition of a sense of flow or dynamics to each point of the medial axis and grouping of monotonically flowing shocks into branches leads to a shock graph, which is analogous to a skeletal graph, but is a finer partition of the medial axis. The shock graph has been used for indexing and recognition of shapes [74,78–84]. 13.4.2 Local Versus Global Shape can also be viewed either from a local or from a global perspective. Many early models in indexing by shape content used features such as moments, eccentricity, area, and so on, which are typically based on the entire shape and are thus global. Similarly, Fourier descriptors of two-dimensional shape are global descriptors. On the other hand, local representations restrict computations to small neighborhoods of the shape. For example, a representation based on curvature extrema and inflection points of the boundary is local. Purely global representations are affected by variations, such as partial occlusion and articulation, whereas purely local representations are sensitive to noise. Ideally, our ability to focus on either facet implies that both must be emphasized in the representation for successful and intuitive indexing by shape. REPRESENTATION OF SHAPE 353 The binary edge map used in the query by image content system (QBIC) [4,85] is an example of global shape representation. Here, edges are extracted (either manually or automatically) and represented as a binary edge map from which twenty-two global features are extracted (area, circularity, eccentricity, the major axis, and a set of associated algebraic moments up to degree 8). A Karhunen Loeve (KL) transform reduces the dimensionality of the feature space. Transform-based methods are also typically global: Fourier descriptors [86,87], frequency subband decomposition, coefficients of 2D Discrete Wavelet transform (DFT) [88], Wavelet Transform [89], Karhunen-Loeve Trans- form [19], and others all encode global measures. Orientation radiograms [90] project an image onto an axis by integrating image intensities along lines orthogonal to that axis. This results in a histogram for each of the four or eight orientations of the axis used. This is a global representation because local variations are not explicitly captured onto a profile and are thus global. Grosky and Mehrotra represent boundary features by a property vector, which is matched using a string edit-distance similarity measure [6]. They use an m-way search tree-based index structure to organize boundary features. A few approaches cannot be easily characterized as either global or local. These include local differential invariants [91] and semidifferential invariants [61,92,93]. Shyu and coworkers [94] discuss and compare the utility of local and global features in the context of a medical application [15]. Wang and coworkers [95] note the limited discrimination capability of global features, on the one hand, and the noise sensitivity of local features, on the other. They propose combining both and use two global features (shape elongation and compactness) as a filter to eliminate the most dissimilar images to the query template and then use local features to refine the search. Recall that elongation of a shape is the ratio of the eigenvalues of the covariance matrix of the contour points coordinates and compactness is the ratio of perimeter squared to area. Both measures are invariant under Euclidean (i.e., rotation plus translation) and scaling transformations. Wang and coworkers define a set of local features, referred to as interest points, which are a small subset of the contour points derived by a pairwise growing algorithm. First, a pair of contour points with maximal distance from each other are selected. Then a second pair farthest from the line connecting the first pair is chosen. The latter part of this process is repeated for each adjacent pair of points until a sufficient number of interest points have been obtained. Finally, the coordinates of the interest points are converted through a normalized affine-invariant transformation [96]. 13.4.3 Composition of Parts Versus Deformation Shape can also be viewed either as the composition of simpler, elementary parts, or as the deformation of simpler shapes. In the “part-based view,” shapes are composed of simple components; for example, a tennis racket is easily described as an elliptical head attached to a 354 SHAPE REPRESENTATION FOR IMAGE RETRIEVAL rectangular handle, and a hand is seen as four fingers and one thumb attached to a palm. Superquadrics [53] represent a rich space of shape primitives from which to choose [97]. The partitioning can be based on either global fit or local evidence. An example of global fit is the minimum description length (MDL) approach. Here, a shape is represented as a combination of primitives selected from a collection; for each combination, two quantities are computed: the fitting error, and the encoding cost. The encoding cost (expressed in “bits”) is called description length,and measures the complexity of describing the combination. The overall energy is defined as an increasing function of both, fit error, and description length (e.g., a weighted average). Shape representation with the lowest energy is selected. Representations with few simple parts have short-description length but can also have a poor fit; complex representations better approximate the shape but have long-description length. The method therefore selects one that optimizes a linear combination of fit and description length [98]. Shape can also be decomposed into parts based on “local” evidence. Properties of the boundary belong to this category. For example, the boundary can be decomposed into codons along negative minima of its curvature [51,99–101] or by taking into account regional properties, such as good continuation of tangents [102]. The latter approach has been shown to produce parts that are perceptually meaningful [103]. The “part-based” methodology is not universally applicable. Biological shapes, such as the corpus collosum boundary in the brain, leaves, animal limbs, and so on, are often best described as the deformation of a simpler shape. This morph-based view has given rise to deformable templates [55,104–106], modal representation [69], and so on. Deformable templates are representations in which shape variability is captured by allowable transformations of a template. Generally, two forms of deformable shape models have been proposed, which differ, based on whether the model itself or the deformation of the model is parameterized. Parameterized (geometric) models use an underlying representation that has a few variable parameters. For example, Yuille and coworkers [73] use conic curve segments as templates for the eyes and the mouth in face recognition. The parameters of the conic allow for shape variations. As another example, Staib and Duncan [107] use elliptical Fourier descriptors to represent boundary templates. Superquadrics provide yet another example of parameterized shape models [97]. Parametric-deformation approaches represent the object by fitting it to a fixed template, using a set of allowable parametric deformations. For example, Jain and coworkers [108] represent the template shape via a bitmap and impose a probability distribution (a Bayesian prior) on the admissible mappings. Matching then reduces to selecting the transformation that minimizes a Bayesian objective function. This class of methods also contains approaches based on skeletons [21], deformable templates [47,48,108], the methods by Grenander and

Ngày đăng: 26/01/2014, 15:20

Xem thêm