Tài liệu Cơ sở dữ liệu hình ảnh P10 pptx

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	24
Dung lượng	156,04 KB

Nội dung

Image Databases: Search and Retrieval of Digital Imagery Edited by Vittorio Castelli, Lawrence D. Bergman Copyright  2002 John Wiley & Sons, Inc. ISBNs: 0-471-32116-8 (Hardback); 0-471-22463-4 (Electronic) 10 Introduction to Content-Based Image Retrieval — Overview of Key Techniques YING LI and C C. JAY KUO University of Southern California, Los Angeles, California X. WAN U.S. Research Lab, San Jose, California 10.1 INTRODUCTION Advances in modern computer and telecommunication technologies have led to huge archives of multimedia data in diverse application areas such as medicine, remote sensing, entertainment, education, and on-line information services. This is similar to the rapid increase in the amount of alphanumeric data during the early days of computing, which led to the development of database management systems (DBMS). Traditional DBMSs were designed to organize alphanumeric data into interrelated collections so that information retrieval and storage could be done conveniently and efficiently. However, this technology is not well suited to the management of multimedia information. The diversity of data types and formats, the large size of media objects, and the difficulties in automatically extracting semantic meanings from data are entirely foreign to traditional database management techniques. To use this widely available multimedia information effectively, efficient methods for storage, browsing, indexing, and retrieval [1,2] must be developed. Different multimedia data types may require specific indexing and retrieval tools and methodologies. In this chapter, we present an overview of indexing and retrieval methods for image data. Since the 1970s, image retrieval has been a very active research area within two major research communities — database management and computer vision. These research communities study image retrieval from two different angles. The first is primarily text-based, whereas the second relies on visual properties of the data [3]. 261 262 INTRODUCTION TO CONTENT-BASED IMAGE RETRIEVAL Text-based image retrieval can be traced back to the late 1970s. At that time, images were annotated by key words and stored as retrieval keys in traditional databases. Some relevant research in this field can be found in Refs. [4,5]. Two problems render manual annotation ineffective when the size of image databases becomes very large. The first is the prohibitive amount of labor involved in image annotation. The other, probably more essential, results from the difficulty of capturing the rich content of images using a small number of key words, a difficulty which is compounded by the subjectivity of human perception. In the early 1990s, because of the emergence of large-scale image collections, content-based image retrieval (CBIR) was proposed as a way to overcome these difficulties. In CBIR, images are automatically indexed by summarizing their visual contents through automatically extracted quantities or features such as color, texture, or shape. Thus, low-level numerical features, extracted by a computer, are substituted for higher-level, text-based manual annotations or key words. Since the inception of CBIR, many techniques have been developed along this direction, and many retrieval systems, both research and commercial, have been built [3]. Note that ideally CBIR systems should automatically extract (and index) the semantic content of images to meet the requirements of specific application areas. Although it seems effortless for a human being to pick out photos of horses from a collection of pictures, automatic object recognition and classification are still among the most difficult problems in image understanding and computer vision. This is the main reason why low-level features such as colors [6–8], textures [9–11], and shapes of objects [12,13] are widely used for content-based image retrieval. However, in specific applications, such as medical or petroleum imaging, low-level features play a substantial role in defining the content of the data. A typical content-based image retrieval system is depicted in Figure 10.1 [3]. The image collection database contains raw images for the purpose of visual display. The visual feature repository stores visual features extracted from images needed to support content-based image retrieval. The text annotation repository contains key words and free-text descriptions of images. Multidimensional indexing is used to achieve fast retrieval and to make the system scalable to large image collections. The retrieval engine includes a query interface and a query-processing unit. The query interface, typically employing graphical displays and direct manipu- lation techniques, collects information from users and displays retrieval results. The query-processing unit is used to translate user queries into an internal form, which is then submitted to the DBMS. Moreover, in order to gap the bridge between low-level visual features and high-level semantic meanings, users are usually allowed to communicate with the search engine in an interactive way. We will address each part of this structure in more detail in later sections. This chapter is organized as follows. In Section 10.2, the extraction and integration of some commonly used features, such as color, texture, shape, FEATURE EXTRACTION AND INTEGRATION 263 Feature extraction Image collection User Query interface Query-processing Multidimensional indexing Visual features Text annotation Retrieval engine Figure 10.1. An image retrieval system architecture. object spatial relationship, and so on are briefly discussed. Some feature indexing techniques are reviewed in Section 10.4. Section 10.5 provides key concepts of interactive content-based image retrieval, and several main components of a CBIR system are also discussed briefly. Section 10.6 introduces a new work item of the ISO/MPEG family, which is called the “Multimedia Content Description Interface” or MPEG-7 in short, which defines a standard to describe and define multimedia content features and descriptors. Finally, concluding remarks are drawn in Section 10.7. 10.2 FEATURE EXTRACTION AND INTEGRATION Feature extraction is the basis of CBIR. Features can be categorized as general or domain-specific. General features typically include color, texture, shape, sketch, spatial relationships, and deformation, whereas domain-specific features are appli- cable in specialized domains such as human face recognition or fingerprint recognition. Each feature may have several representations. For example, color histogram and color moments are both representations of the image color feature. Moreover, numerous variations of the color histogram itself have been proposed, each of which differs in the selected color-quantization scheme. 264 INTRODUCTION TO CONTENT-BASED IMAGE RETRIEVAL 10.2.1 Feature Extraction 10.2.1.1 Color. Color is one of the most recognizable elements of image content [14] and is widely used in image retrieval because of its invariance with respect to image scaling, translation, and rotation. The key issues in color feature extraction include the color space, color quantization, and the choice of similarity function. Color Spaces. The commonly used color spaces include RGB, YCbCr, HSV, CIELAB, CIEL*u*v*, and Munsell spaces. The CIELAB and CIEL*u*v* color spaces usually give a better performance because of their improved perceptual uniformity with respect to RGB [15]. MPEG-7 XM V2 supports RGB, YCbCr, HSV color spaces, and some linear transformation matrices with reference to RGB [16]. Color Quantization. Color quantization is used to reduce the color resolution of an image. Using a quantized color map can considerably decrease the computational complexity during image retrieval. The commonly used color-quantization schemes include uniform quantization, vector quantization, tree-structured vector quantization, and product quantization [17–19]. In MPEG-7 XM V2 [16], three quantization types are supported: linear, nonlinear, and lookup table. 10.3 SIMILARITY FUNCTIONS A similarity function is a mapping between pairs of feature vectors and a positive real-valued number, which is chosen to be representative of the visual similarity between two images. Let us take the color histogram as an example. There are two main approaches to histogram formation. The first one is based on the global color distribution across the entire image, whereas the second one consists of computing the local color distribution for a certain partition of the image. These two techniques are suitable for different types of queries. If users are concerned only with the overall colors and their amounts, regardless of their spatial arrangement in the image, then indexing using the global color distribution is useful. However, if users also want to take into consideration the positional arrangement of colors, the local color histogram will be a better choice. A global color histogram represents an image I by an N-dimensional vector, H(I ) = [H(I, j ), j = 1, 2, ,N], where N is the number of quantization colors and H(I, j) is the number of pixels having color j . The similarity of two images can be easily computed on the basis of this representation. The four common types of similarity measurements are the L1 norm [20], the L2 norm [21], the color histogram intersection [7], and the weighted distance metric [22]. The L1 norm has the lowest computational complexity. However, it was shown in Ref. [23] that it could produce false negatives (not all similar images are retrieved). The L2 norm (i.e., the Euclidean distance) is probably the most widely used metric. SIMILARITY FUNCTIONS 265 However, it can result in false positives (dissimilar images are retrieved). The color histogram intersection proposed by Swain and Ballard [7] has been adopted by many researchers because of its simplicity and effectiveness. The weighted distance metric proposed by Hafner and coworkers [22] takes into account the perceptual similarity between two colors, thus making retrieval results consistent with human’s visual perception. Other weighted matrices for similarity measure can be found in Ref. [24]. See Chapter 14 for a more detailed description of these metrics. Local color histograms are used to retrieve images on the basis of their color similarity in local spatial regions. One natural approach is to partition the whole image into several regions and then extract color features from each of them [25,26]. In this case, the similarity of two images will be determined by the similarity of the corresponding regions. Of course, the two images should have same number of partitions with the same size. If they happen to have different aspect ratios, then normalization will be required. 10.3.1 Some Color Descriptors A compact color descriptor, called a binary representation of the image histogram, was proposed in Ref. [27]. With this approach, each region is represented by a binary signature, which is a binary sequence generated by a two-level quantization of wavelet coefficients obtained by applying the two-dimensional (2D) Haar transform to the 2D color histogram. In Ref. [28], a scalable blob histogram was proposed, where the term blob denotes a group of pixels with homogeneous color. One advantage of this descriptor is that images containing objects with different sizes and shapes can be easily distinguished without color segmentation. A region-based image retrieval approach was presented in Ref. [29]. The main idea of this work is to adaptively segment the whole image into sets of regions according to the local color distribution [30] and then compute the similarity on the basis of each region’s dominant colors, which are extracted by applying color quantization. Some other commonly used color feature representations in image retrieval include color moments and color sets. For example, in Ref. [31], Stricker and Dimai extracted the first three color moments from five partially overlapped fuzzy regions. In Ref. [32], Stricker and Orengo proposed to use color moments to overcome undesirable quantization effects. To speed up the retrieval process in a very large image database, Smith and Chang approximated the color histogram with a selection of colors (color sets) from a prequantized color space [33,34]. 10.3.2 Texture Texture refers to visual patterns with properties of homogeneity that do not result from the presence of only a single color or intensity [35]. Tree barks, clouds, water, bricks, and fabrics are examples of texture. Typical textural features include contrast, uniformity, coarseness, roughness, frequency, density, 266 INTRODUCTION TO CONTENT-BASED IMAGE RETRIEVAL and directionality. Texture features usually contain important information about the structural arrangement of surfaces and their relationship to the surrounding environment [36]. To date, a large amount of research in texture analysis has been done as a result of the usefulness and effectiveness of this feature in application areas such as pattern recognition, computer vision, and image retrieval. There are two basic classes of texture descriptors: statistical model-based and transform-based. The first approach explores the gray-level spatial dependence of textures and then extracts meaningful statistics as texture representation. In Ref. [36], Haralick and coworkers proposed the co-occurrence matrix representation of texture features, in which they explored the gray-level spatial dependence of texture. They also studied the line-angle-ratio statistics by analyzing the spatial relationships of lines and the properties of their surroundings. Inter- estingly, Tamura and coworkers addressed this topic from a totally different viewpoint [37]. They showed, on the basis of psychological measurements, that six basic textural features were coarseness, contrast, directionality, line-likeness, regularity, and roughness. This approach selects numerical features that corre- spond to characteristics of the human visual system, rather than on statistical measures of the data and, therefore, seems well suited to the retrieval of natural images. Two well-known CBIR systems (the QBIC system [38] and the MARS system [39,40]) adopted Tamura’s texture representation and made some further improvements. Liu and Picard [10] and Niblack and coworkers [11,41] used a subset of the above mentioned 6 features, namely contrast, coarseness, and directionality models to achieve texture classification and recognition. A human texture perception study, conducted by Rao and Lohse [42], indi- cated that the three most important orthogonal dimensions are “repetitiveness,” “directionality,” and “granularity and complexity.” Some commonly used transforms for transform-based texture extractions are the discrete cosine transform (DCT transform), the Fourier-Mellin transform, Polar Fourier transform, and the Gabor and the wavelet transform. Alata and coworkers [43] proposed classifying rotated and scaled textures by using the combination of a Fourier-Mellin transform and a parametric 2D spectrum esti- mation method called harmonic mean horizontal vertical (HMHV). Wan and Kuo [44] extracted the texture features in the joint photographic experts group (JPEG) compressed domain by analyzing AC coefficients of the DCT transform. The Gabor filters proposed by Manjunath and Ma [45] offer texture descriptors with a set of “optimum joint bandwidth.” A tree-structured wavelet transform presented by Chang and Kuo [46] provides a natural and effective way to describe textures that have dominant middle- or high-frequency subbands. In Ref. [47], Nevel developed a texture feature–extraction method by matching the first and the second-order statistics of wavelet subbands. 10.3.3 Shape Two major steps are involved in shape feature extraction. They are object segmentation and shape representation. SIMILARITY FUNCTIONS 267 10.3.3.1 Object Segmentation. Image retrieval based on object shape is considered to be one of the most difficult aspects of content-based image retrieval because of difficulties in low-level image segmentation and the variety of ways a given three-dimensional (3D) object can be projected into 2D shapes. Several segmentation techniques have been proposed so far and include the global threshold-based technique [21], the region-growing technique [48], the split-and-merge technique [49], the edge-detection-based technique [41,50], the texture-based technique [51], the color-based technique [52], and the model- based technique [53]. Generally speaking, it is difficult to do a precise segmentation owing to the complexity of the individual object shape, the existence of shadows, noise, and so on. 10.3.3.2 Shape Representation. Once objects are segmented, their shape features can be represented and indexed. In general, shape representations can be classified into three categories [54]: • Boundary-Based Representations (Based on the Outer Boundary of the Shape). The commonly used descriptors of this class include the chain code [55], the Fourier descriptor [55], and the UNL descriptor [56]. • Region-Based Representations (Based on the Entire Shape Region). Descriptors of this class include moment invariants [57], Zernike moments [55], the morphological descriptor [58], and pseudo-Zernike moments [56]. • Combined Representations. We may consider the integration of several basic representations such as moment invariants with the Fourier descriptor or moment invariants with the UNL descriptor. The Fourier descriptor is extracted by applying the Fourier transform to the parameterized 1 D boundary. Because digitization noise can significantly affect this technique, robust approaches have been developed such as the one described in Ref. [54], which is also invariant to geometric transformations. Region-based moments are invariant with respect to affine transformations of images. Details can be found in Ref. [57,59,60]. Recent work in shape representation includes the finite element method (FEM) [61], the turning function developed by Arkin and coworkers [62], and the wavelet descriptor developed by Chuang and Kuo [63]. Chamfer matching is the most popular shape-matching technique. It was first proposed by Barrow and coworkers [64] for comparing two collections of shape fragments and was then further improved by Borgefors in Ref. [65]. Besides the aforementioned work in 2D shape representation, some research has focused on 3D shape representations. For example, Borgefors and coworkers [66] used binary pyramids in 3D space to improve the shape and the topology preservation in lower-resolution representations. Wallace and Mitchell [67] presented a hybrid structural or statistical local shape analysis algorithm for 3D shape representation. 268 INTRODUCTION TO CONTENT-BASED IMAGE RETRIEVAL 10.3.4 Spatial Relationships There are two classes of spatial relationships. The first class, containing topological relationships, captures the relations between element boundaries. The second class containing orientation or directional relationships captures the relative posi- tions of elements with respect to each other. Examples of topological relationships are “near to,” “within,” or “adjacent to.” Examples of directional relationships are “in front of,” “on the left of,” and “on top of.” A well-known method to describe spatial relationship is the attributed-relational graph (ARG) [68] in which objects are represented by nodes, and an arc between two nodes represents a relationship between them. So far, spatial-based modeling has been widely addressed, mostly in the liter- ature on spatial reasoning, for application areas such as geographic information systems [69,70]. We can distinguish two main categories that are called qualitative and quantitative spatial modeling, respectively. A typical application of the qualitative spatial model to image databases, based on symbolic projection theory, was proposed by Chang [7]; it allows a bidimensional arrangement of a set of objects to be encoded into a sequential structure called a 2D string. Because the 2D string structure reduces the matching complexity from a quadratic function to a linear one, the approach has been adopted in several other works [72,73]. Compared to qualitative modeling, quantitative spatial modeling can provide a more continuous relationship between perceived spatial arrangements and their representations by using numeric quantities as classification thresholds [74,75]. Lee and Hsu [74] proposed a quantitative modeling technique that enables the comparison of the mutual position of a pair of extended regions. In this approach, the spatial relationship between an observer and an observed object is represented by a finite set of equivalence classes based on the dense sets of possible paths leading from any pixel of one object to that of the other. 10.3.5 Features of Nonphotographic Images The discussion in the previous section focused on features for indexing and retrieving natural images. Nonphotographic images such as medical and satellite images can be retrieved more effectively using special-purpose features, owing to their special content and their complex and variable characteristics. 10.3.5.1 Medical Images. Medical images include diagnostic X-ray images, ultrasound images, computer-aided tomographical images, magnetic resonance images, and nuclear medicine images. Typical medical images contain many complex, irregular objects. These exhibit a great deal of variability, due to difference in modality, equipment, procedure, and the patient [76]. This variability poses a big challenge to efficient image indexing and retrieval. Features suitable for medical images can be categorized into two basic classes: text-based and content-based. SIMILARITY FUNCTIONS 269 Text-Based Features. Because of the uniqueness of each medical image (for example, the unique relationship between a patient and an X-ray image of his or her lungs at a particular time), text-based features are widely used in some medical image retrieval systems. This information usually includes the institution name, the institution patient identifier, patient’s name and birth date, patient study identifiers, modality, date, and time [76]. Usually, these features are incorporated into labels, which are digitally or physically affixed to the images and then used as the primary indexing key in medical imaging libraries. Content-Based Features. Two commonly used content-based features are shape and object spatial relationship, which are very useful in helping physicians locate images containing the objects of their interest. In Ref. [76], Cabral and coworkers proposed a new feature called anatomic labels. This descriptor is associated with the anatomy and pathology present in the image and provides a means for assigning (unified medical language system) (UMLS) labels to images or specific locations within images. 10.3.5.2 Satellite Images. Recent advances in sensor and communication technologies have made it practical to launch an increasing number of space platforms for a variety of Earth science studies. The large volume of data generated by the instruments on the platforms has posed significant challenges for data transmis- sion, storage, retrieval and dissemination. Efficient image storage, indexing, and retrieval systems are required to make this vast quantity of data useful. The research community has devoted a significant amount of effort to this area [77–80]. In CBIR systems for satellite imagery, different image features are extracted, depending on the type of satellite images and research purposes. For example, in a system used for analyzing aurora image data [79], the authors extract two types of features. Global features include the aurora area, the magnetic flux, the total intensity and the variation of intensity, and radial features along a radial line from geomagnetic north such as the average width and the variation of width. In Ref. [77], shape and spatial relationship features are extracted from a National Oceanographic and Atmospheric Adminstration (NOAA) satellite image database. In a database system for the Earth observing satellite image [80], Li and Chen proposed an algorithm to progressively extract and compare different texture features, such as the fractal dimension, coarseness, entropy, circular Moran autocorrelation functions, and spatial gray- level difference (SGLD) statistics, between an image and a target template. In Ref. [78], Barros and coworkers explored techniques for the exploitation of spectral distribution information in a satellite image database. 10.3.6 Some Additional Features Some additional features that have been used in the image retrieval process are discussed in the following section. 270 INTRODUCTION TO CONTENT-BASED IMAGE RETRIEVAL 10.3.6.1 Angular Spectrum. Visual properties of an image are mainly related to the largest objects it contains. In describing an object, shape, texture, and orientation play a major role. In many cases, because shape can also be defined in terms of presence and distribution of oriented subcomponents, the orientation of objects within an image becomes a key attribute in the definition of similarity to other images. On the basis of this assumption, Lecce and Celentano [81] defined a metric for image classification in the 2D space that is quantified by signatures composed of angular spectra of image components. In Ref. [8.2], an image’s Fourier transform was analyzed to find the directional distribution of lines. 10.3.6.2 Edge Directionality. Edge directionality is another commonly used feature. In Ref. [82], Lecce and Celentano detected edges within an image by using the Canny algorithm [83] and then applied the Hough transform [84], which transforms a line in Cartesian coordinate space to a point in polar coordinate space, to each edge point. The results were then analyzed in order to detect main directions of edges in each image. 10.3.7 Feature Integration Experience shows that the use of a single class of descriptors to index an image database does not generally produce results that are adequate for real applications and that retrieval results are often unsatisfactory even for a research prototype. A strategy to potentially improve image retrieval, both in terms of speed and quality of results, is to combine multiple heterogeneous features. We can categorize feature integration as either sequential or parallel. Sequential feature integration, also called feature filtering, is a multistage process in which different features are sequentially used to prune a candidate image set. In the parallel feature-integration approach, several features are used concurrently in the retrieval process. In the latter case, different weights need to be assigned appropri- ately to different features, because different features have different discriminating powers, depending on the application and specific task. The feature-integration approach appears to be superior to using individual features and, as a conse- quence, is implemented in most current CBIR systems. The original Query by Image Content (QBIC) system [85] allowed the user to select the relative impor- tance of color, texture, and shape. Smith and Chang [86] proposed a spatial and feature (SaFe) system to integrate content-based features with spatial query methods, thus allowing users to specify a query in terms of a set of regions with desired characteristics and simple spatial relations. Srihari [20] developed a system for identifying human faces in newspaper photographs by integrating visual features extracted from images with texts obtained from the associated descriptive captions. A similar system based on textual and image content information was also described in Ref. [87]. Extensive experiments show that the use of only one kind of information cannot produce satisfactory results. In the newest version of the QBIC system [85], text-based key word search is integrated with content-based similarity search, which leads

Ngày đăng: 26/01/2014, 15:20

Xem thêm