Tài liệu Cơ sở dữ liệu hình ảnh P12 ppt

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	32
Dung lượng	737,11 KB

Nội dung

Image Databases: Search and Retrieval of Digital Imagery Edited by Vittorio Castelli, Lawrence D. Bergman Copyright  2002 John Wiley & Sons, Inc. ISBNs: 0-471-32116-8 (Hardback); 0-471-22463-4 (Electronic) 12 Texture Features for Image Retrieval B.S. MANJUNATH University of California at Santa Barbara, Santa Barbara, California WEI-YING MA Microsoft Research China, Beijing, China 12.1 INTRODUCTION Pictures of water, grass, a bed of flowers, or a pattern on a fabric contain strong examples of image texture. Many natural and man-made objects are distinguished by their texture. Brodatz [1], in his introduction to Textures: A photographic album, states “The age of photography is likely to be an age of texture.” His texture photographs, which range from man-made textures (woven aluminum wire, brick walls, handwoven rugs, etc.), to natural objects (water, clouds, sand, grass, lizard skin, etc.) are being used as a standard data set for image-texture analysis. Such textured objects are difficult to describe in qualitative terms, let alone creating quantitative descriptions required for machine analysis. The observed texture often depends on the lighting conditions, viewing angles and distance, may change over a period of time as in pictures of landscapes. Texture is a property of image regions, as is evident from the examples. Texture has no universally accepted formal definition, although it is easy to visualize what one means by texture. One can think of a texture as consisting of some basic primitives (texels or Julesz’s textons [2,3], also referred to as the micropatterns), whose spatial distribution in the image creates the appearance of a texture. Most man-made objects have such easily identifiable texels. The spatial distribution of texels could be regular (or periodic) or random. In Figure 12.1a, “brick” is a micropattern in which particular distribution in a “brick-wall” image constitutes a structured pattern. The individual primitives need not be of the same size and shape, as illustrated by the bricks and pebbles textures (Fig. 12.1b). Well-defined micropatterns may not exist in many cases, such as pictures of sand on the beach, water, and clouds. Some examples of textured images are shown in Figure 12.1. Detection of the micropatterns, if they exist, and their spatial arrangement offers important depth cues to the human visual system (see Fig. 12.2.) 313 314 TEXTURE FEATURES FOR IMAGE RETRIEVAL (a) Brick wall (b) Stones and pebbles (c) Sand (d) Water (c) Tree bark (f) Grass Figure 12.1. Examples of some textured images. Image-texture analysis during the past three decades has primarily focused on texture classification, texture segmentation, and texture synthesis. In texture classification the objective is to assign a unique label to each homogeneous region. For example, regions in a satellite picture may be classified into ice, water, forest, agricultural areas, and so on. In medical image analysis, texture is used to classify magnetic resonance (MR) images of the brain into gray and white matter or to detect cysts in X-ray computed tomography (CT) images of the kidneys. If the images are preprocesed to extract homogeneous-textured regions, then the pixel data within these regions can be used for classifying the regions. In doing so we associate each pixel in the image to a corresponding class label , the label of the region to which that particular pixel belongs. An excellent overview of some of the early methods for texture classification can be found in an overview paper by Haralick [4]. INTRODUCTION 315 (a) (b) (c) Figure 12.2. Texture is useful in depth perception and image segmentation. Picture of (a) a building, (b) a ceiling, and (c) a scene consisting of multiple textures. Texture, together with color and shape, helps distinguish objects in a scene. Figure 12.2c shows a scene consisting of multiple textures. Texture segmentation refers to computing a partitioning of the image, each of the partitions being homogeneous in some sense. Note that homogeneity in color and texture may not ensure segmenting the image into semantically meaningful objects. Typically, segmentation results in an overpartitioning of the objects of interest. Segmentation and classification often go together — classifying the individual pixels in the image produces a segmentation. However, to obtain a good classification, one needs homogeneous-textured regions, that is, one must segment the image first. Texture adds realism to synthesized images. The objective of texture synthesis is to generate texture that is perceptually indistinguishable from that of a provided example. Such synthesized textures can then be used in applications such as texture mapping. In computer graphics, texture mapping is used to generate surface details of synthesized objects. Texture mapping refers to mapping an image, usually a digitized image, onto a surface [5]. Generative models that can synthesize textures under varying imaging conditions would aid texture mapping and facilitate the creation of realistic scenes. In addition, texture is considered as an important visual cue in the emerging application area of content-based access to multimedia data. One particular aspect that has received much attention in recent years is query by example. Given a query image, one is interested in finding visually similar images in the database. As a basic image feature, texture is very useful in similarity search. This is conceptually similar to the texture-classification problem in that we are interested in computing texture descriptions that allow us to make comparisons between different textured images in the database. Recall that in texture classification 316 TEXTURE FEATURES FOR IMAGE RETRIEVAL we compute a label for a given textured image. This label may have semantics associated with it, for example, water texture or cloud texture. If the textures in the database are similarly classified, their labels can then be used to retrieve other images containing the water or cloud texture. The requirements on similarity retrieval, however, are somewhat different. First, it may not be feasible to create an exhaustive class-label dictionary. Second, even if such class-label information is available, one is interested in finding the top N matches within that class that are visually similar to the given pattern. The database should store detailed texture descriptions to allow search and retrieval of similar texture patterns. The focus of this chapter is on the use of texture features for similarity search. 12.1.1 Organization of the Chapter Our main focus will be on descriptors that are useful for texture representation for similarity search. We begin with an overview of image texture, emphasizing characteristics and properties that are useful for indexing and retrieving images using texture. In typical applications, a number of top matches with rank-ordered similarities to the query pattern will be retrieved. In presenting this overview, we will only be able to give an overview of the rich and diverse work in this area and strongly encourage the reader to follow-up on the numerous references provided. An overview of texture features is given in the next section. For convenience, the existing texture descriptors are classified into three categories: features that are computed in the spatial domain (Section 12.3), features that are computed using random field models (Section 12.4), and features that are computed in a transform domain (Section 12.5). Section 12.6 contains a comparison of different texture descriptors in terms of image-retrieval performance. Section 12.7 describes the use of texture features in image segmentation and in constructing a texture thesaurus for browsing and searching an aerial image database. Ongoing work related to texture in the moving picture experts group (MPEG-7) standardization within the international standards organization (ISO) MPEG subcommittee has also been described briefly. 12.2 TEXTURE FEATURES A feature is defined as a distinctive characteristic of an image and a descriptor is a representation of a feature [6]. A descriptor defines the syntax and the semantics of the feature representation. Thus, a texture feature captures one specific attribute of an image, such as coarseness, and a coarseness descriptor is used to represent that feature. In the image processing and computer-vision literature, however, the terms feature and descriptor (of a feature) are often used synonymously. We also drop this distinction and use these terms interchangeably in the following discussion. Initial work on texture discrimination used various image texture statistics. For example, one can consider the gray level histogram as representing the TEXTURE FEATURES 317 first-order distribution of pixel intensities, and the mean and the standard deviation computed from these histograms can be used as texture descriptors for discriminating different textures. First-order statistics treat pixel-intensity values as independent random variables; hence, they ignore the dependencies between neighboring-pixel intensities and do not capture most textural properties well. One can use second-order or higher-order statistics to develop more effective descriptors. Consider the pixel value at position s, I(s)-l. Then, the joint distribution is specified by P(l,m,r) = Prob (I (s) = l and I(s + r) = m),where s and r denote 2D pixel coordinates. One of the popular second-order statistical features is the gray-level co-occurrence matrix, which is generated from the empirical version of P (l, m, r) (obtained by counting how many pixels have value l and the pixel displaced by r has the value m). Many statistical features computed from co-occurrence matrices have been used in texture discrimination (for a detailed discussion refer to Ref. [7], Chapter 9). The popularity of this descriptor is due to Julesz, who first proposed the use of co-occurrence matrices for texture discrimination [8]. He was motivated by his conjecture that humans are not able to discriminate textures that have identical second-order statistics (this conjecture has since then proven to be false.) During the 1970s the research mostly focused on statistical texture features for discrimination, and in the 1980s, there was considerable excitement and interest in generative models of textures. These models were used for both texture synthesis and texture classification. Numerous random field models for texture representation [9–12] were developed in this spirit and a review of some of the recent work can be found in Ref. [13]. Once the appropriate model features are computed, the problem of texture classification can be addressed using techniques from traditional pattern classification [14]. Multiresolution analysis and filtering has influenced many areas of image analysis, including texture, during the 1990s. We refer to these as spatial filtering–based methods in the following section. Some of these methods are motivated by seeking models that capture human texture discrimination. In particular, preattentive texture discrimination — the ability of humans to distinguish between textures in an image without any detailed scene analysis — has been extensively studied. Some of the early work in this field can be attributed to Julesz [2,3] for his theory of textons as basic textural elements. Spatial filtering approaches have been used by many researchers for detecting texture boundaries [15,16]. In these studies, texture discrimination is generally modeled as a sequence of filtering operations without any prior assumptions about the texture-generation process. Some of the recent work involves multiresolution filtering for both classification and segmentation [17,18]. 12.2.1 Human Texture Perception Texture, as one of the basic visual features, has been studied extensively by psychophysicists for over three decades. Texture helps in the studying and under- standing of early visual mechanisms in human vision. In particular, Julesz and his 318 TEXTURE FEATURES FOR IMAGE RETRIEVAL colleagues [2,3,8,19] have studied texture in the context of preattentive vision. Julesz defines a “preattentive visual system” as one that “cannot process complex forms, yet can, almost instantaneously, without effort or scrutiny, detect differences in a few local conspicuous features, regardless of where they occur” (quoted from Ref. [3]). Julesz coined the word textons to describe such features that include elongated blobs (together with their color, orientation, length, and width), line terminations, and crossings of line-segments. Differences in textons or in their density can only be preattentively discriminated. The observations in Ref. [3] are mostly limited to line drawing patterns and do not include gray scale textures. Julesz’s work focused on low-level texture characterization using textons, whereas Rao and Lohse [20] addressed issues related to high-level features for texture perception. In contrast with preattentive perception, high-level features are concerned with attentive analysis. There are many applications, including some in image retrieval, that require such analysis. Examples include medical- image analysis (detection of skin cancer, analysis of mammograms, analysis of brain MR images for tissue classification and segmentation, to mention a few) and many process control applications. Rao and Lohse identify three features as being important in human texture perception: repetition, orientation,andcomplexity. Repetition refers to periodic patterns and is often associated with regularity. A brick wall is a repetitive pattern, whereas a picture of ocean water is nonrepet- itive (and has no structure). Orientation refers to the presence or absence of directional textures. Directional textures have a flowlike pattern as in a picture of wood grain or waves [21]. Complexity refers to the descriptional complexity of the textures and, as the authors state in Ref. [20], “ if one had to describe the texture symbolically, it (complexity) indicates how complex the resulting description would be.” Complexity is related to Tamura’s coarseness feature (see Section 12.3.2). 12.3 TEXTURE FEATURES BASED ON SPATIAL-DOMAIN ANALYSIS 12.3.1 Co-occurrence Matrices Texture manifests itself as variations of the image intensity within a given region. Following the early work on textons by Julesz [19] and his conjecture that human texture discrimination is based on the second-order statistics of image intensities, much attention was given to characterizing the spatial intensity distribution of textures. A popular descriptor that emerged is the co-occurrence matrix. Co- occurrence matrices [19, 22–26] are based on second-order statistics of pairs of intensity values of pixels in an image. A co-occurrence matrix counts how often pairs of grey levels of pixels, separated by a certain distance and lying along certain direction, occur in an image. Let (x, y) ∈{1, ,N} be the intensity value of an image pixel at (x, y).Let[(x 1 − x 2 ) 2 + (y 1 − y 2 ) 2 ] 1/2 = d be the distance that separates two pixels at locations (x 1 ,y 1 ) and (x 2 ,y 2 ), respectively, and with intensities i and j, respectively. The co-occurrence matrices for a given TEXTURE FEATURES BASED ON SPATIAL-DOMAIN ANALYSIS 319 d are defined as follows: c (d) = [c(i, j)],i,j,∈{l, ,N} (12.1) where c(i, j ) is the cardinality of the set of pixel pairs that satisfy I (x 1 ,y 1 ) = i and I(x 2 ,y 2 ) = j , and are separated by a distance d. Note that the direction between the pixel pairs can be used to further distinguish co-occurrence matrices for a given distance d. Haralick and coworkers [25] describe 14 texture features based on various statistical and information theoretic properties of the co-occurrence matrices. Some of them can be associated with texture properties such as homogeneity, coarseness, and periodicity. Despite the significant amount of work on this feature descriptor, it now appears that this characterization of texture is not very effective for classification and retrieval. In addition, these features are expensive to compute; hence, co-occurrence matrices are rarely used in image database applications. 12.3.2 Tamura’s Features One of the influential works on texture features that correspond to human texture perception is the paper by Tamura, Mori, and Yamawaki [27]. They characterized image texture along the dimensions of coarseness, contrast, directionality, line- likeness, regularity, and roughness. 12.3.2.1 Coarseness. Coarseness corresponds to the “scale” or image resolution. Consider two aerial pictures of Manhattan taken from two different heights: the one which is taken from a larger distance is said to be less coarse than the one taken from a shorter distance wherein the blocky appearance of the buildings is more evident. In this sense, coarseness also refers to the size of the underlying elements forming the texture. Note that an image with finer resolution will have a coarser texture. An estimator of this parameter would then be the best scale or resolution that captures the image texture. Many computational approaches to measure this texture property have been described in the literature. In general, these approaches try to measure the level of spatial rate of change in image intensity and therefore indicate the level of coarseness of the texture. The particular procedure proposed in Ref. [27] can be summarized as follows: 1. Compute moving averages in windows of size 2 k × 2 k at each pixel (x, y), where k = 0, 1, ,5. 2. At each pixel, compute the difference E k (x, y) between pairs of nonover- lapping moving averages in the horizontal and vertical directions. 3. At each pixel, the value of k that maximizes E k (x, y) in either direction is used to set the best size: S best (x, y) = 2 k . 4. The coarseness measure F crs is then computed by averaging S best (x, y) over the entire image. 320 TEXTURE FEATURES FOR IMAGE RETRIEVAL Instead of taking the average of S best , an improved version of the coarseness feature can be obtained by using a histogram to characterize the distribution of S best . This modified feature can be used to deal with a texture that has multiple coarseness properties. 12.3.2.2 Contrast. Contrast measures the amount of local intensity variation present in an image. Contrast also refers to the overall picture quality — a high- contrast picture is often considered to be of better quality than a low–contrast version. Dynamic range of the intensity values and sharpness of the edges in the image are two indicators of picture contrast. In Ref. [27], contrast is defined as F con = σ/(α 4 ) n (12.2) where n is a positive number, σ is the standard deviation of the gray-level probability distribution, and α 4 is the kurtosis, a measure of the polarization between black and white regions in the image. The kurtosis is defined as α 4 = µ 4 /σ 4 (12.3) where µ 4 is the fourth central moment of the gray-level probability distribution. In the experiments in [27], n = 1/4 resulted in the best texture-discrimination performance. 12.3.2.3 Directionality. Directionality is a global texture property. Direction- ality (or lack of it) is due to both the basic shape of the texture element and the placement rule used in creating the texture. Patterns can be highly directional (e.g., a brick wall) or may be nondirectional, as in the case of a picture of a cloud. The degree of directionality, measured on a scale of 0 to 1, can be used as a descriptor (for example, see Ref. [27]). Thus, two patterns, which differ only in their orientation, are considered to have the same degree of directionality. These descriptions can be computed either in the spatial domain or in the frequency domain. In [27], the oriented edge histogram (number of pixels in which edge strength in a certain direction exceeds a given threshold) is used to measure the degree of directionality. Edge strength and direction are computed using the Sobel edge detector [28]. A histogram H(φ) of direction values φ is then constructed by quantizing φ and counting the pixels with magnitude larger than a predefined threshold. This histogram exhibits strong peaks for highly directional images and is relatively flat for images without strong orientation. A quantitative measure of directionality can be computed from the sharpness of the peaks as follows: F dir = l − r · n p · p  p  φ∈w ·p (φ − φ p ) 2 · H(φ) (12.4) AUTOREGRESSIVE AND RANDOM FIELD TEXTURE MODELS 321 where n p is the number of peaks and φ p is the pth peak position of H . For each peak p, w p is the set of bins distributed over it and r is the normalizing factor related to quantizing levels of φ. In addition to the three components discussed earlier, Tamura and coworkers [27] also consider three other features, which they term line-likeness, regularity, and roughness. There appears to be a significant correlation between these three features and coarseness, contrast, and directionality. It is not clear that adding these additional dimensions enhances the effectiveness of the description. These additional dimensions will not be used in the comparison experiments described in Section 12.7. Tamura’s features capture the high-level perceptual attributes of a texture well and are useful for image browsing. However, they are not very effective for finer texture discrimination. 12.4 AUTOREGRESSIVE AND RANDOM FIELD TEXTURE MODELS One can think of a textured image as a two-dimensional (2D) array of random numbers. Then, the pixel intensity at each location is a random variable. One can model the image as a function f (r, ω), where r is the position vector representing the pixel location in the 2D space and ω is a random parameter. For a given value of r, f (r, ω) is a random variable (because ω is a random variable). Once we select a specific texture ω, f (r, ω) is an image, namely, a function over the two-dimensional grid indexed by r. f (r, ω) is called a random field [29]. Thus, one can think of a texture-intensity distribution as a realization of a random field. Random field models (also referred to as spatial-interaction models) impose assumptions on the intensity distribution. One of the initial motivations for such model-based analysis of texture is that these models can be used for texture synthesis. There is a rich literature on random field models for texture analysis dating back to the early seventies and these models have found applications not only in texture synthesis but also in texture classification and segmentation [9,11,13,30–34]. A typical random field model is characterized by a set of neighbors (typically, a symmetric neighborhood around the pixel), a set of model coefficients, and a noise sequence with certain specified characteristics. Given an array of observations {y(s)} of pixel-intensity values, it is natural to expect that the pixel values are locally correlated. This leads to the well known Markov model P [y(s)|all y(r), r = s] = P [y(s)|all y(s + r),r ∈ N],(12.5) where N is a symmetric neighborhood set. For example, if the neighborhood is the four immediate neighbors of a pixel on a rectangular grid, then N = {(0, 1), (1, 0), (−1, 0), (0, −1)}. We refer to Besag [35,36] for the constraints on the conditional probability density for the resulting random field to be Markov. If, in addition to being 322 TEXTURE FEATURES FOR IMAGE RETRIEVAL Markov, {y(s)} is also Gaussian, then, a pixel value at s, y(s), can be written as a linear combination of the pixel values y(s + r), r ∈ N, and an additive correlated noise (see Ref. [34]). A special case of the Markov random field (MRF) that has received much attention in the image retrieval community is the simultaneous autoregressive model (SAR), given by y(s) =  r∈N θ(r)y(s + r) +  βw(s), (12.6) where {y(s)} are the observed pixel intensities, s is the indexing of spatial locations, N is a symmetric neighborhood set, and w(s) is white noise with zero mean and unit variance. The parameters ({θ(r), β}) characterize the texture observations {y(s)} and can be estimated from those observations. The SAR and MRF models are related to each other in that, for every SAR there exists an equivalent MRF with second-order statistics that are identical to the SAR model. However, the converse is not true: given an MRF, there may not be an equivalent SAR. The model parameters ({θ(r)},β) form the texture feature vector that can be used for classification and similarity retrieval. The second-order neighborhood has been widely used and it consists of the 8-neighborhood of a pixel N ={(0, 1), (1, 0), (0, −1), (−1, 0), (1, 1)(1, −1), (−1, −1), (−1, 1)}.For a symmetric model θ (r) = θ(−r); hence five parameters are needed to specify a symmetric second-order SAR model. In order to define an appropriate SAR model, one has to determine the size of the neighborhood. This is a nontrivial problem, and often, a fixed-size neighborhood does not represent all texture variations very well. In order to address this issue, the multiresolution simultaneous autoregressive (MRSAR) model has been proposed [37,38]. The MRSAR model tries to account for the variability of texture primitives by defining the SAR model at different resolutions of a Gaussian pyramid. Thus, three levels of the Gaussian pyramid, together with a second-order symmetric model, requires 15(3 × 5) parameters to specify the texture. 12.4.1 Wold Model Liu and Picard propose the Wold model for image retrieval application [39]. It is based on the Wold decomposition of stationary stochastic processes. In the Wold model, a 2D homogeneous random field is decomposed into three mutually orthogonal components, which approximately correspond to the three dimensions (periodicity, directionality, and complexity or randomness) identi- fied by Rao and Lohse [20]. The construction of the Wold model proceeds as follows. First, the periodicity of the texture pattern is analyzed by considering the autocorrelation function of the image. Note that for periodic patterns, the autocorrelation function is also periodic. The corresponding Wold feature set consists of the frequencies and the magnitudes of the harmonic spectral peaks.

Ngày đăng: 26/01/2014, 15:20

Xem thêm