Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 32 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
32
Dung lượng
737,11 KB
Nội dung
Image Databases: Search and Retrieval of Digital Imagery
Edited by Vittorio Castelli, Lawrence D. Bergman
Copyright
2002 John Wiley & Sons, Inc.
ISBNs: 0-471-32116-8 (Hardback); 0-471-22463-4 (Electronic)
12 Texture Features for Image
Retrieval
B.S. MANJUNATH
University of California at Santa Barbara, Santa Barbara, California
WEI-YING MA
Microsoft Research China, Beijing, China
12.1 INTRODUCTION
Pictures of water, grass, a bed of flowers, or a pattern on a fabric contain strong
examples of image texture. Many natural and man-made objects are distinguished
by their texture. Brodatz [1], in his introduction to Textures: A photographic
album, states “The age of photography is likely to be an age of texture.” His
texture photographs, which range from man-made textures (woven aluminum
wire, brick walls, handwoven rugs, etc.), to natural objects (water, clouds, sand,
grass, lizard skin, etc.) are being used as a standard data set for image-texture
analysis. Such textured objects are difficult to describe in qualitative terms,
let alone creating quantitative descriptions required for machine analysis. The
observed texture often depends on the lighting conditions, viewing angles and
distance, may change over a period of time as in pictures of landscapes.
Texture is a property of image regions, as is evident from the examples. Texture
has no universally accepted formal definition, although it is easy to visualize what
one means by texture. One can think of a texture as consisting of some basic
primitives (texels or Julesz’s textons [2,3], also referred to as the micropatterns),
whose spatial distribution in the image creates the appearance of a texture. Most
man-made objects have such easily identifiable texels. The spatial distribution
of texels could be regular (or periodic) or random. In Figure 12.1a, “brick” is a
micropattern in which particular distribution in a “brick-wall” image constitutes
a structured pattern. The individual primitives need not be of the same size and
shape, as illustrated by the bricks and pebbles textures (Fig. 12.1b). Well-defined
micropatterns may not exist in many cases, such as pictures of sand on the beach,
water, and clouds. Some examples of textured images are shown in Figure 12.1.
Detection of the micropatterns, if they exist, and their spatial arrangement offers
important depth cues to the human visual system (see Fig. 12.2.)
313
314 TEXTURE FEATURES FOR IMAGE RETRIEVAL
(a) Brick wall (b) Stones and pebbles
(c) Sand (d) Water
(c) Tree bark (f) Grass
Figure 12.1. Examples of some textured images.
Image-texture analysis during the past three decades has primarily focused
on texture classification, texture segmentation, and texture synthesis. In texture
classification the objective is to assign a unique label to each homogeneous
region. For example, regions in a satellite picture may be classified into ice,
water, forest, agricultural areas, and so on. In medical image analysis, texture
is used to classify magnetic resonance (MR) images of the brain into gray and
white matter or to detect cysts in X-ray computed tomography (CT) images
of the kidneys. If the images are preprocesed to extract homogeneous-textured
regions, then the pixel data within these regions can be used for classifying the
regions. In doing so we associate each pixel in the image to a corresponding
class label , the label of the region to which that particular pixel belongs. An
excellent overview of some of the early methods for texture classification can be
found in an overview paper by Haralick [4].
INTRODUCTION 315
(a) (b)
(c)
Figure 12.2. Texture is useful in depth perception and image segmentation. Picture of
(a) a building, (b) a ceiling, and (c) a scene consisting of multiple textures.
Texture, together with color and shape, helps distinguish objects in a scene.
Figure 12.2c shows a scene consisting of multiple textures. Texture segmentation
refers to computing a partitioning of the image, each of the partitions being
homogeneous in some sense. Note that homogeneity in color and texture may
not ensure segmenting the image into semantically meaningful objects. Typically,
segmentation results in an overpartitioning of the objects of interest. Segmentation
and classification often go together — classifying the individual pixels in the
image produces a segmentation. However, to obtain a good classification, one
needs homogeneous-textured regions, that is, one must segment the image first.
Texture adds realism to synthesized images. The objective of texture synthesis
is to generate texture that is perceptually indistinguishable from that of a provided
example. Such synthesized textures can then be used in applications such as
texture mapping. In computer graphics, texture mapping is used to generate
surface details of synthesized objects. Texture mapping refers to mapping an
image, usually a digitized image, onto a surface [5]. Generative models that can
synthesize textures under varying imaging conditions would aid texture mapping
and facilitate the creation of realistic scenes.
In addition, texture is considered as an important visual cue in the emerging
application area of content-based access to multimedia data. One particular aspect
that has received much attention in recent years is query by example. Given a
query image, one is interested in finding visually similar images in the database.
As a basic image feature, texture is very useful in similarity search. This is
conceptually similar to the texture-classification problem in that we are interested
in computing texture descriptions that allow us to make comparisons between
different textured images in the database. Recall that in texture classification
316 TEXTURE FEATURES FOR IMAGE RETRIEVAL
we compute a label for a given textured image. This label may have semantics
associated with it, for example, water texture or cloud texture. If the textures
in the database are similarly classified, their labels can then be used to retrieve
other images containing the water or cloud texture. The requirements on similarity
retrieval, however, are somewhat different. First, it may not be feasible to create
an exhaustive class-label dictionary. Second, even if such class-label information
is available, one is interested in finding the top N matches within that class
that are visually similar to the given pattern. The database should store detailed
texture descriptions to allow search and retrieval of similar texture patterns. The
focus of this chapter is on the use of texture features for similarity search.
12.1.1 Organization of the Chapter
Our main focus will be on descriptors that are useful for texture representation
for similarity search. We begin with an overview of image texture, emphasizing
characteristics and properties that are useful for indexing and retrieving images
using texture. In typical applications, a number of top matches with rank-ordered
similarities to the query pattern will be retrieved. In presenting this overview,
we will only be able to give an overview of the rich and diverse work in this
area and strongly encourage the reader to follow-up on the numerous references
provided.
An overview of texture features is given in the next section. For convenience,
the existing texture descriptors are classified into three categories: features that are
computed in the spatial domain (Section 12.3), features that are computed using
random field models (Section 12.4), and features that are computed in a transform
domain (Section 12.5). Section 12.6 contains a comparison of different texture
descriptors in terms of image-retrieval performance. Section 12.7 describes the
use of texture features in image segmentation and in constructing a texture
thesaurus for browsing and searching an aerial image database. Ongoing work
related to texture in the moving picture experts group (MPEG-7) standardization
within the international standards organization (ISO) MPEG subcommittee has
also been described briefly.
12.2 TEXTURE FEATURES
A feature is defined as a distinctive characteristic of an image and a descriptor is
a representation of a feature [6]. A descriptor defines the syntax and the semantics
of the feature representation. Thus, a texture feature captures one specific attribute
of an image, such as coarseness, and a coarseness descriptor is used to represent
that feature. In the image processing and computer-vision literature, however,
the terms feature and descriptor (of a feature) are often used synonymously. We
also drop this distinction and use these terms interchangeably in the following
discussion.
Initial work on texture discrimination used various image texture statistics.
For example, one can consider the gray level histogram as representing the
TEXTURE FEATURES 317
first-order distribution of pixel intensities, and the mean and the standard devi-
ation computed from these histograms can be used as texture descriptors for
discriminating different textures. First-order statistics treat pixel-intensity values
as independent random variables; hence, they ignore the dependencies between
neighboring-pixel intensities and do not capture most textural properties well.
One can use second-order or higher-order statistics to develop more effective
descriptors. Consider the pixel value at position s, I(s)-l. Then, the joint distri-
bution is specified by P(l,m,r) = Prob (I (s) = l and I(s + r) = m),where
s and r denote 2D pixel coordinates. One of the popular second-order statis-
tical features is the gray-level co-occurrence matrix, which is generated from the
empirical version of P (l, m, r) (obtained by counting how many pixels have
value l and the pixel displaced by r has the value m). Many statistical features
computed from co-occurrence matrices have been used in texture discrimination
(for a detailed discussion refer to Ref. [7], Chapter 9). The popularity of this
descriptor is due to Julesz, who first proposed the use of co-occurrence matrices
for texture discrimination [8]. He was motivated by his conjecture that humans
are not able to discriminate textures that have identical second-order statistics
(this conjecture has since then proven to be false.)
During the 1970s the research mostly focused on statistical texture features for
discrimination, and in the 1980s, there was considerable excitement and interest in
generative models of textures. These models were used for both texture synthesis
and texture classification. Numerous random field models for texture representa-
tion [9–12] were developed in this spirit and a review of some of the recent work
can be found in Ref. [13]. Once the appropriate model features are computed,
the problem of texture classification can be addressed using techniques from
traditional pattern classification [14].
Multiresolution analysis and filtering has influenced many areas of image
analysis, including texture, during the 1990s. We refer to these as spatial
filtering–based methods in the following section. Some of these methods
are motivated by seeking models that capture human texture discrimination.
In particular, preattentive texture discrimination — the ability of humans
to distinguish between textures in an image without any detailed scene
analysis — has been extensively studied. Some of the early work in this field can
be attributed to Julesz [2,3] for his theory of textons as basic textural elements.
Spatial filtering approaches have been used by many researchers for detecting
texture boundaries [15,16]. In these studies, texture discrimination is generally
modeled as a sequence of filtering operations without any prior assumptions about
the texture-generation process. Some of the recent work involves multiresolution
filtering for both classification and segmentation [17,18].
12.2.1 Human Texture Perception
Texture, as one of the basic visual features, has been studied extensively by
psychophysicists for over three decades. Texture helps in the studying and under-
standing of early visual mechanisms in human vision. In particular, Julesz and his
318 TEXTURE FEATURES FOR IMAGE RETRIEVAL
colleagues [2,3,8,19] have studied texture in the context of preattentive vision.
Julesz defines a “preattentive visual system” as one that “cannot process complex
forms, yet can, almost instantaneously, without effort or scrutiny, detect differ-
ences in a few local conspicuous features, regardless of where they occur” (quoted
from Ref. [3]). Julesz coined the word textons to describe such features that
include elongated blobs (together with their color, orientation, length, and width),
line terminations, and crossings of line-segments. Differences in textons or in their
density can only be preattentively discriminated. The observations in Ref. [3] are
mostly limited to line drawing patterns and do not include gray scale textures.
Julesz’s work focused on low-level texture characterization using textons,
whereas Rao and Lohse [20] addressed issues related to high-level features for
texture perception. In contrast with preattentive perception, high-level features
are concerned with attentive analysis. There are many applications, including
some in image retrieval, that require such analysis. Examples include medical-
image analysis (detection of skin cancer, analysis of mammograms, analysis of
brain MR images for tissue classification and segmentation, to mention a few) and
many process control applications. Rao and Lohse identify three features as being
important in human texture perception: repetition, orientation,andcomplexity.
Repetition refers to periodic patterns and is often associated with regularity. A
brick wall is a repetitive pattern, whereas a picture of ocean water is nonrepet-
itive (and has no structure). Orientation refers to the presence or absence of
directional textures. Directional textures have a flowlike pattern as in a picture
of wood grain or waves [21]. Complexity refers to the descriptional complexity
of the textures and, as the authors state in Ref. [20], “ if one had to describe
the texture symbolically, it (complexity) indicates how complex the resulting
description would be.” Complexity is related to Tamura’s coarseness feature
(see Section 12.3.2).
12.3 TEXTURE FEATURES BASED ON SPATIAL-DOMAIN
ANALYSIS
12.3.1 Co-occurrence Matrices
Texture manifests itself as variations of the image intensity within a given region.
Following the early work on textons by Julesz [19] and his conjecture that human
texture discrimination is based on the second-order statistics of image intensities,
much attention was given to characterizing the spatial intensity distribution of
textures. A popular descriptor that emerged is the co-occurrence matrix. Co-
occurrence matrices [19, 22–26] are based on second-order statistics of pairs of
intensity values of pixels in an image. A co-occurrence matrix counts how often
pairs of grey levels of pixels, separated by a certain distance and lying along
certain direction, occur in an image. Let (x, y) ∈{1, ,N} be the intensity
value of an image pixel at (x, y).Let[(x
1
− x
2
)
2
+ (y
1
− y
2
)
2
]
1/2
= d be the
distance that separates two pixels at locations (x
1
,y
1
) and (x
2
,y
2
), respectively,
and with intensities i and j, respectively. The co-occurrence matrices for a given
TEXTURE FEATURES BASED ON SPATIAL-DOMAIN ANALYSIS 319
d are defined as follows:
c
(d)
= [c(i, j)],i,j,∈{l, ,N} (12.1)
where c(i, j ) is the cardinality of the set of pixel pairs that satisfy I (x
1
,y
1
) =
i and I(x
2
,y
2
) = j , and are separated by a distance d. Note that the direc-
tion between the pixel pairs can be used to further distinguish co-occurrence
matrices for a given distance d. Haralick and coworkers [25] describe 14 texture
features based on various statistical and information theoretic properties of the
co-occurrence matrices. Some of them can be associated with texture properties
such as homogeneity, coarseness, and periodicity. Despite the significant amount
of work on this feature descriptor, it now appears that this characterization of
texture is not very effective for classification and retrieval. In addition, these
features are expensive to compute; hence, co-occurrence matrices are rarely used
in image database applications.
12.3.2 Tamura’s Features
One of the influential works on texture features that correspond to human texture
perception is the paper by Tamura, Mori, and Yamawaki [27]. They characterized
image texture along the dimensions of coarseness, contrast, directionality, line-
likeness, regularity, and roughness.
12.3.2.1 Coarseness. Coarseness corresponds to the “scale” or image resolu-
tion. Consider two aerial pictures of Manhattan taken from two different heights:
the one which is taken from a larger distance is said to be less coarse than the
one taken from a shorter distance wherein the blocky appearance of the buildings
is more evident. In this sense, coarseness also refers to the size of the underlying
elements forming the texture. Note that an image with finer resolution will have
a coarser texture. An estimator of this parameter would then be the best scale
or resolution that captures the image texture. Many computational approaches to
measure this texture property have been described in the literature. In general,
these approaches try to measure the level of spatial rate of change in image inten-
sity and therefore indicate the level of coarseness of the texture. The particular
procedure proposed in Ref. [27] can be summarized as follows:
1. Compute moving averages in windows of size 2
k
× 2
k
at each pixel (x, y),
where k = 0, 1, ,5.
2. At each pixel, compute the difference E
k
(x, y) between pairs of nonover-
lapping moving averages in the horizontal and vertical directions.
3. At each pixel, the value of k that maximizes E
k
(x, y) in either direction is
used to set the best size: S
best
(x, y) = 2
k
.
4. The coarseness measure F
crs
is then computed by averaging S
best
(x, y) over
the entire image.
320 TEXTURE FEATURES FOR IMAGE RETRIEVAL
Instead of taking the average of S
best
, an improved version of the coarseness
feature can be obtained by using a histogram to characterize the distribution of
S
best
. This modified feature can be used to deal with a texture that has multiple
coarseness properties.
12.3.2.2 Contrast. Contrast measures the amount of local intensity variation
present in an image. Contrast also refers to the overall picture quality — a high-
contrast picture is often considered to be of better quality than a low–contrast
version. Dynamic range of the intensity values and sharpness of the edges in the
image are two indicators of picture contrast. In Ref. [27], contrast is defined as
F
con
= σ/(α
4
)
n
(12.2)
where n is a positive number, σ is the standard deviation of the gray-level
probability distribution, and α
4
is the kurtosis, a measure of the polarization
between black and white regions in the image. The kurtosis is defined as
α
4
= µ
4
/σ
4
(12.3)
where µ
4
is the fourth central moment of the gray-level probability distribution.
In the experiments in [27], n = 1/4 resulted in the best texture-discrimination
performance.
12.3.2.3 Directionality. Directionality is a global texture property. Direction-
ality (or lack of it) is due to both the basic shape of the texture element and the
placement rule used in creating the texture. Patterns can be highly directional
(e.g., a brick wall) or may be nondirectional, as in the case of a picture of a
cloud. The degree of directionality, measured on a scale of 0 to 1, can be used as
a descriptor (for example, see Ref. [27]). Thus, two patterns, which differ only in
their orientation, are considered to have the same degree of directionality. These
descriptions can be computed either in the spatial domain or in the frequency
domain. In [27], the oriented edge histogram (number of pixels in which edge
strength in a certain direction exceeds a given threshold) is used to measure the
degree of directionality. Edge strength and direction are computed using the Sobel
edge detector [28]. A histogram H(φ) of direction values φ is then constructed
by quantizing φ and counting the pixels with magnitude larger than a predefined
threshold. This histogram exhibits strong peaks for highly directional images and
is relatively flat for images without strong orientation. A quantitative measure of
directionality can be computed from the sharpness of the peaks as follows:
F
dir
= l − r · n
p
·
p
p
φ∈w
·p
(φ − φ
p
)
2
· H(φ) (12.4)
AUTOREGRESSIVE AND RANDOM FIELD TEXTURE MODELS 321
where n
p
is the number of peaks and φ
p
is the pth peak position of H . For each
peak p, w
p
is the set of bins distributed over it and r is the normalizing factor
related to quantizing levels of φ.
In addition to the three components discussed earlier, Tamura and coworkers
[27] also consider three other features, which they term line-likeness, regularity,
and roughness. There appears to be a significant correlation between these three
features and coarseness, contrast, and directionality. It is not clear that adding
these additional dimensions enhances the effectiveness of the description. These
additional dimensions will not be used in the comparison experiments described
in Section 12.7.
Tamura’s features capture the high-level perceptual attributes of a texture well
and are useful for image browsing. However, they are not very effective for finer
texture discrimination.
12.4 AUTOREGRESSIVE AND RANDOM FIELD TEXTURE MODELS
One can think of a textured image as a two-dimensional (2D) array of random
numbers. Then, the pixel intensity at each location is a random variable. One can
model the image as a function f (r, ω), where r is the position vector representing
the pixel location in the 2D space and ω is a random parameter. For a given value
of r, f (r, ω) is a random variable (because ω is a random variable). Once we
select a specific texture ω, f (r, ω) is an image, namely, a function over the
two-dimensional grid indexed by r. f (r, ω) is called a random field [29]. Thus,
one can think of a texture-intensity distribution as a realization of a random
field. Random field models (also referred to as spatial-interaction models) impose
assumptions on the intensity distribution. One of the initial motivations for such
model-based analysis of texture is that these models can be used for texture
synthesis. There is a rich literature on random field models for texture analysis
dating back to the early seventies and these models have found applications
not only in texture synthesis but also in texture classification and segmentation
[9,11,13,30–34].
A typical random field model is characterized by a set of neighbors (typically, a
symmetric neighborhood around the pixel), a set of model coefficients, and a noise
sequence with certain specified characteristics. Given an array of observations
{y(s)} of pixel-intensity values, it is natural to expect that the pixel values are
locally correlated. This leads to the well known Markov model
P [y(s)|all y(r), r = s] = P [y(s)|all y(s + r),r ∈ N],(12.5)
where N is a symmetric neighborhood set. For example, if the neighborhood
is the four immediate neighbors of a pixel on a rectangular grid, then N =
{(0, 1), (1, 0), (−1, 0), (0, −1)}.
We refer to Besag [35,36] for the constraints on the conditional probability
density for the resulting random field to be Markov. If, in addition to being
322 TEXTURE FEATURES FOR IMAGE RETRIEVAL
Markov, {y(s)} is also Gaussian, then, a pixel value at s, y(s), can be written
as a linear combination of the pixel values y(s + r), r ∈ N, and an additive
correlated noise (see Ref. [34]).
A special case of the Markov random field (MRF) that has received much
attention in the image retrieval community is the simultaneous autoregressive
model (SAR), given by
y(s) =
r∈N
θ(r)y(s + r) +
βw(s), (12.6)
where {y(s)} are the observed pixel intensities, s is the indexing of spatial loca-
tions, N is a symmetric neighborhood set, and w(s) is white noise with zero mean
and unit variance. The parameters ({θ(r), β}) characterize the texture observa-
tions {y(s)} and can be estimated from those observations. The SAR and MRF
models are related to each other in that, for every SAR there exists an equivalent
MRF with second-order statistics that are identical to the SAR model. However,
the converse is not true: given an MRF, there may not be an equivalent SAR.
The model parameters ({θ(r)},β) form the texture feature vector that
can be used for classification and similarity retrieval. The second-order
neighborhood has been widely used and it consists of the 8-neighborhood of a
pixel N ={(0, 1), (1, 0), (0, −1), (−1, 0), (1, 1)(1, −1), (−1, −1), (−1, 1)}.For
a symmetric model θ (r) = θ(−r); hence five parameters are needed to specify a
symmetric second-order SAR model.
In order to define an appropriate SAR model, one has to determine the size
of the neighborhood. This is a nontrivial problem, and often, a fixed-size neigh-
borhood does not represent all texture variations very well. In order to address
this issue, the multiresolution simultaneous autoregressive (MRSAR) model has
been proposed [37,38]. The MRSAR model tries to account for the variability
of texture primitives by defining the SAR model at different resolutions of a
Gaussian pyramid. Thus, three levels of the Gaussian pyramid, together with
a second-order symmetric model, requires 15(3 × 5) parameters to specify the
texture.
12.4.1 Wold Model
Liu and Picard propose the Wold model for image retrieval application [39].
It is based on the Wold decomposition of stationary stochastic processes. In
the Wold model, a 2D homogeneous random field is decomposed into three
mutually orthogonal components, which approximately correspond to the three
dimensions (periodicity, directionality, and complexity or randomness) identi-
fied by Rao and Lohse [20]. The construction of the Wold model proceeds as
follows. First, the periodicity of the texture pattern is analyzed by considering
the autocorrelation function of the image. Note that for periodic patterns, the
autocorrelation function is also periodic. The corresponding Wold feature set
consists of the frequencies and the magnitudes of the harmonic spectral peaks.