Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 27 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
27
Dung lượng
323,95 KB
Nội dung
Image Databases: Search and Retrieval of Digital Imagery
Edited by Vittorio Castelli, Lawrence D. Bergman
Copyright
2002 John Wiley & Sons, Inc.
ISBNs: 0-471-32116-8 (Hardback); 0-471-22463-4 (Electronic)
11 Color for Image Retrieval
JOHN R. SMITH
IBM T.J. Watson Research Center, Hawthorne, New York
11.1 INTRODUCTION
Recent progress in multimedia database systems has resulted in solutions for inte-
grating and managing a variety of multimedia formats that include images, video,
audio, and text [1]. Advances in automatic feature extraction and image-content
analysis have enabled the development of new functionalities for searching,
filtering, and accessing images based on perceptual features such as color [2,3],
texture [4,5], shape [6], and spatial composition [7]. The content-based query
paradigm, which allows similarity searching based on visual features, addresses
the obstacles to access color image databases that result from the insufficiency
of key word or text-based annotations to completely, consistently, and objec-
tively describe the content of images. Although perceptual features such as color
distributions and color layout often provide a poor characterization of the actual
semantic content of the images, content-based query appears to be effective for
indexing and rapidly accessing images based on the similarity of visual features.
11.1.1 Content-Based Query Systems
The seminal work on content-based query of image databases was carried out
in the IBM query by image content (QBIC) project [2,8]. The QBIC project
explored methods for searching for images based on the similarity of global
image features of color, texture, and shape. The QBIC project developed a novel
method of prefiltering of queries that greatly reduces the number of target images
searched in similarity queries [9]. The MIT Photobook project extended some of
the early methods of content-based query by developing descriptors that provide
effective matching as well as the ability to reconstruct the images and their
features from the descriptors [5]. Smith and Chang developed a fully automated
content-based query system called VisualSEEk, which further extended content-
based querying of image databases by extracting regions and allowing searching
based on their spatial layout [10]. Other content-based image database systems
285
286 COLOR FOR IMAGE RETRIEVAL
such as WebSEEk [11] and ImageRover [12] have focused on indexing and
searching of images on the World Wide Web. More recently, the MPEG-7 “Multi-
media Content Description Interface” standard provides standardized descriptors
for color, texture, shape, motion, and other features of audiovisual data to enable
fast and effective content-based searching [13].
11.1.2 Content-Based Query-by-Color
The objective of content-based query-by-color is to return images, color features
of which are most similar to the color features of a query image. Swain and
Ballard investigated the use of color histogram descriptors for searching of color
objects contained within the target images [3]. Stricker and Orengo developed
color moment descriptors for fast similarity searching of large image databases
[14]. Later, Stricker and Dimai developed a system for indexing of color images
based on the color moments of different regions [15]. In the spatial and feature
(SaFe) project, Smith and Chang designed a 166-bin color descriptor in HSV
color space and developed methods for graphically constructing content-based
queries that depict spatial layout of color regions [7]. Each of these approaches for
content-based query-by-color involves the design of color descriptors, including
the selection of the color feature space and a distance metric for measuring the
similarity of the color features.
11.1.3 Outline
This chapter investigates methods for content-based query of image databases
based on color features of images. In particular, the chapter focuses on the design
and extraction of color descriptors and the methods for matching. The chapter is
organized as follows. Section 11.2 analyzes the three main aspects of color feature
extraction, namely, the choice of a color space, the selection of a quantizer, and
the computation of color descriptors. Section 11.3 defines and discusses several
similarity measures and Section 11.4 evaluates their usefulness in content-based
image-query tasks. Concluding remarks and comments for future directions are
given in Section 11.5.
11.2 COLOR DESCRIPTOR EXTRACTION
Color is an important dimension of human visual perception that allows dis-
crimination and recognition of visual information. Correspondingly, color features
have been found to be effective for indexing and searching of color images in
image databases. Generally, color descriptors are relatively easily extracted and
matched and are therefore well-suited for content-based query. Typically, the
specification of a color descriptor
1
requires fixing a color space and determining
its partitioning.
1
In this chapter we use the term “feature” to mean a perceptual characteristic of images that signifies
something to human observers, whereas “descriptor” means a numeric quantity that describes a
feature.
COLOR DESCRIPTOR EXTRACTION 287
Images can be indexed by mapping their pixels into the quantized color space
and computing a color descriptor. Color descriptors such as color histograms can
be extracted from images in different ways. For example, in some cases, it is
important to capture the global color distribution of an image. In other cases,
it is important to capture the spatially localized apportionment of the colors to
different regions. In either case, because the descriptors are ultimately represented
as points in a multidimensional space, it is necessary to carefully define the
metrics for determining descriptor similarity.
The design space for color descriptors, which involves specification of the
color space, its partitioning, and the similarity metric, is therefore quite large.
There are a few evaluation points that can be used to guide the design. The
determination of the color space and partitioning can be done using color experi-
ments that perceptually gauge intra and interpartition distribution of colors. The
determination of the color descriptors can be made using retrieval-effectiveness
experiments in which the content-based query-by-color results are compared to
known ground truth results for benchmark queries. The image database system
can be designed to allow the user to select from different descriptors based on
the query at hand. Alternatively, the image database system can use relevance
feedback to automatically weight the descriptors or select metrics based on user
feedback [16].
11.2.1 Color Space
A color space is the multidimensional space in which the different dimensions
represent the different components of color. Color or colored light, denoted by
function F(λ), is perceived as electromagnetic radiation in the range of visible
light (λ ∈{380 nm 780 nm}). It has been verified experimentally that color
is perceived through three independent color receptors that have peak response
at approximately red (r), green (g), and blue (b) wavelengths: λ
r
= 700 nm,
λ
g
= 546.1nm,λ
b
= 435.8 nm, respectively. By assigning to each primary color
receptor a response function c
k
(λ),wherek ∈{r, b, g}, the linear superposition
of the c
k
(λ)’s represents visible light F(λ) of any color or wavelength λ [17].
By normalizing c
k
(λ)’s to reference white light W(λ) such that
W(λ) =
c
r
(λ) + c
g
(λ) + c
b
(λ), (11.1)
the colored light F(λ) produces the tristimulus responses (R,G, B) such that
F(λ) = R
c
r
(λ) + G c
g
(λ) + B c
b
(λ). (11.2)
As such, any color can be represented by a linear combination of the three primary
colors (R,G, B). The space spanned by the R, G,andB values completely
describe visible colors, which are represented as vectors in the 3D RGB color
space. As a result, the RGB color space provides a useful starting point for
representing color features of images. However, the RGB color space is not
288 COLOR FOR IMAGE RETRIEVAL
perceptually uniform. More specifically, equal distances in different areas and
along different dimensions of the 3D RGB color space do not correspond to equal
perception of color dissimilarity. The lack of perceptual uniformity results in the
need to develop more complex vector quantization to satisfactorily partition the
RGB color space to form the color descriptors. Alternative color spaces can be
generated by transforming the RGB color space. However, as yet, no consensus
has been reached regarding the optimality of different color spaces for content-
based query-by-color. The problem originates from the lack of any known single
perceptually uniform color space [18]. As a result, a large number of color spaces
have been used in practice for content-based query-by-color.
In general, the RGB colors, represented by vectors v
c
, can be mapped to
different color spaces by means of a color transformation T
c
. The notation w
c
indicates the transformed colors. The simplest color transformations are linear.
For example, linear transformations of the RGB color spaces produce a number
of important color spaces that include YIQ(NTSC composite color TV standard),
YUV (PAL and SECAM color television standards), YCrCb(JPEG digital image
coding standard and MPEG digital video coding standard), and opponent color
space OPP [19]. Equation (11.3) gives the matrices that transform an RGB
vector into each of these color spaces. The YIQ, YUV,andYCrCb linear
color transforms have been adopted in color picture coding systems. These linear
transforms, each of which generates one luminance channel and two chrominance
channels, were designed specifically to accommodate targeted display devices:
YIQ— NTSC color television, YUV — PAL and SECAM color television, and
YCrCb— color computer display. Because none of the color spaces is uniform,
color distance does not correspond well to perceptual color dissimilarity.
The opponent color space (OPP) was developed based on evidence that
human color vision uses an opponent-color model by which the responses of the
R, G,andB cones are combined into two opponent color pathways [20]. One
benefit of the OPP color space is that it is obtained easily by linear transform.
The disadvantages are that it is neither uniform nor natural. The color distance
in OPP color space does not provide a robust measure of color dissimilarity.
One component of OPP, the luminance channel, indicates brightness. The two
chrominance channels correspond to blue versus yellow and red versus green.
T
YIQ
c
=
0.299 0.587 0.114
0.596 −0.274 −0.322
0.211 −0.523 0.312
T
YUV
c
=
0.299 0.587 0.114
−0.147 −0.289 0.436
0.615 −0.515 −0.100
T
YCrCb
c
=
0.2990 0.5870 0.1140
0.5000 −0.4187 −0.0813
−0.1687 −0.3313 0.5000
COLOR DESCRIPTOR EXTRACTION 289
T
OPP
c
=
0.333 0.333 0.333
−0.500 −0.500 1.000
0.500 −1.000 0.500
(11.3)
Although these linear color transforms are the simplest, they do not generate
natural or uniform color spaces. The Munsell color order system was desined to
be natural, compact, and complete. The Munsell color order rotational system
organizes the colors according to natural attributes [21]. Munsell’s Book of Color
[22] contains 1,200 samples of color chips, each with a value of hue, saturation,
and chroma. The chips are spatially arranged (in three dimensions) so that steps
between neighboring chips are perceptually equal.
The advantage of the Munsell color order system results from its ordering of a
finite set of colors by perceptual similarities over an intuitive three-dimensional
space. The disadvantage is that the color order system does not indicate how to
transform or partition the RGB color space to produce the set of color chips.
Although one transformation, named the mathematical transform to Munsell
(MTM), from RGB to Munsell HVC was investigated for image data by Miya-
hara [23], there does not exist a simple mapping from color points in RGB color
space to Munsell color chips. Although the Munsell space was designed to be
compact and complete, it does not satisfy the property of uniformity. The color
order system does not provide for the assessment of the similarity of color chips
that are not neighbors.
Other color spaces such as HSV, CIE 1976 (L
∗
a
∗
b
∗
), and CIE 1976 (L
∗
u
∗
v
∗
)
are generated by nonlinear transformation of the RGB space. With the goal of
deriving uniform color spaces, the CIE
2
in 1976 defined the CIE 1976 (L
∗
u
∗
v
∗
)
and CIE 1976 (L
∗
a
∗
b
∗
) color spaces [24]. These are generated by a linear trans-
formation from the RGB to the XYZ color space, followed by a different
nonlinear transformation. The CIE color spaces represent, with equal emphasis,
the three characteristics that best characterize color perceptually: hue, lightness,
and saturation. However, the CIE color spaces are inconvenient because of
the necessary nonlinearity of the transformations to and from the RGB color
space.
Although the determination of the optimum color space is an open problem,
certain color spaces have been found to be well-suited for content-based query-
by-color. In Ref. [25], Smith investigated one form of the hue, lightness,and
saturation transform from RGB to HSV, given in Ref. [26], for content-based
query-by-color. The transform to HSV is nonlinear, but it is easily invertible. The
HSV color space is natural and approximately perceptually uniform. Therefore,
the quantization of HSV can produce a collection of colors that is also compact
and complete. Recognizing the effectiveness of the HSV color space for content-
based query-by-color, the MPEG-7 has adopted HSV as one of the color spaces
for defining color descriptors [27].
2
Commission Internationale de l’Eclairage
290 COLOR FOR IMAGE RETRIEVAL
11.2.2 Color Quantization
By far, the most common category of color descriptors are color histograms. Color
histograms capture the distribution of colors within an image or an image region.
When dealing with observations from distributions that are continuous or that can
take a large number of possible values, a histogram is constructed by associating
each bin to a set of observation values. Each bin of the histogram contains the
number of observations (i.e., the number of image pixels) that belong to the asso-
ciated set. Color belongs to this category of random variables: for example, the
color space of 24-bit images contains 2
24
distinct colors. Therefore, the partitioning
of the color space is an important step in constructing color histogram descriptors.
As color spaces are multidimensional, they can be partitioned by multi-
dimensional scalar quantization (i.e., by quantizing each dimension separately) or
by vector quantization methods. By definition, a vector quantizer Q
c
of dimension
k and size M is a mapping from a vector in k-dimensional space into a finite set
C
that contains M outputs [28]. Thus, a vector quantizer is defined as the mapping
Q
c
:
k
→ C,whereC = (y
0
, y
1
, ,y
M−1
) and each y
m
is a vector in the k-
dimensional Euclidean space
k
.ThesetC is customarily called a codebook,
and its elements are called code words. In the case of vector quantization of the
color space, k = 3 and each code word y
m
is an actual color point. Therefore,
the codebook
C represents a gamut or collection of colors.
The quantizer partitions the color space
k
into M disjoint sets R
m
, one per
code word that completely covers it:
M−1
m=0
R
m
=
k
and R
m
R
n
∀ m = n. (11.4)
All the transformed color points w
c
belonging to the same partition R
m
are
quantized to (i.e., represented by) the same code word y
m
:
R
m
={w
c
∈
k
: Q
c
(w
c
) = y
m
}.(11.5)
A good color space quantizer defines partitions that contain perceptually similar
colors and code words that well approximate the colors in their partition. The
quantization Q
166
c
of the HSV color space developed by Smith in Ref. [25] parti-
tions the HSV color space into 166 colors. As shown in Figure 11.1, the HSV
color space is cylindrical. The cylinder axis represents the value, which ranges
from blackness to whiteness. The distance from the axis represents the saturation,
which indicates the amount of presence of a color. The angle around the axis
is the hue, indicating tint or tone. As the hue represents the most perceptually
significant characteristic of color, it requires the finest quantization. As shown
in Figure 11.1, the primaries, red, green, and blue, are separated by 120 degrees
in the hue circle. A circular quantization at 20-degree steps separates the hues
so that the three primaries and yellow, magenta, and cyan are each represented
with three subdivisions. The other color dimensions are quantized more coarsely
COLOR DESCRIPTOR EXTRACTION 291
v
ˆ
c
= (
r
,
g
,
b
)
w
ˆ
c
=
T ·
nˆ
c
w
ˆ
c
= (
h
,
s
,
v
)
G
R
B
H
S
V
g
b
r
Figure 11.1. The transformation T
HSV
c
from RGB to HSV and quantization Q
166
c
gives
166 HSVcolors = 18 hues × 3 saturations × 3 values + 4 grays. A color version of this
figure can be downloaded from ftp://wiley.com/public/sci
tech med/image databases.
because the human visual system responds to them with less discrimination; we
use three levels each for value and saturation. This quantization, Q
166
c
, provides
M = 166 distinct colors in HSV color space, derived from 18 hues (H) × 3
saturations (S) × 3values(V) + 4 grays [29].
11.2.3 Color Descriptors
A color descriptor is a numeric quantity that describes a color feature of an
image. As with texture and shape, it is possible to extract color descriptors from
the image as a whole, producing a global characterization; or separately from
different regions, producing a local characterization. Global descriptors capture
the color content of the entire image but carry no information on the spatial
layout, whereas local descriptors can be used in conjunction with the position
and size of the corresponding regions to describe the spatial structure of the
image color.
11.2.3.1 Color Histograms. The vast majority of color descriptors are color
histograms or derived quantities. As previously mentioned, mapping the image
to an appropriate color space, quantizing the mapped image, and counting how
many times each quantized color occurs produce a color histogram. Formally, if
I denotes an image of size W × H , I
q
(i, j) is the color of the quantized pixel
at position i, j,andy
m
is the mth code word of the vector quantizer, the color
histogram h
c
has entries defined by
h
c
[m] =
W −1
i=0
H −1
j= 0
δ(I
q
(i, j), y
m
), (m = 1, ,M), (11.6)
where the Kronecker delta function, δ(·, ·), is equal to 1 if its two arguments are
equal, and zero otherwise.
The histogram computed using Eq. 11.6 does not define a distribution because
the sum of the entries is not equal to 1 but is the total number of pixels of the
292 COLOR FOR IMAGE RETRIEVAL
image. This definition is not conducive to comparing color histograms of images
having different size. To allow matching, the following class of normalizations
can be used:
h
r
=
h
M−1
m=0
|h[m]|
r
1/r
,(r= 1, 2). (11.7)
Histograms normalized with r = 1 are empirical distributions, and they can be
compared with different metrics and dissimilarity indices. Histograms normalized
with r = 2 are unit vectors in the M-dimensional Euclidean space, namely, they
lie on the surface of the unit sphere. The similarity between two such histograms
can be represented, for example, by the angle between the corresponding vectors,
captured by their inner product.
11.2.3.2 Region Color. One of the drawbacks of extracting color histograms
globally is that it does not take into account the spatial distribution of color
across different areas of the image. A number of methods have been developed
for integrating color and spatial information for content-based query. Sticker and
Dimai developed a method for partitioning each image into five nonoverlapping
spatial regions [15]. By extracting color descriptors from each of the regions, the
matching can optionally emphasize some regions or can accommodate matching
of rotated or flipped images. Similarly, Whsu and coworkers developed a method
for extracting color descriptors from local regions by imposing a spatial grid on
images [30]. Jacobs and coworkers developed a method for extracting color
descriptors from wavelet-transformed images, which allows fast matching of the
images based on location of color [31]. Figure 11.2 illustrates an example of
extracting localized color descriptors in ways similar to that explored in [15] and
[30], respectively. The basic approach involves the partitioning of the image into
multiple regions and extracting a color descriptor for each region. Corresponding
region-based color descriptors are compared in order to assess the similarity of
two images.
Figure 11.2a shows a partitioning of the image into five regions: r
0
–r
4
,in
which a single center region, r
0
, captures the color features of any center object.
Figure 11.2b shows a partitioning of the image into sixteen uniformly spaced
regions: g
0
–g
15
. The dissimilarity of images based on the color spatial descriptors
can be measured by computing the weighted sum of individual region dissimi-
larities as follows:
d
q,t
=
M−1
m=0
w
m
d
q,t
(r
q
m
,r
t
m
), (11.8)
where r
q
m
is the color descriptor of region m of the query image, r
t
m
is the color
descriptor of region m of the target image, and w
m
is the weight of the m-th
distance and satisfies
w
m
= 1.
Alternately, Smith and Chang developed a method by matching images based
on extraction of prominent single regions, as shown in Figure 11.3 [32]. The
COLOR DESCRIPTOR EXTRACTION 293
g
0
g
1
g
2
g
3
g
4
g
5
g
6
g
7
g
8
g
9
g
10
g
11
g
12
g
13
g
14
g
15
r
1
r
2
r
3
r
0
r
4
(a)
(b)
Figure 11.2. Representation of spatially localized color using region-based color
descriptors. A color version of this figure can be downloaded from ftp://wiley.com/public/
sci
tech med/image databases.
Region
extraction
Spatial
composition
Region
extraction
Spatial
composition
I
Q
q
k
I
T
t
j
Query image Target image
AB CD ABCD
D
D
C
C
B
B
A
A
Q
= {
q
k
}
Compare
D
({
q
k
},{
t
j
})
T
= {
t
j
}
Figure 11.3. The integrated spatial and color feature query approach matches the images
by comparing the spatial arrangements of regions.
294 COLOR FOR IMAGE RETRIEVAL
VisualSEEk content-based query system allows the images to be matched by
matching the color regions based on color, size, and absolute and relative spatial
location [10]. In [7], it was reported that for some queries the integrated spatial
and color feature query approach improves retrieval effectiveness substantially
over content-based query-by-color using global color histograms.
11.3 COLOR DESCRIPTOR METRICS
A color descriptor metric indicates the similarity, or equivalently, the dissimilarity
of the color features of images by measuring the distance between color
descriptors in the multidimensional feature space. Color histogram metrics can
be evaluated according to their retrieval effectiveness and their computational
complexity. Retrieval effectiveness indicates how well the color histogram
metric captures the subjective, perceptual image dissimilarity by measuring the
effectiveness in retrieving images that are perceptually similar to query images.
Table 11.1 summarizes eight different metrics for measuring the dissimilarity of
color histogram descriptors.
11.3.1 Minkowski-Form Metrics
The first category of metrics for color histogram descriptors is based on the
Minkowski-form metric. Let h
q
and h
t
be the query and target color histograms,
respectively. Then
d
r
q,t
=
M−1
m=0
|h
q
(m) − h
t
(m)|
r
.(11.9)
As illustrated in Figure 11.4, the computation of Minkowski distances between
color histograms accounts only for differences between corresponding color bins.
A Minkowski metric compares the proportion of a specific color within image q
to the proportion of the same color within image t, but not to the proportions of
Table 11.1. Summary of the Eight Color Histogram Descriptor Metrics (D1–D8)
Metric Description Category
D1 Histogram L
1
distance Minkowski-form (r = 1)
D2 Histogram L
2
distance Minkowski-form (r = 2)
D3 Binary set Hamming distance Binary Minkowski-form (r = 1)
D4 Histogram quadratic distance Quadratic-form
D5 Binary set quadratic distance Binary quadratic-form
D6 Histogram Mahalanobis distance Binary quadratic-form
D7 Histogram mean distance First moment
D8 Histogram moment distance Higher moments
[...]... follows: a benchmark query is issued to the system, the system retrieves the images in rank order, then, for each cutoff value k, the following values are computed, where Vn ∈ {0, 1} is the relevance of the document with rank n, where n, k = [1, , N ] range over the N images: k n=1 Vn , is the number of relevant results returned among the top k, k n=1 (1 − Vn ), is the number of irrelevant results returned . the following values are
computed, where V
n
∈{0, 1} is the relevance of the document with rank n,where
n, k = [1, ,N] range over the N images:
• A
k
=
k
n=1
V
n
,