Bài báo này trình bày một phương pháp nhận dạng đối tượng dựa trên các đặc trưng sparse feature của ảnh màu..... "In this paper, we propose a new framework for extraction of sparse features of color images. The framework is based on the structure of the standard model of the visual cortex: however, in place of symmetric Gabor filters, Gabor energy filters are utilized. Color information is taken into account in calculating sparse features of objects. The learning stage yields a set of prototype patches of color components that are simultaneously chosen over spatial position, spatial size, and multiple scales. A set of sparse features is obtained by means of the localized pooling method, after which is passed to a classifier for object recognition and classification. The experimental results confirm the significant contribution of our framework in object recognition." @ryanong
Trang 1Object Recognition using Sparse Features of Color Images
T.T.-Quyen Bui1, Keum-Shik Hong2, Dang-Chung Nguyen1, Anh-Tuan Do1, Thanh-Long Nguyen1, Ngoc-Minh Pham1, Quang-Vinh Thai1
1
Department of Automation Technology, Institute of Information Technology, Hanoi, Vietnam
2
Department of Cogno-Mechatronics Engineering and School of Mechanical Engineering,
Pusan National University, Busan, Korea Email:quyenbt@ioit.ac.vn
Abstract—In this paper, we propose a new framework for
extraction of sparse features of color images The framework is
based on the structure of the standard model of the visual
cortex: however, in place of symmetric Gabor filters, Gabor
energy filters are utilized Color information is taken into
account in calculating sparse features of objects The learning
stage yields a set of prototype patches of color components that
are simultaneously chosen over spatial position, spatial size, and
multiple scales A set of sparse features is obtained by means of
the localized pooling method, after which is passed to a
classifier for object recognition and classification The
experimental results confirm the significant contribution of our
framework in object recognition
Keywords-object recognition; gabor energy filter; sparse
features; color-based features;
I INTRODUCTION
Object recognition is one of difficult challenges in the
field of computer vision Among the vast variety of existing
approaches to object recognition, methods using a deep,
biologically inspired architecture have proven remarkably
successful A computational model of object recognition in
the visual cortex is introduced in [1] The model consists of
five levels, starting with a grey-scale image layer I and
proceeding, in the higher levels, to alternating single S and
complex C units The S units mix their inputs according to a
bell-shaped tuning function to increase selectivity, while the
C units pool their inputs by a maximum operation in order
to increase invariance A cortex-like mechanism that uses a
symmetric Gabor filter with input grey images for
recognition of complex visual scenes is presented in detail
in [2] The use of sparse features with limited receptive
fields for multiclass object recognition is introduced in [3]
In Mutch and Lowe’s approach [4], symmetric Gabor
filters are applied at all grey-image position and scales, by
means of alternating template matching and a max pooling
operation, feature complexity and position/scale invariance
are developed Sparsity is increased by constraining the
number of features inputs, lateral inhibition, and features
selection The result is that images are reduced to features
vectors that are computed by using a localized pooling
method and classified by a support vector machine (SVM)
Recently, Zhang et al [5] proposed a framework in which
two functional classes of color-sensitive neurons [6],
single-opponent and double-single-opponent neurons, are used as the
inputs of the model of Serre et al [2] Thériault et al [7]
improved the architecture of the visual cortex model by allowing filters (prototype patches) extracted from level C1
to combine multiple scales inside the same spatial neighborhood This provides a better match to the local scale of image structure Multi-resolution spatial pooling, at level C2, also is performed, as a result of which, both local and global spatial information are encoded to produce discriminative image signatures Thériault et al [7] demonstrated that their model outperforms several previous models, for example, those based on biologically inspired architecture [2][4][8] or bag-of-words (BoW) architecture [9-10]
Gabor filters and color information have been usefully exploited in several object recognition studies owing to the effectiveness of Gabor filters as (localization) detectors of lines and edges [11] and the high discriminative power of color The discerning ability of color information is important in many pattern recognition and image processing applications [12-13] Certainly, the additional information provided by color often yields better results than methods that use only grey-scale information [14-16] Bui and Hong [15] examined the use of color cues in the CIELAB color space in a color-based active basis model incorporating a template-based approach and the local power spectrums (LPSs) of color components combined with gradient-based features for object recognition Martin et al [17] explored the utility of the local brightness, color, and texture cues in detecting natural image boundaries
Based on the structure of the model of the visual cortex, the characteristics of Gabor filters, and the color cues of LAB color images, we propose a new framework in which the color information, Gabor energy filters, and the architecture of the visual cortex model are utilized for object recognition Color information, unlike grey images as the input of the visual cortex model, is taken into account in calculating the sparse features of the objects The CIELAB color space was chosen, as it is designed to approximate human vision We endeavored to maintain the structure of the generic framework of alternating convolution and max pooling operations, which enhances selectivity and invariance An image represented in RGB color space is converted to LAB color space And in place of symmetric Gabor filters, a set of spatial Gabor energy filters is applied
to each color component In the learning stage, we adapt the learning strategy presented in [7] A set of discriminative prototype patches of color components is selected randomly
Trang 2over spatial position, spatial size, and several scales
simultaneously and is extracted by the local maximum over
scales and orientations After alternating layers of feature
mapping (convolution) and of feature pooling, a set of
sparse features is computed by a localized pooling method
and is exploited by an SVM classifier [18] for object
recognition
The research contributions presented in this paper can be
summarized as follows: As substitutes for the symmetric
Gabor filters employed in previous models, Gabor energy
filters are utilized in our framework; color information is
taken into account in calculating sparse features of objects
based on the structure of the model of the visual cortex;
experimental validations of our proposed framework with
respect to previous models are provided For the purposes of
experiments and in order to compare our model with
previous ones, we use the leaves dataset [19], along with
airplane, motorcycle, and rear-car datasets [20], as well as
subsets of the CalTech101 [21]
This paper is organized as follows: A standard model of
the visual cortex for object recognition is overviewed briefly
in Section 2 The methodology of our proposed framework
for extraction of sparse features of color images is presented
in Section 3 The results of experiments conducted on
several image datasets for verification of our object
recognition approach are reviewed in Section 4 Finally,
concluding remarks are drawn in Section 5
II THE STANDARD VISUAL CORTEX MODEL
A standard object recognition model based on a theory of
the ventral visual stream of the visual cortex was introduced
in [1] The structure of the model is kept in modified
versions [2][4][7] Serre et al [2] used this model structure,
with Gabor filter sets at eight bands with 16 scales, as an
alternative to the original Gaussian filter bank; they added a
learning step, and investigated the contributions of the C1
and C2 features to object recognition The basic network
calculation in [2] is summarized as follows
Layer S1: The two-dimensional Gabor function
)
,
(
,
,
g , wherein (x,y)2, is centered at the
origin and is given by [22]
~ 2 cos(
) ,
2 2 2
2
~
~ ,
,
e y x g
y x
where ~x xcos ysin , ~y xsin ycos , and
is the spatial aspect ratio that determines the ellipticity of
the support of the Gabor function Further, is the
standard deviation of the Gaussian envelope and determines
the size of the receptive field; is the wavelength of the
cosine factor where f 1/ is the spatial frequency; ,
where [0,), represents the orientation of the normal to
the parallel stripes of the Gabor function, and finally, ,
where (,], is the phase offset that determines the
symmetry of g ,,,(x,y) with respect to the origin
Layer S1 is the response of the convolution of the input
image I(x,y) and a set of spatial symmetric Gabor filters,
) , ( 0 , ,
g with orientation {1, , }
N and scale }
, , { 1
N , where N and N are the numbers of orientations and scales, respectively:
S1, I(x,y)*g ,,,0(x,y) The S1 stage resembles an edge detector, since symmetric Gabor filters are active only near image edges The S1 unit
is a four-dimensional matrix (x/y/θ/σ)
Layer C1: Layer C1 is a result of the selection of the
maxima over the local spatial neighborhood and the down-sampling of the result This pooling increases the tolerance
to two-dimensional transformation from layer S1 to C1, thereby providing robustness to scale and translation
Similarly, the C1 unit is a four-dimensional matrix (x/y/θ/σ) Layer S2: At every position and scale in the C1 layer,
layer S2 is generated by template matching between the patch of C1 units centered at that position/scale and each of
the N p prototype patches The S2 unit is calculated as
S2, exp( X P2)
where β is a tunable parameter, X is an image patch from the C1 layer at scale σ, and P is one of the N p features Both
maps are calculated across all positions for each of scale
i, i 1, ,N Here, N p prototype patches (small image
patches having dimensionality n × n × N ) are randomly
sampled from the C1 layers of training images
Layer C2: A set of C2 features that is sift- and
scale-invariant is generated by applying the global maximum over
all scales and positions from the S2 map A vector of N p C2 values are obtained for each image
III SPARSE FEATURES OF COLOR IMAGES
In the visual cortex model in which symmetric Gabor filters are used [2][4-5][7], sparse features are actually the values that are related to the edges of objects after a certain number of mappings since Gabor filters in the first layer resemble edge detecting filters and are active only near image edges According to the phase offset value of the Gabor function in equation (1), there are two filter types: symmetric and symmetric A filter that deploys an anti-symmetric Gabor function (phase offset 90 or -90 degrees) yields a maximum response exactly at an edge; however, due to the ripples of the Gabor function, there are flanking responses A filter that deploys a symmetric Gabor function (phase offset 0 or 180 degrees) yields a maximum that is shifted from the edge There are actually two maxima: one
to the left and the other to the right of the edge This can cause problems in selecting maxima over local neighborhoods for layer C1, since the symmetric Gabor filter has been used in layer L1 in previous research [2][4-5][7] Moreover, it is adverse to the calculation of a set of sparse features in layer C2, which uses the localized pooling method whereby the maximum output for each test image is achieved in the neighborhood of the training position of the prototype patch extracted from layer C1 [4][7]
Trang 3When a Gabor energy filter is used, the responses to all
lines and edges in the input image are equally strong This
filter provides a smooth response to an edge or a line of
appropriate width with a local maximum exactly at the edge
or in the center of the line If we apply thinning to this
response, we will obtain one thin line that follows the
corresponding edge or line [23] By contrast, a simple linear
Gabor filter (anti-symmetric or symmetric type) provides
flanking responses owing to the ripples of the Gabor
function Therefore, the use Gabor energy filters will
improve the precision of the calculation of the local
maximum and localized pooling operations
We propose a new framework for object recognition in
which color information is taken into account in the
calculation of sparse features of objects, as in Fig 1 Layers
in the framework alternate between “sensitivity” (to
detecting features) and “invariance” (to position, scale, and
orientation) An input natural image represented in RGB
color space is converted to LAB color space In addition,
rather than symmetric Gabor filters, we use Gabor energy
filters in the calculation of the first layer L1 Each color
component is convoluted with the Gabor energy filter, scale
}
, ,
{ 1
N , orientations {1, , }
N At scales σ
and orientation θ, the Gabor energy model E ,(x,y,c) of
a color component is computed from the superposition of
phases, as follows
2 / , , 2
0 , , ,
,
) , (
* ) , , ( ) , (
*
)
,
,
(
)
,
,
(
y x g
c y x I y x g
c
y
x
I
c
y
x
E
where c indicates an index of color components,
}
, ,
{1
C
N
c
c
c , and N C is the number of color
components We use eight scales, N 8 , and the
parameters of the Gabor energy filter are listed in Table 1
The superposition of the Gabor energy filter outputs over
the color components is computed as
1 , , ,
N
i
The normalized response E i,Sum ,(x,y) of the output
superposition is calculated by using a divisive normalization
nonlinearity in which each non-normalized response (cell) is
suppressed by the pooled activity of a large number of
non-normalized responses [24], as follows
j
Sum j
Sum i Sum
i
y x E
y x E K y x E
,
, )
, (
, , , 2
, , , ,
, ,
where K is a constant scale factor and ρ 2 is the
semi-saturation constant The summation j j Sum
y x
E , , , is taken over a large number of non-normalized responses with
different turnings, and contains E i,Sum,,x,y, which appears
Figure 1 The proposed scheme of object recognition TABLE I THE PARAMETERS OF THE GABOR FILTERS THESE PARAMETERS ARE USED IN [2]
Scale Filter size Gabor σ Wavelength λ
1
2
3
4
5
6
7
8
7 × 7
11 × 11
15 × 15
19 × 19
23 × 23
27 × 27
31 × 31
35 × 35
2.8 4.5 6.7 8.2 10.2 12.3 14.6 17.0
3.5 5.6 7.9 10.3 12.7 15.5 18.2 21.2
in the numerator Since the value ρ is non-zero, the normalized response has a value from 0 to K, saturating for
high contrasts This normalization step is necessary, since the new model preserves the essential features of linearity in the face of apparently contradictory behavior [24] Here, the normalization responses are pooled over all orientations
Parameters K and ρ were 1 and 0.225, respectively, in our
experiments
+ Layer L2: Layer L2 is the reduced version of L1, and
is obtained by means of the maxima in the local spatial neighborhood and over two adjacent scales This pooling increases the tolerance to position and scale, providing robustness to translation and scale We use spatial pooling sizes that are proportional to the scale presented in (Serre
et al 2007) The sizes are g × g where g {8, 10, 12, 14,
16, 18, 22, 24}
+ Layer L3: The layer L3 unit is obtained by the
convolution product of layer L2 with a prototype patch P
centered at scale σ Both components of the convolution
operation are normalized to unit length The purpose of this step is to maintain the features’ geometrical similarities regarding the variations of the light intensity Before calculating layer L3 of an image, we need to conduct the learning stage to extract a set of prototype patches that is collected from all training images
+ Layer L4: For each image, the number of sparse features obtained is N p if the number of prototype patches in
the learning stage is N p We use the localized pooling method for calculating layer L4 The training position of each prototype patch is recorded Instead of pooling over the entire images, the L4 unit is the response of taking the maximum output for a test image in the neighborhood of the training position This approach allows for holding some geometric information above the L4 level, and gains the global invariance Thériault et al [7] considered both cases, localized pooling [4] and multi-resolution pooling, and their
Trang 4results show that multi-resolution pooling at the L4 stage
yields better performance (an additional 2% increase when
testing on CalTech101 for 30 training examples with 4080
prototype patches) Here, we used the multi-resolution
pooling presented in [7] The local pooling regions are
circles centered at a training position of each prototype
patch
+ Learning stage: We adapted the training process
presented in [7], in which the local scale and orientation
learned fit the local image structure Training the prototype
patches with a lower fitness error increases the network
invariance level to basic local geometrical transformations,
and these prototype patches are less sensitive to local
perturbations around the axes of relevant image structures
Additionally, local scaling in the network sketches out the
necessary balance between discriminative power and
invariance for classification Unlike the learning in [2][4-5],
instead of selecting prototype patches at a single scale, the
learning selects a large pool of prototypes of different sizes
n × n, where n ∈ {4, 8, 12, 16} at random positions and
multiple scales simultaneously from the L2 layer of random
training images In training, the coefficients that correspond
to “weak” scales and orientations are set to zero Let NP be
the number of prototype patches extracted by learning Let
} ,
,
,
{1 3 5 7
S and {k /N ,k0, ,N 1}
The rule for selecting prototype patches is
otherwise 0
) , , , ( max ) , ( if ) ,
,
,
(
)
,
,
,
(
,
*
*
*
*
*
*
i
i
i
i
i
i
y x B y
x
B
y
x
P
where B i(x i,y i,,) is an image patch of size n × n, and n
∈ {4, 8, 12, 16} at a random position p i(x i,y i, s) ,
}
, ,
1
s on the layer L2 of a random training image,
S, Patch B i(x i,y i,*,*) is obtained from the
local maximum over orientations and scales This rule
makes a set of prototype patches more discriminative,
whereby weaker scales and orientations are ignored during
the testing process In the learning stage, the image from
each class is selected randomly, but we pick out prototype
patches on each training dataset of each class equally The
learning procedure is carried out as Algorithm 1
+ Classification stage: Two sets of sparse features of
images in training and testing datasets are calculated from
our framework, which are passed to the classifier LibSVM
[18] for training and classification For multi-class
classification, we used the one-against-all (OAA) SVMs
approach first introduced by Vapnik in [25]
Performance measures: In order to obtain performance
measures, decision functions are applied to predict the labels
(target values) of testing data The prediction result is
evaluated as follows:
data testing total
#
data predicted correctly
#
IV EXPERIMENTAL RESULTS
Here, we illustrate the use of our proposed framework in object recognition All images were rescaled to 140 pixels
on the shortest side, and other side was rescaled automatically to preserve the image aspect ratio Two color-based sparse features datasets of training and testing datasets achieved in the L4 stage were converted to the LibSVM data format and then passed to LibSVM for recognition and classification Our object recognition and classification process were executed on static images and not in real-time The result reported is the mean and standard deviation of the performance measures after five runs with random splits We re-implemented the model in [7], because it is the most closely related to our model We did not focus on the improvement of SVM classifiers In order to make fair comparisons between the methods, we used the same values of the parameters of the algorithms for learning and recognizing objects (e.g., the number of prototype patches, fixed splits on datasets, etc.), as well as SVM parameters
TABLE II A COMPARISON OF RECOGNITION PERFORMANCE MEASURES OBTAINED FOR THE DATASETS THE NUMBER OF PROTOTYPE PATCHES IN THE LEARNING STAGE IS 1,000; THE NUMBER OF NEGATIVE IMAGES IS 200 THE RESULTS IN COLUMNS “BENCHMARK” AND “SERRE ET AL [7]” ARE ARCHIEVED BY THEIR OWN IMPLEMENTATION
Datasets Unit: %
Bench-mark
Serre et
al [2]
Thériault et
al [7]
Our model
Leaves 84.0 95.9 97.43 ± 0.10 98.85 ± 0.10
Motorcycles 95.0 97.4 98.46 ± 0.09 99.60 ± 0.06
Airplanes 94.0 94.9 96.50 ± 0.15 98.36 ± 0.12
Figure 2 Sample images from leaves, rear-car, motorcycle, and airplane
datasets [19-20]
Leaves Car rears
Motorcycles Airplanes
Algorithm 1 for selection of prototype patches
For i=1: N P + Select one random training image + Convert the image to LAB color space + Calculate layer L2 of the image + Select a random position p i(x i,y i, s) , s 1 , ,N } on layer L2 + Extract a random patch B i(x i,y i, s,) of size n × n at position
+ If ( , ) max ( , , , )
,
*
y x B
) , , , ( ) , , , ( i i ** i i i **
else P i(x i,y i,*,*) 0 +Record prototype patch P i(x i,y i,*,*) andp i(x i,y i, s)
End for i
Trang 5Figure 3 Sample images of objects from CalTech101 dataset
TABLE III RECOGNITION PERFORMANCE OBTAINED ON A SUBSET OF 94
CATERGORIES (7,687 IMAGES) FROM CALTECH101 DATASET THE USE OF 15
AND 30 TRAINING IMAGES PER CLASS CORRESPOND TO COLUMN 2 AND 3,
RESPECTIVELY THE NUMBER OF PROTOTYPE PATCHES IS 4,000
15 images/cat 30 images/cat
Thériault et al [7] 60.80 ± 0.45 69.63 ± 0.38
A Single-class object recognition
Here, the use of our framework for single-class object
recognition is demonstrated, and experimental results are
compared with those of previous works such as involved
benchmark systems (the constellation models [20]), and
grey-based sparse features in [2][7] Each object class is
recognized independently, and a set of sparse features of
each object is extracted from each positive training image
dataset We considered datasets of the following objects:
leaves, car rears, airplanes, and motorcycles from [19-20]
Fig 2 displays sample images from leaves, rear-car,
motorcycle, and airplane datasets [19-20]
In the testing on the rear-car, leaves, airplane, and
motorcycle datasets, we used the same fixed splits on
datasets as in [20] in all of the experiments: each dataset
was randomly split into two separate datasets of equal size
One dataset was used for the learning stage, and the second
one for testing Table 2 provides a summary of the results
achieved with the two methods The values in the
Benchmark and Serre et al [2] columns are results
published from the benchmark systems [19-20] and Serre et
al [2], respectively Both the models of Serre et al [2] and
Thériault et al [7] use grey images as the input of the deep
biologically inspired architectures, but the performance
obtained from the model of Thériault et al [7] is better,
owing to its incorporation of a learning stage in which
prototype patches are extracted over spatial position, spatial
orientations, and multiple-scale In addition, in place of
pooling over an entire image as in [2], Thériault et al [7]
used the localized pooling method for the calculation of
sparse feature sets Nevertheless, our model yielded the
highest performance, because, along with the learning
strategy presented in [7], Gabor energy filters are utilized to
increase the precision of the local maximum and localized
pooling operations; and further, color information is taken
into account in calculating sparse features of objects The
experimental results of single-class object recognition
showed that the use of color information in our model imparts significant improvements to object recognition
B Multi-class object recognition
Here, we illustrate the use of our framework for multi-class object recognition Unlike the case of single-multi-class object recognition, universal sparse features are extracted from random training image datasets and shared by several object classes We used subsets from the CalTech101 dataset, in which most objects of interest are central and dominant CalTech101 in which objects are against either a background or a plain natural scene, are composed of both grey- and color-image types CalTech101 comprises 101 object classes plus a background class, totaling 9,144 images However, because our framework works on color images, we collected subsets from those datasets for our experiments Fig 3 displays sample images of objects from CalTech101 dataset
We employed the fixed splits as follows: either 15 or 30 images were randomly selected from each object dataset for
a training set A testing data set was collected from the images remaining from the data sets of objects Table 3 displays the recognition performance achieved on a subset
of 94 categories (7,687 images) from the CalTech101 dataset, corresponding to the cases for 15 and 30 training images per class when the number of prototype patches in the learning stage was 4,000 The recognition performance for 30 training images per class was better than that for 15 training images per class, even though the number of prototype patches was the same Moreover, our model yielded better results in both cases (15 and 30 training images per class), specifically an improvement in classification score of around 6% These results confirm that that the use of Gabor energy filters and color information in our deep, biologically inspired architecture yields significant improvements in object recognition and classification
V CONCLUSIONS
In this paper, we presented our new framework in which
a combination of Gabor energy filters, the structure of the visual cortex model, and color information is used for extraction of sparse features of color images in the process of object recognition In the learning stage, a set of prototype patches of color components is selected over spatial position, spatial size, and multiple scales simultaneously and is extracted by the local maximum over scales and orientations
A set of sparse features of objects is computed by the localized pooling method, after which is exploited by the classifier SVM for object recognition and classification The utility of our framework in recognizing objects was illustrated on various datasets The experimental results show that the utility of our framework effects significant object recognition improvement
REFERENCES [1] M Riesenhuber & T Poggio, “Hierarchical models of object
recognition in cortex,” Nature Neuroscience, 2(11), 1019–1025,
1999
Sunflowers Starfishes
Crabs Cougar faces
Dragonflies Crayfishes
Trang 6[2] T Serre, L Wolf, S Stanley, M Riesenhuber, & T Poggio, “ Robust
object recognition with cortex-like mechanisms,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, 29(3), 411–426, 2007
[3] J Mutch & D.G Lowe, “Multiclass object recognition with sparse
localized features,” In IEEE computer society conference on
computer vision and pattern recognition, NewYork: CVPR, 2006, pp
11–18
[4] J Mutch & D.G Lowe, “Object class recognition and localization
using sparse features with limited receptive fields,” International
Journal of Computer Vision, 80(1), 45–57, 2008
[5] J Zhang, Y Barhomil, & T Serre, “A new biologically inspired color
image descriptor,” In Computer vision–ECCV 2012, proceedings of
the 12 th European conference on computer vision, part V, Heidelberg:
Spring-Verlag Berlin, 2012, pp 312–324
[6] R Shapley & M Hawken, Color in the cortex: Single- and
double-opponent cells, Vision Research, 51(7), 701–717, 2011
[7] C Thériault, N Thome, & M Cord, “Extended coding and pooling in
the HMAX model,” IEEE Transactions on Image Processing, 22(2),
764–777, 2013
[8] Y Huang, K Huang, D Tao, T Tan, & X Li, “Enhanced
biologically inspired model for object recognition,” IEEE
Transactions on Systems Man and Cybernetics, part B, 41(6), 1668–
1680, 2011
[9] J Pone, S Lazebnik, & C Schmid, “Beyond bags of features: Spatial
pyramid matching for recognizing natural scene categories,” In IEEE
computer society conference on computer vision and pattern
recognition, New York: CVPR, 2006, pp 2169–2178
[10] J Yang, K Yu, Y Gong, & T Huang, “Linear spatial pyramid
matching using sparse coding for image classification,” In IEEE
computer society conference on computer vision and pattern
recognition, Miami: CVPR, 2009, pp 1794–1801
[11] N Petkov & P Kruizinga, “Computational models of visual neurons
specialized in the detection of periodic and aperiodic oriented visual
stimuli: Bar and grating cells,” Biological Cybernetics, 76(2), 83–96,
1997
[12] T.T.Q Bui & K.S Hong, “Supervised learning of a color-based
active basis model for object recognition,” Proceedings of the 2rd
International Conference on Knowledge and Systems Engineering,
2010, pp 69-74
[13] Y.S Heo, K.M Lee, & S.U Lee, “Joint depth map and color
consistency estimation for stereo images with different illuminations
and cameras,” IEEE Transactions on Pattern Analysis and Machine
http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.167
[14] K.E van de Sande, T Gevers, & C.G.M Snoek, “Evaluating color
descriptors for object and scene recognition,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, 32(9), 1582–1596, 2010
[15] T.T.Q Bui & K.S Hong, “Evaluating a color-based active basis
model for object recognition,” Computer Vision and Image
Understanding, 116(11), 1111–1120, 2012
[16] F.S Khan, J van de Weijer, & M Vanrell, “Modulating shape
features by color attention for object recognition,” International
Journal of Computer Vision, 98(1), 49–64, 2012
[17] D.R Martin, C.C Fowlkes, & J Malik, “Learning to detect natural
image boundaries using local brightness, color, and texture cues,”
IEEE Transactions on Pattern Analysis and Machine Intelligence,
26(5), 530–549, 2004
[18] C.C Chang & C.J Lin, “LIBSVM: A library for support vector
machines,” ACM Transactions on Intelligent Systems and
Technology, 2(27), 1–27, 2011
[19] M Weber, W Welling, & P Perona, “Unsupervised learning of
models for recognition,” In Computer vision – ECCV 2000,
proceedings of the 6 th European conference on computer vision, part
I, Heidelberg: Springer-Verlag Berlin, 2000, pp 18–32
[20] R Fergus, P Perona, & A Zisserman, “Object class recognition by
unsupervised scale-invariant learning,” In Proceedings of the IEEE
computer society conference on computer vision and pattern recognition, Madison: IEEE Computer Society, 2003, pp 264–271
[21] L Fei-Fei, R Fergus, & P Perona, “Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories,” In Proceedings of the
IEEE computer society conference on computer vision and pattern recognition workshops, Washington: IEEE Computer Society, 2004,
pp 178–187
[22] J.G Daugman, “Uncertainty relation for resolution in space, spatial-frequency, and orientation optimized by two-dimensional visual
cortical filters,” Journal of the Optical Society of America A-Optics
Image Science and Vision, 2(7), 1160–1169, 1985
[23] N Petkov, “Biologically motivated computationally intensive
approaches to image pattern recognition,” Future Generation
Computer Systems, 11(4-5), 451–465, 1995
[24] D.J Heeger, “Modeling simple-cell direction selectivity with normalized, half-squared, linear operators,” Journal of Neurophysiology, 70(5), 1885–1898, 1993
[25] V Vapnik, The nature of statistical learning theory, London:
Springer-Verlag, 1995