Nhận dạng đối tượng dựa trên các đặc trưng "sparse feature" của ảnh màu

Bài báo này trình bày một phương pháp nhận dạng đối tượng dựa trên các đặc trưng sparse feature của ảnh màu..... "In this paper, we propose a new framework for extraction of sparse features of color images. The framework is based on the structure of the standard model of the visual cortex: however, in place of symmetric Gabor filters, Gabor energy filters are utilized. Color information is taken into account in calculating sparse features of objects. The learning stage yields a set of prototype patches of color components that are simultaneously chosen over spatial position, spatial size, and multiple scales. A set of sparse features is obtained by means of the localized pooling method, after which is passed to a classifier for object recognition and classification. The experimental results confirm the significant contribution of our framework in object recognition." @ryanong

Trang 1

Object Recognition using Sparse Features of Color Images

T.T.-Quyen Bui1, Keum-Shik Hong2, Dang-Chung Nguyen1, Anh-Tuan Do1, Thanh-Long Nguyen1, Ngoc-Minh Pham1, Quang-Vinh Thai1

1

Department of Automation Technology, Institute of Information Technology, Hanoi, Vietnam

2

Department of Cogno-Mechatronics Engineering and School of Mechanical Engineering,

Pusan National University, Busan, Korea Email:quyenbt@ioit.ac.vn

Abstract—In this paper, we propose a new framework for

extraction of sparse features of color images The framework is

based on the structure of the standard model of the visual

cortex: however, in place of symmetric Gabor filters, Gabor

energy filters are utilized Color information is taken into

account in calculating sparse features of objects The learning

stage yields a set of prototype patches of color components that

are simultaneously chosen over spatial position, spatial size, and

multiple scales A set of sparse features is obtained by means of

the localized pooling method, after which is passed to a

classifier for object recognition and classification The

experimental results confirm the significant contribution of our

framework in object recognition

Keywords-object recognition; gabor energy filter; sparse

features; color-based features;

I INTRODUCTION

Object recognition is one of difficult challenges in the

field of computer vision Among the vast variety of existing

approaches to object recognition, methods using a deep,

biologically inspired architecture have proven remarkably

successful A computational model of object recognition in

the visual cortex is introduced in [1] The model consists of

five levels, starting with a grey-scale image layer I and

proceeding, in the higher levels, to alternating single S and

complex C units The S units mix their inputs according to a

bell-shaped tuning function to increase selectivity, while the

C units pool their inputs by a maximum operation in order

to increase invariance A cortex-like mechanism that uses a

symmetric Gabor filter with input grey images for

recognition of complex visual scenes is presented in detail

in [2] The use of sparse features with limited receptive

fields for multiclass object recognition is introduced in [3]

In Mutch and Lowe’s approach [4], symmetric Gabor

filters are applied at all grey-image position and scales, by

means of alternating template matching and a max pooling

operation, feature complexity and position/scale invariance

are developed Sparsity is increased by constraining the

number of features inputs, lateral inhibition, and features

selection The result is that images are reduced to features

vectors that are computed by using a localized pooling

method and classified by a support vector machine (SVM)

Recently, Zhang et al [5] proposed a framework in which

two functional classes of color-sensitive neurons [6],

single-opponent and double-single-opponent neurons, are used as the

inputs of the model of Serre et al [2] Thériault et al [7]

improved the architecture of the visual cortex model by allowing filters (prototype patches) extracted from level C1

to combine multiple scales inside the same spatial neighborhood This provides a better match to the local scale of image structure Multi-resolution spatial pooling, at level C2, also is performed, as a result of which, both local and global spatial information are encoded to produce discriminative image signatures Thériault et al [7] demonstrated that their model outperforms several previous models, for example, those based on biologically inspired architecture [2][4][8] or bag-of-words (BoW) architecture [9-10]

Gabor filters and color information have been usefully exploited in several object recognition studies owing to the effectiveness of Gabor filters as (localization) detectors of lines and edges [11] and the high discriminative power of color The discerning ability of color information is important in many pattern recognition and image processing applications [12-13] Certainly, the additional information provided by color often yields better results than methods that use only grey-scale information [14-16] Bui and Hong [15] examined the use of color cues in the CIELAB color space in a color-based active basis model incorporating a template-based approach and the local power spectrums (LPSs) of color components combined with gradient-based features for object recognition Martin et al [17] explored the utility of the local brightness, color, and texture cues in detecting natural image boundaries

Based on the structure of the model of the visual cortex, the characteristics of Gabor filters, and the color cues of LAB color images, we propose a new framework in which the color information, Gabor energy filters, and the architecture of the visual cortex model are utilized for object recognition Color information, unlike grey images as the input of the visual cortex model, is taken into account in calculating the sparse features of the objects The CIELAB color space was chosen, as it is designed to approximate human vision We endeavored to maintain the structure of the generic framework of alternating convolution and max pooling operations, which enhances selectivity and invariance An image represented in RGB color space is converted to LAB color space And in place of symmetric Gabor filters, a set of spatial Gabor energy filters is applied

to each color component In the learning stage, we adapt the learning strategy presented in [7] A set of discriminative prototype patches of color components is selected randomly

Trang 2

over spatial position, spatial size, and several scales

simultaneously and is extracted by the local maximum over

scales and orientations After alternating layers of feature

mapping (convolution) and of feature pooling, a set of

sparse features is computed by a localized pooling method

and is exploited by an SVM classifier [18] for object

recognition

The research contributions presented in this paper can be

summarized as follows: As substitutes for the symmetric

Gabor filters employed in previous models, Gabor energy

filters are utilized in our framework; color information is

taken into account in calculating sparse features of objects

based on the structure of the model of the visual cortex;

experimental validations of our proposed framework with

respect to previous models are provided For the purposes of

experiments and in order to compare our model with

previous ones, we use the leaves dataset [19], along with

airplane, motorcycle, and rear-car datasets [20], as well as

subsets of the CalTech101 [21]

This paper is organized as follows: A standard model of

the visual cortex for object recognition is overviewed briefly

in Section 2 The methodology of our proposed framework

for extraction of sparse features of color images is presented

in Section 3 The results of experiments conducted on

several image datasets for verification of our object

recognition approach are reviewed in Section 4 Finally,

concluding remarks are drawn in Section 5

II THE STANDARD VISUAL CORTEX MODEL

A standard object recognition model based on a theory of

the ventral visual stream of the visual cortex was introduced

in [1] The structure of the model is kept in modified

versions [2][4][7] Serre et al [2] used this model structure,

with Gabor filter sets at eight bands with 16 scales, as an

alternative to the original Gaussian filter bank; they added a

learning step, and investigated the contributions of the C1

and C2 features to object recognition The basic network

calculation in [2] is summarized as follows

Layer S1: The two-dimensional Gabor function

)

,

(

,

g     , wherein (x,y)2, is centered at the

origin and is given by [22]

~ 2 cos(

) ,

2 2 2

2

~

~ ,

,

















e y x g

y x

 

where ~x xcos ysin , ~y xsin  ycos , and 

is the spatial aspect ratio that determines the ellipticity of

the support of the Gabor function Further,  is the

standard deviation of the Gaussian envelope and determines

the size of the receptive field;  is the wavelength of the

cosine factor where f 1/ is the spatial frequency;  ,

where [0,), represents the orientation of the normal to

the parallel stripes of the Gabor function, and finally,  ,

where (,], is the phase offset that determines the

symmetry of g ,,,(x,y) with respect to the origin

Layer S1 is the response of the convolution of the input

image I(x,y) and a set of spatial symmetric Gabor filters,

) , ( 0 , ,

g    with orientation {1, , }



 N and scale }

, , { 1



 N , where N and  N  are the numbers of orientations and scales, respectively:

  S1, I(x,y)*g ,,,0(x,y)  The S1 stage resembles an edge detector, since symmetric Gabor filters are active only near image edges The S1 unit

is a four-dimensional matrix (x/y/θ/σ)

Layer C1: Layer C1 is a result of the selection of the

maxima over the local spatial neighborhood and the down-sampling of the result This pooling increases the tolerance

to two-dimensional transformation from layer S1 to C1, thereby providing robustness to scale and translation

Similarly, the C1 unit is a four-dimensional matrix (x/y/θ/σ) Layer S2: At every position and scale in the C1 layer,

layer S2 is generated by template matching between the patch of C1 units centered at that position/scale and each of

the N p prototype patches The S2 unit is calculated as

  S2, exp( X  P2) 

where β is a tunable parameter, X is an image patch from the C1 layer at scale σ, and P is one of the N p features Both

maps are calculated across all positions for each of scale



 i, i 1, ,N Here, N p prototype patches (small image

patches having dimensionality n × n × N ) are randomly 

sampled from the C1 layers of training images

Layer C2: A set of C2 features that is sift- and

scale-invariant is generated by applying the global maximum over

all scales and positions from the S2 map A vector of N p C2 values are obtained for each image

III SPARSE FEATURES OF COLOR IMAGES

In the visual cortex model in which symmetric Gabor filters are used [2][4-5][7], sparse features are actually the values that are related to the edges of objects after a certain number of mappings since Gabor filters in the first layer resemble edge detecting filters and are active only near image edges According to the phase offset value of the Gabor function in equation (1), there are two filter types: symmetric and symmetric A filter that deploys an anti-symmetric Gabor function (phase offset 90 or -90 degrees) yields a maximum response exactly at an edge; however, due to the ripples of the Gabor function, there are flanking responses A filter that deploys a symmetric Gabor function (phase offset 0 or 180 degrees) yields a maximum that is shifted from the edge There are actually two maxima: one

to the left and the other to the right of the edge This can cause problems in selecting maxima over local neighborhoods for layer C1, since the symmetric Gabor filter has been used in layer L1 in previous research [2][4-5][7] Moreover, it is adverse to the calculation of a set of sparse features in layer C2, which uses the localized pooling method whereby the maximum output for each test image is achieved in the neighborhood of the training position of the prototype patch extracted from layer C1 [4][7]

Trang 3

When a Gabor energy filter is used, the responses to all

lines and edges in the input image are equally strong This

filter provides a smooth response to an edge or a line of

appropriate width with a local maximum exactly at the edge

or in the center of the line If we apply thinning to this

response, we will obtain one thin line that follows the

corresponding edge or line [23] By contrast, a simple linear

Gabor filter (anti-symmetric or symmetric type) provides

flanking responses owing to the ripples of the Gabor

function Therefore, the use Gabor energy filters will

improve the precision of the calculation of the local

maximum and localized pooling operations

We propose a new framework for object recognition in

which color information is taken into account in the

calculation of sparse features of objects, as in Fig 1 Layers

in the framework alternate between “sensitivity” (to

detecting features) and “invariance” (to position, scale, and

orientation) An input natural image represented in RGB

color space is converted to LAB color space In addition,

rather than symmetric Gabor filters, we use Gabor energy

filters in the calculation of the first layer L1 Each color

component is convoluted with the Gabor energy filter, scale

}

, ,

{ 1



 N , orientations {1, , }



 N At scales σ

and orientation θ, the Gabor energy model E  ,(x,y,c) of

a color component is computed from the superposition of

phases, as follows

 

2 / , , 2

0 , , ,

,

) , (

* ) , , ( ) , (

*

)

,

(

)

,

(

y x g

c y x I y x g

c

y

x

I

c

y

x

E





























where c indicates an index of color components,

}

, ,

{1

C

N

c

c  , and N C is the number of color

components We use eight scales, N  8 , and the

parameters of the Gabor energy filter are listed in Table 1

The superposition of the Gabor energy filter outputs over

the color components is computed as

1 , , ,

N

i





The normalized response E  i,Sum ,(x,y) of the output

superposition is calculated by using a divisive normalization

nonlinearity in which each non-normalized response (cell) is

suppressed by the pooled activity of a large number of

non-normalized responses [24], as follows







j

Sum j

Sum i Sum

i

y x E

y x E K y x E

,

, )

, (

, , , 2

, , , ,

, ,















where K is a constant scale factor and ρ 2 is the

semi-saturation constant The summation j j Sum 

y x

E , , , is taken over a large number of non-normalized responses with

different turnings, and contains E  i,Sum,,x,y, which appears

Figure 1 The proposed scheme of object recognition TABLE I THE PARAMETERS OF THE GABOR FILTERS THESE PARAMETERS ARE USED IN [2]

Scale Filter size Gabor σ Wavelength λ

1

2

3

4

5

6

7

8

7 × 7

11 × 11

15 × 15

19 × 19

23 × 23

27 × 27

31 × 31

35 × 35

2.8 4.5 6.7 8.2 10.2 12.3 14.6 17.0

3.5 5.6 7.9 10.3 12.7 15.5 18.2 21.2

in the numerator Since the value ρ is non-zero, the normalized response has a value from 0 to K, saturating for

high contrasts This normalization step is necessary, since the new model preserves the essential features of linearity in the face of apparently contradictory behavior [24] Here, the normalization responses are pooled over all orientations

Parameters K and ρ were 1 and 0.225, respectively, in our

experiments

+ Layer L2: Layer L2 is the reduced version of L1, and

is obtained by means of the maxima in the local spatial neighborhood and over two adjacent scales This pooling increases the tolerance to position and scale, providing robustness to translation and scale We use spatial pooling sizes that are proportional to the scale presented in (Serre

et al 2007) The sizes are g × g where g  {8, 10, 12, 14,

16, 18, 22, 24}

+ Layer L3: The layer L3 unit is obtained by the

convolution product of layer L2 with a prototype patch P

centered at scale σ Both components of the convolution

operation are normalized to unit length The purpose of this step is to maintain the features’ geometrical similarities regarding the variations of the light intensity Before calculating layer L3 of an image, we need to conduct the learning stage to extract a set of prototype patches that is collected from all training images

+ Layer L4: For each image, the number of sparse features obtained is N p if the number of prototype patches in

the learning stage is N p We use the localized pooling method for calculating layer L4 The training position of each prototype patch is recorded Instead of pooling over the entire images, the L4 unit is the response of taking the maximum output for a test image in the neighborhood of the training position This approach allows for holding some geometric information above the L4 level, and gains the global invariance Thériault et al [7] considered both cases, localized pooling [4] and multi-resolution pooling, and their

Trang 4

results show that multi-resolution pooling at the L4 stage

yields better performance (an additional 2% increase when

testing on CalTech101 for 30 training examples with 4080

prototype patches) Here, we used the multi-resolution

pooling presented in [7] The local pooling regions are

circles centered at a training position of each prototype

patch

+ Learning stage: We adapted the training process

presented in [7], in which the local scale and orientation

learned fit the local image structure Training the prototype

patches with a lower fitness error increases the network

invariance level to basic local geometrical transformations,

and these prototype patches are less sensitive to local

perturbations around the axes of relevant image structures

Additionally, local scaling in the network sketches out the

necessary balance between discriminative power and

invariance for classification Unlike the learning in [2][4-5],

instead of selecting prototype patches at a single scale, the

learning selects a large pool of prototypes of different sizes

n × n, where n ∈ {4, 8, 12, 16} at random positions and

multiple scales simultaneously from the L2 layer of random

training images In training, the coefficients that correspond

to “weak” scales and orientations are set to zero Let NP be

the number of prototype patches extracted by learning Let

} ,

,

{1 3 5 7



S and {k /N ,k0, ,N  1}

The rule for selecting prototype patches is











otherwise 0

) , , , ( max ) , ( if ) ,

,

(

)

,

(

,

*



















i

y x B y

x

B

y

x

P

 

where B i(x i,y i,,) is an image patch of size n × n, and n

∈ {4, 8, 12, 16} at a random position p i(x i,y i, s) ,

}

, ,

1

s  on the layer L2 of a random training image,





 S, Patch B i(x i,y i,*,*) is obtained from the

local maximum over orientations and scales This rule

makes a set of prototype patches more discriminative,

whereby weaker scales and orientations are ignored during

the testing process In the learning stage, the image from

each class is selected randomly, but we pick out prototype

patches on each training dataset of each class equally The

learning procedure is carried out as Algorithm 1

+ Classification stage: Two sets of sparse features of

images in training and testing datasets are calculated from

our framework, which are passed to the classifier LibSVM

[18] for training and classification For multi-class

classification, we used the one-against-all (OAA) SVMs

approach first introduced by Vapnik in [25]

Performance measures: In order to obtain performance

measures, decision functions are applied to predict the labels

(target values) of testing data The prediction result is

evaluated as follows:

data testing total

#

data predicted correctly

#





IV EXPERIMENTAL RESULTS

Here, we illustrate the use of our proposed framework in object recognition All images were rescaled to 140 pixels

on the shortest side, and other side was rescaled automatically to preserve the image aspect ratio Two color-based sparse features datasets of training and testing datasets achieved in the L4 stage were converted to the LibSVM data format and then passed to LibSVM for recognition and classification Our object recognition and classification process were executed on static images and not in real-time The result reported is the mean and standard deviation of the performance measures after five runs with random splits We re-implemented the model in [7], because it is the most closely related to our model We did not focus on the improvement of SVM classifiers In order to make fair comparisons between the methods, we used the same values of the parameters of the algorithms for learning and recognizing objects (e.g., the number of prototype patches, fixed splits on datasets, etc.), as well as SVM parameters

TABLE II A COMPARISON OF RECOGNITION PERFORMANCE MEASURES OBTAINED FOR THE DATASETS THE NUMBER OF PROTOTYPE PATCHES IN THE LEARNING STAGE IS 1,000; THE NUMBER OF NEGATIVE IMAGES IS 200 THE RESULTS IN COLUMNS “BENCHMARK” AND “SERRE ET AL [7]” ARE ARCHIEVED BY THEIR OWN IMPLEMENTATION

Datasets Unit: %

Bench-mark

Serre et

al [2]

Thériault et

al [7]

Our model

Leaves 84.0 95.9 97.43 ± 0.10 98.85 ± 0.10

Motorcycles 95.0 97.4 98.46 ± 0.09 99.60 ± 0.06

Airplanes 94.0 94.9 96.50 ± 0.15 98.36 ± 0.12

Figure 2 Sample images from leaves, rear-car, motorcycle, and airplane

datasets [19-20]

Leaves Car rears

Motorcycles Airplanes

Algorithm 1 for selection of prototype patches

For i=1: N P + Select one random training image + Convert the image to LAB color space + Calculate layer L2 of the image + Select a random position p i(x i,y i, s) , s 1 , ,N } on layer L2 + Extract a random patch B i(x i,y i, s,) of size n × n at position

+ If ( , ) max ( , , , )

,

*





y x B





) , , , ( ) , , , ( i i ** i i i **

else P i(x i,y i,*,*)  0 +Record prototype patch P i(x i,y i,*,*) andp i(x i,y i, s)

End for i

Trang 5

Figure 3 Sample images of objects from CalTech101 dataset

TABLE III RECOGNITION PERFORMANCE OBTAINED ON A SUBSET OF 94

CATERGORIES (7,687 IMAGES) FROM CALTECH101 DATASET THE USE OF 15

AND 30 TRAINING IMAGES PER CLASS CORRESPOND TO COLUMN 2 AND 3,

RESPECTIVELY THE NUMBER OF PROTOTYPE PATCHES IS 4,000

15 images/cat 30 images/cat

Thériault et al [7] 60.80 ± 0.45 69.63 ± 0.38

A Single-class object recognition

Here, the use of our framework for single-class object

recognition is demonstrated, and experimental results are

compared with those of previous works such as involved

benchmark systems (the constellation models [20]), and

grey-based sparse features in [2][7] Each object class is

recognized independently, and a set of sparse features of

each object is extracted from each positive training image

dataset We considered datasets of the following objects:

leaves, car rears, airplanes, and motorcycles from [19-20]

Fig 2 displays sample images from leaves, rear-car,

motorcycle, and airplane datasets [19-20]

In the testing on the rear-car, leaves, airplane, and

motorcycle datasets, we used the same fixed splits on

datasets as in [20] in all of the experiments: each dataset

was randomly split into two separate datasets of equal size

One dataset was used for the learning stage, and the second

one for testing Table 2 provides a summary of the results

achieved with the two methods The values in the

Benchmark and Serre et al [2] columns are results

published from the benchmark systems [19-20] and Serre et

al [2], respectively Both the models of Serre et al [2] and

Thériault et al [7] use grey images as the input of the deep

biologically inspired architectures, but the performance

obtained from the model of Thériault et al [7] is better,

owing to its incorporation of a learning stage in which

prototype patches are extracted over spatial position, spatial

orientations, and multiple-scale In addition, in place of

pooling over an entire image as in [2], Thériault et al [7]

used the localized pooling method for the calculation of

sparse feature sets Nevertheless, our model yielded the

highest performance, because, along with the learning

strategy presented in [7], Gabor energy filters are utilized to

increase the precision of the local maximum and localized

pooling operations; and further, color information is taken

into account in calculating sparse features of objects The

experimental results of single-class object recognition

showed that the use of color information in our model imparts significant improvements to object recognition

B Multi-class object recognition

Here, we illustrate the use of our framework for multi-class object recognition Unlike the case of single-multi-class object recognition, universal sparse features are extracted from random training image datasets and shared by several object classes We used subsets from the CalTech101 dataset, in which most objects of interest are central and dominant CalTech101 in which objects are against either a background or a plain natural scene, are composed of both grey- and color-image types CalTech101 comprises 101 object classes plus a background class, totaling 9,144 images However, because our framework works on color images, we collected subsets from those datasets for our experiments Fig 3 displays sample images of objects from CalTech101 dataset

We employed the fixed splits as follows: either 15 or 30 images were randomly selected from each object dataset for

a training set A testing data set was collected from the images remaining from the data sets of objects Table 3 displays the recognition performance achieved on a subset

of 94 categories (7,687 images) from the CalTech101 dataset, corresponding to the cases for 15 and 30 training images per class when the number of prototype patches in the learning stage was 4,000 The recognition performance for 30 training images per class was better than that for 15 training images per class, even though the number of prototype patches was the same Moreover, our model yielded better results in both cases (15 and 30 training images per class), specifically an improvement in classification score of around 6% These results confirm that that the use of Gabor energy filters and color information in our deep, biologically inspired architecture yields significant improvements in object recognition and classification

V CONCLUSIONS

In this paper, we presented our new framework in which

a combination of Gabor energy filters, the structure of the visual cortex model, and color information is used for extraction of sparse features of color images in the process of object recognition In the learning stage, a set of prototype patches of color components is selected over spatial position, spatial size, and multiple scales simultaneously and is extracted by the local maximum over scales and orientations

A set of sparse features of objects is computed by the localized pooling method, after which is exploited by the classifier SVM for object recognition and classification The utility of our framework in recognizing objects was illustrated on various datasets The experimental results show that the utility of our framework effects significant object recognition improvement

REFERENCES [1] M Riesenhuber & T Poggio, “Hierarchical models of object

recognition in cortex,” Nature Neuroscience, 2(11), 1019–1025,

1999

Sunflowers Starfishes

Crabs Cougar faces

Dragonflies Crayfishes

Trang 6

[2] T Serre, L Wolf, S Stanley, M Riesenhuber, & T Poggio, “ Robust

object recognition with cortex-like mechanisms,” IEEE Transactions

on Pattern Analysis and Machine Intelligence, 29(3), 411–426, 2007

[3] J Mutch & D.G Lowe, “Multiclass object recognition with sparse

localized features,” In IEEE computer society conference on

computer vision and pattern recognition, NewYork: CVPR, 2006, pp

11–18

[4] J Mutch & D.G Lowe, “Object class recognition and localization

using sparse features with limited receptive fields,” International

Journal of Computer Vision, 80(1), 45–57, 2008

[5] J Zhang, Y Barhomil, & T Serre, “A new biologically inspired color

image descriptor,” In Computer vision–ECCV 2012, proceedings of

the 12 th European conference on computer vision, part V, Heidelberg:

Spring-Verlag Berlin, 2012, pp 312–324

[6] R Shapley & M Hawken, Color in the cortex: Single- and

double-opponent cells, Vision Research, 51(7), 701–717, 2011

[7] C Thériault, N Thome, & M Cord, “Extended coding and pooling in

the HMAX model,” IEEE Transactions on Image Processing, 22(2),

764–777, 2013

[8] Y Huang, K Huang, D Tao, T Tan, & X Li, “Enhanced

biologically inspired model for object recognition,” IEEE

Transactions on Systems Man and Cybernetics, part B, 41(6), 1668–

1680, 2011

[9] J Pone, S Lazebnik, & C Schmid, “Beyond bags of features: Spatial

pyramid matching for recognizing natural scene categories,” In IEEE

computer society conference on computer vision and pattern

recognition, New York: CVPR, 2006, pp 2169–2178

[10] J Yang, K Yu, Y Gong, & T Huang, “Linear spatial pyramid

matching using sparse coding for image classification,” In IEEE

computer society conference on computer vision and pattern

recognition, Miami: CVPR, 2009, pp 1794–1801

[11] N Petkov & P Kruizinga, “Computational models of visual neurons

specialized in the detection of periodic and aperiodic oriented visual

stimuli: Bar and grating cells,” Biological Cybernetics, 76(2), 83–96,

1997

[12] T.T.Q Bui & K.S Hong, “Supervised learning of a color-based

active basis model for object recognition,” Proceedings of the 2rd

International Conference on Knowledge and Systems Engineering,

2010, pp 69-74

[13] Y.S Heo, K.M Lee, & S.U Lee, “Joint depth map and color

consistency estimation for stereo images with different illuminations

and cameras,” IEEE Transactions on Pattern Analysis and Machine

http://doi.ieeecomputersociety.org/10.1109/TPAMI.2012.167

[14] K.E van de Sande, T Gevers, & C.G.M Snoek, “Evaluating color

descriptors for object and scene recognition,” IEEE Transactions on

Pattern Analysis and Machine Intelligence, 32(9), 1582–1596, 2010

[15] T.T.Q Bui & K.S Hong, “Evaluating a color-based active basis

model for object recognition,” Computer Vision and Image

Understanding, 116(11), 1111–1120, 2012

[16] F.S Khan, J van de Weijer, & M Vanrell, “Modulating shape

features by color attention for object recognition,” International

Journal of Computer Vision, 98(1), 49–64, 2012

[17] D.R Martin, C.C Fowlkes, & J Malik, “Learning to detect natural

image boundaries using local brightness, color, and texture cues,”

IEEE Transactions on Pattern Analysis and Machine Intelligence,

26(5), 530–549, 2004

[18] C.C Chang & C.J Lin, “LIBSVM: A library for support vector

machines,” ACM Transactions on Intelligent Systems and

Technology, 2(27), 1–27, 2011

[19] M Weber, W Welling, & P Perona, “Unsupervised learning of

models for recognition,” In Computer vision – ECCV 2000,

proceedings of the 6 th European conference on computer vision, part

I, Heidelberg: Springer-Verlag Berlin, 2000, pp 18–32

[20] R Fergus, P Perona, & A Zisserman, “Object class recognition by

unsupervised scale-invariant learning,” In Proceedings of the IEEE

computer society conference on computer vision and pattern recognition, Madison: IEEE Computer Society, 2003, pp 264–271

[21] L Fei-Fei, R Fergus, & P Perona, “Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories,” In Proceedings of the

IEEE computer society conference on computer vision and pattern recognition workshops, Washington: IEEE Computer Society, 2004,

pp 178–187

[22] J.G Daugman, “Uncertainty relation for resolution in space, spatial-frequency, and orientation optimized by two-dimensional visual

cortical filters,” Journal of the Optical Society of America A-Optics

Image Science and Vision, 2(7), 1160–1169, 1985

[23] N Petkov, “Biologically motivated computationally intensive

approaches to image pattern recognition,” Future Generation

Computer Systems, 11(4-5), 451–465, 1995

[24] D.J Heeger, “Modeling simple-cell direction selectivity with normalized, half-squared, linear operators,” Journal of Neurophysiology, 70(5), 1885–1898, 1993

[25] V Vapnik, The nature of statistical learning theory, London:

Springer-Verlag, 1995

Định dạng
Số trang	6
Dung lượng	8,7 MB

Nhận dạng đối tượng dựa trên các đặc trưng &quot;sparse feature&quot; của ảnh màu

Nhận dạng đối tượng dựa trên các đặc trưng "sparse feature" của ảnh màu