Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 12 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
12
Dung lượng
0,94 MB
Nội dung
J Ambient Intell Human Comput (2012) 3:281–292 DOI 10.1007/s12652-012-0141-z ORIGINAL RESEARCH Cluster-based relevance feedback for CBIR: a combination of query point movement and query expansion Nhu-Van Nguyen • Alain Boucher • Jean-Marc Ogier Salvatore Tabbone • Received: 30 June 2011 / Accepted: June 2012 / Published online: 21 June 2012 Ó Springer-Verlag 2012 Abstract This paper presents a cluster-based relevance feedback method, which combines two popular techniques of relevance feedback: query point movement and query expansion Inspired from text retrieval, these two techniques are giving good results for image retrieval But query point movement is limited by a constraint of unimodality in taking into account the user feedbacks Query expansion gives better results than query point movement, but it cannot take into account irrelevant images from the user feedbacks We combine the two techniques to profit from their advantages and to cope with their limitations From a single point initial query, query expansion provides a multiple point query, which is then enhanced using query point movement To learn the multiple point queries, the irrelevant feedback images are classified into query points which are clustered from relevant images using the query expansion technique The experiments show that our method gives better results in comparison with the two techniques of relevance feedback taken individually N.-V Nguyen (&) Á J.-M Ogier L3i-University of La Rochelle, La Rochelle, France e-mail: Nhu-Van.Nguyen@univ-lr.fr J.-M Ogier e-mail: Jean-Marc.Ogier@univ-lr.fr N.-V Nguyen Á A Boucher IFI, MSI team; IRD, UMI 209 UMMISCO; Vietnam National University, Hanoi, Vietnam e-mail: alain.boucher@auf.org N.-V Nguyen Á S Tabbone QGAR-LORIA, University of Lorraine, Nancy, France e-mail: tabbone@loria.fr Keywords Image retrieval Á Relevance feedback Á Query point movement Á Query expansion Introduction There are two reasons for limited performance of all Content-Based Image Retrieval (CBIR) systems The first one is that it is impossible to fully express all the user intent into a simple query for retrieval The latter is due to the the semantic gap, which can be defined as the difference between the user interpretation and the computer description for an image In order to resolve these problems, several researchers (Zhou and Huang 2003; Nguyen et al 2009; Apostol et al 2005; Kim et al 2005; Ritendra et al 2008; Ortega and Mehrotra 2004; Yoshiharu et al 1998) have applied the relevance feedback (RF) techniques in CBIR over the last decade Significant improvements in performance have been witnessed in the application of RF techniques in the traditional text retrieval domain Nowadays, RF has become an essential component of a CBIR system RF is an interactive strategy which is effective to improve the accuracy of information retrieval systems The basic idea of RF is that the user is involved in the retrieval process so the final result set is improved In particular, the user gives feedback on the relevance of documents in an initial set of results It adapts the retrieval process for a specific user and a specific query The user first submits a query (an image as example in our case), then receives some results After that, the user interacts with the system by labeling some images as relevant or irrelevant with the given query The system, in turn, computes a better revised set of retrieval results based on the user feedback RF has a short-term memory which means that the system can 123 282 remember the results during the interaction process for the given query Once it is finished, the system cleans its memory and the next user starts from scratch Various relevance feedback techniques have been proposed to improve the retrieval performance: weight features learning (Yoshiharu et al 1998), query modification (Ortega and Mehrotra 2004), classifier learning (Tao et al 2006) Among them, query representation modification is the most popular technique and is widely used in both image retrieval and text retrieval Query modification includes two different techniques: query expansion and query point movement The first technique, query point movement (Ortega and Mehrotra 2004; Yoshiharu et al 1998) is referred to as the retrieval by single point query (as represented in the feature space) which is modified via relevant and irrelevant images, which represent positive and negative feedbacks from the user It is working with the assumption of the unimodality of relevant images (Yimin and Aidong 2004) Unimodality means that all relevant images are similar between them and they form a distinct cluster from other images in the feature space Query point movement tries to obtain the ideal query point by moving it towards relevant images and away from irrelevant ones The second technique, query expansion (Ortega and Mehrotra 2004; Kim et al 2005), is referred to as the retrieval by multiple point queries Instead of assuming an unimodal distribution, query expansion assumes many smaller unimodal distributions to construct multiple point queries from relevant images Query expansion is arguably one of the most effective approaches of relevance feedback In this paper, a novel method for combining these two techniques is proposed for query by example in CBIR Query expansion is used to construct multiple point queries by clustering the relevant images Query point movement is used to improve the representation of the multiple point queries by applying the Rocchio technique (Salton 1971) on the relevant and the irrelevant images Our contribution is a cluster-based relevance feedback technique which uses the query point movement technique and the irrelevant examples to enhance the efficiency of query expansion This paper is divided into sections In Sect the related work is described and the remaining problems are discussed in Sect Section presents our method Section discusses the evaluation and presents experimental results on a large dataset with 30K images Section concludes the paper and gives some future directions for work Related work Because of the problem of fully expressing the user intents using a simple query and the problem of the semantic gap, 123 N.-V Nguyen et al there have been many works focusing on relevance feedback Various relevance feedback techniques have been proposed: weight features learning (Yoshiharu et al 1998), query modification (Ortega and Mehrotra 2004), classifier learning (Tao et al 2006) Weight features learning improves the distance function, query modification looks for the ideal query point and classifier learning uses the relevant/irrelevant images as training data to construct a probability classifier Among the techniques for relevance feedback, query modification is based on the text retrieval approach and is often considered as the best approach of relevance feedback in image retrieval systems This traditional type of approach is still very efficient compared to all other techniques in the two fields: text retrieval and image retrieval In the general context of the image retrieval process and the development of techniques of relevance feedback, a recognized problem is the small number of available examples We state the hypothesis that a user can label up to 20 images only when most of the learning techniques require much more If we compare the Rocchio algorithm for query modification with learning algorithms (metric of classifier optimization), such as neural networks for example, it can be understood that the popularity of query modification is related to the fact that it requires very few examples in learning To detail these two techniques for query modification, we must first define the concept of unimodality of an image group Unimodality is a concept used by some authors in the field of reference feedback (Karthik and Jawahar 2006; Yimin and Aidong 2004) to characterize the fact that the closest images of a query in the feature space are not all relevant to the query However, there is no clear definition of this concept, so we define it as: Definition The concept of unimodality of an image group means that all images in this group are similar and they form a group distinct from the other images in the feature space In relevance feedback, images in a group are similar in the sense of their relevance with the given query The relevance can be estimated using an arbitrary threshold or function, or in the case of our work, indicated by the user who is labeling some images in the retrieval results as relevant or irrelevant Relevance is then a subjective notion meaning that it satisfies the query as judged by the human user An image group is defined as centered on the query in the feature space, or in another words as the most closest retrieval results for the given query For example in Fig 1, the left group is unimodal while the right group is not unimodal The query modification technique, which we focus on in this paper, can be achieved using either of two approaches: query point movement and query expansion In both approaches, the input is a single point query (or a vector in Cluster-based relevance feedback for CBIR Fig Unimodality of an image group based on the user feedbacks: relevant (?) and irrelevant (-) result images compared with the given query A non-unimodal image group (the group includes irrelevant images as judged by the user given the query) could contain some unimodal subgroups, as in the right group where we can identify contains unimodal subgroups (but not-centered on the query) In our work, we try to identify these unimodal subgroups from a nonunimodal image group 283 Fig Query expansion: a a single point query is replaced by b a multiple point query, using the user feedbacks, relevant (?) example images only qiỵ1 ẳ aqi ỵ b X c X dÀ d jDr j d2D jDn j d2D r ð1Þ n where ~ qi is the query at iteration i of the relevance feedback process, Dr is the relevant set, Dn is the irrelevant set, a, b and c give the relative weights of q, Dr and Dn In experiments, the set of parameters a = b = c = is widely used for image retrieval 2.2 Query expansion Fig Query point movement a The initial query and the user feedbacks (relevant ‘‘?’’ and irrelevant ‘‘-’’ result images) b The query moves toward the relevant images c The query moves toward the relevant images until it is positioned at the center of the relevant images the feature space) Query point movement aims at moving the single point query in the feature space (adjusting the feature vector of the query point, Fig 2) Query expansion aims at replacing the single point query by a multiple point query (replacing a feature vector by multiple feature vectors, Fig 3) Each technique uses the incremental information from interactions with the user, or in other terms, the relevant/irrelevant images returned (labeled) by the user 2.1 Query point movement In the query point movement approach (Ortega and Mehrotra 2004; Yoshiharu et al 1998) for the query by example in CBIR, a query is represented by a single point in the feature space and the refinement process attempts to reformulate the query vector to move it closer to the area containing relevant images (see Fig 2) With the assumption of the unimodality of relevant images, the optimal query maximizes the similarity to relevant images and minimizes the similarity to irrelevant ones (Kim et al 2005) The Rocchio technique (Salton 1971) is often used to compute the optimal query: In the query expansion approach (Kim et al 2005; Ortega and Mehrotra 2004), the query is modified by selectively adding new relevant point to the query representation A single point query is replaced by a multiple point query (see Fig 3) Instead of assuming an unimodal distribution as in query point movement, query expansion assumes many smaller unimodal distributions to construct multiple local clusters from the relevant images The representatives of local clusters are used to perform multiple point querying The clustering of relevant images is repeated for each relevance feedback iteration Querying by multiple points is investigated in (Xiangyu and James 2003; Natsev and Smith 2003; Thijs and de P Vries Arjen 2004; Tahaghoghi et al 2002; Apostol et al 2005; Danzhou et al 2009) which are focused on the similarity function and the fusion of multiple single point query Experimental evaluation in (Ortega and Mehrotra 2004) shows that query expansion outperforms query point movement in retrieval effectiveness Recently, new approaches are aiming to improve the query modification technique The QCluster system (Kim et al 2005) uses a new adaptive classification and clustermerging method to find multiple regions The clustering step is not repeated as in query expansion QCluster classifies relevant examples into the previous clusters or create a new cluster The number of clusters is limited to a fixed number by using a cluster-merging method But this complex approach is unable to make effective use of irrelevant examples All the above methods still have 123 284 N.-V Nguyen et al drawbacks such as local maximum traps and slow convergence In (Danzhou et al 2009), the authors propose a fast query point movement technique to get rid of these drawbacks However, their work aims to specific target search by using relevance feedback, which has some difference with the category search done in classical CBIR Target search in CBIR systems refers to finding a specific (target) image such as a particular registered logo or a specific historical photograph 2.3 Multiple point query Query expansion requires support for multiple point querying Querying by multiple point is investigated in (Thijs and de P Vries Arjen 2004; Tahaghoghi et al 2002; Apostol et al 2005) which are concerned by the similarity function and the fusion of multiple single point queries The similarity of images for each single point query is determined independently The result for a single point query is an ordered list Lists from all single point queries must be combined to determine the final ranking of the multiple points query A combining function is therefore required to reduce multiple similarity values to a single value When this reduction has been performed for all images in the collection, the user is presented with a list of the images, presented in decreasing order of similarity All combining functions can be resumed into three types: MINIMUM, MAXIMUM and SUM These types determine the distance of images from the specified multiple points query to be respectively the minimum, the maximum, and the sum of the distances (with weights) to each single point query In our experiment, the MINIMUM function is found to be the best combining function in term of robustness This is also confirmed by Tahaghoghi et al in (2002) Remaining problems The main disadvantage of query point movement is the constraint of unimodality (see previous definition in Sect 2) on relevant examples The main problem for query expansion is its difficulty to use effectively irrelevant images In query point movement, the query point is moved closer to the relevant examples and away from the irrelevant ones in the feature space When the relevant images are grouped in distinct subsets in the feature space (that is to say the distribution of the relevant examples is not unimodal), then the problem arises from the need to cover multiple clusters with a single query In these cases, the ideal query point includes irrelevant examples Figure shows the ellipse representing the line equidistant from a 123 Fig Remaining problems with query point movement and query expansion a In query point movement the ideal query point can include some irrelevant examples (-) due to the non-unimodality of the relevant examples b In query expansion, ideal query points slowly converge when irrelevant examples (-) are not used Both techniques can cause result in a local maximum trap new query We can see some irrelevant examples included in the relevant ellipses Query expansion and its best improved version QCluster (Kim et al 2005) only use relevant examples to form queries to multiple points The technique of query expansion does not use irrelevant examples because we cannot perform clustering using relevant and irrelevant examples together, which would give false groups Our analysis on the subject suggests that without irrelevant examples, convergence towards the ideal query point can potentially be very slow, and also the risk of falling into a local minimum is not insignificant Indeed, a false ideal query point can be achieved when the local group is close to some relevant examples, but located near also many irrelevant examples (see Fig 4) We can see from this figure that irrelevant examples may be included in local groups, because these are constructed based only on relevant examples regardless of the presence or not of irrelevant ones In general, relevance feedback techniques often use relevant feedback examples The management of irrelevant feedback examples remains a major growth factor, thus representing a very open scientific question (Xuanhui et al 2008) Clustered-based relevance feeback for CBIR In this section, we present our approach which attempts to provide precise answers to questions previously identified This approach exploits irrelevant examples and combines query point movement and query expansion A combination of query point movement and query expansion is proposed to overcome problems related to query expansion and query point movement The main drawback of query point movement is the constraint of unimodality on relevant examples that cannot be always verified We solve this problem by using a clustering Cluster-based relevance feedback for CBIR 285 Fig Combination of query point movement and query expansion, where ideal query points are achieved more efficiently and quickly and irrelevant examples are not present in local clusters a The initial single point query and the feedbacks (relevant ‘‘?’’ and irrelevant ‘‘’’) given by the user b The multiple point query obtained by query expansion c The multiple points query is moved towards relevant feedbacks and away from irrelevant ones using query point movement technique to build multiple local clusters that provide local unimodality using relevant examples The main drawback of query expansion is the inability to make effective use of irrelevant examples In our approach, we propose a sequential combination of the two techniques: first query expansion (Fig 5b) then query point movement (Fig 5c) We are taking advantage of irrelevant examples using the technique of query point movement on multiple local clusters created using query expansion We believe this sequential combination is the best among all possible combinations because it ensures the unimodality constraint and makes use of irrelevant examples (Fig 5c) to effectively achieve the ideal query The opposite combination (first query point movement then query expansion) is not good as query expansion cannot profit from irrelevant examples which were used in query point movement The purpose of this technique is to reach the ideal query through interaction with the user and to overcome the identified problems for both query point movement and query expansion The first relevance feedback interaction loop is shown in Fig Initially, a single point query is formalized by using the feature vector of an image query q: Q = f1, f2, , fn fi is a n-dimension vector in the feature space Then images are retrieved, the first N images are shown to the user (which has a limited view due to screen interface constraints) The user identifies and labels relevant/irrelevant images in an interaction process of RF, with the assumption that relevant examples in the result not ensure the unimodality (Fig 6, steps and 2) Basing on (only) relevant/irrelevant images returned from the user the technique will replace and improve the single point query q by a multiple point query qi, i [ (a query with multiple feature vectors) using the two main processes: query expansion and query movement First, the single point query q is expanded into a multiple point query to ensure the unimodality (of each subquery) which is the problem of query point movement (Fig 6, step 3): the relevant examples are clustered into c groups C1, C2, …, Cc The number of clusters c is selected Fig Main steps for the cluster-based relevance feedback automatically using an adaptive clustering technique and is limited to a maximum value In this step, we try to have the cluster/group maximums that are always unimodal Two clustering algorithms used in our system are presented in the end of this section Second, in order to find the ideal points of the c relevant groups, the query point movement technique is used: irrelevant examples are classified into these c groups (Fig 6, step 4) to identify irrelevant examples present in each local group (in contrast with query expansion where only relevant examples are used) Relevant and irrelevant examples in each group are then used to build the multiple point query by the Eq (Fig 6, step 5) in which we try to move the query points closer to the relevant images and away from the irrelevant images The classifier k Nearest Neighbors (k-NN) is used in step for the classification of irrelevant examples because of its efficiency and simplicity, the parameter k of the classifier is selected as follows: k ẳ minjCi j; i ẳ : cị 2ị and the query point ! qi of cluster i is calculated using the Rocchio’s formula (Salton 1971): Pm ! Pn ! jẳ1 Rj jẳ1 Ij ! qi ẳ 3ị m n 123 286 where I1, I2, …, In: n irrelevant examples and R1, R2, …, Rm : m relevant examples of the local cluster Ci These c points of query form the final multiple point query As discussed above, in the first interaction loop, the initial query (one sole point) is replaced by a multiple points query by building local groups (clustering step) For the following interaction loops, there are two choices to improve the multiple points query The first choice does not rely on the first multiple point query (clustering step of the first iteration), but is re-clustering relevant examples at each iteration This method attempts to add relevant query points and to remove irrelevant points in this same query, based on all relevant/irrelevant examples from each interaction loop Clustering and classification are repeated for each iteration for this method The second choice is to move points of the first query to ideal points based on new relevant/irrelevant examples from the following interactions This method assumes that one can get at ideal query points from the first constructed query points Since we not rebuild local groups, the clustering step is performed once at the beginning (during the first interaction loop), in the following interactions the query is built based on the multiple point query from the first iteration We can observe that the first choice is more influenced by query point movement than query expansion, because it attempts to move the multiple points query to the ideal query In contrast, the second choice is more influenced by query expansion because it tries to create the ideal query points based on the clustering We are calling these two methods: Clustering-Repeat (CR) and Clustering-NoRepeat (CNR) The two corresponding algorithms are described below Clustering-Repeat (CR) In this approach, the clustering step of relevant examples, the classification step of irrelevant examples and the multiple point query construction step are repeated for each iteration of relevance feedback Thus, the system performs the same process for all iterations The query of the previous iteration does not directly affect the new query for the current iteration Examples from the previous iteration are also included in the current iteration Implicitly, relevant points are added and irrelevant ones get dropped as we move from one iteration to the next Clustering-No-Repeat In this approach, the previous query affect directly the new query The clustering step of relevant examples is performed once at the beginning (first iteration) Then, during subsequent iterations, instead of making a new clustering as in the case of the CR method, both of relevant/irrelevant examples are classified in points of the previous query, so take advantage of the previous query New query points are refined from the relevant/irrelevant examples using the query point movement technique: In these two algorithms, we can observe that the difference is in steps 3, and In the case of the CNR 123 N.-V Nguyen et al method, step is performed only once (at the first iteration) while it is repeated for all iterations for the CR method In step 4, only the irrelevant set is classified into clusters for the CR algorithm, while both sets (relevant and irrelevant) are classified into the clusters for the CNR algorithm Step of the CR algorithm, the relevant set is used to rebuild the local groups (step is repeated) Finally, the formula used to construct the multiple points query is different for two algorithms Discussion In this section, we have presented our approach with two variants for relevance feedback Our approach combines two techniques of query modification: query point movement and query expansion, to take advantage of irrelevant examples and to address the problem of unimodality and trying to eliminate all irrelevant examples in the result Both variants of our approach (Clustering-Repeat and Clustering-No-Repeat methods) are aiming at finding the ideal query points when we move from one interaction loop to another The first method (Clustering-Repeat) aims to replace irrelevant query points by relevant query points The second method (Clustering- Cluster-based relevance feedback for CBIR 287 is a fuzzy clustering method which has a computational efficiency (complexity) of O(CDN), C being the number of prototypes, the data points are D-dimensional and N the number of data points to cluster The kNN classification method has a complexity of O(DN), where the data points are D-dimensions and N is the total number of points The total complexity is O(CDN) ? O(DN) which is are suitable for retrieval analysis in large image datasets, remembering that as in our assumption/condition for each interaction the number of samples processed (relevant/irrelevant examples) is very small, estimated at 20 maximum (limited by the quantity of images that the user can label 4.1 Selection of clustering method In our approach of relevance feedback, an important step concerns the clustering of user feedbacks Clustering is used to cluster relevant images in separate groups In our system, the number of groups is unknown We are therefore interested in clustering methods able of determining automatically the optimal number of groups We have experimented using methods: Adaptive K-Means (Kothari and Pitts 1999) and Competitive Agglomeration (Frigui and Krishnapuram 1997) These two methods are chosen for their ability to automatically determine the number of groups, and are representative of two known types of clustering methods in the literature: hierarchical methods and partitional methods Adaptive K-means The best known algorithm for clustering is the k-means method For p models: fxl : l ¼ 1; 2; ; pg; xl Rn ð4Þ the k-means method obtains the position of the k cluster centers y m by minimizing the cost function given by: Jẳ p X k X Iym jxl ịjjxl ym jj2 5ị lẳ1 mẳ1 No-Repeat) aims to move query points to ideal points The first method (CR) is more dependent on the performance of the clustering method used than the second one because in the CR method the clustering is repeated for all iterations The second method (CNR) is more dependent on the construction of the initial points For example, if all the possible relevant examples can be represented in n distinct groups but the relevant examples labeled by the user and used to construct the initial points belonging to c \\ n distinct groups, this can produce a loss in the result The computational complexity of the two algorithms is the sum of the complexity of the clustering and the classification methods used In our case, the Competitive Agglomeration where ||.|| denotes a distance metric, I(ym|xl) is an indicator function which equals if l = arg minł ||xl - ył||2 and otherwise In the Adaptive K-Means method (Kothari and Pitts, 1999), the proposed cost function is: Jẳ p X k X Iym jxl ịjjxl ym jj2 ỵ extra term 6ị lẳ1 mẳ1 extra term ¼ p X k X ~ m jxl Þjjym À yx jj2 k~m Iy 7ị lẳ1 mẳ1 123 288 N.-V Nguyen et al ~ m jxl Þ is an indicator function which equals if where Iðy m l ł y Nyx ; x ¼ argminł jjx À y jj ; and Nyx are neighborhoods of the center of the cluster yx There are two terms in the cost function: the first is similar to the k-means method, the second is an extra term This extra term tries to spread the cluster centers to minimize the sums of squares of distance of a cluster center to cluster centers nearby Smaller values for the neighborhood encourage the formation of several centers in separate clusters, while large values for the neighborhood encourage the formation of fewer distinct cluster centers The Adaptive K-Means method identifies the neighborhood as a scale parameter and provides the number of centers of clusters at different values of the scale parameter The number of centers of clusters in the data is then obtained based on the stability of clusters by varying the scale parameter Competitive agglomeration This second clustering method by (Frigui and Krishnapuram 1997) minimizes an objective function that integrates the advantages of hierarchical and partitional clustering techniques The Competitive Agglomeration algorithm produces a sequence of partitions with a decrease in the number of groups Competitive Agglomeration begins with data partitioning on a specified number of groups, and finally provides the ‘‘best’’ number of groups During the clustering phase, the adjacent groups playing against each other to capture the data points, and groups that are gradually losing in the competition run out and disappear, until only groups with large cardinality survive The algorithm can incorporate different distance measures in the objective function to find a number of groups in various forms Discussion on clustering methods In our experiments, different clustering methods were studied to calculate the local groups Taking advantage of the benefits of both hierarchical and partitional clustering, Competitive Agglomeration (Frigui and Krishnapuram 1997) seems to produce the best performance in our extensive testing Another advantage of this clustering method is the automatic selection of the number of groups Our experiments have shown that the choice of the clustering and the classification methods does not influence much the final result, because the total number of samples (relevant/irrelevant) is very small Let us recall here that the user marks only a few examples as relevant or irrelevant during the relevance feedback process We will present the experiment to compare these clustering methods in the result section of this paper Evaluation We presented our contribution on relevance feedback for content-based image retrieval with two methods These 123 methods are based on a combination of two popular techniques: query point movement and query expansion The main idea of our approach is to avoid the problems associated with query point movement and query expansion to enhance search results This approach provides a good tool to improve the performance of image retrieval In this section we present our experiments to evaluate our methods for relevance feedback 5.1 Experimental protocol For our experiment, we are using different databases: Corel 30K image database (Gustavo et al 2007), Caltech256 database (Griffin et al 2007) and Pascal VOC2011 database (Everingham et al 2007) User interactions are simulated using external knowledge corresponding to the manual annotations in this database Three methods of relevance feedback are evaluated in this experiment: the query point movement, the query expansion and our proposed method with two variants which are ClusteringRepeat (CR) and Clustering-No-Repeat (CNR) The content-based image retrieval system used in the experiments is based on the state-of-the-art Bag of Words model (Sivic and Zisserman 2008) Visual words are built using the SIFT feature, computed as in (Sivic and Zisserman 2008) All the results presented in this section will evaluate the improvement between the initial response from the system (after the initial query) and the one obtained after relevance feedback (in percent of improvement for the precision and recall measures) 5.1.1 Experimental database The Corel 30K image database contains 30,000 images divided into different categories by experts and there are 100 images in each class The Caltech256 database contains about 30,000 images divided into 256 different categories by experts and there are about 100 images in each class The Pascal VOC2011 database contains about 15,000 images, each image being in one or several of the 23 different categories (multiple class images) We rely on a simulation of human interaction, using data already in Corel30K, Caltech256 and PascalVoc2011, playing a role somewhat similar to that of a human A technique of pseudo-relevance feedback is used to simulate automatically human interactions in relevance feedback Our approach relies on the use of textual annotations given for the images in this database, for which there are various possibilities for specifying a ground truth for validation Cluster-based relevance feedback for CBIR 289 5.1.2 Discussion on the protocols used for other systems For the MARS system (Ortega and Mehrotra 2004), images relevant to a query image are selected as follows A query image Q is selected at random from the database and retrieval for the first 50 image results This set of 50 images is referred to the set relevant(Q) Then new queries are constructed by moving around of Q (these queries are close to Q in the feature space) It is then considering Q as the ideal query Queries are chosen from around Q in the hope that they will achieve the ideal query Q (using relevance feedback) Then the first 100 images are retrieved, which become the retrieved (Q) In Mars, precision and recall are calculated using the relevant (Q) set and retrieved (Q) set using the classical formulas below: T relevantQị retrievedQị precison ẳ 8ị retrievedQị rappel ẳ T relevantðQÞ retrievedðQÞ relevantðQÞ ð9Þ For the MARS system (Ortega and Mehrotra 2004), the relevant set is selected by ensuring the unimodality since all images are visually similar to a query image The authors assume that all the relevant images form a unimodal, assumption which is not entirely realistic, creating an implicit limitation of the approach In addition, this work supports all measures on average about 100 queries, which is very small compared to the number of images in the database In another example, the QCluster system (Kim et al 2005), the ground truth is relatively simple because information from high-level category in the Corel database is used as ground truth for simulating the relevance feedback The images of the same class are considered as the most relevant images and related categories (such as flowers and plants) are considered relevant This assumption creates an easy condition for the relevance feedback, because the number of relevant images is then higher compared with other approaches [e.g Mars (Ortega and Mehrotra 2004)], explaining the good quality results for the QCluster system 5.1.3 Our experimental protocol For our experiment, we consider the ground truth as the class of images in Corek30K, Caltech256 and PascalVoc2011, which can produce a wide variety of classes, but that seems representative of real life conditions We measure the retrieval performance with the classical criteria of recall/precision by retrieving the first 100 responses (we assume that the user can see only 100 results on the screen interface) Most of studies (Huiskes and Lew 2008; Yimin and Aidong 2004; Faria et al 2010) on relevance feedback use only a sub-database (10, 20 or 50 categories) for expriment on Corel30K and Caltech256 due to the great number of images in these databases (30,000) while the number of images in each category is small (100) This is done to stress the effect of relevance feedback in the validation process Following a similar protocol, we are dividing the whole database into five different experiment sets to ensure there are relevant images in the first 100 images retrieved The PascalVoc2011 database has 14,961 images and there are from 275 to 1,366 images in each class (except for one class which has 7,419 images), so there is no need to divide this database For the experimentation, we are using about 5,000 queries for each experiment set One parameter for relevance feedback is the number of feedbacks given by the user at each iteration This number of training examples is usually small In our experiments, we rely on the assumption that a maximum of 20 images can be selected by the user These images are chosen as the first P relevant examples and the first N irrelevant examples in the first 100 responses, where P ? N B 20 These examples are automatically returned by the system using the ground truth as we use a technique of pseudo-relevance feedback to simulate automatically human interaction We propose two strategies for the number of examples: Ten relevant examples, 10 irrelevant examples in the case of query point movement, CR and CNR And 20 relevant examples in the case of query expansion We remind that query expansion does not use irrelevant examples because this technique attempts to combine the relevant examples to form the multiple point query Five relevant examples, irrelevant examples in the case of query point movement, CR and CNR And 10 relevant examples in the case of query expansion 5.2 Results and discussion 5.2.1 Retrieval performance over image databases In this section, the relevance feedback techniques are compared according to the protocol described above As mentioned above, we compute the classical criteria of recall/precision by retrieving the first 100 responses As the number of images of each class in Corel30K and Caltech256 database is about 100 (thus, the number of relevant examples is equal to the number of examples retrieved), the recall for the first 100 retrieved images is equal to the precision For the Corel30K database, in the case of experiments based on 10 sample images (Fig 7), our methods are better 123 290 Fig Corel30K: Average accuracy for the first 100 retrieved images for the four techniques of relevance feedback with 10 feedback examples for each iteration QE, Query expansion; QPM, Query point movement; CR, Clustering-Repeat; CNR Clustering-No-Repeat Both CR and CNR methods show very good performance compared to existing query modification techniques than query expansion and query point movement CNR method is slightly better than CR method After two iterations of relevance feedback, query point movement has the worst performance; the other three methods are with equivalent performance During subsequent iterations, both methods CR and CNR become better than traditional techniques The average precision of traditional techniques is approximately of 0.244 after five iterations, while the CNR method has an average accuracy of 0.288 and the CR method has an average accuracy of 0.279 The improvement in accuracy of our methods over traditional techniques is 18 % from these results In the case of experiments with 20 images of feedback (Fig 8), the CNR method outperforms all other methods Our methods have better performance for the early iterations, but the accuracy of the CR method is not better than query point movement for the following iterations In this case, query expansion gives the worst performance; query point movement and the CR method have the same performance with an average accuracy of about 0.305, the CNR method with the best average accuracy of 0.39 The improvement in accuracy for the CNR method compared with traditional techniques is 28% in this experiment Our methods give better results compared to query modification techniques used in MARS (Ortega and Mehrotra 2004) Both also provides a significant improvement in average accuracy compared to QCluster (Kim et al 2005) They show improvements of 18 and 28 % (respectively for 10 and 20 examples of relevance feedback in the first 100 retrieved images) as compared with traditional techniques QCluster has an improvement of 20 % compared with traditional techniques, but for this approach, the number of examples is the maximum number of relevant 123 N.-V Nguyen et al Fig Corel30K: Average accuracy for the first 100 images from the four techniques with 20 examples of relevance feedback for one iteration QE, Query expansion; QPM, Query point movement; CR, Clustering-Repeat; CNR Clustering-No-Repeat The CNR method gives the best result images in the first 100 images result This number is greater than the number of examples in our proposed methods (20 maximum) In reality, the approach proposed by QCluster seems unrealistic in terms of usage, because it is difficult to ask too many interactions from the user A system asking the user 20 interactions seems more realistic compared to one who is asking 100 In addition, Qcluster and MARS are evaluated on only 100 queries and their ground truths are selected solely for their own methods Our method is evaluated on a number of 5,000 queries that provides so much more than generic QCluster and MARS For the Caltech256 database based on 20 sample images (Fig 9), query expansion is the worst and query point movement and CR method are the same The first iteration, all methods have the same performance, while for the latter two iterations, CR is better than query point movement but in the 5th iteration, query point movement is better than CR Only CNR method is always better than other methods The average precision of the best traditional technique is 0.308 after iterations, while the CNR method has an average accuracy of 0.368 and the CR method has an average accuracy of 0.296 The improvement in accuracy of CNR method over traditional techniques is about 20 % For the PascalVOC2011 database based on 20 sample images (Fig 10), query expansion is also the worst and query point movement is better than CR method For the first iteration, the two traditional techniques have better performance than our methods During the latter iterations, query point movement is better than CR method but CNR method always outperforms all other methods The average precision of the best traditional technique is about 0.393 after iterations, while the CNR method has an average accuracy of 0.464 and the CR method has an average accuracy of 0.370 The improvement in accuracy of CNR Cluster-based relevance feedback for CBIR Fig Caltech256: average accuracy for the first 100 images from the techniques with 20 examples of relevance feedback for iteration QE, Query expansion; QPM, Query point movement; CR, Clustering-repeat; CNR, Clustering-No-Repeat The CNR method gives the best result 291 Fig 11 Average accuracy for the first 100 retrieved images for our two techniques of RF with 20 feedback examples for each iteration with different clustering methods: adaptive K-means and competitive agglomeration CR, Clustering-Repeat; CNR, Clustering-No-Repeat In both cases CR and CNR, competitive agglomeration is slightly better than adaptive K-means, the difference being relatively small number of samples (relevant/irrelevant examples) is very low Note that in our system, the user labels few examples (20 maximum) as relevant or irrelevant during an interaction Conclusion Fig 10 PascalVOC2011: average accuracy for the first 100 images from the techniques with 20 examples of relevance feedback for iteration.QE, Query expansion; QPM, Query point movement; CR, Clustering-repeat; CNR, Clustering-No-Repeat The CNR method gives the best result method over traditional techniques is about 18 % in this experiment 5.2.2 Comparison of clustering methods Our algorithms are mainly based on the clustering of sample images We have presented our selection of the clustering approach in Sect 4: adaptive K-means and competitive agglomeration In this section, these two methods are compared based on the performance of image retrieval The Figure 11 illustrates the average accuracy for the first 100 retrieved images In both cases CR and CNR, Competitive Agglomeration is slightly better than Adaptive K-means We can see that the choice of clustering method does not influence much the results, because the total In this article, we are proposing a new method for relevance feedback called cluster-based relevance feedback It is inspired by two existing techniques of relevance feedback scheme: query point movement and query expansion Taking advantage of irrelevant images and advantages of both traditional techniques, our method gives better results The cluster-based relevance feedback is proposed with two different variants: CR and CNR By combining both techniques of query modification that are query point movement and query expansion, these two approaches can benefit from irrelevant examples In all cases, CNR gives the best result Clustering-repeat gives good results when the number of feedback examples is low Our method does not require complex computations, but offers very significant improvements in accuracy compared to traditional techniques As the relevance feedback methods presented here are valid for both text and image retrieval, we are planning, in the near future, to extend our cluster-based relevance feedback by combining together text-based and contentbased image retrieval To achieve this, a text/image learning model is needed and can be built onto the same relevance feedback model This learning model would be considered as long-term memory relevance feedback, because knowledge would be learnt and stored in the 123 292 system for long-term use, as opposed to the short-term memory relevance feedback presented in this article Acknowledgments This project is supported in part by the ICTAsia IDEA project from the French Ministry of Foreign Affairs (MAE), the DRI INRIA and DRI CNRS References Apostol N, Milind N, Jelena T (2005) Learning the semantics of multimedia queries and concepts from a small number of examples In: MULTIMEDIA ’05: Proceedings of the 13th annual ACM international conference on Multimedia, ACM, New York, NY, USA, pp 598–607 Danzhou L, Hua A, Vu K, Yu N (2009) Fast query point movement techniques for large cbir systems IEEE Trans Knowl and Data Eng 21(5):729–743 Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2007) The PASCAL Visual Object Classes Challenge 2011 (VOC2011) Results http://www.pascal-network.org/challenges/ VOC/voc2011/workshop/index.html Faria FF, Veloso A, Almeida HM, Valle E, Torres RdS, Gonc¸alves MA, Meira W Jr (2010) Learning to rank for content-based image retrieval In: Proceedings of the international conference on Multimedia information retrieval, MIR ’10, ACM, New York, NY, USA, pp 285–294 Frigui H, Krishnapuram R (1997) Clustering by competitive agglomeration Pattern Recognition 30(7):1109 – 1119 Griffin G, Holub A, Perona P (2007) Caltech-256 object category dataset Tech Rep 7694, California Institute of Technology, http://authors.library.caltech.edu/7694 Gustavo C, Chan B, Moreno J, Vasconcelos N (2007) Supervised learning of semantic classes for image annotation and retrieval IEEE Trans Pattern Anal Mach Intell 29(3):394–410 Huiskes J, Lew S (2008) Performance evaluation of relevance feedback methods In: CIVR ’08: Proceedings of the 2008 international conference on Content-based image and video retrieval, ACM, New York, NY, USA, pp 239–248 Karthik PS, Jawahar CV (2006) Analysis of relevance feedback in content based image retrieval In: Ninth international conference on control automation robotics and vision, 2006, pp 1–6 Kim D, Chung C, Barnard K (2005) Relevance feedback using adaptive clustering for image similarity retrieval J Syst Softw 78(1):9–23 Kothari R, Pitts D (1999) On finding the number of clusters Pattern Recogn Lett 20(4):405–416 Natsev A, Smith J (2003) Active selection for multi-example querying by content In: ICME ’03: proceedings of the 2003 123 N.-V Nguyen et al international conference on multimedia and expo, IEEE Computer Society, Washington, DC, USA, pp 445–448 Nguyen NV, Ogier JM, Tabbone S, Boucher A (2009) Text retrieval relevance feedback techniques for bag of words model in cbir In: International conference on machine learning and pattern recognition (ICMLPR), Paris, France, pp 541–546 Ortega M, Mehrotra S (2004) Relevance feedback techniques in the mars image retrieval system Multimed Syst 9:535–547 Ritendra D, Dhiraj J, Jia L, James ZW (2008) Image retrieval: Ideas, influences, and trends of the new age ACM Comput Surv 40(2): 1–60 Salton G (ed) (1971) The SMART retrieval system—experiments in automatic document processing Prentice Hall, Englewood, Cliffs Sivic J, Zisserman A (2008) Efficient visual search for objects in videos Proc IEEE 96(4):548–566 Tahaghoghi M, Thom A, Williams E (2002) Multiple example queries in content-based image retrieval In: SPIRE 2002: proceedings of the ninth international symposium on string processing and information retrieval, Springer-Verlag, London, pp 227–240 Tao D, Tang X, Li X, Wu X (2006) Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval IEEE Trans Pattern Anal Mach Intell 28(7): 1088–1099 Thijs W, de P Vries Arjen (2004) Multimedia retrieval using multiple examples In: Image and video retrieval, lecture notes in computer science, vol 3115, Springer, Berlin, pp 2048–2049 Xiangyu J, James CF (2003) Improving image retrieval effectiveness via multiple queries In: MMDB ’03: Proceedings of the 1st ACM international workshop on Multimedia databases, ACM, New York, NY, USA, pp 86–93 Xuanhui W, Hui F, ChengXiang Z (2008) A study of methods for negative relevance feedback In: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, ACM, New York, NY, USA, SIGIR ’08, pp 219–226 Yimin W, Aidong Z (2004) Interactive pattern analysis for relevance feedback in multimedia information retrieval Multimedia Syst 10:41–55 Yoshiharu I, Ravishankar S, Christos F (1998) Mindreader: Querying databases through multiple examples In: VLDB ’98: Proceedings of the 24rd International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 218–227 Zhou S, Huang S (2003) Relevance feedback in image retrieval: a comprehensive review Multimedia Syst 8(6):536–544 ... irrelevant examples and combines query point movement and query expansion A combination of query point movement and query expansion is proposed to overcome problems related to query expansion and query. .. interaction, using data already in Corel30K, Caltech256 and PascalVoc2011, playing a role somewhat similar to that of a human A technique of pseudo -relevance feedback is used to simulate automatically... two variants for relevance feedback Our approach combines two techniques of query modification: query point movement and query expansion, to take advantage of irrelevant examples and to address