Improving digital image retrieval towards image understanding and organization

IMPROVING DIGITAL IMAGE RETRIEVAL TOWARDS IMAGE UNDERSTANDING AND ORGANIZATION CHEN QI NATIONAL UNIVERSITY OF SINGAPORE 2013 IMPROVING DIGITAL IMAGE RETRIEVAL TOWARDS IMAGE UNDERSTANDING AND ORGANIZATION CHEN QI (B.E., Harbin Institute of Technology, 2008) A DISSERTATION SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF COMPUTER SCIENCE NATIONAL UNIVERSITY OF SINGAPORE 2013 Declaration I hereby declare that this thesis is my original work and it has been written by me in its entirety. I have duly acknowledged all the sources of information which have been used in the thesis. This thesis has also not been submitted for any degree in any university previously. Signature: Date: c 2013, CHEN Qi To my parents and elder brother. Acknowledgements I am deeply grateful to my supervisor Prof. Chew Lim Tan who has provided patient guidance during my PhD career, constant encouragement when I lost confidence of future and generous support both technically and financially. He has been so nice to me and done so many wonderful things for me. I am and will always be thankful for that. I would like to express my appreciation to Dr. Gang Wang. I have enjoyed working with him on several projects, including my two papers, and he has provided sound advice in many important decisions on my research work. Without his valuable advice and enthusiastic guidance, my research works could not have been completed. I would also like to thank my co-authors Prof. Andy Yip, Dr. Linlin Li, Dr. Tianxia Gong and Dr. Boon Chuan Pang. They have offered key insights into my work and suggestions that led to improvements. Sincere thanks is also extended to my dear colleagues in Artificial and Intelligence Lab: Sun Jun, Su Bolan, Mitra Mohtarami, Situ Liangji and Zhang Xi. They have created a friendly working environment and I really enjoyed the fruitful discussions with these brilliant people. I also owe much to my lovely friends in Singapore: Hao Jia, Lu Meiyu, Zhang Meihui, Wang Xiaoli, Ma He, etc. Their warm friendship made the life here much easier and joyful. Special thanks to Hao Jia for her always kindness and being like a sister to me. I would also like to thank my boyfriend, Deng Fanbo, who has been taking care of me, sharing his life with me and loving me all these years. Lastly, I would like to thank my parents for their unfailing love and unselfishly support in the last 25 years of my life. I want to perpetuate the memory of my elder brother who protected and loved me, and deserves the eternal happiness. Contents Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Introduction 1.1 Motivation . . . . . . . 1.2 Problems to Be Solved 1.3 Contributions . . . . . 1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Literature Review 2.1 Image Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Fashion Image Understanding . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Image Search Result Organization . . . . . . . . . . . . . . . . . . . . 14 Generic Image Annotation 3.1 Introduction . . . . . . . . . . . . . . . . . 3.2 Approach . . . . . . . . . . . . . . . . . . . 3.2.1 Word Embedding Model . . . . . . 3.2.2 Neighborhood Selection . . . . . . 3.2.3 Model Learning . . . . . . . . . . . 3.2.4 Image Annotation . . . . . . . . . 3.3 Data Sets and Experimental Settings . . . 3.3.1 Data Sets . . . . . . . . . . . . . . . 3.3.2 Features . . . . . . . . . . . . . . . 3.3.3 Evaluation Baselines and Criteria . i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 16 18 18 19 20 21 22 22 23 24 3.4 . . . . . . 25 25 26 26 27 28 Fashion Image Understanding 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.1 Basic Visual Pattern Discovery . . . . . . . . . . . . . . . . . . 4.4.2 Visual Pattern based Image Representation . . . . . . . . . . . 4.4.3 Discriminative Latent Models . . . . . . . . . . . . . . . . . . . 4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Classification Performance . . . . . . . . . . . . . . . . . . . . 4.5.2 Qualitative Results of Discovered Fashionable Visual Patterns 4.5.3 Fashionable Visual Pattern Centric Dress Retrieval . . . . . . . 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 32 34 35 36 37 38 38 42 43 43 44 45 Image Organization through Clustering 5.1 Introduction . . . . . . . . . . . . . . . . . . . . 5.2 Related Work . . . . . . . . . . . . . . . . . . . 5.3 Approach . . . . . . . . . . . . . . . . . . . . . . 5.3.1 The Multi-Class Clustering Phase . . . 5.3.2 The Cluster-Specific Refinement Phase 5.3.3 New Clusters Discovery . . . . . . . . . 5.4 Extension to Object Discovery . . . . . . . . . . 5.5 Experiments . . . . . . . . . . . . . . . . . . . . 5.5.1 Features . . . . . . . . . . . . . . . . . . 5.5.2 NUS-WIDE Clustering . . . . . . . . . . 5.5.3 Google Image Clustering . . . . . . . . 5.5.4 MSRC Object Discovery . . . . . . . . . 47 47 50 50 52 55 57 57 57 58 58 60 61 3.5 Experimental Results . . . . . . . . . . . . . . . . 3.4.1 Results on the Corel 5K Data Set . . . . . 3.4.2 Results on the IAPR TC12 Data Set . . . . 3.4.3 Results on the NUS-WIDE-LITE Data Set 3.4.4 Visualization of Word Vectors . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Conclusion 69 6.1 Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.2 Limitations and Future Work . . . . . . . . . . . . . . . . . . . . . . . 71 Bibliography 73 Summary Image retrieval is to perform image browsing, searching and retrieving through a large digital database. There are two branches of image retrieval systems. The traditional concept-based image retrieval usually attaches images with their metadata such as text extracted from relevant HTML pages or tags assigned by human. Such image retrieval systems often suffer from irrelevant images since the attached metadata could be noisy. Things seem to be better for manually assigned tags, but it is time consuming and costly to label all images manually. The other branch is content-based image retrieval which purely relies on the visual content of images. For both of these two branches, understanding the content of images in an effective and efficient manner is very necessary and thus becomes one of the research topics in this dissertation. Another research problem investigated in this dissertation is image search result organization. Current image retrieval systems often display search results in a flat structure which is far from satisfactory compared with cluster-based image organization. In terms of image content understanding, we make one step ahead to automatically associate images with semantic-related keywords, which is called automatic image annotation. In Chapter 3, we consider image annotation as a generic problem and propose a discriminative word embedding learning model. We define a new low-dimensional embedding space and project both images and keywords into this space through neighborhood propagation. The proposed embedding model achieves significant improvements on the annotation accuracy. In Chapter 4, we consider image annotation in a specific domain. We investigate how to understand fashion since which has become a very large industrial sectors around the world. In this work, we model the fashionability Chapter 5. Image Organization through Clustering Figure 5.9: Segments discovered in the MSRC dataset. Two topics including “bicycle” and “tree” are shown. 67 Chapter 5. Image Organization through Clustering Figure 5.10: Segments discovered in the MSRC dataset. Two topics including “building” and “cow” are shown. 68 Chapter Conclusion We have presented three works including generic image annotation, fashion image understanding and image organization through clustering. In this chapter, we give a summary of this dissertation. An assessment of these works is firstly described along with the limitations and possible future research directions. 6.1 Assessment In this dissertation, we focus on facilitating image retrieval from the aspects of image content understanding and organization. For the topic of image content understanding, we propose a work on automatic image annotation for general images and another work for a specific domain: fashion image understanding. For image organization, we have proposed an active clustering framework with human in the loop. These three works are summarized respectively as follows. We present an automatic image annotation work in Chapter 3. The aim of image annotation is to assign keywords or concepts to digital images based on their semantic meanings. We propose a discriminative embedding learning method to model the semantic space of the keywords. Different from some previous embedding learning based methods, we explicitly explore the visual similarity between images in order to effectively propagate the label information among neighbors. Considering the time cost for neighborhood selection, we adopt locality-sensitive hashing to calculate the approximate neighbors which leads to times acceleration compared to exact neighborhood computation. Furthermore, we learn the model in 69 Chapter 6. Conclusion a stochastic manner which further speeds up the training. The proposed approach is compared with a line of nearest neighbor based methods and one embedding learning based method which obtains current state of the art annotation precision. We perform the evaluation on public data sets. Our model achieves significant improvement over these methods, especially on the precision score, which shows that the proposed model has obvious superiority in the task of image annotation. This work has been published in ICTAI’2012 [12]. In Chapter 4, we consider modeling fashion using computer vision techniques and specifically we target dress fashion. The goal is to study the elements that make a dress fashionable or unfashionable and to train a discriminative classifier to identify fashionable dressers from unfashionable ones. This is a novel and interesting research problem with large research gap in the topic of fashion understanding. To achieve our goal, we first discover a set of common visual patterns that appear in the dress images without label information by adopting a discriminative clustering technique. After that, we propose a discriminative model to train fashion classifier and identify fashionable visual patterns simultaneously, with the assumption that these two tasks are complementary to each other. The experimental results show that the proposed joint model can reasonable find fashionable visual patterns and achieve promising accuracy on fashion classification work. We also conduct an image retrieval experiment based on the identified fashionable visual patterns. It achieves better results compared with the traditional image retrieval and shows large potentials in online shopping applications. A part of this work has been published in ICME’2013 [11]. Our third work, introduced in Chapter 5, is to organize images by clustering them into coherent groups. Unsupervised image clustering has always been difficult due to the complicated visual patterns of images. While it is hard for computer to interpret the visual information, human can easily understand the semantic meaning of an image. Hence in this work we outsource a small ratio of image labelling tasks to Amazon Mechanical Turk iteratively. The obtained label information is then utilized in an active metric learning and discriminative clustering procedure. We demonstrate the proposed active clustering framework on images from multiple sources such as Google and Flickr images, and achieve high quality image clusters with a low cost. We further extend it to object discovery task, the aim of which is to partition a set of disordered image segments into different groups 70 Chapter 6. Conclusion or categories. We compare the proposed framework with one popular object discovery model and our method obtains better purity and mean average precision scores with the same number of clusters generated. This work has been published in ICTAI’2012 [10]. 6.2 Limitations and Future Work In this section we discuss the limitations and possible future research directions for the proposed three works. Generic image annotation As we mentioned that the computation of nearest neighborhood could be very inefficient. Though we have adopted the locality sensitive hashing (LSH) algorithm to obtain approximate neighborhood, efficiency is still an important issue. Moreover, the number of hash functions and hash tables should be defined before the neighborhood selection and determining the values of which could be a trade off between the efficiency and effectiveness of the resulted hashing method. In future we could investigate possible strategies of determining these parameters for LSH. We might also perform a comparison between LSH and other fast neighborhood methods such as space partitioning in order to improve the search speed. Another future direction might be to improve the visual similarity measurement. Metric learning techniques could be integrated in the proposed model. Fashion image understanding In this work, a dress image is roughly partitioned into parts before we perform the common visual pattern discovery, and this part partition could affect the quality of discovered patterns. Current partition method is very simple and heuristic as shown in Figure 4.2. Therefore we can apply part detection techniques in this process and more accurate partition results should be generated. Furthermore, we are interested to investigate how to perform relevance feedback for the fashionable visual pattern based retrieval and applying the proposed framework to other products such as handbags and shoes. Image organization through clustering During our crowdsourcing experiments on Amazon Mechanical Turk (MTurk), we found that the labelling accuracy differs and varies over different workers or even the same worker but different time. Though we have utilized a soft margin strategy in both distance metric learning and 71 Chapter 6. Conclusion SVM based clustering, low-quality inputs could still affect the quality of clustering results. Therefore we could consider modelling the quality of different workers and improve the overall labelling accuracy. As one labelling task is usually assigned to multiple workers, we can model the inputs of multiple workers for a set of labelling tasks in a matrix, and obtain better labelling results using matrix recovery techniques. We can also consider the quality control methods developed in [20, 84] to filter the low-quality inputs and maintain a high-quality workforce. 72 Bibliography [1] A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In Foundations of Computer Science, pages 459–468. IEEE, 2006. 3.2.2 [2] D. Bespalov, A. Dahl, B. Bai, and A. Shokoufandeh. On inferring image label information using rank minimization for supervised concept embedding. Image Analysis, pages 103–113, 2011. 1.2, 2.1, 3.1 [3] A. Bosch, A. Zisserman, and X. Munoz. Representing shape with a spatial pyramid kernel. In Proceedings of ACM Conference on Content-based Image and Video Retrieval, pages 401–408, 2007. 5.5.1 [4] L. Bottou. Stochastic gradient learning in neural networks. Proceedings of Neuro-Nımes, 91, 1991. 3.2.3, 5.3.1 [5] S. Branson, P. Perona, and S. Belongie. Strong supervision from weak annotation: Interactive training of deformable part models. In Proceedings of IEEE International Conference on Computer Vision, pages 1832–1839, 2011. 2.1, 4.4.3, 5.2 [6] S. Branson, C. Wah, F. Schroff, B. Babenko, P. Welinder, P. Perona, and S. Belongie. Visual recognition with humans in the loop. In Proceedings of European Conference on Computer Vision, pages 438–451. Springer, 2010. 2.1 [7] D. Cai, X. He, Z. Li, W.Y. Ma, and J.R. Wen. Hierarchical clustering of www image search results using visual, textual and link information. In Proceedings of ACM International Conference on Multimedia, pages 952–959, 2004. 1.2, 2.3 [8] G. Carneiro, A.B. Chan, P.J. Moreno, and N. Vasconcelos. Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3):394–410, 2007. 2.1 [9] Huizhong Chen, Andrew Gallagher, and Bernd Girod. Describing clothing by semantic attributes. In Computer Vision–ECCV 2012, pages 609–623. Springer, 2012. 2.2 73 BIBLIOGRAPHY [10] Q. Chen, G. Wang, and C.L. Tan. Web image organization and object discovery by actively creating visual clusters through crowdsourcing. In Proceedings of IEEE International Conference on Tools with Artificial Intelligence, 2012. 1.3, 6.1 [11] Q. Chen, G. Wang, and C.L. Tan. Modeling fashion. In Proceedings of IEEE International Conference on Multimedia and Expo, 2013. 1.3, 6.1 [12] Q. Chen, A. Yip, and C.L. Tan. Automatic image annotation using word embedding learning. In Proceedings of IEEE International Conference on Tools with Artificial Intelligence, 2012. 1.3, 6.1 [13] X. Chen, Y. Mu, S. Yan, and T.S. Chua. Efficient large-scale image annotation by probabilistic collaborative multi-label propagation. In Proceedings of ACM International Conference on Multimedia, pages 35–44, 2010. 1.2, 2.1, 3.2.2 [14] C.C. Chiang, M.W. Hung, Y.P. Hung, and W.K. Leow. Image annotation with relevance feedback using a semi-supervised and hierarchical approach. In Proceedings of International Conference on Computer Vision Theory Application, pages 173–178, 2008. 2.1 [15] T.S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng. Nus-wide: A realworld web image database from national university of singapore. In Proceedings of ACM Conference on Content-based Image and Video Retrieval, page 48, 2009. 5.5 [16] T.S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y.T. Zheng. Nus-wide: A real-world web image database from national university of singapore. In Proceedings of ACM Conference on Content-based Image and Video Retrieval, Santorini, Greece., 2009. 3.3.1 [17] C. Cusano, G. Ciocca, and R. Schettini. Image annotation using svm. In Proceedings of Internet imaging IV, Vol. SPIE, volume 5304, pages 330–338. Citeseer, 2004. 1.2, 2.1 [18] H. Ding, J. Liu, and H. Lu. Hierarchical clustering-based navigation of image search results. In Proceedings of ACM International Conference on Multimedia, pages 741–744, 2008. 2.3 [19] C. Doersch, S. Singh, A. Gupta, J. Sivic, and A.A. Efros. What makes paris look like paris? ACM Transactions on Graphics, 31(4):101, 2012. 4.2 [20] P. Donmez, J.G. Carbonell, and J. Schneider. A probabilistic framework to learn from multiple annotators with time-varying accuracy. In Proceedings of SIAM International Conference on Data Mining, pages 826–837, 2010. 6.2 74 BIBLIOGRAPHY [21] P. Duygulu, K. Barnard, J. De Freitas, and D. Forsyth. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. Proceedings of European Conference on Computer Vision, pages 349–354, 2006. 1.2, 2.1, 3.3.1 [22] A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. Describing objects by their attributes. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1778–1785, 2009. 4.4.1 [23] P.F. Felzenszwalb, R.B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9):1627–1645, 2010. 4.2 [24] S.L. Feng, R. Manmatha, and V. Lavrenko. Multiple bernoulli relevance models for image and video annotation. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2, pages II–1002, 2004. 2.1 [25] B.J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 315(5814):972–976, 2007. 4.4.1, 5.3, 5.3.1 [26] Andrew C Gallagher and Tsuhan Chen. Clothing cosegmentation for recognizing people. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8. IEEE, 2008. 2.2 [27] C. Galleguillos, B. McFee, S. Belongie, and G. Lanckriet. From region similarity to category discovery. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 2665–2672, 2011. 5.1, 5.2, 5.4 [28] B. Gao, T.Y. Liu, T. Qin, X. Zheng, Q.S. Cheng, and W.Y. Ma. Web image clustering by consistent utilization of visual features and surrounding texts. In Proceedings of ACM International Conference on Multimedia, pages 112–121, 2005. 1.2, 2.3 [29] A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proceedings of International Conference on Very Large Data Bases, pages 518–529, 1999. 2.2, 3.1, 3.2.2 [30] R. Gomes, P. Welinder, A. Krause, and P. Perona. Crowdclustering. In Advances in Neural Information Processing Systems, 2011. 2.3 [31] D. Grangier and S. Bengio. A discriminative kernel-based approach to rank images from text queries. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(8):1371–1384, 2008. 1.2, 2.1 75 BIBLIOGRAPHY [32] K. Grauman and T. Darrell. Unsupervised learning of categories from sets of partially matching image features. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 1, pages 19–25, 2006. 5.2 [33] M. Grubinger. Analysis and evaluation of visual information systems performance. PhD thesis, Victoria University, 2007. 3.3.1 [34] M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid. Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In Proceedings of IEEE International Conference on Computer Vision, pages 309–316, 2009. 1.2, 2.1, 3.3.3 [35] T. Hertz, A. Bar-Hillel, and D. Weinshall. Learning distance functions for image retrieval. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2, pages II–570, 2004. 1.2, 2.1 [36] W. Hu, T. Tan, L. Wang, and S. Maybank. A survey on visual surveillance of object motion and behaviors. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 34(3):334–352, 2004. 4.1 [37] J. Jeon, V. Lavrenko, and R. Manmatha. Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of International ACM SIGIR conference on Research and development in Information Retrieval, pages 119–126, 2003. 2.1 [38] F. Jing, C. Wang, Y. Yao, K. Deng, L. Zhang, and W.Y. Ma. Igroup: web image search results clustering. In Proceedings of ACM International Conference on Multimedia, pages 377–384, 2006. 1.2, 2.3 [39] Yannis Kalantidis, Lyndon Kennedy, and Li-Jia Li. Getting the look: clothing recognition and segmentation for automatic product suggestions in everyday photos. In Proceedings of the 3rd ACM conference on International conference on multimedia retrieval, pages 105–112. ACM, 2013. 2.2 [40] G. Kim, C. Faloutsos, and M. Hebert. Unsupervised modeling of object categories using link analysis techniques. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1–8, 2008. 4.2, 5.2 [41] A. Kovashka, S. Vijayanarasimhan, and K. Grauman. Actively selecting annotations among objects and attributes. In Proceedings of IEEE International Conference on Computer Vision, pages 1403–1410, 2011. 2.1 76 BIBLIOGRAPHY [42] Joseph B Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical society, 7(1):48–50, 1956. 4.4.3 [43] V. Lavrenko, R. Manmatha, and J. Jeon. A model for learning the semantics of pictures. In Advances in Neural Information Processing Systems, 2003. 2.1 [44] Y.J. Lee and K. Grauman. Foreground focus: Unsupervised learning from partially matching images. International Journal of Computer Vision, 85(2):143– 166, 2009. 5.2 [45] Y.J. Lee and K. Grauman. Object-graphs for context-aware category discovery. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1–8, 2010. 4.2, 5.1, 5.2, 5.4, 5.3, 5.5.4 [46] T. Leung and J. Malik. Representing and recognizing the visual appearance of materials using three-dimensional textons. International Journal of Computer Vision, 43(1):29–44, 2001. 3.3.2 [47] D.D. Lewis and W.A. Gale. A sequential algorithm for training text classifiers. In Proceedings of International ACM SIGIR conference on Research and development in Information Retrieval, pages 3–12, 1994. 5.3.1 [48] D. Liu and T. Chen. Unsupervised image categorization and object localization using topic models and correspondences between images. In Proceedings of IEEE International Conference on Computer Vision, pages 1–7, 2007. 5.2 [49] H. Liu, X. Xie, X. Tang, Z.W. Li, and W.Y. Ma. Effective browsing of web image search results. In Proceedings of ACM SIGMM international workshop on Multimedia information retrieval, pages 84–90, 2004. 1.2, 2.3, 5.1 [50] J. Liu, M. Li, Q. Liu, H. Lu, and S. Ma. Image annotation via graph learning. Pattern recognition, 42(2):218–228, 2009. 1.2, 2.1 [51] S. Liu, J. Feng, Z. Song, T. Zhang, H. Lu, C. Xu, and S. Yan. Hi, magic closet, tell me what to wear! In Proceedings of ACM International Conference on Multimedia, pages 619–628, 2012. 1.2, 2.2 [52] S. Liu, Z. Song, G. Liu, C. Xu, H. Lu, and S. Yan. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 3330–3337, 2012. 1.2, 2.2 [53] D.G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004. 3.3.2, 5.5.1 77 BIBLIOGRAPHY [54] B. Luo, X. Wang, and X. Tang. World wide web based image search engine using text and image content features. In Electronic Imaging, pages 123–130. International Society for Optics and Photonics, 2003. 2.3 [55] A. Makadia, V. Pavlovic, and S. Kumar. A new baseline for image annotation. In Proceedings of European Conference on Computer Vision, volume 8, pages 316– 329, 2008. 1.2, 2.1, 3.3.1, 3.3.3, 5.5.1 [56] T. Mensink, J. Verbeek, and G. Csurka. Learning structured prediction models for interactive image labeling. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 833–840, 2011. 2.1 [57] T. Mensink, J. Verbeek, and G. Csurka. Tree-structured crf models for interactive image labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(2):476–489, 2013. 2.1 [58] F. Monay and D. Gatica-Perez. Plsa-based image auto-annotation: constraining the latent space. In Proceedings of ACM International Conference on Multimedia, pages 348–351, 2004. 1.2, 2.1 [59] D.M. Mount and S. Arya. Ann: A library for approximate nearest neighbor searching. In CGC Annual Fall Workshop on Computational Geometry, 1997. 2.1 [60] Y. Mu, J. Shen, and S. Yan. Weakly-supervised hashing in kernel space. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 3344–3351, 2010. 3.2.2 [61] J.Y. Pan, H.J. Yang, C. Faloutsos, and P. Duygulu. Automatic multimedia cross-modal correlation discovery. In Proceedings of ACM SIGKDD Conference on Knowledge, Discovery, and Data Mining, pages 653–658, 2004. 1.2, 2.1 [62] D. Parikh and K. Grauman. Interactively building a discriminative vocabulary of nameable attributes. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1681–1688, 2011. 4.2 [63] Robert Clay Prim. Shortest connection networks and some generalizations. Bell system technical journal, 36(6):1389–1401, 1957. 4.4.3 [64] B.C. Russell, W.T. Freeman, A.A. Efros, J. Sivic, and A. Zisserman. Using multiple segmentations to discover objects and their extent in image collections. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2, pages 1605–1614, 2006. 4.2, 5.1, 5.2, 5.4 78 BIBLIOGRAPHY [65] F. Schroff, A. Criminisi, and A. Zisserman. Harvesting image databases from the web. In Proceedings of IEEE International Conference on Computer Vision, pages 1–8, 2007. 5.3.2 [66] M. Schultz and T. Joachims. Learning a distance metric from relative comparisons. Advances in Neural Information Processing Systems, page 41, 2004. 5.3.1, 5.3.1 [67] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000. 5.5.4 [68] J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake. Real-time human pose recognition in parts from single depth images. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1297–1304, 2011. 4.1 [69] S. Singh, A. Gupta, and A.A. Efros. Unsupervised discovery of mid-level discriminative patches. Proceedings of European Conference on Computer Vision, 2012. 4.2, 4.4.1, 4.4.1 [70] A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Contentbased image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349–1380, 2000. 1.2 [71] N. Snavely, S.M. Seitz, and R. Szeliski. Photo tourism: exploring photo collections in 3d. In ACM Transactions on Graphics, pages 835–846, 2006. 4.1 [72] Zheng Song, Meng Wang, Xian-sheng Hua, and Shuicheng Yan. Predicting occupation via human clothing and contexts. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 1084–1091. IEEE, 2011. 2.2 [73] J. Tang, R. Hong, S. Yan, T.S. Chua, G.J. Qi, and R. Jain. Image annotation by k nn-sparse graph-based label propagation over noisily tagged web images. ACM Transactions on Intelligent Systems and Technology, 2(2):14, 2011. 1.2, 2.1 [74] Y. Tian, W. Liu, R. Xiao, F. Wen, and X. Tang. A face annotation framework with partial clustering and interactive labeling. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1–8, 2007. 5.2 [75] S. Tong and D. Koller. Support vector machine active learning with applications to text classification. The Journal of Machine Learning Research, 2:45–66, 2002. 5.3.2 79 BIBLIOGRAPHY [76] D. Tsai, Y. Jing, Y. Liu, H.A. Rowley, S. Ioffe, and J.M. Rehg. Large-scale image annotation using visual synset. In Proceedings of IEEE International Conference on Computer Vision, pages 611–618, 2011. 1.2, 2.1, 3.3.3, 3.4.3 [77] J.J. Tsay, C.H. Lin, C.H. Tseng, and K.C. Chang. On visual clothing search. In Proceedings of IEEE International Conference on Technologies and Applications of Artificial Intelligence, pages 206–211, 2011. 1.2 [78] R.H. Van Leuken, L. Garcia, X. Olivares, and R. van Zwol. Visual diversification of image search results. In Proceedings of International Conference on World Wide Web, pages 341–350. ACM, 2009. 1.2, 2.3 [79] S. Vijayanarasimhan and K. Grauman. Multi-level active prediction of useful image annotations for recognition. Computer Science Department, University of Texas at Austin, 2008. 5.2 [80] S. Vijayanarasimhan and K. Grauman. What’s it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 2262–2269, 2009. 5.2 [81] S. Vijayanarasimhan and K. Grauman. Large-scale live active learning: training object detectors with crawled data and crowds. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1449–1456, 2011. 5.2 [82] L. Von Ahn and L. Dabbish. Labeling images with a computer game. In Proceedings of SIGCHI Conference on Human Factors in Computing Systems, pages 319–326. ACM, 2004. 5.1 [83] C. Wah, S. Branson, P. Perona, and S. Belongie. Multiclass recognition and part localization with humans in the loop. In Proceedings of IEEE International Conference on Computer Vision, pages 2524–2531, 2011. 2.1 [84] P. Wais, S. Lingamneni, D. Cook, J. Fennell, B. Goldenberg, D. Lubarov, D. Marin, and H. Simons. Towards building a high-quality workforce with mechanical turk. Proceedings of Computational Social Science and the Wisdom of Crowds, pages 1–5, 2010. 6.2 [85] H. WANG, J. DU, and Q. GUO. The application of content based image retrieval technology in clothing retrieval system. Computing Technology and Automation, 2:022, 2009. [86] J. Wang, L.Y. Jia, and X.S. Hua. Interactive browsing via diversified visual summarization for image search results. Multimedia systems, 17(5):379–391, 2011. 1.2, 2.3 80 BIBLIOGRAPHY [87] Nan Wang and Haizhou Ai. Who blocks who: Simultaneous clothing segmentation for grouping images. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 1535–1542. IEEE, 2011. 2.2 [88] X.J. Wang, W.Y. Ma, Q.C. He, and X. Li. Grouping web image search result. In Proceedings of ACM International Conference on Multimedia, pages 436–439, 2004. 1.2, 2.3, 5.1 [89] X.J. Wang, W.Y. Ma, L. Zhang, and X. Li. Iteratively clustering web images based on link and attribute reinforcements. In Proceedings of ACM International Conference on Multimedia, pages 122–131, 2005. 1.2, 2.3, 5.5 [90] Y. Wang and G. Mori. A discriminative latent model of object classes and attributes. Proceedings of European Conference on Computer Vision, pages 155– 168, 2010. 4.2, 4.4.3 [91] P. Welinder, S. Branson, S. Belongie, and P. Perona. The multidimensional wisdom of crowds. Advances in Neural Information Processing Systems, 23:2424– 2432, 2010. 2.1 [92] J. Weston, S. Bengio, and N. Usunier. Large scale image annotation: learning to rank with joint word-image embeddings. Machine learning, 81(1):21–35, 2010. 1.2, 2.1, 3.1, 3.2.1, 3.3.3 [93] J. Weston, S. Bengio, and N. Usunier. Wsabie: Scaling up to large vocabulary image annotation. In Proceedings of International Joint Conference on Artificial Intelligence, 2011. 1.2, 2.1, 3.1, 3.2.1, 3.2.3, 3.3.3 [94] J. Whitehill, P. Ruvolo, T. Wu, J. Bergsma, and J. Movellan. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. Advances in Neural Information Processing Systems, 22(2035-2043):7– 13, 2009. 2.1 [95] O. Yakhnenko and V. Honavar. Annotating images and image objects using a hierarchical dirichlet process model. In Proceedings of International Workshop on Multimedia Data Mining: held in conjunction with the ACM SIGKDD, pages 1–7, 2008. 1.2, 2.1 [96] K. Yamaguchi, M.H. Kiapour, L.E. Ortiz, and T.L. Berg. Parsing clothing in fashion photographs. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 3570–3577, 2012. 1.2, 2.2 [97] L. Yang, R. Jin, and R. Sukthankar. Bayesian active distance metric learning. In Proceedings of Conference on Uncertainty in Artificial Intelligence, 2007. 5.3.1 81 BIBLIOGRAPHY [98] J. Yi, R. Jin, A. Jain, S. Jain, and T. Yang. Semi-crowdsourced clustering: Generalizing crowd labeling by robust distance metric learning. In Advances in Neural Information Processing Systems, pages 1781–1789, 2012. 2.3 [99] J. Yi, R. Jin, A. K. Jain, and S. Jain. Crowdclustering with sparse pairwise labels: A matrix completion approach. In AAAI Workshop on Human Computation, 2012. 2.3 [100] C.N.J. Yu and T. Joachims. Learning structural svms with latent variables. In Proceedings of ACM International Conference on Machine Learning, pages 1169–1176, 2009. 4.2, 4.4.3 [101] J. Yuan and Y. Wu. Spatial random partition for common visual pattern discovery. In Proceedings of IEEE International Conference on Computer Vision, pages 1–8, 2007. 4.2 [102] H. Zhang, A.C. Berg, M. Maire, and J. Malik. Svm-knn: Discriminative nearest neighbor classification for visual category recognition. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2, pages 2126–2136, 2006. 1.2, 2.1 [103] S. Zhang, J. Huang, Y. Huang, Y. Yu, H. Li, and D.N. Metaxas. Automatic image annotation using group sparsity. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 3312–3319, 2010. 1.2, 2.1, 3.3.3 [104] L. Zhu, Y. Chen, A. Yuille, and W. Freeman. Latent hierarchical structural learning for object detection. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 1062–1069, 2010. 4.2, 4.4.3 82 [...]... topics: (1) generic image annotation (2) fashion image understanding and (3) image organization through clustering For better understanding the semantic content of images, we first investigate automatic image annotation as a general problem in topic (1) In topic (2), we address the content understanding problem for a specific task: fashion understanding The fashionability of dress images is modeled which... choose to browse the image clusters they are interested in and simply ignore the others Besides improving result visualization, clustering based image organization techniques could also speed up the retrieval procedure and make the storage more efficient In this dissertation, we aim to supply better image retrieval experiences in the aspects of image content understanding and image organization Specifically,... Confronted with this huge amount of images, the needs for effective image retrieval become more and more urgent From a general aspect, an image retrieval system is a computer system which is designed for image browsing, searching and retrieving through a large digital image set In a traditional image retrieval system, images are indexed with their metadata such as captions, keywords and natural language text... professionally and with clean background Therefore, there are within-scenario retrieval and cross-scenario retrieval Withinscenario means both query image and retried images belong to the same resource and cross-scenario means the query image and images in the retrieval pool belong to different resources In [52], a practical problem of cross-scenario clothing retrieval is addressed via parts alignment and auxiliary... facilitate both concept-based and contentbased image retrieval Two of them focus on understanding the semantic meanings of images within a general area or a specific task/domain The third contribution targets at better image organization through image clustering which could largely benefit image searching and browsing experiences Generic image annotation We propose an automatic image annotation framework... done manually in the concept-based image retrieval and is less efficient compared to automatic manner If the resulting automated mapping between images and words is trustable, it could be much meaningful for both concept-based and content-based image retrieval Another important research problem arises from image retrieval is image search result organization Current image search engines usually display... review some recent efforts for the following 3 tasks: image annotation, fashion image understanding and image search result organization 2.1 Image Annotation Image annotation is a typical multi-label classification problem, since one image can be related to multiple words A significant amount of works have been devoted to address the task of automatic image annotation We can roughly categorize these existing... pattern based image retrieval which is very interesting and promising On the topic of image search result organization, we aim to utilize clustering techniques to facilitate image searching and browsing which is described in Chapter 5 Traditional unsupervised clustering methods usually cannot produce image clusters with high precision Therefore in this work we propose to actively clustering images and largely... framework based on a discriminative embedding learning model Chapter 4 covers the fashion image understanding work which belongs to the scope of domain/task specific image understanding Chapter 5 describes the image organization through active clustering and human-in-the-loop Finally, Chapter 6 concludes this dissertation and provides a short discussions on possible future research directions 8 Chapter 2... that, the image retrieval system will return a list of images and the ranking of each image reflecting the similarity of the image s 1 Chapter 1 Introduction metadata to the textual query Concept-based image retrieval usually suffers from irrelevant images For example, text extracted from HTML pages contains many noises, while manually entered tags may not capture every keyword that describe the image The . IMPROVING DIGITAL IMAGE RETRIEVAL TOWARDS IMAGE UNDERSTANDING AND ORGANIZATION CHEN QI NATIONAL UNIVERSITY OF SINGAPORE 2013 IMPROVING DIGITAL IMAGE RETRIEVAL TOWARDS IMAGE UNDERSTANDING AND. generic image annotation (2) fashion image understanding and (3) image organization through clustering. For better understanding the semantic content of images, we first investigate automatic image. retrieval procedure and make the storage more efficient. In this dissertation, we aim to supply better image retrieval experiences in the aspects of image content understanding and image organization.

Định dạng
Số trang	97
Dung lượng	5,76 MB