PHÁT HIỆN các NHÓM đối TƯỢNG TRONG cơ sở dữ LIỆU ẢNH

ĐẠI HỌC QUỐC GIA TP HỒ CHÍ MINH TRƯỜNG ĐẠI HỌC CÔNG NGHỆ THÔNG TIN  NGUYỄN THỊ BẢO NGỌC PHÁT HIỆN CÁC NHÓM ĐỐI TƯỢNG TRONG CƠ SỞ DỮ LIỆU ẢNH LUẬN VĂN THẠC SĨ NGÀNH KHOA HỌC MÁY TÍNH Mã số: 60.48.01.01 NGƯỜI HƯỚNG DẪN KHOA HỌC TS LÊ ĐÌNH DUY TP HỒ CHÍ MINH - NĂM 2017 LỜI CAM ĐOAN Tôi xin cam đoan công trình nghiên cứu riêng tơi Các số liệu, kết nêu luận văn trung thực chưa cơng bố cơng trình khác, ngoại trừ tư liệu trích dẫn ghi mục tài liệu tham khảo Tác giả luận văn Nguyễn Thị Bảo Ngọc LỜI CẢM ƠN Đầu tiên em xin chân thành cảm ơn thầy Lê Đình Duy thầy Ngơ Đức Thành tận tình hướng dẫn, giúp đỡ em suốt thời gian thực luận văn Bên cạnh đó, em xin chân thành cảm ơn giáo sư Shin’ichi Satoh (Viện Tin học Quốc gia (NII), Nhật Bản) tài trợ hướng dẫn em ngày đầu thực đề tài Em xin cảm ơn anh, chị bạn sinh viên tham gia sinh hoạt Phòng thí nghiệm Truyền thông Đa phương tiện, trường ĐH Công nghệ Thông tin Cảm ơn người hỗ trợ em nhiều suốt thời gian em thực khóa luận Em xin chân thành cảm ơn TP Hồ Chí Minh, tháng năm 2017 Nguyễn Thị Bảo Ngọc Mục lục Lời cam đoan iii Lời cảm ơn iv Mục lục v Danh sách hình vẽ viii Danh sách bảng xi Danh sách từ viết tắt xii Tóm tắt Những đóng góp PHÁT HIỆN CÁC NHĨM ĐỐI TƯỢNG TRONG CƠ SỞ DỮ LIỆU ẢNH 1.1 Bài toán, ứng dụng thách thức 1.2 Nội dung đóng góp luận văn 1.2.1 Tích hợp ràng buộc hình học so khớp đặc trưng 1.2.2 Lựa chọn ứng viên gần gom cụm Cấu trúc luận văn 11 1.3 KHẢO SÁT CÁC CƠNG TRÌNH LIÊN QUAN VÀ HƯỚNG TIẾP CẬN DỰA TRÊN ĐỒ THỊ SO KHỚP 12 v MỤC LỤC 2.1 2.2 12 2.1.1 Rút trích đặc trưng 14 2.1.2 So khớp 14 2.1.3 Gom cụm 15 Hướng tiếp cận dựa đồ thị so khớp 16 2.2.1 Rút trích đặc trưng 16 2.2.2 Xây dựng đồ thị so khớp 18 2.2.3 Gom cụm 21 2.2.3.1 Thuật toán ForceAltas2 21 2.2.3.2 Thuật toán Greedy Breadth-First Search (GBFS) 23 HỆ THỐNG VIM TÍCH HỢP CÁC GIẢI PHÁP ĐỀ XUẤT 25 3.1 Ý tưởng 25 3.1.1 Nâng cao chất lượng đồ thị so khớp 25 3.1.2 Nâng cao hiệu suất phương pháp gom cụm 26 Hệ thống VIM tích hợp giải pháp đề xuất 30 3.2.1 Rút trích đặc trưng 30 3.2.2 Tích hợp ràng buộc hình học xây dựng đồ thị so khớp 30 3.2.3 Gom cụm dựa lựa chọn ứng viên gần 34 3.2 Các cơng trình liên quan THỰC NGHIỆM VÀ PHÂN TÍCH 37 4.1 Bộ liệu 37 4.2 Phương pháp đánh giá 38 4.3 Các phương pháp so sánh 40 4.4 Kết thí nghiệm nhận xét 40 TỔNG KẾT VÀ HƯỚNG PHÁT TRIỂN 46 5.1 Kết đạt 46 5.2 Hướng phát triển 47 vi MỤC LỤC Cơng trình cơng bố 48 Tài liệu tham khảo 49 Phụ lục cơng trình công bố 52 Phụ lục đề cương xét duyệt 60 vii Danh sách hình vẽ 1.1 Đầu hệ thống phát nhóm đối tượng sở liệu ảnh (visual instance mining - VIM) nhóm ảnh với nhóm ảnh chứa đối tượng cụ thể riêng biệt (instance), (từ xuống dưới): xe Mercedes-Benz Dealership White, xe Toyota C-HR, tượng “Bruce Lee statue” Hong Kong, logo “Coca cola”, túi xách “Speedy 30 Damier Azur Canvas handbag”, 1.2 Phương pháp so khớp Đầu vào cặp ảnh Các cặp điểm ứng viên trùng khớp (candidate matches) tạo cách ghép visual word hai ảnh có visual word ID Các cặp điểm sau cắt bỏ HE-code[1], Burstiness Removal(sử dụng Multiple match removal - MMR[2]) kiểm tra tính quán hình học (Geometric Consistency[3]) 1.3 Giả sử tồn đồ thị so khớp Với ảnh I1 I2 chứa đối tượng hình tam giác vàng; ảnh I1 , I3 , I4 chứa đối tượng mặt tròn So sánh kết phương pháp gom cụm truyền thống (theo lớp) phương pháp gom cụm đề xuất (theo ứng viên gần nhất) 2.1 10 Hệ thống phát nhóm đối tượng sở liệu ảnh thường bao gồm ba bước chính, bao gồm: rút trích đặc trưng ảnh, so khớp 2.2 gom cụm 13 Ví dụ đồ thị so khớp 19 viii DANH SÁCH HÌNH VẼ 2.3 Sử dung thuật tốn ForceAltas2 trực quan hóa đồ thị 3.1 So sánh kết so khớp đặc trưng (a) So khớp thơng thường (b) 22 Tích hợp ràng buộc Kết cho thấy giải pháp đề xuất giúp loại bỏ matches lỗi hiệu 3.2 27 Các phương pháp gom cụm trước gom cụm theo phân cấp Ký hiệu: đỉnh màu đỏ điểm nằm cụm; Các điểm màu vàng ứng viên xét để đưa vào cụm; Các điểm màu xanh dương đỉnh lại đồ thị so khớp Các điều kiện đặt để đỉnh thêm vào cụm có xu hướng giống với điểm khởi tạo (Entry point) Cụm mở rộng cách duyệt theo chiều rộng lựa chọn lần ứng viên có khả giống với điểm khởi tạo Số cấp duyệt ứng viên người dùng lựa chọn 3.3 28 Phương pháp đề xuất ưu tiên mối liên kết ứng viên ảnh có cụm Ký hiệu: đỉnh màu đỏ điểm nằm cụm; Các điểm màu vàng ứng viên xét để đưa vào cụm; Các điểm màu xanh dương đỉnh lại đồ thị so khớp Các ứng viên liên kết lớn với ảnh cụm ưu tiên xem xét đưa vào cụm trước 3.4 29 Ví dụ minh họa việc tồn đa điểm nối ảnh Nghĩa là, tồn điểm đặc trưng có kết nối đến điểm đặc trưng khác 4.1 32 Một số đối tượng (a) liệu MQA (statue-of-liberty, Cocacola, Bruce-Lee-Statue, crabapple, panda) (b) liệu PartialDup (logo-converse, book, truck, gandi, beatles, gothic) 4.2 38 So sánh Fpair hai liệu MQA với PartialDup với giá trị khác ht thr ix 41 DANH SÁCH HÌNH VẼ 4.3 Hiệu suất thực liệu MQA Chúng so sánh với ba độ đo (a) Pair F-Measure, (b) Pair Precision, and (c) Pair Recall 42 4.4 Hiệu suất thực liệu PartialDup 43 4.5 Một số ví dụ nhóm đối tượng phát hai liệu sử dụng phương pháp đề xuất Lưu ý, số lượng đối tượng nhóm lớn số lượng đối tượng ảnh x 45 Danh sách bảng 4.1 So sánh kết tốt phương pháp đề xuất phương pháp tiên tiến Các giá trị tốt liệu với độ đo khác in đậm xi 44 Phụ lục cơng trình cơng bố 52 8/1/2017 Thư University of Information Technology - Fwd: KSE-2017 notification for paper 77 Nguyễn Thị Bảo, Ngọc Fwd: KSE-2017 notification for paper 77 Nguyễn Thị Bảo, Ngọc Tới: ngocntb.10@grad.uit.edu.vn 15:11 tháng 8, 2017 Forwarded message -From: KSE2017 Date: Tue, Aug 1, 2017 at 11:37 AM Subject: KSE-2017 notification for paper 77 To: Ngoc-Bao Nguyen Dear Ngoc-Bao Nguyen, On behalf of The Program Committee, we are pleased to inform you that your submission, referenced below, has been accepted as a full paper at KSE-2017, and will be published in the conference proceedings indexed by IEEE Xplore Digital Library and DBLP Congratulations! Paper 77: Graph-based visual instance mining with geometric matching and nearest candidates selection The reviews of your paper are enclosed with this letter Please take into account any comments and suggestions in the reviews when you prepare the revised version CAMERA-READY: The deadline for the camera-ready version of your paper is August 25, 2017 You will be contacted shortly with information about preparing the camera-ready version for the conference proceedings REGISTRATION: Please remember that at least one of the authors of your paper is obligated to register for the conference and to present the paper at the conference Early-bird registration closes on August 30, 2017; late registration closes on September 10, 2017 Please see the conference website (http://kse2017.dhsphue.edu.vn/Registration.aspx) for more information TRAVEL: Please note that foreign authors of accepted papers are responsible for obtaining appropriate travel documents (e.g., passports and visas) well in advance of their planned trip to Vietnam for KSE-2017 Authors can contact to request invitation letter or visa support documents at http://kse2017.dhsphue.edu.vn/Registration.aspx Again, congratulations on the acceptance of your paper Please visit the KSE-2017 website (http://kse2017.dhsphue.edu.vn/) regularly for up-to-date conference information And we look forward to seeing you in Hue city, Vietnam in October Best regards, KSE-2017 Program Committee Co-Chairs Ireneusz Czarnowski, Le-Minh Nguyen, and Xuan-Hieu Phan https://mail.google.com/mail/u/2/?ui=2&ik=2d408017b8&jsver=VHmuqYMXTE0.vi.&view=pt&msg=15d9cda824af6dc1&search=inbox&siml=15d9cda82… 1/1 Graph-based visual instance mining with geometric matching and nearest candidates selection Ngoc-Bao Nguyen Khang M T T Nguyen Multimedia Communications Laboratory, University of Information Technology, VNU-HCM, Ho Chi Minh, Vietnam ngocntb@uit.edu.vn University of Information Technology, VNU-HCM, Ho Chi Minh, Vietnam khangnttm@uit.edu.vn Cuong Mai Van Duy-Dinh Le University of Information Technology, VNU-HCM, Ho Chi Minh, Vietnam cuongmv@uit.edu.vn University of Information Technology, VNU-HCM, Ho Chi Minh, Vietnam duyld@uit.edu.vn Abstract—The goal of this work is to cluster frequently appearing visual instances from a collection of images, called visual instance mining State-of-the-art approaches employ clustering on the matching graph Although these graph-based approaches are quite effective, their performances are still modest In this paper, we propose two methods to further improve the accuracy of graph-based approaches Firstly, we improve graph construction by integrating geometric constraints Particularly, we exploit Hamming Embedding code, burstiness removal, and geometric consistency in our framework to remove false connections Secondly, unlike most of the existing methods which cluster images similar to an entry point, we cluster them by prioritizing the correlation between the candidates and members in the cluster We enlarge the cluster by adding the nearest candidate at each reexpanding Our method outperforms the state-of-the-art methods on benchmark datasets Especially, our results are 21.2% and 5.5% higher than the latest results, conducted respectively in MQA dataset and PartialDup dataset Index Terms—Instance Mining, Graph Clustering, Geometric Matching, Hamming Embedding, Burstiness I I NTRODUCTION Visual instance mining aims to automatically link all images which embrace the same instance into the same cluster from a collection of images For example, a cluster may be a specific logo, such as “Coca cola”, or a specific handbag, such as “Speedy 30 Damier Azur Canvas - handbag” (see Figure 1) Instances mined from a large collection serve for many applications, e.g image annotation, instance search, multimedia summarization, trend prediction, etc Until now, instance mining is still a challenging problem The first challenge is about the variation There are considerable diversity in the number of visual instance existing in a collection of images Besides, different types of intra-class variations exist due to the difference of scales, viewpoints, illumination, background, The second challenge is the scalability issue In fact, when the mining dataset is expanded, the noise grows rapidly and leads to mining performance drop quickly Additionally, a parallelizable and uncomplicated method is necessary to deal with a large collection of images Fig 1: Visual instance mining system Output of the system is clusters of images containing the same instances such as “Coca cola” logo, “Speedy 30 Damier Azur Canvas handbag”, “Bruce Lee statue” in Hong Kong Many approaches have been proposed for instance mining State-of-the-art approaches are graph-based [1], [2] which involves two main components: graph construction and clustering In the first step, images are matched and computed the pairwise similarity Then, the graph is constructed whose nodes are images and edges indicate the similarity between two nodes Based on the graph, clustering method is then applied to cluster images which embrace the same instance This approach tackles quite effective not only the variation issue but also the scalability issue and, therefore, obtains the better performance than previous works However, the performance is still not improved completely For instance, the performance of state-of-the-art approaches are still below par in MQA dataset[3] whose only a few hundreds images In this work, we propose two new techniques to further improve the accuracy of graph-based instance mining system, thus Firstly, a desire instance graph is a graph that contains only true connections which means that there is a connection between two nodes only if they embrace the same instance In fact, the existing graph construction methods produce many false connections In this paper, we, therefore, improve graph construction by integrating geometric constraints to make connections more reliable Particularly, we exploit Hamming Embedding code[4], burstiness removal[5], and geometric consistency in our framework Those algorithms are used to refine matches, as a result, removing false connections between node pairs Secondly, unlike most of the existing methods which cluster images similar to an entry point, we cluster them by prioritizing the correlation between the candidate and members in the cluster We enlarge the cluster by adding the nearest candidate at each re-expanding By doing this, the new member will be close to the rest of the cluster and the output cluster will tend to embrace the same instance In sum, our main contributions are two-fold: 1) Geometric Matching in Graph Construction We integrate geometric constraints in our framework to minimize the mismatches Hence, our connections in the graph more reliable 2) Nearest Candidates Selection in Clustering Our approach expands clusters by adding the nearest candidate at each adding step Therefore, the final cluster contains the images that are connected closely to each other which enhances the ability of having the same instance Our approach evaluated in benchmark datasets has shown the state-of-the-art results Especially, our performances are 21.2% and 5.5% higher than the state-of-the-art performance, conducted respectively in MQA dataset[3] and PartialDup dataset[6] This paper is organized as following: an overview of related works is given in Section Section presents our proposed method Experimental results are discussed in Section Finally, Section draws the conclusion of the paper II R ELATED W ORKS A typical instance mining system has three main steps: feature extraction, matching, and clustering First of all, features of each images in the collection are extracted Images are then matched to find matching content between images Finally, clustering methods are applied to produce clusters of images whose the same instance In this section, we present the overview of related works according to those main steps Feature extraction In instance mining, bag of visual words is one of the most widely used features for image representation Recently, information of geometry is exploited to enhance quality of image representation by combining a visual word and its neighborhoods [7], [1], [2] Matching Matching is one of the most costly parts of instance mining systems Works proposed until now usually use fast techniques for matching images in a collection Min-Hash is used widely [8], [9], [10], [11], [7] due to its speed Geometric min-Hash (GmH) [12], an improvement of min-Hash, is introduced by exploiting geometric information among visual words GmH generates a number of sketch collisions which improves the performance of small object discovery In 2008, Jegou et al [4] proposed Hamming Embedding (HE) code extracted from feature signatures to enhance both performance and speed for instance search task This method is currently employed by many approaches in instance mining [7], [1], [2] to reduce matching errors To reduce matching errors further, Jaccard similarity is computed between neighborhoods of central visual words [7], [1], [2] Recently, weak geometric consistency (WGC)[4] has exploited in the work of Li et al.[2] Clustering Quack et al [13] aims to detect frequently occurring objects automatically by emploing a frequent itemset mining algorithm Each visual word is treated as a item and each local image patch is encoded as a ”transactions” This work only deals with a small number of images Later, Pineda et al [11] obtained object models by clustering visual word sets which tend to appear in the same image by Min-Hashing This method can only work for small datasets If the number of images increases, the performance will drop quickly since the noises increase Zang et al [7] mined visual instances by exploiting Thread of Features (ToF) Local features between the images are threaded Clusters will be generated by voting thread clusters Clustering based on graphs is also an important approach Philbin and Zisserman [14] matched all images to construct a matching graph They use standard clustering techniques for mining images groups However, the matching is computed using the whole image which is only suitable for mining large instances In [1], after constructing the graph, the images are clustered by using a greedy breadth-first search (GBFS) algorithm which finds all images similar to the entry point An improvement of this method has recently introduced in [2] where layout embedding (LE) [15], a new approach for graph visualization, is embedded to tighten the candidate pairs Li et al [16] proposed Midlevel Deep Pattern Mining (MDPM) They apply CNN to find clusters of image patches This work, however, requires training data to learn association rule mining III P ROPOSED METHOD In this paper, we propose two techniques to further improve both matching process for graph construction and candidate selection process for clustering We will dissect each part as following A Feature Extraction For feature extraction, given a pre-train codebook, local SIFT features are extracted and are quantized to visual words Then 32-bit Hamming Embedding code [4] of each visual word is extracted from its SIFT features B Matching In this phase, we match local features of images and output the number of matches of each image pair Unlike existing methods, candidate matches are created by grouping two local features of two different images containing the same visual Fig 2: Our matching approach The input is image pairs Candidate matches are produced by matching point of the same visual word ID Then, these matches are cut by using Hamming Embedding(HE) code, Multiple match removal (MMR) and Geometric Consistency words ID Then, we remove all inappropriate matches by considering their Hamming Embedding code and removing the burstiness Finally, these matches are checked for the consistency of geometric and agglomerated to construct the connection between images See Figures for the summary Hamming Embedding Hamming Embedding is a method to refine the matching based on visual words First, we compute the Hamming distance of HE-code l HamDist(fi , fj ) = [ci,k = cj,k ] (1) k=1 where c is Hamming Embedding code and l is the length of HE-code Similarly to other methods, we set l = 32 for our experiments Matches will be removed if their Hamming distance between binary signatures is above a threshold ht Note that Hamming distance can be computed quickly by XOR operator Hence, this step can help us remove quickly and effectively mismatches Removing multiple matches To further reduce inappropriate matches, we remove multiple matches by using multiple match removal (MMR) method [5] The strategy is to keep only the best matches of which smallest Hamming distance between binary signatures See the Algorithm for more details Geometric consistency After removing multiple matches, we merge all best matches of each image pair to prepare to compute the connection of these pairs The geometric transformation is used to filter matches that are not consistent in geometric Two images have a connection if they contain at least two matches The weight of connection is assigned by the number of accepted matches At the end of this step, we generate the instance graph of the whole collection Each node is an image in the collection Algorithm 1: Removing multiple matches algorithm Input: Matches of visual word id i of image Img1 and Img2, denoted M atchesi ; Sort M atchesi in ascending order according to its Hamming distances; Initialize BestM atchesi = M atchesi (1); for j = → size(BestM atchesi ) M ultipleM atches = elements of M atchesi (j) are also in BestM atchesi ; if isempty(M ultipleM atches) then Add M atchesi (j) to BestM atchesi ; end end Output: Best matches BestM atchesi and the connection between two nodes (if any) indicates their similarity C Clustering Existing graph-based clustering methods are that cluster images similar to an entry point To this, they investigate the greedy breadth-first search (GBFS) and layout embedding (LE) [15] Start at an entry point which is also a vertex of the graph, neighbor nodes are scanned before moving to next search depth level A node or an image will be considered to add to cluster if the average weight between the scanned node and each image in cluster higher than a threshold and the distance between the node and entry point is less than a threshold extracted by layout embedding method Note that the thresholds will be increased when moving to next search depth level This is unfair because the average weight of each candidate is computed in the same way In addition, this strategy is more suitable for searching than clustering since the candidate whose the closest candidate of cluster is not scanned firstly Furthermore, each candidate is only visited one time at a searching level By the same token, true positive candidates can be skipped Finally, the depth level is required to control (a) Existing clustering method (b) Our proposed method Fig 3: (a) Existing methods cluster images in layer-by-layer greedy way Images in cluster are tend to similar to the entry point (b) Our method prioritize the correlation between the candidates and members in the cluster We enlarge the cluster by adding the nearest candidate at each re-expanding visited vertices that make the algorithm be more complex to select parameters In this paper, we propose a new method for clustering based on graph Compare to existing graph-based methods, our method is simpler but more effective (See the Figure 3) The main idea of the algorithm is to prioritize the correlation between the candidate and members of cluster The nearest candidate of cluster will be added if its connections and members of cluster is big enough Then, its neighbors are considered to be the new candidates and are updated to the queue The loop continues until no suitable candidate is chosen By doing this, the final cluster will tend to embrace the same instance Let f inalCluster be the output clusters; C is the set of temporary clusters For each cluster C(i), Cdd(i) is the set of images which are considered to be pushed into C(i) Each candidate in Cdd(i) has a correlation weight computed by the sum of all connection’s weights between it and all images in C(i) This score indicates how close a candidate and C(i) is The higher score is, the bigger chance it has the same instance with images in the cluster The set of those scores is stored in wC(i) To begin, the ith image in the collection and its nearest neighbor is added to a temporary cluster in C(i) Then, images connected to cluster are added to Cdd(i) The best selection nCdd(i) which contains the highest wC(i) will be added to cluster if it passes a threshold To deal with the expansion of a cluster, the threshold will increase linearly with the cluster’s size Next, we add the connections of nCdd(i) to Cdd(i) and update wC(i) The loop stops when no images nCdd(i) in Cdd(i) have wC(i) which pass the threshold The details of algorithm are discussed below Algorithm 2: Instance mining algorithm Input: Graph G, collection of images CI, threshold thr Initialize clusters C by finding the nearest neighbor of each image in CI If their correlation weights are bigger than thr , initialize a cluster seed to C, including the image and its nearest neighbor RemoveDuplicatedClusters(C); Find the neighbor images Cdd and its correlation weight cW of each cluster in C; finalCluster = empty; while ∼ isempty(C) f lag = false(size(C)); newT hr = thr × size(C); for j = → |C| Find the nearest candidate nCdd and its weight w of ith cluster if w > newT hr then Add nCdd to cluster C(i); Add the connection images of nCdd to Cdd; Update wC of each image in Cdd; else f lag(i) = true; end end f inalCluster ← C(f lag); RemoveDuplicatedClusters(C(∼ f lag), Cdd(∼ f lag), wC(∼ f lag)); end Output: Clusters f inalCluster IV E XPERIMENTS A Datasets To evaluate our proposed method, our experiments are conducted on two benchmark datasets They are MQA dataset [3] and PartialDup dataset [6] MQA contains 52 different instances involving 438 images in total Despite its small size, the performances of existing approaches are still below par due to its diverse intra-class variation PartialDup has 782 images with 20 instances, focused on partial-duplicate images Instances are different from scales, viewpoints, backgrounds, etc B Evaluation Methods To compare to other methods, we use three standard metrics: Pair Precision (Ppair ), Pair Recall (Rpair ), and F-measure (Fpair ) Let CGT be the set of ground truth clusters; CDR is discover clusters Pair Precision (Ppair ) is the quotient of the number of image pairs in both CGT and CDR divided by the number of image pairs in CDR Pair Recall (Rpair ) is the ratio of the number of image pairs in both CGT and CDR to the number of image pairs in CGT F-measure (Fpair ) is the harmonic of Ppair and Rpair (a) MQA dataset (a) MQA dataset (b) PartialDup dataset Fig 4: Comparison of Fpair with different ht and thr threshold C Result and Analysis First of all, we present the results of using different cluster threshold thr and Hamming threshold ht where thr ∈ {3, 4, , 10} and ht ∈ {1, 2, , 15} in Figure We notice that with small ht , small thr produces better performance When ht increases, the value of thr should also increases to get the best performance However, the highest points will be stable when ht and thr increase (see the curves of thr > 6) It means that we can obtain the best performance by choosing small ht and thr to be effective in terms of time In the next comparisons, we set thr = and thr = for MQA dataset and PartialDup dataset, respectively Then, we plot the performance comparisons in Figure and Figure with varying ht For both two datasets, we achieve the stateof-the-art results With small Hamming distance threshold ht , all produced matches and connections are true As a result, our approach achieves very high precision (Pair precision = 1) On the other hand, small Hamming distance threshold ht is also produces a small number of matches which lead to a low recall As a result, F-measure is low In contrast, high ht allows more matches considered, hence both true and false matches are produced This leads to the decrease in precision and the increase in recall When they balance, we gain the best performance for F-measure Finally, table I shows the results of other methods and our best results which ht = 12 for both two datasets Our results (b) PartialDup dataset Fig 5: Example instances are mined in two datasets, using our method TABLE I: Comparison of different methods with our best results Dataset MQA PartialDup Method ToF [7] IG [1] IG-Full[2] Proposed method ToF [7] IG [1] IG-Full [2] Proposed method Fpair 0.373 0.301 0.416 0.628 0.839 0.757 0.933 0.988 Ppair 0.523 0.618 0.751 0.873 0.932 0.835 0.984 0.994 Rpair 0.289 0.193 0.287 0.4903 0.763 0.693 0.888 0.982 are much higher than others For MQA dataset, the recall is approximately twice as high as the others Specifically, our result achieved on MQA dataset is 21.2% higher than others for Fpair measure For PartialDup dataset, both recall and precision are also improved Some instances mined in two datasets are shown in Figure Our method can mine effectively visual instances with different sizes and different types of variation V C ONCLUSION In this paper, we propose two techniques to improve instance mining performance Our method enhances both matching quality for graph construction and effectiveness for clustering Experimental results on benchmark datasets significantly improves over the state-of-the-art (a) Pair F-Measure (b) Pair Precision (c) Pair Recall Fig 6: The performance in MQA dataset We compare in term of (a) Pair F-Measure, (b) Pair Precision, and (c) Pair Recall (a) Pair F-Measure (b) Pair Precision (c) Pair Recall Fig 7: The performance in PartialDup dataset ACKNOWLEDGEMENT This research is funded by University of Information Technology-Vietnam National University HoChiMinh City under grant number D1-2017-02 R EFERENCES [1] W Li, C Wang, L Zhang, Y Rui, and B Zhang, “Scalable visual instance mining with instance graph,” in Proceedings of the British Machine Vision Conference (BMVC) BMVA Press, September 2015, pp 98.1–98.11 [2] W Li, J Li, C Wang, L Zhang, and B Zhang, “Visual instance mining from the graph perspective,” Multimedia Systems, pp 1–16, 2017 [3] W Zhang, L Pang, and C.-W Ngo, “Snap-and-ask: answering multimodal question by naming visual instance,” in Proceedings of the 20th ACM international conference on Multimedia ACM, 2012, pp 609– 618 [4] H Jegou, M Douze, and C Schmid, “Hamming embedding and weak geometric consistency for large scale image search,” in European conference on computer vision Springer, 2008, pp 304–317 [5] H J´egou, M Douze, and C Schmid, “On the burstiness of visual elements,” in Computer Vision and Pattern Recognition, 2009 CVPR 2009 IEEE Conference on IEEE, 2009, pp 1169–1176 [6] Z Wu, Q Ke, M Isard, and J Sun, “Bundling features for large scale partial-duplicate web image search,” in Computer Vision and Pattern Recognition, 2009 CVPR 2009 IEEE Conference on IEEE, 2009, pp 25–32 [7] W Zhang, H Li, C.-W Ngo, and S.-F Chang, “Scalable visual instance mining with threads of features,” in Proceedings of the 22nd ACM international conference on Multimedia ACM, 2014, pp 297–306 [8] D C Lee, Q Ke, and M Isard, “Partition min-hash for partial duplicate image discovery,” in European Conference on Computer Vision Springer, 2010, pp 648–662 [9] O Chum et al., “Large-scale discovery of spatially related images,” IEEE transactions on pattern analysis and machine intelligence, vol 32, no 2, pp 371–377, 2010 [10] O Chum and J Matas, “Fast computation of min-hash signatures for image collections,” in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on IEEE, 2012, pp 3077–3084 [11] G F Pineda and T WATANABE, “Scalable object discovery: A hashbased approach to clustering co-occurring visual words,” IEICE TRANSACTIONS on Information and Systems, vol 94, no 10, pp 2024–2035, 2011 [12] O Chum, M Perd’och, and J Matas, “Geometric min-hashing: Finding a (thick) needle in a haystack,” in Computer Vision and Pattern Recognition, 2009 CVPR 2009 IEEE Conference on IEEE, 2009, pp 17–24 [13] T Quack, V Ferrari, and L Van Gool, “Video mining with frequent itemset configurations,” in International Conference on Image and Video Retrieval Springer, 2006, pp 360–369 [14] J Philbin and A Zisserman, “Object mining using a matching graph on very large image collections,” in Computer Vision, Graphics & Image Processing, 2008 ICVGIP’08 Sixth Indian Conference on IEEE, 2008, pp 738–745 [15] M Jacomy, T Venturini, S Heymann, and M Bastian, “Forceatlas2, a continuous graph layout algorithm for handy network visualization designed for the gephi software,” PloS one, vol 9, no 6, p e98679, 2014 [16] Y Li, L Liu, C Shen, and A van den Hengel, “Mid-level deep pattern mining,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp 971–980 Phụ lục đề cương xét duyệt 60 ĐẠI HỌC QUỐC GIA TP HCM TRƯỜNG ĐẠI HỌC CƠNG NGHỆ THƠNG TIN CỘNG HỊA XÃ HỘI CHỦ NGHĨA VIỆT NAM Độc lập - Tự - Hạnh phúc ĐỀ CƯƠNG ĐỀ TÀI LUẬN VĂN THẠC SĨ Định hướng ứng dụng (10TC) Định hướng nghiên cứu (15TC)  Tên đề tài hướng NC: Tiếng Việt: PHÁT HIỆN CÁC NHÓM ĐỐI TƯỢNG TRONG CƠ SƠ DỮ LIỆU ẢNH Tiếng Anh: VISUAL INSTANCE MINING Ngành mã ngành đào tạo: Khoa học máy tính – 60.48.01.01 Họ tên học viên thực đề tài, khóa- đợt học: Nguyễn Thị Bảo Ngọc, Khóa 10, đợt MSHV: CH1501012 Địa email, điện thoại liên lạc học viên: 724/24 Điện Biên Phủ, phường 10, quận 10, TP HCM Người hướng dẫn: TS Lê Đình Duy Địa email: duyld@uit.edu.vn Tổng quan tình hình NC 4.1 Giới thiệu chung Phát nhóm đối tượng (visual instance mining) tốn tự động tìm đối tượng cụ thể phổ biến sở liệu (csdl) ảnh cho trước[1] Thuật ngữ “đối tượng” (instance) thực thể thị giác cụ thể như: máy bay Air Force One, túi Speedy 30 Damier Azur Canvas, logo Adidas (xem ví dụ minh họa Hình 1) Phát nhóm đối tượng csdl ảnh đóng vai trò quan trọng nhiều tốn khác phân tích liệu đa phương tiện tóm tắt liệu đa phương tiện (multimedia summarization), tự động gán nhãn cho ảnh (image annotation), khuyến nghị sản phẩm (product recommendation), tìm kiếm thị giác (instance search) Phát nhóm đối tượng csdl ảnh tốn có nhiều thách thức, có hai thách thức sau: Hình 1: Ví dụ minh họa tốn phát nhóm đối tượng sở liệu ảnh Một số nhóm đối tượng phát như: máy bay Air Force One, logo Adidas, túi Speedy 30 Damier Azur Canvas (1) Chi phí tính tốn Trong sở liệu lớn, số lượng đối tượng lớn Do chi phí để phát nhóm đối tượng lớn Bên cạnh đó, để đưa phương pháp phát nhóm đối tượng vào ứng dụng thực tế, phương pháp đề xuất cần phải quan tâm đến tốc độ xử lý thuật toán (2) Sự đa dạng biểu diễn đối tượng Hình ảnh đối tượng khác lớn khác tỉ lệ, góc nhìn, che khuất, độ chiếu sáng Điều làm ảnh hưởng đến khả gom cụm hình ảnh đối tượng Hình 2: Hệ thống phát nhóm đối tượng sở liệu ảnh thường bao gồm ba bước chính, bao gồm: rút trích đặc trưng ảnh, so khớp gom cụm Hiện có nhiều phương pháp để phát nhóm đối tượng csdl ảnh Một phương pháp phổ biến nay, nhiều cơng trình sử dụng [1][2][3], gồm ba bước (xem Hình 2): (1) Rút trích đặc trưng ảnh Các ảnh liệu rút trích đặc trưng Một số đặc trưng rút trích local feature, local patch Kết giai đoạn thường điểm đặc trưng đại diện cho đối tượng xuất ảnh (2) So khớp Mục đích giai đoạn tìm điểm tương tự hình ảnh từ tìm đối tượng giống ảnh Đây giai đoạn chiếm nhiều thời gian hệ thống phát nhóm đối tượng sở liệu ảnh độ phức tạp lên đến O(n2) với n số lượng ảnh liệu (3) Gom cụm (gom nhóm đối tượng) Các hình ảnh gom cụm cho ảnh chứa đối tượng gom chung cụm Kết trả danh sách nhóm ảnh, nhóm bao gồm ảnh chứa đối tượng Trong đề tài này, học viên tập trung vào giai đoạn gom cụm (bước (3)) hệ thống phát nhóm đối tượng csdl ảnh Các phương pháp gom cụm khác với gom cụm ảnh thơng thường chúng cần phát nhóm đối tượng có kích thước nhỏ so với tồn ảnh Điều nghĩa với hai ảnh có độ tương tự nhỏ, chúng có khả chứa chung đối tượng Ngoài ra, hình ảnh chứa nhiều đối tượng, hình ảnh nằm nhiều cụm khác Tóm lại, đề tài này, học viên hướng đến sử dụng phương pháp tiên tiến rút trích đặc trưng so khớp, đề xuất phương pháp gom cụm giúp nâng cao độ xác phương pháp phát nhóm đối tượng sở liệu ảnh 4.2 Tình hình nghiên cứu 4.2.1 Nghiên cứu nước Sau khảo sát ba cơng trình tiêu biểu tốn phát nhóm đối tượng sở liệu ảnh: • Co-occuring Visual Words[2] Ý tưởng phương pháp tìm visual words xuất (co-occuring) biểu diễn đối tượng (xem Hình 3) Đầu tiên, tác giả rút trích đặc trưng ảnh visual word xây dựng tập mục ngược (inverted index) giúp quản lý danh sách ảnh chứa visual word Để phát nhóm đối tượng, tác giả tìm kiếm tập inverted index rút trích tập visual word xuất ảnh giống (co-occuring word sets) Cuối cùng, để thu mơ hình đối tượng (object models), tác giả sử gom cụm co-occuring word sets Phương pháp hữu hiệu việc khai thác đối tượng kích thước lớn nhỏ liệu Tuy nhiên hiệu suất phương pháp bị giảm nhanh chóng số lượng ảnh tăng lên nhiễu tăng lên Hình 3: Mơ hình tổng qt phương pháp Co-VW[2] • Threads of Features[3] Phương pháp dựa ý tưởng đối tượng giống hình chia sẻ với chuỗi (thread) đặc trưng giống (xem Hình 4) Tác giả trước tiên tìm thread đặc trưng tiềm có khả chia sẻ chung đối tượng Các chuỗi sau gom cụm bỏ phiếu (vote) thành nhóm đối tượng Hình 4: Mơ hình tổng qt phương pháp ToF[3] • Instance Graph[1] Tác giả xây dựng đồ thị liên hệ hình ảnh với nút hình ảnh, độ dài cạnh biểu thị độ tương tự hai ảnh nút (xem • Hình 5) Để gom cụm, hình ảnh liệu thay phiên làm ảnh đầu vào (entry point) tìm kiếm ảnh lân cận có độ tương tự vượt ngưỡng Phương pháp đạt hiệu khai thai thác đối tượng có kích thước lớn không đạt hiệu đối tượng có kích thước nhỏ độ tương tự hai ảnh chứa đối tượng nhỏ Phương pháp gom cụm đồng thời cần điểm khởi tạo để gom cụm Hình 5: Mơ hình tổng qt phương pháp Instance Graph[1] Bảng 1: Bảng thống kê kết ba phương pháp tiên tiến Instance Graph[1], ToF[3], Co-VW[2] ba liệu MQA[7], Oxford5k[3], PartialDup[8] Bộ liệu Độ đo Co-VW[2] MQA (438 ảnh, 52 đối tượng) Precision Recall F-measure Precision Recall F-measure Precision Recall F-measure ≈0.250 ≈0.250 ≈0.250 Khơng có liệu PartialDup (782 ảnh, 20 đối tượng) Oxford5k (5063 ảnh, 364 đối tượng) ≈0.125 ≈0.325 ≈0.180 ToF[3] 0.639 0.183 0.284 0.887 0.643 0.745 ≈0.800 ≈0.400 ≈0.533 Instance Graph[1] 0.613 0.200 0.301 0.835 0.693 0.757 Khơng có liệu 4.2.2 Nghiên cứu nước Theo tìm hiểu học viên tình nghiên cứu Việt Nam, có nhóm nghiên cứu TS Lê Đình Duy đồng PTN Truyền thông Đa phương tiện, trường ĐH Công nghệ Thơng tin có nghiên cứu phát nhóm đối tượng sở liệu ảnh Tính khoa học tính 5.1 Tính khoa học Đề tài “Phát nhóm đối tượng sở liệu ảnh” tập trung vào nội dung sau: • Nghiên cứu tìm hiểu phương pháp phát nhóm đối tượng sở liệu ảnh • Đề xuất phương pháp gom cụm giúp nâng cao độ xác phương pháp phát nhóm đối tượng sở liệu ảnh 5.2 Tính • Phát nhóm đối tượng sở liệu ảnh có nhiều ứng dụng thực tiễn • Các phương pháp chưa đáp ứng nhu cầu thực tế Ngay với liệu nhỏ MQA[7] (438 hình) độ xác (F-measure) đạt mức khoảng 30% (xem kết phương pháp thơng kê Bảng 1) • Đối với cộng đồng nghiên cứu nước tốn phát nhóm đối tượng sở liệu ảnh chưa có nhiều nghiên cứu Mục tiêu, đối tượng phạm vi 6.1 Mục tiêu: Đề tài “Phát nhóm đối tượng sở liệu ảnh” có mục tiêu sau: (1) Tìm hiểu tổng quan phương pháp phát nhóm đối tượng sở liệu ảnh (2) Cài đặt thử nghiệm phương pháp phát nhóm đối tượng sở liệu ảnh Đánh giá kết liệu chuẩn (3) Đề xuất phương pháp cải tiến dựa phân tích đưa trước (4) Tổng kết đề tài 6.2 Phạm vi nghiên cứu: Nghiên cứu thực nghiệm đánh giá kết tập liệu chuẩn như: MQA[7], Oxford5k[3], PartialDup[8] Nội dung, phương pháp kế hoạch bố trí thời gian NC STT Nội dung Nội dung 1: • Nghiên cứu khảo sát kỹ thuật, phương pháp tiên tiến 12/2016 phát nhóm đối tượng tượng sở liệu ảnh tiên tiến sở liệu ảnh • Các liệu chuẩn dạng ảnh cho • Khảo sát liệu chuẩn sử toán phát nhóm đối dụng cho tốn phát nhóm đối tượng sở liệu ảnh tượng sở liệu ảnh MQA, Oxford5k, PartialDup Nội dung 2: • Cài đặt thử nghiệm phương pháp phát nhóm đối tượng sở Thời gian • Tài liệu tổng hợp, phân tích 11/2016 – phương pháp phát nhóm đối nay, Kết Một phương pháp phát 12/2017 – nhóm đối tượng sở liệu 1/2017 ảnh liệu ảnh • Đánh giá kết Nội dung 3: • Đề xuất phương pháp gom cụm giúp cải tiến độ xác tốn phát nhóm đối tượng sở liệu ảnh • Phương pháp phát nhóm 1/2017 – đối tượng sở liệu ảnh 5/2017 • Một báo hội nghị nước quốc tế • Viết báo gửi hội nghị Nội dung 4: Báo cáo luận văn 6/2017 Tổng kết đề tài 8.Tài liệu tham khảo [1] W Li, C Wang, L Zhang, Y Rui, and B Zhang, “Scalable Visual Instance Mining with Instance Graph,” in Proceedings of the British Machine Vision Conference (BMVC), 2015, p 98.1-98.11 [2] G F Pineda and T WATANABE, “Scalable object discovery: A hash-based approach to clustering co-occurring visual words,” IEICE Trans Inf Syst., vol 94, no 10, pp 2024–2035, 2011 [3] W Zhang, H Li, C.-W Ngo, and S.-F Chang, “Scalable visual instance mining with threads of features,” in Proceedings of the 22nd ACM international conference on Multimedia, 2014, pp 297– 306 [4] E Cohen et al., “Finding interesting associations without support pruning,” IEEE Trans Knowl Data Eng., vol 13, no 1, pp 64–78, 2001 [5] A Z Broder, “On the resemblance and containment of documents,” in Compression and Complexity of Sequences 1997 Proceedings, 1997, pp 21–29 [6] H Jegou, M Douze, and C Schmid, “Hamming embedding and weak geometric consistency for large scale image search,” in European conference on computer vision, 2008, pp 304–317 [7] W Zhang, L Pang, and C.-W Ngo, “Snap-and-ask: answering multimodal question by naming visual instance,” in Proceedings of the 20th ACM international conference on Multimedia, 2012, pp 609–618 [8] Z Wu, Q Ke, M Isard, and J Sun, “Bundling features for large scale partial-duplicate web image search,” in Computer Vision and Pattern Recognition, 2009 CVPR 2009 IEEE Conference on, 2009, pp 25–32 TP HCM, ngày 08 tháng 03 năm 2017 TP HCM, ngày 08 tháng 03 năm 2017 NGƯỜI HƯỚNG DẪN HỌC VIÊN KÝ TÊN Lê Đình Duy Nguyễn Thị Bảo Ngọc ... họa Hình 1.1) Các instance Chương Phát nhóm đối tượng sở liệu ảnh Hình 1.1: Đầu hệ thống phát nhóm đối tượng sở liệu ảnh (visual instance mining - VIM) nhóm ảnh với nhóm ảnh chứa đối tượng cụ thể... sách nhóm đối tượng, nhóm đối tượng bao gồm tập ảnh sở liệu ảnh (được mong đợi) có chứa instance Mỗi ảnh sở liệu nằm nhiều nhóm đối tượng Đây tốn có nhiều thách thức đa dạng cao đối tượng có ảnh. .. 2017) Chương PHÁT HIỆN CÁC NHÓM ĐỐI TƯỢNG TRONG CƠ SỞ DỮ LIỆU ẢNH 1.1 Bài toán, ứng dụng thách thức Phát nhóm đối tượng (visual instance mining, viết tắt VIM) tốn tự động tìm đối tượng cụ thể,

Định dạng
Số trang	77
Dung lượng	23,12 MB