Tóm tắt: Kết hợp cấu trúc R - Tree với đồ thị tri thức cho mô hình tìm kiếm ảnh.

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	48
Dung lượng	2,54 MB

Nội dung

Phát triển mô hình tìm kiếm ảnh theo tiếp cận cây RTree.Phát triển mô hình tìm kiếm ảnh theo tiếp cận cây RTree.Phát triển mô hình tìm kiếm ảnh theo tiếp cận cây RTree.Phát triển mô hình tìm kiếm ảnh theo tiếp cận cây RTree.Phát triển mô hình tìm kiếm ảnh theo tiếp cận cây RTree.Phát triển mô hình tìm kiếm ảnh theo tiếp cận cây RTree.Phát triển mô hình tìm kiếm ảnh theo tiếp cận cây RTree.Phát triển mô hình tìm kiếm ảnh theo tiếp cận cây RTree.Phát triển mô hình tìm kiếm ảnh theo tiếp cận cây RTree.Phát triển mô hình tìm kiếm ảnh theo tiếp cận cây RTree.Phát triển mô hình tìm kiếm ảnh theo tiếp cận cây RTree.Phát triển mô hình tìm kiếm ảnh theo tiếp cận cây RTree.Phát triển mô hình tìm kiếm ảnh theo tiếp cận cây RTree.Phát triển mô hình tìm kiếm ảnh theo tiếp cận cây RTree.Phát triển mô hình tìm kiếm ảnh theo tiếp cận cây RTree.

ĐẠI HỌC HUẾ TRƯỜNG ĐẠI HỌC KHOA HỌC LÊ THỊ VĨNH THANH KẾT HỢP CẤU TRÚC R-TREE VỚI ĐỒ THỊ TRI THỨC CHO MƠ HÌNH TÌM KIẾM ẢNH Ngành: Khoa học máy tính Mã số: 48 01 01 LUẬN ÁN TIẾN SĨ KHOA HỌC MÁY TÍNH Người hướng dẫn khoa học: PGS TS Lê Mạnh Thạnh TS Văn Thế Thành HUẾ, NĂM 2023 Cơng trình hồn thành tại: Khoa Công nghệ Thông tin, Trường Đại học Khoa học, Đại học Huế Người hướng dẫn khoa học: PGS TS Lê Mạnh Thạnh Phản biện 1: Phản biện 2: Phản biện 3: Luận án bảo vệ Hội đồng chấm luận án cấp Đại học Huế họp tại: ……………………………………… ……………………………………………………………… Vào hồi: ….giờ… .ngày… .tháng… .năm Có thể tìm hiểu luận án thư viện: Trung tâm thông tin thư viện, Trường Đại học Khoa học, Đại học Huế MỞ ĐẦU Lý chọn đề tài Hiện nay, hệ thống tìm kiếm ảnh phát triển đưa vào nhiều ứng dụng khác nhận dạng tìm kiếm khn mặt [1]; tìm kiếm ảnh hàng hóa [2]; tìm kiếm ảnh y tế [3], tìm kiếm ảnh vệ tinh [4], v.v Có hai phương pháp tìm kiếm ảnh thơng dụng bao gồm: tìm theo từ khóa TBIR (Text-based Image Retrieval) tìm theo nội dung CBIR (Content-based Image Retrieval) Phương pháp TBIR thực tìm kiếm ảnh tương tự dựa việc lập mục, mơ tả, thích hình ảnh cung cấp từ người dùng [5, 6] Phương pháp tìm kiếm ảnh dựa nội dung CBIR phát triển Phương pháp tập trung vào việc trích xuất so sánh đặc trưng cấp thấp (low-level features) hình ảnh màu sắc, kết cấu, hình dạng, vị trí số đặc trưng khác [79] Các kết nhiều cơng trình nghiên cứu thập kỷ qua thể tính hiệu độ xác kỹ thuật dựa CBIR ứng dụng nhiều hệ thống tìm kiếm ảnh [10] Hệ thống CBIR hỗ trợ người dùng tìm kiếm tập ảnh tương tự nội dung dựa đặc trưng cấp thấp hình ảnh kết khác ngữ nghĩa [11] Đây khoảng cách ngữ nghĩa cấp cao đặc trưng thị giác cấp thấp hình ảnh, việc thu hẹp khoảng cách thách thức lớn hệ tìm kiếm ảnh dựa nội dung [12, 13] Do đó, tốn phân tích tìm kiếm ảnh theo tiếp cận ngữ nghĩa lĩnh vực thị giác máy tính nhà nghiên cứu quan tâm [14-16] Với tăng trưởng liệu đa phương tiện (bao gồm: hình ảnh, âm thanh, video, văn bản), hệ thống xử lý cần lưu trữ khối lượng liệu lớn [17] Vì vậy, việc tạo cấu trúc lưu trữ liệu đa chiều cho dạng liệu đa phương tiện cần thiết để giúp q trình tìm kiếm nhanh chóng hiệu Trên sở đó, luận án đề xuất thực đề tài “Kết hợp cấu trúc R-Tree với đồ thị tri thức cho mơ hình tìm kiếm ảnh” Tổng quan tình hình nghiên cứu Trong năm gần đây, hệ thống tìm kiếm ảnh thực nhiều phương pháp phân cụm liệu khác mang lại kết tốt Trong đó, R-Tree cấu trúc dùng để phân cụm lưu trữ liệu đa chiều dựa vùng không gian để phân hoạch liệu ứng dụng hiệu lĩnh vực tìm kiếm ảnh [16, 18] Có nhiều cơng trình áp dụng cấu trúc R-Tree cho tốn tìm kiếm ảnh tương tự nhằm nâng cao độ xác giảm thời gian tìm kiếm ảnh Haldurai cộng (2015) đề xuất hệ tìm kiếm ảnh tương tự theo nội dung sử dụng cấu trúc R-Tree [22] Vanitha cộng (2017) đề xuất cấu trúc lưu trữ SR-Tree ứng dụng cho hệ thống tìm kiếm ảnh tương tự theo nội dung [24] Shama cộng (2015) đề xuất hệ thống tìm kiếm ảnh tương tự sử dụng cấu trúc R*-Tree cho tập ảnh thực vật [23] Alfarrarjeh cộng (2020) đề xuất mơ hình tìm kiếm ảnh dựa cấu trúc R*-Tree ứng dụng cho tốn tìm kiếm ảnh tương tự với liệu ảnh đường phố [21] Các hệ thống tìm kiếm theo nội dung dựa đặc trưng cấp thấp đạt nhiều kết khả quan áp dụng vào thực tế Tuy nhiên, hạn chế hệ thống tồn độ sai lệch đặc trưng cấp thấp ngữ nghĩa cấp cao hình ảnh (semantic gap) [11] Giải “semantic gap” toán đầy thách thức hệ thống tìm kiếm ảnh dựa nội dung [12] Các cơng trình nghiên cứu gần áp dụng đồ thị tri thức đồ thị ngữ cảnh cho hệ thống tìm kiếm theo tiếp cận ngữ nghĩa để giảm “semantic gap” đặc trưng cấp thấp ngữ nghĩa cấp cao hình ảnh [44-46], cụ thể sau: Justin Jonhson cộng đề xuất khung tìm kiếm ảnh theo ngữ nghĩa dựa khái niệm đồ thị ngữ cảnh [27] Wang, S cộng giới thiệu mơ hình tìm kiếm ảnh sử dụng đồ thị ngữ cảnh bao gồm đồ thị ngữ cảnh trực quan đồ thị ngữ cảnh văn [28] Yoon, S cộng giới thiệu cách tiếp cận để tìm kiếm ảnh dựa độ tương tự đồ thị ngữ cảnh sử dụng mạng nơ-ron đồ thị [29] Qi, M cộng đề xuất khung để tìm kiếm ngữ cảnh trực tuyến đa phương thức dựa biểu diễn nhị phân đồ thị ngữ nghĩa [30] Quinn, M H cộng mơ tả kiến trúc tìm kiếm ảnh theo ngữ nghĩa dựa tình trực quan hình ảnh [31] Do phương pháp tìm kiếm ảnh theo nội dung ngữ nghĩa kết hợp đặc trưng cấp thấp ngữ nghĩa cấp cao dựa đồ thị tri thức cho tập ảnh lớn định hướng nghiên cứu phù hợp, mang tính cấp thiết có khả ứngdụng hiệu hệ thống tìm kiếm ảnh tương tự Mục tiêu luận án Mục tiêu luận án phát triển mơ hình tìm kiếm ảnh tương tự dựa cấu trúc R-Tree, đề xuất mơ hình kết hợp cấu trúc với biểu diễn quan hệ ngữ nghĩa đối tượng hình ảnh nhằm nâng cao độ xác tìm kiếm ảnh Các mục tiêu cụ thể luận án bao gồm: (1) Nghiên cứu phương pháp lưu trữ liệu đa chiều cấu trúc R-Tree, đồng thời kết hợp đồ thị láng giềng vào cấu trúc cải tiến RS-Tree nhằm nâng cao hiệu lưu trữ tìm kiếm; (2) Đề xuất sử dụng đồ thị tri thức để biểu diễn thông tin ngữ nghĩa mối quan hệ ngữ nghĩa đối tượng hình ảnh; (3) Nghiên cứu mơ hình tìm kiếm ảnh dựa cấu trúc RS-Tree; kết hợp đồ thị láng giềng RS-Tree; kết hợp RSTree đồ thị tri thức Đối tượng phạm vi nghiên cứu: Đối tượng nghiên cứu: (1) Các cấu trúc phân cụm liệu, tạo véc-tơ đặc trưng đa chiều phục vụ cho tốn tìm kiếm ảnh; (2) Các thuật toán tạo cấu trúc lưu trữ liệu thuật tốn tìm kiếm ảnh; (3) Đồ thị tri thức; (4) Các tập liệu ảnh phổ biến Phạm vi nghiên cứu: (1) Tạo cải tiến cấu trúc lưu trữ liệu đa chiều dựa cấu trúc R-Tree; (2) Các thuật toán xây dựng thuật toán tìm kiếm cây; (3) Các phương pháp cải tiến phân cụm với đồ thị cụm láng giềng; (4) Đồ thị tri thức, ngôn ngữ OWL, truy vấn SPARQL; (5) Các tập liệu ảnh bao gồm: COREL, Oxford Flowers 17, Oxford Flowers 102, CUB-200-2011, Visual Genome, MS-COCO Phương pháp nghiên cứu Phương pháp lý thuyết: Tổng hợp, phân tích cơng bố liên quan đến tìm kiếm ảnh dựa cấu trúc R-Tree tìm kiếm ảnh theo tiếp cận ngữ nghĩa; đánh giá ưu, khuyết điểm cơng trình nhằm đề xuất cải tiến phù hợp Phương pháp thực nghiệm: Xây dựng thực nghiệm trích xuất đặc trưng, cải tiến cấu trúc liệu, đề xuất mơ hình cài đặt thuật toán tập liệu ảnh phổ biến Các tập liệu ảnh sử dụng cho cài đặt thực nghiệm bao gồm: COREL, Oxford Flowers 17, Oxford Flowers 102, CUB-2011-200, Visual Genome MS-COCO Bố cục luận án Luận án trình bày 137 trang, mở đầu (09 trang), kết luận hướng phát triển (02 trang), danh mục cơng trình khoa học tác giả liên quan đến luận án (01 trang), tài liệu tham khảo (09 trang), luận án chia thành chương Chương (31 trang) trình bày sở lý thuyết tìm kiếm ảnh cấu trúc R-Tree Chương (39 trang) trình bày cấu trúc phân cụm liệu RS-Tree mơ hình tìm kiếm ảnh theo nội dựa cấu trúc RS-Tree Chương (46 trang) đề xuất cải tiến cấu trúc RS-Tree kết hợp đồ thị láng giềng đồ thị tri thức để nâng cao độ xác tìm kiếm ảnh Đóng góp luận án (1) Đề xuất cải tiến cấu trúc R-Tree thiết kế RS-Tree với khối cầu liệu kết hợp đồ thị láng giềng xây dựng cấu GraphNB-RST để nâng cao độ xác tìm kiếm ảnh Đồng thời, đề xuất thuật tốn mơ hình tìm kiếm ảnh theo nội dung dựa cấu trúc xây dựng; (2) Xây dựng đồ thị tri thức dựa tập liệu ảnh Visual Genome RS-Tree để lưu trữ mô tả thông tin ngữ nghĩa hình ảnh, mối quan hệ ngữ nghĩa đối tượng ảnh Từ đó, xây dựng mơ hình tìm kiếm ảnh theo tiếp cận ngữ nghĩa dựa kết hợp RS-Tree với đồ thị tri thức để nâng cao độ xác tìm kiếm ảnh Chương TỔNG QUAN VỀ TÌM KIẾM ẢNH, CẤU TRÚC RTREE VÀ ĐỒ THỊ TRI THỨC 1.1 Giới thiệu Tìm kiếm ảnh theo nội dung ứng dụng nhiều lĩnh vực khác đời sống Hai vấn đề quan trọng tốn tìm kiếm ảnh (1) mơ tả nội dung thị giác hình ảnh đặc trưng cấp thấp; (2) tạo cấu trúc liệu lưu trữ cho nội dung thị giác [16, 36] Bên cạnh đó, nhiều phương pháp khác áp dụng cho tốn tìm kiếm ảnh theo tiếp cận ngữ nghĩa để giảm độ sai lệch ngữ nghĩa đặc trưng cấp thấp ngữ nghĩa cấp cao người [48] Do đó, việc kết hợp phương pháp khác cho tốn tìm kiếm ảnh cần thực nhằm nâng cao độ xác tìm kiếm 1.2 Tìm kiếm ảnh dựa theo nội dung Tìm kiếm ảnh theo nội dung phương pháp thực tìm kiếm tập hình ảnh tương tự dựa việc trích xuất tự động đặc trưng cấp thấp hình ảnh màu sắc, kết cấu hình dạng, vị trí, khơng gian số đặc trưng khác Hệ thống lưu trữ đặc trưng cấp thấp liệu hình ảnh dạng véc-tơ đặc trưng đa chiều đối sánh véc-tơ đặc trưng dựa độ đo tương đồng [42] Trong luận án này, phương pháp trích xuất đặc trưng cấp thấp kết hợp bao gồm: đặc trưng màu dựa hệ màu MPEG7 (25 đặc trưng); đặc trưng vị trí theo thuật tốn ShiTomasi (25 đặc trưng), đặc trưng hình dạng dựa phép tốn Maxpooling dò cạnh Sobel (48 đặc trưng); đặc trưng kết cấu dựa HOG phép lọc Sobel (144 đặc trưng) Số đặc trưng hình ảnh trích xuất 242 chiều Bên cạnh đó, nhiều phương pháp tìm kiếm ảnh theo ngữ nghĩa đề xuất để giảm độ sai lệch ngữ nghĩa nội dung cấp thấp hình ảnh với ngữ nghĩa cấp cao người: (1) Các kỹ thuật học máy sử dụng để liên kết đặc trưng cấp thấp với ngữ nghĩa hình ảnh (2) Tìm kiếm ảnh dựa đồ thị tri thức để mô tả ngữ nghĩa hình ảnh mối quan hệ ngữ nghĩa đối tượng hình ảnh 1.3 Cấu trúc R-Tree cho tốn tìm kiếm ảnh Từ việc khảo sát cấu trúc R-Tree biến thể chúng cho thấy cấu trúc sử dụng để lưu trữ liệu đa chiều áp dụng tốn tìm kiếm liệu ảnh nhằm nâng cao hiệu tốc độ tìm kiếm Trên sở đó, luận án này, mơ hình tìm kiếm ảnh dựa cấu trúc RS-Tree đề xuất phần 1.5 RS-Tree cấu trúc cải tiến từ cấu trúc R-Tree nguyên thủy biến thể nó, trình bày chương 1.4 Tổng quan đồ thị tri thức Đồ thị tri thức KG (Knowledge Graph) cấu trúc biểu diễn tri thức làm tảng cho số ứng dụng khác [76] Đồ thị tri thức ngày quan tâm cấu trúc trừu tượng tạo điều kiện thuận lợi cho việc quản lý liệu khái niệm cách hiệu Đồ thị tri thức mã hóa ngữ nghĩa liệu dạng đồ thị bao gồm: (1) Tri thức (ngữ nghĩa): khái niệm mối quan hệ khái niệm yếu tố quan trọng, chúng mã hóa tri thức để mơ tả miền liệu giới thực; (2) Đồ thị (dữ liệu): cấu trúc liệu dựa nút cạnh cho phép tích hợp liệu từ nguồn liệu khơng đồng nhất, từ khơng có cấu trúc đến có cấu trúc 1.5 Đồ thị ngữ cảnh Đồ thị ngữ cảnh, đề xuất Johnson vào năm 2015 [28], cấu trúc liệu biểu diễn nội dung ngữ nghĩa hình ảnh Đồ thị ngữ cảnh chứa tri thức có cấu trúc ngữ cảnh trực quan, bao gồm đối tượng, thuộc tính đối tượng mối quan hệ đối tượng Là tri thức hữu ích mơ tả ngữ nghĩa chi tiết hình ảnh thích, đồ thị ngữ cảnh ứng dụng nhiều nhiệm vụ bao gồm: thích hình ảnh [82], tìm kiếm ảnh [83], trả lời câu hỏi cho hình ảnh(VQA) [84] tạo hình ảnh [85] 1.6 Mơ hình tìm kiếm ảnh Trong luận án đề xuất mơ hình tìm kiếm ảnh theo nội dung ngữ nghĩa dựa kết hợp cấu trúc RS-Tree đồ thị tri thức Hình 1.5 Trong mơ hình gồm hai pha: Pha tiền xử lý pha tìm kiếm Trong pha tiền: (1) xây dựng cấu trúc liệu lưu trữ phân cụm liệu ảnh; (2) xây dựng đồ thị tri thức dùng để lưu trữ mô tả mối quan hệ ngữ nghĩa đối tượng hình ảnh Trong pha tìm kiếm ảnh: trình tìm kiếm ảnh theo nội dung ngữ nghĩa dựa kết hợp cấu trúc RS-Tree với đồ thị tri thức Hình 1.5 Mơ hình tìm kiếm ảnh kết hợp RS-Tree với đồ thị tri thức 1.7 Các phương pháp tổ chức thực nghiệm đánh giá Để xác định hiệu mơ hình đề xuất, phương pháp tổ chức thực nghiệm đánh giá luận án bao gồm: môi trường thực nghiệm, tập ảnh giá trị đánh giá hiệu suất 1.8 Tổng kết chương Chương trình bày tổng quan tìm kiếm ảnh theo nội dung ngữ nghĩa dựa R-Tree đồ thị tri thức Các mơ hình tìm kiếm ảnh theo tiếp cận R-Tree đề xuất Ngoài ra, phương pháp tổ chức thực nghiệm trình bày bao gồm: mơi trường thực nghiệm, tập liệu thực nghiệm giá trị đánh giá 10 Chương TÌM KIẾM ẢNH DỰA TRÊN RS-TREE 2.1 Giới thiệu Trên sở cấu trúc R-Tree biến thể ứng dụng lĩnh vực tìm kiếm ảnh, cấu trúc phân cụm liệu RS-Tree đề xuất nhằm lưu trữ véc-tơ đặc trưng cấp thấp hình ảnh RS-Tree đa nhánh cân bằng, nút phân cụm dựa vào độ đo tương tự theo phương pháp phân hoạch phân cấp, đảm bảo khả lưu trữ lớn 2.2 Cấu trúc RS-Tree RS-Tree đa nhánh cân ứng dụng cho tốn tìm kiếm ảnh tương tự Cây RS-Tree phân hoạch liệu không gian bao gồm: nút gốc, tập nút tập nút Cho hình ảnh I có véc-tơ đặc trưng 𝑓⃗𝐼 = (𝑣𝐼1 , 𝑣𝐼2 , 𝑣𝐼3 , … , 𝑣𝐼𝑑 ) Trong đó, 𝑣𝐼𝑖 đặc trưng cấp thấp ảnh I với 𝑖 = 𝑑 𝑣𝐼𝑖 ∈ [0,1] Một khối cầu 𝑀𝐵𝑆 thực thể 𝑠𝑝𝐸𝐷 khối cầu chứa đối tượng 𝑓⃗𝐼 gồm tâm 𝑐⃗𝑠𝑝 bán kính 𝑟𝑠𝑝 sau: 1) Tâm khối cầu thực thể: 𝑐⃗𝑠𝑝 = (𝑐𝐼1 , 𝑐𝐼2 , 𝑐𝐼3 , … , 𝑐𝐼𝑑 ) (2.1) Với 𝑐𝐼𝑗 = max⁡(0, 𝑣𝐼𝑗 − 𝑎𝐼𝑗 ), 𝑗 = 𝑑 Trong đó, 𝑎𝐼1 = 𝑣𝐼1⁄𝑘 , 𝑎𝐼2 = 𝑣𝐼2⁄𝑘 , … , 𝑎𝐼𝑑 = 𝑣𝐼𝑑⁄𝑘, với⁡𝑘 ≥ 2) Bán kính khối cầu thực thể: 𝑑 𝑟𝑠𝑝 = √∑(𝑐𝐼𝑗 − 𝑣𝐼𝑗 )2 𝑑 (2.2) 𝑗=1 Một khối cầu MBS nút 𝑆𝑙 khối cầu tối thiểu bao phủ tất phần tử khối cầu thực thể chứa bên gồm tâm 𝑐⃗𝑙 bán kính 𝑟𝑙 mơ tả sau: 10 environment, dataset, and evaluation values Chapter IMAGE RETRIEVAL BASED ON RS-TREE 2.1 Introduction On the basis of R-Tree structures and variants applied in image retrieval, a data clustering structure named RS-Tree is proposed to store low-level feature vectors of the image RS-Tree is a balanced multi-branch tree in which each node on the tree is clustered based on similarity measures by partitioning and hierarchical methods, ensuring a large storage capacity 2.2 RS-Tree structure RS-Tree is a balanced multi-branch tree used for similar image retrieval RS-Tree is a data partition structure including a root, a set of nodes, and a set of leaves In image I with vector feature 𝑓⃗𝐼 = (𝑣𝐼1 , 𝑣𝐼2 , 𝑣𝐼3 , … , 𝑣𝐼𝑑 ), each sphere 𝑀𝐵𝑆 of the element 𝑠𝑝𝐸𝐷 illustrated in Figure is the sphere that contains identifier 𝑓⃗𝐼 with the center vector 𝑐⃗𝑠𝑝 and the radius 𝑟𝑠𝑝 is described as follows: 1) The sphere center of the element: 𝑐⃗𝑠𝑝 = (𝑐𝐼1 , 𝑐𝐼2 , 𝑐𝐼3 , … , 𝑐𝐼𝑑 ) (2.1) With: 𝑐𝐼𝑗 = 𝑣𝐼𝑗 − 𝑎𝐼𝑗 , 𝑗 = 𝑑 𝑣 𝑣 𝑣 𝑎𝐼1 = 𝐼1⁄𝑘 , 𝑎𝐼2 = 𝐼2⁄𝑘 , … , 𝑎𝐼𝑑 = 𝐼𝑑⁄𝑘, 𝑘 is a parameter, 𝑘 ≥ 2) The sphere radius of the element: 𝑑 𝑟𝑠𝑝 = √∑(𝑐𝐼𝑗 − 𝑣𝐼𝑗 )2 𝑑 (2.2) 𝑗=1 A sphere MBS of the leaf node 𝑆𝐿 is the minimal sphere that covers 11 all sphere elements included the center vector 𝑐⃗𝐿 and the radius 𝑟𝑙 described as follows: 1) The sphere center of the leaf node 𝑆𝐿 : 𝑘 𝑐⃗𝐿 = ∑ sp𝑖 𝑐⃗𝑖 𝑘 (2.3) 𝑖=1 Where 𝑠𝑝1 , 𝑠𝑝2 , 𝑠𝑝3 , … 𝑠𝑝𝑘 are sphere elements inside the leaf 𝑆𝐿 and sp𝑖 ⃗⃗⃗ 𝑐𝑖 , the center of the sphere sp𝑖 , with < 𝑖 < 𝑘 2) The radius sphere of the leaf 𝑆𝐿 : 𝑟𝐿 = Max𝑖=1 𝑘 {𝑑𝐸 (𝑐⃗𝑙 , 𝑠𝑝𝑖 𝑐⃗𝑖 ) + 𝑠𝑝𝑖 𝑟𝑖 } (2.4) Where 𝑑𝐸𝑢 (𝑐⃗𝐿 , 𝑠𝑝𝑖 𝑐⃗𝑖 ) is the distance Euclid from the center of the leaf 𝑆𝑙𝑒𝑎𝑓 to the center of the 𝑖 𝑡ℎ sphere element and 𝑠𝑝𝑖 𝑟𝑖 is the radius of the 𝑖 𝑡ℎ sphere element A sphere MBS of the internal node 𝑆𝑁 is the minimal sphere that covers all spheres of the nodes in the subbranch of the tree included the center vector 𝑐⃗𝑛 = (𝑐1 , 𝑐2 , 𝑐3 , … 𝑐𝑑 ) and the radius 𝑟𝑛 described as follows: 1) The center sphere of the node 𝑆𝑁 : 𝑐𝑖 = ∑𝑘 𝑆𝑗 𝑐⃗𝑗 𝑥𝑖 ×𝑆𝑗 𝑤 ∑𝑘 𝑆𝑗 𝑤 , 𝑖 = 𝑑 (2.5) Where 𝑆1 , 𝑆2 , … , 𝑆𝑘 are the children nodes of the node 𝑆𝑁 ; d is the dimensionality of feature vectors; 𝑆𝑗 𝑐⃗𝑗 𝑥𝑖 is the i th feature of the center vector 𝑐⃗𝑖 , 𝑆𝑗 𝑤 is the number of elements of the node 𝑆𝑗 2) The radius sphere of the node 𝑆𝑁 : 𝑟𝑛 = Max𝑗=1 𝑘 {𝑑𝐸 (𝑐⃗𝑛 , 𝑆𝑗 𝑐⃗𝑗 ) + 𝑆𝑗 𝑟𝑗 } (2.6) 2.3 The principles of the operations on RS-Tree To ensure the storage of image data objects increasing from time to time, and enhance the performance of image retrieval systems, the RSTree structure must satisfy the following two requirements: (1) having ability to grow with the height of the tree; (2) having ability to partition and cluster similar data Based on these needs, adding 𝑠𝑝𝐸𝐷 elements to 12 the RS-Tree must be done from the root node with the following principles: Principle 1: If 𝑟𝑜𝑜𝑡 = 𝑁𝑢𝑙𝑙 Making a 𝑟𝑜𝑜𝑡 is 𝑆𝐿𝑟 , putting in the sphere 𝑠𝑝𝐸𝐷 into 𝑆𝐿𝑟 Updating the center and radius of the sphere Principle 2: If 𝑟𝑜𝑜𝑡 ≠ 𝑁𝑢𝑙𝑙 and 𝑟𝑜𝑜𝑡 𝑖𝑠 𝑙𝑒𝑎𝑓 Measuring 𝑑𝑖𝑠𝑡 = 𝑑𝐸𝑢 (𝑠𝑝𝐸𝐷 𝑐⃗𝑠𝑝 , 𝑆𝐿𝑟 𝑐⃗𝑟𝑙 ) + 𝑠𝑝𝐸𝐷 𝑟𝑠𝑝 • If 𝑑 ≤ 𝜃 + If 𝑅𝑜𝑜𝑡 𝑐𝑜𝑢𝑛𝑡 < 𝑀: Put 𝑠𝑝𝐸𝐷 into 𝑆𝐿𝑟 , update the center and the radius of the sphere 𝑆𝐿𝑟 + If 𝑅𝑜𝑜𝑡 𝑐𝑜𝑢𝑛𝑡 = 𝑀, node splitting performance • If 𝑑 > 𝜃 Making a 𝑆𝐿𝑛𝑒𝑤 to store 𝑠𝑝𝐸𝐷, create a new 𝑟𝑜𝑜𝑡 denoted 𝑆𝑁𝑟 link to 𝑆𝐿𝑟 , and 𝑆𝐿𝑛𝑒𝑤 Updating the center and the radius of the sphere from the current leaf along to the root Principle 3: If 𝑟𝑜𝑜𝑡 ≠ 𝑁𝑢𝑙𝑙 and 𝑟𝑜𝑜𝑡 𝑖𝑠 𝑛𝑜𝑡 𝑙𝑒𝑎𝑓 Choosing the path from the current node to the nearer nodes and the appropriate branch until seeing the current leaf denoted 𝑆𝐿𝑐𝑟𝑡 Measuring 𝑑𝑖𝑠𝑡 = 𝑑𝐸𝑢 (𝑠𝑝𝐸𝐷 𝑐⃗𝑠𝑝 , 𝑆𝐿𝑐𝑟𝑡 𝑐⃗𝑙𝑐𝑟𝑡 ) + 𝑠𝑝𝐸𝐷 𝑟𝑠𝑝 • If 𝑑 ≤ 𝜃 + If 𝑆𝐿𝑐𝑟𝑡 𝑐𝑜𝑢𝑛𝑡 < 𝑀 Put 𝐸𝑖 into the current leaf node, update the center, radius of the current leaf node and then update back to the root node + If 𝑆𝐿𝑐𝑟𝑡 𝑐𝑜𝑢𝑛𝑡 = 𝑀, node splitting performance • If 𝑑 > 𝜃 making new leaf denoted 𝑆𝐿𝑛𝑒𝑤 store the sphere 𝑠𝑝𝐸𝐷, updates the center and radius of the sphere from the current leaf along to the root 2.4 Image retrieval model based on RS-Tree The content-based image retrieval model based on RS-Tree is 13 illustrated as Figure 2.11 Figure 2.11 The image retrieval model of CBIR-RST based on RS-Tree The image retrieval process is performed in two phases: the first phase performs clustering and stores the image data in RS-Tree, the second phase performs to retrieval for similar images set 2.5 Experimenting the image retrieval system CBIR-RST Table 2.7 Comparison of the precision of methods on COREL set Method Bibi, R., 2020 [94] Ahmed, K T., 2019 [21] Chhabra P., 2020 [95] N.T.U Nhi [96] CBIR-RST MAP (%) 65,29 72,10 77,11 67,76 79,45 Table 2.8 Comparison of the precision of methods on OF17 set Method Ahmed, 2019 [21] S Gao, 2014 [97] CBIR-RST MAP (%) 77,10 73,43 78,69 Table 2.9 Comparison of the precision of methods on OF102 set Method Yang, J., 2017 [98] Unar, S., 2019 [99] Ahmed, K T., 2019 [20] MAP (%) 73,20 71,00 71,40 14 CBIR-RST 73,16 Table 2.10 Comparison of the precision of methods on CUB set Method Wei, X S., 2016 [100] Wang, Z., 2018 [101] Zeng, H., 2019 [26] CBIR-RST MAP (%) 65,80 66,57 70,10 68,17 Figure 2.20 Precision-Recall ROC of Figure 2.21 Precision-Recall ROC of OF102 set (1-51) OF102 set (52-102) Figure 2.22 Precision-Recall ROC of Figure 2.23 Precision-Recall ROC of CUB-200-2011 set (1-100) CUB-200-2011 set (101-200) 2.6 Chapter summary In Chapter 2, the RS-Tree structure is improved to apply to the problem of similar image retrieval by content Experimental results in Chapter performed on image datasets have demonstrated the effectiveness of the RS-Tree However, node splitting in the tree occurs frequently during tree generation, resulting in a number of similar elements located on different leaves The results lead to reduce image retrieval performance Therefore, improvements on the RS-Tree are performed in Chapter to improve the accuracy of the retrieval system 15 Chapter COMBINING RS-TREE WITH KNOWLEDGE GRAPH IN IMAGE RETRIEVAL 3.1 Introduction In this chapter, the image retrieval methods on RS-Tree are proposed to enhance retrieval accuracy including (1) combining RS-Tree and neighbor cluster graph to improve the efficiency of image search by content; (2) combining RS-Tree with knowledge graph to improve the efficiency of image retrieval by semantic 3.2 Combining RS-Tree with Neighbor graph The RS-Tree structure is built based on the node-splitting process during tree creation When a leaf is overflowed, the system performs node splitting, and the node splitting process is performed from leaf to root If the root node is overflowing, splitting the root node leads to the height of the tree increases by one level The result of the frequent node-splitting process affects the efficiency of clustering because some elements may be distributed into the incorrect leaf cluster Therefore, to overcome this shortcoming, a structure is created based on combining RS-Tree with a neighbor graph to improve the efficiency of image retrieval accuracy Given two leaf spheres 𝑆L1 (𝑐⃗1 , 𝑟1 ), 𝑆L2 (𝑐⃗2 , 𝑟2 ) with centers 𝑐⃗1, 𝑐⃗2 and radius 𝑟1 , 𝑟2 respectively 𝑆leaf1, 𝑆leaf2 are called spatial overlap, denoted 𝑂𝑣𝑒𝑟𝑙𝑎𝑝(𝑆𝐿1 , 𝑆𝐿2 ) when: 𝑑𝐸𝑢 (𝑆L1 𝑐⃗1 , 𝑆L2 𝑐⃗2 ) < 𝑆𝐿1 𝑟1 + 𝑆𝐿2 𝑟2 (3 1) with 𝑑𝐸𝑢 is the Euclidean distance Assuming that 𝑆L1, 𝑆L2 are not spatial overlap; distance of two spheres 𝑆L1, 𝑆L2, denoted dist eps = dEu (SL1 c⃗1 , SL2 c⃗2 ) − (SL1 r1 + SL2 r2 ) Given any leaf 𝑆Lk, let Ο = {𝑠𝑝𝐸𝐷𝑖, 𝑖 = 𝑀} (3 2) be the data elements stored in the 𝑆Lk, where M is the number of elements of the 𝑆Lk Suppose, labelα is any class label belonging to the classifier set of the experimental dataset The classifier of the 𝑆Lk is defined as follows: class(SLk ) =< labelα | max{count(spEDi)} , spEDi label = labelα > S (3 3) Let 𝐶𝑆𝐿 be the set of leaves on the R -Tree, 𝑆𝐿𝑘 is an arbitrary leaf 16 𝑆𝐿𝑘 ∈ 𝐶𝑆𝐿 , a given threshold ε ∈ (0,1), d𝑑𝐸𝑢 is the Euclidean distance function, class(𝑆𝐿𝑘 ) is the classification function of leaf 𝑆𝐿𝑘 Definition 3.1 (Types of neighbors) (1) Neighbor 𝑜𝑣𝑒𝑟𝑙𝑎𝑝 of leaf 𝑆𝐿𝑘 is a set: No (SLk ) = {SL ∈ CSL \{SLk }|Overlap(SLk , SL )} (3 4) (2) Neighbor 𝑒𝑝𝑠𝑖𝑙𝑜𝑛 of leaf 𝑆𝑙𝑒𝑎𝑓𝑘 is a set: Ne (SLk ) = {SL ∈ CSL \{SLk }|dist esp (SLk , SL ) ≤ ε} (3 5) (3) Neighbor 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 of leaf 𝑆𝑙𝑒𝑎𝑓𝑘 is a set: Nc (SLk ) = {SL ∈ CSL |class(SLk ) = class(SL )} (3 6) (4) Neighbor of leaf 𝑆𝐿𝑘 , denoted 𝑁𝑛 (𝑆𝐿𝑘 ), is union of all neighbors 𝑜𝑣𝑒𝑟𝑙𝑎𝑝, 𝑒𝑝𝑠𝑖𝑙𝑜𝑛 𝑐𝑙𝑎𝑠𝑠𝑒𝑠, that mean: Nn (SLk ) = No (SLk ) ∪ Ne (SLk ) ∪ Nc (SLk ) (3 7) On that basis, a neighbor graph of a leaf is created during tree building to improve image retrieval precision and is described as follows: Definition 3.2 (Neighbors Graph)The neighbor graph of a leaf 𝑆𝐿𝑘 is a graph denoted by NBGraph(𝑆𝐿𝑘 ) = {(𝑉, 𝐸)}, where, 𝑉 = {𝑆𝐿𝑘 } ∪ 𝑁𝑛 (𝑆𝐿𝑘 ), 𝐸 = {(𝑣, 𝑣𝑖 )}|𝑣 = 𝑆𝐿𝑘 , 𝑣𝑖 ∈ 𝑉 ∖ {𝑆𝐿𝑘 } (3 8) 3.3 Knowledge graph on Visual Genome In this section, the process of constructing a knowledge graph to describe the semantics of the image is presented The data used to build the knowledge graph is the Visual Genome dataset This dataset focuses on objects, attributes, and the relationships between them Detecting relationships and attributes is a major part of fully understanding the visual context of an image This dataset is one of the first to provide detailed labels on object relationships and attributes The data set includes the following components: area description, object, attribute, relationship, area chart, and scene graph The most frequent classes, objects, attributes, and relationships between them are extracted From there, a knowledge graph is generated from these components using the OWL Figure 3.8 illustrates the process of building the knowledge graph 17 Figure 3.8 The process of building knowledge graph A knowledge graph is a graph 𝑮 =< 𝑽, 𝑨, 𝑬 >, where 𝑽 = {𝑽𝟏 , 𝑽𝟐 , … , 𝑽𝒏 } is a set of vertices on the graph, 𝑽𝒊 is a label, or concept, or instance; 𝑨 = {𝒂𝟏 , 𝒂𝟐 , … , 𝒂𝒏 } is a set of attributes, 𝒂𝒊 is the attribute of the concept or instance; 𝑬 = {𝑬𝟏 , 𝑬𝟐 , … , 𝑬𝒏 } is a set edges on the graph, 𝒓𝒊 is a relationship between concepts and instances, or the relationship between concepts and attributes, or relationship between instances and attributes Figure 3.12 Model of knowledge graph The components in knowledge graph include (1) node types: class (Classes), individual (inClass, OBJ, IMG); (2) relationships: object properties (opOBJinv, opIMGinv, opIMGobj), relationships between objects (ON, IN, OF, WEAR, RIDE, …); (3) data properties: inClass (dpCLAName, dpCLAWordNet), OBJ (dpOBJObjectID, dpOBJWordNet, dpOBJAtribute, dpOBJWidth, dpOBJHeight, dpOBJSynsetID, 18 dpOBJName), IMG (dpIMGImageID, dpIMGURL, dpIMGLocation, dpIMGDataset, dpIMGWidth, dpIMGHeight); (4) annotations: relationships (anoRELSynsetID, anoRELPredicate, anoRELRelationID, anoRELWordNet, anoRELDescription) 3.4 Image retrieval model on RS-Tree and knowledge graph Figure 3.22 Semantic-based image retrieval model SBIR-RSTKG In Figure 3.22, a model of semantic-based images retrieval using NBGraphRST and knowledge graph, named SBIR_RSTKG, is described The model SBIR-RSTKG consists of two phases: (1) A model of knowledge graph is built for a description of images semantic based on the VG dataset; the feature vectors of the image dataset are extracted and stored on NBGraphRST; (2) The low-level features of the query image are extracted; at the same time, retrieving a set of content-based similar images is performed on the NBGraphRST structure Then, images which the highest similarity are selected from these Finally, scene graphs of the images are extracted to create SPARQL commands to query similar images and extract semantics on knowledge graph 19 3.5 Experimenting the image retrieval system SBIR-RSTKG Figure 3.33 Precision-Recall and Figure 3.34 Precision-Recall and ROC of ROC of COREL set OF17 set Figure 3.35 Precision-Recall and Figure 3.36 Precision-Recall and ROC of ROC of OF102 set CUB-2011-200 set Figure 3.37 Precision-Recall and ROC of MS-COCO set Figure 3.38 Precision-Recall and Figure 3.39 Precision-Recall and ROC of Dataset 1-VG ROC of Dataset 2-VG Experiment result on Knowledge Graph is presented as Figures 3.40-3.42 20 Figure 3.40 Precision-Recall and ROC of MS-COCO set Figure 3.41 Precision-Recall and Figure 3.42 Precision-Recall and ROC of Dataset -VG ROC of Dataset -VG Table 3.11 Comparison of the precision of methods on COREL set Method MAP (%) Ahmed, K T., 2019 [21] Chhabra P., 2020 [95] N.T.U Nhi, 2021 [96] CBIR-RST CBIR_NBGraphRST 72,10 77,11 67,76 79,45 88,98 Table 3.12 Comparison of the precision of methods on OF17 set Method MAP (%) Ahmed, 2019 [21] 77,10 S Gao, 2014 [97] 73,43 Gonỗalves, F.M.F., 2018 [14] 85.39 CBIR-RST 78,69 CBIR_NBGraphRST 89,76 Table 3.13 Comparison of the precision of methods on OF102 set Method Gonỗalves, F.M.F., 2018 [14] Ahmed, K T., 2019 [21] Lin, H., 2021 [111] CBIR-RST CBIR_NBGraphRST MAP (%) 73.31 71,40 77.93 73,16 80,52 21 Table 3.14 Comparison of the precision of methods on CUB-200 set Method MAP (%) Wei, X S., 2016 [100] 65,80 Wang, Z., 2018 [101] 66,57 Zeng, H., 2019 [26] 70,10 CBIR-RST 68,17 CBIR_NBGraphRST 79,86 Table 3.15 Comparison of the precision of methods on MS-COCO set Method MAP(%) Cao, Y., 2018 [112] 70.13 Cao, Z., 2017 [113] 73.62 Yan, C., 2020 [114] 80.62 CBIR_NBGraphRST 75.49 SBIR-RSTKG 81.19 3.6 Chapter summary In this chapter, a content and semantics-based image retrieval model is proposed using the RS-Tree structure combined with the neighbor graph and knowledge graph There are two main contributions in the chapter: (1) building a content and semantics-based image retrieval model; (2) constructing a knowledge graph to store and describe the semantics for multi-object images using the Visual Genome dataset Experimental results have shown the effectiveness of the proposed method CONCLUSIONS AND DEVELOPMENT ORIENTATION The thesis approached the image search model based on the RSTree structure The main contributions include (1) improving the method of building trees based on the R-Tree, while also improving on the RSTree structure by combining neighbor graphs to enhance the accuracy of image retrieval; (2) proposing a knowledge graph model to describe the 22 semantics between objects in the image, applied to the semantic retrieval problem to improve the accuracy of image search Experimental results show that the proposed methods have improved the accuracy of image retrieval First, we build the RS-Tree structure for storing and clustering image data for similar images retrieval Improvements on the RS-Tree structure include as follows: (1) RS-Tree is built based on feature vectors transformed into spherical form for easy storage and reduced computation time; (2) a threshold θ is proposed as a filter to perform clustering of similar images to improve the efficiency of data clustering and reduce spatial overlap; (3) the node splitting algorithm is improved based on the bias-variance trade off to enhance the effectiveness of node splitting Based on the proposed theory, we constructed a content-based image search model based on the RS-Tree Experimental results have demonstrated the effectiveness of the RS-Tree applied to image retrieval problems Second, we propose a model based on the combination of the RSTree with the neighbor graph to improve the accuracy of image retrieval In this model, a neighbor graph is created at the leaf node level during the construction of the RS-Tree The image retrieval process is performed as follows: from the input image, the system finds the appropriate leaf on the RS-Tree; then, the search process is continued on the neighbor graph to find the set of leaves belonging to the same neighbor cluster The experimental results have improved the accuracy of image search by performing two search processes Finally, we propose a model of semantic-based image retrieval by combining RS-Tree with a knowledge graph In this model, the image search process is performed in two stages: the system performs contentbased images retrieval using the RS-Tree combined with the neighbor graph; from the set of similar images, the system continues to perform a 23 search for images by semantics on the knowledge graph to improve the accuracy of image retrieval Thesis on constructing experiments and evaluations on singleobject and multi-object image datasets, with single-object image datasets including COREL, Oxford Flowers 17, Oxford Flowers 102, and CUB2011-200; and multi-object image datasets including Visual Genome and MS-COCO The experimental results show that the proposed improvement of the RS-Tree has improved accuracy Additionally, image search experiments based on the combination of RS-Tree and knowledge graphs yielded better results, while also extracting the semantic content of input images to enhance search quality The experimental results of image search models were also compared with recent works on each image dataset, demonstrating that the proposed methods are correct and improve retrieval accuracy, meeting the objectives of the thesis Based on the theoretical and experimental foundations, future research plans include: (1) Investigating methods for automatically generating scene graphs for input images using R-CNN, GCN, and knowledge graphs; (2) Enriching the knowledge graph based on vertex label prediction and predicting relationships between vertices using GCN and inference methods on the knowledge graph; (3) Implementing image caption based on weighted knowledge graphs and optimal path-finding algorithms on the knowledge graph; (4) Developing programs for practical applications in various fields such as: diagnosing plant diseases based on images, diagnosing diseases from medical images, providing health care advice based on daily food intake images, and other applications 24 REFERENCES A1 A2 A3 A4 A5 A6 A7 Lê Thị Vĩnh Thanh, Phan Thị Ngọc Mai, Văn Thế Thành, Lê Mạnh Thạnh, (2020), “Tìm kiếm ảnh theo ngữ nghĩa dựa phương pháp gom cụm ontology” Kỷ yếu Hội thảo Quốc gia Nghiên cứu ứng dụng CNTT (FAIR), ĐH Nha Trang, Nhà xuất Khoa học Tự nhiên Công nghệ, ISBN: 978-604-9985-77-5, tr 612-622 Lê Thị Vĩnh Thanh, Văn Thế Thành, Lê Mạnh Thạnh, (2021), “Một phương pháp tìm kiếm ảnh hiệu dựa cấu trúc R-Tree”, Kỷ yếu Hội thảo Quốc gia Công nghệ thông tin ứng dụng lĩnh vực (CITA2021), Đại học Đà Nẵng, Nhà xuất Đà Nẵng, ISBN: 978-60484-5998-7, tr 259-271 Lê, M T., Lê, T V T., Lương, T T X., Nguyen, T D., & Văn, T T., (2022), “Một mơ hình tìm kiếm ảnh dựa cấu trúc R-Tree kết hợp KDTree Random Forest Các cơng trình nghiên cứu, phát triển ứng dụng Công nghệ Thông tin Truyền thông, ISSN: 1859-3526, tr 29-41 Le Thi Vinh Thanh, Van The Thanh, Le Manh Thanh, (2022), “An improvement of R-Tree for content-based image retrieval”, Annales Univ Sci Budapest Sect Comp., Vol 53, pp 29-55 Thanh, L T V., & Thanh, L M, (2022), “Semantic-Based Image Retrieval using RS-Tree and Neighbor Graph” In World Conference on Information Systems and Technologies Springer, Cham pp 165-176 Thanh, L.T.V., Van, T.T., Le, T.M., (2022), “Semantic-Based Image Retrieval Using RS-Tree and Knowledge Graph” In: Nguyen, N.T., Tran, T.K., Tukayev, U., Hong, TP., Trawiński, B., Szczerbicki, E (eds) Intelligent Information and Database Systems ACIIDS 2022 Lecture Notes in Computer Science(), vol 13757 Springer, Cham, pp 481-495 Lê Thị Vĩnh Thanh, Văn Thế Thành, Lê Mạnh Thạnh, (2022), “Tìm kiếm ảnh theo ngữ nghĩa dựa cấu trúc iRS-Tree ontology”, Hue University Journal of Science: Techniques and Technology, T …, S … A8 Lê Thị Vĩnh Thanh, Văn Thế Thành, Lê Mạnh Thạnh, (2022),“Một khảo sát cấu trúc R-tree cho toán tìm kiếm ảnh”, Tạp chí khoa học cơng nghệ, Trường Đại học Khoa học, ĐH Huế, Tập …, Số … A9 Lê Thị Vĩnh Thanh, Văn Thế Thành, (2022),“Tìm kiếm ảnh dựa đồ thị láng giềng đồ thị ngữ nghĩa”, Kỷ yếu Hội tảo Quốc gia Nghiên cứu ứng dụng CNTT (FAIR), Học Viện Kỹ Thuật Mật Mã, Nhà xuất Khoa học Tự nhiên Công nghệ, ISBN: 978-604-357-119-6, tr 400-412

Ngày đăng: 17/05/2023, 12:14