Luận án tiến sĩ một số kỹ thuật tìm kiếm thực thể dựa trên quan hệ ngữ nghĩa ẩn và gợi ý truy vấn hướng ngữ cảnh

BỘ GIÁO DỤC VÀ ĐÀO TẠO VIỆN HÀN LÂM KHOA HỌC VÀ CÔNG NGHỆ VIỆT NAM HỌC VIỆN KHOA HỌC VÀ CÔNG NGHỆ - Trần Lâm Quân MỘT SỐ KỸ THUẬT TÌM KIẾM THỰC THỂ DỰA TRÊN QUAN HỆ NGỮ NGHĨA ẨN VÀ GỢI Ý TRUY VẤN HƯỚNG NGỮ CẢNH LUẬN ÁN TIẾN SĨ TOÁN HỌC Hà Nội – 2020 BỘ GIÁO DỤC VÀ ĐÀO TẠO VIỆN HÀN LÂM KHOA HỌC VÀ CÔNG NGHỆ VIỆT NAM HỌC VIỆN KHOA HỌC VÀ CÔNG NGHỆ - Trần Lâm Quân MỘT SỐ KỸ THUẬT TÌM KIẾM THỰC THỂ DỰA TRÊN QUAN HỆ NGỮ NGHĨA ẨN VÀ GỢI Ý TRUY VẤN HƯỚNG NGỮ CẢNH Chuyên ngành: Cơ sở toán học cho tin học Mã số: 9.46.01.10 LUẬN ÁN TIẾN SĨ TOÁN HỌC NGƯỜI HƯỚNG DẪN KHOA HỌC: TS Vũ Tất Thắng Hà Nội – 2020 i LỜI CAM ĐOAN Tôi xin cam đoan công trình nghiên cứu riêng tơi, hồn thành hướng dẫn TS Vũ Tất Thắng Các kết nêu luận án trung thực chưa cơng bố cơng trình khác Tôi xin chịu trách nhiệm lời cam đoan Hà nội, tháng 12 năm 2020 Tác giả Trần Lâm Quân ii LỜI CẢM ƠN Luận án hồn thành với nỗ lực khơng ngừng tác giả giúp đỡ từ thầy hướng dẫn, gia đình, bạn bè đồng nghiệp Đầu tiên, tác giả xin bày tỏ lời tri ân tới TS Vũ Tất Thắng, Thầy tận tình hướng dẫn tác giả hoàn thành luận án này, Thày kiên trì đặc biệt, định hướng cho nghiên cứu sinh suốt trình nghiên cứu Tác giả xin gửi lời cảm ơn tới Thầy, Cô cán Viện Công nghệ thông tin, Học viện Khoa học Công nghệ (Viện Hàn lâm Khoa học Cơng nghệ Việt Nam) nhiệt tình giúp đỡ tạo môi trường nghiên cứu tốt để tác giả hồn thành cơng trình nghiên cứu; có góp ý xác để tác giả có cơng bố ngày hôm Tác giả xin cảm ơn tới Ban Lãnh đạo Tổng công ty Hàng không Việt Nam (Vietnam Airlines), Trung tâm Nghiên cứu Ứng dụng đồng nghiệp nơi tác giả công tác ủng hộ để luận án hoàn thành Cuối cùng, xin gửi lời cảm ơn đến tất thành viên gia đình, bạn bè ln ủng hộ, chia sẻ, động viên khích lệ tơi học tập, nghiên cứu Hà Nội, tháng 12 năm 2020 Trần Lâm Quân iii MỤC LỤC Trang phụ bìa Lời cam đoan i Lời cảm ơn ii Mục lục iii Danh mục ký hiệu, chữ viết tắt v Danh mục bảng vii Danh mục hình vẽ, đồ thị viii MỞ ĐẦU 01 CHƯƠNG 1: TỔNG QUAN 1.1 Bài tốn tìm kiếm thực thể dựa quan hệ ngữ nghĩa ẩn 05 1.2 Các nghiên cứu liên quan đến tìm kiếm thực thể dựa ngữ nghĩa ẩn 07 1.2.1 Lý thuyết ánh xạ cấu trúc (Structure Mapping Theory – SMT) 07 1.2.2 Mơ hình khơng gian vector (Vector Space Model - VSM) 08 1.2.3 Phân tích quan hệ tiềm ẩn (Latent Relational Analysis - LRA) 09 1.2.4 Ánh xạ quan hệ tiềm ẩn (Latent Relational Mapping Engine - LRME) 09 1.2.5 Quan hệ ngữ nghĩa tiềm ẩn (Latent Semantic Relation – LSR) 11 1.2.6 Tương đồng quan hệ dựa Wordnet 11 1.2.7 Mơ hình học biểu diễn vector từ Word2Vec 12 1.3 Phương pháp tìm kiếm thực thể dựa quan hệ ngữ nghĩa ẩn với nghiên cứu liên quan 14 1.4 Bài toán gợi ý truy vấn hướng ngữ cảnh 14 1.5 Các nghiên cứu liên quan đến gợi ý truy vấn 15 1.5.1 Kỹ thuật gợi ý truy vấn dựa phiên (Session-based) 15 1.5.2 Kỹ thuật gợi ý truy vấn dựa cụm (Cluster-based) 18 1.6 Phương pháp gợi ý truy vấn dựa hướng ngữ cảnh với nghiên cứu liên quan 22 1.7 Các kết đạt luận án 24 CHƯƠNG 2: TÌM KIẾM THỰC THỂ DỰA TRÊN QUAN HỆ NGỮ NGHĨA ẨN 2.1 Bài toán 25 2.2 Phương pháp tìm kiếm thực thể dựa quan hệ ngữ nghĩa ẩn 27 iv 2.2.1 Kiến trúc – Mơ hình 27 2.2.2 Thành phần rút trích quan hệ ngữ nghĩa 30 2.2.3 Thành phần gom cụm quan hệ ngữ nghĩa 32 2.2.4 Thành phần tính tốn độ tương đồng quan hệ cặp thực thể 39 2.3 Kết thực nghiệm - Đánh giá 43 2.3.1 Dataset 44 2.3.2 Kiểm thử - Điều chỉnh tham số 45 2.3.3 Đánh giá với độ đo MRR 46 2.3.4 Hệ thống thực nghiệm 46 2.4 Kết luận chương 49 CHƯƠNG 3: GỢI Ý TRUY VẤN HƯỚNG NGỮ CẢNH 3.1 Bài toán 50 3.2 Phương pháp hướng ngữ cảnh 52 3.2.1 Định nghĩa – Thuật ngữ 52 3.2.2 Đề dẫn – Ví dụ minh họa 53 3.2.3 Kiến trúc – Mơ hình 55 3.2.4 Offline phase 55 3.2.5 Online phase – Giải thuật gợi ý truy vấn 63 3.2.6 Phân tích ưu nhược điểm 64 3.2.7 Các đề xuất kỹ thuật 66 3.2.8 Kỹ thuật phân lớp kết tìm kiếm dựa dàn khái niệm 73 3.3 Kết thực nghiệm - Đánh giá 84 3.3.1 Dataset 85 3.3.2 Đánh giá, so sánh 85 3.3.3 Hệ thống thực nghiệm 88 3.4 Kết luận chương 92 CHƯƠNG 4: KẾT LUẬN VÀ KIẾN NGHỊ 4.1 Kết luận 94 4.2 Kiến nghị 95 DANH MỤC CƠNG TRÌNH CỦA TÁC GIẢ 96 TÀI LIỆU THAM KHẢO 97 v DANH MỤC CÁC KÝ HIỆU, CÁC CHỮ VIẾT TẮT Ký hiệu CBOW Tên tiếng Anh Continous Bag-Of-Words Tên tiếng Việt Mơ hình Túi từ liên tục áp dụng Word2Vec C Cluster, Clustering Cụm, Phân cụm CL Concept Lattice Dàn khái niệm Dataset Data-set Tập liệu mẫu FCA Formal Concept Analysis Phân tích khái niệm hình thức Fe Filter-Entities Hàm lọc tìm cặp thực thể ứng viên FC Formal Context Khái niệm hình thức IRES Implicit Relational Entity Tìm kiếm thực thể dựa quan hệ ẩn; Search IR Information Retrieval Tìm kiếm thơng tin IRS Implicit Relational Search Tìm kiếm dựa quan hệ ẩn LM Language Model Mơ hình ngơn ngữ LRME Latent Relation Mapping Ánh xạ quan hệ tiềm ẩn Engine LRA Latent Relational Analysis Phân tích quan hệ tiềm ẩn LSR Latent Semantic Relation Quan hệ ngữ nghĩa tiềm ẩn MRR Mean Reciprocal Rank Trung bình RR (Reciprocal Rank) tập truy vấn NE Named Entity Thực thể có tên PMI Pointwise Mutual Độ đo thông tin tương hỗ Information q Query Câu truy vấn QLogs Query Log Tập truy vấn khứ Q-suggest Query suggestion Gợi ý truy vấn; Đề xuất truy vấn Re Rank-Entities Hàm xếp hạng thực thể tập ứng viên RelSim Relational Similarity Tương đồng quan hệ vi RR Reciprocal Rank Thứ hạng phù hợp đối tượng (truy vấn) SE Search Engine Máy tìm kiếm SL Semantic relation Quan hệ ngữ nghĩa Session Session Phiên tìm kiếm SR Similarity relation Quan hệ tương đồng SMT Structure Mapping Theory Lý thuyết ánh xạ cấu trúc term Term(s) Từ, cụm từ, thuật ngữ mining Text mining Khai phá liệu văn VS Voice search Tìm kiếm giọng nói VSM Vector Space Model Mơ hình khơng gian vector Word2Vec Word to Vector Mơ hình học biểu diễn từ thành vector vii DANH MỤC CÁC BẢNG Bảng 1.1: Tìm tương quan thuật ngữ danh sách Bảng 1.2: Kết tương quan thuật ngữ danh sách 10 Bảng 2.1: Kết giải thuật rút trích quan hệ ngữ nghĩa 31 Bảng 2.2: Các phân lớp NER (Location, Organization, Personal, Time) 44 Bảng 2.3: Các ví dụ kết thực nghiệm với input q = {A, B, C} output D 48 Bảng 3.1: Cấu trúc rút gọn phiên tìm kiếm 53 Bảng 3.2: Bảng ngữ cảnh 75 Bảng 3.3: Bảng so sánh tìm kiếm hướng ngữ cảnh Lucene-Nutch 86 viii DANH MỤC CÁC HÌNH VẼ, ĐỒ THỊ Hình 1.1: Danh sách trả từ Keyword-SE ứng với query: “Việt Nam”, “Hà Nội”, “Pháp” Hình 1.2: Danh sách trả từ Keyword-SE ứng với q1, q2 Hình 1.3: Input: “Cuba”, “José Marti”, “Ấn Độ” (ngữ nghĩa ẩn: “anh hùng dân tộc”) Hình 1.4: Tìm kiếm dựa quan hệ ngữ nghĩa với truy vấn đầu vào gồm thực thể Hình 1.5: Ánh xạ cấu trúc SMT Hình 1.6: Quan hệ từ mục tiêu ngữ cảnh mơ hình Word2Vec 12 Hình 1.7: Word2Vec “học” quan hệ “ẩn” từ mục tiêu ngữ cảnh từ 13 Hình 1.8: QFG sử dụng trọng số 17 Hình 1.9: Các phương pháp phân cụm 18 Hình 1.10: Các đối tượng Core, Border, Noise phân cụm DBSCAN 21 Hình 1.11: Khả Directly Density-reachable Density-reachable 21 Hình 1.12: Gợi ý truy vấn kỹ thuật truyền thống 22 Hình 1.13: Ngữ cảnh truy vấn 23 Hình 1.14: Minh họa truy vấn “tiger” 23 Hình 2.1: Tìm kiếm dựa quan hệ ngữ nghĩa với đầu vào gồm thực thể 27 Hình 2.2: Kiến trúc tổng quát mơ hình tìm kiếm thực thể dựa quan hệ ngữ nghĩa ẩn 29 Hình 2.3: Giá trị F-Score tương ứng với giá trị thay đổi α, θ1 45 Hình 2.4: So sánh PMI với f: tần suất (số lần đồng hiện) dựa MRR 46 Hình 2.5: Thực nghiệm IRS với nhãn thực thể B-PER 47 Hình 2.6: Thực nghiệm IRS với thực thể kiểu thời gian 47 Hình 3.1: Ngữ cảnh truy vấn 53 Hình 3.2: Ví dụ minh họa với truy vấn “gladiator” 54 Hình 3.3: Ví dụ minh họa với truy vấn “tiger” 54 Hình 3.4: Mơ hình tiếp cận gợi ý truy vấn hướng ngữ cảnh 55 Hình 3.5: Đồ thị phía (tập đỉnh Q – tập đỉnh U) 56 Hình 3.6: Sử dụng cấu trúc liệu mảng để phân cụm 59 Hình 3.7: Mơ trực quan tiến trình dựng hậu tố 62 Hình 3.8: Phase online: Tiến trình gợi ý truy vấn 63 92 Các máy tìm kiếm tổng quát nói thu thập liệu từ không gian Internet nơi kho liệu khổng lồ, đa ngôn ngữ, nhiều lĩnh vực, đa cấu trúc, định dạng, v.v Kỹ thuật phân loại kết sau tìm kiếm kỹ thuật online Vì yếu tố thời gian phải trả kết tức thời cho người sử dụng - nên gần không khả thi thực phân loại tài liệu Máy tìm kiếm tổng quát Một trở ngại khác, đáng kể mà tốn phân loại online phải vượt qua, gán nhãn (đặt tiêu đề cho chủ đề tương ứng) Tiêu đề phải mô tả đủ ngữ nghĩa dễ hiểu để người dùng lựa chọn Máy tìm kiếm luận án thực tìm kiếm chuyên sâu, miền liệu cụ thể (dữ liệu tác nghiệp Hàng không, "vắng mặt" Internet), lượng tài liệu biết trước, áp dụng giải thuật dựng dàn thực off-line, kết hợp với việc duyệt lại tập nhãn cách thủ công (thêm yếu tố xử lý người), thích hợp cho kỹ thuật phân loại kết trước tìm kiếm 3.4 Kết chương Dưới góc nhìn lý thuyết, Chương trình bày cách tường minh phương pháp hướng ngữ cảnh: tư tưởng, ngun lý, mơ hình, cơng thức thuật toán, v.v nêu lên đề xuất cải thiện kỹ thuật Dưới góc nhìn thực nghiệm, việc cài đặt (các biến, cấu trúc liệu, thuật toán, đáp ứng tức thời gợi ý truy vấn, ) trở nên hoàn toàn khả thi Kết thực nghiệm đưa dạng gợi ý: Gợi ý truy vấn, gợi ý tài liệu gợi ý chủ đề Đóng góp chương bao gồm: 1) Ứng dụng kỹ thuật hướng ngữ cảnh, xây dựng máy tìm kiếm chuyên sâu áp dụng hướng ngữ cảnh miền sở tri thức riêng (dữ liệu hàng không) 2) Đề xuất độ đo tương đồng tổ hợp toán gợi ý truy vấn theo ngữ cảnh nhằm nâng cao chất lượng gợi ý Ngoài ra, chương có đóng góp bổ sung thực nghiệm: i) Tích hợp nhận dạng tổng hợp tiếng nói tiếng Việt tùy chọn vào máy tìm kiếm để tạo thành hệ tìm kiếm có tương tác tiếng nói ii) Áp dụng cấu trúc dàn khái niệm để phân lớp tập kết trả Phương pháp gợi ý truy vấn hướng ngữ cảnh nhánh tốn máy tìm kiếm, nhiên vấn đề thiết thực, thu hút quan tâm nghiên cứu 93 rõ ràng toán khó Nắm vững nguyên lý, cài đặt hiệu phương pháp hướng ngữ cảnh, giải pháp tốt hỗ trợ người sử dụng q trình tìm kiếm thơng tin Máy tìm kiếm tiếng Việt áp dụng phương pháp hướng ngữ cảnh hứa hẹn đem đến kết đột biến, thú vị hiệu lĩnh vực gợi ý truy vấn Việc phát tri thức tiếp tục đặt nhiều vấn đề nội Query Logs cịn chứa nhiều tri thức tiềm ẩn, ví dụ liệu {IP, query}: phản ánh lịch sử người dùng (user’s history) khai phá để tìm kiếm cá nhân hóa (personalized search) hay gợi ý truy vấn cá nhân hóa (personalized query suggestion); Hay khai phá liệu cặp {URL, title} để tìm kết liên quan Hoặc khai phá đồ thị phía để tìm mối quan hệ tài liệu – truy vấn dù tập tài liệu (tập đỉnh U), tập truy vấn (tập đỉnh Q) khơng có terms chung: Nếu tập tài liệu D’ thường xuyên click đọc tập queries Q’, terms Q’ liên quan mạnh đến terms D’ Cũng vậy, gợi ý truy vấn phân loại tập kết thực chất tiến trình riêng biệt, cần nghiên cứu áp dụng tính tốn song song 94 CHƯƠNG 4: KẾT LUẬN VÀ KIẾN NGHỊ Trong phần kết luận, tác giả tóm lược lại kết đóng góp luận án Ngồi ra, tác giả trình bày số hạn chế luận án thảo luận hướng phát triển nghiên cứu tương lai Kết luận 4.1 Áp dụng phân tích khái niệm hình thức (FCA – Formal Concept Analysis) cấu trúc dàn khái niệm để khai phá tìm kiếm liệu văn Dàn cấu trúc đẹp mặt tốn học, thích hợp với khai phá, phân tích gom cụm liệu, dàn khơng hồn tồn thích hợp lĩnh vực tìm kiếm Do đó, luận án chuyên sâu hai hướng nghiên cứu chính: i) Tìm kiếm thực thể dựa quan hệ ngữ nghĩa, nhằm mô khả suy thông tin/tri thức chưa biết suy diễn tương tự, khả “tự nhiên” người; ii) Gợi ý truy vấn hướng ngữ cảnh - xét chuỗi truy vấn liền mạch nhằm nắm bắt ý định tìm kiếm, sau đưa xu hướng mà tri thức số đơng thường hỏi sau truy vấn hành Đóng góp luận án gồm: Với phương pháp Tìm kiếm thực thể dựa quan hệ ngữ nghĩa ẩn, nhằm giải toán thứ nhất: - Luận án nghiên cứu, xây dựng kỹ thuật tìm kiếm thực thể dựa quan hệ ngữ nghĩa ẩn sử dụng phương pháp phân cụm nhằm nâng cao hiệu tìm kiếm Với phương pháp Gợi ý truy vấn Hướng ngữ cảnh, mục đích giải tốn thứ hai: - Ứng dụng kỹ thuật hướng ngữ cảnh, xây dựng máy tìm kiếm chuyên sâu áp dụng hướng ngữ cảnh miền sở tri thức riêng (dữ liệu hàng không) 95 - Đề xuất độ đo tương đồng tổ hợp toán gợi ý truy vấn theo ngữ cảnh nhằm nâng cao chất lượng gợi ý 4.2 Kiến nghị Với hướng nghiên cứu Tìm kiếm thực thể dựa quan hệ ngữ nghĩa ẩn, nhận thấy mơ hình tìm kiếm bị cứng hóa thực thể đầu vào, nhược điểm Để khắc phục nhược điểm, mặt - xét thêm loại ánh xạ quan hệ, thêm yếu tố thời gian để kết tìm kiếm cập nhật xác Mặt khác, mở rộng tìm kiếm thực thể với truy vấn đầu vào gồm thực thể, ví dụ: “Sơng dài Trung Quốc?”, mơ hình tìm kiếm thực thể dựa ngữ nghĩa ẩn đưa câu trả lời xác: “Trường Giang”, dù Corpus có câu gốc “Trường Giang sơng lớn Trung Quốc” Với hướng nghiên cứu Gợi ý truy vấn dựa kỹ thuật hướng ngữ cảnh, mặt, nghiên cứu cịn vài thiếu sót chí khuyết điểm, lọc nhiễu âm đầu vào để cải thiện chất lượng nhận dạng, áp dụng học máy để tối ưu tham số α, β, γ cách tính độ tương đồng tổ hợp phương pháp tìm kiếm hướng ngữ cảnh Mặt khác, nghiên cứu biến thể tương đồng quan hệ RelSim (Relational Similarity) [100], nghiên cứu phương pháp kết hợp Word2Vec, Doc2Vec, Word embeddings [101] … cho máy tìm kiếm Hướng phát triển, luận án tập trung vào nghiên cứu áp dụng thuật tốn thích nghi, mơ hình thống kê, thành phần cốt lõi hệ thống xử lý ngôn ngữ tự nhiên 96 DANH MỤC CƠNG TRÌNH CỦA TÁC GIẢ Trần Lâm Quân - Vũ Tất Thắng “Tìm kiếm thực thể dựa quan hệ ngữ nghĩa ẩn” Hội thảo Quốc gia lần thứ XXI: Một số vấn đề chọn lọc Công nghệ Thông tin Truyền thông (27-28/07.2018) Trần Lâm Quân - Vũ Tất Thắng “Search for entities based on the Implicit Semantic Relations” Tạp chí Tin học Điều khiển 2019 (Volume 35, Number 2019) Trần Lâm Quân - Đỗ Quốc Trường - Phan Đăng Hưng - Đinh Anh Tuấn - Phi Tùng Lâm - Vũ Tất Thắng - Lương Chi Mai “A study of applying Vietnamese voice interaction for a context-based Aviation search engine” The IEEE RIVF 2013 International Conference on Computing and Communication Technologies 10-13.11.2013 Trần Lâm Quân – Vũ Tất Thắng “Context-aware and voice interactive search” (the SoCPaR 2013 special issue) Journal of Network and Innovative Computing ISSN 2160-2174 Volume 2, pages 233-239, 2014 Trần Lâm Quân - Phan Đăng Hưng - Vũ Tất Thắng “Tìm kiếm giọng nói với kĩ thuật hướng ngữ cảnh” Tạp chí Khoa học Công nghệ - Viện Hàn lâm Khoa học Công nghệ Việt Nam ISSN: 0886 768X Số 52 (1B), 29.06.2014 Trần Lâm Quân - Lê Đức Hiếu - Lê Ngọc Thế - Vũ Tất Thắng “Một cách tiếp cận sử dụng cấu trúc dàn khái niệm để khai phá tìm kiếm liệu văn bản” Hội thảo Quốc gia lần thứ XVII: Một số vấn đề chọn lọc Công nghệ Thông tin Truyền thông 30-31.10.2014 97 TÀI LIỆU THAM KHẢO [1] Christoph Kofler, Martha Larson, Alan Hanjalic, User Intent in Multimedia Search: A Survey of the State of the Art and Future Challenges ACM Journals Computing Surveys, Vol 49, No 2, August 2016 [2] R Song, Z Luo, J.-Y Nie, Y Yu and H.-W Hon, Identification of ambiguous queries in web search Information Processing & Management, 45(2), pages 216– 229, 2009 [3] W Song, Y Liu, L Liu et al., Semantic composition of distributed representations for query subtopic mining Frontiers Inf Technol Electronic Eng 19, 2018 [4] J Xu, F Ye, Query Recommendation Using Hybrid Query Relevance Future Internet, 2018 [5] S Gaou, A Bekkari, The Optimization of Search Engines to Improve the Ranking to Detect User’s Intent In Advanced Information Technology, Services and Systems (AIT2S) 2017 [6] Dirk Lewandowski, Jessica Drechsler, Sonja von Mach, Deriving query intents from web search engine queries Journal of the American Society for Information Science and Technology, September 2012 [7] Imrattanatrai, Wiradee & Kato, Makoto & Tanaka, Katsumi & Yoshikawa, Masatoshi, Entity Ranking for Queries with Modifiers Based on Knowledge Bases and Web Search Results In IEICE Transactions on Information and Systems, 2018 [8] Li, Jing & Sun, Aixin & Han, Ray & Li, Chenliang, A Survey on Deep Learning for Named Entity Recognition In IEEE Transactions on Knowledge and Data Engineering, 2020 [9] H Cao, D Jiang, J Pei, Q He, Z Liao, E Chen and H Li, Towards contextaware search by learning a very large variable length hidden markov model from search logs In Proceedings of the 18th international conference on World wide web, pages 191–200, April 2009 [10] H Cao, D Jiang, J Pei, Q He, Z Liao, E Chen, E and H Li, Context-aware query suggestion by mining click-through and session data In Proceedings of KDD, pages 875-883, 2008 98 [11] Peter D Turney, The latent relation mapping engine: Algorithm and experiments Journal of Artificial Intelligence Research (JAIR), 33, pages 615-655, 2008 [12] Dedre Gentner, Structure-mapping: A Theoretical Framework for Analogy Elsevier Cognitive Science, Volume 7, Issue 2, pages 155-170, April–June 1983 [13] Peter D Turney, M.L Littman, Corpus-based Learning of Analogies and Semantic Relations Machine Learning, 60(1–3), pages 251–278, 2005 [14] Peter D Turney, Distributional semantics beyond words: Supervised learning of analogy and paraphrase Transactions of the Association for Computational Linguistics (TACL), 1, pages 353-366, 2013 [15] Peter D Turney and P Pantel, From frequency to meaning: Vector space models of semantics Journal of Artificial Intelligence Research (JAIR), 37, pages 141-188, 2010 [16] Peter D Turney, Similarity of semantic relations Computational Linguistics, 32(3), 2006 [17] Bollegala, Danushka & Matsuo, Yutaka & Ishizuka, Mitsuru, Measuring the Similarity between Implicit Semantic Relations from the Web Proceedings of WWW, pages 651-660, 2009 [18] Duc, N., Bollegala et al., Cross-Language Latent Relational Search: Mapping Knowledge across Languages In Association for the Advancement of AI, 2011 [19] Kato et al., Query by analogical example: relational search using web search engine indices In Proceedings of the 18th ACM conference on Information and knowledge management ACM, 2009 [20] Y.J Cao et al., Relational Similarity Measure: An Approach Combining Wikipedia and WordNet Journal of Applied Mechanics and Materials, 2011 [21] E Agirre, E Alfonseca, K Hall, J Kravalova, M Pasca and A Soroa, A study on similarity and relatedness using distributional and wordnet-based approaches In NAACL ’09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 19–27, 2009 [22] Mikolov et al., Distributed Representations of Words and Phrases and their Compositionality In Advances in Neural Information Processing Systems 26 (NIPS), 99 2013 [23] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov, Enriching Word Vectors with Subword Information Transactions of the Association for Computational Linguistics, Vol 5, pages 135-146, 2017 [24] Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, Tomas Mikolov Learning Word Vectors for 157 Languages In LREC (Language Resources and Evaluation) Feb 19, 2018 [25] Tomas Mikolov et al., Efficient Estimation of Word Representations in Vector Space In ICLR (Workshop Poster), 2013 [26] Kata Gábor, Haïfa Zargayouna, Isabelle Tellier, Davide Buscaldi, Thierry Charnois, Exploring Vector Spaces for Semantic Relations In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1814–1823, 2017 [27] Hugo Caselles-Dupré, Florian Lesaint, Jimena Royo-Letelier, Word2vec applied to recommendation: hyperparameters matter In Proceedings of the 12th ACM Conference on Recommender Systems, pages 352–356, September 2018 [28] S Yilmaz, S Toklu, A deep learning analysis on question classification task using Word2vec representations Neural Comput & Applic 32, pages 2909–2928, 2020 [29] Prajakta Shinde, Pranjali Joshi, Survey of various query suggestion system, International Journal of Engineering And Computer Science ISSN:2319-7242; Volume Issue 12, pages 9576-9580, December 2014 [30] Susan Dumais, Personalized search: potential and pitfalls, In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, October 2016 [31] Jinyoung Kim, Jaime Teevan, Nick Craswell, Explicit In Situ User Feedback for Web Search Results SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, pages 829–832, July 2016 [32] Sørig, Esben; Collignon, Fiebrink and Kando, Evaluation of Rich and Explicit Feedback for Exploratory Search In Second Workshop on Evaluation of Personalisation in Information Retrieval (WEPIR), March, 2019 100 [33] Thorsten Joachims et al., Accurately Interpreting Clickthrough Data as Implicit Feedback SIGIR, Volume 51, Issue 1, June 2017 [34] Edward Rolando Núñez-Valdéz et al., Implicit feedback techniques on recommender systems applied to electronic books Computers in Human Behavior Volume 28, Issue 4, ScienceDirect, 2012 [35] Gai Li and Qiang Che, Exploiting Explicit and Implicit Feedback for Personalized Ranking Hindawi Publishing Corporation - Mathematical Problems in Engineering, Article ID 2535329, 11 pages, 2016 [36] Keping Bi, Choon Hui Teo, Yesh Dattatreya, Vijai Mohan, W Bruce Croft Leverage Implicit Feedback for Context-aware Product Search In SIGIR 2019 eCom, Paris, France, July 2019 [37] W Chen, F Cai, H Chen, M De Rijke, Personalized query suggestion diversification In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 817–820, 2017 [38] W Chen, F Cai, H Chen et al., Personalized query suggestion diversification in information retrieval Springer Link, Front Comput Sci 14, 143602, 19 December 2019 [39] C Bouhini, M Géry and C Largeron, Personalized information retrieval models integrating the user's profile IEEE Tenth International Conference on Research Challenges in Information Science (RCIS), Grenoble, pages 1-9, 2016 [40] Hiteshwar Kumar Azad, Akshay Deepak, A new approach for query expansion using Wikipedia and WordNet Elsevier, Information Sciences Volume 492, pages 147-163, August 2019 [41] Hiteshwar Kumar Azad, Akshay Deepak Query expansion techniques for information retrieval: A survey Elsevier Information Processing & Management Volume 56, Issue 5, pages 1698-1735, September 2019 [42] Claveau, Vincent, Kijak, Ewa, Distributional thesauri for information retrieval and vice versa In Language and Resource Conference, LREC, 2016 [43] Q Chen, L Yao and J Yang, Short text classification based on LDA topic model International Conference on Audio, Language and Image Processing (ICALIP), Shanghai, pages 749-753, 2016 [44] J Xu, F Ye, Query Recommendation Using Hybrid Query Relevance Future 101 Internet Journals Volume 10, Issue 11, 2018 [45] Wanyu Chen, Fei Cai, Honghui Chen, Maarten de Rijke, Personalized Query Suggestion Diversification In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 817–820, August 2017 [46] Choudhary, Durga & Chandra, Subhash, Adaptive Query Recommendation Techniques for Log Files Mining to Analysis User’s Session Pattern In International Journal of Computer Applications, 2016 [47] J Guo, X Zhu, Y Lan et al., Modeling users’ search sessions for high utility query recommendation Information Retrieval Journal 20, 2017 [48] Lingling Meng, A Survey on Query Suggestion International Journal of Hybrid Information Technology Vol 7, No 6, 2014 [49] Bai, Lu, Jiafeng Guo, Xueqi Cheng, Xiubo Geng and Pan Du, Exploring the Query-Flow Graph with a Mixture Model for Query Recommendation SIGIR Workshop on Query Representation and Understanding, July 2011 [50] P Boldi, F Bonchi, C Castillo, D Donato, A Gionis and S Vigna, The queryflow graph: model and applications In Proceeding of the 17th ACM conference on Information and knowledge management (CIKM’08), pages 609–618, 2008 [51] P Boldi, F Bonchi, C Castillo, D Donato, A Gionis and S Vigna, Query Suggestions Using Query-Flow Graphs In Proceedings of the 2009 workshop on Web Search Click Data (WSCD ’09), pages 56–63, Feb 9, 2009 [52] Xinbao Shao, Qingshan Li, Yishuai Lin, Boyu Zhou, A meta-search group recommendation mechanism based on user intent identification In Proceedings of the 6th International Conference on Software and Computer Applications (ICSCA '17), pages 102–106, February 2017 [53] E Sadikov, J Madhavan, L.Wang and A Halevy, Clustering query refinements by user intent In Proceedings of the International World Wide Web Conference (WWW’10), pages 841–850, 2010 [54] Saxena et al., A Review of Clustering Techniques and Developments Article in Neurocomputing, July 2017 [55] T Sajana, C M Sheela Rani and K V Narayana, A Survey on Clustering Techniques for Big Data Mining Indian Journal of Science and Technology, Vol 102 9(3), January 2016 [56] Parth Ritin Saraiya et al., Study of Clustering Techniques in the Data Mining Domain In International Journal of Computer Science and Mobile Computing, Vol.7 Issue.11, pages 31-37, November 2018 [57] K Sathiyakumari, G Manimekalai, V Preamsudha and M P Scholar, A survey on various approaches in document clustering Int J Comput Technol, pages 1534– 1539, 2011 [58] Manpreet Kaur, Usvir Kaur, A Survey on Clustering Principles with K-means Clustering Algorithm Using Different Methods in Detail IJCSMC, Vol 2, Issue 5, pages 327 – 331, May 2013 [59] Gursharan Saini, Harpreet Kaur, K-Mean Clustering and PSO: A Review International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-3, Issue-5, June 2014 [60] Hamada M Zahera, Gamal F El Hady, F Waiel, Abd El-Wahed, Query Recommendation for Improving Search Engine Results In Proceedings of the World Congress on Engineering and Computer Science (WCECS), October 20-22, 2010 [61] Naeem, Arshia; Rehman, Mariam; Anjum, Maria; Asif, Muhammad, Development of an efficient hierarchical clustering analysis using an agglomerative clustering algorithm Current Science (00113891), Vol 117 Issue 6, pages 10451053, 9/25/2019 [62] Dhiliphanrajkumar Thambidurai, Suruliandi Aandavar and Selvaperumal Prakasam Query Recommendation by Coupling Personalization with Clustering for Search Engine I.J Information Technology and Computer Science, pages 82-91, 11/2016 [63] W Wu, H Li, and J Xu, Learning query and document similarities from clickthrough bipartite graph with metadata In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, 2013 [64] L Noce, I Gallo and Zamberletti, A Query and Product Suggestion for Price Comparison Search Engines based on Query-product Click-through Bipartite Graphs In Proceedings of the 12th International Conference on Web Information Systems and Technologies (WEBIST 2016) - Volume 1, pages 17-24, 2016 103 [65] Sébastien Harispe, Sylvie Ranwez, Stefan Janaqi, and Jacky Montmain, Semantic Similarity from Natural Language and Ontology Analysis Synthesis Lectures on Human Language Technologies, Vol 8, No (Arxiv, 167 pages), May 2015 [66] Slimani, Thabet, Description and Evaluation of Semantic Similarity Measures Approaches International Journal of Computer Applications Vol 80 25-33 10.5120/13897-1851, 2013 [67] Christoph Lofi, Measuring Semantic Similarity and Relatedness with Distributional and Knowledge-based Approaches Information and Media Technologies, Volume 10, Issue Online ISSN 1881-0896, pages 493-501, September 15, 2015 [68] N Craswell, Mean Reciprocal Rank In Encyclopedia of Database Systems Springer, Boston, MA, 2009 [69] Yao, Yuan et al., DocRED: A Large-Scale Document-Level Relation Extraction Dataset ACL (Association for Computational Linguistics), 2019 [70] Michele Banko and Oren Etzioni, The Tradeoffs Between Open and Traditional Relation Extraction In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, ACL 2008, Columbus, Ohio, USA, pages 28-36, 2008 [71] Yun Liu, Mingxin Li, Hui Liu, Junjun Cheng, Yanping Fu, Research of Unsupervised Entity Relation Extraction Journal of Computers Vol 30 No 1, pages 31-41, 2019 [72] Bollegala et al., Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web In Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pages 151-160, Raleigh, North Carolina, USA, 2010 [73] Parapar, Javier & Losada, David & Presedo-Quindimil, Manuel & Barreiro, Alvaro Using score distributions to compare statistical significance tests for information retrieval evaluation Journal of the Association for Information Science and Technology 71 (10.1002/asi.24203), 2019 [74] Christopher D Manning, Prabhakar Raghavan, and Hinrich Schutze, Introduction to Information Retrieval Cambridge University Press, 2008 104 [75] W Chen, F Cai, H Chen et al., Personalized query suggestion diversification in information retrieval Front Comput Sci 14, 143602, 2020 [76] Trần Lâm Quân, Vũ Tất Thắng, Kỹ thuật gợi ý truy vấn hướng ngữ cảnh tốn tìm kiếm Hội thảo Quốc gia lần thứ XV: Một số vấn đề chọn lọc Công nghệ Thông tin Truyền thông, 03-04.12.2012 [77] Z Liao, D Jiang, E Chen, P Pei, H Cao, H Li, Mining Concept Sequences from Large-Scale Search Logs for Context-Aware Query Suggestion ACM Trans Intell Syst Technol 9, 4, Article 87, 40 pages, 2011 [78] T Ruotsalo, G Jacucci & S Kaski, Interactive faceted query suggestion for exploratory search: Whole-session effectiveness and interaction engagement Journal of the Association for Information Science and Technology, 2019 [79] Souvick Ghosh, Chirag Shah, Session-based Search Behavior in Naturalistic Settings for Learning-related Tasks In CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pages 2449– 2452, November 2019 [80] Sowmya Yalamanchili (IBM), Log mining in Query recommendation International Journal of Information Technology & Systems, Vol 4; No 1: ISSN: 2277-9825, 2015 [81] X Fei, S Zheng, L Yan and C Fan, A improved sequential pattern mining algorithm based on PrefixSpan World Automation Congress (WAC), Rio Grande, 2016 [82] Zhengshen Jiang, Hongzhi Liu, Bin Fu, Zhonghai Wu, Tao Zhang, Recommendation in Heterogeneous Information Networks based on Generalized Random Walk Model and Bayesian Personalized Ranking In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM '18), pages 288–296, February 2018 [83] Gao, J., et al., Smoothing clickthrough data for web search ranking SIGIR'09, pages 355-362, 2009 [84] C Rasell and M Szummer, Random walks on the click graph In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval(SIGIR’07), pages 239-246, 2007 [85] M Shajalal, M Z Ullah, A N Chy and M Aono, Query subtopic 105 diversification based on cluster ranking and semantic features International Conference On Advanced Informatics: Concepts, Theory And Application (ICAICTA), George Town, pages 1-6, 2016 [86] Xiaofei, Zhu., et al., A unified framework for recommending diverse and relevant queries In Proceedings of the 20th international conference on World wide web (WWW '11), pages 37–46, March 2011 [87] Wanyu Chen, Fei Cai, Honghui Chen, Maarten de Rijke, Personalized Query Suggestion Diversification In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '17), pages 817–820, August 2017 [88] Claudio Carpineto, Sergei O Kuznetsov, Amedeo Napoli, Formal Concept Analysis Meets Information Retrieval Workshop co-located with the 35th European Conference on Information Retrieval (ECIR), 2013 [89] Frano Škopljanac-Mačina, Bruno Blašković, Formal Concept Analysis – Overview and Applications, ScienceDirect, 24th DAAAM International Symposium on Intelligent Manufacturing and Automation, 2013 [90] Larry González, Aidan Hogan, Modelling Dynamics in Semantic Web Knowledge Graphs with Formal Concept Analysis In Proceedings of the 2018 World Wide Web Conference (WWW '18), pages 1175–1184, April 2018 [91] A Abid, M Rouached & N Messai, Semantic web service composition using semantic similarity measures and formal concept analysis Multimed Tools Appl 79, 6569–6597, Dec 2019 [92] Claudio Carpineto and Giovanni Romano, Using Concept Lattices for Text Retrieval and Mining In Formal Concept Analysis, pages 161-179, 2005 [93] Singh, Prem & Cherukuri, Aswani Kumar, Concept lattice reduction using different subset of attributes as information granules In Granular Computing Springer International Publishing Switzerland 2016 [94] Bernhard Ganter, Sebastian Rudolph, Gerd Stumme, Explaining Data with Formal Concept Analysis, Springer International Publishing, 2019 [95] Nizar Messai, Marie-Dominique Devignes, Amedeo Napoli, and Malika SmailTabbone, BR-Explorer: An FCA-based algorithm for Information Retrieval Fourth International Conference, CLA, 2006 106 [96] Ganter, Wille, Formal Concept Analysis: Mathematical Foundations SpringerVerlag, Berlin Heidelberg New York, 1999 [97] D.G Kourie, S Obiedkov, B.W Watson, D Van der Merwe, An incremental algorithm to construct a lattice of set intersections Sci Comput Programm 74, pages 128–142, 2009 [98] Trần Lâm Quân, Tìm kiếm hệ mới: Tìm kiếm thơng minh lai Chuyên san ngành Hàng không Việt Nam, 2011 [99] Trần Lâm Quân, Vũ Tất Thắng, An Approach Using Concept Lattice Structure for Data Mining and Information Retrieval Journal of Science and Technology: Issue on Information and Communication Technology, Vol 1, No.1, August 2015 http://ict.jst.udn.vn/index.php/jst/article/view/4 [100] Wang, Chenguang et al., RelSim: Relation Similarity Search in Schema-Rich Heterogeneous Information Networks In Proceedings of the 2016 SIAM International Conference on Data Mining (SDM), 2016 [101] Kata Gábor, Haïfa Zargayouna, Isabelle Tellier, Davide Buscaldi, Thierry Charnois, Exploring Vector Spaces for Semantic Relations In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1814–1823, September 7–11, 2017 ... thực thể dựa quan hệ ngữ nghĩa, miền ngôn ngữ tiếng Việt 1.4 Bài toán gợi ý truy vấn hướng ngữ cảnh Trong toán, động lực nghiên cứu kỹ thuật gợi ý truy vấn dựa hướng ngữ cảnh Đối với chế tìm kiếm. .. tích, đánh giá kết sau thực nghiệm So sánh với kỹ thuật khác Đóng góp luận án Luận án nghiên cứu giải vấn đề tìm kiếm thực thể dựa quan hệ ngữ nghĩa gợi ý truy vấn hướng ngữ cảnh Đóng góp luận án. .. Text Corpus Hình 2.1: Tìm kiếm dựa quan hệ ngữ nghĩa với đầu vào gồm thực thể 28 Hình thái tìm kiếm thực thể dựa quan hệ ngữ nghĩa ẩn phải xác định quan hệ ngữ nghĩa thực thể, đồng thời tính tốn

Định dạng
Số trang	117
Dung lượng	2,54 MB