Luận án một số phương pháp phục vụ xếp hạng các trang web trong tìm kiếm xuyên ngữ

LỜI CAM ĐOAN Tôi xin cam đoan: Luận án cơng trình nghiên cứu thực cá nhân tôi, thực Trường Đại học Bách khoa, Đại học Đà Nẵng hướng dẫn khoa học PGS.TS Võ Trung Hùng PGS.TS Huỳnh Công Pháp Các số liệu, kết luận nghiên cứu trình bày luận án trung thực chưa cơng bố cơng trình tác giả khác Tôi xin chịu trách nhiệm lời cam đoan Tác giả, Lâm Tùng Giang -i- MỤC LỤC MỞ ĐẦU 1 ĐẶT VẤN ĐỀ MỤC TIÊU, ĐỐI TƯỢNG VÀ PHẠM VI NGHIÊN CỨU 2.1 Mục tiêu 2.2 Đối tượng 2.3 Phạm vi ĐÓNG GÓP CỦA LUẬN ÁN BỐ CỤC CỦA LUẬN ÁN CHƯƠNG 1: TỔNG QUAN VÀ ĐỀ XUẤT NGHIÊN CỨU 1.1 TRUY VẤN THÔNG TIN 1.1.1 Khái niệm 1.1.2 Định nghĩa hình thức 10 1.1.3 Sơ đồ xử lý hệ thống truy vấn thông tin 10 1.1.4 Các mơ hình truy vấn thông tin truyền thống 12 1.1.5 Khai thác quan hệ thuật ngữ văn 16 1.2 ĐÁNH GIÁ HỆ THỐNG TRUY VẤN THÔNG TIN 19 1.2.1 Khái niệm 19 1.2.2 Các độ đo 20 1.2.3 Môi trường thực nghiệm 22 1.3 TRUY VẤN THÔNG TIN XUYÊN NGỮ 24 1.3.1 Khái niệm 24 1.3.2 Các hướng tiếp cận 24 1.3.3 Các kỹ thuật dịch tự động 25 1.4 CÁC KỸ THUẬT XẾP HẠNG LẠI 25 - ii - 1.4.1 Xếp hạng xếp hạng lại 25 1.4.2 Khai thác thơng tin máy tìm kiếm có sẵn 26 1.4.3 Học xếp hạng 28 1.4.4 Khai thác thông tin người sử dụng 30 1.5 XẾP HẠNG TRANG WEB 31 1.5.1 Đặc thù tìm kiếm web 31 1.5.2 Các phương pháp xếp hạng trang Web 32 1.5.3 Xếp hạng trang Web tìm kiếm xuyên ngữ 36 1.6 CÁC HẠN CHẾ VÀ ĐỀ XUẤT NGHIÊN CỨU 37 1.6.1 Hạn chế 37 1.6.2 Đề xuất nghiên cứu 37 1.7 TIỂU KẾT CHƯƠNG 41 CHƯƠNG 2: DỊCH TỰ ĐỘNG PHỤC VỤ TRUY VẤN XUYÊN NGỮ 42 2.1 CÁC PHƯƠNG PHÁP DỊCH TỰ ĐỘNG 42 2.1.1 Sử dụng máy dịch 42 2.1.2 Sử dụng kho ngữ liệu 43 2.1.3 Sử dụng từ điển 44 2.1.4 Sử dụng ngôn ngữ trung gian 44 2.1.5 Sử dụng không gian ngữ nghĩa 45 2.1.6 Đánh giá chung 45 2.2 KHỬ NHẬP NHẰNG 46 2.3 MƠ HÌNH SỬ DỤNG TỪ ĐIỂN MÁY 47 2.3.1 Xây dựng liệu từ điển 48 2.3.2 Khử nhập nhằng dựa độ đo mức độ liên quan cặp từ 49 2.3.3 Các biến thể công thức MI 49 2.3.4 Thuật toán chọn dịch tốt 51 - iii - 2.3.5 Xây dựng câu truy vấn 58 2.4 THỰC NGHIỆM ÁP DỤNG CÔNG THỨC SMI 62 2.4.1 Môi trường thực nghiệm 62 2.4.2 Kết thực nghiệm 64 2.5 THỰC NGHIỆM TẠO BẢN DỊCH CÂU TRUY VẤN CĨ CẤU TRÚC 65 2.5.1 Mơi trường thực nghiệm 65 2.5.2 Cấu hình thực nghiệm 65 2.5.3 Kết thực nghiệm 66 2.6 TIỂU KẾT CHƯƠNG 67 CHƯƠNG 3: HỖ TRỢ DỊCH CÂU TRUY VẤN 69 3.1 CÁC KỸ THUẬT HỖ TRỢ DỊCH CÂU TRUY VẤN 69 3.1.1 Phân đoạn câu truy vấn ngôn ngữ nguồn 69 3.1.2 Mở rộng câu truy vấn 70 3.1.3 Thu hẹp câu truy vấn 71 3.1.4 Xử lý thuật ngữ từ điển 72 3.2 PHÂN ĐOẠN CÂU TRUY VẤN 73 3.2.1 Sử dụng công cụ vnTagger 73 3.2.2 Thuật toán WLQS 73 3.2.3 Kết hợp WLQS công cụ vnTagger 75 3.3 ĐIỀU CHỈNH CÂU TRUY VẤN Ở NGƠN NGỮ ĐÍCH 78 3.3.1 Phản hồi ẩn 79 3.3.2 Phản hồi ẩn truy vấn xuyên ngữ 81 3.3.3 Điều chỉnh câu truy vấn có cấu trúc ngơn ngữ đích 82 3.4 THỰC NGHIỆM 86 3.4.1 Cấu hình thực nghiệm 86 3.4.2 Kết 87 - iv - 3.5 TIỂU KẾT CHƯƠNG 89 CHƯƠNG 4: XẾP HẠNG LẠI 91 4.1 HỌC XẾP HẠNG DỰA TRÊN LẬP TRÌNH DI TRUYỀN 91 4.1.1 Mơ hình ứng dụng lập trình di truyền 93 4.1.2 Xây dựng công cụ kết thực nghiệm 94 4.1.3 Đánh giá 96 4.2 ĐỀ XUẤT CÁC MƠ HÌNH LÂN CẬN 97 4.2.1 Mơ hình CL-Büttcher 98 4.2.2 Mơ hình xếp hạng CL-Rasolofo 99 4.2.3 Mô hình xếp hạng CL-HighDensity 100 4.2.4 Thực nghiệm việc ứng dụng mơ hình lân cận xuyên ngữ 101 4.3 HỌC XẾP HẠNG TRANG WEB 103 4.3.1 Các mơ hình học xếp hạng 103 4.3.2 Môi trường thực nghiệm 106 4.3.3 Cấu hình thực nghiệm 109 4.3.4 Kết thực nghiệm 109 4.4 TIỂU KẾT CHƯƠNG 110 CHƯƠNG 5: HỆ THỐNG TÌM KIẾM WEB XUYÊN NGỮ VIỆT-ANH 111 5.1 THIẾT KẾ HỆ THỐNG 111 5.1.1 Các thành phần hệ thống & sơ đồ thuật toán 111 5.1.2 Dữ liệu từ điển 114 5.1.3 Dữ liệu đánh mục 114 5.2 PHƯƠNG PHÁP THỰC NGHIỆM 115 5.3 THỰC NGHIỆM CÁC GIẢI PHÁP DỊCH CÂU TRUY VẤN 116 5.3.1 Cấu hình thực nghiệm 116 5.3.2 Kết thực nghiệm 117 -v- 5.3.3 Đánh giá 119 5.4 THỰC NGHIỆM ĐIỀU CHỈNH CÂU TRUY VẤN 119 5.4.1 Cấu hình thực nghiệm 119 5.4.2 Kết thực nghiệm 120 5.4.3 Đánh giá 121 5.5 THỰC NGHIỆM XẾP HẠNG LẠI 121 5.5.1 Cấu hình thực nghiệm 122 5.5.2 Kết thực nghiệm 123 5.5.3 Đánh giá 125 5.6 ĐÁNH GIÁ HIỆU QUẢ KẾT HỢP CÁC KỸ THUẬT 125 5.7 TIỂU KẾT CHƯƠNG 128 KẾT LUẬN VÀ HƯỚNG PHÁT TRIỂN 129 KẾT LUẬN 129 1.1 Tóm tắt nội dung luận án 129 1.2 Các kết đạt 129 HƯỚNG PHÁT TRIỂN 132 TÀI LIỆU THAM KHẢO 133 - vi - DANH MỤC HÌNH VẼ Hình 1.1: Quá trình xử lý hệ thống truy vấn thơng tin 11 Hình 1.2: Biểu đồ trung bình 11 điểm 22 Hình 1.3: Mơ hình xếp hạng tìm kiếm Web đa ngữ 38 Hình 1.4: Sơ đồ xử lý giai đoạn truy vấn 39 Hình 3.1: Phân loại phương pháp mở rộng câu truy vấn 71 Hình 3.2: Phản hồi người dùng 79 Hình 3.3: Phản hồi ẩn độ phù hợp kết tìm kiếm ban đầu 79 Hình 3.4: Đồ thị trung bình 11 điểm 88 Hình 4.1: Hệ thống tìm kiếm Web đa ngữ Việt-Anh 107 Hình 5.1: Các thành phần hệ thống tìm kiếm Web Việt - Anh 111 Hình 5.2: Sơ đồ thuật toán hệ thống 112 Hình 5.3: So sánh cấu hình dùng dịch 118 Hình 5.4: So sánh cấu hình dùng dịch 119 Hình 5.5: Kết lần huấn luyện phương pháp 124 Hình 5.6: Điểm MAP sử dụng phương án dịch Top_three_all 127 Hình 5.7: Điểm MAP sử dụng phương án dịch Top_three_weight 128 - vii - DANH MỤC BẢNG Bảng 1.1 Thông tin sử dụng & đặc điểm mô hình xếp hạng 18 Bảng 2.1: Cấu hình thực nghiệm 63 Bảng 2.2: Kết thực nghiệm 64 Bảng 2.3: So sánh P@k MAP cấu hình 66 Bảng 3.1: Điểm số MAP 87 Bảng 3.2: Số lượng tài liệu phù hợp tải 88 Bảng 4.1 Ví dụ thuộc tính sưu tập OHSUMED 92 Bảng 4.2 So sánh giá trị MAP 95 Bảng 4.3 So sánh giá trị NDCG@k 96 Bảng 4.4: So sánh giá trị P@k 96 Bảng 4.5: Điểm MAP cấu hình thực nghiệm 102 Bảng 4.6: Mức độ tăng hiệu áp dụng mơ hình lân cận 102 Bảng 4.7: Các phương án hàm distance 105 Bảng 4.8: Kết thực nghiệm 109 Bảng 5.1: Các cấu hình đánh giá giải pháp dịch câu truy vấn 116 Bảng 5.2: So sánh giải pháp dịch câu truy vấn 117 Bảng 5.3: Cấu hình đánh giá kết điều chỉnh câu truy vấn 120 Bảng 5.4: So sánh giải pháp điều chỉnh câu truy vấn 121 Bảng 5.5: Cấu hình thực nghiệm học xếp hạng 122 Bảng 5.6: Kết thực nghiệm phương pháp học xếp hạng 123 Bảng 5.7: Đánh giá việc áp dụng kỹ thuật đề xuất 125 - viii - DANH MỤC TỪ VIẾT TẮT AP Average Precision CLEF Cross Language Evaluation Forum CLIR Cross Language Information Retrieval DF Document frequency FIRE Forum for Information Retrieval Evaluation GP Genetic Programming HITS Hypertext Induced Topic Search HTML Hyper Text Markup Language IDF Inverse Document Frequency IR Information Retrieval LETOR LEearning TO Rank LMIR Language Models in Information Retrieval LSI Latent Sematic Indexing MAP Mean Average Precision MI Mutual Information MRD Machine Readable Dictionary NDCG Normalized Discount Cumulative Gain PRF Pseudo-Relevance Feedback SMI Summary Mutual Information SVD Singular-Value Decomposition TF Term frequency TREC Text REtrieval Conference UNL Universal Network Language VSM Vector Space Model WLQS Word-Length-based Query Segmentation WWW Word Wide Web - ix - DANH MỤC THUẬT NGỮ Anchor Mốc, neo Authority Độ tin cậy Average Precision Độ xác trung bình Bag of Words Túi từ Bilingual Machine Readable Dictionary Từ điển máy song ngữ Binary Independence Retrieval – BIR Mơ hình truy vấn nhị phân độc lập Boolean model Mơ hình Boolean Cohesion Score Điểm liên kết Cross Language Information Retrieval - Truy vấn thông tin xuyên ngữ CLIR Cross-language Web Search Tìm kiếm web xuyên ngữ Data sparsity Tính thưa thớt liệu Degree of similarity Mức độ tương tự Discounted Cumulative Gain Độ lợi tích lũy giảm dần Fuzzy-Logic model Mơ hình lơ-gic mờ Gain Function Hàm lợi ích Hub Trung tâm Hyper Text Markup Language-HTML Ngơn ngữ siêu văn Hyperlink Siêu liên kết Information Retrieval – IR Truy vấn thông tin Inverse document frequency – IDF Tần suất tài liệu nghịch đảo IR model Mô hình truy vấn thơng tin Language Model – LMIR Mơ hình ngơn ngữ Latent Sematic Indexing - LSI Mơ hình mục ngữ nghĩa ngầm Learning to Rank Học xếp hạng Loss Function Hàm tổn thất Machine Learning - ML Học máy -x- nghiệm, hệ thống đề xuất có hiệu tốt (với độ đo MAP) so việc áp dụng dịch thủ công [86] Một kết quan trọng luận án với việc áp dụng đồng thời thành phần, chất lượng xếp hạng trang Web tìm kiếm xuyên ngữ nâng cao vượt kết xếp hạng sử dụng phương pháp dịch thủ công thực nghiệm tiến hành HƯỚNG PHÁT TRIỂN Bên cạnh kết đạt được, tác giả xác định hướng phát triển luận án tập trung giải vấn đề sau:  Các thuật toán xử lý câu truy vấn trình bày luận án nhạy cảm với loại ngơn ngữ, nội dung, kích thước câu truy vấn Trong khn khổ giới hạn thời gian, tác giả tập trung nghiên cứu mơ hình tìm kiếm với câu truy vấn tiếng Việt văn cần tìm kiếm tiếng Anh Các câu truy vấn trọng thực nghiệm câu truy vấn có độ dài trung bình, trường hợp câu truy vấn ngắn câu truy vấn dài chưa xem xét Hướng nghiên cứu mở rộng, hoàn chỉnh việc đánh giá thực nghiệm với cặp ngôn ngữ khác với độ dài câu truy vấn khác  Tối ưu hóa thuật toán tiền xử lý câu truy vấn, khử nhập nhằng Thời gian xử lý thuật toán xử lý câu truy vấn, khử nhập nhằng cần cải thiện  Nghiên cứu việc áp dụng kỹ thuật học máy khác, xây dựng tổ hợp hàm xếp hạng sở khác Hạn chế học máy dựa lập trình di truyền chi phí thời gian lớn Bên cạnh đó, luận án tập trung xem xét danh sách hàm sở hạn chế Hướng nghiên cứu xem xét áp dụng thuật toán học máy khác với danh sách mở rộng hàm sở - 132 - TÀI LIỆU THAM KHẢO [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Adriani Mirna (2000), "Using Statistical Term Similarity for Sense Disambiguation in Cross-Language Information Retrieval" Information Retrieval, vol 2, no 1, pp 69–80 Al-dallal Ammar, Abdul-wahab Rasha Shaker (2009), "Genetic Algorithm Based Mining for HTML Document" In: Second International Conference on Developments in eSystems Engineering (DESE), pp 343–348 Angeline Peter J (1994), "Genetic programming: On the programming of computers by means of natural selection," Biosystems., MIT Press Cambridge Baeza-Yates Ricardo, Ribeiro-Neto Berthier (1999), "Modern Information Retrieval" [Internet] 2nd ed Baeza-Yates RA, Ribeiro-Neto B, editors New York., Addison Wesley, 513 p Balasubramanian Niranjan, Drive Governors (2010), "Exploring Reductions for Long Web Queries" In: SIGIR’10, pp 571–578 Baliński Jaroslaw, Daniłowicz Czeslaw (2005), "Re-ranking method based on inter-document distances" Information Processing & Management, vol 41, no 4, pp 759–775 Ballesteros Lisa, Croft W Bruce (1997), "Phrasal translation and query expansion techniques for cross-language information retrieval" In: Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, pp 84–91 Ballesteros Lisa, Croft W Bruc (1998), "Statistical methods for cross language information retrieval" In: Statistical methods for cross language information retrieval, Kluwer Academic Publisher, pp 23–40 Ballesteros Lisa, Croft W Bruce (1998), "Resolving ambiguity for crosslanguage retrieval" Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’98, ACM Press, pp 64–71 Bendersky Michael, Croft W Bruce (2009), "Analysis of long queries in a large scale search log" In: Proceedings of the 2009 workshop on Web Search Click Data - WSCD ’09, ACM Press, pp 8–14 Berger Adam, Lafferty John (1999), "Information retrieval as statistical translation" In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’99, pp 222–229 Borlund Pia (2003), "The concept of relevance in information retrieval" Journal of the American Society for Information Science and Technology, vol 54, no 10, pp 913–925 Braschler Martin, Mateev Bojidar, Mittendorf Elke, Schauble Peter, Wechsler Martin (1999), "SPIDER Retrieval System at TREC7" In: NIST Special Publication, pp 509–518 - 133 - [14] Breese John S., Heckerman D., Kadie Carl (1998), "Empirical analysis of predictive algorithms for collaborative filtering" Proceedings of the 14th conference on Uncertainty in Artificial Intelligence, vol 461, no 8, pp 43–52 [15] Breiman Leo (2001), "Random Forests" Machine Learning, vol 45, no 1, pp 5–32 [16] Brin Sergey, Page Lawrence (1998), "The anatomy of a large-scale hypertextual Web search engine" Computer Networks and ISDN Systems, vol 30, no 1–7, pp 107–117 [17] Brown Peter F., Della Pietra Vincent J., Della Pietra Stephen A., Mercer Robert L (1993), "The mathematics of statistical machine translation: Parameter estimation" Computational linguistics, vol 19, pp 262–311 [18] Bui Thanh Hung, Nguyen Le Minh, Shimazu Akira (2012), "Sentence splitting for Vietnamese-English machine translation" In: Proceedings - 4th International Conference on Knowledge and Systems Engineering, KSE 2012, IEEE, pp 156–160 [19] Büttcher Stefan, Clarke Charles L.a., Lushman Brad (2006), "Term proximity scoring for ad-hoc retrieval on very large text collections" In: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’06, pp 621–622 [20] Callan Jamie (2000), "Distributed Information Retrieval" In: Advances in Information Retrieval, pp 127–150 [21] Callan J.P., Croft W.B., Harding S.M (1992), "The INQUERY retrieval system" In: Proceedings of the third international conference on database and expert systems applications, pp 78–83 [22] Cao Zhe, Qin Tao, Liu Tie-Yan, Tsai Ming-Feng, Li Hang (2007), "Learning to Rank : From Pairwise Approach to Listwise Approach" Proceedings of the 24th international conference on Machine learning, pp 129–136 [23] Chen Jiangping, Bao Yu (2009), "Information access across languages on the web: From search engines to digital libraries" In: Proceedings of the American Society for Information Science and Technology, pp 1–14 [24] Chidlovskii Boris, Glance Natalie S., Grasso M Antonietta (2000), "Collaborative Re-Ranking of Search Results" In: Proceedings of the National Conference on Artificial Intelligence 2000 Workshop on AI for Web Search, pp 18–23 [25] Chirita Paul Alexandru, Kohlsch Christian (2005), "Using ODP Metadata to Personalize Search Categories and Subject Descriptors" Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 178 185 [26] Cleverdon Cyril W., Keen Michael (1966), "ASLIB Cranfield Research Project: Factors determining the performance of indexing systems" College of Aeronautics, Cranfield [27] Clinchant Stéphane, Gaussier Eric (2013), "A Theoretical Analysis of Pseudo-Relevance Feedback Models" Proceedings of the 2013 Conference on the Theory of Information Retrieval - ICTIR ’13, pp 6–13 - 134 - [28] Clir Indian-language English (2012), "Handling OOV Words in Indianlanguage–English CLIR" In: Advances in Information Retrieval Springer Berlin Heidelberg, pp 476–479 [29] Clough Paul, Sanderson Mark (2013), "Evaluating the performance of information retrieval systems using test collections" Information Research, vol 18, no 2, pp 1–10 [30] Crammer Koby, Singer Yoram (2002), "Pranking with Ranking" Advances in Neural Information Processing Systems 14, vol 14, pp 641 647 [31] Crestani Fabio, Du Heather (2006), "Written versus spoken queries: A qualitative and quantitative comparative analysis" Journal of the American Society for Information Science and Technology, vol 57, no 7, pp 881–890 [32] Croft Bruce, Turtle Howard, Lewis David (1991), "The use of phrases and structured queries in information retrieval" In: SIGIR ’91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval, pp 32–45 [33] Cutler M., Shi Y., Meng W (1997), "Using the Structure of HTML Documents to Improve Retrieval" In: Proceedings of the USENIX Symposium on Internet Technologies and Systems: December 11, 1997, Monterey, California, pp 241–252 [34] Dang Van Bac, Ho Bao Quoc (2007), "Automatic construction of englishVietnamese parallel corpus through web mining" In: 2007 IEEE International Conference on Research, Innovation and Vision for the Future, RIVF 2007, Ieee, pp 261–266 [35] Deerwester Scott, Furnas George W., Landauer Thomas K., Harshman Richard (1990), "Indexing by Latent Semantic Analysis" Journal of the American Society for Information Scienceation Science, vol 41, no 6, pp 391–407 [36] Devi Pooja, Gupta Ashlesha, Dixit Ashutosh (2014), "Comparative Study of HITS and PageRank Link based Ranking Algorithms" International Journal of Advanced Research in Computer and Communication Engineering, vol 3, no 2, pp 5749–5754 [37] Dinh Quang Thang, Le Hong Phuong, Nguyen Thi Minh Huyen, Nguyen Cam Tu, Rossignol Mathias, Vu Xuan Luong (2008), "Word segmentation of Vietnamese texts : a comparison of approaches" In: 6th international conference on Language Resources and Evaluation - LREC, pp 1933–1936 [38] Dou Zhicheng, Song Ruihua, Wen Ji-Rong (2007), "A large-scale evaluation and analysis of personalized search strategies" Proceedings of the 16th international conference on World Wide Web - WWW ’07, pp 581 [39] Fagan Joel L (1987), "Experiments in Automatic Phrase Indexing For Document Retrieval:A Comparison of Syntactic and Non-Syntactic Methods" In: Proc tenth Ann Intl ACM SIGIR Conf on Research and Development in Information Retrieval, pp 91–101 [40] Fan Weiguo, Fox Edward a., Pathak Praveen, Wu Harris (2004), "The effects of fitness functions on genetic programming-based ranking discovery for web - 135 - [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] search" Journal of the American Society for Information Science and Technology, vol 55, no 7, pp 628–636 Fan Weiguo, Gordon Michael D., Pathak Praveen (2004), "A generic ranking function discovery framework by genetic programming for information retrieval" Information Processing and Management, vol 40, pp 587–602 Ferro Nicola, Peters Carol (2009), "CLEF 2009 Ad Hoc Track Overview : TEL & Persian Tasks" In: Proceedings of the 10th Cross-language Evaluation Forum Conference on Multilingual Information Access Evaluation: Text Retrieval Experiments (CLEF’09), pp 13–35 Frakes William B., Baeza-yates Ricardo (1992), "Information Retrieval : Data Structures & Algorithms" 1st ed Prentice Hall., Prentice Hall Freund Yoav, Iyer Raj, Schapire Robert E., Singer Yoram (2003), "An Efficient Boosting Algorithm for Combining Preferences" The Journal of Machine Learning Research, vol 4, pp 933–969 Friedman J.H (2001), "Greedy function approximation: A gradient boosting machine" Annals of Statistics, vol 29, no 5, pp 1189–1232 Gaillard Benoit, Bouraoui Jean-Leon, Guimier de Neef Emilie, Boualem Malek (2010), "Query expansion for Cross Language Information Retrieval Improvement" 2010 Fourth International Conference on Research Challenges in Information Science (RCIS), Ieee, pp 337–342 Gao Jianfeng, Nie Jian-Yun (2006), "A Study of Statistical Models for Query Translation : Finding a Good Unit of Translation" In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 194–201 Gao Jianfeng, Nie Jian-yun, He Hongzhao, Chen Weijun, Zhou Ming (2002), "Resolving Query Translation Ambiguity using a Decaying Co-occurrence Model and Syntactic Dependence Relations" In: 25th ACM SIGIR conference on Research and development in information retrieval, pp 183–190 Gao Jianfeng, Nie Jian-Yun, Xun Endong, Zhang Jian, Zhou Ming, Huang Changning (2001), "Improving query translation for cross-language information retrieval using statistical models" In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’01, ACM Press, pp 96–104 Gey Fredric (2009), "Romanization – An Untapped Resource for Out-ofVocabulary Machine Translation for CLIR" In: SIGIR Workshop on Information Access in a Multilingual World Gey Fredric, Aitao Chen (1998), "Phrase discovery for English and crosslanguage retrieval at TREC 6" In: Proceedings of the sixth text retrieval conference (TREC-6), pp 637–648 Ghorab M Rami, Zhou Dong, Lawless Seamus, Wade Vincent (2012), "Multilingual user modeling for personalized re-ranking of multilingual web search results" CEUR Workshop Proceedings, vol 872, pp 1–4 Grinstead Charles Miller, Snell James Laurie (2007), "Introduction to Probability" [Internet] Swarthmore College., American Mathematical Society, 520 p - 136 - [54] Hadjouni Myriam, Haddad Mohamed Ramzi, Baazaoui Hajer (2010), "Personalized Information Retrieval Approach" Information Retrieval [55] Hawking David, Thistlewaite Paul (1995), "Proximity Operators - So Near And Yet So Far" In: Proceedings of TREC-4, pp 295–304 [56] He Daqing, Ahn Jae-wook (2006), "Pitt at CLEF05: Data Fusion for Spoken Document Retrieval" Workshop of the Cross-Language Evalution Forum, CLEF 2005, vol 4022, pp 773–782 [57] He Daqing, Wu Dan (2008), "Translation enhancement: a new relevance feedback method for cross-language information retrieval" In: Proceedings of the 17th ACM conference on Information and knowledge management, pp 729–738 [58] Helou Mamoun Abu, Palmonari Matteo, Jarrar Mustafa (2016), "Effectiveness of automatic translations for cross-lingual ontology mapping" Journal of Artificial Intelligence Research, vol 55, pp 165–208 [59] Herbert Benjamin, Szarvas Gyorgy, Gurevych Iryna (2011), "Combining query translation techniques to improve cross-language information retrieval" In: ECIR 2011, pp 712–715 [60] Herbrich Ralf, Graepel Thore, Obermayer Klaus (2000), "Large Margin Rank Boundaries for Ordinal Regression" In: Advances in Large Margin Classifiers, MIT Press, pp 115–132 [61] Hiemstra Djoerd, Kraaij Wessel, Pohlmann Ren´ee, Westerveld Thijs (2000), "Translation Resources , Merging Strategies , and Relevance Feedback for Cross-Language" CrossLanguage Information Retrieval and Evaluation Workshop of CrossLanguage Evaluation Forum CLEF 2000, pp 102–115 [62] Hiemstra Djoerd, Mihajlovic Vojkan (2010), "A database approach to information retrieval: The remarkable relationship between language models and region models" [Internet] Advances in Information Retrieval Theory [63] Ho Bao Quoc, Dang Van Bac, Luong Minh Vy, Dong Thi Bich Thuy (2008), "English-Vietnamese Cross-Language Information Retrieval: An experimental study" In: 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies, pp 107–113 [64] Hoang Huu Hanh, Tjoa A Min (2006), "The State of the Art of Ontologybased Query Systems : A Comparison of Existing Approaches" In: International Conference on Computing and Informatics ICOCI [65] Huynh Cong Phap (2011), "New approach for collecting high quality parallel corpora from multilingual websites" In: Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services, ACM Press, pp 341–344 [66] Javed A Aslam, Montague Mark (2001), "Models for metasearch" In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp 276–284 [67] Jeh Glen, Widom Jennifer (2002), "SimRank : A Measure of StructuralContext Similarity " Proceedings of the eighth ACM SIGKDD international …, pp 538–543 - 137 - [68] Joachims Thorsten (2002), "Optimizing search engines using clickthrough data" In: Kdd ’02, pp 133–142 [69] Kanoulas Evangelos (2009), "Building Reliable Test and Training Collections in Information Retrieval" College of Computer and Information Science, Northeastern University Boston, Massachusetts [70] Kent Chow Kok, Salim Naomie (2010), "Web Based Cross Language Plagiarism Detection" 2010 Second International Conference on Computational Intelligence, Modelling and Simulation, vol 1, no 1, pp 199–204 [71] Kim S.U.N., Zhang Byoung-tak (2003), "Genetic Mining of HTML Structures for Effective Web-Document Retrieval" Applied Intelligence, vol 18, pp 243–256 [72] Kishida Kazuaki (2005), "Technical issues of cross-language information retrieval: a review" Information Processing & Management, vol 41, no 3, pp 433–455 [73] Kleinberg Jon M (1999), "Authoritative sources in a hyperlinked environment" Journal of the ACM, vol 46, no 5, pp 604–632 [74] Klementiev Alexandre, Roth Dan, Small Kevin (2007), "An Unsupervised Learning Algorithm for Rank Aggregation" Proceedings of European Conference on Machine Learning [75] Kraaij Wessel, Nie Jian-yun, Simard Michel (2003), "Embedding Web-Based Statistical Translation Models in Cross-Language" Comput Linguis, vol 29, no 3, pp 381–419 [76] Kraft Donald H., Buell Duncan A (1983), "Fuzzy sets and generalized Boolean retrieval systems" International Journal of Man-Machine Studies, vol 19, no 1, pp 45–56 [77] Kumaran Giridhar, Carvalho Vitor R (2009), "Reducing long queries using query quality predictors" In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, ACM Press, pp 564–571 [78] Lafferty John, Zhai Chengxiang (2001), "Document language models, query models, and risk minimization for information retrieval" In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp 111–119 [79] Lâm Tùng Giang, Võ Trung Hùng (2013), "Đánh giá thực nghiệm mơ hình truy vấn thông tin đa ngữ" In: Hội nghị quốc gia lần thứ VI Nghiên cứu ứng dụng Công nghệ thông tin, pp 103–107 [80] Lâm Tùng Giang, Võ Trung Hùng (2013), "Ứng dụng lập trình di truyền học xếp hạng" Tạp chí Khoa học Cơng nghệ trường Đại học Kỹ thuật, vol 92, pp 58–63 [81] Lam Tung Giang, Vo Trung Hung, Huynh Cong Phap (2013), "Experiments with query translation and re-ranking methods in Vietnamese-English bilingual information retrieval" In: Proceedings of the Fourth Symposium on Information and Communication Technology - SoICT ’13, ACM Press, pp 118–122 [82] Lam Tung Giang, Vo Trung Hung, Huynh Cong Phap (2013), "Building - 138 - [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] Evaluation Dataset in Vietnamese Information Retrieval" Journal of Science and Technology Danang University, vol 12, no 1, pp 37–41 Lam Tung Giang, Vo Trung Hung, Huynh Cong Phap (2015), "Building Structured Query in Target Language for Vietnamese – English Cross Language Information Retrieval Systems" International Journal of Engineering Research & Technology (IJERT), vol 4, no 4, pp 146–151 Lam Tung Giang, Vo Trung Hung, Huynh Cong Phap (2015), "Improve Cross Language Information Retrieval with Pseudo-Relevance Feedback" In: FAIR 2015, pp 315–320 Lam Tung Giang, Vo Trung Hung, Huynh Cong Phap (2015), "Building proximity models for Cross Language Information Retrieval" Issue on Information and Communication Technology- University of Danang, vol 1, no 1, pp 8–12 Lâm Tùng Giang, Võ Trung Hùng, Huỳnh Công Pháp (2016), "Áp dụng học máy dựa lập trình di truyền tìm kiếm Web xuyên ngữ" Tạp chí Khoa học Cơng nghệ, Đại học Đà Nẵng, vol 1, no 98, pp 93–97 Lavrenko Victor, Choquette Martin, Croft W Bruce (2002), "Cross-lingual relevance models" In: SIGIR-2002, pp 175–182 Lavrenko Victor, Croft W Bruce (2001), "Relevance-Based Language Models" In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp 120–127 Le Hong Phuong, Roussanaly Azim, Nguyen Thi Minh Huyen, Rossignol Mathias (2010), "An empirical study of maximum entropy approach for partof-speech tagging of Vietnamese texts" In: Traitement Automatique des Langues Naturelles-TALN 2010, pp 19–23 Lee Joon Ho (1995), "Combining multiple evidence from different properties of weighting schemes" In: Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’95, pp 180–188 Lee Chia-Jung, Croft W Bruc (2014), "Cross-Language Pseudo-Relevance Feedback Techniques for Informal Text" In: Advances in Information Retrieval, SE - 22, 36th European Conference on IR Research, ECIR 2014, Amsterdam, The Netherlands, April 13-16, 2014 Proceedings, pp 260–272 Lee Kyung Soon, Park Young Chan, Choi Key Sun (2001), "Re-ranking model based on document clusters" Information Processing and Management, vol 37, no 1, pp 1–14 Lehtokangas Raija, Keskustalo Heikki, Järvelin Kalervo (2008), "Experiments with transitive dictionary translation and pseudo-relevance feedback using graded relevance assessments" Journal of the American Society for Information Science and Technology, vol 59, no 3, pp 476–488 Levow Gina-Anne, Oard Douglas W., Resnik Philip (2005), "Dictionarybased techniques for cross-language information retrieval" Information Processing & Management, vol 41, no 3, pp 523–547 Lewandowski Dirk (2012), "New perspectives on web search engine - 139 - research" Web Search Engine Research, vol 12, pp 1–17 [96] Li Hang (2011), "Learning to Rank for Information Retrieval and Natural Language Processing" [Internet] Synthesis Lectures on Human Language Technologies., Morgan & Claypool Publishers, 1-113 p [97] Liu T.Y (2011), "Learning to rank for information retrieval" [Internet] Springer., Springer [98] Liu Yu-Ting, Liu Tie-Yan, Qin Tao, Ma Zhi-Ming, Li Hang (2007), "Supervised rank aggregation" In: Proceedings of the 16th international conference on World Wide Web - WWW ’07, pp 481–490 [99] Maeda Akira, Sadat Fatiha, Yoshikawa Masatoshi, Uemura Shunsuke (2000), "Query term disambiguation for Web cross-language information retrieval using a search engine" In: IRAL ’00, ACM Press, pp 25–32 [100] Manning Christopher D., Raghavan Prabhakar, Schutze Hinrich (2008), "Introduction to Information Retrieval" [Internet] Cambridge University Press [101] Manoj M., Jacob Elizabeth (2008), "Information retrieval on Internet using meta-search engines : A review" Journal of Scientific & Industrial research, vol 67, pp 739–746 [102] Metzler D., Croft Wb (2007), "Linear feature-based models for information retrieval" Information Retrieval, vol 10, no 3, pp 257–274 [103] Mirna Adriani Ihsan Wahyu (2006), "The Performance of a Machine Translation-Based English-Indonesian CLIR System" Accessing Multilingual Information Repositories, vol 4022, pp 151–154 [104] Mizzaro Stefano (1979), "Information retrieval: theory and practice" In: Proceedings of the Joint IBM/University of Newcastle upon Tyne Seminar on Data Base Systems, pp 1–14 [105] Mizzaro Stefano (1998), "How many relevances in information retrieval?" Interacting with Computers, vol 10, no 3, pp 303–320 [106] Moghadasi Shiva Imani, Ravana Sri Devi, Raman Sudharshan N (2013), "Low-cost evaluation techniques for information retrieval systems: A review" Journal of Informetrics, Elsevier Ltd, vol 7, no 2, pp 301–312 [107] Mukerjee Amitabha, Raina Achla M., Kapil Kumar, Goyal Pankaj, Shukla Pushpraj (2003), "Universal Networking Language: A Tool for LanguageIndependent Semantics" In: Indo UK Workshop on Language Engineering for South Asian Languages [108] Ngo Quoc Hung, Winiwarter Werner, Wloka Bartholomaus (2013), "EVBCorpus - A Multi-Layer English-Vietnamese Bilingual Corpus for Studying Tasks in Comparative Linguistics" International Joint Conference on Natual Language Processing, , no October, pp 1–9 [109] Nguyen Han Doan (2007), "Vietnamese-English Cross-language information retrieval (CLIR) using bilingual dictionary" In: International Workshop on Advanced Computing and Applications Ho Chi Minh City [110] Nguyen Dong (2008), "Query Translation for Cross-lingual Information Retrieval using Wikipedia" In: 9th Twente Student Conference on IT - 140 - [111] Nguyen Dong, Overwijk Arnold, Hauff Claudia, Trieschnigg Dolf R.B., Hiemstra Djoerd, De Jong Franciska (2009), "WikiTranslate: query translation for cross-lingual information retrieval using only Wikipedia" Evaluating Systems for Multilingual and Multimodal Information Access, vol 5706, pp 58–65 [112] Nguyen Van Be Hai, Wilkinson Ross, Zobel Justin (1997), "Cross-language Retrieval In English and Vietnamese" AAAI Technical Report, pp 143–145 [113] Nie Jian-Yun (2010), "Cross-Language Information Retrieval" Morgan & Claypool Publishers [114] Nie Jian-Yun, Simard Michel, Isabelle Pierre, Durand Richard (1999), "Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web" Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’99, ACM Press, pp 74–81 [115] Oard Douglas W., Wang Jianqiang (2001), "Comparing Pirkola’s Structured Queries and Balanced Translation" In: Proceedings of the 2nd NTCIR Workshop on Research in Chinese & Japanese, Text Retrieval and Text Summarization [116] Page Lawrence, Brin Sergey, Motwani Rajeev, Winograd Terry (1998), "The PageRank Citation Ranking: Bringing Order to the Web" [Internet] [117] Pham Dang Duc, Tran Giang Binh, Pham Son Bao (2009), "A Hybrid Approach to Vietnamese Word Segmentation using Part of Speech tags difficulties and challenges segmentation :" In: The 1st International Conference on Knowledge and Systems Engineering, pp 154–161 [118] Pirkola Ari (1998), "The effects of query structure and dictionary setups in dictionary-based cross-language information retrieval" In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 55–63 [119] Pirkola Ari, Hedlund Turid, Keskustalo Heikki, Järvelin Kalervo (2001), "Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings" Information Retrieval, vol 4, no 3, pp 209–230 [120] Ponte Jay, Croft Bruce (1998), "A Language Modeling Approach To Information Retrieval" Proceedings of the 21st annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp 275–281 [121] Pretschner Alexander, Universit Technische, Gauch Susan, Hall Snow (1999), "Ontology Based Personalized Search The University of Kansas" , no 97, pp 391–398 [122] Qin Tao, Liu Tie-Yan, Xu Jun, Li Hang (2010), "LETOR: A benchmark collection for research on learning to rank for information retrieval" Information Retrieval, vol 13, no 4, pp 346–374 [123] Qin Tao, Liu Tie Yan, Xu Jun, Li Hang (2010), "LETOR: A benchmark collection for research on learning to rank for information retrieval" Information Retrieval, vol 13, no 4, pp 346–374 - 141 - [124] Rahman Shihab, Chapa Dolon, Kabir Shaily (2014), "A New Weighted Keyword Based Similarity Measure for Clustering Webpages" International Journal of Computer and Information Technology, vol 3, no 5, pp 929–933 [125] Rasolofo Yves, Savoy Jacques (2003), "Term Proximity Scoring for Keyword-Based Retrieval Systems" Lecture Notes in Computer Science, Springer, pp 207–218 [126] Renda M Elena, Straccia Umberto (2003), "Web Metasearch: Rank vs Score Based Rank Aggregation Methods" Proceedings of the 2003 ACM symposium on Applied computing - SAC ’03, pp 841–846 [127] Resnik Philip, Smith Noah A (2003), "The Web as a Parallel Corpus" Computational Linguistics, vol 29, pp 349–380 [128] Rieh Hae-young, Rieh Soo Young (2005), "Web searching across languages: Preference and behavior of bilingual academic users in Korea" Library & Information Science Research, vol 27, no 2, pp 249–263 [129] Robertson S (2008), "On the history of evaluation in IR" Journal of Information Science, vol 34, no 4, pp 439–456 [130] Robertson Stephen E., Jones Karen Sparck (1988), "Relevance weighting of search terms" Document retrieval systems, pp 143–160 [131] Robertson Stephen E., Walker Stephen, Hancock-Beaulieu Micheline, Gull Aarron, Lau Marianna (1994), "Okapi at TREC-3" In: Proceedings of 3rd Text REtrieval Conference, pp 109–126 [132] Rocchio J.J (1971), "Relevance Feedback in Information Retrieval" In: SMART Retrieval System Experimens in Automatic Document Processing, pp 313–323 [133] Sadat Fatiha (2010), "Research on Query Disambiguation and Expansion for Cross-Language Information Retrieval" Communications of the IBIMA [134] Salton Gerard (1970), "Automatic processing of foreign language documents" Journal of the American Society for Information Science, vol 21, no 3, pp 187–194 [135] Salton Gerard, Buckley Chris (1990), "Improving retrieval performance by relevance feedback" Journal of the American Society for Information Science, vol 41, no 4, pp 288–297 [136] Salton G., Wong A., Yang C.S (1975), "A vector space model for automatic indexing" Communications of the ACM., pp 613–620 [137] Sanderson M., Clough P (2004), "Measuring pseudo relevance feedback & CLIR" In: 27th ACM-SIGIR, pp 484–485 [138] Sarmah Jumi, Kumar Shikhar (2016), "Survey on Word Sense Disambiguation : An Initiative towards an Indo-Aryan Language" International Journal of Engineering and Manufacturing, vol 3, pp 37–52 [139] Savoy Jacques, Le Calvé Anne, Vrajitoru Dana (1997), "Report on the TREC-S Experiment: Data Fusion and Collection Fusion" In: Proceedings of the TREC’5, pp 489–502 [140] Schenkel Ralf, Broschart Andreas, Hwang Seungwon, Theobald Martin, Weikum Gerhard (2007), "Efficient Text Proximity Search" String Processing and Information Retrieval, pp 287–299 - 142 - [141] Shao Yingxia, Cui Bin, Chen Lei, Liu Mingming, Xie Xing (2015), "An efficient similarity search framework for SimRank over large dynamic graphs" Proceedings of the VLDB Endowment, vol 8, no 8, pp 838–849 [142] Sharma Vijay Kumar, Mittal Namita (2016), "Cross Lingual Information Retrieval (CLIR): Review of Tools, Challenges and Translation Approaches" Advances in Intelligent Systems and Computing, vol 433, pp 699–708 [143] Sharma Vijay Kumar, Mittal Namita (2016), "Exploiting Parallel Sentences and Cosine Similarity for Identifying Target Language Translation" Procedia Computer Science, The Author(s), vol 89, pp 428–433 [144] Shaw Joseph A., Fox Edward A., Tech Virginia (1994), "Combination of Multiple Searches" In: The Second Text REtrieval Conference (TREC-2), pp 243–252 [145] Singh Manjit, Singh Dheerendra, Singh Surender (2015), "Use of HTML Tags in Web Search" IJITKM, vol 8, no 2, pp 8–14 [146] Smiley David, Pugh Eric (2009), "Solr 1.4 Enterprise Search Server" Search., 336 p [147] Spark Jones, Rijsbergen C.J Van (1976), "Information retrieval test collections" Journal of Documentation, vol 32, no 1, pp 59–75 [148] Spink Amanda, Zimmer Michael (2008), "Web Search : Multidisciplinary perspectives" Journal of Chemical Information and Modeling., Springer, 160 p [149] Sun Jt, Zeng Hj, Liu Huan, Lu Yuchang (2005), "CubeSVD: a novel approach to personalized Web search" Proceedings of the 14th international conference on World Wide Web, pp 382–390 [150] Svore K.M., Kanani P.H., Khan N (2010), "How Good is a Span of Terms? Exploiting Proximity to Improve Web Retrieval" In: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pp 154–161 [151] Taghizadeh Nasrin (2016), "Automatic Wordnet Development for LowResource Languages using Cross-Lingual WSD" Journal of Artificial Intelligence Research, vol 56, pp 61–87 [152] Tan Bin, Shen Xuehua, Zhai Chengxiang (2006), "Mining Long-Term Search History to Improve Search Accuracy" Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 718–723 [153] Tao Wenbo, Li Guoliang (2014), "Efficient top-K SimRank-based similarity join" Proceedings of the 2014 ACM SIGMOD international conference on Management of data - SIGMOD ’14, pp 1603–1604 [154] Tao Tao, Zhai ChengXiang (2007), "An Exploration of Proximity Measures in Information Retrieval" In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR 07, pp 295–302 [155] Teufel Simone (2007), "An overview of evaluation methods in TREC ad hoc information retrieval and TREC question answering" In: Evaluation of Text and Speech systems, pp 163–186 [156] Tsai Ming-Feng, Liu Tie-Yan, Qin Tao, Chen Hsin-Hsi, Ma Wei-Ying - 143 - (2007), "FRank: A Ranking Method with Fidelity Loss" In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’07, pp 383 [157] Ture Ferhan, Lin Jimmy, Oard Douglas W (2012), "Combining Statistical Translation Techniques for Cross-Language Information Retrieval" Coling2012, vol 3, pp 2685–2702 [158] Wang Jue, Li Z., Yao Jinyi, Sun Zengqi, Li Mingjing, Ma Wei-ying (2006), "Adaptive user profile model and collaborative filtering for personalized news" Frontiers of WWW Research and Development-APWeb 2006, pp 474–485 [159] Wu Shengli, Bi Yaxin, Zeng Xiaoqin (2011), "The linear combination data fusion method in information retrieval" In: 22nd International Conference Database and Expert Systems Applications, pp 219–233 [160] Xu-wen Wang, Xiao-jie Wang, Jun-lian L.I (2015), "Cross-lingual Pseudo Relevance Feedback Based on Weak Relevant Topic Alignment" In: 29th Pacific Asia Conference on Language, Information and Computation, pp 529–534 [161] Xu Jinxi, Croft W Bruce (1996), "Query expansion using local and global document analysis" Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval SIGIR ’96, ACM Press, pp 4–11 [162] Xu Jun, Li Hang (2007), "AdaRank: a boosting algorithm for information retrieval" In: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp 391–398 [163] Xu Jinxi, Weischedel Ralph (2005), "Empirical studies on the impact of lexical resources on CLIR performance" Information Processing and Management, vol 41, no 3, pp 475–487 [164] Yahya Zulaini, Abdullah Muhamad Taufik, Azman Azreen, Kadir Rabiah Abdul (2013), "Query Translation Using Concepts Similarity Based on Quran Ontology for Cross-Language Information Retrieval" Journal of Computer Science, vol 9, no 7, pp 889–897 [165] Ye Zheng, He Ben, Huang Xiangji, Lin Hongfei (2010), "Revisiting Rocchio’s relevance feedback algorithm for probabilistic models" Lecture Notes in Computer Science, vol 6458 LNCS, pp 151–161 [166] Yeh Jen-yuan, Lin Jung-yi, Ke Hao-Ren, Yang Wei-Pang (2007), "Learning to Rank for Information Retrieval Using Genetic Programming" In: SIGIR 2007 workshop: Learning to Rank for Information Retrieval [167] Yu Weiren, Lin Xuemin, Zhang Wenjie, Chang Lijun, Pei Jian (2013), "More is Simpler: Effectively and Efficiently Assessing Node-Pair Similarities Based on Hyperlinks" Proceedings of the VLDB …, vol 7, no 1, pp 13–24 [168] Zhai Chengxiang, Lafferty John (2001), "A study of smoothing methods for language models applied to Ad Hoc information retrieval" In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’01, pp 334–342 [169] Zhai ChengXiang, Lafferty John D (2001), "Model-based Feedback In The Language Modeling Approach To Information Retrieval" Cikm, pp 403–410 - 144 - [170] Zhang Ying, Huang Fei, Vogel Stephan (2005), "Mining translations of OOV terms from the web through cross-lingual query expansion" In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, ACM Press, pp 669–670 [171] Zhou Dong, Truran Mark, Brailsford Tim, Ashman Helen (2008), "A Hybrid Technique for English-Chinese Cross Language Information Retrieval" ACM Trans Asian Lang Info Process, vol 7, no 2, pp 1–35 [172] Zhou Dong, Truran Mark, Brailsford Tim, Wade Vincent, Ashman Helen (2012), "Translation techniques in cross-language information retrieval" ACM Computing Surveys, vol 45, pp 1–44 [173] Zukerman Ingrid, Road Blackburn (2003), "Query Expansion and Query Reduction in Document Retrieval" In: Tools with Artificial Intelligence, 2003 Proceedings 15th IEEE International Conference, pp 552–559 - 145 - DANH MỤC CÁC CƠNG TRÌNH KHOA HỌC ĐÃ CÔNG BỐ [1] [2] [3] [4] [5] [6] [7] [8] [9] Giang L.T., Hùng V.T (2012), "Các phương pháp xếp hạng lại trộn kết tìm kiếm" Tạp chí Khoa học Cơng nghệ trường Đại học Kỹ thuật, vol 91, pp 59–64 Lâm Tùng Giang, Võ Trung Hùng (2013), "Đánh giá thực nghiệm mô hình truy vấn thơng tin đa ngữ" In: Hội nghị quốc gia lần thứ VI Nghiên cứu ứng dụng Công nghệ thông tin, pp 103–107 Lâm Tùng Giang, Võ Trung Hùng (2013), "Ứng dụng lập trình di truyền học xếp hạng" Tạp chí Khoa học Công nghệ trường Đại học Kỹ thuật, vol 92, pp 58–63 Lam Tung Giang, Vo Trung Hung, Huynh Cong Phap (2013), "Building Evaluation Dataset in Vietnamese Information Retrieval" Journal of Science and Technology Danang University, vol 12, no 1, pp 37–41 Lam Tung Giang, Vo Trung Hung, Huynh Cong Phap (2013), "Experiments with query translation and re-ranking methods in Vietnamese-English bilingual information retrieval" In: Proceedings of the Fourth Symposium on Information and Communication Technology - SoICT ’13, ACM Press, pp 118–122 Lam Tung Giang, Vo Trung Hung, Huynh Cong Phap (2015), "Building Structured Query in Target Language for Vietnamese – English Cross Language Information Retrieval Systems" International Journal of Engineering Research & Technology (IJERT), vol 4, no 04, pp 146–151 Lam Tung Giang, Vo Trung Hung, Huynh Cong Phap (2015), "Improve Cross Language Information Retrieval with Pseudo-Relevance Feedback" In: FAIR 2015, pp 315–320 Lam Tung Giang, Vo Trung Hung, Huynh Cong Phap (2015), "Building proximity models for Cross Language Information Retrieval" Issue on Information and Communication Technology- University of Danang, vol 1, no 1, pp 8–12 Lâm Tùng Giang, Võ Trung Hùng, Huỳnh Công Pháp (2016), "Áp dụng học máy dựa lập trình di truyền tìm kiếm Web xun ngữ" Tạp chí Khoa học Cơng nghệ, Đại học Đà Nẵng, vol 1, no 98, pp 93–97 - 146 - ... XẾP HẠNG TRANG WEB 31 1.5.1 Đặc thù tìm kiếm web 31 1.5.2 Các phương pháp xếp hạng trang Web 32 1.5.3 Xếp hạng trang Web tìm kiếm xuyên ngữ 36 1.6 CÁC HẠN CHẾ VÀ... ngơn ngữ đích;  Đề xuất phương pháp xếp hạng lại danh sách kết tìm kiếm truy vấn xuyên ngữ, trọng việc xếp hạng trang Web  Kết hợp áp dụng giải pháp đề xuất mô hình tìm kiếm web xuyên ngữ nhằm... đề xuất gán trọng số cách thủ cơng đó, cần nghiên cứu nhằm khắc phục hạn chế Xuất phát từ tình hình thực tiễn trên, đề tài "Một số phương pháp phục vụ xếp hạng trang Web tìm kiếm xuyên ngữ" chọn

Định dạng
Số trang	157
Dung lượng	1,71 MB