Luận án tiến sĩ khai phá mẫu dãy có trọng số trong cơ sở dữ liệu dãy

BỘ GIÁO DỤC VÀ ĐÀO TẠO VIỆN HÀN LÂM KHOA HỌC VÀ CÔNG NGHỆ VIỆT NAM HỌC VIỆN KHOA HỌC VÀ CÔNG NGHỆ - TRẦN HUY DƯƠNG KHAI PHÁ MẪU DÃY CÓ TRỌNG SỐ TRONG CƠ SỞ DỮ LIỆU DÃY LUẬN ÁN TIẾN SĨ NGÀNH MÁY TÍNH HÀ NỘI – 2021 BỘ GIÁO DỤC VÀ ĐÀO TẠO VIỆN HÀN LÂM KHOA HỌC VÀ CÔNG NGHỆ VIỆT NAM HỌC VIỆN KHOA HỌC VÀ CÔNG NGHỆ - Trần Huy Dương KHAI PHÁ MẪU DÃY CÓ TRỌNG SỐ TRONG CƠ SỞ DỮ LIỆU DÃY Chuyên ngành: Hệ thống thông tin Mã số: 48 01 04 LUẬN ÁN TIẾN SĨ NGÀNH MÁY TÍNH NGƯỜI HƯỚNG DẪN KHOA HỌC: TS Nguyễn Trường Thắng GS.TS Vũ Đức Thi Hà Nội – Năm 2021 LỜI CAM ĐOAN Tơi xin cam đoan cơng trình nghiên cứu tơi kết trình bày luận án mới, trung thực chưa cơng bố cơng trình người khác Những kết viết chung với cán hướng dẫn tác giả khác đồng ý đưa vào luận án Việc tham khảo nguồn tài liệu, viết thực trích dẫn ghi nguồn tham khảo theo quy định Tác giả luận án NCS Trần Huy Dương i LỜI CẢM ƠN Lời đầu tiên, xin gửi lời cảm ơn sâu sắc tới TS.Nguyễn Trường Thắng GS.TS.Vũ Đức Thi tận tình hướng dẫn, giúp đỡ tơi q trình nghiên cứu, đăng hồn thành luận án Tôi xin chân thành cảm ơn Ban lãnh đạo Viện Công nghệ thông tin - Viện Hàn lâm Khoa học Công nghệ Việt Nam, lãnh đạo Học viện Khoa học Công nghệ tạo điều kiện thuận lợi cho q trình nghiên cứu tơi, cảm ơn cán phịng Cơng nghệ phần mềm quản lý nhiệt tình cơng tác, giúp dành thời gian tập trung nghiên cứu hồn thành luận án Cuối cùng, tơi xin cảm ơn gia đình, bạn bè, đồng nghiệp ln nguồn động viên, ủng hộ, giúp thêm động lực để hoàn thành luận án Người thực Trần Huy Dương ii MỤC LỤC DANH MỤC HÌNH VẼ DANH MỤC BẢNG BIỂU .4 DANH MỤC CÁC TỪ VIẾT TẮT MỞ ĐẦU CHƯƠNG TỔNG QUAN KHAI PHÁ MẪU DÃY CÓ TRỌNG SỐ TRONG CƠ SỞ DỮ LIỆU DÃY 15 1.1 Tổng quan tình hình nghiên cứu 15 1.2 Khai phá mẫu dãy có trọng số CSDL dãy 25 1.3 Khai phá mẫu dãy có trọng số CSDL dãy với khoảng cách thời gian .32 1.4 gian Khai phá mẫu dãy lợi ích cao CSDL định lượng có khoảng cách thời .47 Kết luận Chương 61 CHƯƠNG KHAI PHÁ MẪU DÃY CÓ TRỌNG SỐ TRONG CƠ SỞ DỮ LIỆU DÃY CÓ KHOẢNG CÁCH THỜI GIAN .63 2.1 Giới thiệu .63 2.2 Thuật toán khai phá top-k mẫu dãy thường xuyên trọng số với khoảng cách thời gian (TopKWFP) .65 2.2.1 Bài toán đặt 65 2.2.2 Ý tưởng thuật toán .66 2.2.3 Thuật toán TopKWFP 67 2.2.4 Phân tích thuật tốn TopKWFP 70 2.2.5 Thử nghiệm thuật toán 78 Kết luận Chương 86 CHƯƠNG KHAI PHÁ MẪU DÃY LỢI ÍCH CAO TRONG CƠ SỞ DỮ LIỆU DÃY CÓ KHOẢNG CÁCH THỜI GIAN .87 3.1 Giới thiệu .87 3.2 Thuật toán khai phá mẫu dãy lợi ích cao có khoảng cách thời gian (UIPrefixSpan) 89 3.2.1 Bài toán đặt 89 3.2.2 Ý tưởng thuật toán .90 3.2.3 Thuật toán UIPrefixSpan .90 3.2.4 Phân tích thuật tốn UIPrefixSpan 92 3.2.5 Thử nghiệm thuật toán .103 3.3 Thuật toán khai phá mẫu dãy lợi ích cao có khoảng cách thời gian pha (HUISP) 109 3.3.1 Bài toán đặt 109 3.3.2 Ý tưởng thuật toán .110 3.3.3 Thuật toán HUISP .112 3.3.4 Phân tích thuật tốn HUISP .114 3.3.5 Thử nghiệm thuật toán .126 Kết luận Chương 133 KẾT LUẬN VÀ KIẾN NGHỊ 134 DANH MỤC CƠNG TRÌNH ĐÃ CÔNG BỐ 137 TÀI LIỆU THAM KHẢO .138 DANH MỤC HÌNH VẼ Hình 1.1 Các vấn đề nghiên cứu luận án 25 Hình 2.1 Ảnh hưởng tham số k 80 Hình 2.2 Ảnh hưởng chiến lược tối ưu lên thời gian chạy 81 Hình 2.3 Ảnh hưởng chiến lược tối ưu lên số ứng viên tạo 82 Hình 2.4 So sánh thuật tốn WIPrefixSpan TopKWFP 85 Hình 3.1 Biểu đồ phân phối giá trị lợi nhuận 1000 mục (UIPrefixSpan) 104 Hình 3.2 Thời gian chạy UIPrefixSpan 106 Hình 3.3 Bộ nhớ sử dụng UIPrefixSpan 107 Hình 3.4 Số mẫu dãy lợi ích cao UIPrefixSpan 109 Hình 3.5 Biểu đồ phân phối giá trị lợi nhuận 1000 mục (HUISP) 127 Hình 3.6 Thời gian chạy HUISP .128 Hình 3.7 Bộ nhớ sử dụng HUISP 129 Hình 3.8 Ảnh hưởng số lượng mẫu dãy với thời gian chạy nhớ 132 DANH MỤC BẢNG BIỂU Bảng 1.1 Danh sách số cơng trình liên quan đến luận án 22 Bảng 1.2 CSDL dãy SDB 26 Bảng 1.3 Trọng số mục SDB 26 Bảng 1.4 CSDL dãy iSDB với khoảng cách thời gian 33 Bảng 1.5 Trọng số mục iSDB .34 Bảng 1.6 CSDL dãy QiSDB với khoảng cách thời gian 48 Bảng 1.7 Trọng số mục QiSDB 49 Bảng 1.8 Bảng lợi ích QiSDB 56 Bảng 1.9 Bảng mục .56 Bảng 2.1 CSDL dãy iSDB với khoảng cách thời gian 75 Bảng 2.2 Trọng số mục iSDB .75 Bảng 2.3 CSDL chiếu dãy 77 Bảng 2.4 Các liệu thực nghiệm .79 Bảng 2.5 Thống kê chi tiết số lượng mẫu dãy ứng viên tạo 83 Bảng 3.1 Cơ sở liệu điều kiện với tiền tố 97 Bảng 3.2 Cơ sở liệu điều kiện với tiền tố .97 Bảng 3.3 Cơ sở liệu điều kiện với tiền tố 98 Bảng 3.4 Cơ sở liệu điều kiện với tiền tố 98 Bảng 3.5 Các mẫu dãy ứng viên ứng với tiền tố 99 Bảng 3.6 Bảng thống kê khai phá mẫu dãy lợi ích cao với khoảng cách thời gian QiSDB 100 Bảng 3.7 Lợi ích mẫu dãy phần tử 118 Bảng 3.8 Lợi ích dãu đầu vào 119 Bảng 3.9 Bảng lợi ích mẫu dãy phần tử 119 Bảng 3.10 Bảng mục QiSDB 120 Bảng 3.11 CSDL chiếu QiSDB| 121 Bảng 3.12 Bảng lợi ích mẫu ứng viên độ dài với tiền tố 122 Bảng 3.13 CSDL chiếu QiSDB| 122 Bảng 3.14 Bảng lợi ích mục ứng viên độ dài với tiền tố 123 Bảng 3.15 CSDL chiếu QiSDB| 123 Bảng 3.16 Bảng lợi ích mục ứng viên độ dài với tiền tố 124 Bảng 3.17 Bảng mẫu dãy lợi ích cao tìm với tiền tố 124 Bảng 3.18 Bảng mẫu dãy lợi ích cao với khoảng cách thời gian QiSDB 125 Bảng 3.19 Bảng thống kê số lượng mẫu dãy ứng viên số mẫu dãy lợi ích cao UIPrefixSpan HUISP 130 DANH MỤC CÁC TỪ VIẾT TẮT Từ viết tắt CSDL UL Tiếng Anh Database Utility Level US Utility Span PrefixSpan Prefix-Projected Sequential Patterns Mining Algorithm TopKWFP Top-k weighted sequential pattern mining with item interval Algorithm WIPrefixSpan Weighted sequential pattern mining with item interval Algorithm UIPrefixSpan High Utility Sequential Patterns with Time Interval Algorithm HUISP High Utility Item Interval Sequential Pattern Algorithm GSP Generalized Sequential Pattern SDB iSDB Sequence Database Sequence Database with item interval Quantitative Sequence Database with item interval UC Irvine Machine QiSDB UCI Tiếng Việt Cơ sở liệu Thuật toán khai phá mẫu dãy lợi ích cao theo phương pháp Apriori Thuật toán khai phá lợi ích cao theo phương pháp PrefixSpan Thuật tốn khai phá mẫu dãy thường xuyên theo phương pháp tăng trưởng mẫu dãy Thuật toán khai phá top-k mẫu dãy trọng số có khoảng cách thời gian Thuật tốn khai phá mẫu dãy trọng số có khoảng cách thời gian Thuật tốn khai phá mẫu dãy lợi ích cao có khoảng cách thời gian theo phương pháp pha Thuật toán khai phá mẫu dãy lợi ích cao có khoảng cách thời gian theo phương pháp sử dụng bảng lợi ích Thuật toán khai phá mẫu dãy tổng quát Cơ sở liệu dãy Cơ sở liệu dãy có khoảng cách thời gian Cơ sở liệu dãy định lượng có khoảng cách thời gian Kho liệu chuẩn UCI 133 Kết luận Chương Chương trình bày giải thuật khai phá mẫu dãy lợi ích cao CSDL định lượng có khoảng cách thời gian Luận án đề xuất thuật tốn khai phá mẫu dãy lợi ích cao thuật tốn UIPrefixSpan thuật tốn HUISP có cải tiến sử dụng số kỹ thuật bảng lợi ích, bảng mục Trong thuật toán này, mục liệu xác định giá trị lợi ích lợi ích ngồi liệu dãy liệu CSDL có khoảng cách thời gian Cả hai thuật toán phát triển dựa phương pháp tăng trưởng mẫu dãy thuật toán gốc PrefixSpan thuật toán WIPrefixSpan Các thuật toán UIPrefixSpan HUISP đắn đầy đủ Độ phức tạp tính tốn 02 thuật tốn Luận án tiến hành thực nghiệm 02 thuật toán liệu thực Kết thực nghiệm cho thấy tính đắn khả thi thuật toán đề xuất Bài báo thuật toán UIPrefixSpan đăng Kỷ yếu hội thảo Một số vấn đề chọn lọc Công nghệ thơng tin Truyền thơng [CT2] tạp chí Cybernetics and Information Technologies [CT3] Bài báo thuật toán HUISP đăng Kỷ yếu hội thảo Một số vấn đề chọn lọc Công nghệ thông tin Truyền thơng [CT4] tạp chí Journal of Computer Science and Cybernetics [CT5] 134 KẾT LUẬN VÀ KIẾN NGHỊ Những kết luận án: NCS hoàn thành mục tiêu luận án Luận án tập trung nghiên cứu phát triển lý thuyết ứng dụng vấn đề khai phá mẫu dãy có trọng số CSDL dãy đặc biệt tập trung vào giải vấn đề khai phá mẫu dãy có trọng số CSDL dãy có khoảng cách thời gian CSDL dãy định lượng có khoảng cách thời gian Từ việc nghiên cứu phân tích kết nghiên cứu trước vấn đề liên quan, luận án đạt số kết quả: Góp phần giải tốn khai phá mẫu dãy trọng số với khoảng cách thời gian CSDL dãy có khoảng cách thời gian Cụ thể: - Mở rộng toán khai phá mẫu dãy có trọng số với khoảng cách thời gian WIPrefixSpan cho ngưỡng hỗ trợ wminsup cho trước Đề xuất thuật toán TopKWFP thực khai phá top-k mẫu dãy có trọng số với khoảng cách thời gian cách tăng dần ngưỡng hỗ trợ wminsup ban đầu Thuật toán TopKWFP NCS đề xuất thực theo phương pháp tăng trưởng mẫu dãy thuật toán PrefixSpan cho phép xây dựng dãy ứng viên tồn CSDL, khơng phương pháp Apriori sinh dãy ứng viên khơng tồn CSDL Đóng góp NCS sử dụng phương pháp xây dựng mẫu dãy ứng viên để đảm bảo tính chất phản điệu cho phép khai phá đệ quy, đồng thời sử dụng chiến lược tăng dần ngưỡng hỗ trợ wminsup sử dụng chiến lược tạo mẫu dãy ứng viên hứa hẹn theo ý tưởng thuật toán TKS áp dụng CSDL dãy có khoảng cách thời gian, nhờ ngưỡng hỗ trợ wminsup tăng nhanh khơng gian tìm kiếm giảm xuống đáng kể Thuật toán TopKWFP thuật toán hiệu nhằm khai phá top-k mẫu dãy trọng số với khoảng cách thời gian CSDL dãy có khoảng cách thời gian Các mẫu dãy TopKWFP phát cho thấy ý nghĩa ứng dụng thực tiễn chúng giải thuật có quan tâm đến trọng số mục liệu, khoảng cách thời gian dãy CSDL dãy có khoảng cách thời gian 135 Góp phần giải tốn khai phá mẫu dãy lợi ích cao với khoảng cách thời gian CSDL dãy định lượng có khoảng cách thời gian Cụ thể: - Mở rộng toán khai phá mẫu dãy có trọng số với khoảng cách thời gian WIPrefixSpan tốn khai phá mẫu dãy lợi ích cao US Đề xuất thuật toán UIPrefixSpan thực khai phá mẫu dãy lợi ích cao với khoảng cách thời gian CSDL dãy định lượng có khoảng cách thời gian Thuật toán UIPrefixSpan NCS đề xuất thực theo phương pháp tăng trưởng mẫu dãy thuật toán PrefixSpan Đóng góp NCS sử dụng phương pháp xây dựng mẫu dãy ứng viên lợi ích cao để đảm bảo tính chất phản điệu cho phép khai phá đệ quy, đồng thời sử dụng chiến lược pha: pha thực sinh mẫu ứng viên, pha thực tính lại lợi ích thực mẫu ứng viên tìm mẫu dãy lợi ích cao với khoảng cách thời gian Thuật toán UIPrefixSpan thuật toán hiệu nhằm khai phá mẫu dãy lợi ích cao với khoảng cách thời gian CSDL dãy định lượng có khoảng cách thời gian Các mẫu dãy UIPrefixSpan phát cho thấy ý nghĩa ứng dụng thực tiễn chúng giải thuật có quan tâm đến lợi ích trong, lợi ích ngồi mục liệu, đồng thời quan tâm khoảng cách thời gian dãy CSDL dãy định lượng có khoảng cách thời gian - Nhằm tiếp tục thực nâng cao thuật toán khai phá mẫu dãy lợi ích cao UIPrefixSpan tốn khai phá mẫu dãy lợi ích cao PHUS Đề xuất thuật tốn HUISP thực khai phá mẫu dãy lợi ích cao với khoảng cách thời gian CSDL dãy định lượng có khoảng cách thời gian Thuật tốn HUISP NCS đề xuất thực theo phương pháp tăng trưởng mẫu dãy thuật toán PrefixSpan sử dụng bảng mục, bảng lợi ích thuật tốn PHUS Đóng góp NCS sử dụng phương pháp xây dựng mẫu dãy ứng viên lợi ích cao để đảm bảo tính chất phản điệu cho phép khai phá đệ quy, cấu trúc bảng bảng lợi ích để trì ngưỡng cận lợi ích thực mẫu dãy trình sinh mẫu dãy cấu trúc bảng mục để tăng tốc tìm kiếm xây dựng CSDL chiếu CSDL dãy định lượng có khoảng cách thời gian Thuật tốn 136 HUISP thuật toán hiệu nhằm khai phá mẫu dãy lợi ích cao với khoảng cách thời gian CSDL dãy định lượng có khoảng cách thời gian Các mẫu dãy HUISP phát cho thấy ý nghĩa ứng dụng thực tiễn chúng giải thuật có quan tâm đến lợi ích trong, lợi ích mục liệu, đồng thời quan tâm khoảng cách thời gian dãy CSDL dãy định lượng có khoảng cách thời gian Một hạn chế phần ứng dụng luận án thực nghiệm dựa vào liệu UCI Machine Learning mà chưa tiến hành thực nghiệm triển khai thuật toán CSDL thực liệu chứng khốn, tài chính, giá … Hướng nghiên cứu tương lai: Các thuật toán kết luận án khai phá mẫu dãy có trọng số CSDL dãy có khoảng cách thời gian, CSDL dãy định lượng có khoảng cách thời gian phát triển dựa ý tưởng tăng trưởng mẫu dãy thuật toán PrefixSpan Một hướng nghiên cứu sau luận án phát triển số phương pháp, cải tiến khác để khai phá hiệu Các dãy liệu CSDL dãy, CSDL dãy định lượng có khoảng cách thời gian liên tục bổ sung, cập nhật vào CSDL Một hướng nghiên cứu khác khai phá mẫu dãy có trọng số CSDL dãy có khoảng cách thời gian tăng trưởng , CSDL dãy định lượng có khoảng cách thời gian tăng trưởng sử dụng khái niệm cửa sổ trượt để thực khai phá mà không cần duyệt lại toàn CSDL dãy sau tăng trưởng Một hướng nghiên cứu phát triển khai phá luật CSDL dãy trọng số có khoảng cách thời gian, CSDL dãy định lượng có khoảng cách thời gian 137 DANH MỤC CƠNG TRÌNH ĐÃ CƠNG BỐ [CT1] Tran Huy Duong, Nguyen Truong Thang, Vu Duc Thi, Tran The Anh, “Mining top-K frequent sequential pattern in item interval extended sequence database", Journal of Computer Science and Cybernetics, Ha Noi, Vol 34, No.2, 2018, pp 249-263 [CT2] Trần Huy Dương, Trần Thế Anh, “Một thuật tốn khai phá mẫu dãy lợi ích cao với khoảng cách thời gian", Kỷ yếu hội thảo Một số vấn đề chọn lọc Công nghệ thông tin Truyền thông, Quy Nhơn, 2017, pp 225-241 [CT3] Tran Huy Duong, Janos Demetrovics, Vu Duc Thi, Nguyen Truong Thang, Tran The Anh, “An algorithm for Mining High Utility Sequential Patterns with Time Interval", Cybernetics and Information Technologies (CIT), Bulgarian Academy of Sciences, Sofia, Vol 19, No.4, 2019, pp 3-16 (Scopus) [CT4] Trần Huy Dương, Trần Thế Anh, Nguyễn Tiến Thụy, Đặng Thị Oanh, “HUISP-Thuật toán khai phá mẫu dãy lợi ích cao với khoảng cách thời gian", Kỷ yếu hội thảo Một số vấn đề chọn lọc Công nghệ thơng tin Truyền thơng, Thái Bình, 2019, pp 114-119 [CT5] Tran Huy Duong, Nguyen Truong Thang, Vu Duc Thi, Tran The Anh, “High utility item Interval sequential pattern mining algorithm", Journal of Computer Science and Cybernetics, Ha Noi, Vol 36, No.1, 2020, pp 1-15 138 TÀI LIỆU THAM KHẢO [1] R Agrawal and R Srikant, "Fast algorithms for mining association rules," in The International Conference on Very Large Databases, 1994 [2] R Agrawal, R Srikant, "Mining sequential patterns," in Proceedings of the International Conference on Data Engineering (ICDE), 1995 [3] Han, Jiawei, Jian Pei, and Yiwen Yin., "Mining frequent patterns without candidate generation.," in ACM Sigmod Record, 2000 [4] M Zaki, "Scalable algorithms for association mining," IEEE Transactions on Knowledge and Data Engineering, vol 12, no 3, pp 372390, 2000 [5] J.Pei, J.Han, H.Lu, S.Nishio, S.Tang and D.Yang, "H-mine: Hyperstructure mining of frequent patterns in large databases," IEEE International Conference on Data Mining, pp 441-448, 2001 [6] T Uno, M Kiyomi, and H Arimura, "Efficient mining algorithms for frequent/closed/maximal itemsets," in IEEE International Conference on Data Mining Workshop on Frequent Itemset Mining Implementations, 2004 [7] Vũ Đức Thi, Nguyễn Huy Đức, "Thuật toán hiệu khai phá tập mục lợi ích cao cấu trúc liệu cây," Computer science and cybernetics, vol 24, no 3, pp 110-126, 2008 [8] J M Pokou, P Fournier-Viger, and C Moghrabi, "Authorship attribution using small sets of frequent part-of-speech skip-grams," in The International Florida Artificial Intelligence Research Society Conference, 2016 [9] Pramono, Y W T., "Anomaly-based Intrusion Detection and Prevention System on Website Usage using Rule-Growth Sequential Pattern Analysis," in The International Conference on Advanced Informatics, Concept Theory and Applications, 2014 [10] R Agrawal, R.Srikant, "Mining sequential patterns: Generallizations and performance improvements," Lecture Notes in Computer Science, vol 1057, pp 3-17, 1996 [11] Zaki.M, "SPADE: An Efﬁcient Algorithm for Mining Frequent Sequences," Machine Learning, vol 40, pp 31-60, 2000 [12] S Aseervatham, A Osmani, and E Viennet, "bitSPADE: A latticebased sequential pattern mining," in The International Conference on Data Mining, 2006 139 [13] J Han, J Pei, B Mortazavi-Asl, Q Chen, U Dayal, and M C Hsu, "FreeSpan: frequent pattern projected sequential pattern mining," in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2000 [14] J.Pei,J.Han, and Wang.W, "Mining Sequential Pattern with Constraints in Large Databases," in Proc of CIKM’02, 2002 [15] J Ayres, J Flannick, J Gehrke, and T Yiu, "Sequential pattern mining using a bitmap representation," ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 429-435, 2002 [16] K Gouda, M Hassaan, and M J Zaki, "Prism: An effective approach for frequent sequence mining via prime-block encoding," Journal of Computer and System Sciences, vol 76, no 1, p 88–102, 2010 [17] P Fournier-Viger, A Gomariz, M Campos, and R Thomas, "Fast Vertical Mining of Sequential Patterns Using Co-occurrence Information," in The Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2014 [18] Z Yang, and M Kitsuregawa, "LAPIN-SPAM: An improved algorithm for mining sequential pattern," in The International Conference on Data Engineering Workshops, 2005 [19] P Fournier-Viger, C.-W Wu, A Gomariz, and V S Tseng, "VMSP: Efficient vertical mining of maximal sequential patterns," in The Canadian Conference on Artificial Intelligence, 2014 [20] P Fournier-Viger, A Gomariz, M Sebek, M Hlosta, "VGEN: fast vertical mining of sequential generator patterns," in The International Conference on Data Warehousing and Knowledge Discovery, 2014 [21] Fournier-Viger.P, Gomariz.A, Gueniche.T, Mwamikazi.E, Thomas.R, "TKS: Efficient Mining of Top-K Sequential Patterns," Springer Advanced Data Mining and Application, vol 8346, pp 109-120, 2013 [22] E Salvemini, F Fumarola, D Malerba, and J Han, "Fast sequence mining based on sparse id-lists," The International Symposium on Methodologies for Intelligent Systems, pp 316-325, 2011 [23] Mabroukeh, N R and Ezeife, C I, "A taxonomy of sequential pattern mining algorithms," ACM Computing Surveys, vol 43, no 1, pp 1-41, 2010 [24] J Wang, J Han, and C Li, "Frequent closed sequence mining without candidate maintenance," IEEE Transactions on Knowledge Data Engineering, vol 19, no 8, pp 1042-1056, 2007 [25] P Fournier-Viger, R Nkambou, and E Mephu Nguifo, "A Knowledge discovery framework for learning task models from user interactions in 140 intelligent tutoring systems," in The Mexican International Conference on Artificial Intelligence, 2008 [26] S Ziebarth, I A Chounta, and H U Hoppe, "Resource access patterns in exam preparation activities," in The European Conference on Technology Enhanced Learning, 2015 [27] D Schweizer, M Zehnder, H Wache, H F Witschel, D Zanatta, and M Rodriguez, "Using consumer behavior data to reduce energy consumption in smart homes: Applying machine learning," in IEEE International Conference on Machine Learning and Applications, 2015 [28] P Fournier-Viger, T Gueniche, and V S Tseng, "Using partiallyordered sequential rules to generate more accurate sequence prediction," in The International Conference on Advanced Data Mining and Applications, 2012 [29] J Lin, E Keogh, L Wei, and S Lonardi, "Experiencing SAX: a novel symbolic representation of time series," Data Mining and Knowledge Discovery, vol 15, no 2, pp 107-144, 2007 [30] Ayres, J., Gehrke, J., Yiu, T and Flannick, J, "Sequential Pattern Mining using Bitmap Representation," in Proc of ACM SIGKDD’02, 2002 [31] J Pei, J Han, B.M Asi, H Pino, "PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth," in Proceedings of the Seventeenth International Conference on Data Engineering, 2001 [32] Wei Wei jie, Zhang Ming Wei, Zhang Bin, Wang Bo, "Mining algorithm for weighted sequential pattern base on minimum weighted support," in Journal of Jilin University Engineering and Technology Edition, vol38, no.2, 2008 [33] U Yun, and J J Leggett, "WSpan: Weighted Sequential pattern mining in large sequence," in The International IEEE Conference Intelligent Systems, 2006 [34] Zaiping Tao, "An Efficient Algorithm for Weighted Sequential Pattern Mining," in International Journal of Advancements in Computing Technology(IJACT), Vol , 2012 [35] M Zaki, "Sequence Mining in Categorical Domains:Incorporating Constraints," in Proc of CIKM’00, 2000 [36] J H Chang, "Mining weighted sequential patterns in a sequence database with a time-interval," Knowledge-Based Systems, vol 24, no 1, pp 1-9, 2011 141 [37] Yu Hirate, Hayato Yamana, "Generalized Sequential Pattern Mining with Item Intervals," JOURNAL OF COMPUTERS, vol 1, no 3, pp 51-60, 2006 [38] Chen Y.-L and Huang T.C.-H., "Discovering time-interval sequential patterns in sequence databases," Expert Systems with Applications, vol 25, no 3, p 343–354, 2003 [39] Chen Y.-L., Chiang M.-C and Ko M.-T., "Discovering fuzzy timeinterval sequential patterns in sequence databases," IEEE Transactions on Systems Man and Cybernetics, vol 35, no 5, p 959–972, 2005 [40] Trần Huy Dương, Vũ Đức Thi, "Thuật toán khai phá mẫu dãy thường xuyên trọng số chuẩn hóa với khoảng cách thời gian," Chun san Tạp chí Cơng nghệ thơng tin truyền thông, vol 2, pp 72-81, 2015 [41] Yen SJ, Lee YS, "Mining non-redundant time-gap sequential patterns," in Appl Intell 39(4), 2013 [42] C F Ahmed, S K Tanbeer, and B S Jeong, "A novel approach for mining high-utility sequential," Electronics and Telecommunications Research Institute journal,, vol 32, no 5, p 676–686, 2010 [43] Yin, J., Zheng, Z., Cao, L, "USpan: an efficient algorithm for mining high utility sequential," in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge, 2012 [44] Lan, G.C., Hong, T.P., Tseng, V.S., Wang, S.L, "Applying the maximum utility measure in high utility sequential pattern mining," Expert Syst Appl, vol 41, no 11, p 5071–5081, 2014 [45] Alkan, O K and Karagoz, P , "CRoM and HuspExt: Improving efficiency of high utility sequential pattern extraction," in 2016 IEEE 32nd International Conference on Data Engineering (ICDE), Helsinki, 2016 [46] Wang, J.Z., Huang, J.L., Chen, Y.C, "On efficiently mining high utility sequential patterns," Knowl Inf Syst, vol 49, no 2, p 597–627, 2016 [47] Truong-Chi T., Fournier-Viger P, "A Survey of High Utility Sequential Pattern Mining," in High-Utility Pattern Mining: Theory, Algorithms and Applications, vol 51, P Fournier-Viger, J Lin, R Nkambou, B Vo and V Tseng, Eds., Springer, Cham, 2019 [48] Truong Duc Phuong, Do Van Thanh, Nguyen Duc Dung, "Mining Fuzzy Sequential Patterns with Fuzzy Time-Intervals in Quantitative Sequence Databases," Cybernetics and Information Technologies, vol 18, no 2, pp 3-19, 2018 142 [49] J Pei, J Han, and W Wang, "Constraint-based sequential pattern mining: the pattern-growth methods," Journal of Intelligent Information Systems, vol 28, no 2, p 133–60, 2007 [50] J D Ren, J Yang, and Y Li, "Mining weighted closed sequential patterns in large databases," in The International Conference on Fuzzy Systems and Knowledge Discovery, 2008 [51] M N Quang, T Dinh, U Huynh, and B Le, "MHHUSP: An integrated algorithm for mining and hiding high utility sequential patterns," in The International Conference on Knowledge and Systems Engineering, 2016 [52] S Zida, P Fournier-Viger, C W Wu, J C Lin, and V S Tseng, "Efficient mining of high-utility sequential rules," in The International Conference on Machine Learning and Data Mining, 2015 [53] A Sirisha, Suresh Pabboju, G Narsimha, "An approach to mine Time Interval based Weighted Sequential Patterns in Sequence Databases," in International Conference on Signal-Image Technology & Internet-Based Systems, 2017 [54] Asima Jamil, Abdus Salam and Farhat Amin, "Performance evaluation of top-k sequential mining methods on synthetic and real datasets," International Journal of Advanced Computer Research, vol 7, no 32, pp 176184, 2017 [55] Chuang.K, Huang.J and Chen.M, "Mining Top-K Frequent Patterns in the Presence of the Memory Constraint," VLDB Journal, vol 17, pp 13211344, 2008 [56] Karishma B Hathi , Jatin R Ambasana, "Top K Sequential Pattern Mining Algorithm.," International Conference on Information Engineering, Management and Security, pp 115-120, 2015 [57] Tzvetkov.P, Yan.X and Han.J, "TSP: Mining Top-K Closed Sequential Patterns," ICDM, pp 347-354, 2003 [58] Wang.J and Han.J, TFP, "An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets," TKDE, vol 17, pp 652-664, 2005 [59] Feremans Len, Cule Boris, Goethals Bart, "Mining Top-k Quantilebased Cohesive Sequential Patterns," in Siam International Conference on Data Mining (SDM18), 2019 [60] Zheng.Z, Cao.L, Song.Y and Wei.W, "Efficiently Mining Top-K High Utility Sequential Patterns," 2013 IEEE 13th International Conference on Data Mining, pp 1259-1264, 2013 143 [61] Philippe-fournier-viger, "The SPMF Open-Source Data Mining Library," 2020 [Online] Available: http://www.philippe-fournierviger.com/spmf/index.php?link=datasets.php [62] János Demetrovics, Hoang Minh Quang, Vu Duc Thi, Nguyen Viet Anh, "An Efficient Method to Reduce the Size of Consistent Decision Tables," Acta Cybern, vol 23, no 4, pp 1039-1054 , 2018 [63] Dalmas, B., Fournier-Viger, P., Norre, S, "TWINCLE: a constrained sequential rule mining algorithm for event logs," in Proceedings 9th International KES Conference, 2017 [64] Dinh, T., Huynh, V.N., Le, B, "Mining periodic high utility sequential patterns," in In Asian Conference on Intelligent Information and Database Systems, 2017 [65] Xu, T., Dong, X., Xu, J., Dong, X, "Mining high utility sequential patterns with negative item Mining high utility sequential patterns with negative item," International Journal of Pattern Recognition and Artificial Intelligence, vol 31, no 10, pp 1-17, 2017 [66] Sharda Khode, Sudhir Mohod, "Mining high utility itemsets using TKO and TKU to find top-k high utility web access patterns," in 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, 2017 [67] Lin, J.C.W., Zhang, J., Fournier-Viger, P, "High-utility sequential pattern mining with multiple minimum utility thresholds," in Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data (2017), 2017 [68] Zihayat, M., Davoudi, H., An, A, "Top-k utility-based gene regulation sequential pattern discovery," in Bioinformatics and Biomedicine (BIBM), 2016 IEEE International Conference, 2016 [69] R Campisano, F Porto, E Pacitti, F Masseglia, and E Ogasawara, "Spatial sequential pattern mining for seismic data," The Brazilian Symposium on Databases, p 241–246, 2016 [70] L Cao, X Dong, and Z Zheng, "e-NSP: Efficient negative sequential pattern mining," Artificial Intelligence, vol 235, pp 156-182, 2016 [71] Zihayat, M., Hut, Z.Z., An, A., Hut, Y, "Distributed and parallel high utility sequential pattern mining," in Big Data (Big Data), 2016 IEEE International Conference , 2016 [72] P Fournier-Viger, C W Wu, V S Tseng, L Cao, R Nkambou, "Mining partially-ordered sequential rules common to multiple sequences," 144 IEEE Transactions on Knowledge and Data Engineering, vol 27, no 8, p 2203–2216, 2015 [73] Zihayat, M., Wu, C.W., An, A., Tseng, V.S, "Mining high utility sequential patterns from evolving data streams," in Proceedings of the ASE Big Data and Social Informatics, 2015 [74] Dave, U., Shah, J, "Efficient mining of high utility sequential pattern from incremental sequential dataset," Int J Comput Appl, vol 122, no 12, pp 22-28, 2015 [75] Janos Demetrovics, Vu Duc Thi, Tran Huy Duong, "An algorithm to mine normalized weighted sequential patterns using Prefix-Projected Database," Serdica Journal of Computing, Sofia, Bulgarian Academy of Sciences, vol 9, no 2, p 105–122, 2015 [76] Dinh, T., Quang, M.N., Le, B., "A Novel approach for hiding high utility sequential patterns," in Proceedings International Symposium Information and Communication Technology, 2015 [77] Show-Jane Yen,Yue-Shi Lee, "Mining non-redundant time-gap sequential patterns," in Appl Intell (2013) 39:, 2013 [78] Shie, B.E., Yu, P.S., Tseng, V.S, "Mining interesting user behavior patterns in mobile commerce environments," Appl Intell, vol 38, no 3, p 418–435, 2013 [79] Huang, Tony Cheng-Kui, "Discovery of fuzzy quantitative sequential patterns with multiple minimum supports and adjustable membership functions," Information sciences, vol 222, pp 126-146, 2013 [80] Gomariz, A., Campos, M., Marin, R., Goethals, B, "ClaSP: an efficient algorithm for mining frequent closed sequences," in Proceedings of 17th Pacific-Asia Conference, 2013 [81] Yin, J., Zheng, Z., Cao, L., Song, Y., Wei, W, " Efficiently mining topk high utility sequential patterns," in IEEE 13th International Conference on Data Mining, 2013 [82] Shie, B.E., Cheng, J.H., Chuang, K.T., Tseng, V.S, "A one-phase method for mining high utility mobile sequential patterns in mobile commerce environments.," in Advanced Research in Applied Artificial Intelligence, 2012 [83] Fournier-Viger, Philippe, Roger Nkambou, and Vincent Shin-Mu Tseng, "RuleGrowth: mining sequential rules common to several sequences by pattern-growth," in Proceedings of the 2011 ACM symposium on applied computing, 2011 145 [84] P Fournier-Viger, and V S Tseng, "Mining top-k sequential rules," in The International Conference on Advanced Data Mining and Applications, 2011 [85] Shie, B.E., Hsiao, H., Tseng, V.S., Yu, P.S, "Mining high utility mobile sequential patterns in mobile commerce environments," in International Conference on Database Systems for Advanced Applications, 2011 [86] Kuo, R J., C M Chao, and C Y Liu, "Integration of K-means algorithm and AprioriSome algorithm for fuzzy sequential pattern mining," Applied Soft Computing, vol 9, no 1, pp 85-93, 2009 [87] Khan.M.S, Muyeba.M, Coenen.F, "Weighted Association Rule Mining from Binary and Fuzzy Data," in Proceedings of 8th Industrial Conference, ICDM 2008, 2008 [88] L Chang, T Wang, D Yang, and H Luan, "Seqstream: Mining closed sequential patterns over," in IEEE International Conference on Data Mining, 2008 [89] C Fiot, A Laurent, and M Teisseire, "From crispness to fuzziness: Three algorithms for soft sequential pattern mining," IEEE Transactions on Fuzzy Systems, vol 15, no 6, p 1263–1277, 2007 [90] Hong, Tzung-Pei, Kuei-Ying Lin, and Shyue-Liang Wang, "Mining fuzzy sequential patterns from quantitative transactions," Soft Computing, vol 10, no 10, pp 925-932, 2006 [91] R A Garcia-Hernandez, J F Martanez-Trinidad, and J A CarrascoOchoa, "A new algorithm for fast discovery of maximal sequential patterns in a document collection," in The International Conference on Intelligent Text Processing and Computational Linguistics, 2006 [92] Yun.U, Leggett.J.J, "WFIM: weighted frequent itemset mining with a weight range and a minimum weight," in 5th SIAM Int Conf on Data Mining, 2005 [93] J Ho, L Lukov, and S Chawla, "Sequential pattern mining with constraints on large protein databases," in The International Conference on Management of Data, 2005 [94] J H Chang, and W S Lee, "Efficient mining method for retrieving sequential patterns over online," Journal of Information Science, vol 31, no 5, p 420–432, 2005 [95] Cheung.Y.L and Fu.A.W, "Mining frequent itemsets without support threshold: with and without item constraints," TKDE, vol 16, pp 1052-1069, 2004 146 [96] H Cheng, X Yan, and J Han, "IncSpan: incremental mining of sequential patterns in large database," in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004 [97] Hu, Yi-Chung, Gwo-Hshiung Tzeng, and Chin-Mi Chen , "Deriving two-stage learning sequences from knowledge in fuzzy sequential pattern mining.," Information Sciences, vol 159, no 1, pp 69-86, 2004 [98] Tao.F, Murtagh.F, Farid.M, "Weighted Association Rule Mining Using Weighted Support and Significance Framework," in Proceedings of 9th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2003 [99] Y.-C Hu, R.-S Chen, G.-H Tzeng, and J.-H Shieh, "A fuzzy data mining algorithm for finding sequential patterns," Int J Uncertainty, Fuzziness Knowledge-Based Syst, vol 11, no 2, p 173–193, 2003 [100] Kitakami, H., Kanbara, T., Mori, Y., Kuroki, S and Ya-mazaki, Y., "Modified PrefixSpan Method for Motif Dis-covery in Sequence Databases," in in Proc of PRICAI2002, 2002 [101] Chen, Yen-Liang, Shih-Sheng Chen, and Ping-Yu Hsu, "Mining hybrid sequential patterns and sequential rules," Information Systems, vol 27, no 5, pp 345-362, 2002 [102] Hong, Tzung-Pei, Kuie-Ying Lin, and Shyue-Liang Wang, "Mining fuzzy sequential patterns from multiple-item transactions.," in IFSA World Congress and 20th NAFIPS International Conference, 2001 Joint 9th, 2001 [103] Chen, Ruey-Shun, et al., "Discovery of fuzzy sequential patterns for fuzzy partitions in quantitative attributes," in Computer Systems and Applications, ACS/IEEE International Conference on 2001, 2001 [104] Wang.W, Yang.J, and Yu.P.S, "Efficient Mining of Weighted Association Rules (WAR)," in Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2000 [105] Garofalakis, Minos N., Rajeev Rastogi, and Kyuseok Shim, "SPIRIT: Sequential pattern mining with regular expression constraints.," VLDB, vol 99, 1999 [106] Hong, Tzung-pei, Chan-Sheng Kuo, and Sheng-Chai Chi., "Mining fuzzy sequential patterns from quantitative data.," in IEEE SMC'99 Conference Proceedings, 1999 [107] Cai.C.H, Chee Fu.A.W, Cheng.C.H, and Kwong.W.W, "Mining Association Rules with Weighted Items," in Proceedings of the 1998 International Symposium on Database Engineering & Applications, Cardiff, Wales, 1998 147 [108] R B V Subramanyam and A Goswami, "A fuzzy data mining algorithm for incremental mining of quantitative sequential patterns," Int J Uncertainty, Fuzziness Knowledge-Based Syst, vol 13, no 6, p 633–652, 2005 [109] T Huang, R Huang, B Liu, and Y Yan, "Extracting Various Types of Informative Web Content via Fuzzy Sequential Pattern Mining," in sia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data, 2017 [110] M Yoshida, T Iizuka, H Shiohara, and M Ishiguro, "Mining sequential patterns including time intervals," Data Mining and Knowledge Discovery: Theory, Tools, and Technology II, vol 4057, p 213–220, 2000 [111] F Giannotti, M Nanni, D Pedreschi, and F Pinelli, "Mining sequences with temporal annotations," in Proceedings of the 2006 ACM symposium on Applied computing, 2006 [112] I Mukhlash, D Yuanda, and M Iqbal, "Mining Fuzzy Time Interval Periodic Patterns in Smart Home Data," International Journal of Electrical and Computer Engineering (IJECE), vol 8, no 5, p 3374, 2018 [113] C.-I Chang, H.-E Chueh, and N P Lin, "Sequential patterns mining with fuzzy time-intervals," in Sixth International Conference on Fuzzy Systems and Knowledge Discovery, 2009 [114] C.-I Chang, H.-E Chueh, and Y.-C Luo, "An integrated sequential patterns mining with fuzzy time-intervals," in International Conference on Systems and Informatics (ICSAI2012), 2012 [115] Sahugu`ede, A., Le Corronc, E., and Le Lann, M.-V, "An ordered chronicle discovery algorithm," in 3nd ECML/PKDD Workshop on Advanced Analytics and Learning on Temporal Data, 2018 [116] Guyet, T and Quiniou, R., "NegPSpan: efficient extraction of negative sequential patterns with embedding constraints," Data Mining and Knowledge Discovery, vol 34, no 2, pp 563-609, 2020 [117] J Bakalara, "Temporal models of care sequences for the exploration of medico-administrative data.," in In Proceedings of the 17th Conference on Artificial Intelligence in Medicine (AIME),, Poznan, Poland, 2019 [118] Mathonat, R., Nurbakova, D., Boulicaut, J.-F., and Kaytoue, M., "SeqScout: Using a Bandit Model to Discover Interesting Subgroups in Labeled Sequences," in In IEEE International Conference on Data Science and Advanced Analytics (DSAA), Washington, United States, 2019 ... DÃY CÓ TRỌNG SỐ TRONG CƠ SỞ DỮ LIỆU DÃY 15 1.1 Tổng quan tình hình nghiên cứu 15 1.2 Khai phá mẫu dãy có trọng số CSDL dãy 25 1.3 Khai phá mẫu dãy có trọng số CSDL dãy. .. ý nghĩa mẫu dãy thường xuyên khai phá Đối tượng nghiên cứu 13 Dữ liệu có giá trị trọng số Cơ sở liệu dãy có khoảng cách thời gian Dữ liệu có giá trị trọng số Cơ sở liệu dãy định lượng có khoảng... phương pháp tăng trưởng mẫu dãy Thuật toán khai phá top-k mẫu dãy trọng số có khoảng cách thời gian Thuật tốn khai phá mẫu dãy trọng số có khoảng cách thời gian Thuật toán khai phá mẫu dãy lợi

Tiêu đề	Khai Phá Mẫu Dãy Có Trọng Số Trong Cơ Sở Dữ Liệu Dãy
Tác giả	Trần Huy Dương
Người hướng dẫn	TS. Nguyễn Trường Thắng, GS.TS. Vũ Đức Thi
Trường học	Học viện Khoa học và Công nghệ
Chuyên ngành	Hệ thống thông tin
Thể loại	luận án tiến sĩ
Năm xuất bản	2021
Thành phố	Hà Nội

Định dạng
Số trang	151
Dung lượng	3,31 MB