Nghiên cứu phát triển mô hình, thuật toán khai phá tập phần tử có trọng số và lợi ích cao

BỘ GIÁO DỤC VÀ ĐÀO TẠO BỘ QUỐC PHÒNG HỌC VIỆN KỸ THUẬT QUÂN SỰ ĐẬU HẢI PHONG NGHIÊN CỨU PHÁT TRIỂN MƠ HÌNH, THUẬT TỐN KHAI PHÁ TẬP PHẦN TỬ CĨ TRỌNG SỐ VÀ LỢI ÍCH CAO LUẬN ÁN TIẾN SĨ CƠ SỞ TOÁN HỌC CHO TIN HỌC HÀ NỘI – NĂM 2018 BỘ GIÁO DỤC VÀ ĐÀO TẠO BỘ QUỐC PHÒNG HỌC VIỆN KỸ THUẬT QUÂN SỰ ĐẬU HẢI PHONG NGHIÊN CỨU PHÁT TRIỂN MƠ HÌNH, THUẬT TỐN KHAI PHÁ TẬP PHẦN TỬ CÓ TRỌNG SỐ VÀ LỢI ÍCH CAO Chuyên ngành: Cơ sở Toán học cho Tin học Mã số : 62.46.01.10 LUẬN ÁN TIẾN SĨ CƠ SỞ TOÁN HỌC CHO TIN HỌC NGƯỜI HƯỚNG DẪN KHOA HỌC: TS NGUYỄN MẠNH HÙNG PGS.TS ĐOÀN VĂN BAN HÀ NỘI - 2018 LỜI CAM ĐOAN Tôi xin cam đoan luận án cơng trình nghiên cứu tác giả thực hướng dẫn tập thể cán hướng dẫn Luận án có sử dụng thơng tin trích dẫn từ nhiều nguồn tham khảo khác nhau, thơng tin trích dẫn ghi rõ nguồn gốc Các số liệu thực nghiệm, kết nghiên cứu trình bày luận án hồn tồn trung thực, chưa công bố tác giả hay cơng trình khác LỜI CẢM ƠN Luận án thực hoàn thành Khoa Công nghệ Thông tin, Học viện kỹ thuật Quân Để đạt kết thiếu định hướng hỗ trợ giáo viên hướng dẫn Tơi ln tỏ lịng cảm ơn tri ân người giúp đỡ trình nghiên cứu sau Tơi ln tỏ lịng biết ơn công lao to lớn hai giáo viên hướng dẫn Thầy người Thầy lớn tận tình, hướng dẫn giúp đỡ nghiên cứu Tôi trân trọng cảm ơn Lãnh đạo, Thầy/Cô Khoa Công nghệ Thông tin, Phòng Sau đại học - Học viện Kỹ thuật Quân tạo điều kiện thuận lợi, giúp đỡ q trình học tập nghiên cứu Tơi cảm ơn tới Ban Giám Hiệu, Thầy/Cô bạn bè đồng nghiệp trường Đại học Thăng Long tạo điều kiện để tập trung nghiên cứu Tôi xin dành tất yêu thương lời cảm ơn tới gia đình, bố mẹ, vợ con, anh chị em người thân động viên mạnh mẽ giúp thực Luận án Xin chân thành cảm ơn! Tác giả luận án Đậu Hải Phong MỤC LỤC LỜI CAM ĐOAN I LỜI CẢM ƠN II MỤC LỤC III DANH MỤC CÁC KÝ HIỆU VÀ CÁC CHỮ VIẾT TẮT VI DANH MỤC CÁC BẢNG VII DANH MỤC CÁC HÌNH VẼ, ĐỒ THỊ .IX MỞ ĐẦU CHƯƠNG TỔNG QUAN VỀ KHAI PHÁ TẬP PHỔ BIẾN 1.1 Giới thiệu chung 1.2 Tập phổ biến 1.2.1 Khái niệm sở 1.2.2 Một số phương pháp khai phá tập phổ biến 1.3 Tập phổ biến có trọng số 12 1.3.1 Khái niệm sở 13 1.3.2 Một số phương pháp khai phá tập phổ biến có trọng số .14 1.3.3 Thuật toán khai phá tập phổ biến có trọng số theo chiều dọc .19 1.4 Tập lợi ích cao 33 1.4.1 Khái niệm sở 34 1.4.2 Một số phương pháp khai phá tập lợi ích cao .37 1.5 Kết luận chương 42 CHƯƠNG THUẬT TỐN KHAI PHÁ TẬP LỢI ÍCH CAO DỰA TRÊN MƠ HÌNH CWU .43 2.1 Giới thiệu chung 43 2.2 Mơ hình hiệu khai phá tập lợi ích cao 44 2.2.1 Đặt vấn đề .44 2.2.2 Đề xuất mơ hình CWU 45 2.3 Thuật toán HP khai phá tập lợi ích cao dựa số hình chiếu mơ hình CWU 49 2.3.1 Mơ tả thuật tốn HP 52 2.3.2 Ví dụ minh họa thuật tốn HP .55 2.3.3 Độ phức tạp tính tốn thuật toán HP .61 2.3.4 Kết thực nghiệm .62 2.4 Thuật toán song song PPB khai phá tập lợi ích cao dựa số hình chiếu danh sách lợi ích 65 2.4.1 Một số cấu trúc sử dụng thuật toán PPB gồm: 67 2.4.2 Mơ tả thuật tốn song song PPB 70 2.4.3 Ví dụ minh họa thuật toán PPB .72 2.4.4 Độ phức tạp tính tốn thuật tốn PPB 77 2.4.5 Kết thực nghiệm .79 2.5 Thuật toán CTU-PRO+ 82 2.5.1 Một số cấu trúc 82 2.5.2 Độ phức tạp tính tốn thuật tốn CTU-PRO+ 93 2.5.3 Kết thực nghiệm .94 2.6 Kết luận chương 96 CHƯƠNG THUẬT TỐN KHAI PHÁ TẬP LỢI ÍCH CAO TRÊN CÂY DANH SÁCH LỢI ÍCH VÀ CẤU TRÚC RTWU 98 3.1 Cấu trúc liệu hiệu cho khai phá tập lợi ích cao 98 3.1.1 Mơ tả cấu trúc CUP 100 3.1.2 Ví dụ minh họa CUP 102 3.2 Thuật toán HUI-Growth 107 3.2.1 Ví dụ minh họa thuật tốn HUI-Growth .108 3.2.2 Độ phức tạp thuật toán HUI-Growth 109 3.2.3 Kết thực nghiệm 110 3.3 Cấu trúc RTWU cho tỉa tập ứng viên .112 3.4 Thuật toán EAHUI-Miner dựa cấu trúc RTWU 121 3.4.1 Xây dựng danh sách lợi ích mở rộng 121 3.4.2 Thuật toán EAHUI-Miner .122 3.4.3 Độ phức toán tính tốn thuật tốn EAHUI-Miner 122 3.4.4 Thuật toán song song PEAHUI-Miner .123 3.4.5 Kết thực nghiệm 125 3.5 Kết luận chương .129 KẾT LUẬN VÀ KIẾN NGHỊ .130 Kết đạt được: 130 Hướng phát triển: 131 DANH MỤC CƠNG TRÌNH KHOA HỌC CỦA TÁC GIẢ LIÊN QUAN ĐẾN LUẬN ÁN .132 TÀI LIỆU THAM KHẢO 133 DANH MỤC CÁC KÝ HIỆU VÀ CÁC CHỮ VIẾT TẮT ST T Từ viết tắt AU CFP CSDL CUP CWU FI FP IT HCWU 10 LCWU 11 RTWU 12 13 14 15 TC TWU UL UT VMUD 16 17 G VMWFP Thuật ngữ tiếng Anh Thuật ngữ tiếng Việt Actual Utility Compact Frequent Pattern Database Compressed Utility Pattern Candidate Weighted Utility Frequent Itemsets Frequent Pattern Index Table High Candidate Weighted Lợi ích thực tế Mẫu phổ biến nén Cơ sở liệu Mẫu lợi ích nén Lợi ích trọng số ứng viên Tập phổ biến Mẫu phổ biến Bảng số Lợi ích ứng viên có trọng Utility Low Candidate Weighted số cao Lợi ích trọng số ứng viên Utility Remaining Transaction thấp Weighted Utilization Table Candidate Transaction Weighted Utility Utility List Utility Table Vertical Mining using Diffset Lợi ích giao dịch cịn lại Bảng ứng viên Lợi ích trọng số giao dịch Danh sách lợi ích Bảng giao dịch lợi ích Khai phá theo chiều dọc Groups Vertical Mining of Weighted sử dụng nhóm Diffset Khai phá theo chiều dọc Frequent Patterns tập phổ biến có trọng số DANH MỤC CÁC BẢNG Bảng 1.1 Cơ sở liệu minh họa 20 Bảng 1.2 Bảng Diffset phần tử .20 Bảng 1.3 Bảng trọng số phần tử 22 Bảng 1.4 Cơ sở liệu giao dịch minh họa 34 Bảng 1.5 Lợi ích phần tử 34 Bảng 2.1 Cơ sở liệu giao dịch minh họa 46 Bảng 2.2 Lợi ích phần tử 46 Bảng 2.3 Bảng TC1 với tập gồm 1- phần tử 51 Bảng 2.4 Bảng số ITA tập {A} 51 Bảng 2.5 Bảng UTA phần tử A .51 Bảng 2.6 Bảng TC1 với tập gồm phần tử 55 Bảng 2.7 Cơ sở liệu giao dịch sau xếp loại D 56 Bảng 2.8 Bảng TC1 sau cập nhật lại CWU .56 Bảng 2.9 Bảng UTC phần tử C .56 Bảng 2.10 Bảng số ITC phần tử C 57 Bảng 2.11 Bảng TC2 với tiền tố C giao dịch 58 Bảng 2.12 Bảng TC2 với tiền tố C giao dịch 58 Bảng 2.13 Bảng TC2 với tiền tố C CSDL 59 Bảng 2.14 Bảng TC1 sau cập nhật lại CWU 59 Bảng 2.15 Bảng số IT{CB} tập {CB} 59 Bảng 2.16 So sánh giá trị CWU TWU 60 Bảng 2.17 Cơ sở liệu giao dịch minh họa 66 Bảng 2.18 Bảng lợi ích ngồi phần tử .66 Bảng 2.19 Bảng lợi ích phần tử giao dịch 67 Bảng 2.20 Bảng TC1 với tập gồm phần tử 68 Bảng 2.21 Bảng số ITC tập {C} .68 Bảng 2.22 Bảng TC1 toàn cục với tập gồm phần tử 72 Bảng 2.23 Bảng số ITC phần tử C 73 Bảng 2.24 Bảng TC2 với tiền tố C giao dịch 74 Bảng 2.25 Bảng TC2 với tiền tố C giao dịch 75 Bảng 2.26 Bảng TC2 với tiền tố C 75 Bảng 2.27 Bảng số IT{CB} tập {CB} 76 Bảng 2.28 So sánh số lượng ứng viên danh sách lợi ích TWU 77 Bảng 2.29 Cơ sở liệu giao dịch minh họa .83 Bảng 2.30 Bảng lợi ích phần tử .83 Bảng 3.1 Cơ sở liệu giao dịch 99 Bảng 3.2 Bảng lợi ích phần tử 99 Bảng 3.3 CSDL giao dịch 99 Bảng 3.4 Cơ sở liệu giao dịch .113 Bảng 3.5 Lợi ích phần tử 114 Bảng 3.6 Danh sách lợi ích mở rộng tập {bc} .115 Bảng 3.7 Các thuộc tính CSDL .125 Bảng 3.8 So sánh số lượng tập ứng viên 126 Hình 3.45 Thời gian thực thuật toán EAHUI-Miner PEAHUI-Miner liệu T10I4D200K 3.5 Kết luận chương Trong chương luận án đề xuất cấu trúc mẫu lợi ích nén (CUP) kết hợp danh sách lợi ích thuật tốn khai phá tập lợi ích cao HUI-Growth [IV] Cấu trúc CUP dựa ưu điểm danh sách lợi ích cắt tỉa tập ứng viên nén liệu Kết thực nghiệm từ Hình 3.6 Hình 3.7 cho thấy thời gian thực thuật toán HUI-Growth sử dụng cấu trúc CUP kết hợp danh sách lợi ích nhanh so với các thuật toán UPGrowth [62], HUI-Miner [38] liệu Mushroom T40I4D100K Tiếp theo, luận án đề xuất cấu trúc RTWU [VI] dựa giá trị lợi ích giao dịch cịn lại kết hợp danh sách lợi ích mở rộng cặp phần tử làm giảm số lượng tập ứng viên Tính đắn cấu trúc thể qua hai Định lý 3.1 Định lý 3.2 Từ hai định lý này, đề xuất hai thuật toán song song khai phá tập lợi ích cao EAHUI-Miner [VI], PEAHUI-Miner [VI] sử dụng cấu trúc RTWU Kết thực nghiệm từ Bảng 3.8 cho thấy thuật toán EAHUI-Miner cho số lượng tập ứng viên thuật tốn FHM [26] Kết Hình 3.11-3.14 cho thấy thời gian thực thuật toán EAHUI-Miner nhanh thuật toán FHM [26] EFIM [77] với sở liệu thưa nhiều giao dịch như: Footmart, 139 T10I4D100K, T10I4D200K thời gian thực chậm sở liệu dày Mushroom 140 KẾT LUẬN VÀ KIẾN NGHỊ Kết luận án: Với mục tiêu xây dựng mơ hình, cấu trúc liệu thuật tốn nhằm nâng cao hiệu thuật toán khai phá tập phổ biến có trọng số tập lợi ích cao Luận án đạt kết sau: Đề xuất mơ hình lợi ích ứng viên có trọng số (CWU – Candidate Weighted Utility) [II] dựa phân tích cho thấy mơ hình TWU nhiều thuật toán sử dụng để cắt tỉa ứng viên khơng hiệu đánh giá ngưỡng cao nhiều so với giá trị lợi ích thực tế Từ mơ hình CWU đề xuất hai thuật tốn khai phá tập lợi ích cao HP [II] sử dụng số hình chiếu, CTU-PRO+ [III] sử dụng cấu trúc cho số lượng ứng viên thời gian thực nhanh so với số thuật toán Đề xuất cấu trúc RTWU (Remaining Transaction-Weighted Utility) dựa giá trị lợi ích giao dịch cịn lại kết hợp với danh sách lợi ích mở rộng cặp phần tử cho cắt tỉa tập ứng viên Dựa cấu trúc RTWU, đề xuất thuật toán EAHUI-Miner [VI] khai phá tập lợi ích cao thuật tốn song song PEAHUI-Miner [VI] khai phá tập lợi ích cao cho kết thực nghiệm có số lượng tập ứng viên thời gian thực nhanh sở liệu thưa số lượng giao dịch lớn Đề xuất thuật toán song song PPB khai phá tập lợi ích cao sử dụng kết hợp số hình chiếu, danh sách lợi ích phương pháp lưu trữ giá trị lợi ích phần tử giao dịch để tính nhanh giá trị iutil rutil danh sách lợi ích Đề xuất cấu trúc mẫu lợi ích nén (CUP) kết hợp với danh sách lợi ích [IV] Mỗi nút CUP lưu trữ tập phần tử danh sách lợi ích Các phần tử xếp giảm dần theo tần suất xuất 141 cho số nút Để khai phá tập lợi ích cao CUP, luận án đề xuất thuật toán HUI-Growth [IV] Đề xuất thuật toán VMWFP [I] khai phá tập phổ biến lợi ích cao dựa cấu trúc diffset Từ thuật toán VMWFP cho thấy nhóm, lớp nhóm xử lý độc lập Do đó, luận án đề xuất thuật tốn song song PVMWFP [I] mơ hình chia sẻ nhớ Hướng phát triển: Luận án tập trung vào bước quan trọng khai phá luật kết hợp khai phá tập phổ biến có trọng số tập lợi ích cao Cụ thể, đề xuất mơ hình, cấu trúc, thuật toán song song khai phá tập phổ biến có trọng số tập lợi ích cao sở liệu giao dịch Tuy nhiên, liệu ngày lớn phức tạp, cần có có cấu trúc thuật tốn phù hợp Do vậy, hướng phát triển luận án là:  Nghiên cứu mơ hình, cấu trúc thuật tốn hiệu khai tập phổ biến có trọng số tập lợi ích cao  Đưa kỹ thuật khai phá liệu mờ vào thuật toán đề xuất  Cài đặt, thử nghiệm thuật toán tảng lập trình Hadoop mơ hình Map-Reduce cho toán liệu lớn 142 DANH MỤC CƠNG TRÌNH KHOA HỌC CỦA TÁC GIẢ LIÊN QUAN ĐẾN LUẬN ÁN [I] Đậu Hải Phong, Nguyễn Mạnh Hùng, Đoàn Văn Ban, “Thuật toán song song khai phá mẫu phổ biến có trọng số theo chiều dọc”, Hội thảo quốc gia lần thứ 15: Một số vấn đề chọn lọc CNTT TT, pp 453-456, 2012 [II] Đậu Hải Phong, Nguyễn Mạnh Hùng, “Mơ hình hiệu khai phá tập mục lợi ích cao”, Tạp chí Cơng nghệ Thơng tin Truyền thơng: Các cơng trình nghiên cứu, phát triển ứng dụng CNTT-TT, Tập V-1, Số 13 (33), pp 26-37, 6-2015 [III] Đậu Hải Phong, Đoàn Văn Ban, Đỗ Mai Hường, “Mơ hình nén cho khai phá tập mục lợi ích cao”, Hội nghị khoa học quốc gia lần thứ 8: Nghiên cứu ứng dụng CNTT, Viện CNTT - Đại học Quốc gia Hà Nội, pp 359-370, 2015 [IV] Đậu Hải Phong, Nguyễn Mạnh Hùng, “Cấu trúc liệu hiệu cho khai phá tập mục lợi ích cao”, Hội thảo quốc gia lần thứ 19: Một số vấn đề chọn lọc CNTT TT, pp 60-67, 2016 [V] Đậu Hải Phong, Nguyễn Mạnh Hùng, “Phương pháp song song khai phá tập mục lợi ích cao dựa số hình chiếu”, Tạp chí Cơng nghệ Thơng tin Truyền thơng: Các cơng trình nghiên cứu, phát triển ứng dụng CNTT-TT, Tập V-1, Số 17 (37), pp 31-40, 6-2017 [VI] Nguyen Manh Hung, Dau Hai Phong, “Parallel mining for high utility itemsets mining by efficient data structure”, Information Communication TechnoLogy, Research and Development on Information Communications Technology, Volume E–3, No.14, 9-2017 143 and TÀI LIỆU THAM KHẢO Tiếng Việt: Nguyễn Duy Hàm (2016), Phát triển số thuật toán hiệu khai thác tập mục sở liệu số lượng có phân cấp mục, Luận án Tiến sĩ, Trường đại học Khoa học Tự nhiên, ĐHQGHN Nguyễn Huy Đức (2010), Khai phá tập mục cổ phần cao lợi ích cao sở liệu, Luận án Tiến sĩ, Viện CNTT Tiếng Anh: Agarwal R.C., Aggarwal C.C., and Prasad V.V.V (2001) A Tree Projection Algorithm for Generation of Frequent Item Sets J Parallel Distrib Comput, 61(3), 350–371 Aggarwal C.C., Bhuiyan M.A., and Hasan M.A (2014) Frequent Pattern Mining Algorithms: A Survey Frequent Pattern Mining Springer, Cham, 19–64 Agrawal R., Imielinski T., and Swami A (1993) Mining Association Rules between Sets of Items in Large Databases Agrawal R and Srikant R (1994) Fast Algorithms for Mining Association Rules in Large Databases Morgan Kaufmann Publishers Inc., 487–499 Ahmed C.F., Tanbeer S.K., Jeong B.-S et al (2009) An Efficient Candidate Pruning Technique for High Utility Pattern Mining Advances in Knowledge Discovery and Data Mining, Springer, Berlin, Heidelberg, 749–756 Anastasiu D.C., Iverson J., Smith S et al (2014) Big Data Frequent Pattern Mining Frequent Pattern Mining Springer International Publishing, 225–259 Baralis E., Cerquitelli T., Chiusano S et al (2013) P-Mine: Parallel itemset mining on large datasets 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW), 266–271 144 10 Bhanderi S.D and Garg S (2012) Parallel frequent set mining using inverted matrix approach 2012 Nirma University International Conference on Engineering (NUiCONE), 1–4 11 Cai C.H., Fu A.W.C., Cheng C.H et al (1998) Mining Association Rules with Weighted Items Proceedings of the 1998 International Symposium on Database Engineering & Applications, Washington, DC, USA, IEEE Computer Society, 68– 12 C.F A and S.K T Efficient Tree Structures for Highutility Pattern Mining in Incremental Databases 2009 13 Chan R., Yang Q., and Shen Y.-D (2003) Mining High Utility Itemsets IEEE Computer Society, 19 14 Chen Y and An A (2016) Approximate Parallel High Utility Itemset Mining Big Data Res, 6, 26–42 15 Cheng Lan G., Pei Hong T., and Tseng V.S (2013) An efﬁcient projection-based indexing approach for mining high utility itemsets 16 Chung S.M and Luo C (2003) Parallel mining of maximal frequent itemsets from databases Proceedings 15th IEEE International Conference on Tools with Artificial Intelligence, 134–139 17 Dawar S and Goyal V (2014) UP-Hist Tree: An Efficient Data Structure for Mining High Utility Patterns from Transaction Databases Proceedings of the 19th International Database Engineering & Applications Symposium, New York, NY, USA, ACM, 56–61 18 Dlala I.O., Jabbour S., Sais L et al (2015) Parallel SAT based closed frequent itemsets enumeration 2015 IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA), 1–8 19 El-Hajj M and Zaïane O.R (2003) Non-recursive Generation of Frequent K-itemsets from Frequent Pattern Tree Representations Data Warehousing and Knowledge Discovery, Springer, Berlin, Heidelberg, 371–380 20 El-hajj M and Zaïane O.R (2003) COFI-tree Mining: A New Approach to Pattern Growth with Reduced Candidacy Generation Workshop on 145 Frequent Itemset Mining Implementations (FIMI’03) in conjunction with IEEE-ICDM 21 El-Hajj M and Zaïane O.R (2003) Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, ACM, 109–118 22 El-Megid L.A.A., El-Sharkawi M.E., El-Fangary L.M et al (2009) Vertical Mining of Frequent Patterns Using Diffset Groups 2009 Ninth International Conference on Intelligent Systems Design and Applications, 1196–1201 23 Erwin A., Gopalan R.P., and Achuthan N.R (2007) A Bottom-up Projection Based Algorithm for Mining High Utility Itemsets Proceedings of the 2Nd International Workshop on Integrating Artificial Intelligence and Data Mining - Volume 84, Darlinghurst, Australia, Australia, Australian Computer Society, Inc., 3–11 24 Erwin A., Gopalan R.P., and Achuthan N.R (2007) CTU-Mine: An Efficient High Utility Itemset Mining Algorithm Using the Pattern Growth Approach 7th IEEE International Conference on Computer and Information Technology (CIT 2007), 71–76 25 Erwin A., Gopalan R.P., and Achuthan N.R (2008) Efficient Mining of High Utility Itemsets from Large Datasets Advances in Knowledge Discovery and Data Mining Springer Berlin Heidelberg, 554–561 26 Fournier-Viger P., Wu C.-W., Zida S et al (2014) FHM: Faster HighUtility Itemset Mining Using Estimated Utility Co-occurrence Pruning Foundations of Intelligent Systems, Springer International Publishing, 83– 92 27 Fournier-Viger P and Zida S (2015) FOSHU: Faster On-shelf High Utility Itemset Mining – with or Without Negative Unit Profit Proceedings of the 30th Annual ACM Symposium on Applied Computing, New York, NY, USA, ACM, 857–864 28 Fumarola F and Malerba D (2014) A parallel algorithm for approximate frequent itemset mining using MapReduce 2014 International 146 Conference on High Performance Computing Simulation (HPCS), 335– 342 29 Grahne G and Zhu J (2005) Fast algorithms for frequent itemset mining using FP-trees IEEE Trans Knowl Data Eng, 17(10), 1347–1362 30 Han J., Pei J., Yin Y et al (2000) Mining frequent patterns without candidate generation ACM SIGMOD Rec, 29(2), 1, 1–12, 12 31 Holsheimer M., Kersten M.L., Mannila H et al (1995), A Perspective on Databases and Data Mining, CWI (Centre for Mathematics and Computer Science), Amsterdam, The Netherlands, The Netherlands 32 Huai Z g and Huang M h (2011) A weighted frequent itemsets Incremental Updating Algorithm base on hash table 2011 IEEE 3rd International Conference on Communication Software and Networks, 201– 204 33 Kumar P and S A.V (2009) Parallel Method for Discovering Frequent Itemsets Using Weighted Tree Approach 2009 International Conference on Computer Engineering and Technology, 124–128 34 Lan Y.-J and Qiu Y (2005) Parallel frequent itemsets mining algorithm without intermediate result 2005 International Conference on Machine Learning and Cybernetics, 2102-2107 Vol 35 Lin C.-W., Hong T.-P., and Lu W.-H (2010) Efficiently Mining High Average Utility Itemsets with a Tree Structure Intelligent Information and Database Systems, Springer, Berlin, Heidelberg, 131–139 36 Lin Y.C., Wu C.-W., and Tseng V.S (2015) Mining High Utility Itemsets in Big Data Advances in Knowledge Discovery and Data Mining Springer International Publishing, Cham, 649–661 37 Liu J., Wang K., and Fung B.C.M (2012) Direct Discovery of High Utility Itemsets without Candidate Generation 2012 IEEE 12th International Conference on Data Mining, 984–989 38 Liu M and Qu J (2012) Mining High Utility Itemsets Without Candidate Generation Proceedings of the 21st ACM International Conference on Information and Knowledge Management, New York, NY, USA, ACM, 55–64 147 39 Liu Y., Liao W., and Choudhary A (2005) A Two-phase Algorithm for Fast Discovery of High Utility Itemsets Proceedings of the 9th PacificAsia Conference on Advances in Knowledge Discovery and Data Mining, Berlin, Heidelberg, Springer-Verlag, 689–695 40 Liu Y., Liao W., and Choudhary A (2005) A Fast High Utility Itemsets Mining Algorithm Proceedings of the 1st International Workshop on Utility-based Data Mining, New York, NY, USA, ACM, 90–99 41 Lucchese C., Orlando S., and Perego R (2007) Parallel Mining of Frequent Closed Patterns: Harnessing Modern Computer Architectures Seventh IEEE International Conference on Data Mining (ICDM 2007), 242–251 42 Manike C and Om H (2015) Time-Efficient Tree-Based Algorithm for Mining High Utility Patterns Advances in Intelligent Informatics Springer, Cham, 409–418 43 Padhy N., Mishra D., Panigrahi R et al (2012) The survey of data mining applications and feature scope ArXiv Prepr ArXiv12115723 44 Padhy N., Mishra D.P., and Panigrahi R (2012) The Survey of Data Mining Applications And Feature Scope Int J Comput Sci Eng Inf Technol, 2(3), 43–58 45 Park J.S., Chen M.-S., and Yu P.S (1995) An Effective Hash-based Algorithm for Mining Association Rules Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, ACM, 175–186 46 Pillai J., Vyas O.P., and Muyeba M (2013) HURI – A Novel Algorithm for Mining High Utility Rare Itemsets Advances in Computing and Information Technology Springer, Berlin, Heidelberg, 531–540 47 Qiu H., Gu R., Yuan C et al (2014) YAFIM: A Parallel Frequent Itemset Mining Algorithm with Spark 2014 IEEE International Parallel Distributed Processing Symposium Workshops, 1664–1671 48 Ramkumar G.D., Sanjay R., and Tsur S (1998) Weighted Association Rules: Model and Algorithm Proc Fourth ACM Int’l Conf Knowledge Discovery and Data Mining 148 49 Rathi S and Dhote C.A (2014) Using parallel approach in pre-processing to improve frequent pattern growth algorithm 2014 International Conference on Information Systems and Computer Networks (ISCON), 72– 76 50 Ruan Y.-L., Liu G., and Li Q.-H (2005) Parallel algorithm for mining frequent itemsets 2005 International Conference on Machine Learning and Cybernetics, 2118-2121 Vol 51 Savasere A., Omiecinski E., and Navathe S.B (1995) An Efficient Algorithm for Mining Association Rules in Large Databases Proceedings of the 21th International Conference on Very Large Data Bases, San Francisco, CA, USA, Morgan Kaufmann Publishers Inc., 432–444 52 Shankar S., Babu N., Purusothaman T et al (2009) A Fast Algorithm for Mining High Utility Itemsets 2009 IEEE International Advance Computing Conference, 1459–1464 53 Shenoy P., Haritsa J.R., Sudarshan S et al (2000) Turbo-charging Vertical Mining of Large Databases Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, ACM, 22–33 54 Solanki S.K and Patel J.T (2015) A Survey on Association Rule Mining 2015 Fifth International Conference on Advanced Computing Communication Technologies, 212–216 55 Song W., Liu Y., and Li J (2012) Vertical mining for high utility itemsets 2012 IEEE International Conference on Granular Computing, 429–434 56 Subramanian K., Kandhasamy P., and Subramanian S (2013) A Novel Approach to Extract High Utility Itemsets from Distributed Databases Comput Inform, 31(6+), 1597–1615 57 Sucahyo Y.G and Gopalan R.P (2004) CT-PRO: A Bottom-Up Non Recursive Frequent Itemset Mining Algorithm Using Compressed FPTree Data Structure FIMI, 212–223 58 Tao F., Murtagh F., and Farid M (2003) Weighted Association Rule Mining Using Weighted Support and Significance Framework Proceedings of the Ninth ACM SIGKDD International Conference on 149 Knowledge Discovery and Data Mining, New York, NY, USA, ACM, 661– 666 59 Teodoro G., Mariano N., Jr W.M et al (2010) Tree Projection-Based Frequent Itemset Mining on Multicore CPUs and GPUs 2010 22nd International Symposium on Computer Architecture and High Performance Computing, 47–54 60 Tseng V.S., Shie B.-E., Wu C.-W et al (2013) Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases IEEE Trans Knowl Data Eng, 25(8), 1772–1786 61 Tseng V.S., Wu C.W., Fournier-Viger P et al (2016) Efficient Algorithms for Mining Top-K High Utility Itemsets IEEE Trans Knowl Data Eng, 28(1), 54–67 62 Tseng V.S., Wu C.-W., Shie B.-E et al (2010) UP-Growth: An Efficient Algorithm for High Utility Itemset Mining Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, ACM, 253–262 63 Veloso A., Otey M.E., Parthasarathy S et al (2003) Parallel and Distributed Frequent Itemset Mining on Dynamic Datasets High Performance Computing - HiPC 2003, Springer, Berlin, Heidelberg, 184– 193 64 Vo B., Coenen F., and Le B (2013) A New Method for Mining Frequent Weighted Itemsets Based on WIT-trees Expert Syst Appl, 40(4), 1256– 1264 65 Vo B., Nguyen H., Ho T.B et al (2009) Parallel Method for Mining High Utility Itemsets from Vertically Partitioned Distributed Databases Proceedings of the 13th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems: Part I, Berlin, Heidelberg, Springer-Verlag, 251–260 66 Wang W., Yang J., and Yu P (2004) WAR: Weighted Association Rules for Item Intensities Knowl Inf Syst, 6(2), 203–229 150 67 Xun Y., Zhang J., and Qin X (2016) FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce IEEE Trans Syst Man Cybern Syst, 46(3), 313–325 68 Ye Y and Chiang C.-C (2006) A Parallel Apriori Algorithm for Frequent Itemsets Mining Fourth International Conference on Software Engineering Research, Management and Applications (SERA’06), 87–94 69 Yin J., Zheng Z., Cao L et al (2013) Efficiently Mining Top-K High Utility Sequential Patterns 2013 IEEE 13th International Conference on Data Mining, 1259–1264 70 Yoshikawa M and Terai H (2006) Apriori, Association Rules, Data Mining,Frequent Itemsets Mining (FIM), Parallel Computing Fourth International Conference on Software Engineering Research, Management and Applications (SERA’06), 95–100 71 Yu K.M and Wu S.H (2011) An Efficient Load Balancing Multi-core Frequent Patterns Mining Algorithm 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, 1408–1412 72 Yun U and Leggett J (2005) WFIM: Weighted Frequent Itemset Mining with a weight range and a minimum weight Proceedings of the 2005 SIAM International Conference on Data Mining Society for Industrial and Applied Mathematics, 636–640 73 Yun U and Ryu K.H (2011) Approximate weighted frequent pattern mining with/without noisy environments Knowl-Based Syst, 24(1), 73– 82 74 Zaki M.J (2000) Scalable Algorithms for Association Mining IEEE Trans Knowl Data Eng, 12(3), 372–390 75 Zaki M.J and Gouda K (2003) Fast Vertical Mining Using Diffsets Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, ACM, 326–335 151 76 Zhang Z., Ji G., and Tang M (2013) MREclat: An Algorithm for Parallel Mining Frequent Itemsets 2013 International Conference on Advanced Cloud and Big Data, 177–180 77 Zida S., Fournier-Viger P., Lin J.C.-W et al (2015) EFIM: a highly efficient algorithm for high-utility itemset mining Mexican International Conference on Artificial Intelligence, Springer, 530–546 78 Zong-Yu Z and Ya-Ping Z (2012) A Parallel Algorithm of Frequent Itemsets Mining Based on Bit Matrix 2012 International Conference on Industrial Control and Electronics Engineering, 1210–1213 79 SPMF: A Java Open-Source Data Mining Library , accessed: 04/26/2017 152 ... tập lợi ích cao - Các thuật tốn khai phá tập phổ biến có trọng số tập lợi ích cao Phạm vi nghiên cứu - Nghiên cứu tổng quan khai phá tập phổ biến, tập phổ biến có trọng số tập lợi ích cao - Nghiên. .. hai trọng số khác gọi lợi ích lợi ích ngồi Lợi ích số lượng phần tử giao dịch; lợi ích ngồi lợi nhuận giá mặt hàng Lợi ích phần tử tích hai giá trị lợi ích lợi ích ngồi Một tập phần tử gọi tập lợi. .. số Số lượng phần tử phổ biến có trọng số điều chỉnh khoảng khác Trọng số tập phần tử giá trị trung bình trọng số phần tử tập phần tử Trọng số W phần tử thỏa mãn: Wmin ≤ W ≤ Wmax (1.6) Thuật toán

Định dạng
Số trang	155
Dung lượng	1,31 MB