Luận án tiến sĩ kỹ thuật thiết kế cơ sở dữ liệu phân tán theo tiếp cận khai phá dữ liệu

BỘ GIÁO DỤC VÀ ĐÀO TẠO ĐẠI HỌC ĐÀ NẴNG  - LƢƠNG VĂN NGHĨA THIẾT KẾ CƠ SỞ DỮ LIỆU PHÂN TÁN THEO TIẾP CẬN KHAI PHÁ DỮ LIỆU LUẬN ÁN TIẾN SĨ KỸ THUẬT ĐÀ NẴNG – 2019 BỘ GIÁO DỤC VÀ ĐÀO TẠO ĐẠI HỌC ĐÀ NẴNG  - LƢƠNG VĂN NGHĨA THIẾT KẾ CƠ SỞ DỮ LIỆU PHÂN TÁN THEO TIẾP CẬN KHAI PHÁ DỮ LIỆU Chuyên ngành: KHOA HỌC MÁY TÍNH M : LUẬN ÁN TIẾN SĨ KỸ THUẬT Ngƣời hƣớng dẫn khoa học: PGS.TS Lê Văn Sơn PGS.TS Đoàn Văn Ban ĐÀ NẴNG – 2019 i LỜI CAM ĐOAN Tôi xin cam đoan Luận án "Thiết kế sở liệu phân tán theo tiếp cận khai phá liệu” công trình nghiên cứu tơi thực hiện, hướng dẫn PGS.TS Lê Văn Sơn PGS.TS Đoàn Văn Ban Tôi cam đoan kết nghiên cứu trình bày luận án trung thực khơng chép từ luận án khác Một số kết nghiên cứu thành tập thể đồng tác giả đồng ý cho sử dụng Mọi trích dẫn có ghi nguồn gốc xuất xứ rõ ràng đầy đủ Tác giả Lƣơng Văn Nghĩa ii MỤC LỤC LỜI CAM ĐOAN .i MỤC LỤC ii DANH MỤC CÁC CỤM TỪ VIẾT TẮT v DANH MỤC THUẬT NGỮ ANH - VIỆT vi DANH MỤC CÁC BẢNG viii DANH MỤC CÁC HÌNH .ix MỞ ĐẦU Chƣơng THIẾT KẾ CƠ SỞ DỮ LIỆU PHÂN TÁN 1.1 TỔNG QUAN VỀ HỆ CƠ SỞ DỮ LIỆU PHÂN TÁN 1.1.1 Các đặc điểm hệ sở liệu phân tán 1.1.2 Các mục tiêu hệ sở liệu phân tán 1.1.3 Kiến trúc hệ sở liệu phân tán 10 1.1.4 Các mơ hình hệ sở liệu phân tán 11 1.2 THIẾT KẾ CƠ SỞ DỮ LIỆU PHÂN TÁN 12 1.2.1 Các chiến lƣợc thiết kế 12 1.2.2 Các vấn đề thiết kế sở liệu phân tán 14 1.2.3 Kỹ thuật thiết kế sở liệu phân tán 16 1.2.4 Các quy tắc phân mảnh đắn 18 1.2.5 Thảo luận thiết kế sở liệu phân tán 18 1.3 THIẾT KẾ CƠ SỞ DỮ LIỆU PHÂN TÁN BẰNG CÁC KỸ THUẬT PHÂN MẢNH 19 1.3.1 Kỹ thuật phân mảnh ngang 20 1.3.2 Kỹ thuật phân mảnh dọc 25 1.3.3 Thuật toán phân mảnh FC 29 1.3.4 Kỹ thuật phân mảnh hỗn hợp 33 1.3.5 Thảo luận kỹ thuật phân mảnh 34 1.4 KẾT CHƢƠNG 36 Chƣơng PHÂN CỤM DỮ LIỆU TRONG THIẾT KẾ CƠ SỞ DỮ LIỆU PHÂN TÁN .38 iii 2.1 TIẾP CẬN KHAI PHÁ DỮ LIỆU 38 2.1.1 Khai phá tri thức khai phá liệu 38 2.1.2 Những thách thức khai phá liệu 40 2.1.3 Các toán khai phá liệu 41 2.2 KỸ THUẬT PHÂN CỤM TRONG KHAI PHÁ DỮ LIỆU 42 2.2.1 Kỹ thuật phân cụm 42 2.2.2 Các kiểu liệu độ đo phân cụm 44 2.2.3 Một số phƣơng pháp phân cụm liệu 48 2.2.4 Thảo luận kỹ thuật phân cụm 58 2.3 PHÂN MẢNH DỮ LIỆU DỰA VÀO KỸ THUẬT PHÂN CỤM 59 2.3.1 Đề xuất cải tiến thuật toán phân mảnh dọc VFC 60 2.3.2 Đề xuất cải tiến thuật toán phân mảnh ngang HFC 61 2.3.3 Đánh giá kết thực nghiệm 64 2.4 KẾT CHƢƠNG 70 Chƣơng THIẾT KẾ CƠ SỞ DỮ LIỆU PHÂN TÁN THEO PHÂN CỤM THÔ VÀ TỐI ƢU ĐÀN KIẾN 72 3.1 THIẾT KẾ CƠ SỞ DỮ LIỆU PHÂN TÁN THEO TIẾP CẬN TẬP THÔ 72 3.1.1 Rời rạc hố liệu trích chọn thuộc tính theo tiếp cận tập thơ 73 3.1.2 Hệ thông tin 74 3.1.3 Quan hệ không phân biệt, bất khả phân biệt hệ thông tin 74 3.1.4 Thuộc tính vector đặc trƣng tham chiếu 75 3.2 PHÂN CỤM DỮ LIỆU PHÂN TÁN THEO TIẾP CẬN TẬP THÔ 76 3.2.1 Thuật tốn phân cụm thơ KR (K-Means Rough) 76 3.2.2 Kết thực nghiệm thuật tốn phân cụm thơ KR 80 3.3 THIẾT KẾ CƠ SỞ DỮ LIỆU PHÂN TÁN THEO PHƢƠNG PHÁP TỐI ƢU ĐÀN KIẾN 83 3.3.1 Phƣơng pháp tối ƣu hóa đàn kiến 83 3.3.2 Từ đàn kiến tự nhiên đến đàn kiến nhân tạo 83 3.3.3 Thuật toán ACO tổng quát 84 3.3.4 Thuật toán hệ kiến AS 85 iv 3.3.5 Tổ chức liệu khái niệm độ đo 87 3.4 PHÂN CỤM DỮ LIỆU PHÂN TÁN THEO TIẾP CẬN TỐI ƢU ĐÀN KIẾN 89 3.4.1 Phân cụm liệu phân tán theo tiếp cận ACO 89 3.4.2 Đề xuất thuật toán phân mảnh dọc theo phân cụm đàn kiến 90 3.4.3 Kết thực nghiệm thuật toán đề xuất VFAC 95 3.5 KẾT CHƢƠNG 99 KẾT LUẬN 101 DANH MỤC CƠNG TRÌNH CỦA TÁC GIẢ TÀI LIỆU THAM KHẢO v DANH MỤC CÁC CỤM TỪ VIẾT TẮT TT Từ viết tắt Tiếng Anh Tiếng Việt ACO Ant Colony Optimization Tối ƣu hóa đàn kiến AS Ant System Hệ kiến BA Bottom Attributes Thuộc tính đáy BEA Bond energy algorithm Thuật toán lƣợng nối CA Clustered Affintity Ái lực tụ thuộc tính CFN Current Fragmentation Số mảnh Number FAC Fragmentation Ants Cluster Phân mảnh cho phân cụm kiến FC Fragmentation Cluster Phân mảnh cho phân cụm HFC Horizontal Fragmentation Phân cụm cho phân mảnh Cluster ngang Knowledge Discovery in Khám phá tri thức Database CSDL 10 KDD 11 KO Knowledge-Oriented Hƣớng tri thức 12 KPDL Data Mining Khai phá liệu 13 OCM Object-Condition Matrix Ma trận đối tƣợng-điều kiện 14 OFN Optimization Fragmentation Số mảnh tối ƣu Number 15 RST Rough Set Theory Lý thuyết tập thô 16 TA Top Attributes Thuộc tính đỉnh 17 TSP Travelling Salesman Bài tốn ngƣời chào hàng Problem 18 19 VFAC VFC Vertical Fragmentation Ants Phân cụm kiến cho phân Cluster mảnh dọc Vertical Fragmentation Phân cụm cho phân mảnh Cluster dọc vi DANH MỤC THUẬT NGỮ ANH - VIỆT TT Thuật ngữ tiếng Anh Thuật ngữ tiếng Việt Access frequency Tần số truy xuất Affinity Ái lực quan hệ Allocation Cấp phát Analysis & decision support Phân tích hỗ trợ định Association rules Luật kết hợp Attribute affinity Ái lực thuộc tính Attribute affinity matrix Ma trận lực thuộc tính Big data Dữ liệu lớn Border object Đối tƣợng biên 10 Bottom-up approach Tiếp cận từ dƣới lên 11 Cardinality Lực lƣợng 12 Classification & prediction Phân lớp dự đoán 13 Cluster Affintity Matrix Ma trận lực cụm thuộc tính (CA) 14 Clustering Phân cụm 15 Concept description Mơ tả khái niệm 16 Conceptual design Thiết kế khái niệm 17 Contribution Đóng góp 18 Core object Đối tƣợng lõi 19 Database machine Máy CSDL 20 Dense region Vùng dày đặc 21 Density based cluster Cụm dựa mật độ 22 Distributed processing Xử lý phân tán 23 Distribution transparency Trong suốt phân tán 24 Equi-join Nối 25 Fragmentation Phân mảnh 26 Fragmentation Transparency Trong suốt phân mảnh vii 27 Global affinity measure Số đo lực chung AM 28 Hetorogeneous DDBS Hệ CSDL phân tán không 29 Homogeneous DDBS Hệ CSDL phân tán 30 Horizontal Fragmentation Phân mảnh ngang 31 Hybrid Fragmentation Phân mảnh hỗn hợp 32 Minterm fragment Mảnh hội sơ cấp 33 Minterm predicate Vị từ hội sơ cấp 34 Minterm selectivity Độ tuyển hội sơ cấp 35 Net contribution Đóng góp thực 36 Noise object Đối tƣợng nhiễu 37 Outlier Phần tử ngoại lệ 38 Relevant Liên đới 39 Replication transparency Trong suốt nhân 40 Semi-join Nửa kết nối 41 Simple predicate Vị từ đơn 42 Top-down approach Tiếp cận từ xuống 43 Vertical fragmentation Phân mảnh dọc 44 View design Thiết kế khung nhìn viii DANH MỤC CÁC BẢNG Bảng 1.1 Ma trận giá trị sử dụng thuộc tính 27 Bảng 1.2 Ma trận lực thuộc tính AA 27 Bảng 2.1 Bảng kiện cho biến nhị phân [I] 46 Bảng 2.2 Ma trận khoảng cách đối tượng 50 Bảng 2.3 Ma trận khoảng cách cụm sau gom cụm bước 51 Bảng 2.4 Khoảng cách cụm sau gom cụm bước 51 Bảng 2.5 Khoảng cách cụm sau lần gom cụm 51 Bảng 2.6 Vector hóa ghi 62 Bảng 2.7 Ma trận OCM 62 Bảng 2.8 Bảng biểu diễn đối tượng (p1, p2, p6) 64 Bảng 2.9 Khoảng cách Euclide đối tượng 65 Bảng 2.10 Tập D gồm 20 đối tượng cần phân cụm 67 Bảng 2.11 So sánh kết với phân cụm k-Means VFC 68 Bảng 2.12 Kết phân mảnh ngang cải tiến HFC 69 Bảng 2.13 Kết phân mảnh ngang theo k-Medoids 70 Bảng 3.1 Tập D gồm 20 đối tượng cần phân cụm 81 Bảng 3.2 So sánh kết phân cụm thô KR k-Means 82 Bảng 3.3 Bảng tham số 87 Bảng 3.4 Tập liệu D gồm 20 giao tác 96 Bảng 3.5 So sánh kết với phân cụm k-Means với VFAC 98 98 Bảng 3.5 So sánh kết với phân cụm k-Means với VFAC Thuật toán so sánh Phân cụm k-Means nguyên thủy Phân cụm VFAC Số cụm k Tổng Thời gian (ms) Chi phí trung bình lỗi (Min) Bộ nhớ tối đa (Mb) Số lần lặp k=3 16 4728.4549 1.2986 k=7 16 2582.9550 1.2987 k = 10 16 5437.3716 1.3000 k = 15 16 1455.4550 1.2987 k = 19 16 5437.3716 1.3000 k=3 16 2633.4550 1.2800 10 k=7 16 2633.7012 1.2800 10 k = 10 15 2632.7272 1.2801 10 k = 15 15 2632.0085 1.2801 k = 19 15 2632.0016 1.2802 3.4.3.3 Đánh giá thực nghiệm thuật toán VFAC (về chi phí lỗi so với k-Means) So ánh chi phí trung bình lỗi k-Means VFAC 6000 5000 (min) 4000 3000 2000 1000 k=3 k=7 k = 10 k = 15 k = 19 (S cụm) k-Means VFAC Hình 3.8 So sánh chi phí trung bình lỗi k-Means VFAC 99 Hình 3.9 Đánh giá độ ổn định theo số cụm k-Means VFAC Đánh giá chung thuật toán đề xuất VFAC: - Xem xét thuật toán VFAC với (k = 3) nhƣ kết (Hình 3.6), (Hình 3.7) kết đánh giá biểu đồ (Hình 3.8), (hình 3.9), dễ dàng nhận thấy rằng, tăng số cụm k tổng thời gian giảm, chi phí trung bình lỗi nhƣ số lần lặp giảm Điều cho thấy thuật tốn VFAC có ƣu điểm xử lý với liệu lớn, có số đối tƣợng cần phân cụm nhiều - Với (k = 10), thuật toán VFAC tối ƣu thuật toán k-Means nhiều số nhƣ: tổng thời gian, chi phí lỗi chi phí nhớ Khi số cụm k tăng dần, thuật toán VFAC giảm đƣợc tổng thời gian tính tốn số trƣờng hợp lặp so với kMeans - Qua so sánh đánh giá theo đồ thị nhƣ (Hình 3.8) (Hình 3.9), thể rõ đƣợc độ ổn định chi phí trung bình lỗi k-Means so với thuật tốn đề xuất VFAC tăng dần số k cụm cần phận cụm dọc 3.5 KẾT CHƢƠNG Chƣơng trình bày nghiên cứu toán phân mảnh dọc thiết kế sở liệu phân tán sở kết hợp phân cụm dựa vào lý thuyết tập thô phân cụm kiến FAC dựa vào lý thuyết tối ƣu hóa đàn kiến ACO 100 Dựa vào phân cụm theo tiếp cận tập thô, luận án xây dựng ma trận độ tương tự cho việc phân cụm, ứng với cặp đối tƣợng ma trận, gán cụm cụm dựa vào xấp xỉ xấp xỉ Việc cải tiến thuật tốn phân cụm ngun thủy k-Means thành phân cụm thơ KR cách kết hợp khoảng cách, độ tương đồng với xấp xỉ xấp xỉ thể đƣợc ƣu điểm cải thiện độ đo chất lƣợng phân cụm Dựa vào phân cụm theo phƣơng pháp ACO, luận án đề xuất nội dung phát triển độ đo tương đồng cho thuật toán phân cụm kiến FAC toán phân mảnh dọc liệu theo phân cụm kiến VFAC Các kết so sánh, đánh giá thử nghiệm cho thấy tính tối ƣu độ phức tạp tính tốn thuật tốn phân cụm tích hợp đề xuất Qua phân tích, so sánh đánh giá kết thực nghiệm cho thấy việc tiếp cận kỹ thuật phân cụm tích hợp nhƣ phân cụm thơ, phân cụm đàn kiến FAC có số ƣu điểm nỗi trội tiến hành cho liệu lớn Ngồi ra, chi phí nhớ tăng khơng đáng kể số trƣờng hợp lặp giảm nhiều, làm giảm độ phức tạp tính tốn k cụm lớn./ 101 KẾT LUẬN Với mục tiêu đặt tập trung nghiên cứu số phƣơng pháp phân mảnh thiết kế sở liệu phân tán theo tiếp cận khai phá liệu kỹ thuật phân cụm nguyên thủy, tích hợp với phân cụm thô phân cụm đàn kiến, Luận án đạt đƣợc số kết nghiên cứu nhƣ sau: Trình bày hai tiếp cận lý thuyết tập thơ lý thuyết tối ƣu hóa đàn kiến ACO việc áp dụng đề xuất toán phân mảnh dọc thiết kế CSDL phân tán theo tiếp cận khai phá liệu Nghiên cứu cải tiến thuật toán phân mảnh ngang HFC phân mảnh dọc VFC dựa vào kỹ thuật phân cụm tích hợp dựa sở phát triển độ đo tƣơng đồng từ thuật toán phân cụm nguyên thủy Nghiên cứu đề xuất thuật tốn phân cụm thơ KR theo tiếp cận lý thuyết tập thơ cho tốn phân mảnh dọc sở liệu thô phân tán Nghiên cứu đề xuất thuật toán phân mảnh dọc VFAC theo phân cụm đàn kiến FAC theo tiếp cận lý thuyết tối ƣu hóa đàn kiến ACO, cho phép ta bƣớc thu hẹp khơng gian tìm kiếm nâng cao hiệu thuật toán phân cụm liệu toán thiết kế sở liệu phân tán Hƣớng phát triển luận án tiếp tục nghiên cứu bổ sung hồn thiện tốn thiết kế sở liệu phân tán theo số tiếp cận nhƣ: - Tiếp tục nghiên cứu đề xuất thuật toán phân mảnh dọc liệu phân tán theo tiếp cận phân cụm mờ thuật toán phân mảnh ngang liệu phân tán theo hƣớng tiếp cận phân cụm thô phân cụm đàn kiến - Thử nghiệm thuật toán phân cụm đề xuất nhƣ KR, VFAC nhiều liệu lớn hệ CSDL khơng - Đối với tốn phân cụm có n đối tƣợng cần phân cụm số k cụm lớn, độ phức tạp tính tốn cao Vì vậy, việc nghiên cứu cải tiến thuật tốn chi phí xử lý tính tốn hƣớng phát triển cần thiết sau luận án này./ DANH MỤC CƠNG TRÌNH CỦA TÁC GIẢ [I] Lƣơng Văn Nghĩa (2013), “Phân mảnh dọc, ngang thiết kế sở liệu phân tán dự kỹ thuật phân cụm”, Tạp chí KH&CN, ISSN 8591531 - Đại học Đà Nẵng, số 03(64), pp 159-162 [II] Lƣơng Văn Nghĩa, Lê Văn Sơn (2014), “Phân mảnh liệu thiết kế sở liệu phân tán dựa vào kỹ thuật phân cụm hƣớng tri thức”, Tạp chí KH&CN, ISSN 1859-1531 - Đại học Đà Nẵng, số 1(74), Quyển II, pp 59-63 [III] Van Nghia Luong, Ha Huy Cuong Nguyen, Van Son Le (2015), “An Improvement on Fragmentation in Distribution Database Design Based on Knowledge-Oriented Clustering Techniques”, International Journal of Computer Science & Information Security (IJCSIS) - USA, ©IJCSIS publication 2015 Pennsylvania, Vol 13, No 5, pp 13-17 [IV] Van Nghia Luong, Vijender Kumar Solanki, Ha Huy Cuong Nguyen (2017), “Fragmentation in Distributed Database Design Based on ACO Clustering Technique”, IGI Global eEditorial Discovery®, International Journal of Information Retrieval Research (IJIRR), Published by IGI Global Publishing, Hershey,USA, Vol 9, Issue 2, Article [V] Van Nghia Luong, Van Son Le, and Van Ban Doan (2017), “Fragmentation in Distributed Database Design Based on KR Rough Clustering Technique”, 6th International Conference, ICCASA 2017 and 3rd International Conference, ICTCC 2017 Tam Ky, Vietnam, November 23–24, 2017, pp 166-172 TÀI LIỆU THAM KHẢO [1] Aadil, Farhan, et al (2016), "CACONET: Ant Colony Optimization (ACO) Based Clustering Algorithm for VANET", PloS one 11(5), p e0154080 [2] Adl, Rosa Karimi and Rankoohi, Seyed Mohammad Taghi Rouhani (2009), "A new ant colony optimization based algorithm for data allocation problem in distributed databases", Knowledge and Information Systems 20(3), pp 349-373 [3] Al-Sayyed, Rizik MH, et al (2014), "A new approach for database fragmentation and allocation to improve the distributed database management system performance", Journal of Software Engineering and Applications 7(11), p 891 [4] Amalarethinam, D and Balakrishnan, C (2013), "oDASuANCO-ant colony optimization based data allocation strategy in peer-to-peer distributed databases", Int J Enhanc Res Sci Technol Eng 2(3), pp 18 [5] Anwar, Ismail M, Salama, Khalid M, and Abdelbar, Ashraf M (2015), "Instance selection with ant colony optimization", Procedia Computer Science 53, pp 248-256 [6] Banerjee, Abhirup and Maji, Pradipta (2017), "Stomped-t: A novel probability distribution for rough-probabilistic clustering", Information Sciences 421, pp 104-125 [7] BASER, Furkan, GOKTEN, Soner, and GOKTEN, Pinar OKAN (2017), "Using fuzzy c-means clustering algorithm in financial health scoring", Audit Financiar 15(147), pp 385-394 [8] Bean, Charlotte and Kambhampati, Chandra (2008), "Autonomous clustering using rough set theory", International Journal of Automation and Computing 5(1), pp 90-102 [9] Bean, CL and Kambhampati, C (2003), "Knowledge-oriented clustering for decision support", Neural Networks, 2003 Proceedings of the International Joint Conference on, IEEE, pp 3244-3249 [10] Bendechache, Malika and Kechadi, M-Tahar (2015), "Distributed clustering algorithm for spatial data mining", Spatial Data Mining and Geographical Knowledge Services (ICSDM), 2015 2nd IEEE International Conference on, IEEE, pp 60-65 [11] Chaharsooghi, S Kamal and Kermani, Amir H Meimand (2008), "An effective ant colony optimization algorithm (ACO) for multi-objective resource allocation problem (MORAP)", Applied mathematics and computation 200(1), pp 167-177 [12] Chen, Min (2017), "Soft clustering for very large data sets", International Journal of Computer Science and Network Security (IJCSNS) 17(1), p 102 [13] Cong, Wang, et al (2015), "A fuzzy c-means clustering scheme incorporating non-local spatial constraint for brain magnetic resonance image segmentation", Journal of Medical Imaging and Health Informatics 5(8), pp 1821-1825 [14] Cui, Xiaohui, Potok, Thomas E, and Palathingal, Paul (2005), "Document clustering using particle swarm optimization", Swarm Intelligence Symposium, 2005 SIS 2005 Proceedings 2005 IEEE, IEEE, pp 185-191 [15] Cuzzocrea, Alfredo, Darmont, Jerome, and Mahboubi, Hadj (2009), "Fragmenting very large XML data warehouses via k-means clustering algorithm", International Journal of Business Intelligence and Data Mining 4(3-4), pp 301-328 [16] Dai, Min and Huang, Ya-Lou (2007), "Data mining used in rule design for active database systems", Fuzzy Systems and Knowledge Discovery, 2007 FSKD 2007 Fourth International Conference on, IEEE, pp 588592 [17] Darabant, Adrian Sergiu (2005), "A new approach in fragmentation of distributed object oriented databases using clustering techniques", Studia Univ babes, L (2) [18] Darabant, Adrian Sergiu and Darabant, Laura (2011), "Clustering methods in data fragmentation", Rom Journ of Information Science and Technology 14(1), pp 81-97 [19] Das, Gyanesh and Nanda, Bijay Kumar (2013), "Ant-Based Clustering: A Comparative Study" [20] Dash, Priyanka, et al (2013), "Review on fragment allocation by using clustering technique in distributed database system", arXiv preprint arXiv:1310.1190 [21] Devi, Asha and Sharma, Saurabh (2017), "Review on Analysis of Clustering Techniques in Data Mining", International Journal of Computational Engineering Research [22] Ding, Hu, et al (2016), "K-means clustering with distributed dimensions", International Conference on Machine Learning, pp 13391348 [23] Dorigo, Marco and Birattari, Mauro (2011), "Ant colony optimization", Encyclopedia of machine learning, Springer, pp 36-39 [24] Du, Jun, Alhajj, Reda, and Barker, Ken (2006), "Genetic algorithms based approach to database vertical partition", Journal of Intelligent Information Systems 26(2), pp 167-183 [25] Duan, Lingzi, Yu, Fusheng, and Zhan, Li (2016), "An improved fuzzy Cmeans clustering algorithm", Natural Computation, Fuzzy Systems and Knowledge Discovery [26] Duan, Qi, Yang, You Long, and Li, Yang (2017), "Rough K-Modes Clustering Algorithm Based on Entropy", IAENG International Journal of Computer Science 44(1), pp 13-18 [27] El-Dosouky, Ali I, Ali, Assist Prof Dr Hesham A, and Areed, Eng Marwa FF (2008), "A Heuristic Approach for Horizontal Fragmentation and Allocation in DOODB" [28] Eladl, Gamal H (2017), "A Modified Vertical Fragmentation Strategy for Distributed Database" [29] Firdaus, Sabhia and Uddin, Md Ashraf (2015), "A Survey on Clustering Algorithms and Complexity Analysis", International Journal of Computer Science Issues (IJCSI) 12(2), p 62 [30] Fournier-Viger, Philippe SPMF, accessed, from http://www.philippefournier-viger.com/spmf/ [31] Getahun, Fekade, et al (2007), "The use of semantic-based predicates implication to improve horizontal multimedia database fragmentation", Workshop on multimedia information retrieval on The many faces of multimedia semantics, ACM, pp 29-38 [32] Goli, Mehdi and Rankoohi, Seyed Mohammad Taghi Rouhani (2012), "A new vertical fragmentation algorithm based on ant collective behavior in distributed database systems", Knowledge and Information Systems 30(2), pp 435-455 [33] Guinepain, Sylvain and Gruenwald, Le (2006), "Automatic database clustering using data mining", Database and Expert Systems Applications, 2006 DEXA'06 17th International Workshop on, IEEE, pp 124-128 [34] Hamdi, Amira, et al (2016), "FractAntBee algorithm for data clustering problem", International Journal of Computer Science and Information Security 14(7), p 275 [35] Han, Jiawei and Miller, Harvey J (2009), "Geographic data mining and knowledge discovery", CRC Press [36] Han, Jiawei, Pei, Jian, and Kamber, Micheline (2011), "Data mining: concepts and techniques", Elsevier [37] Hardiansyah, et al (2017), "Ant Lion Optimization for Dynamic Economic Dispatch", IOSR Journal of Electrical and Electronics Engineering (IOSR-JEEE) 12(8), pp 87-92 [38] Kalanat, Nasrin and Kangavari, Mohammad Reza (2015), "Data Mining Methods for Designing and Triggering Rules in Active Database Systems", International Journal of Database Theory and Application 8(1), pp 39-44 [39] Kaufman, Leonard and Rousseeuw, Peter J (2009), "Finding groups in data: an introduction to cluster analysis", Vol 344, John Wiley & Sons [40] Kelemu, Yohannes and Patil, Seema (2016), "Hybridized Fragmentation of Distributed Databases using Clustering" [41] Khan, Shahidul Islam and Hoque, ASML (2010), "A new technique for database fragmentation in distributed systems", International Journal of Computer Applications 5(9), pp 20-24 [42] Kranen, Philipp, et al (2011), "The ClusTree: indexing micro-clusters for anytime stream mining", Knowledge and information systems 29(2), pp 249-272 [43] Kriegel, Hans-Peter, Kröger, Peer, and Zimek, Arthur (2009), "Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering", ACM Transactions on Knowledge Discovery from Data (TKDD) 3(1), p [44] Liao, Tianjun, et al (2011), "An incremental ant colony algorithm with local search for continuous optimization", Proceedings of the 13th annual conference on Genetic and evolutionary computation, ACM, pp 125-132 [45] Liparulo, Luca, Proietti, Andrea, and Panella, Massimo (2015), "Fuzzy clustering using the convex hull as geometrical model", Advances in Fuzzy Systems 2015, p [46] Liu, Yaya, Martínez, Luis, and Qin, Keyun (2017), "A Comparative Study of Some Soft Rough Sets", Symmetry 9(11), p 252 [47] Ma, Hui (2007), "Distribution design for complex value databases", PhD Thesis, Massey University [48] Ma, Hui, Schewe, K-D, and Kirchberg, Markus (2006), "A heuristic approach to vertical fragmentation incorporating query information", Databases and Information Systems, 2006 7th International Baltic Conference on, IEEE, pp 69-76 [49] Marir, Farhi, et al (2007), "An enhanced grouping algorithm for vertical partitioning problem in DDBs", Computer and information sciences, 2007 iscis 2007 22nd international symposium on, IEEE, pp 1-6 [50] Mendhule, Vrushali D, Soni, Gaurav, and Sharma, Alesh (2015), "Interactive Image Segmentation Using Combined MRF and Ant Colony Optimization", International Journal of Engineering and Computer Science 4(06) [51] Michelakos, Ioannis, et al (2011), "Ant colony optimization and data mining", Next Generation Data Technologies for Collective Computational Intelligence, Springer, pp 31-60 [52] Modak, Manisha, Singhal, Niharika, and M, Venkatesan (2014), "Covering Based Rough Set using K-means", International Journal of Advanced Research in Computer Science and Software Engineering (IJARSSE) 4(12) [53] Mohibullah, Md, Hossain, Md Zakir, and Hasan, Mahmudul (2015), "Comparison of euclidean distance function and manhattan distance function using k-mediods", International Journal of Computer Science and Information Security 13(10), p 61 [54] Naldi, Murilo Coelho and Campello, Ricardo JGB (2015), "Comparison of distributed evolutionary k-means clustering algorithms", Neurocomputing 163, pp 78-93 [55] Narasimhaiah, Gorla and Pang, Wing Yan Betty (2010), "Vertical Fragmentation in Databases Using Data-Mining Technique", Strategic Advancements in Utilizing Data Mining and Warehousing Technologies: New Concepts and Developments, IGI Global, pp 178-197 [56] Olson, David L and Delen, Dursun (2008), "Advanced data mining techniques", Springer Science & Business Media [57] Othman, Khaled Ali, et al (2017), "A Refined Rough K-Means Clustering Algorithm based on Minimizing the Effect of Local Outlier Objects to Improve Overlapping Detection", Research Journal of Applied Sciences, Engineering and Technology 14(8), pp 281-290 [58] Özsu, M Tamer and Valduriez, Patrick (2011), "Principles of distributed database systems", Springer Science & Business Media [59] Park, Hae-Sang, Lee, Jong-Seok, and Jun, Chi-Hyuck (2006), "A Kmeans-like Algorithm for K-medoids Clustering and Its Performance", Proceedings of ICCIE, pp 102-117 [60] Parpinelli, Rafael S, Lopes, Heitor S, and Freitas, Alex Alves (2002), "Data mining with an ant colony optimization algorithm", IEEE transactions on evolutionary computation 6(4), pp 321-332 [61] Reynolds, Alan P, Richards, Graeme, and Rayward-Smith, Vic J (2004), "The application of k-medoids and pam to the clustering of rules", International Conference on Intelligent Data Engineering and Automated Learning, Springer, pp 173-178 [62] Sajana, T, Rani, CM Sheela, and Narayana, KV (2016), "A survey on clustering techniques for big data mining", Indian Journal of Science and Technology 9(3) [63] Savulea, Dorel and Constantinescu, Nicolae (2011), "Vertical Fragmentation Security Study in Distributed Deductive Databases", Annals of the University of Craiova-Mathematics and Computer Science Series 38(3), pp 37-45 [64] Sembiring, Rahmat Widia, Zain, Jasni Mohamad, and Embong, Abdullah (2010), "Clustering high dimensional data using subspace and projected clustering algorithms", arXiv preprint arXiv:1009.0384 [65] Sharma, Pragya and Datta, Unmukh (2016), "Ant Colony Optimization with the Fusion of Adaptive K-means and Gaussian Second Derivative for Image Segmentation", International Journal of Computer Applications 134(3), pp 44-47 [66] Singh, Ina and Gupta, Neelakshi (2015), "Segmentation of Liver using Hybrid K-means Clustering and Level Set", International Journal of Advanced Research in Computer Science and Software EngineeringVolume [67] Singh, Manjeet and Kumar, Yogendra (2016), "Eeffective Similarity Measure With Enhanced K-Medoid Partitioned Clustering Algorithm", International Journal of Computer Engineering and Applications (IJCEA) X(Special) [68] Singh, Shalini S and Chauhan, NC (2011), "K-means v/s K-medoids: A Comparative Study", National Conference on Recent Trends in Engineering & Technology [69] Singla, Amit and Karambir, Mr (2012), "Comparative analysis & evaluation of euclidean distance function and manhattan distance function using k-means algorithm", International Journal of Advanced Research in Computer Science and Software Engineering (IJARSSE) 2(7), pp 298300 [70] Sinwar, Deepak and Kaushik, Rahul (2014), "Study of Euclidean and Manhattan distance metrics using simple k-means clustering", Int J Res Appl Sci Eng Technol 2(5), pp 270-274 [71] Sleit, Azzam, et al (2007), "A dynamic object fragmentation and replication algorithm in distributed database systems", American Journal of Applied Sciences 4(8), pp 613-618 [72] Son, Jin Hyun and Kim, Myoung Ho (2004), "An adaptable vertical partitioning method in distributed systems", Journal of Systems and Software 73(3), pp 551-561 [73] Song, Jian-Hua, Cong, Wang, and Li, Jin (2017), "A Fuzzy C-means Clustering Algorithm for Image Segmentation Using Nonlinear Weighted Local Information", Journal of Information Hiding and Multimedia Signal Processing 2017 8(3) [74] Sulaiman, Nasir, et al (2017), "A Refined Rough K-Means Clustering Algorithm based on Minimizing the Effect of Local Outlier Objects to Improve Overlapping Detection", Research Journal of Applied Sciences, Engineering and Technology 14(8), pp 281-290 [75] Takacs, Balint and Demiris, Yiannis (2010), "Spectral clustering in multiagent systems", Knowledge and information systems 25(3), pp 607-622 [76] Tan, Pang-Ning, Steinbach, Michael, and Kumar, Vipin (2013), "Data mining cluster analysis: basic concepts and algorithms", Introduction to data mining [77] Thuần, Nguyễn Đức (2011), "Nhập môn phát tri thức khai phá liệu", NXB Thông tin & Truyền thông [78] Ubukata, Seiki, Notsu, Akira, and Honda, Katsuhiro (2016), "The rough Set k-means clustering", Soft Computing and Intelligent Systems (SCIS) and 17th International Symposium on Advanced Intelligent Systems, 2016 Joint 8th International Conference on, IEEE, pp 189-193 [79] Uddin, Jamal, Ghazali, Rozaida, and Deris, Mustafa Mat (2017), "An Empirical Analysis of Rough Set Categorical Clustering Techniques", PloS one 12(1), p e0164803 [80] Williams, Graham J and Simoff, Simeon J (2006), "Data mining: Theory, methodology, techniques, and applications", Vol 3755, Springer [81] Xiao, Shucheng, et al (2017), "An Optimized Clustering Algorithm based on Integration of Genetic Algorithm and Ant Colony Algorithm", International Journal of Security and Its Application 11(3), pp 25-32 [82] Zaixin, Zhao, Lizhi, Cheng, and Guangquan, Cheng (2013), "Neighbourhood weighted fuzzy c-means clustering algorithm for image segmentation", IET Image processing, 8(3), pp 150-161 ... .38 iii 2.1 TIẾP CẬN KHAI PHÁ DỮ LIỆU 38 2.1.1 Khai phá tri thức khai phá liệu 38 2.1.2 Những thách thức khai phá liệu 40 2.1.3 Các toán khai phá liệu 41 2.2... 2.7 Kết phân cụm theo khoảng cách ngắn 65 Hình 2.8 Kết phân cụm theo khoảng cách lớn 66 Hình 2.9 Kết phân cụm theo thuật toán k-Means (k = 3) 67 Hình 2.10 Kết phân cụm theo thuật toán... PHÂN CỤM DỮ LIỆU PHÂN TÁN THEO TIẾP CẬN TỐI ƢU ĐÀN KIẾN 89 3.4.1 Phân cụm liệu phân tán theo tiếp cận ACO 89 3.4.2 Đề xuất thuật toán phân mảnh dọc theo phân cụm đàn kiến

Định dạng
Số trang	124
Dung lượng	1,93 MB