Hướng phát triển của luận văn

Khai thác tập phổ biến là một bước quan trọng trong quá trình khai thác luật kết

hợp. Thuật toán Index-BitTable là một thuật toán hiệu quả dựa trên cấu trúc bảng BitTable để khai thác tập phổ biến. Thuật toán vận dụng BitTable theo cả chiều ngang

và chiều dọc để tính toán mảng Index và độ hỗ trợ tương ứng, bằng việc tìm tất cả các

mục dữ liệu đồng xuất hiện cùng với phần tử đại diện item. Nên các tập phổ biến dạng này được sinh ra một cách nhanh chóng bằng cách kết hợp item với tất cả các tập con

khác rỗng của subsume của chính item đại diện đó. Hơn nữa thuật toán cũng đã chứng

minh rằng tập phổ biến đại diện item và những tập phổ biến được đại diện bởi item có cùng độ hỗ trợ với nhau. Vì vậy, chi phí cho việc xử lý cho loại itemset này sẽ thấp và hiệu quả hơn. Những kết quả thực nghiệm cho thấy kỹ thuật này đặc biệt hiệu quả cho

Thuật toán dựa trên cấu trúc BitTable không thể thực hiện tốt cho đến khi có đủ

bộ nhớ để chứa BitTable. Tuy nhiên, thực tế thì tập dữ liệu rất lớn và phức tạp. Vì thế, để thiết kế một thuật toán hiệu quả phục vụ cho việc khai thác tập phổ biến là công việc cần hướng tới. Hơn nữa, ảnh hưởng của những phương pháp tìm tập phổ biến hiện

tại là quá lớn. Tuy nhiên, có một điều vẫn không rõ ràng là loại itemset nào sẽ làm thỏa

mãn hai tiêu chí là ngắn gọn và tiêu biểu cho từng ứng dụng cụ thể. Vì vậy, việc xây

dựng một itemset ngắn gọn nhưng có chất lượng cao, phục vụ hữu ích cho các ứng

TÀI LIỆU THAM KHẢO

Tiếng Việt

[1] Lê Hoài Bắc, Võ Đình Bảy (2007), “Thuật toán tìm nhanh Minimal Generator của

tập phổ biến đóng ”. Tạp chí phát triển KH&CN, tập 10, số 12-2007.

[2] Lê Hoài Bắc, Võ Đình Bảy (2008), “Khai thác luật thiết yếu nhất từ tập phổ biến đóng”. Science & Technology Development, Vol 11, No.01-2008.

Tiếng Anh

[3] A. Inokuchi, T. Washio, H. Motoda, Complete mining of frequent patterns from graphs: mining graph data, Machine Learning 50 (3) (2003) 321–354.

[4] A. Savasere, E. Omiecinski, S.B. Navathe, An efficient algorithm for mining association rules in large databases, in: Proceedings of 21th International Conference on Very Large Data Bases (VLDB’95), Zurich, 1995, pp. 432–444. [5] H. Toivonen, Sampling large databases for association rules, in: Proceedings of22th

International Conference on Very Large Data Bases (VLDB’96), Bombay,India, 1996, pp. 134–145.

[6] J. Dong, M. Han, BitTableFI: an efficient mining frequent itemsets algorithm, Knowledge Based Systems 20 (4) (2007) 329–335.

[7] J. Fong, H.K. Wong, S.M. Huang, Continuous and incremental data mining association rules using frame metadata model, Knowledge Based Systems 16 (2) (2003) 91–100.

[8] J. Han, Data Mining: Concepts and Techniques, University of Illinois at Urbana- Champaign, 2006.

[9] J.Han, J.Pei, Y.Yin, Mining frequent patterns without candidate generation, in: Proceedings of the 2000 ACM-SIGMOD International Conference on Management of Data (SIGMOD’00), Dallas, Texas, 2000, pp. 1–12.

[10] J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, D. Yang, H-Mine: hyper-structure mining of frequent patterns in large databases, in: Proceedings of the 2001 IEEE International Conference on Data Mining (ICDM’01), San Jose, CA, 2001, pp. 441–448.

[11] J.S. Park, M.S. Chen, P.S. Yu, An efficient hash-based algorithm for mining association rules, in: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD’95), San Jose, California, 1995, pp 175-186.

[12] M.J. Zaki, Mining non-redundant association rules, Data Mining and Knowledge Discovery 9 (3) (2004) 223-248.

[13] M.J. Zaki, Scalable algorithms for association mining, IEEE Transactions on Knowledge and Data Engineering 12 (3) (2000) 372-390.

[14] M.J. Zaki, K. Gouda, Fast vertical mining using diffsets, in: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, 2003, pp.326-335.

[15] R. Agrawal, R. Srikant, Fast algorithms for mining association rules in largedatabases, in: Proceedings of the 20th International Conference on Very LargeData Bases (VLDB’94), Chile, 1994, pp. 487–499.

[16] R. Agrawal, T. Imilienski, A. Swami, Mining association rules between sets of items in large databases, in: Proceedings of the ACM SIGMOD International Conference on Management of Data, Washington, DC, 1993, pp. 207-216.

[17] S. Orlando, P. Palmerini, R. Perego, Enhancing the Apriori algorithm for frequent set counting, in: Proceedings of Third International Conferrence on Data Warehousing and Knowledge Discovery (DaWak’01), Munich, 2001, pp.71-82. [18] S. Brin, R. Motwani, J.D. Ullman, S. Tsur, Dynamic itemset counting and

International Conference on Management of Data (SIGMOD’97), Tucson, Arizona, 1997, pp. 255–264.

[19] W.Song, B.Yang, Index-BitTableFI :An improved algorithm for mining frequent itemsets, Knowledge Based Systems 21 (2008) 507-513.

[20] Y.J. Tsay, J.Y. Chiang, CBAR: an efficient method for mining association rules, Knowledge Based Systems 18 (2-3) (2005) 99-105.

[21] T. Calders, B. Goethals, Non-derivable itemset mining, Data Mining andKnowledge Discovery 14 (1) (2007) 171–206.

Tìm nhanh tập ứng viên

Nguồn dữ liệu cước điện thoại