Kết luận chương 4

Chương 4 đã trình bày phương pháp xây dựng hàm phân biệt mở rộng, ma trận phân biệt mở rộng trên bảng quyết định giá trị tập. Hai cấu trúc dữ liệu mới này là công cụ để xây dựng thuật toán tìm tập rút gọn trên bảng quyết định giá trị tập. Đồng thời luận án chứng minh việc tìm tập rút gọn trên bảng quyết định giá trị tập đại diện và bảng quyết định giá trị tập ban đầu là như nhau.

Sử dụng hàm phân biệt mở rộng, chương 4 đã trình bày một phương pháp mới tìm rút gọn thuộc tính trong bảng quyết định giá trị tập. Bằng lý thuyết và thực nghiệm, chương 4 đã đề xuất năm thuật toán trên bảng quyết định giá trị tập.

1) Thuật toán 4.1 chọn mẫu đại diện trên hệ thông tin giá trị tập với độ phức tạp thời gian là hàm mũ.

2) Thuật toán 4.2 chọn mẫu đại diện trên bảng quyết định giá trị tập. Đây là thuật toán thực sự có ý nghĩa trong bước tiền xử lý dữ liệu trước khi thực hiện các nhiệm vụ khai phá dữ liệu tiếp theo.

3) Thuật toán RGDSDT tìm một tập rút gọn sử dụng hàm phân biệt mở rộng trong bảng quyết định giá trị tập.

4) Thuật toán RSDTAAS tìm một tập rút gọn sử dụng hàm phân biệt mở rộng trong bảng quyết định giá trị tập khi bổ sung tập thuộc tính.

5) Thuật toán RSDTDAS tìm một tập rút gọn sử dụng hàm phân biệt mở rộng trong bảng quyết định giá trị tập khi loại bỏ tập thuộc tính.

Chương 4 cũng tiến hành thử nghiệm thuật toán RGDSDT và so sánh kết quả tìm tập rút gọn hai thuật toán RGDSDT và thuật toán GMDSDT qua thực nghiệm cho cùng một kết quả.

KẾT LUẬN VÀ KIẾN NGHỊ I. Những kết quả chính của luận án

1. Định hướng theo kỹ thuật bảng ngẫu nhiên (contingency table) [56], luận

án đề xuất hai cấu trúc dữ liệu mới là bảng ngẫu nhiên tổng quát hóa (generalized contingency table) và các dàn giá trị thuộc tính (lattices of attribute values) trong hệ thông tin giá trị tập. Các cấu trúc dữ liệu mới này cho phép phát triển kỹ thuật bảng ngẫu nhiên để đề xuất thuật toán tương ứng tìm tập rút gọn trên bảng quyết định giá trị tập. Kết quả là, luận án đề xuất một thuật toán rút gọn thuộc tính trong bảng quyết định giá trị tập mới là Thuật toán GMDSVDT (Generalized Maximal Discernibility heuristic for Set valued Decision Tables, Thuật toán 3.1). Thuật toán

GMDSVDT được tác giả công bố trong công trình số 1, phần “Danh mục các công trình của tác giả”.

2. Hơn nữa, luận án chứng tỏ rằng hai cấu trúc dữ liệu bảng ngẫu nhiên tổng

quát hóa và các dàn giá trị thuộc tính cũng là công cụ hữu hiệu đề nghị thuật toán tìm các tập xấp xỉ thô trong hệ thông tin giá trị tập. Luận án đề xuất thuật toán VASDT (Verifying upper and lower Approximation for Set valued Decision Tables, Thuật toán 3.2.) tìm tập xấp xỉ trong hệ thông tin giá trị tập. Ngoài việc áp dụng hai cấu trúc dữ liệu nói trên, Thuật toán VASDT còn khai thác ý nghĩa của các khái niệm hàm phân biệt và độ đo bao hàm trong hệ thông tin giá trị tập. Bằng lý thuyết, luận án chứng minh Thuật toán VASDT do luận án đề xuất hiệu quả hơn Thuật toán tìm tập xấp xỉ trong công trình [74] dựa trên độ phức tạp tính toán. Thuật toán VASDT

được tác giả công bố trong công trình số 1.

3. Dựa trên ý tưởng thu nhỏ kích thước tập dữ liệu ban đầu của công trình

[27], kết quả là, luận án đề xuất Thuật toán chọn tập đối tượng đại diện (Select a representative object set, Thuật toán 4.1 và Thuật toán 4.2) trong hệ thông tin và trong bảng quyết định giá trị tập từ tập đối tượng ban đầu cho bài toán tìm tập rút gọn. Và chứng minh tập rút gọn trên tập đối tượng ban đầu và tập rút gọn trên tập đối tượng đại diện là như nhau qua một song ánh trên cả hệ thông tin và bảng quyết định giá trị tập, từ đó khẳng định tính đúng đắn của phương pháp. Các thuật toán chọn tập đối tượng đại diện có ý nghĩa quan trọng trong bước tiền xử lý số liệu của

bảng quyết định trước khi thực hiện các nhiệm vụ khai phá dữ liệu. Các kết quả được tác giả công bố trong công trình số 2.

4. Trong lý thuyết tập thông truyền thống, Skowron và Rauszer [52] đã đưa ra

khái niệm ma trận phân biệt và hàm phân biệt để tìm tập rút gọn. Dựa trên cách tiếp cận này, luận án đề xuất hai cấu trúc dữ liệu mới là hàm phân biệt mở rộng và ma trận phân biệt mở rộng trong bảng quyết định giá trị tập. Hai cấu trúc dữ liệu mới này là công cụ để xây dựng thuật toán tìm tập rút gọn trên bảng quyết định giá trị tập. Kết quả là, luận án đề xuất ba thuật toán tìm tập rút gọn thuộc tính trong bảng quyết định giá trị tập mới là Thuật toán RGDSDT tìm tập rút gọn thuộc tính trong bảng quyết định giá trị tập, Thuật toán RSDTAAS và Thuật toánRSDTDAS lần lượt tìm tập rút gọn thuộc tính khi bổ sung và loại bỏ tập thuộc tính trong bảng quyết định giá trị tập, đồng thời luận án đánh giá độ phức tạp của từng thuật toán. Các kết quả được tác giả công bố trong công trình số 3.

II. Hướng phát triển của luận án

1. Trên bảng quyết định giá trị tập, tiếp tục nghiên cứu rút gọn thuộc tính trong trường hợp khi bổ sung tập đối tượng.

2. Tiếp tục nghiên cứu các hàm phân biệt khác trên hệ thông tin giá trị tập. Trên cơ sở đó, đề xuất các phương pháp mới hiệu quả hơn các phương pháp đã có.

DANH MỤC CÁC CÔNG TRÌNH ĐÃ CÔNG BỐ

1. Sinh Hoa Nguyen and Thi Thu Hien Phung (2013). Efficient Algorithms for Attribute Reduction on Set-valued Decision Systems, The 14th International Conference of Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing (RSFDGrC 2013), Halifax, NS, Canada, October 11-14, 2013, Lecture Notes of Computer Science, SpingerLink, Volume 8170 (2013), pp 87-98.

2. Thi Thu Hien Phung (2013). Generalized Discernibility Function based Attribute Reduction in Set-valued Decision Systems, The Third World Congress on Information and Communication Technologies (WICT 2013),

published by IEEE, pp. 225-230.

3. Thi Thu Hien Phung (2013). On Selection of Representative Object Set for Attribute Reduction in Set-valued Information Systems, The Third World Congress on Information and Communication Technologies (WICT 2013),

published by IEEE, pp. 269-274.

4. Phùng Thị Thu Hiền, Lê Quang Hào, Nguyễn Bá Tường (2011). Những vấn đề của trích chọn đặc trưng trong hệ tin, Tạp chí nghiên cứu Khoa học kỹ thuật và công nghệ quân sự (2011), tr. 60 - 63.

5. Phùng Thị Thu Hiền, Lê Quang Hào, Nguyễn Quang Khanh, Nguyễn Bá Tường (2010). Định nghĩa tập thô theo hàm thuộc thô, Tạp chí nghiên cứu Khoa học kỹ thuật và công nghệ quân sự(2010), tr. 50 - 54.

TÀI LIỆU THAM KHẢO

Tài liệu tiếng Việt

[1] Nguyễn Long Giang (2012). Nghiên cứu một số phương pháp khai phá dữ liệu theo tiếp cận lý thuyết tập thô, Luận án Tiến sĩ, Viện Công Nghệ Thông Tin. [2] Hoàng Thị Lan Giao (2007). Khía cạnh đại số và lôgic phát hiện luật theo tiếp

cận tập thô, Luận án Tiến sĩ, Viện Công Nghệ Thông Tin.

[3] Phùng Thị Thu Hiền, Lê Quang Hào, Nguyễn Quang Khanh, Nguyễn Bá Tường (2010). Định nghĩa tập thô theo hàm thuộc thô, Tạp chí nghiên cứu Khoa học kỹ thuật và công nghệ quân sự (2010), tr. 50 - 54.

[4] Phùng Thị Thu Hiền, Lê Quang Hào, Nguyễn Bá Tường (2011). Những vấn đề của trích chọn đặc trưng trong hệ tin, Tạp chí nghiên cứu Khoa học kỹ thuật và công nghệ quân sự (2011), tr. 60 - 63.

[5] Nguyễn Đức Thuần (2010). Phủ tập thô và độ đo đánh giá hiệu năng tập luật quyết định, Luận án Tiến sĩ, Viện Công Nghệ Thông Tin.

Tài liệu tiếng Anh

[6] Beaubouef T., Petry F.E. and Arora G. (1998). Information-theoretic measures of uncertainty for rough sets and rough relational databases,

Information Sciences, 109, pp. 535-563.

[7] Jerzy W. Grzymala-Busse (2011). Mining Incomplete Data - A Rough Set Approach, RSKT 2011, pp. 1-7.

[8] Chen Z. C, Shi P., Liu P. G., Pei Z. (2013). Criteria Reduction of Set-Valued Ordered Decision System Based on Approximation Quanlity, International Journal of Innovative Computing, Information and Control, Vol 9, N 6, 2013, pp. 2393-2404.

[9] D.G. Chen, S.Y. Zhao, L. Zhang, Y.P. Yang, X. Zhang (2012). Sample pair selection for attribute reduction with rough set, IEEE Transaction on Knowledge and Data Engineering, 24(1), pp. 2080-2093.

[10] Ivo Düntsch, Wendy MacCaull, Ewa Orlowska (2000). Structures with Many- Valued Information and Their Relational Proof Theory. ISMVL 2000, pp.

293-304.

[11] I. Duntsch, G. Gediga, and E. Orlowska (2001). Relational attribute systems.

International Journal of Human-Computer Studies, 55(3):293–309.

[12] Qinrong Feng and Rui Li (2013). Discernibility Matrix Based Attribute Reduction in Intuitionistic Fuzzy Decision Systems, RSFDGrC 2013, Halifax, NS, Canada, October 11-14, 2013, Lecture Notes of Computer Science,

Volume 8170 (2013), pp. 147-156.

[13] Ge H., Li L.S and Yang C.J. (2009). Improvement to Quick Attribution Reduction Algorithm, Journal of Computers, Vol.30, No.2, pp. 308-312.

[14] Long Giang Nguyen and Hung Son Nguyen (2013). Metric Based Attribute Reduction in Incomplete Decision Tables, RSFDGrC 2013, Halifax, NS, Canada, October 11-14, 2013, Lecture Notes of Computer Science, Volume 8170 (2013), pp. 99-110.

[15] Y. Y. Guan, H. K. Wang (2006). Set-valued information systems, Information Sciences 176(17), pp. 2507-2525.

[16] Thi Thu Hien Phung (2013). Generalized Discernibility Function based Attribute Reduction in Set-valued Decision Systems, The Third World Congress on Information and Communication Technologies (WICT 2013),

published by IEEE, pp. 225-230.

[17] Thi Thu Hien Phung (2013). On Selection of Representative Object Set for Attribute Reduction in Set-valued Information Systems, The Third World Congress on Information and Communication Technologies (WICT 2013),

published by IEEE , pp. 269-274.

[18] Sinh Hoa Nguyen and Thi Thu Hien Phung (2013). Efficient Algorithms for Attribute Reduction on Set-valued Decision Systems, The 14th International Conference of Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing (RSFDGrC 2013), Halifax, NS, Canada, October 11-14, 2013,

Lecture Notes of Computer Science, SpingerLink, Volume 8170 (2013), pp 87-98.

[19] Nguyen Sinh Hoa, Nguyen H.Son (1996). Some Efficient Algorithms for Rough Set Methods, The sixth International Conference on Information Processing Management of Uncertainty in Knowledge - Based Systems, pp. 1451-1456

[20] Hu X.H. and Cercone N. (1995). Learning in relational databases, pp. a rough set approach, International Journal of computational intelligence, pp. 323-338. [21] Hu X.H., Lin T.Y. and Han J.C. (2004). A new rough sets model based on

database systems, Fundamenta Informaticae, 59(1), pp. 135-152 .

[22] Richard Jensen (2005). Combining rough and fuzzy sets for feature selection,

PhD Thesis, University of Edinburgh.

[23] B. Kolman, R.C. Busby, S.C. Ross (2003). Discrete Mathematical Structures ( fifth ed.), Prentice-Hall, Inc., 2003

[24] Jan Komorowski, Zdzislaw Pawlak, Lech Polkowski and Andrzej Skowron (1999). Rough Sets: A Tutorial. Rough Fuzzy Hybridization – A New Trend in Decision Making (S.K. Pal, A. Skowron; Eds,), pp. 3-98, Springer

[25] M. Kryszkiewicz (1998). Rough set approach to incomplete information systems, Information Sciences 112, pp. 39-49.

[26] Kryszkiewicz, M. (1999). Rule in complete information systems, Information Science, (113), pp. 271 - 292.

[27] Lang G. M., Li Q. G. (2012). Data compression of dynamic set-valued information systems, CoRR abs/1209.6509.

[28] Li X.H. and Shi K.Q. (2006). A knowledge granulation-based algorithm for attribute reduction under incomplete information systems, Computer Science, Vol. 33, pp. 169-171.

[29] Jiye Liang (2011). Uncertainty and feature selection in rough set theory,

Proceedings of the 6th international conference on Rough sets and knowledge technology (RSKT'11), pp. 8-15.

[30] G. Liu (2006). The axiomatization of the rough set upper approximation operations, Fundamenta Informaticae 69 (3):331–342

[31] Liu Y., Xiong R. and Chu J. (2009). Quick Attribute Reduction Algorithm with Hash, Chinese Journal of Computers, Vol.32, No.8, pp. 1493-1499. [32] Dale E. Nelson (2001). High Range Resolution Radar Target Classification,

pp. A Rough Set Approach, PhD Thesis, Ohio University.

[33] Senan Norhalina (2013). Feature selection for traditional Malay musical instrument sound classification using rough set, PhD Thesis, Universiti Tun Hussein Onn Malaysia.

[34] Neil S. Mac Parthalain (2009). Rough Set Extensions for Feature Selection,

PhD Thesis, Aberystwyth University.

[35] Zdzislaw Pawlak (1981). Information System Theoretical Foudation, Information Systems, 6 (3):205-218.

[36] Zdzislaw Pawlak (1982). Rough sets, International Journal of Computer and Information Sciences, 11 (5), pp. 341-356.

[37] Zdzislaw Pawlak (1985) Rough sets and fuzzy sets, Fuzzy sets and systems, 17, pp. 99-102.

[38] Z. Pawlak (1991). Rough Sets: Theoretical Aspects of Reasoning about Data,

System Theory, Kluwer Academic Publishers, Dordrecht, 1991.

[39] Pawlak Z. (1998). Rough set theory and its applications in data analysis,

Cybernetics and systems 29, pp. 661-688.

[40] Zdzisław Pawlak (2002). Rough set theory and its applications, Journal of Telecommunications and Information Technology (3/2002), pp. 7-10.

[41] Z. Pawlak (2002). Rough sets and intelligent data analysis, Information Science, 147, pp. 1-12.

[42] Zdzislaw Pawlak, Andrzej Skowron (2007). Rudiments of rough sets,

Information Sciences 177 (1), pp. 3-27.

[43] Zdzislaw Pawlak, Andrzej Skowron (2007). Rough sets: some extensions,

Information Sciences 177 (1), pp. 28-40.

[44] Y. Qian, C. Dang, J. Liang, D. Tang (2009). Set-valued ordered information systems, Information Sciences 179 (16), pp. 2809–2832.

[45] Y. H. Qian Y. H., Liang J. Y. (2010). On Dominance Relations in Disjunctive Set-Valued Ordered Information Systems, International Journal of Information Technology & Decision Making, Vol. 9, No. 1, pp. 9–33.

[46] Yuhua Qian, Jiye Liang, Witold Pedrycz, Chuangyin Dang (2010). Positive approximation: An accelerator for attribute reduction in rough set theory,

Artificial Intelligence 174 (2010), pp. 597–618.

[47] J. Qian, D. Q. Miao, Z. H. Zhang, W. Li (2011). Hybrid approaches to attribute reduction based on indiscernibility and discernibility relation. Int. J. Approx. Reasoning 52(2), pp. 212-230.

[48] Sanchita Mal-Sarkar (2009). Uncertainty Management of Intelligent Feature Selection in Wireless Sensor Networks, PhD Thesis, Cleveland State University.

[49] Shifei D., Hao D. (2010). Research and Development of Attribute Reduction Algorithm Based on Rough Set, IEEECCDC2010, pp. 648-653.

[50] Andrzej Skowron and Cecylia Rauszer (1992). The discernibility matrices and functions in information systems. Theory and Decision Library, Volume 11, pp 331-362 .

[51] Skowron, A., Stepaniuk, J. (1996). Tolerance approximation spaces.

Fundamenta Informaticae 27, pp. 245-253.

[52] Skowron, A., Rauszer, C.: The discernibility matrices and function in infornation systems. In:Slowinski (ed.) Intelligent Decision Support, vol. 11, pp. 331–362. Springer, Netherlands (1992).

[53] Andrzej Skowron, Jaroslaw Stepaniuk, Roman W. Swiniarski (2012). Modeling rough granular computing based on approximation spaces. Inf. Sci.

184(1), pp. 20-43.

[54] Andrzej Skowron, Andrzej Jankowski, and Roman Swiniarski (2013). 30 Years of Rough Sets and Future Perspectives, RSFDGrC 2013, Halifax, NS, Canada, October 11-14, 2013, Lecture Notes of Computer Science, Volume 8170 (2013), pp. 1–10.

[55] Hung Son Nguyen (2006). Approximate Boolean Reasoning: Foundations and Applications in Data Mining, Transactions on Rough Sets (J.F. Peters and A. Skowron (Eds.), V, LNCS 4100, pp. 334–506.

[56] W. Swieboda, H. S. Nguyen (2012). Rough Set Methods for Large and Spare Data in EAV Format. 2012 IEEE RIVF International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), pp. 1-6

[57] Roman W. Swiniarski, Andrzej Skowron (2003). Rough set methods in feature selection and recognition. Pattern Recognition Letters 24(6), pp. 833- 849.

[58] Marcin Szczuka (2011). The use of Rough Set methods in KDD (A tutorial),

RSFDGrC-2011.

[59] Wang G.Y. (2003). Rough reduction in algebra view and information view,

International Journal of Intelligent System 18, pp. 679-688.

[60] Wang G.Y., Yu H., Yang D.C. and Wu Z.F. (2001). Knowledge Reduction Based on Rough Set and Information Entropy, The World Multi-conference on Systemics, Cybernetics and Informatics, Orlando, Florida, pp. 555-560.

[61] Wang G.Y., Yu H. and Yang D.C. (2002). Decision table reduction based on conditional information entropy, Journal of Computers, Vol. 25 No. 7, pp. 759-766. [62] Wang C. Z., Wua C. X., Chenb D. G., Duc W. J. (2008). Some properties of

relation information systems under homomorphisms, Applied Mathematics Letters 21, pp. 940–945.

[63] Wang X.B., Zheng X.F. and Xu Z.Y (2008). Comparative Research of Attribute Reductions based on the System Entropy and on the Database System, International Symposium on Computation Intelligence and Design, pp. 150-153.

[64] Wang C. Z., Chen D. G., Wuc C., Hu Q. H. (2011). Data compression with homomorphism in covering information systems, International Journal of Approximate Reasoning 52, pp. 519–525.

dynamic data sets. Appl. Soft Comput. 13(1), pp. 676-689.

[66] Feng Wang, Jiye Liang, Yuhua Qian (2013). Attribute reduction: A dimension incremental strategy, Knowl.-Based Syst. 39, pp. 95-108.

[67] Wei Wei, Jiye Liang, Yuhua Qian, Feng Wang, and Chuangyin Dang (2010). Comparative study of decision performance of decision tables induced by attribute reductions, International Journal of General Systems, 39 (8), pp. 813-838.

[68] Xu Z.Y., Yang B.R. and Song W. (2006). Complete attribute reduction algorithm based on Simplified discernibility matrix, Computer Engineering and Applications, Vol. 42, No. 26, pp.167-169.

[69] Yao Y.Y., Zhao Y., Wang J. (2006). On reduct construction algorithms,

Proceedings of International Conference on Rough Sets and Knowledge Technology, pp. 297-304.

[70] Yiyu Yao, Yan Zhao (2008). Attribute reduction in decision-theoretic rough set models, Inf. Sci. 178(17), pp. 3356-3373.

[71] Ye D.Y. and Chen Z.J. (2002). A new discernibility matrix and computation of a core, Acta Electronica Sinica, Vol. 30, No. 7, pp. 1086-1088.

[72] Zadeh L.A. (1965). Fuzzy sets, Information and Control, 8, pp. 338-353, Academic Press, New York.

[73] W. Zhang, J. Mi (2004). Incomplete information system and its optimal selections, Computers & Mathematics with Applications 48 (5–6), pp. 691– 698.

[74] Junbo Zhang, Tianrui Li, Da Ruan, Dun Liu (2012). Rough sets based matrix approaches with dynamic attribute variation in set-valued information systems, International Journal of Approximate Reasoning, Volume 53, Issue 4, pp. 620-635.

[75] Zhao M., Luo K. and Qin Z. (2008). Algorithm for attribute reduction based on granular computing, Computer Engineering and Applications, Vol. 44, No. 30, pp. 157-159.

[76] The UCI machine learning repository, http://archive.ics.uci.edu/ml/datasets.html

[77] Li J.H. (2004), “An Absolute Information Quantity-based Algorithm for Reduction of Knowledge in Information Systems”, Computer Engineering and Applications, 39 (28), pp. 52-53.

[78] Miao D.Q. and Hu G.R. (1999), “A heuristic algorithm for knowledge reduction”, Computer Research and Development, Vol. 36, No. 6, pp. 681-684.

Hệ thông tin giá trị tập

Tập thô theo quan hệ dung sai