Kết luận chƣơng 3 - xử lý giá trị thiếu trong khai- 123docz.net

Chƣơng 3 đã trình bày một phƣơng pháp hiệu quả xử lý giá trị thiếu, mới đƣợc đề xuất thời gian gần đây: phƣơng pháp HMiT. Phƣơng pháp này sử

dụng phối hợp kỹ thuật khai phá luật kết hợp trong CSDL không đầy đủ do

Ragel và Cremilleux đề xuất với phƣơng pháp k-láng giềng gần nhất. Để kiểm tra, đánh giá độ hiệu quả của kỹ thuật mới này, chúng tôi đã tiến hành các tính toán thử nghiệm trên các cơ sở dữ liệu thực tế. Kết quả thử nghiệm cho thấy, phƣơng pháp HMiT chẳng những cho độ chính xác cao hơn mà còn có thời gian xử lý giá trị thiếu ít hơn so với phƣơng pháp k-láng giềng gần nhất.

Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên http://www.lrc-tnu.edu.vn

TÀI LIỆU THAM KHẢO

[1] R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, A.I. Verkamo, Fast

discovery of association rules, Advances in Knowledge Discovery and Data Mining, MIT Press, Cambridge, MA, 1996 pp. 307–328.

[2] R. Agrawal, T. Imielinski, A. Swami, Mining association rules between

sets of items in large databases, in: Proceedings of the ACM SIGMOD

Conference on Management of Data, Washington, DC, USA, 1993, pp. 207–216.

[3] G. Batista and M. Monard, “K-Nearest Neighbour as Imputation

Method: Experimental Results”, Tech. Report 186 ICMC-USP, 2002. [4] P. Ciaccia, M. Patella, and P. Zezula. M-tree: An Efficient Access

Method for Similarity Search in Metric Spaces. In VLDB’97, pages

426–435, 1997.

[5] J. W. Grzymala-Busse and M. Hu, A comparison of several approaches to missing attribute values in data mining. Proceedings of the Second International Conference on Rough Sets and Current Trends in Computing RSCTC'2000, October 16–19, 2000, Canada, 340–347.

[6] J. Han, & M. Kamber, Data Mining: Concepts and Techniques,

Morgan Kaufmann Publishers, 2001.

[7] H. Hu and J. Li, Using Association Rules to Make Rule-based Classi-

fiers Robust, Proc. of 16th Australasian Database Conference (ADC),

2005.

[8] R. Kohavi, D. Sommerfield, and J. Dougherty. Data Mining using

Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên http://www.lrc-tnu.edu.vn

Intelligence, pages 234–245, 1996.

[9] K. Lakshminarayan, S.A. Harp, R. Goldman, T. Samad, Imputation of

missing data using machine learning techniques, Proceedings of the

Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, USA, MIT Press, Cambridge, MA, 1996 pp. 140–146.

[10] K. Lakshminarayan, S. A. Harp, and T. Samad. Imputation of Missing

Data in Industrial Databases. Applied Intelligence, 11:259–275, 1999.

[11] R. J. Little and D. B. Rubin. Statistical Analysis with Missing Data.

John Wiley and Sons, New York, 1987.

[12] W.Z. Liu, A.P.White, S.G. Thompson, M.A. Bramer, Techniques for

dealing with missing values in classiﬁcation, in: Second International

Symposium on Intelligent Data Analysis, London, UK, 1997.

[13] R. J. Little and D. B. Rubin. Statistical Analysis with Missing Data.

John Wiley and Sons, New York, 1987.

[15] K.C. Lee, , J.S. Park, Y.S. Kim, and Y.T. Byun, Missing Value Estima-

tion Based on Dynamic Attribute Selection. In proceedings of the

PAKDD 2000, pp. 134-137, 2000.

[16] C. J. Merz and P. M. Murphy. UCI Repository of Machine Learning

Datasets, 1998. http://www.ics.uci.edu/ mlearn/MLRepository.html.

[17] H. Mannila, Data mining: machine learning, statistics, and databases.

Eight International Conference on Scientific and Statistical Database Management, Stockholm June 18-20, 1996, p. 1-8, 1996.

Số hóa bởi Trung tâm Học liệu – Đại học Thái Nguyên http://www.lrc-tnu.edu.vn

Decision Tree Induction. Machine Learning, Vol.6, 81-92, 1991.

[19] D. Pyle, Data Preparation for Data Mining. Morgan Kaufmann Publi-

shers, Inc, 1999.

[20] J.R. Quinlan, Induction of decision trees. Machine Learning, 1, 81-

106, 1986.

[21] A. Ragel, B. Crémilleux, Treatment of missing values for association

rules, Proceedings of The Second Paciﬁc-Asia Conference on Know-

ledge Discovery and Data Mining (PAKDD-98), Melbourne, Australia, Lecture Notes in Artiﬁcial Intelligence 1394, Springer, Berlin, 1998 pp. 258–270.

[22] A.P. White, Probabilistic induction by dynamic path generation in virtual

trees. In Research and Development in Expert Systems III, edited by

M.A. Bramer, pp. 35-46. Cambridge: Cambridge University Press, 1987.