Hướng phát triển

Mở rộng kho ngữ liệu:

Do kích thước của kho ngữ liệu huấn luyện còn nhỏ đối với một số lớp (ví dụ như lớp I), nên kết quả phân lớp không đáng tin cậy trong một số ít trường hợp. Do đó, một hướng phát triển trong tương lai là mở rộng kho ngữ liệu.

Hoàn thiện tập luật:

Do thời gian có hạn, luận văn chỉ đưa ra một số luật để thử nghiệm hiệu quả của việc bổ sung đặc trưng luật, tuy nhiên tập luật chưa đầy đủ do đó không cực đại hóa được hiệu năng của hệ thống. Trong tương lai, nếu bộ luật này hoàn thiện thì hiệu năng của hệ sống sẽ tăng lên.

Mở rộng tập các hành vi:

Hiện tại hệ thống hoạt động với 2 loại hành vi tự bảo vệ mình và tự chẩn đoán. Tương lai có thể mở rộng các nhóm hành vi tự bảo vệ mình và các hành vi tự chẩn đoán liên quan các bệnh khác.

Mở rộng việc giám sát các bất thường:

Ngoài việc giám sát sự các hành vi, hệ thống đang nhắm tới phát hiện các bất thường về những triệu chứng của các bệnh xuất hiện trong cộng đồng qua các phát ngông của người dùng của Twitter.

Phụ lục A

Trích dẫn bài báo khoa học đã công bố

 Chúng tôi đã trích đăng một phần kết quả của luận văn trong công trình [31]

tại hội nghị Fourth International Symposium on Semantic Mining in Biomedicine (SMBM) 25th -26th October, 2010, European Bioinformatics Institute, Hinxton, Cambridgeshire, UK).

Tài liệu tham khảo

[1] J.H. Jones and M. Salathe, "Early Assessment of Anxiety and Behavioral Response to Novel Swine-Origin Influenza A (H1N1)," 2009.

[2] J.P. Woodall, "Global surveillance of emerging diseases: the ProMED-mail perspective," Cad. saúde pública, vol. 17, p. 147, 2001.

[3] L.C. Madoff and J.P. Woodall, "The internet and the global monitoring of emerging diseases: lessons from the first 10 years of ProMED-mail," Archives of medical research, vol. 36, pp. 724–730, 2005.

[4] L.C. Madoff, "ProMED-mail: an early warning system for emerging diseases," Clinical infectious diseases, vol. 39, pp. 227–232, 2004.

[5] M. Hugh-Jones, "Global awareness of disease outbreaks: the experience of ProMED-mail," Public Health Reports, vol. 116, p. 27, 2001.

[6] S. Doan, A. Kawazoe, R.M. Goodwin, M. Conway, Y. Tateno, Q.H. Ngo, D. Dien, A. Kawtrakul, K. Takeuchi, and others N. Collier, "BioCaster: detecting public health rumors with a Web-based text mining system," Bioinformatics, vol. 24, p. 2940, 2008.

[7] S. Doan, A. Kawazoe, and N. Collier M. Conway, "Classifying Disease Outbreak Reports Using N-grams and Semantic Features," 2009.

[8] C. Hutchatai, S. Mika, and C. Nigel K. Ai, "Structuring an event ontology for disease outbreak detection," BMC Bioinformatics, vol. 9.

[9] R. Steinberger, F. Fuart, S. Bucci, J. Belyaeva, M. Gemo, D. Al-Khudhairy, R. Yangarber, and E. van der Goot J.P. Linge, "MedISys: Medical Information System".

[10] J. Belyaeva, M. Gemo, E. Goot, and J.P. Linge A. Rortais, "MedISys: an early warning system for the detection of (re-) emerging food-and feed-borne hazards," Food Research International, 2010.

[11] F. Sebastiani, "Machine learning in automated text categorization," ACM computing surveys (CSUR), vol. 34, pp. 1–47, 2002.

[12] S. Kotsiantis, and V. Tampakas M. Ikonomakis, "Text classification using machine learning techniques," WSEAS Transactions on Computers, vol. 4, pp. 966–974, 2005.

[13] G. Salton and C. Buckley, "Term-weighting approaches in automatic text retrieval," Information processing & management, vol. 24, pp. 513–523, 1988.

[14] M. Lalmas, and N. Fuhr N. Govert, "A probabilistic description-oriented approach for categorizing web documents," Proceedings of the eighth international conference on Information and knowledge management, pp.

[15] L.S. Larkey and W.B. Croft, "Combining classifiers in text categorization,"

Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 289–297, 1996.

[16] G.R. Xue, Q. Yang, and Y. Yu W. Dai, "Transferring naive bayes classifiers for text classification," PROCEEDINGS OF THE NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, p. 540, 2007.

[17] H.C. Rim, D.S. Yook, and H.S. Lim S.B. Kim, "Effective methods for improving Naive Bayes text classifiers," PRICAI 2002: Trends in Artificial Intelligence, pp. 479–484, 2002.

[18] C. Chung and C. Lin, "LibSVM: a libary for Support Vector Machine," 2001. [19] C.C. Chang, C.J. Lin, and others C.W. Hsu, "A practical guide to support

vector classification," Citeseer, 2003.

[20] J. Hu, H. Zeng, and Z. Chen P. Wang, "Using Wikipedia knowledge to improve text classification," Knowledge and Information Systems, vol. 19, pp. 265-281, 2008.

[21] E. Gabrilovich and S. Markovitch, "Feature generation for text categorization using world knowledge," International Joint Conference on Artificial Intelligence, p. 1048, 2005.

[22] R. Beckwith, C. Fellbaum, D. Gross, and K.J. Miller G.A. Miller, "Introduction to wordnet: An on-line lexical database," International Journal of lexicography, vol. 3, p. 235, 1990.

[23] (2010, April) Just the Facts: Statistics from Twitter Chirp. [Online]. http://www.readwriteweb.com/archives/just_the_facts_statistics_from_twitter _chirp.php?utm_source=feedburner&utm_medium=feed&utm_campaign=Fe ed%3A+readwriteweb+%28ReadWriteWeb%29&utm_content=Google+Read er

[24] Collier, "What’s unusual in online disease outbreak news?," 2010.

[25] (2010) Twitter API Wiki / Twitter API Documentation. [Online]. http://apiwiki.twitter.com/Twitter-API-Documentation

[26] "Guide line for building training corpus - DIZZIE project," 2010.

[27] Nigel Collier John McCare. (2010, Aug.) SRL Project. [Online]. http://code.google.com/p/srl-editor/

[28] E. Frank, L. Trigg, M. Hall, G. Holmes, and S.J. Cunningham I.H. Witten, "Weka: Practical machine learning tools and techniques with Java implementations," ICONIP/ANZIIS/ANNES, pp. 192–196, 1999.

[29] Weka 3 - Data Mining with Open Source Machine Learning Software in Java. [Online]. http://www.cs.waikato.ac.nz/ml/weka/

[30] Thompson W, Seeman MG, Treadwell T Hutwagner L, "The Bioterrorism Preparedness and Response Early Aberration Reporting System (EARS),"

i89-i96., 2003.

[31] Nigel Collier, Truong-Son Nguyen, and Ngoc-Mai Nguyen, "OMG U got ﬂu? Analysis of shared health messages for bio-surveillance," Fourth International Symposium on Semantic Mining in Biomedicine (SMBM), 2010.

[32] S. Doan, A. Kawazoe, and N. Collier M. Conway, "Classifying disease outbreak reports using n-grams and semantic features," International journal of medical informatics, 2009.

[33] J. Platt, D. Heckerman, and M. Sahami S. Dumais, "Inductive learning algorithms and representations for text categorization," Proceedings of the seventh international conference on Information and knowledge management, pp. 148–155, 1998.

[34] Nigel Collier , Truong-Son Nguyen, Ngoc-Mai Nguyen, Son Doan, "DIZZIE Project," National Institute of Informatics, 2010.

[35] A. Kawazoe, L. Jin, M. Shigematsu, D. Dien, R.A. Barrero, K. Takeuchi, and A. Kawtrakul N. Collier, "A multilingual ontology for infectious disease surveillance: rationale, design and challenges," Language Resources and Evaluation, vol. 40, no. 405, 2007.

[36] F. Damerau, and S.M. Weiss C. Apté, "Automated learning of decision rules for text categorization," ACM Transactions on Information Systems (TOIS), vol. 12, pp. 233–251, 1994.

Đặc điểm của mạng xã hội Twitter

So sánh giữa các đặc trưng khác nhau