Multi-class Text Classification using Support Vector Machines (SVMs)

The st UTS-VNU Research School Advanced Technologies for IoT Applications Title Multi-class Text Classification using Support Vector Machines (SVMs) Author Names and Affiliations NGHIA NGUYEN HOANG University of Information Technology, Ho Chi Minh city Contributions • Hai NT, Nghia NH, Le TD, Nguyen VT A Hybrid Feature Selection Method for Vietnamese Text Classification In Knowledge and Systems Engineering (KSE), 2015 Seventh International Conference on 2015 Oct (pp 91-96) IEEE • Nguyen VT, Hai NT, Nghia NH, Le TD A Term Weighting Scheme Approach for Vietnamese Text Classification In International Conference on Future Data and Security Engineering 2015 Nov 23 (pp 46-53) Springer International Publishing Problem Statement Nowadays, a vast amount of data is being produced more and more, which leads to the problems of gaining insights from the produced data This makes the huge desire for automatically classifying large amount of text information And Support vector machines has been proved to be a effective learning machine, especially for classification Abstract My project presents the experiment conclusion of the suited of Support vector machines for multi-class text in different datasets and discusses the process of text classification with a series of novel Multi-class Support vector machines methods It address the following points: • How to represent text documents as feature vectors and the effect of text representation on classification result? • How to use the binary classifiers for Multi-classification problem? And the empirical experiment on different dataset both in English and Vietnamese Conclusion Having many difference kinds of approach using SVMs are experimented to shows that SVMs are well suited for multi-class text classification problem Most of them based on the combinations of binary classifiers and find a way to use these classifiers with more effective Through my empirical evaluation, each method has their advantaged, one approach could be slower in training time and accuracy but it has the simple in ideal and construct The novel approaches such as tree-based SVMs, like DAGSVM, and membership function SVMs, like fuzzy SVMs, has performed better than other But the changes only few percent points in accuracy and less than a minute in training time, the differences are not enough for a significant change Future work In recent years, there has been a various interesting approaches on how to utilize unlabeled data such as Self training and Co-training, Generative probabilistic models, Semi-supervised support vector machines, Graphbased semi-supervised learning In next short term, I would like to study previous related works to find out how other researchers solve this problem and carefully making empirical research evaluating most common and effective approaches References Joachims, Thorsten "Text categorization with support vector machines: Learning with many relevant features." European conference on machine learning Springer Berlin Heidelberg, 1998 Abe, Shigeo Support vector machines for pattern classification Vol London: Springer, 2005 Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines ACM Transactions on Intelligent Systems and Technology, 2:27:1{27:27, 2011

Định dạng
Số trang	1
Dung lượng	2,39 MB