Transductive support vector machines for cross lingual sentiment classification

4 8 0
Transductive support vector machines for cross lingual sentiment classification

Đang tải... (xem toàn văn)

Thông tin tài liệu

Transductive Support Vector Machines for Cross-lingual Sentiment Classification Nguyen Thi Thuy Linh Faculty of Information Technology University of Engineering and Technology Vietnam National University, Hanoi Supervised by Professor Ha Quang Thuy A thesis submitted in fulfillment of the requirements for the degree of Master of Computer Science December, 2009 Table of Contents Introduction 1.1 Introduction 1.2 What might be involved? 1.3 Our approach 1.4 Related works 1.4.1 Sentiment classification 1.4.1.1 Sentiment classification tasks 1.4.1.2 Sentiment classification features 1.4.1.3 Sentiment classification techniques 1.4.1.4 Sentiment classification domains 1.4.2 Cross-domain text classification Background 2.1 Sentiment Analysis 2.1.1 Applications 2.2 Support Vector Machines 2.3 Semi-supervised techniques 2.3.1 Generate maximum-likelihood models 2.3.2 Co-training and bootstrapping 2.3.3 Transductive SVM The 3.1 3.2 3.3 semi-supervised model for cross-lingual The semi-supervised model Review Translation Features 3.3.1 Words Segmentation 3.3.2 Part of Speech Tagging 3.3.3 N-gram model ii approach 1 3 4 4 5 6 7 10 10 11 11 13 13 16 16 16 18 18 TABLE OF CONTENTS iii Experiments 4.1 Experimental set up 4.2 Data sets 4.3 Evaluation metric 4.4 Features 4.5 Results 4.5.1 Effect of cross-lingual corpus 4.5.2 Effect of extraction features 4.5.2.1 Using stopword list 4.5.2.2 Segmentation and Part 4.5.2.3 Bigram 4.5.3 Effect of features size 20 20 20 22 22 23 23 24 24 24 25 25 of speech tagging Conclusion and Future Works 28 A 30 B 32 Abstract Sentiment classification has been much attention and has many useful applications on business and intelligence This thesis investigates sentiment classification problem employing machine learning technique Since the limit of Vietnamese sentiment corpus, while there are many available English sentiment corpus on the Web We combine English corpora as training data and a number of unlabeled Vietnamese data in semi-supervised model Machine learning eliminates the language gap between the training set and test set in our model Moreover, we also examine types of features to obtain the best performance The results show that semi-supervised classifier are quite good in leveraging cross-lingual corpus to compare with the classifier without cross-lingual corpus In term of features, we find that using only unigram model turning out the outperformace ... 1.4.1 Sentiment classification 1.4.1.1 Sentiment classification tasks 1.4.1.2 Sentiment classification features 1.4.1.3 Sentiment classification techniques 1.4.1.4 Sentiment classification. .. classification domains 1.4.2 Cross- domain text classification Background 2.1 Sentiment Analysis 2.1.1 Applications 2.2 Support Vector Machines 2.3 Semi-supervised... obtain the best performance The results show that semi-supervised classifier are quite good in leveraging cross- lingual corpus to compare with the classifier without cross- lingual corpus In term

Ngày đăng: 16/03/2021, 12:31

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan