Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 53 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
53
Dung lượng
1,15 MB
Nội dung
VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY VU TIEN THANH A FEATURE-BASED OPINION MINING MODEL ON PRODUCT REVIEWS IN VIETNAMESE MASTER THESIS OF INFORMATION TECHNOLOGY Hanoi – 2012 VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY VU TIEN THANH A FEATURE-BASED OPINION MINING MODEL ON PRODUCT REVIEWS IN VIETNAMESE Major : Computer Science Code : 60 48 01 MASTER THESIS OF INFORMATION TECHNOLOGY Supervisor: Assoc.Prof Ha QuangThuy Hanoi – 2012 ORIGINALITY STATEMENT ‘I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at University of Engineering and Technology (UET/Coltech) or any other educational institution, except where due acknowledgement is made in the thesis Any contribution made to the research by others, with whom I have worked at UET/Coltech or elsewhere, is explicitly acknowledged in the thesis I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project’s design and conception or in style, presentation and linguistic expression is acknowledged.’ Hanoi, November 25th , 2012 Signed i ii iii ABSTRACT Feature-based opinion mining and summarizing (FOMS) of reviews is a very interesting and attracting issue in the opinion mining field With the development of e-commerce in Vietnam, there are more and more commercial sites and technical forums where people can review or express their opinions on the products which they have used As a result, the number of reviews has been increasing rapidly to hundreds or even thousands for a hot-product in recent years Not only is it difficult for the customer to read in order to make a decision whether to buy product but hard for the producer to handle customer opinions to improve their products as well In this thesis, we describe a Feature-based opinion mining and summarizing model on Vietnamese product reviews Our model performs four following steps:(1)Preprocessing the input customer reviews by standardizing reviews, segmenting Token, and POS tagging(2) extracting explicit product features and opinion-words by using Vietnamese syntax rules, identifying implicit product features by using relationships with opinion words,and automatically grouping synonym product features by combining HAC clustering method and semi-supervised SVM-kNN classification method; (3) identifying opinion sentences in each review and deciding whether each opinion sentence is positive, negative or neutral by using a VietSentiWordNet extended from an initial SentiWordNet 3.0; (4) summarizing the results which is different from the traditional text summarization because we only focus on product-features on which the customers reviewed and whether opinions are positive, negative or neutral Experimental results on Vietnamese reviews of mobile phone product domain demonstrate the effectiveness of the model Publications: ? Huyen-Trang Pham, Tien-Thanh Vu, Mai-Vu Tran and Quang-Thuy Ha A Solution for Grouping Vietnamese Synonym Feature Words in Product Reviews In Proceedings of the 6th international conference on Asia-Pacific Services Computing (APSCC 2011) ? Quang-Thuy Ha, Tien-Thanh Vu, Huyen-Trang Pham and Cong-To Luu An Upgrading Featurebased Opinion Mining Model on Vietnamese Product Reviews In Proceedings of the 7th international conference on Active media technology (AMT 2011), pp 173-185 ? Tien-Thanh Vu, Huyen-Trang Pham, Cong-To Luu and Quang-Thuy Ha A Feature-Based Opinion Mining Model on Product Reviews in Vietnamese In Semantic Methods for Knowledge Management and Communication (SCI 381), pp 23-33 ACKNOWLEDGEMENTS First and foremost, I would like to express my deepest gratitude to my supervisor, Assoc.Prof Ha Quang Thuy, for his patient guidance and continuous support throughout the years He always appears when I need help, and responds to queries so helpfully and promptly I would like to give my honest appreciation to my colleagues at the Knowledge and Technology laboratory for their great support I also would like to thank my friend, Nguyen Quoc Dat, for his kindly help I sincerely acknowledge the Vietnam National University, Hanoi, NAFOSTED Vietnam and especially, QG.10.38 and KC.01.TN04/11-15 projects for supporting finance to my master study Finally, this thesis would not have been possible without the support and love of my parents and my wife Thank you! iv To my family ♥ v Table of Contents Introduction Literature review 2.1 Opinion Mining 2.1.1 The demand of opinion mining 2.1.2 The basic concepts in the opinion mining field 2.1.3 Opinion mining problems 2.2 Feature-based Opinion Mining 2.2.1 Problem Definition 2.2.2 Features Extraction 2.2.3 Opinion Orientation Identification 2.2.4 Feature-based Opinion Mining System on Vietnamese Product Reviews Our Feature-based Opinion Mining Model 3.1 Introduction 3.2 Phase 1: Pre-processing 3.2.1 Data Standardizing 3.2.2 Token Segmenting and POS Tagging 3.3 Phase 2: Product Features and Opinion Words 3.3.1 Explicit Product Features Extraction 3.3.2 Opinion word Extraction 3.3.3 Implicit Features identification 3.3.4 Grouping Synonym Features 3.3.5 Frequent Features Identification 3.4 Phase 3: Determining the opinion orientation 3.5 Phase 4: Summarization vi Extraction 4 10 10 11 12 14 15 15 16 16 17 18 18 21 22 23 24 26 28 TABLE OF CONTENTS vii Evaluation 4.1 Environment and Experimental Data 4.1.1 Environment 4.1.2 Experimental Data 4.2 Product Features Extraction Evaluation 4.3 Opinion Words Extraction Evaluation 4.4 The Whole System Evaluation 29 29 29 29 30 31 32 Conclusion 36 List of Figures 1.1 An example summarization of Samsung Galaxy Tab 2.1 2.2 2.3 2.4 OM documents on Google Scholars (In title) OM documents on Google Scholars (In anywhere) The tree of Nokia N72 object A customer review 3.1 Model for Feature-based Opinion Mining and Summarizing in Vietnamese Product Reviews 16 A summarization output 28 3.2 4.1 4.2 4.3 4.4 4.5 (Precision values (%))A comparison between our method in (Vu et al., 2011) and in this thesis (Recall values (%))A comparison between our method in (Vu et al., 2011) and in this thesis (F1 values (%))A comparison between our method in (Vu et al., 2011) and in this thesis A summarization of Nokia C5-03 A summarization of LG Wink Touch T300 viii 33 34 34 35 35 ... A V O prep Opinion Mining Feature-based Opinion Mining Feature-based Opinion Mining and Summarizing Natural Language Processing Pointwise Mutual Information Support Vector Machine Hierarchical... we introduce related works extracting opinion words and aggregating opinions Finally, some FOM systems in Vietnamese are introduced in 2.2.4 2.1 2.1.1 Opinion Mining The demand of opinion mining. .. we introduce the demand of opinion mining in 2.1.1 Secondly, the basic concepts in the opinion mining field such as Object, Opinion passage on a feature, etc are described in 2.1.2 Finally, opinion