Mô hình khai phá quan điểm dựa trên đặc trưng các đánh giá sản phẩm trong tiếng việt

6 521 1
Mô hình khai phá quan điểm dựa trên đặc trưng các đánh giá sản phẩm trong tiếng việt

Đang tải... (xem toàn văn)

Thông tin tài liệu

Mô hình khai phá quan điểm dựa trên đặc trưng các đánh giá sản phẩm trong tiếng Việt Vũ Tiến Thành Trường Đại học Công nghệ Chuyên ngành: Khoa học máy tính; Mã số: 60 48 01 Người hướng dẫn: PGS.TS. Hà Quang Thụy Năm bảo vệ: 2012 Abstract: In this thesis, we present an approach to build an opinion mining system of customer reviews according to product features based on Vietnamese syntax rules and VietSentiWordNet dictionary in four phases: (1)Pre-processing; (2)Extracting explicit/implicit product features and opinion-words,and grouping synonym product features; (3)Identifying orientation of opinion; and (4)Summarizing the results. With three main contributions as following: Firstly, in the phase 1, we build a Vietnamese accented system combined N-gram statistic model and Hidden Markov model(HMM) for the purpose ofconverting a sentence without accents into a Vietnamese accented sentence. Secondly, in the phase 2, we construct a mapping dictionary to identify implicit features by mapping those ones to corresponding opinion words; and we proposed a method of using SVM-kNN semi-supervised learning along with HAC clustering method generating training set for SVM-kNN to group synonym features; after that, co- reference was resolved by using some Vietnamese rules. Keywords: Khoa học máy tính; Khai phá dữ liệu; Mô hình dữ liệu Table of Contents 1 Introduction 1 2 Literature review 4 2.1 Opinion Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 The demand of opinion mining . . . . . . . . . . . . . . . . . 4 2.1.2 The basic concepts in the opinion mining field . . . . . . . . . 7 2.1.3 Opinion mining problems . . . . . . . . . . . . . . . . . . . . . 9 2.2 Feature-based Opinion Mining . . . . . . . . . . . . . . . . . . . . . . 10 2.2.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Features Extraction . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.3 Opinion Orientation Identification . . . . . . . . . . . . . . . . 12 2.2.4 Feature-based Opinion Mining System on Vietnamese Product Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3 Our Feature-based Opinion Mining Model 15 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 Phase 1: Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.1 Data Standardizing . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.2 Token Segmenting and POS Tagging . . . . . . . . . . . . . . 17 3.3 Phase 2: Product Features and Opinion Words Extraction . . . . . . 18 3.3.1 Explicit Product Features Extraction . . . . . . . . . . . . . . 18 3.3.2 Opinion word Extraction . . . . . . . . . . . . . . . . . . . . . 21 3.3.3 Implicit Features identification . . . . . . . . . . . . . . . . . . 22 3.3.4 Grouping Synonym Features . . . . . . . . . . . . . . . . . . . 23 3.3.5 Frequent Features Identification . . . . . . . . . . . . . . . . . 24 3.4 Phase 3: Determining the opinion orientation . . . . . . . . . . . . . . 26 3.5 Phase 4: Summarization . . . . . . . . . . . . . . . . . . . . . . . . . 28 vi TABLE OF CONTENTS vii 4 Evaluation 29 4.1 Environment and Experimental Data . . . . . . . . . . . . . . . . . . 29 4.1.1 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.1.2 Experimental Data . . . . . . . . . . . . . . . . . . . . . . . . 29 4.2 Product Features Extraction Evaluation . . . . . . . . . . . . . . . . 30 4.3 Opinion Words Extraction Evaluation . . . . . . . . . . . . . . . . . . 31 4.4 The Whole System Evaluation . . . . . . . . . . . . . . . . . . . . . . 32 5 Conclusion 36 Bibliography Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, and Daniel Tapias, editors, Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), Valletta, Malta, may 2010. European Language Resources Association (ELRA). ISBN 2-9517408-6-7. Giuseppe Carenini, Raymond T. Ng, and Ed Zwart. Extracting knowledge from evaluative text. In K-CAP, pages 11–18, 2005. Amitava Das and Sivaji Bandyopadhyay. Sentiwordnet for indian languages. In Proceedings of The 8th Workshop on Asian Language Resources, pages 56—-63, 2010. Andrea Esuli. Automatic generation of lexical resources for opinion mining: models, algorithms and applications. SIGIR Forum, 42:105–106, November 2008. ISSN 0163-5840. Andrea Esuli and Fabrizio Sebastiani. Sentiwordnet: A publicly available lexical resource for opinion mining. In In Proceedings of the 5th Conference on Language Resources and Evaluation (LREC’06), pages 417–422, 2006. Quang-Thuy Ha, Tien-Thanh Vu, Huyen-Trang Pham, and Cong-To Luu. An upgrading feature- based opinion mining model on vietnamese product reviews. In Proceedings of the 7th interna- tional conference on Active media technology, AMT’11, pages 173–185, Berlin, Heidelberg, 2011. Springer-Verlag. ISBN 978-3-642-23619-8. Vasileios Hatzivassiloglou and Kathleen R. McKeown. Predicting the semantic orientation of ad- jectives. In Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, EACL ’97, pages 174–181, Stroudsburg, PA, USA, 1997. Associa- tion for Computational Linguistics. Chih-Wei Hsu and Chih-Jen Lin. A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks, 13(2):415–425, 2002. URL http://ieeexplore.ieee. org/xpls/abs_all.jsp?arnumber=991427&isnumber=21380. 38 Bibliography 39 Minqing Hu and Bing Liu. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’04, pages 168–177, New York, NY, USA, 2004. ACM. ISBN 1-58113-888-1. Pham Huyen-Trang, Vu Tien-Thanh, Tran Mai-Vu, and Ha Quang-Thuy. A solution for grouping vietnamese synonym feature words in product reviews. In Proceedings of the APSCC 2011 conference, inpress, Korea, 2011. Binh Thanh Kieu and Son Bao Pham. Sentiment analysis for vietnamese. In Proceedings of the 2010 Second International Conference on Knowledge and Systems Engineering, KSE ’10, pages 152–157, Washington, DC, USA, 2010. IEEE Computer Society. ISBN 978-0-7695-4213-3. Soo-Min Kim and Eduard Hovy. Automatic identification of pro and con reasons in online reviews. In Proceedings of the COLING/ACL on Main conference poster sessions, COLING-ACL ’06, pages 483–490, Stroudsburg, PA, USA, 2006. Association for Computational Linguistics. Kunlun Li, Xuerong Luo, and Ming Jin. Semi-supervised learning for svm-knn. Journal of Com- puters, 5(5):671–679, 2010. Bing Liu. Sentiment analysis and subjectivity. In Nitin Indurkhya and Fred J. Damerau, edi- tors, Handbook of Natural Language Processing, Second Edition. CRC Press, Taylor and Francis Group, Boca Raton, FL, 2010. ISBN 978-1420085921. Bruno Ohana. Opinion mining with the SentWordNet lexical resource. PhD thesis, 2009. Bo Pang and Lillian Lee. Opinion mining and sentiment analysis. Found. Trends Inf. Retr., 2: 1–135, January 2008. ISSN 1554-0669. doi: 10.1561/1500000011. URL http://dl.acm.org/ citation.cfm?id=1454711.1454712. Dang Duc Pham, Giang Binh Tran, and Son Bao Pham. A hybrid approach to vietnamese word segmentation using part of speech tags. Knowledge and Systems Engineering, International Conference on, 0:154–161, 2009. Ana-Maria Popescu and Oren Etzioni. Extracting product features and opinions from reviews. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT ’05, pages 339–346, Stroudsburg, PA, USA, 2005. Association for Computational Linguistics. Guang Qiu, Bing Liu, Jiajun Bu, and Chun Chen. Expanding domain sentiment lexicon through double propagation. In Proceedings of the 21st international jont conference on Artifical intelli- gence, IJCAI’09, pages 1199–1204, San Francisco, CA, USA, 2009. Morgan Kaufmann Publishers Inc. Guang Qiu, Bing Liu, Jiajun Bu, and Chun Chen. Opinion word expansion and target extraction through double propagation. Comput. Linguist., 37:9–27, 2011. ISSN 0891-2017. Bibliography 40 Christopher Scaffidi, Kevin Bierhoff, Eric Chang, Mikhael Felker, Herman Ng, and Chun Jin. Red opal: product-feature scoring from reviews. In Proceedings of the 8th ACM conference on Electronic commerce, EC ’07, pages 182–191, New York, NY, USA, 2007. ACM. ISBN 978-1- 59593-653-0. Veselin Stoyanov and Claire Cardie. Topic identification for fine-grained opinion analysis. In Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1, COLING ’08, pages 817–824, Stroudsburg, PA, USA, 2008. Association for Computational Lin- guistics. ISBN 978-1-905593-44-6. Mike Thelwall. Myspace comments. Online Information Review, 33(1):58–76, 2009. Peter D Turney. Thumbs up or thumbs down? semantic orientation applied to unsupervised classi- fication of reviews. Computational Linguistics, pages(July):8, 2002. URL http://cogprints. org/2321/. Peter D. Turney and Michael L. Littman. Measuring praise and criticism: Inference of semantic orientation from association. ACM Trans. Inf. Syst., 21:315–346, October 2003. ISSN 1046-8188. Tien-Thanh Vu, Huyen-Trang Pham, Cong-To Luu, and Quang-Thuy Ha. A feature-based opin- ion mining model on product reviews in vietnamese. In Radoslaw Katarzyniak, Tzu-Fu Chiu, Chao-Fu Hong, and Ngoc Nguyen, editors, Semantic Methods for Knowledge Management and Communication, volume 381 of Studies in Computational Intelligence, pages 23–33. Springer Berlin Heidelberg, 2011. ISBN 978-3-642-23417-0. Zhongwu Zhai, Bing Liu, Hua Xu, and Peifa Jia. Grouping product features using semi-supervised learning with soft-constraints. In Proceedings of the 23rd International Conference on Compu- tational Linguistics, COLING ’10, pages 1272–1280, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics. Zhongwu Zhai, Bing Liu, Hua Xu, and Peifa Jia. Clustering product features for opinion mining. In WSDM’11, pages 347–354, 2011a. Zhongwu Zhai, Bing Liu, Hua Xu, and Peifa Jia. Constrained lda for grouping product features in opinion mining. In Joshua Huang, Longbing Cao, and Jaideep Srivastava, editors, Advances in Knowledge Discovery and Data Mining, volume 6634 of Lecture Notes in Computer Science, pages 448–459. Springer Berlin / Heidelberg, 2011b. ISBN 978-3-642-20840-9. Hao Zhang, Alexander C. Berg, Michael Maire, and Jitendra Malik. Svm-knn: Discriminative nearest neighbor classification for visual category recognition. In CVPR (2), pages 2126–2136, 2006. Lei Zhang, Bing Liu, Suk Hwan Lim, and Eamonn O’Brien-Strain. Extracting and ranking product features in opinion documents. In Proceedings of the 23rd International Conference on Com- putational Linguistics: Posters, COLING ’10, pages 1462–1470, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics. . Mô hình khai phá quan điểm dựa trên đặc trưng các đánh giá sản phẩm trong tiếng Việt Vũ Tiến Thành Trường Đại học Công nghệ Chuyên. co- reference was resolved by using some Vietnamese rules. Keywords: Khoa học máy tính; Khai phá dữ liệu; Mô hình dữ liệu Table of Contents 1 Introduction 1 2 Literature review 4 2.1 Opinion Mining. Đại học Công nghệ Chuyên ngành: Khoa học máy tính; Mã số: 60 48 01 Người hướng dẫn: PGS.TS. Hà Quang Thụy Năm bảo vệ: 2012 Abstract: In this thesis, we present an approach to build an

Ngày đăng: 25/08/2015, 16:23

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan