Nhận dạng cảm xúc dựa trên bình luận trong điều kiện học nửa giám sát

BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC MỞ THÀNH PHỐ HỒ CHÍ MINH ∞0∞ HỒ HƯỚNG THIÊN NHẬN DẠNG CẢM XÚC DỰA TRÊN BÌNH LUẬN TRONG ĐIỀU KIỆN HỌC NỬA GIÁM SÁT Tai Lieu Chat Luong LUẬN VĂN THẠC SĨ KHOA HỌC MÁY TÍNH TP HỒ CHÍ MINH, NĂM 2020 BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC MỞ THÀNH PHỐ HỒ CHÍ MINH ∞0∞ HỒ HƯỚNG THIÊN NHẬN DẠNG CẢM XÚC DỰA TRÊN BÌNH LUẬN TRONG ĐIỀU KIỆN HỌC NỬA GIÁM SÁT Chuyên ngành: Khoa học máy tính Mã số chuyên ngành: 60 48 01 01 LUẬN VĂN THẠC SĨ KHOA HỌC MÁY TÍNH Giảng viên hướng dẫn: TS TRƯƠNG HỒNG VINH TP HỒ CHÍ MINH, NĂM 2020 TRƯỜNG ĐẠI HỌC MỞ THÀNH PHỐ HỒ CHÍ MINH KHOA ĐÀO TẠO SAU ĐẠI HỌC CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM Độc lập – Tự – Hạnh phúc GIẤY XÁC NHẬN Tôi tên là: Hồ Hướng Thiên Ngày sinh: 25/12/1982 Nơi sinh: Khánh Hoà Chuyên ngành: Khoa Học Máy Tính Mã học viên: 1884801010008 Tơi đồng ý cung cấp tồn văn thơng tin luận văn tốt nghiệp hợp lệ quyền cho Thư viện trường đại học Mở Thành phố Hồ Chí Minh Thư viện trường đại học Mở Thành phố Hồ Chí Minh kết nối tồn văn thơng tin luận văn tốt nghiệp vào hệ thống thông tin khoa học Sở Khoa học Cơng nghệ Thành phố Hồ Chí Minh Ký tên Hồ Hướng Thiên LỜI CAM ĐOAN Tôi tên Hồ Hướng Thiên, học viên cao học lớp MCOM018A niên khố 2018 - 2020 Tơi xin cam đoan luận văn “Nhận dạng cảm xúc dựa bình luận điều kiện học nửa giám sát” nghiên cứu tơi hướng dẫn thầy tiến sĩ Trương Hoàng Vinh Những kết đạt đóng góp từ luận văn kết trình tìm hiểu, nghiên cứu thực nghiệm thân tơi Bên cạnh đó, luận văn sử dụng số tài liệu, số cơng trình nghiên cứu trước Tất tài liệu công trình nghiên cứu tơi trích dẫn ghi nguồn trích dẫn rõ ràng, theo quy định Kết luận văn chưa nộp để nhận cấp trường đại học sở đào tạo khác Tôi xin cam đoan chịu hoàn toàn trách nhiệm nội dung Thành phố Hồ Chí Minh, năm 2020 Hồ Hướng Thiên i LỜI CẢM ƠN Trong thời gian học tập, nghiên cứu để hồn thiện luận văn, tơi nhận hướng dẫn, giúp đỡ tận tình quý thầy cô khoa CNTT, khoa Sau Đại Học trường Đại học Mở Tp.HCM Tôi xin gởi lời cảm ơn chân thành đến tiến sĩ Trương Hoàng Vinh Thầy nhiệt tình hỗ trợ, hướng dẫn tơi suốt thời gian nghiên cứu thực luận văn Bên cạnh đó, thầy cịn khơi gợi, truyền đạt tinh thần nghiên cứu khoa học công bố khoa học thông qua báo hội nghị quốc tế Tôi may mắn cộng tác, làm việc chung với thầy Một lần nữa, xin gởi đến thầy lòng biết ơn sâu sắc Bên cạnh đó, tơi xin cảm ơn thầy giảng dạy số môn trình học tập trường Ngồi việc truyền đạt kiến thức, thầy hướng dẫn, đưa lời khuyên, kinh nghiệm cho việc chọn đề tài nghiên cứu viết đề tài luận văn Cuối cùng, xin gởi lời cảm ơn đến người thân yêu gia đình, số bạn bè thân thiết Những người bên cạnh, hỗ trợ, động viên tạo điều kiện thuận lợi để tơi hồn thành luận văn, chương trình học thạc sĩ trường ii TÓM TẮT Cảm xúc biểu tâm lý người bao gồm hạnh phúc, buồn, vui, hờn giận v.v Những cảm xúc thể qua lời nói, câu từ, cử hay biểu cảm khuôn mặt Cảm xúc dường ảnh hưởng nhiều đến việc làm, định người đời sống ngày Việc nhận biết cảm xúc người không giúp thành cơng giao tiếp, mà cịn hỗ trợ doanh nghiệp hiểu khách hàng, nắm bắt thị hiếu, mong muốn khách hàng để từ chăm sóc khách hàng tốt Vì có nhiều lợi ích mang lại, nhận dạng cảm xúc quan tâm lớn Sự quan tâm khơng giới nghiên cứu khoa học, mà nhiều doanh nghiệp Ngày nay, với bùng nổ công nghệ thông tin ứng dụng thuộc lĩnh vực trí tuệ nhân tạo, việc nhận dạng cảm xúc nhu cầu lớn áp dụng lĩnh vực chứng khoán, hệ thống khách sạn, du lịch, marketing Cùng với bùng nổ đó, nhiều cơng việc sử dụng phương tiện kỹ thuật số làm kênh liên lạc yếu Con người giao tiếp, trao đổi với thơng qua tin nhắn, bình luận, email văn Bằng nhiều phương pháp khác nhau, nhận dạng cảm xúc người thơng qua bình luận viết ngắn Trong năm gần đây, hướng tiếp cận xử lý toán nhận dạng cảm xúc theo phương pháp học máy nhiều cơng trình nghiên cứu áp dụng iii đạt nhiều kết khả quan Phương pháp luận văn tiếp cận nghiên cứu cho việc nhận dạng cảm xúc Trong đó, ba phân lớp Naive Bayes, Random Forest Support Vector Machine luận văn lựa chọn áp dụng vào mơ hình Đối với phương pháp học máy, việc xây dựng mơ hình huấn luyện địi hỏi phải có lượng liệu đủ lớn Số lượng liệu huấn luyện nhiều hay ảnh hưởng phần đến hiệu độ xác mơ hình Tuy nhiên, nhiều lĩnh vực nhiều trường hợp, khơng có đủ liệu gán nhãn để huấn luyện mơ hình Vì vậy, luận văn tập trung nghiên cứu đồng thời đề xuất số kỹ thuật nhằm làm tăng liệu huấn luyện Các kỹ thuật luận văn đề xuất mang lại kết đáng kể việc nâng cao độ xác mơ hình Bên cạnh việc nghiên cứu đề xuất kỹ thuật làm tăng liệu văn bản, luận văn cịn nghiên cứu xây dựng mơ hình huấn luyện điều kiện có liệu huấn luyện Trong mơ hình thực nghiệm, luận văn đặt thử thách lớn việc xây dựng mơ hình huấn luyện điều kiện liệu việc lựa chọn xây dựng mơ hình với liệu huấn luyện từ đến mười câu bình luận Qua thực nghiệm, luận văn cho thấy hiệu số phương pháp đề xuất Đặc biệt, luận văn đóng góp thêm bốn kỹ thuật làm tăng liệu huấn luyện iv ABSTRACT Emotions are human psychological expression, including happiness, sadness, anger, etc These emotions can be expressed through words, sentences, gestures, or facial expressions Sentiment might have an impact to and decisions of people in daily life Recognizing human emotions not only helps us to be successful in communication, but also helps understanding customers in business This allows to capture the desires of customers for customer care There are many benefits to it, sentiment analysis has been receiving many attentions in the last decade both in scientific research community and businesses Today, with the explosion of information technology and various applications of artificial intelligence, sentiment analysis is a great demand and is widely applied in different areas such as securities, hotel systems booking, tourism, marketing Many jobs have used digital media as the main communication channel People can communicate via messages, comments, emails, or texts By using different methods, we can identify human emotions through these short comments or reviews In recent years, the sentiment analysis problem method has been solved by different machine learning algorithms because it has achieved a promising result The sentiment analysis is further investigated in this thesis The three classifiers Naive Bayes, Random Forest and Support Vector Machine are considered to recognize the sentiments of Vietnamese comment or reviews of products v In machine learning, building a training model requires a sufficiently large amount of data The amount of training data also affects the performance and accuracy of the model However, in many areas, we not have enough labeled data to train the model This thesis focusses on increasing training data for Vietnamese short text reviews We put the proposed issues as a challenge problem by building the training model in the context of very limited training data The proposed approach is evaluated, and which shows its efficiency Through experiments, the thesis has shown effectiveness in several proposed methods In particular, the thesis has contributed four additional techniques to increase training data vi PHỤ LỤC Thien Ho Huong and Vinh Truong Hoang, “A data augmentation technique based on text for vietnamese sentiment analysis,” in Proceedings of the 11th International Conference on Advances in Information Technology, ser IAIT2020 New York, NY, USA: Association for Computing Machinery, 2020 79 A data augmentation technique based on text for Vietnamese sentiment analysis Thien Ho Huong Ho Chi Minh City Open University Ho Chi Minh City Vietnam thienhh.188i@ou.edu.vn ABSTRACT Online opinions are used as a data source that contains relevant information about customer sentiments toward a product or service This can be used to make a specific decision for customers and management To achieve the good models for sentiment analysis, we require a large human-labeled data which is costly to obtain This paper proposes an approach based on text data augmentation based on product reviews in Vietnamese language Several basic techniques are applied to generate more comments by random insertions, substitutions The experimental results demonstrate the efficiency of the proposed approach KEYWORDS sentiment analysis, product reviews, text augmentation, Vietnamese language, natural language processing, text mining ACM Reference format: Thien Ho Huong and Vinh Truong Hoang 2020 A data augmentation technique based on text for Vietnamese sentiment analysis In IAIT ’20: International Conference on Advances in Information Technology, July 01–03, 2020, Bangkok, Thailand ACM, New York, NY, USA, pages https://doi.org/10.1145/XXXXXX XXXXXX INTRODUCTION Throughout the era of digitization, a growing number of people are expressing their opinions on the Web, for example throughout public reviews These comments are important for running businesses or services because they provide recommendations for enhancement Therefore, the decisions of customers rely heavily on reviews [1] However, manually analyzing those comments is time-consuming and it is more challenging to generalize the results Sentiment Analysis (SA) is a research topic which aimed at recognizing and analyzing the components of a person’s opinion in machine learning In the last decade, SA has been received many attentions and it is widely applied in various application such as market analysis [2], analyze the reviews related to products [3, 4], political communication [5], social medias [6, 7] ∗ Both authors contributed equally to this research Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page Copyrights for components of this work owned by others than ACM must be honored Abstracting with credit is permitted To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee Request permissions from permissions@acm.org IAIT ’20, July 01–03, 2020, Bangkok, Thailand © 2020 Association for Computing Machinery ACM ISBN 978-1-4503-XXXX-XXXXXX $15.00 https://doi.org/10.1145/XXXXXX.XXXXXX Vinh Truong Hoang Ho Chi Minh City Open University Ho Chi Minh City Vietnam vinh.th@ou.edu.vn Sentiment analysis can be considered as a text classification problem which is essentially based on computational linguistics Natural Language Processing (NLP) It is easier for mining text in longer documents than in short texts due to the context of semantic understanding The sentiments of an aspect can be divided into different categories depending on the purpose of sentiment classification such as: positive, negative, and neutral classes Thus, it is very simple to collect large quantities of unlabeled data from online social networks but very costly to fully label texts into categories However, the classification results mainly reply on supervisory information and required a large scale labeled data to train the model Data augmentation method is one of the most techniques used for tackling this issue for generating more data It is widely applied in computer vision [8] by using simple techniques such as flip, rotate, crop, scale or color texture features [9] to transform the original images Due to the meaning of words, grammar diversity of language and context so data augmentation in NLP is still a challenging problem Several augmenting training data with semi-supervised learning have been proposed for dealing with limited training data in literature Lu et al [10] apply the propagation method to generate unlabeled data via a weighted undirected graph Lee et al [11] combined two learning approaches on supervised and unsupervised to handle small number of labeled sentiments Shakeel et al [12] proposed a data augmentation strategy and a multi-cascaded model for improved paraphrase detection in short texts This approach is based on binary relations over the set of texts which apply graph theoretic concepts to generate paraphrase and non-paraphrase pairs Jason Wei et al [13] present a basic technique for text data augmentation, namely Easy Data Augmentation (EDA) including synonym replacement, random insertion, random Swap, and random Deletion Additionally, the similarity word replacement method based on vector word embedding space, and random noise method is investigated in this study The work in [14] applied synonym replacement method but it restricted the words to nouns, adjectives, and adverbs A score is computed for each synonym and the best score is chosen for the selected words The experimental results showed that in some cases replacing verbs or prepositions with synonym words make the ungrammatical sentences and can be wrong in contextual meaning However, this phenomenon does not occur in case of nouns, adjective and adverbs synonym replacement IAIT ’20, July 01–03, 2020, Bangkok, Thailand Sentiment analysis for Vietnamese language has received significantly less attention in the literature The authors in [15] applied the basic data augmentation technique by using synonym replacement and random swap in semi-supervised learning context In this paper, we further investigate this problem by using several methods of word replacement by synonym or similar words for Vietnamese text The word embedding is based on the cosine distance for measuring the similarity [16] The rest of this paper is organized as follows Section illustrates our proposed method Then, section describes the experimental results Finally, the conclusion is discussed in section TEXT DATA AUGMENTATION APPROACH Ho Huong and Truong Hoang The overall of the proposed text data augmentation approach is illustrated in figure Online comments on products are used as a data source to make various management decisions These reviews are often given in short sentences by different ways of expression So, the text Pre-processing is a main step to reduce noise from those comments The contents consist of more characters which are not meaningful so that it will be removed from the training dataset These basic comment pre-processing are tokenization, removing URLs, removing hashtag, removing email, removing @user, emoticons handling, removing numbers, lowercasing, removal of duplicate letters, punctuations removal In this study, we focus on words preprocessing which include segmentation, stopwords removal, negation handling Figure 1: Overall of the proposed text data augmentation approach Segmentation for Vietnamese language is an essential task due to its grammatical complexity In Vietnamese, a word can have completely distinct meaning when it is an individual position or combining with another word For example, the meaning of the word "đất" (sand) and "nước" (water) when they are combined is "đất nước" (country) Therefore, we need a strong enough segmentation tool to this In this study, the pyvi library is applied for Vietnamese word segmentation Moreover, the removal of stopwords is applied to remove words that are less meaningful for sentiment analysis Many Vietnamese stop words are considered as: "thì" (to be), "nhưng" (but), "là" (to be), "vì" (because) These stopwords are created manually based on the frequency term in the data by determining their TF-IDF score For negation handling, we have built a list of negation words based on the study [17] Some negative words such as không (not), chẳng (not), chưa (not yet), chả (not), đâu (not), đâu có (not at all), (any), có (any), khỏi (without), ứ (not) We first determine the negative words of each sentence and then combine sentiment words in [18] An addition of the symbol NOT\_ is realized to the tokens by positive or negative lexicons All comments will be tokenized to individual words to characterize by feature vectors The two extraction methods are considered such as: BOW (Bag of Words), TF-IDF (Term Frequency-Inverse Document Frequency) since they are simple and efficient for representing textual data [18] The Term Frequency (TF) is a frequency of word appearance and the number of times a word appears in a document, divided by the total number of words in that document Where t is a word in a document, ( , ) is a A data augmentation technique based on text for Vietnamese sentiment analysis frequency of word occurring in that document and number of words in the document ( )= ( , ) is a total (1) There are some words that appear in most documents, but it has no meaning for sentiment recognition For example, "thì" (to be), "mà" (yet), "nhưng" (but) etc The Inverse Document Frequency (IDF) is computed as the logarithm of the number of the documents in the corpus divided by the number of documents where the specific term appears: ( , )= (2) |{ ∈ ∶ ∈ }| where N is a total number of documents and denominator is the number of documents containing the word t In case, the word disappears in any document then the equation will become invalid, so we need to set the value of the denominator as The TF-IDF is finally calculated as below: T F - IDF (t, d, D) = T F (t) × IDF (t, D) (3) The text data augmentation is applied by using four techniques [13] to generate more comments: IAIT ’20, July 01–03, 2020, Bangkok, Thailand (1) Words Replacement: many works applied WordNet to replace the synonym words but to the best of our knowledge, WordNet does not perform well for Vietnamese text So, the similar words based on cosine distance of word embedding vectors, Word2vec [16] are used to replace words The pre-trained model [20] is applied in our experiments (2) Words Insertion: this technique is used to find and insert the synonym or similar word at the end of each sentence (3) Words Swapping: this technique will be implemented n times, where n is a subtraction of the tokenized number of each sentence (4) Words Deletion: a new sentence will be created after deleting some words including verbs, adverbs, prepositions Table illustrates an example of a comment “Nhân viên phục vụ nhiệt tình lịch sự” (Enthusiastic and polite service staff) by using four data augmentation techniques When these techniques are applied the grammar and meaning of the sentences might be changed totally but its general sentiment does not change It is worth noting that our scope is to focus on sentiment of sentences The grammatical structure and global context are not considered in this case Table1: Sentences generated for the comment "Nhân viên phục vụ nhiệt tình lịch sự" (Enthusiastic and polite service staff) by using four data augmentation techniques Techniques None Words Replacement Words Insertion Words Deletion Words Swapping Sentence Nhân_viên phục_vụ nhiệt_tình lịch_sự Nhân_viên cao_cấp hoan_nghênh lịch_sự Nhân_viên phục_vụ nhiệt_tình lịch_sự chuyên_viên cao_cấp Nhân_viên nhiệt_tình lịch_sự nhiệt_tình Nhân_viên phục_vụ lịch_sự EXPERIMENTAL AND RESULTS We use the dataset provided in [15] order to evaluate our proposed approach Dataset and dataset are the collection of Vietnamese comments on food which were crawled on streetcodevn.com Dataset is collected from an AI contest for Vietnamese sentiment analysis The characteristic of these datasets is illustrated in Table All comments are divided into emotional polarity by two-class classification problems (positive and negative) For classification, there are many classifiers state-of-the-art in natural language processing In the studies [21], [22] the authors showed comparative study of classifier to evaluate the effectiveness of proposed text data augmentation, so we apply experiment by three well-known classification algorithms such as Naive Bayes, Random Forest, Support Vector Machine All experiments were implemented by Python library and on a PC configured with Intel core I7 and Gigabyte of memory After the preprocessing step, the data augmentation is applied to generate more comments Table presents the number of comments and words corresponding to each dataset before and after augmentation step The number of comments increased four times The number of words augments nearly million words for dataset Three common classifiers for text classification [21] including Naive Bayes (NB), Random Forest (RF) and Support Vector Machine (SVM) are used to recognize sentiments Table presents the classification performance on two scenarios of with and without text augmentation The average accuracy of NB classifiers achieves at 84% in both cases before and after augmentation The similar observation is acquired for SVM classifiers The RF classifier demonstrates its efficiency for handling high-dimensional feature vectors in case data augmentation is applied Therefore, the best accuracy is 95% given by the RF classifier and it improves nearly 10% IAIT ’20, July 01–03, 2020, Bangkok, Thailand Ho Huong and Truong Hoang Table 2: Characteristic of dataset used in experiments No Name Dataset Dataset Dataset Emotional polarity Positive Negative Positive Negative Positive Negative Number of comments 15,000 15,000 5,000 5,000 7,383 8,690 Labels Categoric al Categoric al Binary Table 3: The number of comments and words before and after data augmentation step Before augmentation Data Source Dataset Dataset Dataset Dataset Dataset 30,000 10,000 16,073 120,000 40,000 64,292 2,962,235 1,003,237 347,733 10,438,550 3,534,413 1,287,089 Number of comments Number of words Table 4: The classification performance (%) of two scenarios: before and after text augmentation Before augmentation After augmentation After augmentation Name NB RF SVM NB RF SVM Dataset Dataset Dataset Average 82 82 88 84 84 83 91 86 86 85 91 87 83 84 85 84 97 97 92 95 86 89 88 87 CONCLUSION We presented a method based on data augmentation techniques for sentiment analysis The experimental results on three datasets have been shown the efficiency of the proposed approach We improve about 10% on the product comments in Vietnamese language by using simple techniques for replacing words and random insertions associated with several common classifiers The extension of this work is now continuing to build the sentiment lexicon for Vietnamese language to enhance the performance of augmentation method REFERENCES [1] Pang, B., and Lee, L Opinion mining and sentiment analysis Foundations and Trends® in Information Retrieval 2, 1–2 (2008), 1–135 [2] Gandhmal, D P., and Kumar, K Systematic analysis, and review of stock market prediction techniques Computer Science Review 34 (Nov 2019), 100190 [3] Haque, T U., Saber, N N., and Shah, F M Sentiment analysis on large scale Amazon product reviews In 2018 IEEE International Conference on Innovative Research and Development (ICIRD) (Bangkok, May 2018), IEEE, pp 1–6 [4] Valdivia, A., Hrabova, E., Chaturvedi, I., Luzón, M V., Troiano, L., Cambria, E., and Herrera, F Inconsistencies on TripAdvisor reviews: A unified index between users and Sentiment Analysis Methods Neurocomputing 353 (Aug 2019), 3–16 Dataset [5] Haselmayer, M., and Jenny, M Sentiment analysis of political communication: combining a dictionary approach with crowdcoding Quality & Quantity 51, (Nov 2017), 2623–2646 [6] Hassonah, M A., Al-Sayyed, R., Rodan, A., Al-Zoubi, A M., Aljarah, I., and Faris, H An efficient hybrid filter and evolutionary wrapper approach for sentiment analysis of various topics on Twitter Knowledge-Based Systems 192 (Mar 2020), 105353 [7] Drus, Z., and Khalid, H Sentiment Analysis in Social Media and Its Application: Systematic Literature Review Procedia Computer Science 161 (2019), 707–714 [8] Perez, L., and Wang, J The Effectiveness of Data Augmentation in Image Classification using Deep Learning arXiv:1712.04621 [cs] (Dec 2017) arXiv: 1712.04621 [9] Duong, H., and Hoang, V T Data Augmentation Based on Color Features for Limited Training Texture Classification In 2019 4th International Conference on Information Technology (InCIT) (2019), pp 208–211 [10] Lu, X., Zheng, B., Velivelli, A., and Zhai, C Enhancing Text Categorization with Semantic-enriched Representation and Training Data Augmentation Journal of the American Medical Informatics Association 13, (Sept 2006), 526–535 [11] Shan Lee, V L., Gan, K H., Tan, T P., and Abdullah, R Semi-supervised Learning for Sentiment Classification using Small Number of Labeled Data Procedia Computer Science 161 (2019), 577–584 [12] Shakeel, M H., Karim, A., and Khan, I A multi-cascaded model with data augmentation for enhanced paraphrase detection in short texts Information Processing & Management 57, (May 2020), 102204 [13] Wei, J., and Zou, K EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (Hong Kong, China, 2019), Association for Computational Linguistics, pp 6383–6389 [14] Giridhara, P., Mishra, C., Venkataramana, R., Bukhari, S., and Dengel, A A Study of Various Text Augmentation Techniques for Relation Classification in Free Text: In Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods (Prague, Czech Republic, 2019), SCITEPRESS - Science and Technology Publications, pp 360–367 [15] Nguyen-Nhat, D.-K., and Duong, H.-T One-Document Training for Vietnamese Sentiment Analysis In Computational Data and Social Networks, A Tagarelli and H Tong, Eds., vol 11917 Springer International Publishing, Cham, 2019, pp 189–200 [16] Mikolov, T., Chen, K., Corrado, G., and Dean, J Efficient Estimation of Word Representations in Vector Space arXiv:1301.3781 [cs] (Sept 2013) arXiv: 1301.3781 [17] Hoa, B T Nhóm hư từ mang ý nghĩa phủ định tiếng Việt Tạp Chí Ngơn Ngữ (2014), [18] Vu, X.-S., and Park, S.-B Construction of Vietnamese SentiWordNet by using Vietnamese Dictionary A data augmentation technique based on text for Vietnamese sentiment analysis [19] Ahuja, R., Chug, A., Kohli, S., Gupta, S., and Ahuja, P The Impact of Features Extraction on the Sentiment Analysis Procedia Computer Science 152 (2019), 341– 348 [20] Vu, X.-S Pre-trained word2vec models for vietnamese, 2016 [21] Duong, H.-T., and Truong Hoang, V A Survey on the Multiple Classifier for New Benchmark Dataset of Vietnamese News Classification In 2019 11th International Conference on Knowledge and Smart Technology (KST) (Phuket, Thailand, Jan 2019), IEEE, pp 23–28 [22] Al Amrani, Y., Lazaar, M., and El Kadiri, K E Random Forest and Support Vector Machine based Hybrid Approach to Sentiment Analysis Procedia Computer Science 127 (2018), 511–520 IAIT ’20, July 01–03, 2020, Bangkok, Thailand HoHuongThien ORIGINALITY REPORT 27 % SIMILARITY INDEX 19% 21% 2% INTERNET SOURCES PUBLICATIONS STUDENT PAPERS PRIMARY SOURCES Hanoi University Publication tailieu.vn Internet Source Hanoi Pedagogycal University Publication luanvan.net.vn Internet Source ngonngu.net Internet Source documents.mx Internet Source dulieu.tailieuhoctap.vn Internet Source doc.edu.vn Internet Source vns.edu.vn Internet Source 2% 2% 2% 1% 1% 1% 1% 1% 1%

Định dạng
Số trang	115
Dung lượng	9,94 MB