Phân tích trải nghiệm khách hàng trực tuyến trong lĩnh vực khách sạn tiếp cận theo mô hình chủ đề

BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC KINH TẾ TP HỒ CHÍ MINH NGUYỄN VĂN HỒ PHÂN TÍCH TRẢI NGHIỆM KHÁCH HÀNG TRỰC TUYẾN TRONG LĨNH VỰC KHÁCH SẠN TIẾP CẬN THEO MƠ HÌNH CHỦ ĐỀ LUẬN VĂN THẠC SĨ KINH TẾ TP Hồ Chí Minh – Năm 2020 BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC KINH TẾ TP HỒ CHÍ MINH NGUYỄN VĂN HỒ PHÂN TÍCH TRẢI NGHIỆM KHÁCH HÀNG TRỰC TUYẾN TRONG LĨNH VỰC KHÁCH SẠN TIẾP CẬN THEO MƠ HÌNH CHỦ ĐỀ Chun ngành: Hệ thống thông tin quản lý (Công nghệ thiết kế thông tin truyền thông) Hướng đào tạo: Ứng dụng Mã số: 8340405 LUẬN VĂN THẠC SĨ KINH TẾ NGƯỜI HƯỚNG DẪN KHOA HỌC: TS HỒ TRUNG THÀNH TP Hồ Chí Minh – Năm 2020 LỜI CAM ĐOAN Tơi cam đoan rằng, luận văn “Phân tích trải nghiệm khách hàng trực tuyến lĩnh vực khách sạn tiếp cận theo Mơ hình chủ đề” tơi thực nghiên cứu hướng dẫn TS Hồ Trung Thành Ngoại trừ kết tham khảo từ cơng trình khác trích dẫn rõ luận văn nghiên cứu chưa có cơng bố trước Tất liệu thu thập; nội dung, kết tác giả thực phân tích; xuất phát từ nhu cầu thực tiễn đề tài nguyện vọng nghiên cứu, tìm hiểu thân tơi TP Hồ Chí Minh, ngày 08 tháng 08 năm 2020 Tác giả Nguyễn Văn Hồ MỤC LỤC TRANG PHỤ BÌA LỜI CAM ĐOAN MỤC LỤC BẢNG DANH MỤC CHỮ VIẾT TẮT DANH MỤC CÁC BẢNG DANH MỤC CÁC HÌNH VẼ TĨM TẮT ABSTRACT CHƯƠNG 1: TỔNG QUAN ĐỀ TÀI 1.1 Cơ sở hình thành đề tài 1.2 Mục tiêu nghiên cứu 1.3 Đối tượng phạm vi nghiên cứu .5 1.4 Phương pháp nghiên cứu .5 1.5 Quy trình nghiên cứu 1.6 Đóng góp nghiên cứu 1.7 Kết cấu luận văn CHƯƠNG 2: TỔNG QUAN TÌNH HÌNH NGHIÊN CỨU 2.1 Phân tích trải nghiệm khách hàng .9 2.2 Phân tích liệu văn 11 2.3 Xử lý ngôn ngữ tự nhiên 12 2.4 Mơ hình chủ đề phân tích ý kiến khách hàng trực tuyến 13 2.5 Mơ hình số NPS quản trị trải nghiệm khách hàng .15 2.6 Các khoảng trống nghiên cứu 16 CHƯƠNG 3: CƠ SỞ LÝ THUYẾT 18 3.1 Mơ hình chủ đề 18 3.1.1 Mơ hình Latent Semantic Analysis (LSA) 19 3.1.2 Mơ hình Probabilistic Latent Semantic Analysis (pLSA) 20 3.1.3 Mơ hình Latent Dirichlet Allocation (LDA) 22 3.2 Mơ hình chủ đề theo thời gian 25 3.3 Mơ hình số Net Promoter Score 26 CHƯƠNG 4: XÂY DỰNG MƠ HÌNH THỰC NGHIỆM 29 4.1 Mơ hình nghiên cứu tổng quan 29 4.2 Thiết kế thực nghiệm Mơ hình chủ đề 31 4.2.1 Thu thập liệu 31 4.2.2 Tiền xử lý liệu 31 4.2.3 Ứng dụng mơ hình LDA tập ngữ liệu 33 4.2.4 Ưu điểm nhược điểm mơ hình LDA 34 4.3 Gán nhãn chủ đề 35 4.4 Thực nghiệm Mơ hình chủ đề kết hợp với yếu tố thời gian 36 4.5 Thực nghiệm Mơ hình số Net Promoter Score 38 CHƯƠNG 5: KẾT QUẢ NGHIÊN CỨU VÀ THẢO LUẬN 40 5.1 Kết thu thập tiền xử lý liệu .40 5.2 Kết huấn luyện đánh giá Mơ hình chủ đề 41 5.2.1 Trích xuất tập chủ đề 41 5.2.2 Suy luận nhãn chủ đề 45 5.2.3 Biểu diễn trực quan kết gói byLDAvis 48 5.3 Kết Mơ hình chủ đề kết hợp với yếu tố thời gian .48 5.3.1 Trích xuất tập chủ đề theo thời gian 48 5.3.2 Biểu diễn chủ đề tập từ khóa theo thời gian 49 5.3.3 Ý nghĩa Mơ hình chủ đề - Tập chủ đề - Từ khóa 54 5.4 Biểu diễn trực quan Mơ hình số NPS 55 CHƯƠNG 6: KẾT LUẬN VÀ HƯỚNG PHÁT TRIỂN 58 6.1 Tổng kết trình nghiên cứu 58 6.2 Hạn chế hướng phát triển .59 DANH MỤC CƠNG TRÌNH CỦA TÁC GIẢ TÀI LIỆU THAM KHẢO PHỤ LỤC BẢNG DANH MỤC CHỮ VIẾT TẮT Chữ viết tắt Từ tiếng Anh Từ tiếng Việt BoW Bag of word Túi từ CEM Customer Experience Quản trị trải nghiệm khách Management hàng CS Coherence Score Chỉ số thấu hiểu DTM Dynamic Topic Modeling Mơ hình chủ đề động EM Expectation Maximization Tối đa hóa kỳ vọng K K Số chủ đề mơ hình LDA LDA Latent Dirichlet Allocation Mơ hình LDA LSA Latent Semantic Analysis Mơ hình Phân tích ngữ nghĩa ẩn NLP Natural Language Processing Xử lý ngôn ngữ tự nhiên NPS Net Promoter Score Điểm số quảng cáo theo mạng lưới/Điểm số khách hàng thân thiện probabilistic Latent Semantic Mô hình Phân tích ngữ nghĩa Analysis ẩn theo xác suất TCM Traffic Management Center Trung tâm quản lý Giao thông TF – IDF Term Frequency – Inverse Chỉ số TF – IDF pLSA Document Frequency UNWTO World Tourism Organization Tổ chức Du lịch Thế giới DANH MỤC CÁC BẢNG Bảng 3.1: Các định nghĩa sử dụng mơ hình LDA Bảng 5.1: Kết thu thập liệu Bảng 5.2: Các chủ đề 0, 2, 14 với mười lăm từ có xác suất cao Bảng 5.3: Các chủ đề bật Bảng 5.4: Kết gán nhãn Chủ đề DANH MỤC CÁC HÌNH VẼ Hình 1.1: Chiến lược Quản trị trải nghiệm khách hàng Hình 1.2: Sơ đồ quy trình nghiên cứu Hình 3.1: Đồ thị sinh xác suất mơ hình pLSA (Hofmann, 1999) Hình 3.2: Q trình sinh mơ hình pLSA Hình 3.3: Mơ hình LDA (Blei cộng sự, 2003) Hình 3.4: Quy trình sinh xác suất DTM (Blei Lafferty, 2006) Hình 3.5: Cơng thức tính số NPS Hình 4.1: Mơ hình nghiên cứu tổng quan Hình 4.2: Q trình Tiền xử lý liệu Hình 4.3: Tương quan Coherence Score Số lượng chủ đề (K) Hình 5.1: Phân bổ số lượng bình luận theo năm Hình 5.2: Các chủ đề từ khóa biểu diễn theo biểu đồ "đám mây" Hình 5.3: Các chủ đề từ khóa theo tần suất giảm dần Hình 5.4: Biểu diễn chủ đề từ khóa cơng cụ pyLDAvis Hình 5.5: Biểu diễn Chủ đề theo thời gian (Năm 2009 – 2020) Hình 5.6: Chủ đề Từ khóa theo thời gian Hình 5.7: Chủ đề Từ khóa theo thời gian Hình 5.8: Chủ đề 10 Từ khóa theo thời gian Hình 5.9: Chủ đề 12 Từ khóa theo thời gian Hình 5.10: Trực quan số NPS TĨM TẮT Quản trị phân tích trải nghiệm khách hàng vấn đề quan trọng mà hầu hết doanh nghiệp quan tâm, đặc biệt doanh nghiệp lĩnh vực dịch vụ du lịch, khách sạn, nhà hàng Trong đó, việc thu thập phân tích ý kiến bình luận khách hàng quan tâm nghiên cứu Hiện nay, ý kiến đánh giá, phản hồi khách hàng trang thương mại điện tử, diễn đàn, mạng xã hội lưu dạng văn tạo thành nguồn thông tin khổng lồ cho việc phân tích trải nghiệm khách hàng sản phẩm, dịch vụ doanh nghiệp Hay nói khác việc thu thập, phân tích hiểu thơng tin ẩn chứa liệu thấu hiểu khách hàng Trong nghiên cứu này, trước tiên, thu thập tập ngữ liệu với 127.896 ý kiến nhận xét trao đổi khách hàng tiếng Anh lĩnh vực khách sạn Sau đó, thực nghiệm tập ngữ liệu chọn tham số K tốt phép đo Perplexity với Coherence Score (CS) làm tham số đầu vào cho mơ hình Cuối cùng, thực nghiệm tập ngữ liệu tiếp cận theo mơ hình chủ đề Latent Dirichlet Allocation (LDA) với hệ số K để khám phá chủ đề ẩn từ tập ngữ liệu Sau mở rộng mơ hình cách phân tích chủ đề kết hợp với yếu tố thời gian (Dynamic Topic Modeling) Kết mô hình tìm chủ đề ẩn tập từ có tần suất cao khách hàng quan tâm Bên cạnh đó, nghiên cứu tính tốn phân tích số Net Promoter Score (NPS) từ điểm số đánh giá khách hàng biểu diễn trực quan kết phân tích thành dashboards Ứng dụng kết thực nghiệm từ mơ hình hỗ trợ định giúp doanh nghiệp cải thiện sản phẩm dịch vụ, nâng cao hài lòng khách hàng có thu hút khách hàng áp dụng quản lý phát triển kinh doanh doanh nghiệp lĩnh vực khách sạn Từ khóa: khách sạn, LDA, phân tích liệu văn bản, mơ hình chủ đề; trải nghiệm khách hàng trực tuyến ABSTRACT Management and analysis of customer experience is an important issue that most companies focus on, especially businesses in the service sector such as tourism, hotel, and restaurant In particular, collecting and analyzing customer comments are also interesting and studied Currently, customers’ reviews and feedbacks on ecommerce sites, forums, or social networks stored as textual form create a huge source of data for analyzing the customers experience about products and services of their company In other words, collecting, analyzing and understanding of the information hidden in this data is to comprehend the customer (customers insights) In this study, first of all, we collected a corpus with 127,896 customers’ comments that were discussed in English, then experimented on this dataset and chose the best K parameter by Perplexity measurement with Coherence Score (CS) as the input parameter for the model Finally, we experimented on the corpus using the Latent Dirichlet Allocation (LDA) model with K coefficient to explore latent topics from the corpus After that, the reseach model was extended by applying the Dynamic Topic Modeling (DTM) to annalyze the topics with temporal factor The results of the model show the hidden topics with top high probability key-words that are interesting to customers In addition, the study also calculated and analyzed the Net Promoter Score (NPS) from customer rating scores and visualized NPS on overview dashboards Applying empirical results from the model will support decision making to help businesses improve products and services, enhance existing customer satisfaction and attract new customers as well as apply in management and business development of companies in the hotel sector Keywords: hotel services, LDA, online customer experience, text mining, topic modeling  Chương trình thu thập danh sách tất khách sạn Bước 1: request đến server để list ID khách sạn từ keyword tìm kiếm Hình hiển thị câu lệnh request đến server để trả danh sách khách sạn từ khóa Chau Doc, kết json trả Trong liệu trả có thuộc tính ObjectID Id khách sạn, tham số sử dụng để gửi yêu cầu lên server để trả thơng tin liên quan như: bình ln, đánh giá… Bước 2: Từ biến đầu vào id khách sạn (ObjectID) bước 1, ta request đến server để yêu cầu liệu bình luận, thơng tin liên quan đến khách sạn Như hình bên dưới, thông tin trả bào gồm: điểm đánh giá, nội dung bình luận, thời gian bình luận … Từ file json ta trích xuất giá trị thuộc tính lưu dạng csv 1.4 Kết thu thập liệu: Sau hoàn thành chương trình, liệu trả từ server lưu thư mục result  json hình bên Hình sau thể tập liệu trích xuất từ Agoda, bao gồm (hotel_id, hotel_name, review_comments, rating_text…): Phụ lục 2: Tiền xử lý liệu 2.1 Tổng hợp tập liệu thành tập ngữ liệu Tiến hành union file csv lại thành file tổng hợp để bắt đầu tiền xử lý #combine all results files in the list import os import glob import pandas as pd os.chdir(r"C:\Users\honguyen\Desktop\agoda_results\results") extension = 'csv' all_filenames = [i for i in glob.glob('*.{}'.format(extension))] #combine all files in the list combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames ]) #export to csv combined_csv.to_csv( "agoda_results_combined_csv.csv", index=False, encoding='u tf-8-sig') Nạp lại file liệu lọc bình luận tiếng Anh import pandas as pd df = pd.read_csv(r"agoda_results_combined_csv.csv", engine = 'python') #Filter data1 = df.loc[pd.notnull(df.review_comments)] data = data1.loc[data1.comment_language == 'en'] data.count() Kết sau city_id city_name hotel_id hotel_name hotel_review_id reviewer_name reviewer_country rating travel_type_name room_type_name stay_length check_in_date review_date rating_text review_comments 127896 127896 127896 127866 127896 127126 127896 127896 127896 127806 127896 127896 127896 127896 127896 comment_language year dtype: int64 127896 127896 2.2 Tách từ Import thư viện, loại bỏ ký tự đặc biệt, đưa dạng viết thường import re import numpy as np import pandas as pd from pprint import pprint # Gensim import gensim import gensim.corpora as corpora from gensim.utils import simple_preprocess from gensim.models import CoherenceModel # Load the regular expression library import re # Remove punctuation data['paper_text_processed'] = data['review_comments'].map(lambda x: re.sub('[, \.!?]', '', x)) # Convert the titles to lowercase data['paper_text_processed'] = data['review_comments'].map(lambda x: x.lower()) # Print out the first rows of papers data['paper_text_processed'].head() Kết 46 48 50 Name: new hotel, good location, friendly staff spacious and clean good place for family with it is located near the train station and most good sevice, clean and tidy room the building its a good hotel in a good location look for paper_text_processed, dtype: object Tách từ import gensim from gensim.utils import simple_preprocess def sent_to_words(sentences): for sentence in sentences: yield(gensim.utils.simple_preprocess(str(sentence), deacc=True)) cc=True removes punctuations # dea data2 = data.paper_text_processed.values.tolist() data_words = list(sent_to_words(data2)) print(data_words[:1][0][:30]) Kết ['new', 'hotel', 'good', 'location', 'friendly', 'staff'] 2.3 Tạo bigram trigram cho mơ hình Xây dựng bigram trigram sử dụng class Phrases gensim # Build the bigram and trigram models bigram = gensim.models.Phrases(data_words, min_count=5, threshold=100) # higher threshold fewer phrases trigram = gensim.models.Phrases(bigram[data_words], threshold=100) # Faster way to get a sentence clubbed as a trigram/bigram bigram_mod = gensim.models.phrases.Phraser(bigram) trigram_mod = gensim.models.phrases.Phraser(trigram) 2.4 Loại bỏ stopwords import spacy# Remove Stop Words data_words_nostops = remove_stopwords(data_words) # Form Bigrams data_words_bigrams = make_bigrams(data_words_nostops) # Initialize spacy 'en' model, keeping only tagger component (for efficiency) nlp = spacy.load("en_core_web_sm", disable=['parser', 'ner']) # Do lemmatization keeping only noun, adj, vb, adv data_lemmatized = lemmatization(data_words_bigrams, allowed_postags=['NOUN', 'A DJ', 'VERB', 'ADV']) print(data_lemmatized[:1][0][:30]) Gọi hàm để remove stopwords 2.5 Tạo từ điển văn Sử dụng package gensim để tạo Từ điển (dictionary) văn (corpus) sau: import gensim.corpora as corpora # Create Dictionary id2word = corpora.Dictionary(data_lemmatized) # Create Corpus texts = data_lemmatized Phụ lục 3: Xây dựng mô hình chủ đề LDA Gensim 3.1 Véc-tơ hóa từ Mã hóa văn quy định từ điển kèm theo tần suất xuất chúng văn # Term Document Frequency corpus = [id2word.doc2bow(text) for text in texts] # View print(corpus[:1][0][:30]) Kết quả: [(0, 1), (1, 1), (2, 1), (3, 1), (4, 1)] 3.2 Huấn luyện mơ hình LDA Các tham số đưa vào mơ hình LDA gồm có số lượng chủ đề, số lượng văn bản, số lượt huấn luyện # Build LDA model lda_model = gensim.models.ldamodel.LdaModel(corpus=corpus, id2word=id2word, num_topics=15, random_state=100, update_every=1, chunksize=100, passes=5, alpha='auto', per_word_topics=True) Sau huấn luyện xong mơ hình lưu vào thư mục để lần sau sử dụng lại lda.save("your_folder/model_lda_20200808.model") 3.3 Các chủ đề thu sau huấn luyện Sau huấn luyện xong model LDA ta tìm phân phối văn theo chủ đề biểu diễn chủ đề theo phân phối từ Chúng ta tìm 10 từ vựng quan trọng chủ đề kèm theo trọng số phân phối chúng from pprint import pprint # Print the Keyword in the 15 topics pprint(lda_model.print_topics()) doc_lda = lda_model[corpus] Kết quả: [(0, '0.243*"good" + 0.085*"hotel" + 0.072*"location" + 0.060*"value" + ' '0.057*"service" + 0.051*"money" + 0.050*"great" + 0.045*"staff" + ' '0.042*"excellent" + 0.039*"breakfast"'), (1, '0.080*"stay" + 0.050*"hotel" + 0.045*"time" + 0.037*"back" + 0.035 *"staff" ' '+ 0.032*"make" + 0.031*"come" + 0.027*"go" + 0.023*"trip" + 0.021* "thank"'), (2, '0.148*"room" + 0.045*"bed" + 0.044*"bathroom" + 0.036*"water" + ' '0.031*"shower" + 0.030*"clean" + 0.025*"hot" + 0.023*"small" + 0.0 19*"big" ' '+ 0.018*"good"'), (3, '0.146*"perfect" + 0.142*"amazing" + 0.084*"kind" + 0.047*"family" + ' '0.044*"spa" + 0.044*"resort" + 0.035*"massage" + 0.035*"access" + ' '0.024*"kid" + 0.022*"child"'), (4, '0.043*"room" + 0.033*"night" + 0.030*"door" + 0.025*"next" + 0.024 *"floor" ' '+ 0.020*"morning" + 0.017*"luggage" + 0.016*"day" + 0.016*"move" + ' '0.016*"noise"'), (5, '0.095*"breakfast" + 0.056*"quarter" + 0.048*"food" + 0.036*"good" + ' '0.030*"fruit" + 0.025*"restaurant" + 0.024*"choice" + 0.022*"coffe e" + ' '0.022*"buffet" + 0.021*"local"'), (6, '0.138*"stay" + 0.136*"recommend" + 0.088*"would" + 0.054*"highly" + ' '0.050*"hotel" + 0.040*"place" + 0.038*"great" + 0.036*"definitely" + ' '0.026*"night" + 0.020*"enjoy"'), (7, '0.089*"pool" + 0.080*"view" + 0.074*"great" + 0.066*"lovely" + ' '0.057*"beautiful" + 0.055*"room" + 0.051*"fantastic" + 0.037*"upgr ade" + ' '0.030*"nice" + 0.021*"swimming"'), (8, '0.149*"staff" + 0.102*"friendly" + 0.090*"helpful" + 0.076*"room" + ' '0.058*"clean" + 0.048*"good" + 0.042*"breakfast" + 0.041*"nice" + ' '0.040*"great" + 0.038*"hotel"'), (9, '0.085*"service" + 0.062*"front" + 0.048*"customer" + 0.043*"desk" + ' '0.043*"train" + 0.032*"staff" + 0.029*"need" + 0.029*"manager" + 025*"do" ' '+ 0.022*"booking"'), (10, '0.100*"walk" + 0.093*"restaurant" + 0.082*"hotel" + 0.043*"distanc e" + ' '0.039*"old" + 0.036*"location" + 0.035*"close" + 0.034*"locate" + ' '0.034*"shop" + 0.033*"market"'), (11, '0.066*"hotel" + 0.025*"well" + 0.022*"room" + 0.020*"staff" + 0.01 7*"stay" ' '+ 0.013*"make" + 0.010*"price" + 0.009*"look" + 0.009*"would" + ' '0.009*"service"'), (12, '0.044*"room" + 0.038*"good" + 0.034*"nice" + 0.033*"old" + 0.032*" hotel" + ' '0.028*"little" + 0.028*"place" + 0.027*"location" + 0.024*"clean" + ' '0.023*"bit"'), (13, '0.042*"room" + 0.038*"check" + 0.036*"hotel" + 0.030*"book" + 0.02 8*"staff" ' '+ 0.025*"arrive" + 0.023*"ask" + 0.020*"give" + 0.019*"night" + ' '0.018*"even"'), (14, '0.073*"tour" + 0.033*"taxi" + 0.028*"help" + 0.024*"also" + 0.024* "arrange" ' '+ 0.024*"free" + 0.022*"bus" + 0.022*"tourist" + 0.021*"airport" + ' '0.021*"take"')] 3.4 Tính số perplexity coherence score mơ hình Chỉ số perplexity coherence score cho số cho ta biết chất lượng mô hình tốt hay xấu Nó cịn dùng tìm kiếm số lượng topics phù hợp với liệu Perplexity xây dựng dựa logarit hàm hợp lý tối đa (MLE) nên perplexity thấp chất lượng model tốt Trái lại Coherence score mơ hình cao mơ hình tốt from gensim.models.coherencemodel import CoherenceModel # Compute Perplexity print('\nPerplexity: ', lda_model.log_perplexity(corpus)) ood the model is lower the better # a measure of how g # Compute Coherence Score coherence_model_lda = CoherenceModel(model=lda_model, texts=data_lemmatized, di ctionary=id2word, coherence='c_v') coherence_lda = coherence_model_lda.get_coherence() print('\nCoherence Score: ', coherence_lda) Kết quả: Perplexity: -8.151460416028938 Coherence Score: 0.6553682213438064 3.5 Trực quan Sử dụng gói pyLDAvis để trực quan kết !pip install pyLDAvis # Plotting tools import pyLDAvis import pyLDAvis.gensim # don't skip this import matplotlib.pyplot as plt %matplotlib inline # Visualize the topics pyLDAvis.enable_notebook() vis = pyLDAvis.gensim.prepare(lda_model, corpus, id2word) Phụ lục 4: Xây dựng mơ hình DTM Gensim 4.1 Tiền xử lý liệu Thực bước tương tự mơ hình LDA Nạp liệu vào corpus from gensim import corpora class DTMcorpus(corpora.textcorpus.TextCorpus): def get_texts(self): return self.input def len (self): return len(self.input) corpus = DTMcorpus(data_lemmatized) corpus Kết quả: < main .DTMcorpus at 0x1885d183cf8> 4.2 Tính tốn số lượng tài liệu (documents) mốc thời gian (time-slices) Mơ hình DTM địi hỏi phải rõ số lượng tài liệu mốc thời gian trước đưa vào huấn luyện Đoạn lệnh xử lý, tính tốn: #set time_slices t1 = data4.loc[(df["year"].dt.year == 2009)].shape[0] t2 = data4.loc[(df["year"].dt.year == 2010)].shape[0] t3 = data4.loc[(df["year"].dt.year == 2011)].shape[0] t4 = data4.loc[(df["year"].dt.year == 2012)].shape[0] t5 = data4.loc[(df["year"].dt.year == 2013)].shape[0] t6 = data4.loc[(df["year"].dt.year == 2014)].shape[0] t7 = data4.loc[(df["year"].dt.year == 2015)].shape[0] t8 = data4.loc[(df["year"].dt.year == 2016)].shape[0] t9 = data4.loc[(df["year"].dt.year == 2017)].shape[0] t10 =data4.loc[(df["year"].dt.year == 2018)].shape[0] t11 =data4.loc[(df["year"].dt.year == 2019)].shape[0] t12 =data4.loc[(df["year"].dt.year == 2020)].shape[0] Truy vấn kết quả: #set time_slices print(t1, t2, t3, t4, t5,t6, t7, t8, t9, t10,t11, t12) time_slices = [t1, t2, t3, t4, t5,t6, t7, t8, t9, t10,t11, t12] Kết sau: 10264 9893 8263 9840 11882 8594 11043 12800 11510 10258 12993 10601 4.3 Huấn luyện mơ hình Nạp tập tin dtm-win64.exe print("Dynamic Topic Modeling started.") start = datetime.datetime.now() dtm_model = DtmModel('dtm-win64.exe', corpus=corpus, time_slices=time_slice, nu m_topics=5, id2word=id2word, initialize_lda=True) finish = datetime.datetime.now() print(f"\nComplete! Elapsed Time: {(finish-start).total_seconds()} seconds\n") Huấn luyện mơ hình: #run Dynamic topic modeling from gensim.models.wrappers.dtmmodel import DtmModel model = DtmModel(r"C:\Users\honguyen\Downloads\dtm-master\bin\dtm-win64.exe", c orpus, time_slices, num_topics=15, id2word=corpus.dictionary) #Topic Evolution num_topics = 15 for topic_no in range(num_topics): print("\nTopic", str(topic_no)) for time in range(len(time_slices)): print("Time slice", str(time)) print(model.show_topic(topic_no, time, topn=10)) 4.4 Kết Hiển thị kết quả: model.show_topics(num_topics=-1, times=5, num_words=10, log=False, formatted=Tr ue) Kết quả: ['0.044*place + 0.029*family + 0.025*rent + 0.023*really + 0.022*bike + 0.019*stay + 0.017*great + 0.017*bed + 0.015*recommend + 0.015*owne r', '0.033*tour + 0.022*boat + 0.019*stay + 0.016*take + 0.014*trip + 012*book + 0.012*day + 0.011*go + 0.011*help + 0.011*great', '0.372*bus + 0.110*walk + 0.104*minute + 0.103*station + 0.027*ferry + 0.027*free + 0.022*drop + 0.020*catch + 0.017*away + 0.014*get', '0.050*night + 0.048*room + 0.029*sleep + 0.026*noisy + 0.025*noise + 0.021*morning + 0.020*next + 0.018*door + 0.017*street + 0.015*hear ', '0.049*breakfast + 0.041*pool + 0.020*food + 0.018*beach + 0.018*res taurant + 0.012*coffee + 0.010*buffet + 0.010*nice + 0.009*staff + 009*choice', '0.137*good + 0.058*hotel + 0.053*price + 0.052*room + 0.039*clean + 0.025*location + 0.021*breakfast + 0.017*stay + 0.015*nice + 0.013*se rvice', '0.075*bad + 0.072*room + 0.060*old + 0.047*dirty + 0.046*smell + 037*staff + 0.029*facility + 0.027*con + 0.025*pro + 0.022*poor', '0.064*room + 0.039*bed + 0.039*water + 0.029*hot + 0.027*shower + 025*clean + 0.024*bathroom + 0.014*small + 0.014*good + 0.011*comfor table', … '0.104*great + 0.102*view + 0.096*room + 0.051*nice + 0.034*location + 0.027*clean + 0.018*beach + 0.017*sea + 0.016*breakfast + 0.015*riv er', '0.106*stay + 0.057*would + 0.053*nice + 0.044*place + 0.032*hotel + 0.028*really + 0.025*day + 0.024*night + 0.017*well + 0.016*little'] #Save to folder save(r"C:\Users\honguyen\Desktop\dynamic_topic_modeling_20200823.xlsx", separat ely=None, sep_limit=10485760, ignore=frozenset({}), pickle_protocol=2) Biểu diễn trực quan #Topic Evolution Visualization from gensim.test.utils import common_corpus, common_dictionary from gensim.models.wrappers import DtmModel dtm_vis(corpus, time) ... TRƯỜNG ĐẠI HỌC KINH TẾ TP HỒ CHÍ MINH NGUYỄN VĂN HỒ PHÂN TÍCH TRẢI NGHIỆM KHÁCH HÀNG TRỰC TUYẾN TRONG LĨNH VỰC KHÁCH SẠN TIẾP CẬN THEO MƠ HÌNH CHỦ ĐỀ Chun ngành: Hệ thống thông tin quản lý (Công... Năm 2020 LỜI CAM ĐOAN Tơi cam đoan rằng, luận văn ? ?Phân tích trải nghiệm khách hàng trực tuyến lĩnh vực khách sạn tiếp cận theo Mơ hình chủ đề? ?? tơi thực nghiên cứu hướng dẫn TS Hồ Trung Thành... HÌNH NGHIÊN CỨU 2.1 Phân tích trải nghiệm khách hàng .9 2.2 Phân tích liệu văn 11 2.3 Xử lý ngôn ngữ tự nhiên 12 2.4 Mơ hình chủ đề phân tích ý kiến khách hàng trực tuyến

Định dạng
Số trang	97
Dung lượng	3,84 MB