luận án tiến sĩ khai phá luồng văn bản với kỹ thuật gom cụm

BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC LẠC HỒNG VÕ THỊ HỒNG THẮM KHAI PHÁ LUỒNG VĂN BẢN VỚI KỸ THUẬT GOM CỤM LUẬN ÁN TIẾN SĨ KHOA HỌC MÁY TÍNH Đồng Nai, năm 2021 BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC LẠC HỒNG VÕ THỊ HỒNG THẮM KHAI PHÁ LUỒNG VĂN BẢN VỚI KỸ THUẬT GOM CỤM LUẬN ÁN TIẾN SĨ KHOA HỌC MÁY TÍNH Chuyên ngành: Khoa học máy tính Mã số: 9480101 NGƯỜI HƯỚNG DẪN KHOA HỌC PGS.TS ĐỖ PHÚC Đồng Nai, năm 2021 LỜI CẢM ƠN Xin chân thành cảm ơn PGS.TS Đỗ Phúc tận tình hướng dẫn nghiên cứu sinh hoàn thành luận án tiến sĩ Xin chân thành cảm ơn quý thầy/cô khoa sau đại học, trường đại học Lạc Hồng tạo điện kiện thuận lợi hỗ trợ nghiên cứu sinh hoàn thành luận án Xin trân trọng cảm ơn trường đại học Thủ Dầu Một hỗ trợ nghiên cứu sinh tham gia học tập trường đại học Lạc Hồng Xin chân thành cám ơn quý bạn bè, đồng nghiệp tạo điều kiện giúp đỡ nghiên cứu sinh hoàn thành luận án Nghiên cứu sinh - Võ Thị Hồng Thắm LỜI CAM ĐOAN Tôi xin cam đoan luận án cơng trình nghiên cứu riêng hướng dẫn PGS.TS Đỗ Phúc Các số liệu tài liệu nghiên cứu trung thực chưa công bố cơng trình nghiên cứu Tất tham khảo kế thừa trích dẫn tham chiếu đầy đủ Đồng Nai, ngày … tháng năm 2021 Nghiên cứu sinh Võ Thị Hồng Thắm MỤC LỤC CHƯƠNG 1: GIỚI THIỆU 1.1Tổng quan đề tài luận án 1.1.1Bài toán nghiên cứu ý n 1.1.2Thách thức toán go 1.1.3Các vấn đề nghiên cứu 1.1.4Các tốn nghiên cứu 1.2Đóng góp luận án cơng trình cơng bố 1.3Mục tiêu, phạm vi phương pháp nghiên cứu 1.3.1Mục tiêu nghiên cứu 1.3.2Phạm vi nghiên cứu 1.3.3Phương pháp nghiên cứu 1.4Cấu trúc luận án 1.5Kết chương CHƯƠNG 2: CÁC NGHIÊN CỨU LIÊN QUAN 2.1 So sánh số cách tiếp cận liên quan đến gom cụm luồng văn 2.1.1Phương pháp tiếp cận dựa 2.1.2Phương pháp tiếp cận dựa 2.1.3Phương pháp tiếp cận dựa 2.1.4Mơ hình hóa chủ đề (Topic 2.1.5Mơ hình hỗn hợp dựa 2.1.6Đồ thị phổ biến 2.1.7Mơ hình hóa bật trê 2.2 Kết chương CHƯƠNG 3: GOM CỤM LUỒNG VĂN BẢN THEO NGỮ NGHĨA DỰA TRÊN ĐỒ THỊ TỪ 3.1 Phương pháp 3.1.1Biểu diễn đặt trưng văn 3.1.2Biểu diễn văn đồ 3.1.3Gom cụm luồng văn d 3.2Thực nghiệm bàn luận 3.3Kết chương CHƯƠNG 4: PHÁT HIỆN CỤM TỪ XU THẾ TRÊN LUỒNG VĂN BẢN 4.1Phương pháp 4.2Thực nghiệm bàn luận 4.3Kết chương CHƯƠNG 5: KẾT LUẬN & HƯỚNG PHÁT TRIỂN 5.1Các kết đạt được, hạn chế hướng phát triển 5.2Ý nghĩa học thuật thực tiễn luận án BẢNG THUẬT NGỮ ANH – VIỆT Tiếng Anh Allocation Dirichlet Latent Bag of Word Benchmark Cluster validation Common sub GOWs Concept/topic drift Corpus Density-based Dirichlet Process Dirichlet-Hawkes Topic Model Document batch Dynamic Clustering Topic Dynamic Topic Model Features of meaning Filtering Frequent sub-graph Graph of Word Microblogs Model’s hyper-parameter sensitivity Mstream Noise Outlier Politeness Preprocess Proximity measure Sequence Monte Carlo Sparse nature Sparsity of text Stemming and Lemmatization Stop word Streaming LDA Survey Tiếng Anh Temporal Dynamic Process Model Temporal model-LDA Temporal Text Mining Term Frequency Term Frequency-Invert Document Frequency Text corpus Text similarity Text to Graph Trendy Keyword Extraction System Tokenization Topic tracking model Vector Space model Visualize Word relatedness Word segmentation Word similarity Word vector DANH MỤC BẢNG Bảng 1.1: Phân tích điểm mạnh tồn mơ hình Bảng 3.1: Biểu diễn văn với BOW truyền thống 42 Bảng 3.2: Biểu diễn văn với BOW TF-IDF 42 Bảng 3.3: Biểu diễn văn với GOW 48 Bảng 3.4: Biểu diễn văn kết hợp BOW GOW 49 Bảng 3.5: Biểu diễn véc tơ chủ đề mơ hình GOW-Stream 62 Bảng 3.6: Chi tiết liệu thử nghiệm 64 Bảng 3.7: Chi tiết cấu hình cho mơ hình gom cụm luồng văn 66 Bảng 3.8: Kết đầu trung bình tác vụ gom cụm văn với mơ hình khác với độ đo NMI 67 Bảng 3.9: Kết đầu thử nghiệm tác vụ gom cụm văn với mô hình khác với độ đo F1 67 Bảng 4.1: Các thuộc tính nút mối quan hệ 80 Bảng 4.2: Một ví dụ tính tốn số xếp hạng từ 82 Bảng 4.3: Một ví dụ tính tổng trọng số từ khóa chuyên mục 83 Bảng 4.4: Thí dụ cấu trúc lưu trữ Burst 87 Bảng 4.5: Các Burst từ khóa “Facebook” 89 Bảng 4.6: Xác định danh sách từ xu chung với từ khóa “Facebook” 90 Bảng 4.7: Thử nghiệm thời gian thực thi thu thập thông tin 91 Bảng 4.8: Kiểm tra thời gian thực thi việc thêm liệu vào sở liệu đồ thị 91 Bảng 4.9: Kiểm tra thời gian chạy xử lý 91 Bảng 4.10: Thời gian xử lý số lượng viết khác với độ dài khác 92 Bảng 4.11: Tỷ lệ giống liệu sinh từ thuật tốn TF-IDF viết ngơn ngữ lập trình khác 93 Bảng 4.12: Tần số từ khóa 94 Bảng 4.13: Một số tham số với word2Vec 95 Bảng 4.14: Các từ liên quan đến từ khóa “Ứng dụng” 96 Bảng 4.15: So sánh mức độ tương đồng sử dụng thước đo khoảng cách tương đồng khác 96 Bảng 4.16: Thời gian huấn luyện mơ hình 97 Bảng 4.17: Thời gian xử lý để tìm 10 từ liên quan 98 Bảng 4.18: Kiểm tra thời gian xử lý phát Burst báo 19 ngày 100 107 kết thực nghiệm nghiên cứu bao gồm: tính tốn thời gian xử lý, so sánh thời gian xử lý giải pháp tập liệu khác nhau; Thu thập tập liệu nguồn kết xuất kết thành tập liệu phục vụ cho nghiên cứu liên quan Hướng phát triển đề xuất sau: Nghiên cứu, cấu trúc lại tập liệu theo dạng chuẩn chung để cơng bố; Hồn thiện đáp ứng u cầu người dùng vào nhiều tảng khác Smart phone, Web …để đáp ứng triển khai thực tiễn; Sử dụng kết nghiên cứu phát cụm từ xu để nâng cao hiệu mô hình GOW-Stream việc nắm bắt thêm xu hướng từ văn đến từ luồng thực gom cụm 5.2 Ý nghĩa học thuật thực tiễn luận án Về học thuật, luận án đề xuất mơ hình Mơ hình GOW-Stream thể tính ưu việt so sánh với thuật toán đại gần Hệ thống TKES có đóng góp đề xuất thuật tốn phát cụm từ xu có tiềm ứng dụng vào việc tối ưu hóa mơ hình GOW-Stream đề xuất Các cơng trình nghiên cứu luận án gồm 04 báo hội nghị quốc tế (Springer/ACM) 02 báo tạp chí quốc tế (01 thuộc Scopus-Q3 01 thuộc SCIE-Q3) Về thực tiễn, mơ hình, thuật tốn đề xuất ứng dụng nhiều lĩnh vực, hệ thống xây dựng có ý nghĩa thực tiễn cao, phục vụ nhu cầu khai phá thông tin đông đảo người dùng thời đại cách mạng công nghiệp 4.0 DANH MỤC CÁC BÀI BÁO ĐÃ CÔNG BỐ Bốn báo hội nghị công bố: [CT1] Hong, T V T., & Do, P (2018, February) Developing a graph-based system for storing, exploiting and visualizing text stream In Proceedings of the 2nd International Conference on Machine Learning and Soft Computing (pp 82-86) (https://dl.acm.org/doi/abs/10.1145/3184066.3184084) [CT2] Hong, T.V.T and Do, P., 2018, October SAR: A Graph-Based System with Text Stream Burst Detection and Visualization In International Conference on Intelligent Computing & Optimization (pp 35-45) Springer, Cham (https://link.springer.com/chapter/10.1007/978-3-030-00979-3_4) [CT3] Hong, T.V.T and Do, P., 2019, October A Novel System for Related Keyword Extraction over a Text Stream of Articles In International Conference on Intelligent Computing & Optimization (pp 409-419) Springer, Cham (https://link.springer.com/chapter/10.1007/978-3-030-33585-4_41) [CT4] Hong, T.V.T and Do, P., 2019, October Comparing Two Models of Document Similarity Search over a Text Stream of Articles from Online News Sites In International Conference on Intelligent Computing & Optimization (pp 379-388) Springer, Cham (https://link.springer.com/chapter/10.1007/9783-030-33585-4_38) Hai báo tạp chí (chỉ mục Scopus/SCIE) chấp nhận đăng: [CT5] Hong, Tham Vo Thi, and Phuc Do “TKES: A Novel System for Extracting Trendy Keywords from Online News Sites” In: Journal of the Operations Research Society of China (ISSN: 2194-6698) (Scopus indexed) (https://www.springer.com/journal/40305) (Scopus Q3, http://link.springer.com/article/10.1007/s40305-020-00327-4) [CT6] Hong, Tham Vo Thi, and Phuc Do “GOW-Stream: a novel approach of graph-of-words based mixture model for semantic-enhanced text stream clustering” In: Intelligent Data Analysis (ISSN: 1571-4128) (https://www.iospress.nl/journal/intelligent-data-analysis) (SCIE Q3, accepted for publication – 2020, September) TÀI LIỆU THAM KHẢO Agarwal Neha, Sikka Geeta, and Awasthi Lalit Kumar, Evaluation of web service clustering using Dirichlet Multinomial Mixture model based approach for Dimensionality Reduction in service representation Information Processing & Management, 2020 57(4): p 102238 Aggarwal Charu C, A Survey of Stream Clustering Algorithms, in Data Clustering: Algorithms and Applications, C.K.R Charu C Aggarwal, Editor 2013, CRC Press p 229-253 Aggarwal Charu C, et al A framework for clustering evolving data streams in Proceedings 2003 VLDB conference 2003 Elsevier Ahmed Amr and Xing Eric Dynamic non-parametric mixture models and the recurrent chinese restaurant process: with applications to evolutionary clustering in Proceedings of the 2008 SIAM International Conference on Data Mining 2008 SIAM Aldous David J, Exchangeability and related topics, in École d'Été de Probabilités de Saint-Flour XIII—1983 1985, Springer p 1-198 Aljalbout Elie, et al., Clustering with deep learning: Taxonomy and new methods arXiv preprint arXiv:1801.07648, 2018 Alrehamy Hassan and Walker Coral, Exploiting extensible background knowledge for clustering-based automatic keyphrase extraction Soft Computing, 2018 22(21): p 7041-7057 Alzaidy Rabah, Caragea Cornelia, and Giles C Lee Bi-LSTM-CRF sequence labeling for keyphrase extraction from scholarly documents in The world wide web conference 2019 Amoualian Hesam, et al Streaming-lda: A copula-based approach to modeling topic dependencies in document streams in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining 2016 10 Antonellis Panagiotis, et al., Efficient Algorithms for Clustering Data and Text Streams, in Encyclopedia of Information Science and Technology, Third Edition 2015, IGI Global p 1767-1776 11 Bakkum Douglas J, et al., Parameters for burst detection Frontiers in computational neuroscience, 2014 7: p 193 12 Beliga Slobodan, Meštrović Ana, and Martinčić-Ipšić Sanda, Selectivitybased keyword extraction method International Journal on Semantic Web and Information Systems (IJSWIS), 2016 12(3): p 1-26 13 Bicalho Paulo, et al., A general framework to expand short text for topic modeling Information Sciences, 2017 393: p 66-81 14 Blei David M and Lafferty John D Dynamic topic models in Proceedings of the 23rd international conference on Machine learning 2006 15 Blei David M, Ng Andrew Y, and Jordan Michael I, Latent Dirichlet Allocation Journal of machine Learning research, 2003 3(Jan): p 993-1022 16 Cai Yanli and Sun Jian-Tao, Text Mining, in Encyclopedia of Database Systems, L Liu and M.T ÖZsu, Editors 2009, Springer US: Boston, MA p 30613065 17 Cami Bagher Rahimpour, Hassanpour Hamid, and Mashayekhi Hoda, User preferences modeling using dirichlet process mixture model for a content-based recommender system Knowledge-Based Systems, 2019 163: p 644-655 18 Cao Feng, et al Density-based clustering over an evolving data stream with noise in Proceedings of the 2006 SIAM international conference on data mining 2006 SIAM 19 Chen Gang, Deep learning with nonparametric clustering arXiv preprint arXiv:1501.03084, 2015 20 Chen Junyang, Gong Zhiguo, and Liu Weiwen, A Dirichlet process bitermbased mixture model for short text stream clustering Applied Intelligence, 2020: p 1-11 21 Curiskis Stephan A, et al., An evaluation of document clustering and topic modelling in two online social networks: Twitter and Reddit Information Processing & Management, 2020 57(2): p 102034 22 Darling William M A theoretical and practical implementation tutorial on topic modeling and gibbs sampling in Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies 2011 23 Du Nan, et al Dirichlet-hawkes processes with applications to clustering continuous-time document streams in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2015 24 Duan Tiehang, et al Sequential embedding induced text clustering, a nonparametric bayesian approach in Pacific-Asia Conference on Knowledge Discovery and Data Mining 2019 Springer 25 Erkan Günes and Radev Dragomir R, Lexrank: Graph-based lexical centrality as salience in text summarization Journal of Artificial Intelligence Research, 2004 22: p 457-479 26 Ferguson Thomas S, A Bayesian analysis of some nonparametric problems The annals of statistics, 1973: p 209-230 27 Finegan-Dollak Catherine, et al Effects of creativity and cluster tightness on short text clustering performance in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2016 28 Fisher David, et al., Evaluating ranking diversity and summarization in microblogs using hashtags University of Massachusetts, Boston, MA, Technical Report, 2015 29 Fung Gabriel Pui Cheong, et al Parameter free bursty events detection in text streams in Proceedings of the 31st international conference on Very large data bases 2005 VLDB Endowment 30 Guo Xifeng, et al Improved deep embedded clustering with local structure preservation in IJCAI 2017 31 Guo Xifeng, et al Deep clustering with convolutional autoencoders in International conference on neural information processing 2017 Springer 32 Heydari Atefeh, et al., Detection of review spam: A survey Expert Systems with Applications, 2015 42(7): p 3634-3642 33 Hosseinimotlagh Seyedmehdi and Papalexakis Evangelos E Unsupervised content-based identification of fake news articles with tensor decomposition ensembles in Proceedings of the Workshop on Misinformation and Misbehavior Mining on the Web (MIS2) 2018 34 Hu Jun and Zheng Wendong Transformation-gated LSTM: Efficient capture of short-term mutation dependencies for multivariate time series prediction tasks in 2019 International Joint Conference on Neural Networks (IJCNN) 2019 IEEE 35 Hu Xia and Liu Huan, Text analytics in social media Mining text data, 2012: p 385-414 36 Hu Xuegang, Wang Haiyan, and Li Peipei, Online Biterm Topic Model based short text stream classification using short text expansion and concept drifting detection Pattern Recognition Letters, 2018 116: p 187-194 37 Jiang Zhuxi, et al., Variational deep embedding: An unsupervised and generative approach to clustering arXiv preprint arXiv:1611.05148, 2016 38 Jindal Vasu A personalized Markov clustering and deep learning approach for Arabic text categorization in Proceedings of the ACL 2016 Student Research Workshop 2016 39 Kalogeratos Argyris, Zagorisios Panagiotis, and Likas Aristidis Improving text stream clustering using term burstiness and co-burstiness in Proceedings of the 9th Hellenic Conference on Artificial Intelligence 2016 40 Kampffmeyer Michael, et al., Deep divergence-based approach to clustering Neural Networks, 2019 113: p 91-101 41 Kim Jaeyoung, et al., Patent document clustering with deep embeddings Scientometrics, 2020: p 1-15 42 Kleinberg Jon, Bursty and hierarchical structure in streams Data Mining and Knowledge Discovery, 2003 7(4): p 373-397 43 Lahiri Shibamouli, Mihalcea Rada, and Lai P-H, Keyword extraction from emails Natural Language Engineering, 2017 23(2): p 295-317 44 Le Hong Phuong Nguyen Thi Minh, Huyen Azim Roussanaly, and Vinh Hô Tuong, A hybrid approach to word segmentation of Vietnamese texts Language and Automata Theory and Applications, 2008: p 240 45 Li Chenliang, et al., Enhancing topic modeling for short texts with auxiliary word embeddings ACM Transactions on Information Systems (TOIS), 2017 36(2): p 1-30 46 Li Chenliang, et al Topic modeling for short texts with auxiliary word embeddings in Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval 2016 47 Li Hua, Text Clustering, in Encyclopedia of Database Systems, L Liu and M.T ÖZsu, Editors 2009, Springer US: Boston, MA p 3044-3046 48 Li Shan-Qing, Du Sheng-Mei, and Xing Xiao-Zhao A keyword extraction method for chinese scientific abstracts in Proceedings of the 2017 International Conference on Wireless Communications, Networking and Applications 2017 49 Liang Shangsong and de Rijke Maarten, Burst-aware data fusion for microblog search Information Processing & Management, 2015 51(2): p 89113 50 Liang Shangsong, Yilmaz Emine, and Kanoulas Evangelos Dynamic clustering of streaming short documents in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining 2016 51 Lynn Htet Myet, et al., Swiftrank: an unsupervised statistical approach of keyword and salient sentence extraction for individual documents Procedia computer science, 2017 113: p 472-477 52 Mai Khai, et al Enabling hierarchical Dirichlet processes to work better for short texts at large scale in Pacific-Asia Conference on Knowledge Discovery and Data Mining 2016 Springer 53 Margara Alessandro and Rabl Tilmann, Definition of Data Streams, in Encyclopedia of Big Data Technologies, S Sakr and A.Y Zomaya, Editors 2019, Springer International Publishing: Cham p 648-652 54 Martínez-Fernández José Luis, et al Automatic keyword extraction for news finder in International Workshop on Adaptive Multimedia Retrieval 2003 Springer 55 Musselman Andrew, Apache Mahout, in Encyclopedia of Big Data Technologies, S Sakr and A.Y Zomaya, Editors 2019, Springer International Publishing: Cham p 66-70 56 Neal Radford M, Markov chain sampling methods for Dirichlet process mixture models Journal of computational and graphical statistics, 2000 9(2): p 249-265 57 Neill Daniel B and Moore Andrew W Anomalous spatial cluster detection in Proceedings of the KDD 2005 Workshop on Data Mining Methods for Anomaly Detection 2005 58 Neill Daniel B, et al Detecting significant multidimensional spatial clusters in Advances in Neural Information Processing Systems 2005 59 Nguyen Hai-Long, Woon Yew-Kwong, and Ng Wee-Keong, A survey on data stream clustering and classification Knowledge and information systems, 2015 45(3): p 535-569 60 Nguyen Tri and Do Phuc Topic discovery using frequent subgraph mining approach in International Conference on Computational Science and Technology 2017 Springer 61 Park Jinuk, et al., ADC: Advanced document clustering using contextualized representations Expert Systems with Applications, 2019 137: p 157-166 62 Peters Matthew E, et al., Deep contextualized word representations arXiv preprint arXiv:1802.05365, 2018 63 Pham Phu, Do Phuc, and Ta Chien DC GOW-LDA: Applying Term Cooccurrence Graph Representation in LDA Topic Models Improvement in International Conference on Computational Science and Technology 2017 Springer 64 Pitman Jim, Combinatorial Stochastic Processes: Ecole d'Eté de Probabilités de Saint-Flour XXXII-2002 2006: Springer 65 Qiang Jipeng, et al Topic modeling over short texts by incorporating word embeddings in Pacific-Asia Conference on Knowledge Discovery and Data Mining 2017 Springer 66 Qiang Jipeng, et al., Short text clustering based on Pitman-Yor process mixture model Applied Intelligence, 2018 48(7): p 1802-1812 67 Quan Xiaojun, et al Short and sparse text topic modeling via selfaggregation in Twenty-fourth international joint conference on artificial intelligence 2015 68 Quan Xiaojun, et al., Latent discriminative models for social emotion detection with emotional dependency ACM Transactions on Information Systems (TOIS), 2015 34(1): p 1-19 69 Romsaiyud Walisa Detecting emergency events and geo-location awareness from twitter streams in The International Conference on ETechnologies and Business on the Web (EBW2013) 2013 The Society of Digital Information and Wireless Communication 70 Saul Lawrence K, Weiss Yair, and Bottou Léon, Advances in neural information processing systems 17: Proceedings of the 2004 conference Vol 17 2005: MIT press 71 Shah Setu and Luo Xiao Comparison of deep learning based concept representations for biomedical document clustering in 2018 IEEE EMBS international conference on biomedical & health informatics (BHI) 2018 IEEE 72 Shaham Uri, et al., Spectralnet: Spectral clustering using deep neural networks arXiv preprint arXiv:1801.01587, 2018 73 Shi Tian, et al Short-text topic modeling via non-negative matrix factorization enriched with local word-context correlations in Proceedings of the 2018 World Wide Web Conference 2018 74 Shou Lidan, et al Sumblr: continuous summarization of evolving tweet streams in Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval 2013 75 Teh Yee Whye, Dirichlet Process 2010: p 280-287 76 Teh Yee Whye, Dirichlet Process 2010 77 Tian Kai, Zhou Shuigeng, and Guan Jihong Deepcluster: A general clustering framework based on deep learning in Joint European Conference on Machine Learning and Knowledge Discovery in Databases 2017 Springer 78 Vlachos Michail, et al Identifying similarities, periodicities and bursts for online search queries in Proceedings of the 2004 ACM SIGMOD international conference on Management of data 2004 ACM 79 Wan Haowen, et al., Research on Chinese Short Text Clustering Ensemble via Convolutional Neural Networks, in Artificial Intelligence in China 2020, Springer p 622-628 80 Wang Binyu, et al., Text clustering algorithm based on deep representation learning The Journal of Engineering, 2018 2018(16): p 1407-1414 81 Wang Mengzhi, et al Data mining meets performance evaluation: Fast algorithms for modeling bursty traffic in Proceedings 18th International Conference on Data Engineering 2002 IEEE 82 Wang Wu, et al Learning latent topics from the word co-occurrence network in National Conference of Theoretical Computer Science 2017 Springer 83 Wang Xuerui and McCallum Andrew Topics over time: a non-Markov continuous-time model of topical trends in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining 2006 84 Wang Yinglin, Wang Ming, and Fujita Hamido, Word sense disambiguation: A comprehensive knowledge exploitation framework Knowledge-Based Systems, 2020 190: p 105030 85 Wang Yu, Agichtein Eugene, and Benzi Michele TM-LDA: efficient online modeling of latent topic transitions in social media in Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining 2012 86 Wang Zhiguo, Mi Haitao, and Ittycheriah Abraham, Semi-supervised clustering for short text via deep representation learning arXiv preprint arXiv:1602.06797, 2016 87 Weng Jianshu and Lee Bu-Sung, Event detection in twitter ICWSM, 2011 11: p 401-408 88 Xie Junyuan, Girshick Ross, and Farhadi Ali Unsupervised deep embedding for clustering analysis in International conference on machine learning 2016 89 Xu Dongkuan, et al Deep co-clustering in Proceedings of the 2019 SIAM International Conference on Data Mining 2019 SIAM 90 Xu Jiaming, et al., Self-taught convolutional neural networks for short text clustering Neural Networks, 2017 88: p 22-31 91 Yamamoto Shuhei, et al., Twitter user tagging method based on burst time series International Journal of Web Information Systems, 2016 12(3): p 292-311 92 Yan Xifeng and Han Jiawei gspan: Graph-based substructure pattern mining in 2002 IEEE International Conference on Data Mining, 2002 Proceedings 2002 IEEE 93 Yang Bo, et al Towards k-means-friendly spaces: Simultaneous deep learning and clustering in international conference on machine learning 2017 PMLR 94 Yang Min, et al., Cross-domain aspect/sentiment-aware abstractive review summarization by combining topic modeling and deep reinforcement learning Neural Computing and Applications, 2020 32(11): p 6421-6433 95 Yang Zaihan, et al Parametric and non-parametric user-aware sentiment topic models in Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval 2015 96 Yi Junkai, et al., A novel text clustering approach using deep-learning vocabulary network Mathematical Problems in Engineering, 2017 2017 97 Yin Jianhua, et al Model-based clustering of short text streams in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 2018 98 Yin Jianhua and Wang Jianyong A model-based approach for text clustering with outlier detection in 2016 IEEE 32nd International Conference on Data Engineering (ICDE) 2016 IEEE 99 Yin Jianhua and Wang Jianyong A text clustering algorithm using an online clustering scheme for initialization in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining 2016 100.Yoo Shinjae, Huang Hao, and Kasiviswanathan Shiva Prasad Streaming spectral clustering in 2016 IEEE 32nd international conference on data engineering (ICDE) 2016 IEEE 101.Yuan Chunyuan, et al Learning review representations from user and product level information for spam detection in 2019 IEEE International Conference on Data Mining (ICDM) 2019 IEEE 102.Zhang Xin, Fast algorithms for burst detection 2006, New York University, Graduate School of Arts and Science 103.Zhang Yun, Hua Weina, and Yuan Shunbo, Mapping the scientific research on open data: A bibliometric review Learned Publishing, 2018 31(2): p 95106 104.Zhou Deyu, et al., Unsupervised event exploration from social text streams Intelligent Data Analysis, 2017 21(4): p 849-866 105.Zhu Longxia, et al., A joint model of extended LDA and IBTM over streaming Chinese short texts Intelligent Data Analysis, 2019 23(3): p 681699 106.Zubaroğlu Alaettin and Atalay Volkan, Data stream clustering: a review Artificial Intelligence Review, 2020 107.Zuo Yuan, et al Topic modeling of short texts: A pseudo-document view in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining 2016 108 Zuo Yuan, Zhao Jichang, and Xu Ke, Word network topic model: a simple but general solution for short and imbalanced texts Knowledge and Information Systems, 2016 48(2): p 379-398 ... đề gom cụm luồng văn ngắn; Vấn đề gom cụm luồng văn với chủ đề không cố định; Vấn đề xét mối liên hệ đồng từ gom cụm luồng văn bản; Vấn đề phát cụm từ xu nắm bắt ngữ nghĩa xu từ văn đến từ luồng; ... góp luận án: so sánh số cách tiếp cận liên quan đến gom cụm luồng văn bản, tiếp cận phát kiện phát bật luồng văn 2.1 So sánh số cách tiếp cận liên quan đến gom cụm luồng văn Các nghiên cứu gần gom. .. hệ từ gom cụm luồng văn bản; Vấn đề gom cụm luồng văn tiếng Việt; Vấn đề tiền xử lý nội dung văn trước tiến hành gom cụm việc vận dụng chế rút trích từ khóa tiền xử lý văn bản; Vấn đề phát xu

Định dạng
Số trang	151
Dung lượng	2,57 MB