1. Trang chủ
  2. » Luận Văn - Báo Cáo

Topi modeling and its applications

85 2 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Nội dung

4List of Tables ...5List of Figures...6Chapter 1 INTRODUCTION...7Chapter 2 MODERN PROGRESS IN TOPIC MODELING ...112.1 Linear algebra based models...122.2 Statistical topic models ...132.

MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF TECHNOLOGY THÂN QUANG KHOÁT TOPIC MODELING AND ITS APPLICATIONS MAJOR: INFORMATION TECHNOLOGY THESIS FOR THE DEGREE OF MASTER OF SCIENCE SUPERVISOR: Prof HỒ TÚ BẢO HANOI, 2009 Tai ngay!!! Ban co the xoa dong chu nay!!! 17057205189431000000 MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF TECHNOLOGY THESIS FOR THE DEGREE OF MASTER OF SCIENCE MAJOR: INFORMATION TECHNOLOGY TOPIC MODELING AND ITS APPLICATIONS THAN QUANG KHOAT HANOI, 2009 MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF TECHNOLOGY THÂN QUANG KHOÁT TOPIC MODELING AND ITS APPLICATIONS MAJOR: INFORMATION TECHNOLOGY THESIS FOR THE DEGREE OF MASTER OF SCIENCE SUPERVISOR: Prof HỒ TÚ BẢO HANOI, 2009 PLEDGE I promise that the content of this thesis was written solely by me Any of the content was written based on the reliable references such as published papers in distinguished international conferences and journals, and books published by widely-known publishers Many parts and discussions of the thesis are new, not previously published by any other authors ACKNOWLEDGEMENT First and foremost, I would like to present my gratitude to my supervisor, Professor Ho Tu Bao, for introducing me to this attractive research area, for his willingness to promptly support me to complete the thesis, and for many invaluable advices from the starting point of my thesis I would like to sincerely thank Nguyen Phuong Thai and Nguyen Cam Tu for sharing some data sets and for pointing me to some sources on the network where I can find the implementations of some topic models Thanks are also to Phung Trung Nghia for spending his valuable days on helping me to load the data for my experiments Finally, I would like to thank David Blei and Thomas Griffiths for their insightful discussions on Topic Modeling and for providing the C implementation of one of their topic models TABLE OF CONTENTS List of Phrases List of Tables List of Figures Chapter INTRODUCTION .7 Chapter MODERN PROGRESS IN TOPIC MODELING 11 2.1 Linear algebra based models 12 2.2 Statistical topic models 13 2.3 Discussion and notes 18 Chapter LINEAR ALGEBRA BASED TOPIC MODELS 21 3.1 An overview 21 3.2 Latent Semantic Analysis 22 3.3 QR factorization .33 3.4 Discussion 35 Chapter PROBABILISTIC TOPIC MODELS 37 4.1 An overview 37 4.2 Probabilistic Latent Semantic Analysis 39 4.3 Latent Dirichlet Allocation 44 4.4 Hierarchical Latent Dirichlet Allocation 53 4.5 Bigram Topic Model 60 Chapter SOME APPLICATIONS OF TOPIC MODELS 64 5.1 Classification 64 5.2 Analyzing research trends over times 65 5.3 Semantic representation 66 5.4 Information retrieval 67 5.5 More applications 68 5.6 Experimenting with some topic models 68 CONCLUSION 74 REFERENCES 75 LIST OF PHRASES Abbreviation AI ART AT BTM cDTM CTM dDTM DELSA DiscLDA EM HDP HDP-RE hLDA HMM-LDA HTMM IG-LDA IR LDA LSA MBTM MCMC nCRP NetSTM PF-LDA pLSA PLSV sLDA Spatial LDA STM SVD TEM Full name Artificial Intelligence Author-Recipient-Topic Model Author-Topic Model Bigram Topic Model Continuous Dynamic Topic Model Correlated Topic Model Discrete Dynamic Topic Model Dirichlet Enhanced LSA Discriminative LDA Expectation Maximization Hierarchical Dirichlet Processes Hierarchical Dirichlet Processes with random effects Hierarchical Latent Dirichlet Allocation Hidden Markov Model LDA Hidden Topic Markov Model Incremental Gibbs LDA Information Retrieval Latent Dirichlet Allocation Latent Semantic Analysis Memory Bounded Topic Model Markov Chain Monte Carlo Nested Chinese restaurant process Network Regularized Statistical Topic Model Particle Filter LDA Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Visualization Supervised Latent Dirichlet Allocation Spatial Latent Dirichlet Allocation Syntactic Topic Model Singular Value Decomposition Tempered EM algorithm LIST OF TABLES Table 2.1 Some selected Probabilistic topic models 15 Table 5.1 DiscLDA for Classification 65 Table 5.2 Comparison of query likelihood retrieval (QL), cluster-based retrieval (CBDM) and retrieval with the LDA-based document models (LBDM) .68 Table 5.3 The most probable topics from NIPS and VnExpress collections 70 Table 5.4 Finding the topics of a document 71 Table 5.5 Finding topics of a report 71 Table 5.6 Selected topics found by HMM-LDA 72 Table 5.7 Classes of function words found by HMM-LDA .73 LIST OF FIGURES Figure 1.1 Some approaches to representing knowledge Figure 2.1 A general view on Topic Modeling 12 Figure 2.2 Probabilistic topic models in view of the bag-of-words assumption 16 Figure 2.3 Viewing generative models in terms of Topics 17 Figure 2.4 A parametric view on generative models 18 Figure 3.1 A corpus consisting of documents .23 Figure 3.2 An illustration of finding topics by LSA using cosine 29 Figure 3.3 A geometric illustration of representing items in 2-dimensional space 30 Figure 3.4 Finding relevant documents using QR-based method .34 Figure 4.1 Graphical model representation of pLSA 40 Figure 4.2 A geometric interpretation of pLSA 41 Figure 4.3 Graphical model representation of LDA 46 Figure 4.4 A geometric interpretation of LDA 46 Figure 4.5 A variational inference algorithm for LDA .48 Figure 4.6 A geometric illustration of document generation process 55 Figure 4.7 An example of hierarchy of topics [8] 58 Figure 4.8 A graphical model representation of BTM 61 Figure 5.1 LDA for Classification 64 Figure 5.2 The dynamics of the three hottest and three coldest topics .65 Figure 5.3 Evolution of topics through decades 66 Chapter INTRODUCTION Information Retrieval (IR) has been being a very active area and has a long history The development of IR often associates with increasingly huge corpora such as collections of Web pages, collections of scientific papers over years Therefore, it poses many hard questions that have received much attention from researchers One of the most famous questions that seem to be never ended is how to automatically index the documents of a given corpus or database Another substantial question is how to find the most relevant documents in the semantic manner from the Internet or a given corpus to a given user’s query Finding and ranking are usually important tasks in IR Many tools for supporting these tasks are available now, for example, Google and Yahoo However most of these available tools are only able to search for documents via words matching instead of semantic matching Semantics is well-known to be complicated, so finding and ranking documents in the presence of semantics are extremely hard Despite of this fact, these tasks however potentially have many important applications, which in my opinion are future web service technologies, for instance, semantic searching, semantic advertising, academic recommending, and intelligent controlling Semantics is a hot topic not only in the IR community but also in the Artificial Intelligence (AI) community In particular, in the field of knowledge representation it is crucial to know how to effectively represent natural knowledge gathered from the environment around so that reusing it or integrating new knowledge are easy and efficient To obtain a good knowledge database, semantics cannot be absent since any word has its own meanings and has semantic relations to some other words As we know, a word may have multiple senses and play different roles in

Ngày đăng: 22/01/2024, 17:06

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w