1. Trang chủ
  2. » Ngoại Ngữ

Graduation Thesis Computer Science Finding the semantic similarity in Vietnamese

67 257 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 67
Dung lượng 2,77 MB

Nội dung

VIETNAM NATIONAL UNIVERSITY, HA NOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY  Nguyen Tien Dat FINDING THE SEMANTIC SIMILARITY IN VIETNAMESE GRADUATION THESIS Major Field: Computer Science Ha Noi – 2010 VIETNAM NATIONAL UNIVERSITY, HA NOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY  Nguyen Tien Dat FINDING THE SEMANTIC SIMILARITY IN VIETNAMESE GRADUATION THESIS Major Field: Computer Science Supervisor: Phd Phạm Bảo Sơn Ha Noi – 2010 Finding the semantic similarity in Vietnamese Nguyen Tien Dat Abstract Our thesis shows the quality of semantic vector representation with random projection and Hyperspace Analogue to Language model under about the researching on Vietnamese The main goal is how to find semantic similarity or to study synonyms in Vietnamese We are also interested in the stability of our approach that uses Random Indexing and HAL to represent semantic of words or documents We build a system to find the synonyms in Vietnamese called Semantic Similarity Finding System In particular, we also evaluate synonyms resulted from our system Keywords: Semantic vector, Word space model, Random projection, Apache Lucene i Finding the semantic similarity in Vietnamese Nguyen Tien Dat Acknowledgments First of all, I wish to express my respect and my deepest thanks to my advisor Pham Bao Son, University of Engineering and Technology, Viet Nam National University, Ha Noi for his enthusiastic guidance, warm encouragement and useful research experiences I would like to gratefully thank all the teachers of University of Engineering and Technology, VNU for their invaluable knowledge which they provide me during the past four academic years I would also like to send my special thanks to my friends in K51CA class, HMI Lab Last, but not least, my family is really the biggest motivation for me My parents and my brother always encourage me when I have stress and difficulty I would like to send them great love and gratefulness Ha Noi, May 19, 2010 Nguyen Tien Dat ii Finding the semantic similarity in Vietnamese Nguyen Tien Dat Contents Abstract i Acknowledgments ii Contents iii Figure List vi Table List vii Table List vii Chapter Introduction Chapter Background Knowledge .4 2.1 Lexical relations 2.1.1 Synonym and Hyponymy 2.1.2 Antonym and Opposites .5 2.2 Word-space model .6 2.2.1 Definition 2.2.2 Semantic similarity .8 2.2.3 Document-term matrix 2.2.4 Example: tf-idf weights 10 2.2.5 Applications 11 2.3 Word space model algorithms 12 2.3.1 Context vector 12 2.3.2 Word-concurrence Matrices .13 iii Finding the semantic similarity in Vietnamese Nguyen Tien Dat 2.3.3 Similarity Measure 15 2.4 Implementation of word space model 16 2.4.1 Problems 16 High dimensional matrix 16 Data sparseness 17 Dimension reductions 17 2.4.2 Latent semantic Indexing 18 2.4.3 Hyperspace Analogue to Language 21 2.4.4 Random Indexing 22 Chapter Semantic Similarity Finding System 25 3.1 System Description 25 3.2 System Processes Flow 26 3.3 Lucene Indexing 27 3.3 Semantic Vector Package 29 3.4 System output .31 Chapter Experimental setup and Evaluations 33 4.1 Data setup 33 4.2 Experimental measure .33 4.2.1 Test Corpus .34 4.2.2 Experimental Metric 35 4.3 Experiment 1: Two kinds of context vector .36 4.4 Experiment 2: Context-size Evaluation .37 4.5 Experiment 3: Performance of system .41 4.6 Discussion 47 iv Finding the semantic similarity in Vietnamese Nguyen Tien Dat Chapter Conclusion and Future work 50 References 52 Appendix .56 v Finding the semantic similarity in Vietnamese Nguyen Tien Dat Figure List Figure 2.1: Word geometric repersentaion Figure 2.2: Cosine distance Figure 2.3: The processes of Random indexing 23 Figure 3.1: The processes of Semantic Similarity Finding System 26 Figure 3.2: Lucene Index Toolbox - Luke 28 Figure 4.1: Context size 37 Figure 4.2: P1 when context-size changes 40 Figure 4.3: Pn, n=1 19 for each kind of word 45 Figure 4.4 Average synonym for Test Corpus 46 vi Finding the semantic similarity in Vietnamese Nguyen Tien Dat Table List Table 2.1: A example of documents-term matrix Table 2.2: Word co-occurrence table 13 Table 2.3: Co-occurrences Matrix 14 Table 4.1: All words in Test corpus – Target words 34 Table 4.2 Results of Mode and on Test Corpus 36 Table 4.3: Results of Context-size Experiment 39 Table 4.4: P1(TestCorpus) for each context-size 40 Table 4.5: the best synonyms 42 Table 4.5: the best synonyms of all target words retuned by our System 42 Table 4.6: Result output for nouns returned by system 43 Table 4.7: Result output for pronouns returned by system 43 Table 4.8: Result output for Verbs returned by system 44 Table 4.9: Result output for Verbs returned by system 44 Table 4.10: Pn , n = 1, 19 of each kind of word and Test Corpus 45 Table 4.11: Some interesting results 48 vii Chapter 1: Introduction Nguyen Tien Dat Chapter Introduction Finding semantic similarity is an interesting project in Natural Language Processing (NLP) Determining semantic similarity of a pair of words is an important problem in many NLP applications such as: web-mining [18] (search and recommendation systems), targeted advertisement and domains that need semantic content matching, word sense disambiguation, text categorization [28][30] There is not much research done on semantic similarity for Vietnamese, while semantic similarity plays a crucial role for human categorization [11] and reasoning; and computational similarity measures have also been applied to many fields such as: semantics-based information retrieval [4][29], information filter [9] or ontology engineering [19] Nowadays, word space model is often used in current research in semantic similarity Specifically, there are many well-known approaches for representing the context vector of words such as: Latent Semantic Indexing (LSI) [17], Hyperspace Analogue to Language (HAL) [21] and Random Indexing (RI) [26] These approaches have been introduced and they have proven useful in implementing word space model [27] In our thesis, we carry on the word space model and implementation for computing the semantic similarity We have studied every method and investigated their advantages and disadvantages to select the suitable technique to apply for Vietnamese text data Then, we built a complete system for finding synonyms in Vietnamese It is called Semantic Similarity Finding System Our system is a Chapter Experimental setup and Evaluations Nguyen Tien Dat Table 4.8: Result output for Verbs returned by system 10 11 12 13 14 15 16 17 18 19 20 Verbs bố trí bàn luận bồi thường buộc tội chữa trị dẫn công bố cân nhắc đàm phán trì giảng dạy giúp đỡ hét nhận xét sử dụng suy thoái thiết kế thực thi xét duyệt yểm trợ x x x x x x x x x x x 10 11 12 13 14 x 15 x 16 17 18 x x 19 x x x x x x X x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x Table 4.9: Result output for Verbs returned by system 10 11 12 13 14 15 16 17 18 19 20 Adjectives an toàn dứt khoát đau đớn giá lạnh kỹ mù mịt mơn mởn mạnh mẽ nhanh chóng lãng mạn phong phú sáng tạo thích hợp xung khắc xinh xắn yếu yên bình ý nhị x 10 11 12 13 14 15 16 17 18 19 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 44 Chapter Experimental setup and Evaluations Nguyen Tien Dat Table 4.10: Pn , n = 1, 19 of each kind of word and Test Corpus P 10 11 12 13 14 15 16 17 18 19 noun 65 80 85 85 90 95 95 95 95 95 95 100 100 100 100 100 100 100 100 pronoun 35 50 55 55 65 65 65 65 70 70 70 70 70 70 70 70 70 70 70 verb 75 90 95 95 95 100 100 100 100 100 100 100 100 100 100 100 100 100 100 adjective 45 55 60 60 60 65 70 70 70 75 80 80 85 85 90 90 95 95 100 all 55 68.75 73.75 73.75 77.5 81.25 82.5 82.5 83.75 85 86.25 87.5 88.75 88.75 90 90 91.25 91.25 92.5 Figure 4.3: Pn, n=1 19 for each kind of word We compare the synonyms among four kinds of target words according to Figure 4.3 The pronoun has the smallest number of synonyms because there are not many synonymic pronouns in the Vietnamese The verb and noun have various synonyms In the top output, our system found out all synonyms for all verbs, and nouns rank Top 12 The accuracy level of finding synonym is 75% for verbs and 65 % for nouns The synonyms of nouns are more accurate than adjectives For all target 45 Chapter Experimental setup and Evaluations Nguyen Tien Dat words in Test Corpus, all synonymic nouns are found in top 12 outputs, while all synonymic adjectives are in Top 19 output Figure 4.4 Average synonym for Test Corpus The Figure 4.4 shows the average estimation for the target words inTest Corpus In the average, in the top word outputs, it has promising results for finding synonym; the correctness of system is 81.25% In this level, all synonyms of verbs, nouns have been found, and the adjectives shows quite good enough results while output of the pronouns does not On the other hand, through various outputs, we have found out interesting outcomes shown in the following section 46 Chapter Experimental setup and Evaluations Nguyen Tien Dat 4.6 Discussion After three experiments, we have collected many promising results and this section is a review of the performance of our Semantic Similarity Finding System In the experiment 1: we prove the effect of granularity on the quality of context vector Both documents and sliding windows will be used to compute context vectors As a result, in the document case, the output contains the words have related meaning with the target word They are not synonyms But, as the context vector is computed by the words that immediately surround the target word, the outputs are truly synonyms An example: words have related meaning from MODE and synonym in MODE 2: 0| 1.0000001:cơchế | 1| 0.3430684:sảnxuất | 2| 0.32518548:đổimới | 3| 0.32417223:quantrọng | 4| 0.31907144:chínhsách | 5| 0.3120406:đồngtình | 6| 0.30772704:nhànước | 7| 0.2999777:kháiquát | 8| 0.29790342:marineinntd | 9| 0.29790342:innovativtd | 10| 0.29790342:acrylic | 11| 0.2899913:minhbạch | 12| 0.2868013:giảipháp | 13| 0.285642:họcphí | 14| 0.28482476:lêthanhquân | 15| 0.28482473:phướchảo | 16| 0.28482473:đrangphốk | 17| 0.28482473:nhânan | 0| 1.0:cơchế | 1| 0.8407506:chínhsách | 2| 0.83535266:cụthể | 3| 0.8268509:chếđộ | 4| 0.8252427:việc | 5| 0.82509196:chủtrương | 6| 0.8230261:đềxuất | 7| 0.81738627:phươngán | 8| 0.81170243:địnhhướng | 9| 0.81158334:việclàm | 10| 0.80798894:giảipháp | 11| 0.80683917:quychế | 12| 0.804382:cáchthức | 13| 0.80153966:kếhoạch | 14| 0.8009234:tiêuchí | 15| 0.8007951:mơhình | 16| 0.80066156:vấnđề | 17| 0.7983741:hệthống | 47 Chapter Experimental setup and Evaluations 18| 0.28347075:hiệnđạihóa | 19| 0.28016242:bộmáy | Nguyen Tien Dat 18| 0.7970572:nguyêntắc | 19| 0.79101145:khảnăng | In the experiment 2: it determines the suitable size of context to computing context vector for a word in Vietnamese The results show that the smallest window size (3 words) provides the most accurate results on this test In the experiment 3: We evaluate the performance of our System We collect many promising results The correctness of finding synonym for noun or word is approximate 65% - 75% Some verb and noun have retuned many synonyms, this prove the effect of our system: Table 4.11: Some interesting results Verbs Synonyms bồi thường đàm phán 0.8837459:đềnbù |0.79945433:bồihoàn |0.76926476:hoàntrả | 0.8709249:thươnglượng |0.8001806:thảoluận |0.77296454:thươngthảo |0.8193797:hộiđàm | 0.9136752:nhậnđịnh |0.86294633:giảithích |0.8611396:bìnhluận |0.8136452:phântích | 0.8109792:hạnchế |0.77753866:ngănngừa |0.8109792:hạnchế | nhận xét ngăn chặn Nouns cửa hiệu chế hội Synonyms 0.8465601:cửahàng |0.820595:gianhàng |0.81420326:đạilý | 0.8407506:chínhsách |0.8268509:chếđộ |0.80683917:quychế |0.82509196:chủtrương | 0.7895256:hiệphội |0.76933795:tổchức|0.75964046:câulạcbộ | However, we need improve the accuracy of our system for adjective and pronoun, we suggest some future works (Seeing: Chapter 5) Beside, in additional, we gain some very interesting results: Antonym Words: Target words have mean describe the status of person, as the adjective or the relationship in social (father, mother, you, friends ) refer many antonym word results in our experiments We found that a little of closest words seemed to be antonyms Some example is show below: 48 Chapter Experimental setup and Evaluations Nguyen Tien Dat 1.0000002:contrai - 0.89182067:congái 0.9999999:sungsướng- 0.8050421:đauđớn 1.0000006:yêu - 0.8493682:ghét Clustering Words: There are many words, we thinks, it is very difficult to find the word closest meaning to them In other hand, it has not similarity words But in some cases, they appear parallel with other words with the same surrounded words, target word and the outputs from our system together into a cluster or a word category Target words have mean about vehicle (car, boat, bicycle ) refer many clustering word results in our experiments It is true for some adjectives and verbs For example: 1.0000002:quyếnrũ 0.8099806:lãngmạn 0.80539143:cátính 0.79983556:đẹp 0.7683023:dịudàng 0.75999635:mạnhmẽ 0.75611085:ấmáp 0.7535376:hấpdẫn 0.7514998:lạnhlùng 0.75031424:sắcsảo 0.7454683:thôngminh 0.74529815:bảnlĩnh 0.7452806:khátvọng 0.7451791:cả 0.7425014:nhẹnhàng 0.7408441:trẻtrung 0.7363997:nhiệthuyết 0.73068196:cảmxúc 0.7290287:năngđộng 0.72863:tựnhiên 0.72862893:yêuthương 1.0:xemáy 0.8531696:xegắnmáy 0.8483326:xeđạp 0.84614867:xe 0.79172635:xetải 0.730841:học 0.7302092:xebuýt 0.72180295:máy 0.7201042:bấtngờ 0.71242905:thuyền 0.71030396:xehơi 0.7085694:liêntục 0.7065483:tay 0.7031712:đi 0.7022549:đã 0.69586486:có 0.6956005:để 0.69538856:chở 0.693656:phongbì 0.69343746:điệnthoạidiđộng 49 Chapter Conclusion and Future work Nguyen Tien Dat Chapter Conclusion and Future work In our thesis, we have done research on various techniques and investigated their advantages as well as disadvantages as to give out suitable techniques for tasks of finding semantic similarity in Vietnamese We built a complete system to find the synonyms in Vietnamese Our system applies Random Indexing approach to compute context vector of word Our free text corpus comprises 15169 documents whereas in the news or articles As a result we obtained many promising results about semantic similarity in Vietnamese We prove the effect of granularity on the quality of context vector and we give out a consequence that is the model constructed based on the cooccurrences of the words only with the previous and the next word About the computing synonyms in Vietnamese, the performance of system is good for nouns and verbs and the results for adjective or pronoun are promising In our experiment, the accuracy level of our system on Verbs and Nouns are higher than Adjective and Pronoun It is 75% for verbs and 65 % for nouns We suggest two future works to continue our thesis on how to improve the synonyms found by system: creating Directional words-by-words co-occurrence matrix and the Probabilistic approaches For the first future work, our Semantic Similarity Finding System provides geometric words-by–words co-occurrence matrix to compute context vector We count co-occurrences symmetrically in both directions within the window (three words on the left and three words on the right of target words); this is one case in the computing co-occurrence matrix Thus, some given words can be difficult to find synonym, especially for pronoun Seeing our experiment 2, the pronoun has not good results output result, the reason can be pronoun often be located at the first or last sentence position In additional, adjectives in Vietnamese is always placed after nouns in 50 Chapter Conclusion and Future work Nguyen Tien Dat sentences, it is different from English Therefore, to gain more accurate synonyms for adjective, we need to build the left directional words-by-words co-occurrence matrix We will build a new Mode in our system for computing co-occurrence matrix in which rows contain left-context co-occurrences, and columns contain right-context cooccurrences Therefore, we will collect the more exact synonyms on Vietnamese vocabulary For the second future work, to improve the synonym output; we need some techniques to leave out the unexpected results Three main methods: LSI, RI, HAL try to overcome the problems of lexical matching by using statistically derived conceptual indices instead of individual words for retrieval [4] Begins with a rigorous probabilistic model of corpus, we build a model of topics such as probability distributions on terms Documentation is the probability distribution that is a combination of a small number of subjects, whether a language is a set of documents obtained by continuously drawing the sample text Additional, we will calculate the probability of the co-occurrence between a given words with the surrounded words, so that reduce the sparseness of co-occurrence matrix As a result, the output synonyms are more accurate for all kind of lexicons 51 References Nguyen Tien Dat References [1] D Appelt 1999, An Introduction to information extraction, Artificial Intelligence Communications, 12, 1999 [2] David M Blei, Andrew Y Ng, Michael I Jordan 2003 Latent Dirichlet Allocation Journal of Machine Learning Research (2003) 993-1022 [3] Thorsten Brants, Francine Chen, and Ioannis Tsochantaridis Topic-based document segmentation with probabilistic latent semantic analysis In Conference on Information and Knowledge Management (CIKM), pages 211– 218, 2002 [4] MW.Berry, S.T Dumiais & G.W.O'Brien 1994 Using linear algebra for intelligent information retrieval Computer Science Department [5] Cowie and W.Lehnert 1996, Information Extraction, In Communications of the ACM, 39, 1996 [6] H Cunningham 1999, Information extraction: a User Guide (revised version), Research Menorandum CS-99-07, Department of Computer Science, University of Sheffied, May, 1999 [7] Deerwester, S., Dumais, S., Furnas, G., Landauer, T., and Harshman, R (1990) Indexing by latent semantic analysis Journal of the American Society for Information Science, 41(16):391–407.(p 157, p 159) [8] Dang Duc Pham, Giang Binh Tran, Son Pham Bao 2009 A hybrid approach to Vietnamese Word Segmentation using Part of Speech tags International Conference on Knowledge and Systems Engineering [9] Mohammad Emtiyaz Khan Matrix Inversion Lemma and Information Filter Honeywell Technology Solutions Lab, Bangalore, India 52 References [10] Nguyen Tien Dat Dr Edel Garcia 2006 Singular Value Decomposition (SVD) A Fast Track Tutorial First Published on September 11, 2006; Last Update: September 12, 2006 [11] Katherine Heller, Adam Sanborn, Nick Chater Hierarchical Learning of Dimensional Biases in Human Categorization Department of Engineering University of CambridgeCambridge CB2 1PZ [12] http://dictionary.reference.com/browse/antonym [13] http://en.wikipedia.org/wiki/SMART_Information_Retrieval_System [14] http://dictionary.reference.com/browse/hyponym [15] http://dictionary.reference.com/browse/synonym [16] Khoo, C., & Na, J.C (2006) Semantic Relations in Information Science Annual Review of Information Science and Technology, 40, 157-228 [17] Thomas K Landauer 1998 An Introduction to Latent Semantic Analysis Discourse Processes, 25, 259-284 [18] Raymond Kosala, Hendrik Blockeel 2001.Web Mining Research: A Survey Department of Computer Science Katholieke Universiteit LeuvenCelestijnenlaan 200A, B-3001 Heverlee, Belgium [19] Sergei Nirenburg, Victor Raskin and Svetlana Sheremetyeva Lexical Acquisition Computing Research Laboratory New Mexico State University [20] Claes Neuefeind Fabian Steeg 2009 Information-Retrieval: Vektorraum-Model Text-Engineering I - Information-Retrieval - Wintersemester 2009/2010 Informationsverarbeitung - Universit• at zu K• oln [21] Ulrik Petersen toLanguage) [22] Hyperspace Analogue to language [Lund and Burgess, 1996] Lund, Kevin and Curt Burgess (1996) Producing high-dimensional semantic spaces from lexical co-ccurrence, Behavior Research Methods, Instruments and Computers, Volume 28, number 2, pp 203–208 2009 Emdros HAL 53 example (Hyperspace Analogue References [23] Nguyen Tien Dat Robertson, S., & Sp• arck Jones, K (1997) Simple, proven approaches to text re trieval (Technical report No 356) Computer Laboratory, University of Cambridge [24] James Richard Curran 2004 From Distributional to Semantic Similarity Doctor of Philosophy Institute for Communicating and Collaborative Systems [25] Robertson, S., & Sp• arck Jones, K (1997) Simple, proven approaches to text retrieval (Technical report No 356) Computer Laboratory, University of Cambridge [26] Magnus Sahlgren 1980 An Introduction to Random Indexing SICS, Swedish Institute of Computer Science [27] Magnus Sahlgren 2006 Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces Universitetsservice US-AB, Sweden [28] Schütze, H 1998.: Automatic word sense discrimination Computational Linguistics 24, 1, 97-123.(1998) [29] Salton, G., and McGill, M J (1983) Introduction to Information Retrieval McGraw-Hill, Inc., New York, NY.(p 83, p 86, p 114, p 114, p 115) [30] Fabrizio Sebastiani Text Categorization Dipartimento di Matematica Pura e Applicata Universit` adiPadova35131 Padova, Italy [31] Salton, G., Wong, A., and Yang, C (1975) A vector space model for automatic indexing Communications of the ACM, 18:613–620 Association of Computing Machinery, Inc.(p 86) [32] Dominic Widdows, Kathleen Ferraro 1998 Semantic Vectors: A Scalable Open Source Package and Online Technology Management Application MAYA Design, University of Pittsburgh [33] Xing Wei and W Bruce Croft 2003 Modeling Term Associations for Ad-hoc Retrieval Performance within Language Modeling Framework.Center for Intelligent Information Retriev University of Massachusetts Amherst 54 References [34] Zipf, Nguyen Tien Dat G.(1949) Human behavior and the principle of least-e ort Cambridge, MA: Addison-Wesley [35] http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsALexicalRela tion.htm [36] http://en.wikipedia.org/wiki/Tf%E2%80%93idf 55 Appendix Nguyen Tien Dat Appendix Synonym Test Corpus Synonyms Target Word 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 lĩnh cửa hiệu chế cơng trình cơng việc diện mạo độc giả giải pháp giáo viên tượng hội học sinh người bệnh thực tập đoàn thời kỳ thương hiệu tình cảnh tài liệu yếu tố nhiêu thảy chung quanh chúng chúng tơi mày đằng hơm ý chí, cửa hang, gian hang, gian hang, đại lý, sách, chế độ, nguyên tắc dự án, việc mặt bạn đọc phương án, biện pháp giảng viên động thái hiệp hội, tổ chức, câu lạc sinh viên bệnh nhân thật nhà sản xuất, doanh nghiệp giai đoạn, tên tuổi hoàn cảnh, tình trạng liệu nhân tố bao năm lâu tất thảy, tất cả, tất tần tật, xung quanh nó, họ mày mình, đâu, đó, bạn lúc hơm trước 56 Appendix 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 ta trước hết bố trí bàn luận bồi thường buộc tội chữa trị dẫn cơng bố cân nhắc đàm phán trì giảng dạy giúp đỡ hét nhận xét sử dụng suy thoái thiết kế thực thi xét duyệt yểm trợ an toàn dứt khoát đau đớn giá lạnh kỹ mù mịt Mơn mởn mạnh mẽ nhanh chóng lãng mạn phong phú sáng tạo thích hợp xung khắc xinh xắn yếu yên bình ý nhị Nguyen Tien Dat thảy xưa, trước tơi thảy, tất trước tiên trang bị, thiết kết, xếp thảo luận, đàm phán, hội thảo, thương thảo đền bù, bồi hoàn, chi trả, bồi hoàn, hoàn trả kết tội, cáo buộc điều trị, chữa chạy hướng dẫn, đạo thông báo, công khai xem xét hội đàm, đàm thoại, thương lượng tiếp tục, giữ đào tạo, dạy ủng hộ, trợ giúp thốt, gào, kêu, vang thẩm định, bình luận, nhìn nhận, phân tích dùng, áp dụng suy giảm bố trí, trang bị thực thẩm định, xem xét hỗ trợ, giúp đỡ an ninh, đảm bảo, bảo đảm kiên đau khổ lạnh, lạnh giá kỹ lưỡng, lỹ mịt mù xum xuê liệt lập tức, nhanh, tức ngịa lập tức, nhanh chóng đa dạng thơng minh phù hợp, hợp lý, hợp kỳ thị xinh đẹp yếu, kém, thiếu xót bình n, n ả nhạy cảm 57 Appendix Nguyen Tien Dat 58

Ngày đăng: 28/10/2016, 21:30

Nguồn tham khảo

Tài liệu tham khảo Loại Chi tiết
[1] D. Appelt 1999, An Introduction to information extraction, Artificial Intelligence Communications, 12, 1999 Sách, tạp chí
Tiêu đề: An Introduction to information extraction
[2] David M. Blei, Andrew Y. Ng, Michael I. Jordan 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3 (2003) 993-1022 Sách, tạp chí
Tiêu đề: Latent Dirichlet Allocation
[3] Thorsten Brants, Francine Chen, and Ioannis Tsochantaridis. Topic-based document segmentation with probabilistic latent semantic analysis. In Conference on Information and Knowledge Management (CIKM), pages 211–218, 2002 Sách, tạp chí
Tiêu đề: Topic-based document segmentation with probabilistic latent semantic analysis
[4] MW.Berry, S.T Dumiais & G.W.O'Brien 1994. Using linear algebra for intelligent information retrieval. Computer Science Department Sách, tạp chí
Tiêu đề: Using linear algebra for intelligent information retrieval
[5] Cowie and W.Lehnert. 1996, Information Extraction, In Communications of the ACM, 39, 1996 Sách, tạp chí
Tiêu đề: Information Extraction
[6] H. Cunningham. 1999, Information extraction: a User Guide (revised version), Research Menorandum CS-99-07, Department of Computer Science, University of Sheffied, May, 1999 Sách, tạp chí
Tiêu đề: Information extraction: a User Guide (revised version)
[7] Deerwester, S., Dumais, S., Furnas, G., Landauer, T., and Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(16):391–407.(p. 157, p. 159) Sách, tạp chí
Tiêu đề: Indexing by latent semantic analysis
Tác giả: Deerwester, S., Dumais, S., Furnas, G., Landauer, T., and Harshman, R
Năm: 1990
[8] Dang Duc Pham, Giang Binh Tran, Son Pham Bao 2009. A hybrid approach to Vietnamese Word Segmentation using Part of Speech tags. International Conference on Knowledge and Systems Engineering Sách, tạp chí
Tiêu đề: A hybrid approach to Vietnamese Word Segmentation using Part of Speech tags
[9] Mohammad Emtiyaz Khan. Matrix Inversion Lemma and Information Filter. Honeywell Technology Solutions Lab, Bangalore, India Sách, tạp chí
Tiêu đề: Matrix Inversion Lemma and Information Filter
[10] Dr. Edel Garcia 2006. Singular Value Decomposition (SVD) A Fast Track Tutorial. First Published on September 11, 2006; Last Update: September 12, 2006 Sách, tạp chí
Tiêu đề: A Fast Track Tutorial
[11] Katherine Heller, Adam Sanborn, Nick Chater. Hierarchical Learning of Dimensional Biases in Human Categorization. Department of Engineering University of CambridgeCambridge CB2 1PZ Sách, tạp chí
Tiêu đề: Hierarchical Learning of Dimensional Biases in Human Categorization
[16] Khoo, C., & Na, J.C. (2006). Semantic Relations in Information Science. Annual Review of Information Science and Technology, 40, 157-228 Sách, tạp chí
Tiêu đề: Semantic Relations in Information Science
Tác giả: Khoo, C., & Na, J.C
Năm: 2006
[17] Thomas K Landauer 1998. An Introduction to Latent Semantic Analysis. Discourse Processes, 25, 259-284 Sách, tạp chí
Tiêu đề: An Introduction to Latent Semantic Analysis. "Discourse Processes
[18] Raymond Kosala, Hendrik Blockeel 2001.Web Mining Research: A Survey. Department of Computer Science Katholieke Universiteit LeuvenCelestijnenlaan 200A, B-3001 Heverlee, Belgium Sách, tạp chí
Tiêu đề: Web Mining Research: A Survey
[19] Sergei Nirenburg, Victor Raskin and Svetlana Sheremetyeva Lexical Acquisition. Computing Research Laboratory New Mexico State University Sách, tạp chí
Tiêu đề: Lexical Acquisition
[20] Claes Neuefeind Fabian Steeg 2009. Information-Retrieval: Vektorraum-Model. Text-Engineering I - Information-Retrieval - Wintersemester 2009/2010 - Informationsverarbeitung - Universit• at zu K• oln Sách, tạp chí
Tiêu đề: Information-Retrieval: Vektorraum-Model
[21] Ulrik Petersen 2009. Emdros HAL example (Hyperspace Analogue toLanguage) Sách, tạp chí
Tiêu đề: Emdros HAL example
[22] Hyperspace Analogue to language [Lund and Burgess, 1996] -- Lund, Kevin and Curt Burgess. (1996) Producing high-dimensional semantic spaces from lexical co-ccurrence, Behavior Research Methods, Instruments and Computers, Volume 28, number 2, pp. 203–208 Sách, tạp chí
Tiêu đề: Producing high-dimensional semantic spaces from lexical co-ccurrence
[23] Robertson, S., & Sp• arck Jones, K. (1997). Simple, proven approaches to text re trieval (Technical report No. 356). Computer Laboratory, University of Cambridge Sách, tạp chí
Tiêu đề: Simple, proven approaches to text re trieval
Tác giả: Robertson, S., & Sp• arck Jones, K
Năm: 1997
[24] James Richard Curran 2004. From Distributional to Semantic Similarity. Doctor of Philosophy Institute for Communicating and Collaborative Systems Sách, tạp chí
Tiêu đề: From Distributional to Semantic Similarity

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w