Báo cáo khoa học: "truecasing" doc
... broad- cast news documents and the translated Xinhua News Agency (XINHUA) documents in the ACE corpus do not contain any case information, while human transcribed broadcast news documents con- tain ... extraction of mentions of entities and relations between them from textual data. The tex- tual documents are from newswire, broadcast news with text derived from automatic speech recognition (A...
Ngày tải lên: 17/03/2014, 06:20
... Luck! Thuyết trình báo cáo khoa học - Presentation - Bài 3/3 - Bảo vệ TN • • 1 • 2 • 3 • 4 • 5 (2 votes) 12/05/2007 Trang 1 / 4 Mục lục bài viết Thuyết trình báo cáo khoa học - Presentation ... lược chúng ta dùng Thuyết trình báo cáo khoa học - Presentation - Bài 1/3 • • 1 • 2 • 3 • 4 • 5 (8 votes) 02/05/2007 Trang 1 / 2 Thuyết trình báo cáo khoa học...
Ngày tải lên: 19/01/2014, 21:20
... obfuscated docu- ment. This is important since there is no clear consensus as to which features should be used for authorship attribution. 2 Document Obfuscation Our approach to document obfuscation ... deletions to be equally weighted. However, while deletion sites in the document are easy to identify, Document Changes Doc Size Changes/1000 49 42 3849 10.9 50 46 2364 19.5...
Ngày tải lên: 20/02/2014, 12:20
Tài liệu Báo cáo khoa học: "Ensemble Document Clustering Using Weighted Hypergraph Generated by NMF" docx
... feed- back in IR, where retrieved documents are clus- tered, is actively researched (Hearst and Pedersen, 1996)(Kummamuru et al., 2004). In document clustering, the document is repre- sented as a ... hypergraph in the integration phase. Document clustering is the task of dividing a doc- ument’s data setinto groupsbased ondocumentsim- ilarity. This is the basic intelligent procedure, and is...
Ngày tải lên: 20/02/2014, 12:20
Báo cáo khoa học: "Labeling Documents with Timestamps: Learning from their Time Expressions" pot
... task in the NLP community: automatic document dating. Given a document with unknown origins, what char- acteristics of its text indicate the year in which the document was written? This paper proposes ... dating documents has come from the IR and knowledge management communities inter- ested in dating documents with unknown origins. de Jong et al. (2005) was among the first to auto- matically...
Ngày tải lên: 07/03/2014, 18:20
Báo cáo khoa học: "Probabilistic Document Modeling for Syntax Removal in Text Summarization" ppt
... method. There are D document sets, M documents in each set, N M words in document M, and C syntax classes. proach would be to model the syntax and semantic words used in a document collection ... assuming that each document is generated solely from one topic distribution that is shared throughout each document set. This results in a smoothed language model for each document set’s content ......
Ngày tải lên: 07/03/2014, 22:20
Báo cáo khoa học: "Learning Document-Level Semantic Properties from Free-text Annotations" pot
... that both the document text and the selection of keyphrases are governed by the underlying hidden properties of the document. Each property indexes a language model, thus allowing documents that ... keyphrase similarity values h – document keyphrases η – document keyphrase topics λ – probability of selecting η instead of φ c – selects between η and φ for word topics φ – document topic model z...
Ngày tải lên: 23/03/2014, 17:20