Tài liệu Báo cáo khoa học: "Mining Wiki Resources for Multilingual Named Entity Recognition" pdf
... pages 1–9, Columbus, Ohio, USA, June 2008. c 2008 Association for Computational Linguistics Mining Wiki Resources for Multilingual Named Entity Recognition Alexander E. Richman Patrick Schone ... is available for download (download.wikimedia.org) in a text format suitable for inclusion in a database. For the remainder of this paper, we refer to this format. 1 W...
Ngày tải lên: 20/02/2014, 09:20
... therefore cannot cover fresh words or new usages of existing words. Secondly, their search 1 http://www.engkoo.com. functions are often limited, making it hard for users to effectively find information ... built for Chinese users who are learning English; however the technology it- self is language independent and can be ex- tended in the future. At a system level, En- gkoo is an applicati...
Ngày tải lên: 20/02/2014, 05:20
... Volume), pages 93–96, Columbus, Ohio, USA, June 2008. c 2008 Association for Computational Linguistics Using Structural Information for Identifying Similar Chinese Characters Chao-Lin Liu Jen-Hsiang ... pronunciations or in their internal structures are useful for computer-assisted language learning and for psycholinguistic studies. Al- though it is possible for us to emplo...
Ngày tải lên: 20/02/2014, 09:20
Tài liệu Báo cáo khoa học: "Kinds of Features for Chinese Opinionated Information Retrieval" pdf
... set of features for text classification (index- ing) for an OIR query of the first level (finds opin- ionated information) and for an OIR query of the second level (finds opinionated information with sentiment ... politics. We therefore believe that a system capable of pro- viding access to opinionated information in other languages (especially in Chinese) might be of great use for indivi...
Ngày tải lên: 20/02/2014, 12:20
Tài liệu Báo cáo khoa học: "Incorporating Context Information for the Extraction of Terms" pdf
... 1996), incorporating information gained from the textual context of the candidate term. 2 Context information for terms The idea of incorporating context information for term extraction came ... product. Since context carries information about terms it should be involved in the procedure for their ex- traction. We incorporate context information in the form of weights construc...
Ngày tải lên: 22/02/2014, 03:20
Báo cáo khoa học: "Clique-Based Clustering for improving Named Entity Recognition systems" pot
... processing. For instance the NE Oxford illus- trates the different ambiguity types that are inter- esting to address: • intra-annotation ambiguity: Wikipedia lists more than 25 cities named Oxford in ... Introduction In Information Extraction domain, named entities (NEs) are one of the most important textual units as they express an important part of the meaning of a document. Named...
Ngày tải lên: 31/03/2014, 20:20
Tài liệu Báo cáo khoa học: "A Statistical Model for Unsupervised and Semi-supervised Transliteration Mining" pptx
... labelled information for training. Our sys- tem extracts transliteration pairs in an unsupervised fashion. It is also able to utilize labelled information if available, obtaining improved performance. We ... of the Association for Computational Linguistics, pages 469–477, Jeju, Republic of Korea, 8-14 July 2012. c 2012 Association for Computational Linguistics A Statistical Model for...
Ngày tải lên: 19/02/2014, 19:20
Tài liệu Báo cáo khoa học: "Mining Wikipedia Revision Histories for Improving Sentence Compression" docx
... bountiful resource for such training data, which we obtain by mining the revision his- tory of Wikipedia for sentence compressions and expansions. Using only a fraction of the available Wikipedia data, ... pages 137–140, Columbus, Ohio, USA, June 2008. c 2008 Association for Computational Linguistics Mining Wikipedia Revision Histories for Improving Sentence Compression Elif Yamangil...
Ngày tải lên: 20/02/2014, 09:20
Tài liệu Báo cáo khoa học: "Mining User Reviews: from Specification to Summarization Xinfan Meng Key Laboratory of Computational Linguistics " doc
... struc- ture information and unit of measurement information are mined from the specifi- cation to improve the accuracy of feature extraction. At summary generation stage, hierarchy information in ... to users. For example, for feature “size”, descriptions like “small” and “thin” are more readable than “positive”. Usually, the words used to describe a product feature are short. For each p...
Ngày tải lên: 20/02/2014, 09:20
Tài liệu Báo cáo khoa học: "Mining metalinguistic activity in corpora to create lexical resources using Information Extraction techniques: the MOP system" doc
... information about sublanguage usage is being put forward. But the usefulness of robust NLP applications for special-domain text goes beyond glossary updates. The kind of categoriza- tion information ... informational segments are not meant to be read by laymen, but used by do- main lexicographers reviewing existing glossaries for neological change, or, for example, in machi- ne-read...
Ngày tải lên: 20/02/2014, 15:20