Báo cáo khoa học: "Prototyping virtual instructors from human-human corpora" pdf
... rapidly prototyping virtual in- structors from human-human corpora without manual annotation. Automatically prototyp- ing full-fledged dialogue systems from cor- pora is far from being a reality ... novel a algorithm for generating virtual instructors from automatically an- notated human-human corpora. Our algorithm, when given a task-based corpus situated in a virtual worl...
Ngày tải lên: 07/03/2014, 22:20
... ter- minology acquisition and disambiguation from com- parable corpora (Sadat et al., 2003) is described as follows: - Bilingual terminology acquisition from source language to target language to ... Japan Abstract The present paper will seek to present an approach to bilingual lexicon extrac- tion from non-aligned comparable cor- pora, phrasal translation as well as evalua- tions on Cr...
Ngày tải lên: 20/02/2014, 16:20
... both precision and recall. We cast semantic category acquisition from search logs as the task of learning labeled in- stances from few labeled seeds. To our knowledge this is the first study that ... different from ours. An- other line of new research is to combine various re- sources such as web documents with search query logs (Pas¸ca and Durme, 2008; Talukdar et al., 2008). We differ...
Ngày tải lên: 08/03/2014, 01:20
Báo cáo khoa học: "Building Emotion Lexicon from Weblog Corpora" potx
... Blog from January to July, 2006, spanning a period of 212 days. In total, 336,161 bloggers’ articles were col- lected. Each blogger posts 16 articles on average. We used the articles from ... and emotions using weblog corpora. A collocation model is proposed to learn emotion lexicons from weblog articles. Emotion classification at sentence level is experimented by using the mined...
Ngày tải lên: 17/03/2014, 04:20
Báo cáo khoa học: "Constructing Transliteration Lexicons from Web Corpora" docx
... importance of term transliteration can be realized from our analysis of the terms used in 200 qualifying sentences that were randomly selected from English-Chinese mixed news pages. Each qualifying ... 15,822,984 pages, which was collected from the Internet using a web spider and was converted to plain text, was used as a training set. This corpus is called SET1. From SET1, 80...
Ngày tải lên: 17/03/2014, 06:20
Báo cáo khoa học: "Identifying Word Translations from Comparable Corpora Using Latent Topic Models" potx
... use knowledge from word-topic distributions outperform methods based on similarity measures in the original word-document space. The best results, ob- tained by combining knowledge from word- topic ... in other cross-lingual topics. In other words, a word w 2 from a target language is a potential trans- lation candidate for a word w 1 from a source lan- guage, if the distribution of w...
Ngày tải lên: 23/03/2014, 16:20
Báo cáo khoa học: "Learning Bilingual Lexicons from Monolingual Corpora" pot
... Ohio, USA, June 2008. c 2008 Association for Computational Linguistics Learning Bilingual Lexicons from Monolingual Corpora Aria Haghighi, Percy Liang, Taylor Berg-Kirkpatrick and Dan Klein Computer ... aria42,pliang,tberg,klein }@cs.berkeley.edu Abstract We present a method for learning bilingual translation lexicons from monolingual cor- pora. Word types in each language are charac- te...
Ngày tải lên: 31/03/2014, 00:20
Báo cáo khoa học: "Learning Tense Translation from Bilingual Corpora" docx
... Learning Tense Translation from Bilingual Corpora Michael Schiehlen* Institute for Computational Linguistics, University ... Fortunately, the task can be (partly) automated if the tables associating words with biases are learned from a corpus. Statistical approaches also support empirical evaluation of different disambiguation ... (analytic tenses). 2 Words Are Not Enough Often, s...
Ngày tải lên: 31/03/2014, 04:20
Tài liệu Báo cáo khoa học: "Finding Parts in Very Large Corpora" pdf
... Providence, RI 02912 Abstract We present a method for extracting parts of objects from wholes (e.g. "speedometer" from "car"). Given a very large corpus our method finds part words ... lexicon. 1 Introduction We present a method of extracting parts of objects from wholes (e.g. "speedometer" from "car"). To be more precise, given a single...
Ngày tải lên: 20/02/2014, 19:20
Báo cáo khoa học: "System for Querying Syntactically Annotated Corpora" pdf
... special file called PML Schema and referring to this schema file from individual data files. It is relatively easy to convert data from other formats to PML without loss of information. In fact, ... consists of a part that selects nodes in the treebank, and an optional part that generates a report from the selected occurrences. The selective part of the query specifies condi- tions that a g...
Ngày tải lên: 17/03/2014, 02:20