Báo cáo khoa học: "Learning Bilingual Lexicons from Monolingual Corpora" pot
... }@cs.berkeley.edu Abstract We present a method for learning bilingual translation lexicons from monolingual cor- pora. Word types in each language are charac- terized by purely monolingual features, such as context ... 771–779, Columbus, Ohio, USA, June 2008. c 2008 Association for Computational Linguistics Learning Bilingual Lexicons from Monolingual Corpora Aria Haghighi,...
Ngày tải lên: 31/03/2014, 00:20
... transliteration pairs (EX) from corpora. The TM approach models phoneme-based or grapheme-based mapping rules using a generative model that is trained from a large bilingual lexicon, with the ... from corpora. The EX approach aims to construct a large and up-to- date transliteration lexicon from live corpora. Towards this objective, some have proposed extracting translation p...
Ngày tải lên: 31/03/2014, 01:20
... languages from non-parallel multilingual corpora in an unsupervised fashion. For this purpose, we assume a generative model for multilingual corpora, where each sentence is generated from a language ... borrowing from nearby languages, and 3) the innate abilities of humans (Chomsky, 1965). We assume hidden commonalities in syntax across languages, and try to extract a common grammar fr...
Ngày tải lên: 07/03/2014, 22:20
Báo cáo khoa học: "Constructing Transliteration Lexicons from Web Corpora" docx
... Constructing Transliteration Lexicons from Web Corpora Jin-Shea Kuo 1, 2 Ying-Kuei Yang 2 1 Chung-Hwa Telecommunication Laboratories, ... importance of term transliteration can be realized from our analysis of the terms used in 200 qualifying sentences that were randomly selected from English-Chinese mixed news pages. Each qualifying ... 15,822,984 pages, which was collec...
Ngày tải lên: 17/03/2014, 06:20
Báo cáo khoa học: "Building Emotion Lexicon from Weblog Corpora" potx
... collocation model is proposed to learn emotion lexicons from weblog articles. Emotion classification at sentence level is experimented by using the mined lexicons to demonstrate their usefulness. ... Blog from January to July, 2006, spanning a period of 212 days. In total, 336,161 bloggers’ articles were col- lected. Each blogger posts 16 articles on average. We used the articl...
Ngày tải lên: 17/03/2014, 04:20
Báo cáo khoa học: "Learning Tense Translation from Bilingual Corpora" docx
... Learning Tense Translation from Bilingual Corpora Michael Schiehlen* Institute for Computational Linguistics, University of ... disambigua- tion strategies for the translation of tense be- tween German and English, using a bilingual corpus of appointment scheduling dialogues. It describes a scheme to detect complex ... context relevant for disambiguation must be identified (disam...
Ngày tải lên: 31/03/2014, 04:20
Tài liệu Báo cáo khoa học: "Learning Event Durations from Event Descriptions" docx
... approach human per- formance. This research is potentially very important in applications in which the time course of events is to be extracted from news. For example, whether two events overlap ... instances), from the TimeBank corpus annotated in TimeML (Pustejovky et al., 2003). The non- WSJ articles (mainly political and disaster news) include both print and broadcast news that...
Ngày tải lên: 20/02/2014, 12:20
Báo cáo khoa học: "Learning Semantic Links from a Corpus of Parallel Temporal and Causal Relations" doc
... null label is NO-REL. train/test split from Table 1 and the feature sets: Syntactic The syntactic features from Section 4. Semantic The semantic features from Section 4. All Both syntactic and ... relations and 77.8% on causal re- lations. We trained machine learning mod- els using features derived from WordNet and the Google N-gram corpus, and they out- performed a variety of baselin...
Ngày tải lên: 08/03/2014, 01:20
Báo cáo khoa học: "Learning Semantic Categories from Clickthrough Logs" pdf
... both precision and recall. We cast semantic category acquisition from search logs as the task of learning labeled in- stances from few labeled seeds. To our knowledge this is the first study that ... different from ours. An- other line of new research is to combine various re- sources such as web documents with search query logs (Pas¸ca and Durme, 2008; Talukdar et al., 2008). We differ...
Ngày tải lên: 08/03/2014, 01:20
Báo cáo khoa học: "Acquiring a Lexicon from Unsegmented Speech" potx
... phones and semantic symbols with a sequence of words from the dictionary, each word offset a certain distance into the phone sequence, with words potentially overlapping. • It then creates new ... have an empty se- meme set. Indeed, such a word is properly hypothe- sized but a special mechanism prevents semantically empty words from being added to the dictionary. Without this mec...
Ngày tải lên: 08/03/2014, 07:20