Báo cáo khoa học: "Learning Transliteration Lexicons from the Web" pptx
... labeling. At the same time, we select samples of high confidence score from the rest and consider them correct E-C pairs. We then merge the labeled set with the high- confidence set in the PSM ... pairs from the Web at the same time. Conceptually, the adaptive learning is carried out as follows. We obtain bilingual snippets from the Web by iteratively submitting qu...
Ngày tải lên: 31/03/2014, 01:20
... collected manually. One of the most important attributes of these term pairs is that the numbers of syllables in the source-language term and the target- language term are equal. The syllables of both ... are produced or until other criteria are met. The conversions used in the last round of the training phase are then used to extract large-scale transliterated-term pair...
Ngày tải lên: 17/03/2014, 06:20
... than any other word. Evaluat- ing against the union of these lexicons yielded 98.0 p 0.33 , a significant improvement over the 92.3 us- ing only the Wiktionary lexicon. Of the true errors, the most ... corpora are from dif- ferent domains. Nonetheless, even in the more diffi- cult cases, a sizable set of high-precision translations can be extracted. As an example of the perfor...
Ngày tải lên: 31/03/2014, 00:20
Báo cáo khoa học: "Mining Parenthetical Translations from the Web by Word Alignment" potx
... pairs, where the translation of the in-parenthesis terms is a suffix of the pre-parenthesis text. The lengths and frequency counts of the suffixes have been used to determine what is the translation ... our modified version of the competitive link- ing algorithm, the link score of a pair of words is the sum of the φ 2 scores of the words themselves, their prefixes...
Ngày tải lên: 17/03/2014, 02:20
Báo cáo khoa học: "Extracting Hypernym Pairs from the Web" potx
... hypernym relations from the web. We compare our approach with hypernym ex- traction from morphological clues and from large text corpora. We show that the abun- dance of available data on the web enables obtaining ... A, B and C are siblings of each other Here, sibling refers to the relative position of the words in the hypernymy tree. Two words are sib- lings of each other...
Ngày tải lên: 17/03/2014, 04:20
Tài liệu Báo cáo khoa học: "Learning Event Durations from Event Descriptions" docx
... and the head of its object are extracted from the parse trees generated by the CONTEX parser. Similarly to the local context features, for both the subject head and the object head, their ... or object of the verb is plural. In “Iraq has destroyed its long-range missiles”, there is the time it takes to destroy one missile and the duration of the interval in which a...
Ngày tải lên: 20/02/2014, 12:20
Báo cáo khoa học: "Learning Common Grammar from Multilingual Corpus" potx
... α φ A represent the param- eters of a common grammar. We use the Dirichlet prior because it is the conjugate prior for the multi- nomial distribution. In summary, the proposed model assumes the following ... each language is generated from a general model that are common across languages, and each sentence in multilingual corpora is generated from the lan- guage dependent PC...
Ngày tải lên: 07/03/2014, 22:20
Báo cáo khoa học: "Learning Semantic Links from a Corpus of Parallel Temporal and Causal Relations" doc
... is the number of times the word appeared in the keyword’s pattern, and N(w) is the number of times the word was in the corpus. The following features were derived from these scores: • Whether the ... 2008). Annotators used the labels: BEFORE The first event fully precedes the second AFTER The second event fully precedes the first NO-REL Neither event clearly precedes...
Ngày tải lên: 08/03/2014, 01:20
Báo cáo khoa học: "Learning Semantic Categories from Clickthrough Logs" pdf
... “singapore,” and thus the label of “singapore” will be propagated to the pattern. On the other hand, the pattern “♯ map” is a neu- tral pattern which co-occurs with terms other than the Travel domain ... are patterns. The strength of lines indicates related- ness between each node. The darker a node, the more likely it belongs to the Travel domain. Start- ing from “singapo...
Ngày tải lên: 08/03/2014, 01:20
Báo cáo khoa học: "Learning Tense Translation from Bilingual Corpora" docx
... state in VC-mode records the verb form expected by Vii n (n + 1), the infinite verb form of the last verb encountered (rn), and the verb form expected by the VC verb, if the VC consists of only ... + 1). So there are m • (n + 1) 2 states. As soon as a non-verb is encountered in VC-mode or the verb form of the previous verb does not fit the subcategorization requiremen...
Ngày tải lên: 31/03/2014, 04:20