Báo cáo khoa học: "Reduced n-gram models for English and Chinese corpora" ppt
... Association for Computational Linguistics Reduced n-gram models for English and Chinese corpora Le Q Ha, P Hanna, D W Stewart and F J Smith School of Electronics, Electrical Engineering and Computer ... traditional models, which include all n-grams. In our experiments, the reduced n-gram Zipf curves are first presented, and compared with previously obtained conv...
Ngày tải lên: 23/03/2014, 18:20
... contextual features for all the senses of the target. For example, among the top 20 features for coach, we get match and team (for the “trainer” sense) as well as driver and car (for the “bus” sense). ... Distributional models have been used in many NLP analysis tasks (Salton et al., 1975; McCarthy and Carroll, 2003; Salton et al., 1975), as well as for cognitive modeling...
Ngày tải lên: 30/03/2014, 21:20
... utter- ances and automatically annotated with part-of- speech tag and supertag information and named entities. They were annotated by hand for dia- log acts and tasks/subtasks. The dialog act and task/subtask ... types of infor- mation provide rich clues for building dialog mod- els (Grosz and Sidner, 1986). Dialog models can be built ofine (for dialog mining and summari...
Ngày tải lên: 22/02/2014, 02:20
Báo cáo khoa học: "Employing Topic Models for Pattern-based Semantic Class Discovery" doc
... “documents”, “words”, and “topics”. To further improve efficiency, we also perform preprocess- ing (refer to Section 3.4 for details) before build- ing topic models for C R (q), where some ... modeling pro- vides a formal and convenient way of grouping documents and words to topics. In order to apply topic models to our problem, we map RASCs to documents, items to words, a...
Ngày tải lên: 08/03/2014, 00:20
Báo cáo khoa học: "Learning Expressive Models for Word Sense Disambiguation" pot
... expres- sive representation formalism, a range of (shallow and deep) knowledge sources and ILP as learning technique, it is possible to generate models that, when compared to models produced by machine ... tagged corpus and sense repositories provided for verbs in Senseval-3. There are 32 verbs with be- tween 40 and 398 examples each. The number of senses varies between 3...
Ngày tải lên: 08/03/2014, 02:21
Báo cáo khoa học: "Re-Ranking Models For Spoken Language Understanding Marco Dinarelli University of Trento Italy" potx
... (Shawe-Taylor and Cristianini, 2004) and tree kernels (Raymond and Riccardi, 2007; Moschitti and Bejan, 2004; Moschitti, 2006) to implicitly encode n-grams and other structural information in ... baselines FST and SVMs, and the re-ranking models (RR) applied to FST. A, B and C refer to the three approaches for generating training in- stances described above. As already men...
Ngày tải lên: 08/03/2014, 21:20
Báo cáo khoa học: "Rule Markov Models for Fast Tree-to-String Translation" pot
... 1. 858 Here, n 1 and n 2 are the total number of n-grams with exactly one and two counts, respectively. For our corpus, D 1 = 0.871 and D 2 = 0.902. Additionally, we experiment with 0.4 and 0.5 for D n . Pruning ... outperforms minimal rules, and performs at the same level as composed and vertically composed rules, but is smaller and faster. The number of parameters is shown...
Ngày tải lên: 17/03/2014, 00:20
Báo cáo khoa học: "Flow Network Models for Word Alignment and Terminology Extraction from Bilingual Corpora" docx
... of associ- ating English and French words, one way to find the preceding alignment is to search for the most 1All the examples consider English and French as the source and target languages, ... probabil- ities between English and French words, and in which we impose that any English, respec- tively French word, has to be aligned with one and only one French, resp....
Ngày tải lên: 17/03/2014, 07:20
Báo cáo khoa học: "Latent Variable Models for Semantic Orientations of Phrases" pdf
... the models, where one random variable corresponds to nouns and another random vari- able corresponds to adjectives. The words that are similar in terms of semantic orientations, such as “risk” and ... + f c , (17) where |N | and |A| are the numbers of the words for n and a, respectively. Thus, we have four different models : naive bayes (baseline), 3-PLSI, triangle, and U-shape...
Ngày tải lên: 17/03/2014, 22:20
Báo cáo khoa học: "Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the 0" potx
... F1) when possible. For Arabic -English and Chinese- English, we used 346 and 184 hand-aligned sen- tences from LDC2006E86 and LDC2006E93. Sim- ilarly, for Czech -English, 515 hand-aligned sen- tences ... and to the anonymous reviewers for their valuable comments. We thank Jason Riesa for providing the Arabic -English and Chinese -English hand-aligned data and the al...
Ngày tải lên: 30/03/2014, 17:20