Tài liệu Báo cáo khoa học: "Using Confidence Bands for Parallel Texts Alignment" pptx
... Using Confidence Bands for Parallel Texts Alignment António RIBEIRO Departamento de Informática Faculdade de Ciências e Tecnologia Universidade ... independent method for alignment of parallel texts that makes use of homograph tokens for each pair of languages. In order to filter out tokens that may cause misalignment, we use confidence bands of linear ... have been two...
Ngày tải lên: 20/02/2014, 18:20
... Volume), pages 93–96, Columbus, Ohio, USA, June 2008. c 2008 Association for Computational Linguistics Using Structural Information for Identifying Similar Chinese Characters Chao-Lin Liu Jen-Hsiang ... pronunciations or in their internal structures are useful for computer-assisted language learning and for psycholinguistic studies. Al- though it is possible for us to emplo...
Ngày tải lên: 20/02/2014, 09:20
... translation task (Fujii et al., 2008). We use the first 100k sentences of the parallel corpus for the TM, and the whole parallel corpus for the LM. De- tails of both corpora can be found in Table 1. Cor- pora ... approaches. 2 A Probabilistic Model for Phrase Table Extraction The problem of SMT can be defined as finding the most probable target sentence e for the source sen- tence f...
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: " A Declarative Language for Implementing Dynamic Programs∗" pptx
... acquire more over time: we in- tend for it to generalize and encapsulate best practices, and serve as a testbed for new practices. Dyna is now be- ing used for parsing, machine translation, morphological analysis, ... probabilities, which permits heuristic early stopping before the agenda is empty. With viterbi values, it amounts to uniform-cost search for the best parse, and an item’...
Ngày tải lên: 20/02/2014, 16:20
Tài liệu Báo cáo khoa học: "Using Smaller Constituents Rather Than Sentences in Active Learning for Japanese Dependency Parsing" docx
... of pool-based active learning. Various methods for selecting informative exam- ples can be combined with this framework. 2.2 Selection Algorithm for Large Margin Classifiers One of the most accurate ... set. 2. While resources for labeling examples are available (a) Apply the current classifier to each un- labeled example (b) Find the m examples which are most in- formative for the clas...
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: "Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification" doc
... negative sentiment) given a small set of labeled data for the source domain, and unlabeled data for both source and target domains. In particular, no labeled data is provided for the target domain. In this paper, ... delicious in book reviews. Therefore, a model that is trained only using book reviews might not have any weights learnt for deli- cious or rust, which would make it diffic...
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: "Using Cross-Entity Inference to Improve Event Extraction" docx
... “peripheral vision”. Gupta and Ji (2009) used cross-event informa- tion within ACE extraction, but only for recovering implicit time information for events. Liao and Grishman (2010) propose document ... important local in- formation, actually contain sufficient clues for event detection. It is only based on the premise that we know the backgrounds of the entities before- hand. For...
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: "Using adaptor grammars to identify synergies in the unsupervised acquisition of linguistic structure" docx
... Adaptor grammars are a framework for specifying a wide range of such mod- els for grammatical inference. They can be viewed as a nonparametric extension of PCFGs. Informally, there seem to be at ... con- textual information is less important for their acqui- sition than, say, syntax. 2 From PCFGs to Adaptor Grammars This section introduces adaptor grammars as an ex- tension of PCFGs; f...
Ngày tải lên: 20/02/2014, 09:20
Tài liệu Báo cáo khoa học: "Unsupervised Translation Induction for Chinese Abbreviations using Monolingual Corpora" ppt
... length[Corpus] 3 sent1 ← Corpus[ i ] 4 contexts ← UPDATE(contexts, Corpus, i ) 5 for full in sent1 6 if full in Full-list 7 for sent2 in contexts 8 for abbr in sent2 9 if RL(full, abbr ) = TRUE 10 ... approach is to transform the abbreviation 425 into its full-form for which the current SMT system knows how to translate. For example, if the baseline system knows that the translation...
Ngày tải lên: 20/02/2014, 09:20
Tài liệu Báo cáo khoa học: "Using Automatically Transcribed Dialogs to Learn User Models in a Spoken Dialog System" doc
... assume that Pr( ˜ A t | A t ) is relatively straightforward to es- timate: for example, ASR models that rely a simple confusion rate and uniform substitutions (which can be estimated from small ... algorithm that treats this information as unobserved data. Although this approach does not directly employ manually transcribed dialogs, it does require a confusion model for the ASR en- gine,...
Ngày tải lên: 20/02/2014, 09:20