Báo cáo khoa học: "A Cascaded Linear Model for Joint Chinese

Báo cáo khoa học: "A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" pdf

... 897–904, Columbus, Ohio, USA, June 2008. c 2008 Association for Computational Linguistics A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging Wenbin Jiang † Liang Huang ‡ Qun ... segmentation only and joint segmentation and part-of-speech tagging. On the Penn Chinese Treebank 5.0, we obtain an error reduction of 18.5% on...

Ngày tải lên: 08/03/2014, 01:20

8 445 0

Báo cáo khoa học: "An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging" docx

... discriminative word- character hybrid model for joint Chi- nese word segmentation and POS tagging. Our word- character hybrid model offers high performance since it can handle both known and unknown words. ... ACL and the 4th IJCNLP of the AFNLP, pages 513–521, Suntec, Singapore, 2-7 August 2009. c 2009 ACL and AFNLP An Error-Driven Word- Character Hybrid Model for...

Ngày tải lên: 17/03/2014, 01:20

9 338 0

Tài liệu Báo cáo khoa học: "A Hybrid Hierarchical Model for Multi-Document Summarization" ppt

... paper, we formulate ex- tractive summarization as a two step learn- ing problem building a generative model for pattern discovery and a regression model for inference. We calculate scores for sentences ... hierarchical model and regression model to score sentences in new docu- ments, eliminating the need for building a generative model for new document clusters. 3...

Ngày tải lên: 20/02/2014, 04:20

10 559 0

Tài liệu Báo cáo khoa học: "A Unified Graph Model for Sentence-based Opinion Retrieval" pdf

... The Lexicon of Chinese Positive Words, which consists of 5,054 positive words and the Lexicon of Chinese Negative Words, which consists of 3,493 negative words; (2) The opinion word lexicon ... notion of topic-sentiment word pair, which consists of a topic term and a sentiment word. A word pair maintains the asso- ciative information between the two words, and enables s...

Ngày tải lên: 20/02/2014, 04:20

9 585 0

Tài liệu Báo cáo khoa học: "A probabilistic generative model for an intermediate constituency-dependency representation" pptx

... re-ranking model performs rather well for a limited number of candidate structures, and out- performs Charniak’s model when k = 5. In this case we observe a small boost in performance for the detection ... structure. It models the event of ﬁlling B with a content word (cw), given the content word of the governing block, the cate- gories (cats) and functional words (f w) of B,...

Ngày tải lên: 20/02/2014, 04:20

6 556 0

Tài liệu Báo cáo khoa học: "A Phonotactic Language Model for Spoken Language Identification" pptx

... another. Therefore, one can easily draw the analogy between an acoustic token in bag-of-sounds and a word in bag-of-words . Unlike words in a text document, the phonotactic information that ... n-character slice for text categorization by language (Cavnar and Trenkle, 1994) and Phone Rec- ognition followed by n-gram Language Modeling, or PRLM (Zissman, 1996) . Orthographi...

Ngày tải lên: 20/02/2014, 15:20

8 437 0

Tài liệu Báo cáo khoa học: "A Localized Prediction Model for Statistical Machine Translation" ppt

... length. Single source and target words are denoted by and respectively, where and . We will also use a special single -word block set which contains only blocks for which . For the experiments in ... phrase-based model for SMT similar to the models presented in (Koehn et al., 2003; Och et al., 1999; Tillmann and Xia, 2003). In our paper, phrase pairs are named blocks and...

Ngày tải lên: 20/02/2014, 15:20

8 578 0

Tài liệu Báo cáo khoa học: "A SPEECH-FIRST MODEL FOR REPAIR DETECTION AND CORRECTION" docx

... statistical analysis does not sup- 6We performed the same analysis for the last and first syllables in the reparandum and repair, respectively, and for normalized f0 and energy; results did not substantially ... Length of Reparandum Offset Word Frag- ments (N=288) bution of initial phonemes for all words in the corpus of 6,414 ATIS sentences, and for all fragments, s...

Ngày tải lên: 20/02/2014, 21:20

8 502 0

Báo cáo khoa học: "A Unified Statistical Model for the Identification of English BaseNP" pptx

... describe the two-pass statistical model, parameters training and Viterbi algorithm for the search of the best sequences of POS tagging and baseNP identification. Before describing our algorithm, ... iii nnnP and (4) ),|( iii bmtwP . The first and the third parameters are trigrams of T and B respectively. The second and the fourth are lexical generation probabilities. Probabilit...

Ngày tải lên: 08/03/2014, 05:20

8 482 0

Báo cáo khoa học: " A Noisy-Channel Model for Document Compression" pptx

... result in incoherence and information loss. The deletion of certain words and phrases may also lead to ungram- maticality and information loss. The mayor is now looking for re-election. John ... in- serted. Knight and Marcu (2000) describe in detail a noisy-channel model that explains how short sentences can be expanded into longer ones by inserting and expanding syntactic con...

Ngày tải lên: 08/03/2014, 07:20

8 458 0