a joint statistical model

Tài liệu Báo cáo khoa học: "A Joint Statistical Model for Simultaneous Word Spacing and Spelling Error Correction for Korean" pdf

Tài liệu Báo cáo khoa học: "A Joint Statistical Model for Simultaneous Word Spacing and Spelling Error Correction for Korean" pdf

Ngày tải lên : 20/02/2014, 12:20
... be divided into statistical algorithms and rule-based algorithms. Statistical algorithms generally use character n- gram (Eojeol 1 or Eumjeol 2 n-gram in Korean) (Kang and Woo, 2001; Kwon, ... it needs a word dictionary and takes long time for searching many character combinations. 61 4.2 Experiment Results and Analyses We used two separate Eumjeol n-grams as lan- guage models for ... single Jaso tran- 3 Jaso is a Korean character. 4 ‘Transition’ means the correct character is changed to other character due to some causes, such as typographical errors. sition case (나와욧Æ나와요...
  • 4
  • 523
  • 0
Báo cáo khoa học: "A Unified Statistical Model for the Identification of English BaseNP" pptx

Báo cáo khoa học: "A Unified Statistical Model for the Identification of English BaseNP" pptx

Ngày tải lên : 08/03/2014, 05:20
... important subtask for many natural language processing applications, such as partial parsing, information retrieval and machine translation. A baseNP is a simple noun phrase that does not contain other ... pp.218-224. COLING-ACL’98 Lance A. Ramshaw and Michael P. Marcus ( In Press). Text chunking using transformation-based learning. In Natural Language Processing Using Very large Corpora. Kluwer. Originally appeared in ... Treebank II, and the definition of baseNP is the same as Ramshaw’s, Table 1 summarizes the average performance on both baseNP tagging and POS tagging, each section of the whole Penn Treebank was...
  • 8
  • 482
  • 0
Tài liệu Báo cáo khoa học: "A Statistical Model for Unsupervised and Semi-supervised Transliteration Mining" pptx

Tài liệu Báo cáo khoa học: "A Statistical Model for Unsupervised and Semi-supervised Transliteration Mining" pptx

Ngày tải lên : 19/02/2014, 19:20
... system learns this as a non-transliteration but it is wrongly annotated as a transliteration in the gold standard. Arabic nouns have an article “al” attached to them which is translated in English as ... uses Hidden Markov Models (Nabende, 2010; Darwish, 2010; Jiampojamarn et al., 2010), Finite State Au- tomata (Noeman and Madkour, 2010) and Bayesian learning (Kahki et al., 2011) to learn transliteration pairs ... International Language Resources and Evaluation (LREC’10), Val- letta, Malta. Sittichai Jiampojamarn, Kenneth Dwyer, Shane Bergsma, Aditya Bhargava, Qing Dou, Mi-Young Kim, and Grzegorz Kondrak....
  • 9
  • 521
  • 0
Tài liệu Báo cáo khoa học: "A Phrase-based Statistical Model for SMS Text Normalization" ppt

Tài liệu Báo cáo khoa học: "A Phrase-based Statistical Model for SMS Text Normalization" ppt

Ngày tải lên : 20/02/2014, 12:20
... a consensus translation technique to bootstrap parallel data using off-the-shelf translation sys- tems for training a hierarchical statistical transla- tion model for general domain instant ... SMS normalization. 2.3 SMS Normalization versus Text Para- phrasing Problem Others may regard SMS normalization as a para- phrasing problem. Broadly speaking, paraphrases capture core aspects ... minimal adaptation. One advantage of this pre-translation normalization is that the di- versity in different user groups and domains can be modeled separately without accessing and adapting...
  • 8
  • 399
  • 0
Tài liệu Báo cáo khoa học: "A Localized Prediction Model for Statistical Machine Translation" ppt

Tài liệu Báo cáo khoa học: "A Localized Prediction Model for Statistical Machine Translation" ppt

Ngày tải lên : 20/02/2014, 15:20
... Boston, MA, May. Christoph Tillmann and Fei Xia. 2003. A Phrase-based Unigram Model for Statistical Machine Translation. In Companian Vol. of the Joint HLT and NAACL Confer- ence (HLT 03), pages ... paper, we present a block-based model for statis- tical machine translation. A block is a pair of phrases which are translations of each other. For example, Fig. 1 shows an Arabic-English translation ... set of candidates. This computational advantage is the main reason that we adopt the local model in this paper. 3.3 Global versus Local Models Both the global and the localized log-linear models...
  • 8
  • 578
  • 0
Tài liệu Báo cáo khoa học: "Modelling Early Language Acquisition Skills: Towards a General Statistical Learning Mechanism" potx

Tài liệu Báo cáo khoa học: "Modelling Early Language Acquisition Skills: Towards a General Statistical Learning Mechanism" potx

Ngày tải lên : 22/02/2014, 02:20
... for cepstral mean normalisation (CMN) and cep- stral mean and variance normalisation (CMVN). Stage 2: A local-match distance matrix is then calculated by measuring the cosine distance be- Speech ... lan- guage and building a lexicon, whereas grammar is learnt via non -statistical methods later on. Sei- denberg et al. (2002) believe that learning grammar begins when statistical learning ... the computational and mathematical methods to those that model be- havioural data of human speech perception and production within five main areas: Front-end Processing: Research and devel- opment...
  • 9
  • 306
  • 0
Báo cáo khoa học: "A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" pdf

Báo cáo khoa học: "A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" pdf

Ngày tải lên : 08/03/2014, 01:20
... propose a cascaded linear model for joint Chinese word segmentation and part- of-speech tagging. With a character-based perceptron as the core, combined with real- valued features such as language models, ... ap- proach of discriminative models treats segmentation as a labelling problem by assigning each character a boundary tag (Xue and Shen, 2003), Joint S&T can be conducted in a labelling fashion ... features as well as the character-based ones. We noticed that the templates related to word unigrams and bigrams bring to the feature space an enlargement much rapider than the character-base...
  • 8
  • 445
  • 0
Báo cáo khoa học: "A Statistical Model for Lost Language Decipherment" pptx

Báo cáo khoa học: "A Statistical Model for Lost Language Decipherment" pptx

Ngày tải lên : 17/03/2014, 00:20
... co-occurrence analysis oper- ate over large corpora, which are typically unavail- able for a lost language. Finally, Knight and Yamada (1999) and Knight et al. (2006) describe a computational HMM- based ... structural sparsity constraints on character-level mappings. We assume that an ac- curate alphabetic mapping between related lan- guages will be sparse in the following way: each letter will map to a ... Publishers. Benjamin Snyder and Regina Barzilay. 2008. Cross- lingual propagation for morphological analysis. In Proceedings of the AAAI, pages 848–854. Wilfred Watson and Nicolas Wyatt, editors. 1999. Handbook...
  • 10
  • 429
  • 0
Báo cáo khoa học: "A Joint Rule Selection Model for Hierarchical Phrase-based Translation" pptx

Báo cáo khoa học: "A Joint Rule Selection Model for Hierarchical Phrase-based Translation" pptx

Ngày tải lên : 23/03/2014, 16:20
... for Phrase-Based Translation. In Proc. ACL, pages 315-323. Kenji Yamada and Kevin Knight. 2001. A Syntax- based Statistical Translation Model. In Proc. ACL, pages 523-530. Le Zhang. 2006. Maximum ... Entropy Based Phrase Reordering Model for Statistical Machine Translation. In Proc. ACL- Coling, pages 521-528. Deyi Xiong, Min Zhang, Aiti Aw, and Haizhou Li. 2009. A Syntax-Driven Bracketing Model ... trans- lation in traditional phrase-based models (Koehn et al., 2003; Xiong et al., 2006), but also char- acterize the complicated long distance reordering similar to syntactic based statistical...
  • 6
  • 314
  • 0
Báo cáo khoa học: "Automatic Story Segmentation using a Bayesian Decision Framework for Statistical Models of Lexical Chain Features" pdf

Báo cáo khoa học: "Automatic Story Segmentation using a Bayesian Decision Framework for Statistical Models of Lexical Chain Features" pdf

Ngày tải lên : 23/03/2014, 17:20
... Lexical Chain Features 4.1 Chain starts and ends We follow (Chan et al. 2007) to model the lexi- cal chain starts and ends at a story boundary with a statistical distribution. We apply a window ... consideration and statistically modeled. 2 Experimental Setup Experiments are conducted using data from the TDT-2 Voice of America Mandarin broadcast. In particular, we only use the data from ... terms and locating instances of time where the count of chain starts and ends (boun- dary strength) achieves local maxima. Chan et al. (2007) enhanced this approach through statistical modeling...
  • 4
  • 402
  • 1
Báo cáo khoa học: "A Part of Speech Estimation Method for Japanese Unknown Words using a Statistical Model of Morphology and Context" pptx

Báo cáo khoa học: "A Part of Speech Estimation Method for Japanese Unknown Words using a Statistical Model of Morphology and Context" pptx

Ngày tải lên : 23/03/2014, 19:20
... Table 3: Examples of common character bigrams for each part of speech in the infrequent words character type sequence kanji katakana katakana-kanji kanji-hiragana hiragana kanji-katakana ... kanji-hiragana hiragana kanji-katakana kat akana-symbol-katakana number kanji-hiragana-kanji alphabet kanji-hir agana-kanji-hir agana hiragana-kanji percent 45.1% 11.4% 6.5% 5.6% ... different types of characters other than punc- tuation marks: kanji, hiragana, katakana, Roman alphabet, and Arabic numeral. Kanji which means 'Chinese character' is used for both...
  • 8
  • 397
  • 0