automated authorship attribution with character level language models

Tài liệu Báo cáo khoa học: "Reading Level Assessment Using Support Vector Machines and Statistical Language Models" pdf

Tài liệu Báo cáo khoa học: "Reading Level Assessment Using Support Vector Machines and Statistical Language Models" pdf

... statistical language models. In this paper, we also use support vector machines to combine features from tradi- tional reading level measures, statistical language models, and other language pro- cessing ... of 33-43%, with only one over 40%. The curves for bigram and unigram models have similar shapes, but the trigram models outperform the lower-order models. Error rates for the bigram models range ... of language competency. However, finding topical texts at an appro- priate reading level for foreign and sec- ond language learners is a challenge for teachers. This task can be addressed with natural...

Ngày tải lên: 20/02/2014, 15:20

8 447 0
Báo cáo khoa học: "Local Histograms of Character N -grams for Authorship Attribution" ppt

Báo cáo khoa học: "Local Histograms of Character N -grams for Authorship Attribution" ppt

... 13(6):1208–1215. F. Peng, D. Shuurmans, V. Keselj, and S. Wang. 2003. Language independent authorship attribution using character level language models. In Proceedings of the 10th conference of the European ... characters at the n-gram level (Plakias and Sta- matatos, 2008a). Acceptable performance in AA has been reported with character n-gram representations. However, as with word-based features, character ... information would be obtained with n ∈ {1 . . . D} where D is the maximum number of words in a document). With respect to character- based features, n-grams at the character level have been widely used...

Ngày tải lên: 07/03/2014, 22:20

11 339 0
Báo cáo khoa học: "Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers" ppt

Báo cáo khoa học: "Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers" ppt

... explore a dependency language model to improve translation quality. To some ex- tent, these syntactically-informed language models are consistent with syntax-based translation models in capturing ... or even trillions of English words, huge language models are built in a distributed man- ner (Zhang et al., 2006; Brants et al., 2007). Such language models yield better translation results but at ... Computational Linguistics Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers Deyi Xiong, Min Zhang, Haizhou Li Human Language Technology Institute...

Ngày tải lên: 07/03/2014, 22:20

10 415 0
Tài liệu Báo cáo khoa học: "Incremental Syntactic Language Models for Phrase-based Translation" pptx

Tài liệu Báo cáo khoa học: "Incremental Syntactic Language Models for Phrase-based Translation" pptx

... to incorporate large- scale n-gram language models in conjunction with incremental syntactic language models. The added decoding time cost of our syntactic language model is very high. By increasing ... trans- lation has effectively used n-gram word sequence models as language models. Modern phrase-based translation using large scale n-gram language models generally performs well in terms of lexical ... use supertag n-gram LMs. Syntactic language models have also been explored with tree-based translation models. Charniak et al. (2003) use syntactic lan- guage models to rescore the output of a...

Ngày tải lên: 20/02/2014, 04:20

12 511 0
Tài liệu Báo cáo khoa học: "The impact of language models and loss functions on repair disfluency detection" pptx

Tài liệu Báo cáo khoa học: "The impact of language models and loss functions on repair disfluency detection" pptx

... language models trained from text or speech corpora of vari- ous genres and sizes. The largest available language models are based on written text: we investigate the effect of written text language models ... among the different language models when extended features are present are relatively small. We assume that much of the information expressed in the language models overlaps with the lexical fea- tures. We ... when used with simple language models like a bigram language model. In this paper we first identify the 25 most likely analyses of each sentence using the TAG channel model together with a bigram...

Ngày tải lên: 20/02/2014, 04:20

9 610 0
Tài liệu Báo cáo khoa học: "An Empirical Investigation of Discounting in Cross-Domain Language Models" ppt

Tài liệu Báo cáo khoa học: "An Empirical Investigation of Discounting in Cross-Domain Language Models" ppt

... accurate than those with few samples. We used five history count buckets so that JMLM would have the same number of param- eters as GDLM. All five models are trigram models with type counts at ... exhibit linear growth in the count of the n-gram, with the amount of growth being closely correlated with the corpus divergence. Finally, we build a language model exploiting a parametric form of ... of English Bigrams. Computer Speech & Language, 5(1):19–54. Joshua Goodman. 2001. A Bit of Progress in Language Modeling. Computer Speech & Language, 15(4):403– 434. Bo-June (Paul) Hsu...

Ngày tải lên: 20/02/2014, 04:20

6 444 0
Tài liệu Báo cáo khoa học: "Improved Smoothing for N-gram Language Models Based on Ordinary Counts" doc

Tài liệu Báo cáo khoa học: "Improved Smoothing for N-gram Language Models Based on Ordinary Counts" doc

... we have tested, with the new method eliminating most of the gap between Kneser-Ney and those methods. 1 Introduction Statistical language models are potentially useful for any language technology ... with (4) one empir- ically optimized discount, (5) modified interpo- lated KN with Chen-Goodman formula discounts, and (6) interpolated KN with one empirically op- timized discount. We built models ... optimization. We built models based on six previous ap- proaches: (1) Katz backoff, (2) interpolated ab- solute discounting with Ney et al. formula dis- counts, backoff absolute discounting with (3) Ney et...

Ngày tải lên: 20/02/2014, 09:20

4 365 0
Tài liệu Báo cáo khoa học: "Generating statistical language models from interpretation grammars in dialogue systems" potx

Tài liệu Báo cáo khoa học: "Generating statistical language models from interpretation grammars in dialogue systems" potx

... with a comparison of in- grammar recognition performance. 3 Language modelling To generate the different trigram language models we used the SRI language modelling toolkit (Stol- cke, 2002) with ... the results with the results of the Nuance grammar show- ing that the differences of WER of the models in comparison with the baseline are all signifi- cant on the p=0.05 significance level. However, the ... decades of statistical language modeling: Where do we go from here? In Proceed- ings of IEEE:88(8). Rosenfeld R. 2000. Incorporating Linguistic Structure into Statistical Language Models. In Philosophical Transactions...

Ngày tải lên: 22/02/2014, 02:20

8 381 0
Tài liệu Báo cáo khoa học: "Web augmentation of language models for continuous speech recognition of SMS text messages" docx

Tài liệu Báo cáo khoa học: "Web augmentation of language models for continuous speech recognition of SMS text messages" docx

... 8 billion. 3 Speech Recognition Experiments We have trained language models on the in- domain data together with web data, and these models have been used in speech recognition ex- periments. ... 2007. Large language models in machine translation. In Proceedings of the 2007 Joint Conference on Empirical Meth- ods in Natural Language Processing and Com- putational Natural Language Learning ... Kneser- Ney smoothed n-gram models. IEEE Transac- tions on Audio, Speech and Language Processing, 15(5):1617–1624. A. Stolcke. 1998. Entropy-based pruning of backoff language models. In Proc. DARPA...

Ngày tải lên: 22/02/2014, 02:20

9 301 0
Tài liệu Báo cáo khoa học: "Automatic Authorship Attribution" ppt

Tài liệu Báo cáo khoa học: "Automatic Authorship Attribution" ppt

... Holmes, D. 1994, Authorship Attribution, Computers and the Humanities, 28: 87-106. Holmes, D. and R. Forsyth 1995, The Federalist Revisited: New Directions in Authorship Attribution, Literary ... stamatatos@wcl.ee.upatras.gr Abstract In this paper we present an approach to automatic authorship attribution dealing with real-world (or unrestricted) text. Our method is based on the computational ... trainable and fully -automated requiring no manual text preprocessing nor sampling. 1 Introduction The vast majority of the attempts to computer- assisted authorship attribution has been...

Ngày tải lên: 22/02/2014, 03:20

7 338 0
Báo cáo khoa học: "The use of formal language models in the typology of the morphology of Amerindian languages" potx

Báo cáo khoa học: "The use of formal language models in the typology of the morphology of Amerindian languages" potx

... context-free linear grammar with high effectivity than another using regular languages with low effectivity. To model the behavior of causative agglutina- tion and the interaction with person prefixes ... PC-KIMMO (Antworth, 1990) 3 The Toba morphology The Toba language belongs, with the languages pilaga, mocovi and kaduveo, to the guaycuru language family (Messineo, 2003; Klein, 1978). The toba ... grammars for modeling agglutination in this language, but first we will present the for- mer class of languages and its acceptor automata. 3.1 Linear context free languages and two-taped nondeterministic...

Ngày tải lên: 07/03/2014, 22:20

6 439 0
Báo cáo khoa học: "Faster and Smaller N -Gram Language Models" pptx

Báo cáo khoa học: "Faster and Smaller N -Gram Language Models" pptx

... performed experiments with two different language models. Our first language model, WMT2010, was a 5- gram Kneser-Ney language model which stores probability/back-off pairs as values. We trained this language ... novel language model caching technique that improves the query speed of our language models (and SRILM) by up to 300%. 1 Introduction For modern statistical machine translation systems, language models ... and Jianfeng Gao. 2007. Compressing trigram language models with golomb coding. In Proceedings of the Conference on Empiri- cal Methods in Natural Language Processing. Marcello Federico and Mauro...

Ngày tải lên: 07/03/2014, 22:20

10 463 0
Báo cáo khoa học: "Lost in Translation: Authorship Attribution using Frame Semantics" doc

Báo cáo khoa học: "Lost in Translation: Authorship Attribution using Frame Semantics" doc

... 23-24. Patrick Juola. 2006. Authorship attribution. Found. Trends Inf. Retr., 1(3):233–334. Moshe Koppel, Jonathan Schler, and Shlomo Argamon. 2008. Computational methods for authorship attribu- tion. ... Technology, 60(1):9–25. Moshe Koppel, Jonathan Schler, and Shlomo Arga- mon. 2010. Authorship attribution in the wild. Language Resources and Evaluation, pages 1–12. 10.1007/s10579-009-9111-2. 69 4 ... corpus. Random Weighted Attribution Corpus I IIa IIb IIc Accuracy 57.6 28.7 33.9 26.5 Table 1: Accuracy of a random weighted attribution. FWaF performed better than FW for attribution of author...

Ngày tải lên: 07/03/2014, 22:20

6 370 0
Báo cáo khoa học: "Randomized Language Models via Perfect Hash Functions" pptx

Báo cáo khoa học: "Randomized Language Models via Perfect Hash Functions" pptx

... 2007. Compressing trigram language models with golomb coding. In Proceedings of EMNLP-CoNLL 2007, Prague, Czech Republic, June. P. Clarkson and R. Rosenfeld. 1997 . Statistical language modeling using ... BLEU scores for a lossless (non-randomized) language model with parameter values quantized into 5 to 8 bits. We use MT04 data for system development, with MT05 data and MT06 (“NIST” subset) data ... (lossless) lan- guages models and our randomized language model. Note that the standard practice of measuring per- plexity is not meaningful here since (1) for efficient computation, the language model...

Ngày tải lên: 08/03/2014, 01:20

9 273 0
Báo cáo khoa học: "Generalized Algorithms for Constructing Statistical Language Models" pdf

Báo cáo khoa học: "Generalized Algorithms for Constructing Statistical Language Models" pdf

... tasks with a vocabulary size of about 500,000 words and for . Class-based models. In many applications, it is nat- ural and convenient to construct class-based language models, that is models ... experiment. 4 Representation of -gram Language Models with WFAs Standard smoothed -gram models, including backoff (Katz, 1987) and interpolated (Jelinek and Mercer, 1980) models, admit a natural representation ... by composing with is given by figure 7. 6 Conclusion We presented several new and efficient algorithms to deal with more general problems related to the construc- tion of language models found in new language...

Ngày tải lên: 08/03/2014, 04:22

8 389 0
w