0

discriminative n gram language modeling

Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "A Succinct N-gram Language Model" ppt

Báo cáo khoa học

... N -gram com-pression tasks achieved a significant com-pression rate without any loss.1 IntroductionThere has been an increase in available N -gram data and a large amount of web-scaled N- gram data ... Compressingtrigram language models with Golomb coding. InProc. of EMNLP-CoNLL 2007.O. Delpratt, N. Rahman, and R. Raman. 2006. Engi-neering the LOUDS succinct tree representation. InProc. ... N- gram counts. By using 8-bit floating pointquantization1, N -gram language models are com-pressed into 10 GB, which is comparable to a lossyrepresentation (Talbot and Brants, 2008).2 N -gram...
  • 4
  • 457
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Improved Smoothing for N-gram Language Models Based on Ordinary Counts" doc

Báo cáo khoa học

... thismakes Kneser-Ney smoothing inappropri-ate or inconvenient. In this paper, we in-troduce a new smoothing method based onordinary counts that outperforms all of theprevious ordinary-count methods ... new method eliminatingmost of the gap between Kneser-Ney andthose methods.1 IntroductionStatistical language models are potentially usefulfor any language technology task that producesnatural -language ... currently be the best approach when language models based on ordinary counts are desired.ReferencesChen, Stanley F., and Joshua Goodman. 1998.An empirical study of smoothing techniques forlanguage...
  • 4
  • 365
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Faster and Smaller N -Gram Language Models" pptx

Báo cáo khoa học

... Eachnode in the tree encodes a word, and paths in thetree correspond to n- grams in the collection. Triesensure that each n- gram prefix is represented onlyonce, and are very efficient when n- grams ... number of keys and values needed for n- gram language modeling, generic implementations do notwork efficiently “out of the box.” In this section,we will review existing techniques for encoding ... scalabledecoder for parsing-based machine translation withequivalent language model state maintenance. In Pro-ceedings of the Second Workshop on Syntax and Struc-ture in Statistical Translation.Zhifei Li,...
  • 10
  • 463
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Discriminative Lexicon Adaptation for Improved Character Accuracy – A New Direction in Chinese Language Modeling" pptx

Báo cáo khoa học

... statis-tical language modeling for Chinese. ACM Trans-action on Asian Language Information Processing,1(1):3–33.Jianfeng Gao, Mu Li, Andi Wu, and Chang-NingHuang. 2004. Chinese word segmentation: A ... characters in the lexi-con and using the training data to alter the currentlexicon in each iteration. This is also an interestingdirection.ReferencesMaximilian Bisani and Hermann Ney. 2005. Open vo-cabulary ... 16thInternational Conference on Computational Linguis-tic, pages 200–203.Kae-Cherng Yang, Tai-Hsuan Ho, Lee-Feng Chien, andLin-Shan Lee. 1998. Statistics-based segment pat-tern lexicon: A new...
  • 9
  • 466
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Discriminative Syntactic Language Modeling for Speech Recognition" pdf

Báo cáo khoa học

... linguistically motivated language model in conver-sational speech recognition. In Proc. ICASSP.Wen Wang. 2003. Statistical parsing and language model-ing based on constraint dependency grammar. ... fields. In Proceedings of the Human Language Technology Conference and Meeting of the NorthAmerican Chapter of the Association for Computational Lin-guistics (HLT-NAACL), Edmonton, Canada.Andreas ... 2001.Whole-sentence exponential language models: a vehicle forlinguistic-statistical integration. In Computer Speech and Language. Fei Sha and Fernando Pereira. 2003. Shallow parsing withconditional random...
  • 8
  • 409
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Large-Scale Syntactic Language Modeling with Treelets" docx

Báo cáo khoa học

... positive dataalone. We also show fluency improvements in a pre-liminary machine translation reranking experiment.2 Treelet Language Modeling The common denominator of most n- gram language models ... despite training on positive data alone.We also show fluency improvements in a pre-liminary machine translation experiment.1 Introduction N- gram language models are a central componentof all ... Google Inc. 2007. Large lan-guage models in machine translation. In Proceedingsof the Conference on Empirical Methods in Natural Language Processing.Eugene Charniak and Mark Johnson. 2005....
  • 10
  • 463
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Creating Robust Supervised Classifiers via Web-Scale N-gram Data" pdf

Báo cáo khoa học

... Workshopon Natural Language Generation.Natalia N. Modjeska, Katja Markert, and Malvina Nis-sim. 2003. Using the Web in machine learning forother-anaphora resolution. In EMNLP.Preslav Nakov and ... Search enginestatistics beyond the n- gram: Application to nouncompound bracketing. In CoNLL.Preslav Ivanov Nakov. 2007. Using the Web as an Im-plicit Training Set: Application to Noun CompoundSyntax ... order-ing, spelling correction, noun compoundbracketing, and verb part-of-speech dis-ambiguation. More importantly, when op-erating on new domains, or when labeledtraining data is not plentiful,...
  • 10
  • 359
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "An ERP-based Brain-Computer Interface for text entry using Rapid Serial Visual Presentation and Language Modeling" ppt

Báo cáo khoa học

... Transactions on Reha-bilitation Engineering, 8(2):216–219.B. Roark, J. de Villiers, C. Gibbons, and M. Fried-Oken.2010. Scanning methods and language modeling forbinary switch typing. In Proceedings ... brain-computer interface. NeuralSystems and Rehabilitation Engineering, IEEE Trans-actions on, 13(1):89–98.M.S. Treder and B. Blankertz. 2010. (C) overt atten-tion and visual speller design in an ERP-based ... methodsintegrating language modeling into grid scanning.2 RSVP based BCI and ERP ClassificationRSVP is an experimental psychophysics techniquein which visual stimulus sequences are displayedon a...
  • 6
  • 551
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "A Unified Framework for Automatic Evaluation using N-gram Co-Occurrence Statistics" pptx

Báo cáo khoa học

... ∑∑∑∑∈∈∈∈=}{),(}{),()()()(CandidatesCnCSngramCandidatesCnCSngramclipngramCountngramCountnP where Count(ngram) is the number of n- gram counts, and Countclip(ngram) is the maximum number of co-occurrences of ngram ... ),()()()(ferencesRnRSngramferencesRnRSngramclipngramCountngramCountnR where, as before, Count(ngram) is the number of n- gram counts, and Countclip(ngram) is the maximum number of co-occurrences of ngram in the reference answer and its corresponding ... using ST and eliminating the unigrams found in SW. We therefore define a recall score as: ∑∑∑∑∈∈∈∈=}{Re ),(}{Re ),()()()(ferencesRnRSngramferencesRnRSngramclipngramCountngramCountnR...
  • 8
  • 462
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Scalable Probabilistic Classifier for Language Modeling" pdf

Báo cáo khoa học

... Proceedings of the 5th International Con-ference on Spoken Language Processing, pages 1694–1698, Sydney, Australia.R. Kneser and H. Ney. 1995. Improved Backing-offfor M -Gram Language Modeling. In ... CategorizationResearch. Journal of Machine Learning Research,5:361–397.A. Mnih and G. Hinton. 2008. A Scalable HierarchicalDistributed Language Model. In Advances in NeuralInformation Processing Systems ... 21.H. Ney, U. Essen, and R. Kneser. 1994. On Structur-ing Probabilistic Dependences in Stochastic Language Modeling. Computer, Speech and Language, 8:1–38.B. Roark, M. Saraclar, and M. Collins....
  • 6
  • 350
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "An Efficient Indexer for Large N-Gram Corpora" docx

Báo cáo khoa học

... Melbourne, Australia.R. Kneser and H. Ney. 1995. Improved backing-off for n- gram language modeling. In Acoustics, Speech, andSignal Processing, 1995. ICASSP-95., 1995 Interna-tional Conference on, ... definition, eachinternal node except the root can have any number ofkeys in the range [v, 2v], and the root must have atleast one key. Finally, an internal node with k keyshas k + 1 children.4.2 ... Foundation CA-REER award #0747340 and IIS awards #0917170and #1018613. Any opinions, findings, and conclu-sions or recommendations expressed in this materialare those of the authors and do not...
  • 6
  • 320
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining" ppt

Báo cáo khoa học

... integrated syntactic language modeling. Ph.D. thesis, Brown University.L. Huang and K. Sagae. 2010. Dynamic Programmingfor Linear-Time Incremental Parsing. In Proceedingsof ACL.Zhongqiang ... Discrimina-tive syntactic language modeling for speech recogni-tion. In ACL.Denis Filimonov and Mary Harper. 2009. A joint language model with fine-grain syntactic tags. InEMNLP.Yoav Goldberg and Michael ... tuning dev04f BN data 2.5 hrsSupervised training: dep. parser, POS tagger Ontonotes BN treebank+ WSJ Penn treebank 1.3m words, 59k sent.Supervised training: constituent parser Ontonotes BN...
  • 9
  • 319
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation" pdf

Báo cáo khoa học

... alignment in statistical trans-lation. In Proceedings of COLING, pages 836–841.Ying Zhang and Stephan Vogel. 2004. Measuring con-fidence intervals for the machine translation evalua-tion metrics. ... translation performance significantly on alarge-scale Arabic-to-English MT task.1 IntroductionSignificant progress has been made in statisti-cal machine translation (SMT) in recent years.Among ... metrics. In Proceedings of The 10th InternationalConference on Theoretical and Methodological Issuesin Machine Translation.Bing Zhao and Shengyuan Chen. 2009. A simplexarmijo downhill algorithm...
  • 5
  • 259
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling" doc

Báo cáo khoa học

... token rep-resenting a sentence boundary in language model-101Mark Johnson and Sharon Goldwater. 2009. Im-proving nonparameteric Bayesian inference: exper-iments on unsupervised word segmentation ... arbitrary language, without any “word” indications.1 Introduction“Word” is no trivial concept in many languages.Asian languages such as Chinese and Japanesehave no explicit word boundaries, ... leverages dy-namic programming for inference. In Section 5 wedescribe experiments on the standard datasets inChinese and Japanese in addition to English pho-netic transcripts, and semi-supervised...
  • 9
  • 238
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Grounded Language Modeling for Automatic Speech Recognition of Sports Video" doc

Báo cáo khoa học

... grounded language modeling, an extension of tradition lan-guage modeling in which the probability of a word is conditioned not only on the previous word(s) but also on the non-linguistic context ... In-ternational Conference on Knowledge Discovery and Data Mining. Seattle, Washington. Stolcke, A., (2002). SRILM - An Extensible Language Modeling Toolkit, in Proc. Intl. Conf. Spoken Lan-guage ... baseline compari-sons) are generated with the SRI language model-ing toolkit (Stolcke, 2002) using Chen and Goodman's modified Kneser-Ney discounting and interpolation (Chen and Goodman,...
  • 9
  • 395
  • 0

Xem thêm