... AStatisticalParserfor Czech* Michael Collins AT&T Labs-Research, Shannon Laboratory, 180 Park Avenue, Florham Park, NJ 07932 mcollins@research, att.com Jan Haj i~. Institute ... Tillmann Lehrstuhl ftir Informatik VI, RWTH Aachen D-52056 Aachen, Germany tillmann@informatik, rwth-aachen, de Abstract This paper considers statistical parsing of Czech, which differs radically ... of a morphological analy- sis program, and also with the single one of those tags that astatistical POS tagging program had predicted to be the correct tag (Haji~ and Hladka, 1998). Table...
... systemlearns this as a non-transliteration but it is wronglyannotated as a transliteration in the gold standard.Arabic nouns have an article “al” attached to themwhich is translated in English as ... InternationalLanguage Resources and Evaluation (LREC’10), Val-letta, Malta.Sittichai Jiampojamarn, Kenneth Dwyer, Shane Bergsma,Aditya Bhargava, Qing Dou, Mi-Young Kim, andGrzegorz Kondrak. ... non-transliterations by N.3.2 Implementation DetailsWe use the Forward-Backward algorithm to estimatethe counts of multigrams. The algorithm has a for- ward variable α and a backward variable...
... GRAMMARS The grammars which are supported by the parser are a subset of those for Structure Unification Grammar. These grammars are for the most part lexicalized. Each lexicalized grammar ... Categorial Grammar, Lexi- cal Functional Grammar, and Head-driven Phrase Structure Grammar. An SUG grammar is a set of partial descrip- tions of phrase structure trees. Each SUG gram- mar ... In Lawrence Davis, editor, Genetic Algorithms and Simulated Annealing, chapter 11, pages 141- 154. Morgan Kaufmann Publishers, Los Altos, CA. Shastri, Lokendra and Ajjanagadde, Venkat (1990)....
... Cunchillos, Juan-Pablo Vita, and Jose-´Angel Zamora. 2002. Ugaritic data bank. CD-ROM.Gregoria del Olo Lete and Joaqu´ın Sanmart´ın. 2004. A Dictionary of the Ugaritic Language in the Alpha-betic ... morphologicalsegmentation was carried out with the guidance of a standard Ugaritic grammar (Schniedewind andHunt, 2007). Although Ugaritic is an inflectionalrather than agglutinative language, in ... thisresearch has similar goals, it typically builds oninformation or resources unavailable for ancienttexts, such as comparable corpora, a seed lexi-con, and cognate information (Fung and McKe-own,...
... 2004. A maximum-entropy Chinese parser augmented by transformation-based learning.ACM Transactions on Asian Language InformationProcessing, 3(2):159–168.Mary Hearne and Andy Way. 2004. Data-orientedparsing ... timesfaster than Levy and Manning’s parser, and 270times faster than Bikel’s parser. Another advan-tage of our parser is that it does not take as muchmemory as these other parsers do. In fact, ... ’02.Michael John Collins. 1999. Head-driven Statistical Models for Natural Language Parsing. Ph.D. thesis,University of Pennsylvania.Walter Daelemans, Jakub Zavrel, Ko van der Sloot, andAntal van...
... a consensus translation technique to bootstrap parallel data using off-the-shelf translation sys-tems for training a hierarchical statistical transla-tion model for general domain instant ... normalization as a translation problem from the SMS language to the English language1 and we propose to adapt a phrase-based statistical MT model for the task. Evaluation by 5-fold cross validation ... SMS normalization. 2.3 SMS Normalization versus Text Para-phrasing Problem Others may regard SMS normalization as a para-phrasing problem. Broadly speaking, paraphrases capture core aspects...
... needs a word dictionary and takes long time for searching many character combinations. 614.2 Experiment Results and Analyses We used two separate Eumjeol n-grams as lan-guage models for experiments. ... be divided into statistical algorithms and rule-based algorithms. Statistical algorithms generally use character n-gram (Eojeol1 or Eumjeol2 n-gram in Korean) (Kang and Woo, 2001; Kwon, ... single Jaso tran- 3 Jaso is a Korean character. 4 ‘Transition’ means the correct character is changed to other character due to some causes, such as typographical errors. sition case (나와욧Æ나와요...
... the main spine. The automata for both the primary and secondary anchors associated with a lexical item could then be merged, minimized and used for parsing as above. Using automata for parsing ... merging all the final states into one. The parser maintains a dynamic record of which trees are valid for states (in particular fi- nal states) in the parse table. This means that we can minimise ... in their use of automata as part of the grammar formalism itself. Here, automata are used purely as a stepping-stone to parser optimisation: we make no linguistic claims about them. Indeed...
... Number 10, Stanford, CA. Carl Pollard. 1984. Generalized Phrase Structure Grammars, Head Grammars and Natural Lan- guage. Ph.D. thesis, Stanford University. Michael Reape. 1991. Parsing bounded ... information that a semantic head provides. For example, a head usually provides information about the remaining daughters that the parser must find, and (since the head daughter in a construction ... ConTroll also allows a parsing strategy to be specified within the same formalism as the grammar. 3 Our imple- mentation of the head-corner parser adapts van No- ord's (1997) parser to...
... Johnson and Paul M. Postal. Are Pair Grammar. Princeton University Press, 1980. [4] Ronald Kaplan and Joan Bresnan. Lexical- functional grammar, a formal system for grammatical representation. ... suitable for building computational relational grammars. A lexicalized SFG is sim- ply a collection of stratified feature graphs (S- graphs), each of which is anchored to a lexical item, analogous ... in a ~o~ally closed label. Additionally, we assume that all labels in well-formed lexicalized graphs (the input graphs to the parsing algorithm) are at least partially closed. This leaves...
... expand out each of the three cases: (1 3a) homorganic-nasal-cluster ~ labial-nasal labial-obstruent (13b) homorganie-nasal-cluster ~ coronal-nasal coronal-obstruent (13c) homorganic-nasal-cluster ... constraint. Nasal-cluster and place-assimilation are defined as: (1 7a) (setq nasal-cluster-lattice (M. nasal-lattice obstruent-lattice)) (17b) (setq place-assimilation-lattice (M + (M** labial-lattice) ... 1979. 16. Kaplan, R. and Bresnan, J., LexicabFunctional Grammar: A Formal System for Grammatical Representation, in Bresnan (ed.), The Mental Representation of Grammatical Relations, MIT Press....
... data and tools.In Proceedings of the NEMLAR International Con-ference on Arabic Language Resources and Tools,pages 110–117.Jan Hajiˇc, Jarmila Panevov´ a, Eva Hajiˇcov´ a, JarmilaPanevov´ a, ... fact that very little efforthas been spent on optimizing the training oracleand feature model for the 2-planar parser so far.It is worth mentioning that the 2-planar parser has two advantages ... 2006). For our 2-planar parser, we use the same kernelized SVMclassifiers as MaltParser, using the LIBSVM pack-age (Chang and Lin, 2001), with feature modelsthat are similar to MaltParser but...
... important subtask for many natural language processing applications,such as partial parsing, information retrieval andmachine translation. A baseNP is a simple nounphrase that does not contain other ... pp.218-224.COLING-ACL’98Lance A. Ramshaw and Michael P. Marcus ( InPress). Text chunking using transformation-basedlearning. In Natural Language Processing UsingVery large Corpora. Kluwer. Originally appearedin ... Treebank II,and the definition of baseNP is the same asRamshaw’s, Table 1 summarizes the averageperformance on both baseNP tagging and POStagging, each section of the whole PennTreebank was...
... LR Parsing of Natural Language (Corpora) with Unification-Based Grammars. Computa- tional Linguistics, 19(1):25-60. K. Church. 1988. A Stochastic Parts Program and Noun Phrase Parserfor ... Grammatical Trigrams: A Probabilistic Model of Link Grammar. Proceedings of the 1992 AAAI Fall Symposium on Probabilistic Approaches to Natural Language. D. Magerman. 1995. Statistical Decision-Tree ... relationship between 'sales' and the three tokens of 'of': Example 2 Shaw, based in Dalton, Ga., has an- nual sales of about $1.18 billion, and has economies of scale and...
... poems as outliers).4 Selection of lexical and syntacticvariablesAny text classification tasks require an object(here a text) to be parameterised into variables,whether qualitative or quantitative. ... ofNAACL HLT, pages 460–467.M. Heilman, K. Collins-Thompson, and M. Eskenazi.2008. An analysis of statistical models and fea-tures for reading difficulty prediction. Association for Computational ... Belgiumthomas.francois@uclouvain.beAbstractReading is known to be an essential taskin language learning, but finding the ap-propriate text for every learner is far fromeasy. In this context, automatic...