... AUTOMATIC ACQUISITION OF SUBCATEGORIZATION FRAMES FROM UNTAGGED TEXT Michael R. Brent MIT AI Lab 545 Technology Square Cambridge, ... open-class dictionary) and gener- ates a partial list of verbs occurring in the text and the subcategorizationframes (SFs) in which they occur. Verbs are detected by a novel tech- nique based on ... corpora. 1 INTRODUCTION This paper describes an implemented program that takes an untagged text corpus and generates a partial list of verbs occurring in it and the sub- categorization frames...
... describes a novel systemfor acquiring adjectival subcategorization frames (SCFs) and associated frequencyinformation from English corpus data.The system incorporates a decision-treeclassifier ... sub-categorization framesfrom untagged text. In Meet-ing of the Association for Computational Linguistics,pages 209–214.E. J. Briscoe and J. Carroll. 1997. Automatic Extractionof Subcategorizationfrom Corpora. ... first systems capable of automatically learn-ing a small number of verbal subcategorization frames (SCFs) from English corpora emerged overa decade ago (Brent, 1991; Manning, 1993). Subse-quent...
... values varied from frame to flame but not from verb to verb and were determined by taking into account for each frame its overall frame frequency which was es- timated from the COMLEX subcategorization ... corpus id- iosyncrasies can affect subcategorization frequen- cies (cf. Roland and Jurafsky (1998) for an exten- sive discussion). This suggests that different corpora may give different results ... shal- low syntactic processing. Alternating verbs were ac- quired from the BNC by using Gsearch as a chunk parser. Erroneous frames were discarded by apply- ing linguistic heuristics, statistical...
... (Cucerzan and Yarowsky, 1999) and (Collinsand Singer, 1999) present algorithms to obtainNEs from untagged corpora. However, they focuson the classification stage of already segmentedentities, and ... feature vector from this example in the following manner:First, we split both words into all possiblesubstrings of up to size two:We build a feature vector by coupling sub-strings from the two ... Computational LinguisticsWeakly Supervised Named Entity Transliteration and Discovery from Multilingual Comparable Corpora Alexandre Klementiev Dan RothDept. of Computer ScienceUniversity of IllinoisUrbana,...
... Japanese-English language pair,especially if involving the comparable corpora. Re-scoring through the Comparable Corpora Comparable corpora could be considered for thedisambiguation of translation ... comparable corpora- based techniques, re-spectively compared to the hybrid two-stages com-parable corpora and linguistics-based pruning.The proposed approach based on bi-directionalcomparable corpora ... TR2-007.P. Fung. 2000. A Statistical View of Bilingual Lexi-con Extraction: From Parallel Corpora to Non-Parallel Corpora. In Jean Veronis, Ed. Parallel Text Process-ing.G. Grefenstette. 1999....
... ( (from SF0) (to San Francisco))))).) GR (Tell ((me (((about the) public) transportation)) ( (from SF0) ((to San) (Francisco .))))) GB ((Tell (me (about (((the public) transportation) ( (from ... corpus, the inside prob- abilities of longer spans of c are computed from INSIDE-OUTSIDE REESTIMATION FROM PARTIALLY BRACKETED CORPORA Fernando Pereira 2D-447, AT~zT Bell Laboratories PO Box ... inferred from raw text. In addition, the number of iterations needed to reach a good grammar can be reduced; in extreme cases, a good solution is found from parsed text but not from raw text....
... nouns or proper nouns is converted from their positions in the text into a vector. 3. Match pairs of positional difference vec- tors~ giving scores. All vectors from English and Chinese are matched ... dim(V2) 240 A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Pascale Fung Computer Science Department Columbia University New York, NY ... in the texts. For every word pair from this lexicon, we had ob- tained a DTW score and a DTW path. If we plot the points on the DTW paths of all word pairs from the lexicon, we get a graph...
... linguistic analysis. Theoriginality of our approach comes from the factthat collocations are not extracted from raw texts,but rather from syntactically parsed texts. The lin-guistic analysis ... textual corpora from the World Trade Organisation (WTO), whichconsist in parallel documents in three languages:English, French and Spanish. All the examplesgiven in this paper are taken from ... returns chunks of partial analyses. If132Creating a Multilingual Collocation Dictionary from Large Text Corpora Luka Nerima, Violeta Seretan, Eric WehrliLanguage Technology Laboratory (LATL),...
... translationknowledge acquisition from WWWnews sites, this paper studies issues onthe effect of cross-language retrieval ofrelevant texts in bilingual lexicon ac-quisition from comparable corpora. Weexperimentally ... parallel/comparative corpora. However, the sizes as well as the domainof existing parallel/comparative corpora are lim-ited, while it is very expensive to manually col-lect parallel/comparative corpora. ... translationknowledge acquisition from parallel/comparative corpora, various kinds of translation knowledgeare acquired.Within this framework of translation knowledgeacquisition from WWW news sites, this...
... this paper we presented a novel algorithm forrapidly prototyping virtual instructors from human-human corpora without manual annotation. Usingour algorithm and the GIVE corpus we have gener-ated ... sum, this paper presents a novel way of au-tomatically prototyping task-oriented virtual agents from corpora who are able to effectively and natu-rally help a user complete a task in a virtual ... world.ReferencesSudeep Gandhe and David Traum. 2007. Creating spo-ken dialogue characters fromcorpora without annota-tions. In Proceedings of Interspeech, Belgium.Andrew Gargett, Konstantina...
... engineering is desired.Paraphrases can be extracted from non-parallel corpora using contextual similarity (Lin, 1998).They can also be obtained from parallel corpora if such data is available (Barzilay ... Ibrahim et al., 2003). Recently, there arealso a number of studies that extract paraphrases from multilingual corpora (Bannard and Callison-Burch, 2005; Zhao et al., 2008).The approach in (Barzilay ... Singapore, 4 August 2009.c2009 ACL and AFNLPExtracting Paraphrases of Technical Terms from Noisy Parallel Software Corpora Xiaoyin Wang1,2, David Lo1, Jing Jiang1, Lu Zhang2, Hong Mei21School...
... field.Comparable corpora exhibit various degrees ofparallelism. Fung and Cheung (2004a) describe corpora ranging from noisy parallel, to compara-ble, and finally to very non-parallel. Corpora from the ... comparable corpora from the Romanian translations of the EuropeanUnion’s acquis communautaire which we mined from the Web, and has about 10M English words.We downloaded comparable data from three ... lexicon extraction from compara-ble corpora. In ACL 2004, pages 527–534.Philipp Koehn and Kevin Knight. 2000. Estimatingword translation probabilities from unrelated mono-lingual corpora using...
... corpora, but - as empirically shown by Rapp - it also holds for non-parallel corpora. It can be expected that this clue will work best with parallel corpora, second-best with comparable corpora, ... translations from non-parallel corpora. Proceedings of the 5th Annual Workshop on Very Large Cor- pora, Hong Kong, 192-202. Fung, P.; Yee, L. Y. (1998). An IR approach for translating new words from ... word associations based on the co-occurrences of words in large corpora. In: Proceedings of the 1st Work- shop on Very Large Corpora: Columbus, Ohio, 84- 93. 526 German test word Baby...
... demonstrated that for Australian corporations, the correlation between corporate performance and executive salary was negative, that is, the highest paid executives control-led corporations with the ... distinguish the contribution of the executive from the fortunes of the corporation as a whole. Attempts to compare performance against similar corporations might allow comparative evaluation ... ciency may have been due to corporate leadership, such as through restruc-turing of corporations. is is plausible but diffi cult to prove. It cannot be isolated from other potential causes...
... (paragraph-level)structure of documents is examined, possibly usingmark-up from text encoding.133Creating a Multilingual Collocation Dictionary from Large Text Corpora Luka Nerima, Violeta Seretan, Eric WehrliLanguage ... linguistic analysis. Theoriginality of our approach comes from the factthat collocations are not extracted from raw texts,but rather from syntactically parsed texts. The lin-guistic analysis ... textual corpora from the World Trade Organisation (WTO), whichconsist in parallel documents in three languages:English, French and Spanish. All the examplesgiven in this paper are taken from...