Báo cáo khoa học: "Icelandic Data Driven Part of Speech Tagging" pot
... Proceedings of ACL-08: HLT, Short Papers (Companion Volume), pages 33–36, Columbus, Ohio, USA, June 2008. c 2008 Association for Computational Linguistics Icelandic Data Driven Part of Speech Tagging Mark ... Tagging Mark Dredze Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104 mdredze@cis.upenn.edu Joel Wallenberg Department of...
Ngày tải lên: 23/03/2014, 17:20
... and Robust Part- of- Speech Tagging Using Dynamic Model Selection Jinho D. Choi Department of Computer Science University of Colorado Boulder choijd@colorado.edu Martha Palmer Department of Linguistics University ... “heterogeneous data as a mixture of data collected from several different sources. and testing data gets larger. Thus, to ensure robust- ness, a tagger needs to...
Ngày tải lên: 19/02/2014, 19:20
... Dept. of Computer Science Daegu, Korea Heidelberg, Germany University of Illinois at Chicago {hjsong,jwson,tgnoh}@sejong.knu.ac.kr sbpark@uic.edu sjlee@knu.ac.kr Abstract All types of part- of- speech ... a Maxi- mum Entropy Part- of- Speech Tagger. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. pp. 63–70. Ioannis Tsochantaridis, Thomas H...
Ngày tải lên: 07/03/2014, 18:20
Báo cáo khoa học: "Simultaneous Tokenization and Part-of-Speech Tagging for Arabic without a Morphological Analyzer" doc
... 106 REL PRON 5647 VOC PART 74 DEM 3673 VERB 62 OBJ PRON 2812 JUS PART 56 NEG PART 2649 FOREIGN 46 PSEUDO VERB 1505 DIALECT 41 FUT PART 1099 INTERJ 37 ADV 1058 EMPHATIC PART 19 VERB PART 824 CVSUFF ... is created. NOA 173938 PART 288 PREP 49894 RESTRIC PART 237 PUNC 41398 DET 215 NOUN PROP 29423 RC PART 192 CONJ 28257 FOCUS PART 191 PV 16669 TYPO 188 IV 15361 INTERROG PART...
Ngày tải lên: 30/03/2014, 21:20
Tài liệu Báo cáo khoa học: "Detecting Errors in Part-of-Speech Annotation" docx
... Detecting Errors in Part- of- Speech Annotation Markus Dickinson W. Detmar Meurers Department of Linguistics Department of Linguistics The Ohio State University The ... patterns, are dis- cussed. The success of the three ap- proaches is illustrated for the Wall Street Journal corpus as part of the Penn Tree- bank. 1 Introduction Part- of- speech (pos) annotated referen...
Ngày tải lên: 22/02/2014, 02:20
Báo cáo khoa học: "Improving data-driven dependency parsing using large-scale LFG grammars" pptx
... Spreyer Department of Linguistics University of Potsdam {lilja,kuhn,spreyer}@ling.uni-potsdam.de Abstract This paper presents experiments which combine a grammar -driven and a data- driven parser. ... example of a feature model. 2 For the training of baseline parsers we employ feature models which make use of the word form (FORM), part- of- speech (POS) and the dependency re...
Ngày tải lên: 17/03/2014, 02:20
Báo cáo khoa học: "Improving Data Driven Wordclass Tagging by System Combination" pptx
... by taking the first eight utterances of every ten. This part is used to train the individual tag- gers. The second part, Tune, consists of 10% of the data (every ninth utterance, 114479 tokens) ... merely by the potential of the learn- ing method used. Other limiting factors are the power of the hard- and software used to imple- ment the learning method and the availabil...
Ngày tải lên: 17/03/2014, 07:20
Tài liệu Báo cáo khoa học: Endovanilloids Putative endogenous ligands of transient receptor potential vanilloid 1 channels docx
... stimulus-induced formation of any of the three classes of the endogenous ligands can lead to the identification of TRPV1-mediated physiological processes. By looking at the biosynthetic routes of the putative endovanilloids, ... transient receptor potential vanilloid type 1 protein. Note added during revision: During the revision process of this article, NADA was reported to exert...
Ngày tải lên: 19/02/2014, 12:20
Báo cáo khoa học: "Company-Oriented Extractive Summarization of Financial News" pot
... informa- tion of interest. Our system performs well in terms of the ROUGE score (Lin & Hovy, 2003) com- pared with a competitive baseline (Section 6). 2 Data The data we work with is a collection of ... identified by means of simple heuristics. The text is tokenized according to Penn TreeBank style and each to- ken lemmatized using Wordnet’s morphological functions. Part of...
Ngày tải lên: 08/03/2014, 21:20
Báo cáo khoa học: "Using Search-Logs to Improve Query Tagging" potx
... run-time. Compared to a state -of- the-art approach, we achieve more than 20% relative error reduction. Additionally, we an- notate a corpus of search queries with part- of- speech tags, providing a ... token repeated with differ- ent parts -of- speech such as in “tie a tie.” To make a more precise matching we try a sequence of match- ing rules: First, exact match of the query n-...
Ngày tải lên: 16/03/2014, 20:20