named entities from large corpora

Tài liệu Báo cáo khoa học: "Creating a Multilingual Collocation Dictionary from Large Text Corpora" docx

Tài liệu Báo cáo khoa học: "Creating a Multilingual Collocation Dictionary from Large Text Corpora" docx

Ngày tải lên : 22/02/2014, 02:20
... Cooccurrence Extraction with Fips Collocations are extracted from syntactically ana- lysed corpora. The analysis is performed by Fips, a large- scale parser based on an adaptation of Chomksy's ... returns chunks of partial analyses. If 132 Creating a Multilingual Collocation Dictionary from Large Text Corpora Luka Nerima, Violeta Seretan, Eric Wehrli Language Technology Laboratory (LATL), ... linguistic analysis. The originality of our approach comes from the fact that collocations are not extracted from raw texts, but rather from syntactically parsed texts. The lin- guistic analysis...
  • 4
  • 479
  • 0
Báo cáo khoa học: "Creating a Multilingual Collocation Dictionary from Large Text Corpora" ppt

Báo cáo khoa học: "Creating a Multilingual Collocation Dictionary from Large Text Corpora" ppt

Ngày tải lên : 08/03/2014, 21:20
... (paragraph-level) structure of documents is examined, possibly using mark-up from text encoding. 133 Creating a Multilingual Collocation Dictionary from Large Text Corpora Luka Nerima, Violeta Seretan, Eric Wehrli Language ... linguistic analysis. The originality of our approach comes from the fact that collocations are not extracted from raw texts, but rather from syntactically parsed texts. The lin- guistic analysis ... textual corpora from the World Trade Organisation (WTO), which consist in parallel documents in three languages: English, French and Spanish. All the examples given in this paper are taken from...
  • 4
  • 353
  • 0
Tài liệu Báo cáo khoa học: "Bilingual Terminology Acquisition from Comparable Corpora and Phrasal Translation to Cross-Language Information Retrieval" pptx

Tài liệu Báo cáo khoa học: "Bilingual Terminology Acquisition from Comparable Corpora and Phrasal Translation to Cross-Language Information Retrieval" pptx

Ngày tải lên : 20/02/2014, 16:20
... Japanese-English language pair, especially if involving the comparable corpora. Re-scoring through the Comparable Corpora Comparable corpora could be considered for the disambiguation of translation ... comparable corpora- based techniques, re- spectively compared to the hybrid two-stages com- parable corpora and linguistics-based pruning. The proposed approach based on bi-directional comparable corpora ... TR2-007. P. Fung. 2000. A Statistical View of Bilingual Lexi- con Extraction: From Parallel Corpora to Non-Parallel Corpora. In Jean Veronis, Ed. Parallel Text Process- ing. G. Grefenstette. 1999....
  • 4
  • 377
  • 0
Tài liệu Báo cáo khoa học: "Finding Parts in Very Large Corpora" pdf

Tài liệu Báo cáo khoa học: "Finding Parts in Very Large Corpora" pdf

Ngày tải lên : 20/02/2014, 19:20
... the machines at our disposal, so still larger corpora would not be out of the question. Finally, as noted above, Hearst [2] tried to find parts in corpora but did not achieve good results. ... Lexicography 3 (1990), 235-245. [2] Marti Hearst, "Automatic acquisition of hy- ponyms from large text corpora, " in Proceed- ings of the Fourteenth International Conference on Computational ... Abstract We present a method for extracting parts of objects from wholes (e.g. "speedometer" from "car"). Given a very large corpus our method finds part words with 55% accuracy...
  • 8
  • 351
  • 0
Tài liệu Báo cáo khoa học: "Effect of Cross-Language IR in Bilingual Lexicon Acquisition from Comparable Corpora" pot

Tài liệu Báo cáo khoa học: "Effect of Cross-Language IR in Bilingual Lexicon Acquisition from Comparable Corpora" pot

Ngày tải lên : 22/02/2014, 02:20
... translation knowledge acquisition from WWW news sites, this paper studies issues on the effect of cross-language retrieval of relevant texts in bilingual lexicon ac- quisition from comparable corpora. We experimentally ... parallel/comparative corpora. However, the sizes as well as the domain of existing parallel/comparative corpora are lim- ited, while it is very expensive to manually col- lect parallel/comparative corpora. ... approach of acquiring transla- tion knowledge of domain specific named entities, event expressions, and collocational expressions from the collection of bilingual news articles on WWW news sites...
  • 8
  • 477
  • 0
Báo cáo khoa học: "Learning Condensed Feature Representations from Large Unsupervised Data Sets for Supervised Learning" docx

Báo cáo khoa học: "Learning Condensed Feature Representations from Large Unsupervised Data Sets for Supervised Learning" docx

Ngày tải lên : 07/03/2014, 22:20
... utilize a large amount of unsupervised data to supplement supervised data. Specifically, an approach that involves incorporating ‘clustering- based word representations (CWR)’ induced from unsupervised ... Limited Memory BFGS Method for Large Scale Optimization. Math. Programming, Ser. B, 45(3):503–528. Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1994. Building a Large Annotated Corpus ... 2011. c 2011 Association for Computational Linguistics Learning Condensed Feature Representations from Large Unsupervised Data Sets for Supervised Learning Jun Suzuki, Hideki Isozaki, and Masaaki...
  • 6
  • 300
  • 0
Báo cáo khoa học: "Annotating and Recognising Named Entities in Clinical Notes" pot

Báo cáo khoa học: "Annotating and Recognising Named Entities in Clinical Notes" pot

Ngày tải lên : 08/03/2014, 01:20
... 15000 clin- ical named entities in 11 entity types. This paper reports on the challenges involved in creating the annotation schema, and recog- nising and annotating clinical named enti- ties. ... step to the extraction of structured in- formation from these clinical notes is to achieve accurate identification of clinical concepts or named entities. An entity may refer to a concrete object ... 3 named entities - CT, pituitary macroade- noma and suprasellar cisterns in the sentence: CT revealed pituitary macroadenoma in suprasel- lar cisterns. In recent years, the recognition of named...
  • 9
  • 413
  • 0
Báo cáo khoa học: " Translating Named Entities Using Monolingual and Bilingual Resources" ppt

Báo cáo khoa học: " Translating Named Entities Using Monolingual and Bilingual Resources" ppt

Ngày tải lên : 08/03/2014, 07:20
... IdentiFinder named entity identifier (Bikel et al., 1999) to iden- tify all named entities in the top retrieved docu- ments for each sub-phrase. All named entities of the type of the named entity ... articles and hence the named entities will most likely be reported in many languages in- cluding the target language. Instead of having to come up with translations for the named entities of- ten with ... While the identifica- tion of named entities in text has received sig- nificant attention (e.g., Mikheev et al. (1999) and Bikel et al. (1999)), translation of named entities has not. This translation...
  • 9
  • 297
  • 0
Báo cáo khoa học: "Detecting Highly Confident Word Translations from Comparable Corpora without Any Prior Knowledge" doc

Báo cáo khoa học: "Detecting Highly Confident Word Translations from Comparable Corpora without Any Prior Knowledge" doc

Ngày tải lên : 08/03/2014, 21:20
... of bilingual lexicon extraction from parallel corpora. This assumption should also be reasonable for many types of comparable corpora such as Wikipedia or news corpora, which are topically aligned ... trans- lation candidates from multilingual comparable corpora. By employing the algorithm we have improved precision scores of the methods rely- ing on per-topic word distributions from a cross- language ... efficiently bridge the gap between languages. That seed lexicon is usually crawled from the Web or obtained from parallel corpora. Recently, Li et al. (2011) have proposed an ap- proach that improves...
  • 11
  • 290
  • 0
Báo cáo khoa học: "CS NIPER Annotation-by-query for non-canonical constructions in large corpora" pdf

Báo cáo khoa học: "CS NIPER Annotation-by-query for non-canonical constructions in large corpora" pdf

Ngày tải lên : 16/03/2014, 20:20
... annotation tasks that require manual analysis over large corpora. The approach is generalizable to any kind of linguistic phenomena that can be lo- cated in corpora on the basis of queries and require manual ... suitable software. Their empirical distribu- tion in corpora is thus largely unknown. A major task in recognizing NCCs is distin- guishing them from structurally similar construc- 86 Figure 3: KWIC ... investiga- tion requires the analysis of large corpora due to a relatively low frequency of instances and whose identification requires expert knowledge to distin- guish them from other similar constructions....
  • 6
  • 356
  • 0
Báo cáo khoa học: "Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora" pptx

Báo cáo khoa học: "Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora" pptx

Ngày tải lên : 16/03/2014, 20:20
... sentence pairs are extracted from the aligned comparable corpora (section 2.2). The workflow for named entity (NE) and terminology extraction and mapping from comparable corpora extracts data in ... and named entities. The toolkit pairs similar bilingual comparable documents and extracts parallel sentences and bilingual terminological and named entity dictionaries from comparable corpora. ... translation;  named entity dictionaries. The demonstration showcases two general use case scenarios defined in the toolkit: “parallel data mining from comparable corpora and named entity/terminology...
  • 6
  • 289
  • 0
Báo cáo khoa học: "Recognizing Named Entities in Tweets" docx

Báo cáo khoa học: "Recognizing Named Entities in Tweets" docx

Ngày tải lên : 17/03/2014, 00:20
... semi- supervised learning. 1 Introduction Named Entities Recognition (NER) is generally un- derstood as the task of identifying mentions of rigid designators from text belonging to named- entity types such as ... Extracting personal names from email: apply- ing named entity recognition to informal text. In HLT, pages 443–450. David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and ... challenges and misconceptions in named entity recognition. In CoNLL, pages 147–155. Sameer Singh, Dustin Hillard, and Chris Leggetter. 2010. Minimally-supervised extraction of entities from text advertisements....
  • 9
  • 296
  • 0
Báo cáo khoa học: "Building Emotion Lexicon from Weblog Corpora" potx

Báo cáo khoa học: "Building Emotion Lexicon from Weblog Corpora" potx

Ngày tải lên : 17/03/2014, 04:20
... 133–136, Prague, June 2007. c 2007 Association for Computational Linguistics Building Emotion Lexicon from Weblog Corpora Changhua Yang Kevin Hsin-Yih Lin Hsin-Hsi Chen Department of Computer Science and ... mine the relationships between words and emotions using weblog corpora. A collocation model is proposed to learn emotion lexicons from weblog articles. Emotion classification at sentence level ... Blog from January to July, 2006, spanning a period of 212 days. In total, 336,161 bloggers’ articles were col- lected. Each blogger posts 16 articles on average. We used the articles from...
  • 4
  • 302
  • 0
Báo cáo khoa học: "Detecting Semantic Relations between Named Entities in Text Using Contextual Features" pdf

Báo cáo khoa học: "Detecting Semantic Relations between Named Entities in Text Using Contextual Features" pdf

Ngày tải lên : 17/03/2014, 04:20
... solves problems, which result from when a parallel sentence arises from predication ellipsis. How- ever, there are several types of parallel sentence that differ from the one we explained. (For ... are sorted in order of likelihood of being the antecedent. The sorting algorithm has two steps. First, from the beginning of the text until the pronoun appears, noun Osakao asu , Naomiothers ni ga Ken wa Osakao asu ... anaphora resolutions here. Applied centering theory to relation detection is as follows. First, from the beginning of the text until the following NE appears, noun phrases are stacked depending...
  • 4
  • 314
  • 0
Báo cáo khoa học: "Mapping Concrete Entities from PAROLE-SIMPLE-CLIPS to ItalWordNet: Methodology and Results" potx

Báo cáo khoa học: "Mapping Concrete Entities from PAROLE-SIMPLE-CLIPS to ItalWordNet: Methodology and Results" potx

Ngày tải lên : 17/03/2014, 04:20
... (henceforth TCs) clustered in three categories distinguishing 1 st OrderEntities, 2 nd OrderEntities and 3 rd Order Entities. Their subclasses, hierarchically ordered by means of a subsumption ... ontology of semantic types. 2 Corpora e Lessici dell'Italiano Parlato e Scritto. 161 The IWN Top Ontology (TO) (Roventini et al., 2003), which slightly differs from the EWN TO 3 , consists ... 161–164, Prague, June 2007. c 2007 Association for Computational Linguistics Mapping Concrete Entities from PAROLE-SIMPLE-CLIPS to ItalWordNet: Methodology and Results Adriana Roventini, Nilda...
  • 4
  • 257
  • 0

Xem thêm