named entities from large corpora

Báo cáo khoa học: "Discovering Relations among Named Entities from Large Corpora" pot

Ngày tải lên: 17/03/2014, 06:20

8 283 0

Báo cáo khoa học: "A Method for Effective and Scalable Mining of Named Entity Transliterations from Large Comparable Corpora" doc

Ngày tải lên: 24/03/2014, 03:20

9 358 0

Tài liệu Báo cáo khoa học: "Creating a Multilingual Collocation Dictionary from Large Text Corpora" docx

... Cooccurrence Extraction with Fips Collocations are extracted from syntactically ana- lysed corpora. The analysis is performed by Fips, a large- scale parser based on an adaptation of Chomksy's ... returns chunks of partial analyses. If 132 Creating a Multilingual Collocation Dictionary from Large Text Corpora Luka Nerima, Violeta Seretan, Eric Wehrli Language Technology Laboratory (LATL), ... linguistic analysis. The originality of our approach comes from the fact that collocations are not extracted from raw texts, but rather from syntactically parsed texts. The linguistic analysis...

Ngày tải lên: 22/02/2014, 02:20

4 479 0

Báo cáo khoa học: "Creating a Multilingual Collocation Dictionary from Large Text Corpora" ppt

... (paragraph-level) structure of documents is examined, possibly using mark-up from text encoding. 133 Creating a Multilingual Collocation Dictionary from Large Text Corpora Luka Nerima, Violeta Seretan, Eric Wehrli Language ... linguistic analysis. The originality of our approach comes from the fact that collocations are not extracted from raw texts, but rather from syntactically parsed texts. The linguistic analysis ... textual corpora from the World Trade Organisation (WTO), which consist in parallel documents in three languages: English, French and Spanish. All the examples given in this paper are taken from...

Ngày tải lên: 08/03/2014, 21:20

4 353 0

Báo cáo khoa học: "Learning Translations of Named-Entity Phrases from Parallel Corpora" ppt

Ngày tải lên: 17/03/2014, 22:20

8 312 0

Tài liệu Báo cáo khoa học: "Bilingual Terminology Acquisition from Comparable Corpora and Phrasal Translation to Cross-Language Information Retrieval" pptx

... Japanese-English language pair, especially if involving the comparable corpora. Re-scoring through the Comparable Corpora Comparable corpora could be considered for the disambiguation of translation ... comparable corpora- based techniques, re- spectively compared to the hybrid two-stages comparable corpora and linguistics-based pruning. The proposed approach based on bi-directional comparable corpora ... TR2-007. P. Fung. 2000. A Statistical View of Bilingual Lexi- con Extraction: From Parallel Corpora to Non-Parallel Corpora. In Jean Veronis, Ed. Parallel Text Process- ing. G. Grefenstette. 1999....

Ngày tải lên: 20/02/2014, 16:20

4 377 0

Tài liệu Báo cáo khoa học: "Finding Parts in Very Large Corpora" pdf

... the machines at our disposal, so still larger corpora would not be out of the question. Finally, as noted above, Hearst [2] tried to find parts in corpora but did not achieve good results. ... Lexicography 3 (1990), 235-245. [2] Marti Hearst, "Automatic acquisition of hy- ponyms from large text corpora, " in Proceed- ings of the Fourteenth International Conference on Computational ... Abstract We present a method for extracting parts of objects from wholes (e.g. "speedometer" from "car"). Given a very large corpus our method finds part words with 55% accuracy...

Ngày tải lên: 20/02/2014, 19:20

8 351 0

Tài liệu Báo cáo khoa học: "Effect of Cross-Language IR in Bilingual Lexicon Acquisition from Comparable Corpora" pot

... translation knowledge acquisition from WWW news sites, this paper studies issues on the effect of cross-language retrieval of relevant texts in bilingual lexicon acquisition from comparable corpora. We experimentally ... parallel/comparative corpora. However, the sizes as well as the domain of existing parallel/comparative corpora are limited, while it is very expensive to manually col- lect parallel/comparative corpora. ... approach of acquiring translation knowledge of domain specific named entities, event expressions, and collocational expressions from the collection of bilingual news articles on WWW news sites...

Ngày tải lên: 22/02/2014, 02:20

8 477 0

Báo cáo khoa học: "Learning Condensed Feature Representations from Large Unsupervised Data Sets for Supervised Learning" docx

... utilize a large amount of unsupervised data to supplement supervised data. Speciﬁcally, an approach that involves incorporating ‘clustering- based word representations (CWR)’ induced from unsupervised ... Limited Memory BFGS Method for Large Scale Optimization. Math. Programming, Ser. B, 45(3):503–528. Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1994. Building a Large Annotated Corpus ... 2011. c 2011 Association for Computational Linguistics Learning Condensed Feature Representations from Large Unsupervised Data Sets for Supervised Learning Jun Suzuki, Hideki Isozaki, and Masaaki...

Ngày tải lên: 07/03/2014, 22:20

6 300 0

Báo cáo khoa học: "Annotating and Recognising Named Entities in Clinical Notes" pot

... 15000 clinical named entities in 11 entity types. This paper reports on the challenges involved in creating the annotation schema, and recognising and annotating clinical named entities. ... step to the extraction of structured information from these clinical notes is to achieve accurate identiﬁcation of clinical concepts or named entities. An entity may refer to a concrete object ... 3 named entities - CT, pituitary macroadenoma and suprasellar cisterns in the sentence: CT revealed pituitary macroadenoma in suprasellar cisterns. In recent years, the recognition of named...

Ngày tải lên: 08/03/2014, 01:20

9 413 0

Báo cáo khoa học: " Translating Named Entities Using Monolingual and Bilingual Resources" ppt

... IdentiFinder named entity identifier (Bikel et al., 1999) to iden- tify all named entities in the top retrieved documents for each sub-phrase. All named entities of the type of the named entity ... articles and hence the named entities will most likely be reported in many languages in- cluding the target language. Instead of having to come up with translations for the named entities of- ten with ... While the identifica- tion of named entities in text has received sig- nificant attention (e.g., Mikheev et al. (1999) and Bikel et al. (1999)), translation of named entities has not. This translation...

Ngày tải lên: 08/03/2014, 07:20

9 297 0

Báo cáo khoa học: "Detecting Highly Conﬁdent Word Translations from Comparable Corpora without Any Prior Knowledge" doc

... of bilingual lexicon extraction from parallel corpora. This assumption should also be reasonable for many types of comparable corpora such as Wikipedia or news corpora, which are topically aligned ... translation candidates from multilingual comparable corpora. By employing the algorithm we have improved precision scores of the methods rely- ing on per-topic word distributions from a cross- language ... efﬁciently bridge the gap between languages. That seed lexicon is usually crawled from the Web or obtained from parallel corpora. Recently, Li et al. (2011) have proposed an approach that improves...

Ngày tải lên: 08/03/2014, 21:20

11 290 0

Báo cáo khoa học: "CS NIPER Annotation-by-query for non-canonical constructions in large corpora" pdf

... annotation tasks that require manual analysis over large corpora. The approach is generalizable to any kind of linguistic phenomena that can be lo- cated in corpora on the basis of queries and require manual ... suitable software. Their empirical distribu- tion in corpora is thus largely unknown. A major task in recognizing NCCs is distinguishing them from structurally similar construc- 86 Figure 3: KWIC ... investiga- tion requires the analysis of large corpora due to a relatively low frequency of instances and whose identiﬁcation requires expert knowledge to distin- guish them from other similar constructions....

Ngày tải lên: 16/03/2014, 20:20

6 356 0

Báo cáo khoa học: "Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora" pptx

... sentence pairs are extracted from the aligned comparable corpora (section 2.2). The workflow for named entity (NE) and terminology extraction and mapping from comparable corpora extracts data in ... and named entities. The toolkit pairs similar bilingual comparable documents and extracts parallel sentences and bilingual terminological and named entity dictionaries from comparable corpora. ... translation;  named entity dictionaries. The demonstration showcases two general use case scenarios defined in the toolkit: “parallel data mining from comparable corpora and named entity/terminology...

Ngày tải lên: 16/03/2014, 20:20

6 289 0

Báo cáo khoa học: "Recognizing Named Entities in Tweets" docx

... semi- supervised learning. 1 Introduction Named Entities Recognition (NER) is generally un- derstood as the task of identifying mentions of rigid designators from text belonging to named- entity types such as ... Extracting personal names from email: apply- ing named entity recognition to informal text. In HLT, pages 443–450. David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and ... challenges and misconceptions in named entity recognition. In CoNLL, pages 147–155. Sameer Singh, Dustin Hillard, and Chris Leggetter. 2010. Minimally-supervised extraction of entities from text advertisements....

Ngày tải lên: 17/03/2014, 00:20

9 296 0

Báo cáo khoa học: "Building Emotion Lexicon from Weblog Corpora" potx

... 133–136, Prague, June 2007. c 2007 Association for Computational Linguistics Building Emotion Lexicon from Weblog Corpora Changhua Yang Kevin Hsin-Yih Lin Hsin-Hsi Chen Department of Computer Science and ... mine the relationships between words and emotions using weblog corpora. A collocation model is proposed to learn emotion lexicons from weblog articles. Emotion classification at sentence level ... Blog from January to July, 2006, spanning a period of 212 days. In total, 336,161 bloggers’ articles were col- lected. Each blogger posts 16 articles on average. We used the articles from...

Ngày tải lên: 17/03/2014, 04:20

4 302 0

Báo cáo khoa học: "Detecting Semantic Relations between Named Entities in Text Using Contextual Features" pdf

... solves problems, which result from when a parallel sentence arises from predication ellipsis. How- ever, there are several types of parallel sentence that differ from the one we explained. (For ... are sorted in order of likelihood of being the antecedent. The sorting algorithm has two steps. First, from the beginning of the text until the pronoun appears, noun Osakao asu , Naomiothers ni ga Ken wa Osakao asu ... anaphora resolutions here. Applied centering theory to relation detection is as follows. First, from the beginning of the text until the following NE appears, noun phrases are stacked depending...

Ngày tải lên: 17/03/2014, 04:20

4 314 0

Báo cáo khoa học: "Mapping Concrete Entities from PAROLE-SIMPLE-CLIPS to ItalWordNet: Methodology and Results" potx

... (henceforth TCs) clustered in three categories distinguishing 1 st OrderEntities, 2 nd OrderEntities and 3 rd Order Entities. Their subclasses, hierarchically ordered by means of a subsumption ... ontology of semantic types. 2 Corpora e Lessici dell'Italiano Parlato e Scritto. 161 The IWN Top Ontology (TO) (Roventini et al., 2003), which slightly differs from the EWN TO 3 , consists ... 161–164, Prague, June 2007. c 2007 Association for Computational Linguistics Mapping Concrete Entities from PAROLE-SIMPLE-CLIPS to ItalWordNet: Methodology and Results Adriana Roventini, Nilda...

Ngày tải lên: 17/03/2014, 04:20

4 257 0

Báo cáo khoa học: "Constructing Transliteration Lexicons from Web Corpora" docx

Ngày tải lên: 17/03/2014, 06:20

4 218 0

Báo cáo khoa học: "Constructing Semantic Space Models from Parsed Corpora" potx

Ngày tải lên: 17/03/2014, 06:20

8 280 0