scalable mining of named entity transliterations

Tài liệu Báo cáo khoa học: "Robust Extraction of Named Entity Including Unfamiliar Word" doc

... Extraction of Japanese Named Entity 2.1 Task of the IREX Workshop The task of NE extraction of the IREX workshop (Sekine and Eriguchi, 2000) is to recognize eight NE types in Table 1. The organizer of ... experiments of extracting Japanese named entities from IREX corpus and NHK corpus show the effective- ness of the proposed method. 1 Introduction It is widely agreed that extraction of named entity (henceforth, ... Chunking of Named Entities It is quite common that the task of extracting Japanese NEs from a sentence is formalized as a chunking problem against a sequence of mor- 1 The organizer of the IREX...

Báo cáo khoa học: "Joint Inference of Named Entity Recognition and Normalization for Tweets" doc

... which named entities occur fre- quently with rich variations. We study the problem of named entity normalization (NEN) for tweets. Two main challenges are the er- rors propagated from named entity ... nature of tweets, there are rich variations of named enti- ties in them. According to our investigation on the data set provided by Liu et al. (2011), every named entity in tweets has an average of ... an overview of our method, then detail its model and features. 4.1 Overview Given a set of tweets as input, our method recog- nizes predefined types of named entities and for each entity outputs...

Báo cáo khoa học: "Automatic Acquisition of Named Entity Tagged Corpus from World Wide Web" pot

... Part- Of- Speech Tagging of Korean. Computational Lin- guistics, 28(1):53–70. Manabu Sassano and Takehito Utsuro. 2000. Named Entity Chunking Techniques in Supervised Learning for Japanese Named Entity ... Automatic Acquisition of Named Entity Tagged Corpus from World Wide Web Joohui An Dept. of CSE POSTECH Pohang, Korea 790-784 Seungwoo Lee Dept. of CSE POSTECH Pohang, Korea ... Jour- nal of Korean Information Science Society, 24(8):900– 909. GuoDong Zhou and Jian Su. 2002. Named Entity Recognition using an HMM-based Chunk Tagger. In Proceedings of the 40th Annual Meeting of...

Tài liệu Báo cáo khoa học: "Mining Wiki Resources for Multilingual Named Entity Recognition" pdf

... knowledge for named entity disambigua- tion. In Proceedings of EACL, 9-16. Cucerzan, S. 2007. Large-scale named entity dis- ambiguation based on Wikipedia data. In Pro- ceedings of EMNLP/CoNLL, ... trained on up to 40,000 words of human-annotated newswire. 1 Introduction Named Entity Recognition (NER) has long been a major task of natural language processing. Most of the research in the field ... Computational Linguistics Mining Wiki Resources for Multilingual Named Entity Recognition Alexander E. Richman Patrick Schone Department of Defense Department of Defense Washington, DC...

Tài liệu Báo cáo khoa học: "Inducing Gazetteers for Named Entity Recognition by Large-scale Clustering of Dependency Relations" ppt

... English. 408 Proceedings of ACL-08: HLT, pages 407–415, Columbus, Ohio, USA, June 2008. c 2008 Association for Computational Linguistics Inducing Gazetteers for Named Entity Recognition by Large-scale Clustering of ... clustering of de- pendency relations between verbs and multi- word nouns (MNs) to construct a gazetteer for named entity recognition (NER). Since depen- dency relations capture the semantics of MNs well, ... for storing only a part of classes C l , i.e., 1/|P | of the parame- ter matrix, where P is the number of cluster nodes. This data splitting enables linear scalability of mem- ory sizes. However,...

Tài liệu Báo cáo khoa học: "Improving the Scalability of Semi-Markov Conditional Random Fields for Named Entity Recognition" pdf

... label of a named entity is “O”, which indicates a non -named entity. For 98.0% of the named entities in the training data of the shared task in the 2004 JNLPBA, the label of the preced- ing entity ... End- Word” capture the tendency of the length of a named entity. “Count feature” captures the ten- dency for named entities to appear repeatedly in the same sentence. “Preceding Entity and Prev Word” are ... N is the length of sentence and K is the size of label set. And that of training in first order semi-CRFs is O(K 2 LN). The increase of the cost is used to transfer non-adjacent entity information. To...

Báo cáo khoa học: "Incorporating speech recognition confidence into discriminative named entity recognition of speech data" ppt

... distinguish words of a class from words of other classes. For NER, we used an SVM-based chunk annotator YamCha 2 0.33 with a quadratic kernel (1 +  x ·  y) 2 and a soft margin parameter of SVMs C=0.1 ... tokenized using ChaSen. The vocabulary size of the word 3-gram model was 426,023. The test-set perplexity over the text corpus was 76.928. The number of out- of- vocabulary words was 1,551 (0.587%). ... Compar- isons of NE surfaces did not include differences in word segmentation because of the segmentation ambiguity in Japanese. Note that NER recall with ASR results could not exceed the rate of the...

Tài liệu Báo cáo khoa học: "Weakly Supervised Named Entity Transliteration and Discovery from Multilingual Comparable Corpora" ppt

... language. Identification of the entity s equivalence class of transliterations is important for obtaining its accurate time sequence. In order to keep to our objective of requiring as little language ... was initialized with a set of 20 pairs of En- glish NEs and their Russian transliterations. Nega- tive examples here and during the rest of the train- ing were pairs of randomly selected non-NE ... random 727 of the total of 978 NEs were matched to correct transliterations by a language expert (partly due to the fact that some of the English NEs were not mentioned in the Rus- sian side of the...

Tài liệu Báo cáo khoa học: "Syntax-based Semi-Supervised Named Entity Tagging Behrang Mohit" ppt

... sub- ject or the object of a sentence has a high probabil- ity of being a particular type of named entity. Thus, we expanded our syntactic analysis of the data into dependency parse of the text and ... section 5 covers the results of the evalua- tion of our system. Figure 1: System's architecture 3 Named Entity Recognition In this level, the system used a group of syntax- based rules to ... trained classifier generalizes well. 1 Introduction Named entity (NE) tagging is the task of recogniz- ing and classifying phrases into one of many se- mantic classes such as persons, organizations...

Tài liệu Báo cáo khoa học: "A Bootstrapping Approach to Named Entity Classification Using Successive Learners" pdf

... 1998. Description of the MENE named Entity System. Proceedings of MUC-7. Collins, M. and Y. Singer. 1999. Unsupervised Models for Named Entity Classification. Proceedings of the 1999 Joint ... performance of the HMM on the PRO tag. Table 4. Performance of PRODUCT NE TYPE PRECISION RECALL F-MEASURE PRODUCT 67.3% 72.5% 69.8% Similar to the case of ORG NEs, the number of concept-based ... Discovery Engine Supported by New Levels of Information Extraction. Proceeding of HLT-NAACL 2003 Workshop on Software Engineering and Architecture of Language Technology Systems, Edmonton,...

Tài liệu Báo cáo khoa học: "The Multilingual Named Entity Recognition Framework" docx

... entropy approach for named entity recognition. PhD Thesis, New York University. Collins M. and Singer Y. (1999) Unsupervised models for named entity classification. In Proceedings of EMNLP/WVLC, 1999, ... language technology is not much developed for most of them. This has a big consequence for named entity recognition: for certain languages like most of the European languages, we benefit from already ... is simply, most of the time, not realistic to tag large amount of corpus (Appelt and Israel, 1999). Moreover, tagging great amounts of data can be compared to the elaboration of dictionaries 2 . • Grammar....

Báo cáo khoa học: "Japanese Named Entity Recognition based on a Simple Rule Generator and Decision Tree Learning" pdf

... Unsuper- vised models for named entity classification. In Proceedings of EMNLP/VLC. Jim Cowie. 1995. CRL/NMSU description of the CRL/NMSU system used for MUC-6. In Proceed- ings of the Sixth Message ... memt, January. Manabu Sassano and Takehito Utsuro. 2000. Named entity chunking techniques in supervised learning for Japanese named entity recognition. In Proceed- ings of the International Conference on Computa- tional ... 705–711. Satoshi Sekine and Yoshio Eriguchi. 2000. Japanese named entity extraction evaluation — analysis of results —. In Proceedings of 18th International Conference on Computational Linguistics,...

Báo cáo khoa học: "The Multilingual Named Entity Recognition Framework" ppt

... resources and tools for named entity recognition. A team of computational linguist students develops this The members of the INaLCO Named Entity Group are: A. Acoulon, C. Avaux, L. Beroff-Beneat-, A. ... different approaches to named entity recognition. We then examine previous experiments to compare systems and techniques. Sekine and Eriguchi (2000) present an interesting classification of named entity recognition ... entropy approach for named entity recognition. PhD Thesis, New York University. Collins M. and Singer Y. (1999) Unsupervised models for named entity classification. In Proceedings of EMNLP/WVLC, 1999,...

Báo cáo khoa học: "A Framework for Unifying Named Entity Recognition and Disambiguation Extraction Tools" pot

... extract the list of Named Entity, their classification and the URIs that dis- ambiguate these entities. The main purpose of this interface is to enable a human user to assess the quality of the extraction ... the comparison of the perfor- mance of these services as well as their pos- sible combination. We address this problem by proposing NERD, a framework which unifies 10 popular named entity extractors available ... 09/10/2011 to 12/10/2011. the number n d of evaluated documents, the num- ber n w of words, the total number n e of enti- ties, the total number n c of categories and n u URIs. Moreover, we compute...

Báo cáo khoa học: "Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation" potx

... to find whether a N-gram is a named entity, we match it to the named entity list extracted using the open- Calais API3, which contains more than 30 types of named entities, such as Person, ... Totally, the traditional named entity dis- ambiguation methods can be classified into two categories: the shallow methods and the know- ledge-based methods. Most of previous named entity disambiguation ... knowledge captured in the structural semantic relatedness measure for named entity disambiguation. Because the key problem of named entity disambiguation is to measure the similarity between name...

