... the antecedents of (zero) pronouns. When there is a (zero) pronoun in the text, noun phrases that are in the previous context of the pronoun are sorted in order of likelihood of being the antecedent. The ... property-sharing constraint in center- ing. Annual Meeting of Association of Computational Lin- guistics, pages 200–206. T. Kudo and Y. Matsumoto. 2004. A boosting algorithm for classification of semi-structured ... relations out of 236,142 pairs in the annotated text. We conducted ten-fold cross-validation over 236,142 pairs of NEs so that sets of pairs from a single text were not divided into the training and...
Ngày tải lên: 17/03/2014, 04:20
... Computational Linguistics. Kevin Gimpel and Noah A. Smith. 2010a. Softmax- margin CRFs: Training log-linear models with loss functions. In Proceedings of the Human Language Technologies Conference of the ... with self-training on unlabeled target-domain data; enforc- ing the same recall-oriented bias in the self- training stage yields marginal gains. 1 1 Introduction This paper considers named entity ... chal- lenging problem of identifying named entities in Arabic Wikipedia text. 2 Arabic Wikipedia NE Annotation Most of the effort in NER has been fo- cused around a small set of domains and general-purpose...
Ngày tải lên: 24/03/2014, 03:20
Tài liệu Báo cáo khoa học: "Robust Extraction of Named Entity Including Unfamiliar Word" doc
... compare perfor- mances of proposed methods and baseline methods. 3 Robust Extraction of Named Entities Including Unfamiliar Words The proposed method of extracting NEs consists of two steps. Its first ... Example of Training Instance for Proposed Method −→ Parsing Direction −→ Feature set F i−2 F i−1 F i F i+1 F i+2 Chunk label c i−2 c i−1 c i Figure 1 shows an example of training instance of the ... know. 1 2.2 Chunking of Named Entities It is quite common that the task of extracting Japanese NEs from a sentence is formalized as a chunking problem against a sequence of mor- 1 The organizer of the...
Ngày tải lên: 20/02/2014, 09:20
Báo cáo khoa học: "Annotating and Recognising Named Entities in Clinical Notes" pot
... baseline system was built using only bag -of- word features from the training corpus. A context-window size of 2 and tag pre- diction of previous token were used in all experi- ments. Without using ... acronyms in the notes. This also suggest that this kind of clin- ical notes are very noisy, and require a consider- 23 able amount of effort in pre-processing. Allow- ing partial matching increased ... overall increase of 2.47 F-score. Partial matching discov- ered a larger number of matching candidates us- ing a looser matching criteria, therefore decreased in precision with compensation of an increase...
Ngày tải lên: 08/03/2014, 01:20
Báo cáo khoa học: "Recognizing Named Entities in Tweets" docx
... Linguistics Recognizing Named Entities in Tweets Xiaohua Liu ‡ † , Shaodian Zhang ∗ § , Furu Wei † , Ming Zhou † ‡ School of Computer Science and Technology Harbin Institute of Technology, Harbin, 150001, China Đ Department ... mingzhou}@microsoft.com Đ zhangsd.sjtu@gmail.com Abstract The challenges of Named Entities Recogni- tion (NER) for tweets lie in the insufficient information in a tweet and the unavailabil- ity of training data. We propose to com- bine a K-Nearest Neighbors ... 2010. Annotating named entities in twitter data with crowd- sourcing. In CSLDAMT, pages 80–88. Jenny Rose Finkel and Christopher D. Manning. 2009. Nested named entity recognition. In EMNLP, pages 141–150. Jenny...
Ngày tải lên: 17/03/2014, 00:20
encyclopedic dictionary of named processes in chemical technology
Ngày tải lên: 01/04/2014, 10:14
kurti - strategic applications of named reactions in organic synthesis (elsevier, 2005)
Ngày tải lên: 03/04/2014, 12:15
TimeML: Robust Specification of Event and Temporal Expressions in Text doc
... holding during the duration of the other: 10. One being the beginning of the other: John has lived in Boston since 1998. 11. One being begun by the other: (cf. 10) 12. One being the ending of the ... it contains incomplete information regarding the domain over which the expression is to be interpreted. We introduce the attribute CARDINALITY in the MAKEINSTANCE tag to allow for this interpreta- tion. ... relType="IS_INCLUDED"/> 3.2 SLINK SLINK or Subordination Link is used for contexts introducing relations between two events, or an event and a signal, of the following sort: 1. Modal: Relation introduced mostly...
Ngày tải lên: 30/03/2014, 16:20
Contrative analysis of cohesive devices in english texts and those in vietnamese ones
Ngày tải lên: 12/12/2013, 00:03
Tài liệu Báo cáo khoa học: "The Effect of Corpus Size in Combining Supervised and Unsupervised Training for Disambiguation" pdf
... struc- tures giving rise to th e same set of dependen- cies (a piece of a tile of a roof of a house vs. a piece of a roof of a tile of a house) cannot be distinguished. We believe that an inverted index ... performance of 87.6% for a train- ing set of about 85% of WSJ. That num- ber is not that far from the 82.8% achieved by Collins’ parser in our experiments when trained on 50% of WSJ. Some of the super- vised ... co-training for statistical parsers. In Workshop on the Con- tinuum from Labeled to Unlabeled Data in Ma- chine Learning and Data Mining, ICML. Mark Johnson and Stefan Riezler. 2000. Ex- ploiting...
Ngày tải lên: 20/02/2014, 12:20
Tài liệu Báo cáo khoa học: "SenseLearner: Word Sense Disambiguation for All Words in Unrestricted Text" doc
... reported during the recent SENSEVAL evaluations. 1 Introduction The task of word sense disambiguation consists of assigning the most appropriate meaning to a polyse- mous word within a given context. ... advantage of providing larger coverage. In this paper, we present a method for solving the semantic ambiguity of all content words in a text. The algorithm can be thought of as a minimally supervised word ... back-off method using the most frequent sense in WordNet when no training exam- ples were found in SEMCOR. This resulted into sig- nificantly higher complexity, with a very large num- ber of models...
Ngày tải lên: 20/02/2014, 15:20
Báo cáo khoa học: "A Method for Word Sense Disambiguation of Unrestricted Text" potx
... be able to distin- guish later the correct sense association from such a small pool. 3 Contextual ranking of word senses Since the Internet contains the largest collection of texts electronically ... results in a value indicating the frequency of occurrences for Wl and the sense of W2. In our experiments we used (Altavista, 1996) since it is one of the most powerful search engines currently ... SemCor was done of course within a larger context, the context of sentence and discourse. By working only with a pair of words we do not take advan- tage of such a broader context. For example,...
Ngày tải lên: 08/03/2014, 06:20
Báo cáo khoa học: "Semantic Retrieval for the Accurate Identification of Relational Concepts in Massive Textbases" ppt
... in- 1023 Query No. User input 1 something inhibit ERK2 2 something trigger diabetes 3 adiponectin increase something 4 TNF activate IL6 5 dystrophin cause disease 6 macrophage induce something 7 ... for semantic annotations 3.2 On-line processing The off-line processing described above results in much simpler on-line processing. User input is converted into queries of the extended region al- gebra, ... is significantly improved. 1 Introduction Rapid expansion of text information has motivated the development of efficient methods of access- ing information in huge texts. Furthermore, user demand...
Ngày tải lên: 17/03/2014, 04:20
Báo cáo khoa học: "Good Spelling of Vietnamese Texts, one aspect of computational linguistics in Vietnam" ppt
... Designing Vietools The error detecting program reads one syllable at a time from the text. The syllable is divided into an initial consonant and a rhyme pattern, paying attention to solving initial ... the correction of a syllable based on the column position of initial consonants and the row position of rhyme patterns, for example, the syllable lamf (work) in the TELEX form, is composed of the initial ... there are 27 initial consonants, and 1160 rhyme patterns (including 6 tones). Based on Vietnamese syllable structure, the spelling database is built in a tabular form. Each element of the table...
Ngày tải lên: 17/03/2014, 07:20
Báo cáo khoa học: "Centrality Measures in Text Mining: Prediction of Noun " docx
... to represent texts and hence analyze mutual relevance of two texts. The values of the elements in a vector are determined by the betweenness cen- trality of the NPs in a text being analyzed. ... to represent the prominence of a NP in the text, not only does the kind of the centrality matter, but also the way of forming the NP network. Overall, the heuristic of using centrality itself ... NJ. M. Kubat and S. Matwin. 1997. Addressing the curse of imbalanced data sets: one-sided sampling. In Proc. of the Fourteenth International Conference on Machine Learning, Morgan Kauffman,...
Ngày tải lên: 23/03/2014, 19:20
Báo cáo khoa học: "Recognizing Expressions of Commonsense Psychology in English Text" potx
... combined into a single finite state machine (one for each concep- tual area). By examining the number of states and transitions in the compiled finite state graphs, some indication of their relative ... high levels of accuracy in identifying these concepts in natural language text. The remainder of this paper describes our efforts in authoring and evaluating such a resource. 3 Authoring recognition ... 6-11. Silberztein, M. (1999) Text Indexing with INTEX. Computers and the Humanities 33(3). Traum, D. (1993) Mental state in the TRAINS-92 dia- logue manager. In Working Notes of the AAAI Spring Symposium...
Ngày tải lên: 23/03/2014, 19:20
Báo cáo khoa học: "Tree Representations in Probabilistic Models for Extended Named Entities Detection" ppt
... level of named entities. Using two different sets of morpho-syntactic features results in more effec- tive models, as they create a kind of agreement for a given word in case of match. Concerning the ... corre- sponding named entity is shown in figure 4. As decided in the annotation guidelines, fillers can be part of a named entity. This can happen for com- plex named entities involving several ... entities. Fillers of named entities should be, in principle, distin- guished from any other filler, since they may be informative to discriminate entities. Following this intuition, we designed...
Ngày tải lên: 24/03/2014, 03:20
Báo cáo khoa học: "Automatic of Proper Processing Names in Texts" pptx
... words). In addition, the number of words used in constructing proper names is potentially in- finite. The first step of the processing is segmentation, i.e. accurate cutting-up of proper names in ... France [4] JACOBS P., RAU L. 1993 Innovations in text interpretation, Artificial Intelligence 63 [5] HAYES PH. 1994 NameFinder : Software that find names in Text, RIAO '94 New York [6] ... company) is often used later in the text instead of the full name. The company Kyocera Corp, for example, may be designated by the single word Kyocera in the remainder of the text. Consequently,...
Ngày tải lên: 24/03/2014, 05:21
Báo cáo khoa học: "Time Period Identification of Events in Text" pptx
Ngày tải lên: 31/03/2014, 01:20