Tài liệu Báo cáo khoa học: "Mining Wikipedia Revision Histor

Tài liệu Báo cáo khoa học: "Mining Wikipedia Revision Histories for Improving Sentence Compression" docx

... 137–140, Columbus, Ohio, USA, June 2008. c 2008 Association for Computational Linguistics Mining Wikipedia Revision Histories for Improving Sentence Compression Elif Yamangil Rani Nelken School of ... importance. 2 Data: Wikipedia revision histories as a source of sentence compressions Many researchers are increasingly turning to Wikipedia as a large-scale data source...

Ngày tải lên: 20/02/2014, 09:20

4 328 0

Tài liệu Báo cáo khoa học: "Generating Impact-Based Summaries for Scientiﬁc Literature" docx

... papers. Each citation context contains 5 sentences with 2 sentences before and after the citing sentence. Since a low-impact paper would not be useful for evaluating impact summarization, we took ... citation context and then for each extracted sentence ﬁnd a similar one in the original paper. Unfortunately, we did not have time to test this approach before the deadline for the...

Ngày tải lên: 20/02/2014, 09:20

9 376 0

Tài liệu Báo cáo khoa học: "A Finite-State Model of Human Sentence Processing" docx

... design. In this study, a total of 76 sentences were tested: 10 for lexical category ambiguity, 12 for RR ambiguity, 20 for PP ambiguity, 16 for DO/SC ambiguity, and 18 for clausal boundary ambiguity. This ... 1981). For this test, materials from Traxler et al. (2002) (96 sentences) are used. 51 4 Results 4.1 The Probability Decrease per Word Unambiguous sentences are usually l...

Ngày tải lên: 20/02/2014, 11:21

8 446 0

Tài liệu Báo cáo khoa học: "ADP based Search Algorithm for Statistical Machine Translation" docx

... additional parameter into the recursion formula for DP. In the following, we will explain this method in detail. 2.3 Recursion Formula for DP In the DP formalism, the search process is described ... A. For our first experiments, we set A to 0.5. The test corpus consisted of 150 sentences, for which sample translations exist. The labels were translated separately: First, the...

Ngày tải lên: 20/02/2014, 18:20

8 481 0

Tài liệu Báo cáo khoa học: "AN INTEGRATED HEURISTIC SCHEME FOR PARTIAL PARSE EVALUATION" docx

... original input sentence. The fea- tures are designed to be general and, for the most part, grammar and domain independent. For each parse, the heuristic computes a penalty score for each of the ... Parser [Tomita, 1986], that can parse almost any input sentence by ignoring unrecognizable parts of the sentence. On a given input sentence, the parser returns a collection of pa...

Ngày tải lên: 20/02/2014, 21:20

3 346 0

Tài liệu Báo cáo khoa học: " Mining the Web for Language Learning" pdf

... title/non-title classiﬁers, are applied to each term /sentence pair. The readability evaluator assigns a score to each term /sentence pair according to Formula 1 7 . 206.835−1.015× #words #sentences −84.6× #syllables #words (1) Two ... therefore cannot cover fresh words or new usages of existing words. Secondly, their search 1 http://www.engkoo.com. functions are often limited, making it ha...

Ngày tải lên: 20/02/2014, 05:20

6 658 0

Tài liệu Báo cáo khoa học: "Mining Wiki Resources for Multilingual Named Entity Recognition" pdf

... is available for download (download.wikimedia.org) in a text format suitable for inclusion in a database. For the remainder of this paper, we refer to this format. 1 Within Wikipedia, we ... language article, if available, for additional information. • A second pass checks for multi-word phrases that exist as titles of Wikipedia articles. • We look for certain types...

Ngày tải lên: 20/02/2014, 09:20

9 429 1

Tài liệu Báo cáo khoa học: "Mining User Reviews: from Speciﬁcation to Summarization Xinfan Meng Key Laboratory of Computational Linguistics " doc

... struc- ture information and unit of measurement information are mined from the speciﬁ- cation to improve the accuracy of feature extraction. At summary generation stage, hierarchy information in ... to users. For example, for feature “size”, descriptions like “small” and “thin” are more readable than “positive”. Usually, the words used to describe a product feature are short. For each p...

Ngày tải lên: 20/02/2014, 09:20

4 430 0

Tài liệu Báo cáo khoa học: "Mining metalinguistic activity in corpora to create lexical resources using Information Extraction techniques: the MOP system" doc

... accounting for a (basi- cally) correct selection of superficial sentence segments. 9 For sentence (8) the system would retrieve a previ- ous sentence: (“A few have positive enthalpies of for- mation”). ... these sentences will have at least two indi- cators present, for example a verb and a descrip- tor, or quotation marks, or even have preceding sentences that announce...

Ngày tải lên: 20/02/2014, 15:20

8 459 0

Tài liệu Báo cáo khoa học: "Analysing Wikipedia and Gold-Standard Corpora for NER Training" ppt

... our discretion: sec- tions 03–21 for TRAIN, 00–02 for DEV and 22-24 for TEST. Corpus sizes are compared in Table 1. 2.2 Evaluating NER performance One challenge for NER research is establishing ... use stylistic forms which are rare in Wikipedia. For instance, the Wall Street Journal (BBN) uses US state abbreviations, while Wikipedia nearly al- ways refers to states in full. We b...

Ngày tải lên: 22/02/2014, 02:20

9 478 0