Tài liệu Báo cáo khoa học: "Automatic Construction of Polarity-tagged Corpus from HTML Documents" docx
... of reviews are not available. In addition, the corpus created from re- views is often noisy as we discuss in Section 2. This paper proposes a novel method of building polarity-tagged corpus from ... proposes a novel method of building polarity-tagged corpus from HTML documents. The characteristics of this method is that it is fully automatic and can be applied to arb...
Ngày tải lên: 20/02/2014, 12:20
... application of the method is auto- matic or semi-automatic compilation of a glossary or technical-term dictionary for a certain domain. Re- cursive application of the method enables to collect a list of ... consists of three steps: compiling corpus, au- tomatic term recognition (ATR), and filtering. This system is implemented for Japanese language. 2.1 Compiling corpus The first st...
Ngày tải lên: 20/02/2014, 16:20
... on the Case Filter of Rouvret and Vergnaud (1980). The completeness of the output list increases monotonically with the total number of occurrences of each verb in the corpus. False positive ... is evaluated in terms of efficiency and accuracy. The most useful estimate of effi- ciency is simply the density of observations in the corpus, shown in the first column of Tabl...
Ngày tải lên: 20/02/2014, 21:20
Tài liệu Báo cáo khoa học: "Automatic Extraction of Lexico-Syntactic Patterns for Detection of Negation and Speculation Scopes" pdf
... conditions. The importance of the task of negation and spec- ulation (a.k.a. hedge) detection is attested by a num- ber of research initiatives. The creation of the Bio- Scope corpus (Vincze et al., ... Statistics of the BioScope corpus. The 2nd and 3d columns show the total number of cues within the datasets; the 4th and 5th columns show the percentage of negated and spec- u...
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: "Automatic learning of textual entailments with cross-pair similarities" ppt
... ex- amples of the previous section. From the point of view of bag -of- word methods, the pairs (T 1 , H 1 ) and (T 1 , H 2 ) have both the same intra-pair simi- larity since the sentences of T 1 and ... rules that describe a non trivial set of entailment cases. The experiments with the data sets of the RTE 2005 challenge show an improvement of 4.4% over the state -of- the-art...
Ngày tải lên: 20/02/2014, 12:20
Tài liệu Báo cáo khoa học: "Automatic Identification of Pro and Con Reasons in Online Reviews" ppt
... specific and tangible features. Also, there are somewhat a fixed set of features of a specific type of product, for exam- ple, ease of use, durability, battery life, photo quality, and shutter lag ... examples of sen- tences that our system identified as reasons of complaints. (1) Unfortunately, I find that I am no longer comfortable in your establishment because of the...
Ngày tải lên: 20/02/2014, 12:20
Tài liệu Báo cáo khoa học: "Automatic Evaluation of Sentence-Level Fluency Andrew Mutton∗" pdf
... the discriminability of the data before giving them to hu- man judges. Our approach to generating ‘sentences’ of a fixed length is to take word sequences of different lengths from a corpus and glue ... training data the 1000 instances of sentences of sequence length 24 (i.e. sentences extracted from the corpus) and as negative training data the 1000 sentences of se- quence l...
Ngày tải lên: 20/02/2014, 12:20
Tài liệu Báo cáo khoa học: "Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics" doc
... during construction of N-best translation lexicons from parallel text. Melamed (1995) used the ratio (LCSR) between the length of the LCS of two words and the length of the longer word of the ... WLCS score ending at word x i of X and y j of Y, w is the table storing the length of consecu- tive matches ended at c table position i and j, and f is a function of cons...
Ngày tải lên: 20/02/2014, 16:20
Tài liệu Báo cáo khoa học: "Automatic clustering of collocation for detecting practical sense boundary" ppt
... means a set of vocabulary, N is the size of the contextual window that is an integer, and C means a set of corpus. In this paper, vocabulary refers to all content words in the corpus. Function ... vocabularies are selected from a given corpus and 2P C/VP is all sets of C/V. In the equation (1), the frequency of x is m in c. We can also express m=|c/x|. The window size...
Ngày tải lên: 20/02/2014, 16:20
Tài liệu Báo cáo khoa học: "Automatic Detection of Nonreferential It in Spoken Multi-Party Dialog" doc
... a mi- nority of all instances of it. Evans (2001) reports that his corpus of approx. 370.000 words from the SUSANNE corpus and the BNC contains 3.171 examples of it, approx. 29% of which are ... words) contains 425 instances of it, 16.5% of which are nonreferential. Boyd et al. (2005) use a 350.000 word corpus from a variety of genres. They count 2.337 instances of...
Ngày tải lên: 22/02/2014, 02:20