Tài liệu Báo cáo khoa học: "Unsupervised Discourse Segmentation of Documents with Inherently Parallel Structure" pdf
... to the problem of segmenting parallel parts of documents. The task of aligning each sentence of an abstract to one or more sentences of the body has been studied in the context of summarization ... that it neglects the discourse structure and the lexical cohesion phenomenon. 3 Model In this section we describe our model for discourse segmentation of documents with...
Ngày tải lên: 20/02/2014, 04:20
... cases manual annotation of objects with numerical properties is possible, it is a hard and labor intensive task, and is impractical for dealing with the vast amount of objects of interest. Hence, there ... of the methods are suitable for retrieval of numerical attributes. How- ever, most of them do not exploit the numerical nature of the attribute data. Our research is relat...
Ngày tải lên: 20/02/2014, 04:20
... location of the exclaiming avatar to determine if the exclamation was a result of their location (in the zone with the dead body) or because of something said or done by another player. Location of ... events involving multiple avatars over a span of time and space. While the design of the RAT tool will support annotation of any event of interest with only slight m...
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: "Co-training for Predicting Emotions with Spoken Dialogue Data" pdf
... with spoken dialogue data. Al- though a large set of dialogues have been collected, only 8% of them have been annotated (10 dialogues with a total of 350 utterances), due to the laborious annotation ... data consists of the student turns in a set of 10 spoken dialogues randomly selected from a corpus of 128 qualitative physics tutoring dialogues between a human tutor and...
Ngày tải lên: 20/02/2014, 16:20
Tài liệu Báo cáo khoa học: "Predicting the fluency of text with shallow structural features: case studies of machine translation and human-written text" doc
... distinguished from machine translations with high fluency with accuracy of 61%. In pairwise comparison of sen- tences with different fluency, accuracy of predict- ing which of the two is better is 90%. Results ... number of words comprising a given type of phrase, divided by the number of phrases of this type. It was computed for PP, NP, VP, ADJP, ADVP. Two versions of t...
Ngày tải lên: 22/02/2014, 02:20
Tài liệu Báo cáo khoa học: "Unsupervised Topic Modelling for Multi-Party Spoken Discourse" ppt
... interested. Of course, this requires both identification of the top- ics discussed, and segmentation into the periods of topically related discussion. Work on automatic topic segmentation of text and ... two levels, with each segment being produced from a linear combination of the distributions associated with each topic. Consequently, our model can of- ten capture the cont...
Ngày tải lên: 20/02/2014, 11:21
Tài liệu Báo cáo khoa học: "Unsupervised Semantic Role Induction with Global Role Ordering" doc
... relation of an argument to its head in the dependency parse tree, (ii) head: head word of the argument, and (iii) pos-head: Part -of- Speech tag of head. Algorithm 1 describes the generative story of ... most of the in- tervals tend to have only a few types of SRs and a given SR tends to occur only in a few types of in- tervals. The concept of intervals is also related to the...
Ngày tải lên: 19/02/2014, 19:20
Tài liệu Báo cáo khoa học: "Unsupervised Search for The Optimal Segmentation for Statistical Machine Translation" doc
... unaddressed problem of unsupervised determination of the optimal morphological segmentation for statistical machine translation (SMT) and propose a segmentation metric that takes into account both sides of the ... one but both sides of the parallel corpus. A posssible choice is the post -segmentation alignment accuracy. How- ever, Elming et al. (2009) show that optimizing segm...
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: "Unsupervised Translation Induction for Chinese Abbreviations using Monolingual Corpora" ppt
... example, Bill Gates tends to appear together with Microsoft. The co-occurrence may imply a relationship (e.g., Bill Gates is the founder of Microsoft). By inspec- tion of the Chinese text, we found that ... 50M). Measure Value number of English entities 5M number of Chinese entities 4.7M number of full-abbreviation relations 51K number of translation entries added 210K total num...
Ngày tải lên: 20/02/2014, 09:20
Tài liệu Báo cáo khoa học: "Analyzing the Errors of Unsupervised Learning" docx
... a sequence of words and the output y is the corresponding sequence of part -of- speech tags. In the PCFG, the input x is a sequence of POS tags and the output y is a binary parse tree with yield x. ... of EM contain valuable information about the incor- rect biases of these models. However, EM is chang- ing hundreds of thousands of parameters at once in a non-trivial way, so w...
Ngày tải lên: 20/02/2014, 09:20