Báo cáo khoa học: "A Unified Tagging Approach to Text Normalization" pptx
... and tagging. In preprocessing, (A) we separate the text into paragraphs (i.e., sequences of tokens), (B) we determine tokens in the paragraphs, and (C) we assign possible tags to each token. ... separated into different tokens if they are joined together. Natural spaces and line breaks are also regarded as tokens. (C). We assign tags to each token based on the type of the token....
Ngày tải lên: 31/03/2014, 01:20
... Approach to Unsupervised Part-of-Speech Tagging ∗ Sharon Goldwater Department of Linguistics Stanford University sgwater@stanford.edu Thomas L. Griffiths Department of Psychology UC Berkeley tom griffiths@berkeley.edu Abstract Unsupervised ... differences hold to a lesser degree when a partial dictionary is provided. With MLHMM, different tokens of the same word type are usually assign...
Ngày tải lên: 20/02/2014, 12:20
... converted to dependency trees us- ing Stanford Parser (Marneffe et al., 2006). We con- vert the tokens in training data to lower case, and re-tokenize the sentences using the same tokenizer from ... sensitive to parser er- rors; on the other hand, integrated model is forced to use a longer distortion limit which leads to more search errors during decoding time. It is possible to 9...
Ngày tải lên: 19/02/2014, 19:20
Tài liệu Báo cáo khoa học: "A Feature Based Approach to Leveraging Context for Classifying Newsgroup Style Discussion Segments" pptx
... automaton is set to initial state (q 0 ) at the top of a message. It makes a transition to state (q 1 ) when it encounters a quoted span of text. Once in state (q 1 ), the automa- ton remains in ... a span of text is to the spans of text in the parent message. This is computed using the mini- mum of all cosine distance measures between the vector representation of the span of...
Ngày tải lên: 20/02/2014, 12:20
Báo cáo khoa học: "A Nonparametric Bayesian Approach to Acoustic Model Discovery" docx
... the future, we plan to explore phonological context and use more flexible topological structures to model acoustic units within our framework. Acknowledgements The authors would like to thank Hung-an ... R 39 to denote the t th feature frame of the i th utterance. Fig. 1 illustrates how the speech signal of a single word utterance banana is converted to a sequence of feature vectors...
Ngày tải lên: 07/03/2014, 18:20
Báo cáo khoa học: "A Two-step Approach to Sentence Compression of Spoken Utterances" pdf
... first step, 8 anno- tators were asked to select words to be removed to compress the sentences. In the second step, 6 an- notators (different from the first step) were asked to pick the best one ... in order to remove re- dundant or unnecessary words while trying to pre- serve the information in the original sentence. Sen- tence compression has been studied from formal text domain...
Ngày tải lên: 07/03/2014, 18:20
Báo cáo khoa học: "A Syntax-Free Approach to Japanese Sentence Compression" potx
... Syntax-Free Approach to Japanese Sentence Compression Tsutomu HIRAO, Jun SUZUKI and Hideki ISOZAKI NTT Communication Science Laboratories, NTT Corp. 2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0237 ... alternative to these tree trimming approaches, sequence-oriented approaches have been proposed (McDonald, 2006; Nomoto, 2007; Hori and Furui, 2003; Clarke and Lapata, 2006). Nomoto (2...
Ngày tải lên: 08/03/2014, 00:20
Báo cáo khoa học: "A Noisy-Channel Approach to Question Answering" docx
... “legal” is related to “rule”, which in turn is related to “mandatory”; that “age” is related to “aged”; and that “Argentine” is related to “Argentina”. It is not difficult to see by now that ... S that is likely to be an answer to Q and assigns a score to it. Once one has these two modules, one has a QA system because finding the answer to a question Q amounts to se...
Ngày tải lên: 08/03/2014, 04:22
Báo cáo khoa học: "A multi-staged approach to identifying complex events in textual data" ppt
... or in what context they appear. We attempt to extract this important contextual information using text classification methods. We also use text classification methods to help users to more quickly ... prob- lem), or in the context of a one-time event, such as a merger or layoff. A second concern is thus to enable end users to interpret facts and events through automated context...
Ngày tải lên: 08/03/2014, 21:20
Báo cáo khoa học: "A Memory-Based Approach to the Treatment of Serial Verb Construction in Combinatory Categorial Grammar" pdf
... ‘Kla goes out to seek Laay in the cane field and he finds that it is about to walk away.’ The sentence in (17) are split into two SVCs: the series of V 1 to V 3 and the series of V 4 to V 5 , be- cause ... gener- ative power for a particular language by annotating modalities to the slashes to allow or ban specific combinatory operations. Due to the page limita- tion, the multimodal...
Ngày tải lên: 08/03/2014, 21:20