Báo cáo khoa học: "Lexicalized phonotactic word segmentation" pptx
... number of words (small, medium, large subsets), average phones per word, average words per phrase, and percent of word types that occur only once (hapax). Phones /word is replaced by characters /word ... partial words helps the segmenter handle long, infrequent words. Long words are typically created by productive mor- phology and, thus, often start and end just like other words. Only 32...
Ngày tải lên: 17/03/2014, 02:20
... segmented words that are either lexical words or OOV words with certain types (e.g. person name, morphological words, new words) we then have a system that can perform word segmentation and OOV word ... models, the procedure of word segmentation in our system is as follows: First, all word candidates (lexical words and OOV words of certain types) are generated, each with its w...
Ngày tải lên: 23/03/2014, 19:20
... co-occurrence. Word based model. In this model, statistical data about word boundary frequencies for each character is retrieved word- wise. For example, in the case of a monosyllabic word only two word ... components of words, instead, they are contextual background providing informa- tion about the likelihood of whether each CB is also a wordbreak (WB). In other words, we model Chi...
Ngày tải lên: 20/02/2014, 12:20
Báo cáo khoa học: "Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling" doc
... actually function as a single word, and we of- ten condense them into the virtual words “UK” and “w.r.t.”. In order to extract “words” from text streams, unsupervised word segmentation is an important research ... word boundary between two neighboring words, they can leverage only up to bigram word dependencies. In this paper, we extend this work to pro- pose a more efficient and accura...
Ngày tải lên: 17/03/2014, 01:20
Báo cáo khoa học: "A Novel Word Segmentation Approach for Written Languages with Word Boundary Markers" pptx
... indicates the word spac- ing error rate of the user input in terms of the character-unit precision, and the y-axis shows the word- unit precision of the output. Each graph de- picts the word- unit ... applications work under the as- sumption that a user input is error-free; thus, word segmentation (WS) for written languages that use word boundary mark- ers (WBMs), such as spaces, has...
Ngày tải lên: 17/03/2014, 02:20
Báo cáo khoa học: "Fully Unsupervised Word Segmentation with BVE and MDL" pdf
... obtained using the first 100,000 words of the Chinese Gigaword corpus (Huang, 2007), written in Chinese characters. The word boundaries specified in the Chinese Gigaword Cor- pus were used as a gold ... lex- icon, or set of words. More formally, the segmented corpus S is a list of words s 1 s 2 . . . s N . L(S), the lexicon implicitly defined by S, is simply the set of unique words in S. The d...
Ngày tải lên: 30/03/2014, 21:20
Tài liệu Báo cáo khoa học: "Fixed Length Word Suffix for Factored Statistical Machine Translation" pdf
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: "Learning Sub-Word Units for Open Vocabulary Speech Recognition" doc
... coherence. Hybrid word/ sub -word recognizers can produce a sequence of sub -word units in place of OOV words. Ideally, the recognizer outputs a complete word for in-vocabulary (IV) utterances, and sub -word ... recognize words beyond their vocab- ulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/ sub -word systems solve this problem by...
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: "Yet Another Word Alignment Tool" docx
... with Yawat. As the mouse is moved over a word, th e word and all words linked with it are highlighted. The highlighting is removed when the mouse leaves the word in qu estion. This allows the annotator ... assoc iated words are shown only for one wor d at a time, as determined by the location of the mouse pointer. When the mouse is moved over a word in the text, the word and all the...
Ngày tải lên: 20/02/2014, 09:20
Tài liệu Báo cáo khoa học: "Guiding Statistical Word Alignment Models With Prior Knowledge" pdf
... a m 1 specifies the indices of source words that target words are aligned to. In an HMM-based word alignment model, source words are treated as Markov states while target words are observations that are ... as 1. In building word alignment models, a special “NULL” word is usually introduced to address tar- get words that align to no source words. Since this physically non-existing word...
Ngày tải lên: 20/02/2014, 12:20