Identifying coordinated compound words for Vietnamese word segmentation
... amount of coordinated compound words. The purpose of building a coordinated compound word is increase the accuracy of vietnamese word segmentation when detecting coordinated compound words. There ... all reverse word of coordinated compound words then check. 3.4 Review and estimate the accuracy of the dictionary The new coordinated compound words (about 30...
Ngày tải lên: 12/04/2014, 15:43
... thesis for further de- tails. 1409 4.2 Generation of Words with Internal Structures Words with rich internal structures can be described using a context-free grammar formalism as word → root (3) word ... structure of words that were not seen during training. For this, we sampled 100 such words including those with prefixes or suffixes and personal names. We found that for 82 o...
Ngày tải lên: 17/03/2014, 00:20
... model for Chinese word segmentation. It differentiates from the previous pruning approaches in two respects. First, the pruning criterion is based on performance variation of word segmentation. ... model for Chinese word segmentation was pro- posed. Gao et al. (2005) further developed it to a linear mixture model. In these statistical models, language models are essential...
Ngày tải lên: 17/03/2014, 04:20
Báo cáo khoa học: "An Empirical Study of Active Learning with Support Vector Machines for Japanese Word Segmentation" pptx
... described in the paragraph above. 4 Japanese Word Segmentation 4.1 Word Segmentation as a Classification Task Many tasks in natural language processing can be formulated as a classification task (van ... sen- tence to words. For example, kanji is mainly used to represent nouns or stems of verbs and adjectives. It is never used for particles, which are always writ- ten in hiragana....
Ngày tải lên: 17/03/2014, 08:20
Báo cáo khoa học: "Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection" docx
... performance of Chinese word segmentation. We consider here new word detection as an integral part of segmentation, aiming to improve both segmentation and new word detection: detected new words ... detected words are re- incorporated into word segmentation for improving segmentation accuracies. 3.2 New Features Here, we will describe high dimensional new features for...
Ngày tải lên: 23/03/2014, 14:20
Báo cáo khoa học: "Using Rejuvenation to Improve Particle Filtering for Bayesian Word Segmentation" doc
... et al., 2009) A sequence of words or utterance is generated by making independent draws from a discrete distribu- tion over words, G. As neither the actual “true” words nor their number is known ... directions for fu- ture research. 2 Model description The Unigram model assumes that words in a se- quence are generated independently whereas the Bi- gram model models dependencies betwe...
Ngày tải lên: 30/03/2014, 17:20
Báo cáo khoa học: "Improved Source-Channel Models for Chinese Word Segmentation" pdf
... words depending on how the words are used in real applications. In our system, a lexicon (containing 98,668 lexicon words and 59,285 morphologically derived words) has been constructed for ... Slashes indicate word boundaries. (b) An output of our word segmentation system. Square brackets indicate word boundaries. + indicates a morpheme boundary. • For lexicon words...
Ngày tải lên: 31/03/2014, 03:20
Tài liệu Word Segmentation for Vietnamese Text Categorization: An online corpus approach pptx
... acceptable Vietnamese word segmentation. Why is identifying word boundary in Vietnamese vital for Vietnamese text categorization? According to [18] and our survey, most of top-performing text ... of Vietnamese word segmentation is very problematic, especially without a manual segmentation test corpus. Therefore, we perform two experiments, one is done by human judg...
Ngày tải lên: 12/12/2013, 11:15
Tài liệu Báo cáo khoa học: "Unsupervized Word Segmentation: the case for Mandarin Chinese" doc
... density distributions for words vs. non -words, we observed that the VBE at both boundaries were the most dis- criminative value. Therefore, we decided to take in account the VBE only at the word- candidate ... Association for Computational Linguistics, pages 383–387, Jeju, Republic of Korea, 8-14 July 2012. c 2012 Association for Computational Linguistics Unsupervized Word Segme...
Ngày tải lên: 19/02/2014, 19:20
Tài liệu Báo cáo khoa học: "Bilingually Motivated Domain-Adapted Word Segmentation for Statistical Machine Translation" pptx
... to be words. Therefore, for languages where word boundaries are not orthographically marked, tools which segment a sentence into words are required. However, this segmentation is normally performed as ... vocabulary (Voc), number of character vocabulary (Char.voc) in Voc, and the running words (Run .words) when different word segmentations were used. From Ta- ble 7, we can see th...
Ngày tải lên: 22/02/2014, 02:20