Báo cáo khoa học: "Adaptive Chinese Word Segmentation" pptx

Báo cáo khoa học: "Adaptive Chinese Word Segmentation" pptx

Báo cáo khoa học: "Adaptive Chinese Word Segmentation" pptx

... segmented words that are either lexical words or OOV words with certain types (e.g. person name, morphological words, new words) we then have a system that can perform word segmentation and OOV word ... models, the procedure of word segmentation in our system is as follows: First, all word candidates (lexical words and OOV words of certain types) are generated, each with its w...

Ngày tải lên: 23/03/2014, 19:20

8 336 0
Tài liệu Báo cáo khoa học: "Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification" pdf

Tài liệu Báo cáo khoa học: "Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification" pdf

... co-occurrence. Word based model. In this model, statistical data about word boundary frequencies for each character is retrieved word- wise. For example, in the case of a monosyllabic word only two word ... introduce is that Chinese word segmentation is the classifi- cation of a string of character-boundaries (CB’s) into either word- boundaries (WB’s) and non -word- boundaries. I...

Ngày tải lên: 20/02/2014, 12:20

4 301 0
Báo cáo khoa học: "Lexicalized phonotactic word segmentation" pptx

Báo cáo khoa học: "Lexicalized phonotactic word segmentation" pptx

... number of words (small, medium, large subsets), average phones per word, average words per phrase, and percent of word types that occur only once (hapax). Phones /word is replaced by characters /word ... partial words helps the segmenter handle long, infrequent words. Long words are typically created by productive mor- phology and, thus, often start and end just like other words. Only 32...

Ngày tải lên: 17/03/2014, 02:20

9 173 0
Báo cáo khoa học: "Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling" doc

Báo cáo khoa học: "Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling" doc

... languages such as Chinese and Japanese have no explicit word boundaries, thus word seg- mentation is a crucial first step when processing them. Even in western languages, valid “words” are often ... actually function as a single word, and we of- ten condense them into the virtual words “UK” and “w.r.t.”. In order to extract “words” from text streams, unsupervised word segmentation is...

Ngày tải lên: 17/03/2014, 01:20

9 238 0
Báo cáo khoa học: "A Novel Word Segmentation Approach for Written Languages with Word Boundary Markers" pptx

Báo cáo khoa học: "A Novel Word Segmentation Approach for Written Languages with Word Boundary Markers" pptx

... indicates the word spac- ing error rate of the user input in terms of the character-unit precision, and the y-axis shows the word- unit precision of the output. Each graph de- picts the word- unit ... applications work under the as- sumption that a user input is error-free; thus, word segmentation (WS) for written languages that use word boundary mark- ers (WBMs), such as spaces, has...

Ngày tải lên: 17/03/2014, 02:20

4 268 0
Báo cáo khoa học: "Fully Unsupervised Word Segmentation with BVE and MDL" pdf

Báo cáo khoa học: "Fully Unsupervised Word Segmentation with BVE and MDL" pdf

... obtained using the first 100,000 words of the Chinese Gigaword corpus (Huang, 2007), written in Chinese characters. The word boundaries specified in the Chinese Gigaword Cor- pus were used as a gold ... lex- icon, or set of words. More formally, the segmented corpus S is a list of words s 1 s 2 . . . s N . L(S), the lexicon implicitly defined by S, is simply the set of unique words in S....

Ngày tải lên: 30/03/2014, 21:20

6 373 0
Báo cáo khoa học: "Adaptive Language Modeling for Word Prediction" potx

Báo cáo khoa học: "Adaptive Language Modeling for Word Prediction" potx

... number of words in the prediction window. We focus on 5 -word prediction windows. Many com- mercial devices provide optimized input for the most common words (called core vocabulary) and offer word ... determine the weight of each word in the current document us- ing frequency, recency, and topical salience. The recency of use of a word contributes to the relevance of the word. If a...

Ngày tải lên: 31/03/2014, 00:20

6 376 0
Tài liệu Báo cáo khoa học: "Improving Chinese Semantic Role Labeling with Rich Syntactic Features" ppt

Tài liệu Báo cáo khoa học: "Improving Chinese Semantic Role Labeling with Rich Syntactic Features" ppt

... as either their first or last word. Head position sug- gests that boundary words are good approximation of head word features. If head words have good approximation word features, then it is not ... design two kinds of traces (htr-p, htr- w): one uses POS of the head word; the other uses the head word word itself. E.g., the head word of 事故原因 is “原因” therefore these feature of t...

Ngày tải lên: 20/02/2014, 04:20

5 364 0
Tài liệu Báo cáo khoa học: "Learning Sub-Word Units for Open Vocabulary Speech Recognition" doc

Tài liệu Báo cáo khoa học: "Learning Sub-Word Units for Open Vocabulary Speech Recognition" doc

... coherence. Hybrid word/ sub -word recognizers can produce a sequence of sub -word units in place of OOV words. Ideally, the recognizer outputs a complete word for in-vocabulary (IV) utterances, and sub -word ... recognize words beyond their vocab- ulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/ sub -word systems solve this problem by...

Ngày tải lên: 20/02/2014, 04:20

10 442 0
Từ khóa: