confidencedependent+chinese+word+segmentation

Tài liệu Báo cáo khoa học: "Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification" pdf

Tài liệu Báo cáo khoa học: "Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification" pdf

... International Chinese Word Segmentation Bake- off. Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan, July 2003. Xue, N. 2003. Chinese Word Segmentation ... co-occurrence. Word based model. In this model, statistical data about word boundary frequencies for each character is retrieved word- wise. For example, in the case of a monosyllabic word only two word ... introduce is that Chinese word segmentation is the classifi- cation of a string of character-boundaries (CB’s) into either word- boundaries (WB’s) and non -word- boundaries. In Chinese, CB’s are delimited...

Ngày tải lên: 20/02/2014, 12:20

4 301 0
Tài liệu Báo cáo khoa học: "Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data" pdf

Tài liệu Báo cáo khoa học: "Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data" pdf

... as that in English. Chinese word segmentation is therefore the first step for any Chinese information processing system[ 1]. Almost all methods for Chinese word segmentation developed so far, ... Automatic Word Segmentation System for Written Chinese Texts", Journal of Chinese Information Processing, Vol. 1, No.2, 1987 (in Chinese) [2] Fan C.K.,Tsai WH., "Automatic Word Identification ... ofHong Kong, Hong Kong Abstract Chinese word segmentation is the first step in any Chinese NLP system. This paper presents a new algorithm for segmenting Chinese texts without making use...

Ngày tải lên: 20/02/2014, 18:20

7 396 0
Báo cáo khoa học: "Exploring Deterministic Constraints: From a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation" ppt

Báo cáo khoa học: "Exploring Deterministic Constraints: From a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation" ppt

... decoding. 3 Chinese Word Segmentation (CWS) 3.1 Word segmentation as character tagging Considering the ambiguity problem that a Chinese character may appear in any relative position in a word and ... Character- and word- based features As studied in previous work, word- based feature templates usually include the word itself, sub-words contained in the word, contextual characters/words and so ... are incorporated into word- based CWS models, some word- based features are no longer of interest, such as the start- ing character of a word, sub-words contained in the word, contextual characters...

Ngày tải lên: 07/03/2014, 18:20

9 425 0
Báo cáo khoa học: "A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" pdf

Báo cáo khoa học: "A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" pdf

... obtain accuracy improvements on both segmentation and Joint S&T. 2 Segmentation and POS Tagging Given a Chinese character sequence: C 1:n = C 1 C 2 C n the segmentation result can be depicted ... end of the word • s: a single-character word We can extract segmentation result by splitting the labelled result into subsequences of pattern s or bm ∗ e which denote single-character word and ... 3-gram word language model measuring the flu- ency of the segmentation result, a 4-gram POS lan- guage model functioning as the product of state- transition probabilities in HMM, and a word- POS co-occurrence...

Ngày tải lên: 08/03/2014, 01:20

8 445 0
Báo cáo khoa học: "A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" potx

Báo cáo khoa học: "A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" potx

... stacked sub -word model. Given multiple word segmentations of one sentence, we formally define a sub -word structure that maximizes the agreement of non -word- break positions. Based on the sub -word structure, ... pre- dicted words and their POS information as clues to find a new word. After one word is found and classi- fied, solvers move on and search for the next possi- ble word. This word- by -word method ... data for sub -word tagging. 3 Method 3.1 Architecture In our stacked sub -word model, joint word segmen- tation and POS tagging is decomposed into two steps: (1) coarse-grained word segmentation...

Ngày tải lên: 17/03/2014, 00:20

10 412 0
Báo cáo khoa học: "Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation" doc

Báo cáo khoa học: "Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation" doc

... Generation of Words with Internal Structures Words with rich internal structures can be described using a context-free grammar formalism as word → root (3) word → word suffix (4) word → prefix word (5) Here ... trained with the Penn Chinese Treebank and actually is able to parse both word and phrase structures in a unified way. 1 Why Parse Word Structures? Research in Chinese word segmentation has pro- gressed ... 2003. Chinese word segmentation as character tagging. Computational Linguistics and Chinese Language Processing, 8(1):29–48. Yue Zhang and Stephen Clark. 2007. Chinese segmenta- tion with a word- based...

Ngày tải lên: 17/03/2014, 00:20

10 476 0
Tài liệu Báo cáo khoa học: "Unsupervized Word Segmentation: the case for Mandarin Chinese" doc

Tài liệu Báo cáo khoa học: "Unsupervized Word Segmentation: the case for Mandarin Chinese" doc

... len(w i ), where W is the segmentation corresponding to the sequence of words w 0 w 1 . . . w m , and len(w i ) is the length of a word w i used here to be able to com- pare segmentations resulting ... redefine the sentence segmentation problem as the maximization of the au- tonomy measure of its words. For a character se- quence s, if we call Seg(s) the set of all the possible segmentations, then ... against the corpora from the Second International Chi- nese Word Segmentation Bakeoff (Emerson, 2005). These corpora cover 4 different segmentation guide- lines from various origins: Academia Sinica...

Ngày tải lên: 19/02/2014, 19:20

5 467 1
Báo cáo khoa học: "Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese" potx

Báo cáo khoa học: "Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese" potx

... 1996. A stochastic finite-state word- segmentation algorithm for Chinese. Computational Linguistics, 22. Weiwei Sun. 2011. A stacked sub -word model for joint Chinese word segmentation and part-of-speech ... improve the segmentation of out-of- vocabulary (OOV) words. Unlike languages such as Japanese that use a distinct character set (i.e. katakana) for foreign words, the transliterated words in Chinese, ... POS tags. The joint approach to word segmentation and POS tagging has been reported to improve word seg- mentation and POS tagging accuracies by more than 1% in Chinese (Zhang and Clark, 2008)....

Ngày tải lên: 07/03/2014, 18:20

9 523 0
Tài liệu Word Segmentation for Vietnamese Text Categorization: An online corpus approach pptx

Tài liệu Word Segmentation for Vietnamese Text Categorization: An online corpus approach pptx

... Vietnamese word segmentation is very problematic, especially without a manual segmentation test corpus. Therefore, we perform two experiments, one is done by human judgment for word segmentation ... ways of segmentation, i.e. the important words are segmented correctly while less important words may be segmented incorrectly. Table 6 represents the human judgment for our word segmentation ... inhomogeneous phenomenon in judgment word segmentation. However, the acceptable segmentation percentage is satisfactory. Nearly eighty percent of word segmentation outcome does not make the...

Ngày tải lên: 12/12/2013, 11:15

6 741 1
Tài liệu Báo cáo khoa học: "Joint Word Segmentation and POS Tagging using a Single Perceptron" docx

Tài liệu Báo cáo khoa học: "Joint Word Segmentation and POS Tagging using a Single Perceptron" docx

... specific to Chinese, are shown in Table 2. The word segmentation features are extracted from word bigrams, capturing word, word length and character information in the context. The word length ... last word can be a complete word or a partial word. A problem arises in whether to give POS tags to incomplete words. If partial words are given POS tags, it is likely that some partial words are ... pattern “number word + “number word can help to prevent seg- menting a long number word into two words. In order to avoid error propagation and make use of POS information for word segmentation, ...

Ngày tải lên: 20/02/2014, 09:20

9 576 0
Tài liệu Báo cáo khoa học: "An Equivalent Pseudoword Solution to Chinese Word Sense Disambiguation" ppt

Tài liệu Báo cáo khoa học: "An Equivalent Pseudoword Solution to Chinese Word Sense Disambiguation" ppt

... monosemous word is usually synonymous to some polysemous words. For example the words "信守, 严守, 恪守 遵照 遵从 遵循 , , , , 遵守 " has similar meaning as one of the senses of the ambiguous word ... in Chinese, which can be used as a knowledge source for WSD. 3.1 Definition of Equivalent Pseudoword If the ambiguous words in the corpus are re- placed with its synonymous monosemous word, ... ambiguous word need to simulate the function of the real ambiguous word, and to acquire semantic knowledge as the real ambiguous word does. Thus, we call it an equivalent pseudoword (EP)...

Ngày tải lên: 20/02/2014, 12:20

8 414 0
Tài liệu Báo cáo khoa học: "Bilingually Motivated Domain-Adapted Word Segmentation for Statistical Machine Translation" pptx

Tài liệu Báo cáo khoa học: "Bilingually Motivated Domain-Adapted Word Segmentation for Statistical Machine Translation" pptx

... of the Chinese side of the training data, including the total vocabulary (Voc), number of character vocabulary (Char.voc) in Voc, and the running words (Run.words) when different word segmentations ... iterations). 4 Word Lattice Decoding 4.1 Word Lattices In the decoding stage, the various segmentation alternatives can be encoded into a compact rep- resentation of word lattices. A word lattice ... Given a Chinese sentence c J 1 consisting of J characters {c 1 , . . . , c J } and an English sentence e I 1 consisting of I words {e 1 , . . . , e I }, A C→E will denote a Chinese- to- English word...

Ngày tải lên: 22/02/2014, 02:20

9 236 0
Xem thêm

Bạn có muốn tìm thêm với từ khóa:

w