automatic adaptation of annotation standards chinese word segmentation and pos tagging

Báo cáo khoa học: "Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging – A Case Study" potx

Báo cáo khoa học: "Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging – A Case Study" potx

Ngày tải lên : 17/03/2014, 01:20
... efficacy of this method in the context of Chinese word segmentation and part -of- speech tagging, where no segmentation and POS tagging standards are widely accepted due to the lack of morphology in Chinese. ... of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pages 522–530, Suntec, Singapore, 2-7 August 2009. c 2009 ACL and AFNLP Automatic Adaptation of Annotation Standards: Chinese ... method we choose Chinese word segmentation and part -of- speech tagging, where the problem of incompatible an- notation standards is one of the most evident: so far no segmentation standard is widely...
  • 9
  • 404
  • 0
Báo cáo khoa học: "An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging" docx

Báo cáo khoa học: "An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging" docx

Ngày tải lên : 17/03/2014, 01:20
... Joint Chinese Word Segmentation and POS Tagging Canasai Kruengkrai †‡ and Kiyotaka Uchimoto ‡ and Jun’ichi Kazama ‡ Yiou Wang ‡ and Kentaro Torisawa ‡ and Hitoshi Isahara †‡ † Graduate School of ... discriminative word- character hybrid model for joint Chi- nese word segmentation and POS tagging. Our word- character hybrid model offers high performance since it can handle both known and unknown words. ... Liu, and Yajuan L ¨ u. 2008a. A cascaded linear model for joint chinese word segmentation and part -of- speech tagging. In Proceedings of ACL. Wenbin Jiang, Haitao Mi, and Qun Liu. 2008b. Word lattice...
  • 9
  • 338
  • 0
Tài liệu Báo cáo khoa học: "Joint Word Segmentation and POS Tagging using a Single Perceptron" docx

Tài liệu Báo cáo khoa học: "Joint Word Segmentation and POS Tagging using a Single Perceptron" docx

Ngày tải lên : 20/02/2014, 09:20
... word w 11 the starting characters c 1 and c 2 of two con- secutive words 12 the ending characters c 1 and c 2 of two con- secutive words 13 a word of length l with previous word w 14 a word of ... UK {yue.zhang,stephen.clark}@comlab.ox.ac.uk Abstract For Chinese POS tagging, word segmentation is a preliminary step. To avoid error propa- gation and improve segmentation by utilizing POS information, segmentation and tagging can be ... proposed a hy- brid model for word segmentation and POS tagging using an HMM-based approach. Word information is used to process known-words, and character infor- mation is used for unknown words...
  • 9
  • 576
  • 0
Báo cáo khoa học: "A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" pdf

Báo cáo khoa học: "A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" pdf

Ngày tải lên : 08/03/2014, 01:20
... −l), and select for position i a N-best list of candidate results from all these candidates. When we derive a candidate result from a word- POS pair p and a candidate q at prior position of p, ... and joint segmentation and part -of- speech tagging. On the Penn Chinese Treebank 5.0, we obtain an error reduction of 18.5% on segmentation and 12% on joint seg- mentation and part -of- speech tagging ... that segmentation and POS tagging task is to divide a character sequence into several subse- quences and label each of them a POS tag. It is a better idea to perform segmentation and POS tagging...
  • 8
  • 445
  • 0
Báo cáo khoa học: "A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" potx

Báo cáo khoa học: "A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" potx

Ngày tải lên : 17/03/2014, 00:20
... sequence of characters c = (c 1 , , c #c ), the task of word segmentation and POS tagging is to predict a sequence of word and POS tag pairs y = (w 1 , p 1 , w #y , p #y ), where w i is a word, ... model, joint word segmen- tation and POS tagging is decomposed into two steps: (1) coarse-grained word segmentation and tagging, and (2) fine-grained sub -word tagging. The workflow is shown in ... decoding for POS tagging over sub- words is efficient. Finally, the Chinese language is characterized by the lack of morphology that often provides important clues for POS tagging, and the POS tags...
  • 10
  • 412
  • 0
Báo cáo khoa học: "Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation" doc

Báo cáo khoa học: "Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation" doc

Ngày tải lên : 17/03/2014, 00:20
... Liang Huang, and Qun Liu. 2009. Au- tomatic adaptation of annotation standards: Chinese word segmentation and POS tagging – a case study. In Proceedings of the Joint Conference of the 47th An- nual ... and Hitoshi Isahara. 2009. An error-driven word- character hybrid model for joint Chinese word segmentation and POS tagging. In Proceedings of the Joint Conference of the 47th Annual Meeting of ... model for word structure parsing is integrated with con- stituent parsing. There has been many efforts to in- tegrate Chinese word segmentation, part -of- speech tagging and parsing (Wu and Zixin,...
  • 10
  • 476
  • 0
Báo cáo khoa học: "Discriminative Pruning of Language Models for Chinese Word Segmentation" ppt

Báo cáo khoa học: "Discriminative Pruning of Language Models for Chinese Word Segmentation" ppt

Ngày tải lên : 17/03/2014, 04:20
... Comparison of Combined Model and KLD Model 5 Conclusions and Future Work A discriminative pruning criterion of n-gram lan- guage model for Chinese word segmentation was proposed in this paper, and ... and word segmentation performance is also discussed. 1 Introduction Chinese word segmentation is the initial stage of many Chinese language processing tasks, and has received a lot of attention ... of the 41 st Annual Meeting of Association for Computational Linguis- tics (ACL-2003), pages 272-279. Jianfeng Gao, Mu Li, Andi Wu, and Chang-Ning Huang. 2005. Chinese Word Segmentation and...
  • 8
  • 294
  • 0
Tài liệu Báo cáo khoa học: "Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification" pdf

Tài liệu Báo cáo khoa học: "Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification" pdf

Ngày tải lên : 20/02/2014, 12:20
... J. and A. Wu and Mu Li and C N.Huang and H. Li and X. Xia and H. Qin. 2004. Adaptive Chinese Word Segmentation. In Proceedings of ACL-2004. Meng, H. and C. W. Ip. 1999. An Analytical Study of Transformational ... N. 2003. Chinese Word Segmentation as Charac- ter Tagging. Computational Linguistics and Chinese Language Processing. 8(1): 29-48 Redington, M. and N. Chater and C. Huang and L. Chang and K. Chen. ... that Chinese word segmentation is the classifi- cation of a string of character-boundaries (CB’s) into either word- boundaries (WB’s) and non -word- boundaries. In Chinese, CB’s are delimited and...
  • 4
  • 301
  • 0
Tài liệu Báo cáo khoa học: "Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data" pdf

Tài liệu Báo cáo khoa học: "Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data" pdf

Ngày tải lên : 20/02/2014, 18:20
... "~J~:~""7~: ~'"'~}~:~'"'~: ~"should be separated and "~: ~'"'~:~'"'~: [] '"'}~: ~J:" be combined ... dis(locmax, y:z) = dts(x:y)- dts(y:z) Definition 7 Suppose 'vxyzw' is a Chinese 1268 Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data Sun Maosong, Shen Dayang*, ... Any Chinese word is composed of either single or multiple characters. Chinese texts are explicitly concatenations of characters, words are not delimited by spaces as that in English. Chinese...
  • 7
  • 396
  • 0
Báo cáo khoa học: "Exploring Deterministic Constraints: From a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation" ppt

Báo cáo khoa học: "Exploring Deterministic Constraints: From a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation" ppt

Ngày tải lên : 07/03/2014, 18:20
... beginning of a word and I all other positions; and 2) BMES: where B, M and E represent the beginning, middle and end of a multi- character word respectively, and S tags a single- character word. ... decoding. 3 Chinese Word Segmentation (CWS) 3.1 Word segmentation as character tagging Considering the ambiguity problem that a Chinese character may appear in any relative position in a word and the ... (c n−l k +1 c n ) represents a segmentation of k words and the lengths of the first and last word are l 1 and l k respectively. In early work, rule-based models find words one by one based on heuristics...
  • 9
  • 425
  • 0
Báo cáo khoa học: "Automatic Evaluation of Chinese Translation Output: Word-Level or Character-Level" doc

Báo cáo khoa học: "Automatic Evaluation of Chinese Translation Output: Word-Level or Character-Level" doc

Ngày tải lên : 17/03/2014, 00:20
... ICTCLAS2009. NUS Chinese word segmenter (NUS): The NUS Chinese word segmenter uses a maximum entropy approach to Chinese word segmentation, which achieved the highest F-measure on three of the four ... into words: Translation: 多少_钱_的_ 伞 _吗_? Reference: 这些_ 雨伞 _多少_钱_? The word “ 伞 ” is a synonym for the word “ 雨 伞 ”, and both words are translations of the English word “umbrella”. If a word- level ... the four corpora in the open track of the Second Interna- tional Chinese Word Segmentation Bakeoff (Ng and Low, 2004; Low et al., 2005). The segmenta- tion standard adopted in this paper is CTB...
  • 6
  • 344
  • 1
Tài liệu Báo cáo khoa học: "Unsupervized Word Segmentation: the case for Mandarin Chinese" doc

Tài liệu Báo cáo khoa học: "Unsupervized Word Segmentation: the case for Mandarin Chinese" doc

Ngày tải lên : 19/02/2014, 19:20
... segmented Chinese text, most of the tokens are uni- and bigrams but most of the types are bi- and trigrams (as unigrams are often high frequency grammatical words and trigrams the result of more ... unambiguous cases of numbers and dates in Chinese script. From h → (x 0 n ) and h → (x 0 n−1 ) on the one hand, and from h ← (x 0 n ) and h ← (x 1 n ) we estimate the Variation of Branching Entropy ... redefine the sentence segmentation problem as the maximization of the au- tonomy measure of its words. For a character se- quence s, if we call Seg(s) the set of all the possible segmentations, then...
  • 5
  • 467
  • 1
Tài liệu Báo cáo khoa học: "Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity" pdf

Tài liệu Báo cáo khoa học: "Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity" pdf

Ngày tải lên : 20/02/2014, 12:20
... synonyms and other types of semantically related words such as antonyms, (co)hyponyms and hypernyms. We present a method based on automatic word alignment of parallel corpora con- sisting of documents ... context and one using translational context based on word alignment and the combination of both. For both approaches, we used a cutoff n for each row in our word- by-context matrix. A word is ... word P(W) is the probability of seeing the word P(f) is the probability of seeing the feature P(W,f) is the probability of seeing the word and the feature together. 3.3 Word Alignment The multilingual...
  • 8
  • 516
  • 0

Xem thêm