0

fully unsupervised word segmentation with bve and mdl

Báo cáo khoa học:

Báo cáo khoa học: "Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling" doc

Báo cáo khoa học

... with respectto” actually function as a single word, and we of-ten condense them into the virtual words “UK” and “w.r.t.”.In order to extract “words” from text streams, unsupervised word segmentation ... this paper, we proposed a much more efficient and accurate model for fully unsupervised word segmentation. With a combination of dynamicprogramming and an accurate spelling model froma Bayesian ... Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pages 100–108,Suntec, Singapore, 2-7 August 2009.c2009 ACL and AFNLPBayesian Unsupervised Word Segmentation with Nested Pitman-Yor...
  • 9
  • 238
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese" potx

Báo cáo khoa học

... q−1 and q−2respectively denote the last-shifted word and the word shifted before q−1. q.w and q.t respectively denote the(root) word form and POS tag of a subtree (word) q, and q.b and q.e ... on CTB-6 and CTB-7accuracies of POS tagging and dependency pars-ing were remarkably improved by 0.6% and 2.4%,respectively corresponding to 8.3% and 10.2% er-ror reduction. For word segmentation, ... Gale, and NancyChang. 1996. A stochastic finite-state word- segmentation algorithm for Chinese. ComputationalLinguistics, 22.Weiwei Sun. 2011. A stacked sub -word model for jointChinese word segmentation...
  • 9
  • 523
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Unsupervised Word Alignment with Arbitrary Features" potx

Báo cáo khoa học

... Bouchard-Cˆot´e, J. DeNero, and D. Klein. 2010. Painless unsupervised learning with features. In Proc. of NAACL.P. Blunsom and T. Cohn. 2006. Discriminative word alignment with conditional random fields. In ... Collins, and T. Darrell. 2004. Condi-tional random fields for object recognition. In NIPS17.H. Setiawan, C. Dyer, and P. Resnik. 2010. Discrimina-tive word alignment with a function word reorderingmodel. ... be-tween pairs of source and target word types acrosssentence pairs (Dice, 1945), IBM Model 1 forward and reverse probabilities, and the geometric mean ofthe Model 1 forward and reverse probabilities....
  • 11
  • 292
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Unsupervised Discourse Segmentation of Documents with Inherently Parallel Structure" pdf

Báo cáo khoa học

... Linguistics Unsupervised Discourse Segmentation of Documents with Inherently Parallel StructureMinwoo Jeong and Ivan TitovSaarland UniversitySaarbrăucken, Germany{m.jeong|titov}@mmci.uni-saarland.deAbstractDocuments ... story.Our model We evaluate our joint model of seg-mentation and alignment both with and withoutthe split/merge moves. For the model withoutthese moves, we set the desired number of seg-ments ... user interfaces and im-prove the performance of summarization and in-formation retrieval systems.Discourse segmentation of the documents com-posed of parallel parts is a novel and challeng-ing...
  • 5
  • 376
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Joint Word Segmentation and POS Tagging using a Single Perceptron" docx

Báo cáo khoa học

... tag t with word w2tag bigram t1t23tag trigram t1t2t34tag t followed by word w5 word w followed by tag t6 word w with tag t and previous character c7 word w with tag t and next ... a word starting with char c0 and containing char c13tag t on a word ending with char c0 and containing char c14tag t on a word containing repeated char cc15tag t on a word starting with ... sentence, and T is the size of the tagset (T = 1 for pure word segmentation) . It workedwell for word segmentation alone (Zhang and Clark,2007), even with an agenda size as small as 8, and a simple...
  • 9
  • 576
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Learning Word Senses With Feature Selection and Order Identification Capabilities" pdf

Báo cáo khoa học

... (Pantel and Lin, 2002;Schăutze, 1998), there are other related efforts on word sense discrimination (Dorow and Widdows,2003; Fukumoto and Suzuki, 1999; Pedersen and Bruce, 1997).In (Pedersen and ... aboutderivation of feature vectors. A feature for target word here consists of a contextual content word and its grammatical relationship with target word. Ac-quisition of grammatical relationship depends ... case characters, ignoring all words that con-tain digits or non alpha-numeric characters, remov-ing words from a stop word list, and filtering outlow frequency words which appeared only once...
  • 8
  • 463
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data" pdf

Báo cáo khoa học

... 1268 Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data Sun Maosong, Shen Dayang*, Benjamin K Tsou** State Key Laboratory of Intelligent Technology and Systems, ... Chinese word segmentation developed so far, both statistical and rule-based, exploited two kinds of important resources, i.e., lexicon and hand-crafted linguistic resources(manually segmented and ... between mi and dts in depth; and (3) integrating it as a module with the existing Chinese segmenters so as to improve their performance (especially in ability to cope with unknown words and ability...
  • 7
  • 396
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "POS Disambiguation and Unknown Word Guessing with Decision Trees" pot

Báo cáo khoa học

... of the tagger and the order of processing: Raw Text I I I words with one tag I I I re°re un~ownl I ~an w°r , 4;; Disambiguator I tags" I &Guesser I I words with one tag Ta ... words, which examines contextual features along with the word ending and capitalization and returns an open-class POS. 3 Training Sets For the study and resolution of lexical ambiguity in M. ... Proceedings of EACL '99 POS Disambiguation and Unknown Word Guessing with Decision Trees Giorgos S. Orphanos Computer Engineering & Informatics Dept. and Computer Technology Institute University...
  • 8
  • 326
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" pdf

Báo cáo khoa học

... p(position i −l), and select for position i a N-best listof candidate results from all these candidates. Whenwe derive a candidate result from a word- POS pairp and a candidate q at prior ... sources effectively and obtain accuracyimprovements on both segmentation and Joint S&T.2 Segmentation and POS TaggingGiven a Chinese character sequence:C1:n= C1C2 Cnthe segmentation ... seg-mentation only and joint segmentation and part-of-speech tagging. On the Penn ChineseTreebank 5.0, we obtain an error reduction of18.5% on segmentation and 12% on joint seg-mentation and part-of-speech...
  • 8
  • 445
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Chinese Segmentation with a Word-Based Perceptron Algorithm" docx

Báo cáo khoa học

... experiments without such optimization.845 1 word w2 word bigram w1w23single-character word w4a word starting with character c and havinglength l5a word ending with character c and havinglength ... characters c1 and c2of two con-secutive words12the ending characters c1 and c2of two con-secutive words13a word of length l and the previous word w14a word of length l and the next word wTable ... characters c1 and c27character bigram c1c2in any word 8the first and last characters c1 and c2of any word 9 word w immediately before character c10character c immediately before word w11the...
  • 8
  • 380
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" potx

Báo cáo khoa học

... token and whichclass it belongs to. Solvers may use previously pre-dicted words and their POS information as clues tofind a new word. After one word is found and classi-fied, solvers move on and ... 2010. Word- based and character-based word segmentation models: Comparison and combi-nation. In Coling 2010: Posters, pages 1211–1219,Beijing, China, August. Coling 2010 Organizing Com-mittee.Andr´e ... sub -word model, joint word segmen-tation and POS tagging is decomposed into twosteps: (1) coarse-grained word segmentation and tagging, and (2) fine-grained sub -word tagging. Theworkflow is shown in...
  • 10
  • 412
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging" docx

Báo cáo khoa học

... deal-ing with this issue. With this search space representation, wecan consistently handle unknown words with character-level nodes. In other words, we use word- level nodes to identify known words and character-level ... ACL and AFNLPAn Error-Driven Word- Character Hybrid Modelfor Joint Chinese Word Segmentation and POS TaggingCanasai Kruengkrai†‡ and Kiyotaka Uchimoto‡ and Jun’ichi Kazama‡Yiou Wang‡ and ... discriminative word- character hybrid model for joint Chi-nese word segmentation and POS tagging.Our word- character hybrid model offershigh performance since it can handle bothknown and unknown words....
  • 9
  • 338
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging – A Case Study" potx

Báo cáo khoa học

... the word containing these characters. In addition, Ng and Low (2004) find that, compared with POS taggingafter word segmentation, Joint S&T can achievehigher accuracy on both segmentation and ... representationof Ng and Low (2004). For word segmentation only, there are four boundary tags:ã b: the begin of the word ã m: the middle of the word ã e: the end of the word ã s: a single-character word while ... the ACL and the 4th IJCNLP of the AFNLP, pages 522–530,Suntec, Singapore, 2-7 August 2009.c2009 ACL and AFNLPAutomatic Adaptation of Annotation Standards:Chinese Word Segmentation and POS...
  • 9
  • 404
  • 0

Xem thêm