... with respectto” actually function as a single word, and we of-ten condense them into the virtual words “UK” and “w.r.t.”.In order to extract “words” from text streams, unsupervised wordsegmentation ... this paper, we proposed a much more efficient and accurate model for fullyunsupervised word segmentation. With a combination of dynamicprogramming and an accurate spelling model froma Bayesian ... Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pages 100–108,Suntec, Singapore, 2-7 August 2009.c2009 ACL and AFNLPBayesian UnsupervisedWordSegmentation with Nested Pitman-Yor...
... q−1 and q−2respectively denote the last-shifted wordand the word shifted before q−1. q.w and q.t respectively denote the(root) word form and POS tag of a subtree (word) q, and q.b and q.e ... on CTB-6 and CTB-7accuracies of POS tagging and dependency pars-ing were remarkably improved by 0.6% and 2.4%,respectively corresponding to 8.3% and 10.2% er-ror reduction. For word segmentation, ... Gale, and NancyChang. 1996. A stochastic finite-state word- segmentation algorithm for Chinese. ComputationalLinguistics, 22.Weiwei Sun. 2011. A stacked sub -word model for jointChinese word segmentation...
... Bouchard-Cˆot´e, J. DeNero, and D. Klein. 2010. Painless unsupervised learning with features. In Proc. of NAACL.P. Blunsom and T. Cohn. 2006. Discriminative word alignment with conditional random fields. In ... Collins, and T. Darrell. 2004. Condi-tional random fields for object recognition. In NIPS17.H. Setiawan, C. Dyer, and P. Resnik. 2010. Discrimina-tive word alignment with a function word reorderingmodel. ... be-tween pairs of source and target word types acrosssentence pairs (Dice, 1945), IBM Model 1 forward and reverse probabilities, and the geometric mean ofthe Model 1 forward and reverse probabilities....
... Linguistics Unsupervised Discourse Segmentation of Documents with Inherently Parallel StructureMinwoo Jeong and Ivan TitovSaarland UniversitySaarbrăucken, Germany{m.jeong|titov}@mmci.uni-saarland.deAbstractDocuments ... story.Our model We evaluate our joint model of seg-mentation and alignment both withand withoutthe split/merge moves. For the model withoutthese moves, we set the desired number of seg-ments ... user interfaces and im-prove the performance of summarization and in-formation retrieval systems.Discourse segmentation of the documents com-posed of parallel parts is a novel and challeng-ing...
... tag t withword w2tag bigram t1t23tag trigram t1t2t34tag t followed by word w5 word w followed by tag t6 word w with tag t and previous character c7 word w with tag t and next ... a word starting with char c0 and containing char c13tag t on a word ending with char c0 and containing char c14tag t on a word containing repeated char cc15tag t on a word starting with ... sentence, and T is the size of the tagset (T = 1 for pure word segmentation) . It workedwell for wordsegmentation alone (Zhang and Clark,2007), even with an agenda size as small as 8, and a simple...
... (Pantel and Lin, 2002;Schăutze, 1998), there are other related efforts on word sense discrimination (Dorow and Widdows,2003; Fukumoto and Suzuki, 1999; Pedersen and Bruce, 1997).In (Pedersen and ... aboutderivation of feature vectors. A feature for target word here consists of a contextual content word and its grammatical relationship with target word. Ac-quisition of grammatical relationship depends ... case characters, ignoring all words that con-tain digits or non alpha-numeric characters, remov-ing words from a stop word list, and filtering outlow frequency words which appeared only once...
... 1268 Chinese WordSegmentation without Using Lexicon and Hand-crafted Training Data Sun Maosong, Shen Dayang*, Benjamin K Tsou** State Key Laboratory of Intelligent Technology and Systems, ... Chinese word segmentation developed so far, both statistical and rule-based, exploited two kinds of important resources, i.e., lexicon and hand-crafted linguistic resources(manually segmented and ... between mi and dts in depth; and (3) integrating it as a module with the existing Chinese segmenters so as to improve their performance (especially in ability to cope with unknown words and ability...
... of the tagger and the order of processing: Raw Text I I I words with one tag I I I re°re un~ownl I ~an w°r , 4;; Disambiguator I tags" I &Guesser I I words with one tag Ta ... words, which examines contextual features along with the word ending and capitalization and returns an open-class POS. 3 Training Sets For the study and resolution of lexical ambiguity in M. ... Proceedings of EACL '99 POS Disambiguation and Unknown Word Guessing with Decision Trees Giorgos S. Orphanos Computer Engineering & Informatics Dept. and Computer Technology Institute University...
... p(position i −l), and select for position i a N-best listof candidate results from all these candidates. Whenwe derive a candidate result from a word- POS pairp and a candidate q at prior ... sources effectively and obtain accuracyimprovements on both segmentationand Joint S&T.2 Segmentationand POS TaggingGiven a Chinese character sequence:C1:n= C1C2 Cnthe segmentation ... seg-mentation only and joint segmentation and part-of-speech tagging. On the Penn ChineseTreebank 5.0, we obtain an error reduction of18.5% on segmentationand 12% on joint seg-mentation and part-of-speech...
... experiments without such optimization.845 1 word w2 word bigram w1w23single-character word w4a word starting with character c and havinglength l5a word ending with character c and havinglength ... characters c1 and c2of two con-secutive words12the ending characters c1 and c2of two con-secutive words13a word of length l and the previous word w14a word of length l and the next word wTable ... characters c1 and c27character bigram c1c2in any word 8the first and last characters c1 and c2of any word 9 word w immediately before character c10character c immediately before word w11the...
... token and whichclass it belongs to. Solvers may use previously pre-dicted words and their POS information as clues tofind a new word. After one word is found and classi-fied, solvers move on and ... 2010. Word- based and character-based word segmentation models: Comparison and combi-nation. In Coling 2010: Posters, pages 1211–1219,Beijing, China, August. Coling 2010 Organizing Com-mittee.Andr´e ... sub -word model, joint word segmen-tation and POS tagging is decomposed into twosteps: (1) coarse-grained wordsegmentation and tagging, and (2) fine-grained sub -word tagging. Theworkflow is shown in...
... deal-ing with this issue. With this search space representation, wecan consistently handle unknown words with character-level nodes. In other words, we use word- level nodes to identify known words and character-level ... ACL and AFNLPAn Error-Driven Word- Character Hybrid Modelfor Joint Chinese WordSegmentationand POS TaggingCanasai Kruengkrai†‡ and Kiyotaka Uchimoto‡ and Jun’ichi Kazama‡Yiou Wang‡ and ... discriminative word- character hybrid model for joint Chi-nese wordsegmentationand POS tagging.Our word- character hybrid model offershigh performance since it can handle bothknown and unknown words....
... the word containing these characters. In addition, Ng and Low (2004) find that, compared with POS taggingafter word segmentation, Joint S&T can achievehigher accuracy on both segmentationand ... representationof Ng and Low (2004). For word segmentation only, there are four boundary tags:ã b: the begin of the word ã m: the middle of the word ã e: the end of the word ã s: a single-character word while ... the ACL and the 4th IJCNLP of the AFNLP, pages 522–530,Suntec, Singapore, 2-7 August 2009.c2009 ACL and AFNLPAutomatic Adaptation of Annotation Standards:Chinese WordSegmentationand POS...