Báo cáo khoa học: "Fully Unsupervised Word Segmentation with BVE and MDL" pdf

Báo cáo khoa học: "Fully Unsupervised Word Segmentation with BVE and MDL" pdf

Báo cáo khoa học: "Fully Unsupervised Word Segmentation with BVE and MDL" pdf

... set of candidate segmentations. In this work, we compare a variety of unsupervised word segmentation algorithms operating in conjunc- tion with MDL for fully unsupervised segmentation, and find ... pages 540–545, Portland, Oregon, June 19-24, 2011. c 2011 Association for Computational Linguistics Fully Unsupervised Word Segmentation with BVE and MDL Daniel Hewlett an...
Ngày tải lên : 30/03/2014, 21:20
  • 6
  • 373
  • 0
Báo cáo khoa học: "Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling" doc

Báo cáo khoa học: "Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling" doc

... with respect to” actually function as a single word, and we of- ten condense them into the virtual words “UK” and “w.r.t.”. In order to extract “words” from text streams, unsupervised word segmentation ... em- bedded in the word model. We confirmed that it significantly outperforms previous reported results in both phonetic tran- scripts and standard datasets for Chinese and J...
Ngày tải lên : 17/03/2014, 01:20
  • 9
  • 238
  • 0
Tài liệu Báo cáo khoa học: "Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification" pdf

Tài liệu Báo cáo khoa học: "Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification" pdf

... also a wordbreak (WB). In other words, we model Chi- nese word segmentation as wordbreak (WB) iden- tification which takes all CB’s as candidates and returns a subset which also serves as wordbreaks. More ... that Chinese word segmentation is the classifi- cation of a string of character-boundaries (CB’s) into either word- boundaries (WB’s) and non -word- boundaries. In Chinese, CB...
Ngày tải lên : 20/02/2014, 12:20
  • 4
  • 301
  • 0
Báo cáo khoa học: "Lexicalized phonotactic word segmentation" pptx

Báo cáo khoa học: "Lexicalized phonotactic word segmentation" pptx

... most words are classified as unknown. To classify a word, we compare its frequency w as a word in the segmentation to the frequencies p and s with which it occurs as a prefix and suffix of words in ... complex, messy inputs. (Cf. Ando and Lee’s (2000) kanji segmenter.) On the other hand, modelling only partial words helps the segmenter handle long, infrequent words. Long words are t...
Ngày tải lên : 17/03/2014, 02:20
  • 9
  • 173
  • 0
Báo cáo khoa học: "A Novel Word Segmentation Approach for Written Languages with Word Boundary Markers" pptx

Báo cáo khoa học: "A Novel Word Segmentation Approach for Written Languages with Word Boundary Markers" pptx

... ACL and AFNLP A Novel Word Segmentation Approach for Written Languages with Word Boundary Markers Han-Cheol Cho † , Do-Gil Lee § , Jung-Tae Lee § , Pontus Stenetorp † , Jun’ichi Tsujii † and ... a baseline WS model, confidence and threshold es- timation, and output optimization. The following sections will explain the steps in detail. 2.1 Baseline Word Segmentation Model We...
Ngày tải lên : 17/03/2014, 02:20
  • 4
  • 268
  • 0
Báo cáo khoa học: "Fully Unsupervised Discovery of Concept-Specific Relationships by Web Mining" pdf

Báo cáo khoa học: "Fully Unsupervised Discovery of Concept-Specific Relationships by Web Mining" pdf

... 1998; Widdows and Dorow, 2002; Davi- dov and Rappoport, 2006) and meronymy (Berland and Charniak, 1999). In addition to these basic types, several stud- ies deal with the discovery and labeling ... 2: For each concept word, collect instances of contexts in which the word appears together with one other content word. Call this other word a tar- get word for that concept...
Ngày tải lên : 23/03/2014, 18:20
  • 8
  • 330
  • 0
Báo cáo khoa học: "Adaptive Chinese Word Segmentation" pptx

Báo cáo khoa học: "Adaptive Chinese Word Segmentation" pptx

... segmented words that are either lexical words or OOV words with certain types (e.g. person name, morphological words, new words) we then have a system that can perform word segmentation and OOV word ... efficiency and the best results achieved in our experiments. Given the linear models, the procedure of word segmentation in our system is as follows: First, all word c...
Ngày tải lên : 23/03/2014, 19:20
  • 8
  • 336
  • 0
Báo cáo khoa học: "Fully Unsupervised Core-Adjunct Argument Classification" pot

Báo cáo khoa học: "Fully Unsupervised Core-Adjunct Argument Classification" pot

... be- long to a closed class cluster as a head word (an argument can have several head words). A closed class is a class of function words with relatively few word types, each of which is very frequent. Typical ... the argu- ment using the unsupervised tools above 5 . Each word in the argument is now represented by its word form (without lemmatization), its unsuper- vised POS tag a...
Ngày tải lên : 30/03/2014, 21:20
  • 11
  • 356
  • 0
Báo cáo khoa học: "Boosting Statistical Word Alignment Using Labeled and Unlabeled Data" ppt

Báo cáo khoa học: "Boosting Statistical Word Alignment Using Labeled and Unlabeled Data" ppt

... distortion probabilities for head words and non-head words are estimated in (6) and (7) with the labeled data, respectively. Where and are the trained supervised model and unsupervised model, respectively. ... occurring fre- quency of word aligned to i e i φ target words in the labeled data. Word Alignment Model 0 p and describe the fertility probabilities for ....
Ngày tải lên : 08/03/2014, 02:21
  • 8
  • 451
  • 1
Báo cáo khoa học: "Effects of Word Confusion Networks on Voice Search" pdf

Báo cáo khoa học: "Effects of Word Confusion Networks on Voice Search" pdf

... query is ”night club for 18 and up”. We know ”night club” is the main subject. And ”18 and up” is a constraint. Without matching ”night club”, any match with ”18 and up” is meaning- less. The ... search, which searches words in q i j within a specific dis- tance. For instance, ”burlington west virginia” ∼ 5 will find entries that include these three words within 5 words of each other....
Ngày tải lên : 08/03/2014, 21:20
  • 8
  • 389
  • 0

Xem thêm