Báo cáo khoa học: "Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling" doc

Báo cáo khoa học: "Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling" doc

Báo cáo khoa học: "Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling" doc

... accurate word n-gram language model directly from characters of arbitrary language, without any word indications. 1 Introduction Word is no trivial concept in many languages. Asian languages ... unsupervised word seg- mentation and an efficient blocked Gibbs sampler combined with dynamic program- ming for inference. Our model is a nested hierarchical Pitman-Yor language model...

Ngày tải lên: 17/03/2014, 01:20

9 238 0
Báo cáo khoa học: "Fully Unsupervised Word Segmentation with BVE and MDL" pdf

Báo cáo khoa học: "Fully Unsupervised Word Segmentation with BVE and MDL" pdf

... Introduction The goal of unsupervised word segmentation is to discover correct word boundaries in natural lan- guage corpora where explicit boundaries are absent. Often, unsupervised word segmentation algorithms rely ... the cor- rect segmentation for a given language. The goal of fully unsupervised word segmentation, then, is to recover the correct boundaries for arbitr...

Ngày tải lên: 30/03/2014, 21:20

6 373 0
Tài liệu Báo cáo khoa học: "Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification" pdf

Tài liệu Báo cáo khoa học: "Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification" pdf

... suffer from the same dilemma. Word segmentation is sup- posed to identify word boundaries in a running text, and words defined by these boundaries are then com- pared with the mental/electronic lexicon ... modeling segmentation as character classification (Xue, 2003; Gao et al., 2004). This approach observes that by classifying characters as word- initial, word- final, penultimate, e...

Ngày tải lên: 20/02/2014, 12:20

4 301 0
Báo cáo khoa học: "Lexicalized phonotactic word segmentation" pptx

Báo cáo khoa học: "Lexicalized phonotactic word segmentation" pptx

... most words are classified as unknown. To classify a word, we compare its frequency w as a word in the segmentation to the frequencies p and s with which it occurs as a prefix and suffix of words in ... free-standing. Languages with many affixes have longer words, e.g. my Arabic data averages 5.6 phones per word. Pauses are vital for deciding what is an af- fix. Attempts to segment trans...

Ngày tải lên: 17/03/2014, 02:20

9 173 0
Báo cáo khoa học: "A Novel Word Segmentation Approach for Written Languages with Word Boundary Markers" pptx

Báo cáo khoa học: "A Novel Word Segmentation Approach for Written Languages with Word Boundary Markers" pptx

... 29–32, Suntec, Singapore, 4 August 2009. c 2009 ACL and AFNLP A Novel Word Segmentation Approach for Written Languages with Word Boundary Markers Han-Cheol Cho † , Do-Gil Lee § , Jung-Tae Lee § , ... module. 1 Introduction Word segmentation (WS) has been a fundamen- tal research issue for languages that do not have word boundary markers (WBMs); on the con- trary, other language...

Ngày tải lên: 17/03/2014, 02:20

4 268 0
Báo cáo khoa học: "Adaptive Chinese Word Segmentation" pptx

Báo cáo khoa học: "Adaptive Chinese Word Segmentation" pptx

... segmented words that are either lexical words or OOV words with certain types (e.g. person name, morphological words, new words) we then have a system that can perform word segmentation and OOV word ... Chinese word segmentation, as described in 3.2. We also perform NWI on Bakeoff AWS w/o NW AWS w/ NW (post-processor) AWS w/ NW (unified approach) word segmentation word...

Ngày tải lên: 23/03/2014, 19:20

8 336 0
Tài liệu Báo cáo khoa học: "Learning Sub-Word Units for Open Vocabulary Speech Recognition" doc

Tài liệu Báo cáo khoa học: "Learning Sub-Word Units for Open Vocabulary Speech Recognition" doc

... coherence. Hybrid word/ sub -word recognizers can produce a sequence of sub -word units in place of OOV words. Ideally, the recognizer outputs a complete word for in-vocabulary (IV) utterances, and sub -word ... words independently, keeping fixed all other segmentations. Still, even sampling a single word s segmentation requires enumerating probabil- ities for all possible segmentatio...

Ngày tải lên: 20/02/2014, 04:20

10 443 0
Báo cáo khoa học: "The order of prenominal adjectives in natural language generation" doc

Báo cáo khoa học: "The order of prenominal adjectives in natural language generation" doc

... feature f. In other words, know- ing the value of a feature with a higher G gets us closer on average to knowing the class of an in- stance than knowing the value of a feature with a lower G does. The ... English. 2 Word bigram model The problem of generating ordered sequences of adjectives is an instance of the more general prob- lem of selecting among a number of possible outputs from...

Ngày tải lên: 08/03/2014, 05:20

8 420 0
Báo cáo khoa học: "Acquisition of Conceptual Data Models from Natural Language Descriptions" doc

Báo cáo khoa học: "Acquisition of Conceptual Data Models from Natural Language Descriptions" doc

... representation language from the analyst. It can alternatively help the analyst learn the specification language by displaying the graphical representation of a given description. With a natural language ... representation languages, in place of traditional approaches which rely heavily on natural language narrative to specify prooessing requirements. A typical representati...

Ngày tải lên: 09/03/2014, 01:20

8 328 0
Báo cáo khoa học: "MENTING A DATABASE KNOWLEDGE REPRESENTATION FOR NATURAL LANGUAGE GENERATION" docx

Báo cáo khoa học: "MENTING A DATABASE KNOWLEDGE REPRESENTATION FOR NATURAL LANGUAGE GENERATION" docx

... compare the values of the DDAs of one sub-class with the values of the same attributes within a sibling sub-class. The values of relational attributes within a sub-class are also recorded by ENHANCE. ... Calculating the potential-DDAs requires comparing the values of the attributes within the sub-class with the values within each other sub-class in turn. This calculation yields tw...

Ngày tải lên: 17/03/2014, 19:21

8 311 0
w