Chinese word segmentation with a maximum entropy approach

Chinese word segmentation with a maximum entropy approach

Chinese word segmentation with a maximum entropy approach

... another character; another tag for a character that occurs in the middle of a word; another tag for a character that ends a word; and another tag for a character that occurs as a single-character ... incorporate additional dictionary features based on an external word list, and to use extra training data annotated in other word segmentation standards Corpora of different...

Ngày tải lên: 03/10/2015, 20:33

63 251 0
Tài liệu Báo cáo khoa học: "Refined Lexicon Models for Statistical Machine Translation using a Maximum Entropy Approach" pptx

Tài liệu Báo cáo khoa học: "Refined Lexicon Models for Statistical Machine Translation using a Maximum Entropy Approach" pptx

... References Yaser Al-Onaizan, Jan Curin, Michael Jahr, Kevin Knight, John Lafferty, Dan Melamed, David Purdy, Franz J Och, Noah A Smith, and David Yarowsky 1999 Statistical machine translation, final report, ... to adaptive statistical language modeling Computer, Speech and Language, 10:187–228 Christoph Tillmann and Hermann Ney 2000 Word re-ordering and dp-based search in statistical...

Ngày tải lên: 20/02/2014, 18:20

8 427 0
Báo cáo khoa học: "Chinese Segmentation with a Word-Based Perceptron Algorithm" docx

Báo cáo khoa học: "Chinese Segmentation with a Word-Based Perceptron Algorithm" docx

... bigram w1 w2 single-character word w a word starting with character c and having length l a word ending with character c and having length l space-separated characters c1 and c2 character bigram ... each candidate in the source agenda and puts the generated candidates onto the target agenda After each character is processed, the items in the target agenda are copied to the source agen...

Ngày tải lên: 08/03/2014, 02:21

8 380 0
Báo cáo khoa học: "Exploring Deterministic Constraints: From a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation" ppt

Báo cáo khoa học: "Exploring Deterministic Constraints: From a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation" ppt

... In AAAI, pages 412–418 C Kruengkrai, K Uchimoto, J Kazama, Y Wang, K Torisawa, and H Isahara 2009 An error-driven word- character hybrid model for joint chinese word segmentation and pos tagging ... on POS taging The proposed constrained taggers as described above can achieve near state-of-art POS tagging accuracy in a much more efficient manner 5.4 Chinese word segmenta...

Ngày tải lên: 07/03/2014, 18:20

9 425 0
Báo cáo khoa học: "A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" pdf

Báo cáo khoa học: "A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" pdf

... tagging (Collins, 2002), Chinese word segmentation (Ng and Low, 2004; Zhang and Clark, 2007) and so on We trained a character-based perceptron for Chinese Joint S&T, and found that the perceptron ... the POS information and reported the F-measure on segmentation only, while the second performed Joint S&T using POS information and reported the F-measure both on seg...

Ngày tải lên: 08/03/2014, 01:20

8 445 0
Báo cáo khoa học: "A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" potx

Báo cáo khoa học: "A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" potx

... F-score performance on the test data Conclusion and Future Work This paper has described a stacked sub -word model for joint Chinese word segmentation and POS tagging We defined a sub -word structure ... 2005) In this work, stacked learning is used to acquire extended training data for sub -word tagging 3.1 Method Architecture In our stacked sub -word model...

Ngày tải lên: 17/03/2014, 00:20

10 412 0
Báo cáo khoa học: "Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation" doc

Báo cáo khoa học: "Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation" doc

... (Section 5) The output of our parser incorporates word structures naturally Evaluation shows that the model can learn much of the regularity of word structures, and also achieves reasonable accuracy ... treebank and check each of them manually Words with non-trivial structures are thus annotated Finally, we install these small trees of words into the original treebank Wh...

Ngày tải lên: 17/03/2014, 00:20

10 476 0
Báo cáo khoa học: "Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging – A Case Study" potx

Báo cáo khoa học: "Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging – A Case Study" potx

... two annotation standards are naturally denoted as source standard and target standard, while the classifiers following the two annotation standards are respectively named as source classifier and ... for Segmentation and Tagging Table also lists the results of annotation adaptation experiments For word segmentation, the model after annotation adaptation (row in upper...

Ngày tải lên: 17/03/2014, 01:20

9 404 0
Báo cáo khoa học: "Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection" docx

Báo cáo khoa học: "Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection" docx

... purpose fast online training method, ADF The proposed training method requires only a few passes to complete the training • We propose a joint model for Chinese word segmentation and new word detection ... features + New word detection + ADF training (replacing SGD training) The results are shown in Table 259 As we can see, the new features improved performance...

Ngày tải lên: 23/03/2014, 14:20

10 551 0
CUSTOMER SATISFACTION MEASUREMENT MODELS: GENERALISED MAXIMUM ENTROPY APPROACH

CUSTOMER SATISFACTION MEASUREMENT MODELS: GENERALISED MAXIMUM ENTROPY APPROACH

... estimation approach in solving the customer satisfaction models A proposed method can be used t o compute CSI based on statistical information about customer satisfaction measurements model COSTUMER SATISFACTION ... European customer satisfaction index model, which is an economic indicator, represents in Figuer.2 Perceived quality Customer Complaints Perceiv ed Value Custom...

Ngày tải lên: 19/10/2013, 07:15

14 549 0
Tài liệu Báo cáo khoa học: "Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification" pdf

Tài liệu Báo cáo khoa học: "Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification" pdf

... co-occurrence Word based model In this model, statistical data about word boundary frequencies for each character is retrieved word- wise For example, in the case of a monosyllabic word only two word boundaries ... contextual background providing information about the likelihood of whether each CB is also a wordbreak (WB) In other words, we model Chinese word segmentation as...

Ngày tải lên: 20/02/2014, 12:20

4 301 0
Tài liệu Báo cáo khoa học: "Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data" pdf

Tài liệu Báo cáo khoa học: "Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data" pdf

... co-occurrence probability of x and y, and p(x), p(y) are the independent probabilities of x and y respectively As claimed by Church(1991), the larger the mutual information between x and y, the higher the ... v, x, y and w: (1) tsv,y(x) > tsx,w(y ) < (x tends to combine with y, and y tends to combine with x) ==> dts(x:y) > ® ® In this case, x and y attract each other The locatio...

Ngày tải lên: 20/02/2014, 18:20

7 396 0
Báo cáo khoa học: "Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling" doc

Báo cáo khoa học: "Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling" doc

... so that word length will have a Poisson distribution whose parameter can now be estimated for a given language and word type We describe this in detail in Section 4.3 Nested Pitman-Yor Language ... probabilities over words ? If a lexicon is nite, we can use a uniform prior G0 (w) = 1/|V | for every word w in lexicon V However, with word segmentation every substring could b...

Ngày tải lên: 17/03/2014, 01:20

9 238 0
Báo cáo khoa học: "An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging" docx

Báo cáo khoa học: "An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging" docx

... wordcharacter hybrid model for joint Chinese word segmentation and POS tagging Our approach has two important advantages The first is robust search space representation based on a hybrid model in which word- level ... levels of information about words and POS tags Let us introduce some notation We write w−1 and w0 for the surface forms of words, where subscripts...

Ngày tải lên: 17/03/2014, 01:20

9 338 0
Báo cáo khoa học: "Discriminative Pruning of Language Models for Chinese Word Segmentation" ppt

Báo cáo khoa học: "Discriminative Pruning of Language Models for Chinese Word Segmentation" ppt

... Stochastic Finite-state Word- segmentation Algorithm for Chinese Computational Linguistics, 22(3): 377-404 Andreas Stolcke 1998 Entropy-based Pruning of Backoff Language Models In Proc of DARPA News Transcription ... appears in Figure Performance Comparison of Combined Model and KLD Model Conclusions and Future Work A discriminative pruning criterion of n-gram language...

Ngày tải lên: 17/03/2014, 04:20

8 294 0
w