... + I i,r j<i:r i =r j θ x i ,x j / w θ w,x i (2) 3 While the number of rhyme schemes of length n is tech- nically the number of partitions of an n- element set (the Bell number), only a subset of these are typically used. 78 P ... extremely useful for large-scale statistical analyses of poetic texts. • Historical Linguistics/Study of Dialects Rhymes of a word in poetry...
Ngày tải lên: 07/03/2014, 22:20
... Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 428–435, Sydney, July 2006. c 2006 ... Computational Linguistics 428 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 1 2 3 4 5 6 7 8 entropy offset 429 430 431 432 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.55 0.6 0.65 0.7 0.75
Ngày tải lên: 20/02/2014, 12:20
Báo cáo khoa học: "Unsupervised Learning of Acoustic Sub-word Units" pot
... France emmanuel.dupoux@gmail.com Abstract Accurate unsupervised learning of phonemes of a language directly from speech is demon- strated via an algorithm for joint unsupervised learning of the topology and parameters of a hidden Markov model ... im- provement in the efficacy of the SSS algorithm as described in Section 2. It is based on observing that the improvement in the goodness...
Ngày tải lên: 08/03/2014, 01:20
Báo cáo khoa học: "Unsupervised Learning of Arabic Stemming using a Parallel Corpus" pot
... indicates an improvement of 22-38% in average pre- cision over unstemmed text, and 96% of the performance of the proprietary stem- mer above. 1 Introduction Stemming is the process of normalizing word ... two examples use the joint probability of the prefix and suffix, with a smoothing back-off (the product of the individual probabilities). Scor- ing models of this form proved to...
Ngày tải lên: 08/03/2014, 04:22
Báo cáo khoa học: "Unsupervised Decomposition of a Document into Authorial Components" pdf
... One of the advantages of using biblical litera- ture is the availability of a great deal of manual annotation. In particular, we are able to identify synsets by exploiting the availability of ... that’s the nature of the clustering algorithm, but in fact are not part of what we might think of as the core of either cluster. Informally, we say that a unit is in the core...
Ngày tải lên: 17/03/2014, 00:20
Báo cáo khoa học: "Unsupervised Part-of-Speech Tagging Employing Efficient Graph Clustering" ppt
... All of them employ a syntactic version of Harris’ distributional hypothesis: Words of similar parts of speech can be observed in the same syntactic contexts. Contexts in that sense are often ... state -of- the-art approaches, the kind and number of different tags is generated by the method itself. We compute and merge two partitionings of word graphs: one based on context...
Ngày tải lên: 17/03/2014, 04:20
Báo cáo khoa học: "Automatic Discovery of Named Entity Variants – Grammar-driven Approaches to Non-alphabetical Transliterations" pptx
... proposal has great po- tential of increasing robustness of future NER work by enabling discovery of new and unknown translit- erated NE’s. Our study shows that resolution of transliterated NE variations ... Taiwan shukai@gmail.com Abstract Identification of transliterated names is a particularly difficult task of Named Entity Recognition (NER), especially in the Chi- nese context....
Ngày tải lên: 17/03/2014, 04:20
Báo cáo khoa học: "Unsupervised Learning of Dependency Structure for Language Modeling" potx
... Learning of Dependency Structure for Language Modeling Jianfeng Gao Microsoft Research, Asia 49 Zhichun Road, Haidian District Beijing 100080 China jfgao@microsoft.com Hisami Suzuki Microsoft ... particular, the probability of Equation (11) backs off to the estimate of P(w j |R), which is computed as: N RwC RwP j j ),( )|( = , (14) where N is the total number of dependencie...
Ngày tải lên: 17/03/2014, 06:20
Báo cáo khoa học: "Towards resolution of bridging descriptions" docx
... Background As part of our research on definite description (DD) interpretation, we asked 3 subjects to classify the uses of DDs in a corpus using a taxonomy related to the proposals of (Hawkins, ... DDs) found a total of 240 relations, dis- tributed over 107 cases of DDs. There were 54 cor- rect resolutions (distributed over 34 DDs) and 186 false positives. Types of bridg...
Ngày tải lên: 08/03/2014, 21:20
Báo cáo khoa học: "Semantic Transliteration of Personal Names" docx
... decision of gender had led to deterioration in MRR performance of the male names compared to the case where no prior information was assumed. Soft decision of gender yielded further gains of 17.1% ... the C-C corpus, out of the total of 4,507 characters, only 776 of them are for surnames. It is interesting to find that female given names are represented by a smaller set...
Ngày tải lên: 17/03/2014, 04:20