... + I i,r j<i:r i =r j θ x i ,x j / w θ w,x i (2) 3 While the number of rhyme schemes of length n is tech- nically the number of partitions of an n- element set (the Bell number), only a subset of these are typically used. 78 P ... is an edited ver- sion of the public-domain portion of the corpus used by Sonderegger (2011), and consists of just under 12000 stanzas spannin...
Ngày tải lên: 07/03/2014, 22:20
... is the left part of word, RP is the right part of it, Len (p) is the length of part P (number of characters), freq(p) is the frequency of part P in corpus, WN is the number of words (corpus ... length of the corpus. Given a probabil- istic model of the corpus, the description length is the sum of the most compact statement of the model expressible in some universal la...
Ngày tải lên: 17/03/2014, 22:20
Báo cáo khoa học: "Unsupervised Discovery of Domain-Specific Knowledge from Text" pptx
... Computational Linguistics Unsupervised Discovery of Domain-Specific Knowledge from Text Dirk Hovy, Chunliang Zhang, Eduard Hovy Information Sciences Institute University of Southern California 4676 Admiralty ... Research with a Series of Reading Tasks. In Proceedings of LREC 2010. Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: a core of semantic knowled...
Ngày tải lên: 23/03/2014, 16:20
Báo cáo khoa học: "Unsupervised Discovery of Generic Relationships Using Pattern Clusters and its Evaluation by Automatically Generated SAT Analogy Questions" pot
... sets of pairs into several clusters, where each cluster corresponds to one of a known set of re- lationship types. Their classification setting is thus very different from our unsupervised discovery ... appears in instances of this pattern extracted from the original corpus or retrieved from the web during evaluation (see Section 5.2). Thus if some pair appears in most of pat...
Ngày tải lên: 23/03/2014, 17:20
Báo cáo khoa học: Kinetic characterization of methionine c-lyases from the enteric protozoan parasite Entamoeba histolytica against physiological substrates and trifluoromethionine, a promising lead compound against amoebiasis ppt
... Nozaki 1 1 Department of Parasitology, Gunma University Graduate School of Medicine, Japan 2 Department of Applied Biology, Graduate School of Science and Technology, Kyoto Institute of Technology, ... development of new chemotherapeutics against amoebiasis. For the further development of antiamoebic agents based on TFM, elucidation of the underlying reaction mechanisms of M...
Ngày tải lên: 07/03/2014, 05:20
Tài liệu Báo cáo khoa học: "Unsupervised Segmentation of Chinese Text by Use of Branching Entropy" pdf
... Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 428–435, Sydney, July 2006. c 2006 ... Computational Linguistics 428 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 1 2 3 4 5 6 7 8 entropy offset 429 430 431 432 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.55 0.6 0.65 0.7 0.75
Ngày tải lên: 20/02/2014, 12:20
Báo cáo khoa học: "Unsupervised Learning of Acoustic Sub-word Units" pot
... France emmanuel.dupoux@gmail.com Abstract Accurate unsupervised learning of phonemes of a language directly from speech is demon- strated via an algorithm for joint unsupervised learning of the topology and parameters of a hidden Markov model ... con- ditional models of the acoustics given a phoneme- sequence. The phonemic pronunciation of words and the phonemes of the languag...
Ngày tải lên: 08/03/2014, 01:20
Báo cáo khoa học: "Unsupervised Learning of Arabic Stemming using a Parallel Corpus" pot
... joint probability of the prefix and suffix, with a smoothing back-off (the product of the individual probabilities). Scor- ing models of this form proved to be poor perform- ers from the beginning, ... drawbacks that prevent us from using it on a corpus other than the training corpus. Both of the drawbacks below are brought about by the small size of the parallel corpus: • Out -of...
Ngày tải lên: 08/03/2014, 04:22
Báo cáo khoa học: "Unsupervised Decomposition of a Document into Authorial Components" pdf
... One of the advantages of using biblical litera- ture is the availability of a great deal of manual annotation. In particular, we are able to identify synsets by exploiting the availability of ... are not part of what we might think of as the core of either cluster. Informally, we say that a unit is in the core of its cluster if it is sufficiently simi- lar to the centro...
Ngày tải lên: 17/03/2014, 00:20
Báo cáo khoa học: "Unsupervised Part-of-Speech Tagging Employing Efficient Graph Clustering" ppt
... All of them employ a syntactic version of Harris’ distributional hypothesis: Words of similar parts of speech can be observed in the same syntactic contexts. Contexts in that sense are often ... state -of- the-art approaches, the kind and number of different tags is generated by the method itself. We compute and merge two partitionings of word graphs: one based on context...
Ngày tải lên: 17/03/2014, 04:20