Báo cáo khoa học: "Combining Indicators of Allophony" doc

Báo cáo khoa học: "Combining Indicators of Allophony" doc

... extensive evaluation of individual indicators that rely on dis- tributional or lexical information. Then, we present a ﬁrst evaluation of the combination of indicators of different types, considering ... question of whether or not these indicators capture different aspects of allophony and, if so, which combination scheme yields better results. We present an extensive eval...

Ngày tải lên: 30/03/2014, 21:20

6 299 0

Báo cáo khoa học: "Unsupervised Decomposition of a Document into Authorial Components" pdf

... Linguistics Unsupervised Decomposition of a Document into Authorial Components Moshe Koppel Navot Akiva Idan Dershowitz Nachum Dershowitz Dept. of Computer Science Dept. of Bible School of Computer Science ... example is that of documents of his- torical significance that appear to be composites of multiple earlier texts. The challenge for literary scholars is to tease a...

Ngày tải lên: 17/03/2014, 00:20

9 359 0

Báo cáo khoa học: "Mechanical Translation of French" docx

... speaking, it is the longest part of a word common to all forms (inflections) of that word. The stem of donner, for example, is donn-; what remains of each inflection of donner after this has been ... a dictionary of a hundred, only fourteen will be necessary to cope with a dictionary of ten thousand or twenty-eight to cope with one of a hundred million. The capabilit...

Ngày tải lên: 23/03/2014, 13:20

10 259 0

Báo cáo khoa học: "AUTOMATED DETERMINATION OF SUBLANGUAGE" doc

... syntact/c usage from a sample of text in the sublanguage We describe the results of applying this procedure to taree text samples: two sets of medical documents and a set of equipment failure me~ages. ... Growth in thc size of the gr~mm.r as a function of the size of the text sample. X = the number of sentences (and sentence frag- ments) in the text samplc; ~" = th...

Ngày tải lên: 24/03/2014, 01:21

5 294 0

Báo cáo khoa học: "Combining a Statistical Language Model with Logistic Regression to Predict the Lexical and Syntactic Difﬁculty of Texts for FFL" potx

... complexity of a text’s syntactic structures through the traditional factor of the “mean number of words per sentence”, we de- cided to also take into account the difﬁculty of the conjugation of the ... ﬁeld of readability has since produced many formulae based on simple lexical and syntactic measures such as the average number of syllables per word, the average length of sen...

Ngày tải lên: 08/03/2014, 21:20

9 514 0

Báo cáo khoa học: " The Development of Lexical Resources for Information Extraction from Text Combining Word Net and Dewey Decimal Classification" potx

... tion requirement. Unfortunately one of the current trends in IE is the progressive reduction of the size of training corpora: e.g., from the 1,000 texts of the MUC-5 (MUC-5, 1993) to the 100 ... set of field labels. Field labels are indicators, generally used in dictionaries, which provide information about the use of the word in a semantic field. Semantic fields a...

Ngày tải lên: 08/03/2014, 21:20

4 436 0

Báo cáo khoa học: "Combining data and mathematical models of language change" ppt

... course of the N/V stress shift, and can thus be seen as pos- sible sources of the diachronic dynamics of N/V pairs. 5 5 This type of assumption is necessary for any hypothesis about the sources of ... analyze the dynamics of 5 dynamical systems models of linguistic populations, each derived from a model of learning by individuals. We compare each model’s dynamics to a set o...

Ngày tải lên: 17/03/2014, 00:20

11 407 0

Báo cáo khoa học: "Combining Orthogonal Monolingual and Multilingual Sources of Evidence for All Words WSD" pot

... in the annotations of nouns in Sem- Cor or the very ﬁne granularity of the nouns in WN. We know that 72% of the nouns, 74% of the verbs, 68.9% of the adjectives, and 81.9% of the adverbs directly ... example. Other types of errors resulted from the lack of a way to explicitly identify multiwords. Looking at the performance of TransCont we note that much of the loss is a re...

Ngày tải lên: 17/03/2014, 00:20

10 443 0

Báo cáo khoa học: "Automatic Detection of Syllable Boundaries Combining the Advantages of Treebank and Bracketed Corpora Training" docx

... describes that words are composed of syllables which consist of a string of phonemes or a single phoneme. The following table shows the frequencies of some of the rules of the analysis grammar that ... training corpus. The “slow” decrease of the number of unknown words of the test corpus is due to both the high amount of test data (242047 items) and the “slightly” growing...

Ngày tải lên: 23/03/2014, 19:20

8 455 0

Báo cáo khoa học: "Combining Source and Target Language Information for Name Tagging of Machine Translation Output" ppt

... result of the current token + that of the previous token F11: ET result of the current token + ET result of the previous token. F12: English NER result of the current token + that of the ... performance of current translation systems is not very good, and so the output is quite different from Standard English text. The fluency of the translated text will be poor, and t...

Ngày tải lên: 31/03/2014, 00:20

6 288 0