... 9.71M 0.50M 9.45M 1.19M 4.1 Using a morphological tagger and disambiguator The split version of the corpus contains words that are split into their stem and suffix forms by using a previously developed ... 2 gives the total log-probability (using log 2 ) for the split and unsplit datasets using n-gram models of different order. We compute the perplexity of the two datasets using a common denomina- tor: ... that using differing dependency offsets for stems and suffixes can improve the perplexity. 345 which corresponds to a 27% improvement. 4.2 Separation of stem and suffix models Only 45% of the words...
Ngày tải lên: 20/02/2014, 09:20
... to combine sentences. Sentence include transition phrases and conjunctive adverbs Transition phrases are the phrases that play role as linking words. Transition phrases help establish clear ... II: A study on using transition signals in writing a paragraph… 22 II.1. The function of transition signals in writing………………………………22 II.2. The position and punctuation of transition signals……………………….23 ... signals……………………….23 II.3. Using transition signals in writing an English paragraph……………… 25 II.3.1. Transition signals in clauses and sentences……………………………… 25 II.3.1.1 Transition signals in clauses………………………………………………25...
Ngày tải lên: 19/03/2014, 17:11
kalman filtering theory and practice using matlab - grewal and andrews
Ngày tải lên: 08/04/2014, 10:14
Tài liệu Báo cáo khoa học: "Extracting Semantic Orientations of Words using Spin Model" pdf
... (Schmid, 1994). 35 stopwords (quite fre- quent words such as “be” and “have”) are removed from the lexical network. Negation words include 33 words. In addition to usual negation words such as “not” ... extracted the words tagged with “Positiv” or “Negativ”, and reduced multiple-entry words to single entries. As a result, we obtained 3596 words (1616 positive words and 1980 negative words) 1 . ... computation converged. The words with high final average values are clas- sified as positive words. The words with low final average values are classified as negative words. 4.3 Hyper-parameter Prediction The...
Ngày tải lên: 20/02/2014, 15:20
Tài liệu Báo cáo khoa học: "An Evaluation Method of Words Tendency using Decision " docx
... classes. The words belong to each class is called: increasing -words, relatively constant -words, and decreasing -words respectively. Table 1 shows a sample of some classified words according ... the words in each group. Table 1 Sample of Classified Words Stability Class Example of words in each class Increasing Words Sammy-Sosa, McGwire, Carlos-Delgado Relatively constant words ... of words frequency with time- series variation included in both periods. The data of extracted words is shown in Table 2. In order to get the accuracy of the correct words that are words...
Ngày tải lên: 20/02/2014, 16:20
Báo cáo khoa học: "Using WordNet to Automatically Deduce Relations between Words in Noun-Noun Compounds" docx
... the two words in that compound. Sets of compounds from other sources would not have such associated definitions. Second, by using compounds from WordNet, we could guarantee that all constituent words ... that the correct re- lation between two words in a compound can be deduced by finding other compounds containing words from the same semantic categories as the words in the compound to be disambiguated: ... obtained for that relation from any other sense-pair, using the first term of the score tuple as the main key for comparison (lines 14 and 15), and using the second term as a tie-breaker (lines 16...
Ngày tải lên: 08/03/2014, 02:21
Báo cáo khoa học: "An Unsupervised Approach to Prepositional Phrase Attachment using Contextually Similar Words" potx
... Similar Words The contextually similar words of a word w are words similar to the intended meaning of w in its context. Below, we describe an algorithm for constructing contextually similar words ... parsed corpus. Attachment decisions are made using a linear combination of features and low frequency events are approximated using contextually similar words. Introduction Prepositional phrase attachment ... the contextually similar words of w. We retrieve from the collocation database the words that occurred in the same dependency relationship as w. We refer to this set of words as the cohort of w...
Ngày tải lên: 08/03/2014, 05:20
Báo cáo khoa học: "USING AN ONLINE DICTIONARY TO FIND RHYMING WORDS AND PRONUNCIATIONS FOR UNKNOWN WORDS " doc
... rhyming words, in WordSmith's rhyming dimension, for an unknown word. Z. Rhyme The WordSmith rhyme dimension is based on two files. The first is a main file keyed on the spelling of words ... words. They also show how answer~ to these psycholinguistic questions can, in turn, contribute to 282 USING AN ON=LINE DICTIONARY TO FIND RHYMING WORDS AND PRONUNCIATIONS FOR UNKNOWN WORDS ... in the pronunciation of known words (Rosson, 1985). Until recently, it was generally assumed that novel words or pseudowords (letter strings which are not real words of English but which conform...
Ngày tải lên: 08/03/2014, 18:20
Báo cáo khoa học: "Efficient Unsupervised Discovery of Word Categories Using Symmetric Patterns and High Frequency Words" ppt
... number of words present in both C and WN divided by N; (2) Precision*: the number of correct words divided by N. Correct words are ei- ther words that appear in the WN subtree, or words whose ... manner, using meta-patterns comprised of high frequency words and content words. 2. Identification of pattern candidates that give rise to symmetric lexical relationships. This is done using simple ... of words present in both C and WN divided by the number of (single) words in WN; (4) The num- ber of correctly discovered words (New) that are not in WN. The Table also shows the number of WN words...
Ngày tải lên: 23/03/2014, 18:20
Báo cáo khoa học: "Guessing Parts-of-Speech of Unknown Words Using Global Information" ppt
... POS tags of unknown words. We propose a probabilistic model for POS guessing of unknown words using global information as well as local information, and estimate its parameters using Gibbs sampling. ... estimated using all the training data (Figure 2, *2). Local 3 A major method for generating such pseudo unknown words is to collect the words that appear only once in a cor- pus (Nagata, 1999). These words ... unknown words whose lexical forms appear only once in the training or test data, so we process only non- unique unknown words (unknown words whose lexical forms appear more than once) using the proposed...
Ngày tải lên: 23/03/2014, 18:20
Báo cáo khoa học: "Using bilingual dependencies to align words in Enlish/French parallel corpora" ppt
... align words using various syntactic relations in both languages, even though the category of the words under consideration is different. 5.4 Comparative evaluation The results achieved using ... of dependency relations to align words (Debili & Zribi, 1996). The reasoning is as follows (Figure 1): if there is a pair of anchor words, i.e. if two words w1 i (community in the ex- ample) ... anchor pair consisting of two words that are transla- tions of one another within aligned sen- tences, the alignment link is propagated to syntactically connected words. 1 Introduction It is...
Ngày tải lên: 23/03/2014, 19:20
Báo cáo khoa học: "Aligning words using matrix factorisation" pptx
... produced: If the MAP assigns f -words but no e -words to a cept (because e -words have more probable cepts), we may pro- duce “orphan” cepts, which are aligned to words only on one side. One way ... statistical model (eg IBM models), or it may be introduced to accommodate words which have a low association measure with all other words. Using PLSA, we can deal with null alignments in a principled way ... general M-N alignments. We formalise this using the notion of cepts: a cept is a central pivot through which a subset of e- words is aligned to a subset of f -words. General M-N alignments then correspond...
Ngày tải lên: 23/03/2014, 19:20
Báo cáo khoa học: "A Part of Speech Estimation Method for Japanese Unknown Words using a Statistical Model of Morphology and Context" pptx
... origin words. Roman alphabet is also used for Western origin words and acronyms. Arabic numeral is used for numbers. Most Japanese words are written in kanji, while more recent loan words ... statistical model of Japanese unknown words using word morphology and word context. We find that Japanese words are better modeled by clas- sifying words based on the character sets (kanji, ... unknown words in the system output to the all unknown words in the test sentences. Precision is the percentage of correctly segmented unknown words in the system's output to the all words...
Ngày tải lên: 23/03/2014, 19:20
Báo cáo khoa học: "Improved Modeling of Out-Of-Vocabulary Words Using Morphological Classes" docx
... distribution of OOV words (cf. Table 2). The c 1 model with θ = 1 is specialized for predicting words after unknown nouns and cardinal numbers and two thirds of the unknown words are of exactly that ... The word transition proba- bility of such a model is given by equation 1, where c i denotes the cluster of the word w i . The class transition probability P (c 3 |c 1 c 2 ) is estimated using the ... The model could be further improved by using contex- tual information for the word clustering and training a classifier based on morphological features to as- sign OOV words to these clusters. Acknowledgments....
Ngày tải lên: 30/03/2014, 21:20