... the head ‘rescue’, and lemma:failing arg:ARG1 var:bank which indicates that the argument of ‘failing’ is ‘bank’. Note that any tree can be transformed into a feature for a particular lexical item ... (The pattern-based approach uses a set of manually- constructed patterns applied to a web search.) In the same vein, Geffet and Dagan (2005) fil- ter the result of a pattern-based system...
Ngày tải lên: 08/03/2014, 21:20
... boundaries as hidden variables and include probabilities for let- ter transitions within segments. The ad- vantage of this model family is that it can learn from small datasets and easily gen- eralises ... 2006). They used a natural language tagger which was trained on the output of ParaMor and Morfes- sor. The goal was to mimic each algorithm since ParaMor is rule-based and there is no acc...
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: "Joint Word Segmentation and POS Tagging using a Single Perceptron" docx
... character) 12 tag t on a word starting with char c 0 and containing char c 13 tag t on a word ending with char c 0 and containing char c 14 tag t on a word containing repeated char cc 15 tag ... w 4 a word of length l with starting character c 5 a word of length l with ending character c 6 space-separated characters c 1 and c 2 7 character bigram c 1 c 2 in any word 8 th...
Ngày tải lên: 20/02/2014, 09:20
Tài liệu Báo cáo khoa học: "Word Order in German: A Formal Dependency Grammar Using a Topological Hierarchy" pptx
... Word Order in German: A Formal Dependency Grammar Using a Topological Hierarchy Kim Gerdes Lattice, Université Paris 7 75251 Paris Cedex 05 France kim@linguist.jussieu.fr Sylvain Kahane Lattice, ... occu- pied by a non-verbal phrase or by a verb cre- ating an embedded domain. 3 Formalization A grammar in the formalism we introduce in the following will be called a Topological...
Ngày tải lên: 20/02/2014, 18:20
Tài liệu Báo cáo khoa học: "A Method for Correcting Errors in Speech Recognition Using the Statistical Features of Character Co-occurrence" pptx
... method also obtains reliably recognized partial segments of an utterance by cooperatively using both grammatical and n-gram based statistical language constraints, and uses a robust parsing technique ... '~:t~ ~ 5 ~%~ ~'¢,V,,~ ~-)~,~/19~'~,='~°~ ~'¢ ' "Hai arigatou gozaimasu Kyoto Kanko Hoteru yoyaku gakari de gozaimasu", ('l'hank y...
Ngày tải lên: 20/02/2014, 18:20
Tài liệu Báo cáo khoa học: "Choosing the Word Most Typical in Context Using a Lexical Co-occurrence Network" ppt
... 'synonyms' is the most appropriate for achieving the desired pragmatic goals: but this is necessary for high- quality machine translation and natural language genera- tion. Knowledge-based approaches ... always typical. A particular word might occur in a 'pat- tern' in which another synonym was seen more often, making it the typical choice. Thus, we cannot exp...
Ngày tải lên: 22/02/2014, 03:20
Báo cáo khoa học: "Constituent-Based Morphological Parsing: A New Approach to the Problem of Word-Recognition" pdf
... morphological and syntactic parsers on a more complicated example: Ngarrka-ngku.ka marlu marna-kurra luwa.rnu ngarni.nja-kurra (man-ergative-aux kangaroo grass-obj shoot-past eat-infmitive-obj) 'The ... ~UA ~ liB! PIO V'IA'RI jI M@AJUf| WJA ~UA (b) Figure 2a is the phonological representation for the sentence: ngarrka.ngku.ka marlu marna.kurra luwa.rnu ngarni.nja...
Ngày tải lên: 08/03/2014, 18:20
Báo cáo khoa học: "Determining Word Sense Dominance Using a Thesaurus" potx
... and citations therein); (ii) compu- tational ease—with just around a thousand cate- gories, the word category matrix has a manage- able size; (iii) widespread availability—thesauri are available ... vocabulary into around a thousand categories. Each category has a list of semantically related words, which we will call category terms or c-terms for short. Words with multiple meanings...
Ngày tải lên: 08/03/2014, 21:20
Báo cáo khoa học: "A Syllable Based Word Recognition Model for Korean Noun Extraction" potx
... it requires a training data. Because the existing Korean POS tagged corpora are annotated by a morpheme level, we cannot use them as a training data without converting the data suitable for the word recognition model. ... pure Korean syllables are pos- sible 3 Actually, of syllables are used in the training data, including Korean characters and non-Korean characters (e.g. al- phabet...
Ngày tải lên: 17/03/2014, 06:20