Báo cáo khoa học: "Unsupervised Induction of Modern Standard Arabic Verb Classes Using Syntactic Frames and LSA" pdf
... inclusion of syntactic frames, LSA vectors, morphological pattern, and sub- ject animacy. The best set of parameters yields an F β=1 score of 0.456, compared to a random baseline of an F β=1 score of 0.205. 1 ... Introduction The creation of the Arabic Treebank (ATB) and Arabic Gigaword (AG) facilitates corpus based studies of many interesting linguistic phenomena in...
Ngày tải lên: 31/03/2014, 01:20
... Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 428–435, Sydney, July 2006. c 2006 ... Computational Linguistics 428 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 1 2 3 4 5 6 7 8 entropy offset 429 430 431 432 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.55 0.6 0.65 0.7 0.75
Ngày tải lên: 20/02/2014, 12:20
... and French. The English data is an edited ver- sion of the public-domain portion of the corpus used by Sonderegger (2011), and consists of just under 12000 stanzas spanning a range of poets and ... the most common scheme of the appropriate length from the gold standard of the given sub-corpus. Sub-corpus Sub-corpus overview Accuracy (%) F-Score (time- # of Total # # of...
Ngày tải lên: 07/03/2014, 22:20
Báo cáo khoa học: "Unsupervised Learning of Acoustic Sub-word Units" pot
... learning of phonemes of a language directly from speech is demon- strated via an algorithm for joint unsupervised learning of the topology and parameters of a hidden Markov model (HMM); states and short ... minutes of speech is adequate for learning the acoustic units. 2 An Improved and Fast SSS Algorithm The improvement of the SSS algorithm of Takami and Sagayama (1992)...
Ngày tải lên: 08/03/2014, 01:20
Báo cáo khoa học: "Unsupervised Learning of Arabic Stemming using a Parallel Corpus" pot
... by GOLD). The latter is a state of the art Arabic stem- mer, and was built using rules, suffix and prefix lists, and human annotated text. GOLD is an earlier ver- sion of the stemmer described in (Lee ... training and testing set came from the same collection. 3.2 Task-Based Evaluation : Arabic Information Retrieval Task Description: Given a set of Arabic documents and a...
Ngày tải lên: 08/03/2014, 04:22
Báo cáo khoa học: "Automatic Induction of Finite State Transducers for Simple Phonological Rules" pptx
... state with all of the incoming and outgoing transitions of s and f. The result of the first merging operation on the transducer of Figure 2 is shown in Figure 3, and the end result of the OSTIA ... subset of the pho- netic features. Thus if we think of the transducer as a set of rewrite rules, we can now express the context of each rule as a regular expression...
Ngày tải lên: 08/03/2014, 07:20
Báo cáo khoa học: "Crosslingual Induction of Semantic Roles" potx
... induc- tion of syntactic structures (Kuhn, 2004; Snyder et al., 2009) or morphologic analysis (Snyder and Barzilay, 2008) and we are not aware of any pre- vious work on induction of semantic ... (Swier and Stevenson, 2004), where the VerbNet verb lexicon was used to guide unsuper- vised learning, and a generative model of Grenager and Manning (2006) which exploits ling...
Ngày tải lên: 16/03/2014, 19:20
Báo cáo khoa học: "Unsupervised Decomposition of a Document into Authorial Components" pdf
... sub-genre (prophetic works), and each is widely thought to consist primarily of the work of a single distinct author. Jeremiah consists of 52 chapters and Eze- kiel consists of 48 chapters. For our ... One of the advantages of using biblical litera- ture is the availability of a great deal of manual annotation. In particular, we are able to identify synsets by expl...
Ngày tải lên: 17/03/2014, 00:20
Báo cáo khoa học: "Unsupervised Part-of-Speech Tagging Employing Efficient Graph Clustering" ppt
... approaches to derive syntactic categories. All of them employ a syntactic version of Harris’ distributional hypothesis: Words of similar parts of speech can be observed in the same syntactic contexts. ... state -of- the-art approaches, the kind and number of different tags is generated by the method itself. We compute and merge two partitionings of word graphs: one...
Ngày tải lên: 17/03/2014, 04:20
Báo cáo khoa học: "Automatic Induction of a CCG Grammar for Turkish" pptx
... heuristics consist of morphological infor- mation like existence of a “PRESPART” morpheme in (8), and part -of- speech of the word. However, there is still a problem in cases like (9a) and (9b). Since ... (CG) of Aj- dukiewicz (1935) and Bar-Hillel (1953). CG, and extensions to it, are lexicalist approaches which deny the need for movement or deletion rules in syntax. Transpar...
Ngày tải lên: 17/03/2014, 06:20