Báo cáo khoa học: "Experiments on Candidate Data for Collocation Extraction" pot
... standard for our evaluation, since the purely syntactic relation- ships that have to be annotated are less ambiguous than the distinction between collocations and non- collocational candidates. ... Stuttgart {evert,kermes}@ims.uni - stuttgart.de Abstract The paper describes ongoing work on the evaluation of methods for extract- ing collocation candidates from large text corpora. Our r...
Ngày tải lên: 08/03/2014, 21:20
... and semantics. Our experiments on question/answer classification show that the above models highly improve on bag-of-words on a TREC dataset. 1 Introduction Question Answering (QA) is an IR task ... Papers (Companion Volume), pages 113–116, Columbus, Ohio, USA, June 2008. c 2008 Association for Computational Linguistics Kernels on Linguistic Structures for Answer Extraction Alessa...
Ngày tải lên: 20/02/2014, 09:20
... action A may have preconditions B 1 , B2, , B.; - enablement - action A enables action B; - decomposition - action A is performed when subactions B 1 , B2 9 9 B„ are performed; generation ... weather forecast - with explicitly fully expressed information. The system creates a formal representation of the text that is equivalent to related database entries. Another Information Extractio...
Ngày tải lên: 31/03/2014, 20:20
Báo cáo khoa học: "Experiments on the Choice of Features for Learning Verb Classes" potx
... prepositional phrase information and selectional preferences. In contrast to previous approaches concentrating on the sparse data problem, we present ev- idence for a linguistically defined limit on ... provides selectional preference informa- tion on a fine-grained level: it specifies argument realisations for a specific verb-frame-slot combi- nation in form of lexical heads. Fo...
Ngày tải lên: 17/03/2014, 22:20
Tài liệu Báo cáo khoa học: "Collecting Highly Parallel Data for Paraphrase Evaluation" doc
... retain workers who performed well. Since the scope of this data collection effort ex- tended beyond gathering English data alone, we 3 Everyone who submitted descriptions in a foreign language was ... metric. We designed our data collection framework for use on crowdsourcing platforms such as Amazon’s Mechanical Turk. Crowdsourcing can allow inex- pensive and rapid data collection...
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: "Learning with Unlabeled Data for Text Categorization Using Bootstrapping and Feature Projection Techniques" doc
... data and title words? Maybe unlabeled data don’t have any information for building a text classifier because they do not contain the most important information, their category. Thus we must assign ... categorization (Slonim et al., 2002). Nigam studied an Expected Maximization (EM) technique for combining labeled and unlabeled data for text categorization in his dissertation...
Ngày tải lên: 20/02/2014, 16:20
Báo cáo khoa học: "CONSTRAINTS ON THE GENERATION OF ADJUNCT CLAUSES" potx
... from similar constructions. The next consideration, and the subject of the present section, is how the construction should be situated within the generation process: what decisions, made at ... the content of the matrix. 209 3.1 Decision Making in Generation In generation, unlike comprehension, the speaker's appreciation of his situation, his goals, and the information that ......
Ngày tải lên: 24/03/2014, 02:20
Báo cáo khoa học: "Using Noisy Bilingual Data for Statistical Machine Translation" pot
... corpora. Translation results for a Chinese to En- glish translation task are given. 1 Introduction Statistical machine translation systems typically use a translation model trained on bilingual data and ... language model for the target language, trained on perhaps some larger monolingual data. Often the amount of clean parallel data is limited. This leads to the question of whe...
Ngày tải lên: 24/03/2014, 03:20
Báo cáo khoa học: "Semi-supervised Convex Training for Dependency Parsing" potx
... hat-loss), which is non-convex. Therefore the objective as a whole is non-convex, making the search for global optimal difficult. Note that the root of the optimiza- tion difficulty for S3VMs is the non-convex ... outperforms the super- vised one, without much additional computational cost. There remain many directions for future work. One obvious direction is to use the whole Penn Tree-...
Ngày tải lên: 08/03/2014, 01:20
Báo cáo khoa học: "An Extensive Empirical Study of Collocation Extraction Methods" ppt
... amounts of data and limited scalability of some methods to high order n-grams. The exper- iments are performed on Czech data. 13 2 Collocation extraction Most methods for collocation extraction are ... us- 16 0.9 0.5 0.1 16.98.80.7 Cosine context similarity in boolean vector space Pointwise mutual information collocations non-collocations linear discriminant Figure 2: Data visua...
Ngày tải lên: 08/03/2014, 04:22