collecting a whyquestion corpus

Tài liệu Báo cáo khoa học: "Collecting a Why-question corpus for development and evaluation of an automatic QA-system" pdf

... each paid reward. • Qualifications To improve the data quality, a HIT can also be attached to certain tests, “qualifications” that are either system-provided or created by the requester. An example ... the assign- ments have been completed. • Rewards At upload time, each HIT has to be assigned a fixed reward, that cannot be changed later. Minimum reward is $0.01. Amazon.com collects a 10% (or a ... excess of information. FAQ-pages tend to also answer questions which are not asked, and also contain practical examples. Human-powered answers often contain unrelated information and discourse- like...

Ngày tải lên: 20/02/2014, 09:20

9 611 1

Báo cáo khoa học: "Unsupervised Learning of Arabic Stemming using a Parallel Corpus" pot

Ngày tải lên: 08/03/2014, 04:22

8 424 0

Báo cáo khoa học: "The Human Language Project: Building a Universal Corpus of the World’s Languages" pptx

Ngày tải lên: 16/03/2014, 23:20

10 574 0

Báo cáo khoa học: "Extracting Paraphrases from a Parallel Corpus" pdf

Ngày tải lên: 23/03/2014, 19:20

8 358 0

Báo cáo khoa học: "Encoding a Parallel Corpus for Automatic Terminology" pot

Ngày tải lên: 24/03/2014, 03:20

2 249 0

Tài liệu Cyber Forensics—A Field Manual for Collecting, Examining, and Preserving Evidence of Computer Crimes ppt

... project manager, and auditor. Dedication Erienne, Kristina, and Andy Michael Jordan said it best, thus, what more can I say… I approached practices the same way I approached games. You can't ... participates in the St. Louis InfraGard chapter. John W. Rado is a geospatial analyst at National Imagery and Mapping Agency (NIMA) in St. Louis, Missouri. John has worked for NIMA since January ... contains inappropriate material. Also assume there are several people involved, that one individual initially sent the inappropriate e−mail and several additional individuals passed (e−mailed/forwarded) it around....

Ngày tải lên: 18/01/2014, 06:20

346 1,5K 0

Tài liệu Báo cáo khoa học: "Creating a manually error-tagged and shallow-parsed learner corpus" pptx

... Computational Linguistics Creating a manually error-tagged and shallow-parsed learner corpus Ryo Nagata Konan University 8-9-1 Okamoto, Kobe 658-0072 Japan rnagata @ konan-u.ac.jp. Edward Whittaker ... 44th Annual Meeting of ACL, pages 241–248. Katsuaki Okihara. 1985. English writing (in Japanese). Taishukan, Tokyo. Alla Rozovskaya and Dan Roth. 201 0a. Annotating ESL errors: Challenges and rewords. ... Vera Sheinman The Japan Institute for Educational Measurement Inc. 3-2-4 Kita-Aoyama, Tokyo, 107-0061 Japan whittaker,sheinman @jiem.co.jp Abstract The availability of learner corpora, especially those...

Ngày tải lên: 20/02/2014, 04:20

10 467 0

Tài liệu Báo cáo khoa học: "Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political debates" pdf

... sentiments for a variety of topics and corresponding targets are potentially involved (Riloff and Wiebe., 2003; Sarmento et al., 2009). Alternative approaches to automatic and manual construction ... Natu- ral Language Processing and Computational Natural Language Learning, Prague. Krippendorﬀ, Klaus. 2004. Content Analysis: An Intro- duction to Its Methodology, 2 nd Edition. Sage Publi- cations, ... 564–568, Portland, Oregon, June 19-24, 2011. c 2011 Association for Computational Linguistics Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political debates Paula Carvalho...

Ngày tải lên: 20/02/2014, 05:20

5 499 0

Tài liệu Báo cáo khoa học: "ModelTalker Voice Recorder – An Interface System for Recording a Corpus of Speech for Synthesis" ppt

... pitch, amplitude and pronuncia- tion and users are given immediate feedback on the acceptability of each recording. Users can then rerecord an unacceptable utterance. Recordings are automatically ... utterance. This alignment is retained so that each utterance is automatically labeled. Once the entire corpus has been recorded, alignments are automatically refined based on specific individual ... naturalness and individuality one associates with one’s own voice. Individuals with difficulty speak- ing can be any age, gender, and from any part of the country, with regional dialects and...

Ngày tải lên: 20/02/2014, 09:20

4 419 0

Tài liệu Báo cáo khoa học: "WebCAGe – A Web-Harvested Corpus Annotated with GermaNet Senses" docx

... hand-crafted sense-annotated corpora have been available (Agirre et al., 2007; Erk and Strapparava, 2012; Mihalcea et al., 2004), while WSD research for languages that lack these corpora has lagged behind ... the 3rd In- ternational Language Resources and Evaluation (LREC’02), Las Palmas, Canary Islands, pp. 609– 612 Santamar ´ a, C., Gonzalo, J., Verdejo, F. 2003. Au- tomatic Association of Web Directories ... representative examples in Yarowsky’s approach is performed completely manually and is therefore limited to the amount of data that can reasonably be annotated by hand. Leacock et al. (1998), Agirre...

Ngày tải lên: 22/02/2014, 03:20

10 419 0

Tài liệu Báo cáo khoa học: "Using an Annotated Corpus as a Stochastic Grammar" ppt

... ~ A may be 40 M. Marcus, 1991. "Very Large Annotated Database of America~ English". DARPA Speech and Naawal Language Workshop, ~ Grove, Morgan Kaufmarm. F. Pereira and Y. Schabes, ... the Air Travel Information System (ATIS) spoken language corpus. Preliminary experiments yield 96% test set parsing accuracy. 1 Motivation As soon as a formal grammar characterizes a non- ... trivial part of a natural language, .almost every input string of reasonable length gets an unmanageably large number of different analyses. Since most of these analyses are not perceived as...

Ngày tải lên: 22/02/2014, 10:20

8 393 0

PRACTICAL TAXIDERMY A MANUAL OF INSTRUCTION TO THE AMATEUR IN COLLECTING, PRESERVING, AND SETTING UP NATURAL HISTORY SPECIMENS OF ALL KINDS doc

... which date naturalists appear to have had some idea of the proper preservation and mounting of natural history specimens; but Réaumur, more than a century and a quarter ago, published a treatise ... leather and of furs; but of the actual setting up of animals as specimens I can find no trace. I doubt, however, if we can carry taxidermy proper farther back than to about 150 years ago, at ... sum and substance of my interview is as follows: The nets, which are of two pieces, are each about twelve yards long by two-and -a- half yards wide, and are made with a three-quarter mesh of what...

Ngày tải lên: 06/03/2014, 13:20

363 612 0

Báo cáo khoa học: " a Movie Dialogue Corpus for Research and Development" potx

... Seve- ral factors, such as the availability of more power- ful computers, an almost unlimited storage ca- pacity, the availability of large volumes of data in digital format, as well as the ... dialogue management and natural language generation. Springer. Stallard D (2000) Talk’n’travel: a conversational system for air travel planning. In Proceedings of the 6 th Conference on Applied ... hand, contain all additional information/texts appearing in the scripts, which are typically of narrative nature and explain what is happening in the scene. Figure 1 depicts a browser snapshot...

Ngày tải lên: 07/03/2014, 18:20

5 424 0

Báo cáo khoa học: "A Corpus for Modeling Morpho-Syntactic Agreement in Arabic: Gender, Number and Rationality" docx

... Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus. In NEMLAR Conference on Arabic Language Resources and Tools, pages 102–109, Cairo, Egypt. Yuval Marton, Nizar Habash, and ... Func- tional Approach. In Proceedings of the seventh In- ternational Conference on Language Resources and Evaluation (LREC), Valletta, Malta. Mohammed Attia. 2008. Handling Arabic Morpholog- ical and ... Society for Information Science and Technology, 55(3):189– 213. Mohamed Altantawy, Nizar Habash, Owen Rambow, and Ibrahim Saleh. 2010. Morphological Analysis and Generation of Arabic Nouns: A Morphemic...

Ngày tải lên: 07/03/2014, 22:20

6 378 0

Báo cáo khoa học: "Solving Relational Similarity Problems Using the Web as a Corpus" potx

... relations like equative, e.g., ﬁnding player and coach on the Web suggests an equative relation for player coach (and for coach player). As Table 3 shows, this is different for SAT ver- bal analogy, ... helpful. 456 References Hiyan Alshawi and David Carter. 1994. Training and scaling preference functions for disambiguation. Computational Linguistics, 20(4):635–648. Ken Barker and Stan Szpakowicz. 1998. Semi-automatic recognition ... Science and Engineering. Christiane Fellbaum, editor. 1998. WordNet: An Elec- tronic Lexical Database. MIT Press. Roxana Girju, Dan Moldovan, Marta Tatu, and Daniel Antohe. 2005. On the semantics...

Ngày tải lên: 08/03/2014, 01:20

9 390 0

Báo cáo khoa học: "Learning Semantic Links from a Corpus of Parallel Temporal and Causal Relations" doc

... e.g. took and began meet only at their roots, so the LCA senses are act#0 and be#0. We also extracted temporal and causal word associ- ations from the Google N-gram corpus (Brants and Franz, 2006), ... achieving an F-measure of 49.0 for temporals and 52.4 for causals. Analysis of these models suggests that additional data will improve perfor- mance, and that temporal information is crucial to causal ... existing corpora are missing some crucial pieces for study- ing temporal-causal interactions. Our research aims to ﬁll these gaps by building a corpus of parallel temporal and causal relations and exploring...

Ngày tải lên: 08/03/2014, 01:20

4 363 0

Báo cáo khoa học: "Creating a Corpus of Parse-Annotated Questions" docx

... data repeat Parse a new section of raw data Manually correct errors in the parser output Add the corrected data to the training set Extract a new grammar for the parser until All the data has been processed Algorithm ... of Pennsylvania, Philadelphia, PA. Daniel Gildea. 2001. Corpus variation and parser perfor- mance. In Lillian Lee and Donna Harman, editors, Pro- ceedings of EMNLP, pages 167–202, Pittsburgh, PA. Charles ... can be rapidly induced from appropri- ate treebank material. However, treebank- and machine learning-based grammatical resources re- ﬂect the characteristics of the training data. They generally...

Ngày tải lên: 08/03/2014, 02:21

8 405 0

Báo cáo khoa học: "Test Collection Selection and Gold Standard Generation for a Multiply-Annotated Opinion Corpus" potx

... strict and lenient met- rics are also applied in annotations of relevance. 4.2 High agreement To see how the generated gold standards agree with the annotations of all annotators, we analyze ... gold standard; for the lenient metric, sentences with annotations agreed by at least two annotators are selected as the testing collection and the major- ity of annotations are treated as the ... of annotations are listed and two methods are introduced to evaluate the quality of the human-tagged opinion corpora. 3.1 Combinations of annotations Three major properties are annotated for...

Ngày tải lên: 08/03/2014, 02:21

4 418 0

Báo cáo khoa học: "Generating Usable Formats for Metadata and Annotations in a Large Meeting Corpus" pptx

... metadata and annotations. The annotation files are converted to a tabular format using an eas- ily adaptable XSLT-based mechanism, and their consistency is verified in the process. Metadata files are ... order to generate tabular files (TSV) and a table-creation script. 4. Create and populate metadata tables within database. 5. Adapt the XSLT stylesheet as needed for vari- ous table formats. 5 Results: ... names or analyse folders. Moreover, the ad- vantage of creating IMDI files is that the metadata is compliant with a widely used standard accompa- nied by freely available tools such as the metadata browser....

Ngày tải lên: 08/03/2014, 02:21

4 373 0

Báo cáo khoa học: "Generation of VP Ellipsis: A Corpus-Based Approach" ppt

Ngày tải lên: 08/03/2014, 05:20

8 353 0