a multilingual parallel corpus

Báo cáo khoa học: "Unsupervised Learning of Arabic Stemming using a Parallel Corpus" pot

Báo cáo khoa học: "Unsupervised Learning of Arabic Stemming using a Parallel Corpus" pot

... 1999). Usually, entire documents are translated by humans, and the sentence pairs are subsequently aligned by automatic means. A small parallel corpus can be available when native speakers and translators ... is a language-independent algo- rithm. English Phrase: the advisory committee Arabic Phrase: Alljnp AlAst $ Aryp Task: stem AlAst $ Aryp Choices Score AlAst$Aryp 0.2 AlAst$Aryp 0.7 AlAst$Aryp ... 2 Approach Figure 1: Approach Overview Our approach is based on the availability of the following three resources: • a small parallel corpus • an English stemmer • an optional unannotated Arabic...

Ngày tải lên: 08/03/2014, 04:22

8 424 0
Tài liệu Báo cáo khoa học: "Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political debates" pdf

Tài liệu Báo cáo khoa học: "Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political debates" pdf

... sentiments for a variety of topics and corresponding targets are potentially involved (Riloff and Wiebe., 2003; Sarmento et al., 2009). Alternative approaches to automatic and manual construction ... Natu- ral Language Processing and Computational Natural Language Learning, Prague. Krippendorff, Klaus. 2004. Content Analysis: An Intro- duction to Its Methodology, 2 nd Edition. Sage Publi- cations, ... 564–568, Portland, Oregon, June 19-24, 2011. c 2011 Association for Computational Linguistics Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political debates Paula Carvalho...

Ngày tải lên: 20/02/2014, 05:20

5 499 0
Tài liệu Báo cáo khoa học: "Collecting a Why-question corpus for development and evaluation of an automatic QA-system" pdf

Tài liệu Báo cáo khoa học: "Collecting a Why-question corpus for development and evaluation of an automatic QA-system" pdf

... each paid reward. • Qualifications To improve the data quality, a HIT can also be attached to certain tests, “qualifications” that are either system-provided or created by the requester. An example ... the assign- ments have been completed. • Rewards At upload time, each HIT has to be assigned a fixed reward, that cannot be changed later. Minimum reward is $0.01. Amazon.com collects a 10% (or a ... excess of information. FAQ-pages tend to also answer questions which are not asked, and also con- tain practical examples. Human-powered answers often contain unrelated information and discourse- like...

Ngày tải lên: 20/02/2014, 09:20

9 611 1
Tài liệu Báo cáo khoa học: "Creating a Multilingual Collocation Dictionary from Large Text Corpora" docx

Tài liệu Báo cáo khoa học: "Creating a Multilingual Collocation Dictionary from Large Text Corpora" docx

... trans- lations for creating a tri-lingual collocation dic- tionary, with samples of actual use in language. Using past translations as reference for the transla- tor's further work was an ... unable to create a complete analysis of a sentence, the Fips parser returns chunks of partial analyses. If 132 Creating a Multilingual Collocation Dictionary from Large Text Corpora Luka Nerima, ... V-Prep-N. Another argument in favour of a full syntactical analysis is that it solves the problem of all cases of extraposed elements, such as passives, topicalisa- tion, and dislocation. To illustrate...

Ngày tải lên: 22/02/2014, 02:20

4 479 0
Tài liệu Báo cáo khoa học: "WebCAGe – A Web-Harvested Corpus Annotated with GermaNet Senses" docx

Tài liệu Báo cáo khoa học: "WebCAGe – A Web-Harvested Corpus Annotated with GermaNet Senses" docx

... hand-crafted sense-annotated corpora have been available (Agirre et al., 2007; Erk and Strapparava, 2012; Mihalcea et al., 2004), while WSD research for languages that lack these corpora has lagged behind ... the 3rd In- ternational Language Resources and Evaluation (LREC’02), Las Palmas, Canary Islands, pp. 609– 612 Santamar ´ a, C., Gonzalo, J., Verdejo, F. 2003. Au- tomatic Association of Web Directories ... representative examples in Yarowsky’s ap- proach is performed completely manually and is therefore limited to the amount of data that can reasonably be annotated by hand. Leacock et al. (1998), Agirre...

Ngày tải lên: 22/02/2014, 03:20

10 419 0
Báo cáo khoa học: " a Movie Dialogue Corpus for Research and Development" potx

Báo cáo khoa học: " a Movie Dialogue Corpus for Research and Development" potx

... Seve- ral factors, such as the availability of more power- ful computers, an almost unlimited storage ca- pacity, the availability of large volumes of data in digital format, as well as the ... dialogue management and natural language generation. Springer. Stallard D (2000) Talk’n’travel: a conversational system for air travel planning. In Proceedings of the 6 th Conference on Applied ... hand, contain all additional information/texts appearing in the scripts, which are typically of narrative nature and explain what is happening in the scene. Figure 1 depicts a browser snapshot...

Ngày tải lên: 07/03/2014, 18:20

5 424 0
Báo cáo khoa học: "Personalized Normalization for a Multilingual Chat System" doc

Báo cáo khoa học: "Personalized Normalization for a Multilingual Chat System" doc

... short-forms are very irregular and hard to predict their standard forms using morphological and phonetic similarity. It is also hard to train a statistical model if training data is not available. ... Koehn &al. Moses: Open Source Toolkit for Statistical Machine Translation, ACL 2007, demonstration session. Koehn, P. (2005). Europarl: A Parallel Corpus for Statistical Machine Translation. ... flexibility and interactivity to include and manage their own vocabularies during chat. 2 ASIASPIK System Overview AsiaSpik is a web-based multilingual instant messaging system that enables online...

Ngày tải lên: 07/03/2014, 18:20

6 376 0
Báo cáo khoa học: "Generating Usable Formats for Metadata and Annotations in a Large Meeting Corpus" pptx

Báo cáo khoa học: "Generating Usable Formats for Metadata and Annotations in a Large Meeting Corpus" pptx

... metadata and annotations. The annotation files are converted to a tabular format using an eas- ily adaptable XSLT-based mechanism, and their consistency is verified in the process. Metadata files are ... order to generate tabular files (TSV) and a table-creation script. 4. Create and populate metadata tables within database. 5. Adapt the XSLT stylesheet as needed for vari- ous table formats. 5 Results: ... names or analyse folders. Moreover, the ad- vantage of creating IMDI files is that the metadata is compliant with a widely used standard accompa- nied by freely available tools such as the metadata browser....

Ngày tải lên: 08/03/2014, 02:21

4 373 0
Báo cáo khoa học: "TRANSFER IN A MULTILINGUAL MT SYSTEM" pdf

Báo cáo khoa học: "TRANSFER IN A MULTILINGUAL MT SYSTEM" pdf

... project in the world that applies (iii) not only as a matter of principle but as actual practice. We will regard a natural language as a set of texts. A translation pair is a pair of texts (T~, ... simple transfer can now be formulated as follows: If A translates-as A& apos;, then we will call A& apos; a TN of A. We now call an element s,t of the set defined by translates-as a simple ... We can then introduce a new relation, called translates-as. This is a binary relation, probably many-to-many; its left-hand term is a subtree of R , and its righthand term is a tree. Clearl~,...

Ngày tải lên: 08/03/2014, 18:20

4 286 0
Báo cáo khoa học: "Creating a Multilingual Collocation Dictionary from Large Text Corpora" ppt

Báo cáo khoa học: "Creating a Multilingual Collocation Dictionary from Large Text Corpora" ppt

... syntactical relation). When parallel corpora are available, also the translation equivalents of the collocation context are displayed, thus allowing the user to see how a given collocation was translated ... is length-based and integrates a shal- low content analysis. It begins by individuating a paragraph in the target text which is a first candi- date as target paragraph, and which we call "pivot". ... trans- lations for creating a tri-lingual collocation dic- tionary, with samples of actual use in language. Using past translations as reference for the transla- tor's further work was an...

Ngày tải lên: 08/03/2014, 21:20

4 353 0
Báo cáo khoa học: "Accurate Collocation Extraction Using a Multilingual Parser" docx

Báo cáo khoa học: "Accurate Collocation Extraction Using a Multilingual Parser" docx

... sections 4 and 5 a com- parative evaluation experiment proving that a hy- brid approach leads to more accurate results than a classical approach in which syntactic information is not taken into account. 2 ... help- ing developing nations ? 1.c) make mistake: We could look back and probably see a lot of mistakes that all parties including Canada perhaps may have made. 3 Multilingual Extraction Results In ... 2006. c 2006 Association for Computational Linguistics Accurate Collocation Extraction Using a Multilingual Parser Violeta Seretan Language Technology Laboratory University of Geneva 2, rue de Candolle,...

Ngày tải lên: 17/03/2014, 04:20

8 261 0
Báo cáo khoa học: "Analysis of Selective Strategies to Build a Dependency-Analyzed Corpus" pptx

Báo cáo khoa học: "Analysis of Selective Strategies to Build a Dependency-Analyzed Corpus" pptx

... Makoto Nagao. 199 4a. KN Parser: Japanese dependency/case structure ana- lyzer. In Proceedings of Workshop on Sharable Nat- ural Language Resources, pages 48–55. Sadao Kurohashi and Makoto Nagao. ... human to annotate. Under this framework, the system has access to a large pool of unlabeled data, and it has to predict how much it can learn from each candi- date in the pool if that candidate is labeled. Most ... E-mail Magazine is bilingual. The articles of this magazine were an- alyzed by the dependency analyzer CaboCha, and we manually corrected the errors. K-mag includes a wide variety articles, and...

Ngày tải lên: 17/03/2014, 04:20

8 488 0
Báo cáo khoa học: "Evaluating Centering-based metrics of coherence for text structuring using a reliably annotated corpus" doc

Báo cáo khoa học: "Evaluating Centering-based metrics of coherence for text structuring using a reliably annotated corpus" doc

... International Work- shop on NLG, pages 98–107, Niagara-on-the- Lake, Ontario, Canada. Eleni Miltsakaki. 2002. Towards an aposyn- thesis of topic continuity and intrasenten- tial anaphora. Computational ... in Italian. In Walker et al. (Walker et al., 1998b), pages 115–137. Aggeliki Dimitromanolaki and Ion Androut- sopoulos. 2003. Learning to order fac ts for discourse planning in natural language ... into a format appropriate for seec. The first author was able to engage in this research thanks to a scholarship from the Greek State Schol- arships Foundation (IKY). References Regina Barzilay,...

Ngày tải lên: 17/03/2014, 06:20

8 608 0
Báo cáo khoa học: "Towards a Resource for Lexical Semantics: A Large German Corpus with Extensive Semantic Annotation" pot

Báo cáo khoa học: "Towards a Resource for Lexical Semantics: A Large German Corpus with Extensive Semantic Annotation" pot

... Semantic Annotation Katrin Erk and Andrea Kowalski and Sebastian Pad ´ o and Manfred Pinkal Department of Computational Linguistics Saarland University Saarbr¨ucken, Germany {erk, kowalski, pado, ... desirable. FrameNet as a resource for semantic role an- notation. Above, we have asked about the suitabil- ity of FrameNet for semantic role annotation, and our data allow a first, though tentative, ... suit- able resource as annotation basis. FrameNet roles, which are local to particular frames (abstract sit- uations), may be better suited for the annotation task than the “classical” thematic...

Ngày tải lên: 17/03/2014, 06:20

8 407 0
Báo cáo khoa học: "JaBot: a multilingual Java-based intelligent agent for Web sites" pdf

Báo cáo khoa học: "JaBot: a multilingual Java-based intelligent agent for Web sites" pdf

... XXI. Revista de la UNED. Read T., Bhrcena E. and Faber P. (1997) Java and its role in Natural Language Processing and Machine Translation. In Proceedings of the Machine Translation Summit ... their careers. As can be seen in the diagram below, JaBot has three modules: a natural language interface, a search engine and an interactive list of references to the Web pages on the site at ... its architecture and associated data sources. Subsequently, an illustrative example of its functionality has been presented, which demonstrated that JaBot is more flexible than a traditional...

Ngày tải lên: 17/03/2014, 07:20

5 230 0

Bạn có muốn tìm thêm với từ khóa:

w