metrics for mt evaluation evaluating reordering

Báo cáo khoa học: "Combining Coherence Models and Machine Translation Evaluation Metrics for Summarization Evaluation" doc

Báo cáo khoa học: "Combining Coherence Models and Machine Translation Evaluation Metrics for Summarization Evaluation" doc

Ngày tải lên : 07/03/2014, 18:20
... 4a and 4b, evaluation metrics al- ways correlate better on the initial task than on the update task. This suggests that there is much room for improvement for readability metrics, and metrics need ... DICOMER – a DIscourse COherence Model for Evaluating Readability. LIN outperforms all metrics on all correlations on both tasks. On the initial task, it outperforms the best scores by 3.62%, 16.20%, ... Explicit/Non-Explicit information, and demonstrate that they improve the original model. There are parallels between evaluations of ma- chine translation (MT) and summarization with re- spect to textual content. For...
  • 9
  • 351
  • 0
Báo cáo khoa học: "A Graphical Interface for MT Evaluation and Error Analysis" doc

Báo cáo khoa học: "A Graphical Interface for MT Evaluation and Error Analysis" doc

Ngày tải lên : 16/03/2014, 20:20
... offering a rich set of metrics and meta -metrics for assessing MT quality (Gim ´ enez and M ` arquez, 2010a). Although automatic MT evaluation is still far from manual evaluation, it is indeed ... Association for Computational Linguistics, pages 139–144, Jeju, Republic of Korea, 8-14 July 2012. c 2012 Association for Computational Linguistics A Graphical Interface for MT Evaluation and ... existing evaluation measures and to support the development of further improve- ments or even totally new evaluation metrics. This information can be gathered both from the experi- 139 Figure 1: MT...
  • 6
  • 453
  • 0
Tài liệu Báo cáo khoa học: "a Precision-Order-Recall MT Evaluation Metric for Tuning" pdf

Tài liệu Báo cáo khoa học: "a Precision-Order-Recall MT Evaluation Metric for Tuning" pdf

Ngày tải lên : 19/02/2014, 19:20
... word alignment information. 3 Experiments 3.1 PORT as an Evaluation Metric We studied PORT as an evaluation metric on WMT data; test sets include WMT 2008, WMT 2009, and WMT 2010 all-to-English, ... Birch and M. Osborne. 2011. Reordering Metrics for MT. In Proceedings of ACL. C. Callison-Burch, C. Fordyce, P. Koehn, C. Monz and J. Schroeder. 2008. Further Meta -Evaluation of Machine Translation. ... and 22.0% ties). 1 Introduction Automatic evaluation metrics for machine translation (MT) quality are a key part of building statistical MT (SMT) systems. They play two 1 PORT: Precision-Order-Recall...
  • 10
  • 387
  • 0
Báo cáo khoa học: "A Re-examination of Machine Learning Approaches for Sentence-Level MT Evaluation" ppt

Báo cáo khoa học: "A Re-examination of Machine Learning Approaches for Sentence-Level MT Evaluation" ppt

Ngày tải lên : 08/03/2014, 02:21
... human assessment are higher than stan- dard automatic evaluation metrics. 2 MT Evaluation Recent automatic evaluation metrics typically frame the evaluation problem as a comparison task: how similar ... in- valuable resource for measuring the reliability of au- tomatic evaluation metrics. In this paper, we show that they are also informative in developing better metrics. 3 MT Evaluation with Machine ... Meeting of the Association for Computa- tional Linguistics, July. Chin-Yew Lin and Franz Josef Och. 2004b. Orange: a method for evaluating automatic evaluation metrics for ma- chine translation....
  • 8
  • 476
  • 0
Tài liệu Báo cáo khoa học: "Collecting Highly Parallel Data for Paraphrase Evaluation" doc

Tài liệu Báo cáo khoa học: "Collecting Highly Parallel Data for Paraphrase Evaluation" doc

Ngày tải lên : 20/02/2014, 04:20
... these metrics cor- relate highly with human judgments. 1 Introduction Machine paraphrasing has many applications for natural language processing tasks, including ma- chine translation (MT) , MT evaluation, ... Paraphrase Evaluation Metrics One of the limitations to the development of ma- chine paraphrasing is the lack of standard metrics like BLEU, which has played a crucial role in driv- ing progress in MT. ... for what constitutes a high-quality para- phrase. In addition to the lack of standard datasets for training and testing, there are also no standard metrics like BLEU (Papineni et al., 2002) for...
  • 11
  • 418
  • 0
Tài liệu Báo cáo khoa học: "MT Evaluation: Human-like vs. Human Acceptable" doc

Tài liệu Báo cáo khoa học: "MT Evaluation: Human-like vs. Human Acceptable" doc

Ngày tải lên : 20/02/2014, 12:20
... Similarity Metrics We begin by defining a set of 22 similarity metrics taken from the list of standard evaluation metrics in Subsection 2.1. Evaluation metrics can be tuned into similarity metrics ... families of similarity metrics form a set of 104 metrics. Our goal is to obtain the subset of metrics with highest descriptive power; for this, we rely on the KING probability. A brute force exploration ... references: ORANGE was introduced by Lin and Och (2004b) 6 for the meta -evaluation of MT evalua- tion metrics. The measure provides information about the average behavior of auto- matic and manual...
  • 8
  • 334
  • 0
Tài liệu Báo cáo khoa học: "A Unified Framework for Automatic Evaluation using N-gram Co-Occurrence Statistics" pptx

Tài liệu Báo cáo khoa học: "A Unified Framework for Automatic Evaluation using N-gram Co-Occurrence Statistics" pptx

Ngày tải lên : 20/02/2014, 16:20
... R 2 for the family of metrics AEv(α,N), for correctness scores, second QA evaluation A Unified Framework for Automatic Evaluation using N-gram Co-Occurrence Statistics Radu SORICUT Information ... penalized). Another evaluation we consider in this paper, the DUC 2001 evaluation for Automatic Summarization (also performed by NIST), had specific guidelines for coverage evaluation, which ... Unified Framework for Automatic Evaluation In this section we propose a family of evaluation metrics based on N-gram co-occurrence statistics. Such a family of evaluation metrics provides...
  • 8
  • 462
  • 0
Tài liệu Báo cáo khoa học: "Extending the BLEU MT Evaluation Method with Frequency Weightings" pdf

Tài liệu Báo cáo khoa học: "Extending the BLEU MT Evaluation Method with Frequency Weightings" pdf

Ngày tải lên : 20/02/2014, 16:20
... used in the vec- tor-space model for Information Retrieval (Salton and Leck, 1968) and the S-score proposed for evaluating MT output corpora for the purposes of Information Extraction (Babych ... scores for both runs were compared using a standard deviation measure. 3. The results of the MT evaluation with frequency weights With respect to evaluating MT systems, the cor- relation for ... for translation: MT systems that have no means for prioritising this information often in- troduce excessive information noise into the tar- get text by literally translating structural information,...
  • 8
  • 267
  • 0
Báo cáo " A web-based decision support system for the evaluation and strategic planning using ISO 9000 factors in higher education " pot

Báo cáo " A web-based decision support system for the evaluation and strategic planning using ISO 9000 factors in higher education " pot

Ngày tải lên : 05/03/2014, 14:20
... 9000 factors for an evaluation and a strategic university planning. For the implementation, a Web-based DSS is based on ISO 9000 factors for the evaluation and strategic planning for a case study ... alternatives for an evaluation model / a strategic university planning. 3. DSS model application for an evaluation and a strategy planning 3.1. Application model using ISO 9000 factors for a strategic ... The forth step is to analyze the hierarchy model using ISO 9000 factors for an evaluation and a strategic planning. The final step is to build a Web-based DSS application based on AHP model for...
  • 12
  • 541
  • 0
The ‘global health’ education framework: a conceptual guide for monitoring, evaluation and practice doc

The ‘global health’ education framework: a conceptual guide for monitoring, evaluation and practice doc

Ngày tải lên : 05/03/2014, 22:21
... on overall driving forces for education reforms be consid- ered (Figure 5). Indicators Finally, we d educe ten core indicators from the above framework for the purpose of monitoring and evaluation via ... higher policy and decision-making fora, but equally - and potentially more important - they can be bottom-up, that is promoted and enforced by the health workforce, for instance by means of addressing ... the evaluation of educational interventions or the monitoring of curri- culum development during education reforms. It further suggests comprehensive consideration of the driving forces for education...
  • 12
  • 884
  • 0
Báo cáo khoa học: "Incremental HMM Alignment for MT System Combination" pot

Báo cáo khoa học: "Incremental HMM Alignment for MT System Combination" pot

Ngày tải lên : 17/03/2014, 01:20
... tabular form CN, and E i (k) to denote the cell at the k-th row and the i-th column. W(k ) is the weight for E(k), and W i (k) = W (k) is the weight for E i (k). p i (k) is the normalized weight for ... newsgroup sections of MT0 6, whereas the test set is the entire MT0 8. The 10- best translations for every source sentence in the dev and test sets are collected from eight MT sys- tems. Case-insensitive ... Open MT evaluation. 1 Introduction Word-level combination using confusion network (Matusov et al. (2006) and Rosti et al. (2007)) is a widely adopted approach for combining Machine Translation (MT) ...
  • 9
  • 263
  • 0
Báo cáo khoa học: "An Automatic Method for Summary Evaluation Using Multiple Evaluation Results by a Manual Method" pptx

Báo cáo khoa học: "An Automatic Method for Summary Evaluation Using Multiple Evaluation Results by a Manual Method" pptx

Ngày tải lên : 17/03/2014, 04:20
... 2006. c 2006 Association for Computational Linguistics An Automatic Method for Summary Evaluation Using Multiple Evaluation Results by a Manual Method Hidetsugu Nanba Faculty of Information Sciences, ... section, are necessary for a more accurate summary evaluation. 3 Investigation of an Automatic Method using Multiple Manual Evaluation Results 3.1 Overview of Our Evaluation Method and ... Consortium. 2 http://www.nist.gov/speech/tests /mt/ mt2001/resource/ 604 tested ROUGE and cosine distance, both of which have been used for summary evaluation. If a score by Yasuda’s method exceeds...
  • 8
  • 359
  • 0
Báo cáo khoa học: "QARLA:A Framework for the Evaluation of Text Summarization Systems" pdf

Báo cáo khoa học: "QARLA:A Framework for the Evaluation of Text Summarization Systems" pdf

Ngày tải lên : 17/03/2014, 05:20
... is, therefore, how to find informative metrics, and then how to combine them into an op- timal single quality estimation for automatic sum- maries. The most immediate way of combining metrics is ... and (iii) test whether evaluating with that test-bed is reliable (JACK measure). 2 Formal constraints on any evaluation framework based on similarity metrics We are looking for a framework to evaluate ... Lin. 2004. Orange: a Method for Evaluating Au- tomatic Metrics for Machine Translation. In Pro- ceedings of the 36th Annual Conference on Compu- tational Linguisticsion for Computational Linguis- tics...
  • 10
  • 517
  • 0
Báo cáo khoa học: "A Figure of Merit for the Evaluation of Web-Corpus Randomness" ppt

Báo cáo khoa học: "A Figure of Merit for the Evaluation of Web-Corpus Randomness" ppt

Ngày tải lên : 17/03/2014, 22:20
... whole corpus (BNC). C is the total number of categories. W stands for Written, S for Spoken. C1, C2, DE, UN are demographic classes for the spontaneous conversations, no cat is the BNC undefined category. ples ... to investigate how the choice of the biased sampling method affects the performance of our procedure and its relations to uniform sampling. 3.1 Corpora as unigram distributions A compact way of representing ... collections of doc- uments is closely related to the similarity of the 218 A Figure of Merit for the Evaluation of Web-Corpus Randomness Massimiliano Ciaramita Institute of Cognitive Science and...
  • 8
  • 436
  • 0

Xem thêm