Advertisement

Machine Translation

, Volume 23, Issue 2–3, pp 117–127 | Cite as

TER-Plus: paraphrase, semantic, and alignment enhancements to Translation Edit Rate

  • Matthew G. Snover
  • Nitin Madnani
  • Bonnie Dorr
  • Richard Schwartz
Article

Abstract

This paper describes a new evaluation metric, TER-Plus (TERp) for automatic evaluation of machine translation (MT). TERp is an extension of Translation Edit Rate (TER). It builds on the success of TER as an evaluation metric and alignment tool and addresses several of its weaknesses through the use of paraphrases, stemming, synonyms, as well as edit costs that can be automatically optimized to correlate better with various types of human judgments. We present a correlation study comparing TERp to BLEU, METEOR and TER, and illustrate that TERp can better evaluate translation adequacy.

Keywords

Machine translation evaluation Paraphrasing Alignment 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL 2005 workshop on intrinsic and extrinsic evaulation measures for MT and/or summarization, pp 228–231Google Scholar
  2. Bannard C, Callison-Burch C (2005) Paraphrasing with bilingual parallel corpora. In: Proceedings of the 43rd annual meeting of the association for computational linguistics (ACL 2005). Ann Arbor, Michigan, pp 597–604Google Scholar
  3. Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press. http://www.cogsci.princeton.edu/wn Accessed 7 Sep 2000
  4. Kauchak D, Barzilay R (2006) Paraphrasing for automatic evaluation. In: Proceedings of the human language technology conference of the North American chapter of the ACL, pp 455–462Google Scholar
  5. Lavie A, Sagae K, Jayaraman S (2004) The significance of recall in automatic metrics for MT evaluation. In: Proceedings of the 6th conference of the association for machine translation in the Americas, pp 134–143Google Scholar
  6. Leusch G, Ueffing N, Ney H (2006) CDER: efficient MT evaluation using block movements. In: Proceedings of the 11th conference of the European chapter of the association for computational linguistics, pp 241–248Google Scholar
  7. Lita LV, Rogati M, Lavie A (2005) BLANC: learning evaluation metrics for MT. In: Proceedings of human language technology conference and conference on empirical methods in natural language processing (HLT/EMNLP). Vancouver, BC, pp 740–747Google Scholar
  8. Lopresti D, Tomkins A (1997) Block edit models for approximate string matching. Theor Comput Sci 181(1): 159–179MATHCrossRefMathSciNetGoogle Scholar
  9. Madnani N, Resnik P, Dorr BJ, Schwartz R (2008) Are multiple reference translations necessary? Investigating the value of paraphrased reference translations in parameter optimization. In: Proceedings of the eighth conference of the association for machine translation in the Americas, pp 143–152Google Scholar
  10. Niessen S, Och F, Leusch G, Ney H (2000) An evaluation tool for machine translation: fast evaluation for MT research. In: Proceedings of the 2nd international conference on language resources and evaluation, pp 39–45Google Scholar
  11. Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp 311–318Google Scholar
  12. Porter MF (1980) An algorithm for suffic stripping. Program 14(3): 130–137Google Scholar
  13. Przybocki M, Peterson K, Bronsart S (2008) Official results of the NIST 2008 “Metrics for MAchine TRanslation” Challenge (MetricsMATR08). http://nist.gov/speech/tests/metricsmatr/2008/results/
  14. Rosti A-V, Matsoukas S, Schwartz R (2007) Improved word-level system combination for machine translation. In: Proceedings of the 45th annual meeting of the association of computational linguistics. Prague, Czech Republic, pp 312–319Google Scholar
  15. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of association for machine translation in the Americas, pp 223–231Google Scholar
  16. Snover M, Madnani N, Dorr B, Schwartz R (2009) Fluency, adequacy, or HTER? Exploring different human judgments with a tunable MT metric. In: Proceedings of the fourth workshop on statistical machine translation. Association for Computational Linguistics, Athens, Greece, pp 259–268Google Scholar
  17. Zhou L, Lin C-Y, Hovy E (2006) Re-evaluating machine translation results with paraphrase support. In: Proceedings of the 2006 conference on empirical methods in natural language processing (EMNLP 2006), pp 77–84Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2009

Authors and Affiliations

  • Matthew G. Snover
    • 1
  • Nitin Madnani
    • 1
  • Bonnie Dorr
    • 1
  • Richard Schwartz
    • 2
  1. 1.Laboratory for Computational Linguistics and Information Processing, Institute for Advanced Computer StudiesUniversity of MarylandCollege ParkUSA
  2. 2.BBN TechnologiesCambridgeUSA

Personalised recommendations