Machine Translation

, Volume 23, Issue 2–3, pp 129–140 | Cite as

Edit distances with block movements and error rate confidence estimates

  • Gregor LeuschEmail author
  • Hermann Ney


We present two evaluation measures for Machine Translation (MT), which are defined as error rates extended by block moves. In contrast to Ter, these measures are constrained in a way that allows for an exact calculation in polynomial time. We then investigate three methods to estimate the standard error of error rates, and compare them to bootstrap estimates. We assess the correlation of our proposed measures with human judgment using data from the National Institute of Standards and Technology (NIST) 2008 MetricsMATR workshop.


Machine translation Evaluation Bootstrap Confidence intervals 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bisani M, Ney H (2004) Bootstrap estimates for confidence intervals in ASR performance evaluation. In: IEEE international conference on acoustics, peech, and signal processing. Montreal, Canada, pp 409–412Google Scholar
  2. Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman & Hall, New York and LondonzbMATHGoogle Scholar
  3. Gatz DF, Smith L (1995) The standard error of a weighted mean concentration—I. Bootstrapping vs other methods. Atmos Environ 29(11): 1185–1193CrossRefGoogle Scholar
  4. Karakos D, Eisner J, Khudanpur S, Dreyer M (2008) Machine translation system combination using ITG-based alignments. In: Proceedings of ACL-08: HLT, short papers. Columbus, Ohio, pp 81–84Google Scholar
  5. Knuth DE (1993) The Stanford GraphBase: a platform for combinatorial computing. ACM Press, New York, NY, pp 74–87Google Scholar
  6. Leusch G, Ueffing N, Ney H (2003) A novel string-to-string distance measure with applications to machine translation evaluation. In: Proceedings of MT Summit IX. New Orleans, LA, pp 240–247Google Scholar
  7. Leusch G, Ueffing N, Ney H (2006) CDER: efficient MT evaluation using block movements. In: Conference of the European chapter of the association for computational linguistics. European Chapter of the Association for Computational Linguistics, Trento, Italy, pp 241–248Google Scholar
  8. Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions and reversals. Sov Phys Doklady 10(8): 707–710MathSciNetGoogle Scholar
  9. Lin C-Y, Och FJ (2004) ORANGE: a method for evaluation automatic evaluation metrics for machine translation. In: Proceedings of COLING 2004. Geneva, Switzerland, pp 501–507Google Scholar
  10. Lopresti D, Tomkins A (1997) Block edit models for approximate string matching. Theor Comput Sci 181(1): 159–179zbMATHCrossRefMathSciNetGoogle Scholar
  11. Przybocki M, Peterson K, Bronsart S (2008) Official results of the NIST 2008 Metrics for MAchine TRanslation Challenge (MetricsMATR08).
  12. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul (2006) A study of translation error rate with targeted human annotation. In: Proceedings of the 7th conference of the association for machine translation in the Americas (AMTA). Boston, MA, pp 223–231Google Scholar
  13. Tillmann C, Vogel S, Ney H, Zubiaga A, Sawaf H (1997) Accelerated DP Based search for statistical translation. In: European conference on speech communication and technology. Rhodes, Greece, pp 2667–2670Google Scholar
  14. Wu D (1995) An algorithm for simultaneously bracketing parallel texts by aligning words. In: Proceedings of the 33rd annual conference of the association for computational linguistics, pp 244–251Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2009

Authors and Affiliations

  1. 1.Computer Science DepartmentRWTH Aachen UniversityAachenGermany

Personalised recommendations