Edit distances with block movements and error rate confidence estimates
- 83 Downloads
We present two evaluation measures for Machine Translation (MT), which are defined as error rates extended by block moves. In contrast to Ter, these measures are constrained in a way that allows for an exact calculation in polynomial time. We then investigate three methods to estimate the standard error of error rates, and compare them to bootstrap estimates. We assess the correlation of our proposed measures with human judgment using data from the National Institute of Standards and Technology (NIST) 2008 MetricsMATR workshop.
KeywordsMachine translation Evaluation Bootstrap Confidence intervals
Unable to display preview. Download preview PDF.
- Bisani M, Ney H (2004) Bootstrap estimates for confidence intervals in ASR performance evaluation. In: IEEE international conference on acoustics, peech, and signal processing. Montreal, Canada, pp 409–412Google Scholar
- Karakos D, Eisner J, Khudanpur S, Dreyer M (2008) Machine translation system combination using ITG-based alignments. In: Proceedings of ACL-08: HLT, short papers. Columbus, Ohio, pp 81–84Google Scholar
- Knuth DE (1993) The Stanford GraphBase: a platform for combinatorial computing. ACM Press, New York, NY, pp 74–87Google Scholar
- Leusch G, Ueffing N, Ney H (2003) A novel string-to-string distance measure with applications to machine translation evaluation. In: Proceedings of MT Summit IX. New Orleans, LA, pp 240–247Google Scholar
- Leusch G, Ueffing N, Ney H (2006) CDER: efficient MT evaluation using block movements. In: Conference of the European chapter of the association for computational linguistics. European Chapter of the Association for Computational Linguistics, Trento, Italy, pp 241–248Google Scholar
- Lin C-Y, Och FJ (2004) ORANGE: a method for evaluation automatic evaluation metrics for machine translation. In: Proceedings of COLING 2004. Geneva, Switzerland, pp 501–507Google Scholar
- Przybocki M, Peterson K, Bronsart S (2008) Official results of the NIST 2008 Metrics for MAchine TRanslation Challenge (MetricsMATR08). http://nist.gov/speech/tests/metricsmatr/2008/results/
- Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul (2006) A study of translation error rate with targeted human annotation. In: Proceedings of the 7th conference of the association for machine translation in the Americas (AMTA). Boston, MA, pp 223–231Google Scholar
- Tillmann C, Vogel S, Ney H, Zubiaga A, Sawaf H (1997) Accelerated DP Based search for statistical translation. In: European conference on speech communication and technology. Rhodes, Greece, pp 2667–2670Google Scholar
- Wu D (1995) An algorithm for simultaneously bracketing parallel texts by aligning words. In: Proceedings of the 33rd annual conference of the association for computational linguistics, pp 244–251Google Scholar