Machine Translation

, Volume 24, Issue 1, pp 27–38

Metric and reference factors in minimum error rate training

Authors

    • CNGL, School of ComputingDublin City University
  • Andy Way
    • CNGL, School of ComputingDublin City University
Article

DOI: 10.1007/s10590-010-9072-7

Cite this article as:
He, Y. & Way, A. Machine Translation (2010) 24: 27. doi:10.1007/s10590-010-9072-7

Abstract

In Minimum Error Rate Training (MERT), Bleu is often used as the error function, despite the fact that it has been shown to have a lower correlation with human judgment than other metrics such as Meteor and Ter. In this paper, we present empirical results in which parameters tuned on Bleu may lead to sub-optimal Bleu scores under certain data conditions. Such scores can be improved significantly by tuning on an entirely different metric altogether, e.g. Meteor, by 0.0082 Bleu or 3.38% relative improvement on the WMT08 English–French data. We analyze the influence of the number of references and choice of metrics on the result of MERT and experiment on different data sets. We show the problems of tuning on a metric that is not designed for the single reference scenario and point out some possible solutions.

Keywords

Minimum Error Rate TrainingMachine translation evaluationLog-linear phrase-based statistical machine translationBLEUMETEORTERChunk penalty

Copyright information

© Springer Science+Business Media B.V. 2010