Neutralizing the Effect of Translation Shifts on Automatic Machine Translation Evaluation

  • Marina FomichevaEmail author
  • Núria Bel
  • Iria da Cunha
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9041)


State-of-the-art automatic Machine Translation [MT] evaluation is based on the idea that the closer MT output is to Human Translation [HT], the higher its quality. Thus, automatic evaluation is typically approached by measuring some sort of similarity between machine and human translations. Most widely used evaluation systems calculate similarity at surface level, for example, by computing the number of shared word n-grams. The correlation between automatic and manual evaluation scores at sentence level is still not satisfactory. One of the main reasons is that metrics underscore acceptable candidate translations due to their inability to tackle lexical and syntactic variation between possible translation options. Acceptable differences between candidate and reference translations are frequently due to optional translation shifts. It is common practice in HT to paraphrase what could be viewed as close version of the source text in order to adapt it to target language use. When a reference translation contains such changes, using it as the only point of comparison is less informative, as the differences are not indicative of MT errors. To alleviate this problem, we design a paraphrase generation system based on a set of rules that model prototypical optional shifts that may have been applied by human translators. Applying the rules to the available human reference, the system generates additional references in a principled and controlled way. We show how using linguistic rules for the generation of additional references neutralizes the negative effect of optional translation shifts on n-gram-based MT evaluation.


Translation shifts Machine Translation Evaluation Paraphrase Generation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Szymańska, I.: Mosaics. A Construction-Grammar-based approach to translation. Semper, Warszawa (2011)Google Scholar
  2. 2.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: Bleu: a method for automatic evaluation of machine translation. RC22176 (Technical Report), IBM T.J. Watson Research Center (2001)Google Scholar
  3. 3.
    Denkowski, M., Lavie, A.: Meteor Universal: Language Specific Translation Evaluation for Any Target Language. In: Proceedings of the EACL 2014 Workshop on Statistical Machine Translation (2014)Google Scholar
  4. 4.
    Doherty, M.: Language processing in discourse: a key to felicitous translation. Routledge, London (2002)CrossRefGoogle Scholar
  5. 5.
    Barrón-Cedeño, A., Vila, M., Martí, M., Rosso, P.: Plagiarism meets paraphrasing: Insights for the next generation in automatic plagiarism detection. Computational Linguistics 39(4), 917–947 (2013)CrossRefGoogle Scholar
  6. 6.
    Bhagat, R., Hovy, E.: What is a paraphrase? Computational Linguistics 39(3), 463–472 (2013)CrossRefGoogle Scholar
  7. 7.
    van Leuven-Zwart, K.M.: Translation and original: Similarities and dissimilarities. Target 1(2), 151–181 (1989)CrossRefGoogle Scholar
  8. 8.
    Cyrus, L.: Building a Resource for Studying Translation Shifts. In: Proceedings of the 5th International Conference on Linguistic Resources and Evaluation, pp. 1240–1245 (2006)Google Scholar
  9. 9.
    Ahrenberg, L.: Codified Close Translation as a Standard for MT. In: Proceedings of the 10th Annual Conference of the European Association for Machine Translation, pp. 13–22 (2005)Google Scholar
  10. 10.
    Albrecht, J., Hwa, R.: Regression for machine translation evaluation at the sentence level. Machine Translation 22(1-2), 1–27 (2008)CrossRefGoogle Scholar
  11. 11.
    Owczarzak, K., Groves, D., Genabith, J.V., Way, A.: Contextual Bitext-Derived Paraphrases in Automatic MT Evaluation. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, pp. 148–155 (2006)Google Scholar
  12. 12.
    Bannard, C., Callison-Burch, C.: Paraphrasing with bilingual parallel corpora. In: Proceedings of ACL (2005)Google Scholar
  13. 13.
    Marimon, M.: The Spanish DELPH-IN grammar. Language Resources and Evaluation 47(2), 371–397 (2013)CrossRefMathSciNetGoogle Scholar
  14. 14.
    Specia, L., Turchi, M., Cancedda, N., Dymetman, M., Cristianini, N.: Estimating the Sentence-Level Quality of Machine Translation Systems. In: 13th Conference of the European Association for Machine Translation, pp. 28–37 (2009)Google Scholar
  15. 15.
    Koehn, P.: Europarl: A Parallel Corpus for Statistical Machine Translation. MT Summit (2005)Google Scholar
  16. 16.
    Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., Marinov, S., Marsi, E.: MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering 13(2), 95–135 (2007)Google Scholar
  17. 17.
    Marimon, M., Bel, N., Padró, L.: Automatic selection of HPSG-parsed sentences for Treebank construction. Computational Linguistics 40(3), 523–531 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Institute for Applied LinguisticsUniversitat Pompeu FabraBarcelonaSpain

Personalised recommendations