Advertisement

Machine Translation

, Volume 27, Issue 3–4, pp 239–256 | Cite as

Sentence-level ranking with quality estimation

  • Eleftherios AvramidisEmail author
Article

Abstract

Starting from human annotations, we provide a strategy based on machine learning that performs preference ranking on alternative machine translations of the same source, at sentence level. Rankings are decomposed into pairwise comparisons so that they can be learned by binary classifiers, using black-box features derived from linguistic analysis. In order to recompose from the pairwise decisions of the classifier, they are weighed with their classification probabilities, increasing the correlation coefficient by 80 %. We also demonstrate several configurations of successful automatic ranking models. The best configurations achieve a correlation with human judgments measured by Kendall’s tau at 0.27. Although the method does not use reference translations, this correlation is comparable to the one achieved by state-of-the-art reference-aware automatic evaluation metrics such as smoothed BLEU, METEOR and Levenshtein distance.

Keywords

Quality estimation Ranking Logistic regression  Linguistic features Sentence selection 

Notes

Acknowledgments

This work has been developed within the TaraXŰ project, financed by TSB Technologiestiftung Berlin—Zukunftsfonds Berlin, co-financed by the European Union—European fund for regional development. Many thanks to Prof. Hans Uszkoreit for the supervision, Dr. Aljoscha Burchardt, Dr. Maja Popovič and Dr. David Vilar for their useful feedback, to Prof. Melanie Siegel for her support concerning the language checking tool and to Lukas Poustka for his technical help on feature acquisition.

References

  1. Avramidis E (2011) DFKI system combination with sentence ranking at ML4HMT-2011. In: Proceedings of the international workshop on using linguistic information for hybrid machine translation and of the shared task on applying machine learning techniques to optimising the division of labour in hybrid machine translation, Barcelona, Spain, pp 99–103Google Scholar
  2. Avramidis E, Popovic M, Vilar D, Burchardt A, Popović M (2011) Evaluate with confidence estimation: machine ranking of translation outputs using grammatical features. In: Proceedings of the sixth workshop on statistical machine translation, Edinburgh, UK, pp 65–70Google Scholar
  3. Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N (2004) Confidence estimation for machine translation. In: Proceedings of the 20th international conference on Computational Linguistics, Stroudsburg, PA, USAGoogle Scholar
  4. Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2007) (Meta-) Evaluation of machine translation. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 136–158Google Scholar
  5. Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2008) Further meta-evaluation of machine translation. In: Proceedings of the third workshop on statistical machine translation, Columbus, Ohio, pp 70–106Google Scholar
  6. Callison-Burch C, Koehn P, Monz C, Schroeder J (2009) Findings of the 2009 workshop on statistical machine translation. In: Proceedings of the fourth workshop on statistical machine translation, Athens, Greece, pp 1–28Google Scholar
  7. Callison-Burch C, Koehn P, Monz C, Peterson K, Przybocki M, Zaidan O (2010) Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation. In: Proceedings of the joint fifth workshop on statistical machine translation and metricsMATR, Uppsala, Sweden, pp 17–53Google Scholar
  8. Callison-Burch C, Koehn P, Monz C, Zaidan O (2011) Findings of the 2011 workshop on statistical machine translation. In: Proceedings of the sixth workshop on statistical machine translation, Edinburgh, UK, pp 22–64Google Scholar
  9. Callison-Burch C, Koehn P, Monz C, Post M, Soricut R, Specia L (2012) Findings of the 2012 workshop on statistical machine translation. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 10–51Google Scholar
  10. Cameron A (1998) Regression analysis of count data. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  11. Cleveland WS (1979) Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc 74(368):829–836MathSciNetCrossRefzbMATHGoogle Scholar
  12. Coomans D, Massart D (1982) Alternative k-nearest neighbour rules in supervised pattern recognition. Anal Chimica Acta 138:15–27CrossRefGoogle Scholar
  13. Demšar J, Zupan B, Leban G, Curk T (2004) Orange: from experimental machine learning to interactive data mining. In: Principles of data mining and knowledge discovery, pp 537–539Google Scholar
  14. Duh K (2008) Ranking vs. regression in machine translation evaluation. In: Proceedings of the third workshop on statistical machine translation, Columbus, Ohio, pp 191–194Google Scholar
  15. Federmann C, Avramidis E, Ruiz MCj, van Genabith J, Melero M, Pecina P (2012) The ML4HMT workshop on optimising the division of labour in hybrid machine translation. In: Proceedings of the 8th ELRA conference on language resources and evaluation, Istanbul, TurkeyGoogle Scholar
  16. Goodstadt L (2010) Ruffus: a lightweight Python library for computational pipelines. Bioinformatics 26(21):2778–2779CrossRefGoogle Scholar
  17. He Y, Ma Y, van Genabith J, Way A (2010) Bridging SMT and TM with translation recommendation. In: Proceedings of the 48th annual meeting of the Association for Computational Linguistics, Uppsala, Sweden, pp 622–630Google Scholar
  18. Herbrich R, Graepel T, Obermayer K (1999) Support vector learning for ordinal regression. In: International conference on artificial neural networks, pp 97–102Google Scholar
  19. Hopkins M, May J (2011) Tuning as ranking. In: Proceedings of the conference on empirical methods in natural language processing, Edinburgh, UK, pp 1352–1362Google Scholar
  20. Hosmer D (1989) Applied logistic regression, 8th edn. Wiley, New YorkGoogle Scholar
  21. Hüllermeier E, Fürnkranz J, Cheng W, Brinker K (2008) Label ranking by learning pairwise preferences. Artif Intell 172(16–17):1897–1916CrossRefzbMATHGoogle Scholar
  22. Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1–2):81–93MathSciNetzbMATHGoogle Scholar
  23. Khedr AM (2008) Learning k-nearest neighbors classifier from distributed data. Comput Inform 27(3):355–376MathSciNetzbMATHGoogle Scholar
  24. Knight WR (1966) A computer method for calculating Kendalls tau with ungrouped data. J Am Stat Assoc 61(314):436–439CrossRefzbMATHGoogle Scholar
  25. Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Conference proceedings: the tenth machine translation summit, AAMT, AAMT, Phuket, Thailand, pp 79–86Google Scholar
  26. Lavie A, Agarwal A (2007) METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the second workshop on statistical machine translation, Prague, Czech Republic, pp 228–231Google Scholar
  27. Levenshtein V (1966) Binary Codes Capable of Correcting Deletions and Insertions and Reversals. Sov Phys Doklady 10(8):707–710MathSciNetGoogle Scholar
  28. Lopez A (2012) Putting human assessments of machine translation systems in order. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 1–9Google Scholar
  29. Miller A (2002) Subset selection in regression, 2nd edn. Chapman & Hall, LondonCrossRefzbMATHGoogle Scholar
  30. Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, pp 311–318Google Scholar
  31. Parton K, Tetreault J, Madnani N, Chodorow M (2011) E-rating machine translation. In: Proceedings of the sixth workshop on statistical machine translation, Edinburgh, UK, pp 108–115Google Scholar
  32. Petrov S, Klein D (2007) Improved inference for unlexicalized parsing. In: Proceedings of the conference of the North American chapter of the Association for Computational Linguistics, Rochester, NY, pp 404–411Google Scholar
  33. Petrov S, Barrett L, Thibaux R, Klein D (2006) Learning accurate, compact, and interpretable tree annotation. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics, Sydney, Australia, pp 433–440Google Scholar
  34. Raybaud S, Lavecchia C, David L, Kamel S (2009a) Word-and sentence-level confidence measures for machine translation. In: 13th Annual meeting of the European Association for Machine Translation, European Association of Machine Translation, Barcelona, SpainGoogle Scholar
  35. Raybaud S, Lavecchia C, Langlois D, Kamel S (2009b) New confidence measures for statistical machine translation. In: Proceedings of the international conference on agents, pp 394–401Google Scholar
  36. Rosti AV, Ayan NF, Xiang B, Matsoukas S, Schwartz R, Dorr BJ (2007) Combining outputs from multiple machine translation systems. In: Proceedings of the North American chapter of the Association for Computational Linguistics Human Language Technologies, Rochester, NY, pp 228–235Google Scholar
  37. Sánchez-Martínez F (2011) Choosing the best machine translation system to translate a sentence by using only source-language information. In: Proceedings of the 15th annual conference of the European Association for Machine Translation, Leuve, Belgium, pp 97–104Google Scholar
  38. Siegel M (2011) Autorenunterstützung für die Maschinelle Übersetzung. In: Multilingual resources and multilingual applications: proceedings of the conference of the German Society for computational linguistics and language technology (GSCL), HamburgGoogle Scholar
  39. Soricut R, Narsale S (2012) Combining quality prediction and system selection for improved automatic translation output. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 163–170Google Scholar
  40. Soricut R, Wang Z, Bach N (2012) The SDL language weaver systems in the WMT12 quality estimation shared task. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 145–151Google Scholar
  41. Specia L, Turchi M, Cancedda N, Dymetman M, Cristianini N (2009) Estimating the sentence-level quality of machine translation systems. In: 13th annual meeting of the European Association for Machine Translation, Barcelona, Spain., pp 28–35Google Scholar
  42. Specia L, Raj D, Turchi M (2010) Machine translation evaluation versus quality estimation. Mach Transl 24(1):39–50CrossRefGoogle Scholar
  43. Specia L, Felice M (2012) Linguistic features for quality estimation. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 96–103Google Scholar
  44. Stolcke A (2002) SRILM—an extensible language modeling toolkit. In: Proceedings of the seventh international conference on spoken language processing, pp 901–904Google Scholar
  45. Ueffing N, Ney H (2005) Word-level confidence estimation for machine translation using phrase-based translation models. Comput Linguist, pp 763–770Google Scholar
  46. Vilar D, Avramidis E, Popović M, Hunsicker S (2011) DFKI’s SC and MT submissions to IWSLT, (2011) In: Proceedings of the international workshop on spoken language translation 2011. San Francisco, CA, USA, pp 98–105Google Scholar
  47. Wagner J, Foster J (2009) The effect of correcting grammatical errors on parse probabilities. In: Proceedings of the 11th international conference on parsing technologies, Stroudsburg, PA, USA, pp 176–179Google Scholar
  48. Ye Y, Zhou M, Lin CY (2007) Sentence level machine translation evaluation as a ranking problem: one step aside from BLEU. In: Proceedings of the second workshop on statistical machine translation, Association for Computational Linguistics, Prague, Czech Republic, pp 240–247Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  1. 1.Language Technology LabGerman Research Center for Artificial Intelligence (DFKI GmbH)BerlinGermany

Personalised recommendations