Machine Translation

, Volume 28, Issue 3–4, pp 281–308 | Cite as

Data-driven annotation of binary MT quality estimation corpora based on human post-editions

Article

Abstract

Advanced computer-assisted translation (CAT) tools include automatic quality estimation (QE) mechanisms to support post-editors in identifying and selecting useful suggestions. Based on supervised learning techniques, QE relies on high-quality data annotations obtained from expensive manual procedures. However, as the notion of MT quality is inherently subjective, such procedures might result in unreliable or uninformative annotations. To overcome these issues, we propose an automatic method to obtain binary annotated data that explicitly discriminate between useful (suitable for post-editing) and useless suggestions. Our approach is fully data-driven and bypasses the need for explicit human labelling. Experiments with different language pairs and domains demonstrate that it yields better models than those based on the adaptation into binary datasets of the available QE corpora. Furthermore, our analysis suggests that the learned thresholds separating useful from useless translations are significantly lower than as suggested in the existing guidelines for human annotators. Finally, a verification experiment with several translators operating with a CAT tool confirms our empirical findings.

Keywords

Statistical MT Quality estimation Productivity   Use of post-editing data 

References

  1. Atkinson AC, Riani M (2000) Robust diagnostic regression analysis. Springer Series in Statistics. Springer, New YorkGoogle Scholar
  2. Atkinson AC, Riani M, Cerioli A (2004) Exploring multivariate data with the forward search. Springer Series in Statistics. Springer, New YorkGoogle Scholar
  3. Bach N, Huang F, Al-Onaizan Y (2011) Goodness: a method for measuring machine translation confidence. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Portland, Oregon, USA, pp 211–219Google Scholar
  4. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodological) 57:289–300MATHMathSciNetGoogle Scholar
  5. Blain F, Schwenk H, Senellart J (2012) Incremental adaptation using translation information and post-editing analysis. In: Proceedings of the international workshop on spoken language translation, Hong-Kong, China, pp 234–241Google Scholar
  6. Blatz J, Fitzgerald E, Foster G, Gandrabur S, Goutte C, Kulesza A, Sanchis A, Ueffing N (2004) Confidence estimation for machine translation. In: Proceedings of the 20th International Conference on Computational Linguistics, Switzerland, Geneva, pp 315–321Google Scholar
  7. Bojar O, Buck C, Callison-Burch C, Federmann C, Haddow B, Koehn P, Monz C, Post M, Soricut R, Specia L (2013) Findings of the 2013 workshop on statistical machine translation. In: Proceedings of the eighth workshop on statistical machine translation, Sofia, Bulgaria, pp 1–44Google Scholar
  8. Callison-Burch C, Koehn P, Monz C, Post M, Soricut R, Specia L (2012) Findings of the 2012 Workshop on statistical machine translation. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, WMT-2012, pp 10–51Google Scholar
  9. Carl M, Dragsted B, Elming J, Hardt D, Jakobsen AL (2011) The process of post-editing: a pilot study. In: Proceedings of the 8th international NLPSC workshop. Special theme: Human-machine interaction in translation, Copenhagen, Denmark, pp 131–142Google Scholar
  10. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27. doi:10.1145/1961189.1961199 CrossRefGoogle Scholar
  11. Chen CY, Yeh JY, Ke HR (2010) Plagiarism detection using ROUGE and WordNet. J Comput 2(3):34–44Google Scholar
  12. Cohn T, Specia L (2013) Modelling annotator bias with multi-task gaussian processes: an application to machine translation quality estimation. In: Proceedings of the 51st annual meeting of the association for computational linguistics, Sofia, Bulgaria, pp 32–42Google Scholar
  13. Camargo de Souza JG, Turchi M, Negri M (2014) Machine translation quality estimation across domains. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, Dublin City University and Association for Computational Linguistics. Dublin, Ireland, pp 409–420, http://www.aclweb.org/anthology/C14-1040
  14. Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7(1):1–26CrossRefMATHMathSciNetGoogle Scholar
  15. Federico M, Cattelan A, Trombetti M (2012) Measuring user productivity in machine translation enhanced computer assisted translation. In: Proceedings of the Tenth conference of the association for machine translation in the Americas, San Diego, CaliforniaGoogle Scholar
  16. Federico M, Bertoldi N, Cettolo M, Negri M, Turchi M, Trombetti M, Cattelan A, Farina A, Lupinetti D, Martines A, Massidda A, Schwenk H, Barrault L, Blain F, Koehn P, Buck C, Germann U (2014) The MateCat tool. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: system demonstrations, Dublin City University and Association for Computational Linguistics, Dublin, Ireland, pp 129–132. http://www.aclweb.org/anthology/C14-2028
  17. Garcia I (2011) Translating by post-editing: is it the way forward? Mach Transl 25(3):217–237CrossRefGoogle Scholar
  18. Graham Y, Baldwin T, Moffat A, Zobel J (2013) Continuous measurement scales in human evaluation of machine translation. In: Proceedings of the 7th linguistic annotation workshop and interoperability with discourse, Sofia, Bulgaria, pp 33–41Google Scholar
  19. Green S, Heer J, Manning CD (2013) The efficacy of human post-editing for language translation. In: Proceedings of the SIGCHI conference on human factors in computing systems, ACM, Paris, France, pp 439–448Google Scholar
  20. Guerberof A (2009) Productivity and quality in MT post-editing. In: Proceedings of Machine Translation Summit XII—Workshop: Beyond translation memories: new tools for translators MT, Ottawa, Ontario, CanadaGoogle Scholar
  21. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422CrossRefMATHGoogle Scholar
  22. Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Proceedings of Machine Translation Summit X, Phuket, Thailand, pp 79–86Google Scholar
  23. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R et al (2007) Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, Czech Republic, Prague, pp 177–180Google Scholar
  24. Koponen M (2012) Comparing human perceptions of post-editing effort with post-editing operations. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 181–190Google Scholar
  25. Koponen M, Aziz W, Ramos L, Specia L (2012) Post-editing time as a measure of cognitive effort. In: Proceedings of the AMTA 2012 workshop on post-editing technology and practice, San Diego, CA, USAGoogle Scholar
  26. Läubli S, Fishel M, Massey G, Ehrensberger-Dow M, Volk M (2013) Assessing post-editing efficiency in a realistic translation environment. In: Proceedings of Machine Translation Summit XIV Workshop on Post-editing Technology and Practice, Nice, France, pp 83–91Google Scholar
  27. Lesk M (1986) Automated sense disambiguation using machine-readable dictionaries: How to tell a pine cone from an ice cream cone. In: Proceedings of the 5th annual international conference on systems documentation (SIGDOC86), Canada, Toronto, pp 24–26Google Scholar
  28. Lin CY (2004) ROUGE: A package for automatic evaluation of summaries. In: Proceedings of the ACL workshop on text summarization branches out, Barcelona, Spain, pp 74–81Google Scholar
  29. Mehdad Y, Negri M, Federico M (2012) Match without a referee: Evaluating MT adequacy without reference translations. In: Proceedings of the seventh workshop on statistical machine translation, Montréal, Canada, pp 171–180Google Scholar
  30. O’Brien S (2011) Towards predicting post-editing productivity. Mach Transl 25(3):197–215CrossRefGoogle Scholar
  31. Papadopoulos H, Proedrou K, Vovk V, Gammerman A (2002) Inductive confidence machines for regression. In: Proceedings of the 13th European conference on machine learning, Helsinki, Finland, pp 345–356Google Scholar
  32. Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, Pennsylvania, Philadelphia, pp 311–318Google Scholar
  33. Porter M (2001) Snowball: a language for stemming algorithms. http://snowball.tartarus.org/texts/introduction.html, Accessed 01 Aug 2014
  34. Potet M, Esperança-Rodier E, Besacier L, Blanchon H (2012) Collection of a large database of french-english SMT output corrections. In: Proceedings of the eighth international conference on language resources and evaluation, Istanbul, Turkey, pp 4043–4048Google Scholar
  35. Potthast M, Barrón-Cedeño A, Eiselt A, Stein B, Rosso P (2010) Overview of the 2nd international competition on plagiarism detection. In: Notebook Papers of CLEF (2010) LABs and Workshops, Padua, ItalyGoogle Scholar
  36. Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical recipes: the art of scientific computing, 3rd edn. Cambridge University Press, New YorkGoogle Scholar
  37. Quirk CB (2004) Training a sentence-level machine translation confidence measure. In: Proceedings of the fourth international conference on language resources and evaluation, pp 825–828Google Scholar
  38. Riani M, Perrotta D, Torti F (2012) FSDA: a MATLAB toolbox for robust analysis and interactive data exploration. Chemom Intell Lab Syst 116:17–32CrossRefGoogle Scholar
  39. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of association for machine translation in the Americas, Massachusetts, USA, Cambridge, pp 223–231Google Scholar
  40. Soricut R, Echihabi A (2010) TrustRank: inducing trust in automatic translations via ranking. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Uppsala, Sweden, pp 612–621Google Scholar
  41. Specia L (2011) Exploiting objective annotations for measuring translation post-editing effort. In: Proceedings of the 15th conference of the European association for machine translation, Belgium, Leuven, pp 73–80Google Scholar
  42. Specia L, Cancedda N, Dymetman M, Turchi M, Cristianini N (2009a) Estimating the sentence-level quality of machine translation systems. In: Proceedings of the 13th annual conference of the European Association for machine translation, Barcelona, Spain, pp 28–35Google Scholar
  43. Specia L, Turchi M, Wang Z, Shawe-Taylor J, Saunders C (2009b) Improving the confidence of machine translation quality estimates. In: Proceedings of machine translation Summit XII, Ottawa, Ontario, CanadaGoogle Scholar
  44. Specia L, Raj D, Turchi M (2010) Machine translation evaluation versus quality estimation. Machine Transl 24(1):39–50CrossRefGoogle Scholar
  45. Specia L, Shah K, C de Souza JG, Cohn T (2013) QuEst-a translation quality estimation framework. In: Proceedings of the 51st annual meeting of the association for computational linguistics: system demonstrations, Sofia, Bulgaria, pp 79–84Google Scholar
  46. Steinberger R, Pouliquen B, Widiger A, Ignat C, Erjavec T, Tufis D, Varga D (2006) The JRC-Acquis: a multilingual aligned parallel corpus with 20+ languages. arXiv preprint cs/0609058
  47. Turchi M, Negri M, Federico M (2013) Coping with the subjectivity of human judgements in MT quality estimation. In: Proceedings of the eighth workshop on statistical machine translation, Sofia, Bulgaria, pp 240–251Google Scholar
  48. Turchi M, Anastasopoulos A, C de Souza JG, Negri M (2014) Adaptive quality estimation for machine translation. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (Volume 1: Long Papers), Association for Computational Linguistics. Baltimore, Maryland, pp 710–720. http://www.aclweb.org/anthology/P14-1067
  49. Zhechev V (2012) Machine translation infrastructure and post-editing performance at Autodesk. In: AMTA 2012 workshop on post-editing technology and practice, San Diego, USA, pp 87–96Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  1. 1.Fondazione Bruno KesslerPovoItaly

Personalised recommendations