Advertisement

What Level of Quality Can Neural Machine Translation Attain on Literary Text?

  • Antonio Toral
  • Andy Way
Chapter
Part of the Machine Translation: Technologies and Applications book series (MATRA, volume 1)

Abstract

Given the rise of the new neural approach to machine translation (NMT) and its promising performance on different text types, we assess the translation quality it can attain on what is perceived to be the greatest challenge for MT: literary text. Specifically, we target novels, arguably the most popular type of literary text. We build a literary-adapted NMT system for the English-to-Catalan translation direction and evaluate it against a system pertaining to the previous dominant paradigm in MT: statistical phrase-based MT (PBSMT). To this end, for the first time we train MT systems, both NMT and PBSMT, on large amounts of literary text (over 100 million words) and evaluate them on a set of 12 widely known novels spanning from the 1920s to the present day. According to the BLEU automatic evaluation metric, NMT is significantly better than PBSMT (p < 0.01) on all the novels considered. Overall, NMT results in a 11% relative improvement (3 points absolute) over PBSMT. A complementary human evaluation on three of the books shows that between 17% and 34% of the translations, depending on the book, produced by NMT (versus 8% and 20% with PBSMT) are perceived by native speakers of the target language to be of equivalent quality to translations produced by a professional human translator.

Keywords

Translation quality assessment Principles to practice Literature translation Neural machine translation Pairwise ranking Phrase-based statistical machine translation 

Notes

Acknowledgements

Carme Armentano and Álvaro Bellón ranked the translations used for the human evaluation. The research leading to these results has received funding from the European Association for Machine Translation through its 2015 sponsorship of activities programme (project PiPeNovel). The second author is supported by the ADAPT Centre for Digital Content Technology, funded under the SFI Research Centres Programme (Grant 13/RC/2106). We would like to thank the Center for Information Technology of the University of Groningen and the Irish Centre for High-End Computing (http://www.ichec.ie) for providing computational infrastructure.

References

  1. Bentivogli L, Bisazza A, Cettolo M, Federico M (2016) Neural versus phrase-based machine translation quality: a case study. In: Proceedings of the 2016 conference on empirical methods in natural language processing, Austin, pp 257–267Google Scholar
  2. Besacier L (2014) Traduction automatisée d’une oeuvre littéraire: une étude pilote. In: Traitement automatique du langage naturel (TALN), Marseille, http://hal.inria.fr/hal-01003944
  3. Bird S (2006) NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL on interactive presentation sessions, Sydney, pp 69–72Google Scholar
  4. Bojar O, Chatterjee R, Federmann C, Graham Y, Haddow B, Huck M, Jimeno Yepes A, Koehn P, Logacheva V, Monz C, Negri M, Neveol A, Neves M, Popel M, Post M, Rubino R, Scarton C, Specia L, Turchi M, Verspoor K, Zampieri M (2016) Findings of the 2016 conference on machine translation. In: Proceedings of the first conference on machine translation, Berlin, pp 131–198Google Scholar
  5. Cherry C, Foster G (2012) Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, Montréal, pp 427–436Google Scholar
  6. Durrani N, Schmid H, Fraser A (2011) A joint sequence translation model with integrated reordering. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: human language technologies, vol 1, Portland, pp 1045–1054Google Scholar
  7. Federmann C (2012) Appraise: an open-source toolkit for manual evaluation of machine translation output. Prague Bull Math Linguist 98:25–35Google Scholar
  8. Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5): 378–382Google Scholar
  9. Forcada ML, Ñeco RP (1997) Recursive hetero-associative memories for translation. In: Proceedings of the biological and artificial computation: from neuroscience to technology: international work-conference on artificial and natural neural networks, IWANN’97 Lanzarote, 4–6 June 1997. Springer, Berlin/Heidelberg, pp 453–462Google Scholar
  10. Genzel D, Uszkoreit J, Och F (2010) “Poetic” statistical machine translation: rhyme and meter. In: Proceedings of the 2010 conference on empirical methods in natural language processing, Cambridge, pp 158–166Google Scholar
  11. Greene E, Bodrumlu T, Knight K (2010) Automatic analysis of rhythmic poetry with applications to generation and translation. In: Proceedings of the 2010 conference on empirical methods in natural language processing, Cambridge, pp 524–533Google Scholar
  12. Hardmeier C (2014) Discourse in statistical machine translation. PhD thesis, University of UppsalaGoogle Scholar
  13. Heafield K (2011) Kenlm: faster and smaller language model queries. In: Proceedings of the sixth workshop on statistical machine translation, Edinburgh, pp 187–197Google Scholar
  14. Jones R, Irvine A (2013) The (un)faithful machine translator. In: Proceedings of the 7th workshop on language technology for cultural heritage, social sciences, and humanities, Sofia, pp 96–101Google Scholar
  15. Junczys-Dowmunt M, Dwojak T, Hoang H (2016) Is neural machine translation ready for deployment? A case study on 30 translation directions. arXiv preprint arXiv:161001108Google Scholar
  16. Klubička F, Toral A, Sánchez-Cartagena VM (2017) Fine-grained human evaluation of neural versus phrase-based machine translation. Prague Bull Math Linguist 108:121–132Google Scholar
  17. Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of the conference on empirical methods in natural language processing, Barcelona, vol 4, pp 388–395Google Scholar
  18. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, Prague, pp 177–180Google Scholar
  19. Landis RJ, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174Google Scholar
  20. Li L, Sporleder C (2010) Using Gaussian mixture models to detect figurative language in context. In: Human language technologies: the 2010 annual conference of the North American chapter of the Association for Computational Linguistics, Los Angeles, pp 297–300Google Scholar
  21. Ljubešić N, Toral A (2014) caWaC – a Web corpus of Catalan and its application to language modeling and machine translation. In: Proceedings of the ninth international conference on language resources and evaluation (LREC’14), Reykjavik, pp 1728–1732Google Scholar
  22. Luong MT, Manning CD (2015) Stanford neural machine translation systems for spoken language domain. In: Proceedings of the international workshop on spoken language translation, Da Nang, pp 76–79Google Scholar
  23. Padró L, Stanilovsky E (2012) Freeling 3.0: towards wider multilinguality. In: Proceedings of the eighth international conference on language resources and evaluation (LREC-2012), Istanbul, pp 2473–2479Google Scholar
  24. Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of Association for Computational Linguistics, Philadelphia, pp 311–318Google Scholar
  25. Pecina P, Toral A, Papavassiliou V, Prokopidis P, Tamchyna A, Way A, van Genabith J (2014) Domain adaptation of statistical machine translation with domain-focused web crawling. Lang Resour Eval 49(1):147–193Google Scholar
  26. Reyes A (2013) Linguistic-based patterns for figurative language processing: the case of humor recognition and irony detection. Procesamiento del Lenguaje Natural 50:107–109Google Scholar
  27. Sakaguchi K, Post M, Van Durme B (2014) Efficient elicitation of annotations for human evaluation of machine translation. In: Proceedings of the ninth workshop on statistical machine translation, Baltimore, pp 1–11Google Scholar
  28. Sánchez-Cartagena VM, Toral A (2016) Abu-Matran at WMT 2016 translation task: deep learning, morphological segmentation and tuning on character sequences. In: Proceedings of the first conference on machine translation, Berlin, pp 362–370Google Scholar
  29. Sennrich R (2012) Perplexity minimization for translation model domain adaptation in statistical machine translation. In: Proceedings of the 13th conference of the European chapter of the Association for Computational Linguistics, Avignon, pp 539–549Google Scholar
  30. Sennrich R, Haddow B, Birch A (2015) Improving neural machine translation models with monolingual data. arXiv preprint arXiv:151106709Google Scholar
  31. Sennrich R, Haddow B, Birch A (2016a) Edinburgh neural machine translation systems for WMT 16. In: Proceedings of the first conference on machine translation, Berlin, pp 371–376Google Scholar
  32. Sennrich R, Haddow B, Birch A (2016b) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the Association for Computational Linguistics, Berlin. Long papers, vol 1, pp 1715–1725Google Scholar
  33. Sennrich R, Firat O, Cho K, Birch A, Haddow B, Hitschler J, Junczys-Dowmunt M, Läubli S, Miceli Barone AV, Mokry J, Nadejde M (2017) Nematus: a toolkit for neural machine translation. In: Proceedings of the software demonstrations of the 15th conference of the European chapter of the Association for Computational Linguistics, Valencia, pp 65–68Google Scholar
  34. Shutova E, Teufel S, Korhonen A (2013) Statistical metaphor processing. Comput Linguist 39(2):301–353Google Scholar
  35. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006: proceedings of the 7th conference of the association for machine translation in the Americas, “Visions for the Future of Machine Translation”, Cambridge, pp 223–231Google Scholar
  36. Stolcke A (2002) SRILM-an extensible language modeling toolkit. In: Proceedings of the 7th international conference on spoken language processing (ICSLP 2002), Denver, pp 901–904Google Scholar
  37. Toral A, Sánchez-Cartagena VM (2017) A multifaceted evaluation of neural versus phrase-based machine translation for 9 language directions. In: Proceedings of the 15th conference of the European chapter of the Association for Computational Linguistics: volume 1, Long Papers, Valencia, pp 1063–1073Google Scholar
  38. Toral A, Way A (2015a) Machine-assisted translation of literary text: a case study. Translat Spaces 4:241–268Google Scholar
  39. Toral A, Way A (2015b) Translating literary text between related languages using SMT. In: Proceedings of the fourth workshop on computational linguistics for literature, Denver, pp 123–132Google Scholar
  40. Varga D, Halaácsy P, Kornai A, Nagy V, Németh L, Trón V (2005) Parallel corpora for medium density languages. In: International conference RANLP-2005, recent advances in natural language processing, proceedings, Borovets, pp 590–596Google Scholar
  41. Vaswani A, Zhao Y, Fossum V, Chiang D (2013) Decoding with large-scale neural language models improves translation. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Seattle, pp 1387–1392Google Scholar
  42. Voigt R, Jurafsky D (2012) Towards a literary machine translation: the role of referential cohesion. In: Proceedings of the NAACL-HLT 2012 workshop on computational linguistics for literature, Montrèal, pp 18–25Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Faculty of Arts, Center for Language and CognitionUniversity of GroningenGroningenThe Netherlands
  2. 2.ADAPT Centre/School of ComputingDublin City UniversityDublinIreland

Personalised recommendations