Skip to main content

What Level of Quality Can Neural Machine Translation Attain on Literary Text?

  • Chapter
  • First Online:
Book cover Translation Quality Assessment

Part of the book series: Machine Translation: Technologies and Applications ((MATRA,volume 1))

Abstract

Given the rise of the new neural approach to machine translation (NMT) and its promising performance on different text types, we assess the translation quality it can attain on what is perceived to be the greatest challenge for MT: literary text. Specifically, we target novels, arguably the most popular type of literary text. We build a literary-adapted NMT system for the English-to-Catalan translation direction and evaluate it against a system pertaining to the previous dominant paradigm in MT: statistical phrase-based MT (PBSMT). To this end, for the first time we train MT systems, both NMT and PBSMT, on large amounts of literary text (over 100 million words) and evaluate them on a set of 12 widely known novels spanning from the 1920s to the present day. According to the BLEU automatic evaluation metric, NMT is significantly better than PBSMT (pā€‰<ā€‰0.01) on all the novels considered. Overall, NMT results in a 11% relative improvement (3 points absolute) over PBSMT. A complementary human evaluation on three of the books shows that between 17% and 34% of the translations, depending on the book, produced by NMT (versus 8% and 20% with PBSMT) are perceived by native speakers of the target language to be of equivalent quality to translations produced by a professional human translator.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    For example, in the US the market share of e-books surpassed that of printed books for fiction in 2014, http://www.ingenta.com/blog-article/adding-up-the-invisible-ebook-market-analysis-of-author-earnings-january-2015-2/

  2. 2.

    Working models of NMT have only recently been introduced, but from a theoretical perspective, very similar models can be traced back two decades (Forcada and Ƒeco 1997).

  3. 3.

    http://events.technologyreview.com/video/watch/alan-packer-understanding-language/

  4. 4.

    With this term we refer to European languages with around 5ā€“10 million speakers, as is the case of many other languages in Europe, such as Danish, Serbian, Czech, etc.

  5. 5.

    While our experiments are for the English-to-Catalan language pair, we also use English monolingual data to generate synthetic data for our NMT system (see Sect. 4.2).

  6. 6.

    http://opus.lingfil.uu.se/OpenSubtitles.php

  7. 7.

    In order to build the test sets we sentence-align the source and target versions of the books. We keep the subset of sentence pairs whose alignment score is above a certain threshold. See Sect. 3.3.1 for further details.

  8. 8.

    https://calibre-ebook.com/

  9. 9.

    http://sourceforge.net/projects/apertium/files/apertium-en-ca/0.9.3/

  10. 10.

    https://github.com/rsennrich/nematus

  11. 11.

    http://www.cs.cmu.edu/~ark/MT/paired_bootstrap_v13a.tar.gz

  12. 12.

    https://github.com/cfedermann/Appraise

  13. 13.

    Training is performed on an NVIDIA Tesla K20X GPU.

  14. 14.

    e.g. http://www.statmt.org/wmt17/translation-task.html

  15. 15.

    As mentioned in Sect. 3.3.1, the source novels and their human translations were sentence-aligned automatically. The empirically set confidence threshold results in most alignments being correct, but some are erroneous.

  16. 16.

    While the majority of HT<ā€‰MT cases are unjustified, not all of them are. By removing these rankings, the results are slightly biased in favour of HT and thus overly conservative with respect to the potential of MT.

  17. 17.

    https://github.com/mjpost/wmt15

References

  • Bentivogli L, Bisazza A, Cettolo M, Federico M (2016) Neural versus phrase-based machine translation quality: a case study. In: Proceedings of the 2016 conference on empirical methods in natural language processing, Austin, pp 257ā€“267

    Google ScholarĀ 

  • Besacier L (2014) Traduction automatisĆ©e dā€™une oeuvre littĆ©raire: une Ć©tude pilote. In: Traitement automatique du langage naturel (TALN), Marseille, http://hal.inria.fr/hal-01003944

  • Bird S (2006) NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL on interactive presentation sessions, Sydney, pp 69ā€“72

    Google ScholarĀ 

  • Bojar O, Chatterjee R, Federmann C, Graham Y, Haddow B, Huck M, Jimeno Yepes A, Koehn P, Logacheva V, Monz C, Negri M, Neveol A, Neves M, Popel M, Post M, Rubino R, Scarton C, Specia L, Turchi M, Verspoor K, Zampieri M (2016) Findings of the 2016 conference on machine translation. In: Proceedings of the first conference on machine translation, Berlin, pp 131ā€“198

    Google ScholarĀ 

  • Cherry C, Foster G (2012) Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, MontrĆ©al, pp 427ā€“436

    Google ScholarĀ 

  • Durrani N, Schmid H, Fraser A (2011) A joint sequence translation model with integrated reordering. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: human language technologies, vol 1, Portland, pp 1045ā€“1054

    Google ScholarĀ 

  • Federmann C (2012) Appraise: an open-source toolkit for manual evaluation of machine translation output. Prague Bull Math Linguist 98:25ā€“35

    Google ScholarĀ 

  • Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5): 378ā€“382

    Google ScholarĀ 

  • Forcada ML, Ƒeco RP (1997) Recursive hetero-associative memories for translation. In: Proceedings of the biological and artificial computation: from neuroscience to technology: international work-conference on artificial and natural neural networks, IWANNā€™97 Lanzarote, 4ā€“6 June 1997. Springer, Berlin/Heidelberg, pp 453ā€“462

    Google ScholarĀ 

  • Genzel D, Uszkoreit J, Och F (2010) ā€œPoeticā€ statistical machine translation: rhyme and meter. In: Proceedings of the 2010 conference on empirical methods in natural language processing, Cambridge, pp 158ā€“166

    Google ScholarĀ 

  • Greene E, Bodrumlu T, Knight K (2010) Automatic analysis of rhythmic poetry with applications to generation and translation. In: Proceedings of the 2010 conference on empirical methods in natural language processing, Cambridge, pp 524ā€“533

    Google ScholarĀ 

  • Hardmeier C (2014) Discourse in statistical machine translation. PhD thesis, University of Uppsala

    Google ScholarĀ 

  • Heafield K (2011) Kenlm: faster and smaller language model queries. In: Proceedings of the sixth workshop on statistical machine translation, Edinburgh, pp 187ā€“197

    Google ScholarĀ 

  • Jones R, Irvine A (2013) The (un)faithful machine translator. In: Proceedings of the 7th workshop on language technology for cultural heritage, social sciences, and humanities, Sofia, pp 96ā€“101

    Google ScholarĀ 

  • Junczys-Dowmunt M, Dwojak T, Hoang H (2016) Is neural machine translation ready for deployment? A case study on 30 translation directions. arXiv preprint arXiv:161001108

    Google ScholarĀ 

  • Klubička F, Toral A, SĆ”nchez-Cartagena VM (2017) Fine-grained human evaluation of neural versus phrase-based machine translation. Prague Bull Math Linguist 108:121ā€“132

    Google ScholarĀ 

  • Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of the conference on empirical methods in natural language processing, Barcelona, vol 4, pp 388ā€“395

    Google ScholarĀ 

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, Prague, pp 177ā€“180

    Google ScholarĀ 

  • Landis RJ, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159ā€“174

    Google ScholarĀ 

  • Li L, Sporleder C (2010) Using Gaussian mixture models to detect figurative language in context. In: Human language technologies: the 2010 annual conference of the North American chapter of the Association for Computational Linguistics, Los Angeles, pp 297ā€“300

    Google ScholarĀ 

  • LjubeÅ”ić N, Toral A (2014) caWaC ā€“ a Web corpus of Catalan and its application to language modeling and machine translation. In: Proceedings of the ninth international conference on language resources and evaluation (LRECā€™14), Reykjavik, pp 1728ā€“1732

    Google ScholarĀ 

  • Luong MT, Manning CD (2015) Stanford neural machine translation systems for spoken language domain. In: Proceedings of the international workshop on spoken language translation, Da Nang, pp 76ā€“79

    Google ScholarĀ 

  • PadrĆ³ L, Stanilovsky E (2012) Freeling 3.0: towards wider multilinguality. In: Proceedings of the eighth international conference on language resources and evaluation (LREC-2012), Istanbul, pp 2473ā€“2479

    Google ScholarĀ 

  • Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of Association for Computational Linguistics, Philadelphia, pp 311ā€“318

    Google ScholarĀ 

  • Pecina P, Toral A, Papavassiliou V, Prokopidis P, Tamchyna A, Way A, van Genabith J (2014) Domain adaptation of statistical machine translation with domain-focused web crawling. Lang Resour Eval 49(1):147ā€“193

    Google ScholarĀ 

  • Reyes A (2013) Linguistic-based patterns for figurative language processing: the case of humor recognition and irony detection. Procesamiento del Lenguaje Natural 50:107ā€“109

    Google ScholarĀ 

  • Sakaguchi K, Post M, Van Durme B (2014) Efficient elicitation of annotations for human evaluation of machine translation. In: Proceedings of the ninth workshop on statistical machine translation, Baltimore, pp 1ā€“11

    Google ScholarĀ 

  • SĆ”nchez-Cartagena VM, Toral A (2016) Abu-Matran at WMT 2016 translation task: deep learning, morphological segmentation and tuning on character sequences. In: Proceedings of the first conference on machine translation, Berlin, pp 362ā€“370

    Google ScholarĀ 

  • Sennrich R (2012) Perplexity minimization for translation model domain adaptation in statistical machine translation. In: Proceedings of the 13th conference of the European chapter of the Association for Computational Linguistics, Avignon, pp 539ā€“549

    Google ScholarĀ 

  • Sennrich R, Haddow B, Birch A (2015) Improving neural machine translation models with monolingual data. arXiv preprint arXiv:151106709

    Google ScholarĀ 

  • Sennrich R, Haddow B, Birch A (2016a) Edinburgh neural machine translation systems for WMT 16. In: Proceedings of the first conference on machine translation, Berlin, pp 371ā€“376

    Google ScholarĀ 

  • Sennrich R, Haddow B, Birch A (2016b) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the Association for Computational Linguistics, Berlin. Long papers, vol 1, pp 1715ā€“1725

    Google ScholarĀ 

  • Sennrich R, Firat O, Cho K, Birch A, Haddow B, Hitschler J, Junczys-Dowmunt M, LƤubli S, Miceli Barone AV, Mokry J, Nadejde M (2017) Nematus: a toolkit for neural machine translation. In: Proceedings of the software demonstrations of the 15th conference of the European chapter of the Association for Computational Linguistics, Valencia, pp 65ā€“68

    Google ScholarĀ 

  • Shutova E, Teufel S, Korhonen A (2013) Statistical metaphor processing. Comput Linguist 39(2):301ā€“353

    Google ScholarĀ 

  • Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006: proceedings of the 7th conference of the association for machine translation in the Americas, ā€œVisions for the Future of Machine Translationā€, Cambridge, pp 223ā€“231

    Google ScholarĀ 

  • Stolcke A (2002) SRILM-an extensible language modeling toolkit. In: Proceedings of the 7th international conference on spoken language processing (ICSLP 2002), Denver, pp 901ā€“904

    Google ScholarĀ 

  • Toral A, SĆ”nchez-Cartagena VM (2017) A multifaceted evaluation of neural versus phrase-based machine translation for 9 language directions. In: Proceedings of the 15th conference of the European chapter of the Association for Computational Linguistics: volume 1, Long Papers, Valencia, pp 1063ā€“1073

    Google ScholarĀ 

  • Toral A, Way A (2015a) Machine-assisted translation of literary text: a case study. Translat Spaces 4:241ā€“268

    Google ScholarĀ 

  • Toral A, Way A (2015b) Translating literary text between related languages using SMT. In: Proceedings of the fourth workshop on computational linguistics for literature, Denver, pp 123ā€“132

    Google ScholarĀ 

  • Varga D, HalaĆ”csy P, Kornai A, Nagy V, NĆ©meth L, TrĆ³n V (2005) Parallel corpora for medium density languages. In: International conference RANLP-2005, recent advances in natural language processing, proceedings, Borovets, pp 590ā€“596

    Google ScholarĀ 

  • Vaswani A, Zhao Y, Fossum V, Chiang D (2013) Decoding with large-scale neural language models improves translation. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Seattle, pp 1387ā€“1392

    Google ScholarĀ 

  • Voigt R, Jurafsky D (2012) Towards a literary machine translation: the role of referential cohesion. In: Proceedings of the NAACL-HLT 2012 workshop on computational linguistics for literature, MontrĆØal, pp 18ā€“25

    Google ScholarĀ 

Download references

Acknowledgements

Carme Armentano and Ɓlvaro BellĆ³n ranked the translations used for the human evaluation. The research leading to these results has received funding from the European Association for Machine Translation through its 2015 sponsorship of activities programme (project PiPeNovel). The second author is supported by the ADAPT Centre for Digital Content Technology, funded under the SFI Research Centres Programme (Grant 13/RC/2106). We would like to thank the Center for Information Technology of the University of Groningen and the Irish Centre for High-End Computing (http://www.ichec.ie) for providing computational infrastructure.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio Toral .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Toral, A., Way, A. (2018). What Level of Quality Can Neural Machine Translation Attain on Literary Text?. In: Moorkens, J., Castilho, S., Gaspari, F., Doherty, S. (eds) Translation Quality Assessment. Machine Translation: Technologies and Applications, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-319-91241-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-91241-7_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-91240-0

  • Online ISBN: 978-3-319-91241-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics