What Level of Quality Can Neural Machine Translation Attain on Literary Text?

Toral, Antonio; Way, Andy

doi:10.1007/978-3-319-91241-7_12

Antonio Toral⁶ &
Andy Way⁷

Part of the book series: Machine Translation: Technologies and Applications ((MATRA,volume 1))

4796 Accesses
26 Citations

Abstract

Given the rise of the new neural approach to machine translation (NMT) and its promising performance on different text types, we assess the translation quality it can attain on what is perceived to be the greatest challenge for MT: literary text. Specifically, we target novels, arguably the most popular type of literary text. We build a literary-adapted NMT system for the English-to-Catalan translation direction and evaluate it against a system pertaining to the previous dominant paradigm in MT: statistical phrase-based MT (PBSMT). To this end, for the first time we train MT systems, both NMT and PBSMT, on large amounts of literary text (over 100 million words) and evaluate them on a set of 12 widely known novels spanning from the 1920s to the present day. According to the BLEU automatic evaluation metric, NMT is significantly better than PBSMT (p < 0.01) on all the novels considered. Overall, NMT results in a 11% relative improvement (3 points absolute) over PBSMT. A complementary human evaluation on three of the books shows that between 17% and 34% of the translations, depending on the book, produced by NMT (versus 8% and 20% with PBSMT) are perceived by native speakers of the target language to be of equivalent quality to translations produced by a professional human translator.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For example, in the US the market share of e-books surpassed that of printed books for fiction in 2014, http://www.ingenta.com/blog-article/adding-up-the-invisible-ebook-market-analysis-of-author-earnings-january-2015-2/
2.
Working models of NMT have only recently been introduced, but from a theoretical perspective, very similar models can be traced back two decades (Forcada and Ñeco 1997).
3.
http://events.technologyreview.com/video/watch/alan-packer-understanding-language/
4.
With this term we refer to European languages with around 5–10 million speakers, as is the case of many other languages in Europe, such as Danish, Serbian, Czech, etc.
5.
While our experiments are for the English-to-Catalan language pair, we also use English monolingual data to generate synthetic data for our NMT system (see Sect. 4.2).
6.
http://opus.lingfil.uu.se/OpenSubtitles.php
7.
In order to build the test sets we sentence-align the source and target versions of the books. We keep the subset of sentence pairs whose alignment score is above a certain threshold. See Sect. 3.3.1 for further details.
8.
https://calibre-ebook.com/
9.
http://sourceforge.net/projects/apertium/files/apertium-en-ca/0.9.3/
10.
https://github.com/rsennrich/nematus
11.
http://www.cs.cmu.edu/~ark/MT/paired_bootstrap_v13a.tar.gz
12.
https://github.com/cfedermann/Appraise
13.
Training is performed on an NVIDIA Tesla K20X GPU.
14.
e.g. http://www.statmt.org/wmt17/translation-task.html
15.
As mentioned in Sect. 3.3.1, the source novels and their human translations were sentence-aligned automatically. The empirically set confidence threshold results in most alignments being correct, but some are erroneous.
16.
While the majority of HT< MT cases are unjustified, not all of them are. By removing these rankings, the results are slightly biased in favour of HT and thus overly conservative with respect to the potential of MT.
17.
https://github.com/mjpost/wmt15

References

Bentivogli L, Bisazza A, Cettolo M, Federico M (2016) Neural versus phrase-based machine translation quality: a case study. In: Proceedings of the 2016 conference on empirical methods in natural language processing, Austin, pp 257–267
Google Scholar
Besacier L (2014) Traduction automatisée d’une oeuvre littéraire: une étude pilote. In: Traitement automatique du langage naturel (TALN), Marseille, http://hal.inria.fr/hal-01003944
Bird S (2006) NLTK: the natural language toolkit. In: Proceedings of the COLING/ACL on interactive presentation sessions, Sydney, pp 69–72
Google Scholar
Bojar O, Chatterjee R, Federmann C, Graham Y, Haddow B, Huck M, Jimeno Yepes A, Koehn P, Logacheva V, Monz C, Negri M, Neveol A, Neves M, Popel M, Post M, Rubino R, Scarton C, Specia L, Turchi M, Verspoor K, Zampieri M (2016) Findings of the 2016 conference on machine translation. In: Proceedings of the first conference on machine translation, Berlin, pp 131–198
Google Scholar
Cherry C, Foster G (2012) Batch tuning strategies for statistical machine translation. In: Proceedings of the 2012 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, Montréal, pp 427–436
Google Scholar
Durrani N, Schmid H, Fraser A (2011) A joint sequence translation model with integrated reordering. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: human language technologies, vol 1, Portland, pp 1045–1054
Google Scholar
Federmann C (2012) Appraise: an open-source toolkit for manual evaluation of machine translation output. Prague Bull Math Linguist 98:25–35
Google Scholar
Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5): 378–382
Google Scholar
Forcada ML, Ñeco RP (1997) Recursive hetero-associative memories for translation. In: Proceedings of the biological and artificial computation: from neuroscience to technology: international work-conference on artificial and natural neural networks, IWANN’97 Lanzarote, 4–6 June 1997. Springer, Berlin/Heidelberg, pp 453–462
Google Scholar
Genzel D, Uszkoreit J, Och F (2010) “Poetic” statistical machine translation: rhyme and meter. In: Proceedings of the 2010 conference on empirical methods in natural language processing, Cambridge, pp 158–166
Google Scholar
Greene E, Bodrumlu T, Knight K (2010) Automatic analysis of rhythmic poetry with applications to generation and translation. In: Proceedings of the 2010 conference on empirical methods in natural language processing, Cambridge, pp 524–533
Google Scholar
Hardmeier C (2014) Discourse in statistical machine translation. PhD thesis, University of Uppsala
Google Scholar
Heafield K (2011) Kenlm: faster and smaller language model queries. In: Proceedings of the sixth workshop on statistical machine translation, Edinburgh, pp 187–197
Google Scholar
Jones R, Irvine A (2013) The (un)faithful machine translator. In: Proceedings of the 7th workshop on language technology for cultural heritage, social sciences, and humanities, Sofia, pp 96–101
Google Scholar
Junczys-Dowmunt M, Dwojak T, Hoang H (2016) Is neural machine translation ready for deployment? A case study on 30 translation directions. arXiv preprint arXiv:161001108
Google Scholar
Klubička F, Toral A, Sánchez-Cartagena VM (2017) Fine-grained human evaluation of neural versus phrase-based machine translation. Prague Bull Math Linguist 108:121–132
Google Scholar
Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of the conference on empirical methods in natural language processing, Barcelona, vol 4, pp 388–395
Google Scholar
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, Prague, pp 177–180
Google Scholar
Landis RJ, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174
Google Scholar
Li L, Sporleder C (2010) Using Gaussian mixture models to detect figurative language in context. In: Human language technologies: the 2010 annual conference of the North American chapter of the Association for Computational Linguistics, Los Angeles, pp 297–300
Google Scholar
Ljubešić N, Toral A (2014) caWaC – a Web corpus of Catalan and its application to language modeling and machine translation. In: Proceedings of the ninth international conference on language resources and evaluation (LREC’14), Reykjavik, pp 1728–1732
Google Scholar
Luong MT, Manning CD (2015) Stanford neural machine translation systems for spoken language domain. In: Proceedings of the international workshop on spoken language translation, Da Nang, pp 76–79
Google Scholar
Padró L, Stanilovsky E (2012) Freeling 3.0: towards wider multilinguality. In: Proceedings of the eighth international conference on language resources and evaluation (LREC-2012), Istanbul, pp 2473–2479
Google Scholar
Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of Association for Computational Linguistics, Philadelphia, pp 311–318
Google Scholar
Pecina P, Toral A, Papavassiliou V, Prokopidis P, Tamchyna A, Way A, van Genabith J (2014) Domain adaptation of statistical machine translation with domain-focused web crawling. Lang Resour Eval 49(1):147–193
Google Scholar
Reyes A (2013) Linguistic-based patterns for figurative language processing: the case of humor recognition and irony detection. Procesamiento del Lenguaje Natural 50:107–109
Google Scholar
Sakaguchi K, Post M, Van Durme B (2014) Efficient elicitation of annotations for human evaluation of machine translation. In: Proceedings of the ninth workshop on statistical machine translation, Baltimore, pp 1–11
Google Scholar
Sánchez-Cartagena VM, Toral A (2016) Abu-Matran at WMT 2016 translation task: deep learning, morphological segmentation and tuning on character sequences. In: Proceedings of the first conference on machine translation, Berlin, pp 362–370
Google Scholar
Sennrich R (2012) Perplexity minimization for translation model domain adaptation in statistical machine translation. In: Proceedings of the 13th conference of the European chapter of the Association for Computational Linguistics, Avignon, pp 539–549
Google Scholar
Sennrich R, Haddow B, Birch A (2015) Improving neural machine translation models with monolingual data. arXiv preprint arXiv:151106709
Google Scholar
Sennrich R, Haddow B, Birch A (2016a) Edinburgh neural machine translation systems for WMT 16. In: Proceedings of the first conference on machine translation, Berlin, pp 371–376
Google Scholar
Sennrich R, Haddow B, Birch A (2016b) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the Association for Computational Linguistics, Berlin. Long papers, vol 1, pp 1715–1725
Google Scholar
Sennrich R, Firat O, Cho K, Birch A, Haddow B, Hitschler J, Junczys-Dowmunt M, Läubli S, Miceli Barone AV, Mokry J, Nadejde M (2017) Nematus: a toolkit for neural machine translation. In: Proceedings of the software demonstrations of the 15th conference of the European chapter of the Association for Computational Linguistics, Valencia, pp 65–68
Google Scholar
Shutova E, Teufel S, Korhonen A (2013) Statistical metaphor processing. Comput Linguist 39(2):301–353
Google Scholar
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006: proceedings of the 7th conference of the association for machine translation in the Americas, “Visions for the Future of Machine Translation”, Cambridge, pp 223–231
Google Scholar
Stolcke A (2002) SRILM-an extensible language modeling toolkit. In: Proceedings of the 7th international conference on spoken language processing (ICSLP 2002), Denver, pp 901–904
Google Scholar
Toral A, Sánchez-Cartagena VM (2017) A multifaceted evaluation of neural versus phrase-based machine translation for 9 language directions. In: Proceedings of the 15th conference of the European chapter of the Association for Computational Linguistics: volume 1, Long Papers, Valencia, pp 1063–1073
Google Scholar
Toral A, Way A (2015a) Machine-assisted translation of literary text: a case study. Translat Spaces 4:241–268
Google Scholar
Toral A, Way A (2015b) Translating literary text between related languages using SMT. In: Proceedings of the fourth workshop on computational linguistics for literature, Denver, pp 123–132
Google Scholar
Varga D, Halaácsy P, Kornai A, Nagy V, Németh L, Trón V (2005) Parallel corpora for medium density languages. In: International conference RANLP-2005, recent advances in natural language processing, proceedings, Borovets, pp 590–596
Google Scholar
Vaswani A, Zhao Y, Fossum V, Chiang D (2013) Decoding with large-scale neural language models improves translation. In: Proceedings of the 2013 conference on empirical methods in natural language processing, Seattle, pp 1387–1392
Google Scholar
Voigt R, Jurafsky D (2012) Towards a literary machine translation: the role of referential cohesion. In: Proceedings of the NAACL-HLT 2012 workshop on computational linguistics for literature, Montrèal, pp 18–25
Google Scholar

Download references

Acknowledgements

Carme Armentano and Álvaro Bellón ranked the translations used for the human evaluation. The research leading to these results has received funding from the European Association for Machine Translation through its 2015 sponsorship of activities programme (project PiPeNovel). The second author is supported by the ADAPT Centre for Digital Content Technology, funded under the SFI Research Centres Programme (Grant 13/RC/2106). We would like to thank the Center for Information Technology of the University of Groningen and the Irish Centre for High-End Computing (http://www.ichec.ie) for providing computational infrastructure.

Author information

Authors and Affiliations

Faculty of Arts, Center for Language and Cognition, University of Groningen, Groningen, The Netherlands
Antonio Toral
ADAPT Centre/School of Computing, Dublin City University, Dublin, Ireland
Andy Way

Authors

Antonio Toral
View author publications
You can also search for this author in PubMed Google Scholar
Andy Way
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonio Toral .

Editor information

Editors and Affiliations

ADAPT Centre/School of Applied Language and Intercultural Studies, Dublin City University, Dublin, Ireland
Joss Moorkens
ADAPT Centre/School of Computing, Dublin City University, Dublin, Ireland
Sheila Castilho
ADAPT Centre/School of Computing, Dublin City University, Dublin, Ireland
Federico Gaspari
School of Humanities and Languages, The University of New South Wales, Sydney, Australia
Stephen Doherty

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Toral, A., Way, A. (2018). What Level of Quality Can Neural Machine Translation Attain on Literary Text?. In: Moorkens, J., Castilho, S., Gaspari, F., Doherty, S. (eds) Translation Quality Assessment. Machine Translation: Technologies and Applications, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-319-91241-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-91241-7_12
Published: 14 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91240-0
Online ISBN: 978-3-319-91241-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics