Abstract
An effective method to generate a large number of parallel sentences for training improved neural machine translation (NMT) systems is the use of the back-translations of the target-side monolingual data. The standard back-translation method has been shown to be unable to efficiently utilize huge amounts of existing monolingual data because of the inability of translation models to differentiate between authentic and synthetic parallel data during training. Tagging, or using gates, has been used to enable translation models to distinguish between synthetic and authentic data, improving standard back-translation and also enabling the use of iterative back-translation on language pairs that underperformed using standard back-translation. In this work, we approach back-translation as a domain adaptation problem, eliminating the need for explicit tagging. In our approach—tag-less back-translation—the synthetic and authentic parallel data are treated as out-of-domain and in-domain data, respectively, and through pre-training and fine-tuning, the translation model is shown to be able to learn more efficiently from them during training. Experimental results have shown that the approach outperforms the standard and tagged back-translation approaches on low resource English-Vietnamese and English-German NMT.
Similar content being viewed by others
References
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray DG, Steiner, B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X (2016) Tensorflow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX conference on operating systems design and implementation, OSDI16, Savannah, GA, pp 265–283
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Bengio Y, LeCun Y (eds) 3rd international conference on learning representations, ICLR 2015, San Diego, CA, May 7–9, 2015, conference track proceedings, 15pp
Barone AVM, Haddow B, Germann U, Sennrich R (2017) Regularization techniques for fine-tuning in neural machine translation. In: Proceedings of the 2017 conference on empirical methods in natural language processing, Copenhagen, pp 1489–1494
Bojar O, Buck C, Federmann C, Haddow B, Koehn P, Leveling J, Monz C, Pecina P, Post M, Saint-Amand H, Soricut R, Specia L, Tamchyna A (2014) Findings of the 2014 workshop on statistical machine translation. In: Proceedings of the ninth workshop on statistical machine translation, Baltimore, MD, pp 12–58
Burlot F, Yvon F (2018) Using monolingual data in neural machine translation: a systematic study. In: Proceedings of the third conference on machine translation: research papers, pp 144–155, Brussels, Belgium
Caswell I, Chelba C, Grangier D (2019) Tagged back-translation. In: Proceedings of the fourth conference on machine translation (volume 1: research papers), Florence, pp 53–63
Cettolo M, Niehues J, Stüker S, Bentivogli L, Federico M (2014) Report on the 11th IWSLT evaluation campaign, IWSLT 2014. In: Proceedings of the 11th workshop on spoken language translation, Lake Tahoe, CA, pp 2–16
Cettolo M, Federico M, Bentivogli L, Niehues J, Stüker S, Sudoh K, Yoshino K, Federmann C (2017) Overview of the IWSLT 2017 evaluation campaign. In: Proceedings of the 14th international workshop on spoken language translation, Tokyo
Cho K, van Merrienboer B, Gülçehre Ç, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR. https://arxiv.org/abs/1406.1078
Chu C, Wang R (2018) A survey of domain adaptation for neural machine translation. In: Proceedings of the 27th international conference on computational linguistics, Santa Fe, NM, pp 1304–1319
Chu C, Dabre R, Kurohashi S (2017) An empirical comparison of domain adaptation methods for neural machine translation. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 2: short papers), Vancouver, pp 385–391
Currey A, Miceli Barone AV, Heafield K (2017) Copied monolingual data improves low-resource neural machine translation. In: Proceedings of the second conference on machine translation, vol 1, Copenhagen, pp 148–156
Dabre R, Chen K, Marie B, Wang R, Fujita A, Utiyama M, Sumita E (2019) NICTs supervised neural machine translation systems for the WMT19 news translation task. In: Proceedings of the fourth conference on machine translation (WMT), shared task papers (day 1), vol 2, Florence, pp 168–174
Dauphin YN, Fan A, Auli M, Grangier D (2017) Language modeling with gated convolutional networks. In: Proceedings of the 34th international conference on machine learning—volume 70, pp 933–941
Dehghani M, Gouws S, Vinyals O, Uszkoreit J, Kaiser Ł (2019) Universal transformers. ICLR, pp 1–23. https://arxiv.org/abs/1807.03819
Edunov S, Ott M, Auli M, Grangier D (2018) Understanding back-translation at scale. In: Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, pp 489–500
Fadaee M, Monz C (2018) Back-translation sampling by targeting difficult words in neural machine translation. In: Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, pp 436–446
Fan A, Bhosale S, Schwenk H, Ma Z, El-Kishky A, Goyal S, Baines M, Celebi O, Wenzek G, Chaudhary V, Goyal N, Birch T, Liptchinsky V, Edunov S, Grave E, Auli M, Joulin A (2020) Beyond english-centric multilingual machine translation. https://arxiv.org/abs/2010.11125
Gehring J, Michael A, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, Sydney, pp 1243–1252
Graça M, Kim Y, Schamper J, Khadivi S, Ney H (2019) Generalizing back-translation in neural machine translation. In: Proceedings of the fourth conference on machine translation (volume 1: research papers), Florence, pp 45–52
Gulcehre C, Firat O, Xu K, Cho K, Bengio Y (2017) On integrating a language model into neural machine translation. Comput Speech Lang 45(2017):137–148
He D, Xia Y, Qin T, Wang L, Yu N, Liu T-Y, Ma W-Y (2016) Dual learning for machine translation. In: Proceedings of the 30th international conference on neural information processing systems, NIPS16. Curran Associates Inc, pp 820–828
Hoang VCD, Koehn P, Haffari G, Cohn T (2018) Iterative back-translation for neural machine translation. In: Proceedings of the 2nd workshop on neural machine translation and generation, Melbourne, pp 18–24
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Imamura K, Fujita A, Sumita E (2018) Enhancement of encoder and attention using target monolingual corpora in neural machine translation. In: Proceedings of the 2nd workshop on neural machine translation and generation, Melbourne, pp 55–63
Johnson M, Schuster M, Le QV, Krikun M, Wu Y, Chen Z, Thorat N, Viégas F, Wattenberg M, Corrado G, Hughes M, Dean J (2017) Googles multilingual neural machine translation system: enabling zero-shot translation. Trans Assoc Comput Linguist 5:339–351
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: 3rd international conference on learning representations, \(\{\)ICLR\(\}\) 2015, conference track proceedings, San Diego, CA, 15pp
Klein G, Kim Y, Deng Y, Senellart J, Rush AM (2017) OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, system demonstrations, Vancouver, pp 67–72
Kocmi T, Bojar O (2018) Trivial transfer learning for low-resource neural machine translation. In: Proceedings of the third conference on machine translation (WMT), vol 1, Brussels, pp 244–252
Kocmi T, Bojar O (2019) CUNI submission for low-resource languages in WMT news 2019. In: Proceedings of the Fourth conference on machine translation (WMT), shared task papers (day 1), vol 2, Florence, pp 234–240
Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of the 2004 conference on empirical methods in natural language processing, Barcelona, pp 388–395
Koehn P, Knowles R (2017) Six challenges for neural machine translation. In: Proceedings of the first workshop on neural machine translation, Vancouver, pp 28–39
Koehn P, Och JF, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the 2003 human language technology conference of the North American chapter of the association for computational linguistics, Edmonton, pp 127–133
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, ACL 07, Prague, pp 177–180
Luong MT, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, pp 1412–1421
Marie B, Rubino R, Fujita A (2020) Tagged back-translation revisited: why does it really work? In: Proceedings of the 58th annual meeting of the association for computational linguistics, online, pp 5990–5997
Mikolov T, Corrado G, Chen K, Dean J (2013) Efficient estimation of word representations in vector space. In: 1st international conference on learning representations, ICLR, AZ, 12pp
Nguyen TQ, Chiang D (2017) Transfer learning across low-resource, related languages for neural machine translation. In: Proceedings of the eighth international joint conference on natural language processing, vol 2, Taipei, pp 296–301
Niu X, Denkowski M, Carpuat M (2018) Bi-directional neural machine translation with synthetic parallel data. In: Proceedings of the 2nd workshop on neural machine translation and generation, Melbourne, pp 84–91
Ott M, Edunov S, Grangier D, Auli M (2018) Scaling neural machine translation. In: Proceedings of the third conference on machine translation: research papers, Brussels, pp 1–9
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Linguistics, proceedings of the 40th annual meeting of the association for computational, Philadelphia, PA, pp 311–318
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: an imperative style, high-performance deep learning library. In: Advances in neural information processing systems 32. Curran Associates Inc, Vancouver, pp 8024–8035
Poncelas A, Shterionov D, Way A, Maillette de Buy Wenniger G, Passban P (2018) Investigating backtranslation in neural machine translation. In: Proceedings of the 21st annual conference of the European Association for Machine Translation (EAMT 2018), Alicante, pp 249–258
Poncelas A, Maillette de Buy Wenniger G, Way A (2019) Adaptation of machine translation models with back-translated data using transductive data selection methods. In: Proceedings of CICLing 2019, the 20th international conference on computational linguistics and intelligent text processing, La Rochelle, 12pp
Popel M (2018) Machine translation using syntactic analysis. PhD Thesis, Charles University, Prague
Ranzato M, Chopra S, Auli M, Zaremba W (2016) Sequence level training with recurrent neural networks. In: 4th international conference on learning representations, ICLR 2016, conference track proceedings, San Juan, PR, 16pp
Sennrich R, Haddow B, Birch A (2016a) Improving neural machine translation models with monolingual data. In: Proceedings of the 54th annual meeting of the association for computational linguistics, Berlin, pp 86–96
Sennrich R, Haddow B, Birch A (2016b) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers), Berlin, pp 1715–1725
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: NeurIPS 2014: proceedings of the twenty-eighth conference on neural information processing systems, Montréal, 9pp
Van der Wees M, Bisazza A, Monz C (2017) Dynamic data selection for neural machine translation. In: EMNLP 2017—conference on empirical methods in natural language processing, proceedings, Copenhagen, pp 1400–1410
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of 31st conference on neural information processing systems, Long Beach, CA, 15pp
Whatmough PN, Zhou C, Hansen P, Venkataramanaiah SK, Seo J-S, Mattina M (2019) FixyNN: efficient hardware for mobile computer vision via transfer learning. CoRR. https://arxiv.org.abs/1902.11128
Wu F, Fan A, Baevski A, Dauphin YN, Auli M (2019) Pay less attention with lightweight and dynamic convolutions. In: Proceedings of 7th international conference on learning representations, ICLR, New Orleans, LA, 14pp
Yang Z, Chen W, Wang F, Xu B (2019) Effectively training neural machine translation models with monolingual data. Neurocomputing 333:240–247
Zhang J, Zong C (2016) Exploiting source-side monolingual data in neural machine translation. In: Proceedings of the 2016 conference on empirical methods in natural language processing, Austin, TX, pp 1535–1545
Zhang Z, Liu S, Li M, Zhou M, Chen E (2018) Joint training for neural machine translation models with monolingual data. CoRR. https://arxiv.org/abs/1803.00353
Zoph B, Yuret D, May J, Knight K (2016) Transfer learning for low-resource neural machine translation. In: Proceedings of the 2016 conference on empirical methods in natural language processing , Austin, TX, pp 1568–1575
Acknowledgements
This work is supported by the National Information Technology Development Agency under the National Information Technology Development Fund PhD Scholarship Scheme 2018.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Abdulmumin, I., Galadanci, B.S. & Aliyu, G. Tag-less back-translation. Machine Translation 35, 519–549 (2021). https://doi.org/10.1007/s10590-021-09284-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-021-09284-y