Skip to main content
Log in

Tag-less back-translation

Machine Translation

Abstract

An effective method to generate a large number of parallel sentences for training improved neural machine translation (NMT) systems is the use of the back-translations of the target-side monolingual data. The standard back-translation method has been shown to be unable to efficiently utilize huge amounts of existing monolingual data because of the inability of translation models to differentiate between authentic and synthetic parallel data during training. Tagging, or using gates, has been used to enable translation models to distinguish between synthetic and authentic data, improving standard back-translation and also enabling the use of iterative back-translation on language pairs that underperformed using standard back-translation. In this work, we approach back-translation as a domain adaptation problem, eliminating the need for explicit tagging. In our approach—tag-less back-translation—the synthetic and authentic parallel data are treated as out-of-domain and in-domain data, respectively, and through pre-training and fine-tuning, the translation model is shown to be able to learn more efficiently from them during training. Experimental results have shown that the approach outperforms the standard and tagged back-translation approaches on low resource English-Vietnamese and English-German NMT.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, Kudlur M, Levenberg J, Monga R, Moore S, Murray DG, Steiner, B, Tucker P, Vasudevan V, Warden P, Wicke M, Yu Y, Zheng X (2016) Tensorflow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX conference on operating systems design and implementation, OSDI16, Savannah, GA, pp 265–283

  • Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: Bengio Y, LeCun Y (eds) 3rd international conference on learning representations, ICLR 2015, San Diego, CA, May 7–9, 2015, conference track proceedings, 15pp

  • Barone AVM, Haddow B, Germann U, Sennrich R (2017) Regularization techniques for fine-tuning in neural machine translation. In: Proceedings of the 2017 conference on empirical methods in natural language processing, Copenhagen, pp 1489–1494

  • Bojar O, Buck C, Federmann C, Haddow B, Koehn P, Leveling J, Monz C, Pecina P, Post M, Saint-Amand H, Soricut R, Specia L, Tamchyna A (2014) Findings of the 2014 workshop on statistical machine translation. In: Proceedings of the ninth workshop on statistical machine translation, Baltimore, MD, pp 12–58

  • Burlot F, Yvon F (2018) Using monolingual data in neural machine translation: a systematic study. In: Proceedings of the third conference on machine translation: research papers, pp 144–155, Brussels, Belgium

  • Caswell I, Chelba C, Grangier D (2019) Tagged back-translation. In: Proceedings of the fourth conference on machine translation (volume 1: research papers), Florence, pp 53–63

  • Cettolo M, Niehues J, Stüker S, Bentivogli L, Federico M (2014) Report on the 11th IWSLT evaluation campaign, IWSLT 2014. In: Proceedings of the 11th workshop on spoken language translation, Lake Tahoe, CA, pp 2–16

  • Cettolo M, Federico M, Bentivogli L, Niehues J, Stüker S, Sudoh K, Yoshino K, Federmann C (2017) Overview of the IWSLT 2017 evaluation campaign. In: Proceedings of the 14th international workshop on spoken language translation, Tokyo

  • Cho K, van Merrienboer B, Gülçehre Ç, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR. https://arxiv.org/abs/1406.1078

  • Chu C, Wang R (2018) A survey of domain adaptation for neural machine translation. In: Proceedings of the 27th international conference on computational linguistics, Santa Fe, NM, pp 1304–1319

  • Chu C, Dabre R, Kurohashi S (2017) An empirical comparison of domain adaptation methods for neural machine translation. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 2: short papers), Vancouver, pp 385–391

  • Currey A, Miceli Barone AV, Heafield K (2017) Copied monolingual data improves low-resource neural machine translation. In: Proceedings of the second conference on machine translation, vol 1, Copenhagen, pp 148–156

  • Dabre R, Chen K, Marie B, Wang R, Fujita A, Utiyama M, Sumita E (2019) NICTs supervised neural machine translation systems for the WMT19 news translation task. In: Proceedings of the fourth conference on machine translation (WMT), shared task papers (day 1), vol 2, Florence, pp 168–174

  • Dauphin YN, Fan A, Auli M, Grangier D (2017) Language modeling with gated convolutional networks. In: Proceedings of the 34th international conference on machine learning—volume 70, pp 933–941

  • Dehghani M, Gouws S, Vinyals O, Uszkoreit J, Kaiser Ł (2019) Universal transformers. ICLR, pp 1–23. https://arxiv.org/abs/1807.03819

  • Edunov S, Ott M, Auli M, Grangier D (2018) Understanding back-translation at scale. In: Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, pp 489–500

  • Fadaee M, Monz C (2018) Back-translation sampling by targeting difficult words in neural machine translation. In: Proceedings of the 2018 conference on empirical methods in natural language processing, Brussels, pp 436–446

  • Fan A, Bhosale S, Schwenk H, Ma Z, El-Kishky A, Goyal S, Baines M, Celebi O, Wenzek G, Chaudhary V, Goyal N, Birch T, Liptchinsky V, Edunov S, Grave E, Auli M, Joulin A (2020) Beyond english-centric multilingual machine translation. https://arxiv.org/abs/2010.11125

  • Gehring J, Michael A, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. In: Precup D, Teh YW (eds) Proceedings of the 34th international conference on machine learning, Sydney, pp 1243–1252

  • Graça M, Kim Y, Schamper J, Khadivi S, Ney H (2019) Generalizing back-translation in neural machine translation. In: Proceedings of the fourth conference on machine translation (volume 1: research papers), Florence, pp 45–52

  • Gulcehre C, Firat O, Xu K, Cho K, Bengio Y (2017) On integrating a language model into neural machine translation. Comput Speech Lang 45(2017):137–148

    Article  Google Scholar 

  • He D, Xia Y, Qin T, Wang L, Yu N, Liu T-Y, Ma W-Y (2016) Dual learning for machine translation. In: Proceedings of the 30th international conference on neural information processing systems, NIPS16. Curran Associates Inc, pp 820–828

  • Hoang VCD, Koehn P, Haffari G, Cohn T (2018) Iterative back-translation for neural machine translation. In: Proceedings of the 2nd workshop on neural machine translation and generation, Melbourne, pp 18–24

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  • Imamura K, Fujita A, Sumita E (2018) Enhancement of encoder and attention using target monolingual corpora in neural machine translation. In: Proceedings of the 2nd workshop on neural machine translation and generation, Melbourne, pp 55–63

  • Johnson M, Schuster M, Le QV, Krikun M, Wu Y, Chen Z, Thorat N, Viégas F, Wattenberg M, Corrado G, Hughes M, Dean J (2017) Googles multilingual neural machine translation system: enabling zero-shot translation. Trans Assoc Comput Linguist 5:339–351

    Article  Google Scholar 

  • Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: 3rd international conference on learning representations, \(\{\)ICLR\(\}\) 2015, conference track proceedings, San Diego, CA, 15pp

  • Klein G, Kim Y, Deng Y, Senellart J, Rush AM (2017) OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, system demonstrations, Vancouver, pp 67–72

  • Kocmi T, Bojar O (2018) Trivial transfer learning for low-resource neural machine translation. In: Proceedings of the third conference on machine translation (WMT), vol 1, Brussels, pp 244–252

  • Kocmi T, Bojar O (2019) CUNI submission for low-resource languages in WMT news 2019. In: Proceedings of the Fourth conference on machine translation (WMT), shared task papers (day 1), vol 2, Florence, pp 234–240

  • Koehn P (2004) Statistical significance tests for machine translation evaluation. In: Proceedings of the 2004 conference on empirical methods in natural language processing, Barcelona, pp 388–395

  • Koehn P, Knowles R (2017) Six challenges for neural machine translation. In: Proceedings of the first workshop on neural machine translation, Vancouver, pp 28–39

  • Koehn P, Och JF, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the 2003 human language technology conference of the North American chapter of the association for computational linguistics, Edmonton, pp 127–133

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions, ACL 07, Prague, pp 177–180

  • Luong MT, Pham H, Manning CD (2015) Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, pp 1412–1421

  • Marie B, Rubino R, Fujita A (2020) Tagged back-translation revisited: why does it really work? In: Proceedings of the 58th annual meeting of the association for computational linguistics, online, pp 5990–5997

  • Mikolov T, Corrado G, Chen K, Dean J (2013) Efficient estimation of word representations in vector space. In: 1st international conference on learning representations, ICLR, AZ, 12pp

  • Nguyen TQ, Chiang D (2017) Transfer learning across low-resource, related languages for neural machine translation. In: Proceedings of the eighth international joint conference on natural language processing, vol 2, Taipei, pp 296–301

  • Niu X, Denkowski M, Carpuat M (2018) Bi-directional neural machine translation with synthetic parallel data. In: Proceedings of the 2nd workshop on neural machine translation and generation, Melbourne, pp 84–91

  • Ott M, Edunov S, Grangier D, Auli M (2018) Scaling neural machine translation. In: Proceedings of the third conference on machine translation: research papers, Brussels, pp 1–9

  • Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Linguistics, proceedings of the 40th annual meeting of the association for computational, Philadelphia, PA, pp 311–318

  • Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: an imperative style, high-performance deep learning library. In: Advances in neural information processing systems 32. Curran Associates Inc, Vancouver, pp 8024–8035

  • Poncelas A, Shterionov D, Way A, Maillette de Buy Wenniger G, Passban P (2018) Investigating backtranslation in neural machine translation. In: Proceedings of the 21st annual conference of the European Association for Machine Translation (EAMT 2018), Alicante, pp 249–258

  • Poncelas A, Maillette de Buy Wenniger G, Way A (2019) Adaptation of machine translation models with back-translated data using transductive data selection methods. In: Proceedings of CICLing 2019, the 20th international conference on computational linguistics and intelligent text processing, La Rochelle, 12pp

  • Popel M (2018) Machine translation using syntactic analysis. PhD Thesis, Charles University, Prague

  • Ranzato M, Chopra S, Auli M, Zaremba W (2016) Sequence level training with recurrent neural networks. In: 4th international conference on learning representations, ICLR 2016, conference track proceedings, San Juan, PR, 16pp

  • Sennrich R, Haddow B, Birch A (2016a) Improving neural machine translation models with monolingual data. In: Proceedings of the 54th annual meeting of the association for computational linguistics, Berlin, pp 86–96

  • Sennrich R, Haddow B, Birch A (2016b) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics (volume 1: long papers), Berlin, pp 1715–1725

  • Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: NeurIPS 2014: proceedings of the twenty-eighth conference on neural information processing systems, Montréal, 9pp

  • Van der Wees M, Bisazza A, Monz C (2017) Dynamic data selection for neural machine translation. In: EMNLP 2017—conference on empirical methods in natural language processing, proceedings, Copenhagen, pp 1400–1410

  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Proceedings of 31st conference on neural information processing systems, Long Beach, CA, 15pp

  • Whatmough PN, Zhou C, Hansen P, Venkataramanaiah SK, Seo J-S, Mattina M (2019) FixyNN: efficient hardware for mobile computer vision via transfer learning. CoRR. https://arxiv.org.abs/1902.11128

  • Wu F, Fan A, Baevski A, Dauphin YN, Auli M (2019) Pay less attention with lightweight and dynamic convolutions. In: Proceedings of 7th international conference on learning representations, ICLR, New Orleans, LA, 14pp

  • Yang Z, Chen W, Wang F, Xu B (2019) Effectively training neural machine translation models with monolingual data. Neurocomputing 333:240–247

    Article  Google Scholar 

  • Zhang J, Zong C (2016) Exploiting source-side monolingual data in neural machine translation. In: Proceedings of the 2016 conference on empirical methods in natural language processing, Austin, TX, pp 1535–1545

  • Zhang Z, Liu S, Li M, Zhou M, Chen E (2018) Joint training for neural machine translation models with monolingual data. CoRR. https://arxiv.org/abs/1803.00353

  • Zoph B, Yuret D, May J, Knight K (2016) Transfer learning for low-resource neural machine translation. In: Proceedings of the 2016 conference on empirical methods in natural language processing , Austin, TX, pp 1568–1575

Download references

Acknowledgements

This work is supported by the National Information Technology Development Agency under the National Information Technology Development Fund PhD Scholarship Scheme 2018.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Idris Abdulmumin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

In this section, we provided the complete evaluation scores on the test set for all the models trained (Tables 7, 8, 9, 10, 11 and12). We also provided the statistical significance scores of the improvements in performances of the various translation approaches (Table  13).

Table 7 Performance of tag-less back-translation compared to the baseline and standard back-translation models
Table 8 Performance of tag-less back-translation compared to the baseline and standard back-translation models
Table 9 Tagged vs tag-less back-translation
Table 10 Pre-training on the authentic data and fine-tuning on the synthetic data for Vietnamese-English NMT
Table 11 Using different ratios of authentic to synthetic parallel data and its effect on the performance of Vietnamese-English NMT
Table 12 Fine-tuning the tagged and standard back-translations
Table 13 This table shows how often a conclusion with 95% statistical significance is made for comparing the various approaches

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abdulmumin, I., Galadanci, B.S. & Aliyu, G. Tag-less back-translation. Machine Translation 35, 519–549 (2021). https://doi.org/10.1007/s10590-021-09284-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-021-09284-y

Keywords

Navigation