Skip to main content
Log in

Transformer fast gradient method with relative positional embedding: a mutual translation model between English and Chinese

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Machine translation uses computers to transform one natural language into another. Text-like neural machine translation tasks cannot fully identify the sequence order of texts or the long-term dependence between words, and they suffer from excessive translation and mistranslation. To improve the naturalness, fluency, and accuracy of translation, this study proposes a new training strategy, the transformer fast gradient method with relative positional embedding (TF-RPE), which includes the fast gradient method (FGM) of adversarial training and relative positional embedding. The input sequence is founded on the transformer model, and after the word embedding matrix converts a word vector in the word embedding layer, the positional encoding can be embedded in it through relative positional embedding, helping the word vector to better save the linguistic information of the word (meaning, semantics). The addition of FGM adversarial training to the multi-head attention encoder mechanism strengthens the training of word vectors and reduces the phenomenon of miss-or-error translation, enabling significant improvement of the overall computational efficiency and accuracy of the model. TF-RPE can also provide satisfactory high-quality translations for the low-resource corpus. Extensive ablation studies and comparative analyses validate the effectiveness of the scheme, and TF-RPE achieves an improvement of average 3+ Bilingual evaluation understudy scores compared with the SOTA methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

Not applicable.

References

  • Abdulmumin I, Galadanci BS, Ahmad IS, Abdullahi RI (2021) Data selection as an alternative to quality estimation in self-learning for low resource neural machine translation. In: International conference on computational science and its applications, pp 311–326. Springer

  • Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473

  • Chen K, Wang R, Utiyama M, Sumita E (2019) Neural machine translation with reordering embeddings. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 1787–1799

  • Chiang Hsiu-Sen, Chen Mu-Yen, Huang Yu-Jhih (2019) Wavelet-based eeg processing for epilepsy detection using fuzzy entropy and associative petri net. IEEE Access 7:103255–103262

    Article  Google Scholar 

  • Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014a). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259

  • Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014b) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078

  • Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. In: International conference on machine learning, pp 1243–1252. PMLR

  • Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: International conference on learning representations 2015

  • Hinton EG, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  MathSciNet  MATH  Google Scholar 

  • Junczys-Dowmunt M, Dwojak T, Hoang H (2014) Is neural machine translation ready for deployment. In: The 13th international conference on spoken language translation

  • Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1700–1709

  • Li B, Wang Z, Liu H, Du Q, Xiao T, Zhang C, Zhu J (2020a) Learning light-weight translation models from deep transformer. arXiv preprint arXiv:2012.13866,

  • Li B, Wang Z, Liu H, Jiang Y, Du Q, Xiao T, Wang H, Zhu J (2020b) Shallow-to-deep training for neural machine translation. arXiv preprint arXiv:2010.03737,

  • Liao B, Khadivi S, Hewavitharana S (2021) Back-translation for large-scale multilingual machine translation. arXiv preprint arXiv:2109.08712,

  • López-Gonzále , Meda-Campa\({\tilde{n}}\)a JA, Hernández-Martínez EG, Paniagua-Contro P (2020) Multi robot distance based formation using parallel genetic algorithm. Soft Comput 86:105929

  • Meng F, Zhang J (2019) Dtmt: A novel deep transition architecture for neural machine translation. In Proc AAAI Conf Artif Intell 33:224–231

    Google Scholar 

  • Miyato T, Dai AM , Goodfellow I (2016) Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725,

  • Mújica-Vargas D (2021) Superpixels extraction by an intuitionistic fuzzy clustering algorithm. Res Technol 19(2):140–152

    Google Scholar 

  • Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp 311–318

  • de Jesús Rubio J (2021) Stability analysis of the modified levenberg-marquardt algorithm for the artificial neural network training. IEEE Trans Neural Netw Learn Syst 32(8):3510–3524

    Article  MathSciNet  Google Scholar 

  • Rubio J, Lughofer E, Pieper J, Cruz P, Martinez DI, Ochoa G, Islas MA, Enrique G (2021) Adapting h-infinity controller for the desired reference tracking of the sphere position in the Maglev process. Inf Sci 569:669–686

  • Rubio J , Islas MA, Ochoa G, Cruz DR, García E, Pacheco J (2022) Convergent newton method and neural network for the electric energy usage prediction. Inf Sci 585:89–112

  • Shaw P, Uszkoreit J, Vaswani A (2018) Self-attention with relative position representations. arXiv preprint arXiv:1803.02155,

  • Shi Y, Wang Y, Wu C,Yeh C-F, Chan J, Zhang F, Le D, Seltzer M (2021). Emformer: efficient memory transformer based acoustic model for low latency streaming speech recognition. In: ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6783–6787. IEEE

  • So D, Le Q, Liang C (2019) The evolved transformer. In: International conference on machine learning, pp 5877–5886. PMLR

  • Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv Neural Inf Process Systems, 27

  • Tu Z, Lu Z, Liu Y, Liu X, Li H (2016) Modeling coverage for neural machine translation. In: the 54th annual meeting of the association for computational linguistics, pp 76–85

  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst, 30

  • Wilks Y (1993) Corpora and machine translation. In: Proceedings of machine translation summit IV, pp 137–146

  • Wu F, Fan A, Baevski A, Dauphin YN, Auli M (2019). Pay less attention with lightweight and dynamic convolutions. arXiv preprint arXiv:1901.10430,

  • Zhu J, Xia Y, Wu L, He D, Qin T, Zhou W, Li H, Liu TY (2020) Incorporating bert into neural machine translation. arXiv preprint arXiv:2002.06823,

  • Ziemski M, Junczys-Dowmunt M, Pouliquen B (2016) The united nations parallel corpus v1. 0. In: Proceedings of the tenth international conference on language resources and evaluation (LREC’16), pp 3530–3534

Download references

Funding

This work is funded by the National Natural Science Foundation of China (No. 62076045) and the High-Level Talent Innovation Support Program (Young Science and Technology Star) of Dalian (No. 2021RQ066).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by YL, YS and ZL. The first draft of the manuscript was written by YL and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Zhaoqian Zhong.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest regarding to publish this paper.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by Oscar Castillo.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Shan, Y., Liu, Z. et al. Transformer fast gradient method with relative positional embedding: a mutual translation model between English and Chinese. Soft Comput 27, 13435–13443 (2023). https://doi.org/10.1007/s00500-022-07678-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-022-07678-5

Keywords

Navigation