Abstract
Machine translation uses computers to transform one natural language into another. Text-like neural machine translation tasks cannot fully identify the sequence order of texts or the long-term dependence between words, and they suffer from excessive translation and mistranslation. To improve the naturalness, fluency, and accuracy of translation, this study proposes a new training strategy, the transformer fast gradient method with relative positional embedding (TF-RPE), which includes the fast gradient method (FGM) of adversarial training and relative positional embedding. The input sequence is founded on the transformer model, and after the word embedding matrix converts a word vector in the word embedding layer, the positional encoding can be embedded in it through relative positional embedding, helping the word vector to better save the linguistic information of the word (meaning, semantics). The addition of FGM adversarial training to the multi-head attention encoder mechanism strengthens the training of word vectors and reduces the phenomenon of miss-or-error translation, enabling significant improvement of the overall computational efficiency and accuracy of the model. TF-RPE can also provide satisfactory high-quality translations for the low-resource corpus. Extensive ablation studies and comparative analyses validate the effectiveness of the scheme, and TF-RPE achieves an improvement of average 3+ Bilingual evaluation understudy scores compared with the SOTA methods.
Similar content being viewed by others
Data availability
Not applicable.
References
Abdulmumin I, Galadanci BS, Ahmad IS, Abdullahi RI (2021) Data selection as an alternative to quality estimation in self-learning for low resource neural machine translation. In: International conference on computational science and its applications, pp 311–326. Springer
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
Chen K, Wang R, Utiyama M, Sumita E (2019) Neural machine translation with reordering embeddings. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 1787–1799
Chiang Hsiu-Sen, Chen Mu-Yen, Huang Yu-Jhih (2019) Wavelet-based eeg processing for epilepsy detection using fuzzy entropy and associative petri net. IEEE Access 7:103255–103262
Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014a). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014b) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. In: International conference on machine learning, pp 1243–1252. PMLR
Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: International conference on learning representations 2015
Hinton EG, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Junczys-Dowmunt M, Dwojak T, Hoang H (2014) Is neural machine translation ready for deployment. In: The 13th international conference on spoken language translation
Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1700–1709
Li B, Wang Z, Liu H, Du Q, Xiao T, Zhang C, Zhu J (2020a) Learning light-weight translation models from deep transformer. arXiv preprint arXiv:2012.13866,
Li B, Wang Z, Liu H, Jiang Y, Du Q, Xiao T, Wang H, Zhu J (2020b) Shallow-to-deep training for neural machine translation. arXiv preprint arXiv:2010.03737,
Liao B, Khadivi S, Hewavitharana S (2021) Back-translation for large-scale multilingual machine translation. arXiv preprint arXiv:2109.08712,
López-Gonzále , Meda-Campa\({\tilde{n}}\)a JA, Hernández-Martínez EG, Paniagua-Contro P (2020) Multi robot distance based formation using parallel genetic algorithm. Soft Comput 86:105929
Meng F, Zhang J (2019) Dtmt: A novel deep transition architecture for neural machine translation. In Proc AAAI Conf Artif Intell 33:224–231
Miyato T, Dai AM , Goodfellow I (2016) Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725,
Mújica-Vargas D (2021) Superpixels extraction by an intuitionistic fuzzy clustering algorithm. Res Technol 19(2):140–152
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp 311–318
de Jesús Rubio J (2021) Stability analysis of the modified levenberg-marquardt algorithm for the artificial neural network training. IEEE Trans Neural Netw Learn Syst 32(8):3510–3524
Rubio J, Lughofer E, Pieper J, Cruz P, Martinez DI, Ochoa G, Islas MA, Enrique G (2021) Adapting h-infinity controller for the desired reference tracking of the sphere position in the Maglev process. Inf Sci 569:669–686
Rubio J , Islas MA, Ochoa G, Cruz DR, García E, Pacheco J (2022) Convergent newton method and neural network for the electric energy usage prediction. Inf Sci 585:89–112
Shaw P, Uszkoreit J, Vaswani A (2018) Self-attention with relative position representations. arXiv preprint arXiv:1803.02155,
Shi Y, Wang Y, Wu C,Yeh C-F, Chan J, Zhang F, Le D, Seltzer M (2021). Emformer: efficient memory transformer based acoustic model for low latency streaming speech recognition. In: ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6783–6787. IEEE
So D, Le Q, Liang C (2019) The evolved transformer. In: International conference on machine learning, pp 5877–5886. PMLR
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv Neural Inf Process Systems, 27
Tu Z, Lu Z, Liu Y, Liu X, Li H (2016) Modeling coverage for neural machine translation. In: the 54th annual meeting of the association for computational linguistics, pp 76–85
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst, 30
Wilks Y (1993) Corpora and machine translation. In: Proceedings of machine translation summit IV, pp 137–146
Wu F, Fan A, Baevski A, Dauphin YN, Auli M (2019). Pay less attention with lightweight and dynamic convolutions. arXiv preprint arXiv:1901.10430,
Zhu J, Xia Y, Wu L, He D, Qin T, Zhou W, Li H, Liu TY (2020) Incorporating bert into neural machine translation. arXiv preprint arXiv:2002.06823,
Ziemski M, Junczys-Dowmunt M, Pouliquen B (2016) The united nations parallel corpus v1. 0. In: Proceedings of the tenth international conference on language resources and evaluation (LREC’16), pp 3530–3534
Funding
This work is funded by the National Natural Science Foundation of China (No. 62076045) and the High-Level Talent Innovation Support Program (Young Science and Technology Star) of Dalian (No. 2021RQ066).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by YL, YS and ZL. The first draft of the manuscript was written by YL and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflicts of interest regarding to publish this paper.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by Oscar Castillo.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, Y., Shan, Y., Liu, Z. et al. Transformer fast gradient method with relative positional embedding: a mutual translation model between English and Chinese. Soft Comput 27, 13435–13443 (2023). https://doi.org/10.1007/s00500-022-07678-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-022-07678-5