Transformer fast gradient method with relative positional embedding: a mutual translation model between English and Chinese

Li, Yuchen; Shan, Yongxue; Liu, Zhuoya; Che, Chao; Zhong, Zhaoqian

doi:10.1007/s00500-022-07678-5

Transformer fast gradient method with relative positional embedding: a mutual translation model between English and Chinese

Focus
Published: 30 November 2022

Volume 27, pages 13435–13443, (2023)
Cite this article

Soft Computing Aims and scope Submit manuscript

Yuchen Li¹,
Yongxue Shan¹,
Zhuoya Liu¹,
Chao Che¹ &
…
Zhaoqian Zhong¹

205 Accesses
1 Citation
Explore all metrics

Abstract

Machine translation uses computers to transform one natural language into another. Text-like neural machine translation tasks cannot fully identify the sequence order of texts or the long-term dependence between words, and they suffer from excessive translation and mistranslation. To improve the naturalness, fluency, and accuracy of translation, this study proposes a new training strategy, the transformer fast gradient method with relative positional embedding (TF-RPE), which includes the fast gradient method (FGM) of adversarial training and relative positional embedding. The input sequence is founded on the transformer model, and after the word embedding matrix converts a word vector in the word embedding layer, the positional encoding can be embedded in it through relative positional embedding, helping the word vector to better save the linguistic information of the word (meaning, semantics). The addition of FGM adversarial training to the multi-head attention encoder mechanism strengthens the training of word vectors and reduces the phenomenon of miss-or-error translation, enabling significant improvement of the overall computational efficiency and accuracy of the model. TF-RPE can also provide satisfactory high-quality translations for the low-resource corpus. Extensive ablation studies and comparative analyses validate the effectiveness of the scheme, and TF-RPE achieves an improvement of average 3+ Bilingual evaluation understudy scores compared with the SOTA methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review on the long short-term memory model

Article 13 May 2020

A Review on Word Embedding Techniques for Text Classification

An analysis of large language models: their impact and potential applications

Article 11 May 2024

Data availability

Not applicable.

References

Abdulmumin I, Galadanci BS, Ahmad IS, Abdullahi RI (2021) Data selection as an alternative to quality estimation in self-learning for low resource neural machine translation. In: International conference on computational science and its applications, pp 311–326. Springer
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
Chen K, Wang R, Utiyama M, Sumita E (2019) Neural machine translation with reordering embeddings. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 1787–1799
Chiang Hsiu-Sen, Chen Mu-Yen, Huang Yu-Jhih (2019) Wavelet-based eeg processing for epilepsy detection using fuzzy entropy and associative petri net. IEEE Access 7:103255–103262
Article Google Scholar
Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014a). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014b) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. In: International conference on machine learning, pp 1243–1252. PMLR
Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: International conference on learning representations 2015
Hinton EG, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MathSciNet MATH Google Scholar
Junczys-Dowmunt M, Dwojak T, Hoang H (2014) Is neural machine translation ready for deployment. In: The 13th international conference on spoken language translation
Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1700–1709
Li B, Wang Z, Liu H, Du Q, Xiao T, Zhang C, Zhu J (2020a) Learning light-weight translation models from deep transformer. arXiv preprint arXiv:2012.13866,
Li B, Wang Z, Liu H, Jiang Y, Du Q, Xiao T, Wang H, Zhu J (2020b) Shallow-to-deep training for neural machine translation. arXiv preprint arXiv:2010.03737,
Liao B, Khadivi S, Hewavitharana S (2021) Back-translation for large-scale multilingual machine translation. arXiv preprint arXiv:2109.08712,
López-Gonzále , Meda-Campa\({\tilde{n}}\)a JA, Hernández-Martínez EG, Paniagua-Contro P (2020) Multi robot distance based formation using parallel genetic algorithm. Soft Comput 86:105929
Meng F, Zhang J (2019) Dtmt: A novel deep transition architecture for neural machine translation. In Proc AAAI Conf Artif Intell 33:224–231
Google Scholar
Miyato T, Dai AM , Goodfellow I (2016) Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725,
Mújica-Vargas D (2021) Superpixels extraction by an intuitionistic fuzzy clustering algorithm. Res Technol 19(2):140–152
Google Scholar
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp 311–318
de Jesús Rubio J (2021) Stability analysis of the modified levenberg-marquardt algorithm for the artificial neural network training. IEEE Trans Neural Netw Learn Syst 32(8):3510–3524
Article MathSciNet Google Scholar
Rubio J, Lughofer E, Pieper J, Cruz P, Martinez DI, Ochoa G, Islas MA, Enrique G (2021) Adapting h-infinity controller for the desired reference tracking of the sphere position in the Maglev process. Inf Sci 569:669–686
Rubio J , Islas MA, Ochoa G, Cruz DR, García E, Pacheco J (2022) Convergent newton method and neural network for the electric energy usage prediction. Inf Sci 585:89–112
Shaw P, Uszkoreit J, Vaswani A (2018) Self-attention with relative position representations. arXiv preprint arXiv:1803.02155,
Shi Y, Wang Y, Wu C,Yeh C-F, Chan J, Zhang F, Le D, Seltzer M (2021). Emformer: efficient memory transformer based acoustic model for low latency streaming speech recognition. In: ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 6783–6787. IEEE
So D, Le Q, Liang C (2019) The evolved transformer. In: International conference on machine learning, pp 5877–5886. PMLR
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. Adv Neural Inf Process Systems, 27
Tu Z, Lu Z, Liu Y, Liu X, Li H (2016) Modeling coverage for neural machine translation. In: the 54th annual meeting of the association for computational linguistics, pp 76–85
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst, 30
Wilks Y (1993) Corpora and machine translation. In: Proceedings of machine translation summit IV, pp 137–146
Wu F, Fan A, Baevski A, Dauphin YN, Auli M (2019). Pay less attention with lightweight and dynamic convolutions. arXiv preprint arXiv:1901.10430,
Zhu J, Xia Y, Wu L, He D, Qin T, Zhou W, Li H, Liu TY (2020) Incorporating bert into neural machine translation. arXiv preprint arXiv:2002.06823,
Ziemski M, Junczys-Dowmunt M, Pouliquen B (2016) The united nations parallel corpus v1. 0. In: Proceedings of the tenth international conference on language resources and evaluation (LREC’16), pp 3530–3534

Download references

Funding

This work is funded by the National Natural Science Foundation of China (No. 62076045) and the High-Level Talent Innovation Support Program (Young Science and Technology Star) of Dalian (No. 2021RQ066).

Author information

Authors and Affiliations

Key Laboratory of Advanced Design and Intelligent Computing Ministry of Education, Dalian University, Dalian, 116622, Liaoning, China
Yuchen Li, Yongxue Shan, Zhuoya Liu, Chao Che & Zhaoqian Zhong

Authors

Yuchen Li
View author publications
You can also search for this author in PubMed Google Scholar
Yongxue Shan
View author publications
You can also search for this author in PubMed Google Scholar
Zhuoya Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chao Che
View author publications
You can also search for this author in PubMed Google Scholar
Zhaoqian Zhong
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by YL, YS and ZL. The first draft of the manuscript was written by YL and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Zhaoqian Zhong.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest regarding to publish this paper.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by Oscar Castillo.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, Y., Shan, Y., Liu, Z. et al. Transformer fast gradient method with relative positional embedding: a mutual translation model between English and Chinese. Soft Comput 27, 13435–13443 (2023). https://doi.org/10.1007/s00500-022-07678-5

Download citation

Accepted: 18 November 2022
Published: 30 November 2022
Issue Date: September 2023
DOI: https://doi.org/10.1007/s00500-022-07678-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transformer fast gradient method with relative positional embedding: a mutual translation model between English and Chinese

Abstract

Access this article

Similar content being viewed by others

A review on the long short-term memory model

A Review on Word Embedding Techniques for Text Classification

An analysis of large language models: their impact and potential applications

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Transformer fast gradient method with relative positional embedding: a mutual translation model between English and Chinese

Abstract

Access this article

Similar content being viewed by others

A review on the long short-term memory model

A Review on Word Embedding Techniques for Text Classification

An analysis of large language models: their impact and potential applications

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation