A fast adaptive algorithm for training deep neural networks

Gui, Yangting; Li, Dequan; Fang, Runyue

doi:10.1007/s10489-022-03629-7

A fast adaptive algorithm for training deep neural networks

Published: 06 June 2022

Volume 53, pages 4099–4108, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

553 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

Among the adaptive algorithms, Adam is the most widely used algorithm, especially for training deep neural networks. However, recent studies have shown that it has a weak generalization ability, and even cannot converge in extreme cases. AdaX (2020) is a variant of Adam, which modifies the second moment of Adam, making the algorithm enjoy good generalization ability compared to SGD. This work aims to improve the AdaX algorithm with faster convergence speed and higher training accuracy. The first moment of AdaX is essentially a classical momentum term, while the Nesterov’s accelerated gradient (NAG) is theoretically and experimentally superior to this classical momentum. Therefore, we replace the classical momentum term of the first moment of AdaX with NAG, and obtain the resulting algorithm named Nesterov’s accelerated AdaX (Nadax). Extensive experiments on deep learning tasks show that training models with our proposed Nadax can bring favorable benefits.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AdaXod: a new adaptive and momental bound algorithm for training deep neural networks

Article 09 May 2023

Deep Learning

Deep Learning Basics

References

Li W, Zhang Z, Wang X, Adax PL (2020) Adaptive gradient descent with exponential long term memory. arXiv:2004.09740
Sharma N, Jain V, Mishra A (2018) An analysis of convolutional neural networks for image classification. Procedia Comput Sci 132:377–384
Article Google Scholar
Zhao W, Lou M, Qi Y, Wang Y, Xu C, Deng X, Ma. Y (2021) Adaptive channel and multiscale spatial context network for breast mass segmentation in full-field mammograms. Applied Intelligence 51(12):8810–8827
Article Google Scholar
Tian P, Mo H, Jiang L (2021) Scene graph generation by multi-level semantic tasks. Applied Intelligence, 51(11):7781–7793
Article Google Scholar
Anup KG, Puneet G, Esa R (2021) Fatalread-fooling visual speech recognition models
Robbins H, Monro S (1951) A stochastic approximation method. The annals of mathematical statistics pages 400–407
Nesterov Y (1983) A method for unconstrained convex minimization problem with the rate of convergence o (1/kˆ 2). In Doklady an ussr, 269:543–547
Google Scholar
Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: International conference on machine learning, pages 1139–1147. PMLR
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research 12:7
MATH Google Scholar
Matthew DZ (2012) Adadelta: an adaptive learning rate method. arXiv:1212.5701
Tijmen T., Geoffrey H., et al. (2012) Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4(2):26– 31
Google Scholar
Kingma DP, Adam JB (2014) A method for stochastic optimization. arXiv:1412.6980
Timothy D (2016) Incorporating nesterov momentum into adam
Reddi SJ, Kale S, Kumar S (2019) On the convergence of adam and beyond. arXiv:1904.09237
Wilson AC, Roelofs R, Stern M, Srebro N, Recht B (2017) The marginal value of adaptive gradient methods in machine learning
Luo L, Xiong Y, Liu Y, Sun XU (2019) Adaptive gradient methods with dynamic bound of learning rate. arXiv:1902.09843
Boris TP (1964) Some methods of speeding up the convergence of iteration methods. Ussr computational mathematics and mathematical physics 4(5):1–17
Article Google Scholar
Zhuang J, Tang T, Ding Y, Tatikonda SC , Dvornek N, Papademetris X, Duncan J (2020) Adabelief optimizer: Adapting stepsizes by the belief in observed gradients. Adv Neural Inf Process Syst 33:18795–18806
Google Scholar
Hazan E (2019) Introduction to online convex optimization. arXiv:1909.05207
Zinkevich M (2003) Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the 20th international conference on machine learning (icml-03), 928–936
LeCun Y (1998) The mnist database of handwritten digits. http://yann.lecuncom/exdb/mnist/
Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. arXiv:1711.05101
Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images
Everingham M, Eslami SM , Gool LV, Williams CKI, Winn J, Zisserman A (2015) The pascal visual object classes challenge: A retrospective. International journal of computer vision 111(1):98–136
Article Google Scholar
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition 3431–3440
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Li H, Xu Z, Taylor G, Studer C, Goldstein T (2018) Visualizing the loss landscape of neural nets

Download references

Acknowledgements

This work is supported in part by the Natural Science Foundation of China under Grant No. 61472003, Academic and Technical Leaders and Backup Candidates of Anhui Province under Grant No. 2019h211, Innovation team of ’50 Star of Science and Technology’ of Huainan, Anhui Province.

Author information

Authors and Affiliations

School of Mathematics and Big Data, Anhui University of Science and Technology, Huainan, 232000, Anhui Province, China
Yangting Gui & Runyue Fang
School of Artificial Intelligence, Anhui University of Science and Technology, Huainan, 232000, Anhui Province, China
Dequan Li

Authors

Yangting Gui
View author publications
You can also search for this author in PubMed Google Scholar
Dequan Li
View author publications
You can also search for this author in PubMed Google Scholar
Runyue Fang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dequan Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gui, Y., Li, D. & Fang, R. A fast adaptive algorithm for training deep neural networks. Appl Intell 53, 4099–4108 (2023). https://doi.org/10.1007/s10489-022-03629-7

Download citation

Accepted: 12 April 2022
Published: 06 June 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s10489-022-03629-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A fast adaptive algorithm for training deep neural networks

Abstract

Access this article

Similar content being viewed by others

AdaXod: a new adaptive and momental bound algorithm for training deep neural networks

Deep Learning

Deep Learning Basics

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A fast adaptive algorithm for training deep neural networks

Abstract

Access this article

Similar content being viewed by others

AdaXod: a new adaptive and momental bound algorithm for training deep neural networks

Deep Learning

Deep Learning Basics

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation