Abstract
In this study, we compare and contrast the seven most widely used gradient-based optimization algorithms of the first order for machine learning problems. These methods are Stochastic Gradient Descent with momentum (SGD), Adaptive Gradient (AdaGrad), Adaptive Delta (AdaDelta), Root Mean Square Propagation (RMSProp), Adaptive Moment Estimation (Adam), Nadam (Nestrove accelerated adaptive moment estimation) and Adamax (maximum adaptive moment estimation). For model creation and comparison, three test problems based on regression, binary classification and multi-classification are addressed. Using three randomly selected datasets, we trained the model and evaluated the optimization strategy in terms of accuracy and loss function. The total experimental results demonstrate that Nadam outperformed the other optimization approach across these datasets, but only in terms of correctness, not in terms of time. Adam optimizer has the best performance in terms of time and accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Soydaner D (2020) A comparison of optimization algorithms for deep learning. Int J Pattern Recognit Artif Intell 34(13):2052013
Reddi SJ, Kale S, Kumar S (2019) On the convergence of adam and beyond. arXiv:1904.09237
Loshchilov I, HutterF (2019) Decoupled weight decay regularization. arXiv:1711.05101
Ma J, Yarats D (2018) Quasi-hyperbolic momentum and adam for deep learning. arXiv:1810.06801
Lucas J, Sun S, Zemel R, Grosse R (2018) Aggregated momentum: Stability through passive damping. arXiv:1804.00325
Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw 12(1):145–151
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(7)
Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv:1212.5701
G. Hinton (2012) Neural networks for machine learning, coursera, video lectures
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Dozat T (2016) Incorporating nesterov momentum into adam
Acharya MS, Armaan A, Antony AS (2019) A comparison of regression models for prediction of graduate admissions. In: International conference on computational intelligence in data science (ICCIDS). IEEE, pp 1–5
https://www.kaggle.com/datasets/mathchi/churn-for-bank-customers
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Maurya, M., Yadav, N. (2023). A Comparative Analysis of Gradient-Based Optimization Methods for Machine Learning Problems. In: Yadav, A., Gupta, G., Rana, P., Kim, J.H. (eds) Proceedings on International Conference on Data Analytics and Computing. ICDAC 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 175. Springer, Singapore. https://doi.org/10.1007/978-981-99-3432-4_7
Download citation
DOI: https://doi.org/10.1007/978-981-99-3432-4_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-3431-7
Online ISBN: 978-981-99-3432-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)