Skip to main content

A Comparative Analysis of Gradient-Based Optimization Methods for Machine Learning Problems

  • Conference paper
  • First Online:
Proceedings on International Conference on Data Analytics and Computing (ICDAC 2022)

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 175))

Included in the following conference series:

Abstract

In this study, we compare and contrast the seven most widely used gradient-based optimization algorithms of the first order for machine learning problems. These methods are Stochastic Gradient Descent with momentum (SGD), Adaptive Gradient (AdaGrad), Adaptive Delta (AdaDelta), Root Mean Square Propagation (RMSProp), Adaptive Moment Estimation (Adam), Nadam (Nestrove accelerated adaptive moment estimation) and Adamax (maximum adaptive moment estimation). For model creation and comparison, three test problems based on regression, binary classification and multi-classification are addressed. Using three randomly selected datasets, we trained the model and evaluated the optimization strategy in terms of accuracy and loss function. The total experimental results demonstrate that Nadam outperformed the other optimization approach across these datasets, but only in terms of correctness, not in terms of time. Adam optimizer has the best performance in terms of time and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Soydaner D (2020) A comparison of optimization algorithms for deep learning. Int J Pattern Recognit Artif Intell 34(13):2052013

    Article  Google Scholar 

  2. Reddi SJ, Kale S, Kumar S (2019) On the convergence of adam and beyond. arXiv:1904.09237

  3. Loshchilov I, HutterF (2019) Decoupled weight decay regularization. arXiv:1711.05101

  4. Ma J, Yarats D (2018) Quasi-hyperbolic momentum and adam for deep learning. arXiv:1810.06801

  5. Lucas J, Sun S, Zemel R, Grosse R (2018) Aggregated momentum: Stability through passive damping. arXiv:1804.00325

  6. Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw 12(1):145–151

    Article  Google Scholar 

  7. Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(7)

    Google Scholar 

  8. Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv:1212.5701

  9. G. Hinton (2012) Neural networks for machine learning, coursera, video lectures

    Google Scholar 

  10. Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980

  11. Dozat T (2016) Incorporating nesterov momentum into adam

    Google Scholar 

  12. Acharya MS, Armaan A, Antony AS (2019) A comparison of regression models for prediction of graduate admissions. In: International conference on computational intelligence in data science (ICCIDS). IEEE, pp 1–5

    Google Scholar 

  13. https://www.kaggle.com/datasets/mathchi/churn-for-bank-customers

  14. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Neha Yadav .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Maurya, M., Yadav, N. (2023). A Comparative Analysis of Gradient-Based Optimization Methods for Machine Learning Problems. In: Yadav, A., Gupta, G., Rana, P., Kim, J.H. (eds) Proceedings on International Conference on Data Analytics and Computing. ICDAC 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 175. Springer, Singapore. https://doi.org/10.1007/978-981-99-3432-4_7

Download citation

Publish with us

Policies and ethics