Skip to main content
Log in

A fast non-monotone line search for stochastic gradient descent

  • Research Article
  • Published:
Optimization and Engineering Aims and scope Submit manuscript

Abstract

We give an improved non-monotone line search algorithm for stochastic gradient descent (SGD) for functions that satisfy interpolation conditions. We establish theoretical convergence guarantees for the algorithm for non-convex functions. We conduct a detailed empirical evaluation to validate the theoretical results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html”.

References

  • Ahookhosh M, Amini K, Peyghami MR (2012) A nonmonotone trust-region line search method for large-scale unconstrained optimization. Appl Math Model 36(1):478–487

    Article  MathSciNet  MATH  Google Scholar 

  • Amini K, Ahookhosh M, Nosratipour H (2014) An inexact line search approach using modified nonmonotone strategy for unconstrained optimization. Numer Algorithms 66(1):49–78

    Article  MathSciNet  MATH  Google Scholar 

  • Chamberlain R, Powell M, Lemarechal C, Pedersen H (1982) The watchdog technique for forcing convergence in algorithms for constrained optimization. Algorithms for Constrained Minimization of Smooth Nonlinear Functions. Springer, New York, pp 1–17

    MATH  Google Scholar 

  • Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. JMLR 12(7):2121–2159

    MathSciNet  MATH  Google Scholar 

  • Ghadimi S, Lan G (2013) Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J Optim 23(4):2341–2368

    Article  MathSciNet  MATH  Google Scholar 

  • Gower RM, Loizou N, Qian X, Sailanbayev A, Shulgin E, Richt’arik P (2019) SGD: general analysis and improved rates. In: International conference on machine learning, pp 5200–5209

  • Grippo L, Lampariello F, Lucidi S (1986) A nonmonotone line search technique for Newton’s method. SIAM J Numer Anal 23(4):707–716

    Article  MathSciNet  MATH  Google Scholar 

  • Karimi H, Nutini J, Schmidt M (2016) Linear convergence of gradient and proximal-gradient methods under the Polyak–łojasiewicz condition. In: Joint European conference on machine learning and knowledge discovery in databases, pp 795–811

  • Loizou N, Vaswani S, Laradji IH, Lacoste-Julien S (2021) Stochastic Polyak stepsize for SGD: an adaptive learning rate for fast convergence. In: International conference on artificial intelligence and statistics, pp 1306–1314

  • Moulines E, Bach F (2011) Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: Advances in neural information processing systems, vol 24

  • Needell D, Ward R, Srebro N (2014) Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. In: Advances in neural information processing systems, vol 27

  • Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J Optim 19(4):1574–1609

    Article  MathSciNet  MATH  Google Scholar 

  • Nocedal J, Wright SJ (1999) Numerical optimization. Springer, New York

    Book  MATH  Google Scholar 

  • Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat 22:400–407

    Article  MathSciNet  MATH  Google Scholar 

  • Schaul T, Zhang S, LeCun Y (2013) No more pesky learning rates. In: Dasgupta S, McAllester D (eds) Proceedings of the 30th international conference on machine learning. Proceedings of machine learning research, vol 28, pp 343–351

  • Tan C, Ma S, Dai Y-H, Qian Y (2016) Barzilai–Borwein step size for stochastic gradient descent. In: Advances in neural information processing systems, vol 29

  • Tieleman T, Hinton G (2017) Divide the gradient by a running average of its recent magnitude. coursera: neural networks for machine learning. Technical report

  • Vaswani S, Dubois-Taine B, Babanezhad R (2022) Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: International conference on machine learning, pp 22015–22059

  • Vaswani S, Mishkin A, Laradji I, Schmidt M, Gidel G, Lacoste-Julien S (2019) Painless stochastic gradient: interpolation, line-search, and convergence rates. In: Advances in neural information processing systems, vol 32

  • Yang Z (2021) On the step size selection in variance-reduced algorithm for nonconvex optimization. Expert Syst Appl 169:114336

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sajad Fathi Hafshejani.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fathi Hafshejani, S., Gaur, D., Hossain, S. et al. A fast non-monotone line search for stochastic gradient descent. Optim Eng (2023). https://doi.org/10.1007/s11081-023-09836-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11081-023-09836-6

Keywords

Navigation