Abstract
We give an improved non-monotone line search algorithm for stochastic gradient descent (SGD) for functions that satisfy interpolation conditions. We establish theoretical convergence guarantees for the algorithm for non-convex functions. We conduct a detailed empirical evaluation to validate the theoretical results.
Similar content being viewed by others
References
Ahookhosh M, Amini K, Peyghami MR (2012) A nonmonotone trust-region line search method for large-scale unconstrained optimization. Appl Math Model 36(1):478–487
Amini K, Ahookhosh M, Nosratipour H (2014) An inexact line search approach using modified nonmonotone strategy for unconstrained optimization. Numer Algorithms 66(1):49–78
Chamberlain R, Powell M, Lemarechal C, Pedersen H (1982) The watchdog technique for forcing convergence in algorithms for constrained optimization. Algorithms for Constrained Minimization of Smooth Nonlinear Functions. Springer, New York, pp 1–17
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. JMLR 12(7):2121–2159
Ghadimi S, Lan G (2013) Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J Optim 23(4):2341–2368
Gower RM, Loizou N, Qian X, Sailanbayev A, Shulgin E, Richt’arik P (2019) SGD: general analysis and improved rates. In: International conference on machine learning, pp 5200–5209
Grippo L, Lampariello F, Lucidi S (1986) A nonmonotone line search technique for Newton’s method. SIAM J Numer Anal 23(4):707–716
Karimi H, Nutini J, Schmidt M (2016) Linear convergence of gradient and proximal-gradient methods under the Polyak–łojasiewicz condition. In: Joint European conference on machine learning and knowledge discovery in databases, pp 795–811
Loizou N, Vaswani S, Laradji IH, Lacoste-Julien S (2021) Stochastic Polyak stepsize for SGD: an adaptive learning rate for fast convergence. In: International conference on artificial intelligence and statistics, pp 1306–1314
Moulines E, Bach F (2011) Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: Advances in neural information processing systems, vol 24
Needell D, Ward R, Srebro N (2014) Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. In: Advances in neural information processing systems, vol 27
Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J Optim 19(4):1574–1609
Nocedal J, Wright SJ (1999) Numerical optimization. Springer, New York
Robbins H, Monro S (1951) A stochastic approximation method. Ann Math Stat 22:400–407
Schaul T, Zhang S, LeCun Y (2013) No more pesky learning rates. In: Dasgupta S, McAllester D (eds) Proceedings of the 30th international conference on machine learning. Proceedings of machine learning research, vol 28, pp 343–351
Tan C, Ma S, Dai Y-H, Qian Y (2016) Barzilai–Borwein step size for stochastic gradient descent. In: Advances in neural information processing systems, vol 29
Tieleman T, Hinton G (2017) Divide the gradient by a running average of its recent magnitude. coursera: neural networks for machine learning. Technical report
Vaswani S, Dubois-Taine B, Babanezhad R (2022) Towards noise-adaptive, problem-adaptive (accelerated) stochastic gradient descent. In: International conference on machine learning, pp 22015–22059
Vaswani S, Mishkin A, Laradji I, Schmidt M, Gidel G, Lacoste-Julien S (2019) Painless stochastic gradient: interpolation, line-search, and convergence rates. In: Advances in neural information processing systems, vol 32
Yang Z (2021) On the step size selection in variance-reduced algorithm for nonconvex optimization. Expert Syst Appl 169:114336
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fathi Hafshejani, S., Gaur, D., Hossain, S. et al. A fast non-monotone line search for stochastic gradient descent. Optim Eng (2023). https://doi.org/10.1007/s11081-023-09836-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11081-023-09836-6