Abstract
The steplength selection is a crucial issue for the effectiveness of the stochastic gradient methods for large-scale optimization problems arising in machine learning. In a recent paper, Bollapragada et al. (SIAM J Optim 28(4):3312–3343, 2018) propose to include an adaptive subsampling strategy into a stochastic gradient scheme, with the aim to assure the descent feature in expectation of the stochastic gradient directions. In this approach, theoretical convergence properties are preserved under the assumption that the positive steplength satisfies at any iteration a suitable bound depending on the inverse of the Lipschitz constant of the objective function gradient. In this paper, we propose to tailor for the stochastic gradient scheme the steplength selection adopted in the full-gradient method knows as limited memory steepest descent method. This strategy, based on the Ritz-like values of a suitable matrix, enables to give a local estimate of the inverse of the local Lipschitz parameter, without introducing line search techniques, while the possible increase in the size of the subsample used to compute the stochastic gradient enables to control the variance of this direction. An extensive numerical experimentation highlights that the new rule makes the tuning of the parameters less expensive than the trial procedure for the efficient selection of a constant step in standard and mini-batch stochastic gradient methods.
Similar content being viewed by others
References
Bellavia S, Gurioli G, Morini B, Toint PL (2019) Adaptive regularization algorithms with inexact evaluations for nonconvex optimization. SIAM J Optim 29(4):2881–2915
Bollapragada R, Byrd R, Nocedal J (2018) Adaptive sampling strategies for stochastic optimization. SIAM J Optim 28(4):3312–3343
Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev 60(2):223–311
Byrd RH, Chin GM, Nocedal J, Wu Y (2012) Sample size selection in optimization methods for machine learning. Math Program 1(134):127–155
Cartis C, Scheinberg K (2015) Global convergence rate analysis of unconstrained optimization methods based on probabilistic models. Math Program 1:1–39
Curtis FE, Guo W (2016) Handling nonpositive curvature in a limited memory steepest descent method. IMA J Numer Anal 36(2):717–742. https://doi.org/10.1093/imanum/drv034
Dai YH, Yuan Y (2003) Alternate minimization gradient method. IMA J Numer Anal 23:377–393
Defazio A, Bach FR, Lacoste-Julien S (2014) SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: NIPS
di Serafino D, Ruggiero V, Toraldo G, Zanni L (2018) On the steplength selection in gradient methods for unconstrained optimization. Appl Math Comput 318:176–195
Fletcher R (2012) A limited memory steepest descent method. Math Program Ser A 135:413–436
Franchini G, Ruggiero V, Zanni L (2020) On the steplength selection in Stochastic Gradient Methods. In: Sergeyev YD, Kvasov DE (eds) Numerical computations: theory and algorithms (NUMTA, 2019). Lecture notes in computer science, vol 11973. Springer, Berlin
Frassoldati G, Zanghirati G, Zanni L (2008) New adaptive stepsize selections in gradient methods. J Ind Manag Optim 4(2):299–312
Friedlander MP, Schmidt M (2012) Hybrid deterministic-stochastic methods for data fitting. SIAM J Sci Comput 34(3):A1380–A1405
Hashemi F, Ghosh S, Pasupathy R (2014) In adaptive sampling rules for stochastic recursions. In: Simulation conference (WSC) 2014, Winter, pp 3959–3970
Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates Inc, Red Hook, pp 315–323
Karimi H, Nutini J, Schmidt M, (2016) Linear convergence of gradient and proximal-gradient methods under the Polyak–Łojasiewicz condition. In: Frasconi P, Landwehr N, Manco G, Vreeken J (eds) Machine learning and knowledge discovery in databases ECML PKDD 2016. Lecture notes in computer science, vol 9851. Springer, Berlin
Tan C, Ma S, Dai Y, Qian Y (2016) BB step size for SGD. In: Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R (eds) Advances in neural information processing systems 29 (NIPS 2016). Springer, Berlin
Tropp JA (2015) An introduction to matrix concentration inequalities. Found Trends Mach Lear 8(1–2):1–230. https://doi.org/10.1561/2200000048
Yang Z, Wang C, Zang Y, Li J (2018) Mini-batch algorithms with Barzilai–Borwein update step. Neurocomputing 314:177–185
Zhou B, Gao L, Dai YH (2006) Gradient methods with adaptive step-sizes. Comput Optim Appl 35(1):69–86
Acknowledgements
We thank the anonymous reviewers for their careful reading of our manuscript and their useful comments and suggestions. This work has been partially supported by the INdAM research group GNCS. The authors are members of the INDAM research group GNCS.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Giorgia Franchini, Valeria Ruggiero and Luca Zanni declare that they have no conflict of interest.
Human and animal rights statement
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by Yaroslav D. Sergeyev.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Franchini, G., Ruggiero, V. & Zanni, L. Ritz-like values in steplength selections for stochastic gradient methods. Soft Comput 24, 17573–17588 (2020). https://doi.org/10.1007/s00500-020-05219-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-020-05219-6