Ritz-like values in steplength selections for stochastic gradient methods

Franchini, Giorgia; Ruggiero, Valeria; Zanni, Luca

doi:10.1007/s00500-020-05219-6

Ritz-like values in steplength selections for stochastic gradient methods

Focus
Published: 18 August 2020

Volume 24, pages 17573–17588, (2020)
Cite this article

Soft Computing Aims and scope Submit manuscript

205 Accesses
6 Citations
Explore all metrics

Abstract

The steplength selection is a crucial issue for the effectiveness of the stochastic gradient methods for large-scale optimization problems arising in machine learning. In a recent paper, Bollapragada et al. (SIAM J Optim 28(4):3312–3343, 2018) propose to include an adaptive subsampling strategy into a stochastic gradient scheme, with the aim to assure the descent feature in expectation of the stochastic gradient directions. In this approach, theoretical convergence properties are preserved under the assumption that the positive steplength satisfies at any iteration a suitable bound depending on the inverse of the Lipschitz constant of the objective function gradient. In this paper, we propose to tailor for the stochastic gradient scheme the steplength selection adopted in the full-gradient method knows as limited memory steepest descent method. This strategy, based on the Ritz-like values of a suitable matrix, enables to give a local estimate of the inverse of the local Lipschitz parameter, without introducing line search techniques, while the possible increase in the size of the subsample used to compute the stochastic gradient enables to control the variance of this direction. An extensive numerical experimentation highlights that the new rule makes the tuning of the parameters less expensive than the trial procedure for the efficient selection of a constant step in standard and mini-batch stochastic gradient methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Steplength and Mini-batch Size Selection in Stochastic Gradient Methods

A mini-batch stochastic conjugate gradient algorithm with variance reduction

Article 01 July 2022

Thresholding Procedure via Barzilai-Borwein Rules for the Steplength Selection in Stochastic Gradient Methods

References

Bellavia S, Gurioli G, Morini B, Toint PL (2019) Adaptive regularization algorithms with inexact evaluations for nonconvex optimization. SIAM J Optim 29(4):2881–2915
Article MathSciNet Google Scholar
Bollapragada R, Byrd R, Nocedal J (2018) Adaptive sampling strategies for stochastic optimization. SIAM J Optim 28(4):3312–3343
Article MathSciNet Google Scholar
Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev 60(2):223–311
Article MathSciNet Google Scholar
Byrd RH, Chin GM, Nocedal J, Wu Y (2012) Sample size selection in optimization methods for machine learning. Math Program 1(134):127–155
Article MathSciNet Google Scholar
Cartis C, Scheinberg K (2015) Global convergence rate analysis of unconstrained optimization methods based on probabilistic models. Math Program 1:1–39
MATH Google Scholar
Curtis FE, Guo W (2016) Handling nonpositive curvature in a limited memory steepest descent method. IMA J Numer Anal 36(2):717–742. https://doi.org/10.1093/imanum/drv034
Article MathSciNet MATH Google Scholar
Dai YH, Yuan Y (2003) Alternate minimization gradient method. IMA J Numer Anal 23:377–393
Article MathSciNet Google Scholar
Defazio A, Bach FR, Lacoste-Julien S (2014) SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. In: NIPS
di Serafino D, Ruggiero V, Toraldo G, Zanni L (2018) On the steplength selection in gradient methods for unconstrained optimization. Appl Math Comput 318:176–195
MathSciNet MATH Google Scholar
Fletcher R (2012) A limited memory steepest descent method. Math Program Ser A 135:413–436
Article MathSciNet Google Scholar
Franchini G, Ruggiero V, Zanni L (2020) On the steplength selection in Stochastic Gradient Methods. In: Sergeyev YD, Kvasov DE (eds) Numerical computations: theory and algorithms (NUMTA, 2019). Lecture notes in computer science, vol 11973. Springer, Berlin
Google Scholar
Frassoldati G, Zanghirati G, Zanni L (2008) New adaptive stepsize selections in gradient methods. J Ind Manag Optim 4(2):299–312
MathSciNet MATH Google Scholar
Friedlander MP, Schmidt M (2012) Hybrid deterministic-stochastic methods for data fitting. SIAM J Sci Comput 34(3):A1380–A1405
Article MathSciNet Google Scholar
Hashemi F, Ghosh S, Pasupathy R (2014) In adaptive sampling rules for stochastic recursions. In: Simulation conference (WSC) 2014, Winter, pp 3959–3970
Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates Inc, Red Hook, pp 315–323
Google Scholar
Karimi H, Nutini J, Schmidt M, (2016) Linear convergence of gradient and proximal-gradient methods under the Polyak–Łojasiewicz condition. In: Frasconi P, Landwehr N, Manco G, Vreeken J (eds) Machine learning and knowledge discovery in databases ECML PKDD 2016. Lecture notes in computer science, vol 9851. Springer, Berlin
Tan C, Ma S, Dai Y, Qian Y (2016) BB step size for SGD. In: Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R (eds) Advances in neural information processing systems 29 (NIPS 2016). Springer, Berlin
Google Scholar
Tropp JA (2015) An introduction to matrix concentration inequalities. Found Trends Mach Lear 8(1–2):1–230. https://doi.org/10.1561/2200000048
Yang Z, Wang C, Zang Y, Li J (2018) Mini-batch algorithms with Barzilai–Borwein update step. Neurocomputing 314:177–185
Article Google Scholar
Zhou B, Gao L, Dai YH (2006) Gradient methods with adaptive step-sizes. Comput Optim Appl 35(1):69–86
Article MathSciNet Google Scholar

Download references

Acknowledgements

We thank the anonymous reviewers for their careful reading of our manuscript and their useful comments and suggestions. This work has been partially supported by the INdAM research group GNCS. The authors are members of the INDAM research group GNCS.

Author information

Authors and Affiliations

Department of Physics, Informatics and Mathematics, University of Modena and Reggio Emilia, Modena, Italy
Giorgia Franchini & Luca Zanni
Department of Mathematics and Computer Science, University of Ferrara, Ferrara, Italy
Valeria Ruggiero

Authors

Giorgia Franchini
View author publications
You can also search for this author in PubMed Google Scholar
Valeria Ruggiero
View author publications
You can also search for this author in PubMed Google Scholar
Luca Zanni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giorgia Franchini.

Ethics declarations

Conflict of interest

Giorgia Franchini, Valeria Ruggiero and Luca Zanni declare that they have no conflict of interest.

Human and animal rights statement

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by Yaroslav D. Sergeyev.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Franchini, G., Ruggiero, V. & Zanni, L. Ritz-like values in steplength selections for stochastic gradient methods. Soft Comput 24, 17573–17588 (2020). https://doi.org/10.1007/s00500-020-05219-6

Download citation

Published: 18 August 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s00500-020-05219-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ritz-like values in steplength selections for stochastic gradient methods

Abstract

Access this article

Similar content being viewed by others

Steplength and Mini-batch Size Selection in Stochastic Gradient Methods

A mini-batch stochastic conjugate gradient algorithm with variance reduction

Thresholding Procedure via Barzilai-Borwein Rules for the Steplength Selection in Stochastic Gradient Methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights statement

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Ritz-like values in steplength selections for stochastic gradient methods

Abstract

Access this article

Similar content being viewed by others

Steplength and Mini-batch Size Selection in Stochastic Gradient Methods

A mini-batch stochastic conjugate gradient algorithm with variance reduction

Thresholding Procedure via Barzilai-Borwein Rules for the Steplength Selection in Stochastic Gradient Methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights statement

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation