Advertisement

On the Steplength Selection in Stochastic Gradient Methods

  • Giorgia FranchiniEmail author
  • Valeria Ruggiero
  • Luca Zanni
Conference paper
  • 20 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11973)

Abstract

This paper deals with the steplength selection in stochastic gradient methods for large scale optimization problems arising in machine learning. We introduce an adaptive steplength selection derived by tailoring a limited memory steplength rule, recently developed in the deterministic context, to the stochastic gradient approach. The proposed steplength rule provides values within an interval, whose bounds need to be prefixed by the user. A suitable choice of the interval bounds allows to perform similarly to the standard stochastic gradient method equipped with the best-tuned steplength. Since the setting of the bounds slightly affects the performance, the new rule makes the tuning of the parameters less expensive with respect to the choice of the optimal prefixed steplength in the standard stochastic gradient method. We evaluate the behaviour of the proposed steplength selection in training binary classifiers on well known data sets and by using different loss functions.

Keywords

Stochastic gradient methods Steplength selection rule Ritz-like values Machine learning 

References

  1. 1.
    Bellavia, S., Krejic, N., Krklec Jerinkic, N.: Subsampled inexact Newton methods for minimizing large sums of convex functions. arXiv:1811.05730 (2018)
  2. 2.
    Bollapragada, R., Byrd, R., Nocedal, J.: Adaptive sampling strategies for stochastic optimization. SIAM J. Optim. 28(4), 3312–3343 (2018)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Fletcher, R.: A limited memory steepest descent method. Math. Program. Ser. A 135, 413–436 (2012)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Krejic, N., Krklec Jerinki, N.: Nonmonotone line search methods with variable sample size. Numer. Algorithms 68, 711–739 (2015)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Paquette, C., Scheinberg, K.: A stochastic line search method with convergence rate analysis. arXiv:1807.07994v1 (2018)
  7. 7.
    di Serafino, D., Ruggiero, V., Toraldo, G., Zanni, L.: On the steplength selection in gradient methods for unconstrained optimization. Appl. Math. Comput. 318, 176–195 (2018)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Sopyla, K., Drozda, P.: SGD with BB update step for SVM. Inf. Sci. Inform. Comput. Sci. Intell. Syst. Appl. 316(C), 218–233 (2015)zbMATHGoogle Scholar
  9. 9.
    Tan, C., Ma, S., Dai, Y., Qian, Y.: BB step size for SGD. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems (NIPS 2016), vol. 29 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of Physics, Informatics and MathematicsUniversity of Modena and Reggio EmiliaModenaItaly
  2. 2.Department of Mathematics and Computer ScienceUniversity of FerraraFerraraItaly
  3. 3.INdAM Research Group GNCSRomeItaly

Personalised recommendations