Abstract
The use of stochastic gradient algorithms for nonlinear optimization is of considerable interest, especially in the case of high dimensions. In this case, the choice of the step size is of key importance for the convergence rate. In this paper, we propose two new stochastic gradient algorithms that use an improved Barzilai–Borwein step size formula. Convergence analysis shows that these algorithms enable linear convergence in probability for strongly convex objective functions. Our computational experiments confirm that the proposed algorithms have better characteristics than two-point gradient algorithms and well-known stochastic gradient methods.
Similar content being viewed by others
REFERENCES
K. Chaudhuri, C. Monteleoni, and D. Sarwate, “Differentially private empirical risk minimization,” J. Mach. Learn. Res., No. 12, 1069–1109 (2011).
H. Robbins and S. Monro, “A stochastic approximation method,” Ann. Math. Stat. 22, 400–407 (1951).
Yu. E. Nesterov, “A method for solving a convex programming problem with a convergence rate O(1/k 2),” Dokl. Akad. Nauk SSSR 269, 543–547 (1983).
A. A. Gaivoronskii, “Nonstationary stochastic programming problems,” Cybernetics 14, 575–579 (1978).
B. T. Polyak, “New method of stochastic approximation type,” Autom. Remote Control 51, 937–946 (1990).
L. Xiao and T. Zhang, “A proximal stochastic gradient method with progressive variance reduction,” SIAM J. Optimiz. 24, 2057–2075 (2014).
R. L. Roux, M. Schmidt, and F. Bach, “A stochastic gradient method with an exponential convergence rate for finite training sets,” Adv. Neural Inform. Process. Syst. 4, 2663–2671 (2012).
S. Shalevshwartz and T. Zhang, “Stochastic dual coordinate ascent methods for regularized loss minimization,” J. Mach. Learn. Res. 14, 567–599 (2013).
R. Johnson and T. Zhang, “Accelerating stochastic gradient descent using predictive variance reduction,” in Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 2013, pp. 315–323.
J. Barzilai and J. M. Borwein, “Two-point step size gradient methods,” IMA J. Numer. Anal. 8, 141–148 (1988).
C. Tan, S. Ma, Y. H. Dai, and Y. Qian, “Barzilai–Borwein step size for stochastic gradient descent,” in Advances in Neural Information Processing Systems 29, Ed. by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Curran Assoc., New York, 2016), pp. 685–693.
M. Raydan, “On the Barzilai and Borwein choice of steplength for the gradient method,” IMA J. Numer. Anal. 13, 321–326 (1993).
Y. Dai, J. Yuan, and Y. X. Yuan, “Modified two-point stepsize gradient methods for unconstrained optimization,” Comput. Optimiz. Appl. 22, 103–109 (2002).
Y. X. Yuan, “Step-sizes for the gradient method,” Am. Math. Soc. 42, 785–796 (2008).
X. B. Jin, X. Y. Zhang, K. Huang, and G. G. Geng, “Stochastic conjugate gradient algorithm with variance reduction,” IEEE Trans. Neural Networks Learn. Syst. 30, 1360–1369 (2018).
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course (Kluwer Academic, Dordrecht, 2004).
www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/. Accessed June 6, 2020.
Funding
This study was supported in part by the Nanjing University of Aeronautics and Astronautics (project no. NG2019004), National Natural Science Foundation of China (project no. 11971231), and Russian Foundation for Basic Research (project no. 19-01-00625).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Translated by Yu. Kornienko
Rights and permissions
About this article
Cite this article
Wang, L., Wu, H. & Matveev, I.A. Stochastic Gradient Method with Barzilai–Borwein Step for Unconstrained Nonlinear Optimization. J. Comput. Syst. Sci. Int. 60, 75–86 (2021). https://doi.org/10.1134/S106423072101010X
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S106423072101010X