Abstract
The inexact SARAH (iSARAH) algorithm as a variant of SARAH algorithm for variance reduction has recently surged into prominence for solving large-scale optimization problems in the context of machine learning. The performance of the iSARAH significantly depends on the choice of step-size sequence. In this paper, we develop a new algorithm called iSARAH-BB, which employs the Barzilai–Borwein (BB) method to automatically compute step size based on SARAH. By introducing this adaptive step size in the design of the new algorithm, iSARAH-BB can take better advantages of both iSARAH and BB methods. Finally, we analyze the convergence rate and the complexity of the new algorithm under the usual assumptions. Numerical experiments on standard datasets indicate that our proposed iSARAH-BB algorithm is robust to the selection of the initial step size, and it is effective and more competitive than the existing algorithms.
Similar content being viewed by others
Data Availability
All data generated or analyzed during this study are included in this published article.
Notes
heart, splice, mushrooms, ijcnn1, a9a and w8a can be downloaded from https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.
References
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
Ding, F., Yang, H.Z., Liu, F.: Performance analysis of stochastic gradient algorithms under weak conditions. Sci. China Ser. F: Inf. Sci. 51, 1269–1280 (2008)
Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)
Moulines, E., Bach, F.R.:Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In: Advances in Neural Information Processing Systems, pp. 451–459 (2011)
Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162, 83–112 (2017)
Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In: NIPS, pp. 1646–1654 (2014)
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: NIPS, pp. 315–323 (2013)
Nguyen, L.M., Liu, J., Scheinberg, K., Tak’ač, M.: Stochastic recursive gradient algorithm for nonconvex optimization (2017). arXiv:1705.07261
Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: SARAH: A novel method for machine learning problems using stochastic recursive gradient. In: ICML, pp. 2613–2621 (2017)
Nguyen, L.M., Scheinberg, K., Takáč, M.: Inexact SARAH algorithm for stochastic optimization. Optim. Method. Softw. 36, 237–258 (2020)
Bottou, L.: Online learning and stochastic approximations. Online Learn. Neural Netw. 17, 9–42 (1998)
Duchi, J.C., Hazan, E., Singer, Y.J.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)
Kingma, D.P., Ba, J.: Adam:a method for stochastic optimization. In: International Conference on Learning Representations, pp. 1–13 (2015)
Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8(1), 141–148 (1988)
Dai, Y.H., Liao, L.Z.: R-linear convergence of the Barzilai and Borwein gradient method. IMA J. Numer. Anal. 22(1), 1–10 (2002)
Fletcher, R.: On the Barzilai–Borwein method. In: Optimization and control with applications, pp. 235–256 (2005)
Yu, T., Liu, X.W., Dai, Y.H., et al.: Stochastic variance reduced gradient methods using a trust-region-like scheme. J. Sci. Comput. 87(5), 1 (2021). https://doi.org/10.1007/s10915-020-01402-x
Sopya, K., Drozda, P.: Stochastic gradient descent with Barzilai–Borwein update step for SVM. Inf. Sci. 316, 218–233 (2015)
Tan, C., Ma, S., Dai, Y.H., Qian, Y.: Barzilai–Borwein step size for stochastic gradient descent. In: Neural Information Processing Systems, pp. 685-693 (2016)
Li, B.C., Giannakis, G.B.: Adaptive step sizes in variance reduction via regularization (2019). arXiv:1910.06532
Shao, G.M., Xue, W., Yu, G.H., Zheng, X.: Improved SVRG for finite sum structure optimization with application to binary classification. J. Ind. Manag. Optim. 16(5), 2253–2266 (2020)
Liu, Y., Wang, X., Guo, T.D.: A linearly convergent stochastic recursive gradient method for convex optimization. Optim. Lett. (2020). https://doi.org/10.1007/s11590-020-01550-x
Yang, Z., Wang, C., Zhang, Z.M., Li, J.: Random Barzilai–Borwein step size for mini-batch algorithms. Eng. Appl. Artif. Intel. 72, 124–135 (2018)
Yang, Z., Chen, Z.P., Wang, C.: Accelerating mini-batch SARAH by step size rules. Inf. Sci. 1, 157–173 (2019)
Acknowledgements
The authors are indebted to the editors and anonymous referees for their a number of helpful comments and suggestions that improved the quality of this manuscript.
Funding
This work was supported by Research Project Foundation of Shanxi Scholarship Council of China (No. 2017- 104); Basic Research Program of Shanxi Province (Free exploration) project (Nos. 202103021224303, 20210302124688).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, Ym., Wang, Fs., Li, Jx. et al. A new inexact stochastic recursive gradient descent algorithm with Barzilai–Borwein step size in machine learning. Nonlinear Dyn 111, 3575–3586 (2023). https://doi.org/10.1007/s11071-022-07987-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11071-022-07987-2