A Stochastic Modified Limited Memory BFGS for Training Deep Neural Networks

Yousefi, Mahsa; Martínez Calomardo, Ángeles

doi:10.1007/978-3-031-10464-0_2

Mahsa Yousefi¹⁰ &
Ángeles Martínez Calomardo¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 507))

Included in the following conference series:

Science and Information Conference

972 Accesses

Abstract

In this work, we study stochastic quasi-Newton methods for solving the non-linear and non-convex optimization problems arising in the training of deep neural networks. We consider the limited memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) update in the framework of a trust-region approach. We provide an almost comprehensive overview of recent improvements in quasi-Newton based training algorithms, such as accurate selection of the initial Hessian approximation, efficient solution of the trust-region subproblem with a direct method in high accuracy and an overlap sampling strategy to assure stable quasi-Newton updating by computing gradient differences based on this overlap. We provide a comparison of the standard L-BFGS method with a variant of this algorithm based on a modified secant condition which is theoretically shown to provide an increased order of accuracy in the approximation of the curvature of the Hessian. In our experiments, both quasi-Newton updates exhibit comparable performances. Our results show that with a fixed computational time budget the proposed quasi-Newton methods provide comparable or better testing accuracy than the state of the art first-order Adam optimizer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Z-score normalization produces a dataset whose mean and standard deviation is zero and one, respectively.

References

Adhikari, L., DeGuchy, O., Erway, J.B., Lockhart, S., Marcia, R.F.: Limited-memory trust-region methods for sparse relaxation. In: Wavelets and Sparsity XVII, vol. 10394, p. 103940J. International Society for Optics and Photonics (2017)
Google Scholar
Berahas, A.L., Nocedal, J., Takáč, M.: A multi-batch L-BFGS method for machine learning. In: Advances in Neural Information Processing Systems, pp. 1055–1063 (2016)
Google Scholar
Berahas, A.S., Takáč, M.: A robust multi-batch L-BFGS method for machine learning. Optim. Methods Softw. 35(1), 191–219 (2020)
Article MathSciNet Google Scholar
Bollapragada, R., Byrd, R.H., Nocedal, J.: Exact and inexact subsampled Newton methods for optimization. IMA J. Numer. Anal. 39(2), 545–578 (2019)
Article MathSciNet Google Scholar
Bollapragada, R., Nocedal, J., Mudigere, D., Shi, H.-J., Peter Tang, P.T.: A progressive batching L-BFGS method for machine learning. In: International Conference on Machine Learning, PMLR, pp. 620–629 (2018)
Google Scholar
Bottou, L., LeCun, Y.: Large scale online learning. Adv. Neural. Inf. Process. Syst. 16, 217–224 (2004)
Google Scholar
Brust, J., Erway, J.B., Marcia, R.F.: On solving L-SR1 trust-region subproblems. Comput. Optim. Appl. 66(2), 245–266 (2016)
Article MathSciNet Google Scholar
Burdakov, O., Gong, L., Zikrin, S., Yuan, Y.: On efficiently combining limited-memory and trust-region techniques. Math. Program. Comput. 9(1), 101–134 (2016)
Article MathSciNet Google Scholar
Byrd, R.H., Hansen, S.L., Nocedal, J., Singer, Y.: A stochastic quasi-Newton method for large-scale optimization. SIAM J. Optim. 26(2), 1008–1031 (2016)
Article MathSciNet Google Scholar
Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust region methods. SIAM (2000)
Google Scholar
Dauphin, Y., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., Bengio, Y.: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In: Advances in Neural Information Processing Systems, vol. 4, pp. 2933–2941 (2014)
Google Scholar
Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Advances in neural information processing systems, pp. 1646–1654 (2014)
Google Scholar
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(7), 2121–2159 (2011)
MathSciNet MATH Google Scholar
Erway, J.B., Griffin, J., Marcia, R.F., Omheni, R.: Trust-region algorithms for training responses: machine learning methods using indefinite hessian approximations. Optim. Methods Softw. 35(3), 460–487 (2020)
Article MathSciNet Google Scholar
Gay, D.M.: Computing optimal locally constrained steps. SIAM J. Sci. Statis. Comput. 2(2), 186–197 (1981)
Article MathSciNet Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference On Artificial Intelligence And Statistics, JMLR Workshop and Conference Proceedings, pp. 249–256 (2010)
Google Scholar
Goldfarb, D., Ren, Y., Bahamou, A.: Practical quasi-Newton methods for training deep neural networks (2020). arXiv preprint, arXiv:2006.08877
Golub, G.H., Van Loan, C.F.: Matrix computations, 4th edn. Johns Hopkins University Press (2013)
Google Scholar
Gower, R., Goldfarb, D., Richtárik, P.: Stochastic block BFGS: squeezing more curvature out of data. In: International Conference on Machine Learning, pp. 1869–1878 (2016)
Google Scholar
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural. Inf. Process. Syst. 26, 315–323 (2013)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings (2015)
Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009). https://www.cs.toronto.edu/~kriz/cifar.html
Kungurtsevm V., Pevny, T.: Algorithms for solving optimization problems arising from deep neural net models: smooth problems (2018). arXiv preprint, arXiv:1807.00172
Kylasa, S., Roosta, F., Mahoney, M.W., Grama, A.: GPU accelerated sub-sampled Newton’s method for convex classification problems. In: Proceedings of the 2019 SIAM International Conference on Data Mining, pp. 702–710. SIAM (2019)
Google Scholar
LeCun, Y.: The MNIST database of handwritten digits (1998). http://yann.lecun.com/exdb/mnist/
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Lucchi, A., McWilliams, B., Hofmann, T.: A variance reduced stochastic Newton method (2015). arXiv preprint, arXiv:1503.08316
Martens, J., Sutskever, I.: Training deep and recurrent networks with hessian-free optimization. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 479–535. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_27
Chapter Google Scholar
Modarres, F., Malik, A.H., Leong, W.J.: Improved hessian approximation with modified secant equations for symmetric rank-one method. J. Comput. Appli. Math. 235(8), 2423–2431 (2011)
Article MathSciNet Google Scholar
Mokhtari, A., Ribeiro, A.: Res: regularized stochastic BFGS algorithm. IEEE Trans. Signal Process. 62(23), 6089–6104 (2014)
Article MathSciNet Google Scholar
Mokhtari, A., Ribeiro, A.: Global convergence of online limited memory BFGS. J. Mach. Learn. Res. 16(1), 3151–3181 (2015)
MathSciNet MATH Google Scholar
Moré, J.J., Sorensen, D.C.: Computing a trust region step. SIAM J. Sci. Stat. Comput. 4(3), 553–572 (1983)
Article MathSciNet Google Scholar
Moritz, P., Nishihara, R., Jordan, M.: A linearly-convergent stochastic L-BFGS algorithm. In: Artificial Intelligence and Statistics, pp. 249–258 (2016)
Google Scholar
Nocedal, J., Wright, S.: Numerical optimization. Springer Science & Business Media (2006). https://doi.org/10.1007/978-0-387-40065-5
Rafati, J., Marcia, R.F.: Improving L-BFGS initialization for trust-region methods in deep learning. In: 2018 17th IEEE International Conference on Machine Learning and Applications, ICMLA, pp. 501–508. IEEE (2018)
Google Scholar
Ramamurthy, V., Duffy, N.: L-SR1: a second order optimization method for deep learning (2016)
Google Scholar
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Article MathSciNet Google Scholar
Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1–2), 83–112 (2017)
Article MathSciNet Google Scholar
Schraudolph, N.N., Yu, J., Günter, S.: A stochastic quasi-Newton method for online convex optimization. In: Artificial intelligence and statistics, PMLR, pp. 436–443 (2007)
Google Scholar
Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-Newton methods for nonconvex stochastic optimization. SIAM J. Optim. 27(2), 927–956 (2017)
Article MathSciNet Google Scholar
Wei, Z., Li, G., Qi, L.: New quasi-Newton methods for unconstrained optimization problems. Appl. Math. Comput. 175(2), 1156–1188 (2006)
MathSciNet MATH Google Scholar
Xu, P., Roosta, F., Mahoney, M.W.: Second-order optimization for non-convex machine learning: an empirical study. In: Proceedings of the 2020 SIAM International Conference on Data Mining, pp. 199–207. SIAM (2020)
Google Scholar
Ziyin, L., Li, B., Ueda, M.: SGD may never escape saddle points (2021). arXiv preprint, arXiv:2107.11774v1

Download references

Author information

Authors and Affiliations

Department of Mathematics and Geosciences, University of Trieste, Trieste, Italy
Mahsa Yousefi & Ángeles Martínez Calomardo

Authors

Mahsa Yousefi
View author publications
You can also search for this author in PubMed Google Scholar
Ángeles Martínez Calomardo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ángeles Martínez Calomardo .

Editor information

Editors and Affiliations

Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yousefi, M., Martínez Calomardo, Á. (2022). A Stochastic Modified Limited Memory BFGS for Training Deep Neural Networks. In: Arai, K. (eds) Intelligent Computing. SAI 2022. Lecture Notes in Networks and Systems, vol 507. Springer, Cham. https://doi.org/10.1007/978-3-031-10464-0_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-10464-0_2
Published: 07 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10463-3
Online ISBN: 978-3-031-10464-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

A Stochastic Modified Limited Memory BFGS for Training Deep Neural Networks