A Stochastic Quasi-Newton Method with Nesterov’s Accelerated Gradient

Indrapriyadarsini, S.; Mahboubi, Shahrzad; Ninomiya, Hiroshi; Asai, Hideki

doi:10.1007/978-3-030-46150-8_43

S. Indrapriyadarsini¹⁴,
Shahrzad Mahboubi¹⁵,
Hiroshi Ninomiya¹⁵ &
…
Hideki Asai¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11906))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2324 Accesses
4 Citations

Abstract

Incorporating second order curvature information in gradient based methods have shown to improve convergence drastically despite its computational intensity. In this paper, we propose a stochastic (online) quasi-Newton method with Nesterov’s accelerated gradient in both its full and limited memory forms for solving large scale non-convex optimization problems in neural networks. The performance of the proposed algorithm is evaluated in Tensorflow on benchmark classification and regression problems. The results show improved performance compared to the classical second order oBFGS and oLBFGS methods and popular first order stochastic methods such as SGD and Adam. The performance with different momentum rates and batch sizes have also been illustrated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Haykin, S.: Neural Networks and Learning Machines, 3rd edn. Pearson Prentice Hall, Upper Saddle River (2009)
Google Scholar
Bottou, L., Cun, Y.L.: Large scale online learning. In: Advances in Neural Information Processing Systems, pp. 217–224 (2004)
Google Scholar
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Lechevallier, Y., Saporta, G. (eds.) Proceedings of COMPSTAT’ 2010. Physica-Verlag HD (2010). https://doi.org/10.1007/978-3-7908-2604-3_16
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Article MathSciNet Google Scholar
Peng, X., Li, L., Wang, F.Y.: Accelerating minibatch stochastic gradient descent using typicality sampling. arXiv preprint arXiv:1903.04192 (2019)
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, pp. 315–323 (2013)
Google Scholar
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)
MathSciNet MATH Google Scholar
Tieleman, T., Hinton, G.: Lecture 6.5-RMSprop, Coursera: neural networks for machine learning. University of Toronto, Technical Report (2012)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Martens, J.: Deep learning via Hessian-free optimization. ICML 27, 735–742 (2010)
Google Scholar
Roosta-Khorasani, F., Mahoney, M.W.: Sub-sampled newton methods I: globally convergent algorithms. arXiv preprint arXiv:1601.04737 (2016)
Dennis Jr., J.E., Moré, J.J.: Quasi-Newton methods, motivation and theory. SIAM Rev. 19(1), 46–89 (1977)
Article MathSciNet Google Scholar
Schraudolph, N.N., Yu, J., Günter, S.: A stochastic quasi-Newton method for online convex optimization. In: Artificial Intelligence and Statistics, pp. 436–443 (2007)
Google Scholar
Mokhtari, A., Ribeiro, A.: RES: regularized stochastic BFGS algorithm. IEEE Trans. Signal Process. 62(23), 6089–6104 (2014)
Article MathSciNet Google Scholar
Mokhtari, A., Ribeiro, A.: Global convergence of online limited memory BFGS. J. Mach. Learn. Res. 16(1), 3151–3181 (2015)
MathSciNet MATH Google Scholar
Byrd, R.H., Hansen, S.L., Nocedal, J., Singer, Y.: A stochastic quasi-Newton method for large-scale optimization. SIAM J. Optim. 26(2), 1008–1031 (2016)
Article MathSciNet Google Scholar
Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-Newton methods for nonconvex stochastic optimization. SIAM J. Optim. 27(2), 927–956 (2017)
Article MathSciNet Google Scholar
Li, Y., Liu, H.: Implementation of stochastic quasi-Newton’s method in PyTorch. arXiv preprint arXiv:1805.02338 (2018)
Lucchi, A., McWilliams, B., Hofmann, T.: A variance reduced stochastic Newton method. arXiv preprint arXiv:1503.08316 (2015)
Moritz, P., Nishihara, R., Jordan, M.: A linearly-convergent stochastic l-BFGS algorithm. In: Artificial Intelligence and Statistics, pp. 249–258 (2016)
Google Scholar
Bollapragada, R., Mudigere, D., Nocedal, J., Shi, H.J.M., Tang, P.T.P.: A progressive batching l-BFGS method for machine learning. arXiv preprint arXiv:1802.05374 (2018)
Byrd, R.H., Chin, G.M., Neveitt, W., Nocedal, J.: On the use of stochastic Hessian information in optimization methods for machine learning. SIAM J. Optim. 21(3), 977–995 (2011)
Article MathSciNet Google Scholar
Gower, R., Goldfarb, D., Richtárik, P.: Stochastic block BFGS: squeezing more curvature out of data. In: International Conference on Machine Learning, pp. 1869–1878 (2016)
Google Scholar
Ninomiya, H.: A novel quasi-Newton-based optimization for neural network training incorporating Nesterov’s accelerated gradient. Nonlinear Theory Appl. IEICE 8(4), 289–301 (2017)
Article Google Scholar
Mahboubi, S., Ninomiya, H.: A novel training algorithm based on limited-memory quasi-Newton method with Nesterov’s accelerated gradient in neural networks and its application to highly-nonlinear modeling of microwave circuit. IARIA Int. J. Adv. Softw. 11(3–4), 323–334 (2018)
Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research, 2nd edn. Springer, New York (2006). https://doi.org/10.1007/978-0-387-40065-5
Book MATH Google Scholar
Zhang, L.: A globally convergent BFGS method for nonconvex minimization without line searches. Optim. Methods Softw. 20(6), 737–747 (2005)
Article MathSciNet Google Scholar
Dai, Y.H.: Convergence properties of the BFGS algoritm. SIAM J. Optim. 13(3), 693–701 (2002)
Article MathSciNet Google Scholar
Indrapriyadarsini, S., Mahboubi, S., Ninomiya, H., Asai, H.: Implementation of a modified Nesterov’s accelerated quasi-Newton method on Tensorflow. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 1147–1154. IEEE (2018)
Google Scholar
Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the 20th International Conference on Machine Learning (ICML 2003), pp. 928–936 (2003)
Google Scholar
Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J.: Modeling wine preferences by data mining from physicochemical properties. Decis. Support Syst. 47(4), 547–553 (2009). https://archive.ics.uci.edu/ml/datasets/wine+quality
Article Google Scholar
Rana, P.: Physicochemical properties of protein tertiary structure data set. UCI Machine Learning Repository (2013). https://archive.ics.uci.edu/ml/datasets/Physicochemical+Properties+of+Protein+Tertiary+Structure
Alpaydin, E., Kaynak, C.: Optical recognition of handwritten digits data set. UCI Machine Learning Repository (1998). https://archive.ics.uci.edu/ml/datasets/optical+recognition+of+handwritten+digits
LeCun, Y., Cortes, C., Burges, C.: MNIST handwritten digit database. AT&T Labs. http://yann.lecun.com/exdb/mnist (2010)
Sutskever, I., Martens, J., Dahl, G.E., Hinton, G.E.: On the importance of initialization and momentum in deep learning. In: ICML, vol. 28, no. 3, pp. 1139–1147 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Shizuoka University, Hamamatsu, Shizuoka Prefecture, Japan
S. Indrapriyadarsini & Hideki Asai
Shonan Institute of Technology, Fujisawa, Kanagawa Prefecture, Japan
Shahrzad Mahboubi & Hiroshi Ninomiya

Authors

S. Indrapriyadarsini
View author publications
You can also search for this author in PubMed Google Scholar
Shahrzad Mahboubi
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Ninomiya
View author publications
You can also search for this author in PubMed Google Scholar
Hideki Asai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hideki Asai .

Editor information

Editors and Affiliations

Leuphana University, Lüneburg, Germany
Ulf Brefeld
IRISA/Inria, Rennes, France
Elisa Fromont
University of Würzburg, Würzburg, Germany
Andreas Hotho
Leiden University, Leiden, The Netherlands
Arno Knobbe
ETH Zurich, Zurich, Switzerland
Marloes Maathuis
Institut National des Sciences Appliquées, Villeurbanne, France
Céline Robardet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Indrapriyadarsini, S., Mahboubi, S., Ninomiya, H., Asai, H. (2020). A Stochastic Quasi-Newton Method with Nesterov’s Accelerated Gradient. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11906. Springer, Cham. https://doi.org/10.1007/978-3-030-46150-8_43

Download citation

DOI: https://doi.org/10.1007/978-3-030-46150-8_43
Published: 30 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46149-2
Online ISBN: 978-3-030-46150-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)