Skip to main content

A Stochastic Quasi-Newton Method with Nesterov’s Accelerated Gradient

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11906))

Abstract

Incorporating second order curvature information in gradient based methods have shown to improve convergence drastically despite its computational intensity. In this paper, we propose a stochastic (online) quasi-Newton method with Nesterov’s accelerated gradient in both its full and limited memory forms for solving large scale non-convex optimization problems in neural networks. The performance of the proposed algorithm is evaluated in Tensorflow on benchmark classification and regression problems. The results show improved performance compared to the classical second order oBFGS and oLBFGS methods and popular first order stochastic methods such as SGD and Adam. The performance with different momentum rates and batch sizes have also been illustrated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Haykin, S.: Neural Networks and Learning Machines, 3rd edn. Pearson Prentice Hall, Upper Saddle River (2009)

    Google Scholar 

  2. Bottou, L., Cun, Y.L.: Large scale online learning. In: Advances in Neural Information Processing Systems, pp. 217–224 (2004)

    Google Scholar 

  3. Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Lechevallier, Y., Saporta, G. (eds.) Proceedings of COMPSTAT’ 2010. Physica-Verlag HD (2010). https://doi.org/10.1007/978-3-7908-2604-3_16

  4. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)

    Article  MathSciNet  Google Scholar 

  5. Peng, X., Li, L., Wang, F.Y.: Accelerating minibatch stochastic gradient descent using typicality sampling. arXiv preprint arXiv:1903.04192 (2019)

  6. Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, pp. 315–323 (2013)

    Google Scholar 

  7. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)

    MathSciNet  MATH  Google Scholar 

  8. Tieleman, T., Hinton, G.: Lecture 6.5-RMSprop, Coursera: neural networks for machine learning. University of Toronto, Technical Report (2012)

    Google Scholar 

  9. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  10. Martens, J.: Deep learning via Hessian-free optimization. ICML 27, 735–742 (2010)

    Google Scholar 

  11. Roosta-Khorasani, F., Mahoney, M.W.: Sub-sampled newton methods I: globally convergent algorithms. arXiv preprint arXiv:1601.04737 (2016)

  12. Dennis Jr., J.E., Moré, J.J.: Quasi-Newton methods, motivation and theory. SIAM Rev. 19(1), 46–89 (1977)

    Article  MathSciNet  Google Scholar 

  13. Schraudolph, N.N., Yu, J., Günter, S.: A stochastic quasi-Newton method for online convex optimization. In: Artificial Intelligence and Statistics, pp. 436–443 (2007)

    Google Scholar 

  14. Mokhtari, A., Ribeiro, A.: RES: regularized stochastic BFGS algorithm. IEEE Trans. Signal Process. 62(23), 6089–6104 (2014)

    Article  MathSciNet  Google Scholar 

  15. Mokhtari, A., Ribeiro, A.: Global convergence of online limited memory BFGS. J. Mach. Learn. Res. 16(1), 3151–3181 (2015)

    MathSciNet  MATH  Google Scholar 

  16. Byrd, R.H., Hansen, S.L., Nocedal, J., Singer, Y.: A stochastic quasi-Newton method for large-scale optimization. SIAM J. Optim. 26(2), 1008–1031 (2016)

    Article  MathSciNet  Google Scholar 

  17. Wang, X., Ma, S., Goldfarb, D., Liu, W.: Stochastic quasi-Newton methods for nonconvex stochastic optimization. SIAM J. Optim. 27(2), 927–956 (2017)

    Article  MathSciNet  Google Scholar 

  18. Li, Y., Liu, H.: Implementation of stochastic quasi-Newton’s method in PyTorch. arXiv preprint arXiv:1805.02338 (2018)

  19. Lucchi, A., McWilliams, B., Hofmann, T.: A variance reduced stochastic Newton method. arXiv preprint arXiv:1503.08316 (2015)

  20. Moritz, P., Nishihara, R., Jordan, M.: A linearly-convergent stochastic l-BFGS algorithm. In: Artificial Intelligence and Statistics, pp. 249–258 (2016)

    Google Scholar 

  21. Bollapragada, R., Mudigere, D., Nocedal, J., Shi, H.J.M., Tang, P.T.P.: A progressive batching l-BFGS method for machine learning. arXiv preprint arXiv:1802.05374 (2018)

  22. Byrd, R.H., Chin, G.M., Neveitt, W., Nocedal, J.: On the use of stochastic Hessian information in optimization methods for machine learning. SIAM J. Optim. 21(3), 977–995 (2011)

    Article  MathSciNet  Google Scholar 

  23. Gower, R., Goldfarb, D., Richtárik, P.: Stochastic block BFGS: squeezing more curvature out of data. In: International Conference on Machine Learning, pp. 1869–1878 (2016)

    Google Scholar 

  24. Ninomiya, H.: A novel quasi-Newton-based optimization for neural network training incorporating Nesterov’s accelerated gradient. Nonlinear Theory Appl. IEICE 8(4), 289–301 (2017)

    Article  Google Scholar 

  25. Mahboubi, S., Ninomiya, H.: A novel training algorithm based on limited-memory quasi-Newton method with Nesterov’s accelerated gradient in neural networks and its application to highly-nonlinear modeling of microwave circuit. IARIA Int. J. Adv. Softw. 11(3–4), 323–334 (2018)

    Google Scholar 

  26. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research, 2nd edn. Springer, New York (2006). https://doi.org/10.1007/978-0-387-40065-5

    Book  MATH  Google Scholar 

  27. Zhang, L.: A globally convergent BFGS method for nonconvex minimization without line searches. Optim. Methods Softw. 20(6), 737–747 (2005)

    Article  MathSciNet  Google Scholar 

  28. Dai, Y.H.: Convergence properties of the BFGS algoritm. SIAM J. Optim. 13(3), 693–701 (2002)

    Article  MathSciNet  Google Scholar 

  29. Indrapriyadarsini, S., Mahboubi, S., Ninomiya, H., Asai, H.: Implementation of a modified Nesterov’s accelerated quasi-Newton method on Tensorflow. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 1147–1154. IEEE (2018)

    Google Scholar 

  30. Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the 20th International Conference on Machine Learning (ICML 2003), pp. 928–936 (2003)

    Google Scholar 

  31. Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J.: Modeling wine preferences by data mining from physicochemical properties. Decis. Support Syst. 47(4), 547–553 (2009). https://archive.ics.uci.edu/ml/datasets/wine+quality

    Article  Google Scholar 

  32. Rana, P.: Physicochemical properties of protein tertiary structure data set. UCI Machine Learning Repository (2013). https://archive.ics.uci.edu/ml/datasets/Physicochemical+Properties+of+Protein+Tertiary+Structure

  33. Alpaydin, E., Kaynak, C.: Optical recognition of handwritten digits data set. UCI Machine Learning Repository (1998). https://archive.ics.uci.edu/ml/datasets/optical+recognition+of+handwritten+digits

  34. LeCun, Y., Cortes, C., Burges, C.: MNIST handwritten digit database. AT&T Labs. http://yann.lecun.com/exdb/mnist (2010)

  35. Sutskever, I., Martens, J., Dahl, G.E., Hinton, G.E.: On the importance of initialization and momentum in deep learning. In: ICML, vol. 28, no. 3, pp. 1139–1147 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hideki Asai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Indrapriyadarsini, S., Mahboubi, S., Ninomiya, H., Asai, H. (2020). A Stochastic Quasi-Newton Method with Nesterov’s Accelerated Gradient. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11906. Springer, Cham. https://doi.org/10.1007/978-3-030-46150-8_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-46150-8_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-46149-2

  • Online ISBN: 978-3-030-46150-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics