Skip to main content

A Formalization of the Natural Gradient Method for General Similarity Measures

  • Conference paper
  • First Online:
Geometric Science of Information (GSI 2019)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11712))

Included in the following conference series:

Abstract

In optimization, the natural gradient method is well-known for likelihood maximization. The method uses the Kullback–Leibler (KL) divergence, corresponding infinitesimally to the Fisher–Rao metric, which is pulled back to the parameter space of a family of probability distributions. This way, gradients with respect to the parameters respect the Fisher–Rao geometry of the space of distributions, which might differ vastly from the standard Euclidean geometry of the parameter space, often leading to faster convergence. The concept of natural gradient has in most discussions been restricted to the KL-divergence/Fisher–Rao case, although in information geometry the local \(C^2\) structure of a general divergence has been used for deriving a closely related Riemannian metric analogous to the KL-divergence case. In this work, we wish to cast natural gradients into this more general context and provide example computations, notably in the case of a Finsler metric and the p-Wasserstein metric. We additionally discuss connections between the natural gradient method and multiple other optimization techniques in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agueh, M.: Finsler structure in the p-Wasserstein space and gradient flows. Comptes Rendus Mathematique 350(1–2), 35–40 (2012)

    Article  MathSciNet  Google Scholar 

  2. Amari, S.I.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998)

    Article  Google Scholar 

  3. Amari, S.i.: Divergence function, information monotonicity and information geometry. In: Workshop on Information Theoretic Methods in Science and Engineering (WITMSE). Citeseer (2009)

    Google Scholar 

  4. Amari, S.I.: Information Geometry and Its Applications. Springer, Tokyo (2016). https://doi.org/10.1007/978-4-431-55978-8

    Book  MATH  Google Scholar 

  5. Bercu, G.: Gradient methods on Finsler manifolds. In: Proceedings of the Workshop on Global Analysis, Differential Geometry and Lie Algebras, pp. 230–233 (2000)

    Google Scholar 

  6. Chen, Y., Li, W.: Natural gradient in Wasserstein statistical manifold. arXiv preprint arXiv:1805.08380 (2018)

  7. Dauphin, Y.N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., Bengio, Y.: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In: Advances in Neural Information Processing Systems, pp. 2933–2941 (2014)

    Google Scholar 

  8. Khan, M.E., Baqué, P., Fleuret, F., Fua, P.: Kullback-Leibler proximal variational inference. In: Advances in Neural Information Processing Systems, pp. 3402–3410 (2015)

    Google Scholar 

  9. Le Roux, N., Manzagol, P.A., Bengio, Y.: Topmoumoute online natural gradient algorithm. In: Advances in Neural Information Processing Systems, pp. 849–856 (2008)

    Google Scholar 

  10. Li, W., Montúfar, G.: Natural gradient via optimal transport. Inf. Geom. 1(2), 181–214 (2018)

    Article  Google Scholar 

  11. Martens, J.: Deep learning via Hessian-free optimization. In: ICML, vol. 27, pp. 735–742 (2010)

    Google Scholar 

  12. Martens, J.: New insights and perspectives on the natural gradient method. arXiv preprint arXiv:1412.1193 (2014)

  13. Nemirovsky, A.S., Yudin, D.B.: Problem complexity and method efficiency in optimization (1983)

    Google Scholar 

  14. Parikh, N., Boyd, S., et al.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014)

    Article  Google Scholar 

  15. Pascanu, R., Bengio, Y.: Revisiting natural gradient for deep networks. arXiv preprint arXiv:1301.3584 (2013)

  16. Raskutti, G., Mukherjee, S.: The information geometry of mirror descent. IEEE Trans. Inf. Theory 61(3), 1451–1457 (2015)

    Article  MathSciNet  Google Scholar 

  17. Saad, Y.: Krylov subspace methods for solving large unsymmetric linear systems. Math. Comput. 37(155), 105–126 (1981)

    Article  MathSciNet  Google Scholar 

  18. Schraudolph, N.N.: Fast curvature matrix-vector products for second-order gradient descent. Neural Comput. 14(7), 1723–1738 (2002)

    Article  Google Scholar 

Download references

Acknowledgements

The authors were supported by Centre for Stochastic Geometry and Advanced Bioimaging, and a block stipendium, both funded by a grant from the Villum Foundation. We furthermore wish to thank the anonymous reviewers for their very useful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anton Mallasto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mallasto, A., Haije, T.D., Feragen, A. (2019). A Formalization of the Natural Gradient Method for General Similarity Measures. In: Nielsen, F., Barbaresco, F. (eds) Geometric Science of Information. GSI 2019. Lecture Notes in Computer Science(), vol 11712. Springer, Cham. https://doi.org/10.1007/978-3-030-26980-7_62

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26980-7_62

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26979-1

  • Online ISBN: 978-3-030-26980-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics