Skip to main content

The Information Geometry of Mirror Descent

  • Conference paper
  • First Online:
Geometric Science of Information (GSI 2015)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9389))

Included in the following conference series:

  • 2151 Accesses

Abstract

We prove the equivalence of two online learning algorithms, mirror descent and natural gradient descent. Both mirror descent and natural gradient descent are generalizations of online gradient descent when the parameter of interest lies on a non-Euclidean manifold. Natural gradient descent selects the steepest descent direction along a Riemannian manifold by multiplying the standard gradient by the inverse of the metric tensor. Mirror descent induces non-Euclidean structure by solving iterative optimization problems using different proximity functions. In this paper, we prove that mirror descent induced by a Bregman divergence proximity functions is equivalent to the natural gradient descent algorithm on the Riemannian manifold in the dual co-ordinate system. We use techniques from convex analysis and connections between Riemannian manifolds, Bregman divergences and convexity to prove this result. This equivalence between natural gradient descent and mirror descent, implies that (1) mirror descent is the steepest descent direction along the Riemannian manifold corresponding to the choice of Bregman divergence and (2) mirror descent with log-likelihood loss applied to parameter estimation in exponential families asymptotically achieves the classical Cramér-Rao lower bound.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amari, S.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998)

    Article  MathSciNet  Google Scholar 

  2. Amari, S., Cichocki, A.: Information geometry of divergence functions. Bull. Pol. Acad. Sci. Tech. Sci. 58(1), 183–195 (2010)

    Google Scholar 

  3. Amari, S.-I., Barndoff-Nielsen, O.E., Kass, R.E., Lauritzen, S.L., Rao, C.R.: Differential Geometry in Statistical Inference. IMS Lecture Notes - Monograph Series. Institute of Mathematical Statistic, Hayward (1987)

    MATH  Google Scholar 

  4. Azoury, K.S., Warmuth, M.K.: Relative loss bounds for on-line density estimation with the exponential family of dsitributions. Mach. Learn. 43(3), 211–246 (2001)

    Article  MATH  Google Scholar 

  5. Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)

    MathSciNet  MATH  Google Scholar 

  6. Barndorff-Nielson, O.E.: Information and Exponential Families. Wiley, Chichester (1978)

    Google Scholar 

  7. Bonnabel, S.: Stochastic gradient descent on Riemannian manifiolds. Technical report, Mines Paris Tech (2011)

    Google Scholar 

  8. Bregman, L.M.: The relaxation method for finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7, 191–204 (1967)

    Article  MathSciNet  MATH  Google Scholar 

  9. Brown, L.D.: Fundamentals of Statistical Exponential Families. Institute of Mathematical Statistics, Hayward (1986)

    MATH  Google Scholar 

  10. DoCarmo, M.P.: Riemannian Geometry. Springer Series in Statistics. Birkhauser, Boston (1992)

    Google Scholar 

  11. Cramér, H.: Mathematical Methods of Statistics. Princeton University Press, Princeton (1946)

    MATH  Google Scholar 

  12. Efron, B.: Defining the curvature of a statistical problem (with applications to second order efficiency). Ann. Stat. 3(6), 1189–1242 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  13. Efron, B.: The geometry of exponential families. Ann. Stat. 6, 362–376 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  14. Fisher, R.A.: Theory of statistical estimation. Math. Proc. Cambridge Philos. Soc. 22, 700–725 (1925)

    Article  MATH  Google Scholar 

  15. Lafferty, J.: Additive models, boosting, and inference for generalized divergences. In: COLT (1999)

    Google Scholar 

  16. Nemirovski, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)

    Google Scholar 

  17. Nielsen, F., Garcia, V.: Statistical exponential families: a digest with flash cards. Technical report, École Polytechnique (2011)

    Google Scholar 

  18. Rao, C.R.: Information and accuracy obtainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 37, 81–91 (1945)

    MathSciNet  MATH  Google Scholar 

  19. Rao, C.R.: Asymptotic efficiency and limiting information. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 531–546 (1961)

    Google Scholar 

  20. Reid, M.D., Williamson, R.C.: Information, divergence and risk for binary experiments. J. Mach. Learn. Res. 12, 731–817 (2011)

    MathSciNet  MATH  Google Scholar 

  21. Rockafeller, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

    Book  Google Scholar 

  22. Wainwright, M.J., Jordan, M.I.: A variational principle for graphical models. In: New Directions in Statistical Signal Processing. MIT Press, Cambridge, MA (2006)

    Google Scholar 

Download references

Acknowledgements

GR was partially supported by the NSF under Grant DMS-1127914 to the Statistical and Applied Mathematical Sciences Institute. SM was supported by grants: NIH (Systems Biology): 5P50-GM081883, AFOSR: FA9550-10-1-0436, and NSF CCF-1049290.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Garvesh Raskutti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Raskutti, G., Mukherjee, S. (2015). The Information Geometry of Mirror Descent. In: Nielsen, F., Barbaresco, F. (eds) Geometric Science of Information. GSI 2015. Lecture Notes in Computer Science(), vol 9389. Springer, Cham. https://doi.org/10.1007/978-3-319-25040-3_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25040-3_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25039-7

  • Online ISBN: 978-3-319-25040-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics