The Information Geometry of Mirror Descent

Raskutti, Garvesh; Mukherjee, Sayan

doi:10.1007/978-3-319-25040-3_39

Garvesh Raskutti^15,16 &
Sayan Mukherjee^17,18

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9389))

Included in the following conference series:

International Conference on Geometric Science of Information

2151 Accesses

Abstract

We prove the equivalence of two online learning algorithms, mirror descent and natural gradient descent. Both mirror descent and natural gradient descent are generalizations of online gradient descent when the parameter of interest lies on a non-Euclidean manifold. Natural gradient descent selects the steepest descent direction along a Riemannian manifold by multiplying the standard gradient by the inverse of the metric tensor. Mirror descent induces non-Euclidean structure by solving iterative optimization problems using different proximity functions. In this paper, we prove that mirror descent induced by a Bregman divergence proximity functions is equivalent to the natural gradient descent algorithm on the Riemannian manifold in the dual co-ordinate system. We use techniques from convex analysis and connections between Riemannian manifolds, Bregman divergences and convexity to prove this result. This equivalence between natural gradient descent and mirror descent, implies that (1) mirror descent is the steepest descent direction along the Riemannian manifold corresponding to the choice of Bregman divergence and (2) mirror descent with log-likelihood loss applied to parameter estimation in exponential families asymptotically achieves the classical Cramér-Rao lower bound.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amari, S.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998)
Article MathSciNet Google Scholar
Amari, S., Cichocki, A.: Information geometry of divergence functions. Bull. Pol. Acad. Sci. Tech. Sci. 58(1), 183–195 (2010)
Google Scholar
Amari, S.-I., Barndoff-Nielsen, O.E., Kass, R.E., Lauritzen, S.L., Rao, C.R.: Differential Geometry in Statistical Inference. IMS Lecture Notes - Monograph Series. Institute of Mathematical Statistic, Hayward (1987)
MATH Google Scholar
Azoury, K.S., Warmuth, M.K.: Relative loss bounds for on-line density estimation with the exponential family of dsitributions. Mach. Learn. 43(3), 211–246 (2001)
Article MATH Google Scholar
Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)
MathSciNet MATH Google Scholar
Barndorff-Nielson, O.E.: Information and Exponential Families. Wiley, Chichester (1978)
Google Scholar
Bonnabel, S.: Stochastic gradient descent on Riemannian manifiolds. Technical report, Mines Paris Tech (2011)
Google Scholar
Bregman, L.M.: The relaxation method for finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7, 191–204 (1967)
Article MathSciNet MATH Google Scholar
Brown, L.D.: Fundamentals of Statistical Exponential Families. Institute of Mathematical Statistics, Hayward (1986)
MATH Google Scholar
DoCarmo, M.P.: Riemannian Geometry. Springer Series in Statistics. Birkhauser, Boston (1992)
Google Scholar
Cramér, H.: Mathematical Methods of Statistics. Princeton University Press, Princeton (1946)
MATH Google Scholar
Efron, B.: Defining the curvature of a statistical problem (with applications to second order efficiency). Ann. Stat. 3(6), 1189–1242 (1975)
Article MathSciNet MATH Google Scholar
Efron, B.: The geometry of exponential families. Ann. Stat. 6, 362–376 (1978)
Article MathSciNet MATH Google Scholar
Fisher, R.A.: Theory of statistical estimation. Math. Proc. Cambridge Philos. Soc. 22, 700–725 (1925)
Article MATH Google Scholar
Lafferty, J.: Additive models, boosting, and inference for generalized divergences. In: COLT (1999)
Google Scholar
Nemirovski, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)
Google Scholar
Nielsen, F., Garcia, V.: Statistical exponential families: a digest with flash cards. Technical report, École Polytechnique (2011)
Google Scholar
Rao, C.R.: Information and accuracy obtainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 37, 81–91 (1945)
MathSciNet MATH Google Scholar
Rao, C.R.: Asymptotic efficiency and limiting information. In: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 531–546 (1961)
Google Scholar
Reid, M.D., Williamson, R.C.: Information, divergence and risk for binary experiments. J. Mach. Learn. Res. 12, 731–817 (2011)
MathSciNet MATH Google Scholar
Rockafeller, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Book Google Scholar
Wainwright, M.J., Jordan, M.I.: A variational principle for graphical models. In: New Directions in Statistical Signal Processing. MIT Press, Cambridge, MA (2006)
Google Scholar

Download references

Acknowledgements

GR was partially supported by the NSF under Grant DMS-1127914 to the Statistical and Applied Mathematical Sciences Institute. SM was supported by grants: NIH (Systems Biology): 5P50-GM081883, AFOSR: FA9550-10-1-0436, and NSF CCF-1049290.

Author information

Authors and Affiliations

Department of Statistics and Computer Science, University of Wisconsin-Madison, Madison, USA
Garvesh Raskutti
Wisconsin Institute of Discovery, Optimization Group, Madison, USA
Garvesh Raskutti
Departments of Statistical Science, Computer Science, and Mathematics, Duke University, Durham, USA
Sayan Mukherjee
Institute for Genome Sciences & Policy, Duke University, Durham, USA
Sayan Mukherjee

Authors

Garvesh Raskutti
View author publications
You can also search for this author in PubMed Google Scholar
Sayan Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Garvesh Raskutti .

Editor information

Editors and Affiliations

Bâtiment Alan Turing, CS35003, École Polytechnique, Palaiseau, France
Frank Nielsen
Thales Land\& Air Systems, Limours, France
Frédéric Barbaresco

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Raskutti, G., Mukherjee, S. (2015). The Information Geometry of Mirror Descent. In: Nielsen, F., Barbaresco, F. (eds) Geometric Science of Information. GSI 2015. Lecture Notes in Computer Science(), vol 9389. Springer, Cham. https://doi.org/10.1007/978-3-319-25040-3_39

Download citation

DOI: https://doi.org/10.1007/978-3-319-25040-3_39
Published: 03 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25039-7
Online ISBN: 978-3-319-25040-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics