Skip to main content
Log in

Ricci curvature for parametric statistics via optimal transport

  • Research Paper
  • Published:
Information Geometry Aims and scope Submit manuscript

Abstract

We define the notion of a Ricci curvature lower bound for parametrized statistical models. Following the seminal ideas of Lott–Sturm–Villani, we define this notion based on the geodesic convexity of the Kullback–Leibler divergence in a Wasserstein statistical manifold, that is, a manifold of probability distributions endowed with a Wasserstein metric tensor structure. Within these definitions, which are based on Fisher information matrix and Wasserstein Christoffel symbols, the Ricci curvature is related to both, information geometry and Wasserstein geometry. These definitions allow us to formulate bounds on the convergence rate of Wasserstein gradient flows and information functional inequalities in parameter space. We discuss examples of Ricci curvature lower bounds and convergence rates in exponential family models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Geodesic convexity is a synthetic definition. If a function f on manifold (Mg) is second differentiable, then f is \(\lambda \)-geodesic convex whenever \(\hbox {Hess}_M f\succeq \lambda g\).

References

  1. Amari, S.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998)

    Google Scholar 

  2. Amari, S.: Information Geometry and Its Applications. Number volume 194 in Applied Mathematical Sciences. Springer, Japan (2016)

    MATH  Google Scholar 

  3. Ay, N., Jost, J., Lê, H.V., Schwachhöfer, L.J.: Information Geometry. Springer, Cham (2017)

    MATH  Google Scholar 

  4. Bakry, D., Émery, M.: Diffusions hypercontractives. Séminaire de probabilités de Strasbourg 19, 177–206 (1985)

    MathSciNet  MATH  Google Scholar 

  5. Benamou, J.-D., Brenier, Y.: A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem. Numer. Math. 84(3), 375–393 (2000)

    MathSciNet  MATH  Google Scholar 

  6. Carlen, E.A., Gangbo, W.: Constrained steepest descent in the 2-Wasserstein metric. Ann. Math. 157(3), 807–846 (2003)

    MathSciNet  MATH  Google Scholar 

  7. Chen, Y., Li, W.: Natural gradient in Wasserstein statistical manifold. (2018)

  8. Chow, S.-N., Huang, W., Li, Y., Zhou, H.: Fokker–Planck equations for a free energy functional or Markov process on a graph. Arch. Ration. Mech. Anal. 203(3), 969–1008 (2012)

    MathSciNet  MATH  Google Scholar 

  9. Chow, S.-N., Li, W., Lu, J., Zhou, H.: Population games and Discrete optimal transport. arXiv:1704.00855 [math] (2017)

  10. Chow, S.-N., Li, W., Zhou, H.: A discrete Schrodinger equation via optimal transport on graphs. arXiv:1705.07583 [math] (2017)

  11. Csiszár, I., Shields, P.C.: Information theory and statistics: a tutorial. Commun. Inf. Theory 1(4), 417–528 (2004)

    MATH  Google Scholar 

  12. Darroch, J.N., Ratcliff, D.: Generalized iterative scaling for log-linear models. Ann. Math. Stat. 43(5), 1470–1480, 10 (1972)

    MathSciNet  MATH  Google Scholar 

  13. Erbar, M., Fathi, M.: Poincaré, modified logarithmic Sobolev and isoperimetric inequalities for Markov chains with non-negative Ricci curvature. J. Funct. Anal. 274(11), 3056–3089 (2018)

    MathSciNet  MATH  Google Scholar 

  14. Erbar, M., Henderson, C., Menz, G., Tetali, P.: Ricci curvature bounds for weakly interacting Markov chains. Electron. J. Probab. 22, 23 (2017)

    MathSciNet  MATH  Google Scholar 

  15. Erbar, M., Kopfer, E.: Super Ricci flows for weighted graphs. arXiv:1805.06703 [math] (2018)

  16. Erbar, M., Maas, J.: Ricci Curvature of finite Markov Chains via convexity of the entropy. Arch. Ration. Mech. Anal. 206(3), 997–1038 (2012)

    MathSciNet  MATH  Google Scholar 

  17. Erbar, M., Maas, J., Tetali, P.: Discrete Ricci curvature bounds for Bernoulli–Laplace and Random transposition models. Annales de la faculté des sciences de Toulouse Mathématiques 24(4), 781–800 (2015)

    MathSciNet  MATH  Google Scholar 

  18. Fathi, M., Maas, J.: Entropic Ricci curvature bounds for discrete interacting systems. Ann. Appl. Probab. 26(3), 1774–1806 (2016)

    MathSciNet  MATH  Google Scholar 

  19. Hua, B., Jost, J., Liu, S.: Geometric analysis aspects of infinite semiplanar graphs with non-negative curvature. Journal für die reine und angewandte Mathematik (Crelles Journal) 2015(700), 1–36 (2015)

    MATH  Google Scholar 

  20. Jordan, R., Kinderlehrer, D., Otto, F.: The variational formulation of the Fokker–Planck equation. SIAM J. Math. Anal. 29(1), 1–17 (1998)

    MathSciNet  MATH  Google Scholar 

  21. Jost, J., Liu, S.: Ollivier’s Ricci Curvature, local clustering and curvature-dimension inequalities on graphs. Discr. Comput. Geom. 51(2), 300–322 (2014)

    MathSciNet  MATH  Google Scholar 

  22. Lafferty, J.D.: The density manifold and configuration space quantization. Trans. Am. Math. Soc. 305(2), 699–741 (1988)

    MathSciNet  MATH  Google Scholar 

  23. Li, W.: Geometry of probability simplex via optimal transport. arXiv:1803.06360 [math] (2018)

  24. Li, W., Montúfar, G.: Natural gradient via optimal transport. Inf. Geom. 1(2), 181–214 (2018)

    MathSciNet  MATH  Google Scholar 

  25. Lin, Y., Lu, L., Yau, S.-T.: Ricci curvature of graphs. Tohoku Math. J. 63(4), 605–627 (2011)

    MathSciNet  MATH  Google Scholar 

  26. Lin, Y., Yau, S.-T.: Ricci curvature and eigenvalue estimate on locally finite graphs. Math. Res. Lett. 17(2), 343–356 (2010)

    MathSciNet  MATH  Google Scholar 

  27. Longford, N.T.: A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects. Biometrika 74(4), 817–827, 12 (1987)

    MathSciNet  MATH  Google Scholar 

  28. Lott, J., Villani, C.: Ricci Curvature for metric-measure spaces via optimal transport. Ann. Math. 169(3), 903–991 (2009)

    MathSciNet  MATH  Google Scholar 

  29. Maas, J.: Gradient flows of the entropy for finite Markov chains. J. Funct. Anal. 261(8), 2250–2292 (2011)

    MathSciNet  MATH  Google Scholar 

  30. Mielke, A.: Geodesic convexity of the relative entropy in reversible markov chains. Calc. Var. Part. Differ. Equ. 48(1), 1–31 (2013)

    MathSciNet  MATH  Google Scholar 

  31. Ollivier, Y.: Ricci curvature of Markov chains on metric spaces. J. Funct. Anal. 256(3), 810–864 (2009)

    MathSciNet  MATH  Google Scholar 

  32. Ollivier, Y., Villani, C.: A curved Brunn–Minkowski inequality on the discrete hypercube, or: what is the ricci curvature of the discrete hypercube? SIAM J. Discr. Math. 26(3), 983–996 (2012)

    MathSciNet  MATH  Google Scholar 

  33. Otto, F.: The geometry of dissipative evolution equations: the porous medium equation. Commun. Partial Differ. Equ. 26(1–2), 101–174 (2001)

    MathSciNet  MATH  Google Scholar 

  34. Otto, F., Villani, C.: Generalization of an Inequality by Talagrand and links with the logarithmic sobolev inequality. J. Funct. Anal. 173(2), 361–400 (2000)

    MathSciNet  MATH  Google Scholar 

  35. Simon Eberle, B.N., Schlichting, A.: Gradient flow formulation and longtime behaviour of a constrained Fokker–Planck equation. Nonlinear Anal. 158, 142167 (2017)

    MathSciNet  MATH  Google Scholar 

  36. Solomon, J., Rustamov, R.M., Guibas, L.J., Butscher, A.: Continuous-flow graph transportation distances. arXiv:1603.06927 (2016)

  37. Sturm, K.-T.: On the geometry of metric measure spaces. Acta Math. 196(1), 65–131 (2006)

    MathSciNet  MATH  Google Scholar 

  38. Villani, C.: Optimal Transport: Old and New. Number 338 in Grundlehren der mathematischen Wissenschaften. Springer, Berlin (2009)

    Google Scholar 

Download references

Acknowledgements

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement no 757983). W.L. is supported by AFOSR MURI FA9550-18-1-0502.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wuchen Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A. Additional figures to Example 1

Appendix A. Additional figures to Example 1

Fig. 6
figure 6

Similar to Fig. 4 but with \(\Theta = [-1/2,1/2]\). Note how on this tight parameter domain around \(\theta =0\) (the value of the reference measure), the Ricci curvature lower bound gives a very close lower bound on the minimum rate of convergence for each of the models. The middle shows the direct comparison of the two values across the 30 exponential families. The minimum rate of convergence is shown in blue, and the Hessian in red

Fig. 7
figure 7

Similar to Fig. 4, but with a larger parameter domain \(\Theta = [-4,4]\). On this relatively large parameter domain, the models contain points close to the boundary of the simplex, where the Hessian (and the Ricci curvature) can have large oscillations. In turn, we observe larger gaps to the minimum rate of convergence, compared with Fig. 6

Fig. 8
figure 8

Convergence rates and minimum Hessian eigenvalue at individual parameter choices. Here we fixed the ground metric \(\omega =(\omega _{12},\omega _{23},\omega _{13})=(1/2,1/2,0)\). Each subplot corresponds to one exponential family, with sufficient statistic indicated at the top. Within a region around \(\theta =0\) (the value of the reference measure), the minimum of the Hessian is closer to the convergence rates. In fact, the Hessian eigenvalue intersects the rate of convergence at \(\theta =0\). The Hessian at \(\theta =0\) is the asymptotic rate of convergence. The lower row zooms in the y axis of the upper row. For these exponential families, the convergence rates do not vary much across choices of the initial parameter value

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, W., Montúfar, G. Ricci curvature for parametric statistics via optimal transport. Info. Geo. 3, 89–117 (2020). https://doi.org/10.1007/s41884-020-00026-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41884-020-00026-2

Keywords

Navigation