Ricci curvature for parametric statistics via optimal transport

  • Research Paper
  Published:
We define the notion of a Ricci curvature lower bound for parametrized statistical models. Following the seminal ideas of Lott–Sturm–Villani, we define this notion based on the geodesic convexity of the Kullback–Leibler divergence in a Wasserstein statistical manifold, that is, a manifold of probability distributions endowed with a Wasserstein metric tensor structure. Within these definitions, which are based on Fisher information matrix and Wasserstein Christoffel symbols, the Ricci curvature is related to both, information geometry and Wasserstein geometry. These definitions allow us to formulate bounds on the convergence rate of Wasserstein gradient flows and information functional inequalities in parameter space. We discuss examples of Ricci curvature lower bounds and convergence rates in exponential family models.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

  1. Geodesic convexity is a synthetic definition. If a function f on manifold (Mg) is second differentiable, then f is \(\lambda \)-geodesic convex whenever \(\hbox {Hess}_M f\succeq \lambda g\).


This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement no 757983). W.L. is supported by AFOSR MURI FA9550-18-1-0502.

Author information

Authors and Affiliations


Correspondence to Wuchen Li.

Correspondence to Wuchen Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A. Additional figures to Example 1

Appendix A. Additional figures to Example 1

Fig. 6
figure 6

Similar to Fig. 4 but with \(\Theta = [-1/2,1/2]\). Note how on this tight parameter domain around \(\theta =0\) (the value of the reference measure), the Ricci curvature lower bound gives a very close lower bound on the minimum rate of convergence for each of the models. The middle shows the direct comparison of the two values across the 30 exponential families. The minimum rate of convergence is shown in blue, and the Hessian in red

Fig. 7
figure 7

Similar to Fig. 4, but with a larger parameter domain \(\Theta = [-4,4]\). On this relatively large parameter domain, the models contain points close to the boundary of the simplex, where the Hessian (and the Ricci curvature) can have large oscillations. In turn, we observe larger gaps to the minimum rate of convergence, compared with Fig. 6

Fig. 8
figure 8

Convergence rates and minimum Hessian eigenvalue at individual parameter choices. Here we fixed the ground metric \(\omega =(\omega _{12},\omega _{23},\omega _{13})=(1/2,1/2,0)\). Each subplot corresponds to one exponential family, with sufficient statistic indicated at the top. Within a region around \(\theta =0\) (the value of the reference measure), the minimum of the Hessian is closer to the convergence rates. In fact, the Hessian eigenvalue intersects the rate of convergence at \(\theta =0\). The Hessian at \(\theta =0\) is the asymptotic rate of convergence. The lower row zooms in the y axis of the upper row. For these exponential families, the convergence rates do not vary much across choices of the initial parameter value

Cite this article

Li, W., Montúfar, G. Ricci curvature for parametric statistics via optimal transport. Info. Geo. 3, 89–117 (2020).

