Hessian transport gradient flows

  • Wuchen LiEmail author
  • Lexing Ying


We derive new gradient flows of divergence functions in the probability space embedded with a class of Riemannian metrics. The Riemannian metric tensor is built from the transported Hessian operator of an entropy function. The new gradient flow is a generalized Fokker–Planck equation and is associated with a stochastic differential equation that depends on the reference measure. Several examples of Hessian transport gradient flows and the associated stochastic differential equations are presented, including the ones for the reverse Kullback–Leibler divergence, \(\alpha \)-divergence, Hellinger distance, Pearson divergence, and Jenson–Shannon divergence.


Optimal transport Information/Hessian geometry Hessian transport Hessian transport stochastic differential equations Generalized de Bruijn identity 



  1. 1.
    Amari, S.: Information Geometry and Its Applications, 1st edn. Springer, New York (2016)zbMATHGoogle Scholar
  2. 2.
    Amari, S., Cichocki, A.: Information geometry of divergence functions. Bull. Polish Acad. Sci. Tech. Sci. 58(1), 183–195 (2010)Google Scholar
  3. 3.
    Amari, S., Karakida, R., Oizumi, M.: Information Geometry Connecting Wasserstein Distance and Kullback–Leibler Divergence via the Entropy-Relaxed Transportation Problem. arXiv:1709.10219 [cs, math] (2017)
  4. 4.
    Arnold, A., Markowich, P., Toscani, G., Unterreiter, A.: On convex Sobolev inequalities and the rate of convergence to equilibrium for Fokker–Planck type equations. Commun. Part. Differ. Equ. 26(1–2), 43–100 (2001)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Ay, N., Jost, J., Lê, H.V., Schwachhöfer, L.: Information Geometry, vol. 64. Springer, Cham (2017)zbMATHGoogle Scholar
  6. 6.
    Bauer, M., Joshi, S., Modin, K.: Diffeomorphic density matching by optimal information transport. SIAM J. Imaging Sci. 8(3), 1718–1751 (2015)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Bauer, M., Modin, K.: Semi-invariant Riemannian Metrics in Hydrodynamics. arXiv:1810.03424 [math] (2018)
  8. 8.
    Cao, Y., Lu, J., Lu, Y.: Exponential Decay of Renyi Divergence Under Fokker–Planck Equations. arXiv:1805.06554 [math] (2018)
  9. 9.
    Carrillo, J.A., Lisini, S., Savare, G., Slepcev, D.: Nonlinear mobility continuity equations and generalized displacement convexity. J. Funct. Anal. 258(4), 1273–1309 (2010)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Chizat, L., Peyré, G., Schmitzer, B., Vialard, F.-X.: An interpolating distance between optimal transport and Fisher–Rao metrics. Found. Comput. Math. 18(1), 1–44 (2018)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Chow, S.-N., Li, W., Zhou, H.: Entropy dissipation of Fokker-Planck equations on graphs. Discrete Contin. Dyn. Syst. 38(10), 4929–4950 (2018)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley Series in Telecommunications. Wiley, New York (1991)zbMATHGoogle Scholar
  13. 13.
    Csiszár, I., Shields, P.C.: Information theory and statistics: a tutorial. Found. Trends Commun. Inf. Theory 1(4), 417–528 (2004)zbMATHGoogle Scholar
  14. 14.
    Dolbeault, J., Nazaret, B., Savaré, G.: A new class of transport distances between measures. Calc. Var. Part. Differ. Equ. 34(2), 193–231 (2009)MathSciNetzbMATHGoogle Scholar
  15. 15.
    Jordan, R., Kinderlehrer, D., Otto, F.: The variational formulation of the Fokker–Planck equation. SIAM J. Math. Anal. 29(1), 1–17 (1998)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Lafferty, J.D.: The density manifold and configuration space quantization. Trans. Am. Math. Soc. 305(2), 699–741 (1988)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Li, W.: Geometry of Probability Simplex via Optimal Transport. arXiv:1803.06360 [math] (2018)
  18. 18.
    Li, W., Lin, A.T., Montufar, G.: Affine natural proximal learning. In: Geometric Science of Information, pp. 705–714 (2019)Google Scholar
  19. 19.
    Li, W., Montúfar, G.: Natural gradient via optimal transport. Inf. Geom. 1(2), 181–214 (2018)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Li, W., Montufar, G.: Ricci curvature for parametric statistics via optimal transport. CAM report 18-52 (2018)Google Scholar
  21. 21.
    Liero, M., Mielke, A., Savaré, G.: Optimal entropy-transport problems and a new Hellinger–Kantorovich distance between positive measures. Invent. Math. 211(3), 969–1117 (2018)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Lin, A.T., Li, W., Osher, S., Montufar, G.: Wasserstein proximal of GANs. CAM report 18-53 (2019)Google Scholar
  23. 23.
    Liu, J.-G., Lu, J., Margetis, D., Marzuola, J.L.: Asymmetry in crystal facet dynamics of homoepitaxy by a continuum model. Phys. D Nonlinear Phenom. 393, 54–67 (2019)MathSciNetGoogle Scholar
  24. 24.
    Liu, Q.: Stein variational gradient descent as gradient flow. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 3115–3123. Curran Associates Inc., New York (2017)Google Scholar
  25. 25.
    Liu, Q., Wang, D.: Stein variational gradient descent: a general purpose bayesian inference algorithm. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pp. 2378–2386. Curran Associates Inc., USA (2016)Google Scholar
  26. 26.
    Lu, J., Lu, Y., Nolen, J.: Scaling limit of the stein variational gradient descent: the mean field regime. SIAM J. Math. Anal. 51(2), 648–671 (2019)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Lu, J., Vanden-Eijnden, E.: Exact dynamical coarse-graining without time-scale separation. J. Chem. Phys. 141(4), 044109 (2014)Google Scholar
  28. 28.
    Malagò, L., Montrucchio, L., Pistone, G.: Wasserstein Riemannian Geometry of Positive Definite Matrices. arXiv:1801.09269 [math, stat] (2018)
  29. 29.
    Minh, H.Q.: A unified formulation for the Bures–Wasserstein and Log-Euclidean/Log-Hilbert–Schmidt distances between positive definite operators. In: Geometric Science of Information, pp. 475–483 (2019)Google Scholar
  30. 30.
    Nelson, E.: Quantum Fluctuations. Princeton Series in Physics. Princeton University Press, Princeton (1985)zbMATHGoogle Scholar
  31. 31.
    Oksendal, B.K.: Stochastic Differential Equations: An Introduction with Applications, 2nd edn. Springer, Berlin (2013)zbMATHGoogle Scholar
  32. 32.
    Otto, F.: The geometry of dissipative evolution equations the porous medium equation. Commun. Part. Differ. Equ. 26(1–2), 101–174 (2001)MathSciNetzbMATHGoogle Scholar
  33. 33.
    Pavliotis, G.A.: Stochastic processes and applications, volume 60 of Texts in Applied Mathematics. In: Diffusion Processes, the Fokker–Planck and Langevin Equations. Springer, New York (2014)Google Scholar
  34. 34.
    Shlyakhtenko, D.: Free Fisher Information for Non-tracial States. arXiv:math/0101137 (2001)
  35. 35.
    Tsallis, C.: Possible generalization of Boltzmann–Gibbs statistics. J. Stat. Phys. 52(1), 479–487 (1988)MathSciNetzbMATHGoogle Scholar
  36. 36.
    Villani, C.: Optimal Transport: Old and New. Number 338 in Grundlehren Der Mathematischen Wissenschaften. Springer, Berlin (2009)zbMATHGoogle Scholar
  37. 37.
    Wong, T.K.L.: Logarithmic Divergences from Optimal Transport and Renyi Geometry. arXiv:1712.03610 [cs, math, stat] (2017)
  38. 38.
    Zozor, S., Brossier, J.-M.: deBruijn identities: from Shannon, Kullback–Leibler and Fisher to generalized \(\phi \)-entropies, \(\phi \)-divergences and \(\phi \)-Fisher informations. AIP Conf. Proc. 1641(1), 522–529 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of MathematicsUniversity of CaliforniaLos AngelesUSA
  2. 2.Department of MathematicsStanford University and Facebook AI ResearchStanfordUSA

Personalised recommendations