Skip to main content
Log in

Optimal transport natural gradient for statistical manifolds with continuous sample space

  • Research Paper
  • Published:
Information Geometry Aims and scope Submit manuscript

Abstract

We study the Wasserstein natural gradient in parametric statistical models with continuous sample spaces. Our approach is to pull back the \(L^2\)-Wasserstein metric tensor in the probability density space to a parameter space, equipping the latter with a positive definite metric tensor, under which it becomes a Riemannian manifold, named the Wasserstein statistical manifold. In general, it is not a totally geodesic sub-manifold of the density space, and therefore its geodesics will differ from the Wasserstein geodesics, except for the well-known Gaussian distribution case, a fact which can also be validated under our framework. We use the sub-manifold geometry to derive a gradient flow and natural gradient descent method in the parameter space. When parametrized densities lie in \(\mathbb {R}\), the induced metric tensor establishes an explicit formula. In optimization problems, we observe that the natural gradient descent outperforms the standard gradient descent when the Wasserstein distance is the objective function. In such a case, we prove that the resulting algorithm behaves similarly to the Newton method in the asymptotic regime. The proof calculates the exact Hessian formula for the Wasserstein distance, which further motivates another preconditioner for the optimization process. To the end, we present examples to illustrate the effectiveness of the natural gradient in several parametric statistical models, including the Gaussian measure, Gaussian mixture, Gamma distribution, and Laplace distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Amari, S.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998)

    Article  Google Scholar 

  2. Amari, S.: Information Geometry and Its Applications, vol. 194. Springer, Berlin (2016)

    Book  MATH  Google Scholar 

  3. Amari, S., Cichocki, A.: Adaptive blind signal processing-neural network approaches. Proc. IEEE 86(10), 2026–2048 (1998)

    Article  Google Scholar 

  4. Amari, S., Karakida, R., Oizumi, M.: Information geometry connecting Wasserstein distance and Kullback–Leibler divergence via the entropy-relaxed transportation problem. Inf. Geom. 1(1), 13–37 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  5. Ambrosio, L., Gigli, N., Savaré, G.: Gradient Flows: In: Metric Spaces and in the Space of Probability Measures. Birkhäuser Basel, Basel (2005)

  6. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. arXiv:1701.07875 [cs, stat] (2017)

  7. Ay, N., Jost, J., Lê, H.V., Schwachhöfer, L.J.: Information geometry. Ergebnisse der Mathematik und ihrer Grenzgebiete A series of modern surveys in mathematics. Folge, volume 64. Springer, Cham (2017)

  8. Benamou, J.D., Brenier, Y.: A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem. Numer. Math. 84(3), 375–393 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  9. Bernton, E., Jacob, P.E., Gerber, M., Robert, C.P.: Inference in generative models using the wasserstein distance. arXiv:1701.05146 [math, stat] (2017)

  10. Bhatia, R., Jian, T., Lim, Y.: On the Bures-Wasserstein distance between positive definite matrices. Expositiones Mathematicae (2018)

  11. Carlen, E.A., Gangbo, W.: Constrained Steepest Descent in the 2-Wasserstein Metric. Ann. Math. 157(3), 807–846 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  12. Carli, F. P., Ning, L., Georgiou, T. T.: Convex Clustering via Optimal Mass Transport. arXiv:1307.5459 [cs] (2013)

  13. Chen, J., Chen, Y., Wu, H., Yang, D.: The quadratic Wasserstein metric for earthquake location. J. Comput. Phys. 373, 188–209 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  14. Chen, Y., Georgiou, T.T., Tannenbaum, A.: Optimal transport for Gaussian mixture models. IEEE Access 7, 6269–6278 (2019)

    Article  Google Scholar 

  15. Chentsov, N.N.: Statistical Decision Rules and Optimal Inference. American Mathematical Society, Providence, R.I. (1982)

    MATH  Google Scholar 

  16. Chow, S.N., Li, W., Lu, J., Zhou, H.: Population games and discrete optimal transport. J. Nonlinear Sci. 29(3), 871–896 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  17. Degond, P., Liu, J.G., Ringhofer, C.: Large-scale dynamics of mean-field games driven by local nash equilibria. J. Nonlinear Sci. 24(1), 93–115 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  18. Engquist, B., Froese, B.D.: Application of the Wasserstein metric to seismic signals. Commun. Math. Sci. 12(5), 979–988 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  19. Engquist, B., Froese, B.D., Yang, Y.: Optimal transport for seismic full waveform inversion. Commun. Math. Sci. 14(8), 2309–2330 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  20. Frogner, C., Zhang, C., Mobahi, H., Araya-Polo, M., Poggio, T.: Learning with a Wasserstein Loss. In: Advances in neural information processing systems, pp. 2053–2061 (2015)

  21. Lafferty, J.D.: The density manifold and configuration space quantization. Trans. Am. Math. Soc. 305(2), 699–741 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  22. Li, W.: Geometry of probability simplex via optimal transport. arXiv:1803.06360 [math] (2018)

  23. Li, W., Montufar, G.: Natural gradient via optimal transport. Inf. Geom. 1(2), 181–214 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  24. Lott, J.: Some geometric calculations on Wasserstein space. Commun. Math. Phys. 277(2), 423–437 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  25. Lott, J., Villani, C.: Ricci curvature for metric-measure spaces via optimal transport. Ann. Math. 169(3), 903–991 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  26. Malagò, L., Montrucchio, L., Pistone, G.: Wasserstein Riemannian Geometry of Positive Definite Matrices. arXiv:1801.09269 [math, stat] (2018)

  27. Malagò, L., Pistone, G.: Natural gradient flow in the mixture geometry of a discrete exponential family. Entropy 17(6), 4215–4254 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  28. Malagò, L., Matteucci, M., Pistone, G.: Natural gradient, fitness modelling and model selection: a unifying perspective. In: 2013 IEEE congress on evolutionary computation, Cancun, pp. 486–493 (2013)

  29. Malagò, L., Matteucci, M.: Robust Estimation of Natural Gradient in Optimization by Regularized Linear Regression. Geometric Science of Information. Springer Berlin Heidelberg, pp. 861–867 (2013)

  30. Martens, J.: New insights and perspectives on the natural gradient method. arXiv:1412.1193 [cs, stat] (2014)

  31. Marti, G., Andler, S., Nielsen, F., Donnat, P.: Optimal transport vs. Fisher-Rao distance between copulas for clustering multivariate time series. In: 2016 IEEE Statistical Signal Processing Workshop, pp. 1–5 (2016)

  32. Métivier, L., Brossier, R., Mérigot, Q., Oudet, E., Virieux, J.: Measuring the misfit between seismograms using an optimal transport distance: application to full waveform inversion Geophysical Supplements to the. Mon. Not. R. Astron. Soc. 205(1), 345–377 (2016)

    Article  MATH  Google Scholar 

  33. Métivier, L., Brossier, R., Mérigot, Q., Oudet, E., Virieux, J.: An optimal transport approach for seismic tomography: application to 3D full waveform inversion. Inverse Prob. 32(11), 115008 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  34. Modin, K.: Geometry of matrix decompositions seen through optimal transport and information geometry. J. Geom. Mech. 9(3), 335–390 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  35. Montavon, G., Müller, K.R., Cuturi, M.: Wasserstein training of restricted Boltzmann machines. Adv. Neural Inf. Process. Syst. 29, 3718–3726 (2016)

    Google Scholar 

  36. Ollivier, Y.: Online natural gradient as a Kalman filter. Electron. J. Stat. 12(2), 2930–2961 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  37. Ollivier, Y.: The extended Kalman filter is a natural gradient descent in trajectory space. arXiv:1901.00696 (2019)

  38. Ollivier, Y.: True asymptotic natural gradient optimization. arXiv:1712.08449 (2017)

  39. Ollivier, Y., Arnold, L., Auger, A., Hansen, N.: Information-geometric optimization algorithms: a unifying picture via invariance principles. J. Mach. Learn. Res. 18(18), 1–65 (2017)

    MathSciNet  MATH  Google Scholar 

  40. Otto, F.: The geometry of dissipative evolution equations the porous medium equation. Commun. Partial Differ. Equ. 26(1–2), 101–174 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  41. Peyré, G., Cuturi, M.: Computational Optimal Transport. arXiv:1803.00567 [stat] (2018)

  42. De Sanctis, A., Gattone, S.: A comparison between Wasserstein distance and a distance induced by Fisher–Rao metric in complex shapes clustering. Multidiscip. Digital Publ. Inst. Proc. 2(4), 163 (2017)

    Google Scholar 

  43. Takatsu, A.: Wasserstein geometry of Gaussian measures. Osaka J. Math. 48(4), 1005–1026 (2011)

    MathSciNet  MATH  Google Scholar 

  44. Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, Berlin (2008)

    MATH  Google Scholar 

  45. Wong, T.L.: Logarithmic divergences from optimal transport and Rényi geometry. Inf. Geom. 1(1), 39–78 (2018)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research is partially supported by AFOSR MURI proposal number 18RT0073. The research of Yifan Chen is partly supported by the Tsinghua undergraduate Xuetang Mathematics Program and Caltech graduate Kortchak Scholarship. The authors thank Prof. Shui-Nee Chow for his farseeing viewpoints on the related topics, and we acknowledge many fruitful discussions with Prof. Wilfrid Gangbo and Prof. Wotao Yin. We gratefully thank Prof. Guido Montúfar for many valuable comments regarding the experimental design part about an earlier version of this manuscript. This article was funded by California Institute of Technology (Grant no. Caltech Kortchak Scholarship) and Multidisciplinary University Research Initiative (Grant no. FA9550-18-1-0502).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yifan Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Y., Li, W. Optimal transport natural gradient for statistical manifolds with continuous sample space. Info. Geo. 3, 1–32 (2020). https://doi.org/10.1007/s41884-020-00028-0

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41884-020-00028-0

Keywords

Navigation