Optimal transport natural gradient for statistical manifolds with continuous sample space

Chen, Yifan; Li, Wuchen

doi:10.1007/s41884-020-00028-0

Optimal transport natural gradient for statistical manifolds with continuous sample space

Research Paper
Published: 11 May 2020

Volume 3, pages 1–32, (2020)
Cite this article

Information Geometry Aims and scope Submit manuscript

17k Accesses
6 Citations
11 Altmetric
Explore all metrics

Abstract

We study the Wasserstein natural gradient in parametric statistical models with continuous sample spaces. Our approach is to pull back the \(L^2\)-Wasserstein metric tensor in the probability density space to a parameter space, equipping the latter with a positive definite metric tensor, under which it becomes a Riemannian manifold, named the Wasserstein statistical manifold. In general, it is not a totally geodesic sub-manifold of the density space, and therefore its geodesics will differ from the Wasserstein geodesics, except for the well-known Gaussian distribution case, a fact which can also be validated under our framework. We use the sub-manifold geometry to derive a gradient flow and natural gradient descent method in the parameter space. When parametrized densities lie in \(\mathbb {R}\), the induced metric tensor establishes an explicit formula. In optimization problems, we observe that the natural gradient descent outperforms the standard gradient descent when the Wasserstein distance is the objective function. In such a case, we prove that the resulting algorithm behaves similarly to the Newton method in the asymptotic regime. The proof calculates the exact Hessian formula for the Wasserstein distance, which further motivates another preconditioner for the optimization process. To the end, we present examples to illustrate the effectiveness of the natural gradient in several parametric statistical models, including the Gaussian measure, Gaussian mixture, Gamma distribution, and Laplace distribution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural gradient via optimal transport

Article 19 November 2018

Second-Order Optimization over the Multivariate Gaussian Distribution

Pseudo-Riemannian geometry encodes information geometry in optimal transport

Article Open access 30 July 2021

References

Amari, S.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998)
Article Google Scholar
Amari, S.: Information Geometry and Its Applications, vol. 194. Springer, Berlin (2016)
Book MATH Google Scholar
Amari, S., Cichocki, A.: Adaptive blind signal processing-neural network approaches. Proc. IEEE 86(10), 2026–2048 (1998)
Article Google Scholar
Amari, S., Karakida, R., Oizumi, M.: Information geometry connecting Wasserstein distance and Kullback–Leibler divergence via the entropy-relaxed transportation problem. Inf. Geom. 1(1), 13–37 (2018)
Article MathSciNet MATH Google Scholar
Ambrosio, L., Gigli, N., Savaré, G.: Gradient Flows: In: Metric Spaces and in the Space of Probability Measures. Birkhäuser Basel, Basel (2005)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. arXiv:1701.07875 [cs, stat] (2017)
Ay, N., Jost, J., Lê, H.V., Schwachhöfer, L.J.: Information geometry. Ergebnisse der Mathematik und ihrer Grenzgebiete A series of modern surveys in mathematics. Folge, volume 64. Springer, Cham (2017)
Benamou, J.D., Brenier, Y.: A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem. Numer. Math. 84(3), 375–393 (2000)
Article MathSciNet MATH Google Scholar
Bernton, E., Jacob, P.E., Gerber, M., Robert, C.P.: Inference in generative models using the wasserstein distance. arXiv:1701.05146 [math, stat] (2017)
Bhatia, R., Jian, T., Lim, Y.: On the Bures-Wasserstein distance between positive definite matrices. Expositiones Mathematicae (2018)
Carlen, E.A., Gangbo, W.: Constrained Steepest Descent in the 2-Wasserstein Metric. Ann. Math. 157(3), 807–846 (2003)
Article MathSciNet MATH Google Scholar
Carli, F. P., Ning, L., Georgiou, T. T.: Convex Clustering via Optimal Mass Transport. arXiv:1307.5459 [cs] (2013)
Chen, J., Chen, Y., Wu, H., Yang, D.: The quadratic Wasserstein metric for earthquake location. J. Comput. Phys. 373, 188–209 (2018)
Article MathSciNet MATH Google Scholar
Chen, Y., Georgiou, T.T., Tannenbaum, A.: Optimal transport for Gaussian mixture models. IEEE Access 7, 6269–6278 (2019)
Article Google Scholar
Chentsov, N.N.: Statistical Decision Rules and Optimal Inference. American Mathematical Society, Providence, R.I. (1982)
MATH Google Scholar
Chow, S.N., Li, W., Lu, J., Zhou, H.: Population games and discrete optimal transport. J. Nonlinear Sci. 29(3), 871–896 (2019)
Article MathSciNet MATH Google Scholar
Degond, P., Liu, J.G., Ringhofer, C.: Large-scale dynamics of mean-field games driven by local nash equilibria. J. Nonlinear Sci. 24(1), 93–115 (2014)
Article MathSciNet MATH Google Scholar
Engquist, B., Froese, B.D.: Application of the Wasserstein metric to seismic signals. Commun. Math. Sci. 12(5), 979–988 (2014)
Article MathSciNet MATH Google Scholar
Engquist, B., Froese, B.D., Yang, Y.: Optimal transport for seismic full waveform inversion. Commun. Math. Sci. 14(8), 2309–2330 (2016)
Article MathSciNet MATH Google Scholar
Frogner, C., Zhang, C., Mobahi, H., Araya-Polo, M., Poggio, T.: Learning with a Wasserstein Loss. In: Advances in neural information processing systems, pp. 2053–2061 (2015)
Lafferty, J.D.: The density manifold and configuration space quantization. Trans. Am. Math. Soc. 305(2), 699–741 (1988)
Article MathSciNet MATH Google Scholar
Li, W.: Geometry of probability simplex via optimal transport. arXiv:1803.06360 [math] (2018)
Li, W., Montufar, G.: Natural gradient via optimal transport. Inf. Geom. 1(2), 181–214 (2018)
Article MathSciNet MATH Google Scholar
Lott, J.: Some geometric calculations on Wasserstein space. Commun. Math. Phys. 277(2), 423–437 (2007)
Article MathSciNet MATH Google Scholar
Lott, J., Villani, C.: Ricci curvature for metric-measure spaces via optimal transport. Ann. Math. 169(3), 903–991 (2009)
Article MathSciNet MATH Google Scholar
Malagò, L., Montrucchio, L., Pistone, G.: Wasserstein Riemannian Geometry of Positive Definite Matrices. arXiv:1801.09269 [math, stat] (2018)
Malagò, L., Pistone, G.: Natural gradient flow in the mixture geometry of a discrete exponential family. Entropy 17(6), 4215–4254 (2015)
Article MathSciNet MATH Google Scholar
Malagò, L., Matteucci, M., Pistone, G.: Natural gradient, fitness modelling and model selection: a unifying perspective. In: 2013 IEEE congress on evolutionary computation, Cancun, pp. 486–493 (2013)
Malagò, L., Matteucci, M.: Robust Estimation of Natural Gradient in Optimization by Regularized Linear Regression. Geometric Science of Information. Springer Berlin Heidelberg, pp. 861–867 (2013)
Martens, J.: New insights and perspectives on the natural gradient method. arXiv:1412.1193 [cs, stat] (2014)
Marti, G., Andler, S., Nielsen, F., Donnat, P.: Optimal transport vs. Fisher-Rao distance between copulas for clustering multivariate time series. In: 2016 IEEE Statistical Signal Processing Workshop, pp. 1–5 (2016)
Métivier, L., Brossier, R., Mérigot, Q., Oudet, E., Virieux, J.: Measuring the misfit between seismograms using an optimal transport distance: application to full waveform inversion Geophysical Supplements to the. Mon. Not. R. Astron. Soc. 205(1), 345–377 (2016)
Article MATH Google Scholar
Métivier, L., Brossier, R., Mérigot, Q., Oudet, E., Virieux, J.: An optimal transport approach for seismic tomography: application to 3D full waveform inversion. Inverse Prob. 32(11), 115008 (2016)
Article MathSciNet MATH Google Scholar
Modin, K.: Geometry of matrix decompositions seen through optimal transport and information geometry. J. Geom. Mech. 9(3), 335–390 (2017)
Article MathSciNet MATH Google Scholar
Montavon, G., Müller, K.R., Cuturi, M.: Wasserstein training of restricted Boltzmann machines. Adv. Neural Inf. Process. Syst. 29, 3718–3726 (2016)
Google Scholar
Ollivier, Y.: Online natural gradient as a Kalman filter. Electron. J. Stat. 12(2), 2930–2961 (2018)
Article MathSciNet MATH Google Scholar
Ollivier, Y.: The extended Kalman filter is a natural gradient descent in trajectory space. arXiv:1901.00696 (2019)
Ollivier, Y.: True asymptotic natural gradient optimization. arXiv:1712.08449 (2017)
Ollivier, Y., Arnold, L., Auger, A., Hansen, N.: Information-geometric optimization algorithms: a unifying picture via invariance principles. J. Mach. Learn. Res. 18(18), 1–65 (2017)
MathSciNet MATH Google Scholar
Otto, F.: The geometry of dissipative evolution equations the porous medium equation. Commun. Partial Differ. Equ. 26(1–2), 101–174 (2001)
Article MathSciNet MATH Google Scholar
Peyré, G., Cuturi, M.: Computational Optimal Transport. arXiv:1803.00567 [stat] (2018)
De Sanctis, A., Gattone, S.: A comparison between Wasserstein distance and a distance induced by Fisher–Rao metric in complex shapes clustering. Multidiscip. Digital Publ. Inst. Proc. 2(4), 163 (2017)
Google Scholar
Takatsu, A.: Wasserstein geometry of Gaussian measures. Osaka J. Math. 48(4), 1005–1026 (2011)
MathSciNet MATH Google Scholar
Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, Berlin (2008)
MATH Google Scholar
Wong, T.L.: Logarithmic divergences from optimal transport and Rényi geometry. Inf. Geom. 1(1), 39–78 (2018)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research is partially supported by AFOSR MURI proposal number 18RT0073. The research of Yifan Chen is partly supported by the Tsinghua undergraduate Xuetang Mathematics Program and Caltech graduate Kortchak Scholarship. The authors thank Prof. Shui-Nee Chow for his farseeing viewpoints on the related topics, and we acknowledge many fruitful discussions with Prof. Wilfrid Gangbo and Prof. Wotao Yin. We gratefully thank Prof. Guido Montúfar for many valuable comments regarding the experimental design part about an earlier version of this manuscript. This article was funded by California Institute of Technology (Grant no. Caltech Kortchak Scholarship) and Multidisciplinary University Research Initiative (Grant no. FA9550-18-1-0502).

Author information

Authors and Affiliations

Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, CA, 91106, USA
Yifan Chen
Department of Mathematics, UCLA, Los Angeles, CA, 90095, USA
Wuchen Li

Authors

Yifan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wuchen Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yifan Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Y., Li, W. Optimal transport natural gradient for statistical manifolds with continuous sample space. Info. Geo. 3, 1–32 (2020). https://doi.org/10.1007/s41884-020-00028-0

Download citation

Received: 21 May 2018
Revised: 22 March 2020
Published: 11 May 2020
Issue Date: June 2020
DOI: https://doi.org/10.1007/s41884-020-00028-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal transport natural gradient for statistical manifolds with continuous sample space

Abstract

Access this article

Similar content being viewed by others

Natural gradient via optimal transport

Second-Order Optimization over the Multivariate Gaussian Distribution

Pseudo-Riemannian geometry encodes information geometry in optimal transport

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimal transport natural gradient for statistical manifolds with continuous sample space

Abstract

Access this article

Similar content being viewed by others

Natural gradient via optimal transport

Second-Order Optimization over the Multivariate Gaussian Distribution

Pseudo-Riemannian geometry encodes information geometry in optimal transport

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation