Abstract
We study the Wasserstein natural gradient in parametric statistical models with continuous sample spaces. Our approach is to pull back the \(L^2\)-Wasserstein metric tensor in the probability density space to a parameter space, equipping the latter with a positive definite metric tensor, under which it becomes a Riemannian manifold, named the Wasserstein statistical manifold. In general, it is not a totally geodesic sub-manifold of the density space, and therefore its geodesics will differ from the Wasserstein geodesics, except for the well-known Gaussian distribution case, a fact which can also be validated under our framework. We use the sub-manifold geometry to derive a gradient flow and natural gradient descent method in the parameter space. When parametrized densities lie in \(\mathbb {R}\), the induced metric tensor establishes an explicit formula. In optimization problems, we observe that the natural gradient descent outperforms the standard gradient descent when the Wasserstein distance is the objective function. In such a case, we prove that the resulting algorithm behaves similarly to the Newton method in the asymptotic regime. The proof calculates the exact Hessian formula for the Wasserstein distance, which further motivates another preconditioner for the optimization process. To the end, we present examples to illustrate the effectiveness of the natural gradient in several parametric statistical models, including the Gaussian measure, Gaussian mixture, Gamma distribution, and Laplace distribution.
Similar content being viewed by others
References
Amari, S.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998)
Amari, S.: Information Geometry and Its Applications, vol. 194. Springer, Berlin (2016)
Amari, S., Cichocki, A.: Adaptive blind signal processing-neural network approaches. Proc. IEEE 86(10), 2026–2048 (1998)
Amari, S., Karakida, R., Oizumi, M.: Information geometry connecting Wasserstein distance and Kullback–Leibler divergence via the entropy-relaxed transportation problem. Inf. Geom. 1(1), 13–37 (2018)
Ambrosio, L., Gigli, N., Savaré, G.: Gradient Flows: In: Metric Spaces and in the Space of Probability Measures. Birkhäuser Basel, Basel (2005)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. arXiv:1701.07875 [cs, stat] (2017)
Ay, N., Jost, J., Lê, H.V., Schwachhöfer, L.J.: Information geometry. Ergebnisse der Mathematik und ihrer Grenzgebiete A series of modern surveys in mathematics. Folge, volume 64. Springer, Cham (2017)
Benamou, J.D., Brenier, Y.: A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem. Numer. Math. 84(3), 375–393 (2000)
Bernton, E., Jacob, P.E., Gerber, M., Robert, C.P.: Inference in generative models using the wasserstein distance. arXiv:1701.05146 [math, stat] (2017)
Bhatia, R., Jian, T., Lim, Y.: On the Bures-Wasserstein distance between positive definite matrices. Expositiones Mathematicae (2018)
Carlen, E.A., Gangbo, W.: Constrained Steepest Descent in the 2-Wasserstein Metric. Ann. Math. 157(3), 807–846 (2003)
Carli, F. P., Ning, L., Georgiou, T. T.: Convex Clustering via Optimal Mass Transport. arXiv:1307.5459 [cs] (2013)
Chen, J., Chen, Y., Wu, H., Yang, D.: The quadratic Wasserstein metric for earthquake location. J. Comput. Phys. 373, 188–209 (2018)
Chen, Y., Georgiou, T.T., Tannenbaum, A.: Optimal transport for Gaussian mixture models. IEEE Access 7, 6269–6278 (2019)
Chentsov, N.N.: Statistical Decision Rules and Optimal Inference. American Mathematical Society, Providence, R.I. (1982)
Chow, S.N., Li, W., Lu, J., Zhou, H.: Population games and discrete optimal transport. J. Nonlinear Sci. 29(3), 871–896 (2019)
Degond, P., Liu, J.G., Ringhofer, C.: Large-scale dynamics of mean-field games driven by local nash equilibria. J. Nonlinear Sci. 24(1), 93–115 (2014)
Engquist, B., Froese, B.D.: Application of the Wasserstein metric to seismic signals. Commun. Math. Sci. 12(5), 979–988 (2014)
Engquist, B., Froese, B.D., Yang, Y.: Optimal transport for seismic full waveform inversion. Commun. Math. Sci. 14(8), 2309–2330 (2016)
Frogner, C., Zhang, C., Mobahi, H., Araya-Polo, M., Poggio, T.: Learning with a Wasserstein Loss. In: Advances in neural information processing systems, pp. 2053–2061 (2015)
Lafferty, J.D.: The density manifold and configuration space quantization. Trans. Am. Math. Soc. 305(2), 699–741 (1988)
Li, W.: Geometry of probability simplex via optimal transport. arXiv:1803.06360 [math] (2018)
Li, W., Montufar, G.: Natural gradient via optimal transport. Inf. Geom. 1(2), 181–214 (2018)
Lott, J.: Some geometric calculations on Wasserstein space. Commun. Math. Phys. 277(2), 423–437 (2007)
Lott, J., Villani, C.: Ricci curvature for metric-measure spaces via optimal transport. Ann. Math. 169(3), 903–991 (2009)
Malagò, L., Montrucchio, L., Pistone, G.: Wasserstein Riemannian Geometry of Positive Definite Matrices. arXiv:1801.09269 [math, stat] (2018)
Malagò, L., Pistone, G.: Natural gradient flow in the mixture geometry of a discrete exponential family. Entropy 17(6), 4215–4254 (2015)
Malagò, L., Matteucci, M., Pistone, G.: Natural gradient, fitness modelling and model selection: a unifying perspective. In: 2013 IEEE congress on evolutionary computation, Cancun, pp. 486–493 (2013)
Malagò, L., Matteucci, M.: Robust Estimation of Natural Gradient in Optimization by Regularized Linear Regression. Geometric Science of Information. Springer Berlin Heidelberg, pp. 861–867 (2013)
Martens, J.: New insights and perspectives on the natural gradient method. arXiv:1412.1193 [cs, stat] (2014)
Marti, G., Andler, S., Nielsen, F., Donnat, P.: Optimal transport vs. Fisher-Rao distance between copulas for clustering multivariate time series. In: 2016 IEEE Statistical Signal Processing Workshop, pp. 1–5 (2016)
Métivier, L., Brossier, R., Mérigot, Q., Oudet, E., Virieux, J.: Measuring the misfit between seismograms using an optimal transport distance: application to full waveform inversion Geophysical Supplements to the. Mon. Not. R. Astron. Soc. 205(1), 345–377 (2016)
Métivier, L., Brossier, R., Mérigot, Q., Oudet, E., Virieux, J.: An optimal transport approach for seismic tomography: application to 3D full waveform inversion. Inverse Prob. 32(11), 115008 (2016)
Modin, K.: Geometry of matrix decompositions seen through optimal transport and information geometry. J. Geom. Mech. 9(3), 335–390 (2017)
Montavon, G., Müller, K.R., Cuturi, M.: Wasserstein training of restricted Boltzmann machines. Adv. Neural Inf. Process. Syst. 29, 3718–3726 (2016)
Ollivier, Y.: Online natural gradient as a Kalman filter. Electron. J. Stat. 12(2), 2930–2961 (2018)
Ollivier, Y.: The extended Kalman filter is a natural gradient descent in trajectory space. arXiv:1901.00696 (2019)
Ollivier, Y.: True asymptotic natural gradient optimization. arXiv:1712.08449 (2017)
Ollivier, Y., Arnold, L., Auger, A., Hansen, N.: Information-geometric optimization algorithms: a unifying picture via invariance principles. J. Mach. Learn. Res. 18(18), 1–65 (2017)
Otto, F.: The geometry of dissipative evolution equations the porous medium equation. Commun. Partial Differ. Equ. 26(1–2), 101–174 (2001)
Peyré, G., Cuturi, M.: Computational Optimal Transport. arXiv:1803.00567 [stat] (2018)
De Sanctis, A., Gattone, S.: A comparison between Wasserstein distance and a distance induced by Fisher–Rao metric in complex shapes clustering. Multidiscip. Digital Publ. Inst. Proc. 2(4), 163 (2017)
Takatsu, A.: Wasserstein geometry of Gaussian measures. Osaka J. Math. 48(4), 1005–1026 (2011)
Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, Berlin (2008)
Wong, T.L.: Logarithmic divergences from optimal transport and Rényi geometry. Inf. Geom. 1(1), 39–78 (2018)
Acknowledgements
This research is partially supported by AFOSR MURI proposal number 18RT0073. The research of Yifan Chen is partly supported by the Tsinghua undergraduate Xuetang Mathematics Program and Caltech graduate Kortchak Scholarship. The authors thank Prof. Shui-Nee Chow for his farseeing viewpoints on the related topics, and we acknowledge many fruitful discussions with Prof. Wilfrid Gangbo and Prof. Wotao Yin. We gratefully thank Prof. Guido Montúfar for many valuable comments regarding the experimental design part about an earlier version of this manuscript. This article was funded by California Institute of Technology (Grant no. Caltech Kortchak Scholarship) and Multidisciplinary University Research Initiative (Grant no. FA9550-18-1-0502).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, Y., Li, W. Optimal transport natural gradient for statistical manifolds with continuous sample space. Info. Geo. 3, 1–32 (2020). https://doi.org/10.1007/s41884-020-00028-0
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41884-020-00028-0