Abstract
Wasserstein barycenters define averages of probability measures in a geometrically meaningful way. Their use is increasingly popular in applied fields, such as image, geometry or language processing. In these fields however, the probability measures of interest are often not accessible in their entirety and the practitioner may have to deal with statistical or computational approximations instead. In this article, we quantify the effect of such approximations on the corresponding barycenters. We show that Wasserstein barycenters depend in a Hölder-continuous way on their marginals under relatively mild assumptions. Our proof relies on recent estimates that allow to quantify the strong convexity of the barycenter functional. Consequences regarding the statistical estimation of Wasserstein barycenters and the convergence of regularized Wasserstein barycenters towards their non-regularized counterparts are explored.
Similar content being viewed by others
References
Agueh, M., Carlier, G.: Barycenters in the Wasserstein space. SIAM J. Math. Anal. 43(2), 904–924 (2011)
Ahidar-Coutrix, A., Le Gouic, T., Paris, Q.: Convergence rates for empirical barycenters in metric spaces: curvature, convexity and extendable geodesics. Probab. Theory Relat. Fields 177(1), 323–368 (2020)
Altschuler, J.M., Boix-Adsera, E.: Wasserstein barycenters can be computed in polynomial time in fixed dimension. J. Mach. Learn. Res. 22(44), 1–19 (2021)
Benamou, J.-D., Carlier, G., Cuturi, M., Nenna, L., Peyré, G.: Iterative bregman projections for regularized transportation problems. SIAM J. Sci. Comput. 37(2), A1111–A1138 (2015)
Bigot, J., Cazelles, E., Papadakis, N.: Penalization of barycenters in the Wasserstein space. SIAM J. Math. Anal. 51(3), 2261–2285 (2019)
Bigot, J., Gouet, R., Klein, T., López, A.: Upper and lower risk bounds for estimating the Wasserstein barycenter of random measures on the real line. Electron. J. Stat. 12(2), 2253–2289 (2018)
Bigot, J., Klein, T.: Characterization of barycenters in the Wasserstein space by averaging optimal transport maps. ESAIM PS 22, 35–57 (2018)
Boissard, E., Le Gouic, T., Loubes, J.-M.: Distribution’s template estimate with Wasserstein metrics. Bernoulli 21(2), 740–759 (2015)
Brascamp, H.J., Lieb, E.H.: On extensions of the Brunn-Minkowski and Prékopa-Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation. J. Funct. Anal. 22(4), 366–389 (1976)
Brenier, Y.: Polar factorization and monotone rearrangement of vector-valued functions. Commun. Pure Appl. Math. 44(4), 375–417 (1991)
Brezis, H.: Functional Analysis, Sobolev Spaces and Partial Differential Equations. Universitext, Springer, New York (2010)
Carlier, G., Eichinger, K., Kroshnin, A.: Entropic-Wasserstein barycenters: PDE characterization, regularity, and CLT. SIAM J. Math. Anal. 53(5), 5880–5914 (2021)
Carlier, G., Oberman, A., Oudet, E.: Numerical methods for matching for teams and Wasserstein barycenters. ESAIM M2AN 49(6), 1621–1642 (2015)
Chewi, S., Maunu, T., Rigollet, P., Stromme, A.J.: Gradient descent algorithms for Bures–Wasserstein barycenters. In: Abernethy, J., Agarwal, S. (eds.) Proceedings of Thirty Third Conference on Learning Theory. Proceedings of Machine Learning Research, vol. 125, pp. 1276–1304. PMLR, Berlin (2020)
Colombo, P., Staerman, G., Piantanida, P., Clavel, C.: Automatic text evaluation through the lens of Wasserstein barycenters. In: EMNLP 2021, Punta Cana, Dominica (2021)
Cuturi, M., Doucet, A.: Fast computation of Wasserstein barycenters. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning, volume 32(2) of Proceedings of Machine Learning Research, pp. 685–693. PMLR, Bejing, China, (2014)
Delalande, A.: Nearly tight convergence bounds for semi-discrete entropic optimal transport. In: Camps-Valls, G., Ruiz, F.J.R., Valera, I. (eds.) Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, pp. 1619–1642. PMLR (2022)
Delalande, A.: Quantitative Stability in Quadratic Optimal Transport. Université Paris-Saclay, Theses (2022)
Delalande, A., Mérigot, Q.: Quantitative stability of optimal transport maps under variations of the target measure. Duke Math. J. (2022)
Dognin, P., Melnyk, I., Mroueh, Y., Ross, J., Dos Santos, C., Sercu, T.: Wasserstein barycenter model ensembling. In: International Conference on Learning Representations (2019)
Ekeland, I., Témam, R.: Convex Analysis and Variational Problems. Society for Industrial and Applied Mathematics, Philadelphia (1999)
Fournier, N., Guillin, A.: On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Relat. Fields 162(3), 707–738 (2015)
Ho, N., Nguyen, X., Yurochkin, M., Bui, H.H., Huynh, V., Phung, D.: Multilevel clustering via Wasserstein means. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 1501–1509. PMLR (2017)
Kim, Y.-H., Pass, B.: Wasserstein barycenters over Riemannian manifolds. Adv. Math. 307, 640–683 (2017)
Kitagawa, J., Mérigot, Q., Thibert, B.: Convergence of a newton algorithm for semi-discrete optimal transport. J. Eur. Math. Soc. 21(9), 2603–2651 (2019)
Le Gouic, T., Loubes, J.-M.: Existence and consistency of Wasserstein barycenters. Probab. Theory Relat. Fields 168(3), 901–917 (2017)
Le Gouic, T., Paris, Q., Rigollet, P., Stromme, A.: Fast convergence of empirical barycenters in Alexandrov spaces and the Wasserstein space. J. Eur. Math. Soc. 25, 2229–2250 (2022)
Lian, X., Jain, K., Truszkowski, J., Poupart, P., Yu, Y.: Unsupervised multilingual alignment using Wasserstein barycenter. In: Bessiere, C. (ed.) Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20. International Joint Conferences on Artificial Intelligence Organization, pp. 3702–3708. Main track (2020)
McCann, R.J.: A convexity principle for interacting gases. Adv. Math. 128(1), 153–179 (1997)
Panaretos, V.M., Zemel, Y.: An Invitation to Statistics in Wasserstein Space. SpringerBriefs in Probability and Mathematical Statistics, Springer, Cham (2020)
Pass, B.: Optimal transportation with infinitely many marginals. J. Funct. Anal. 264(4), 947–963 (2013)
Peyré, G., Cuturi, M.: Computational optimal transport. Found. Trends Mach. Learn. 11(5–6), 355–607 (2019)
Rabin, J., Peyré, G., Delon, J., Marc, B.: Wasserstein barycenter and its application to texture mixing. In: SSVM’11, pp. 435–446. Springer, Israel (2011)
Santambrogio, F.: Optimal Transport for Applied Mathematicians, vol. 55, pp. 58–63. Birkäuser, New York (2015)
Santambrogio, F., Wang, X.-J.: Convexity of the support of the displacement interpolation: counterexamples. Appl. Math. Lett. 58, 152–158 (2016)
Shalev-Shwartz, S., Shamir, O., Srebro, N., Sridharan, K.: Learnability, stability and uniform convergence. J. Mach. Learn. Res. 11(90), 2635–2670 (2010)
Solomon, J., de Goes, F., Peyré, G., Cuturi, M., Butscher, A., Nguyen, A., Du, T., Guibas, L.: Convolutional Wasserstein distances: efficient optimal transportation on geometric domains. ACM Trans. Graph. 34(4), 1–11 (2015)
Srivastava, S., Li, C., Dunson, D.B.: Scalable Bayes via barycenter in Wasserstein space. J. Mach. Learn. Res. 19(8), 1–35 (2018)
Sturm, K.-T.: Probability measures on metric spaces of nonpositive curvature. Contemp. Math. 338, 01 (2003)
van der Vaart, A.W., Wellner, J.A.: Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series in Statistics, Springer, Berlin (1996)
Vapnik, V.: Principles of risk minimization for learning theory. In: Moody, J., Hanson, S., Lippmann, R.P. (eds.) Advances in Neural Information Processing Systems, vol. 4. Morgan-Kaufmann, Cambridge (1991)
Varadarajan, V.S.: On the convergence of sample probability distributions. Sankhyā Indian J. Stat. 19(1/2), 23–26 (1958)
Villani, Cédric.: Optimal Transport: Old and New, vol. 338. Springer, Berlin (2008)
Weed, J., Bach, F.R.: Sharp asymptotic and finite-sample rates of convergence of empirical measures in Wasserstein distance. Bernoulli (2019)
Acknowledgements
The authors acknowledge the support of the Lagrange Mathematics and Computing Research Center and of the ANR (MAGA, ANR-16-CE40-0014). We thank Blanche Buet for interesting discussions related to this work.
Author information
Authors and Affiliations
Contributions
All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A. Dual formulation for the Wasserstein barycenter problem
Proof of Proposition 1.1
Instead of showing directly the formulation of Proposition 1.1, we will rather show
where for any \(\rho \in \mathcal {P}(\Omega )\), \(\phi _\rho ^c\) denotes the following c-transform of \(\phi _\rho \): \(\phi _\rho ^c(\cdot ) = \inf _{y \in \Omega } \frac{1}{2} \left\| \cdot -y\right\| ^2 - \phi _\rho (y)\). Such a formulation entails the result of Proposition 1.1 by the change of variable \((\psi _\rho )_\rho = \frac{\left\| \cdot \right\| ^2}{2} - (\phi _\rho )_\rho \in \textrm{L}^\infty (\mathbb {P}; W^{1, \infty }(\Omega ))\).
Duality Let’s first show that the value of \(\min _{\mu \in \mathcal {P}(\Omega )} F_\mathbb {P}(\mu )\) is equal to the value of the following supremum
where \(\textrm{L}^1(\mathbb {P}; \mathcal {C}(\Omega ))\) denotes the set of \(\mathbb {P}\)-measurable and Bochner integrable mappings from \(\mathcal {P}(\Omega )\) to the space \((\mathcal {C}(\Omega ), \left\| \cdot \right\| _\infty )\) of continuous function from \(\Omega \) to \(\mathbb {R}\) equipped with the supremum norm. Introduce the functional \(H: \mathcal {C}(\Omega ) \rightarrow \mathbb {R}\) defined for all \(\varphi \in \mathcal {C}(\Omega )\) by
Notice then that \(\mathrm {(D)_\mathbb {P}}' = -H(0)\). On the other hand, notice that H has the following convex conjugate: for \(\mu \in \mathcal {P}(\Omega )\),
where we used the Kantorovich duality formula (see for instance [43]) to get to the last line. We thus have
Therefore, showing that \(\mathrm {(D)_\mathbb {P}}' = \min _{\mu \in \mathcal {P}(\Omega )} F_\mathbb {P}(\mu )\) corresponds to show that \(H(0) = H^{**}(0)\). Since H is convex (by concavity of the c-transform operation), this will follow from the continuity of H at 0 for the supremum-norm over \(\mathcal {C}(\Omega )\) (Proposition 4.1 of [21]). For this, we can first notice that H never takes the value \(-\infty \): for any \(\varphi \in \mathcal {C}(\Omega )\) and \((\phi _\rho )_\rho \in \textrm{L}^1(\mathbb {P}; \mathcal {C}(\Omega ))\) such that \(\int _{\mathcal {P}(\Omega )} \phi _\rho (\cdot ) \textrm{d}\mathbb {P}(\rho ) = \varphi (\cdot )\), one has
If follows that
On the other hand, notice that H is bounded from above in a neighborhood of 0 in \(\mathcal {C}(\Omega )\): for any \(\varphi \in \mathcal {C}(\Omega )\) such that \(\left\| \varphi \right\| _\infty \le 1\), one has \(-\varphi ^c(x) \le 1\) for any \(x \in \mathbb {R}^d\) so that
A standard convex analysis result (Proposition 2.5 in [21]) then ensures that H is continuous at 0, so that \(H(0) = H^{**}(0)\) and \(\mathrm {(D)_\mathbb {P}}' = \min _{\mu \in \mathcal {P}(\Omega )} F_\mathbb {P}(\mu )\).
Restriction to \(\textrm{L}^\infty (\mathbb {P}; W^{1, \infty }(\Omega ))\). We show here that we can run the supremum \(\mathrm {(D)_\mathbb {P}}'\) only over \(\textrm{L}^\infty (\mathbb {P}; W^{1, \infty }(\Omega ))\) instead of \(\textrm{L}^1(\mathbb {P}; \mathcal {C}(\Omega ))\), that is
Let \((\phi _\rho )_\rho \in \textrm{L}^1(\mathbb {P}; \mathcal {C}(\Omega ))\) be an admissible solution to \(\mathrm {(D)_\mathbb {P}}'\), i.e. \((\phi _\rho )_\rho \) satisfies
Then we can build from \((\phi _\rho )_\rho \) another admissible solution \((\tilde{\phi }_\rho )_\rho \) that belongs to \(\textrm{L}^\infty (\mathbb {P}; W^{1, \infty }(\Omega ))\) and that performs better at \(\mathrm {(D)_\mathbb {P}}'\), i.e. that verifies
Indeed, introduce \((\hat{\phi }_\rho )_\rho := (\phi ^{cc}_\rho )_\rho \). Then for all \(\rho \in \mathcal {P}(\Omega )\), \(\hat{\phi }_\rho = \phi ^{cc}_\rho \) is obviously 2R-Lipschitz (as a c-transform) and satisfies \(\hat{\phi }_\rho ^c = \phi _\rho ^c\) and \(\hat{\phi }_\rho \ge \phi _\rho \) (as a double c-transform). Using then (13), one has that
where \(\alpha \) is also 2R-Lipschitz. Now denoting \(\tilde{\phi }_\rho = \hat{\phi _\rho } - \alpha \) for all \(\rho \in \mathcal {P}(\Omega )\), the mapping \((\tilde{\phi }_\rho )_\rho \in \textrm{L}^1(\mathbb {P}; \mathcal {C}(\Omega ))\) is admissible to \(\mathrm {(D)_\mathbb {P}}'\) by construction and satisfies \(\tilde{\phi }_\rho \le \hat{\phi }_\rho \) for all \(\rho \in \mathcal {P}(\Omega )\), so that \(\tilde{\phi }^c_\rho \ge \hat{\phi }^c_\rho = \phi ^c_\rho \) (using that the c-transform is order-reversing). For each \(\rho \in \mathcal {P}(\Omega )\), up to subtracting \(\tilde{\phi }_\rho (0)\) to \(\tilde{\phi }_\rho \) (this operation leaves \((\tilde{\phi }_\rho )_\rho \) admissible to \(\mathrm {(D)_\mathbb {P}}'\) and does not change its value), one can assume that \(\tilde{\phi }_\rho (0) = 0\). Noticing that \(\tilde{\phi }_\rho \) is 4R-Lipschitz by construction, we have the bound \(\left\| \tilde{\phi }_\rho \right\| _{W^{1, \infty }(\Omega )} \le 4R(1+R)\). We thus have built an admissible \((\tilde{\phi }_\rho )_\rho \in \textrm{L}^\infty (\mathbb {P}; W^{1,\infty }(\Omega ))\) from an admissible \((\phi _\rho )_\rho \in \textrm{L}^1(\mathbb {P}; \mathcal {C}(\Omega ))\) that satisfies (14), which shows that we can run the supremum \(\mathrm {(D)_\mathbb {P}}'\) only over \(\textrm{L}^\infty (\mathbb {P}; W^{1, \infty }(\Omega ))\) instead of \(\textrm{L}^1(\mathbb {P}; \mathcal {C}(\Omega ))\)
Existence of a maximizer There now remains to show that the supremum in \(\mathrm {(D)_\mathbb {P}}'\) can be replaced by a maximum. Let \(\left( (\phi _\rho ^n)_\rho \right) _{n\ge 0}\) be a maximizing sequence to \(\mathrm {(D)_\mathbb {P}}'\), and assume from what precedes that this sequence belongs to \(\textrm{L}^\infty (\mathbb {P}; W^{1, \infty }(\Omega ))\) and satisfies for all \(n\ge 0\) and \(\rho \in \mathcal {P}(\Omega )\), \(\left\| \phi ^n_\rho \right\| _{W^{1, \infty }(\Omega )} \le 4R(1+R)\). Further assume that this sequence verifies for all \(n \ge 1\),
For any \(n \ge 0\), the mapping \((\rho , x) \mapsto \phi ^n_\rho (x)\) is bounded in \(\textrm{L}^2(\mathbb {P}\otimes \lambda )\) where \(\lambda \) denotes the Lebesgue measure over \(\Omega \). Therefore, by Banach–Alaoglu theorem, the sequence \(\left( (\phi _\rho ^n)_\rho \right) _{n\ge 0}\) (seen as a sequence in \(\textrm{L}^2(\mathbb {P}\otimes \lambda )\)) admits a weakly converging subsequence in \(\textrm{L}^2(\mathbb {P}\otimes \lambda )\), that we do not relabel and for which we denote \((\phi ^\infty _\rho )_\rho \) the weak limit in \(\textrm{L}^2(\mathbb {P}\otimes \lambda )\). Using now Mazur’s lemma [11, Corollary 3.8], we know that there exists a sequence of integers \((N_n)_{n \ge 0}\) and coefficients \(((\lambda _{n,k})_{n \le k \le N_n})_{n \ge 0} \ge 0\) satisfying for all \(n \ge 0\), \(\sum _{k=n}^{N_n} \lambda _{n, k} = 1\) such that the sequence \(\left( (\bar{\phi }_\rho ^n)_\rho \right) _{n\ge 0}\) defined for all \(n \ge 0\) and \(\rho \in \mathcal {P}(\Omega )\) by \(\bar{\phi }_\rho ^n:= \sum _{k=n}^{N_n} \lambda _{n,k} \phi _\rho ^k\) converges strongly to \((\phi ^\infty _\rho )_\rho \) in \(\textrm{L}^2(\mathbb {P}\otimes \lambda )\). By concavity of the c-transform operation and equation (15), we then have the bound
The sequence \(\left( (\bar{\phi }_\rho ^n)_\rho \right) _{n\ge 0}\) is therefore also a maximizing sequence of \(\mathrm {(D)_\mathbb {P}}'\) and it also satisfies for any \(n \ge 0\) and \(\rho \in \mathcal {P}(\Omega )\) the bound
Since the sequence \(\left( (\bar{\phi }_\rho ^n)_\rho \right) _{n\ge 0}\) strongly converges to \((\phi ^\infty _\rho )_\rho \) in \(\textrm{L}^2(\mathbb {P}\otimes \lambda )\), one can extract a subsequence (that we do not relabel) such that for \(\mathbb {P}\)-almost-every \(\rho \in \mathcal {P}(\Omega )\), the sequence \((\bar{\phi }^n_\rho )_{n \ge 0}\) converges to \(\phi ^\infty _\rho \) in \(\textrm{L}^2(\lambda )\). Using (17) and Arzelà-Ascoli theorem, we deduce that for \(\mathbb {P}\)-almost-every \(\rho \in \mathcal {P}(\Omega )\), the sequence \((\bar{\phi }^n_\rho )_{n \ge 0}\) converges uniformly to \(\phi ^\infty _\rho \) in \(\mathcal {C}(\Omega )\) and that
In particular, \((\phi ^\infty _\rho )_\rho \) belongs to \(\textrm{L}^\infty (\mathbb {P}; W^{1, \infty }(\Omega ))\) and we have the limit
so that \((\phi ^\infty _\rho )_\rho \) is admissible to \(\mathrm {(D)_\mathbb {P}}'\). Eventually, for \(\mathbb {P}\)-almost-every \(\rho \in \mathcal {P}(\Omega )\), we have the limit
so that by Lebesgue’s dominated convergence theorem and the bound (16),
which proves that \((\phi ^\infty _\rho )_\rho \in \textrm{L}^\infty (\mathbb {P}; W^{1, \infty }(\Omega ))\) is a maximizer for \(\mathrm {(D)_\mathbb {P}}'\).\(\square \)
Appendix B. Strong-convexity of \(\mathcal {K}_\rho \) for measures with non-convex support
This section gathers occurrences of measures \(\rho \) where the strong convexity estimate (4) of Assumption 1.3 is verified.
1.1 B.1 Measures with convex support
This result is mostly extracted from [19].
Proposition B.1
Let \(\rho \in \mathcal {P}_{a.c.}(\Omega )\). Assume that \(\textrm{spt}(\rho )\) is convex and that there exists \(m_\rho , M_\rho \in (0, +\infty )\) such that \(m_\rho \le \rho \le M_\rho \) on \(\textrm{spt}(\rho )\). Let \(\psi , \tilde{\psi } \in \mathcal {C}(\Omega )\). Then
where \(C_{d,R, m_\rho , M_\rho } = \left( e(d+1)2^{d+1} R \textrm{diam}(\textrm{spt}(\rho )) \left( \frac{M_\rho }{m_\rho } \right) ^2 \right) ^{-1}\).
Proof
We only present here a formal sketch of the proof, which heavily relies on computations done in Section 2 of [19]. Assuming that \(\psi \) and \(\tilde{\psi }\) are smooth enough (see Proposition 2.4 of [19]) and introducing for \(t \in [0,1], \psi ^t = (1-t) \psi + t \tilde{\psi }\), Proposition 2.2 of [19] allows to differentiate \(\mathcal {K}_\rho (\psi ^t)\) with respect to t and to obtain:
were \(v = \tilde{\psi } - \psi \). Reasoning as in the proof of Proposition 2.4 of [19], the Brascamp–Lieb concentration inequality [9] and the log-concavity of the determinant seen as an application on the set of s.d.p. matrices ensure the following bound:
where \(C_{R, m_\rho , M_\rho } = \left( e R \textrm{diam}(\textrm{spt}(\rho )) \left( \frac{M_\rho }{m_\rho } \right) ^2 \right) ^{-1}\), \(\mu = (\nabla \psi ^*)_\# \rho \) and \(\tilde{\mu } = (\nabla \tilde{\psi })_\# \rho \). Back to (19), this leads to
where \(C_{d,R, m_\rho , M_\rho } = \left( e(d+1)2^{d+1} R \textrm{diam}(\textrm{spt}(\rho )) \left( \frac{M_\rho }{m_\rho } \right) ^2 \right) ^{-1}\). We conclude using the convex analysis argument of Proposition 3.1 from [19], which directly ensures
We get the general case (without the smoothness assumptions on \(\psi \) and \(\tilde{\psi }\)) using approximation arguments presented in Proposition 2.5 and 2.7 of [19].
1.2 B.2 Measures with connected union of convex sets as support
We extend Proposition B.1 to the case of a source measure \(\rho \) with a possibly non-convex support. We will assume that \(\textrm{spt}(\rho )\) can be written as a connected finite union of convex sets.
Proposition B.2
Let \(\rho \in \mathcal {P}_{a.c.}(\Omega )\) such that there exists \(m_\rho , M_\rho \in (0, +\infty )\) verifying \(m_\rho \le \rho \le M_\rho \) on \(\textrm{spt}(\rho )\). Assume that \(\textrm{spt}(\rho )\) is connected and that there exists \(N\ge 1\) convex sets \((C_i)_{1 \le i \le N}\) in \(\Omega \) such that \(\textrm{spt}(\rho ) = \bigcup _{i=1}^N C_i\). Also assume that for any \(i \ne j\) such that \(C_i \cap C_j \ne \emptyset \), one has \(\rho (C_i \cap C_j) > 0\). Then there exists a constant \(c_\rho \) depending on \(\rho \) such that for any \(\psi , \tilde{\psi } \in \mathcal {C}(\Omega )\),
Remark B.1
(Constant \(c_\rho \) and Poincaré–Wirtinger constant of \(\rho \)) The constant \(c_\rho \) of Proposition B.2 is not made precise in the statement. A look at the proof of this proposition only allows to bound \(c_\rho \) in terms of the second smallest eigenvalue \(\lambda _2(L)\) of a weighted graph Laplacian L, that is built from the graph whose vertices are the convex sets \(C_i\) and whose edge weights are the masses \(\rho (C_i \cap C_j)\) that \(\rho \) grants to the intersection of the convex sets \(C_i\) and \(C_j\). The constant \(c_\rho \) then reads:
The quantity \(\lambda _2(L)\) is not explicit, but it can be linked to the weighted Cheeger constant of \(\rho \), defined by
where \(\left| \partial A\right| _\rho = \int _{\partial A \cap \textrm{int}(\textrm{spt}(\rho ))} \rho (x) \textrm{d}\mathcal {H}^{d-1}(x)\) and where the infimum is taken over Lipschitz domains \(A \subset \textrm{int}(\textrm{spt}(\rho ))\) with boundary of finite \(\mathcal {H}^{d-1}\)-measure. Quoting [25] (Lemma 5.3), this constant can in turn be linked to the \(\textrm{L}^1\) Poincaré–Wirtinger constant \(C_{PW}(\rho )\) of \(\rho \). Indeed, \(h(\rho )\) is positive whenever \(\rho \) satisfies an \(\textrm{L}^1\) Poincaré–Wirtinger inequality, i.e. whenever there exists a finite \(C_{PW}(\rho ) > 0\) such that for all smooth function f on \(\Omega \),
The Poincaré–Wirtinger constant \(C_{PW}(\rho )\) and the Cheeger constant \(h(\rho )\) are then related by the inequality
Using ideas similar to the ones found in Section 5.2 of [25], the eigenvalue \(\lambda _2(L)\) can be bounded in terms of the Cheeger constant of \(\rho \), and thus in terms of \(C_{PW}(\rho )\). We do not detail this comparison here but only report that \(c_\rho \) may be written
where \(s_{d-1}\) denotes the surface area of the unit sphere in \(\mathbb {R}^d\) and
Proof of Proposition B.2
Let’s denote for now \(f = \tilde{\psi }^* - \psi ^*\). We will first exploit a discrete Laplacian over \(\mathcal {X}= \textrm{spt}(\rho )\) in order to upper bound \(\mathbb {V}\textrm{ar}_\rho (f)\) by a sum of variances of f w.r.t. probability measures supported over the convex sets \((C_i)_i\). We will then use Proposition B.1 to conclude.
For any \(i \in \{1,\dots ,N\}\), we denote \(\rho _i = \frac{1}{\rho (C_i)} \rho _{\vert C_i}\) and \(m_i = \int _{C_i} f \textrm{d}\rho _i\). Then one has the following bound:
We now consider the graph \(G = (\{C_i\}_{1 \le i \le N}, \{w_{ij}\}_{1 \le i,j \le N})\) with vertices \(\{C_i\}_{1 \le i \le N}\) and weighted edges \(\{w_{ij}\}_{1 \le i,j \le N}\) defined by
By construction, this graph has a single connected component. We introduce the weighted Laplacian matrix \(L \in \mathbb {R}^{N\times N}\) of G as follows:
Then L is a symmetric and positive semi-definite matrix. Its null space is made of constant vectors and we denote \(\lambda _2(L)\) its second smallest eigenvalue, which is non-zero. Denoting \(m = (m_i)_{1\le i \le N} \in \mathbb {R}^N\), we introduce \(\bar{m} = \left( \frac{1^{}}{N} \sum _i m_i\right) \mathbbm {1}_N \in \mathbb {R}^N\) the constant vector whose coordinates equal the mean of m (we use \(\mathbbm {1}_N = (1)_{1\le i \le N} \in \mathbb {R}^N\)). Notice that \(m - \bar{m}\) is in the orthogonal to the null space of L, ensuring the following bound:
But for any i, j such that \(w_{ij}>0\), denoting \(m_{i \cap j} = \frac{1}{\rho (C_i \cap C_j)} \int _{C_i \cap C_j} f \textrm{d}\rho \), one has
And for such i, j,
where we used Jensen’s inequality and the fact that \(C_i \cap C_j \subset C_i\). A similar bound can be shown for \((m_{i \cap j} - m_j)^2\), and plugging these into (21) yields
Injecting this into (20) yields
Now recalling that \(f = \psi - \tilde{\psi }\), we have by Proposition B.1 for any \(i\in \{1, \dots , N\}\) that
where \(C_{d,R, m_\rho , M_\rho }= \left( e(d+1)2^{d+1} R^2 \left( \frac{M_\rho }{m_\rho } \right) ^2 \right) ^{-1} \). Weighting this last inequality with \(\rho (C_i)\) and summing over \(i \in \{1, \dots , N\}\), this raises
Using (22) eventually gives
where \(c_\rho = \left( e(d+1)2^{d+1} R^2 \left( \frac{M_\rho }{m_\rho } \right) ^2 \left( N^2 + \frac{ 2 N^3}{\lambda _2(L)} \right) \right) ^{-1}\).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Carlier, G., Delalande, A. & Mérigot, Q. Quantitative stability of barycenters in the Wasserstein space. Probab. Theory Relat. Fields 188, 1257–1286 (2024). https://doi.org/10.1007/s00440-023-01241-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-023-01241-5