An LP-based, strongly-polynomial 2-approximation algorithm for sparse Wasserstein barycenters

Borgwardt, Steffen

doi:10.1007/s12351-020-00589-z

An LP-based, strongly-polynomial 2-approximation algorithm for sparse Wasserstein barycenters

Original Paper
Published: 03 August 2020

Volume 22, pages 1511–1551, (2022)
Cite this article

Operational Research Aims and scope Submit manuscript

Steffen Borgwardt ORCID: orcid.org/0000-0002-8069-5046¹

262 Accesses
5 Citations
Explore all metrics

Abstract

Discrete Wasserstein barycenters correspond to optimal solutions of transportation problems for a set of probability measures with finite support. Discrete barycenters are measures with finite support themselves and exhibit two favorable properties: there always exists one with a provably sparse support, and any optimal transport to the input measures is non-mass splitting. It is open whether a discrete barycenter can be computed in polynomial time. It is possible to find an exact barycenter through linear programming, but these programs may scale exponentially. In this paper, we prove that there is a strongly-polynomial 2-approximation algorithm based on linear programming. First, we show that an exact computation over the union of supports of the input measures gives a tight 2-approximation. This computation can be done through a linear program with setup and solution in strongly-polynomial time. The resulting measure is sparse, but an optimal transport may split mass. We then devise a second, strongly-polynomial algorithm to improve this measure to one with a non-mass splitting transport of lower cost. The key step is an update of the possible support set to resolve mass split. Finally, we devise an iterative scheme that alternates between these two algorithms. The algorithm terminates with a 2-approximation that has both a sparse support and an associated non-mass splitting optimal transport. We conclude with some sample computations and an analysis of the scaling of our algorithms, exhibiting vast improvements in running time over exact LP-based computations and low practical errors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The p-Median Problem

Multiplicativity of linear functionals on function spaces on an open disc

Article 28 May 2024

Hankel-Type Operator Acting on Hardy Spaces and Weighted Bergman Spaces

Article 06 May 2024

References

Agueh M, Carlier G (2011) Barycenters in the Wasserstein space. SIAM J Math Anal 43(2):904–924
Google Scholar
Anderes E, Borgwardt S, Miller J (2016) Discrete Wasserstein barycenters: optimal transport for discrete data. Math Methods Oper Res 84(2):389–409
Google Scholar
Arthur D, Manthey B, Röglin H (2011) Smoothed analysis of the k-means method. J ACM 58(5):19:1–19:31
Google Scholar
Auricchio G, Bassetti F, Gualandi S, Veneroni S (2019) Computing Wasserstein barycenters via linear programming. In: Integration of constraint programming, artificial intelligence, and operations research, pp 355–363
Beiglböck M, Henry-Labordere P, Penkner F (2013) Model-independent bounds for option prices—a mass transport approach. Finance Stoch 17(3):477–501
Google Scholar
Benamou JD, Carlier G, Cuturi M, Nenna L, Peyré G (2015) Iterative Bregman projections for regularized transportation problems. SIAM J Sci Comput 37(2):A1111–A1138
Google Scholar
Bezanson J, Edelman A, Karpinski S, Shah VB (2017) Julia: a fresh approach to numerical computing. SIAM Rev 59(11):65–98
Google Scholar
Bigot J, Klein T (2017) Characterization of barycenters in the Wasserstein space by averaging optimal transport maps. ESAIM: Probability and Statistics 22:35–57
Boissard E, Gouic TL, Loubes JM (2015) Distribution’s template estimate with Wasserstein metrics. Bernoulli 21(2):740–759
Google Scholar
Bonneel N, Rabin J, Peyré G, Pfister H (2015) Sliced and Radon Wasserstein barycenters of measures. J Math Imaging Vis 51(1):22–45
Google Scholar
Borgwardt S, Patterson S (2020) Improved linear programs for discrete barycenters. INFORMS J Optim 2(1):14–33
Google Scholar
Buttazzo G, Pascale LD, Gori-Giorgi P (2012) Optimal-transport formulation of electronic density-functional theory. Phys Rev A 85:062502
Google Scholar
Carlier G, Ekeland I (2010) Matching for teams. Econ Theory 42(2):397–418
Google Scholar
Carlier G, Oberman A, Oudet E (2015) Numerical methods for matching for teams and Wasserstein barycenters. ESAIM Math Model Numer Anal 49(6):1621–1642
Google Scholar
Carlier G, Duval V, Peyré G, Schmitzer B (2017) Convergence of entropic schemes for optimal transport and gradient flows. SIAM J Math Anal 49(2):1385–1418
Google Scholar
Chiaporri PA, McCann R, Nesheim L (2010) Hedonic price equilibiria, stable matching and optimal transport; equivalence, topology and uniqueness. Econ Theory 42(2):317–354
Google Scholar
Claici S, Chien E, Solomon J (2018) Stochastic Wasserstein Barycenters. In: Proceedings of the 35th international conference on machine learning (PMLR) 80:999–1008
Cotar C, Friesecke G, Klüppelberg C (2013) Density functional theory and optimal transportation with Coulomb cost. Commun Pure Appl Math 66(4):548–599
Google Scholar
Cuturi M (2013) Sinkhorn distances: lightspeed computation of optimal transport. Adv Neural Inf Process Syst 26:2292–2300
Google Scholar
Cuturi M, Doucet A (2014) Fast Computation of Wasserstein Barycenters. In: Proceedings of the 31st international conference on machine learning (ICML-14), pp 685–693
del Barrio E, Cuesta-Albertos J, Matrán C, Mayo-Íscar A (2019) Robust clustering tools based on optimal transportation. Stat Comput 29(1):139–160
Google Scholar
Essid M, Solomon J (2017) Quadratically regularized optimal transport on graphs. SIAM J Sci Comput 40:A1961–A1986
Google Scholar
Frogner C, Mirzazadeh F, Solomon J (2019) Learning Embeddings into Entropic Wasserstein Spaces. eprint arXiv:190503329
Gadat S, Gavra I, Risser L (2018) How to calculate the barycenter of a weighted graph. Math Oper Res 43(4):1085–1118
Google Scholar
Galichon A, Henry-Labordere P, Touzi N (2014) A stochastic control approach to non-arbitrage bounds given marginals, with an application to lookback options. Ann Appl Probab 24(1):312–336
Google Scholar
Jain A, Zhong Y, Dubuisson-Jolly MP (1998) Deformable template models: a review. Signal Process 71(2):109–129
Google Scholar
Kroshnin A, Dvinskikh D, Dvurechensky P, Gasnikov A, Tupitsa N, Uribe C (2019) On the Complexity of Approximating Wasserstein Barycenter. In: Proceedings of the 36th international conference on machine learning, proceedings of machine learning research, vol 97, pp 3530–3540
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Google Scholar
Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137
Google Scholar
Lubin M, Dunning I (2015) Computing in operations research using Julia. INFORMS J Comput 27(2):238–248
Google Scholar
Luise G, Rudi A, Pontil M, Ciliberto C (2018) Differential Properties of Sinkhorn approximation for learning with Wasserstein distance. Adv Neural Inf Process Syst 31:5859–5870
Google Scholar
Luise G, Salzo S, Pontil M, Ciliberto C (2019) Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm. eprint arXiv:190513194
MacQueen JB (1967) Some methods of classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp 281–297
Mileyko Y, Mukherjee S, Harer J (2011) Probability measures on the space of persistence diagrams. Inverse Problems 27(12)
Miller J (2016) Transportation networks and matroids: algorithms through circuits and polyhedrality. PhD thesis, University of California Davis
Munch E, Turner K, Bendich P, Mukherjee S, Mattingly J, Harer J (2015) Probabilistic Fréchet means for time varying persistence diagrams. Electron J Stat 9:1173–1204
Google Scholar
Panaretos VM, Zemel Y (2019) Statistical aspects of Wasserstein distances. Annu Rev Stat Appl 6(1):405–431
Google Scholar
Pass B (2014) Multi-marginal optimal transport and multi-agent matching problems: uniqueness and structure of solutions. Discrete Contin Dyn Syst A 34(4):1623–1639
Google Scholar
Peyré G, Cuturi M (2019) Computational optimal transport. Found Trends Mach Learn 11(5–6):355–607
Google Scholar
Rabin J, Peyré G, Delon J, Bernot M (2012) Wasserstein barycenter and its application to texture mixing. Scale Space Var Methods Comput Vis Lect Notes Comput Sci 6667:435–446
Google Scholar
Solomon J, Rustamov R, Guibas L, Butscher A (2014) Earth mover’s distances on discrete surfaces. ACM Trans Graph 33(4):67:1–67:12
Google Scholar
Solomon J, de Goes F, Peyré G, Cuturi M, Butscher A, Nguyen A, Du T, Guibas L (2015) Convolutional Wasserstein distances: efficient optimal transportation on geometric domains. ACM Trans Graph 34(4):66:1–66:11
Google Scholar
Srivastava S, Li C, Dunson DB (2018) Scalable bayes via barycenter in Wasserstein space. J Mach Learn Res 19:1–35
Google Scholar
Staib M, Claici S, Solomon J, Jegelka S (2017) Parallel streaming Wasserstein barycenters. Adv Neural Inf Process Syst 30:2644–2655
Google Scholar
Tardos E (1986) A strongly polynomial algorithm to solve combinatorial linear programs. Oper Res 34(2):250–256
Google Scholar
Trouvé A, Younes L (2005) Local geometry of deformable templates. SIAM J Math Anal 37(1):17–59
Google Scholar
Turner K, Mileyko Y, Mukherjee S, Harer J (2014) Fréchet means for distributions of persistence diagrams. Discrete Comput Geom 52(1):44–70
Google Scholar
Villani C (2003) Topics in optimal transportation. American Mathematical Society, Providence
Google Scholar
Villani C (2009) Optimal transport: old and new. Springer, Berlin
Google Scholar
Yang L, Li J, Sun D, Toh KC (2019) A Fast Globally Linearly Convergent Algorithm for the Computation of Wasserstein Barycenters. eprint arXiv:180904249
Ye J, Wu P, Wang JZ, Li J (2017) Fast discrete distribution clustering using Wasserstein barycenter with sparse support. IEEE Trans Signal Process 65(9):2317–2332
Google Scholar
Zemel Y, Panaretos V (2019) Fréchet means and Procrustes analysis in Wasserstein space. Bernoulli 25(2):932–976
Google Scholar

Download references

Acknowledgements

The author would like to thank Ethan Anderes for the support with implementations in the Julia language, and Jacob Miller for the helpful discussions. The author gratefully acknowledges support through the Collaboration Grant for Mathematicians Polyhedral Theory in Data Analytics of the Simons Foundation.

Author information

Authors and Affiliations

Department of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO, 80204, USA
Steffen Borgwardt

Authors

Steffen Borgwardt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Steffen Borgwardt.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Proofs of Theorems 3 and 4

We begin by proving Theorem 3.

Theorem 3

Algorithm 2 returns a measure ${\bar{P}}'$ supported on a subset of S with $\phi ({\bar{P}}') \le 2 \cdot \phi ({\bar{P}})$ and there is a non-mass splitting transport realizing this bound. Further $|{\bar{P}}'|\le (\sum _{i=1}^N |P_i| - N + 1)^2$.

Proof

First, note that the $P_i^l$ constructed in Step 1 satisfy $\text {supp}(P_i^l)\subset \text {supp}(P_i)$. Thus $\text {supp}(\bar{P}^l)\subset S$, and consequently $\text {supp}({\bar{P}}')\subset S$. Further, ${\bar{P}}'= \sum _{l=1}^r \bar{P^l}$ is a measure. This holds because $\sum _{l=1}^{r} d_l= \sum _{l=1}^{r} z_{t_l} = 1$, because Step 2 does not affect this sum, and because the total mass in $\bar{P^l}$ equals $d_l$ by construction. Thus, ${\bar{P}}'$ is a measure supported in S.

Second, we prove correctness of Step 2. We will show that a greedily lexicographically maximal $(d_1,\dots ,d_r)$ is created while retaining an approximate barycenter in $\text {supp}(P_{\text {org}})$. In particular, we have to show that the objective function value $\phi ({\bar{P}}_{\text {org}})$ does not change during the shift of mass. For a simple wording, let $\bar{P}_\text {lex}$ be the measure corresponding to $(d_1,\dots ,d_r)$ after Step 2. We will prove $\phi ({\bar{P}}_{\text {org}})=\phi (\bar{P}_{\text {lex}})$.

Let $x_{iq_i}^l\in P_i^l$ for $i\le N$ and $c=\sum _{i=1}^N \lambda _i x_{iq_i}^l$, as in Step 2a). Then $\Vert c-s_l\Vert \le \Vert c-s_j\Vert$ for all $j\ne l$. To see this, recall

$$\begin{aligned} \sum \limits _{i=1}^N \lambda _i \Vert s-x^l_{iq_i}\Vert ^2 = \sum \limits _{i=1}^N\lambda _i (\Vert s-c\Vert ^2+\Vert c-x^l_{iq_i}\Vert ^2), \end{aligned}$$

as demonstrated in the proof of Theorem 1. If $\Vert c-s_l\Vert > \Vert c-s_j\Vert$ for some $j\ne l$, ${\bar{P}}_\text {org}$ would not have been optimal.

By $q_i= \text {arg}\max _{q\le |P_i^l|} (s_j-s_l)^Tx^l_{iq}$ in Step 2a), we pick the $x^l_{iq_i}$ such that their weighted centroid $c=\sum _{i=1}^N \lambda _i x^l_{iq_i}$ maximizes the difference $\Vert c-s_l\Vert ^2 - \Vert c-s_j\Vert ^2 \le 0$. Only if $\Vert c-s_l\Vert ^2= \Vert c-s_j\Vert ^2$, mass is shifted from $s_l$ to $s_j$. But then the approximation error does not change, because

$$\begin{aligned} \sum _{i=1}^N \lambda _i \Vert s_j-x^l_{iq_i}\Vert ^2 = \sum \limits _{i=1}^N\lambda _i (\Vert s_j-c\Vert ^2+\Vert c-x^l_{iq_i}\Vert ^2) = \sum \limits _{i=1}^N \lambda _i \Vert s_l-x^l_{iq_i}\Vert ^2 . \end{aligned}$$

Thus, the objective function value does not change during Step 2; we have $\phi ({\bar{P}}_{\text {org}})=\phi ({\bar{P}}_{\text {lex}})$.

By definition of the running indices l and j, mass can only be moved from support points of higher index l to support points of lower index i. For each pair of l and j, we repeat this shift of mass until there is no weighted centroid with $\Vert c-s_l\Vert = \Vert c-s_j\Vert$ anymore. Due to decreasing l in the outer loop and increasing j in the inner loop, $(d_1,\dots ,d_r)$ is transformed to be greedily lexicographically maximal and the corresponding measure remains an approximate barycenter.

Next, we prove correctness of Steps 3 and 4. We show that $\phi ({\bar{P}}_{\text {org}})\ge \phi ({\bar{P}}')$. Further, we show that for each constructed partial measure ${\bar{P}}^l$ there is a non-mass splitting transport to the $P_i^l$, and that they combine to a $\bar{P}'$ that allows for a non-mass splitting transport that is at least as good as an optimal transport for ${\bar{P}}_{\text {org}}$. Finally, we show $|{\bar{P}}'|\le (\sum _{i=1}^N |P_i| - N + 1)^2$.

Recall that in Step 3, the mass of each $s_l$ is spread out to a set of weighted centroids to obtain ${\bar{P}}^l$. Independently of how the $x^l_{iq_i}$ are picked from the $P^l_i$ for all for all $i\le N$, their weighted centroid $c=\sum _{i=1}^N \lambda _i x^l_{iq_i}$ satisfies $\sum _{i=1}^N \lambda _i \Vert c-x_{iq_i}^l\Vert ^2 \le \sum _{i=1}^N \lambda _i \Vert s_l-x_{iq_i}^l\Vert ^2$. By construction of ${\bar{P}}'$ from the ${\bar{P}}^l$ (Step 4), this already implies $\phi ({\bar{P}}') \le \phi ({\bar{P}}_{\text {org}})$. The algorithm started with a 2-approximation, and thus it is guaranteed to return a ${\bar{P}}'$ with $\phi ({\bar{P}}') \le 2 \cdot \phi ({\bar{P}})$.

The existence of a non-mass splitting transport from ${\bar{P}}'$ to $P_1,\dots ,P_N$, and the fact that this transport realizes the above bound, is a consequence of two reasons. First, each ${\bar{P}}^l$ itself allows for a non-mass splitting transport to the $P_i^l$ by lexicographically maximal choice of the $x_{iq_i}^l$ in Step 3a): due to this choice, the first constructed weighted centroid c is lexicographically maximal among all (possible) weighted centroids that can be constructed from any $x_{iq}^l$ in the $P^l_i$. Further, by reducing the mass at each used support point by $d_{\text {min}}$ in Step 3b), at least one of the $d^l_{iq_i}$ becomes 0. The corresponding support point is removed from $P^l_i$ (followed by some reindexing) and thus cannot be used for the construction of a weighted centroid in further iterations. Thus, the second centroid constructed in the inner loop is lexicographically strictly smaller than the first one. The same holds for all subsequent ones.

Second, any two partial measures ${\bar{P}}^{l_1}$, ${\bar{P}}^{l_2}$ from Step 3 satisfy $\text {supp}({\bar{P}}^{l_1}) \cap \text {supp}({\bar{P}}^{l_2}) = \emptyset$ for $l_1\ne l_2$, because of the earlier preprocessing in Step 2: weighted centroids that would be equally distant from both $s_{l_1}$ and $s_{l_2}$ cannot exist, because this would have caused a shift of mass to the lower index in Step 2 to create a lexicographically larger $(d_1,\dots ,d_r)$. Summing up, ${\bar{P}}'$ consists of a set of distinct support points, for which it is trivial to give a non-mass splitting transport to the $P_i$ that is at least as good as an optimal transport for $\bar{P}_{\text {org}}$: this transport just sends the whole mass of each support point in ${\bar{P}}'$ to the support points in the $P_i$ that were used for its construction.

The removal of at least one support point from a $P^l_i$ in Step 3b) implies that there are at most $\sum _{i=1}^N |P^l_i| - N + 1$ runs of 3a) and 3b) to construct a $P^l$: the ’go back to a)’ statement is applied while $d_l>0$; this is the case while there still is a support point in a $P^l_i$ with mass on it. In the final run of Steps 3a) and 3b) for each $P^l$, all the $P^l_i$ have precisely one support point with the same mass left. This gives the claimed bound, and in particular $|P^l|\le \sum _{i=1}^N |P^l_i| - N + 1$.

Due to $|P^l_i|\le |P_i|$ and $|{\bar{P}}_\text {org}| \le \sum _{i=1}^N |P_i| -N + 1$, we obtain

$$\begin{aligned} |\bar{P}'|= & {} \sum \limits _{l=1}^{|{\bar{P}}_\text {org}|} |{\bar{P}}^l|\\\le & {} \sum \limits _{l=1}^{|{\bar{P}}_\text {org}|} (\sum \limits _{i=1}^N |P^l_i| -N + 1) \le \sum \limits _{l=1}^{|{\bar{P}}_\text {org}|} (\sum \limits _{i=1}^N |P_i| -N + 1) \le (\sum _{i=1}^N |P_i| - N + 1)^2. \end{aligned}$$

Thus ${\bar{P}}'$ satisfies all claimed properties. $\square$

Next, we prove that Algorithm 2 runs in strongly-polynomial time.

Theorem 4

For all rational input, a measure can be computed in strongly-polynomial time that is a 2-approximation of a barycenter and for which there is a non-mass splitting transport realizing this bound.

Proof

We consider the running time of each part of the algorithm. For readability, we say ‘polynomial’ in this proof in place of ‘strongly-polynomial’. We use ‘linear’ and ‘quadratic’ to refer to the bit size ${\mathcal {I}}$ of the input. Note that N, the $|P_i|$, and the dimension d are all bounded above by $|{\mathcal {I}}|$.

In Step 1, the input for the subsequent steps is created. By sparsity of ${\bar{P}}_\text {org}$, $r \le \sum _{i=1}^N |P_i| - N + 1$. For each of the r support points $s_l$, N images $P^l_i$ with $|P^l_i|\le |P_i|$ are created. In the application of the stated rule, each $y_{it_lk}$ has to be processed (at most) once. For each $y_{it_lk}$, a single comparison and a fixed number of elementary operations suffices to update the support point and mass in $P_i^l$. In total, data structures of polynomial size are created in polynomial time.

Step 2 is the preprocessing of $(d_1,\dots ,d_r)$ to be greedily lexicographically maximal. For each pair of support points $s_l, s_j$ with $j < l$, we perform the inner part of the loop. Finding $q_i$ in 2a) can be done by considering all $x^l_{iq} \in P_i^l$ exactly once and comparing the inner products $(s_j-s_l)^Tx^l_{iq}$. This is possible in linear time. c is created through the scaling and the sum of N rational d-dimensional vectors.

Step 2b) begins with the computation of $c-s_j$ and $c-s_l$, then computes $\Vert c-s_j\Vert ^2=(c-s_j)^T(c-s_j)$ and $\Vert c-s_l\Vert ^2=(c-s_l)^T(c-s_l)$, and then compares the two values. This is possible in quadratic time. Picking the minimal mass among the $x_{iq_i}^l$ is possible in linear time, and so is updating the masses, performing the set operations on $P_i^l$ and $P_i^j$, and reindexing. In this update, $|P^l_i|$ is reduced by at least one, so the ’go back to a)’ statement is followed at most $|P^l_i|$ times. Summing up, Step 2 runs in polynomial time.

Step 3 performs the spread-out of the r support points. Picking a lexicographically maximal support point $x_{iq_i}^l$ in 3a) can be done by considering all support points in $P_i^l$ once. One saves the current best support point and compares each other support point with respect to their lexicographic order. For identifying the lexicographic order of a pair of d-dimensional support points, (at most) all d of their coefficients have to be compared to each other. This is possible in linear time. Again, c is created through the scaling and the sum of N rational d-dimensional vectors.

In 3b), we pick the minimal mass among the $x_{iq_i}^l$ used for the construction of c, which can be done in linear time. The same holds for the update of masses, the set operations on $P_i^l$, and the reindexing. By this update, the size of one of the $|P^l_i|$ is reduced by at least one, so the ’go back to a)’ statement is followed not more than $\sum _{i=1}^N |P^l_i|$ times; more precisely, there are at most $|P^l_i| - N + 1$ runs of 3a) and 3b) for each l. Summing up, the construction of each ${\bar{P}}^l$ runs in polynomial time, and so does the construction of all the ${\bar{P}}^l$.

In Step 4, the partial measures ${\bar{P}}^l$ are combined to obtain ${\bar{P}}'$. This is the construction of a measure with the appropriate mass put on at most $|{\bar{P}}'|\le (\sum _{i=1}^N |P_i| - N + 1)^2$ support points. Each of these support points is just a copy of a support point in one of the ${\bar{P}}^l$. Thus, all steps run in polynomial time, which proves the claim. $\square$

Appendix 2: Proof of Theorem 5

Theorem 5

Algorithm 3 returns an approximate barycenter ${\bar{P}}'$ supported on a subset of S for which $\phi ({\bar{P}}') \le 2 \cdot \phi ({\bar{P}})$, where ${\bar{P}}$ is a barycenter, and there is a non-mass splitting optimal transport realizing this bound. Further $|{\bar{P}}'|\le \sum _{i=1}^N |P_i| - N + 1$.

Proof

First, recall that the output ${\bar{P}}'$ of Algorithm 2 (Step 2) always satisfies $\text {supp}({\bar{P}}') \subset S$. Further, Algorithm 2 always returns a measure that has a corresponding non-mass splitting transport. As ${\bar{P}}_\text {org}$ from Algorithm 1 (Step 1) is not changed in the final run of Algorithm 2, the returned non-mass splitting transport is optimal. Further, recall that all approximate barycenters ${\bar{P}}_\text {org}$ computed in Step 1 have a support that satisfies $|{\bar{P}}_\text {org}|\le \sum _{i=1}^N |P_i| - N + 1$. This transfers to the sparsity of ${\bar{P}}'$ returned by Algorithm 3.

It remains to prove termination of Algorithm 3 and the error bound. We will do so by showing that $\phi ({\bar{P}}') < \phi ({\bar{P}}_\text {org})$ if ${\bar{P}}' \ne {\bar{P}}_\text {org}$ for ${\bar{P}}_\text {org},{\bar{P}}'$ from the same iteration. This leads to a strictly decreasing sequence of values $\phi ({\bar{P}}')$ as long as the algorithm keeps running. The first approximate barycenter in this sequence already is a 2-approximation and it can only become better. This immediately gives $\phi ({\bar{P}}') \le 2 \cdot \phi (\bar{P})$. At the end of each Step 2, we update $S_\text {org}=\text {supp}({\bar{P}}')\subset S$ before going back to Step 1, where an exact optimum over this new support, a subset of S, is computed. Because of this, and the fact that there are only finitely many subsets of S, the sequence of values $\phi ({\bar{P}}')$ is finite.

Now, it only remains to prove that $\phi ({\bar{P}}') < \phi (\bar{P}_\text {org})$ if ${\bar{P}}' \ne {\bar{P}}_\text {org}$. We begin by considering Step 3 of Algorithm 2. Assume $P^l_i$ consists of a single support point $x^l_{i1}$ for all $i\le N$. Then the unique barycenter ${\bar{P}}^l$ of the $P^l_i$ is the weighted centroid $c = \sum _{i=1}^N \lambda _i x^l_{i1}$ and the cost of transport from ${\bar{P}}^l$ to all the $P^l_i$ is $\phi ({\bar{P}}^l) = d_l\cdot \sum _{i=1}^N\lambda _i \Vert c-x^l_{i1}\Vert ^2$. For all $s\ne c$, in particular for $s=s_l$, we get

$$\begin{aligned} \phi ({\bar{P}}^l) = d_l\cdot \sum \limits _{i=1}^N\lambda _i \Vert c-x^l_{i1}\Vert ^2 < d_l\cdot \sum \limits _{i=1}^N\lambda _i \Vert s-x^l_{i1}\Vert ^2. \end{aligned}$$

If some of the $P^l_i$ consist of more than one support point, Step 3 selects a set of exactly one support point $x_{iq}^l$ from each measure $P^l_i$, forms a weighted centroid c with corresponding mass $d_c=d_{\text {min}}$, and adds it to $\text {supp}({\bar{P}}^l)$. Then this scheme is repeated for the remaining support points and remaining mass. This means that ${\bar{P}}^l$ is constructed as a set of weighted centroids c of support points $x^l_{iq}$ to which these centroids c transport. Each of them satisfies $d_c\cdot \sum _{i=1}^N\lambda _i \Vert c-x^l_{iq}\Vert ^2 \le d_c\cdot \sum _{i=1}^N\lambda _i \Vert s_l-x^l_{iq}\Vert ^2$. By summing over all c that are constructed, one obtains

$$\begin{aligned} \phi ({\bar{P}}^l)\le \sum _{i=1}^N\lambda _i \sum _{q=1}^{|P_i^l|} d^l_{iq} \cdot \Vert s_l-x_{iq}^l\Vert ^2. \end{aligned}$$

Informally, it is at least as costly to transport to the measures $P^l_i$ from the support point $s_l$ as from the set of weighted centroids (with appropriate masses) constituting ${\bar{P}}^l$. Equality in the above can only hold if the single support point $s_l$ itself already is the weighted centroid of single-support point measures $P^l_1,\dots ,P^l_N$. But this means that Step 3 of Algorithm 2 just copies $s_l$ with mass $d_l$ to ${\bar{P}}^l$. The algorithm stops when ${\bar{P}}'={\bar{P}}_\text {org}$. By $\phi (\bar{P}')= \sum _{l=1}^r \phi ({\bar{P}}^l)$, this means all $s_l$ have to satisfy $\phi ({\bar{P}}^l)= \sum _{i=1}^N\lambda _i \sum _{q=1}^{|P_i^l|} d^l_{iq} \cdot \Vert s_l-x_{iq}^l\Vert ^2.$ So all $s_l$ are already the weighted centroids of their single-support measures $P^l_i$.

Further, note that when a shift of mass from $s_l$ to $s_j$ with $j<l$ happens in Step 2 of Algorithm 2, then Step 3 is guaranteed to find a strictly better transport than before: there exists a set of support points that, before the shift, receive transport from $s_l$, but have a weighted centroid $c\ne s_l$. Such a set of support points would be moved from $P^l_i$ to $P^j_i$ (and at least one of the support points was not associated to $s_j$ before). Then $s_j$ is guaranteed to split mass and, in the following Step 3, the cost of transport is strictly improved; see above.

Thus $\phi ({\bar{P}}') < \phi ({\bar{P}}_\text {org})$ if ${\bar{P}}' \ne \bar{P}_\text {org}$ and Algorithm 3 terminates with ${\bar{P}}' = {\bar{P}}_\text {org}$ in the final iteration. $\square$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Borgwardt, S. An LP-based, strongly-polynomial 2-approximation algorithm for sparse Wasserstein barycenters. Oper Res Int J 22, 1511–1551 (2022). https://doi.org/10.1007/s12351-020-00589-z

Download citation

Received: 13 September 2019
Revised: 16 July 2020
Accepted: 21 July 2020
Published: 03 August 2020
Issue Date: April 2022
DOI: https://doi.org/10.1007/s12351-020-00589-z

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An LP-based, strongly-polynomial 2-approximation algorithm for sparse Wasserstein barycenters

Abstract

Access this article

Similar content being viewed by others

The p-Median Problem

Multiplicativity of linear functionals on function spaces on an open disc

Hankel-Type Operator Acting on Hardy Spaces and Weighted Bergman Spaces

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1: Proofs of Theorems 3 and 4

Theorem 3

Proof

Theorem 4

Proof

Appendix 2: Proof of Theorem 5

Theorem 5

Proof

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

An LP-based, strongly-polynomial 2-approximation algorithm for sparse Wasserstein barycenters

Abstract

Access this article

Similar content being viewed by others

The p-Median Problem

Multiplicativity of linear functionals on function spaces on an open disc

Hankel-Type Operator Acting on Hardy Spaces and Weighted Bergman Spaces

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1: Proofs of Theorems 3 and 4

Theorem 3

Proof

Theorem 4

Proof

Appendix 2: Proof of Theorem 5

Theorem 5

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation