Skip to main content
Log in

An LP-based, strongly-polynomial 2-approximation algorithm for sparse Wasserstein barycenters

  • Original Paper
  • Published:
Operational Research Aims and scope Submit manuscript

Abstract

Discrete Wasserstein barycenters correspond to optimal solutions of transportation problems for a set of probability measures with finite support. Discrete barycenters are measures with finite support themselves and exhibit two favorable properties: there always exists one with a provably sparse support, and any optimal transport to the input measures is non-mass splitting. It is open whether a discrete barycenter can be computed in polynomial time. It is possible to find an exact barycenter through linear programming, but these programs may scale exponentially. In this paper, we prove that there is a strongly-polynomial 2-approximation algorithm based on linear programming. First, we show that an exact computation over the union of supports of the input measures gives a tight 2-approximation. This computation can be done through a linear program with setup and solution in strongly-polynomial time. The resulting measure is sparse, but an optimal transport may split mass. We then devise a second, strongly-polynomial algorithm to improve this measure to one with a non-mass splitting transport of lower cost. The key step is an update of the possible support set to resolve mass split. Finally, we devise an iterative scheme that alternates between these two algorithms. The algorithm terminates with a 2-approximation that has both a sparse support and an associated non-mass splitting optimal transport. We conclude with some sample computations and an analysis of the scaling of our algorithms, exhibiting vast improvements in running time over exact LP-based computations and low practical errors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Agueh M, Carlier G (2011) Barycenters in the Wasserstein space. SIAM J Math Anal 43(2):904–924

    Google Scholar 

  • Anderes E, Borgwardt S, Miller J (2016) Discrete Wasserstein barycenters: optimal transport for discrete data. Math Methods Oper Res 84(2):389–409

    Google Scholar 

  • Arthur D, Manthey B, Röglin H (2011) Smoothed analysis of the k-means method. J ACM 58(5):19:1–19:31

    Google Scholar 

  • Auricchio G, Bassetti F, Gualandi S, Veneroni S (2019) Computing Wasserstein barycenters via linear programming. In: Integration of constraint programming, artificial intelligence, and operations research, pp 355–363

  • Beiglböck M, Henry-Labordere P, Penkner F (2013) Model-independent bounds for option prices—a mass transport approach. Finance Stoch 17(3):477–501

    Google Scholar 

  • Benamou JD, Carlier G, Cuturi M, Nenna L, Peyré G (2015) Iterative Bregman projections for regularized transportation problems. SIAM J Sci Comput 37(2):A1111–A1138

    Google Scholar 

  • Bezanson J, Edelman A, Karpinski S, Shah VB (2017) Julia: a fresh approach to numerical computing. SIAM Rev 59(11):65–98

    Google Scholar 

  • Bigot J, Klein T (2017) Characterization of barycenters in the Wasserstein space by averaging optimal transport maps. ESAIM: Probability and Statistics 22:35–57

  • Boissard E, Gouic TL, Loubes JM (2015) Distribution’s template estimate with Wasserstein metrics. Bernoulli 21(2):740–759

    Google Scholar 

  • Bonneel N, Rabin J, Peyré G, Pfister H (2015) Sliced and Radon Wasserstein barycenters of measures. J Math Imaging Vis 51(1):22–45

    Google Scholar 

  • Borgwardt S, Patterson S (2020) Improved linear programs for discrete barycenters. INFORMS J Optim 2(1):14–33

    Google Scholar 

  • Buttazzo G, Pascale LD, Gori-Giorgi P (2012) Optimal-transport formulation of electronic density-functional theory. Phys Rev A 85:062502

    Google Scholar 

  • Carlier G, Ekeland I (2010) Matching for teams. Econ Theory 42(2):397–418

    Google Scholar 

  • Carlier G, Oberman A, Oudet E (2015) Numerical methods for matching for teams and Wasserstein barycenters. ESAIM Math Model Numer Anal 49(6):1621–1642

    Google Scholar 

  • Carlier G, Duval V, Peyré G, Schmitzer B (2017) Convergence of entropic schemes for optimal transport and gradient flows. SIAM J Math Anal 49(2):1385–1418

    Google Scholar 

  • Chiaporri PA, McCann R, Nesheim L (2010) Hedonic price equilibiria, stable matching and optimal transport; equivalence, topology and uniqueness. Econ Theory 42(2):317–354

    Google Scholar 

  • Claici S, Chien E, Solomon J (2018) Stochastic Wasserstein Barycenters. In: Proceedings of the 35th international conference on machine learning (PMLR) 80:999–1008

  • Cotar C, Friesecke G, Klüppelberg C (2013) Density functional theory and optimal transportation with Coulomb cost. Commun Pure Appl Math 66(4):548–599

    Google Scholar 

  • Cuturi M (2013) Sinkhorn distances: lightspeed computation of optimal transport. Adv Neural Inf Process Syst 26:2292–2300

    Google Scholar 

  • Cuturi M, Doucet A (2014) Fast Computation of Wasserstein Barycenters. In: Proceedings of the 31st international conference on machine learning (ICML-14), pp 685–693

  • del Barrio E, Cuesta-Albertos J, Matrán C, Mayo-Íscar A (2019) Robust clustering tools based on optimal transportation. Stat Comput 29(1):139–160

    Google Scholar 

  • Essid M, Solomon J (2017) Quadratically regularized optimal transport on graphs. SIAM J Sci Comput 40:A1961–A1986

    Google Scholar 

  • Frogner C, Mirzazadeh F, Solomon J (2019) Learning Embeddings into Entropic Wasserstein Spaces. eprint arXiv:190503329

  • Gadat S, Gavra I, Risser L (2018) How to calculate the barycenter of a weighted graph. Math Oper Res 43(4):1085–1118

    Google Scholar 

  • Galichon A, Henry-Labordere P, Touzi N (2014) A stochastic control approach to non-arbitrage bounds given marginals, with an application to lookback options. Ann Appl Probab 24(1):312–336

    Google Scholar 

  • Jain A, Zhong Y, Dubuisson-Jolly MP (1998) Deformable template models: a review. Signal Process 71(2):109–129

    Google Scholar 

  • Kroshnin A, Dvinskikh D, Dvurechensky P, Gasnikov A, Tupitsa N, Uribe C (2019) On the Complexity of Approximating Wasserstein Barycenter. In: Proceedings of the 36th international conference on machine learning, proceedings of machine learning research, vol 97, pp 3530–3540

  • LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Google Scholar 

  • Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137

    Google Scholar 

  • Lubin M, Dunning I (2015) Computing in operations research using Julia. INFORMS J Comput 27(2):238–248

    Google Scholar 

  • Luise G, Rudi A, Pontil M, Ciliberto C (2018) Differential Properties of Sinkhorn approximation for learning with Wasserstein distance. Adv Neural Inf Process Syst 31:5859–5870

    Google Scholar 

  • Luise G, Salzo S, Pontil M, Ciliberto C (2019) Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm. eprint arXiv:190513194

  • MacQueen JB (1967) Some methods of classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp 281–297

  • Mileyko Y, Mukherjee S, Harer J (2011) Probability measures on the space of persistence diagrams. Inverse Problems 27(12)

  • Miller J (2016) Transportation networks and matroids: algorithms through circuits and polyhedrality. PhD thesis, University of California Davis

  • Munch E, Turner K, Bendich P, Mukherjee S, Mattingly J, Harer J (2015) Probabilistic Fréchet means for time varying persistence diagrams. Electron J Stat 9:1173–1204

    Google Scholar 

  • Panaretos VM, Zemel Y (2019) Statistical aspects of Wasserstein distances. Annu Rev Stat Appl 6(1):405–431

    Google Scholar 

  • Pass B (2014) Multi-marginal optimal transport and multi-agent matching problems: uniqueness and structure of solutions. Discrete Contin Dyn Syst A 34(4):1623–1639

    Google Scholar 

  • Peyré G, Cuturi M (2019) Computational optimal transport. Found Trends Mach Learn 11(5–6):355–607

    Google Scholar 

  • Rabin J, Peyré G, Delon J, Bernot M (2012) Wasserstein barycenter and its application to texture mixing. Scale Space Var Methods Comput Vis Lect Notes Comput Sci 6667:435–446

    Google Scholar 

  • Solomon J, Rustamov R, Guibas L, Butscher A (2014) Earth mover’s distances on discrete surfaces. ACM Trans Graph 33(4):67:1–67:12

    Google Scholar 

  • Solomon J, de Goes F, Peyré G, Cuturi M, Butscher A, Nguyen A, Du T, Guibas L (2015) Convolutional Wasserstein distances: efficient optimal transportation on geometric domains. ACM Trans Graph 34(4):66:1–66:11

    Google Scholar 

  • Srivastava S, Li C, Dunson DB (2018) Scalable bayes via barycenter in Wasserstein space. J Mach Learn Res 19:1–35

    Google Scholar 

  • Staib M, Claici S, Solomon J, Jegelka S (2017) Parallel streaming Wasserstein barycenters. Adv Neural Inf Process Syst 30:2644–2655

    Google Scholar 

  • Tardos E (1986) A strongly polynomial algorithm to solve combinatorial linear programs. Oper Res 34(2):250–256

    Google Scholar 

  • Trouvé A, Younes L (2005) Local geometry of deformable templates. SIAM J Math Anal 37(1):17–59

    Google Scholar 

  • Turner K, Mileyko Y, Mukherjee S, Harer J (2014) Fréchet means for distributions of persistence diagrams. Discrete Comput Geom 52(1):44–70

    Google Scholar 

  • Villani C (2003) Topics in optimal transportation. American Mathematical Society, Providence

    Google Scholar 

  • Villani C (2009) Optimal transport: old and new. Springer, Berlin

    Google Scholar 

  • Yang L, Li J, Sun D, Toh KC (2019) A Fast Globally Linearly Convergent Algorithm for the Computation of Wasserstein Barycenters. eprint arXiv:180904249

  • Ye J, Wu P, Wang JZ, Li J (2017) Fast discrete distribution clustering using Wasserstein barycenter with sparse support. IEEE Trans Signal Process 65(9):2317–2332

    Google Scholar 

  • Zemel Y, Panaretos V (2019) Fréchet means and Procrustes analysis in Wasserstein space. Bernoulli 25(2):932–976

    Google Scholar 

Download references

Acknowledgements

The author would like to thank Ethan Anderes for the support with implementations in the Julia language, and Jacob Miller for the helpful discussions. The author gratefully acknowledges support through the Collaboration Grant for Mathematicians Polyhedral Theory in Data Analytics of the Simons Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steffen Borgwardt.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Proofs of Theorems 3 and 4

We begin by proving Theorem 3.

Theorem 3

Algorithm 2 returns a measure \({\bar{P}}'\) supported on a subset of S with \(\phi ({\bar{P}}') \le 2 \cdot \phi ({\bar{P}})\) and there is a non-mass splitting transport realizing this bound. Further \(|{\bar{P}}'|\le (\sum _{i=1}^N |P_i| - N + 1)^2\).

Proof

First, note that the \(P_i^l\) constructed in Step 1 satisfy \(\text {supp}(P_i^l)\subset \text {supp}(P_i)\). Thus \(\text {supp}(\bar{P}^l)\subset S\), and consequently \(\text {supp}({\bar{P}}')\subset S\). Further, \({\bar{P}}'= \sum _{l=1}^r \bar{P^l}\) is a measure. This holds because \(\sum _{l=1}^{r} d_l= \sum _{l=1}^{r} z_{t_l} = 1\), because Step 2 does not affect this sum, and because the total mass in \(\bar{P^l}\) equals \(d_l\) by construction. Thus, \({\bar{P}}'\) is a measure supported in S.

Second, we prove correctness of Step 2. We will show that a greedily lexicographically maximal \((d_1,\dots ,d_r)\) is created while retaining an approximate barycenter in \(\text {supp}(P_{\text {org}})\). In particular, we have to show that the objective function value \(\phi ({\bar{P}}_{\text {org}})\) does not change during the shift of mass. For a simple wording, let \(\bar{P}_\text {lex}\) be the measure corresponding to \((d_1,\dots ,d_r)\) after Step 2. We will prove \(\phi ({\bar{P}}_{\text {org}})=\phi (\bar{P}_{\text {lex}})\).

Let \(x_{iq_i}^l\in P_i^l\) for \(i\le N\) and \(c=\sum _{i=1}^N \lambda _i x_{iq_i}^l\), as in Step 2a). Then \(\Vert c-s_l\Vert \le \Vert c-s_j\Vert\) for all \(j\ne l\). To see this, recall

$$\begin{aligned} \sum \limits _{i=1}^N \lambda _i \Vert s-x^l_{iq_i}\Vert ^2 = \sum \limits _{i=1}^N\lambda _i (\Vert s-c\Vert ^2+\Vert c-x^l_{iq_i}\Vert ^2), \end{aligned}$$

as demonstrated in the proof of Theorem 1. If \(\Vert c-s_l\Vert > \Vert c-s_j\Vert\) for some \(j\ne l\), \({\bar{P}}_\text {org}\) would not have been optimal.

By \(q_i= \text {arg}\max _{q\le |P_i^l|} (s_j-s_l)^Tx^l_{iq}\) in Step 2a), we pick the \(x^l_{iq_i}\) such that their weighted centroid \(c=\sum _{i=1}^N \lambda _i x^l_{iq_i}\) maximizes the difference \(\Vert c-s_l\Vert ^2 - \Vert c-s_j\Vert ^2 \le 0\). Only if \(\Vert c-s_l\Vert ^2= \Vert c-s_j\Vert ^2\), mass is shifted from \(s_l\) to \(s_j\). But then the approximation error does not change, because

$$\begin{aligned} \sum _{i=1}^N \lambda _i \Vert s_j-x^l_{iq_i}\Vert ^2 = \sum \limits _{i=1}^N\lambda _i (\Vert s_j-c\Vert ^2+\Vert c-x^l_{iq_i}\Vert ^2) = \sum \limits _{i=1}^N \lambda _i \Vert s_l-x^l_{iq_i}\Vert ^2 . \end{aligned}$$

Thus, the objective function value does not change during Step 2; we have \(\phi ({\bar{P}}_{\text {org}})=\phi ({\bar{P}}_{\text {lex}})\).

By definition of the running indices l and j, mass can only be moved from support points of higher index l to support points of lower index i. For each pair of l and j, we repeat this shift of mass until there is no weighted centroid with \(\Vert c-s_l\Vert = \Vert c-s_j\Vert\) anymore. Due to decreasing l in the outer loop and increasing j in the inner loop, \((d_1,\dots ,d_r)\) is transformed to be greedily lexicographically maximal and the corresponding measure remains an approximate barycenter.

Next, we prove correctness of Steps 3 and 4. We show that \(\phi ({\bar{P}}_{\text {org}})\ge \phi ({\bar{P}}')\). Further, we show that for each constructed partial measure \({\bar{P}}^l\) there is a non-mass splitting transport to the \(P_i^l\), and that they combine to a \(\bar{P}'\) that allows for a non-mass splitting transport that is at least as good as an optimal transport for \({\bar{P}}_{\text {org}}\). Finally, we show \(|{\bar{P}}'|\le (\sum _{i=1}^N |P_i| - N + 1)^2\).

Recall that in Step 3, the mass of each \(s_l\) is spread out to a set of weighted centroids to obtain \({\bar{P}}^l\). Independently of how the \(x^l_{iq_i}\) are picked from the \(P^l_i\) for all for all \(i\le N\), their weighted centroid \(c=\sum _{i=1}^N \lambda _i x^l_{iq_i}\) satisfies \(\sum _{i=1}^N \lambda _i \Vert c-x_{iq_i}^l\Vert ^2 \le \sum _{i=1}^N \lambda _i \Vert s_l-x_{iq_i}^l\Vert ^2\). By construction of \({\bar{P}}'\) from the \({\bar{P}}^l\) (Step 4), this already implies \(\phi ({\bar{P}}') \le \phi ({\bar{P}}_{\text {org}})\). The algorithm started with a 2-approximation, and thus it is guaranteed to return a \({\bar{P}}'\) with \(\phi ({\bar{P}}') \le 2 \cdot \phi ({\bar{P}})\).

The existence of a non-mass splitting transport from \({\bar{P}}'\) to \(P_1,\dots ,P_N\), and the fact that this transport realizes the above bound, is a consequence of two reasons. First, each \({\bar{P}}^l\) itself allows for a non-mass splitting transport to the \(P_i^l\) by lexicographically maximal choice of the \(x_{iq_i}^l\) in Step 3a): due to this choice, the first constructed weighted centroid c is lexicographically maximal among all (possible) weighted centroids that can be constructed from any \(x_{iq}^l\) in the \(P^l_i\). Further, by reducing the mass at each used support point by \(d_{\text {min}}\) in Step 3b), at least one of the \(d^l_{iq_i}\) becomes 0. The corresponding support point is removed from \(P^l_i\) (followed by some reindexing) and thus cannot be used for the construction of a weighted centroid in further iterations. Thus, the second centroid constructed in the inner loop is lexicographically strictly smaller than the first one. The same holds for all subsequent ones.

Second, any two partial measures \({\bar{P}}^{l_1}\), \({\bar{P}}^{l_2}\) from Step 3 satisfy \(\text {supp}({\bar{P}}^{l_1}) \cap \text {supp}({\bar{P}}^{l_2}) = \emptyset\) for \(l_1\ne l_2\), because of the earlier preprocessing in Step 2: weighted centroids that would be equally distant from both \(s_{l_1}\) and \(s_{l_2}\) cannot exist, because this would have caused a shift of mass to the lower index in Step 2 to create a lexicographically larger \((d_1,\dots ,d_r)\). Summing up, \({\bar{P}}'\) consists of a set of distinct support points, for which it is trivial to give a non-mass splitting transport to the \(P_i\) that is at least as good as an optimal transport for \(\bar{P}_{\text {org}}\): this transport just sends the whole mass of each support point in \({\bar{P}}'\) to the support points in the \(P_i\) that were used for its construction.

The removal of at least one support point from a \(P^l_i\) in Step 3b) implies that there are at most \(\sum _{i=1}^N |P^l_i| - N + 1\) runs of 3a) and 3b) to construct a \(P^l\): the ’go back to a)’ statement is applied while \(d_l>0\); this is the case while there still is a support point in a \(P^l_i\) with mass on it. In the final run of Steps 3a) and 3b) for each \(P^l\), all the \(P^l_i\) have precisely one support point with the same mass left. This gives the claimed bound, and in particular \(|P^l|\le \sum _{i=1}^N |P^l_i| - N + 1\).

Due to \(|P^l_i|\le |P_i|\) and \(|{\bar{P}}_\text {org}| \le \sum _{i=1}^N |P_i| -N + 1\), we obtain

$$\begin{aligned} |\bar{P}'|= & {} \sum \limits _{l=1}^{|{\bar{P}}_\text {org}|} |{\bar{P}}^l|\\\le & {} \sum \limits _{l=1}^{|{\bar{P}}_\text {org}|} (\sum \limits _{i=1}^N |P^l_i| -N + 1) \le \sum \limits _{l=1}^{|{\bar{P}}_\text {org}|} (\sum \limits _{i=1}^N |P_i| -N + 1) \le (\sum _{i=1}^N |P_i| - N + 1)^2. \end{aligned}$$

Thus \({\bar{P}}'\) satisfies all claimed properties. \(\square\)

Next, we prove that Algorithm 2 runs in strongly-polynomial time.

Theorem 4

For all rational input, a measure can be computed in strongly-polynomial time that is a 2-approximation of a barycenter and for which there is a non-mass splitting transport realizing this bound.

Proof

We consider the running time of each part of the algorithm. For readability, we say ‘polynomial’ in this proof in place of ‘strongly-polynomial’. We use ‘linear’ and ‘quadratic’ to refer to the bit size \({\mathcal {I}}\) of the input. Note that N, the \(|P_i|\), and the dimension d are all bounded above by \(|{\mathcal {I}}|\).

In Step 1, the input for the subsequent steps is created. By sparsity of \({\bar{P}}_\text {org}\), \(r \le \sum _{i=1}^N |P_i| - N + 1\). For each of the r support points \(s_l\), N images \(P^l_i\) with \(|P^l_i|\le |P_i|\) are created. In the application of the stated rule, each \(y_{it_lk}\) has to be processed (at most) once. For each \(y_{it_lk}\), a single comparison and a fixed number of elementary operations suffices to update the support point and mass in \(P_i^l\). In total, data structures of polynomial size are created in polynomial time.

Step 2 is the preprocessing of \((d_1,\dots ,d_r)\) to be greedily lexicographically maximal. For each pair of support points \(s_l, s_j\) with \(j < l\), we perform the inner part of the loop. Finding \(q_i\) in 2a) can be done by considering all \(x^l_{iq} \in P_i^l\) exactly once and comparing the inner products \((s_j-s_l)^Tx^l_{iq}\). This is possible in linear time. c is created through the scaling and the sum of N rational d-dimensional vectors.

Step 2b) begins with the computation of \(c-s_j\) and \(c-s_l\), then computes \(\Vert c-s_j\Vert ^2=(c-s_j)^T(c-s_j)\) and \(\Vert c-s_l\Vert ^2=(c-s_l)^T(c-s_l)\), and then compares the two values. This is possible in quadratic time. Picking the minimal mass among the \(x_{iq_i}^l\) is possible in linear time, and so is updating the masses, performing the set operations on \(P_i^l\) and \(P_i^j\), and reindexing. In this update, \(|P^l_i|\) is reduced by at least one, so the ’go back to a)’ statement is followed at most \(|P^l_i|\) times. Summing up, Step 2 runs in polynomial time.

Step 3 performs the spread-out of the r support points. Picking a lexicographically maximal support point \(x_{iq_i}^l\) in 3a) can be done by considering all support points in \(P_i^l\) once. One saves the current best support point and compares each other support point with respect to their lexicographic order. For identifying the lexicographic order of a pair of d-dimensional support points, (at most) all d of their coefficients have to be compared to each other. This is possible in linear time. Again, c is created through the scaling and the sum of N rational d-dimensional vectors.

In 3b), we pick the minimal mass among the \(x_{iq_i}^l\) used for the construction of c, which can be done in linear time. The same holds for the update of masses, the set operations on \(P_i^l\), and the reindexing. By this update, the size of one of the \(|P^l_i|\) is reduced by at least one, so the ’go back to a)’ statement is followed not more than \(\sum _{i=1}^N |P^l_i|\) times; more precisely, there are at most \(|P^l_i| - N + 1\) runs of 3a) and 3b) for each l. Summing up, the construction of each \({\bar{P}}^l\) runs in polynomial time, and so does the construction of all the \({\bar{P}}^l\).

In Step 4, the partial measures \({\bar{P}}^l\) are combined to obtain \({\bar{P}}'\). This is the construction of a measure with the appropriate mass put on at most \(|{\bar{P}}'|\le (\sum _{i=1}^N |P_i| - N + 1)^2\) support points. Each of these support points is just a copy of a support point in one of the \({\bar{P}}^l\). Thus, all steps run in polynomial time, which proves the claim. \(\square\)

Appendix 2: Proof of Theorem 5

Theorem 5

Algorithm 3 returns an approximate barycenter \({\bar{P}}'\) supported on a subset of S for which \(\phi ({\bar{P}}') \le 2 \cdot \phi ({\bar{P}})\), where \({\bar{P}}\) is a barycenter, and there is a non-mass splitting optimal transport realizing this bound. Further \(|{\bar{P}}'|\le \sum _{i=1}^N |P_i| - N + 1\).

Proof

First, recall that the output \({\bar{P}}'\) of Algorithm 2 (Step 2) always satisfies \(\text {supp}({\bar{P}}') \subset S\). Further, Algorithm 2 always returns a measure that has a corresponding non-mass splitting transport. As \({\bar{P}}_\text {org}\) from Algorithm 1 (Step 1) is not changed in the final run of Algorithm 2, the returned non-mass splitting transport is optimal. Further, recall that all approximate barycenters \({\bar{P}}_\text {org}\) computed in Step 1 have a support that satisfies \(|{\bar{P}}_\text {org}|\le \sum _{i=1}^N |P_i| - N + 1\). This transfers to the sparsity of \({\bar{P}}'\) returned by Algorithm 3.

It remains to prove termination of Algorithm 3 and the error bound. We will do so by showing that \(\phi ({\bar{P}}') < \phi ({\bar{P}}_\text {org})\) if \({\bar{P}}' \ne {\bar{P}}_\text {org}\) for \({\bar{P}}_\text {org},{\bar{P}}'\) from the same iteration. This leads to a strictly decreasing sequence of values \(\phi ({\bar{P}}')\) as long as the algorithm keeps running. The first approximate barycenter in this sequence already is a 2-approximation and it can only become better. This immediately gives \(\phi ({\bar{P}}') \le 2 \cdot \phi (\bar{P})\). At the end of each Step 2, we update \(S_\text {org}=\text {supp}({\bar{P}}')\subset S\) before going back to Step 1, where an exact optimum over this new support, a subset of S, is computed. Because of this, and the fact that there are only finitely many subsets of S, the sequence of values \(\phi ({\bar{P}}')\) is finite.

Now, it only remains to prove that \(\phi ({\bar{P}}') < \phi (\bar{P}_\text {org})\) if \({\bar{P}}' \ne {\bar{P}}_\text {org}\). We begin by considering Step 3 of Algorithm 2. Assume \(P^l_i\) consists of a single support point \(x^l_{i1}\) for all \(i\le N\). Then the unique barycenter \({\bar{P}}^l\) of the \(P^l_i\) is the weighted centroid \(c = \sum _{i=1}^N \lambda _i x^l_{i1}\) and the cost of transport from \({\bar{P}}^l\) to all the \(P^l_i\) is \(\phi ({\bar{P}}^l) = d_l\cdot \sum _{i=1}^N\lambda _i \Vert c-x^l_{i1}\Vert ^2\). For all \(s\ne c\), in particular for \(s=s_l\), we get

$$\begin{aligned} \phi ({\bar{P}}^l) = d_l\cdot \sum \limits _{i=1}^N\lambda _i \Vert c-x^l_{i1}\Vert ^2 < d_l\cdot \sum \limits _{i=1}^N\lambda _i \Vert s-x^l_{i1}\Vert ^2. \end{aligned}$$

If some of the \(P^l_i\) consist of more than one support point, Step 3 selects a set of exactly one support point \(x_{iq}^l\) from each measure \(P^l_i\), forms a weighted centroid c with corresponding mass \(d_c=d_{\text {min}}\), and adds it to \(\text {supp}({\bar{P}}^l)\). Then this scheme is repeated for the remaining support points and remaining mass. This means that \({\bar{P}}^l\) is constructed as a set of weighted centroids c of support points \(x^l_{iq}\) to which these centroids c transport. Each of them satisfies \(d_c\cdot \sum _{i=1}^N\lambda _i \Vert c-x^l_{iq}\Vert ^2 \le d_c\cdot \sum _{i=1}^N\lambda _i \Vert s_l-x^l_{iq}\Vert ^2\). By summing over all c that are constructed, one obtains

$$\begin{aligned} \phi ({\bar{P}}^l)\le \sum _{i=1}^N\lambda _i \sum _{q=1}^{|P_i^l|} d^l_{iq} \cdot \Vert s_l-x_{iq}^l\Vert ^2. \end{aligned}$$

Informally, it is at least as costly to transport to the measures \(P^l_i\) from the support point \(s_l\) as from the set of weighted centroids (with appropriate masses) constituting \({\bar{P}}^l\). Equality in the above can only hold if the single support point \(s_l\) itself already is the weighted centroid of single-support point measures \(P^l_1,\dots ,P^l_N\). But this means that Step 3 of Algorithm 2 just copies \(s_l\) with mass \(d_l\) to \({\bar{P}}^l\). The algorithm stops when \({\bar{P}}'={\bar{P}}_\text {org}\). By \(\phi (\bar{P}')= \sum _{l=1}^r \phi ({\bar{P}}^l)\), this means all \(s_l\) have to satisfy \(\phi ({\bar{P}}^l)= \sum _{i=1}^N\lambda _i \sum _{q=1}^{|P_i^l|} d^l_{iq} \cdot \Vert s_l-x_{iq}^l\Vert ^2.\) So all \(s_l\) are already the weighted centroids of their single-support measures \(P^l_i\).

Further, note that when a shift of mass from \(s_l\) to \(s_j\) with \(j<l\) happens in Step 2 of Algorithm 2, then Step 3 is guaranteed to find a strictly better transport than before: there exists a set of support points that, before the shift, receive transport from \(s_l\), but have a weighted centroid \(c\ne s_l\). Such a set of support points would be moved from \(P^l_i\) to \(P^j_i\) (and at least one of the support points was not associated to \(s_j\) before). Then \(s_j\) is guaranteed to split mass and, in the following Step 3, the cost of transport is strictly improved; see above.

Thus \(\phi ({\bar{P}}') < \phi ({\bar{P}}_\text {org})\) if \({\bar{P}}' \ne \bar{P}_\text {org}\) and Algorithm 3 terminates with \({\bar{P}}' = {\bar{P}}_\text {org}\) in the final iteration. \(\square\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Borgwardt, S. An LP-based, strongly-polynomial 2-approximation algorithm for sparse Wasserstein barycenters. Oper Res Int J 22, 1511–1551 (2022). https://doi.org/10.1007/s12351-020-00589-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12351-020-00589-z

Keywords

Mathematics Subject Classification

Navigation