Abstract
Most common optimal transport (OT) solvers are currently based on an approximation of underlying measures by discrete measures. However, it is sometimes relevant to work only with moments of measures instead of the measure itself, and many common OT problems can be formulated as moment problems (the most relevant examples being \(L^p\)-Wasserstein distances, barycenters, and Gromov–Wasserstein discrepancies on Euclidean spaces). We leverage this fact to develop a generalized moment formulation that covers these classes of OT problems. The transport plan is represented through its moments on a given basis, and the marginal constraints are expressed in terms of moment constraints. A practical computation then consists in considering a truncation of the involved moment sequences up to a certain order, and using the polynomial sums-of-squares hierarchy for measures supported on semi-algebraic sets. We prove that the strategy converges to the solution of the OT problem as the order increases. We also show how to approximate linear quantities of interest, and how to estimate the support of the optimal transport map from the computed moments using Christoffel–Darboux kernels. Numerical experiments illustrate the good behavior of the approach.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Optimal transport provides a principled and versatile approach to work with probability distributions. In recent years, an increasing amount of theoretical results are being leveraged to build numerical solvers which are by now playing a fundamental role in numerous applications ranging from economy [1, 2], quantum chemistry [3, 4], gradient flow modeling [5], and machine learning [6].
The prototypical example is the two-marginal Monge-Kantorovich problem: given two Borel sets \(\mathcal {X}_1\subset \mathbb {R}^{n_1}\) and \(\mathcal {X}_2\subset \mathbb {R}^{n_2}\) and given two probability measures \(\mu \) and \(\nu \), solve
where \(c:\mathcal {X}_1\times \mathcal {X}_2\mapsto \mathbb {R}^+\) is a lower semi-continuous cost function. The infimum runs over the set \( \Pi (\mathcal {X}_1 \times \mathcal {X}_2; \mu , \nu ) \) of coupling measures (usually called transport plans) on \(\mathcal {X}_1\times \mathcal {X}_2\) with marginal distributions equal to \(\mu \) and \(\nu \) respectively. More generally, the problem can be posed on general Polish spaces. It can also be multi-marginal as we introduce later on in Sect. 3.
Numerous methods have been introduced for solving such problems in practice. Many algorithms rely on an approximation of measures by discrete measures (by sampling or discretization on discrete grids), whose numerical cost quickly becomes prohibitive when the number of discretization points increases. Among the existing strategies to mitigate this effect, probably the most popular one is based on adding an entropic regularization to the loss function which is then solved with a Sinkhorn algorithm [6,7,8]. The algorithm is however still posed on a grid. Other approaches involving discrete grids are the auction algorithm [9], numerical methods based on Laguerre cells [10], multiscale algorithms [11, 12] and methods based on dynamic formulations [13]. Recently, an approach that dynamically discovers sampling points where the support of the solution measure lies has been introduced in [14]. It provides promising results to address the curse of dimensionality when the optimal transport plans have sparse support.
The present paper considers an entirely different avenue where the spatial discretization is replaced by a spectral discretization where the transport plan is represented through its moments on a given basis, and the marginal constraints are expressed in terms of moment constraints. A practical computation then consists in considering a truncation of the involved moment sequences up to a certain order. The procedure needs to be well posed in the sense that one needs to guarantee convergence to the original problem as the truncation order increases.
Among the applications in which working with moments of the measures is particularly relevant, we can first mention uncertainty quantification and sensitivity analysis [15, 16] where first moments of probability distributions are of central interest. First moments of measures can also be used for optimal design of experiments for polynomial regression [17]. Also, relevant geometrical and topological information on the support of a measure can be efficiently captured by the first moments of the measure, which opens the way to many applications in data analysis [18, 19]. We may also mention problems connected with Partial Differential Equations: PDEs connected to quantum chemistry, Fokker-Planck and kinetic equations involve probability distributions as their unknowns, and to reduce computational costs one often restricts oneself to characterizing some moments (among the many works following this approach, we may mention [20,21,22]). Solving hyperbolic PDEs with moment approaches is also currently an active field of research (see [23, 24]). The field of stochastic homogenization also involves estimation of moments of the PDE solution instead of the solution itself (see [25, 26]). For all these applications, one may need to compare, or combine the underlying distributions by using only information on the moments, and solving optimal transport problems.
The moment approach to solve optimal transport problems is not entirely novel but it has only been explored in a few prior works. The idea was originally mentioned in [27] for the case of a polynomial basis and polynomial cost function. The moment approach was also recently explored in [28] for applications related to image processing involving trigonometric polynomial bases. It has also been recently used for solving the Monge problem [29], where an approximation of the transport map is constructed by solving a moment matrix completion problem and an approximation method based on Christoffel–Darboux kernel [30]. A relatively different contribution can be found in [31], where the authors relax marginal constraints into a set of moment constraints but here moments do not come from a prescribed basis. Instead they are selected from a given dictionary involving potentially general test functions. Last but not least, the idea of leveraging moment formulations and sums-of-squares (SoS) has also been used for the dual formulation of problem (1.1) in order to derive statistical estimation bounds of high dimensional OT problems (see [32, 33]).
In view of the present state of the art, the main contribution of this paper is to provide a general moment problem formulation of most common OT problems with polynomial or piecewise polynomial costs. In particular, the problem of estimating \(L^p\)-Wasserstein distances for \(p\ge 1\), barycenters, and Gromov–Wasserstein discrepancies on Euclidean spaces will be covered by our framework. We prove that the resulting sequence of optimal solutions converges to the whole moment sequence of the original OT measure as the polynomial order increases. The case of piecewise polynomial costs is addressed by a reformulation in terms of conditional measures.
For practical computations, we can directly apply the moment-SoS hierarchy similarly as in [27, 28], which eventually boils down to solving semidefinite programming optimization problems. It is worth emphasizing that by switching the point of view to a moment problem, we do not recover the measure itself. Instead, the resulting outputs will be moments of the optimal transport plan. Depending on the application, this may of course be a limitation. However, we show that it is possible to recover linear quantities of interest, and also the support of the measure by a post-processing algorithm based on Christoffel–Darboux kernels. Our numerical examples show that even the support of concentrated measures can efficiently be estimated with relatively low polynomial order. This feature seems particularly appealing. It could for instance be leveraged to recover optimal transport plans from high-dimensional OT problems: we could use the support estimation to provide well-chosen sampling points to grid-based approaches. A full development of these ideas will be presented in a forthcoming work.
The paper is organized as follows. After introducing some basic notation in Sect. 2, we define optimal transport problems in Sect. 3. We prove that when the involved cost function is a polynomial or a piecewise polynomial, the problem can be interpreted as a generalized moment problem. This section will allow us to introduce the basic principles of our approach to OT problems and important results from real algebraic geometry. In Sects. 4 and 5, we consider optimal transport problems that are playing a crucial role in numerous applicative areas, and prove that they can be expressed as generalized moment problems. More precisely, we consider in Sect. 4 the problems of computing \(L^p\)-Wasserstein distances and barycenters for \(p\ge 1\), and in Sect. 5 the problems of computing Gromov–Wasserstein discrepancies and corresponding barycenters. In Sect. 6, we formulate a generalized moment problem that includes all previous OT problems. We derive a solution strategy based on the moment-SoS (or Lasserre’s) hierarchy, and we prove its convergence to the solution of the OT problem. Section 7 explains how to postprocess the moments to estimate linear quantities of interest and the support of the measure. Section 8 illustrates the potential of the approach by giving numerical results on estimating the \(L^1\) and \(L^2\) Wasserstein distances, barycenters, and the \(L^2\) Gromov–Wasserstein discrepancy.
2 Some elements of notation
In the following, \(\mathbb {N}\) should be understood as the set of non-negative integers (including zero). Vectors \(\textbf{x}\) from the Euclidean space \(\mathbb {R}^n\) will be denoted with bold notation. The coordinates \(\textbf{x}=(x_1,\dots , x_n)^T\) will be written with plain text. The canonical vectors will be denoted as \(\textbf{e}_i = (0,\dots , 0, 1,0,\dots , 0)^T\) for \(i\in \{1,\dots ,n\}\). For any index \(p\in \mathbb {N}^*:=\mathbb {N}{\setminus }\{0\}\), \( \Vert \textbf{x}\Vert _p :=\left( \sum _{i=1}^n \vert x_i\vert ^p \right) ^{1/p} \) denotes the \(\ell ^p(\mathbb {R}^n)\) norm of \(\textbf{x}\). We let \(\mathbb {R}[\textbf{x}]\) be the space of real polynomials over \(\mathbb {R}^n\). For any multi-index \(\varvec{\alpha }=(\alpha _1,\dots , \alpha _n)^T\in \mathbb {N}^n\) with length \(\vert {\varvec{\alpha }}\vert = \sum _{i=1}^n \alpha _i\), we define the associated monomial \( \textbf{x}^{{\varvec{\alpha }}} = \prod _{i=1}^n x_i^{\alpha _i} \) of degree \(\vert {\varvec{\alpha }}\vert \). We let \(\mathbb {N}_{r}^n = \{{\varvec{\alpha }}\in \mathbb {N}^n: \vert {\varvec{\alpha }}\vert \le r\},\) and \(\mathbb {R}[\textbf{x}]_r\) be the space of real polynomials of degree less than r that can be written \(\sum _{{\varvec{\alpha }}\in \mathbb {N}_r^n} c_{\varvec{\alpha }}\textbf{x}^{\varvec{\alpha }}\) for some real coefficients \(c_{\varvec{\alpha }}\).
For any Borel set \(\mathcal {X}\) in \(\mathbb {R}^n\), we denote \(\mathcal {M}(\mathcal {X})\) the space of finite signed Borel measures supported on \(\mathcal {X}\),
its positive cone of finite Borel measures supported on \(\mathcal {X}\), and
the set of probability measures supported on \(\mathcal {X}\). The indicator function of a subset \(A\subset \mathbb {R}^n\) is denoted as \(\mathbbm {1}_A\). For any Borel set \(\mathcal {X}\) in \(\mathbb {R}^n\) and any measure \(\mu \in \mathcal {M}(\mathcal {X})\),
is the moment of \(\mu \) associated to the multi-index \({\varvec{\alpha }}\in \mathbb {N}^n\), and
is the sequence of moments of \(\mu \). The mass of \(\mu \) is denoted \(\textrm{mass}(\mu ) = m_0(\mu ).\) A measure \(\mu \) is said to be determinate if it is uniquely determined by its moment sequence \(m(\mu )\).
Finally, for a given sequence \(y = (y_{\varvec{\alpha }})_{{\varvec{\alpha }}\in \mathbb {N}^n}\), we introduce the Riesz functional \(\ell _y: \mathbb {R}[\textbf{x}] \mapsto \mathbb {R}\) which associates to a real polynomial \(g(\textbf{x}) = \sum _{{\varvec{\alpha }}\in \mathbb {N}^n} a_{\varvec{\alpha }}\textbf{x}^{\varvec{\alpha }}\) the value \(\ell _y(g) = \sum _{{\varvec{\alpha }}\in \mathbb {N}^n} a_{\varvec{\alpha }}y_{\varvec{\alpha }}\). For any measure \(\mu \), we thus have
3 Optimal transport problems with polynomial costs
3.1 Formulation
To guide the subsequent discussion, we consider the multi-marginal version of problem (1.1) as a prototypical example of an OT problem. This problem consists in considering K probability measures \(\mu _i\in \mathcal {P}(\mathcal {X}_i)\) defined on Borel sets \(\mathcal {X}_i\subset \mathbb {R}^{n_i}\) for all \(i\in \{1, \dots , K\}\), and solving
The loss function is of the form
and the set \(\mathcal {X}\) is defined as the product set
Note that \(\mathcal {X}\) can be identified with a subset of \(\mathbb {R}^n\), with
The function \(c: \mathcal {X}\rightarrow \mathbb {R}\) is a given cost function, and the constraint \(\Pi \) is a shorthand notation for the set of coupling measures having \(\mu _i\) as marginals, namely
where \(\textrm{proj}_i: \mathcal {X}\rightarrow \mathcal {X}_i\) denotes the canonical projection, and the push-forward measure \(\textrm{proj}_i \# \pi \) is the i-th marginal of \(\pi \).
The existence of a minimizer for (3.1) is standard in OT theory. Indeed, \(\Pi \) is trivially non empty since the coupling \(\otimes _{i=1}^K \mu _i\) belongs to this set. The set \(\Pi \) is convex and compact for the weak-\(*\) topology thanks to the imposed marginals, and if the cost function c is lower semi-continuous (l.s.c.), then the loss function \(\mathcal {L}:\pi \mapsto \int c \, \mathrm d\pi \) is l.s.c. with respect to the weak-\(*\) topology. Hence we can guarantee the existence of a minimizer by imposing a very weak hypothesis on the cost function c such as lower semi-continuity.
We next show that when the loss function is of polynomial nature, problem (3.1) is equivalent to a moment problem under some conditions, as initially observed in [27] (see also [34, Section 7.3]). To see this, note first of all that if c is a polynomial from \(\mathbb {R}[\textbf{x}]\). Therefore it follows from (2.1) that the loss function (3.2) satisfies
In addition, the marginal constraints on \(\pi \in \Pi \) in problem (3.1) imply constraints on the moments of \(\pi \),
As a result of (3.3) and (3.4), instead of considering the OT problem (3.1) where we search for an unknown measure \(\pi \in \Pi \), one can alternatively consider the moment problem of searching for the optimal sequence \(y = (y_{\varvec{\alpha }})_{{\varvec{\alpha }}\in \mathbb {N}^n}\) solving
where \(\Pi _{mom}:= \Pi _{mom}(\mathcal {X}; m(\mu _1), \ldots , m(\mu _K))\) is the set of sequences in \(\mathbb {R}^{\mathbb {N}^{n}}\) satisfying the following constraints:
-
(i)
Marginal conditions: the sequence y should satisfy
$$\begin{aligned} y_{(0,\dots , 0, {\varvec{\beta }}, 0,\dots ,0)} = m_{{\varvec{\beta }}}(\mu _i),\quad \forall {\varvec{\beta }}\in \mathbb {N}^{n_i}, \text { and } \forall i\in \{1,\dots , K\} \end{aligned}$$ -
(ii)
Moment sequence condition: the sequence y must have a representing measure supported on \(\mathcal {X}\), that is, there must exist a measure \(\pi \in \mathcal {M}(\mathcal {X})_+\) such that \(y = m(\pi )\). We write this condition as \(y\in \mathrm {MS(\mathcal {X})}.\)
The equivalence between problems (3.1) and (3.5) is closely related to the determinacy of measures \(\mu _i\), for which a sufficient condition is that the sets \(\mathcal {X}_i\) are compact. We summarize these facts in the theorem below.
Theorem 3.1
(Polynomial cost) If the sets \(\{\mathcal {X}_i\}_{i=1}^K\) are compact, then the OT problem (3.1) with polynomial cost is equivalent to the generalized moment problem (3.5):
-
A minimizer \(\pi ^*\) of problem (3.1) is such that \(m(\pi ^*)\) is a minimizer of problem (3.5).
-
A minimizer \(y^*\) of problem (3.5) has a representing measure which is solution of (3.1).
In addition, if the solution \(\pi ^*\) of the OT problem (3.1) is unique, then the solution \(y^*\) of (3.5) is unique and \(y^* = m(\pi ^*).\)
Proof
Suppose \(\pi ^*\) is a solution of problem (3.1). Since it is a Borel measure supported on the compact set \(\mathcal {X}\), it is determinateFootnote 1 so there exists a unique sequence \(y = m(\pi ^*)\). This sequence y is clearly in the set \( \Pi _{mom}\). Therefore, as a feasible solution for (3.5), it satisfies \( \rho _{mom} \le L(y) = \mathcal {L}(\pi ^*) = \rho \). Conversely, let \(y^*\) be a solution to problem (3.5). Since \(y\in \text {MS}(\mathcal {X})\), there is a representing measure \(\pi \) such that \(y^* = m(\pi )\). For \(\pi \) to belong to the feasible set \(\Pi \) of problem (3.1), the marginal conditions on \(m(\pi )\) should imply that the marginals of \(\pi \) are the \(\mu _i\). This is satisfied given that the marginal measures \(\mu _i\) are determinate because \(\mathcal {X}_i\) is compact. Therefore \(\pi \in \Pi \) and \( \rho \le \mathcal {L}(\pi ) = L(m(\pi )) = L(y^*) = \rho _{mom}\). This proves that \(\rho = \rho _{mom}\), and that \(m(\pi ^*)\) is a solution of problem (3.5) if and only if \(\pi ^*\) is a solution of (3.1). \(\square \)
Remark 3.2
Since the \(\mu _i\) are probability measures, \(y \in \Pi _{mom}\) is such that \(y_{(0,\ldots ,0)} = 1\) and a representing measure \(\pi \in \mathcal {M}(\mathcal {X})_+\) such that \(y = m(\pi )\) has mass 1, i.e. \(\pi \in \mathcal {P}(\mathcal {X}).\)
3.2 The moment sequence condition
In this section, we discuss how the moment sequence condition \(y \in MS(\mathcal {X})\) is translated into mathematical terms. This question is in fact directly related to the so-called moment problem which studies the following question: Given a Borel subset \(\mathcal {X}\subseteq \mathbb {R}^n\) and a sequence of real numbers \(y=(y_{{\varvec{\alpha }}})_{{\varvec{\alpha }}\in \mathbb {N}^n}\), what are the conditions on y under which we can guarantee that \(y = m(\pi )\) for some positive measure \(\pi \in \mathcal {M}(\mathcal {X})_+\)? For the one dimensional case (\(n=1\)), this classical problem is well understood and dates back to contributions by Markov, Stieltjes, Hausdorff, and Hamburger. Explicit conditions on y exist, and they are all stated in terms of positive semi-definiteness of certain Hankel matrices. Much less is known for the multidimensional case (\(n>1\)). A general result is given by the Riesz-Haviland theorem, which states that a moment sequence y has an associated Borel measure \(\pi \) such that \(y=m(\pi )\) if and only if \(\ell _y(f)\ge 0\) for all polynomials \(f \in \mathbb {R}[\textbf{x}] \) nonnegative on \(\mathcal {X}\). This theorem is not really useful if we do not have an explicit characterization of polynomials that are nonnegative on \(\mathcal {X}\) (so called positivstellensatz). Such a characterization has been provided by Schmügden in [35] when the ambient space \(\mathcal {X}\) is a compact basic semi-algebraic set of the form
for some polynomials \(g_j \in \mathbb {R}[\textbf{x}]\). In reference [35], it is proven that a sequence y has a representing Borel measure supported on \(\mathcal {X}\) (i.e. satisfies the moment sequence condition) if and only if it satisfies
where \(g_I = \prod _{j\in I} g_j\) and where we have used the convention \(g_\emptyset =1\).
For a polynomial \(g(\textbf{x}) = \sum _{{\varvec{\gamma }}\in \mathbb {N}^n} c_{\varvec{\gamma }}\textbf{x}^{\varvec{\gamma }}\in \mathbb {R}[\textbf{x}]\) and \(r\in \mathbb {N}\), we let \(\textbf{M}_r(g y) \in \mathbb {R}^{\mathbb {N}^n_r\times \mathbb {N}^n_r}\) be the matrix with entries
which is such that for any polynomial \(f \in \mathbb {R}[\textbf{x}]_r\) of degree less than r, we have
Therefore, the moment sequence condition (3.7) is equivalent to
where for a symmetric matrix \(\textbf{M}\), \(\textbf{M}\succcurlyeq 0\) means that \(\textbf{M}\) is positive semi-definite. A simpler characterization has been given by Putinar in [36] under the following additional assumption.
Assumption 3.3
There exists a polynomial u of the form \(u = u_0 + \sum _{j=1}^J u_j g_j\), where the \(u_j\) are sums of squares (SoS) polynomials, and such that \(\{\textbf{x}\in \mathbb {R}^n: u(\textbf{x}) \ge 0\}\) is compact.
Under Assumption 3.3, it is proven in [36] that y has a representing Borel measure supported on \(\mathcal {X}\) if and only if
where we have used the convention \(g_0 = 1\).
The linear positive semidefinite constraints (3.8) or (3.9) are exactly the moment sequence condition. We summarize the above results in the following theorem.
Theorem 3.4
(Th. 3.8 in [34]) Let \(\mathcal {X}\) be a basic semi-algebraic set as in (3.6). A sequence \(y \in \mathbb {R}^{\mathbb {N}^n}\) satisfies \(y\in MS(\mathcal {X})\) (i.e. satisfies the moment sequence condition on \(\mathcal {X}\)) if and only if it satisfies the positive semidefinite constraints (3.8), or the positive semidefinite constraints (3.9) under the additional Assumption 3.3.
In our context, since \(\mathcal {X}\) is the product set \(\mathcal {X}_1\times \dots \times \mathcal {X}_K\), we assume that each \(\mathcal {X}_i\) is a compact basic semi-algebraic set defined as
Then \(\mathcal {X}\) is a basic semi-algebraic set defined as in (3.6) with \(J=\sum _{i=1}^K J_i\) and functions \(\{g_j\}_{j=1}^J\) defined by
Remark 3.5
(About Assumption 3.3) Assumption 3.3 is trivially satisfied if \(g_1(\textbf{x}) = R - \Vert \textbf{x} \Vert _2^2 \) for some positive R. Since for any compact semi-algebraic set \(\mathcal {X}\), there exists a sufficiently large R such that \(\mathcal {X}\subset \{\textbf{x}: \Vert \textbf{x}\Vert _2 < R\}\), the condition \(R - \Vert \textbf{x}\Vert _2^2 \ge 0\) is redundant and can be systematically added to the definition of \(\mathcal {X}\). A stronger condition for Assumption 3.3 to hold for a product set \(\mathcal {X}\) is that the description of each set \(\mathcal {X}_i\) contains a function \(g^{(i)}_1(\textbf{x}_i) = R_i - \Vert \textbf{x}_i \Vert _2^2\). If this is not the case, we should prefer to add a single function \(g_1(\textbf{x}) = R - \Vert \textbf{x}\Vert _2^2 \) to the description of \(\mathcal {X}\) in order to reduce the number of positive semidefinite constraints.
3.3 Piecewise polynomial costs
Some OT problems (such as, e.g., the \(L^1\)-Wasserstein distance), involve a continuous or l.s.c. piecewise polynomial cost. These problems can also be formulated as generalized moment problems, up to the introduction of new unknown measures.
Piecewise polynomial costs. Let us assume that \(\mathcal {X}= \mathcal {A}_1 \cup \cdots \cup \mathcal {A}_m\), where the \(\mathcal {A}_i\) are pairwise disjoint Borel sets and
For a measure \(\pi \in \mathcal {P}(\mathcal {X})\), we introduce the measures
where \({\bar{\mathcal {A}}}_i\) is the closure of \(\mathcal {A}_i\). Since \(\mathbbm {1}_{\mathcal {A}_1} + \cdots + \mathbbm {1}_{\mathcal {A}_m} = \mathbbm {1}_\mathcal {X}\), we have \(\pi = \pi _1 + \cdots + \pi _m,\) and
We claim that the OT problem (3.1) is equivalent to
Indeed, if \(\pi \) is solution of problem (3.1), then the measures \(\pi _i = \mathbbm {1}_{\mathcal {A}_i} \pi \), \(1\le i\le m\), satisfy the constraints of (3.10) and \({\tilde{\rho }} \le {\tilde{\mathcal {L}}}(\pi _1,\ldots ,\pi _m) = \mathcal {L}(\pi ) = \rho \). Conversely, the set of measures \((\pi _1,\ldots ,\pi _m)\) satisfying the constraints of problem (3.10) is compact in the weak-\(*\) topology (since the \(\pi _i \in \mathcal {M}({\bar{\mathcal {A}}}_i)_+\) and \(\mathrm {mass(\pi _i)}\le 1\)), and therefore implies the existence of solutions. Moreover, if \((\pi _1,\ldots ,\pi _m)\) is a solution of (3.10), then \(\pi = \pi _1 + \cdots + \pi _m \in \Pi \) and \({\tilde{\rho }} = {\tilde{\mathcal {L}}}(\pi _1, \ldots , \pi _m) = \sum _{i=1}^m \int _\mathcal {X}c(\textbf{x}) (\mathrm d\pi _1 + \cdots + \mathrm d\pi _m) = \mathcal {L}(\pi ) \ge \rho \). Finally, denoting \({\tilde{\mathcal {L}}}(\pi _1, \ldots , \pi _m) = {\tilde{L}}(m(\pi _1), \ldots ,m(\pi _m))\), we have that the initial OT problem (3.1) is equivalent to the optimization problem
over m sequences \((y_i)_{1\le i\le m}\) that satisfy moment sequence conditions and whose sum satisfies marginal conditions. We summarize these facts in the theorem below.
Theorem 3.6
(Piecewise polynomial cost) If \( \mathcal {X}_1\times \dots \times \mathcal {X}_K\) is compact, then the OT problem (3.1) with l.s.c. piecewise polynomial cost over a partition \((\mathcal {A}_i)_{1\le i \le m}\) of \(\mathcal {X}\) is equivalent to the generalized moment problem (3.11): a minimizer \(\pi ^*\) of problem (3.1) is such that \((m(\pi _i^*))_{1\le i\le m}\), with \(\pi ^*_i = \mathbbm {1}_{\mathcal {A}_i} \pi ^*\), is a minimizer of problem (3.11), and conversely, a minimizer \((y^*_i)_{1\le i \le m}\) of problem (3.11) is such that each \(y^*_i\) has a representing measure \(\pi _i\) supported on \({\bar{\mathcal {A}}}_i\), and the sum \(\pi = \pi _1 + \cdots + \pi _m\) is solution of (3.1). In addition, if the solution \(\pi ^*\) of the OT problem (3.1) is unique, then even if (3.11) may have infinitely many solutions \((y^*_1,\ldots ,y_m^*)\), the sum \(y^*_1 + \cdots + y_m^*:= y^*\) is unique and such that \(y^* = m(\pi ^*).\)
Remark 3.7
For having a practical characterizing of the set \(MS({\bar{\mathcal {A}}}_i)\) of sequences that satisfy the moment sequence condition on \({\bar{\mathcal {A}}}_i\), the partition should be such that the \({\bar{\mathcal {A}}}_i\) are basic semi-algebraic compact sets. If \(\mathcal {X}\) is a basic semi-algebraic compact set, that means that \(\mathcal {A}_i\) should be defined as the set of points in \(\mathcal {X}\) satisfying a finite set of additional polynomial inequalities.
Sum of piecewise polynomial costs. In the case where
where each \(c_k\) is a l.s.c. piecewise polynomial associated with a particular partition \((\mathcal {A}_{k,i})_{1\le i \le m_k }\), i.e. \(c_{k \mid \mathcal {A}_{k,i}}:= c_{k,i} \in \mathbb {R}[\textbf{x}]\), we could introduce a finer partition of \(\mathcal {X}\) composed by sets \(\mathcal {A}_{1,i_1}\cap \cdots \cap \mathcal {A}_{s,i_s}= \mathcal {A}_{\textbf{i}} \), with \(1 \le i_k \le m_k\). The function c being polynomial on each set \( \mathcal {A}_{\textbf{i}}\), the problem can be reformulated as a generalized moment problem involving \(m_1 \cdots m_s\) measures \(\pi _{\textbf{i}}\) supported on the sets \({\bar{\mathcal {A}}}_{\textbf{i}}\), for \(\textbf{i} \in \{1,\ldots ,m_1\} \times \cdots \times \{1,\ldots ,m_s\}\). However, the resulting number of unknown measures is exponential in s.
An alternative approach, that will be used later in this paper, is to introduce for each \(1 \le k \le s\) a collection of measures \((\pi _{k,i})_{1\le i \le m_i}\) and consider the problem
over measures \(\pi \in \Pi \) and \(\pi _{k,i} \in \mathcal {M}({\bar{\mathcal {A}}}_{k,i})_+\), \(1\le i \le m_k, 1\le k\le s\), satisfying
This results in a problem with \(m_1 + \cdots + m_s +1\) unknown measures, that can be equivalently written as the problem
with measures \(y\in \Pi _{mom}\) and \(y_{k,i} \in MS(\mathcal {A}_{k,i})\), \(1\le i \le m_k, 1\le k\le s\), satisfying the additional constraints
Note that the measure \(\pi \) (resp. the sequence y) can be eliminated from problem (3.13) (resp. (3.14)). We summarize the above results in the next theorem.
Theorem 3.8
(Sum of piecewise polynomial costs) Assume \( \mathcal {X}_1\times \dots \times \mathcal {X}_K\) is compact, and consider a l.s.c. piecewise polynomial cost of the form (3.12), where each \(c_k\) is a l.s.c. piecewise polynomial over a partition \((\mathcal {A}_{k,i})_{1\le i \le m_k}\) of \(\mathcal {X}\), \(1\le k\le s\). Then the OT problem (3.1) is equivalent to the problem (3.14): a minimizer \(\pi ^*\) of problem (3.1) is such that \((m(\pi _{k,i}^*))\), with \(\pi ^*_{k,i} = \mathbbm {1}_{\mathcal {A}_{k,i}} \pi ^*\), is a minimizer of problem (3.14), and conversely, a minimizer \((y^*_{k,i})\) of problem (3.14) is such that each \(y^*_{k,i}\) has a representing measure \(\pi _{k,i}\) supported on \({\bar{\mathcal {A}}}_{k,i}\), and for each k, the sum \( \pi _{k,1} + \cdots + \pi _{k,m_k} = \pi \) is solution of (3.1). In addition, if the solution \(\pi ^*\) of the OT problem (3.1) is unique, then (3.14) have infinitely many solutions but the sum \(y^*_{k,1} + \cdots + y_{k,m_k}^*:= y^*\) is unique and such that \(y^* = m(\pi ^*).\)
4 Wasserstein distances and barycenters
In this section, we consider the problems of computing distances and barycenters in Wasserstein spaces and show that they can be expressed as generalized moment problems. Throughout the section, \(\mathcal {X}\) denotes a compact basic semi-algebraic set in the normed vector space \((\mathbb {R}^d,\Vert \cdot \Vert _p)\), with \(p \in \mathbb {N}^*\).
4.1 Wasserstein distances
The Wasserstein space \(\mathcal {P}_p(\mathcal {X})\) is defined as the set of probability measures \(\mu \in \mathcal {P}(\mathcal {X})\) with finite moments up to order p, namely
Let \(\mu \) and \(\nu \) be two probability measures in \(\mathcal {P}_p(\mathcal {X})\). For any \(p\in \mathbb {N}^*\), the \(L^p\)-Wasserstein distance \(W_p(\mu ,\nu )\) between \(\mu \) and \(\nu \) is defined by
The space \(\mathcal {P}_p(\mathcal {X})\) endowed with the distance \(W_p\) is a metric space, usually called \(L^p\)-Wasserstein space (see [37] for more details). The \(W_p\) distance defined through problem (4.1) is an optimal transport problem of the form (1.1) with \(K=2\) marginals, \(\mathcal {X}_1 = \mathcal {X}_2 = \mathcal {X}\) and a continuous cost function
We claim that for any \(p\in \mathbb {N}^*\), this problem can be seen as a generalized moment problem. We distinguish the cases where p is even and odd.
Case p even. When p is an even number, the cost c is a polynomial and we simply use the binomial theorem to derive that the loss function in (4.1) can be expressed as
or in terms of the moments \(m(\pi )\) of \(\pi \),
where we recall that \(\textbf{e}_i\) is the i-th canonical vector in \(\mathbb {N}^d\). The marginal constraints \(\pi \in \Pi (\mathcal {X}\times \mathcal {X}; \mu , \nu )\) of problem (4.1) can also be expressed in terms of moments. We derived their general form in equation (3.4). In the present context, they read
The problem (4.1) can then be expressed as the generalized moment problem
where \(\Pi _{mom}:= \Pi _{mom}(\mathcal {X}\times \mathcal {X};m(\mu ), m(\nu ))\) is the set of sequences \(y \in \mathbb {R}^{\mathbb {N}^{2d}}\) that satisfy the moment sequence condition and the marginal constraints
Here, Theorem 3.1 applies and proves the equivalence between problems (4.2) and (4.1).
Case p odd. When p is odd, the presence of the absolute value in the cost function c prevents from having a polynomial expression. We can nevertheless derive a moment formulation by exploiting the fact that the cost is piecewise polynomial on \(\mathcal {X}\times \mathcal {X}\). We first introduce for all \(i\in \{1,\dots ,d\}\) the subsets
that form a partition of \(\mathcal {X}\times \mathcal {X}\), i.e.
If \(\mathcal {X}\) is compact semi-algebraic, then \(\mathcal {A}_i^+\) and \( \overline{ \mathcal {A}_i ^-}\) are also compact semi-algebraic. For any \(\pi \in \mathcal {P}(\mathcal {X}\times \mathcal {X})\), we can define measures \(\pi ^+_i, \pi ^-_i\) by
which are such that
When p is odd, since \(\mathbbm {1}_{\mathcal {A}_i^- } + \mathbbm {1}_{\mathcal {A}_i^+} = \mathbbm {1}_\mathcal {X}\), we can write the Wasserstein loss function as
From Theorem 3.1, we know that problem (4.1) is equivalent to the following problem with \(2d+1\) measures,
which can be equivalently reformulated as a generalized moment problem
over a set of \(2d+1\) sequences satisfying moment sequence conditions \(y_ i^+ \in MS(\mathcal {A}_i^+)\) and \(y_ i^- \in MS(\overline{\mathcal {A}_i^-})\), \(1\le i \le d\), and the constraints \(y\in \Pi _{mom}(\mathcal {X}\times \mathcal {X}; m(\mu ),m(\nu ))\) and
Note that the variable y can be eliminated.
4.2 Wasserstein barycenters
A notion that is widely used to approximate measures in the Wasserstein spaces is the one of barycenters. To define it, let \(N\in \mathbb {N}^*\) and let
be the simplex in \(\mathbb {R}^N\). We say that \(\textrm{Bar}(\textrm{Y}_N, \Lambda _N) \in \mathcal {P}_p(\mathcal {X})\) is a barycenter associated to a given set \(\textrm{Y}_N = (\mu _i)_{1\le i\le N}\) of N probability measures from \(\mathcal {P}_p(\mathcal {X})\) and to a given set of weights \(\Lambda _N = (\lambda _i)_{1\le i\le n} \in \Sigma _N\), if and only if \(\textrm{Bar}(\Lambda _N, \textrm{Y}_N)\) is a solution to
Existence and uniqueness of minimizers of (4.3) has been studied in depth in [38] for the case \(p=2\). It is shown, in particular, that if one of the \(\mu _i\) has a density, the barycenter is unique. In the following we assume existence of minimizers. Problem (4.3) can be written as an optimization problem
over measures \(\nu \in \mathcal {P}_p(\mathcal {X})\), and \(\pi _i \in \mathcal {M}(\mathcal {X}\times \mathcal {X})_+\), \(1\le i \le N,\) satisfying the constraints \(\pi _i\in \Pi (\mathcal {X}\times \mathcal {X}; \nu ,\mu _i)\).
When p is even, this can be equivalently written as a generalized moment problem
over sequences that satisfy the constraints \(y_i \in \Pi _{mom}(\mathcal {X}\times \mathcal {X}; y, m(\mu _i))\), \(1\le i \le N.\) Note that the unknown y can be eliminated by imposing that all \(y_i\) have the same left marginal sequence.
When p is odd, the problem (4.3) is equivalent to a generalized moment problem
over sequences satisfying moment sequence conditions \(y \in MS(\mathcal {X})\) and \(y_{i,j}^\pm \in MS(\mathcal {X}\times \mathcal {X})\), \(1\le i \le N, 1\le j \le d\), and the additional constraints \(y_{i,j}^+ + y_{i,j}^- = y_{i,k}^+ + y_{i,k}^-\) for all \(i \in \{1,\ldots ,N\}\) and \( 1\le j<k\le d\), and \(y_{i,j}^+ + y_{i,j}^- \in \Pi _{mom}(\mathcal {X}\times \mathcal {X}; y, m(\mu _i))\) for all \(i \in \{1,\ldots ,N\}\) and \( 1\le j\le d\). Again, the unknown y could be eliminated by imposing that all \(y_i\) have the same left marginal sequences.
5 Gromov–Wasserstein discrepancies and barycenters
For some applications such as shape matching or word embedding, an important limitation of classic Wasserstein metrics lies in the fact that it is not invariant to rotations and translations and more generally to isometries. It is also defined for measures defined on the same ambient space \(\mathcal {X}\). To overcome these limitations, several extensions have been proposed (see, e.g., [39]). We focus here on the so-called Gromov–Wasserstein discrepancies in Euclidian spaces, originally introduced in [40], and which has recently attracted a lot of attention from practitioners.
5.1 Gromov–Wasserstein discrepancies
Given two compact semi-algebraic Borel sets \(\mathcal {X}\in \mathbb {R}^{d_\mathcal {X}}\) and \(\mathcal {Y}\in \mathbb {R}^{d_\mathcal {Y}}\), two probability measures \(\mu \in \mathcal {P}(\mathcal {X})\) and \(\nu \in \mathcal {P}(\mathcal {Y})\), and two cost functions \(c_\mathcal {X}: \mathcal {X}\times \mathcal {X}\rightarrow \mathbb {R}\) and \(c_\mathcal {Y}: \mathcal {Y}\times \mathcal {Y}\rightarrow \mathbb {R}\), we define for \(p\in \mathbb {N}^*\) a Gromov–Wasserstein discrepancy \(GW_{p}\) between measures \(\mu \) and \(\nu \) as
where the loss function \(\mathcal {L}^{GW_p}: \mathcal {M}(\mathcal {X}\times \mathcal {Y})_+ \rightarrow \mathbb {R}\) is such that
Note that this problem is quadratic in \(\pi \), and existence of minimizers of (5.1) is guaranteed under mild assumptions using weak lower semi-continuity, and compactness, similar to the classical Wasserstein problem (see [41, Prop. 3.1]). It can alternatively be expressed as a linear problem with a rank-one tensor constraint in the augmented space
which can be identified with a basic semi-algebraic set of \(\mathbb {R}^{2n}\) with \(n= d_\mathcal {X}+ d_\mathcal {Y}\). Using the space \(\mathcal {Z}\), we can write
with
and we have
In the particular case where the cost functions \(c_\mathcal {X}= \Vert \cdot - \cdot \Vert _q^q\) and \(c_\mathcal {Y}= \Vert \cdot - \cdot \Vert _q^q\) are related to \(\ell ^q\) norms, for some \(q \in \mathbb {N}^*\), we denote \(GW_{p,q}(\mu ,\nu )\) the corresponding Gromov–Wasserstein discrepancy and \(\mathcal {L}^{GW_{p,q}}\) the corresponding loss. Note that case \( GW_{2,2}\) is of particular practical interest. We now distinguish different cases depending on whether the costs \(c_\mathcal {X}\) and \(c_{\mathcal {Y}}\) are polynomials or not.
5.1.1 Polynomial costs \(c_\mathcal {X}\) and \(c_\mathcal {Y}\)
Here we consider polynomial costs \(c_\mathcal {X}\) and \(c_\mathcal {Y}\). Two cases will be again distinguished.
Case p even. When p is even, then the cost \( \vert c_\mathcal {X}( \textbf{x}, \textbf{x}' ) -c_\mathcal {Y}( \textbf{y}, \textbf{y}' ) \vert ^p \) is a polynomial on \(\mathcal {X}\). Given polynomial expansions of \(c_\mathcal {X}\) and \(c_\mathcal {Y}\), we can deduce a polynomial expansion of their difference
with \({\varvec{\gamma }}_i \in \mathbb {N}^{2n}\) and \(c_i\in \mathbb {R}\). Using the multinomial theorem,
with \({\varvec{\gamma }}_{\textbf{k}}= \sum _{i=1}^N k_i {\varvec{\gamma }}_i \in \mathbb {N}^{2n}\) and \(a_{\textbf{k}}= {p \atopwithdelims (){\textbf{k}}} \prod _{i=1}^N c_i^{k_i} \). For \({\varvec{\gamma }}\in \mathbb {R}^{2n}\), we denote \({\varvec{\gamma }}^L,{\varvec{\gamma }}^R \in \mathbb {R}^{n}\) such that \({\varvec{\gamma }}= ({\varvec{\gamma }}^L, {\varvec{\gamma }}^R)\). This yields the following expression of the Gromov–Wasserstein loss function in terms of moments
with \(L^{GW_p}_{aug}:\mathbb {R}^{\mathbb {N}^{2n}} \rightarrow \mathbb {R}\) a linear functional, or
with \(L^{GW_p}:\mathbb {R}^{\mathbb {N}^{n}} \rightarrow \mathbb {R}\) a quadratic functional.
When p is even and the costs are polynomials, the Gromov–Wasserstein problem (5.1) can therefore be expressed as a generalized moment problem with quadratic objective function
with \(\Pi _{mom}=\Pi _{mom}( \mathcal {X}\times \mathcal {Y}; m(\mu ),m(\nu ))\) the set of sequences satisfying the moment sequence condition \(y\in MS(\mathcal {X}\times \mathcal {Y})\) and marginal conditions, and we easily prove the equivalence between the two problems, following the proof of Theorem 3.1.
Case p odd. When p is odd, we can use a similar strategy as for Wasserstein distances. We introduce two subsets \(\mathcal {A}^+\) and \(\mathcal {A}^-\) of \(\mathcal {Z}\) defined by
with \(g(\textbf{z}) = c_\mathcal {X}( \textbf{x}, \textbf{x}' ) -c_\mathcal {Y}( \textbf{y}, \textbf{y}' )\) for \(\textbf{z}= (\textbf{x},\textbf{y},\textbf{x}',\textbf{y}').\) The sets are such that \((\mathcal {A}^+, \mathcal {A}^-)\) form a partition of \(\mathcal {Z}\). If \(\mathcal {X}\) and \(\mathcal {Y}\) are basic semi-algebraic sets, then the sets \(\mathcal {A}^+\), \(\overline{\mathcal {A}^-}\) are also basic semi-algebraic sets. For any \(\pi \in \mathcal {P}(\mathcal {X}\times \mathcal {Y})\), we define two measures \(\gamma ^+ = \mathbbm {1}_{\mathcal {A}^+}\pi \otimes \pi \) and \( \gamma ^- = \mathbbm {1}_{\mathcal {A}^- } \pi \otimes \pi ,\) which are such that
Since \(\mathbbm {1}_{\mathcal {A}^-} + \mathbbm {1}_{\mathcal {A}^+} = \mathbbm {1}_\mathcal {Z}\), we can write the Gromov–Wasserstein loss function as
Therefore, from (3.6), we know that the problem (5.1) is equivalent to the following problem
over three measures satisfying the constraint (5.3), or equivalently
with \(L^{GW_p}_{aug}\) defined by (5.2), and where \(y \in \Pi _{mom}(m(\mu ),m(\nu ))\) satisfies marginal conditions and the moment sequence condition on \(\mathcal {X}\times \mathcal {Y}\), the sequences \(y^+ \in MS(\mathcal {A}^+ )\) and \( y^- \in MS(\overline{\mathcal {A}^-} ) \) satisfy the moment sequence condition on \(\mathcal {A}^+\) and \(\overline{\mathcal {A}^-}\) respectively, and the three sequences satisfy the additional quadratic constraint \(y^+ + y^- = y \otimes y \), or equivalently
5.1.2 Piecewise polynomial costs \(c_\mathcal {X}\) and \(c_\mathcal {Y}\)
The case where \(c_\mathcal {X}\) and \(c_\mathcal {Y}\) are piecewise polynomial functions can be treated by following the general strategy presented in Sect. 3.3. Let us briefly discuss the case of \(GW_{p,q}\) with q odd, where the cost is
For p even and q odd, a first strategy is to introduce a partition \(\{\mathcal {A}_{\varvec{\alpha } }: \varvec{\alpha } \in \{-1,1\}^{2d}\}\) with \(2^{2d}\) elements, where
On each element \(\mathcal {A}_{\varvec{\alpha }}\), the cost \( g(\textbf{z}) ^p\) is a polynomial. Therefore, the problem on a single measure \(\pi \) can be reformulated as a problem on \(4^d\) measures \(\pi _{\varvec{\alpha }} = \mathbbm {1}_{\mathcal {A}_{\varvec{\alpha }}} \pi \). For p odd and q odd, we can introduce a partition \(\{\mathcal {A}_{\varvec{\alpha }}^\pm : \varvec{\alpha } \in \{-1,1\}^{2d}\}\) with \(2^{2d+1}\) elements, where \( \mathcal {A}_{\varvec{\alpha }}^+ = \mathcal {A}_{\varvec{\alpha }} \cap \mathcal {B}_{\varvec{\alpha }}^{+} \) and \( \mathcal {A}_{\varvec{\alpha }}^- = \mathcal {A}_{\varvec{\alpha }} \cap \mathcal {B}_{\varvec{\alpha }}^{-} \), with
The initial problem on a measure \(\pi \) is then reformulated as a problem on \(2^{2d+1}\) measures \(\pi _{\varvec{\alpha }}^\pm \), \(\varvec{\alpha } \in \{-1,1\}^{2d}\).
With the approach above, the number of measures is exponential in d. For p even, in order to reduce the number of measures, an alternative approach is to write the cost as
and for each \(\textbf{k} \in \mathbb {N}^{2d} \), with \(\vert \textbf{k}\vert = p\), introduce a partition adapted to the piecewise polynomial \(p_{ \textbf{k}}(\textbf{z}):= \prod _{i=1}^d \vert x_i - x_i' \vert ^{qk_i} \prod _{i=1}^{d} \vert y_i - y_i' \vert ^{qk_{i+d}}\), and as many measures as the number of elements in the partition. To a polynomial \(p_{ \textbf{k}}(\textbf{z})\) is associated a partition composed by at most \(2^{m_{\textbf{k}}}\) elements, with \(m_{\textbf{k}} \le p\) the number of odd entries in \(\textbf{k}\). This yields a reformulation with a number of measures bounded by \(2^p {2d + p \atopwithdelims (){2d}} = O(d^p)\). As an example, for \(p=2\),
which can be reduced to a sum of \(2 d^2 + d\) piecewise polynomials, each of these piecewise polynomials being associated with a partition composed by 2 or 4 elements. This yields a reformulation in \(O(d^2)\) measures.
5.2 Gromov–Wasserstein barycenters
Using the same notations as in Sect. 4.2, we say that \(\textrm{Bar}(\textrm{Y}_N, \Lambda _N) \in \mathcal {P}(\mathcal {X})\) is a Gromov–Wasserstein barycenter associated to a given set \(\textrm{Y}_N = (\mu _i)_{1\le i\le N}\) of N probability measures in \(\mathcal {P}(\mathcal {Y})\) and to a given set of weights \(\Lambda _N = (\lambda _i)_{1\le i\le n} \) in the simplex \(\Sigma _N\), if and only if \(\textrm{Bar}(\Lambda _N, \textrm{Y}_N)\) is a solution to
Existence of minimizers has been guaranteed in [41, Thm. 5.1, 5.2] but uniqueness is not guaranteed. Note that, in practice this is not really a problem in the sense that if we have several global minimizers, it is sufficient to get one of these minimizers. In fact, the provided fixed-point algorithm seems to converge to a local minimum but the proof of convergence remains an open problem. We refer to [42, 43] for further references with theoretical background on Gromov–Wasserstein barycenters.
Problem (5.4) can be written as a quadratic optimization problem
over measures \(\nu \in \mathcal {P}(\mathcal {X})\) and \(\pi _i \in \mathcal {M}(\mathcal {X}\times \mathcal {Y})_+\), \(1\le i \le N,\) satisfying the constraints \(\pi _i\in \Pi (\mathcal {X}\times \mathcal {Y}; \nu ,\mu _i)\).
When p is even and the costs \(c_\mathcal {X}\) and \(c_\mathcal {Y}\) are polynomials, this can be equivalently written as a generalized moment problem
over sequences that satisfy moment sequence conditions \(y\in MS(\mathcal {X})\) and \(y_i \in MS(\mathcal {X}\times \mathcal {Y})\), \(1\le i \le N,\) and the additional constraints \(y_i \in \Pi _{mom}(y, m(\mu _i))\) for \(1\le i \le N\).
When p is odd and the costs \(c_\mathcal {X}\) and \(c_\mathcal {Y}\) are polynomials, using notations of Sect. 5.1.1, we can introduce additional measures \(\gamma _i^-\) and \(\gamma _i^+\) supported on \(\mathcal {A}^+\) and \(\mathcal {A}^-\) respectively, and the problem is reformulated as
with the same constraints as before for \(y, y_1,\ldots ,y_N\) and the additional constraints \(\gamma _i ^\pm \in MS(\mathcal {A}^\pm )\) and \(y_i^+ + y_i^- = y_i \otimes y_i\), \(1\le i\le N\).
When the costs \(c_\mathcal {X}\) and \(c_\mathcal {Y}\) are piecewise polynomials, e.g. for \(GW_{p,q}\) with odd q, the problem can still be reformulated as a generalization moment problem up to the introduction of new measures, following Sect. 5.1.2. The derivation is rather technical but straightforward.
6 The moment-SoS hierarchy
All OT problems considered in this paper are of the form
under additional constraints
where \(\mathcal {G}\) and \(\mathcal {H}_j\), \(j\in \Gamma ,\) are linear or quadratic functions of a finite set of moments of the measures \(\pi _1, \ldots , \pi _M\), and \(\Gamma \) is a countable set. The constraints include that \(\textrm{mass}({\pi _i}) = m_{0}(\pi _i) \le 1.\) Problem (6.1) can be equivalently formulated as a generalized moment problem
where K is the set of sequences \(y_1 \in MS(\mathcal {X}_1), \ldots , y_M \in MS(\mathcal {X}_M)\) that satisfy the constraints
and where the functions \(G: \mathbb {R}^{\mathbb {N}^{n_1}} \times \cdots \times \mathbb {R}^{\mathbb {N}^{n_M}} \rightarrow \mathbb {R}\) and \(H_j: \mathbb {R}^{\mathbb {N}^{n_1}} \times \cdots \times \mathbb {R}^{\mathbb {N}^{n_M}} \rightarrow \mathbb {R}\) are linear or quadratic functions involving only finitely many entries of the input sequences \(y_1, \ldots ,y_M\). The constraints include the conditions \((y_i)_{0} \le 1\) for all \(1\le i \le M\).
The \(\mathcal {X}_i\) are assumed to be compact semi-algebraic sets defined by
for some polynomials \(g_{i,j} \) over \(\mathbb {R}^{n_i}\), where \(g_{i,0}(\textbf{x}_i) = 1\) and \(g_{i,1}(\textbf{x}_i) = R^2 - \Vert \textbf{x}_i\Vert _2^2 \) for \(\textbf{x}_i\in \mathbb {R}^{n_i}\), \(1\le i\le M\), where \(R>0.\) From Theorem 3.4, the moment sequence condition \(y_i \in MS(\mathcal {X}_i)\) is equivalent to the following set of positive semidefinite constraints
The matrix \(\textbf{M}_{r}(g_{i,j} y_i) \) depends linearly on the entries \((y_i)_{{\varvec{\alpha }}}\) of order \(\vert {\varvec{\alpha }}\vert \le r_{i,j} + 2r\) with \(r_{i,j} = \lceil deg(g_{i,j})/2 \rceil \). We assume that G only involves moments of order up to \(r_G\), and the function \(H_j\) only involves moments of order up to \(r_{H_j}\).
The Lasserre’s (or moment-SoS) approach for solving (6.1) consists in considering a hierarchy of problems
where \(K_r\) is the set of sequences \(y_1 \in MS_r(\mathcal {X}_1), \ldots , y_M \in MS_r(\mathcal {X}_M)\) that satisfy the constraints
with \(\Gamma _r = \{j\in \Gamma : r_{H_j }\le 2 r\}\), and where \(MS_r(\mathcal {X}_i)\) is the set of sequences \(y_i\) that satisfy
Problem (6.3) is called a relaxation of order r of problem (6.2). These problems are considered for \(r \ge r^*:= \max \{ \lceil r_G/2\rceil ,\max _{i,j} r_{i,j} \}\). They only involve the entries of \(y_1, \ldots ,y_M\) of order less than 2r, and can be formulated over M finite dimensional vectors \(y_i^r\) in \(\mathbb {R}^{\mathbb {N}^{n_i}_{2r}}\), \(1\le i\le M\). Then \(y_i^r\) can be considered again as an infinite sequence in \(\mathbb {N}^{n_i}\) by completion with zeros.
Theorem 6.1
Problem (6.3) admits a solution for all \(r \ge r^*\). The sequence \((\rho _r)_{r\ge r^*}\) is increasing and \(\rho _r \rightarrow \rho \) as \(r\rightarrow \infty .\) Moreover, from a sequence of solutions \((y_1^r,\ldots ,y_M^r)\) of problems (6.3), we can extract a subsequence \((y_1^{r_k},\ldots ,y_M^{r_k})\) such that for each \(1\le i\le N\) and \({\varvec{\alpha }}\in \mathbb {N}^{n_i}\),
where the set of sequences \((y_1, \ldots ,y_M)\) is a solution of problem (6.2) and admits a representing measure \((\pi _1,\ldots ,\pi _M)\) solution of (6.1). If (6.2) (or equivalently (6.1)) admits a unique solution, then we have the convergence of the whole sequence \((y_{i}^{r})_{{\varvec{\alpha }}}\) to \((y_i)_{{\varvec{\alpha }}}\) as \(r\rightarrow \infty \), for all \({\varvec{\alpha }}\in \mathbb {N}^{n_i},\) where \((y_1,\ldots ,y_M)\) is the solution of (6.2).
Lemma 6.2
Let \(r\in \mathbb {N}\) and consider a sequence \(y \in \mathbb {N}^{n}\). If \(\textbf{M}_r(y) \succcurlyeq 0\) and \(\textbf{M}_{r-1}(g y) \succcurlyeq 0\) with \(g(\textbf{x}) = R^2 - \Vert \textbf{x}\Vert _2^2\) for \(r\in \mathbb {N}\), then for all \(0\le k \le r\),
Proof
Since \(\textbf{M}_r(y) \succcurlyeq 0\) implies \(\textbf{M}_k(y) \succcurlyeq 0\) for all \(0\le k \le r\), we deduce from [34, Prop. 3.6] that
for all \(0 \le k\le r.\) Moreover \(\textbf{M}_{r-1}( (R^2 - \Vert \textbf{x}\Vert _2^2)y) \succcurlyeq 0\) is equivalent to \(\ell _y(R^2 f^2) - \sum _{i=1}^n \ell _y(x_i^2 f^2) \ge 0\) for all \(f \in \mathbb {R}[\textbf{x}]_{r-1}\). Taking \(f=1,\) we obtain \(\ell _y(x_i^2) \le y_{0} R^2\). Then taking \(f(\textbf{x}) = x_i^{k-1}\) with \(1\le k \le r\), we obtain \(\ell _y(R^2 x_i^{2k-2}) - \sum _{j=1}^n \ell _y(x_j^2 x_i^{2k-2}) \ge 0\), which implies \(\ell _y(x_i^{2k}) \le R^2 \ell _y(x_i^{2k-2}) \le R^{2 k} y_0\). Then \(\max _{1\le i \le n} \ell _y(x_i^{2k}) \le y_0 R^{2k}\) and we conclude by using (6.5). \(\square \)
Proof of Theorem 6.1
The proof is adapted from the proof of [34, Theorem 4.3]. We detail it for the sake of completeness. Clearly, \(K_r \supset K_{r+1} \supset \cdots \supset K\) for all \(r\ge r^*\), so that \(\rho _r\) is increasing with r and \(\rho _r \le \rho \). For all \(1\le i\le M\), we have \((y_i^r)_0 \le 1\) and \(g_{i,0}=1\) and \(g_{i,1}=R^2 - \Vert \cdot \Vert ^2_2\). Then from the constraints (6.4) and Lemma 6.2, we deduce
with \(\tau _k = \max \{1,R^{2k}\}.\) We deduce that \(K_r\) is a compact set of a finite dimensional space, and from the continuity of G and \(H_j\), \(j\in \Gamma _r\), we deduce that (6.3) admits a solution \(y^r = (y^r_1,\ldots ,y^r_M)\). Now we identify each \(y^r_i\) with a sequence in \(\mathbb {N}^{n_i}\) with components \((y^r_i)_{{\varvec{\alpha }}} = 0\) for \(\vert {\varvec{\alpha }}\vert >2r.\) We introduce sequences \({\hat{y}}^r_i \in \mathbb {N}^{n_i}\) defined by
which are such that \(\Vert {\hat{y}}^r_i \Vert _{\ell ^\infty } \le 1\) and \({\hat{y}}^r_i \in c_0:= \{ y \in \mathbb {N}^{n_i}: \lim _{\vert {\varvec{\alpha }}\vert \rightarrow 0} y_{\varvec{\alpha }}= 0\} \subset \ell ^\infty \). Since \(c_0\) is the topological dual of \(\ell ^1\), we have by the Banach-Alaoglu theorem that the unit ball \(B_1(c_0)\) of \(c_0\) is compact in the weak-\(*\) topology \(\sigma (c_0,\ell ^1)\). Therefore, we can extract a subsequence \(({\hat{y}}_i^{r_k})_{k\ge 1}\) of \(({\hat{y}}_i^{r})_{r\ge r^*}\) which converges to some \({\hat{y}}_i \in B_1(c_0)\) in the weak-\(*\) topology. In particular, this implies that for all fixed \({\varvec{\alpha }}\in \mathbb {N}^{n_i}\), \(({\hat{y}}^{r_k}_i)_{{\varvec{\alpha }}} \rightarrow ({\hat{y}}_i)_{\varvec{\alpha }}\) as \(k \rightarrow \infty \) and therefore, \((y^{r_k}_i)_{{\varvec{\alpha }}} \rightarrow (y_i)_{{\varvec{\alpha }}}\) as \(k\rightarrow \infty \), where \(y_i \in \mathbb {N}^{n_i}\) is defined by \((y_i)_{{\varvec{\alpha }}} = ({\hat{y}}_i)_{\varvec{\alpha }}\tau _{\omega ({\varvec{\alpha }})}\).
Since the function \(G(y_1,\ldots ,y_M)\) only depends continuously on the finite set of variables \(\{ (y_i)_{{\varvec{\alpha }}}: \vert \alpha \vert \le r_G, 1\le i \le N \}\), we deduce that \(\rho _{r_k} = G(y^{r_k}_1,\ldots ,y^{r_k}_M) \rightarrow G(y_1,\ldots ,y_M) \) as \(k\rightarrow \infty \). Also, for a fixed \(j \in \Gamma \), since \(H_j\) only depend continuously on the finite set of variables \(\{ (y_i)_{{\varvec{\alpha }}}: \vert {\varvec{\alpha }}\vert \le r_{H_j,} 1\le i \le N \}\), we have that \(H_j(y_1, \ldots , y_M) = \lim _{k \rightarrow \infty } H_j(y_1^{r_k}, \ldots , y_M^{r_k}) = b_j. \) Also, for any \(m \in \mathbb {N}\), since \( \textbf{M}_{m}(g_{i,j} y_i)\) only depends continuously in a finite set of variables \(\{ (y_i)_{{\varvec{\alpha }}}: \vert {\varvec{\alpha }}\vert \le r_{i,j} + 2m, 1\le i \le N \}\), and from the closedness of the positive cone of symmetric positive semidefinite matrices, we deduce that \( \textbf{M}_{m}(g_{i,j} y_i) = \lim _{k\rightarrow \infty } \textbf{M}_m(g_{i,j} y_i^{r_k}) \succcurlyeq 0\). Hence \((y_1,\ldots ,y_M) \in K\) and
which proves that \((y_1,\ldots , y_M)\) is a solution of (6.3). Since \(\rho _r\) is increasing, this implies that the whole sequence \(\rho _r \) converges to \(\rho \) as \(r\rightarrow \infty \). If the solution of (6.3) is unique, then from all subsequences of \(((y_i^r)_{{\varvec{\alpha }}})_{r\ge r^*}\), we can extract a subsequence that converges to the same limit \(((y_i)_{{\varvec{\alpha }}})_{r\ge r^*}\), which implies the convergence of the whole sequence. \(\square \)
7 Post-processing
Here we consider the post-processing of the solution of the moment-SoS approach. From the solution of the problem (6.3) of order r, we obtain an approximation \(y^r\) of the moments \(y = m(\mu )\) (up to order 2r) of some probability measure of interest \(\mu \) over a basic semi-algebraic set \(\mathcal {X}\subset \mathbb {R}^n\), which is the target solution of the initial OT problem. We here assume that \(\mu \) is the unique solution of the initial OT problem. By Theorem 6.1, we have that \(y^r_{\varvec{\alpha }}\) converges to \(m_{\varvec{\alpha }}(\mu )\) as \(r\rightarrow \infty \), for each \({\varvec{\alpha }}\in \mathbb {N}^n\).
7.1 Approximation of linear quantities of interest
From approximate moments, we directly obtain an estimation of the first statistics of \(\mu \) and its marginals (mean, variance, covariance...) or more generally of any quantity of interest
For a polynomial \(g = \sum _{\vert {\varvec{\alpha }}\vert \le p} c_{\varvec{\alpha }}\textbf{x}^{\varvec{\alpha }}\in \mathbb {R}[\textbf{x}]_p\), \(I(g) = \ell _{m(\pi )}(g)\) is estimated by
and we have that \(I_r \rightarrow I(g)\) as \(r \rightarrow \infty \). For a function g which is not a polynomial, the quantity I can be approximated by \(I_{r,p} = \ell _{y^r}(g_{p})\) where \(g_p = \sum _{\vert {\varvec{\alpha }}\vert \le p} c_{\varvec{\alpha }}\textbf{x}^{\varvec{\alpha }}\in \mathbb {R}[\textbf{x}]_p\) is a polynomial approximation of g, and
with \(\Vert g - g_p \Vert _{L^\infty (\mathcal {X})}\) the error of approximation of g by \(g_p\). We have that \(I_{r,p}\) converges to I as \(r,p \rightarrow \infty \). Studying the rate of convergence of \(I_{r,p}\) to I requires some additional information on the convergence of \(g_p\) and the convergence of the approximate moments.
7.2 Approximation of the support of \(\mu \)
Here, we show how to estimate the support \(S(\mu )\) of \(\mu \) from an approximation of its moments, using the Christoffel function. Note that \(S(\mu )\) is contained in the basic semi-algebraic set \(\mathcal {X}\). This methodology has been originally proposed in [30]. It is presented and analysed in [44, 45] in a statistical setting.
For \(r\in \mathbb {N}\), we denote \(\Pi _r^n = \mathbb {R}[\textbf{x}]_r\) the space of polynomials over \(\mathbb {R}^n\) with degree less than r. We let \(\varvec{\phi }_r(\textbf{x}) = (\textbf{x}^{\varvec{\alpha }})_{{\varvec{\alpha }}\in \mathbb {N}^n_r} \in \mathbb {R}^{s(r)} \) be the vector of monomials of degree less than r, with \(s(r):= {n + r \atopwithdelims ()r} = \# \mathbb {N}^n_r = \dim \Pi _r^n.\) For any \(r \in \mathbb {N}\), the moment matrix \(\textbf{M}_r(\mu ) \in \mathbb {R}^{s(r) \times s(r)}\) of \(\mu \), with moments up to order 2r, is given by
which is the Gram matrix in \(L^2_\mu (\mathcal {X})\) of the canonical basis of \(\Pi _r^n.\) For two polynomials \(g(\textbf{x}) = \varvec{\phi }_r(\textbf{x})^T \textbf{a} \) and \(h(\textbf{x}) = \varvec{\phi }_r(\textbf{x})^T \textbf{b} \) in \( \mathbb {R}[\textbf{x}]_r\) with coefficient \(\textbf{a}, \textbf{b} \in \mathbb {R}^{s(r)}\), \(\textbf{a}^T \textbf{M}_r(\mu ) \textbf{b} = \int _\mathcal {X}h(\textbf{x}) g(\textbf{x}) d\mu (\textbf{x}),\) that is the inner product in \(L^2_\mu (\mathcal {X})\). In practice, an approximation of this moment matrix can be obtained from the solution \(y^r\) of a relaxation of order r, or from a solution \(y^{{\tilde{r}}}\) of higher order \({\tilde{r}} \ge r\) in order to get a better estimation.
Non degenerate case: Let us first consider the case where \(S(\mu )\) is not contained by a proper real algebraic subset of \(\mathcal {X}\). In other words, for any polynomial \(p\in \mathbb {R}[\textbf{x}]\),
This is the case when \(S(\mu )\) has nonzero Lebesgue measure. Hence, \(\textbf{M}_r(\mu )\) is invertible and the finite-dimensional space \(\Pi ^n_r\) of polynomials of degree less than r is a reproducing kernel Hilbert space in \(L^2_\mu \), whose kernel, called the Christoffel–Darboux kernel, is given for \(\textbf{x},\textbf{y}\in \mathbb {R}^n\) by (see [46])
where \((\varphi _{1}, \ldots , \varphi _{s(r)})\) is some orthonormal basis of \(\Pi ^n_r\). It can be also written
The Christoffel function \(\Lambda _{\mu ,r}\) is defined for \(\textbf{y}\in \mathbb {R}^n\) by
In the present regular case, we have for all \(\textbf{x}\),
The support is then approximated by the set
for some suitably chosen \(\gamma _r\). Since \(\Lambda _{\mu ,r}(\textbf{x})\le \gamma _r\) is equivalent to the polynomial inequality \(\kappa _{\mu ,r}(\textbf{x},\textbf{x})\le \gamma _r^{-1}\), \(S_r\) is a polynomial sublevel set in \(\mathcal {X}\).
From the Markov inequality, we have that
Therefore, by choosing \(\gamma _r = \eta /s(r)\), we guarantee that \(\mu (S_r(\mu )) \ge 1-\eta \), that is \(S_r(\mu )\) contains a fraction \(1-\eta \) of the mass of \(\mu .\)
When the measure is absolutely continuous with respect to the Lebesgue measure \(\lambda \), it is proven in [47] that \(S_r(\mu )\), with a suitable choice of the sequence \(\gamma _r\), converges to \(S(\mu )\) in the Haussdorff distance. Also, for a point \(\textbf{x}\notin S(\mu )\), \(\Lambda _{\mu ,r}(\textbf{x})^{-1}\) grows exponentially with r, while for a point \(\textbf{x}\in S(\mu )\), it only grows polynomially. A heuristic approach then consists in estimating the rate of convergence from several values of r in order to decide if \(\textbf{x}\) is in the support or not.
Singular case: We now consider the case where the measure of \(\mu \) is contained in an algebraic set, which results in a singular moment matrix \(\textbf{M}_r(\mu )\), and we follow [44] for the definition of an approximate support.
We let V be the Zariski closure of \(S(\mu )\), which is the smallest algebraic set containing \(S(\mu ).\) We denote by \(\mathcal {I}_r\) the ideal of polynomials in \(\Pi _r^n\) that vanish on V, which is the set of polynomials \(p \in \Pi ^n_r\) satisfying \(\int p(\textbf{x})^2 d\mu = 0\). The quotient space \(\Pi ^n_r / \mathcal {I}_r\) is a reproducing kernel Hilbert space in \(L^2_\mu (\mathcal {X})\) with dimension \(r' = \textrm{rank}(\textbf{M}_r(\mu )) \), with kernel
where \(\varphi _1,\ldots ,\varphi _{r'}\) is an orthonormal basis of \(\Pi ^n_r / \mathcal {I}_r\) in \(L^2_\mu \). This kernel can be obtained by
with \(\textbf{M}_r(\mu )^{\dagger } \) the Moore-Penrose pseudo-inverse of \(\textbf{M}_r(\mu )\) (of rank \(r'\)), which can be expressed \(\textbf{M}_r(\mu )^{\dagger } = \sum _{i=1}^{r'} \lambda _i^{-1} \textbf{v}_i \textbf{v}_i^T\) given a spectral decomposition \(\textbf{M}_r(\mu ) = \sum _{i=1}^{r'} \lambda _i \textbf{v}_i \textbf{v}_i^T\) with orthonormal eigenvectors \(\textbf{v}_i\) and corresponding nonzero eigenvalues \(\lambda _i\) of \(\textbf{M}_r(\mu )\).
A Christoffel function \(\Lambda _{\mu ,r}\) can still be defined through a variational formulation
We still have \(\Lambda _{\mu ,r}(\textbf{x}) = \kappa _{\mu ,r}(\textbf{x},\textbf{x})^{-1}\) for all \(\textbf{x}\in V\), but for \(\textbf{x}\notin V\), the functions \(\Lambda _{\mu ,r}(\textbf{x})\) and \(\kappa _{\mu ,r}^{-1}\) differ, which yields two possible definitions of an approximate support \(S_r(\mu )\), using either \(\Lambda _{\mu ,r}(\textbf{x})\) or \(\kappa _{\mu ,r}(\textbf{x},\textbf{x})^{-1}\), that is either (7.1) or
Practical aspects: The functions \(\kappa _{\mu ,r}\) and \(\Lambda _{\mu ,r}\) are functions of the moment matrix \(\textbf{M}_r(\mu )\) of the true measure \(\mu \). In practice, the measure \(\mu \) is replaced by the approximation \(\mu _{ r}\) of a relaxation of order r, that yields approximate functions \(\kappa _{\mu _r,r}\) and \(\Lambda _{\mu _r,r}\) and corresponding approximate supports \(S_r:= S_r(\mu _r).\) Note that for fixed r, \(\mu \) could be replaced by the solution \(\mu _{{\tilde{r}}}\) of a relaxation of higher order \({\tilde{r}}\). A quantitative approach is still missing.
7.3 Approximation of the density
If the measure \(\mu \) admits a density f with respect to a known measure \(\nu \) on \(\mathcal {X}\), i.e \(d\mu (\textbf{x}) = f(\textbf{x}) d\nu (\textbf{x})\), then the Christoffel function could also be used to estimate the density on the support \(S(\mu )\) (or its estimation), as suggested in [47].
Also, the values \(y^r_{\varvec{\alpha }}\) provide approximations of the moments
that is the inner product of \(f(\textbf{x})\) and \(\textbf{x}^{\varvec{\alpha }}\) in \(L^2_\nu (\mathcal {X})\). Different types of approximations of f can be obtained from this information. In particular, a polynomial approximation \(f_{r,p} = \sum _{\vert {\varvec{\beta }}\vert \le p} a_{\varvec{\beta }}\textbf{x}^{\varvec{\beta }}\) of f, \(p\le 2r\), can then be obtained by solving a weighted least-squares problem
where \(G_\nu \) is a Gram matrix in \(L^2_\mu \) with entries \((G_\nu )_{{\varvec{\alpha }},{\varvec{\beta }}} = \ell _m(\nu )(\textbf{x}^{\varvec{\alpha }}\textbf{x}^{\varvec{\beta }}) = \int _\mathcal {X}\textbf{x}^{\varvec{\alpha }}\textbf{x}^{\varvec{\beta }}d\nu (\textbf{x}) \). From a computational point of view, the use of canonical polynomial basis may yield to numerical instabilities and high round-off errors. Therefore, the use of other polynomial basis should be preferred.
Some reformulations of the initial OT problem yield approximations of the moments of the measures \(1_{A_k} \mu \) where the \(A_k\) form a partition of \(\mathcal {X}\). In this case, a local polynomial approximation of the density on \(A_k\) can be computed, which results in a global piecewise polynomial approximation of f over \(\mathcal {X}\).
8 Numerical illustrations
The aim of this section is to illustrate how the method behaves for the computation of Wasserstein distances, barycenters, and Gromov–Wasserstein discrepancies. We also discuss some choices we make in our implementation.
8.1 Wasserstein distances and barycenters
The code used to generate the examples shown here is available at
https://gitlab.tue.nl/data-driven/sos-ot
For our numerical tests, we consider cartoon images as displayed in Fig. 1. The images are \(400\times 400\) pixels but we will see each of them as a uniform measure on a subset \(S\subset [0, 1]^2\) which is defined as
The shape and location of the support S varies for each image. We consider three different types of shapes: smileys, stars, and pacmen.
8.1.1 \(W_2\), \(W_1\) distance
We start by estimating the \(W_1\) and \(W_2\) distances of two translated smileys \(\mu _1,\,\mu _2\) for which we know the exact translation vector \(T=(t_1, t_2)\in \mathbb {R}^2\) (see Fig. 2a). In this simple case, we know that the exact distance is given by
so we can validate the accuracy of our moment approach. Figure 2b shows the relative errors in the estimation as a function of the relaxation order r. We see that we obtain an extremely high accuracy for all relaxation orders. The high accuracy obtained with only the first order is particularly remarkable. Figure 2c shows an exponential increase in the runtime as a function of r, which is to be expected given that the number of unknown moments to estimate grows exponentially with r. Repeating the same experiment with translated stars and translated pacmen yields similar results, with very high accuracy since the first relaxation order \(r=1\).
To confirm that distances are well estimated with low relaxation orders in more general cases, we consider a more complicated example involving the classical Lena image (\(512\times 512\) pixels), and an image of PortlandFootnote 2 (\(3456\times 5184\) pixels). Both images are depicted in Fig. 3a. Even if the images have different resolution and aspect ratio, we view them as piece-wise constant functions in \([0,1]^2\). We estimate their \(W_2\) distance with our moment SoS approach, and we compare the values to the distances estimated by the geomloss library (see [48]). The library estimates \(W_2\) by entropic regularization, and the Sinkhorn algorithm. To run the algorithm, we approximate the 2d images with a sum of \(10^{10}\) Dirac masses (taken on a cartesian grid of \([0, 1]^2\)). We give the obtained results with a regularization parameter of \(1.10^{-3}\). As Fig. 3b illustrates, both approaches give values that are in very good agreement. Their relative error is of order \(10^{-4}\) as Fig. 3c shows. Interestingly, the moment SoS approach gives a slightly lower distance value than the one obtained with Sinkhorn for \(r=1\). It becomes larger for larger relaxation orders. The result illustrates the ability of the method to estimate \(W_2\) with the same quality as state of the art algorithms.
8.1.2 \(W_2\)-Wasserstein barycenters
We now turn to the computation of barycenters. We present the following test for validation purposes: consider the four translated smileys of Fig. 4a. We know that the \(W_2\) barycenter of these measures with uniform weights (0.25, 0.25, 0.25, 0.25) is equal to the smiley which is located at the center of the other images. Our goal is to study how accurately we can recover that target barycenter with our moment approach.
Since we know the exact barycenter, we also know the exact moments so we start by examining how accurately they are estimated. Figure 4b shows the relative error in the computation of the first moments as a function of r. Figure 4c reports the maximum absolute error in the moment estimation for each order r. We observe that the absolute errors decays relatively quickly (we gain about a half order of magnitude per relaxation order). Similar observations hold for relative errors. It would be interesting to examine the trend for larger orders but this has not been possible with the current implementation due to conditioning issues, and also due to the use of Mosek as a black-box optimization solver (which prevented us from sparsifying certain variables and operations, which are critical to prevent memory overflows when the complexity grows). We leave this implementation point for a future contribution in which we will also explore strategies to solve optimal transport problems in high dimension.
From the moments that our approach provides, we can reconstruct the support of the barycenter by computing the Christoffel function, and applying thresholding techniques discussed in Sect. 7.2. Figure 5a shows the Christoffel function for increasing relaxation orders r. The figure also depicts the support that we obtain by thresholding this function with parameter \(\gamma _r=0.3\) for all relaxation orders r. We may note how the estimation of the support improves as r grows. For \(r=4\) and \(r=5\) it is possible to “discover” that the measure has several non-connected components such as the mouth and the eyes.
We can repeat the same experiment by replacing the smileys with stars or pacmen. In this case, we obtain very similar results as the ones from Fig. 4 so we do not include them for the sake of brevity. We however plot the Christoffel function and the obtained support after thresholding (see Fig. 5b, c). We observe that order \(r=3\) is already enough to discover that the star has five corners. In the case of the pacman, the method is able to discover a very fair estimation of the support for \(r=4\) and \(r=5\) but it only gives a coarse approximation of the mouth (approximating this part better would have required higher relaxation orders).
8.2 Gromov–Wasserstein discrepancy and barycenters
Here we illustrate the computation of Gromov–Wasserstein discrepancies and barycenters. For our numerical tests, we consider empirical measures \(\mu _1\) to \(\mu _4\) associated with happy and sad smileys, see Fig. 6. Each measure corresponds to 1000 independent samples from a mixture of three uniform measures with equal weights 1/3, the first two measures being supporting on the eyes, the third measure having the mouth as support. The mouth is here an algebraic set with zero Lebesgue measure. Measure \(\mu _2\) (resp. \(\mu _4\)) is the push-forward of \(\mu _1\) (resp. \(\mu _3\)) by an isometry, so that \(GW_{2,2}^2(\mu _1,\mu _2) = GW_{2,2}^2(\mu _3,\mu _4) = 0\). In this section, for the formulation of moment problems, we relied on Matlab libraries tensap [49] and GloptiPoly [50].
8.2.1 Gromov–Wasserstein discrepancy \(GW_{2,2}\)
We here illustrate the estimation of the discrepancies \(GW_{2,2}(\mu _i,\mu _j)\). For a given relaxation order r, we initialize the truncated moment sequence \(y^{(0)}\) with the truncated moments \(m(\mu _i \otimes \mu _j)\) of the product measure \(\mu _i \otimes \mu _j\). Then we construct a sequence of truncated moments \(y^{(k)}\), \(k\ge 1\), by a fixed point algorithm, where \(y^{(k)}\) minimizes \( y\mapsto L_{aug}^{GW_{2,2}}(y \otimes y^{(k-1)})\) over the truncated moment sequences y satisfying the moment sequence condition and marginal constraints. As shown in Fig. 7 for a given relaxation order, the fixed point algorithm converges rapidly, after roughly 4 to 5 iterations. Note that for (i, j) equal to (1, 2) and (3, 4), the objective function converges to a plateau of order \(10^{-13}\), very close to zero in double precision.
Next, we provide estimations of discrepancies \(GW_{2,2}(\mu _i,\mu _j)\) obtained at convergence of the fixed point algorithm. In Table 1, we show the estimations of \(GW_{2,2}(\mu _i,\mu _j)\) obtained for different relaxation orders. The obtained estimations for \(GW_{2,2}(\mu _1,\mu _2)\) and \(GW_{2,2}(\mu _3,\mu _4)\) converge slowly with the relaxation order but are very small already for very small relaxation orders. The estimation of \(GW_{2,2}(\mu _1,\mu _4)\) rapidly converges with the relaxation order.
8.2.2 Gromov–Wasserstein barycenters \(GW_{2,2}\)
We now turn to the computation of Gromov–Wassertein barycenters, using discrepancy \(GW_{2,2}\). We consider the computation of the barycenters of the empirical measures \(\mu _1\) and \(\mu _2\) illustrated on Fig. 6. The experiments are here for illustrative purpose.
For a given relaxation order r, we have to solve the optimization problem (5.5) over truncated sequences y, \(y_1\) and \(y_2\), where \(y_1\) has as marginals y and the truncated moments of \(\mu _1\), and \(y_2\) has as marginals y and the truncated moments of \(\mu _2\). The objective functional can be rewritten \(\lambda L_{aug}^{GW_{2,2}}(y_1 \otimes y_1) + (1-\lambda ) L_{aug}^{GW_{2,2}}(y_2 \otimes y_2)\), with \(\lambda \in [0,1]\). For the solution of the optimization problem, we rely on a fixed point algorithm which constructs sequences of truncated moment \(y^{(k)}\), \(y^{(k)}_1\) and \(y^{(k)}_2\), \(k\ge 1\), such that \((y^{(k)},y^{(k)}_1,y^{(k)}_2)\) minimizes \((y,y_1,y_2) \mapsto \lambda L_{aug}^{GW_{2,2}}(y_1 \otimes y_1^{(k)}) + (1-\lambda ) L_{aug}^{GW_{2,2}}(y_2 \otimes y_2^{(k)})\) over truncated sequences satisfying marginal constraints and moment sequence conditions. For the initialization of \(y^{(0)}\), we take the truncated moments of either \(\mu _1\) or \(\mu _2\) (depending on the value of \(\lambda \)) and for \(y^{(0)}_1\) (resp. \(y^{(0)}_2\)), we take the tensor product of \(y^{(0)}\) and the truncated moments of \(\mu _1\) (resp. \(\mu _2\)). This algorithm converges rather slowly and should clearly be improved. However, it allows us to illustrate the potential of the proposed approach. The results are given at iteration 100.
Figures 8 and 9 illustrate respectively the estimated supports and Christoffel functions of barycenters \(\textrm{Bar}((\mu _1,{\mu _4}),\lambda )\) for different values of \(\lambda \) and orders of relaxation r. We observe a rather fast convergence with r, at least for \(\lambda \notin \{0,1\}\). Figure 9, we may note that the interpolations of the smileys appear with different rotations as \(\lambda \) varies.
9 Conclusions
We have shown the theoretical foundations to compute most common optimal transport problems with a moment-SoS approach. The numerical results reveal that the method gives very good accuracies for the estimation of the value of the loss functions with very low orders. The support of concentrated measures can efficiently be estimated with relatively low polynomial orders. This feature seems particularly appealing because it could be leveraged to cleverly allocate degrees of freedom in optimal transport solvers that approximate the optimal transport plan.
Notes
This results from the density of the space of polynomials in the space \(C(\mathcal {X})\) of continuous functions over a compact set \(\mathcal {X}\) in \(\mathbb {R}^{n}\) (Weierstrass approximation theorem), and the fact that the topological dual of \(C(\mathcal {X})\) is equal to \(\mathcal {M}(\mathcal {X})\) (Riesz representation theorem).
The image of Portland was taken by one of the authors, and it has been used with its consent.
References
Carlier, G.: Optimal transportation and economic applications. Lecture Notes 18 (2012)
Galichon, A.: A survey of some recent applications of optimal transport methods to econometrics. Econom. J. 20(2), 1–11 (2017)
Cotar, C., Friesecke, G., Pass, B.: Infinite-body optimal transport with coulomb cost. Calc. Var. Partial Differ. Equ. 54(1), 717–742 (2015)
Benamou, J.-D., Carlier, G., Nenna, L.: A numerical method to solve multi-marginal optimal transport problems with coulomb cost. In: Splitting Methods in Communication. Imaging, Science, and Engineering, pp. 577–601. Springer, Cham (2017)
Carlier, G., Duval, V., Peyré, G., Schmitzer, B.: Convergence of entropic schemes for optimal transport and gradient flows. SIAM J. Math. Anal. 49(2), 1385–1418 (2017)
Peyré, G., Cuturi, M.: Computational optimal transport: with applications to data science. Found. Trends® Mach. Learn. 11(5–6), 355–607 (2019)
Cuturi, M.: Sinkhorn distances: lightspeed computation of optimal transport. In: Advances in Neural Information Processing Systems, pp. 2292–2300 (2013)
Chizat, L., Peyré, G., Schmitzer, B., Vialard, F.-X.: Scaling algorithms for unbalanced optimal transport problems. Math. Comput. 87(314), 2563–2609 (2018)
Bertsekas, D.P., Castanon, D.A.: The auction algorithm for the transportation problem. Ann. Oper. Res. 20(1), 67–96 (1989)
Gallouët, T.O., Mérigot, Q.: A Lagrangian scheme à la Brenier for the incompressible Euler equations. Found. Comput. Math. 18(4), 835–865 (2018)
Mérigot, Q.: A multiscale approach to optimal transport. Comput. Graph. Forum 30(5), 1583–1592 (2011)
Schmitzer, B.: A sparse multiscale algorithm for dense optimal transport. J. Math. Imaging Vis. 56(2), 238–259 (2016)
Benamou, J.D., Brenier, Y.: A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem. Numer. Math. 84(3), 375–393 (2000)
Friesecke, G., Schulz, A.S., Vögler, D.: Genetic column generation: fast computation of high-dimensional multimarginal optimal transport problems. SIAM J. Sci. Comput. 44(3), 1632–1654 (2022)
Sobol, I.M.: Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates. Math. Comput. Simul. 55(1), 271–280 (2001)
Owen, A.B., Dick, J., Chen, S.: Higher order Sobol’ indices. Inf. Inference 3(1), 59–81 (2014)
De Castro, Y., Gamboa, F., Henrion, D., Hess, R., Lasserre, J.-B.: Approximate optimal designs for multivariate polynomial regression. Ann. Stat. 47(1), 127–155 (2019)
Lasserre, J.B., Pauwels, E., Putinar, M.: The Christoffel–Darboux Kernel for Data Analysis. Cambridge University Press, Cambridge (2022)
Roos Hoefgeest, P., Slot, L.: The Christoffel–Darboux kernel for topological data analysis. In: 39th International Symposium on Computational Geometry (SoCG 2023) (2023). Schloss Dagstuhl-Leibniz-Zentrum für Informatik
Frank, M., Dubroca, B., Klar, A.: Partial moment entropy approximation to radiative heat transfer. J. Comput. Phys. 218(1), 1–18 (2006)
Dubroca, B., Feugeas, J.-L., Frank, M.: Angular moment model for the Fokker–Planck equation. Eur. Phys. J. D 60, 301–307 (2010)
Alldredge, G.W., Frank, M., Giesselmann, J.: On the convergence of the regularized entropy-based moment method for kinetic equations. SMAI J. Comput. Math. 9, 1–29 (2023)
Marx, S., Weisser, T., Henrion, D., Lasserre, J.-B.: A moment approach for entropy solutions to nonlinear hyperbolic PDEs. MCRF 10(1), 113–140 (2020)
Cardoen, C., Marx, S., Nouy, A., Seguin, N.: A moment approach for entropy solutions of parameter-dependent hyperbolic conservation laws. arXiv preprint arXiv:2307.10043 (2023)
Gloria, A., Otto, F.: An optimal variance estimate in stochastic homogenization of discrete elliptic equations. Ann. Probab. 39(3), 779–856 (2011)
Kaminski, M.: The Stochastic Perturbation Method for Computational Mechanics. Wiley, Hoboken (2013)
Lasserre, J.B.: A semidefinite programming approach to the generalized problem of moments. Math. Program. 112(1), 65–92 (2008)
Catala, P.: Positive semidefinite relaxations for imaging science. PhD thesis, PSL University (2020)
Henrion, D., Lasserre, J.B.: Graph recovery from incomplete moment information. Constr. Approx. 1–23 (2022)
Marx, S., Pauwels, E., Weisser, T., Henrion, D., Lasserre, J.B.: Semi-algebraic approximation using Christoffel–Darboux kernel. Constr. Approx. 54(3), 391–429 (2021)
Alfonsi, A., Coyaud, R., Ehrlacher, V., Lombardi, D.: Approximation of optimal transport problems with marginal moments constraints. Math. Comput. 90(328), 689–737 (2021)
Vacher, A., Muzellec, B., Rudi, A., Bach, F., Vialard, F.-X.: A dimension-free computational upper-bound for smooth optimal transport estimation. In: Conference on Learning Theory, pp. 4143–4173 (2021). PMLR
Muzellec, B., Vacher, A., Bach, F., Vialard, F.-X., Rudi, A.: Near-optimal estimation of smooth transport maps with kernel sums-of-squares. arXiv preprint arXiv:2112.01907 (2021)
Lasserre, J.B.: Moments, Positive Polynomials and Their Applications, vol. 1. World Scientific Publishing Company, London (2009)
Schmüdgen, K.: The moment problem on compact semi-algebraic sets. In: The Moment Problem, pp. 283–313. Springer, Cham (2017)
Putinar, M.: Positive polynomials on compact semi-algebraic sets. Indiana Univ. Math. J. 42(3), 969–984 (1993)
Villani, C.: Topics in Optimal Transportation, vol. 58. American Mathematical Soc, Providence (2003)
Agueh, M., Carlier, G.: Barycenters in the Wasserstein space. SIAM J. Math. Anal. 43(2), 904–924 (2011)
Alvarez-Melis, D., Jegelka, S., Jaakkola, T.S.: Towards optimal transport with global invariances. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1870–1879 (2019). PMLR
Mémoli, F.: Gromov–Wasserstein distances and the metric approach to object matching. Found. Comput. Math. 11(4), 417–487 (2011)
Beier, F., Beinert, R., Steidl, G.: Multi-marginal Gromov–Wasserstein transport and barycenters. Inf. Inference J. IMA 12(4), 2753–2781 (2023)
Peyré, G., Cuturi, M., Solomon, J.: Gromov–Wasserstein averaging of kernel and distance matrices. In: International Conference on Machine Learning, pp. 2664–2672 (2016). PMLR
Dumont, T., Lacombe, T., Vialard, F.-X.: On the existence of Monge maps for the Gromov–Wasserstein problem (2022)
Pauwels, E., Putinar, M., Lasserre, J.B.: Data analysis from empirical moments and the Christoffel function. Found. Comput. Math. 21(1), 243–273 (2021)
Vu, M.T., Bachoc, F., Pauwels, E.: Rate of convergence for geometric inference based on the empirical Christoffel function. ESAIM Probab. Stat. 26, 171–207 (2022)
Dunkl, C.F., Xu, Y.: Orthogonal Polynomials of Several Variables. Cambridge University Press, Cambridge (2014)
Lasserre, J.B., Pauwels, E.: The empirical Christoffel function with applications in data analysis. Adv. Comput. Math. 45(3), 1439–1468 (2019)
Feydy, J., Séjourné, T., Vialard, F.-X., Amari, S.-I., Trouvé, A., Peyré, G.: Interpolating between optimal transport and MMD using Sinkhorn divergences. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 2681–2690 (2019). PMLR
Nouy, A., Grelier, E., Giraldi, L.: ApproximationToolbox. 10.5281/zenodo.3653971
Henrion, D., Lasserre, J.-B., Löfberg, J.: Gloptipoly 3: moments, optimization and semidefinite programming. Optim. Methods Softw. 24(4–5), 761–779 (2009)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mula, O., Nouy, A. Moment-SoS methods for optimal transport problems. Numer. Math. 156, 1541–1578 (2024). https://doi.org/10.1007/s00211-024-01422-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00211-024-01422-x