1 Introduction

The Optimal Transport (OT) problem is a classical minimization problem dating back to the work of Monge [24] and Kantorovich [20, 21]. In this problem, we are given two probability measures, namely \(\mu \) and \(\nu \), and we search for the cheapest way to reshape \(\mu \) into \(\nu \). The effort needed in order to perform this transformation depends on a cost function, which describes the underlying geometry of the product space of the support of the two measures. In the right setting, this effort induces a distance between probability measures.

During the last century, the OT problem has been fruitfully used in many applied fields such as the study of systems of particles by Dobrushin [13], the Boltzmann equation by Tanaka in [17,18,19], and the field of fluidodynamics by Yann Brenier [9]. All these results pointed out that , by a qualitative description of optimal transport, it was possible to gain insightful information on many open problems. For this reason, the Optimal Transport problem has become a topic of major interest for analysts, probabilists and statisticians [3, 29, 31]. In particular, a plethora of results concerning the uniqueness [10, 14, 16], the structure [1, 2, 28], and the regularity [8, 23] of the optimal transportation plan in the continuous framework has been proved.

In recent years, it has also become a crucial sub-problem in several applications in Computer Vision [7, 25,26,27], Computational Statistics [22], Probability [5, 6], and Machine Learning [4, 11, 15, 30]. However, in these fields, the measures \(\mu \) and \(\nu \) are discrete, and therefore the optimal transportation plans lack most of the good properties their continuous counterparts enjoy.

In this paper, we study the structure of optimal transportation plans between discrete probability measures. After introducing the notion of trim plan between the measures \(\mu \) and \(\nu \), we prove that such plans are the sum of two deterministic plans, i.e., plans that are induced by the action of two suitable push-forward maps. The first map acts on a portion \(\mu ^{(d)}\) of \(\mu \), while the other one acts on a portion \(\nu ^{(d)}\) of \(\nu \) (Theorem 3). Thanks to this formula, we recover an extension of the estimate given in [8]. Namely, we estimate the infinity-Wasserstein distance between a pair of discrete measures \((\mu ,\nu )\) (see Definition 4 below) by the c-Wasserstein distance between \(\mu \) and \(\nu \), times a quantity that only depends on \(\mu \) and \(\nu \) (Theorem 7).

2 Basic Notions on Optimal Transport

In this section, following [31], we recall the main definitions regarding optimal transportation and we examine the continuous counterpart [8] to our \(W^\infty \) estimate.

Given a polish space (Xd), we denote with \(\mathcal {B}(X)\) the set of Borel sets over X, while with \(\mathcal {P}(X)\) we denote the set of Borel measures over X. Given a Borel measurable function \(T:X\rightarrow Y\), we denote with \(T_\#:\mathcal {P}(X)\rightarrow \mathcal {P}(Y)\) the push-forward operator induced by T, defined by: \((T_\#\mu )[A]=\mu [T^{-1}(A)]\). The projection maps are \(\mathfrak {p}_X:X\times Y \rightarrow X\), \(\mathfrak {p}_X(x,y)=x\) and \(\mathfrak {p}_Y:X\times Y \rightarrow Y\), \(\mathfrak {p}_Y(x,y)=y\).

Definition 1

Let \(\mu \) and \(\nu \) be two measures over two polish spaces X and Y. The probability measure \(\pi \in \mathcal {P}(X\times Y)\) is a transportation plan between \(\mu \) and \(\nu \) if

$$\begin{aligned} (\mathfrak {p}_{X})_\#\pi =\mu \quad \quad \text {and} \quad \quad (\mathfrak {p}_{Y})_\#\pi =\nu . \end{aligned}$$

We denote with \(\Pi (\mu ,\nu )\) the set of all the transportation plans between \(\mu \) and \(\nu .\)

Given \(A\in \mathcal {B}(X)\) and \(B\in \mathcal {B}(Y)\), the quantity \(\pi (A\times B)\) is the amount of mass that travels from the set A to the set B. By assigning a cost function c on \(X\times Y\) we specify a way to measure the cost of every transportation plan.

Definition 2

Let \(\mu \in \mathcal {P}(X)\), \(\nu \in \mathcal {P}(Y)\), and let \(c:X\times Y\rightarrow [0,+\infty )\) be a lower semicontinuous (l.s.c.) symmetric cost function. The transportation functional \(\mathbb {T}_c:\Pi (\mu ,\nu )\rightarrow [0,+\infty )\) is defined as

$$\begin{aligned} \mathbb {T}_c(\pi ):=\int _{X\times Y}c\ \mathrm{d}\pi . \end{aligned}$$
(1)

Given two measures \(\mu \in \mathcal {P}(X)\), \(\nu \in \mathcal {P}(Y)\), and a cost function c, the optimal transportation problem consists in finding the infimum of \(\mathbb {T}_c\) over \(\Pi (\mu ,\nu )\), i.e.

$$\begin{aligned} \inf _{\pi \in \Pi (\mu ,\nu )} \mathbb {T}_c(\pi ). \end{aligned}$$
(2)

By making further assumptions on c, it is possible to prove that the infimum in (2) is actually a minimum. In particular, when the cost function is nonnegative, the solution exists. For a complete discussion on the existence of the solution, we refer to [31, Chapter 4].

We can use the optimal transportation problem to define a distance over the space \(\mathcal {P}(X)\). In particular, since X is a polish space, we can lift the distance d from X to \(\mathcal {P}(X)\), by choosing d as a cost function in (1).

Definition 3

Let (Xd) be a polish space and \(p\in [1,\infty )\). The Wasserstein distance of order p between the probability measures \(\mu \) and \(\nu \) on X is defined as

$$\begin{aligned} W_p(\mu ,\nu ):=\Big (\inf _{\pi \in \Pi (\mu ,\nu )} \mathbb {T}_{d^p}(\pi )\Big )^{\frac{1}{p}} =\Big (\inf _{\pi \in \Pi (\mu ,\nu )} \int _{X\times Y}d^p(x,y) \mathrm{d}\pi (x,y)\Big )^{\frac{1}{p}}. \end{aligned}$$
(3)

When \(p=1\), the 1-Wasserstein distance is also known as Kantorovich-Rubinstein distance.

When the cost function is not the space distance d, we denote the infimum in (2) with \(W_c(\mu ,\nu )\).

Remark 1

The infimum in (3) could actually be \(+\infty \), it is thus customary to restrict \(W_p\) to the space of probability measures with finite p-moments.

Definition 4

Given a cost function c, the \(W^{(\infty )}_c\) distance between two measures \(\mu \) and \(\nu \) is defined as

$$\begin{aligned} W^{(\infty )}_c(\mu ,\nu )=\inf _{\pi \in \Pi (\mu ,\nu )}||c ||_{L^{\infty }_\pi } \end{aligned}$$

where \(||\,\cdot \,||_{L^{\infty }_\pi }\) is the \(L^{\infty }\) norm with respect to the measure \(\pi \). When c is the Euclidean distance, we use the notation: \(W^{(\infty )}\).

Let \(\mu \) and \(\nu \) be two probability measures on a Lipschitz regular and bounded subset \(\Omega \subset \mathbb {R}^n\). We define the cost function

$$\begin{aligned} c_p({\textbf {x}},{\textbf {y}}):=\left( \sqrt{\sum _{i=1}^n|x_i-y_i|^2}\right) ^{p}, \quad \quad p>1. \end{aligned}$$

When \(\mu \) is absolutely continuous with respect to the Lebesgue measure, it is well known (Theorem 6.3 and Theorem 6.4, [16]) that the optimal transportation plan \(\pi \) between \(\mu \) and \(\nu \) is unique and it is induced by a transportation map \(T_p\), i.e.

$$\begin{aligned} \pi =(Id,T_p)_\#\mu . \end{aligned}$$

In [8], Bouchitté et al. established an \(L^{\infty }_\mu \)-bound on the displacement map \(Id-T_p\), which only depends on the shape of \(\Omega \), on p, and on the density of \(\mu \). This estimate allowed the authors to give the following upper bound on the \(W^{(\infty )}\) distance between \(\mu \) and \(\nu \).

Theorem 1

(Theorem 1.2, [8]) Let \(\Omega \) be a bounded connected open subset of \(\mathbb {R}^n\) with Lipschitz boundary and denote by \(\mathcal {P}(\overline{\Omega })\) (resp. \(\mathcal {P}_{ac}(\Omega )\)) the set of Borel (resp. absolutely continuous) probability measures on \(\overline{\Omega }\). Then, for every \(p>1\) and every pair \((\mu ,\nu )\in \mathcal {P}_{ac}(\Omega )\times \mathcal {P}(\overline{\Omega })\) there holds

$$\begin{aligned} (W^{(\infty )}(\mu ,\nu ))^{p+n}\le C_{p,n}(\Omega )||f^{-1}||_{L^{\infty }(\Omega )}W^p_p(\mu ,\nu ), \end{aligned}$$
(4)

where f is the density of \(\mu \) with respect to the Lebesgue measure and \(C_{p,n}(\Omega )\) is a positive constant depending only on pn, and \(\Omega \).

The proof of this result heavily relies on the regularity of \(\mu \), hence, when \(\mu \) and \(\nu \) are both discrete, this result does not apply. In particular, we are no longer able to find a constant depending only on \(\mu \) and the geometry of the support of \(\mu \), as the following example shows.

Example 1

Let \(\mu ,\nu _\epsilon \in \mathcal {P}(\mathbb {R})\) be defined as

$$\begin{aligned} \mu =\dfrac{1}{2}\delta _{0}+\dfrac{1}{2}\delta _{1},\quad \quad \quad \nu _\epsilon =\dfrac{1-\epsilon }{2}\delta _{0}+\dfrac{1+\epsilon }{2}\delta _{1}, \end{aligned}$$

for \(\epsilon \in (0,1)\), and let \(c_2(x,y)=|x-y|^2\). By a simple computation we have that

$$\begin{aligned} W^{(\infty )}_2(\mu ,\nu _\epsilon )=1,\quad \quad \quad W^2_2(\mu ,\nu _\epsilon )=\dfrac{\epsilon }{2}. \end{aligned}$$

Hence, estimate (4) does not hold true, as for every constant \(C(p,n,\Omega ,\mu )>0\) (possibly depending on \(p,n,\Omega ,\mu \)), there exists \(\epsilon >0\) such that

$$\begin{aligned} (W^{(\infty )}(\mu ,\nu _\epsilon ))^{2+1}=1 > C(p,n,\Omega ,\mu ) W^2_2(\mu ,\nu )=\epsilon C(p,n,\Omega ,\mu ). \end{aligned}$$

3 Structure of Discrete Optimal Transportation Plans

In what follows, we prove the existence of an optimal transportation plan between two discrete measures that is induced by the action of two push-forward functions, one going from X to Y and one going from Y to X. This allows us to establish a bound on \(W^{(\infty )}(\mu ,\nu )\), similar to the one proved in [8]. We always assume \(\# X=\# Y =n \in \mathbb {N}\). In this case, we can identify the sets X and Y with \(\{1,\dots , n\}\). Without loss of generality, we therefore assume \(X=Y\). In this setting, a measure \(\mu \in \mathcal {P}(X)\) has the form \(\sum _{x\in X} \mu _x\delta _x\), we thus use the notation \(\mu _x\) to denote the coefficient of \(\mu \) in x and, likewise, \(c_{x,y}\) (resp. \(\pi _{x,y}\)) stands for the value of \(c:X\times Y \rightarrow \mathbb {R}\) (resp. the coefficient of \(\pi \in \mathcal {P}(X\times Y)\)) in the point \((x,y)\in X\times Y\).

Definition 5

Let \(\mu ,\nu \in \mathcal {P}(X)\) be two measures on a discrete set X and let \(c:X\times X\rightarrow \mathbb {R}\) be a cost function. A minimal solution \(\pi ^*\) of the transportation problem is said to be trim if

$$\begin{aligned} \# \mathrm{spt}(\pi ^*)\le \# \mathrm{spt} (\pi ) \end{aligned}$$

for each optimal solution \(\pi \).

Lemma 2

Let \(\pi \in \Pi (\mu ,\nu )\) be a trim solution. Then each restriction of \(\pi \) is a trim solution for its marginals. In particular, if \(\pi ^{(1)}\) and \(\pi ^{(2)}\) are such that

$$\begin{aligned} \pi =\pi ^{(1)}+\pi ^{(2)} \end{aligned}$$

and \(\mathrm{spt}(\pi ^{(1)})\cap \mathrm{spt}(\pi ^{(2)})=\emptyset \), then \(\pi ^{(1)}\) and \(\pi ^{(2)}\) are trim solutions for their marginals.

Proof

Let \(\pi ^{*}\) be a restriction of \(\pi \). By Theorem 4.6 (Chapter 4, [31]), we know that \(\pi ^*\) is optimal between its marginals, hence we only need to prove that its support has minimal cardinality.

Arguing by contradiction, let us assume that \(\pi ^*\) is not trim, hence there exists another optimal plan \(\eta \) between the marginals of \(\pi ^*\) such that

$$\begin{aligned} \#\mathrm{spt}(\eta )<\#\mathrm{spt}(\pi ^*). \end{aligned}$$

We can define the measure \(\hat{\pi }\) as

$$\begin{aligned} \hat{\pi }=\pi -\pi ^* +\eta , \end{aligned}$$

since \(\pi \ge \pi ^*\) and \(\eta \ge 0\), we have \(\hat{\pi }\ge 0\). Moreover, since \(\pi ^*\) and \(\eta \) have the same marginals, \(\hat{\pi }\) has the same marginals of \(\pi \), therefore \(\hat{\pi }\in \Pi (\mu ,\nu )\). Moreover, since \(\pi ^*\) and \(\eta \) are optimal between their marginals, we have

$$\begin{aligned} \sum _{(x,y)\in X\times X}c_{x,y}\pi ^*_{x,y}=\sum _{(x,y)\in X\times X}c_{x,y}\eta _{x,y}, \end{aligned}$$

thus

$$\begin{aligned} \sum _{(x,y)\in X\times X}c_{x,y}\hat{\pi }_{x,y}= & {} \sum _{(x,y)\in X\times X}c_{x,y}\pi _{x,y}-\sum _{(x,y)\in X\times X}c_{x,y}\pi ^*_{x,y}\\&\quad&+\sum _{(x,y)\in X\times X}c_{x,y}\eta _{x,y}\\= & {} \sum _{(x,y)\in X\times X}c_{x,y}\pi _{x,y}. \end{aligned}$$

In particular, \(\pi \) and \(\hat{\pi }\) have the same cost, therefore \(\hat{\pi }\) is an optimal transportation plan between \(\mu \) and \(\nu \).

To conclude, we notice that, since \(\pi ^*\) is a restriction of \(\pi \), we have

$$\begin{aligned} \#\mathrm{spt}(\pi )=\#\mathrm{spt}(\pi -\pi ^*)+\#\mathrm{spt}(\pi ^*)>\#\mathrm{spt}(\pi -\pi ^*)+\#\mathrm{spt}(\eta )\ge \#\mathrm{spt}(\hat{\pi }), \end{aligned}$$

which concludes the contradiction, since \(\pi \) is trim by hypothesis. \(\square \)

Theorem 6.3 in [16] states that, whenever \(\mu \) is an absolutely continuous measure supported over a compact set \(\Omega \subset \mathbb {R}^n\) and the cost function c is a strictly convex function of the Euclidean distance, the optimal transportation plan is induced by a transportation map, regardless of the regularity of \(\nu \). When \(\mu \) and \(\nu \) are both discrete, this result is generally false. However, in the next Theorem 3, we show that there exists at least one optimal transportation plan between two measures that can be recreated as the action of two functions, one acting from a subset \(\tilde{X}\subset \mathrm{spt}(\mu )\) to \( \mathrm{spt}(\nu )\) and one acting from a subset \(\tilde{Y}\subset \mathrm{spt}(\nu )\) to \(\mathrm{spt}(\mu )\).

Theorem 3

Let X be a discrete polish space and let \(\mu \) and \(\nu \) be two positive measures over the set X such that

$$\begin{aligned} \mu _{a}>0 \quad \quad \quad \forall a \in X, \\ \nu _b>0 \quad \quad \quad \forall b \in X, \end{aligned}$$

and

$$\begin{aligned} \sum _{a\in X }\mu _a=\sum _{b \in X}\nu _b. \end{aligned}$$

Given a cost function \(c:X\times X \rightarrow \mathbb {R}\), let \(\pi \) be a trim solution of the transportation problem. We can then find two couples of measures \((\mu ^{(d)},\mu ^{(c)})\) and \((\nu ^{(d)},\nu ^{(c)})\) and a couple of functions \(h^{(1)}\) and \(h^{(2)}\) such that

$$\begin{aligned} \mu&=\mu ^{(d)}+\mu ^{(c)}\quad \text {and}\quad \nu =\nu ^{(d)}+\nu ^{(c)}, \end{aligned}$$
(5)
$$\begin{aligned} \pi&=(Id,h^{(1)})_\#\mu ^{(d)}+(h^{(2)},Id)_\#\nu ^{(d)}. \end{aligned}$$
(6)

We say that the decomposition ensured by Theorem 3 is a diffusive model associated with the given (trim) solution \(\pi \). We call \(\mu ^{(d)}\) and \(\nu ^{(d)}\) the diffusive part of \(\mu \) and \(\nu \), respectively. Similarly, we denote with \(\mu ^{(c)}\) and \(\nu ^{(c)}\) the concentrating part of \(\mu \) and \(\nu \), respectively. Finally, we call \(h^{(1)}\) the diffusive scheme of \(\mu \) and \(h^{(2)}\) the diffusive scheme of \(\nu \).

Proof

We proceed by induction on the cardinality of X. If \(\#X=1\), the thesis follows trivially.

Let us now assume that the statement holds for each couple of measures whose support has cardinality \((n-1)\) and let \(\mu \) and \(\nu \) be two measures supported on a set with cardinality n, namely \(X_n\). Given a trim solution \(\pi \), it is well known (Chapter 7, [12]) that

$$\begin{aligned} \# \mathrm{spt} (\pi )\le 2n-1. \end{aligned}$$

Since \(\mu \) and \(\nu \) have n points in their support, we can find \(\bar{a}\in X\) such that there exists a unique \(\bar{b}\in \mathrm{spt}(\nu )\) for which

$$\begin{aligned} \pi _{\bar{a},\bar{b}}>0, \end{aligned}$$

hence \(\mu _{\bar{a}}=\pi _{\bar{a},\bar{b}}\le \nu _{\bar{b}}\). Similarly, we can find \(\underline{b}\in X\) such that there exists a unique \(\underline{a}\in \mathrm{spt}(\mu )\) for which

$$\begin{aligned} \pi _{\underline{a},\underline{b}}>0, \end{aligned}$$

so that \(\nu _{\underline{b}}=\pi _{\underline{a},\underline{b}}\le \mu _{\underline{a}}\).

If \(\mu _{\bar{a}}=\pi _{\bar{a},\bar{b}} = \nu _{\bar{b}}\), we can restrict the plan \(\pi \) to the set \(\mathrm{spt}(\pi )\backslash \{(\bar{a},\bar{b})\}\). We denote this restriction with \(\pi _*\). By definition, the marginals of \(\pi _*\) are

$$\begin{aligned} \mu _*=\mu -\mu _{\bar{a}}\delta _{\bar{a}} \end{aligned}$$

and

$$\begin{aligned} \nu _*=\nu -\nu _{\bar{b}}\delta _{\bar{b}}. \end{aligned}$$

In particular, the supports of \(\mu _*\) and \(\nu _*\) contain \((n-1)\) points each. By induction we can find \((\mu ^{(d)}_*,\mu ^{(c)}_*)\), \((\nu ^{(d)}_*,\nu ^{(c)}_*)\), and \((h^{(1)}_*,h^{(2)}_*)\) such that

$$\begin{aligned} \mu _*=\mu ^{(d)}_*+\mu ^{(c)}_*, \\ \nu _*=\nu ^{(d)}_*+\nu ^{(c)}_*, \end{aligned}$$

and

$$\begin{aligned} \pi _*=(Id,h^{(1)}_*)_\#\mu ^{(d)}_* +(h^{(2)}_*,Id)_\#\nu ^{(d)}_*. \end{aligned}$$

We can then define

$$\begin{aligned} \mu ^{(d)}=\mu ^{(d)}_*+\mu _{\bar{a}}\delta _{\bar{a}}, \quad \quad \quad \mu ^{(c)}=\mu _*^{(c)}, \\ \nu ^{(d)}=\nu ^{(d)}_*, \quad \quad \quad \nu ^{(c)}=\nu _*^{(c)}+\nu _{\bar{b}}\delta _{\bar{b}}, \end{aligned}$$

and

$$\begin{aligned} h^{(1)}(a)= {\left\{ \begin{array}{ll} h^{(1)}_*(a)\quad \quad \quad \text {if }a\ne \bar{a},\\ \bar{b} \quad \quad \quad \quad \quad \; \text {otherwise,} \end{array}\right. },\quad \quad \quad \quad \quad h^{(2)}(b)=h^{(2)}_*(b). \end{aligned}$$

It easy to see that

$$\begin{aligned} \mu =\mu ^{(d)}+\mu ^{(c)}, \quad \quad \quad \nu =\nu ^{(d)}+\nu ^{(c)} \end{aligned}$$

and, since \(h^{(1)}_\#\delta _{\bar{a}}=\delta _{\bar{b}}\), we have

$$\begin{aligned} \pi =(Id,h^{(1)})_\#\mu ^{(d)}+(h^{(2)},Id)_\#\nu ^{(d)}, \end{aligned}$$
(7)

which concludes the proof in the case \(\mu _{\bar{a}}=\pi _{\bar{a},\bar{b}}=\nu _{\bar{b}}\). We proceed similarly if \(\nu _{\underline{b}}=\pi _{\underline{a},\underline{b}}= \mu _{\underline{a}}\). See Fig. 1 for a visual representation of this process in the case \(n=3\).

Fig. 1
figure 1

(Example 2) Visual comparison between the optimal plan \(\pi \), which is not trim (right) and a trim plan (left). The support of \(\mu \) is indicated by light gray dots, the support of \(\nu \) by dark gray dots, points i and j are connected by a solid edge if (ij) belongs to the support of the plan

To conclude, consider the case in which \(\mu _{\bar{a}}=\pi _{\bar{a},\bar{b}} < \nu _{\bar{b}}\) and \(\nu _{\underline{b}}=\pi _{\underline{a},\underline{b}}< \mu _{\underline{a}}\). In this case, we restrict \(\pi \) to the set \(\mathrm{spt}(\pi )\backslash \{(\bar{a},\bar{b}),(\underline{a},\underline{b})\}\). Let us denote again with \(\pi _*\) the restriction and with \(\mu _*\) and \(\nu _{*}\) its marginals. Since both \(\mu _*\) and \(\nu _*\) have \((n-1)\) points in their supports, we can again decompose them as

$$\begin{aligned} \mu _*=\mu ^{(d)}_*+\mu ^{(c)}_*, \quad \quad \quad \nu _*=\nu ^{(d)}_*+\nu ^{(c)}_* \end{aligned}$$

and find a couple of functions \(h^{(1)}_*,h^{(2)}_*\) for which

$$\begin{aligned} \pi _*=(Id,h^{(1)}_*)_\#\mu ^{(d)}_*+(h^{(2)}_*,Id)_\#\nu ^{(d)}_*. \end{aligned}$$

We can then define

$$\begin{aligned} \mu ^{(d)}=\mu ^{(d)}_*+\mu _{\bar{a}}\delta _{\bar{a}}, \quad \quad \quad \mu ^{(c)}=\mu _*^{(c)}+\mu _{\underline{a}}\delta _{\underline{a}}, \\ \nu ^{(d)}=\nu ^{(d)}_*{(c)}+\nu _{\underline{b}}\delta _{\underline{b}}, \quad \quad \quad \nu ^{(c)}=\nu _*^{(c)}+\nu _{\bar{b}}\delta _{\bar{b}}, \end{aligned}$$

and

$$\begin{aligned} h^{(1)}(a)= {\left\{ \begin{array}{ll} h^{(1)}_*(a)\quad \quad \text {if }a\ne \bar{a},\\ \bar{b} \quad \quad \quad \quad \; \text {otherwise.} \end{array}\right. }\quad \quad h^{(2)}(b)={\left\{ \begin{array}{ll} h^{(2)}_*(b)\quad \quad \text {if }b\ne \underline{b},\\ \underline{a} \quad \quad \quad \quad \; \text {otherwise,} \end{array}\right. } \end{aligned}$$

which concludes the thesis. \(\square \)

Remark 2

Given two measures as in the hypothesis of Theorem 3, let \(\mu ^{(d)}\) and \(\nu ^{(d)}\) be their diffusive parts. Since \(\mathrm{spt}(\mu ^{(d)})\subset \mathrm{spt}(\mu )\) and \(\mathrm{spt}(\nu ^{(d)})\subset \mathrm{spt}(\nu )\), the support of the transportation plan defined by formula (7) has, at most, 2n points. Thus the trim condition on the optimal transportation plan is necessary, as we are going to show in the next example.

Fig. 2
figure 2

Visual description of the decomposition process used in the proof of Theorem 3. Here, the measures \(\mu \) and \(\nu \) are one-dimensional and have 3 points each in their supports

Example 2

Let us take

$$\begin{aligned} \mu =\dfrac{1}{4}\bigg (\delta _{(0,0,0)}+\delta _{(1,1,0)}+\delta _{(1,0,1)}+\delta _{(0,1,1)}\bigg ) \end{aligned}$$

and

$$\begin{aligned} \nu =\dfrac{1}{4}\bigg (\delta _{(1,1,1)}+\delta _{(0,0,1)}+\delta _{(0,1,0)}+\delta _{(1,0,0)}\bigg ), \end{aligned}$$

(see Fig. 2) and, as a cost function, we choose the Euclidean distance in \(\mathbb {R}^3\), i.e.

$$\begin{aligned} |{\textbf {x}}-{\textbf {y}}|:=\sqrt{\sum _{i=1}^3(x_i-y_i)^2}. \end{aligned}$$

It is easy to see that the plan

$$\begin{aligned} \pi:= & {} \dfrac{1}{12}\delta _{(0,0,0)}\otimes \bigg (\delta _{(1,0,0)}+\delta _{(0,1,0)}+\delta _{(0,0,1)}\bigg )\\&\quad&+\dfrac{1}{12}\delta _{(1,1,0)}\otimes \bigg (\delta _{(0,1,0)}+\delta _{(1,0,0)}+\delta _{(1,1,1)}\bigg )\\&\quad&+\dfrac{1}{12}\delta _{(1,0,1)}\otimes \bigg (\delta _{(1,0,0)}+\delta _{(0,0,1)}+\delta _{(1,1,1)}\bigg )\\&\quad&+\dfrac{1}{12}\delta _{(0,1,1)}\otimes \bigg (\delta _{(0,1,0)}+\delta _{(0,0,1)}+\delta _{(1,1,1)}\bigg )\\ \end{aligned}$$

is optimal. However, according to Remark 2, it cannot be decomposed as in formula (7), since

$$\begin{aligned} \# \mathrm{spt}(\pi )=12>2\# \mathrm{spt}(\mu )=8. \end{aligned}$$

Remark 3

Given a trim solution, there might be more than one diffusive model associated with it. For example, let

$$\begin{aligned} \mu =\dfrac{1}{2}\delta _{(0,0)}+\dfrac{1}{2}\delta _{(1,1)}\quad \text {and}\quad \nu =\dfrac{1}{4}\delta _{(-1,1)}+\dfrac{3}{4}\delta _{(1,0)} \end{aligned}$$

be two discrete measures over \(\mathbb {R}^2\). As a cost function, we choose the Euclidean distance

$$\begin{aligned} c({\textbf {x}},{\textbf {y}}):=\sqrt{(x_1-y_1)^2+(x_2-y_2)^2}. \end{aligned}$$

Then, the probability measure

$$\begin{aligned} \pi =\dfrac{1}{4}\delta _{(0,0)}\otimes \delta _{(-1,1)}+\dfrac{1}{4}\delta _{(0,0)}\otimes \delta _{(1,0)}+\dfrac{1}{2}\delta _{(1,1)}\otimes \delta _{(1,0)} \end{aligned}$$

is a trim plan between \(\mu \) and \(\nu \). It easy to check that

$$\begin{aligned} \mu ^{(d)}=\dfrac{1}{4}\delta _{(0,0)}+\dfrac{1}{2}\delta _{(1,1)},\quad&\quad&\quad \mu ^{(c)}=\dfrac{1}{4}\delta _{(0,0)},\\ \nu ^{(c)}=\dfrac{1}{4}\delta _{(-1,1)}+\dfrac{1}{2}\delta _{(1,0)},\quad&\quad&\quad \nu ^{(d)}=\dfrac{1}{4}\delta _{(1,0)}, \end{aligned}$$

and

$$\begin{aligned} h^{(1)}:={\left\{ \begin{array}{ll} (-1,1) \quad if \; x=(0,0),\\ (+ 1,0) \quad if \; x=(1,1),\\ (0,0) \quad \quad otherwise, \end{array}\right. }\quad \quad h^{(2)}(y)=(0,0)\quad \forall y\in \mathbb {R}^2, \end{aligned}$$

is a decomposition of the trim plan. However, we can also decompose \(\nu \) as

$$\begin{aligned} \tilde{\nu }^{(d)}=\dfrac{1}{4}\delta _{(-1,1)},\quad \quad \quad \tilde{\nu }^{(c)}=\dfrac{3}{4}\delta _{(1,0)}, \end{aligned}$$

define the functions as

$$\begin{aligned} h^{(1)}({\textbf {x}})=(1,0)\quad \forall {\textbf {x}}\in \mathbb {R}^2, \quad \quad \quad h^{(2)}({\textbf {y}})=(0,0) \quad \forall {\textbf {y}}\in \mathbb {R}^2, \end{aligned}$$

and still obtain an admissible decomposition of \(\pi \).

4 An Upper Bound for the Infinity Wasserstein Distance in the Discrete Setting

As an immediate consequence of the diffusive model decomposition (5)–(6) given in Theorem 3, we can decompose the Wasserstein distance associated to a cost function c and use it to estimate the infinity-Wasserstein distance.

Corollary 4

Let \(\mu ,\nu \in \mathcal {P}(X)\) be two discrete measures, \(c:X\times X \rightarrow \mathbb {R}\) be a cost function, and \(\pi \) be a trim solution of the transportation problem. Given a diffusive model for \(\pi \), we have

$$\begin{aligned} W_c(\mu ,\nu )=\sum _{x\in X}c(x,h^{(1)}(x))\mu ^{(d)}_x+\sum _{y\in X}c(h^{(2)}(y),y)\nu ^{(d)}_y \end{aligned}$$

and

$$\begin{aligned} \mathbb {T}_c^{(\infty )}(\pi )=\max \bigg \{||c(x,h^{(1)}(x))||_{L_{\mu ^{(d)}}^{\infty }},||c(h^{(2)}(y),y)||_{L^{\infty }_{\nu ^{(d)}}}\bigg \}. \end{aligned}$$

In particular, we have

$$\begin{aligned} W_c(\mu ,\nu )\ge \alpha W^{(\infty )}_c(\mu ,\nu ), \end{aligned}$$
(8)

where

$$\begin{aligned} \alpha =\min _{a\in \mathrm{spt}(\mu ^{(d)}),b\in \mathrm{spt}(\nu ^{(d)})}\{\nu ^{(d)}_b,\mu ^{(d)}_a\}. \end{aligned}$$
(9)

The value \(\alpha \) defined in (9) depends on the particular diffusive model we choose. However, since \(W_c(\mu ,\nu )\) and \(W^{(\infty )}_c\) do not depend on the choice of the diffusive model, if we can give a lower bound on \(\alpha \) for a particular diffusive model, we can generalize the estimate (8).

Corollary 5

Let \(\mu ,\nu \in \mathcal {P}(X)\) be two discrete measures and \(c:X \times X\rightarrow \mathbb {R}_+\) be a cost function. For any trim plan \(\pi \), there exists a diffusive model for which

$$\begin{aligned} \alpha \ge \min _{(A,B)\in K(\mu ,\nu )}^{} \bigg \{ \bigg |\sum _{x\in A} \mu _x-\sum _{y\in B}\nu _y \bigg |\bigg \}, \end{aligned}$$
(10)

where \(\alpha \) is defined in relation (9) and

$$\begin{aligned} K(\mu ,\nu ):=\bigg \{(A,B)\subset X\times X\quad \text {s.t.}\quad \bigg |\sum _{x\in A}\mu _x-\sum _{y\in B}\nu _y\bigg |>0\bigg \}. \end{aligned}$$

Proof

Let n be the cardinality of X. Since \(\pi \) is trim between \(\mu \) and \(\nu \), we have \(\#\mathrm{spt}(\pi )\le 2n-1\), hence we can find \(\bar{x}_1\) such that

$$\begin{aligned} \exists ! \; \;\bar{y}_1 \quad s.t. \quad \pi _{\bar{x}_1,\bar{y}_1}\ne 0 \end{aligned}$$

and \(\underline{y}_1\) such that

$$\begin{aligned} \exists !\; \; \underline{x}_1 \quad s.t. \quad \pi _{\underline{x}_1,\underline{y}_1}\ne 0. \end{aligned}$$

If \(\underline{x}_1=\bar{x}_1\) (and hence \(\underline{y}_1=\bar{y}_1\)), we have \(\mu _{\bar{x}_1}=\nu _{\bar{y}_1}\) and we define

$$\begin{aligned} \mu ^{(d)}_{\bar{x}_1}=\mu _{\bar{x}_1}, \quad \quad \quad \nu ^{(c)}_{\bar{y}_1}=\mu _{\bar{x}_1}, \end{aligned}$$

and

$$\begin{aligned} \mu ^{(1)}:=\mu -\mu _{\bar{x}_1}\delta _{\bar{x}_1},\quad \nu ^{(1)}:=\nu -\nu _{\bar{y}_1}\delta _{\bar{y}_1}, \quad \pi ^{(1)}=\pi -\pi _{\bar{x}_1,\bar{y}_1}\delta _{\bar{x}_1,\bar{y}_1}. \end{aligned}$$

Otherwise, if \(\underline{x}_1\ne \bar{x}_1\) (and hence \(\underline{y}_1\ne \bar{y}_1\)), we set

$$\begin{aligned} \mu ^{(d)}_{\bar{x}_1}=\mu _{\bar{x}_1},\quad&\quad&\quad \mu ^{(c)}_{\underline{x}_1}=\nu _{\underline{y}_1},\\ \nu ^{(d)}_{\underline{y}_1}=\nu _{\underline{y}_1},\quad&\quad&\quad \nu ^{(c)}_{\bar{y}_1}=\mu _{\bar{x}_1}, \end{aligned}$$

and

$$\begin{aligned} \mu ^{(1)}= & {} \mu -\mu _{\bar{x}_1}\delta _{\bar{x}_1}-\nu _{\underline{y}_1}\delta _{\underline{x}_1},\\ \nu ^{(1)}= & {} \nu -\nu _{\underline{y}_1}\delta _{\underline{y}_1}-\mu _{\bar{x}_1}\delta _{\bar{y}_1}\\ \pi ^{(1)}= & {} \pi -\pi _{\bar{x}_1,\bar{y}_1}\delta _{\bar{x}_1,\bar{y}_1}-\pi _{\underline{x}_1,\underline{y}_1}\delta _{\underline{x}_1,\underline{y}_1}. \end{aligned}$$

In both cases we find two measures, \(\mu ^{(1)}\) and \(\nu ^{(1)}\), whose support has at most \(n-1\) points. Since \(\pi ^{(1)}\) is a restriction of a trim plan, by Lemma 2, also \(\pi ^{(1)}\) is trim between its marginals \(\mu ^{(1)}\) and \(\nu ^{(1)}\). Therefore, we can repeat the process, finding two points \(\bar{x}_2\) and \(\underline{y}_2\) for which

$$\begin{aligned} \exists ! \;\; \bar{y}_2 \quad s.t. \quad \pi _{\bar{x}_2,\bar{y}_2}\ne 0 \end{aligned}$$

and

$$\begin{aligned} \exists ! \;\; \underline{x}_2\quad s.t. \quad \pi _{\underline{x}_2,\underline{y}_2}\ne 0. \end{aligned}$$

We can then extend the definition of the measures \(\mu ^{(d)},\mu ^{(c)},\nu ^{(d)}\), and \(\nu ^{(c)}\), define the measures \(\mu ^{(2)}\), \(\nu ^{(2)}\), and \(\pi ^{(2)}\) and start all over again.

At each step, we define two measures \(\mu ^{(i)}\) and \(\nu ^{(i)}\) and increase the cardinality of the supports of \(\mu ^{(d)},\mu ^{(c)},\nu ^{(d)}\), and \(\nu ^{(c)}\). Given any \(x \in \mathrm{spt}(\mu ^{(d)})\), we can then find \(i\in \{0,1,\dots ,n-1\}\) such that

$$\begin{aligned} \mu ^{(d)}_x=\mu ^{(i)}_x, \end{aligned}$$
(11)

and, similarly, for any \(y\in \mathrm{spt}(\nu ^{(d)})\), we can find a \(j\in \{0,1,\dots ,n-1\}\) such that

$$\begin{aligned} \nu ^{(d)}_y=\nu ^{(j)}_y, \end{aligned}$$

with the convention \(\mu ^{(0)}=\mu \) and \(\nu ^{(0)}=\nu \). The relation between \(\mu ^{(i)}\) and \(\mu ^{(i+1)}\) is either

$$\begin{aligned} \mu ^{(i+1)}=\mu ^{(i)}-\mu ^{(i)}_{\bar{x}_{i+1}}\delta _{\bar{x}_{i+1}} \end{aligned}$$

or

$$\begin{aligned} \mu ^{(i+1)}=\mu ^{(i)}-\mu ^{(i)}_{\bar{x}_{i+1}}\delta _{\bar{x}_{i+1}}-\nu ^{(i)}_{\underline{y}_{i+1}}\delta _{\underline{x}_{i+1}}. \end{aligned}$$

Similarly, we have

$$\begin{aligned} \nu ^{(i+1)}=\nu ^{(i)}-\nu ^{(i)}_{\underline{y}_{i+1}}\delta _{\underline{y}_{i+1}} \end{aligned}$$

or

$$\begin{aligned} \nu ^{(i+1)}=\nu ^{(i)}-\nu ^{(i)}_{\underline{y}_{i+1}}\delta _{\underline{y}_{i+1}}-\mu ^{(i)}_{\bar{x}_{i+1}}\delta _{\bar{y}_{i+1}}. \end{aligned}$$

Similarly, we can write \(\mu ^{(i)}\) and \(\nu ^{(i)}\) as a function of \(\mu ^{(i-1)}\) and \(\nu ^{(i-1)}\), and then express \(\mu ^{(i+1)}\) through \(\mu ^{(i-1)}\) and \(\nu ^{(i-1)}\) as

$$\begin{aligned} \mu ^{(i+1)}_x=\sum _{a\in \tilde{A}_2}\mu ^{(i-1)}_a-\sum _{b\in \tilde{B}_2}\nu ^{(i-1)}_b, \end{aligned}$$
(12)

where \(\tilde{A}_2\) and \(\tilde{B}_2\) are two subsets of X whose cardinality is at most two. By iterating this process, we are able to find

$$\begin{aligned} \mu ^{(i+1)}_x=\sum _{a\in \tilde{A}_{n-(i+1)}}\mu _a-\sum _{b\in \tilde{B}_{n-(i+1)}}\nu _b, \end{aligned}$$
(13)

where \(\tilde{A}_{n-(i+1)}\) and \(\tilde{B}_{n-(i+1)}\) are subsets of X, whose cardinality is \(n-(i+1)\). Since the left side of (12) is positive, we can rewrite (13) as

$$\begin{aligned} \mu ^{(i+1)}_x=\bigg |\sum _{a\in \tilde{A}_2}\mu ^{(i-1)}_a-\sum _{b\in \tilde{B}_2}\nu ^{(i-1)}_b\bigg |. \end{aligned}$$
(14)

By taking the minimum over \(K(\mu ,\nu )\) of the right side in (14), we find

$$\begin{aligned} \mu ^{(i)}_x\ge \min _{(A,B)\in K(\mu ,\nu )}\bigg \{\bigg |\sum _{x\in A}\mu _x-\sum _{y\in B}\nu _y\bigg |\bigg \}, \end{aligned}$$

for any \(i \in \{0,1,\dots ,n-1\}\) and each \(x\in \mathrm{spt}(\mu ^{(i)})\), therefore, from relation (11), we get

$$\begin{aligned} \mu ^{(d)}\ge \min _{(A,B)\in K(\mu ,\nu )}\bigg \{\bigg |\sum _{x\in A}\mu _x-\sum _{y\in B}\nu _y\bigg |\bigg \}. \end{aligned}$$

Similarly, one can prove

$$\begin{aligned} \nu _y^{(d)}\ge \min _{(A,B)\in K(\mu ,\nu )}\bigg \{\bigg |\sum _{x\in A}\mu _x-\sum _{y\in B}\nu _y\bigg |\bigg \}, \end{aligned}$$

for each \(y \in \mathrm{spt}(\nu ^{(d)})\), hence relation (10) is proven. \(\square \)

In Corollary 4, we bound \(W_c^{(\infty )}\) from above with \(W_c\). However, due to the properties of \(W^{(\infty )}_c\), it is possible to relate this distance to the Wasserstein cost induced by any \(p-\)power of the same cost function.

Lemma 6

Let \(\mu ,\nu \in \mathcal {P}(X)\) and let \(c:X\times X \rightarrow \mathbb {R}_+\) be a cost function. Given any \(p>0\), it holds true

$$\begin{aligned} W^{(\infty )}_{c^p}(\mu ,\nu )=\big (W^{(\infty )}_c(\mu ,\nu )\big )^p. \end{aligned}$$

Proof

Let \(\pi \in \Pi (\mu ,\nu )\) be a plan such that

$$\begin{aligned} T_{c}(\pi )=W_c^{(\infty )}(\mu ,\nu ), \end{aligned}$$

then

$$\begin{aligned} W^{(\infty )}_{c^p}(\mu ,\nu )\le T_{c^p}(\pi )=T_c(\pi )^p=\big (W_c^{(\infty )}(\mu ,\nu )\big )^p. \end{aligned}$$

Similarly, one can prove \(\big (W_c^{(\infty )}(\mu ,\nu )\big )^p\le W^{(\infty )}_{c^p}(\mu ,\nu )\) and conclude the thesis. \(\square \)

Thanks to Lemma 6, we are able to prove the following result.

Theorem 7

Given a cost function \(c:X\times X\rightarrow [0,\infty )\), let \(\mu ,\nu \in \mathcal {P}(X)\) be two discrete measures. For any \(p\ge 1\),

$$\begin{aligned} W^{(\infty )}_{c}(\mu ,\nu )\le \frac{W_{c_p}(\mu ,\nu )}{(\alpha _p)^{\frac{1}{p}}}, \end{aligned}$$
(15)

where \(\alpha _p\) is the constant defined in (9).

Proof

Given a \(p\ge 1\), let us denote with \(\pi ^{(p)}\) the trim optimal transportation plan between \(\mu \) and \(\nu \) according to the cost function \(c_p\). Given a diffusive model for \(\pi ^{(p)}\), we denote with \(\alpha _p\) the constant defined in (9). From Lemma 6 we have

$$\begin{aligned} W^{(\infty )}_{c_p}(\mu ,\nu )=(W^{(\infty )}_c(\mu ,\nu ))^p, \end{aligned}$$

hence, for any p, we have

$$\begin{aligned} (W^{(\infty )}_c(\mu ,\nu ))^p=W^{(\infty )}_{c_p}(\mu ,\nu )\le \frac{W^p_{c_p}(\mu ,\nu )}{\alpha _p}, \end{aligned}$$

i.e.,

$$\begin{aligned} W^{(\infty )}_c(\mu ,\nu )\le \frac{W_{c_p}(\mu ,\nu )}{(\alpha _p)^{\frac{1}{p}}}. \end{aligned}$$

\(\square \)

In particular, since the constant \(\alpha \) from Corollary 5 bounds from below every \(\alpha _p\) and does not depend on the cost function but only on the starting measures \(\mu \) and \(\nu \), we have

$$\begin{aligned} W^{(\infty )}_c(\mu ,\nu )\le \frac{W_{c_p}(\mu ,\nu )}{(\alpha )^{\frac{1}{p}}} \end{aligned}$$

for any \(p\ge 1\). In particular, if we take

$$\begin{aligned} c({\textbf {x}},{\textbf {y}}):=\sqrt{\sum _{i=1}^n|x_i-y_i |^2}, \end{aligned}$$

we recover the bound proposed in Theorem 1 for discrete measures.

Remark 4

The estimate in (15) is sharp. To prove it, let us take

$$\begin{aligned} \mu =\delta _a \quad \quad \text {and}\quad \quad \nu =\delta _b \end{aligned}$$

where \(a,b\in \mathbb {R}^n\). By definition (9), we have \(\alpha =1\). Moreover, it is easy to see that

$$\begin{aligned} W^{(\infty )}(\mu ,\nu )=|a-b|\quad \quad \text {and}\quad \quad W_p(\mu ,\nu )=|a-b|, \end{aligned}$$

which proves the sharpness of inequality (8).