1 Introduction

The usual Monge–Kantorovich optimal transport problem consists, given a transportation cost and distribution of sources and targets, in finding a transport plan making the average transport cost minimal. It has attracted a considerable amount of attention in the last three decades, as can be seen from the textbooks of Villani [17, 18] and Santambrogio [15]. In optimal transport probems with a supremal cost (also called \(L^\infty \) optimal transport), one rather looks for transport plans which minimize the essential supremum of the cost. Whereas the usual Monge–Kantorovich problem is linear programming, \(L^\infty \) optimal transport leads to non-convex optimization (eventhough the supremal cost has convex sublevel sets), which to a large extent, explains why there are less theoretical results and numerical methods (with the notable exception of the recent combinatorial approach of Bansil and Kitagawa [1]) to address them. As in the Calculus of Variations with a supremal functional, \(L^\infty \) optimal transport may admit many minimizers and selecting special ones which satisfy tractable optimality conditions is an important issue, which was studied first by Champion, De Pascale and Juutinen in [6]. In contrast with the classical Monge–Kantorovich problem, where restrictions of optimal plans remain optimal between their marginals, this might be false for \(L^\infty \) optimal transport. This is why the authors of [6] have introduced the notion of restrictable optimal and shown that such restrictable solutions are characterized by a remarkable property of \(\infty \)-cyclical monotonicity of their support. This was the starting point for the existence of optimal maps for \(L^\infty \) optimal transport under various conditions on the cost and the marginals, see [3, 6, 10].

Among numerical methods for optimal transport (see Cuturi and Peyré [14], Benamou [2], Mérigot and Thibert [12]), the entropic penalization approach and the Sinkhorn algorithm have gained a lot of popularity since Cuturi’s paper [7]. Entropic optimal transport, which has connections with large deviations and the so-called Schrödinger bridge problem, see Léonard [11] has also stimulated an intensive stream of recent theoretical research, see the lecture notes of Nutz [13] and the references therein. A recent breakthrough in the field is the work of Bernton, Ghosal and Nutz [8] where a large deviations principle, related to cyclical monotonicity is established for entropic optimal plans.

The goal of the present paper is to investigate, theoretically and numerically, whether the entropic approximation strategy can be used for \(L^\infty \) optimal transport as well. We will in particular see how the results of [8] can be used to show that this approximation selects at the limit the distinguished restrictable \(\infty \)-cyclically monotone minimizers introduced in [6].

The article is organized as follows. The setting is introduced in Sect. 2. Section 3 is devoted to \(\Gamma \)-convergence towards the supremal cost functional. In Sect. 4, we study how the entropic penalization selects \(\infty \)-cyclically monotone plans in the limit. In Sect. 5, we give some quantitative convergence estimates and a large deviations upper bound in the spirit of [8]. Finally, we present some numerical illustrations in Sect. 6.

2 Assumptions and Notations

In the sequel, we will always assume that the transportation cost is continuous and nonnegative, \(c\in C(\mathbb {R}^d\times \mathbb {R}^d,\mathbb {R}_+)\), and that the fixed marginals of the problem, \(\mu ,\nu \) are two Borel probability measures on \(\mathbb {R}^d\), \(\mu ,\nu \in \mathcal {P}(\mathbb {R}^d)\), with compact support. Let \(\Pi (\mu , \nu )\) be the set of transport plans between \(\mu \) and \(\nu \) i.e. the set of Borel probability measures on \(\mathbb {R}^d\times \mathbb {R}^d\) having \(\mu \) and \(\nu \) as marginals. More precisely, a Borel probability measure \(\gamma \) on \(\mathbb {R}^d \times \mathbb {R}^d\) belongs to \(\Pi (\mu , \nu )\) when

$$\begin{aligned} \gamma (A\times \mathbb {R}^d)=\mu (A) \text{ and } \gamma (\mathbb {R}^d\times A)=\nu (A), \end{aligned}$$

for every Borel subset A of \(\mathbb {R}^d\). Note that every \(\gamma \) in \(\Pi (\mu , \nu )\) has its support in \({{\,\textrm{spt}\,}}(\mu )\times {{\,\textrm{spt}\,}}(\nu )\) and that c is uniformly continuous on \({{\,\textrm{spt}\,}}(\mu )\times {{\,\textrm{spt}\,}}(\nu )\). We are interested in the following \(\infty \)-optimal transport problem (see [6]):

figure a

In contrast with classical optimal transport where one minimizes an integral cost,

$$\begin{aligned} \inf _{\gamma \in \Pi (\mu , \nu )} \int _{\mathbb {R}^d\times \mathbb {R}^d}c(x,y)d\gamma , \end{aligned}$$
(OT)

(\(\infty \)-OT) is a non-convex and presumably harder problem.

Due to the success of entropic approximation of optimal transport with regularization parameter \(\varepsilon >0\)

figure b

recalled in the introduction, it seems natural to introduce, for \(\varepsilon >0\) and exponent \(p\ge 1\) the following functional \(J_{p,\varepsilon }:\mathcal {P}(\mathbb {R}^d\times \mathbb {R}^d)\rightarrow \mathbb {R}\cup \{+\infty \}\)

$$\begin{aligned} J_{p,\varepsilon }(\gamma ):={\left\{ \begin{array}{ll} \left( \int _{\mathbb {R}^d\times \mathbb {R}^d} c(x,y)^pd\gamma (x,y)+\varepsilon H(\gamma |\mu \otimes \nu ) \right) ^{\frac{1}{p}} \quad &{}\text {if} \ \gamma \in \Pi (\mu ,\nu ),\\ +\infty \quad &{}\text {otherwise,} \end{array}\right. } \end{aligned}$$

where H stands for relative entropy:

$$\begin{aligned} H(\gamma |\mu \otimes \nu )={\left\{ \begin{array}{ll} \int _{\mathbb {R}^d\times \mathbb {R}^d} \log \Big ( \frac{ \text {d} \gamma }{\text {d} \mu \otimes \nu }\Big ) \text{ d } \gamma &{} \text{ if } \gamma \ll \mu \otimes \nu , \\ + \infty \quad &{} \text{ otherwise. } \end{array}\right. } \end{aligned}$$

Note that due to strict convexity of the entropy, for every \(\varepsilon >0\) and \(p\ge 1\), \(J_{p, \varepsilon }\) admits a unique minimizer. We now denote by \(J_{\infty }:\mathcal {P}(\mathbb {R}^d\times \mathbb {R}^d)\rightarrow \mathbb {R}\cup \{+\infty \}\), the supremal functional

$$\begin{aligned} J_\infty (\gamma ):={\left\{ \begin{array}{ll} \gamma -\hbox {ess sup}\,c \quad &{}\text {if} \ \gamma \in \Pi (\mu ,\nu ),\\ +\infty \quad &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$

Finally, let us set

$$\begin{aligned} J_{p}:=J_{p,1}. \end{aligned}$$

Since \(H(\gamma \vert \mu \otimes \nu )\ge 0\) with an equality exactly when \(\gamma =\mu \otimes \nu \), \(J_{p, \varepsilon }(\gamma ) \ge \Vert c\Vert _{L^p(\gamma )}\) but also \(\Vert c\Vert _{L^p(\gamma )} \le J_{\infty }(\gamma )\). So, roughly speaking both approximations play in opposite directions: adding the entropic term is an approximation from above but approximating \(\Vert c\Vert _{L^{\infty }(\gamma )}\) by \(\Vert c\Vert _{L^p(\gamma )}\) is an approximation from below.

We also observe that letting \(p\rightarrow \infty \) and \(\varepsilon \rightarrow 0\) is not enough to ensure that minimizers of \(J_{p,\varepsilon }\) converge to a minimizer of \(J_{\infty }\) (i.e. a solution of \(\infty \)-OT). Indeed, if \(\Vert c\Vert _{\infty } \le \frac{1}{2}\) and \(\varepsilon =\frac{1}{p}\) the minimizer \(\gamma _p\) of \(J_{p, \frac{1}{p}}\) satisfies

$$\begin{aligned} H(\gamma _p \vert \mu \otimes \nu ) \le p 2^{-p} \end{aligned}$$

hence \(\gamma _p\) converges (actually strongly by Pinsker’s inequality, see e.g. Lemma 2.5 in [16]) to \(\mu \otimes \nu \) which in general is not a minimizer of \(J_{\infty }\). On the one hand, this suggests that \(\Gamma \)-convergence of the regularizations above to \(J_{\infty }\) require conditions relating \(\varepsilon \) to p. On the other hand, in the previous example, we see that the range of \(c^p\) compared to the size of the entropic penalization \(\varepsilon \) is crucial. But the solutions of the \(\infty \)-optimal transport problem are invariant when one replaces c by an increasing function of c, in particular one can replace c by \(c+2\) (say) so that \(c^p\) will typically dominate the entropic term and one can expect \(\Gamma \)-convergence as \(p\rightarrow \infty \) for a fixed (or even large) value of \(\varepsilon \) (see the next section for more details).

3 \(\Gamma \)-Convergence

Our first result concerns the \(\Gamma \)-convergence of \(J_{p, \varepsilon }\) to \(J_\infty \):

Theorem 3.1

Under the general assumptions of Sect. 2 we have:

  1. 1.

    \(J_{p, \varepsilon _p}\) \(\Gamma \)-converges (for the weak star topology of \(\mathcal {P}({{\,\textrm{spt}\,}}(\mu )\times {{\,\textrm{spt}\,}}(\nu )\)) to \(J_{\infty }\) as \(p\rightarrow \infty \) provided \(\varepsilon _p^{\frac{1}{p}} \rightarrow 0\) as \(p\rightarrow \infty \),

  2. 2.

    if, in addition, \(c\ge 1+\lambda \) with \(\lambda \ge 0\), then \(J_{p, \varepsilon _p}\) \(\Gamma \)-converges to \(J_{\infty }\) as \(p\rightarrow \infty \) provided

    $$\begin{aligned} \lim _{p \rightarrow \infty } \frac{1}{p} \log \Big (1+\varepsilon _p \frac{\log (p)}{(1+\lambda )^p}\Big )=0. \end{aligned}$$
    (3.1)

    In particular, in this case, \(J_{p,1}\) and \(J_{p,p}\) \(\Gamma \)-converge to \(J_{\infty }\) as \(p\rightarrow \infty \).

Proof

1. Let \(\gamma _p\in \Pi (\mu , \nu )\) converge weakly star to \(\gamma \). By nonnegativity of \(H(\gamma _p \vert \mu \otimes \nu )\), we have

$$\begin{aligned} \liminf _p J_{p, \varepsilon _p} (\gamma _p)\ge \liminf _p \Vert c\Vert _{L^p(\gamma _p)}. \end{aligned}$$

Hence, for fixed q, since \(\Vert c\Vert _{L^p(\gamma _p)} \ge \Vert c\Vert _{L^q(\gamma _p)}\) for \(p\ge q\), we have

$$\begin{aligned} \liminf _p J_{p, \varepsilon _p} (\gamma _p)\ge \liminf _p \Vert c\Vert _{L^q(\gamma _p)} =\Vert c\Vert _{L^q(\gamma )} \end{aligned}$$

taking the supremum with respect to q thus yields the desired \(\Gamma \)-liminf inequality

$$\begin{aligned} \liminf _p J_{p, \varepsilon _p} (\gamma _p)\ge \Vert c\Vert _{L^{\infty }(\gamma )}=J_{\infty }(\gamma ). \end{aligned}$$

Let us now prove the \(\Gamma \)-limsup inequality. For any \(\gamma \in \Pi (\mu , \nu )\) we consider \(\gamma ^\delta \), the block approximation of \(\gamma \) at scale \(\delta \in (0,1)\) defined by (3.3) below, whose convergence to \(\gamma \) is guaranteed by the first inequality in (3.4). By concavity, we first have for \(p\ge 1\),

$$\begin{aligned} J_{p, \varepsilon _p}(\gamma ^\delta )&\le \Vert c \Vert _{L^p(\gamma ^\delta )}+ \varepsilon _p^{\frac{1}{p}} H(\gamma ^\delta \vert \mu \otimes \nu )^{\frac{1}{p}}\\&\le \Vert c \Vert _{L^{\infty }(\gamma ^\delta )}+ \varepsilon _p^{\frac{1}{p}} H(\gamma ^\delta \vert \mu \otimes \nu )^{\frac{1}{p}.} \end{aligned}$$

Denoting by \(\omega \) a modulus of continuity of c on \({{\,\textrm{spt}\,}}(\mu )\times {{\,\textrm{spt}\,}}(\nu )\), thanks to the first inequality in (3.4), we have

$$\begin{aligned} \Vert c \Vert _{L^{\infty }(\gamma ^\delta )} \le \Vert c \Vert _{L^{\infty }(\gamma )}+ \omega (\sqrt{2d}\delta ), \end{aligned}$$

being \(\sqrt{2d}\delta \) the diameter of the cubes of the approximation. Moreover, by the second inequality in (3.4), we have

$$\begin{aligned} H(\gamma ^\delta \vert \mu \otimes \nu )^{\frac{1}{p}} \le d^{\frac{1}{p}} \log (L/\delta )^{\frac{1}{p}} \end{aligned}$$

so if we define \(\gamma _p\) as the block approximation of \(\gamma \) at scale \(\delta =\frac{1}{p}\) (say), we obtain

$$\begin{aligned} \limsup _p J_{p, \varepsilon _p} (\gamma _p) \le J_{\infty }(\gamma )+ \limsup _p \Big ( \omega \Big (\frac{\sqrt{2d}}{p}\Big )+ d^{\frac{1}{p}}\varepsilon _p^{\frac{1}{p}} \log (Lp)^{\frac{1}{p}} \Big )=J_\infty (\gamma ), \end{aligned}$$

since we have assumed that \(\varepsilon _p^{\frac{1}{p}} \rightarrow 0\) as \(p\rightarrow +\infty \).

2. Let us now assume that \(c\ge 1+\lambda \), the proof of the \(\Gamma \)-liminf inequality for \(J_{p, \varepsilon _p}\) is exactly as above. For \(\gamma \in \Pi (\mu , \nu )\) and \(\gamma _p\) the block approximation of \(\gamma \) at scale \(\frac{1}{p}\), we have

$$\begin{aligned} J_{p, \varepsilon _p} (\gamma _p)&\le \Vert c \Vert _{L^{\infty }(\gamma _p)} \Big (1+ \frac{d \varepsilon _p\log (L p)}{(1+\lambda )^p} \Big )^{\frac{1}{p}}\nonumber \\&\le \Big (J_{\infty }(\gamma )+\omega \Big (\frac{\sqrt{2d}}{p}\Big )\Big ) \Big (1+ \frac{d \varepsilon _p\log (L p)}{(1+\lambda )^p } \Big )^{\frac{1}{p}} \end{aligned}$$
(3.2)

so that, as soon as (3.1) holds, one has

$$\begin{aligned} \limsup _p J_{p, \varepsilon _p} (\gamma _p) \le J_{\infty }(\gamma ). \end{aligned}$$

\(\square \)

Remark 3.2

Notice that in case \(c\ge 1+\lambda \) for some \(\lambda >0\), \(\Gamma \)-convergence of \(J_{p,\varepsilon _p}\) to \(J_\infty \) is guaranteed even for fastly increasing \(\varepsilon _p\) like \(\varepsilon _p=p^m(1+\lambda )^p\) with \(m\ge 0\). On the contrary, in the general case, the condition \(\varepsilon _p^{\frac{1}{p}} \rightarrow 0\) requires to choose values of \(\varepsilon \) way too small to be used in practice for numerical computations. This suggests in practice to rescale the cost so that it is bounded from below by 1.

Remark 3.3

We observe that in (3.2) it is sufficient that \(||c||_{L^\infty (\gamma _p)}\ge 1+\lambda \), therefore the conclusion of case 2. in Theorem 3.1 remains valid under the weaker assumption that \(v_\infty =\min _{\Pi (\mu , \nu )} J_\infty \ge 1+\lambda \).

Remark 3.4

The conditions on \(\varepsilon _p\) in Theorem 3.1 turn out to be sharp for the \(\Gamma \)-convergence of \(J_{p, \varepsilon _p}\) to \(J_\infty \). To see this, let us first observe that \(\mu \otimes \nu \) is a maximizer of \(J_{\infty }\) on \(\Pi (\mu , \nu )\) since its support contains the support of any other transport plan between \(\mu \) and \(\nu \). Therefore, unless \(J_{\infty }\) is constant on \(\Pi (\mu , \nu )\), every minimizer \(\gamma _\infty \) of \(J_\infty \) satisfies \(H(\gamma _\infty \vert \mu \otimes \nu ) \ge M\) for some \(M\in (0,+\infty ]\). Let us then assume that \(J_{\infty }\) is not constant on \(\Pi (\mu , \nu )\), fix \(\gamma _\infty \) a minimizer of \(J_\infty \) and let \(\gamma _p\in \Pi (\mu , \nu )\) converge weakly star to \(\gamma _\infty \). Since obviously

$$\begin{aligned} J_{p, \varepsilon _p} (\gamma _p) \ge \varepsilon _p^{\frac{1}{p}} H(\gamma _p \vert \mu \otimes \nu )^{\frac{1}{p}} \end{aligned}$$

and since \(\liminf _p H(\gamma _p \vert \mu \otimes \nu ) \ge M>0\), if for some \(a>0\) one has \(\varepsilon _p \ge a^p\) for large p, then

$$\begin{aligned} \liminf _p J_{p, \varepsilon _p} (\gamma _p) \ge a. \end{aligned}$$

This rules out the \(\Gamma \)-limsup inequality as soon as \(J_{\infty }(\gamma _{\infty }) <a\) and shows that the condition \(\varepsilon _p^{\frac{1}{p}} \rightarrow 0\) in the first statement of the Theorem is sharp. Now, if \(c\ge 1+\lambda \) for some \(\lambda \ge 0\), the same computation as before yields that, if for some \(a>0\) and p large

$$\begin{aligned} \log \Big (1+\varepsilon _p \frac{\log (p)}{(1+\lambda )^p}\Big ) \ge a p, \end{aligned}$$

then

$$\begin{aligned} \liminf _p J_{p, \varepsilon _p} (\gamma _p) \ge \liminf _p \varepsilon _p^{\frac{1}{p}} =(1+ \lambda ) e^a \end{aligned}$$

which rules out the \(\Gamma \)-limsup inequality as soon as \(J_{\infty }(\gamma _{\infty }) <(1+\lambda )e^a\), showing sharpness of (3.1).

For the \(\Gamma \)-limsup inequality, we have used the block approximation introduced in [4], which is defined as follows:

Definition 3.5

Let \(\gamma \in \Pi (\mu , \nu )\). For \(\delta >0\) and \(k\in \mathbb {Z}^d\), we denote by \(Q_k^{\delta }\) the cube \(\delta (k+[0,1)^d)\). The block approximation of \(\gamma \) at scale \(\delta \in (0,1)\) is then defined by

$$\begin{aligned} \gamma ^{\delta } := \sum _{k, l \in \mathbb {Z}^d \; : \; \mu (Q_k^\delta )>0, \, \nu (Q_l^\delta )>0} {\gamma (Q_k^\delta \times Q_l^\delta )} \mu _k^\delta \otimes \nu _l^\delta \end{aligned}$$
(3.3)

where \(\mu _k^\delta \) and \(\nu _l^\delta \) are defined by

$$\begin{aligned} \mu _k^\delta (A)=\frac{\mu (Q_k^\delta \cap A)}{\mu (Q_k^\delta )}, \; \nu _l^\delta (A)=\frac{\nu (Q_l^\delta \cap A)}{\nu (Q_l^\delta )} \end{aligned}$$

for every Borel subset A of \(\mathbb {R}^d\).

For the sake of completeness, we give a short proof of the properties of the block approximation that we have used in the proof of Theorem 3.1 (see [4] and [5] for related results):

Lemma 3.6

Let \(\gamma \in \Pi (\mu , \nu )\) and \(\gamma ^{\delta }\) be the block approximation of \(\gamma \) at scale \(\delta \in (0,1)\), then \(\gamma ^\delta \in \Pi (\mu , \nu )\) and

$$\begin{aligned} W_{\infty }(\gamma ^\delta , \gamma ) \le \sqrt{2d} \delta , \; H(\gamma ^{\delta } \vert \mu \otimes \nu ) \le d \log \Big (\frac{L}{\delta }\Big ), \end{aligned}$$
(3.4)

where L is a constant depending only on \({{\,\textrm{spt}\,}}(\mu )\) (actually on its diameter).

Proof

The fact that \(\gamma ^\delta \in \Pi (\mu , \nu )\) is easy to check by construction (see  [4]). Now observe that by (3.3) the density of \(\gamma ^{\delta }\) with respect to \(\mu \otimes \nu \) is

$$\begin{aligned} \frac{d\gamma ^\delta }{d\mu \otimes \nu }(x,y)={\left\{ \begin{array}{ll} \frac{\gamma (Q_k^\delta \times Q_l^\delta )}{\mu (Q_k^\delta ) \nu (Q_l^\delta )} \quad &{}\text {if }(x,y)\in Q_k^\delta \times Q_l^\delta , \ \text {and }\mu (Q_k^\delta ), \, \nu (Q_j^\delta )>0,\\ 0 \quad &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$

Therefore

$$\begin{aligned} H(\gamma ^\delta \vert \mu \otimes \nu )&= \sum _{k, l \in \mathbb {Z}^d \; : \; \mu (Q_k^\delta )>0, \, \nu (Q_l^\delta )>0}\int _{Q_k^\delta \times Q_l^\delta }\log \left( \frac{\gamma (Q_k^\delta \times Q_l^\delta )}{\mu (Q_k^\delta ) \nu (Q_l^\delta )} \right) d\gamma ^\delta \\&\le \sum _{k, l \in \mathbb {Z}^d \; : \; \mu (Q_k^\delta )>0, \, \nu (Q_l^\delta )>0}\int _{Q_k^\delta \times Q_l^\delta }\log \left( \frac{1}{\mu (Q_k^\delta )} \right) d\gamma ^\delta \\&=\sum _{k \in \mathbb {Z}^d \; : \; \mu (Q_k^\delta )>0}\mu (Q_k^\delta )\log \left( \frac{1}{\mu (Q_k^\delta )}\right) , \end{aligned}$$

where the inequality is due to the fact that \(\frac{\gamma (Q_k^\delta \times Q_l^\delta )}{\nu (Q_l^\delta )}\le 1\), while the last equality is obtained summing over ł. If \(L\ge 1\) is such that \({{\,\textrm{spt}\,}}\mu \) is contained in a cube of side \(L-1\), the number of cubes \(Q_k^\delta \) with positive \(\mu \)-measure is not greater than \(N_\delta := \left( \frac{L}{\delta }\right) ^d\). Therefore, applying Jensen’s inequality to the concave function \(f(z)=z\log (\frac{1}{z})\), we have

$$\begin{aligned} H(\gamma ^\delta \vert \mu \otimes \nu )&\le \sum _{k=1}^{N_\delta }\mu (Q_k^\delta )\log \left( \frac{1}{\mu (Q_k^\delta )}\right) \\&\le N_\delta \left( \frac{1}{N_\delta }\sum _{k=1}^{N_\delta }\mu (Q_k^\delta )\log \left( \frac{1}{\sum _{k=1}^{N_\delta }\frac{1}{N_\delta }\mu (Q_k^\delta )} \right) \right) \\&=\log (N_\delta )=d\log (L)-d\log (\delta ), \end{aligned}$$

which proves the second inequality in (3.4).

By construction \(\gamma (Q_k^\delta \times Q_l^\delta )=\gamma ^\delta (Q_k^\delta \times Q_l^\delta )\), for any kl. Let J be the set of pairs of indices (kl) such that \(\gamma ^\delta (Q_k^\delta \times Q_l^\delta )>0\) and set \(\overline{Q}_{j}=Q_k^\delta \times Q_l^\delta \), for any \(j=(k,l)\in J\). We define

$$\begin{aligned} \eta ^\delta :=\sum _{j \,: \, \gamma (\overline{Q}_{j})>0}\gamma ({\overline{Q}}_j) \gamma _j\otimes \gamma _j^\delta , \end{aligned}$$

where \(\gamma _j(A):=\frac{\gamma (A\cap \overline{Q}_{j})}{\gamma ({\overline{Q}}_j)}\) and \(\gamma ^\delta _j(A):=\frac{\gamma ^\delta (A\cap \overline{Q}_{j})}{\gamma ^\delta ({\overline{Q}}_j)}\). By construction \(\eta ^\delta \in \Pi (\gamma ,\gamma ^\delta )\), thus

$$\begin{aligned} W_{\infty }(\gamma ,\gamma ^\delta )\le \vert \vert x-y \vert \vert _{L^{\infty }(\eta ^\delta )}\le {{\,\textrm{diam}\,}}({\overline{Q}}_j)=\sqrt{2d}\delta . \end{aligned}$$

\(\square \)

4 Selection of Plans with \(\infty \)-Cyclically Monotone Support

As shown in [6] and [10], restrictable minimizers of \(J_\infty \) are supported on \(\infty \)-cyclically monotone sets, such sets are defined as follows:

Definition 4.1

A set \(\Gamma \subset \mathbb {R}^d\times \mathbb {R}^d\) is said to be \(\infty \)-cyclically monotone if we have that

$$\begin{aligned} \max _{i=1,\dots ,k}\left\{ c(x_{i},y_{i})\right\} \le \max _{i=1,\dots ,k}\left\{ c(x_{i},y_{i+1})\right\} , \end{aligned}$$

for all \(k\in \mathbb {N}^*\) and \(\left\{ (x_i,y_i) \right\} _{i=1}^{k}\subset \Gamma \), where \(y_{k+1}=y_1\). A transport plan \(\gamma \) is said to be \(\infty \)-cyclically monotone if \({{\,\textrm{spt}\,}}\gamma \) is an \(\infty \)-cyclically monotone set.

Since every permutation can be obtained as composition of cycles on disjoint sets and trivial cycles on fixed points, one can see that \(\infty \)-cyclical monotonicity of a set \(\Gamma \subset \mathbb {R}^d\times \mathbb {R}^d\) is equivalent to the fact that for every \(k\in \mathbb {N}^*\), every \(\left\{ (x_i,y_i) \right\} _{i=1}^{k}\subset \Gamma \) and every \(\sigma \in \Sigma (k)\) (where \(\Sigma (k)\) is the permutation group of \(\{1, \ldots , k\}\)), one has

$$\begin{aligned} \max _{i=1,\dots ,k}\left\{ c(x_{i},y_{i})\right\} \le \max _{i=1,\dots ,k}\left\{ c(x_{i},y_{\sigma (i)})\right\} . \end{aligned}$$

Usually, in the literature, the previous definition is called \(\infty \)-c-cylical monotonicity, to keep notations simple, we have omitted the dependence on the cost c; let us remark that \(\infty \)-cyclical monotonicity is invariant by replacing c by a strictly increasing transformations of c (like \(c^p\) with \(p>0\)), contrarily to the usual notion of c-cyclical monotonicity. We recall that a nonempty subset \(\Gamma \) of \(\mathbb {R}^d \times \mathbb {R}^d\) is called c-cyclically monotone when for every \(k\in \mathbb {N}^*\), every \((x_i,y_i)_{i=1}^k\subset \Gamma \) and every permutation \(\sigma \in \Sigma (k)\), one has

$$\begin{aligned} \sum _{i=1}^k c(x_i, y_i) \le \sum _{i=1}^k c(x_i, y_{\sigma (i)}). \end{aligned}$$
(4.1)

Our goal in this section is to investigate the convergence of the entropic approximation to \(\infty \)-cyclically monotone plans. We shall make use of the analysis of the landmark recent article [8]. Let us first recall the notion of \((c,\varepsilon )\)-cyclically invariance introduced in [8]:

Definition 4.2

Let \(c:\mathbb {R}^d\times \mathbb {R}^d\rightarrow (0,\infty )\) be a measurable function. A coupling \(\gamma \in \Pi (\mu ,\nu )\) is called \((c,\varepsilon )\)-cyclically invariant if \(\gamma \ll \mu \otimes \nu \) and its density admits a representative \(\frac{ \text {d} \gamma }{\text {d} \mu \otimes \nu }:\mathbb {R}^d\times \mathbb {R}^d\rightarrow (0,\infty )\) such that

$$\begin{aligned}{} & {} \prod _{i=1}^{k}\frac{ \text {d} \gamma }{\text {d} \mu \otimes \nu }(x_i,y_i)\\{} & {} \quad =\exp \left( -\frac{1}{\varepsilon }\left[ \sum _{i=k}^{k}(c(x_i,y_i)-c(x_i,y_{i+1}))\right] \right) \prod _{i=1}^{k}\frac{ \text {d} \gamma }{\text {d} \mu \otimes \nu }(x_i,y_{i+1}), \end{aligned}$$

for all \(k\in \mathbb {N}^*\) and \( \left\{ (x_i,y_i) \right\} _{i=1}^{k}\subset \mathbb {R}^d\times \mathbb {R}^d\), where \(y_{k+1}=y_1\).

In [8] (Proposition 2.2), it is shown that whenever (\(\varepsilon \)-EOT) is finite, the (unique) solution \(\gamma _{\varepsilon }\) of (\(\varepsilon \)-EOT) is characterized by being \((c,\varepsilon )\)-cyclically invariant. The next lemma, which is a part of Lemma 3.1 in [8], provides an estimate for \((c,\varepsilon )\)-cyclically invariant couplings, which will be useful for our purpose. For the reader’s convenience we provide also here the proof.

Lemma 4.3

Let \(\varepsilon >0\) and \(\gamma _{\varepsilon }\in \Pi (\mu ,\nu )\) be \((c,\varepsilon )\)-cyclical invariant. For every fixed \(k\ge 2\), \(k\in \mathbb {N}\), and \(\delta \ge 0\), let \(A_{k,c}(\delta )\) be the set defined by

$$\begin{aligned} A_{k, c} (\delta ):=\left\{ \left( x_i,y_i\right) _{i=1}^k\in \left( \mathbb {R}^d\times \mathbb {R}^d\right) ^k \, : \, \sum _{i=1}^{k}c(x_i,y_i)-\sum _{i=1}^{k}c(x_i,y_{i+1})\ge \delta \right\} \nonumber \\ \end{aligned}$$
(4.2)

where \(y_{k+1}=y_1\). Let \(A\subset A_{k, c}(\delta )\) be Borel. Then \(\gamma _\varepsilon ^k:=\prod _{i=1}^k(\gamma _\varepsilon )(\text {d}x_i,\text {d}y_i)\) satisfies

$$\begin{aligned} \gamma _\varepsilon ^k(A)\le e^{\frac{-\delta }{\varepsilon }}. \end{aligned}$$

Proof

By Definition 4.2 of \((c,\varepsilon )\)-cyclical invariance, for \(\gamma _\varepsilon ^k\) a.e. \((x_i,y_i)_{i=1}^k\in A\) we have that

$$\begin{aligned} \prod _{i=1}^{k}\frac{d\gamma _\varepsilon }{\mu \otimes \nu }(x_i,y_i)\le e^{-\frac{\delta }{\varepsilon }}\prod _{i=1}^{k}\frac{d\gamma _\varepsilon }{\mu \otimes \nu }(x_i,y_{i+1}). \end{aligned}$$

In one defines the set \({\overline{A}}:=\{(x_i,y_{i+1}) \,: \, (x_i,y_{i})\in A \}\), by integrating over A with respect to \(\gamma _\varepsilon ^k=\prod \gamma _\varepsilon (x_i,y_i)=\prod \gamma _\varepsilon (x_i,y_{i+1})\) we obtain

$$\begin{aligned} \gamma _\varepsilon ^k(A)\le e^{-\frac{\delta }{\varepsilon }}\gamma _\varepsilon ^k({\overline{A}})\le e^{-\frac{\delta }{\varepsilon }}. \end{aligned}$$

\(\square \)

The fact that the entropic approximation procedure selects \(\infty \)-cyclically monotone plans is then ensured by the following:

Theorem 4.4

Under the general assumptions of Sect. 2, further assume that \(c>0\) everywhere, and let \(\gamma _{p, \varepsilon _p}\) be the minimizer of \(J_{p, \varepsilon _p}\). Then, any weak star cluster point \(\gamma _{\infty } \) as \(p\rightarrow \infty \) of the family \(\{\gamma _{p, \varepsilon _p}\}_{p\ge 1} \) is \(\infty \)-cyclically monotone, provided

  1. 1.

    \(\varepsilon _p^{\frac{1}{p}} \rightarrow 0\) as \(p\rightarrow \infty \),

  2. 2.

    \(\varepsilon _p=o(p(1+\lambda )^p)\) if, in addition, \(c\ge 1+ \lambda \) with \(\lambda \ge 0\).

Proof

Up to extracting a subsequence, let us assume that \(\gamma _{p, \varepsilon _p}\) weakly star converges to \(\gamma _\infty \). We proceed by contradiction assuming that there exist \(\delta >0\) and a finite sequence of points \(\left( x_i,y_i\right) _{i=1}^k\) contained in \({{\,\textrm{spt}\,}}\gamma _\infty \), such that

$$\begin{aligned} \max _{i=1,\dots ,k}\left\{ c(x_i,y_i)\right\} >\max _{i=1,\dots ,k}\left\{ c(x_i,y_{i+1})\right\} +\delta . \end{aligned}$$

By the continuity of the cost function c and by the uniform convergence of \(\left( \sum _{i=1}^{k}c(x'_i,y'_i)^p\right) ^{\frac{1}{p}}\) to \(\max _{i=1,\dots ,k}\{c(x'_i,y'_i)\}\), as \(p\rightarrow +\infty \), we deduce that for every \(i=1,\dots ,k\) there exists an open neighborhood \(U_i\) of \((x_i,y_i)\) and \(p(\delta )>0\), such that

$$\begin{aligned} \left( \sum _{i=1}^{k}c(x'_i,y'_i)^p\right) ^{\frac{1}{p}}>\left( \sum _{i=1}^{k}c(x'_i,y'_{i+1})^p\right) ^{\frac{1}{p}}+\delta , \end{aligned}$$

for every \((x'_i,y'_i)\in U_i\) (again with the convention that \(y'_{k+1}=y'_1\)) and \(p\ge p(\delta )\). We now observe that

$$\begin{aligned}&\sum _{i=1}^{k}c(x'_i,y'_i)^p>\left( \left( \sum _{i=1}^{k}c(x'_i,y'_{i+1})^p\right) ^{\frac{1}{p}}+\delta \right) ^p\nonumber \\&\quad \ge \sum _{i=1}^{k}c(x'_i,y'_{i+1})^p+p\left( \sum _{i=1}^{k}c(x'_i,y'_{i+1})^p\right) ^\frac{p-1}{p}\delta , \end{aligned}$$
(4.3)

where the last inequality follows from the convexity of \(t\mapsto t^p\), with \(p>1\). Since \(c>0\) there exists some \(b>0\) such that \(c\ge b\) on each \(U_i\), \(i=1, \dots , k\), hence, for every \((x'_i,y'_i)\in U_i\) and \(p\ge p(\delta )\)

$$\begin{aligned} \sum _{i=1}^{k}c(x'_i,y'_i)^p>\sum _{i=1}^{k}c(x'_i,y'_{i+1})^p+p\delta b^{p-1}. \end{aligned}$$
(4.4)

We thus have \(U_1\times \dots \times U_k\subset A_{k, c^p}(p\delta b^{p-1})\), where \(A_{k, c^p}(p\delta b^{p-1})\) is defined as in (4.2) with c replaced by \(c^p\). Applying Lemma 4.3, we thus get:

$$\begin{aligned}&\gamma _{\infty }^k(U_1\times \dots \times U_k):=\prod _{i=1}^k \gamma _{\infty } (U_i) \nonumber \\&\quad \le \liminf _p\gamma ^k_{p,\varepsilon _p}(U_1\times \dots \times U_k):=\prod _{i=1}^k \gamma _{p,\varepsilon _p} (U_i) \nonumber \\&\quad \le \liminf _p e^{-\frac{p\delta b^{p-1}}{\varepsilon _p}} \end{aligned}$$
(4.5)

so that if \(\varepsilon _p^{\frac{1}{p}} \rightarrow 0\) as \(p\rightarrow \infty \), for large enough p one has \(\varepsilon _p \le b^p\), which yields

$$\begin{aligned} \liminf _p e^{-\frac{p\delta b^{p-1}}{\varepsilon _p}}=0. \end{aligned}$$

On the other hand, since the points \((x_i,y_i)\) belong to \({{\,\textrm{spt}\,}}\gamma _\infty \), we have that \(\gamma _{\infty }^k(U_1\times \dots \times U_k)>0\), which yields the desired contradiction. This shows the first assertion. Now, if \(c\ge (1+\lambda )\) with \(\lambda \ge 0\), we can replace b by \((1+\lambda )\) in (4.5) and the same conclusion will be reached as soon as \(\varepsilon _p=o(p(1+\lambda )^p)\), proving the second assertion. \(\square \)

Remark 4.5

Despite what we observed in Remark 3.3 regarding Theorem 3.1, in the proof of the second assertion of Theorem 4.4, it does not seem that the condition \(c(x,y)\ge 1\) for every (xy) can be weakened to \(J_\infty \ge 1\). Note also that the condition \(\varepsilon _p=o(p(1+\lambda )^p)\) is stronger than condition (3.1) that guarantees \(\Gamma \)-convergence when \(c\ge 1+\lambda \).

5 Some Estimates on the Speed of Convergence

Our aim in this section is to give some error estimates for \(v_p-v_\infty \) where

$$\begin{aligned} v_p:=\min _{\gamma \in \Pi (\mu ,\nu )}J_p \quad \text {and} \quad v_{\infty }:=\min _{\gamma \in \Pi (\mu ,\nu )}J_{\infty }, \end{aligned}$$
(5.1)

where \(J_p:=J_{p,1}\) (i.e. for the sake of simplicity we take \(\varepsilon _p=1\) as entropic penalization parameter).

5.1 Upper Bounds

Proposition 5.1

(Upper bounds on the speed of convergence). Let \(c\in C^{0,\alpha }(\mathbb {R}^d\times \mathbb {R}^d)\), with \(\alpha \in (0,1]\) and let us assume that \(v_\infty \ge 1+\lambda \) for some \(\lambda \ge 0\). Then we have

$$\begin{aligned} v_{p}-v_{\infty }\le {\left\{ \begin{array}{ll} O(e^{-\beta p}), \text{ with } \beta =\min \{\alpha , \log (1+\lambda )\} &{} \text{ if } \lambda >0\\ O \Big ( \frac{ \log (\log p))}{p} \Big ) &{} \text{ if } \lambda =0. \end{array}\right. } \end{aligned}$$

Proof

Let \(\gamma _{\infty }\) be a minimizer of \(J_{\infty }\) and \(\gamma ^{\delta }\) be the block approximation of \(\gamma _{\infty }\) at scale \(\delta \in (0,1)\), as defined in (3.3). We observe that, by construction and by the Hölder condition on c, denoting by A the \(C^{0, \alpha }\) semi-norm of c, we first have

$$\begin{aligned} ||c||_{L^{\infty }(\gamma ^{\delta })}\le ||c||_{L^{\infty }(\gamma _{\infty })} + A \delta ^{\alpha }. \end{aligned}$$

Then

$$\begin{aligned}&v_p\le \left( \int c^pd\gamma ^{\delta }+ H(\gamma ^{\delta }\vert \mu \otimes \nu ) \right) ^{\frac{1}{p}}\le \left( ||c||^p_{L^{\infty }(\gamma ^{\delta })}+ H(\gamma ^{\delta }\vert \mu \otimes \nu ) \right) ^{\frac{1}{p}}\nonumber \\&\quad \le \left( ||c||_{L^{\infty }(\gamma _{\infty })} + A \delta ^{\alpha }\right) \left( 1+\frac{H(\gamma ^{\delta }\vert \mu \otimes \nu )}{\left( 1+\lambda \right) ^{p}} \right) ^{\frac{1}{p}}\nonumber \\&\quad \le \left( v_{\infty } + A \delta ^{\alpha }\right) \left( 1+\frac{d\log (L/\delta )}{\left( 1+\lambda \right) ^{p}} \right) ^{\frac{1}{p}}, \end{aligned}$$
(5.2)

where the last inequality follows from Lemma 3.6. For \(\lambda >0\), choosing \(\delta :=e^{-p}\), (5.2) becomes (setting \(C=d \log (L)\))

$$\begin{aligned} v_p\le \left( v_{\infty }+A e^{-\alpha p}\right) \left( 1+ \frac{C +dp}{(1+\lambda )^p} \right) ^{\frac{1}{p}}, \end{aligned}$$

then, we observe that for large p, one has

$$\begin{aligned} \left( 1+ \frac{C+dp}{(1+\lambda )^p} \right) ^{\frac{1}{p}} =1+ \frac{d}{(1+\lambda )^p} +o\Big (\frac{1}{(1+\lambda )^p}\Big ). \end{aligned}$$

Therefore, for p large enough,

$$\begin{aligned} v_p\le v_{\infty }+Be^{-\beta p}, \end{aligned}$$

for some \(B>0\) and \(\beta =\min \{\alpha , \log (1+\lambda )\}\).

Now if \(\lambda =0\), we choose \(\delta =p^{-1/\alpha }\) in (5.2) which gives

$$\begin{aligned} v_p&\le \Big (v_\infty + \frac{A}{p}\Big ) \exp \Big (\frac{1}{p} \log (1+ d \log (Lp^{1/\alpha }))\Big )\\&=v_{\infty } + \frac{1}{\alpha }\frac{v_\infty }{p} \log (\log (p)) +o\Big ( \frac{ \log (\log (p)}{p}\Big ) \end{aligned}$$

which ends the proof. \(\square \)

5.2 Upper and Lower Bounds in the Discrete Case

Let us now consider the discrete case where there exist \(x_1,\dots , x_N\) and \(y_1,\dots ,y_M\) points in \(\mathbb {R}^d\) such that

$$\begin{aligned} \mu = \sum _{i=1}^N\mu _{i}\delta _{x_i} \quad \text {and} \quad \nu =\sum _{j=1}^M\nu _j\delta _{y_j} \end{aligned}$$
(5.3)

with (strictly, without loss of generality) positive weights \(\mu _i\) and \(\nu _j\) summing to 1. To shorten notations let us set \(c_{ij}=c(x_i, y_j)\ge 0\). In this setting, transport plans \(\gamma \) will simply be denoted as \(N\times M\) matrices with entries \(\gamma ^{ij}\). We also recall that in the discrete setting \(\Pi (\mu ,\nu )\) is a convex polytope and the constraint \(\gamma \in \Pi (\mu ,\nu )\) is equivalent to

$$\begin{aligned} \gamma \mathbb {1}_M= \left( \sum _{j=1}^M\gamma ^{ij}\right) _i=(\mu _i)_i \ \text {and} \ \gamma ^\intercal \mathbb {1}_N=\left( \sum _{i=1}^N\gamma ^{ij}\right) _i=(\nu _j)_j. \end{aligned}$$

In the discrete setting transport plans have a finite entropy with respect to \(\mu \otimes \nu \), with the (crude) bound

$$\begin{aligned} H(\gamma \vert \mu \otimes \nu ) \le M:=-\sum _{i=1}^N \mu _i \log (\mu _i)-\sum _{j=1}^N \nu _j \log (\nu _j) \end{aligned}$$

for every \(\gamma \in \Pi (\mu , \nu )\). So if \(v_\infty \ge 1+\lambda \) with \(\lambda \ge 0\), taking \(\gamma _\infty \) a minimizer of \(J_\infty \), we obtain

$$\begin{aligned} v_p&\le J_p(\gamma _\infty )\le v_{\infty } \Big (1+ \frac{M}{(1+\lambda )^p} \Big )^{\frac{1}{p}}\\&\le v_\infty \Big (1+ \frac{M}{p(1+\lambda )^p} +o\Big ( \frac{M}{p(1+\lambda )^p} \Big ) \Big ) \end{aligned}$$

which gives (in a straightforward way, i.e. without using block approximation) an exponentially decaying upper bound for \(v_p-v_\infty \) for \(\lambda >0\) and an algebraic upper bound \(v_p-v_\infty \le O(1/p)\) if \(\lambda =0\). The fact that \(v_\infty \ge 1\) therefore ensures that \(p(v_p-v_\infty )\) is bounded from above. It turns out, that in the discrete setting, this condition also guarantees that we also have an algebraically decaying lower bound for the error. To see this, we first need the following:

Lemma 5.2

Let \(\mu \) and \(\nu \) be discrete measures i.e. of the form (5.3) and define

$$\begin{aligned} F_\infty :=\{\gamma \in \Pi (\mu , \nu ) \;: \; J_{\infty }(\gamma )=v_{\infty }\} \end{aligned}$$

and for every \(\gamma \in F_{\infty }\),

$$\begin{aligned} m(\gamma ):=\max \{ \gamma ^{ij} \;: \; \gamma ^{ij}>0, \; c_{ij}= v_{\infty }\} \end{aligned}$$

then there is some \(\theta >0\) such that \(m(\gamma )\ge \theta \), for every \(\gamma \in F_{\infty }\).

Proof

Since \(v_{\infty }\) is the minimum of \(J_{\infty }\) over \(\Pi (\mu , \nu )\), one can write \(F_{\infty }\) as the set of transport plans for which

$$\begin{aligned} \gamma ^{ij}>0 \Rightarrow c_{ij}-v_\infty \le 0 \end{aligned}$$

or equivalently

$$\begin{aligned} l(\gamma ):= \sum _{ij} \gamma ^{ij} (c_{ij}-v_{\infty })_+ =0. \end{aligned}$$

In other words, \(F_\infty \) is the facet of \(\Pi (\mu , \nu )\) where the linear form l (which is nonnegative on \(\Pi (\mu , \nu )\)) achieves its minimum and it is therefore a convex polytope, whose extreme points belong to the (finite) set of extreme points of \(\Pi (\mu , \nu )\). Let us then denote by \(\{\gamma _a, \, a \in A\}\) with A a finite index set the set of extreme points of \(F_{\infty }\). Thanks to Minkowski’s theorem, we can write any \(\gamma \in F_{\infty }\) as

$$\begin{aligned} \gamma :=\sum _{a\in A} \alpha _a \gamma _a, \end{aligned}$$

for some weights \(\alpha _a\ge 0\) summing to 1. In particular we may pick \(a_0 \in A\) with \(\alpha _{a_0} \ge \frac{1}{ \vert A\vert }\) (with \(\vert A\vert \) denoting the cardinality of A). Then we have

$$\begin{aligned} m(\gamma ) \ge \frac{m(\gamma _{a_0})}{\vert A\vert } \ge \theta := \min _{a \in A} \frac{m(\gamma _a)}{\vert A\vert }>0, \end{aligned}$$

where the strict positivity of \(\theta \) then follows from the fact that A is finite and \(m(\gamma _a )>0\) for every \(a\in A\). \(\square \)

We are now ready to prove the announced lower bound.

Proposition 5.3

(Lower bound on the speed of convergence, discrete case). Assume that \(\mu \) and \(\nu \) are discrete measures i.e. of the form (5.3) and that \(v_\infty \ge 1\), then \(p(v_p-v_\infty )\) is bounded from below. Hence

$$\begin{aligned} v_p-v_\infty =O\Big (\frac{1}{p}\Big ). \end{aligned}$$

Proof

Let us argue by contradiction and assume that \(p(v_p-v_\infty )\) is unbounded from below, then there is a sequence \(p_n \rightarrow \infty \) as \(n \rightarrow \infty \) such that

$$\begin{aligned} \lim _n p_n(v_{p_n}-v_\infty )=-\infty . \end{aligned}$$
(5.4)

Letting \(\gamma _n\) be the minimizer of \(J_{p_n}\), passing to a subsequence if necessary, we may assume that \(\gamma _n\) converges to some \(\gamma _\infty \) which belongs to \(F_\infty \) (as defined in Lemma 5.2) since \(v_\infty \ge 1\). In particular, there exists \(i_{0}, j_{0}\) such that

$$\begin{aligned} c_{i_0 j_0}=v_{\infty } \text{ and } \gamma _{\infty }^{i_0 j_0} \ge \theta >0, \end{aligned}$$

where \(\theta \) is the lower bound from Lemma 5.2. Since \(\gamma _n^{i_0 j_0 }\) converges to \(\gamma _{\infty }^{i_0 j_0}\) we have, for large enough n, \(\gamma _n^{i_0 j_0 } \ge \frac{\theta }{2}\), hence, using the fact that \(c_{i_0 j_0}=v_{\infty }\) and again the nonnegativity of the entropy

$$\begin{aligned} v_{p_n}&\ge v_{\infty } \Big ( \frac{\theta }{2} \Big )^{\frac{1}{p_n}}=v_{\infty } \exp \Big ( \frac{1}{p_n} \log \frac{\theta }{2} \Big )\\&\ge v_{\infty } \Big (1+ \frac{1}{p_n} \log \frac{\theta }{2} \Big ) \end{aligned}$$

which is the desired contradiction to (5.4). \(\square \)

5.3 A Large Deviations Upper Bound

In this (somehow independent) paragraph, our goal is to discuss a (partial) extension of the large deviations results of [8] to the \(L^\infty \)-optimal transport framework. Considering the Monge–Kantorovich problem (OT) it is well-known (see [9, 15]) that the optimality for (OT) of a plan \(\gamma \in \Pi (\mu , \nu )\) is characterized by a property of c-cyclical monotonicity of its support \(\Gamma :={{\,\textrm{spt}\,}}(\gamma )\), where c-cyclical monotonicity is defined by (4.1). To analyze fine convergence properties of the entropic approximation of (OT), defined by (\(\varepsilon \)-EOT), assuming convergence (taking a subsequence if necessary) as \(\varepsilon \rightarrow 0^+\), of the minimizer \(\gamma _\varepsilon \) of (\(\varepsilon \)-EOT) to some \(\gamma \) and denoting by \(\Gamma \) the c-cyclically monotone set \({{\,\textrm{spt}\,}}(\gamma )\), the authors of [8] introduced

$$\begin{aligned} I(x,y):=\sup _{k\ge 2}\sup _{(x_i,y_i)_{i=2}^k\subset \Gamma }\sup _{\sigma \in \Sigma (k)} \Big \{\sum _{i=1}^{k}c(x_i,y_i)-\sum _{i=1}^kc(x_i,y_{\sigma (i)}) \Big \}, \; (x,y)\in \mathbb {R}^d\times \mathbb {R}^d \end{aligned}$$

with \((x_1, y_1)=(x,y)\). They proved that I is a good rate function for the family of optimal entropic plans, \(\{\gamma _\varepsilon \}_{\varepsilon >0}\) in the sense that it obeys, under very general conditions, the large deviations principle

$$\begin{aligned}&\limsup _{\varepsilon \rightarrow 0}\varepsilon \log (\gamma _\varepsilon (C))\le -\inf _{(x,y)\in C} I(x,y) \quad \text {and}\\&\liminf _{\varepsilon \rightarrow 0}\varepsilon \log (\gamma _\varepsilon (U))\ge -\inf _{(x,y)\in U}I(x,y), \end{aligned}$$

for every compact C and every open U included in \({{\,\textrm{spt}\,}}(\mu )\times {{\,\textrm{spt}\,}}(\nu )\). Denoting by \(\gamma _{p, \varepsilon }\) the minimizer of \(J_{p, \varepsilon }\), the results of [8] (using \(c^p\) instead of c) of course apply to the convergence of \(\gamma _{p, \varepsilon }\) as \(\varepsilon \rightarrow 0^+\) for a fixed exponent p. For \(L^\infty \) optimal transport, it makes more sense to rather consider the situation where \(\varepsilon >0\) is fixed and p tends to \(\infty \). More precisely, we know from Theorem 4.4, that if \(c\ge 1\), \(\varepsilon >0\) is fixed, the family \(\{\gamma _{p, \varepsilon }\}_{p\ge 1}\) weakly star converges (again possibly after an extraction) to some \(\gamma _{\infty }\) as \(p\rightarrow \infty \), \(\Gamma _\infty :={{\,\textrm{spt}\,}}( \gamma _{\infty })\) is \(\infty \)-cyclically monotone. In addition to the general assumptions of Sect. 2, we shall further assume throughout this paragraph that

  • \(c\ge 1\),

  • \(\varepsilon >0\) being fixed, the sequence of minimizers \(\{\gamma _{p, \varepsilon }\}_{p\ge 1}\) weakly star converges as \(p\rightarrow \infty \) to some \(\gamma _{\infty }\), with (\(\infty \)-cyclically monotone) support \(\Gamma _\infty \).

Let us define for every \((x,y)\in \mathbb {R}^d\times \mathbb {R}^d\)

$$\begin{aligned} I_{\infty }(x,y):=\sup _{k\ge 2}\sup _{(x_i,y_i)_{i=2}^k \subset \Gamma _\infty }\sup _{\sigma \in \Sigma (k)} \Big \{\ \max _{1\le i\le k}\{c(x_i,y_i)\}-\max _{1\le i\le k}\{c(x_i,y_{\sigma (i)})\} \Big \}, \end{aligned}$$

where \((x_1,y_1)=(x,y)\). Also define

$$\begin{aligned} \widetilde{I}_{\infty }(x,y):=\sup _{k\ge 2}\sup _{(x_i,y_i)_{i=2}^k \subset \Gamma _\infty } \Big \{\ \max _{1\le i\le k}\{c(x_i,y_i)\}-\max _{1\le i\le k}\{c(x_i,y_{i+1})\} \Big \}, \end{aligned}$$

where \((x_1,y_1)=(x,y)\) and \(y_{k+1}=y_1\). In our supremal optimal transport setting, we cannot really expect that \(I_{\infty }\) is a good rate function for \(\{\gamma _{p, \varepsilon }\}_{p\ge 1}\); indeed, \({{\,\textrm{argmin}\,}}_{\Pi (\mu ,\nu )}J_{\infty }\) is unchanged when replacing c with a strictly increasing function of c, while the same does not hold for the function \(I_{\infty }\). However it can be interesting to have a better understanding of the function \(I_\infty \), which still provides an upper bound for the family \(\{\gamma _{p,\varepsilon }\}\) (see Proposition 5.6).

Lemma 5.4

Let \(I_\infty \) and \(\widetilde{I}_\infty \) be defined as above, then

  • \(I_\infty \) and \(\widetilde{I}_{\infty }\) are related by \(I_{\infty }=\max (0, \widetilde{I}_{\infty })\),

  • \(I_\infty \) and \(\widetilde{I}_{\infty }\) are lower semicontinuous, \(I_\infty \ge 0\), \(I_{\infty }=0\) on \(\Gamma _\infty \),

  • \(I_\infty \) and \(\widetilde{I}_{\infty }\) coincide on \(({{\,\textrm{spt}\,}}(\mu )\times \mathbb {R}^d) \cup (\mathbb {R}^d \times {{\,\textrm{spt}\,}}(\nu ))\).

Proof

The fact that \(I_\infty \ge \max (0, \widetilde{I}_{\infty })\) is obvious as well as the fact that \(\widetilde{I}_{\infty }= 0\) on \(\Gamma _\infty \).

We now prove the converse inequality. Fix now \((x,y)=(x_1,y_1) \in \mathbb {R}^d \times \mathbb {R}^d\), \(k\ge 2\), \((x_2, y_2), \ldots (x_k, y_k)\) in \(\Gamma _{\infty }\) and \(\sigma \in \Sigma (k)\). We can then partition \(\{1, \ldots , k\}\) into \(I_0\) the (possibly empty) set of fixed-points of \(\sigma \) and disjoint (empty if \(\sigma \) is the identity) orbits \(I_1, \ldots , I_l\) on each of which \(\sigma \) is a cycle, this means that for \(j=1, \ldots , l\), we may denote \((x_i, y_i)_{i\in I_j}\) as \((\widetilde{x}^j_r,\widetilde{y}^j_r)_{r=1, \ldots , \vert I_j\vert }\) and \((x_i, y_{\sigma (i)})_{i\in I_j}\) as \((\widetilde{x}^j_r,\widetilde{y}^j_{r+1})_{r=1, \ldots , \vert I_j\vert }\) with the convention \(\widetilde{y}^j_{\vert I_j \vert +1}=\widetilde{y}^j_{1}\). We now observe that

$$\begin{aligned} \max _{1\le i\le k}\{c(x_i,y_i)\}-\max _{1\le i\le k}\{c(x_i,y_{\sigma (i)})\} \le \max _j \Big \{\max _{i\in I_j} c(x_i, y_i)- \max _{i\in I_j} c(x_i, y_\sigma (i)) \Big \}. \end{aligned}$$

where the max with respect to j is taken on indices for which \(I_j\) is nonempty. To shorten notations, for such a j let us set

$$\begin{aligned} \beta _j:=\max _{i\in I_j} c(x_i, y_i)- \max _{i\in I_j} c(x_i, y_\sigma (i)). \end{aligned}$$

Of course if \(I_0\) is nonempty, \(\beta _0=0\), now if \(j\ge 1\) and \(I_j\) is nonempty

$$\begin{aligned} \beta _j =\max _{r=1, \ldots , \vert I_j\vert } c(\widetilde{x}^j_r, \widetilde{y}^j_r)- \max _{r=1, \ldots , \vert I_j\vert } c(\widetilde{x}^j_r, \widetilde{y}^j_{r+1})\le \widetilde{I}_{\infty }(\widetilde{x}_1^j, \widetilde{y}_1^j). \end{aligned}$$

So, if \((\widetilde{x}_1^j, \widetilde{y}_1^j)=(x_1, y_1)\), \(\beta _j \le \widetilde{I}_{\infty }(x,y)\) and if \((\widetilde{x}_1^j, \widetilde{y}_1^j)\ne (x_1, y_1)\), then \((\widetilde{x}_1^j, \widetilde{y}_1^j)\in \Gamma _{\infty }\), hence \(\widetilde{I}_{\infty }(\widetilde{x}_1^j, \widetilde{y}_1^j)= 0\) by the definition of \(\widetilde{I}_\infty \) and the fact that \(\Gamma _\infty \) is \(\infty \)-cyclically monotone. In other words, we can bound from above each \(\beta _j\) by \(\max (0,\widetilde{I}_\infty (x,y))\). Taking suprema with respect to k, \((x_2, y_2), \ldots (x_k, y_k)\) in \(\Gamma _{\infty }\) and \(\sigma \in \Sigma (k)\), we thus get \(I_{\infty }\le \max (0, \widetilde{I}_{\infty })\). Moreover, since \(\widetilde{I}_\infty \le 0\) on \(\Gamma _\infty \), \(I_{\infty }= \max (0, \widetilde{I}_{\infty })=0\) on \(\Gamma _{\infty }\)

Lower semi continuity of \(I_\infty \) and \(\widetilde{I}_{\infty }\) follows from the continuity of c. Finally assume that \(x\in {{\,\textrm{spt}\,}}(\mu )\) and \(y\in \mathbb {R}^d\), since \(\Gamma _\infty ={{\,\textrm{spt}\,}}(\gamma _\infty )\) is compact and \(\gamma _\infty \in \Pi (\mu , \nu )\), there exists \(y'\in \mathbb {R}^d\) such that \((x,y')\in \Gamma _\infty \). Taking \((x_1,y_1)=(x,y)\), \((x_2, y_2)=(x,y')\) as a competitor in the definition of \(\widetilde{I}_{\infty }(x,y)\) we see that \(\widetilde{I}_{\infty }(x,y)\ge 0\) hence \(I_{\infty }(x,y)=\widetilde{I}_{\infty }(x,y)\). The same argument shows that \(I_\infty \) and \(\widetilde{I}_\infty \) coincide on \(\mathbb {R}^d\times {{\,\textrm{spt}\,}}(\nu )\). \(\square \)

Lemma 5.5

Let us fix \((x,y)\in \mathbb {R}^d\times \mathbb {R}^d\). Suppose that for some \(\delta \in \mathbb {R}\), \(k\in \mathbb {N}\), \(k\ge 2\) and \((x_i,y_i)_{i=2}^k\subset {{\,\textrm{spt}\,}}\gamma _{\infty }\), we have

$$\begin{aligned} \max _{1\le i\le k}\{c(x_i,y_i)\}-\max _{1\le i\le k}\{c(x_i,y_{i+1})\}>\delta , \ \text {where }(x_1,y_1):=(x,y). \end{aligned}$$

Then there exist \(\alpha >0\), \(r>0\) and \(p_0\ge 1\) such that

$$\begin{aligned} \gamma _{p,\varepsilon }(B_r(x,y))\le \alpha e^{\frac{-p\delta }{\varepsilon }}, \ \forall p\ge p_0, \end{aligned}$$

where \(\gamma _{p,\varepsilon }\) is the minimizer of \(J_{p, \varepsilon }\).

Proof

Of course if \(\delta \le 0\), one can just take \(\alpha =1\) so we may assume that \(\delta >0\). Reasoning as in the proof of Theorem 4.4 (recall that we have assumed \(c\ge 1\)), we know that there exist \(p_0\) and \(r>0\) such that

$$\begin{aligned} \sum _{i=1}^{k}c^p(x'_i,y'_i)-\sum _{i=1}^kc^p(x'_i,y'_{i+1})> p\delta , \end{aligned}$$

for every \(p\ge p_0\) and \((x'_i,y'_i)_{i=1}^k\subset B_r(x_1,y_1)\times \cdots \times B_r(x_k,y_k)\). Then \(B_r(x_1,y_1)\times \cdots \times B_r(x_k,y_k)\subset A_{k, c^p}(p\delta )\) so, thanks to Lemma 4.3,

$$\begin{aligned} \gamma ^k_{p,\varepsilon }(B_r(x_1,y_1)\times \cdots \times B_r(x_k,y_k))\le e^{-\frac{p\delta }{\varepsilon }}. \end{aligned}$$

Moreover \(\liminf _{p\rightarrow \infty }\gamma _{p,\varepsilon }(B_r(x_i,y_i))\ge \gamma _{\infty }(B_r(x_i,y_i))>\beta \), for all \(2\le i\le k\), for some \(\beta >0\) since \((x_i,y_i)_{i=2}^k\subset {{\,\textrm{spt}\,}}\gamma _{\infty }, \) then

$$\begin{aligned} \gamma _{p,\varepsilon }(B_r(x,y))\le \left( \frac{\beta }{2}\right) ^{1-k}e^{-\frac{p\delta }{\varepsilon }}, \end{aligned}$$

for all \(p \ge p_0\) (possibly replacing \(p_0\) with a larger one). \(\square \)

Proposition 5.6

Under the assumptions of this paragraph, for any compact set \(C\subset \mathbb {R}^d\times \mathbb {R}^d\), one has

$$\begin{aligned} \limsup _{p\rightarrow \infty }\frac{\varepsilon }{p}\log \gamma _{p,\varepsilon }(C)\le -\inf _{ C \cap ({{\,\textrm{spt}\,}}(\mu )\times {{\,\textrm{spt}\,}}(\nu ))} \widetilde{I}_\infty \le -\inf _{ C}I_{\infty }. \end{aligned}$$

Proof

First note that since \(\gamma _{p, \varepsilon }\) is supported on \({{\,\textrm{spt}\,}}(\mu )\times {{\,\textrm{spt}\,}}(\nu )\),

$$\begin{aligned} \gamma _{p,\varepsilon }(C)=\gamma _{p,\varepsilon }(C \cap ({{\,\textrm{spt}\,}}(\mu )\times {{\,\textrm{spt}\,}}(\nu ))) \end{aligned}$$

and there is noting to prove if C is disjoint from \({{\,\textrm{spt}\,}}(\mu )\times {{\,\textrm{spt}\,}}(\nu )\). Therefore we can assume that \(C\cap ( {{\,\textrm{spt}\,}}(\mu )\times {{\,\textrm{spt}\,}}(\nu )) \ne \emptyset \). It then follows from Lemma 5.4 that

$$\begin{aligned} \inf _{ C \cap ({{\,\textrm{spt}\,}}(\mu )\times {{\,\textrm{spt}\,}}(\nu ))} \widetilde{I}_\infty = \inf _{ C \cap ({{\,\textrm{spt}\,}}(\mu )\times {{\,\textrm{spt}\,}}(\nu ))} I_\infty \ge \inf _{ C} I_\infty . \end{aligned}$$

Now let \(\eta >0\) and \((x,y)\in C\cap ({{\,\textrm{spt}\,}}(\mu )\times {{\,\textrm{spt}\,}}(\nu ))\). By definition of \(\widetilde{I}_\infty (x,y)\) there exist \(k\ge 2\) and \((x_i,y_i)_{i=2}^k\subset \Gamma _\infty \), such that (setting as usual \((x_1,y_1)=(x,y)\) and \(y_{k+1}=y\))

$$\begin{aligned} \max _{1\le i\le k}\{c(x_i,y_i)\}-\max _{1\le i\le k}\{c(x_i,y_{i+1})\}> \min ( \eta ^{-1}, \widetilde{I}_{\infty }(x,y))-\eta . \end{aligned}$$

Note that the truncation is used to handle the case where \(\widetilde{I}_\infty (x,y)=+\infty \). By Lemma 5.5 we know that there exist \(\alpha ,r>0\) such that

$$\begin{aligned} \gamma _{p,\varepsilon }(B_r(x,y))\le \alpha \exp \left( \frac{-p( \min ( \eta ^{-1}, \widetilde{I}_{\infty }(x,y))-\eta )}{\varepsilon }\right) . \end{aligned}$$

Then

$$\begin{aligned} \limsup _{p\rightarrow \infty }\frac{\varepsilon }{p}\log \gamma _{p,\varepsilon }(B_r(x,y))\le - \min ( \eta ^{-1}, \widetilde{I}_{\infty }(x,y))+\eta \end{aligned}$$

and, by compactness of C,

$$\begin{aligned} \limsup _{p\rightarrow \infty }\frac{\varepsilon }{p}\log \gamma _{p,\varepsilon }(C)\le -\inf _{C \cap ({{\,\textrm{spt}\,}}(\mu )\times {{\,\textrm{spt}\,}}(\nu )) } \min (\eta ^{-1}, \widetilde{I}_{\infty })+\eta \end{aligned}$$

which, letting \(\eta \rightarrow 0^+\), yields the desired upper bound. \(\square \)

6 Numerical Results

In this section, we present several numerical examples, with the aim of illustrating the discussions and theoretical analysis of the previous sections. We shall consider discrete marginals; let \(N,M\in \mathbb {N}\), with a slight abuse of notation, we will denote by \(\mu \) and \(\nu \) both the measures and the vectors of weights \((\mu _i)_{i=1}^N\) and \((\nu _j)_{j=1}^M\) and \(\gamma \) will denote both the transport plan and the \(N\times M\) matrix \((\gamma ^{ij})\). For fixed \(p,\varepsilon >0\), in this discrete setting, the minimization of \(J_{p,\varepsilon }\) reads

$$\begin{aligned} \min _{\Pi (\mu ,\nu )} \left( \sum _{i,j}\gamma ^{ij} c^p_{ij}+\varepsilon \sum _{ij} \gamma ^{ij} \log \Big ( \frac{\gamma ^{ij}}{\mu _i \nu _j }\Big ) \right) ^{\frac{1}{p}}. \end{aligned}$$
(6.1)

Raising the above cost to the power p, which does not change the minimizer, leads to a standard entropic transport problem. For such problems, we used in all our examples Sinkhorn’s algorithm (see for instance Chapter 4 in [14]) to find a good approximation (with error smaller than \(10^{-5}\)) of the solution.

Fig. 1
figure 1

Example of convergence of the plan to the \(\infty \)-cm plan: \(c(x,y)=|x-y|^p\), \(p\in \{2,3,4,5\}\), \(\varepsilon =1\) and \(\mu \) and \(\nu \) having orthogonal supports

If \(v_\infty \ge 1\), in light of Theorem 3.1, we expect the output \(\gamma \) of the Sinkhorn algorithm to be, for suitable p and \(\varepsilon \), also a good approximation of an optimal plan for the discretized \(L^\infty \)- optimal transport problem

$$\begin{aligned} v_{\infty }:=\min _{\gamma \in \Pi (\mu ,\nu )}\max _{i,j}\left\{ c_{i,j}\,: \, \gamma ^{ij}\ne 0 \right\} . \end{aligned}$$

Furthermore, if \(c\ge 1\), thanks to Theorem 4.4, we expect to find a plan close to an \(\infty \)-cyclically monotone one.

Remark 6.1

As the set of transport plans \(\Pi (\mu , \nu )\) is a convex polytope, for any \(\gamma \in \Pi (\mu ,\nu )\) there exists a finite set of indices S, such that \(\gamma =\sum _{s\in S} a_s\gamma _s\), with \(a_s > 0\), \(\sum a_s=1\) and \(\gamma _s\) an extreme point of \(\Pi (\mu , \nu )\). If \(N=M\) and \(\mu _i=\nu _j=\frac{1}{N}\), the set \(\Pi (\mu ,\nu )\) is the set of the so-called bi-stochastic matrices, whose extreme points, by Birkhoff’s theorem, form the set of permutation matrices. We observe that, by definition of \(\gamma -\text {ess}\sup \), \(J_\infty (\gamma )= \max _{s\in S} J_{\infty }(\gamma _s)\) and thus the minimum of \(J_\infty \) is attained at some permutation matrix. Therefore, if \(N=M\) and \(\mu _i=\nu _j=\frac{1}{N}\)

$$\begin{aligned} v_\infty =\min _{\sigma \in \Sigma (N)}\max _{i}c_{i,\sigma (i)}. \end{aligned}$$

This can be in principle used to compute \(v_\infty \) exactly. However this is not particularly useful in practice; regarding for instance the example on bottom of Fig. 4, even if the size of \(\mu \) and \(\nu \) is the same, in order to calculate the exact value of \(v_\infty \) we should be able to perform 100! evaluations, which is infeasible in practice!

All the examples in this section, will be in dimension \(d=2\), \(\mu \) will be represented by blue points, \(\nu \) by red points and the plan will be represented by arrows: the black ones indicate that a blue point is sent to a red point with high probability, while the gray ones indicate that a blue point is sent to a red point with lower probability (but still not negligible).

In the first example, as shown by Fig. 1, we consider \(c^p=|x-y|^p\), for \(p\in \{2,3,4,5\}\), \(\mu \) which is uniformly concentrated on the blue points

$$\begin{aligned} \{(-2,0), (-1.5,0), (-1,0), (-0.5,0),(0.5,0), (1,0), (1.5,0), (2,0)\} \end{aligned}$$

and \(\nu \) on the red points

$$\begin{aligned} \{(0,-1.367), (0,-0.867), (0,867), (0,1.367)\}. \end{aligned}$$

Note that with this choice of \({{\,\textrm{spt}\,}}\mu \) and \({{\,\textrm{spt}\,}}\nu \), \(c\ge 1\) everywhere and therefore, thanks to Theorems 3.1 and 4.4, \(\Gamma \)-convergence and convergence of the outputs towards \(\infty \)-cm plans still hold choosing \(\varepsilon =1\). We observe that for \(p=2\), every transport plan \(\gamma \) is optimal. Indeed, by the orthogonality of the two supports, any plan is concentrated on a cyclically monotone set (see (4.1)) and, as recalled in Sect. 5.3 (see for instance [9, 15]), this is a sufficient optimality condition. Here, since we look for a plan which minimizes the regularized problem which involves the entropy, the Sinkhorn algorithm selects the most diffuse one, as evidenced by the picture on the upper left of Fig. 1. The other three pictures in Fig. 1 show that convergence towards an \(\infty \)-cm plan is really fast and it occurs already for \(p=5\).

Fig. 2
figure 2

Error on the marginals: the first image shows the error \(|\gamma \mathbb {1}_4-\mu |\) of the output \(\gamma \) on the first marginal and the second one the error \(|\gamma ^\intercal \mathbb {1}_8-\nu |\) on the second marginal

Regarding the accuracy, Fig. 2 shows that for \(p=5\) and \(\varepsilon =1\) the distance \(|\gamma \mathbb {1}_4-\mu |\) between the first marginal of the output \(\gamma \) and the distance \(|\gamma ^\intercal \mathbb {1}_8-\nu |\) between the second marginal of \(\gamma \) and \(\nu \) is of the order of \(10^{-5}\) after only 350 iterations.

Fig. 3
figure 3

Example of convergence of the plan to the \(\infty \)-cm plan for \(c(x,y)=\left( \max \{|x_1-y_1|,|x_2-y_2|\}\right) ^p\), for \(p\in \{2,3,4,5\}\), \(\varepsilon =1\) and \(\mu \) and \(\nu \) having orthogonal supports

We have also considered the same example (see Fig. 3) with the cost function \(c^p(x,y):=\left( \max \{|x_1-y_1|,|x_2-y_2|\}\right) ^p\). In this case the convergence is still fast and the error is small after few iterations (of order \(10^{-5}\) after about 180 iterations).

Fig. 4
figure 4

Comparison among three different examples: \(\varepsilon =1\), \(c(x,y)=|x-y|^p\), on the left \(p=2\), on the right \(p=15\). On top: \(\mu \) a uniform discretization of the unitary square and \(\nu \) uniformly concentrated on the points (1, 2) and (2, 1). In the middle: \(\mu \) the same discretization of the unitary square, \(\nu =0.1\delta _{(1,2)}+0.9\delta _{(2,1)}\) (the point (2, 1) is represented by a bigger dot). On bottom: \(\mu \) a uniform discretization of the square \([-0.25,0.25]\times [-0.25,0.25]\) and \(\nu \) of the rectangle \([1.25,1.5]\times [-0.5,0.5]\) (Color figure online)

Fig. 5
figure 5

Comparison among the speed of convergence of \(v_p-v_\infty \), \(Be^{-\beta p}\) and \(-\frac{A}{p}\) for \(p\in [10,206]\), \(\mu \) and \(\nu \) as the ones on top of Fig. 4. On top: \(v_p\) in blue and \(v_\infty \) in orange. On bottom: \(Be^{-\beta p}\) in green, \(-\frac{A}{p}\) in orange and \(v_p-v_\infty \) in blue. Here AB are obtained by linear regression (least squares) and \(\beta =\log (v_\infty )\) (the same \(\beta \) as in Proposition 5.1) (Color figure online)

Fig. 6
figure 6

\(\mu \) and \(\nu \) uniformly distributed both concentrated on 8 points. The value of \(v_\infty \) is about 1.38647347 and it is obtained transporting mass between the two points connected by the magenta segment (Color figure online)

Remark 6.2

When \(c>1\), on the one hand, we don’t need \(\varepsilon \) to be small and we can even take it large as p grows (by case 2. in Theorem 3.1 we can even choose for instance \(\varepsilon _p=(1+\lambda )^p\)). On the other hand, we can encounter some difficulties when computing the Gibbs kernel \(K_{ij}=e^{-\frac{c^p_{ij}}{\varepsilon }}\): if p is large it can happen that, for some ij, \(K_{i,j}=0\) making impossible to perform the division in the iterations of the primal version of the Sinkhorn algorithm. Fortunately, this problem can be overcome using the Log-Domain version (see for instance Sect. 4.4 in [14]), as we did in the following example, represented by Fig. 4.

Figure 4, which shows a comparison among three different examples, considered for \(p=2\) on the left and for \(p=15\) on the right and \(\varepsilon =1\). The two pictures on top in Fig. 4 show the representation by arrows of the output when \(\mu \) is uniformly concentrated on 400 points which discretize the unitary square and \(\nu \) is uniformly concentrated on the points (1, 2) and (2, 1). This is a discretization of the case \(\mu \) uniform on the square \([0,1]^2\), where (see also Example 2.2 in [6]) every \(\gamma \in \Pi (\mu ,\nu )\) is optimal for the problem

$$\begin{aligned} \inf _{\gamma \in \Pi (\mu , \nu )} \gamma -\hbox {ess sup}\,c=\Vert c \Vert _{L^{\infty }(\gamma )}. \end{aligned}$$

Indeed

$$\begin{aligned} \Vert c \Vert _{L^{\infty }(\gamma )}&=\sup \{c(x,y) \, : \, (x,y)\in {{\,\textrm{spt}\,}}\gamma \}\\&=|(0,0)-(1,2)|=|(0,0)-(2,1)|=\sqrt{5}, \end{aligned}$$

for every \(\gamma \in \Pi (\mu ,\nu ).\) Since every plan is optimal, when p is smaller, as shown in the picture on the left, the role of the entropy is more important and the algorithm selects the most diffuse plan. While increasing the value of p the entropy becomes more and more negligible and output becomes sparser: already for \(p=15\) (on the right) the output is a good approximation of the \(\infty \)-cyclically monotone plan, which in this case is unique (see Theorem 5.6 in [6]). A small variation, represented by the two figures in the middle, is to consider \(\nu \) which is not uniformly concentrated on the points (1, 2) and (2, 1). Here we have taken \(\nu =0.1\delta _{(1,2)}+0.9\delta _{(2,1)}\). Finally, on the bottom, we have implemented the case in which also \(\nu \) is the discretization of an absolutely continuous measure. Here \(\mu \) approximates the square \([-0.25,0.25]\times [-0.25,0.25]\) and \(\nu \) the rectangle \([1.25,1.5]\times [-0.5,0.5]\) and both measures are supported on 100 points. As previously, one can notice that for \(p=2\) the entropy plays an important role and the algorithm selects the most diffuse plan, while, already for \(p=15\) the plan is considerably sparser.

Fig. 7
figure 7

Comparison among the speed of convergence of \(v_p-v_\infty \), \(Be^{-\beta p}\) and \(-\frac{A}{p}\) for \(p\in [10,172]\) and \(\varepsilon =500^2\), \(\mu \) and \(\nu \) as the ones in Fig. 6. On top: \(v_p\) in blue and \(v_\infty \) in orange. On bottom: \(Be^{-\beta p}\) in green, \(-\frac{A}{p}\) in orange and \(v_p-v_\infty \) in blue. Here AB are obtained by linear regression (least squares) and \(\beta =\log (v_\infty )\) (the same \(\beta \) as in Proposition 5.1) (Color figure online)

We are now interested in the asymptotic behavior of \(v_p:=\min _{\Pi (\mu , \nu )}J_p\) and we want to numerically represent the upper and lower bounds on the speed of convergence of \(v_p\) towards \(v_\infty :=\min _{\Pi (\mu , \nu )}J_\infty \) proved in Propositions 5.1 and 5.3. In order to apply Propositions 5.1 and 5.3 it is enough to assume a lower bound on \(v_\infty \) and not a pointwise one on c.

Figure 5 provides an example of the asymptotic behavior of \(v_p\) and of the speed of convergence in the case of \(\mu \) and \(\nu \) as the ones represented in the two pictures on top of Fig. 4. In light of what we have just remarked, we have re-scaled the cost c in order to have \(v_\infty \simeq 1.08166\). For \(p\in [10,206]\) the image on top of Fig. 5 shows in blue how \(v_p\) changes varying p, while \(v_\infty \) is constant and is represented by the orange line. On bottom of Fig. 5 we have represented in blue \(v_p-v_\infty \), in green the upper bound \(Be^{-\beta p}\) and in orange the lower bound \(-\frac{A}{p}\), where \(\beta =\log (v_\infty )\) by Proposition 5.1 (indeed in this case c is Lipschitz so \(\alpha =1>\log (v_\infty )\)) and AB have been estimated by a linear regression method (by least squares).

Finally, an example in which it is possible (even if it is really slow!) to compute \(v_\infty \) exactly (see Remark 6.1) is represented in Fig. 6. Here \(\mu \) is concentrated on 8 points, given by

$$\begin{aligned} \left\{ (x_1,x_2) \,: \, x_1=-0.25+ 0.125\cdot i,\, i=1,\dots , 4, \ x_2\in \{-0.1,0.1\} \right\} \end{aligned}$$

and \(\nu \) is concentrated on 8 equidistant points of the segment starting from the point (0.625, 1.25) to the point (1.25, 0) of the line \(y_2=-2y_1+2.5\). We have computed \(v_\infty \) for the cost \(c(x,y)=|x-y|\) applying Remark 6.1, and we have obtained that \(v_\infty \simeq 1.38647347\) and that the points which are at the minimal-maximal distance are \(x_*=(-0.25, -0.1)\) and \(y_*=(0.98214286, 0.53571429)\), connected by the purple segment in the picture. Regarding the speed of convergence we rescaled the cost in order to decrease further \(v_\infty \simeq 1.052460609\). As shown in Fig. 7, \(v_p\) is calculated varying p in the interval [10, 172], with \(\varepsilon =500^2\). We observe that in this case, as shown in the picture on top, \(v_p\) is initially smaller than \(v_\infty \), then it increases becoming greater and finally it starts decreasing converging to \(v_\infty \).