1 Introduction

In a recent paper by Buttazzo et al. [3], a mathematical model for the strong interaction limit in the density functional theory is studied (see also [6, 14]). In particular, they observe that the minimal interaction of n electrons in \(\mathbb{R }^d\) whose density is \(\mu \) can be represented as the minimum of a multimarginal Monge cyclical problem (the cyclicity is due to the symmetries of the problem).

The problem can be described as follows. We consider the cost function \(c:(\mathbb{R }^d)^n\rightarrow \mathbb{R }\) given by

$$\begin{aligned} c\left( x_1,\ldots , x_n\right) = \sum _{1 \le i<j \le n} \frac{1}{|x_i-x_j|} \qquad \forall \left( x_1,\ldots , x_n\right) \in (\mathbb{R }^d)^n \end{aligned}$$
(1.1)

and the cyclical multimarginal problem (of Monge type)

$$\begin{aligned} ({M}) \!=\!\inf \Big \{ \int \limits _{\mathbb{R }^d} c\left( x,T(x),\ldots , T^{(n-1)}(x)\right) \hbox {d} \mu (x): T:\mathbb{R }^d \!\rightarrow \! \mathbb{R }^d\,\text {Borel}, \, T_\sharp \mu \!=\!\mu , \, T^{(n)}\!=\!\hbox {Id} \Big \},\!\!\!\nonumber \\ \end{aligned}$$
(1.2)

where \(\mu \in {\fancyscript{P}}(\mathbb{R }^d)\) is a given probability measure on \(\mathbb{R }^d,\, T^{(i)}\) stands for the \(i\)th composition of T with itself, and \(T_\sharp \mu \) represents the pushforward measure of the measure \(\mu \) through the Borel map T.

A natural question arising both from the modeling and a mathematical point of view is whether there exists a map T such that the minimum in (1.2) is achieved under suitable assumptions on \(\mu \). This question is rather difficult, and up to now, a positive answer is known only in dimension 1, where an explicit map T can be provided (see [5]). For some particular costs generated by vector fields, the existence of an optimal map has been provided by Ghoussoub and Moameni [8] and by Ghoussoub and Maurey [9]. In general, the problem of finding multimarginal optimal transport maps has been solved only under special assumptions on the local behavior of the cost (see [7, 11, 13]).

However, following the theory of optimal transportation (see [13, 15]), one can introduce the cost of a plan \(\pi \in {\fancyscript{P}}((\mathbb{R }^d)^n)\)

$$\begin{aligned} {\fancyscript{C}}(\pi ) = \int \limits _{(\mathbb{R }^d)^n} c\left( x_1,\ldots , x_n\right) \, \hbox {d} \pi \left( x_1,\ldots , x_n\right) \end{aligned}$$

and a problem of Kantorovich type

$$\begin{aligned} (K): = \inf \Big \{ {\fancyscript{C}}(\pi ): \pi \in {\fancyscript{P}}((\mathbb{R }^d)^n), \, (e_i)_\sharp \pi =\mu \;\forall i\in \{1,\ldots , n\} \Big \}, \end{aligned}$$
(1.3)

where \(e_i:(\mathbb{R }^d)^n \rightarrow \mathbb{R }^d\) are the projections on the \(i\)th component. Roughly speaking, through the problem \((K)\), we allow the splitting of mass. It can be easily seen that the problem \((K)\) is linear with respect to plans and that the admissible plans form a bounded, weakly\(^*\) compact subset of the set of measures on \((\mathbb{R }^d)^n\); hence, the existence of a minimum in \((K)\) follows easily from the lower semicontinuity of the cost (and therefore of \({\fancyscript{C}}\)).

Since to every transport map T corresponds a transport plan \((\hbox {Id}, T,\ldots , T^{(n-1)})_\sharp \mu \), we have obviously that \((K)\le (M)\). A natural question regarding the relation between \((K)\) and \((M)\) is whether the weaker problem \((K)\) is consistent with \((M)\), namely if \((K)= (M)\) in the case of a non-atomic probability \(\mu \). In the following, we give a positive answer, proving that the cost of any transport plan can be well approximated through maps with the same marginals.

Theorem 1.1

Let \(\mu \in {\fancyscript{P}}(\mathbb{R }^d)\) be a non-atomic probability measure. Let \(c\) be the cost function (1.1). Then

$$\begin{aligned} (K) = (M). \end{aligned}$$
(1.4)

Remark 1.2

Since the cost c is symmetric, the Kantorovich problem can be formulated considering only symmetric plans \(\pi \). In other words,

$$\begin{aligned} (K) = \inf \Big \{{\fancyscript{C}}(\pi ): \pi \in {\fancyscript{P}}((\mathbb{R }^d)^n), \, (e_i)_\sharp \pi =\mu \;\forall i\in \{1,\ldots , n\}, \, \sigma _\sharp \pi = \pi \; \forall \sigma \in {\fancyscript{O}}_n\Big \},\nonumber \\ \end{aligned}$$
(1.5)

where \({\fancyscript{O}}_n\) is the set of maps which permute the coordinates of \((\mathbb{R }^d)^n\), formally defined below.

Indeed, given a transport plan by the linearity of the transport problem the plan

$$\begin{aligned} \frac{1}{n!}\sum _{\sigma \in {\fancyscript{O}}_n} \sigma _{\sharp } \pi \end{aligned}$$

has the same cost and the same marginals as \(\pi \) and it is symmetric.

In a work of Pratelli [12], it is shown that in every Polish space, there is equivalence between Monge problem and Kantorovich problem with two marginals, in the sharp case of a continuous cost, possibly unbounded and infinite, with the first marginal with no atoms. Theorem 1.1, which holds true as well in metric spaces, provides a version of this result to multimarginal problems with cyclic costs and maps. We remark that the generalization of [12] without requiring any cyclical structure on the maps would be straightforward, as pointed out below.

Remark 1.3

For any n, it follows easily from the results in [12] that the cost of any plan can be approximated through \(n-1\) maps, in particular

$$\begin{aligned} (K) =&\inf \Big \{ \int \limits _{(\mathbb{R }^d)^n} c(x,T_2(x),\ldots , T_{n-1}(x)) \, \hbox {d} \mu (x): T_i:\mathbb{R }^d\rightarrow \mathbb{R }^d,\nonumber \\&\quad T_{i\sharp } \mu =\mu \; \forall i\in \{2,\ldots n\} \Big \}. \end{aligned}$$
(1.6)

Indeed one can consider the latest problem as a two-marginal problem between \(X\) and \(X^{n-1}\). The main purpose of Theorem 1.1 is to show that we can approximate the cost of each transport plan with special maps of the form \((\hbox {Id}, T, T^{(2)},\ldots , T^{(n-1)})\), where \(T^{(n)}=\hbox {Id}\).

Finally, we point out that the extension of Theorem 1.1 to metric spaces is almost straightforward. More precisely, we prove that the infimum of Monge multimarginal problem with cyclical maps is the same as the minimum of Kantorovich problem under the sharp assumption that the cost c is cyclical and continuous. Given \(\tau : \{1,\ldots , n\} \rightarrow \{1,\ldots , n\}\) a permutation, we define \(\sigma _\tau : (\mathbb{R }^d)^n \rightarrow (\mathbb{R }^d)^n\) as

$$\begin{aligned} \sigma _\tau \left( x_1,\ldots x_n\right) = \left( x_{\tau (1)},\ldots , x_{\tau (n)}\right) \qquad \forall \left( x_1,\ldots x_n\right) \in (\mathbb{R }^d)^n. \end{aligned}$$
(1.7)

We call \({\fancyscript{O}}_n\) the collection of these functions. We say that a possibly infinity-valued cost function c on \((\mathbb{R }^d)^n\) is symmetric if \(c(\sigma ) = c\) for every \(\sigma \in {\fancyscript{O}}_n\). We say that a cost function is cyclical if \(c({\overline{\sigma }}) = c\) for

$$\begin{aligned} {\overline{\sigma }}\left( x_1,\ldots ,x_n\right) = \left( x_2,\ldots , x_{n}, x_1\right) \qquad \forall (x_1,\ldots x_n) \in (\mathbb{R }^d)^n. \end{aligned}$$
(1.8)

We notice that if \(n=2\), a cost is cyclical if and only if it is symmetric.

Theorem 1.4

Let \((X,\mathrm{d })\) be a Polish space and \(\mu \in {\fancyscript{P}}(X)\) a non-atomic probability measure on \(X\). Let \(c:X^n\rightarrow [0,\infty ]\) be a cyclical cost which is continuous on its finiteness domain. Then

$$\begin{aligned}&\min \Big \{ \int \limits _{X^n} c\left( x_1,\ldots , x_n\right) \, \mathrm{d} \pi (x_1,\ldots , x_n): \pi \in {\fancyscript{P}}(X^n), \, \, (e_i)_\sharp \pi =\mu \;\forall i\in \{1,\ldots , n\} \Big \}\nonumber \\&\!=\! \inf \Big \{ \int \limits _{X} c(x,T(x),\!\ldots \!, T^{(n-1)}(x)) \, \mathrm{d} \mu (x): T:X\rightarrow X\,\mathrm{Borel map},\,T_\sharp \mu \!=\!\mu , \, T^{(n)}\!=\!\mathrm{Id} \Big \}.\nonumber \\ \end{aligned}$$
(1.9)

Remark 1.5

As in Remark 1.2, since the cost c is cyclical, the Kantorovich problem can be done considering only cyclical plans, namely transport plans \(\pi \) such that \({\overline{\sigma }}_\sharp \pi = \pi \), where \({\overline{\sigma }}\) is defined as in (1.8). In other words, we have that

$$\begin{aligned} (K) = \inf \Big \{{\fancyscript{C}}(\pi ): \pi \in {\fancyscript{P}}((\mathbb{R }^d)^n), \, (e_i)_\sharp \pi =\mu \;\forall i\in \{1,\ldots , n\}, \, {\overline{\sigma }}_\sharp \pi = \pi \Big \}. \end{aligned}$$

To make more clear the key ideas of the proof avoiding the technical details, we present in Sect. 2 a toy situation, namely the case of a bounded uniformly continuous symmetric cost on \(\mathbb{R }^d\) with two marginals. Also, in this case, the statement does not follow directly from more classical results, or from [12], because we require the almost-optimal maps to be involutions. Section 3 is devoted to the proof of Theorem 1.1. Finally, in Sect. 4, we generalize Theorem 1.1 to multimarginal problems with a cyclical continuous cost on a metric space.

2 A uniformly continuous cost with 2 marginals: a toy case

In this section, we prove the main result with a simplified cost function and with 2 marginals. We remark that under standard assumptions on the cost implying that a unique optimal transport map exists between \(\mu \ll {\fancyscript{L}}^d\) and itself (namely the twist condition, see for example [15]), the optimal map is an involution; hence, the minimum in the symmetric Monge problem is achieved. Indeed, since T and \(T^{-1}\) are both optimal transport maps between \(\mu \) and itself, and since the optimal transport map is unique under these assumptions, we obtain that \(T= T^{-1}\) almost everywhere, which proves that T is an involution.

Theorem 2.1

Let \(\mu \in {\fancyscript{P}}(\mathbb{R }^d)\) be a non-atomic probability measure. Let \(c:\mathbb{R }^d \times \mathbb{R }^d\rightarrow [0,\infty )\) be a uniformly continuous, symmetric cost. Then \((K) = (M)\).

An analogous result without assuming any symmetry of the cost function and of the constructed map has been proved in [1].

Before proving the result, we present two lemmas. In the first one, we show the existence of a transport map that is essentially invertible (meaning except for a null set) between any couple of non-atomic measures with the same mass in \(\mathbb{R }^d\). We remark that if we assume that the two measures are absolutely continuous with respect to the Lebesgue measure, then the next lemma can be proved simply by considering an optimal transport map with respect to the quadratic cost \(|x-y|^2\). Indeed, this map is invertible almost everywhere, and its inverse is given by the inverse transport between the two measures (see [15]).

Lemma 2.2

Let \(\mu \) be a non-atomic measure on \(\mathbb{R }^d\) and let \(A, \,B\) be Borel sets such that \(\mu (A)= \mu (B)<\infty \). Then there exist two Borel maps \(T:A\rightarrow B\) and \(S:B \rightarrow A\) with the following properties:

  • \(T_\sharp (1_{A}\mu ) =1_B \mu \) and \(S_\sharp (1_{B}\mu ) =1_A \mu \);

  • the maps \(T\) and \(S\) are \(\mu \)-almost inverse in the sense that \(T \circ S (x) =x\) for \(\mu \)-almost every \(x \in B\) and \(S\circ T (x)=x\) for \(\mu \)-almost every \(x \in A\).

Proof

From the Isomorphism Theorem of measure rings (see for instance [12, Theorem 1.4] for a precise statement and the references quoted therein), it follows that given two Polish spaces X and \(Y, \,\mu \in {\fancyscript{P}}(X)\) and \(\nu \in {\fancyscript{P}}(Y)\) there exist a Borel subset \(\tilde{X} \subseteq X\) of full \(\mu \)-measure, a Borel subset \(\tilde{Y} \subseteq Y\) of full \(\nu \)-measure, and a bijective Borel function \(\phi :\tilde{X} \rightarrow \tilde{Y}\) such that \(\phi _\sharp \mu = \nu \) and \((\phi ^{-1})_\sharp \nu = \mu \). Applying this result with \(X=Y=\mathbb{R }^d\) to the two measures \(1_A \mu \) and \(1_B\mu \), we obtain a transport map \(\phi \) between two sets of full measure \(\tilde{A} \subseteq \mathbb{R }^d\) and \({\tilde{B}} \subseteq \mathbb{R }^d\) that is also a homeomorphism; let now \({\hat{A}}=\phi ^{-1}(B)\) and \({\hat{B}}=\phi (A)\). Let us note that since \(\phi \) is a transport between \(1_A \mu \) and \(1_B\mu \), we have \(\mu ({\hat{A}} \cap A ) = \mu (B)=\mu (A)\) and so \({\hat{A}} \cap A\) has full measure; similarly, also \({\hat{B}} \cap B\) has full measure. Given \(a_0 \in A\) and \(b_0 \in B\), we define

$$\begin{aligned} T (x)= \left\{ \begin{array}{ll} \phi (x) &{} \,\,\text {if}\,x \in {\hat{A}} \cap A \\ b_0 &{} \text { if }\,x \in A\setminus {\hat{A}} \end{array}\right. \qquad S(x)= \left\{ \begin{array}{ll} \phi ^{-1}(x) &{} \quad \text {if}\quad x \in {\hat{B}}\cap B \\ a_0 &{}\quad \text {if}\quad x \in B\setminus {\hat{B}} \end{array}\right. \end{aligned}$$

and we claim that these functions satisfy the requested properties.

Because of the definitions of \({\hat{A}}\) and \({\hat{B}}\), the image of T is contained in B and the image of \(S\) is contained in \(A\). Moreover, we have that \(T \circ S (x)=x\) on \({\hat{B}} \cap B\) and \(S \circ T (x)=x\) on \({\hat{A}} \cap A\), and so \(\mu \)-almost everywhere on \(B\) and \(A\). Since we modified \(\phi \) in sets of null measure, \(T\) and \(S\) are still transport maps between \(1_A \mu \) and \(1_B\mu \). \(\square \)

In the following lemma, we estimate the difference of the cost of two transport plans with the oscillation of the cost on their support.

Lemma 2.3

Let \(Q\subseteq (\mathbb{R }^d)^n\) be a measurable set and let \(\pi \) and \(\gamma \) be transport plans concentrated on \(Q\) and such that \(\pi (Q)= \gamma (Q)\).

Then we have that

$$\begin{aligned} {\fancyscript{C}}(\pi )- {\fancyscript{C}}(\gamma ) \le \pi (Q) \sup _{\mathbf{{x}},\mathbf{{y}}\in Q}|c(\mathbf{{x}})-c(\mathbf{{y}})|. \end{aligned}$$

Proof

We have that

$$\begin{aligned} {\fancyscript{C}}(\pi )- {\fancyscript{C}}(\gamma )&= \int \limits _{(\mathbb{R }^d)^n} c(\mathbf{{x}}) \, \hbox {d}\pi (\mathbf{{x}}) - \int \limits _{(\mathbb{R }^d)^n} c(\mathbf{{y}}) \, \hbox {d}\gamma (\mathbf{{y}}) \\&= \frac{1}{\gamma (Q)} \int \limits _{(\mathbb{R }^d)^n} \int \limits _{(\mathbb{R }^d)^n} \big (c(\mathbf{{x}}) - c(\mathbf{{y}}) \big ) \, \hbox {d}\pi (\mathbf{{x}}) \, \hbox {d}\gamma (\mathbf{{y}}) \le \pi (Q) \sup _{\mathbf{{x}},\mathbf{{y}}\in Q}|c(\mathbf{{x}})-c(\mathbf{{y}})|. \end{aligned}$$

\(\square \)

The proof of Theorem 2.1 is based on a decomposition of \(\mathbb{R }^d \times \mathbb{R }^d\) in square cells where the cost has a small oscillation. Before proving the Theorem, we introduce our simple partitions. For every \(k\in \mathbb{N }\) we partition \(\mathbb{R }^d\) in cubic cells of side length \(1/2^k\)

$$\begin{aligned} {\fancyscript{S}}^k(\mathbb{R }^d) = \Big \{ \left[ \frac{i_1}{2^k},\frac{i_1+1}{2^k} \right) \times \ldots \times \left[ \frac{i_d}{2^k},\frac{i_d+1}{2^k} \right) : \; i_1,\ldots , i_d \in \mathbb{Z }^d \Big \}. \end{aligned}$$
(2.1)

We consider a partition of \(\mathbb{R }^d \times \mathbb{R }^d\) made of squares which are products of cubic cells with side \(2^{-k}\), namely

$$\begin{aligned} {\fancyscript{Q}}^{k} (\mathbb{R }^d \times \mathbb{R }^d)= \left\{ C \times C^{\prime } : C, C^{\prime } \in {\fancyscript{S}}^k (\mathbb{R }^d)\right\} . \end{aligned}$$

Proof of Theorem 2.1

We prove that, given a plan \(\pi \) (namely, \(\pi \in {\fancyscript{P}}((\mathbb{R }^d)^2)\) such that \(e_{1\sharp } \pi = \mu \) and \(e_{2\sharp } \pi = \mu \)) and an \(\varepsilon >0\), there exists a Borel map \(T: \mathbb{R }^d \rightarrow \mathbb{R }^d\) such that

$$\begin{aligned}&T_{\sharp } \mu = \mu ,\end{aligned}$$
(2.2)
$$\begin{aligned}&T ( T(x))=x\,\text {for}\,\mu -\hbox {a.e.} \,x \in \mathbb{R }^d, \end{aligned}$$
(2.3)

and

$$\begin{aligned} {\fancyscript{C}}\left( (\hbox {Id}, T)_\sharp \mu \right) \le {\fancyscript{C}}( \pi ) + \varepsilon \Vert \pi \Vert . \end{aligned}$$
(2.4)

We notice that we can assume \(\pi \) to be symmetric (namely \(\pi (A\times B)=\pi (B\times A)\) for every \(A,B\subseteq \mathbb{R }^d\) measurable sets) as observed in Remark 1.2.

Due to the uniform continuity of the cost, there exists a \(k\) sufficiently large such that in every square of \({\fancyscript{Q}}^{k} (\mathbb{R }^d \times \mathbb{R }^d)\), the cost \(c\) oscillates less than \(\varepsilon \).

Since \({\fancyscript{S}}^k\) is countable, we enumerate its elements \({\fancyscript{S}}^k= \{C_i\}_{i\in \mathbb{N }}\). For every cell \(C_i\), we consider the squares in \({\fancyscript{Q}}^{k} (\mathbb{R }^d \times \mathbb{R }^d)\) with projection \(C_i\) on the first \(\mathbb{R }^d\), namely \(\{C_i \times C_j : j\in \mathbb{N }\}\). Since \(e_{1\sharp } \pi = \mu \), their masses \(\{\pi (C_i \times C_j) : j\in \mathbb{N }\}\) satisfy

$$\begin{aligned} \sum _{j=1}^\infty \pi (C_i \times C_j) =\mu (C_i) \qquad \forall i\in \mathbb{N }. \end{aligned}$$

We partition \(C_i\) in disjoint sets \(E_{i,j}\) such that \(\mu ( E_{i,j} ) =\pi (C_i \times C_j)\): It is possible since \(\mu \) has no atoms. We consider the set \(E_{i,j} \times E_{j,i}\); due to the symmetry of the plan, we have that \(\pi ( C_i \times C_j ) = \pi ( C_j \times C_i )\), and so we have that \(\mu (E_{i,j} )= \mu (E_{j,i} )\). If \(\mu (E_{i,j}) \ne 0\) and \(i\le j\) let \(T_{i,j}\) and \(T_{j,i}\) be the maps given by Lemma 2.2 applied to the sets \(E_{i,j}\) and \(E_{j,i}\) (we consider the identity map if \(i=j\)). For every \(i,j\in \mathbb{N }\), the graph of the map \(T_{i,j}\) lies in \(E_{i,j} \times E_{j,i}\), and we have that \(T_{j,i} \circ T_{i,j} = \hbox {Id}\) almost everywhere on \(E_{i,j}\). Gluing all these maps together, we define a map \(T:\mathbb{R }^d\rightarrow \mathbb{R }^d\), thanks to the fact that the sets \(E_{i,j}\) are disjoint and their union is \(\mu \)-a.e. \(\mathbb{R }^d\)

$$\begin{aligned} T(x)= \left\{ \begin{array}{ll} T_{i,j} (x) &{} \text { if }x \in E_{i,j} \\ 0 &{} \text {otherwise.} \end{array}\right. \end{aligned}$$

We prove that (2.2), (2.3), and (2.4) are satisfied. Since the sets \(E_{i,j}\) are disjoint and since by definition \(T_{i,j\sharp }(\mu \chi _{E_{i,j}}) = \mu \chi _{E_{j,i}}\) for every couple \(i,j\in \mathbb{N }\), we have that

$$\begin{aligned} T_\sharp (\mu ) = T_\sharp \Big (\mu \sum _{i,j=1}^\infty \chi _{E_{i,j}} \Big ) = \sum _{i,j=1}^\infty T_{i,j\sharp }(\mu \chi _{E_{i,j}}) = \sum _{i,j=1}^\infty \mu \chi _{E_{j,i}} = \mu , \end{aligned}$$

which proves (2.2). To prove (2.3), it is sufficient to note that on \(E_{i,j}\) the map \(T\) is \(T_{i,j}\) and maps \(E_{i,j}\) on \(E_{j,i}\), and thus is clear that \(T \circ T = T_{j,i} \circ T_{i,j}\) on \(E_{i,j}\). By construction, we have that \(T_{i,j} \circ T_{j,i} =\hbox {Id}\, \mu \)-a.e. in \(E_{i,j}\). Finally, we apply Lemma 2.3 to the measures \(\pi 1_{C_i \times C_j}\) and \((\hbox {Id},T_{i,j})_{\sharp } (\mu 1_{E_{i,j}})\), both concentrated on \(C_i \times C_j\), where the cost oscillates less than \(\varepsilon \); we get

$$\begin{aligned} {\fancyscript{C}}\big ((\hbox {Id},T_{i,j})_{\sharp } (\mu 1_{E_{i,j}}) \big ) \le {{\fancyscript{C}}} ( \pi 1_{Q_{i,j}} ) + \varepsilon \Vert \pi 1_{Q_{i,j}} \Vert . \end{aligned}$$

Adding these inequalities over \(i,j\in \mathbb{N }\), we obtain (2.4). \(\square \)

3 The multimarginal problem with a Coulomb cost

The proof of Theorem 2.1 was based on a decomposition of \(\mathbb{R }^d\times \mathbb{R }^d\) in cubes of the same size where the cost has a small oscillation. Since the Coulomb cost (1.1) is not uniformly continuous, in the following lemma, we do a partition of \((\mathbb{R }^d)^n\) in cubes adapted to the cost.

Before stating the lemma, we introduce a family of subsets of \((\mathbb{R }^d)^n\) where we will choose the partition. Given the family of cells in \(\mathbb{R }^d\) defined in (2.1), we consider the family of cubes in \((\mathbb{R }^d)^n\) such that each projection on \(\mathbb{R }^d\) is a cubic cell of side \(2^{-k}\) for some \(k\)

$$\begin{aligned} {\fancyscript{Q}}^k( (\mathbb{R }^d)^n) = \left\{ C_1 \times \ldots \times C_n : C_1,\ldots , C_n \in {\fancyscript{S}}^k(\mathbb{R }^d)\right\} , \qquad \forall k\in \mathbb{N }\end{aligned}$$

and we set

$$\begin{aligned} {\fancyscript{Q}}( (\mathbb{R }^d)^n) = \bigcup _{k=1}^{\infty } {\fancyscript{Q}}^k( (\mathbb{R }^d)^n). \end{aligned}$$

Lemma 3.1

(Partition of \(\{c<\infty \}\)) Let \(\varepsilon >0\) and let \(c\) be as in (1.1). There exists a subset \({\fancyscript{F}}\) of \({\fancyscript{Q}}( (\mathbb{R }^d)^n)\) such that the following properties hold true.

  1. 1.

    \({\fancyscript{F}}\) is a disjoint covering of \(\{c<\infty \}\)

    $$\begin{aligned} \bigcup _{Q\in {\fancyscript{F}}} Q= \left\{ c<\infty \right\} . \end{aligned}$$
    (3.1)
  2. 2.

    The cost \(c\) oscillates at most \(\varepsilon \) inside a cube, namely for every \(Q\in {\fancyscript{F}}\)

    $$\begin{aligned} |c(x)-c(y)| \le \varepsilon \qquad \forall x,y\in Q. \end{aligned}$$
    (3.2)
  3. 3.

    The family is cyclical, namely for every \(Q\in {\fancyscript{F}}\) we have

    $$\begin{aligned} {\overline{\sigma }} (Q) \in {\fancyscript{F}}, \end{aligned}$$
    (3.3)

    where \({\overline{\sigma }}\) is defined as in (1.8).

Proof

For every \(\mathbf{{x}}=(x_1,x_2, \ldots , x_n) \in (\mathbb{R }^d)^n \), we consider all the cubes \(Q\in {\fancyscript{Q}}( (\mathbb{R }^d)^n)\) to which \(\mathbf{{x}}\) belongs: There is exactly one cube in every \({\fancyscript{Q}}^k( (\mathbb{R }^d)^n)\), and those cubes form a chain by inclusion. We associate with every \(\mathbf{x }\in \{ c<\infty \}\) the biggest cube \(Q_{\mathbf{{x}}}\) in \({\fancyscript{Q}}( (\mathbb{R }^d)^n)\) containing \(\mathbf{x }\) and where the oscillation of the cost is less than \(\varepsilon \), namely where (3.2) holds true. Let \({\fancyscript{F}}=\{ Q_{\mathbf{{x}}} : { c( \mathbf{{x}} ) < \infty } \}\). By construction, it enjoys (3.2); moreover, it is made of disjoint cubes. Indeed, the elements of \({\fancyscript{Q}}( (\mathbb{R }^d)^n)\) (and hence of \({\fancyscript{F}}\)) are either disjoint or one contained in another; the second case cannot happen in \({\fancyscript{F}}\) because of the maximality condition on the oscillation of the cost. Furthermore, because of the property of bounded oscillations (3.2), for each \(Q \in {\fancyscript{F}}\), we have that \(Q \subseteq \{ c < \infty \}\) and so we get that

$$\begin{aligned} \left\{ c < \infty \right\} \subseteq \bigcup _{ \left\{ c( \mathbf{{x}} ) < \infty \right\} } Q_{\mathbf{{x}}} = \bigcup _{ Q \in {\fancyscript{F}}} Q \subseteq \{c < \infty \} \end{aligned}$$

which proves (3.1). Given \({\overline{\sigma }}\) defined as in (1.8), since \({\fancyscript{Q}}( (\mathbb{R }^d)^n)\) is \({\overline{\sigma }}\)-invariant and \(osc_{Q} ( c) = osc_{ {\overline{\sigma }}(Q)} (c)\) (because the cost is cyclical), we have that \(Q_{{\overline{\sigma }}(\mathbf{{x}})}={\overline{\sigma }} ( Q_{\mathbf{{x}}})\) for every \(\mathbf{x }\in \{ c<\infty \}\). Hence, \({\fancyscript{F}}\) is cyclical. \(\square \)

In the following lemma, we construct a partition of \(\mathbb{R }^d\) adapted to a transport plan starting from the partition of \(\{c<\infty \}\) constructed in Lemma 3.1.

Lemma 3.2

Let \(\varepsilon >0, \,c\) be as in (1.1), and \({\fancyscript{F}}\) be the partition introduced in Lemma 3.1. Let \(\mu \in {\fancyscript{P}}(\mathbb{R }^d)\) be a non-atomic probability measure and let \(\pi \in {\fancyscript{P}}((\mathbb{R }^d)^n)\) be a cyclical transport plan, namely a cyclical probability measure such that \(e_{i\sharp } \pi = \mu \) for every \(i=1,\ldots ,n\).

Then for every \(Q\in {\fancyscript{F}}\) there exists a Borel set \(A_Q\subseteq \mathbb{R }^d\) such that the following properties hold true:

  1. 1.

    \(A_Q\) is contained in the first projection on \(Q\) and has measure \(\pi (Q)\),

    $$\begin{aligned} A_Q \subseteq e_1(Q), \qquad \pi (Q) = \mu (A_Q) \qquad \forall Q\in {\fancyscript{F}}; \end{aligned}$$
    (3.4)
  2. 2.

    Sets associated with different cubes are essentially disjoint

    $$\begin{aligned} \mu (A_Q \cap A_{Q^{\prime }}) = 0 \qquad \forall Q, Q^{\prime }\in {\fancyscript{F}}, \; Q\ne Q^{\prime }. \end{aligned}$$
    (3.5)

To prove the lemma, we define a (total) order of construction on \({\fancyscript{F}}\) and a geometric order.

First order (\(\prec \)) Let \({\fancyscript{F}}^k = {\fancyscript{F}}\cap {\fancyscript{Q}}^k( (\mathbb{R }^d)^n)\) be the cubes in the family \({\fancyscript{F}}\) that have side equal to \(1/2^k\), named in the following the \(k\)th generation. Since every \({\fancyscript{F}}^k\) is at most countable, we enumerate its elements

$$\begin{aligned} {\fancyscript{F}}^k= \left\{ Q_{k,1} , Q_{k,2}, \ldots \right\} . \end{aligned}$$

We say that \(Q \succ Q^{\prime }\, (Q\) is older than \(Q^{\prime }\)) if \(Q\) belongs to a generation strictly smaller than \(Q^{\prime }\), or if it has a greater index in the same generation \({\fancyscript{F}}^k\). In other words we have that \(Q_{i,j} \succ Q_{i^{\prime },j^{\prime }}\) if \(i < i^{\prime }\) or if \(i=i^{\prime }\) and \(j > j^{\prime }\). We notice that \(\prec \) is a well-order if restricted to \(\cup _{i=1}^k {\fancyscript{F}}^i\) for some \(k\), so that we can use induction on it.

Second order (\(\triangleleft \)) We say that \(Q^{\prime } \triangleleft Q\) if \(Q^{\prime } \prec Q\) and \(e_1 (Q^{\prime }) \subseteq e_1(Q)\); this is not a total order. By construction, we note that \(Q\) is older than \(Q^{\prime }\) then either \(e_1(Q^{\prime }) \bigcap e_1(Q) = \emptyset \) or \(Q^{\prime } \triangleleft Q\).

Heuristics for Lemma 3.2 We notice that once we fix \(A_Q\) for some \(Q\) then, if we consider all \(Q^{\prime } \prec Q\), there might be one such that \(e_1(Q^{\prime }) \setminus A_Q\) is too little and so we have not enough “space” to have a \(A_{Q^{\prime }} \subseteq e_1(Q^{\prime })\) disjoint from \(A_Q\) and with the right measure.

Our strategy will be to choose \(A_Q\) for the smaller \(Q\) and then inductively for bigger and bigger cubes. In this way, we will have always “‘space”’ to do so. However, since we do not have a smallest \(Q\), we have to do an approximate construction. For every \(k\), we build inductively \(A_Q^k\) that satisfy our conditions (namely they are disjoint and with the right measure) for every \(Q \in \bigcup _{ j \le k} {\fancyscript{F}}^j\), from the smallest to the largest; we do a coherent construction, i.e., such that \(A_Q^k\) converges to some set \(A_Q\) in measure as \(k \rightarrow \infty \). We conclude passing to the limit the properties of \(A_Q^k\).

Before proving the result, we prove a simple measure-theoretic tool. In the following, given a Borel set \(A\subseteq \mathbb{R }^d\), we denote by \({\fancyscript{B}}(A)\) the family of Borel sets in \(\mathbb{R }^d\) which are contained in \(A\).

Lemma 3.3

Let \(\mu \in {\fancyscript{P}}(\mathbb{R }^d)\) be a non-atomic probability measure. Then for every \(A,B \in {\fancyscript{B}}(\mathbb{R }^d)\) such that \(B\subseteq A\) there exists a function \(\mathbb{B }(A,B, \cdot ):[0,\mu (A\setminus B)]\rightarrow {\fancyscript{B}}(A)\) such that

$$\begin{aligned} \mu (\mathbb{B }(A,B, t) \setminus B) = t \qquad \forall t \in [0,\mu (A\setminus B)] \end{aligned}$$
(3.6)

and

$$\begin{aligned} \mu ( \mathbb{B }(A,B, t)\setminus \mathbb{B }\left( A,B^{\prime }, t)\right) =0 \qquad \forall B'\subseteq B. \end{aligned}$$
(3.7)

Proof

We first deal with the case \(B=\emptyset \). We can assume that \(\mu (A)=1\). Using the Lyapunov convexity theorem for non-atomic measures [10], we construct a Borel set \(A_{1/2} \subseteq A\) such that \(\mu (A_{1/2}) = 1/2 \); using again this theorem, we inductively construct sets \(A_{i/2^n}\) for every \(i=1,\ldots , 2^n\), with \(A_0=\emptyset \) and \(A_1=A\), such that \(A_{i/2^n} \subseteq A_{j/2^n}\) if \(i\le j\) and such that \(\mu (A_{i/2^n}) = i/2^n\). In this way, we define the function \(\mathbb{B }(A, \emptyset , \cdot )\) on the dyadic numbers. We remind that \({\fancyscript{B}}(A)\) is a complete metric space with respect to the \(L^1\)-distance, namely the distance given by the measure of the symmetric difference \(d(E,F)=\mu (E\varDelta F)\) for every \(E,F \in {\fancyscript{B}}(A)\). By construction, the function \(\mathbb{B }(A, \emptyset , \cdot )\) defined on the dyadic numbers in \([0,1]\) is 1-Lipschitz with values in the complete metic space \({\fancyscript{B}}(A)\), and therefore, it can be extended to a continuous function; this function satisfies (3.6) and

$$\begin{aligned} \mu \left( \mathbb{B }\left( A, \emptyset , t^{\prime }\right) \setminus \mathbb{B }(A,\emptyset , t)\right) =0 \qquad \forall \, 0\le t\le t^{\prime } \le 1 \end{aligned}$$
(3.8)

For every \(B \in {\fancyscript{B}}(\mathbb{R }^d)\) and \(t\in [0, \mu (A\setminus B)]\), we define \(s\) to be a solution to

$$\begin{aligned} \mu \left( \mathbb{B }(A,\emptyset , s)\setminus B\right) = t \end{aligned}$$
(3.9)

(note that the left-hand side is continuous and non-decreasing as a function of \(s\), it is 0 at \(s=0\), it is \(\mu (A\setminus B)\) at \(s=1\)), and we set

$$\begin{aligned} \mathbb{B }(A,B, t)= \mathbb{B }\left( A,\emptyset , s\right) . \end{aligned}$$

Property (3.6) is true by definition. (3.7) follows from (3.9) and (3.8). \(\square \)

Proof of Lemma 3.2

Description of the \(k\) th construction We define \(A^k_Q = \emptyset \) for every \(Q\in {\fancyscript{F}}\setminus \bigcup _{ j \le k} {\fancyscript{F}}^j\). Then, we construct \(A^k_Q\) by induction in the set \( \bigcup _{ j \le k} {\fancyscript{F}}^j\), which is well ordered with respect to \(\prec \). Given \(Q \in \bigcup _{ j \le k} {\fancyscript{F}}^j\), let \(B^k_Q \subseteq \mathbb{R }^d\) the possibly empty set

$$\begin{aligned} B^k_Q = \bigcup ^{(k)}_{ Q^{\prime } \triangleleft Q } A_{Q^{\prime }}^k \end{aligned}$$

(where in the following the notation \( \bigcup ^{(k)}\) means that we are considering only the cubes up to the \(k\)th generation, namely in \(\bigcup _{ j \le k} {\fancyscript{F}}^j\)). Since we have that \(Q^{\prime } \triangleleft Q\) implies that \(e_1(Q^{\prime }) \subseteq e_1(Q)\) and so \(Q^{\prime } \subseteq e_1^{-1}(e_1(Q))\) and since all the cubes in \({\fancyscript{F}}\) are disjoint, we have

$$\begin{aligned} \pi (Q) \le \pi \left( e_1^{-1}(e_1(Q))\right) - \pi \bigl (\bigcup ^{(k)}_{ Q' \triangleleft Q } Q^{\prime } \bigr )= \mu (e_1(Q))- \mu (B^k_Q) = \mu \left( e_1(Q)\setminus B^k_Q\right) , \end{aligned}$$

where we used that \((e_1)_{\sharp } \pi = \mu \). Now, thanks to Lemma 3.3, we can define

$$\begin{aligned} A^k_Q = \mathbb{B }(e_1(Q),B^k_Q, \pi (Q))\setminus B^k_Q, \end{aligned}$$

which is disjoint from \(B^k_Q\).

Existence of the limit We show by induction on the order \(\prec \) that

$$\begin{aligned} A_{Q}^k \subseteq A_{Q}^{k+1} \cup B^{k+1}_{Q}, \qquad B^k_Q \subseteq B^{k+1}_{Q}, \end{aligned}$$
(3.10)

where the inclusion should be intended up to sets of \(\mu \)-measure 0. Indeed by inductive assumption, the same relation holds true for every \(Q^{\prime }\prec Q\) and hence

$$\begin{aligned} B^k_Q= \bigcup ^{(k)}_{Q^{\prime } \triangleleft Q} A_{Q^{\prime }}^k \subseteq \bigcup ^{(k)}_{Q^{\prime } \triangleleft Q} \left( A_{Q^{\prime }}^{k+1} \cup B^{k+1}_{Q'} \right) = B^{k+1}_{Q}. \end{aligned}$$
(3.11)

Thanks to (3.7) and (3.11), we therefore have

$$\begin{aligned} A^k_Q = \mathbb{B }\left( e_1(Q),B^k_Q, \pi (Q)\right) \setminus B^k_Q \subseteq \mathbb{B }\left( e_1(Q),B^{k+1}_Q, \pi (Q)\right) \subseteq A_{Q}^{k+1} \cup B^{k+1}_{Q}, \end{aligned}$$

which proves (3.10).

Since \(\mu (A_Q^k)=\mu (A_Q^{k+1})\) by (3.6), we obtain that

$$\begin{aligned} \mu \left( A_Q^k \varDelta A_Q^{k+1}\right) = 2 \mu \left( A_Q^k \setminus A_Q^{k+1}\right) . \end{aligned}$$

From (3.10) and since \(A_Q^k \cap B_Q^k =\emptyset \), we obtain that \(A_Q^k \setminus A_Q^{k+1} \subseteq B^{k+1}_Q \setminus B_Q^k\) up to sets of \(\mu \)-measure \(0\) and therefore

$$\begin{aligned} \mu (A_Q^k \setminus A_Q^{k+1} ) \le \mu \left( B^{k+1}_Q \setminus B^k_Q\right) \le \sum ^{(k+1)}_{Q^{\prime }\triangleleft Q} \pi ( Q^{\prime }) - \sum ^{(k)}_{Q^{\prime } \triangleleft Q } \pi ( Q^{\prime }) = \sum _{\begin{array}{c} Q^{\prime } \triangleleft Q \\ Q^{\prime } \in {\fancyscript{F}}^{k+1} \end{array}} \pi (Q^{\prime }). \end{aligned}$$

Hence for every \(Q \in {\fancyscript{F}}\), the sequence \(A_Q^k\) converges in measure (i.e., the characteristic functions converge in \(L^1\)) to some set \(A_Q\) that has the same mass; furthermore, since \(A_Q^k\) and \(A_{Q^{\prime }}^k\) are disjoint, we deduce that \(A_Q\) and \(A_{Q^{\prime }}\) are essentially disjoint, i.e., their intersection is \(\mu \)-negligible. \(\square \)

3.1 Proof of Theorem 1.1

In the following, we prove that given a transport plan \(\pi \), there exists a map whose cost differs less then \(\varepsilon \) from the cost of \(\pi \). The construction of the map is modeled on the following property of graphs of functions. Given \(A_{Q,1}, A_{Q,2},\ldots ,A_{Q,n}\) disjoint subsets of \((\mathbb{R }^d)^n\) and \((T_2(x),\ldots , T_n(x)): A_{Q,1} \rightarrow A_{Q,2} \times \ldots A_{Q,n}\) so that \(T_i\) is bijective between \(A_{Q,1}\) and \(A_{Q,i}\) for every \(i=2,\ldots ,n\), we set \(G\) the graph of this map, namely

$$\begin{aligned} G= \left\{ \left( x, T_2(x), \ldots , T_n(x)\right) : x\in A_{Q,1}\right\} . \end{aligned}$$

Then, we have that given \({\overline{\sigma }}\) defined as in (1.8)

$$\begin{aligned} {\overline{\sigma }}^{(i-1)}(G)&= \left\{ (T_i(x),\ldots , T_n(x), x, T_2(x),\ldots T_{i-1}(x)): x\in A_{Q,1}\right\} \nonumber \\&= \left\{ (x, T_{i+1} \circ T^{-1}_i(x),\ldots , T_n \circ T^{-1}_{i-1}(x), T^{-1}_i(x),\ldots T_i \circ T^{-1}_i(x): x\in A_{Q,i}\right\} .\nonumber \\ \end{aligned}$$
(3.12)

 Proof of Theorem 1.1

Let \(\varepsilon >0\) and let \(\pi \) be an admissible plan for the Kantorovich problem (1.3); thanks to Remark 1.2, we can assume \(\pi \) to be symmetric (namely \(\sigma _\sharp \pi = \pi \) for every \(\sigma \in {\fancyscript{O}}_n\)) and in particular cyclic [namely \({\overline{\sigma }}_\sharp \pi = \pi \) for \({\overline{\sigma }}\) defined in (1.8)]. We prove that there exists a map \(T_2:\mathbb{R }^d \rightarrow \mathbb{R }^d\) such that \(T_{2 \sharp } \mu = \mu , \,T^{(n)}_2(x) = x\) for \(\mu \)-a.e. every \(x\in \mathbb{R }^d\), and

$$\begin{aligned} {\fancyscript{C}}\left( \left( \hbox {Id}, T_2, T^{(2)}_2,\ldots ,T^{(n-1)}_2\right) _\sharp \mu \right) \le {\fancyscript{C}}\left( \pi \right) + \varepsilon \Vert \pi \Vert . \end{aligned}$$
(3.13)

We consider a partition \({\fancyscript{F}}\) as in Lemma 3.1. For every \(Q\in {\fancyscript{F}}\), we consider \(A_{Q,1}\) constructed in Lemma 3.2. Then, we consider \({\overline{\sigma }}\) defined as in (1.8), and we define for every \(i=2,\ldots ,n\)

$$\begin{aligned} A_{Q,i} = A_{{\overline{\sigma }}^{(i-1)}(Q),1}. \end{aligned}$$
(3.14)

Now, clearly \({\fancyscript{S}}^k\) is countable for every \(k\) and so we can fix a total order \(\le \) on it. For every \(i= 1,\ldots ,n\) let \({\fancyscript{F}}_i \subseteq {\fancyscript{F}}\) be the set

$$\begin{aligned} {\fancyscript{F}}_i = \left\{ Q\in {\fancyscript{F}}: e_i(Q) \le e_j(Q) \quad \forall j\ne i\right\} . \end{aligned}$$

It can be easily seen that

$$\begin{aligned} {\fancyscript{F}}_i = \left\{ {\overline{\sigma }}^{(i)}(Q): Q\in {\fancyscript{F}}_1\right\} . \end{aligned}$$

Since every \(Q\in {\fancyscript{F}}\) is contained in \(\{c<\infty \}\) and thanks to the structure of this set (in particular we remark that \(c(x_1,\ldots x_n)=\infty \) whenever \(x_i=x_j\) for some \(i\ne j\)) we have that

$$\begin{aligned} \bigcup _{i =1}^n {\fancyscript{F}}_i ={\fancyscript{F}}. \end{aligned}$$
(3.15)

Moreover, thanks to the properties of our cost function \({\overline{\sigma }}^{(i)}(Q) \cap Q= \emptyset \) for every \(i=1,\ldots ,n-1\); using also (3.5), we have that

$$\begin{aligned} \mu (A_{Q,i} \cap A_{Q^{\prime },j}) = 0 \qquad \forall Q,Q^{\prime }\in {\fancyscript{F}}_1, \; i,j\in \left\{ 1,\ldots ,n-1\right\} , \text{ such } \text{ that } ({Q,i}) \ne ({Q^{\prime },j}).\nonumber \\ \end{aligned}$$
(3.16)

Thanks to (3.16), (3.14), (3.15), and (3.4), we have that

$$\begin{aligned} \mu \left( \bigcup _{Q\in {\fancyscript{F}}_1} \bigcup _{i=1}^n A_{Q,i} \right)&= \sum _{Q\in {\fancyscript{F}}_1} \sum _{i=1}^n \mu (A_{Q,i}) = \sum _{i=1}^n \sum _{Q\in {\fancyscript{F}}_1} \mu \left( A_{{\overline{\sigma }}^{(i)}(Q),1}\right) \nonumber \\&= \sum _{i=1}^n \sum _{Q\in {\fancyscript{F}}_i} \mu (A_{Q,1}) = \sum _{Q\in {\fancyscript{F}}} \mu (A_{Q,1}) = \sum _{Q\in {\fancyscript{F}}} \pi (Q). \end{aligned}$$
(3.17)

We consider \(Q\in {\fancyscript{F}}_1\) and we want to define maps \(T_1 ,\ldots T_n\) on \(A_{Q,1} \cup \ldots \cup A_{Q,n}\). First, we define them on \(A_{Q,1}\) and then \(A_{Q,2} \cup \ldots \cup A_{Q,n}\) taking (3.12) into account. For every \(x\in A_{Q,1}\), we consider \( (T_1(x), \ldots T_n(x)): A_{Q,1} \rightarrow A_{Q,1} \times \ldots A_{Q,n}\) so that

$$\begin{aligned} \left\{ \begin{array}{l} T_1(x) = x\\ T_{i\sharp } (1_{A_{Q,1}} \mu ) = 1_{A_{Q,i}} \mu , T_i\,\text{ is }\, \mu -\,\text{ a.e. } \text{ invertible } \text{ on }\,A_{Q,1}\,\hbox {for every}\,i=2,\ldots ,n; \end{array}\right. \end{aligned}$$
(3.18)

these maps exist thanks to Lemma 2.2. With an abuse of notation, we set \(T_i^{-1}\) the almost everywhere inverse of \(T_i\) given by Lemma 2.2.

For every \(i=1,\ldots ,n\), we define \( (T_1(x), T_2(x),\ldots T_n(x)): A_{Q,i} \rightarrow A_{Q,i} \times \ldots A_{Q,n}\times A_{Q,1} \times \ldots A_{Q,i-1}\) so that \(T_1(x) = x\) and \(T_j(x) = T_{i+j-1} \circ T^{-1}_i (x)\), where the indices have to be intended modulo \(n\) (it is easy to check that these functions are well defined on \(A_{Q,i}\)).

We repeat the construction for every \(Q \in {\fancyscript{F}}_1\). Thanks to (3.16) and (3.17), the maps \(T_1,\ldots T_n\) are uniquely defined at \(\mu \)-almost every point. It can be easily checked by induction that \(T_j (x)= T_2^{(j)}(x)\) and that \(x= T_2^{(n)}(x)\) for almost every \(x\in \mathbb{R }^d\). Indeed, for \(i=1\) the statement is trivial. We assume the statement for \(j-1\), and we prove it for \(j\). For every \(x\in A_{Q,i}\) (we remark that all the indices have to be intended modulo \(n\)) we have that \(T_2(x) \in A_{Q, i+1}\); hence,

$$\begin{aligned} T^{(j)}_2(x) = T^{(j-1)}_2(T_2(x)) = T_j(T_2(x)) = T_{i+j-1} \circ T^{-1}_{i+1} \circ T_{i+1} \circ T^{-1}_{i} (x) = T_j (x). \end{aligned}$$

We prove that \(T_2\) is a transport map between \(\mu \) ad \(\mu \), i.e. \(T_{2\sharp } \mu = \mu \). By definition,

$$\begin{aligned} T_{2\sharp } \left( 1_{A_{Q,i}} \mu \right) = T_{i+1\sharp }\Big [(T^{-1}_{i})_{\sharp } (1_{A_{Q,i}} \mu )\Big ] = 1_{A_{Q, i+1}} \mu . \end{aligned}$$

Summing up over \(i=1,\ldots , n\) and \(Q\in {\fancyscript{F}}_1\), since the sets \(A_{Q,i}\) are essentially disjoint and cover \((\mathbb{R }^d)^n\) in measure, we obtain that (the indices are to be intended modulo \(n\))

$$\begin{aligned} T_{2\sharp } \mu = T_{2\sharp } \Big (\sum _{Q\in {\fancyscript{F}}_1} \sum _{i=1}^n 1_{A_{Q,i}} \mu \Big ) =\sum _{Q\in {\fancyscript{F}}_1} \sum _{i=1}^n 1_{A_{Q, i+1}} \mu = \mu . \end{aligned}$$

We are left to prove (3.13). We apply Lemma 2.3 to \((T_1 \times T_2 \times \ldots \times T_n)_\sharp (1_{A_{Q,i}}\mu )\) and \(1_{{\overline{\sigma }}^{(i-1)}(Q)} \pi \). The first plan is concentrated on \(A_{Q,i} \times \ldots A_{Q,n}\times A_{Q,1} \times \ldots A_{Q,i-1}\), which is contained in \({\overline{\sigma }}^{(i-1)}(Q)\) by (3.4), and the second is concentrated on \({\overline{\sigma }}^{(i-1)}(Q)\). Thanks to (3.2) we have that

$$\begin{aligned}&{\fancyscript{C}}\left( \left( \hbox {Id}, T_2, T^{(2)}_2,\ldots ,T^{(n-1)}_2\right) _\sharp \mu \right) - {\fancyscript{C}}(\pi )= {\fancyscript{C}}(T_1 \times T_2 \times \ldots \times T_n)_\sharp \mu ) - {\fancyscript{C}}(\pi )\nonumber \\&\quad = \sum _{Q\in {\fancyscript{F}}_1} \sum _{i=1}^n \int \limits _{A_{Q,i}} c(x, T_2(x),\ldots T_n(x))\, \mathrm{d}x\nonumber \\&\qquad ~ - \sum _{Q\in {\fancyscript{F}}_1} \sum \limits _{i=1}^n \int \limits _{{\overline{\sigma }}^{(i-1)}({Q})} c(x_1, x_2,\ldots x_n)\, \mathrm{d}\pi (x_1,\ldots ,x_n) \nonumber \\&\quad \le \sum _{Q\in {\fancyscript{F}}_1} \sum \limits _{i=1}^n \pi ({\overline{\sigma }}^{(i-1)}(Q)) \, \varepsilon = \sum _{Q\in {\fancyscript{F}}} \pi (Q)\, \varepsilon = \varepsilon , \end{aligned}$$
(3.19)

which proves (3.13). \(\square \)

4 A generalization to the multimarginal problem with unbounded continuous costs on metric spaces

To perform a partition of \(X^n\) and of \(X\) as in Lemmas 3.1 and 3.2, we need a nested structure on \(X\), playing the same role as the decomposition of \(\mathbb{R }^d\) introduced before. Following the same ideas in [2] and [4], we perform a dyadic decomposition.

Lemma 4.1

Let \(B \subset X\) be a Borel subset of the Polish space \((X, {\mathsf{d}})\). Given \(\delta >0\) there exists a partition \({\fancyscript{A}}_B^{\delta }\) of \(B\) made of disjoint sets of diameter at most \(\delta \), with

$$\begin{aligned} \bigcup _{C \in {\fancyscript{A}}_B^{\delta } } C =B \end{aligned}$$

and for every \(C \in {\fancyscript{A}}_B^{\delta }\)

$$\begin{aligned} {\mathsf{d}}(x,y) \le \delta \qquad \forall x,y \in C. \end{aligned}$$

Proof

Let us fix a countable dense set \(D=\{x_i \}_{i \in \mathbb{N }} \subseteq X\). We define, inductively, \(B_0=B\) and for every \(n \ge 1\)

$$\begin{aligned} C_n&= \left\{ x \in B_{n-1} : {\mathsf{d}}( x_{n} , x) \le \delta /2\right\} ,\\ B_n&= B_{n-1} \setminus C_n. \end{aligned}$$

If for some \(k\) we have \(B_k= \emptyset \) we end the construction.

We claim that the set \(\{C_i\}_{i \in \mathbb{N }}\) satisfies the assumptions: it is clear that, by construction, for all natural numbers \(n\), the set \(C_n\) is contained in \(B\), and furthermore, for every \(x,y \in C_n\) we have

$$\begin{aligned} {\mathsf{d}}( x, y ) \le {\mathsf{d}}( x, x_n) + {\mathsf{d}}(y,x_n) \le \frac{\delta }{2} + \frac{\delta }{2} = \delta . \end{aligned}$$

It is also clear that the \(C_i\) are disjoint. We show that their union is the whole \(B\). In fact, taking any point \(x \in B\) we know that, by density, there exists a point \(x_i\) in \(D\) such that \({\mathsf{d}}(x_i, x) \le \delta /2 \); now, if \(x \in B_{i-1}\) then \(x\) will belong to \(C_i\); otherwise, if \(x \notin B_{i-1}\) it means it belongs to some other \(C_j\), with \(j < i\). \(\square \)

In analogy with the previous situation, we define the basic cells of our decomposition. Let \(0<\eta <1\) be a fixed parameter (in Sect. 3 we took \(\eta =1/2\)). Then, we define \({\fancyscript{S}}^1 = {\fancyscript{A}}_X^{\eta }\), and then, for every \(k\ge 2\), we inductively divide the space in cells whose diameter is less than \(\eta ^k\)

$$\begin{aligned} {\fancyscript{S}}^{k} (X) = \bigcup _{C \in {\fancyscript{S}}^{k-1} } {\fancyscript{A}}_C^{\eta ^{k}}. \end{aligned}$$

Finally, we define

$$\begin{aligned} {\fancyscript{Q}}^k ( X^n )&= \left\{ C_1 \times \cdots \times C_n : C_1, \ldots C_n \in {\fancyscript{S}}^k \right\} ,\\ {\fancyscript{Q}}(X^n)&= \bigcup _{k=1}^{\infty } {\fancyscript{Q}}^k (X^n). \end{aligned}$$

So \({\fancyscript{Q}}(X^n) \) will be our family of cubes and \({\fancyscript{Q}}^k(X^n)\) will be the \(k\)th generation.

Therefore, Lemmas 3.1 and 3.2 can be repeated verbatim in this context.

The proof of Theorem 1.4 follows the same lines as the one of Theorem 1.1; the only thing which should be taken into account is that in this case, we cannot say any more that each cube has all different faces (which was obtained by the fact that \(c(x_1,\ldots x_n)=\infty \) whenever \(x_i=x_j\) for some \(i\ne j\)), but when two faces coincide, we can define the map between these faces as the identity map. More precisely, in the definition of the maps \(T_1,\ldots ,T_n\) on \(A_{Q,1}\) given in (3.18), we put \(T_i(x)=T_j(x)\) if \(A_{Q,i}=A_{Q,j}\) and \(T_i(x)=x\) if \(A_{Q,1}=A_{Q,i}\).