1 Introduction

In this note we develop theory of optimal transport of vector measures. Let us first briefly describe the topic of classical optimal transport.

1.1 Optimal transport

In 1781 Gaspard Monge (see [19]) asked the following question: given two probability distributions \(\mu ,\nu \) on a metric space (Xd), how to transfer one distribution onto the other in an optimal way. The criterion of optimality was to minimise the average transported distance. Since then the topic has been developed extensively and much of this development has been done recently. We refer the reader to the books of Villani (see [25] and [26]) and to the lecture notes of Ambrosio (see [1]) for a thorough discussion, history and applications of the optimal transport problem.

The modern mathematical treatment of the problem has been initiated in 1942 by Kantorovich [14, 15]. He proposed to consider a relaxed problem of optimising

$$\begin{aligned} \int _{X\times X}d(x,y)d\pi (x,y) \end{aligned}$$

among all transference plans \(\pi \) between \(\mu \) and \(\nu \), i.e., the set \(\mathrm {\varPi }(\mu ,\nu )\) of Borel probability measures on \(X\times X\) with respective marginal distributions equal to \(\mu \) and to \(\nu \). The existence of an optimal transference plan is a straightforward consequence of the Prokhorov’s theorem, provided that X is separable.

The main question that has attracted a lot of attention is whether there exists an optimal transport plan, i.e., a Borel map \(T:X\rightarrow X\) such that \(T_{\#}\mu =\nu \) and the integral

$$\begin{aligned} \int _Xd(x,T(x))d\mu (x) \end{aligned}$$

is minimal. If we knew that an optimal transference plan is concentrated on a graph of a Borel measurable function then we could infer the existence of an optimal transport plan. The first complete answer on Euclidean space, under regularity assumptions on the considered measures, was presented in a seminal paper [13] of Evans and Gangbo. However, before that, Sudakov in [23] presented a solution of the problem that contained a flaw. The flaw has been remedied by Ambrosio in [1] and later by Trudinger and Wang in [24] for the Euclidean distance and by Caffarelli, Feldman and McCann in [5] for distances induced by norms that satisfy certain smoothness and convexity assumptions. In [6] Caravenna has carried out the original strategy of Sudakov for general strictly convex norms and eventually Bianchini and Daneri in [4] accomplished the plan of a proof of Sudakov for general norms on finite-dimensional normed spaces.

Let us describe briefly the strategy of Sudakov in the context of Euclidean spaces. We assume that the two Borel probability measures \(\mu ,\nu \) on \(\mathbb {R}^n\) are absolutely continuous with respect to the Lebesgue measure.

Let us recall that the paramount Kantorovich–Rubinstein duality formula tells that

$$\begin{aligned} \sup \left\{ \int _{\mathbb {R}^n}u d(\mu -\nu )\mid u\text { is }1\text {-Lipschitz}\right\} \end{aligned}$$
(1)

is equal to

$$\begin{aligned} \inf \left\{ \int _{\mathbb {R}^n\times \mathbb {R}^n}\Vert x-y\Vert d\pi (x,y)\mid \pi \in \mathrm {\varPi }(\mu ,\nu )\right\} . \end{aligned}$$
(2)

Let us take an optimal u and an optimal \(\pi \) in the two above optimisation problems. We may infer that

$$\begin{aligned} u(x)-u(y)=\Vert x-y\Vert \text { for }\pi \text {-almost every }(x,y)\in X\times X. \end{aligned}$$
(3)

Consider the maximal sets on which u is an isometry, called the transport rays. We see that all transport has to occur on these sets. Careful analysis of the Lipschitz function u shows that the transport rays form a foliation of the underlying space \(\mathbb {R}^n\) into line segments, up to Lebesgue measure zero. Moreover, the so-called mass balance condition holds true. This is to say, for any Borel set A that is a union of some collection of transport rays there is \(\mu (A)=\nu (A)\); see e.g. [13]. Using the mass balance condition, we may construct an optimal transport by gluing together optimal maps for each of the transport rays; see e.g. [1].

This is one of the important observations that is employed in the localisation technique, which allows to reduce the dimension of a considered problem; see a paper of Klartag [17] for application of the technique to weighted Riemannian manifolds satisfying the curvature-dimension condition in the sense of Bakry and Émery [2, 3] and papers of Cavalletti, Mondino [7, 8] for application in the setting of metric measure spaces. The localisation technique stems from convex geometry, but its generalisations have been employed to prove many novel results concerning functional inequalities, e.g. isoperimetric inequality in the metric measure spaces satisfying the synthetic curvature-dimension condition (see [7, 8]). The latter notion was introduced in the foundational papers by Sturm [21, 22] and by Lott and Villani [18] and allowed for development of a far-reaching, vast theory of metric measure spaces. We refer the reader to [12] and references therein for a broader description of the localisation technique and its history.

1.2 Optimal transport of vector measures

The purpose of this article is to investigate multi-dimensional generalisation of the optimal transport problem and its connections with the localisation technique, as proposed by Klartag in [17, Chapter 6].

We shall consider finite-dimensional linear spaces equipped with Euclidean norm and 1-Lipschitz maps \(u:\mathbb {R}^n\rightarrow \mathbb {R}^m\). A leaf \(\mathcal {S}\) of a 1-Lipschitz map \(u:\mathbb {R}^n\rightarrow \mathbb {R}^m\) is a maximal set, with respect to the order induced by inclusion, such that the restriction \(u|_{\mathcal {S}}\) is an isometry. This is to say, \(\mathcal {S}\) is a leaf, whenever for any \(x,y\in \mathcal {S}\) there is

$$\begin{aligned} \Vert u(x)-u(y)\Vert =\Vert x-y\Vert \end{aligned}$$

and for any \(z\notin \mathcal {S}\) there exists \(x\in \mathcal {S}\) such that

$$\begin{aligned} \Vert u(x)-u(z)\Vert <\Vert x-z\Vert . \end{aligned}$$

The notion of leaves is a multi-dimensional generalisation of the notion of transport rays, see Sect. 1.1, of the one-dimensional optimal transport theory. We refer the reader to [12] for a thorough study of such leaves. Let us mention that such leaves form a convex partition of \(\mathbb {R}^n\), up to Lebesgue measure zero. Moreover, any two such leaves may intersect only by their relative boundaries; see [12] for the proofs.

Suppose now that we are given a Borel probability measure \(\mu \) on \(\mathbb {R}^n\), absolutely continuous with respect to the Lebesgue measure, that satisfies m linear constrains. This is to say,

$$\begin{aligned} \int _{\mathbb {R}^n}fd\mu =0 \end{aligned}$$
(4)

for some integrable function \(f:\mathbb {R}^n\rightarrow \mathbb {R}^m\) with finite first moments, i.e.,

$$\begin{aligned} \int _{\mathbb {R}^n}\Vert f(x)\Vert \Vert x\Vert d\mu (x)<\infty . \end{aligned}$$

Let \(u:\mathbb {R}^n\rightarrow \mathbb {R}^m\) be a 1-Lipschitz map such that

$$\begin{aligned} \int _{\mathbb {R}^n}\langle u,f\rangle d\mu =\sup \Big \{\int _{\mathbb {R}^n}\langle v,f\rangle d\mu \mid v:\mathbb {R}^n\rightarrow \mathbb {R}^m \text { is }1\text {-Lipschitz}\Big \}. \end{aligned}$$
(5)

Existence of u follows by the Arzelà–Ascoli theorem.

A Borel subset A of \(\mathbb {R}^n\) shall be called a transport set associated to u, whenever for any \(x\in A\) that belongs to a unique leaf of u and any \(y\in \mathbb {R}^n\) such that

$$\begin{aligned} \Vert x-y\Vert =\Vert u(x)-u(y)\Vert , \end{aligned}$$

there is \(y\in A\). In other words, a transport set is a Borel union of a collection of leaves of u. In [17, Chapter 6] it is conjectured that for any transport set A of u

$$\begin{aligned} \int _A fd\mu =0. \end{aligned}$$
(6)

This is a generalisation of the mass balance condition, mentioned in Sect. 1.1. The affirmative answer to the conjecture would imply that one may decompose any Borel probability measure \(\mu \), satisfying m linear constraints of the form (4), into a mixture of measures, concentrated on pairwise disjoint convex subsets of \(\mathbb {R}^n\) of dimension at most m, satisfying the same linear constraints; see also [12] for a discussion of the decomposition.

If \(m=1\) then (5) is precisely the dual problem to the optimal transport problem for measures \(\rho _1, \rho _2\) given by formulae \(d\rho _1=f_+d\mu \) and \(d\rho _2=f_-d\mu \). As we see in (1), the dual problem, depends merely on the difference of measures, and therefore, it makes sense to consider the optimal transport for signed measures with total mass zero.

Inspired by this observation, in Sect. 2 we develop a theory of optimal transport with metric cost of vector measures of total mass zero and study its basic properties. The rôle of a vector measure in the problem considered above is played by the measure with density f with respect to the measure \(\mu \).

The precise formulation of the optimal transport problem for an \(\mathbb {R}^m\)-valued measure \(\eta \) on a metric space (Xd) that we deal with is as follows:

$$\begin{aligned} \inf \left\{ \int _{X\times X} d(x,y) d\Vert \pi \Vert (x,y)\mid \mathrm {P}_1\pi -\mathrm {P}_2\pi =\eta \right\} . \end{aligned}$$
(7)

Here \(\mathrm {P}_1\pi \) and \(\mathrm {P}_2\pi \) stand for the first and the second marginal of the \(\mathbb {R}^m\)-valued measure \(\pi \) respectively. The assumption on \(\eta \) is that

$$\begin{aligned} \int _Xd(x,x_0)d\Vert \mu \Vert (x)<\infty \text { for some }x_0\in X\text { and }\eta (X)=0. \end{aligned}$$

The above problem for \(m=1\) simplifies to the original optimal transport problem, as follows readily by the Kantorovich–Rubinstein formula. We prove that for \(m>1\) an analogue of this formula holds with (1) replaced by

$$\begin{aligned} \sup \left\{ \int _{X}\langle u,d\eta \rangle \mid u:X\rightarrow \mathbb {R}^m\text { is }1\text {-Lipschitz}\right\} \end{aligned}$$
(8)

and with (2) replaced by (7). This is a content of Theorem 2. We also develop a theory of the Wasserstein space \(\mathcal {W}(X,\mathbb {R}^m)\) of vector-valued measures. We identify its dual space as the space of vector-valued Lipschitz maps; see Theorem 1. Theorem 3 provides an analogue of (3) in the new setting.

The conjecture of Klartag (see [17, Chapter 6]) in the language of our theory of optimal transport of vector measures may be restated as follows. Suppose that we are given a vector measure \(\mu \) on \(\mathbb {R}^n\), with \(\mu (\mathbb {R}^n)=0\), which is absolutely continuous with respect to Lebesgue measure. Let \(u:\mathbb {R}^n\rightarrow \mathbb {R}^m\) be a 1-Lipschitz map, with respect to Euclidean norms, that attains the supremum

$$\begin{aligned} \sup \left\{ \int _{\mathbb {R}^n}\langle v,d\mu \rangle \mid v:\mathbb {R}^n\rightarrow \mathbb {R}^m\text { is }1\text {-Lipschitz}\right\} . \end{aligned}$$
(9)

It is claimed in [17] that the following mass balance condition holds true

$$\begin{aligned} \mu (A)=0\text { for any Borel set }A\text { that is a union of a family of leaves of }u. \end{aligned}$$
(10)

Using the developed theory, in Sect. 3 we resolve the conjecture in the affirmative, provided that there exists an optimal transport with marginals of its total variation that are absolutely continuous with respect to the Lebesgue measure; see Theorem 4. Note that in the one-dimensional setting, the existence of such optimal transport is clear; see (1) and (2).

We provide a counterexample to the conjecture, for the case \(m>1\); see Theorem 5. It shows that, in general, the mass balance condition (10) fails to be true. It follows that it may happen that an optimal transport with absolutely continuous marginals do not exist, unlike in the one-dimensional case.

More generally, let \(\mathcal {F}\) be any subset of 1-Lipschitz maps that is locally uniformly closed. We prove that the mass balance condition (10) fails to be true, even when the variational problem (9) is replaced by

$$\begin{aligned} \sup \left\{ \int _{\mathbb {R}^n}\langle v,d\mu \rangle \mid v\in \mathcal {F}\right\} , \end{aligned}$$
(11)

unless \(\mathcal {F}\) is trivial, i.e. consists merely of affine maps. This is shown for also any norm on \(\mathbb {R}^n\) and any strictly convex norm on \(\mathbb {R}^m\); see Theorem 6.

Note that the outline of a proof of the conjecture suggested in [17] has a gap, as follows by the results of [11].

Let us mention here that in [12] the generalisation of the localisation technique to multiple constraints is studied. In there, a partition associated to any 1-Lipschitz map \(u:\mathbb {R}^n\rightarrow \mathbb {R}^m\), \(m\le n\), is studied thoroughly. It is established that any log-concave measure on \(\mathbb {R}^n\) may be disintegrated with respect to this partition and that the resulting conditional measures, associated to leaves of maximal dimension, are again log-concave. This result is also presented in the context of spaces satisfying the curvature-dimension condition \(CD(\kappa ,N)\), thus partially confirming another conjecture of Klartag [17, Chapter 6].

Let us also mention the existence of another approach to optimal transport of vector measures that differs from ours developed by Chen, Georgiou, Tannenbaum, Tyu, Li, Osher, Haber, Yamamoto (see [9, 10] and [20]).

1.3 Outline of the paper

Section 2 is devoted to development and study of the optimal transport theory of vector measures. We define a Wasserstein space and in Theorem 1 we identify its dual. Theorem 2 provides an analogue of the Kantorovich–Rubinstein duality formula.

In Sect. 3 we study the mass balance condition for vector measures. In Theorem 4 we answer in the affirmative the a conjecture of Klartag, provided there exists an optimal transport with absolutely continuous marginals of its total variation. In Theorem 5 we provide a counterexample to the conjecture, in the Euclidean setting. In Theorem 6 we resolve the conjecture in the negative in the general setting.

2 Optimal transport of vector measures

In this section we develop the theory of optimal transport of vector measures.

Let X be a metric space with metric d. Let \(\mu \) be \(\mathbb {R}^m\)-valued Borel measure on X. If \(\pi \) is a \(\mathbb {R}^m\)-valued Borel measure on \(X\times X\), we write \(\mathrm {P}_1\pi \) for the first marginal of \(\pi \), i.e. the measure given by

$$\begin{aligned} \mathrm {P}_1\pi (A)=\pi (A\times X), \end{aligned}$$

for all Borel \(A\subset X\), and \(\mathrm {P}_2\pi \) for the second marginal of \(\pi \),

$$\begin{aligned} \mathrm {P}_2\pi (B)=\pi (X\times B), \end{aligned}$$

for all Borel \(B\subset X\). We shall consider a variational problem

$$\begin{aligned} \mathcal {I}(\mu )=\inf \left\{ {\int _{X\times X}}d(x,y) d\Vert \pi \Vert (x,y)\mid \pi \in \varGamma (\mu ) \right\} . \end{aligned}$$
(12)

Here \(\varGamma (\mu )\) is the set of all \(\mathbb {R}^m\)-valued Borel measures \(\pi \) on \(X\times X\) such that

$$\begin{aligned} \mu =\mathrm {P}_1\pi -\mathrm {P}_2\pi . \end{aligned}$$

To check whether (12) defines a meaningful quantity, we have to check whether \(\varGamma (\mu )\) is non-empty.

We shall need the following definition.

Definition 1

Let \(\sigma \) be an \(\mathbb {R}^m\)-valued Borel measure on X and let \(\theta \) be a Borel signed measure on X. A unique Borel \(\mathbb {R}^m\)-valued measure \(\sigma \otimes \theta \) such that

$$\begin{aligned} \langle \sigma \otimes \theta ,v\rangle = \langle \sigma , v\rangle \otimes \theta \end{aligned}$$

for all \(v\in \mathbb {R}^m\) we shall call a product measure. Here \(\langle \sigma ,v \rangle \otimes \theta \) is the usual product measure of \(\mathbb {R}\)-valued measures.

Remark 1

It is clear that the product measure exists. Analogously we define the product measure \(\theta \otimes \sigma \) for a Borel signed measure \(\sigma \) and a Borel \(\mathbb {R}^m\)-valued measure \(\theta \).

Proposition 1

\(\varGamma (\mu )\) is non-empty if and only if

$$\begin{aligned} \mu (X)=0. \end{aligned}$$
(13)

Proof

Clearly, if there exists \(\pi \in \varGamma (\mu )\), then

$$\begin{aligned} \mu (X)=\mathrm {P}_1\pi (X)-\mathrm {P}_2\pi (X)=\pi (X\times X)-\pi (X\times X)=0, \end{aligned}$$

so the condition (13) is satisfied. Conversely, assume that (13) holds true. Let \(\nu \) be any Borel probability measure on X. Set

$$\begin{aligned} \pi =\mu \otimes \nu . \end{aligned}$$

Here \(\mu \otimes \nu \) is the product measure; see Definition 1. Then for any Borel set \(A\subset X\) and any vector \(v\in \mathbb {R}^m\), we have

$$\begin{aligned} \langle \pi (A\times X)-\pi (X\times A),v\rangle =\langle \mu (A),v\rangle -\langle \mu (X),v\rangle \nu (A)=\langle \mu (A),v\rangle . \end{aligned}$$

This is to say, \(\mathrm {P}_1\pi -\mathrm {P}_2\pi =\mu \). \(\square \)

The quantity defined by (12) we shall call the Kantorovich–Rubinstein norm of \(\mu \).

Proposition 2

Assume that \(\mu (X)=0\). Then \(\mathcal {I}(\mu )<\infty \) provided that

$$\begin{aligned} \int _{\mathbb {R}^n}d\left( x,x_0\right) d\Vert \mu \Vert (x)<\infty \end{aligned}$$
(14)

for some (equivalently: any) \(x_0\in X\).

Proof

Define

$$\begin{aligned} \pi =\mu \otimes \delta _{x_0}. \end{aligned}$$

Here \(\delta _{x_0}\) is a probability measure such that \(\delta _{x_0}(\{x_0\})=1\). Then \(\pi \in \varGamma (\mu )\) and

$$\begin{aligned} \int _{X\times X}d(x,y)d\Vert \pi \Vert (x,y)\le \int _{X}d\left( x,x_0\right) d\Vert \mu \Vert (x). \end{aligned}$$
(15)

This shows that \(\mathcal {I}(\mu )<\infty \), provided that (14) is satisfied. The equivalence of finiteness of

$$\begin{aligned} \int _{\mathbb {R}^n}d(x,y)d\Vert \mu \Vert (x)<\infty \end{aligned}$$

for any \(y\in X\) follows by the triangle inequality. \(\square \)

Definition 2

We define the Wasserstein space \(\mathcal {W}(X,\mathbb {R}^m)\) of all Borel measures \(\mu \) on X with values in \(\mathbb {R}^m\) such that

$$\begin{aligned} \mu (X)=0 \text { and } \int _{X}d(x,x_0)d\Vert \mu \Vert (x)<\infty \end{aligned}$$

for some \(x_0\in X\). We endow it with a norm \(\Vert \mu \Vert _{\mathcal {W}(X,\mathbb {R}^m)}=\mathcal {I}(\mu )\).

Before we proceed let us recall the following definition.

We say that a non-negative Borel measure \(\mu \) on X is inner regular if for any Borel set \(B\subset X\) we have

$$\begin{aligned} \mu (B)=\sup \{\mu (K)\mid K\subset B, K \text { is a compact set}\}. \end{aligned}$$

Let us note that Ulam’s lemma tells that any finite Borel measure on a Polish space is inner regular.

Lemma 1

Suppose that X is a Polish space. Let \(\mu \) be a \(\mathbb {R}^m\)-valued Borel measure in \(\mathcal {W}(X,\mathbb {R}^m)\). Suppose that for any Lipschitz function \(u:X\rightarrow \mathbb {R}^m\)

$$\begin{aligned} \int _{X}\langle u, d\mu \rangle =0. \end{aligned}$$

Then \(\mu =0\).

Proof

We may assume that \(m=1\). Let \(\mu =\mu _+-\mu _-\) be the Hahn–Jordan decomposition of \(\mu \). There exists two disjoint, Borel sets \(A,B\subset X\) with \(\mu _+(A^c)=0\) and \(\mu _-(B^c)=0\). Choose any Borel set \(E\subset A\). As any finite measure on X is inner regular, for any \(\epsilon >0\), there exists a compact set \(K\subset E\) such that

$$\begin{aligned} \mu _+(E)\le \mu _+(K)+\epsilon . \end{aligned}$$

Define a function \(u_{\epsilon }\) by the formula

$$\begin{aligned} u_{\epsilon }(x)=(1-\frac{1}{\epsilon }\mathrm {dist}(x,K))\vee 0. \end{aligned}$$

Then \(u_{\epsilon }\) is Lipschitz, equal to one on K and equal to zero on the complement of

$$\begin{aligned} K_{\epsilon }=\{x\in X\mid \mathrm {dist}(x,K)\le \epsilon \}. \end{aligned}$$

Thus

$$\begin{aligned} 0=\int _{X}u_{\epsilon }d\mu =\mu _+(K)+\int _{K_{\epsilon }\setminus K}u_{\epsilon }d\mu , \end{aligned}$$

Therefore, by the above,

$$\begin{aligned} \mu _+(E)\le \epsilon +\mu _+(K)\le \epsilon +\mu _-\left( K_{\epsilon }\setminus K\right) . \end{aligned}$$

Letting \(\epsilon \) tend to zero, we get \(\mu _+(E)=0\). It follows that \(\mu _+=0\). Analogously, \(\mu _-=0\). This is to say, \(\mu =0\). \(\square \)

Remark 2

In what follows, we shall always assume that underlying space X is a Polish space.

Proposition 3

The function \(\mathcal {W}(X,\mathbb {R}^m)\ni \mu \mapsto \Vert \mu \Vert _{\mathcal {W}(X,\mathbb {R}^m)}\in \mathbb {R}\) is a norm.

Proof

Let us first check that

$$\begin{aligned} \Vert \mu \Vert _{\mathcal {W}(X,\mathbb {R}^m)}=0 \text { if and only if } \mu =0. \end{aligned}$$
(16)

If \(\mu =0\), then \(\pi =0\) belongs to \(\varGamma (\mu )\), so \(\Vert \mu \Vert _{\mathcal {W}(X,\mathbb {R}^m)}=0\). Conversely, assume that \(\Vert \mu \Vert _{\mathcal {W}(X,\mathbb {R}^m)}=0\). Choose any L-Lipschitz function

$$\begin{aligned} u:X\rightarrow \mathbb {R}^m. \end{aligned}$$

Then for any \(\pi \in \varGamma (\mu )\) we have

$$\begin{aligned} \Big |\int _{X}\langle u, d\mu \rangle \Big |= \Big |\int _{X\times X}\langle u(x)-u(y), d\pi (x,y)\rangle \Big |\le L \int _{X\times X}d(x,y)d\Vert \pi \Vert (x,y). \end{aligned}$$

Therefore if \(\Vert \mu \Vert _{\mathcal {W}(X,\mathbb {R}^m)}=0\), then

$$\begin{aligned} \int _{X}\langle u, d\mu \rangle =0. \end{aligned}$$

It follows by Lemma 1, that \(\mu =0\). Homogeneity of \(\Vert \cdot \Vert _{\mathcal {W}(X,\mathbb {R}^m)}\) is clear. Let us show that the triangle inequality holds. For this choose measures \(\mu ,\nu \in \mathcal {W}(X,\mathbb {R}^m)\) and any measures \(\pi \in \varGamma (\mu )\) and \(\rho \in \varGamma (\nu )\). Then

$$\begin{aligned} \mu +\nu =\mathrm {P}_1(\pi +\rho )-\mathrm {P}_2(\pi +\rho ), \end{aligned}$$

so that \(\pi +\rho \in \varGamma (\mu +\nu )\). It follows that

$$\begin{aligned} \begin{aligned} \Vert \mu +\nu \Vert _{\mathcal {W}(X,\mathbb {R}^m)}&\le \int _{X\times X}d(x,y)d\Vert \pi +\rho \Vert (x,y)\le \\&\le \int _{X\times X}d(x,y)d\Vert \pi \Vert (x,y)+ \int _{X\times X}d(x,y)d\Vert \rho \Vert (x,y). \end{aligned} \end{aligned}$$

Taking infimum over all \(\pi ,\rho \) we see that the triangle inequality holds true. \(\square \)

Proposition 4

The linear space \(\mathcal {U}\) of measures of the form

$$\begin{aligned} \sum _{i=1}^n \delta _{x_i} v_i \end{aligned}$$

for \(x_i\in X\) and \(v_i\in \mathbb {R}^m\), \(i=1,\dotsc ,n\), such that \(\sum _{i=1}^n v_i=0\), is dense in \(\mathcal {W}(X,\mathbb {R}^m)\).

Proof

Choose any measure \(\mu \in \mathcal {W}(X,\mathbb {R}^m)\). Choose any \(\epsilon >0\). Choose any point \(x_0\in X\) and a compact set K such that

$$\begin{aligned} \int _{K^c}d\left( x,x_0\right) d\Vert \mu \Vert (x)\le \epsilon . \end{aligned}$$

Choose pairwise disjoint Borel sets \(A_1,A_2,\dotsc ,A_k \subset K\) such that the diameter of each is at most \(\epsilon \) and

$$\begin{aligned} K=\bigcup _{i=1}^kA_i. \end{aligned}$$

Consider the restrictions \(\mu _i=\mu |_{A_i}\) of the measure \(\mu \) to the sets \(A_i\), \(i=1,2,\dotsc ,k\). Choose any points \(x_i\in A_i\). Then, as

$$\begin{aligned} \pi _i=\mu _i\otimes \delta _{x_i}\in \varGamma \left( \mu _i-\mu _i(X)\delta _{x_i}\right) , \end{aligned}$$

we have

$$\begin{aligned} \Vert \mu _i- \mu _i(X) \delta _{x_i}\Vert _{\mathcal {W}\left( X,\mathbb {R}^m\right) }\le \int _{X}d\left( y,x_i\right) d\Vert \mu _i\Vert (y)\le \epsilon \Vert \mu \Vert (A_i). \end{aligned}$$

Let \(A_0=K^c\) and let \(\mu _0=\mu |_{A_0}\). Then

$$\begin{aligned} \pi _0=\mu _0\otimes \delta _{x_0}\in \varGamma \left( \mu _0-\mu _0(X)\delta _{x_0}\right) , \end{aligned}$$

so

$$\begin{aligned} \Vert \mu _0-\mu _0(X)\delta _{x_0}\Vert _{\mathcal {W}\left( X,\mathbb {R}^m\right) }\le \int _{X}d\left( x,x_0\right) d\Vert \mu _0\Vert (x)\le \epsilon . \end{aligned}$$

Set

$$\begin{aligned} \nu =\sum _{i=0}^k\mu \left( A_i\right) \delta _{x_i}. \end{aligned}$$

Then \(\nu \in \mathcal {U}\). By the triangle inequality

$$\begin{aligned} \begin{aligned}&\Vert \mu -\nu \Vert _{\mathcal {W}(X,\mathbb {R}^m)}\le \sum _{i=0}^k \Vert \mu _i-\mu _i(X)\delta _{x_i}\Vert _{\mathcal {W}(X,\mathbb {R}^m)}\le \\&\quad \le \epsilon \sum _{i=1}^k\Vert \mu (A_i)\Vert +\epsilon \le \epsilon \left( \Vert \mu \Vert (X)+1\right) . \end{aligned} \end{aligned}$$

This concludes the proof. \(\square \)

Corollary 1

If X is separable, then so is the Wasserstein space \(\mathcal {W}(X,\mathbb {R}^m)\).

Proof

Fix \(n\in \mathbb {N}\). Choose a countable dense subset \(A\subset X\) and a set

$$\begin{aligned} B\subset \left\{ \left( w_1,\dotsc ,w_n\right) \in \mathbb {R}^m\times \dotsc \mathbb {R}^m\mid \sum _{i=1}^nw_i=0\right\} \end{aligned}$$
(17)

which is countable and dense in the set on the right-hand side of (17). Consider a measure \(\mu \) given by

$$\begin{aligned} \mu =\sum _{i=1}^n \delta _{x_i} v_i \end{aligned}$$

for \(x_i\in X\) and \(v_i\in \mathbb {R}^m\), \(i=1,\dotsc ,n\), such that \(\sum _{i=1}^n v_i=0\). Choose \(\epsilon >0\) and \(\tilde{x}_i\in A\), \(i=1,\dotsc ,n\), and \((\tilde{v}_i)_{i=1}^n\in B\), such that for \(i=1,\dotsc ,n\)

$$\begin{aligned} d\left( x_i,\tilde{x}_i\right)<\epsilon \text { and }\Vert v_i-\tilde{v}_i\Vert <\epsilon \text { and } \sum _{i=1}^n\tilde{v}_i=0. \end{aligned}$$

Set

$$\begin{aligned} \tilde{\mu }=\sum _{i=1}^n \delta _{\tilde{x}_i} \tilde{v}_i. \end{aligned}$$

Then

$$\begin{aligned} \Vert \mu -\tilde{\mu }\Vert _{\mathcal {W}\left( X,\mathbb {R}^m\right) }\le \Big \Vert \sum _{i=1}^n \delta _{x_i}\left( v_i-\tilde{v}_i\right) \Big \Vert _{\mathcal {W}\left( X,\mathbb {R}^m\right) }+\Big \Vert \sum _{i=1}^n \left( \delta _{x_i}-\delta _{\tilde{x}_i}\right) v_i\Big \Vert _{\mathcal {W}\left( X,\mathbb {R}^m\right) } \end{aligned}$$

Choose any \(x_0\in X\). Taking

$$\begin{aligned} \pi =\sum _{i=1}^n \delta _{x_i} \otimes \delta _{x_0}\left( v_i-\tilde{v}_i\right) \text { and } \rho =\sum _{i=1}^n \left( \delta _{x_i}\otimes \delta _{\tilde{x}_i}\right) v_i \end{aligned}$$

we see that

$$\begin{aligned} \Big \Vert \sum _{i=1}^n \delta _{x_i}\left( v_i-\tilde{v}_i\right) \Big \Vert _{\mathcal {W}\left( X,\mathbb {R}^m\right) }\le \epsilon \sum _{i=1}^nd\left( x_i,x_0\right) \end{aligned}$$

and

$$\begin{aligned} \Big \Vert \sum _{i=1}^n \left( \delta _{x_i}-\delta _{\tilde{x}_i}\right) v_i\Big \Vert _{\mathcal {W}\left( X,\mathbb {R}^m\right) }\le \epsilon \sum _{i=1}^n\Vert v_i\Vert . \end{aligned}$$

The conclusion follows now from Proposition 4. \(\square \)

Definition 3

Choose any \(x_0\in X\). Define

$$\begin{aligned} \mathcal {L}\left( X,\mathbb {R}^m\right) =\left\{ u:X\rightarrow \mathbb {R}^m\mid u \text { is Lipschitz and } u(x_0)=0\right\} , \end{aligned}$$

i.e. the Banach space of \(\mathbb {R}^m\)-valued Lipschitz functions on X taking value zero at \(x_0\), with norm

$$\begin{aligned} \Vert u\Vert _{\mathcal {L}\left( X,\mathbb {R}^m\right) }=\sup \left\{ \frac{\Vert u(x)-u(y)\Vert }{d(x,y)}\mid x,y\in X, x\ne y\right\} . \end{aligned}$$

Theorem 1

Define

$$\begin{aligned} T:\mathcal {L}\left( X,\mathbb {R}^m\right) \rightarrow \mathcal {W}\left( X,\mathbb {R}^m\right) ^* \end{aligned}$$

and

$$\begin{aligned} S:\mathcal {W}\left( X,\mathbb {R}^m\right) ^*\rightarrow \mathcal {L}\left( X,\mathbb {R}^m\right) \end{aligned}$$

by

$$\begin{aligned} T(u)(\mu )=\int _{X}\langle u,d\mu \rangle \end{aligned}$$
(18)

and

$$\begin{aligned} \langle S(\lambda )(x),w\rangle =\lambda \left( \left( \delta _x-\delta _{x_0}\right) w\right) , \end{aligned}$$
(19)

for any \(w\in \mathbb {R}^m\). Then ST are mutual reciprocals and establish an isometric isomorphism of \(\mathcal {L}(X,\mathbb {R}^m)\) and \(\mathcal {W}(X,\mathbb {R}^m)^*\).

Proof

Choose any \(\pi \in \varGamma (\mu )\). Then \(\mathrm {P}_1\pi -\mathrm {P}_2\pi =\mu \). Thus, if u is a Lipschitz map, then

$$\begin{aligned} \bigg |\int _X\langle u, d\mu \rangle \bigg |=\bigg |\int _X\langle u(x)-u(y), d\pi (x,y) \rangle \bigg |\le \Vert u\Vert _{\mathcal {L}\left( X,\mathbb {R}^m\right) }\int _X d(x,y) d\Vert \pi \Vert (x,y). \end{aligned}$$

Taking infimum over all \(\pi \in \varGamma (\mu )\), we see that

$$\begin{aligned} \bigg |\int _X\langle u, d\mu \rangle \bigg |\le \Vert u\Vert _{\mathcal {L}\left( X,\mathbb {R}^m\right) } \Vert \mu \Vert _{\mathcal {W}\left( X,\mathbb {R}^m\right) }. \end{aligned}$$

The above calculation shows that the formula (18) defines a continuous functional of norm at most \(\Vert u\Vert _{\mathcal {L}(X,\mathbb {R}^m)}\). If \(w\in \mathbb {R}^m\) if of norm one and \(x,y\in X\), \(x\ne y\), then for

$$\begin{aligned} \mu _{x,y,w}=\frac{\delta _x-\delta _y}{d(x,y)}w \end{aligned}$$
(20)

we have \(\Vert \mu _{x,y,w}\Vert _{\mathcal {W}(X,\mathbb {R}^m)}\le 1\) and for any \(u\in \mathcal {L}(X,\mathbb {R}^m)\)

$$\begin{aligned} \int _{\mathbb {R}^n}\langle u,d\mu _{x,y,w}\rangle =\frac{\langle w, u(x)-u(y)\rangle }{d(x,y)}. \end{aligned}$$

Thus

$$\begin{aligned} \Vert u\Vert _{\mathcal {L}\left( X,\mathbb {R}^m\right) }=\Vert T(u)\Vert . \end{aligned}$$

We shall now show that \(T\circ S=\mathrm {Id}\). Take any functional \(\lambda \in \mathcal {W}(X,\mathbb {R}^m)^*\). Set

$$\begin{aligned} \sigma _{x,w}=\left( \delta _x-\delta _{x_0}\right) w. \end{aligned}$$

Then \(S(\lambda ):X\rightarrow \mathbb {R}^m\) is defined by the formula

$$\begin{aligned} \langle S(\lambda )(x),w\rangle =\lambda \left( \sigma _{x,w}\right) . \end{aligned}$$

It is clear that the above formula defines \(S(\lambda )\) uniquely. Then we claim that map \(v=S(\lambda )\) is \(\Vert \lambda \Vert \)-Lipschitz. Indeed

$$\begin{aligned} \Vert v(x)-v(y)\Vert =\sup \left\{ \langle v(x)-v(y),w\rangle \mid w\in \mathbb {R}^m, \Vert w\Vert =1\right\} , \end{aligned}$$

and as

$$\begin{aligned} \langle v(x)-v(y),w\rangle = \lambda \left( \sigma _{x,w}-\sigma _{y,w}\right) \le \Vert \lambda \Vert \Vert \sigma _{x,w}-\sigma _{y,w}\Vert _{\mathcal {W}\left( X,\mathbb {R}^m\right) } \end{aligned}$$

we see that

$$\begin{aligned} \Vert v(x)-v(y)\Vert \le \Vert \lambda \Vert d(x,y), \text { since } \Vert \sigma _{x,w}-\sigma _{y,w}\Vert _{\mathcal {W}\left( X,\mathbb {R}^m\right) }\le d(x,y). \end{aligned}$$

Suppose that \(\nu =(\delta _x-\delta _y)z\). We compute

$$\begin{aligned} T(v)(\nu )=\int _X\langle v,d\nu \rangle =\int _X\langle v,z\rangle d\left( \delta _x-\delta _y\right) = \lambda \left( \sigma _{x,z}-\sigma _{y,z}\right) =\lambda (\nu ). \end{aligned}$$

We see that \(T(S(\lambda ))\) and \(\lambda \) are equal on the set spanned by \((\delta _x-\delta _y)z\), where \(x,y\in X\), \(z\in \mathbb {R}^m\). By Proposition 4, we see that \(T(S(\lambda ))\) and \(\lambda \) are equal on \(\mathcal {W}(X,\mathbb {R}^m)\).

Let us show also that \(S\circ T=\mathrm {Id}\). Choose any \(w\in \mathbb {R}^m\) and any map \(u\in \mathcal {L}(X,\mathbb {R}^m)\). Then

$$\begin{aligned} \langle S(T(u))(x),w\rangle = T(u)\left( \left( \delta _x-\delta _{x_0}\right) w\right) =\int _X\langle u, d\left( \delta _x-\delta _{x_0}\right) w\rangle =\langle u(x),w\rangle , \end{aligned}$$

as \(u(x_0)=0\). Therefore \(S(T(u))=u\). \(\square \)

Theorem 2

For any \(\mu \in \mathcal {W}(X,\mathbb {R}^m)\)

$$\begin{aligned} \sup \left\{ \int _X\langle u, d\mu \rangle \mid u:X\rightarrow \mathbb {R}^m \text { is } 1\text {-Lipschitz}\right\} =\Vert \mu \Vert _{\mathcal {W}\left( X,\mathbb {R}^m\right) }. \end{aligned}$$
(21)

Moreover, there exists 1-Lipschitz function \(u_0\) such that

$$\begin{aligned} \sup \left\{ \int _X\langle u, d\mu \rangle \mid u:X\rightarrow \mathbb {R}^m \text { is } 1\text {-Lipschitz}\right\} =\int _X\langle u_0, d\mu \rangle . \end{aligned}$$
(22)

Proof

Notice first that the left-hand side of (21) is clearly at most the right-hand side of (21). Take any \(\mu \in \mathcal {W}(X,\mathbb {R}^m)\). Then by the Hahn–Banach theorem there exists a continuous linear functional \(\lambda \) of norm one such that

$$\begin{aligned} \lambda (\mu )=\Vert \mu \Vert _{\mathcal {W}\left( X,\mathbb {R}^m\right) }. \end{aligned}$$

By Theorem 1, we know that \(\lambda \) is of the form

$$\begin{aligned} \lambda (\mu )=\int _X\langle u_0,d\mu \rangle \end{aligned}$$

for some Lipschitz map \(u_0\). The Lipschitz constant of \(u_0\) is equal to one, as

$$\begin{aligned} \Vert u_0\Vert _{\mathcal {L}\left( X,\mathbb {R}^m\right) }=\Vert \lambda \Vert =1. \end{aligned}$$

This completes the proof. \(\square \)

Definition 4

Any 1-Lipschitz function \(u:X\rightarrow \mathbb {R}^m\) such that (22) holds we shall call an optimal potential of measure \(\mu \).

Definition 5

A measure \(\pi \in \varGamma (\mu )\) such that

$$\begin{aligned} \Vert \mu \Vert _{\mathcal {W}\left( X,\mathbb {R}^m\right) }=\int _{X\times X}d(x,y)d\Vert \pi \Vert (x,y) \end{aligned}$$

we shall call an optimal transport for \(\mu \).

Theorem 3

Let \(\mu \in \mathcal {W}(X,\mathbb {R}^m)\). Let \(u\in \mathcal {L}(X,\mathbb {R}^m) \) be a 1-Lipschitz map. Let \(\pi \in \varGamma (\mu )\). The following conditions are equivalent:

  1. (i)
    $$\begin{aligned} \int _X \langle u,d\mu \rangle =\int _{X\times X}d(x,y)d\Vert \pi \Vert (x,y)=\Vert \mu \Vert _{\mathcal {W}\left( X,\mathbb {R}^m\right) } , \end{aligned}$$
  2. (ii)
    $$\begin{aligned} \int _A \langle u(x)-u(y),d\pi (x,y) \rangle =\int _A d(x,y)d\Vert \pi \Vert (x,y) \end{aligned}$$

    for any Borel set \(A\subset X\times X\),

  3. (iii)
    $$\begin{aligned} \int _X \langle u,d\mu \rangle =\int _{X\times X} d(x,y)d\Vert \pi \Vert (x,y), \end{aligned}$$
  4. (iv)

    u is an optimal potential for \(\mu \) and \(\pi \) is an optimal transport for \(\mu \).

Moreover, if the above conditions hold, then

$$\begin{aligned} \Vert u(x)-u(y)\Vert =d(x,y) \end{aligned}$$

\(\Vert \pi \Vert \)-almost everywhere.

Proof

Assume that (iii) holds. Observe that

$$\begin{aligned} \int _X\langle u, d\mu \rangle =\int _{X\times X} \langle u(x)-u(y),d\pi (x,y) \rangle . \end{aligned}$$

As

$$\begin{aligned} \int _X\langle u, d\mu \rangle \le \Vert \mu \Vert _{\mathcal {W}(X,\mathbb {R}^m)}\le \int _{X\times X}d(x,y)d\Vert \pi \Vert (x,y), \end{aligned}$$

then by (iii) we see that in the above inequalities we have equalities. This is to say, (i) holds true.

Suppose now that (i) holds. Clearly

$$\begin{aligned} \int _A \langle u(x)-u(y),d\pi (x,y) \rangle \le \int _A d(x,y)d\Vert \pi \Vert (x,y). \end{aligned}$$

If we had strict inequality in (ii) for some Borel set \(A\subset X\times X\), then the above computations show that we would get strict inequality in (i). Condition (iv) is reformulation of (i). The last part of the theorem follows readily from (ii). \(\square \)

We say that a measure \(\mu \in \mathcal {M}(Z,\mathbb {R}^m)\) is concentrated on a subset \(X\subset Z\) if there is \(\Vert \mu \Vert (Z\setminus X)=0\).

Proposition 5

Assume that \(\mathbb {R}^n,\mathbb {R}^m\) are equipped with Euclidean norms. Let \(\mu \in \mathcal {W}(\mathbb {R}^n,\mathbb {R}^m)\) be concentrated on a set \(X\subset \mathbb {R}^n\). Then

$$\begin{aligned} \Vert \mu \Vert _{\mathcal {W}\left( \mathbb {R}^n,\mathbb {R}^m\right) } =\Vert \mu \Vert _{\mathcal {W}(X,\mathbb {R}^m)}. \end{aligned}$$

Proof

The assertion is that

$$\begin{aligned} \sup \left\{ \int _{\mathbb {R}^n}\langle u,d\mu \rangle \mid u:\mathbb {R}^n\rightarrow \mathbb {R}^m\text { is }1\text {-Lipschitz}\right\} \end{aligned}$$

is equal to

$$\begin{aligned} \sup \left\{ \int _X\langle u,d\mu \rangle \mid u:X\rightarrow \mathbb {R}^m\text { is }1\text {-Lipschitz}\right\} . \end{aligned}$$

By the Kirszbraun theorem (see e.g. [16]) any 1-Lipschitz function \(u:X\rightarrow \mathbb {R}^m\) extends to a 1-Lipschitz function \(\tilde{u}:\mathbb {R}^n\rightarrow \mathbb {R}^m\). Clearly, for any such extension

$$\begin{aligned} \int _{\mathbb {R}^n}\langle \tilde{u},d\mu \rangle =\int _X\langle u,d\mu \rangle . \end{aligned}$$

The assertion follows. \(\square \)

3 Mass balance condition

Let us first provide an affirmative answer to the conjecture of Klartag, under the provision of the existence of optimal transport with absolutely continuous marginals of its total variation.

Definition 6

A leaf \(\mathcal {S}\) of a 1-Lipschitz map \(u:\mathbb {R}^n\rightarrow \mathbb {R}^m\) is a maximal set, with respect to the order induced by inclusion, such that the restriction \(u|_{\mathcal {S}}\) is an isometry. This is to say, \(\mathcal {S}\) is a leaf, whenever for any \(x,y\in \mathcal {S}\) there is

$$\begin{aligned} \Vert u(x)-u(y)\Vert =\Vert x-y\Vert \end{aligned}$$

and for any \(z\notin \mathcal {S}\) there exists \(x\in \mathcal {S}\) such that

$$\begin{aligned} \Vert u(x)-u(z)\Vert <\Vert x-z\Vert . \end{aligned}$$

It is proven in [12] that leaves of a map u that is 1-Lipschitz with respect to Euclidean norms are closed and convex sets. Two distinct leaves may intersect at most by their relative boundaries.

Definition 7

Let \(u:\mathbb {R}^n\rightarrow \mathbb {R}^m\) be a 1-Lipschitz map of Euclidean spaces. We say that a Borel set \(A\subset \mathbb {R}^n\) is a transport set associated with u if it enjoys the following property: if \(x\in A\) is contained in a unique leaf of u and \(y\in \mathbb {R}^n\) is such that

$$\begin{aligned} \Vert u(x)-u(y)\Vert =\Vert x-y\Vert , \end{aligned}$$

then \(y\in A\).

Let us remark that a Borel set \(A\subset \mathbb {R}^n\) that is a union of leaves of u is a transport set.

We shall denote by B(u) the set of all points \(x\in \mathbb {R}^n\) such that there exist at least two distinct leaves \(\mathcal {S}_1,\mathcal {S}_2\) of u such that \(x\in \mathcal {S}_1\cap \mathcal {S}_2\). In [12, Corollary 2.15] it is proven that B(u) is of Lebesgue measure zero.

Suppose that \(\mu \in \mathcal {W}(\mathbb {R}^n,\mathbb {R}^m)\). The following theorem shows that if there exists an optimal transport for \(\mu \) such that its total variation has absolutely continuous marginals, then the conjecture of Klartag holds true. Note that such existence is clear for \(m=1\), whenever \(\mu \) is absolutely continuous with respect to the Lebesgue measure \(\lambda \).

Theorem 4

Assume that \(\mathbb {R}^n,\mathbb {R}^m\) are equipped with Euclidean norms. Suppose that \(\mu \in \mathcal {W}(\mathbb {R}^n,\mathbb {R}^m)\). Let u be an optimal potential for \(\mu \). Suppose that there exists an optimal transport \(\pi \) of \(\mu \) such that

$$\begin{aligned} \mathrm {P}_1\Vert \pi \Vert \ll , \mathrm {P}_2\Vert \pi \Vert \ll \lambda . \end{aligned}$$
(23)

Then for any transport set A associated with u:

  1. (i)

    \(\mu (A)=0\),

  2. (ii)

    \(\pi |_{A\times A}\in \varGamma (\mu |_A)\) is an optimal transport of \(\mu |_A\)

  3. (iii)

    u is an optimal potential of \(\mu |_A\).

Proof

By [12, Corollary 2.15] it follows that

$$\begin{aligned} \lambda (B(u))=0. \end{aligned}$$

Suppose that (23) holds true. Then

$$\begin{aligned} \Vert \pi \Vert \left( B(u)\times \mathbb {R}^n \right) =0\text { and }\Vert \pi \Vert \left( \mathbb {R}^n\times B(u) \right) =0. \end{aligned}$$

Let

$$\begin{aligned} I=\left\{ (x,y)\in \mathbb {R}^n\times \mathbb {R}^n\mid \Vert u(x)-u(y)\Vert =\Vert x-y\Vert \right\} . \end{aligned}$$

By Theorem 3, \(\Vert \pi \Vert (I^c)=0\). Thus \(\pi \) is concentrated on the set

$$\begin{aligned} C=I\cap \left( B(u)^c\times B(u)^c\right) . \end{aligned}$$

Suppose that \((x,y)\in C\). Then, as A is a transport set, by the definition of B(u),

$$\begin{aligned} x\in A \text { if and only if }y\in A. \end{aligned}$$
(24)

Let \(\eta =\pi |_{A\times A}\). To prove (ii), it is enough to show that \(\eta \) is an optimal transport and that

$$\begin{aligned} \eta \in \varGamma \left( \mu |_{A}\right) . \end{aligned}$$

For this, let \(D\subset \mathbb {R}^n\) be any Borel set. Using the fact that \(\pi \in \varGamma (\mu )\) and the fact that \(\Vert \pi \Vert (C^c)=0\) and (24), we have

$$\begin{aligned} \begin{aligned} \mu (A\cap D)&=\int _{\mathbb {R}^n\times \mathbb {R}^n}\left( \mathbf {1}_{A\cap D}(x)-\mathbf {1}_{A\cap D}(y)\right) d\pi (x,y)\\&=\int _{\mathbb {R}^n\times \mathbb {R}^n}\mathbf {1}_{A\times A}(x,y)\left( \mathbf {1}_D(x)-\mathbf {1}_D(y)\right) d\pi (x,y)\\&=\int _{\mathbb {R}^n\times \mathbb {R}^n}\left( \mathbf {1}_D(x)-\mathbf {1}_D(y)\right) d\eta (x,y)=\mathrm {P}_1\eta (D)-\mathrm {P}_2\eta (D). \end{aligned} \end{aligned}$$

It follows that \(\pi |_{A\times A}\in \varGamma (\mu |_A)\). Then

$$\begin{aligned} \int _{A}\langle u,d\mu \rangle =\int _{\mathbb {R}^n\times \mathbb {R}^n} \mathbf {1}_C(x,y)\Big \langle \mathbf {1}_A(x) u(x)-\mathbf {1}_A(y)u(y),d\pi (x,y)\Big \rangle . \end{aligned}$$
(25)

Therefore, by (24),

$$\begin{aligned} \int _{A}\langle u,d\mu \rangle =\int _{\mathbb {R}^n\times \mathbb {R}^n}\mathbf {1}_{A\times A}(x,y)\Big \langle u(x)-u(y),d\pi (x,y)\Big \rangle . \end{aligned}$$

By condition (ii) of Theorem 3 we see that

$$\begin{aligned} \int _{A}\langle u,d\mu \rangle =\int _{A\times A}\Vert x-y\Vert d\Vert \pi \Vert (x,y). \end{aligned}$$

Theorem 3, condition (iii), tells us that \(\pi |_{A\times A}\) is an optimal transport and u is an optimal potential. Also \(\mu (A)=0\), as \(\pi |_{A\times A}\in \varGamma (\mu |_A)\). This completes the proof. \(\square \)

We shall now provide necessary tools for the aforementioned counterexample to the conjecture of Klartag.

In fact we shall provide a more general theorem for which we shall consider locally uniformly closed subsets subsets \(\mathcal {F}\) of 1-Lipschitz maps of \(\mathbb {R}^n\) to \(\mathbb {R}^m\) endowed with norms which are not necessarily Euclidean. Suppose that a measure \(\mu \) belongs to \(\mathcal {W}(\mathbb {R}^n,\mathbb {R}^m)\). We consider supremum of integrals

$$\begin{aligned} \int _{\mathbb {R}^n}\langle u,d\mu \rangle \end{aligned}$$
(26)

taken over all \(u\in \mathcal {F}\). An optimal \(u_0\in \mathcal {F}\), i.e. the map that satisfies

$$\begin{aligned} \int _{\mathbb {R}^n}\langle u_0,d\mu \rangle =\sup \left\{ \int _{\mathbb {R}^n}\langle u,d\mu \rangle \mid u\in \mathcal {F}\right\} , \end{aligned}$$

we shall call an \(\mathcal {F}\)-optimal potential of \(\mu \).

Lemma 2

Let \(X\subset \mathbb {R}^n\) be a compact set. Suppose that \((\mu _k)_{k=1}^{\infty }\subset \mathcal {W}(\mathbb {R}^n,\mathbb {R}^m)\) are all supported on X and converge weakly* to \(\mu _0\in \mathcal {W}(\mathbb {R}^n,\mathbb {R}^m)\), i.e. for any continuous and bounded function \(g:\mathbb {R}^n\rightarrow \mathbb {R}^m\) we have

$$\begin{aligned} \lim _{k\rightarrow \infty }\int _{\mathbb {R}^n}\langle g,d\mu _k\rangle =\int _{\mathbb {R}^n}\langle g,d\mu _0\rangle . \end{aligned}$$

Suppose that for \(k=1,2,\dotsc ,\) \(u_k\in \mathcal {F}\) is an \(\mathcal {F}\)-optimal potential of \(\mu _k\) and that \(u_k\) converge locally uniformly to \(u_0:\mathbb {R}^n\rightarrow \mathbb {R}^m\). Then \(u_0\) is an \(\mathcal {F}\)-optimal potential of \(\mu _0\).

Proof

By the assumption, for any continuous and bounded map \(g:\mathbb {R}^n\rightarrow \mathbb {R}^m\), we have

$$\begin{aligned} \lim _{k\rightarrow \infty }\int _{\mathbb {R}^n}\langle g,d(\mu _k-\mu _0)\rangle =0. \end{aligned}$$

In particular, as \(\mu _k\) are all supported on X, we have

$$\begin{aligned} \lim _{k\rightarrow \infty }\int _{\mathbb {R}^n}\langle u_0,d\left( \mu _k-\mu _0\right) \rangle =0. \end{aligned}$$

By the Banach–Steinhaus theorem, the sequence \((\mu _k)_{k=1}^{\infty }\) is bounded in the total variation norm. Hence, by uniform convergence on X,

$$\begin{aligned} \lim _{k\rightarrow \infty }\int _{\mathbb {R}^n}\langle u_k-u_0,d\mu _k\rangle =0. \end{aligned}$$

It follows that

$$\begin{aligned} \int _{\mathbb {R}^n}\langle u_k,d\mu _k\rangle =\int _{\mathbb {R}^n}\langle u_0,d\mu _k\rangle +\int _{\mathbb {R}^n}\langle u_k-u_0,d\mu _k\rangle \end{aligned}$$

converges to \(\int _{\mathbb {R}^n}\langle u_0,d\mu _0\rangle \). As for any 1-Lipschitz map \(h\in \mathcal {F}\) we have

$$\begin{aligned} \int _{\mathbb {R}^n}\langle h,d\mu _k\rangle \le \int _{\mathbb {R}^n}\langle u_k,d\mu _k\rangle . \end{aligned}$$

we also have

$$\begin{aligned} \int _{\mathbb {R}^n}\langle h,d\mu _0\rangle \le \int _{\mathbb {R}^n}\langle u_0,d\mu _0\rangle . \end{aligned}$$

The proof is complete. \(\square \)

Below we shall denote by \(B(x,\epsilon )\) an open ball of radius \(\epsilon >0\) centred at \(x\in \mathbb {R}^n\).

Lemma 3

Let \(m\le n\). Let \(\mu \in \mathcal {W}(\mathbb {R}^n,\mathbb {R}^m)\) and let u be an optimal potential of \(\mu \). Let A be the union of all leaves of dimension at least one. Then A is Borel measurable. Suppose that there exists an optimal transport \(\pi \) for \(\mu \) or that any transport set of u is of \(\mu \)-measure zero. Then

$$\begin{aligned} \Vert \mu \Vert (A^c)=0. \end{aligned}$$

Proof

Observe that

$$\begin{aligned} A=\bigcup _{n=1}^{\infty }\left\{ x\in \mathbb {R}^n\mid \sup \left\{ \frac{\Vert u(x)-u(y)\Vert }{\Vert x-y\Vert }\mid y\in \mathrm {cl}B(x,n)\setminus B(x,1/n)\right\} =1\right\} . \end{aligned}$$

The function

$$\begin{aligned} \mathbb {R}^n\ni x\mapsto \sup \left\{ \frac{\Vert u(x)-u(y)\Vert }{\Vert x-y\Vert }\mid y\in \mathrm {cl}B(x,n)\setminus B(x,1/n)\right\} \in \mathbb {R} \end{aligned}$$

is lower semi-continuous, hence Borel measurable. Thus, A is Borel measurable. Suppose that there exists an optimal transport \(\pi \) for \(\mu \). By Theorem 3, \(\pi \) is supported on the set

$$\begin{aligned} I=\left\{ (x,y)\in \mathbb {R}^n\times \mathbb {R}^n \mid \Vert u(x)-u(y)\Vert =\Vert x-y\Vert \right\} . \end{aligned}$$

As \(\mu =\mathrm {P}_1\pi -\mathrm {P}_2\pi \), for any Borel set \(B\subset A^c\), we have

$$\begin{aligned} \mu (B)=\pi \left( B\times \mathbb {R}^n\right) -\pi \left( \mathbb {R}^n\times B\right) =0, \end{aligned}$$

for if \(B\subset A^c\), then

$$\begin{aligned} \left( B\times \mathbb {R}^n\right) \cap I\subset \left\{ (x,x)\mid x\in \mathbb {R}^n\right\} \text { and }\left( \mathbb {R}^n\times B\right) \cap I\subset \left\{ (x,x)\mid x\in \mathbb {R}^n\right\} . \end{aligned}$$

Suppose now that any transport set for u is of \(\mu \) measure zero. Observe that any Borel set \(B\subset A^c\) is a transport set. The conclusion follows. \(\square \)

In the theorem below we shall provide a counterexample to the conjecture of Klartag.

Theorem 5

Assume that \(m>1\). There exists an absolutely continuous measure \(\mu \in \mathcal {W}(\mathbb {R}^n,\mathbb {R}^m)\) for which there exists a transport set associated with an optimal potential of \(\mu \) with non-zero measure \(\mu \).

In particular, there is no optimal transport \(\pi \) for \(\mu \) such that

$$\begin{aligned} \mathrm {P}_1\Vert \pi \Vert \ll \lambda \text { and }\mathrm {P}_2\Vert \pi \Vert \ll \lambda . \end{aligned}$$

Proof

Choose any \(v_1,\dotsc ,v_{m+1}\in \mathbb {R}^m\) such that

$$\begin{aligned} \sum _{i=1}^{m+1}v_i=0 \end{aligned}$$

and that are affinely independent. For \(\epsilon >0\) set

$$\begin{aligned} \mu _{\epsilon }=\frac{1}{\lambda (B(0,\epsilon ))}\sum _{i=1}^{m+1} \lambda |_{B\left( x_i,\epsilon \right) }v_i, \end{aligned}$$

where \(x_1,\dotsc ,x_{m+1}\in \mathbb {R}^n\) are pairwise distinct points to be specified later. Here \(\lambda \) denotes the Lebesgue measure on \(\mathbb {R}^n\). Then \(\mu _{\epsilon }\in \mathcal {W}(\mathbb {R}^n,\mathbb {R}^m)\). Suppose that for some sequence \((\epsilon _k)_{k=1}^{\infty }\) converging to zero there is

$$\begin{aligned} \mu _{\epsilon _k}\left( C_k\right) =0 \end{aligned}$$

for any transport set \(C_k\) of \(u_k\), where \(u_k:\mathbb {R}^n\rightarrow \mathbb {R}^m\) is an optimal potential of \(\mu _{\epsilon _k}\). For \(k\in \mathbb {N}\) and \(i=1,\dotsc ,m+1\) consider the union \(N_{ik}\) of all non-trivial leaves of \(u_k\) – i.e. of dimension at least one – that intersect \(\mathrm {cl}B(x_i,\epsilon _k)\). Then \(N_{ik}\) is a transport set. Its Borel measurability follows by Lemma 3. Indeed, denote \(B=\mathrm {cl}B(x_i,\epsilon _k)\); then the function

$$\begin{aligned} \mathbb {R}^n\setminus B\ni x\mapsto \sup \left\{ \frac{\Vert u_k(x)-u_k(y)\Vert }{\Vert x-y\Vert }\mid y\in B\right\} \in \mathbb {R} \end{aligned}$$

is lower semi-continuous and therefore

$$\begin{aligned} N_{ik}=\left\{ x\in \mathbb {R}^n\setminus B\mid \sup \left\{ \frac{\Vert u_k(x)-u_k(y)\Vert }{\Vert x-y\Vert }\mid y\in B\right\} =1\right\} \cup \left( B\cap A_k\right) \end{aligned}$$

is a Borel set. Here \(A_k\) is a set of all leaves of dimension at least one corresponding to \(u_k\), c.f. Lemma 3. Thus \(\mu _{\epsilon _k}(N_{ik})=0\). Hence,

$$\begin{aligned} \sum _{j=1}^{m+1}v_j\lambda \left( B\left( x_j,\epsilon _k\right) \cap N_{ik}\right) =0. \end{aligned}$$
(27)

As \(\mu _{\epsilon _k}\), by Lemma 3, is concentrated on non-trivial leaves of \(u_k\), we have for

$$\begin{aligned} \frac{\lambda \left( B\left( x_i,\epsilon _k\right) \cap N_{ik}\right) }{\lambda \left( B\left( 0,\epsilon _k\right) \right) }v_i =\mu _{\epsilon _k}\left( B\left( x_i,\epsilon _k\right) \cap N_{ik}\right) =\mu _{\epsilon _k}\left( B\left( x_i,\epsilon _k\right) \right) =v_i. \end{aligned}$$

By (27) and assumption on the vectors \(v_1,\dotsc ,v_{m+1}\)

$$\begin{aligned} \lambda \left( B\left( x_j,\epsilon _k\right) \cap N_{ik}\right) =\lambda \left( B\left( 0,\epsilon _k\right) \right) \text { for all }j=1,\dotsc ,m+1. \end{aligned}$$

Thus we infer that for any \(k\in \mathbb {N}\) and for all \(r,s=1,\dotsc ,m+1\), \(r\ne s\), there exist points

$$\begin{aligned} \left( x_{rs}^k,x_{sr}^k\right) \in B\left( x_r,\epsilon _k\right) \times B\left( x_s,\epsilon _k\right) \end{aligned}$$

such that

$$\begin{aligned} \Vert u_k\left( x_{rs}^k\right) -u_k\left( x_{sr}^k\right) \Vert =\Vert x_{rs}^k-x_{sr}^k\Vert . \end{aligned}$$

Using Arzelà–Ascoli theorem and passing to a subsequence we may assume that maps \(u_k\) converge locally uniformly to some 1-Lipschitz map \(u_0\). Observe now that

$$\begin{aligned} x_{rs}^k\text { converges to }x_r\text { for all }r,s=1,\dotsc ,m+1. \end{aligned}$$

Thus, by the locally uniform convergence, \(u_0\) is an isometry on \(\{x_1,\dotsc ,x_{m+1}\}\). Observe that

$$\begin{aligned} \mu _{\epsilon _k}\text { converges weakly* to }\mu _0=\sum _{i=1}^{m+1}\delta _{x_i}v_i. \end{aligned}$$

Now, Lemma 2 tells us that \(u_0\) is an optimal potential of \(\mu _0\).

Suppose that points \(x_1,\dotsc ,x_{m+1}\) are such that for \(i\ne j\), \(i,j=1,\dotsc ,m\),

$$\begin{aligned} \Big \langle \frac{x_i-x_{m+1}}{\Vert x_i-x_{m+1}\Vert } , \frac{x_j-x_{m+1}}{\Vert x_j-x_{m+1}\Vert }\Big \rangle < \Big \langle \frac{v_i}{\Vert v_i\Vert },\frac{v_j}{\Vert v_j\Vert }\Big \rangle . \end{aligned}$$
(28)

Then if we define \(h:\{x_1,\dotsc ,x_{m+1}\}\rightarrow \mathbb {R}^m\) by

$$\begin{aligned} h\left( x_{m+1}\right) =0\text {, }h(x_i)=\Vert x_i-x_{m+1}\Vert \frac{v_i}{\Vert v_i\Vert }\text { for }i=1,\dotsc ,m, \end{aligned}$$

then h is 1-Lipschitz. By the Kirszbraun theorem we may assume that h is defined on the entire space. Moreover for

$$\begin{aligned} \pi =\sum _{i=1}^{m+1}v_i\delta _{\left( x_i,x_{m+1}\right) } \end{aligned}$$

we have

$$\begin{aligned} \mathrm {P}_1\pi -\mathrm {P}_2\pi =\mu _0 \end{aligned}$$

and

$$\begin{aligned} \pi =\sum _{i=1}^{m} \frac{h\left( x_i\right) -h\left( x_{m+1}\right) }{\Vert x_i-x_{m+1}\Vert } \Vert v_i\Vert \delta _{\left( x_i,x_{m+1}\right) } \end{aligned}$$

Theorem 3 yields that h is an optimal potential and \(\pi \) is an optimal transport. It follows that

$$\begin{aligned} \Vert \mu _0\Vert _{\mathcal {W}\left( \mathbb {R}^n,\mathbb {R}^m\right) } =\sum _{i=1}^m\Vert v_i\Vert \Vert x_i-x_{m+1}\Vert . \end{aligned}$$

Theorem 3 tells us that also

$$\begin{aligned} \pi =\sum _{i=1}^{m}\frac{u_0\left( x_i\right) -u_0\left( x_{m+1}\right) }{\Vert x_i-x_{m+1}\Vert }\Vert v_i\Vert \delta _{\left( x_i,x_{m+1}\right) } \end{aligned}$$

As \(u_0\) is an isometry on \(\{x_1,\dotsc ,x_{m+1}\}\), it follows that for \(i,j=1,\dotsc ,m\)

$$\begin{aligned} \Vert h\left( x_i\right) -h\left( x_j\right) \Vert =\Vert x_i-x_j\Vert \end{aligned}$$

which is not true, as the inequalities in (28) are strict. The obtained contradiction shows that there is no such sequence \((\epsilon _k)_{k=1}^{\infty }\), i.e. there exists \(\epsilon _0>0\) such that for all \(\epsilon \in (0,\epsilon _0)\) there exists a transport set with non-zero measure \(\mu _{\epsilon }\) for any optimal potential of \(\mu _{\epsilon }\).

By Theorem 4 it follows that for such \(\epsilon \) there is is no optimal transport with absolutely continuous marginals for \(\mu _{\epsilon }\). \(\square \)

The proof of the following theorem is based on the same idea as the proof of Theorem 5. Note that we do not require below that the norms on \(\mathbb {R}^n\) and on \(\mathbb {R}^m\) are Euclidean. For a 1-Lipschitz map \(u:\mathbb {R}^n\rightarrow \mathbb {R}^m\) a leaf of u is a maximal, with respect to the order induced by inclusion, set \(\mathcal {S}\) such that \(u|_{\mathcal {S}}\) is an isometry. A transport set is defined as a set A that enjoys the property that if \(x\in A\) belongs to a unique leaf of u, then for any \(y\in \mathbb {R}^n\) such that \(\Vert u(y)-u(x)\Vert =\Vert y-x\Vert \) there is \(y\in A\). This is to say, the leaves and transport sets are defined as in the Euclidean case.

Theorem 6

Let \(m\le n\). Suppose that the norm on \(\mathbb {R}^m\) is strictly convex. Suppose that \(\mathcal {F}\) is a locally uniformly closed subset of 1-Lipschitz maps of \(\mathbb {R}^n\) to \(\mathbb {R}^m\). Suppose that \(\mathcal {F}\) has the property that for any absolutely continuous measure \(\mu \in \mathcal {W}(\mathbb {R}^n,\mathbb {R}^m)\) and any \(\mathcal {F}\)-optimal potential \(u_0\) of \(\mu \) we have \(\mu (A)=0\) for any transport set A of \(u_0\). Then either \(m=1\) or \(m>1\) and

  1. (i)

    \(m=n\), any \(u\in \mathcal {F}\) is affine, and there exists \(u\in \mathcal {F}\) that is an isometry of \(\mathbb {R}^n\) and of \(\mathbb {R}^m\),

  2. (ii)

    for any absolutely continuous \(\mu \), any \(\mathcal {F}\)-optimal potential of \(\mu \) is an isometry on a maximal subspace \(V\subset \mathbb {R}^n\), so that

    $$\begin{aligned} \mu \left( \left\{ x\in \mathbb {R}^n\mid Px\in A\right\} \right) =0\text { for any Borel set }A\subset W; \end{aligned}$$
    (29)

    here P denotes a projection onto a complement W of V.

Suppose that the norms are Euclidean. Then, if any \(\mathcal {F}\)-optimal potential is affine and is an isometry on a maximal subspace \(V\subset \mathbb {R}^n\) such that (29) holds true, then \(\mu (A)=0\) for any transport set of its \(\mathcal {F}\)-optimal potential.

Proof

Suppose that \(m>1\). Choose any pairwise distinct points \(x_1,x_2,x_3\in \mathbb {R}^n\) and any affinely independent \(v_1,v_2,v_3\in \mathbb {R}^m\) such that \(\sum _{i=1}^3v_i=0\). Let

$$\begin{aligned} \nu _0=\sum _{i=1}^3v_i\delta _{x_i}. \end{aligned}$$

Then \(\nu _0\in \mathcal {W}(\mathbb {R}^n,\mathbb {R}^m)\). For \(\epsilon >0\) let

$$\begin{aligned} \nu _{\epsilon }=\frac{1}{\lambda (B(0,\epsilon )} \sum _{i=1}^3v_i\lambda |_{B\left( x_i,\epsilon \right) } \end{aligned}$$

Choose respective \(\mathcal {F}\)-optimal potentials \(u_{\epsilon }\) for \(\nu _{\epsilon }\). These exist as \(\mathcal {F}\) is locally uniformly closed. Observe that, by the assumption, \(\nu _{\epsilon }(B_{\epsilon })=0\) for any Borel set \(B_{\epsilon }\) consisting of trivial leaves of \(u_{\epsilon }\). Whence, \(\nu _{\epsilon }\) is concentrated on non-trivial leaves of \(u_{\epsilon }\). Let \(N_{i\epsilon }\) denote the union of all non-trivial leaves that intersect \(B_{i\epsilon }=\mathrm {cl}B(x_i,\epsilon )\) for \(i=1,2,3\) and \(\epsilon >0\). The map

$$\begin{aligned} \mathbb {R}^n\setminus B_{i\epsilon }\ni x\mapsto \sup \left\{ \frac{\Vert u(x)-u(y)\Vert }{\Vert x-y\Vert }\mid y\in B_{i\epsilon }\right\} \in \mathbb {R} \end{aligned}$$

is lower semi-continuous. Note that

$$\begin{aligned} N_{i\epsilon }=\left\{ x\in \mathbb {R}^n\setminus B_{i\epsilon }\mid \sup \left\{ \frac{\Vert u(x)-u(y)\Vert }{\Vert x-y\Vert }\mid y\in B_{i\epsilon }\right\} =1\right\} \cup \left( B_{i\epsilon }\cap A_{\epsilon }\right) , \end{aligned}$$

where \(A_{\epsilon }\) denotes the union of all non-trivial leaves of \(u_{\epsilon }\) and is Borel measurable by the argument of Lemma 3. Hence \(N_{i\epsilon }\) is Borel measurable.

By the assumption,

$$\begin{aligned} \nu _{\epsilon }\left( N_{i\epsilon }\right) =0, \end{aligned}$$

which implies, as in the proof of Theorem 5, that

$$\begin{aligned} \Vert u_{\epsilon }\left( x^{\epsilon }_{rs}\right) -u_{\epsilon }\left( x^{\epsilon }_{sr}\right) \Vert =\Vert x^{\epsilon }_{rs}-x^{\epsilon }_{sr}\Vert \end{aligned}$$

for some points

$$\begin{aligned} \left( x^{\epsilon }_{rs},x^{\epsilon }_{sr}\right) \in B\left( x_r,\epsilon \right) \times B\left( x_s,\epsilon \right) , r,s=1,\dotsc ,3, r\ne s. \end{aligned}$$

By the Arzelà–Ascoli theorem and passing to a subsequence we may assume that \(u_{\epsilon }\) converges locally uniformly to some \(u_0\in \mathcal {F}\), which is an \(\mathcal {F}\)-optimal potential of \(\nu _0\) by Lemma 2. By the uniform convergence we infer that \(u_0\) is isometric on \(\{x_1,x_2,x_3\}\). Let now \(x_2=tx_1+(1-t)x_3\) for some \(t\in (0,1)\). Then any map \(f:\{x_1,x_2,x_3\}\rightarrow \mathbb {R}^m\) that is isometric satisfies

$$\begin{aligned} f\left( tx_1+(1-t)x_3\right) =tf\left( x_1\right) +(1-t)f\left( x_3\right) . \end{aligned}$$
(30)

Indeed, as f is isometric,

$$\begin{aligned} \Vert f\left( x_2\right) -f\left( x_1\right) \Vert = (1-t)\Vert x_3-x_1\Vert \text { and }\Vert f\left( x_3\right) -f\left( x_2\right) \Vert = t\Vert x_3-x_1\Vert . \end{aligned}$$

As \(\Vert f(x_3)-f(x_1)\Vert =\Vert x_3-x_1\Vert \) it follows that we have equality in the triangle inequality

$$\begin{aligned} \Vert f\left( x_3\right) -f\left( x_1\right) \Vert \le \Vert f\left( x_2\right) -f\left( x_1\right) \Vert +\Vert f\left( x_3\right) -f\left( x_2\right) \Vert . \end{aligned}$$

By the strict convexity it follows that there is \(\lambda >0 \) such that

$$\begin{aligned} f\left( x_2\right) -f\left( x_1\right) =\lambda \left( f\left( x_3\right) -f\left( x_1\right) \right) . \end{aligned}$$

Taking the norms we arrive at (30). A function f that satisfies (30) may be extended to \(\mathbb {R}^n\) to an affine map that has derivative of operator norm at most m. This follows by the Hahn–Banach theorem. As \(u_0\) is isometric on \(\{x_1,x_2,x_3\}\), we infer that

$$\begin{aligned} \sum _{i=1}^3\langle u_0\left( x_i\right) ,v_i\rangle \ \le \sup \left\{ \sum _{i=1}^3\langle f\left( x_i\right) ,v_i\rangle \mid f:\mathbb {R}^n\rightarrow \mathbb {R}^m \text { is linear and }\Vert f\Vert \le m\right\} \end{aligned}$$

Note now that the set of vectors \(v_1,v_2,v_3\) that sum up to zero and are affinely independent is dense in the set of vectors \(v'_1,v'_2,v'_3\) that sum up to zero. Moreover, \(u_0\) is an \(\mathcal {F}\)-optimal potential for \(\nu _0\). We conclude that for any \(u\in \mathcal {F}\) and any vectors \(v_1,v_2,v_3\) that sum up to zero there is

$$\begin{aligned} \sum _{i=1}^3\langle u(x_i),v_i\rangle \ \le \sup \left\{ \sum _{i=1}^3\langle f(x_i),v_i\rangle \mid f:\mathbb {R}^n\rightarrow \mathbb {R}^m\text { is linear and }\Vert f\Vert \le m\right\} \end{aligned}$$

Take now \(v_2=v\), \(v_1=-tv\) and \(v_3=-(1-t)v\) with \(t\in (0,1)\) as above and any \(v\in \mathbb {R}^m\). We infer that

$$\begin{aligned} \langle u(x_2)-tu(x_1)-(1-t)u(x_3) ,v\rangle \le 0. \end{aligned}$$

As this holds for any v we infer that u is affine. This proves part of i).

If u is affine then there exists a subspace \(V\subset \mathbb {R}^n\), possibly trivial, i.e. \(V=\{0\}\), such that any set of the form

$$\begin{aligned} \{x\in \mathbb {R}^n\mid Px\in A\} \end{aligned}$$
(31)

for a Borel measurable set \(A\subset W\) is a transport set of u. Here P denotes a projection onto a complement W of V. Indeed, let \(V\subset \mathbb {R}^n\) be a maximal subspace such that \(u|_V\) is an isometry. Suppose that V is not a leaf of u. Then there exists \(y\notin V\) such that for all \(x\in V\)

$$\begin{aligned} \Vert u(y)-u(x)\Vert =\Vert y-x\Vert . \end{aligned}$$

It follows that for all non-zero \(\lambda \in \mathbb {R}\)

$$\begin{aligned} \Big \Vert u(y)-u\left( \frac{x}{\lambda }\right) \Big \Vert =\Big \Vert y-\frac{x}{\lambda }\Big \Vert \end{aligned}$$

for all \(x\in V\). Hence for all \(\lambda \in \mathbb {R}\) we have \(\Vert u(\lambda y)-u(x)\Vert =\Vert \lambda y-x\Vert \). As u is affine, it is also an isometry on \(V+\mathbb {R}y\). This contradiction shows that V is a leaf of u. Thus ii) is proven.

We shall now provide an example of a vector measure \(\mu \) such that for any proper subspace V and any \(x_0\) there is \(c>0\) such that

$$\begin{aligned} \mu \left( \left\{ x\in \mathbb {R}^n\mid \Vert P\left( x-x_0\right) \Vert \le c\right\} \right) \ne 0. \end{aligned}$$
(32)

Choose any \(x_1,\dotsc ,x_{m+1}\in \mathbb {R}^n\) affinely independent. Let \(\epsilon >0\) be a number such that any set \(\{y_1,\dotsc ,y_{m+1}\}\), with \(y_i\in B(x_i,\epsilon )\), \(i=1,\dotsc ,m+1\), is affinely independent. Choose vectors \(v_1,\dotsc ,v_{m+1}\in \mathbb {R}^m\) that add up to zero and are affinely independent. Let

$$\begin{aligned} \mu =\sum _{i=1}^{m+1}v_i\lambda |_{B\left( x_i,\epsilon \right) }, \end{aligned}$$

where \(\lambda \) denotes the Lebesgue measure. Choose any proper affine subspace \(V\subset \mathbb {R}^n\). Then V intersects at most m of the balls \(B(x_i,\epsilon )\), \(i=1,\dotsc ,m+1\). So does the set

$$\begin{aligned} \left\{ x\in \mathbb {R}^n\mid \Vert P\left( x-x_0\right) \Vert \le c\right\} \end{aligned}$$

provided that \(c>0\) is sufficiently small. Thus (32) follows. It implies, by ii), that \(V=\mathbb {R}^n\). We have shown that any \(\mathcal {F}\)-optimal potential of \(\mu \) has to be an isometry. Hence \(m= n\) and the proof of i) is complete.

To prove the last part of the theorem, it is enough to prove that the translates of V are the only leaves of an affine map. This holds true, since any point in \(\mathbb {R}^n\) is covered by a translate of V and the leaves of a map foliate \(\mathbb {R}^n\), up to Lebesgue measure zero, if the considered norms are Euclidean, c.f. [12]. \(\square \)