Optimal transport maps on Alexandrov spaces revisited

We give an alternative proof for the fact that in $n$-dimensional Alexandrov spaces with curvature bounded below there exists a unique optimal transport plan from any purely $(n-1)$-unrectifiable starting measure, and that this plan is induced by an optimal map.


Introduction
The problem of optimal mass transportation has a long history, starting from the work of Monge [27] in the late 18th century. In the original formulation of the problem, nowadays called the Monge-formulation, the problem is to find the transport map T minimizing the transportation cost R n c(x, T (x)) dµ 0 (x), (1.1) among all Borel maps T : R n → R n transporting a given probability measure µ 0 to another given probability measure µ 1 , that is, T ♯ µ 0 = µ 1 . In the original problem of Monge, the cost function c(x, y) was the Euclidean distance. Later, other cost functions have been considered, in particular much of the study has involved the distance squared cost, c(x, y) = |x − y| 2 , which is the cost studied also in this paper. In the Monge-formulation (1.1) of the optimal mass transportation problem the class of admissible maps T that send µ 0 to µ 1 is in most cases not closed in any suitable topology. To overcome this problem, Kantorovich [20,19] considered a larger class of optimal transports, namely, measures π on R n × R n such that the first marginal of π is µ 0 and the second is µ 1 . Such measures π are called transport plans. Kantorovich's relaxation leads to the so-called Kantorovich-formulation of the problem, inf π R n ×R n c(x, y) dπ(x, y). (1.2) Due to the closedness of the admissible transport plans and the lower semi-continuity of the cost, minimizers exist in the Kantorovich-formulation under very mild assumptions on the underlying space and the cost c.
For the quadratic cost in the Euclidean space, it was shown independently by Brenier [8] and Smith and Knott [35] that having µ 0 absolutely continuous with respect to the Lebesgue measure guarantees that the optimal transport plans (minimizer of (1.2)) are unique and given by a transport map. Moreover, the optimal transport map is given by a gradient of a convex function.
The results of Brenier and of Smith and Knott have been generalized in many ways. The most important directions of generalization have been: going from the underlying space R n to other metric spaces, considering other cost functions, and relaxing the assumption of the starting measure being absolutely continuous with respect to the reference measure (here the Lebesgue measure). In this paper, we study the direction of relaxing the absolute continuity in a more general metric space setting, the Alexandrov spaces. We note that one should be able to generalize our proof for more general costs, such as the distance to a power p ∈ (1, ∞). In order to keep the presentation simpler, we concentrate here on the distance squared cost.
The existence of optimal transportation maps in Alexandrov spaces with curvature bounded below for starting measures that are absolutely continuous with respect to the reference Hausdorff measure was proven by Bertrand [6]. Later Bertrand improved this result [7] by relaxing the assumption on the starting measure to give zero measure to c − chypersurfaces. Here we provide an alternative proof for the result of Bertrand under the slightly stronger assumption on the starting measure of pure (n − 1)-unrectifiability (see Definition 2.1 for the definition of purely (n − 1)-unrectifiability).
Theorem 1.1. Let (X, d) be an n-dimensional Alexandrov space with curvature bounded below. Then for any pair of measures µ 0 , µ 1 ∈ P 2 (X) such that µ 0 is purely (n − 1)unrectifiable, there exists a unique optimal transport plan from µ 0 to µ 1 and this transport plan is induced by a map.
The contribution of this paper is to provide a different approach to showing the existence and uniqueness of optimal transport maps than what was used by Bertrand in [6,7]. In [6], Bertrand used the local (1 + ε)-biLipschitz maps to R n on the regular set of X, and the general existence of Kantorovich potentials and their Lipschitzness. Since the singular set of X is at most (n − 1)-dimensional, and the Rademacher's theorem on R n can be restated in X via the biLipschitz maps, Bertrand concluded that the optimal transport is concentrated on a graph that is given by applying the exponential map to the gradient of the Kantorovich potential. In [7], Bertrand considered the problem in boundaryless Alexandrov spaces. He used Perelman's DC calculus to translate the problem to differentiability of convex functions on Euclidean spaces. Then the result follows from the characterization of nondifferentiability points of convex functions due to Zajíček [39].
In this paper, we translate a contradiction argument (Lemma 2.11) from the Euclidean space (which uses just cyclical monotonicity in certain geometric configurations) to the space X via the (1 + ε)-biLipschitz charts. In order to use the contradiction argument, we need to get all the used distances to be comparable. For this we use the fact that the directions of geodesics are well-defined in the biLipschitz charts (Theorem 2.7) and thus we can contract along the geodesics without changing the geometric configuration too much. Finally, the geometric configurations that result in the contradiction via cyclical monotonicity are given by the pure (n − 1)-unrectifiability (Lemma 2.2).
Let us comment on the history of the sufficient assumptions on µ 0 . The assumption of pure (n − 1)-unrectifiability was shown by McCann [26] to be sufficient for the existence of optimal maps in the case of Riemannian manifolds. A sharper condition based on the characterization by Zajíček [39] of the set of nondifferentiability points of convex functions was first used in the Euclidean context by Gangbo and McCann [15] when they showed that having an initial measure that gives zero mass to c−c -hypersurfaces is sufficient to give the existence of optimal maps. It was then shown by Gigli [16] that even in the Riemannian manifold context the sharp requirement for the starting measure to have optimal maps for any target measure is indeed that it gives zero measure to c − c -hypersurfaces. It still remains open whether zero measure of c − c -hypersurfaces also gives a full characterization in the case of Alexandrov spaces. One of the directions, the sufficiency, was obtained by Bertrand [7].
The existence of optimal maps has been studied in wider classes of metric measure spaces that satisfy some form of Ricci curvature lower bounds or weak versions of measure contraction property. These classes include CD(K, N)-spaces that were introduced by Lott and Villani [25], and by Sturm [37,36], MCP (K, N)-spaces (see Ohta [28]), and RCD(K, N) spaces that were first introduced by Ambrosio, Gigli and Savaré [3] (see also the improvements and later work by Ambrosio, Gigli, Mondino and Rajala [1], Erbar, Kuwada and Sturm [13] and Ambrosio, Mondino and Savaré [4]). All of these classes contain Alexandrov spaces with curvature lower bounds, see Petrunin [30].
It was first shown by Gigli [17], that in nonbranching CD(K, N)-spaces you do have the existence of optimal maps provided that the starting measure is absolutely continuous with respect to the reference measure. In all the subsequent work, the assumption has been the same for the starting measure, and it would be interesting to see if it can be relaxed also in the more general context of metric measure spaces with Ricci curvature lower bounds.
Also a metric version of Brenier's theorem was studied by Ambrosio, Gigli and Savaré [2]. They did not obtain the existence of optimal maps, but showed that at least the transportation distance is given by the Kantorovich potential. Later, Ambrosio and Rajala [5] showed that under sufficiently strong nonbranching assumptions one can conclude the existence of optimal maps. Rajala and Sturm [32] noticed that strong CD(K, ∞) spaces, and hence RCD(K, ∞) spaces are at least essentially nonbranching, and that this weaker form of nonbranching is sufficient for carrying out Gigli's proof. This result was later improved by Gigli, Rajala and Sturm [18]. Essential nonbranching was then studied together with the measure contraction property MCP (K, N) by Cavalletti and Huesmann [11] and Cavalletti and Mondino [12], and finally it was shown by Kell [22] that under a weak type measure contraction property, the essential nonbranching characterizes the uniqueness of optimal transports and that the unique optimal transport is given by a map for absolutely continuous starting measures.
The existence of optimal transport maps in CD(K, N) spaces without any extra assumption on nonbranching is still an open problem. An intermediate definition between CD(K, N) and essentially nonbranching CD(K, N), called very strict CD(K, N), was studied by Schultz [33]. He showed that in these spaces one still has optimal transport maps even if the space could be highly branching and the optimal plans non-unique. It is also worth noting that if one drops the assumption of essential nonbranching for MCP (K, N), then optimal transport maps need not exist. This is seen from the examples by Ketterer and Rajala [23].
The paper is organized as follows. In Section 2 we recall basic things about rectifiability, Alexandrov spaces and optimal mass transportation. While doing this, we also present a few facts that easily follow from well-known results: purely n − 1-unrectifiable measures have mass in all directions (Lemma 2.2), the singular set in an Alexandrov space is (n − 1)-rectifiable (Theorem 2.5), gradients of geodesics exist in charts in Alexandrov spaces (Theorem 2.7) and the failure of cyclical monotonicity persists after small perturbations (Lemma 2.11). In Section 3 we then put these things together and prove Theorem 1.1.

Preliminaries
In this paper (X, d) always refers to a complete and locally compact length space. By a length space we mean a metric space where the distance between any two points x and y is equal to the infimum of lengths of curves connecting x and y. By the Hopf-Rinow-Cohn-Vossen Theorem, our spaces (X, d) are then geodesic, proper and, in particular, separable. A space is called geodesic, if any two points in the space can be connected by a geodesic. By a geodesic we mean a constant speed length minimizing curve γ : [0, 1] → X. Notice that we parametrize all the geodesics by the unit interval. We denote the space of geodesics of X by Geo(X) and equip it with the supremum-distance. By a (geodesic) triangle ∆(x, y, z) we mean points x, y, z ∈ X and any choice of geodesics [x, y], [y, z] and [x, z] pairwise connecting them.
2.1. Rectifiability. For our Theorem 1.1 the starting measure µ 0 is diffused enough if it is purely n − 1-unrectifiable. Let us recall this notion.
The property of purely unrectifiable measures that we use is that they have mass in all directions. This is made precise using (one-sided) cones that are defined as follows. Given x ∈ R n , θ ∈ S n−1 , α > 0 and r > 0, we denote the open cone at x in direction θ with opening angle α, by Lemma 2.2. Let µ be a purely (n − 1)-unrectifiable measure on R n and let E ⊂ R n with µ(E) > 0. Then at µ-almost every x ∈ E we have C(x, θ, α) ∩ B(x, r) ∩ E = ∅ for all θ ∈ S n−1 , α > 0 and r > 0.
if α ′ ≥ α and r ′ ≥ r, there exist r > 0 and α > 0 such that the subset has positive µ-measure. By considering a countable dense set of directions {θ i } i∈N , we have that there exists one fixed direction θ i such that the set has positive µ-measure. But now, for evey x ∈ R n , the set E 1 ∩ B(x, r/2) is contained in a Lipschitz graph and hence E 1 is an (n − 1)-rectifiable set, giving a contradiction with the pure (n − 1)-unrectifiability of µ.

Alexandrov spaces.
Let us recall some basics about Alexandrov spaces. Unless we provide another source, all the following definitions and results can be found in [9].
Alexandrov spaces generalize sectional curvature bounds by means of comparison to constant curvature model spaces. Alexandrov spaces can be defined for instance by comparing geodesic triangles of a metric space to the corresponding ones in a model space. Let us next give precise definitions.
For each k ∈ R, let M k be a simply connected surface with constant sectional curvature equal to k, that is, for negative k, M k is a scaled hyperbolic plane, for k = 0, M k is the Euclidean plane, and for positive k, M k is a (round) sphere. Let us denote the distance between two points x, y ∈ M k by |x − y|.
Let k ∈ R. For a triplet x, y, z ∈ X, letx,ỹ,z ∈ M k be points so that the triangles ∆(x, y, z) and ∆(x,ỹ,z) have the same side lengths, that is, d(x, y) = |x −ỹ|, d(y, z) = |ỹ −z|, d(x, z) = |x −z|. We call the triangle ∆(x,ỹ,z) a comparison triangle for ∆(x, y, z). For a triangle ∆(x, y, z) in X we denote by∡ k (y, x, z) the comparison angle atx in the comparison triangle ∆(x,ỹ,z) in M k . Definition 2.3 (Alexandrov space). We say that (X, d) is an Alexandrov space (with curvature bounded below by k) if there exists k ∈ R so that for each point p ∈ X there exists a neighbourhood U of p for which the following holds. If ∆(x, y, z) ⊂ U, ∆(x,ỹ,z) its comparison triangle in M k , and w ∈ [x, y],w ∈ [x,ỹ] with d(x, w) = |x −w|, then d(w, z) ≥ |w −z|.
An Alexandrov space might have infinite (Hausdorff) dimension. In this paper we study only finite dimensional Alexandrov spaces. Recall that in an Alexandrov space every open nonempty set has the same dimension, so the dimension of an Alexandrov space is always well defined. Moreover, the dimension is either an integer or infinity. From now on, the space (X, d) is assumed to be an n-dimensional Alexandrov space with curvature bounded below by k ∈ R with n ∈ N.
We will use the fact that our purely (n − 1)-unrectifiable starting measures µ 0 live on the regular set of the space, that has nice charts. Let us recall the notion of regular and singular points.
Definition 2.4. A point p ∈ X is called regular, if the space of directions Σ p at p is isometric to the standard sphere S n−1 , or equivalently, if the Gromov-Hausdorff tangent at p is the Euclidean R n . A point p ∈ X that is not regular is called singular. The set of regular points of X is denoted by Reg(X) and the set of singular points by Sing(X).
The following result is from [29] (see also [10]). It implies that our starting measures µ 0 give zero measure to the singular set.
Proof. Notice that [29, Theorem A] states that Sing(X) has Hausdorff dimension at most n − 1. However, the proof easily gives the stronger conclusion of (n − 1)-rectifiability. Namely, observe that in the proof of [29, Theorem A] Otsu and Shioya show that Sing(X) is contained in Lipschitz images from subsets of the spaces of directions Σ p for countably many points p ∈ X. Since the points p are only needed to locally form a maximal ε-discrete net in X, they can be chosen to be regular points of X. Thus, Sing(X) is contained in countably many Lipschitz images from subsets of S n−1 and is therefore (n − 1)-rectifiable.
Our aim is to arrive at a contradiction with cyclical monotonicity at a small scale near a regular point. We will transfer the Euclidean argument to the Alexandrov space X using the following standard charts ϕ. Since we need the existence of directions of geodesics in these charts, we write the existence down explicitly inside the following theorem.

Optimal mass transportation.
In this section we recall a few basic things in optimal mass transportation. The Monge-Kantorovich formulation of optimal mass transportation problem (with quadratic cost) is to investigate for two Borel probability measures µ 0 and µ 1 the following infimum inf where the infimum is taken over all Borel probability measures π ∈ P(X ×X) which has µ 0 and µ 1 as a marginals, that is, π(A × X) = µ 0 (A) and π(X × A) = µ 1 (A) for all Borel sets A ∈ B(X). In order to guarantee that the above infimum is finite, it is standard to assume the measures µ 0 and µ 1 to have finite second moments. The set of all Borel probability measures in X with finite second moments is denoted by P 2 (X). An admissible measure that minimizes the above infimum is called an optimal (transport) plan, and the set of optimal plans between µ 0 and µ 1 is denoted by Opt(µ 0 , µ 1 ). We say that an optimal plan π is induced by a map, if there exists a Borel measurable function T : X → X so that π = (id × T ) # µ 0 . Such a map is called an optimal (transport) map. While optimal plans exist under fairly general assumptions [38], the existence of optimal maps is not true in general.
Optimality of a given transport plan depends only on the c-cyclical monotonicity of the support of the plan. Let us recall this notion.

Definition 2.8 (cyclical monotonicity). A set Γ ⊂ X × X is called c-cyclically monotone, if for all finite sets of points
holds for all permutations σ ∈ S N of {1, . . . , N}.
A characterization of optimality using c-cyclical monotonicity of the support that is sufficient for us is the following result proven in [31] which holds for continuous cost functions. . Let X be a Polish space and µ 0 , µ 1 ∈ P 2 (X). Then a transport plan π between µ 0 and µ 1 is optimal if and only if its support is c-cyclically monotone set.
In the following lemma we recall a well-known fact which allows us to localize the problem. One way to prove this is to use the result of Lisini in [24] about Wasserstein geodesics and their lifts to the space of probability measures on geodesics of X, see [14] for the proof.
Lemma 2.10. Let (X, d) be a complete and separable geodesic metric space, and let Γ ⊂ X × X be a c-cyclically monotone set. Then, the set In order to arrive at a contradiction with cyclical monotonicity, we will use the following lemma.
(2.1) The quantitative claim then follows by compactness of K: first of all notice that K ⊂ B(0, 2 + C) and thus K is bounded. The set K is also closed and hence it is compact. The function (y 1 , y 2 ) → |y 1 + y 2 | 2 |y 1 | 2 + |y 2 | 2 is continuous as a function K → R. Therefore, the maximum of the above function is achieved in K. By (2.1), this maximum is strictly less than two and hence there exists δ > 0 as in the claim.

Proof of Theorem 1.1
In order to prove the uniqueness of optimal transport plans it suffices to show that any optimal transport plan is induced by a map. Indeed, if there were two different optimal plans π 1 and π 2 , then their convex combination 1 2 (π 1 + π 2 ) would also be optimal and not given by a map. We will prove Theorem 1.1 by assuming that there exists an optimal plan that is not induced by a map, then localizing to a chart and using an Euclidean argument to find a contradiction.
Since A is a projection of a Borel set it is a Souslin set and thus µ 0 -measurable. (Actually, as a projection of a σ-compact set, A is Borel.) We will show that A has positive µ 0 measure.
2}. We will need to choose the geodesics γ 1 x and γ 2 x in a measurable way. We will also make the selection so that d(x, γ 1 x (1)) ≤ d(x, γ 2 x (1)) = 0. (3.1) By now, we have a Borel selection T of spt(π). Since p 1 (spt(π)) is a Borel set, we can extend T to a Borel map T : X → X. Consider now the set spt(π) \ Graph(T). Since T is a Borel map, the graph of T is a Borel set and thus the set spt(π) \ Graph(T) is a Borel set. Since X \ T (x) is σ-compact by the properness and separability of X, we have that (spt(π) \ Graph(T)) x is σ-compact as a closed subset of X \ T (x). Thus again by the Arsenin-Kunugui Theorem there exists a Borel selection S : p 1 (spt(π) \ Graph(T)) → X that we can further extend to a Borel map S : X → X for which we have that T (x) = S(x) for x / ∈ A, and T (x) = S(x) for x ∈ A. To have (3.1) we will define two auxiliary mapsT 1 ,T 2 : X → X × X as where h(x) := d(x, T (x)) − d(x, S(x)), and similarlỹ The mapsT 1 andT 2 are Borel maps since T, S and h are Borel maps. It remains to select the geodesics between points x and T i (x). For that, we consider the set The set G is Borel as the preimage of zero under the Borel map (x, y, γ) → sup{d(x, γ(0)), d(y, γ(1))}.
From now on, we will denote γ 1 x = T 1 (x) and γ 2 x = T 2 (x) for all x ∈ A. Notice that γ 1 x and γ 2 x satisfy (3.1). By Lemma 2.6, we have for all x ∈ A that Thus, we may write A as a countable union of sets and therefore there exists k ∈ N so that µ 0 (A k ) > 0. Notice that the sets A i are measurable, since we can write A i as the intersection of We now consider k ∈ N fixed so that µ 0 (A k ) > 0.
Step 2: localization to a chart Now we are ready to localize the problem so that we may use properties of the Euclidean space to arrive to the contradiction. We will need to choose ε > 0 sufficiently small to arrive to a contradiction with c-cyclical monotonicity in a (1 + ε)-chart given by Theorem 2.7. We define ε := δ 100 where δ = δ(2k) ∈ (0, 1/2) is the constant given by Lemma 2.11 for the k fixed above. Since µ 0 is purely (n − 1)-unrectifiable and Sing(X) is (n − 1)-rectifiable by Theorem 2.5, we have µ 0 (A k ∩ Reg(X)) = µ 0 (A k ). By Theorem 2.7 we can cover the set Reg(X) with open sets U for which the associated maps ϕ : U → R n are (1 + ε)-biLipschitz, and the limit lim tց0 ϕ(γ(t)) − ϕ(γ(0)) d(γ(t), γ(0)) exists for all geodesics γ ⊂ U. Since X is a proper metric space, it is in particular hereditarily Lindelöf. Therefore, there exists a countable subcover F of such open sets U. Hence, there exists U ∈ F for which µ 0 (U ∩ A k ) > 0. Let ϕ : U → R n be as in Theorem 2.7.
Step 3: discretization and choice of points for the contradiction Next we take a subset of A k ∩ U where the direction of the two selected geodesics is independent of the point, up to a small error ε := ε 80k 4 > 0.
(3.2) This is done by covering the set R n by sets {B(y i ,ε)} i∈N . Then there exist i, j and t 0 > 0 so that the set has positive µ 0 -measure. Notice that B is seen to be measurable by a similar argument than A i . By relabeling, we may assume that i = 1 and j = 2.
Step 4: verifying the bounds for Lemma 2.11 In the remainder of the proof we verify that the four selected points x 2 , x 1 , γ 2 x 1 (t) and γ 1 x 2 (t) give a contradiction with c-cyclical monotonicity. Towards this goal we first check that we may apply Lemma 2.11 with the selected δ.