Optimal transportation, topology and uniqueness

The Monge-Kantorovich transportation problem involves optimizing with respect to a given a cost function. Uniqueness is a fundamental open question about which little is known when the cost function is smooth and the landscapes containing the goods to be transported possess (non-trivial) topology. This question turns out to be closely linked to a delicate problem (# 111) of Birkhoff [14]: give a necessary and sufficient condition on the support of a joint probability to guarantee extremality among all measures which share its marginals. Fifty years of progress on Birkhoff's question culminate in Hestir and Williams' necessary condition which is nearly sufficient for extremality; we relax their subtle measurability hypotheses separating necessity from sufficiency slightly, yet demonstrate by example that to be sufficient certainly requires some measurability. Their condition amounts to the vanishing of the measure \gamma outside a countable alternating sequence of graphs and antigraphs in which no two graphs (or two antigraphs) have domains that overlap, and where the domain of each graph / antigraph in the sequence contains the range of the succeeding antigraph (respectively, graph). Such sequences are called numbered limb systems. We then explain how this characterization can be used to resolve the uniqueness of Kantorovich solutions for optimal transportation on a manifold with the topology of the sphere.

is Birkhoff's 1948 problem [14] of characterizing extremality among doubly stochastic measures on the unit square.
In this framework, existence of solutions became straightforward for any continuous cost c ∈ C(X × Y ). Still, fifty more years would elapse before the optimal volume-preserving map between two arbitrary domains sought by Monge was constructed for the Euclidean distance c(x, y) = |x − y| in [4] [22] and [99]. Evans and Gangbo had already solved the analogous problem with the domains replaced by disjoint Lipschitz continuous probability densities [42], while Sudakov's earlier construction [96] required a claim which turned out to be true only two in dimensions [4] [12]; see [23] [26] [12] for simplifications and [44] [6] [9] [47] for extensions. Uniqueness fails in this context [54] [45]. In the meantime both Monge and Kantorovich problems were found to enjoy unique solutions for strictly convex costs such as c(x, y) = |x − y| p /p, with p = 2 [17] [18] [31] [32] and p > 1 [19] [53] [54] [89] [90]. A general criterion for existence and uniqueness of optimal maps was identified by Gangbo [52] and Levin [66], building on works of those cited above. For any pair of destinations y 1 = y 2 in Y , it prohibits the function from having critical points on X. Strictly convex functions c(x, y) = h(x − y) on X = Y = R n [19] [53] satisfy this condition -called the twist criterion in [104] -but no differentiable cost c ∈ C 1 (X × Y ) satisfies it on any compact manifold X without boundary. Although terrestrial transportation takes place on the sphere, there are few theorems set in topologies other than the ball -not to speak of the more exotic landscapes which arise naturally in some applications. Spherical examples typically show that uniqueness of Kantorovich solutions holds even though Monge solutions fail to exist [55].
Building on these developments, one of the goals of this article is to expose a criterion for uniqueness of Kantorovich solutions which works equally well on the sphere and the ball [27]. Called

Extremal doubly stochastic measures
An n × n doubly stochastic matrix refers to a matrix of non-negative entries whose columns and rows each sum to 1. The doubly stochastic matrices form a convex subset of all n × n matrices -in fact a convex polytope, whose extreme points are in bijective correspondence with the n! permutations on n-letters, according to Birkhoff [13] and von Neumann [105]. For example, the 3 × 3 doubly stochastic matrices, A doubly stochastic measure on the square refers to a non-negative Borel probability measure on [0, 1] 2 whose horizontal and vertical marginals both coincide with Lebesgue measure λ on [0, 1]. The set of doubly stochastic measures forms a convex set we denote by Γ(λ, λ) (which is weak- * compact in the Banach space dual to continuous functions C([0, 1] 2 ) normed by their suprema · ∞ ). A measure is said to be extremal in Γ(λ, λ) if it cannot be decomposed as a convex combination γ = (1 − t)γ 0 + tγ 1 with 0 < t < 1 and 0 ≤ γ 0 = γ 1 ∈ Γ(λ, λ). Since the Krein-Milman theorem asserts that convex combinations of extreme points are dense (in any compact convex subset of a topological vector space, Figure 1), it is natural to want to characterize the extreme points of Γ(λ, λ). Another motivation for such a characterization is that every continuous linear functional on Γ(λ, λ) is minimized at an extreme point. Whether or not this extremum is uniquely attained can be an interesting question, as in the optimal transportation context: in Figure   1 the horizontal coordinate is minimized at a single point but maximized at two extreme points (and along the segment joining them).
Motivated by the optimization problems already mentioned, we prefer to formulate the question in slightly greater generality, by replacing the two copies of ([0, 1], λ) with probability spaces (X, µ) and (Y, ν), where X and Y are each subsets of a complete separable metric space, and µ and ν are Borel probability measures on X and Y respectively. This widens applicability of the answer to this question without increasing its difficulty. Letting Γ(µ, ν) denote the Borel probability measures on X ×Y having µ and ν for marginals, K Figure 1: Krein-Milman asserts a compact convex set K can be reconstructed from its extreme points (denoted here by solid circles • and solid lines −).
we wish to characterize the extreme points of the convex set Γ(µ, ν). Ideally, as in the finite-dimensional case, this characterization would be given in terms of some geometrical property of the support of the measure γ in X ×Y .
Indeed, if µ = m i=1 m i δ x i and ν = n j=1 n j δ y j are finite, our problem reduces to characterizing the extreme points of the convex set A of m × n matrices with prescribed column and row sums: A matrix (a ij ) is well-known to be extremal in A if and only if it is acyclic, meaning for every sequence a i 1 j 1 , . . . , a i k j k of non-zero entries occupying k ≥ 2 distinct columns and k distinct rows, the product a i 1 j 2 . . . a i k−1 j k a i k j 1 must vanish -see Figure 2 or Denny [37], where the terminology aperiodic is used. Similarly, a set S ⊂ X × Y is acyclic if for every k ≥ 2 distinct points {x 1 , . . . , x k } ⊂ X and {y 1 , . . . , y k } ⊂ Y , at least one of the pairs (x 1 y 1 ), (x 1 , y 2 ), (x 2 , y 2 ), . . . , (x k−1 , y k ), (x k , y k ), (x k , y 1 ) lies outside of S.
A functional analytic characterization of extremality was supplied by Douglas [39] and by Lindenstrauss [67]: it asserts that γ is extremal in Γ result is a wonderful starting point, it is not quite the characterization we desire for applications, since it is not easily expressed in terms of the geometry of the support of γ. Significant further progress was made by Beneš andŠtěpán, who showed every extremal doubly stochastic measure vanishes outside some acyclic subset S ⊂ X × Y [8]. Hestir and Williams refined this condition, showing that it becomes sufficient under an additional Borel measurability hypothesis which, unfortunately, is not always satisfied [58]. Some of the subtleties of the problem were indicated already by Losert's counterexamples [69]. The difficulty of the problem resides partly in the fact that any geometrical characterization of optimality must be invariant under arbitrary measure-preserving transformations applied independently to the horizontal (abscissa) and vertical (ordinate) variables.
In the next two sections we review this line of research, clarifying the nature of the gap separating necessity from sufficiency and pointing out that it can be narrowed slightly by replacing the Borel σ-algebra with suitably adapted measure-completions. We give a self-contained proof of that part of the theory which is needed to resolved the uniqueness of optimal transportation with respect to a smooth cost on the sphere. This application was first developed in an economic context by Chiappori, McCann, and Nesheim [27], and forms the subject of the final section of the present manuscript.

Measures on graphs are push-forwards
Before recalling the characterization of interest, let us develop a bit of notation in a simpler setting, and a key argument that we shall require. Impatient or knowledgeable readers can proceed directly to the final sections below, referring back to the present section only as needed.
Let X and Y be subsets of complete separable metric spaces, and fix a non-negative Borel measure µ on X. Suppose f : X −→ Y is µ-measurable, meaning f −1 (B) is in the σ-algebra completion of the Borel subsets of X with respect to the measure µ, whenever B is relatively Borel in Y . Then a Borel measure on Y is induced, denoted f # µ and called the push-forward of µ through f , and given by for each Borel B ⊂ Y . Defining the projections π X (x, y) = x and π Y (x, y) = y on X × Y , this notation permits the horizontal and vertical marginals of a measure γ ≥ 0 on X × Y to be expressed as π X # γ and π Y # γ respectively. The next lemma shows that any measure supported on a graph can be deduced from its horizontal marginal. It improves on Lemma 2.4 of [55] and various other antecedents, by using an argument from Villani's Theorem 5.28 [104] to extract µ-measurability of f as a conclusion rather that a hypothesis.
As work of, e.g., Hestir and Williams [58] implies, although measures on graphs are extremal in Γ(µ, ν), the converse is far from being true; this peculiarity is an inevitable consequence of the infinite divisibility of (X, µ).
Proof. Since outer-measure is subadditive, it costs no generality to assume the subsets X and Y are in fact complete and separable, by extending γ in the obvious (minimal) way. Any σ-finite Borel measure γ is regular and σcompact on a complete separable metric space; e.g. p. 255 of [40] or Theorem I-55 of [103]. Since γ vanishes outside Graph(f ) : is an increasing sequence of compact sets K i ⊂ K i+1 ⊂ Graph(f ) whose union implies continuity of f on the compact projection X i := π X (K i ). Thus the restriction f ∞ of f to X ∞ := π X (K ∞ ) is a Borel map whose graph is a σ-compact set of full measure for γ. We now verify that γ and (id X∞ × f ∞ ) # µ assign the same mass to each Borel rectangle The preceding lemma shows that any measure concentrated on a graph is uniquely determined by its marginals; γ is therefore extremal in Γ(π X # γ, π Y # γ). As the results of the next section show, the converse is far from being true. and the graph of its (multivalued) inverse by , More typically, we will be interested in the Antigraph(g) ⊂ X × Y of a map with Dom(f k ) ∪ Ran(f k+1 ) ⊂ I k for each k ≥ 0. The system has (at most) Notice the map f 0 is irrelevant to this definition though I 0 is not; we may always take Dom(f 0 ) = ∅, but require Ran(f 1 ) ⊂ I 0 . The point is the following theorem and its corollary, which extends and relaxes the result measures on X × Y having µ = π X # γ and ν = π Y # γ for marginals. As in the preceding lemma, we say γ vanishes outside of S ⊂ X × Y if γ assigns zero outer measure to the complement of S in X × Y .

Theorem 4.2 (Numbered limb systems yield unique correlations)
Let X and Y be subsets of complete separable metric spaces, equipped with σ-finite Borel measures µ on X and ν on Y . Suppose there is a numbered with the property that Graph(f 2i−1 ) and Antigraph(f 2i ) are γ-measurable subsets of X × Y for each i ≥ 1 and for every γ ∈ Γ(µ, ν) vanishing outside of S. If the system has finitely many limbs or µ[X] < ∞, then at most one γ ∈ Γ(µ, ν) vanishes outside of S. If such a measure exists, it is given by γ = ∞ k=1 γ k where Here f k is measurable with respect to the η k completion of the Borel σ-algebra.
If the system has N < ∞ limbs, γ k = 0 for k > N , and η k and γ k can be computed recursively from the formulae above starting from k = N .
∪ Antigraph(f 2i ) be a numbered limb system whose complement has zero outer measure for some σ-finite measure 0 ≤ γ ∈ Γ(µ, ν). This means that I k ⊃ Dom f k gives a disjoint decomposition of X = ∪ ∞ i=0 I 2i+1 and of Y = ∪ ∞ i=0 I 2i , and that Ran(f k ) ⊂ I k−1 for each k ≥ 1. Assume moreover, that Graph(f 2i ) and Antigraph(f 2i−1 ) are γ-measurable for each i ≥ 1. We wish to show γ is uniquely determined by µ, ν and S.
The graphs Graph(f 2i−1 ) are disjoint since their domains I 2i−1 are disjoint, and the antigraphs Antigraph(f 2i ) are disjoint since their domains I 2i are. Moreover, Graph(f 2i−1 ) is disjoint from Antigraph(f 2j ) for all i, j ≥ 1: Let γ k denote the restriction of γ to Antigraph(f k ) for k even and to Graph(f k ) for k odd. Then γ = γ k by our measurability hypothesis, and γ k restricts to a Borel measure on X × Dom f k if k is even, and on Dom f k × Y if k odd. Defining the marginal projections µ k = π X # γ k and ν k = π Y # γ k , setting η k = ν k if k even and η k = µ k if k odd yields (3) and the η k -measurability of f k immediately from Lemma 3.1. Since ν 2i vanishes outside Dom f 2i , from ν = ∞ k=1 ν k we derive ν 2i = (ν − k =2i ν k )| Dom f 2i . For k even, ν k vanishes outside Dom f k ⊂ I k , while for k odd, ν k vanishes outside Ran f k ⊂ I k−1 , which is disjoint from Dom f 2i unless k = 2i + 1.
It remains to show the representation (3)-(4) specifies (γ k , η k ) uniquely for all k ≥ 1, and hence determines γ = γ k uniquely. If the system has N < ∞ limbs, I k = ∅ for k > N and hence γ k = 0. We can compute η k and γ k starting with k = N , and then recursively from the formulae above for k = N − 1, N − 2, . . . , 1, so the formulae represent γ uniquely. If instead S has countably many limbs, suppose there are two finite Borel measures γ andγ vanishing outside of S and having the same marginals µ and ν. For each k ≥ 1, recall that is measurable with respect to both γ andγ. Given > 0, take N large enough so that both γ andγ assign mass less than to ∪ ∞ k=N K k . Set γ k = γ| K k andγ k =γ| K k and denote their marginals by (µ k , ν k ) = (π X # γ k , π Y # γ k ) and (μ k ,ν k ) = (π X #γ k , π Y #γ k ). Observe that both γ := N k=1 γ k andγ := N k=1γ k are concentrated on the same numbered limb system; it has finitely many limbs, and the differences δµ = N k=1 (μ k − µ k ) and δν = N k=1 (ν k − ν k ) between the marginals of γ andγ have total variation at most 2 . Since the δµ 2i−1 =μ 2i−1 − µ 2i−1 are mutually singular, as are the δν 2i =ν 2i − ν 2i , we find the sum of the total variations of and summing on k yields γ − γ T V (X×Y) < 4 . Since γ → γ andγ →γ as → 0, we concludeγ = γ to complete the uniqueness proof.
As in Hestir and Williams [58], the uniqueness theorem above implies extremality as an immediate consequence.
The following example confirms that a measurability gap still remains between the necessary and sufficient conditions for extremality. It is a close variation on the standard example of a non-Lebesgue measurable set from real analysis. Together with the lemma and theorem preceding, this example makes clear that measurability is required only to allow the graphs to be separated from each other and from the antigraphs in an additive way.
is an acyclic set, hence can be expressed as a numbered limb system according to Hestir and Williams [58].
On the other hand, there are doubly stochastic measures such as γ := 1 2 (γ 0 + γ 1 ) which vanish outside of S but which are manifestly not extremal.

Uniqueness of optimal transportation
In this section we apply the foregoing results to the uniqueness question for optimal transportation on manifolds, which arises when one wants to use a continuum of sources to supply a continuum of sinks (modeled by µ and ν respectively) as efficiently as possible.
Given subsets X and Y of complete separable metric spaces equipped with Borel probability measures, representing the distributions µ of production on X and ν of consumption on Y , the Kantorovich-Koopmans [60] [65] transportation problem is to findγ ∈ Γ(µ, ν) correlating production with consumption so as to minimize the expected transportation cost inf γ∈Γ(µ,ν) X×Y c(x, y)dγ(x, y) against some continuous function c ∈ C(X × Y ). Hereafter we shall be solely concerned with the case in which X is a differentiable manifold, µ is absolutely continuous with respect to coordinates on X, and the cost function is differentiable with local control on the magnitude of its xderivative d x c(x, y) uniformly in y; for convenience we also suppose Y to be a differentiable manifold and c is bounded, though this is not really necessary: substantially weaker assumptions also suffice [27]; c.f. [54] [56] [48].
In this setting one immediately asks whether the infimum (5) is uniquely attained. Since attainment is evident, the question here is uniqueness. If c satisfies a twist condition, meaning x ∈ X −→ c(x, y 1 )−c(x, y 2 ) has no critical points for y 1 = y 2 ∈ Y , then we shall see that not only is the minimizing γ unique, but its mass concentrates entirely on the graph of a single map . This was proved in comparable generality by Gangbo [52] and Levin [66] (see also Ma, Trudinger and Wang [72]), building on the more specific examples of strictly convex cost functions c(x, y) = h(x − y) in X = Y = R n analyzed by Caffarelli [19], Gangbo [55], and Wang's for reflector antenna design [106], which involves given below should prove more interesting and accessible to a mathematical readership.

Theorem 5.1 (Uniqueness of optimal transport on manifolds)
Let X and Y be complete separable manifolds equipped with Borel probability measures µ on X and ν on Y . Let c ∈ C 1 (X × Y ) be a bounded cost function such that for each y 1 = y 2 ∈ Y , the map has no critical points, save at most one global minimum and at most one global maximum. Assume d x c(x, y) is locally bounded in x, uniformly in Y .
If µ is absolutely continuous in each coordinate chart on X, then the minimum (5) is uniquely attained; moreover, the minimizer γ ∈ Γ(µ, ν) vanishes outside a numbered limb system having at most two limbs.
Proof. We first prove that there is a numbered limb system having at most two limbs, outside of which the mass of all minimizers γ vanishes. A detailed argument confirming the plausible fact that the graphs of these limbs are Borel subsets of X × Y will be given later. Uniqueness of γ then follows from Theorem 4.2.
By linear programming duality (due to Kantorovich and Koopmans in this context), it is well-known [104] that there exist upper semi-continuous potentials q ∈ L 1 (X, dµ) and r ∈ L 1 (Y, dν) with such that From (7) we see and let denote the set where the non-negative function c(x, y) − q(x) − r(y) vanishes.
Lower semi-continuity of this function implies Z is a closed subset of X × Y .
Notice that (8) implies any minimizer γ ∈ Γ(µ, ν) vanishes outside the zero set Z ⊂ X × Y of the non-negative function appearing in (9). It remains to show this set Z is contained in a numbered limb system consisting of at most two limbs (apart from a µ ⊗ ν negligible set).
From (7), q is locally Lipschitz, since d x c(x, y) is controlled locally in x, independently of y ∈ Y . Rademacher's theorem therefore combines with absolute continuity of µ to imply q is differentiable µ-almost everywhere; we can safely ignore any points in X where differentiability of q fails, since they Notice uniqueness follows from Lemma 3.1 without further measurability assumptions.
In the present setting, however, we only know that x 0 must be a global minimum or global maximum of the function (6). Exchanging y 1 with y 2 if necessary yields for all x ∈ X, the second inequality being strict unless x = x 0 , in which case both inequalities are saturated. Strictness of inequality (11) implies (x, y 2 ) ∈ Z unless x = x 0 . In other words, (x, y 2 ) ∈ Z lies on the antigraph of a function f 2 (y 2 ) = x 0 well-defined at y 2 . There may or may not be a point y 0 ∈ Y different from y 1 such that for all x ∈ X. If such a point y 0 exists, then (x 0 , y 1 ) ∈ Antigraph(f 2 ) as above. If no such y 0 exists, setting Since the range of f 1 is disjoint from the domain of f 2 , this completes the proof that -up to γ-negligible sets -Z lies in a numbered limb system with at most two limbs, as desired.
Let us now prove Borel measurability of these limbs. To do this, we define the cross-difference as in McCann [76], which is a continuous function on (X × Y ) 2 and notice that ∆ ≤ 0 on Z 2 , i.e, any two (x, y) and (x , y ) in Z satisfy This well-known fact [94] can be deduced by summing the inequalities Closedness of Z and σ-compactness of B = X × Y imply is Borel on X × Y , according to Lemma 5.2 below. Taking y 2 = y 1 implies for all x ∈ X and (x 1 , y 2 ) ∈ Z. This definition is equivalent to saying that there is no y 0 satisfying (12) [76], and Plakhov [85].
Imagine the periodic interval X = Y = R/2πZ = [0, 2π[ to parameterize a town built on the boundary of a circular lake, and let probability measures µ and ν represent the distribution of students and available places in schools, respectively. Suppose the distribution of students is smooth and non-vanishing but peaks sharply at the northern end of the lake, and the distribution of schools is smooth, non-vanishing and sharply peaked at the southern end of the lake. If the cost of transporting a student residing at location θ ∈ [0, 2π] to school at location φ ∈ [0, 2π] is presumed to be given in terms of the angle commuted by c(θ, φ) = 1 − cos(θ − φ), the most effective pairing of students with places in schools is given by the measure in Γ(µ, ν) which attains the minimum: According to results of Gangbo and McCann [55] which are generalized in Theorem 5.1, this minimizer is unique, and its support is contained in the union of the graphs of two maps t ± : X −→ Y . A schematic illustration is given in Figure 4, where the restriction of the support to the subsets marked by ± on the flat torus X ×Y represent graph(t + ) and graph(t − ) respectively.
The dotted lines mark φ − θ = ± π 2 , ± 3π 2 . The necessary positivity of γ[J X × J Y 1 ] > 0 in this picture may be explained by observing that although it is cost-effective for all students to attend a school where they live, this is incompatible with the concentration of students at the north end of the lake, and of schools at the south end. Once this imbalance is corrected by sending a sufficient number of northern students to southern schools by the map t − , the remaining students can be assigned to school near their home using the map t + . Continuity of both of these maps is established in [55] and further quantified by McCann and Sosio [78], and McCann, Pass and Warren [80].
Periodicity of graphs on the flat torus can be used to represent the support as a numbered limb system in more than one way; see Figure 5, which exploits the fact that the support of γ in Figure 4 intersects X × J Y 2 in a graph and X × (Y − J Y 1 ) in an anti-graph.
Chiappori, Nesheim and McCann [27] called the uniqueness hypothesis limiting the number of critical points to at most one maximum and at most one minimum in (6) the subtwist condition. Although it is satisfied in the example above, it is an unfortunate fact that the subtwist condition cannot be satisfied by any smooth function c(θ, φ) on a product of manifolds X × Y with more complicated Morse structures than the sphere. It is an interesting open problem to find a criterion on a smooth cost c(θ, φ) on X = Y = R 2 /Z 2 which guarantees uniqueness of the minimum (14) for all smooth densities µ and ν on the torus. Although we expect such costs to be generic, not a single example of such a cost is known to us. Hestir and Williams criteria for extremality seems likely to remain relevant to such questions, and it is natural

Epilog
The connections of optimal transportation to geometry and curvaturesectional [ [104], and mean [63] have become abundantly clear in recent years. Connections to differential topology remained largely unsuspected. The results reviewed above highlight the delicacy of identifying the extremality of a doubly stochastic measure from its support, and the role played by critical points of the transportation cost (6) in guaranteeing the uniqueness of the extremal measure γ ∈ Γ(µ, ν) which solves a Kantorovich transportation problem (14) set on the ball or sphere X. When the sources µ are continuously distributed, the topology of the landscape X limits the support of γ to lie on a graph in the case of a ball, and a numbered limb system with two limbs in the case of a sphere.
This characterization is dimension independent. For landscapes with more complicated topology, not a single example of a cost function c ∈ C 1 (X × Y ) is known to guarantee uniqueness of optimal measure for all continuous densities µ and ν -nor is anything known about the support of γ beyond its numbered limb system structure and the local rectifiability determined by the rank of the cost [80] [84].