On the geometry of geodesics in discrete optimal transport

We consider the space of probability measures on a discrete set \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {X}$$\end{document}X, endowed with a dynamical optimal transport metric. Given two probability measures supported in a subset \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {Y}\subseteq \mathcal {X}$$\end{document}Y⊆X, it is natural to ask whether they can be connected by a constant speed geodesic with support in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {Y}$$\end{document}Y at all times. Our main result answers this question affirmatively, under a suitable geometric condition on \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {Y}$$\end{document}Y introduced in this paper. The proof relies on an extension result for subsolutions to discrete Hamilton–Jacobi equations, which is of independent interest.


Introduction
Optimal transport continues to be a very active field of research, both in mathematics and in applications. One of the central objects is the L p -Kantorovich metric W p , defined by where μ and ν are Borel probability measures on the metric space (X , d), and (μ, ν) is the set of all couplings of μ and ν. The metric W 2 plays a special role in the theory, as it is the crucial object in the gradient flow formulation of dissipative PDE (starting from [11,20]) and in the synthetic theory of Ricci curvature [14,22], which builds on McCann's discovery that several important functionals enjoy convexity properties along W 2 -geodesics [16].
In spite of the robustness of the optimal transport theory, it is well known that if the underlying space is discrete, W 2 has several undesirable properties that hamper its usefulness. In particular, if X is discrete, the metric space (P(X ), W 2 ) does not contain any non-trivial geodesics.
To circumvent this problem, several authors introduced discrete dynamical transport metrics W, based on discrete versions of the Benamou-Brenier formulation of optimal transport [2,15,17]. These metrics have been intensively studied in recent years; in particular, gradient flow formulations have been obtained for nonlinear evolution equations [6,19], and a discrete theory of Ricci curvature has been developed based on geodesic convexity of entropy functionals along discrete optimal transport [5,18]. Such Ricci curvature bounds have subsequently been obtained in various discrete probabilistic models [4,7,8].
In spite of the relevance of the notion of geodesic convexity, geometric properties of Wgeodesics are currently poorly understood. The aim of this paper is to obtain results of this type. We focus on the issue of locality of geodesics in the space of probability measures.
More precisely, let (X , d) be a metric space, and consider a geodesic metric D on (a subset of) the space of Borel probability measures P(X ). We say that a subset Y ⊆ X has the weak locality property if any pair of probability measures μ 0 , μ 1 ∈ P(X ) supported in Y can be connected by a geodesic that is supported in Y at all times. The notion of strong locality is defined by requiring this property to hold for any geodesic connecting μ 0 and μ 1 . If any pair of measures can be connected by a unique geodesic, the notions of weak and strong locality coincide, but this property is currently unknown for discrete dynamical transport metrics.
If (X , d) is a geodesic metric space, and D is the Kantorovich metric W p for some 1 ≤ p < ∞, it is well known that a subset Y has the weak (resp. strong) locality property if and only if it is weakly (resp. strongly) geodesically convex. This follows from the fact that geodesics in (P p (X ), W p ) are supported on geodesics in (X , d); cf. [12] for a precise formulation of this result in a general setting.
Interestingly, the issue of locality in the discrete setting [with a discrete dynamical transport metric W on P(X ) instead of W p ] turns out to be much more delicate. For example, if one considers the complete graph on a three-point set K 3 , then any geodesic connecting two Dirac masses transports a nontrivial part of the mass via the third point. Hence, two-point subsets of K 3 do not have the locality property. This is shown in Sect. 6 of this paper.
Based on this observation one may conjecture that any nontrivial W-geodesic has support on the whole graph. However, we show that this is not the case. In fact, the main contribution of this paper is the introduction of a geometric condition for subsets Y ⊆ X (the retraction property), that is shown to be sufficient for locality; see Theorem 4.11. The retraction property is easy to check in concrete examples, as is shown in Sect. 4.
As an application of our main result, we show that if X is any subset of the grid Z d with the usual graph structure, and Y ⊆ X is a hyperrectangle, then any pair of measures supported in Y can be connected by a geodesic supported in Y. In particular, this property holds for measures supported on subsets of lines, or k-dimensional hyperplanes of dimension less than d. Let us also mention that discrete Ricci curvature bounds in the sense of [5,18] are inherited by subsets with the retraction property; see Corollary 4.12.
A key ingredient in the proof of our main result is a duality result for the discrete transport metric W, which was recently obtained by Gangbo, Li, and Mou (under slightly more restrictive conditions on the transition rates) [9]. We interpret this result (Theorem 3.4 below) in terms of subsolutions of a discrete Hamilton-Jacobi equation and present a different proof based on Fenchel-Rockafellar duality. We then show that subsolutions of the Hamilton-Jacobi equation on a subset Y ⊆ X can be extended to the full space X , provided that Y has the retraction property; cf. Theorem 4.10. Our main theorem is then a straightforward consequence of this result.
Structure of the paper In Sect. 2 we collect the necessary preliminaries on discrete transport metrics. Section 3 contains the dual formulation of the transport problem in terms of Hamilton-Jacobi subsolutions. In Sect. 4 we introduce the retraction property, we show the extension result for subsolutions to the Hamilton-Jacobi equation (Theorem 4.10), and we prove the main result on weak locality of subsets with the retraction property (Theorem 4.11). In Sect. 5 we show that the strong locality property holds for Markov chains with "dead ends". Finally, it is shown in Sect. 6 that geodesics between Dirac measures on the triangle have full support.

The discrete transport distance
In this section we briefly recall the definition and basic properties of the discrete transport distance constructed in [2,15,17].
Let X be a finite set, and let Q : X × X → R + denote the transition rates for a Markov chain on X . Without loss of generality, we use the convention that Q(x, x) = 0 for all x ∈ X . The corresponding generator L acts on functions φ : X → R by We assume that Q is irreducible, i.e., each pair (x, y) ∈ X × X can be connected, for some n ∈ N, by a path {x i } n i=0 satisfying x 0 = x, x n = y, and Q(x i−1 , x i ) > 0 for i = 1, . . . , n. This assumption implies the existence of a unique stationary probability measure π on X . Moreover, π is strictly positive. We will furthermore assume that Q is reversible with respect to π, i.e., the detailed balance condition holds: The triple (X , Q, π) will be referred to as a Markov triple.
A Markov chain induces a graph on the vertex set X , whose edge set is given by E = {(x, y) ∈ X × X : Q(x, y) > 0}. We write x ∼ y iff Q(x, y) > 0. The assumption that Q is irreducible corresponds to the graph (X , E) being connected. The detailed balance condition implies that the graph is undirected.
In order to define the discrete transport distance on the set P(X ) of probability measures on X , we introduce the following objects.
Of particular interest to us is the logarithmic mean given by since it arises in the entropic gradient flow structure for the master equation ∂ t μ = L * μ.
Other relevant examples of admissible means are the harmonic mean har (s, t) = 2st s+t , the geometric mean geo (s, t) = √ st, and the arithmetic mean ari (s, t) = s+t 2 . Some of these means arise in gradient structures for porous medium equations; cf. [6]. From now on, we will fix an admissible mean .
The action functional for the discrete transport distance is defined using the convex and lower semicontinuous function A : R 3 → [0, ∞] given by For μ ∈ P(X ) and V : X × X → R we define the action by For brevity we sometimes write

Remark 2.4
Without loss of generality we may assume in the minimisation (2.3) that V is anti-symmetric, i.e., V t (x, y) = −V t (y, x). In fact, for each U ∈ R, the quantity |V (x, y) Finally, let us introduce some convenient notation to be used in the sequel. We denote the Euclidean inner products on R X and R X ×X by The discrete gradient of a function φ ∈ R X will be denoted by ∇φ(x, y) = φ(y) − φ(x), and the discrete divergence of ∈ R X ×X is given by Furthermore, for μ ∈ P(X ) and ∈ R X ×X we write where the multiplication of ·μ is understood componentwise. For all , V ∈ R X ×X and μ ∈ P(X ), Young's inequality yields

Duality for discrete optimal transport
We present a dual formulation for the discrete transport distance which can be seen as a discrete analogue of the Kantorovich duality. This result has recently been proved in [9] using different methods; cf. Proposition 3.10 and Theorems 5.10 and 7.4 in that paper. Note that the result in [9] is stated under slightly stronger assumptions on the transition rates. In our notation, it is assumed there that Q(x, y) = Q(y, x) and π is constant. The slightly greater generality here does not cause additional difficulties.
The collection of all Hamilton-Jacobi subsolutions is denoted HJ T X .

Remark 3.3
Informally, (3.1) may be seen as a one-sided discrete version of the Hamilton-Jacobi equation ∂ t φ + 1 2 |∇φ| 2 = 0. Note however that the dependence on μ in (3.1) is nonlinear, which prevents us from formulating the inequality pointwise in terms of φ only. This is a crucial difference between the discrete and the continuous setting, and a source of several difficulties.
In the continuous setting, a full treatment of Hamilton-Jacobi equations relies on the theory of viscosity solutions [3], but this concept will not play any role in our discrete setting. Let us also mention that Hamilton-Jacobi equations have been studied in the setting of metric length spaces [10,13] as well as on graphs [21]. Our discrete notion of Hamilton-Jacobi subsolution is different from the one studied in [21].

This representation remains true if the supremum is restricted to the class of functions
Let us first give a heuristic argument for the duality result above. We start by introducing a Lagrange multiplier for the continuity equation constraint and write where the supremum is taken over all (sufficiently smooth) functions φ : [0, 1] → R X and the infimum is taken over all (sufficiently smooth) curves μ : [0, 1] → R + connecting μ 0 and μ 1 , and over all V : [0, 1] → R X ×X . Here we do not require that (μ, V ) satisfies the continuity equation, but the inner supremum takes the value +∞ if (μ, V ) does not belong to CE 1 (μ 0 , μ 1 ). We also do not require that μ takes values in P(X ), but this is automatically enforced by the continuity equation.
Integrating by parts and using the min-max principle we obtain As the quantity to be minimised is positively A simple localisation argument in t shows that φ ∈ H iff for all t ∈ [0, 1] and (μ, We may write Minimising over V we conclude that φ ∈ H iff the inequality holds for all μ ∈ R X + and t ∈ [0, 1], which means that φ ∈ HJ X . We present a proof of Theorem 3.4 using the Fenchel-Rockafellar duality theorem; see, e.g., [23,Theorem 1.9]. Recall that given a normed vector space E with topological dual space E * and a proper convex function F :

Proof of Theorem 3.4
Let us first note that, by the convexity of the constraint (3.1), any φ ∈ HJ 1 X can be approximated uniformly by C 1 functions satisfying (3.1) by convolution (after scaling the function to a slightly larger interval [− δ, 1 + δ] via Remark 3.2). Therefore, the final part of the theorem follows.
To show the dual representation with C 1 functions, we will apply Theorem 3.5 in the following situation. Let E be the Banach space where the duality pairing between (φ, ) ∈ E and (b, σ, V ) ∈ E * is given by keeping in mind that σ is a vector-valued measure.
Define the functionals F, G : Here we say that a pair (φ, ) ∈ E belongs to D if for all continuous curves t → η t ∈ R X + we have It is readily checked that F and G are convex. Moreover, settingφ(t) = t(− 1, . . . , − 1) and¯ ≡ 0, both F and G are finite at (φ,¯ ) and G is continuous at (φ,¯ ). Note that for φ ∈ C 1 [0, 1], R X we have (φ, ∇φ) ∈ D if and only if φ ∈ HJ 1 X which follows from a simple localisation argument in t. Hence, the supremum in the left-hand side of (3.4) coincides with the supremum in the right-hand side of (3.2).
We will calculate the Legendre-Fenchel transforms of F and G. For F we obtain Thus, by homogeneity of the last expression in φ, one has F * (b, σ, ν) = +∞ unless (σ, V ) satisfies the continuity equation ∂ t σ + ∇ · V = 0 with boundary values − (μ 0 − b) and − μ 1 , in the sense that In particular, the distributional derivative of σ belongs to L 2 ([0, 1]; R X ). Since the antiderivative of a distribution is unique up to a constant, the fundamental theorem of Lebesgue calculus implies that σ has the form dσ (t) = σ t dt for some curve (σ t ) t ∈ H 1 ([0, 1]; R X ). Moreover, (3.5) implies σ 0 = −(μ 0 −b) and σ 1 = −μ 1 . Thus, we obtain where CE is defined by dropping the positivity and normalisation condition (i) in the definition of CE, and we have identified the measure σ with the H 1 -map σ t .
Let us assume that b = 0 and 1 0 A(σ t , V t ) dt < ∞. Then we obtain where the first inequality follows from the definition of D and the second from (2.4). It remains to show that we have in fact equality. First we consider a convolution in time yielding smooth pairs σ ε t , V ε t converging to σ t , V t as ε → 0. Then we set for δ > 0, σ δ,ε t = σ ε t + δπ. By convexity of the action and monotonicity of the mean we have The convexity and lower semicontinuity of A further implies the lower semicontinuity of the action; see [ where σ δ,ε = ρ δ,ε π. We claim that (φ δ,ε , δ,ε ) ∈ D. To see this, we use the inequality which is an identity for s = v, t = u, see [5,Lemma 2.2]. From this we infer that for any μ =ρπ ∈ P(X ) we have which proves the claim. Note that forρ = ρ δ,ε t we obtain equality. Next we claim that To prove this, we compare the left-hand side and the second line in (3.10). The limit ε → 0 is justified by dominated convergence, since (3.10) yields the majorant where C depends on Q and π. The right-hand side converges as ε → 0 by (3.9). The limit δ → 0 is justified by monotone convergence. Similarly, we have Here, we can use the estimate |ab| ≤ 1 2 a 2 + 1 2 b 2 to obtain a majorant that converges by (3.9) as before. Thus the expression in the first braced bracket of (3.8) converges to the right-hand side of (3.8) with this choice of (φ δ,ε , δ,ε ) as δ, ε → 0. A similar argument Combining (3.6), (3.7) and the fact that A(σ, V ) = +∞ unless σ ∈ R X + , we obtain otherwise.
Thus the infimum in the right-hand side of (3.4) coincides with 1 2 W(μ 0 , μ 1 ) 2 . An application of Theorem 3.5 concludes the proof.

Locality of optimal curves
In this section we investigate locality properties for discrete transport geodesics. More precisely, we study the following question: Given two probability measures supported in a subset Y of a state space X , is there an optimal curve connecting them that is supported in Y? The crucial tool to analyse this question is the dual characterisation of the transport problem given in the previous section. We prove two types of results.
Firstly, we show that the question can be answered affirmatively, under a simple condition (the retraction property of the subgraph Y), which will be introduced below. This property ensures that any competitor in the dual problem on the subgraph can be extended to a competitor on the full graph. We present several examples where this property is satisfied. Later, in Sect. 6, we will show that locality may fail if the retraction property is not satisfied.
We start by introducing the retraction property and we give several examples. To increase readability, we often write subscripts instead of parentheses, e.g., Q xy = Q(x, y).
A subset Y ⊆ X is said to be connected if any two distinct points y, y ∈ Y can be connected by a path {y i } n i=0 ⊆ Y satisfying y 0 = y, y n = y , and Q(y i−1 , y i ) > 0 for i = 1, . . . , n.

Remark 4.2
If the Markov triple (X , Q, π) corresponds to a simple random walk (i.e., Q(x, y) ∈ {0, 1} for all x, y ∈ X ), the retraction property can be rephrased in graph theoretical terms. Indeed, it is readily verified that the retraction property holds if and only if there exists a map T : X → Y with the following properties:

Definition 4.3 (Restriction) The restriction of a Markov triple
Connectedness of Y implies that the Markov triple (Y, Q| Y , π| Y ) is irreducible, and the detailed balance relation is obviously inherited. The following result implies that if Y has the retraction property as a subset of X , it also has this property as a subset of any set X with Y ⊆ X ⊆ X .

Lemma 4.4
Let (X , Q, π) be a Markov triple and Y ⊆ X ⊆ X . If T : X → Y is a retraction, then its restriction T | X : X → Y is a retraction as well.
Proof This follows immediately from the definition.
We present some examples of sets with the retraction property.

Example 4.5 (Cycle)
For n ≥ 2, let X = Z/nZ, and set Q j, j+1 = Q j+1, j = 1 and Q i j = 0 otherwise. All computations are to be understood modulo n. We claim that the subset {1, . . . , k} of X has the retraction property if and only if 2k ≤ n. In this case, a retraction is given as follows (cf. Fig. 1): Indeed, to check sufficiency, note that (R1 ) is trivial, (R2 ) holds since n ≥ 2k, and (R3 ) is readily checked as well. Necessity follows from a simple argument.
is a hyperrectangle, and let X be a connected subgraph of Z d containing Y. We claim that Y has the retraction property. Indeed, it is readily checked that a retraction from X to Y can be obtained by mapping x ∈ X to the point in Y that is closest to x with respect to the Euclidean distance.

Example 4.7 (2-Point space)
Assume that Q takes values in {0, 1} and let x, y ∈ X with Q xy = 1. A disjoint decomposition X = A x ∪ A y with x ∈ A x and y ∈ A y is called an x-y cut. An edge (u, v) ∈ E is a cross if u ∈ A x and v ∈ A y . The subset {x, y} has the retraction property if and only if there exists an x-y cut such that no distinct crosses share a point. The correspondence between x-y cuts with this property and retractions is given by (Fig. 2).

Example 4.8 (Honeycomb lattice)
Let (X , E) be a connected subgraph of the honeycomb lattice and define transition rates by setting Q xy = 1 if (x, y) ∈ E and zero otherwise. Then each fundamental cell Y = {y 1 , . . . , y 6 } (see Fig. 3) has the retraction property. Indeed, to obtain a retraction of X onto Y, we partition the plane into 6 sectors separated by rays that originate at the centre of Y and intersect the midpoints of the sides of Y orthogonally. A retraction is then obtained by mapping each x ∈ X to the unique y ∈ Y that belongs to the same sector (cf. Fig. 3). Example 4.9 (Trees) Assume that the graph (X , E) is a tree, i.e., it does not contain a cycle. Every subtree Y of X has the retraction property, and a retraction can be constructed as follows: Fix a vertex y ∈ Y. Since X is a tree, for every x ∈ X there is a unique path γ without self-intersections connecting x and y. The map assigning to x the first point where the path γ meets Y is a retraction of X onto Y. Note that the retraction property depends only on the graph (X , E) and not on the choice of the transition rates Q (as long as they give rise to the same graph).  X , Q, π) be a Markov triple, and let Y be a connected subset of X . If Y has the retraction property, then every Hamilton-Jacobi subsolution on Y can be extended to a Hamilton-Jacobi subsolution on X .
Proof Let φ be a Hamilton-Jacobi subsolution on Y, and let T be a retraction of X onto Y. Defineφ : X → R byφ := φ • T , so thatφ| Y = φ by (R1). We will show that for anȳ ν ∈ P(X ), there exists ν ∈ P(Y) such that for a.e. t. To improve readability, we omit the subscript t. As φ ∈ HJ Y , the right-hand side of (4.1) is nonpositive, so this suffices to prove the theorem. Forν ∈ P(X ) define ν ∈ P(Y) by ν := T #ν . Clearly, It thus remains to show that ∇φ t ν ≤ ∇φ t ν . Splitting the sum we obtain The concavity and homogeneity of imply Given x ∈ X and y, y ∈ Y with y = y and T (x) = y, the retraction property (R2) implies that x ∈T −1 (y ) Q x x ≤ Q yy (and the same holds with primed and unprimed variables interchanged). Hence the monotonicity of yields x , Q y y x ∈T −1 (y )ν x = (ν y Q yy , ν y Q y y ).
Combining these inequalities, we infer that which completes the proof.
The following result shows that any pair of measures supported in a set Y with the retraction property can be connected by a geodesic supported in Y. Theorem 4.11 (Weak locality under the retraction property) Let (X , Q, π) be a Markov triple, and let Y be a subset of X with the retraction property. For all μ 0 , μ 1 ∈ P(X ) with support in Y there exists a minimising W-geodesic (μ t ) t∈[0,1] ⊆ P(X ) connecting μ 0 and μ 1 such that μ t has support in Y for all t ∈ [0, 1].
In fact, we will show that any W Y -geodesic (μ t ) t ⊆ P(Y) is also a W X -geodesic when regarded as a curve in P(X ).
Proof Let (μ t ) t be a minimising geodesic in P(Y) satisfying the continuity Eq. (2.1) with momentum vector field (V t ) t . Consider the extension to X defined byμ Theorem 3.4 (applied in P(Y)) implies that there exists a Hamilton-Jacobi subsolution φ ∈ HJ Y such that By Theorem 4.10, φ can be extended to a Hamilton-Jacobi subsolutionφ ∈ HJ X . In particular, using Theorem 3.4 once more (this time in P(X )), Since ε > 0 is arbitrary, it follows that , which yields the result.
It follows from the previous result that Ricci curvature bounds in the sense of [5,18] are inherited by subsets with the retraction property. We recall that a Markov triple (X , Q, π) is said to have Ricci curvature bounded from below by κ ∈ R if for any μ 0 , μ 1 ∈ P(X ), and for some (equivalently, for any) W-geodesic (μ t ) connecting μ 0 and μ 1 , the relative entropy μ → Ent π (μ) := x∈X μ(x) log μ(x) π(x) satisfies the following κ-convexity inequality, for any 0 ≤ t ≤ 1: In this case we write Ric(X , Q, π) ≥ κ; cf. [5,18] for further details.

Corollary 4.12
Let (X , Q, π) be a Markov triple, and let Y be a subset of X with the retraction property. If Ric(X , Q, π) ≥ κ for some κ ∈ R, then Ric(Y, Q| Y , π| Y ) ≥ κ as well.

Optimal transport avoids dead ends
In this section we prove the intuitively natural statement that optimal curves do not transport mass into "dead ends". We formalise this concept by considering the gluing of two Markov triples along a vertex.
Definition 5.1 (Gluing of Markov triples) Let (X 1 , Q 1 , π 1 ) and (X 2 , Q 2 , π 2 ) be Markov triples, and fix x 1 ∈ X 1 , x 2 ∈ X 2 . The gluing of the two triples at x 1 , x 2 is the Markov triple (X , Q, π) defined by setting For brevity, let us write X i := X i \{x i }. We have canonical injections X 1 → X , X 2 → X , and we identify elements of (X 1 X 2 )\{x 1 , x 2 } with their respective images. We define transition rates Q : X × X → R by It is easy to see that Q is irreducible and reversible, and the unique invariant probability measure is given by Definition 5.2 (Dead end) Let (X , Q, π) be a Markov triple, and let X 1 , X 2 ⊆ X . We say that X 2 is a dead end for X 1 (and vice versa) if the intersection of X 1 and X 2 contains exactly one point (denoted " * "), and moreover, Q(x, y) = Q(y, x) = 0 whenever x ∈ X 1 and y ∈ X 2 . Here, we write X i = X i \{ * }.

Remark 5.3
The notions of dead end and gluing of Markov triples are compatible in the following sense: Let (X , Q, π) be a Markov triple, and suppose that X 2 ⊆ X is a dead end for X 1 ⊆ X with intersection point * . Then one recovers (X , Q, π) by gluing together the restrictions of X to X 1 and X 2 at * .

Proposition 5.4
Let (X 1 , Q 1 , π 1 ) and (X 2 , Q 2 , π 2 ) be Markov triples, and let (X , Q, π) be the Markov triple obtained by gluing the triples at x 1 ∈ X 1 and x 2 ∈ X 2 . Then X 1 and X 2 have the retraction property as subsets of X .
Proof Define T : X → X 1 by T (x) = x for x ∈ X 1 and T (x) = * for x ∈ X 2 . One verifies that T indeed defines a retraction by distinguishing cases.
In view of Theorem 4.11, the previous result implies that any two measures μ 0 , μ 1 supported in (the image of) X 1 can be connected by a geodesic that is supported in X 1 for all times; i.e., weak locality holds. We will now show that in fact strong locality holds: any geodesic connecting μ 0 and μ 1 has to be supported in X 1 .
This strict inequality contradicts the fact that (μ t ) t∈[0,1] is a geodesic.

Nonlocality of optimal transport on the triangle
Consider a Markov triple (X , Q, π) and a connected subset Y ⊆ X . In this section we show that locality of geodesics in P(Y) may fail if Y does not have the retraction property. We consider the simplest possible setting, where (X , Q, π) corresponds to simple random walk on a triangle, and Y ⊆ X is a two-point set. We show that the canonical lift of a geodesic between Dirac measures on the two-point space is not an optimal curve in P(X ), by constructing a competitor that transports mass along all edges. Throughout this section we make the following additional assumption on the mean . If (0, t) > 0 for t > 0, then (6.1) also holds for s = 0.
Clearly, this assumption is satisfied for the arithmetic, geometric, and logarithmic means, but not for the harmonic mean.