Abstract
We consider the space of probability measures on a discrete set \(\mathcal {X}\), endowed with a dynamical optimal transport metric. Given two probability measures supported in a subset \(\mathcal {Y}\subseteq \mathcal {X}\), it is natural to ask whether they can be connected by a constant speed geodesic with support in \(\mathcal {Y}\) at all times. Our main result answers this question affirmatively, under a suitable geometric condition on \(\mathcal {Y}\) introduced in this paper. The proof relies on an extension result for subsolutions to discrete Hamilton–Jacobi equations, which is of independent interest.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Optimal transport continues to be a very active field of research, both in mathematics and in applications. One of the central objects is the \(L^p\)Kantorovich metric \(W_p\), defined by
where \(\mu \) and \(\nu \) are Borel probability measures on the metric space \((\mathcal {X},d)\), and \(\Pi (\mu ,\nu )\) is the set of all couplings of \(\mu \) and \(\nu \).
The metric \(W_2\) plays a special role in the theory, as it is the crucial object in the gradient flow formulation of dissipative PDE (starting from [11, 20]) and in the synthetic theory of Ricci curvature [14, 22], which builds on McCann’s discovery that several important functionals enjoy convexity properties along \(W_2\)geodesics [16].
In spite of the robustness of the optimal transport theory, it is well known that if the underlying space is discrete, \(W_2\) has several undesirable properties that hamper its usefulness. In particular, if \(\mathcal {X}\) is discrete, the metric space \((\mathcal {P}(\mathcal {X}), W_2)\) does not contain any nontrivial geodesics.
To circumvent this problem, several authors introduced discrete dynamical transport metrics \(\mathcal {W}\), based on discrete versions of the Benamou–Brenier formulation of optimal transport [2, 15, 17]. These metrics have been intensively studied in recent years; in particular, gradient flow formulations have been obtained for nonlinear evolution equations [6, 19], and a discrete theory of Ricci curvature has been developed based on geodesic convexity of entropy functionals along discrete optimal transport [5, 18]. Such Ricci curvature bounds have subsequently been obtained in various discrete probabilistic models [4, 7, 8].
In spite of the relevance of the notion of geodesic convexity, geometric properties of \(\mathcal {W}\)geodesics are currently poorly understood. The aim of this paper is to obtain results of this type. We focus on the issue of locality of geodesics in the space of probability measures.
More precisely, let \((\mathcal {X}, \mathsf {d})\) be a metric space, and consider a geodesic metric \(\mathsf {D}\) on (a subset of) the space of Borel probability measures \(\mathcal {P}(\mathcal {X})\). We say that a subset \(\mathcal {Y}\subseteq \mathcal {X}\) has the weak locality property if any pair of probability measures \(\mu _0, \mu _1 \in \mathcal {P}(\mathcal {X})\) supported in \(\mathcal {Y}\) can be connected by a geodesic that is supported in \(\mathcal {Y}\) at all times. The notion of strong locality is defined by requiring this property to hold for any geodesic connecting \(\mu _0\) and \(\mu _1\). If any pair of measures can be connected by a unique geodesic, the notions of weak and strong locality coincide, but this property is currently unknown for discrete dynamical transport metrics.
If \((\mathcal {X}, \mathsf {d})\) is a geodesic metric space, and \(\mathsf {D}\) is the Kantorovich metric \(W_p\) for some \(1 \le p < \infty \), it is well known that a subset \(\mathcal {Y}\) has the weak (resp. strong) locality property if and only if it is weakly (resp. strongly) geodesically convex. This follows from the fact that geodesics in \((\mathcal {P}_p(\mathcal {X}), W_p)\) are supported on geodesics in \((\mathcal {X}, \mathsf {d})\); cf. [12] for a precise formulation of this result in a general setting.
Interestingly, the issue of locality in the discrete setting [with a discrete dynamical transport metric \(\mathcal {W}\) on \(\mathcal {P}(\mathcal {X})\) instead of \(W_p\)] turns out to be much more delicate. For example, if one considers the complete graph on a threepoint set \(K_3\), then any geodesic connecting two Dirac masses transports a nontrivial part of the mass via the third point. Hence, twopoint subsets of \(K_3\) do not have the locality property. This is shown in Sect. 6 of this paper.
Based on this observation one may conjecture that any nontrivial \(\mathcal {W}\)geodesic has support on the whole graph. However, we show that this is not the case. In fact, the main contribution of this paper is the introduction of a geometric condition for subsets \(\mathcal {Y}\subseteq \mathcal {X}\) (the retraction property), that is shown to be sufficient for locality; see Theorem 4.11. The retraction property is easy to check in concrete examples, as is shown in Sect. 4.
As an application of our main result, we show that if \(\mathcal {X}\) is any subset of the grid \(\mathbb {Z}^d\) with the usual graph structure, and \(\mathcal {Y}\subseteq \mathcal {X}\) is a hyperrectangle, then any pair of measures supported in \(\mathcal {Y}\) can be connected by a geodesic supported in \(\mathcal {Y}\). In particular, this property holds for measures supported on subsets of lines, or kdimensional hyperplanes of dimension less than d. Let us also mention that discrete Ricci curvature bounds in the sense of [5, 18] are inherited by subsets with the retraction property; see Corollary 4.12.
A key ingredient in the proof of our main result is a duality result for the discrete transport metric \(\mathcal {W}\), which was recently obtained by Gangbo, Li, and Mou (under slightly more restrictive conditions on the transition rates) [9]. We interpret this result (Theorem 3.4 below) in terms of subsolutions of a discrete Hamilton–Jacobi equation and present a different proof based on Fenchel–Rockafellar duality. We then show that subsolutions of the Hamilton–Jacobi equation on a subset \(\mathcal {Y}\subseteq \mathcal {X}\) can be extended to the full space \(\mathcal {X}\), provided that \(\mathcal {Y}\) has the retraction property; cf. Theorem 4.10. Our main theorem is then a straightforward consequence of this result.
Structure of the paper In Sect. 2 we collect the necessary preliminaries on discrete transport metrics. Section 3 contains the dual formulation of the transport problem in terms of Hamilton–Jacobi subsolutions. In Sect. 4 we introduce the retraction property, we show the extension result for subsolutions to the Hamilton–Jacobi equation (Theorem 4.10), and we prove the main result on weak locality of subsets with the retraction property (Theorem 4.11). In Sect. 5 we show that the strong locality property holds for Markov chains with “dead ends”. Finally, it is shown in Sect. 6 that geodesics between Dirac measures on the triangle have full support.
2 The discrete transport distance
In this section we briefly recall the definition and basic properties of the discrete transport distance constructed in [2, 15, 17].
Let \(\mathcal {X}\) be a finite set, and let \(Q: \mathcal {X}\times \mathcal {X}\rightarrow \mathbb {R}_+\) denote the transition rates for a Markov chain on \(\mathcal {X}\). Without loss of generality, we use the convention that \(Q(x,x) = 0\) for all \(x \in \mathcal {X}\). The corresponding generator \(\mathcal {L}\) acts on functions \(\phi : \mathcal {X}\rightarrow \mathbb {R}\) by
We assume that Q is irreducible, i.e., each pair \((x, y)\in \mathcal {X}\times \mathcal {X}\) can be connected, for some \(n \in \mathbb {N}\), by a path \(\{x_i\}_{i=0}^n\) satisfying \(x_0 = x\), \(x_n = y\), and \(Q(x_{i1}, x_i) > 0\) for \(i = 1, \ldots , n\). This assumption implies the existence of a unique stationary probability measure \(\pi \) on \(\mathcal {X}\). Moreover, \(\pi \) is strictly positive. We will furthermore assume that Q is reversible with respect to \(\pi \), i.e., the detailed balance condition holds:
The triple \((\mathcal {X},Q,\pi )\) will be referred to as a Markov triple.
A Markov chain induces a graph on the vertex set \(\mathcal {X}\), whose edge set is given by \(\mathcal {E}= \{ (x,y) \in \mathcal {X}\times \mathcal {X}: Q(x,y) > 0 \}\). We write \(x \sim y\) iff \(Q(x,y) > 0\). The assumption that Q is irreducible corresponds to the graph \((\mathcal {X},\mathcal {E})\) being connected. The detailed balance condition implies that the graph is undirected.
In order to define the discrete transport distance on the set \(\mathcal {P}(\mathcal {X})\) of probability measures on \(\mathcal {X}\), we introduce the following objects.
Definition 2.1
(Continuity equation) A pair \((\mu ,V)\) is said to satisfy the continuity equation if

(i)
\(\mu :[0,T]\rightarrow \mathbb {R}^\mathcal {X}\) is continuous;

(ii)
\(V:[0,T]\rightarrow \mathbb {R}^{\mathcal {X}\times \mathcal {X}}\) is locally integrable;

(iii)
\(\mu _t\in \mathcal {P}(\mathcal {X})\) for all \(t\in [0,T]\);

(iv)
the continuity equation holds in the sense of distributions:
$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t}\mu _t(x) + \frac{1}{2}\sum _{y\in \mathcal {X}}\left( V_t(x,y)  V_t(y,x) \right) = 0 \quad \text { for all } x\in \mathcal {X}. \end{aligned}$$(2.1)
In this case, we write \((\mu ,V) \in \mathsf {CE}_T\). Furthermore, \(\mathsf {CE}_T(\mu ^0,\mu ^1)\) denotes the collection of pairs \((\mu ,V)\in \mathsf {CE}_T\) satisfying \(\mu _{t=0}=\mu ^0\) and \(\mu _{t=T}=\mu ^1\).
Definition 2.2
(Admissible mean) An admissible mean is a continuous function \(\Lambda : \mathbb {R}_+ \times \mathbb {R}_+ \rightarrow \mathbb {R}_+\) that is \(C^\infty \) on \((0,\infty ) \times (0,\infty )\), symmetric, positively 1homogeneous, nondecreasing in each of its variables, jointly concave, and normalised, i.e., \(\Lambda (1,1) = 1\).
Of particular interest to us is the logarithmic mean given by
since it arises in the entropic gradient flow structure for the master equation \(\partial _t \mu = \mathcal {L}^* \mu \). Other relevant examples of admissible means are the harmonic mean \(\Lambda _{\text {har}}(s,t) = \frac{2st}{s+t}\), the geometric mean \(\Lambda _{\text {geo}}(s,t) = \sqrt{st}\), and the arithmetic mean \(\Lambda _{\text {ari}}(s,t) = \frac{s+t}{2}\). Some of these means arise in gradient structures for porous medium equations; cf. [6]. From now on, we will fix an admissible mean \(\Lambda \).
The action functional for the discrete transport distance is defined using the convex and lower semicontinuous function \(A: \mathbb {R}^3 \rightarrow [0,\infty ]\) given by
For \(\mu \in \mathcal {P}(\mathcal {X})\) and \(V: \mathcal {X}\times \mathcal {X}\rightarrow \mathbb {R}\) we define the action by
For brevity we sometimes write
Definition 2.3
(Discrete transport distance) For a Markov triple \((\mathcal {X}, Q, \pi )\) and an admissible mean \(\Lambda \), the discrete transport distance \(\mathcal {W}\) is defined for \(\mu _0,\mu _1\in \mathcal {P}(\mathcal {X})\) by
It has been shown in [5] that minimisers exist in the minimisation problem above. Any minimal curve \((\mu _t)_{t\in [0,1]}\) is a constant speed geodesic, i.e., it satisfies \(\mathcal {W}(\mu _s,\mu _t)=ts\mathcal {W}(\mu _0,\mu _1)\) for all \(s,t\in [0,1]\).
Remark 2.4
Without loss of generality we may assume in the minimisation (2.3) that V is antisymmetric, i.e., \(V_t(x,y)= V_t(y,x)\). In fact, for each \(U \in \mathbb {R}\), the quantity \(V(x,y)^2+V(y,x)^2\) is minimised among all choices of V(x, y), V(y, x) such that \(V(x,y)  V(y,x) = U\) by choosing \(V(y,x) = V(x,y) = U/2\).
Finally, let us introduce some convenient notation to be used in the sequel. We denote the Euclidean inner products on \(\mathbb {R}^\mathcal {X}\) and \(\mathbb {R}^{\mathcal {X}\times \mathcal {X}}\) by
The discrete gradient of a function \(\phi \in \mathbb {R}^\mathcal {X}\) will be denoted by \(\nabla \phi (x,y) = \phi (y)  \phi (x)\), and the discrete divergence of \(\Phi \in \mathbb {R}^{\mathcal {X}\times \mathcal {X}}\) is given by
Furthermore, for \(\mu \in \mathcal {P}(\mathcal {X})\) and \(\Phi \in \mathbb {R}^{\mathcal {X}\times \mathcal {X}}\) we write
where the multiplication of \(\Phi \cdot \hat{\mu }\) is understood componentwise. For all \(\Phi ,V\in \mathbb {R}^{\mathcal {X}\times \mathcal {X}}\) and \(\mu \in \mathcal {P}(\mathcal {X})\), Young’s inequality yields
3 Duality for discrete optimal transport
We present a dual formulation for the discrete transport distance which can be seen as a discrete analogue of the Kantorovich duality. This result has recently been proved in [9] using different methods; cf. Proposition 3.10 and Theorems 5.10 and 7.4 in that paper. Note that the result in [9] is stated under slightly stronger assumptions on the transition rates. In our notation, it is assumed there that \(Q(x,y) = Q(y,x)\) and \(\pi \) is constant. The slightly greater generality here does not cause additional difficulties.
Definition 3.1
(Hamilton–Jacobi subsolution) A function \(\phi \in H^1\big ((0,T);\mathbb {R}^\mathcal {X}\big )\) is said to be a Hamilton–Jacobi subsolution if for a.e. t in (0, T), we have
The collection of all Hamilton–Jacobi subsolutions is denoted \(\mathsf {HJ}^T_\mathcal {X}\).
Remark 3.2
Hamilton–Jacobi subsolutions obey a simple scaling relation: given \(\phi \in \mathsf {HJ}_\mathcal {X}^T\) and \(\lambda >0\), set \(\phi ^\lambda _t:=\lambda \phi _{\lambda t}\). It is immediate to check that \(\phi ^\lambda \in \mathsf {HJ}_\mathcal {X}^{\lambda T}\).
Remark 3.3
Informally, (3.1) may be seen as a onesided discrete version of the Hamilton–Jacobi equation \(\partial _t \phi + \frac{1}{2}\nabla \phi ^2 = 0\). Note however that the dependence on \(\mu \) in (3.1) is nonlinear, which prevents us from formulating the inequality pointwise in terms of \(\phi \) only. This is a crucial difference between the discrete and the continuous setting, and a source of several difficulties.
In the continuous setting, a full treatment of Hamilton–Jacobi equations relies on the theory of viscosity solutions [3], but this concept will not play any role in our discrete setting. Let us also mention that Hamilton–Jacobi equations have been studied in the setting of metric length spaces [10, 13] as well as on graphs [21]. Our discrete notion of Hamilton–Jacobi subsolution is different from the one studied in [21].
Theorem 3.4
(Duality formula) For \(\mu _0,\mu _1\in \mathcal {P}(\mathcal {X})\) we have
This representation remains true if the supremum is restricted to the class of functions \(\phi \in C^1\big ([0,1],\mathbb {R}^\mathcal {X}\big )\) satisfying (3.1).
Let us first give a heuristic argument for the duality result above. We start by introducing a Lagrange multiplier for the continuity equation constraint and write
where the supremum is taken over all (sufficiently smooth) functions \(\phi :[0,1]\rightarrow \mathbb {R}^\mathcal {X}\) and the infimum is taken over all (sufficiently smooth) curves \(\mu : [0,1] \rightarrow \mathbb {R}_+\) connecting \(\mu _0\) and \(\mu _1\), and over all \(V : [0,1] \rightarrow \mathbb {R}^{\mathcal {X}\times \mathcal {X}}\). Here we do not require that \((\mu ,V)\) satisfies the continuity equation, but the inner supremum takes the value \(+\infty \) if \((\mu ,V)\) does not belong to \(\mathsf {CE}_1(\mu _0,\mu _1)\). We also do not require that \(\mu \) takes values in \(\mathcal {P}(\mathcal {X})\), but this is automatically enforced by the continuity equation.
Integrating by parts and using the min–max principle we obtain
As the quantity to be minimised is positively 1homogeneous in \((\mu , V)\), the infimum takes the value \(\infty \) if \(\phi \) does not belong to \(\mathcal {H}\), the set of \(C^1\) functions \(\phi :[0,1]\rightarrow \mathbb {R}^{\mathcal {X}}\) satisfying
for all \(\mu : [0,1] \rightarrow \mathbb {R}_+^\mathcal {X}\) and all \(V : [0,1] \rightarrow \mathbb {R}^{\mathcal {X}\times \mathcal {X}}\). Consequently,
A simple localisation argument in t shows that \(\phi \in \mathcal {H}\) iff for all \(t\in [0,1]\) and \((\mu ,V)\in \mathbb {R}_+^\mathcal {X}\times \mathbb {R}^{\mathcal {X}\times \mathcal {X}}\):
We may write
Minimising over V we conclude that \(\phi \in \mathcal {H}\) iff the inequality
holds for all \(\mu \in \mathbb {R}^\mathcal {X}_+\) and \(t \in [0,1]\), which means that \(\phi \in \mathsf {HJ}_\mathcal {X}\).
We present a proof of Theorem 3.4 using the Fenchel–Rockafellar duality theorem; see, e.g., [23, Theorem 1.9]. Recall that given a normed vector space E with topological dual space \(E^*\) and a proper convex function \(F: E \rightarrow \mathbb {R}\cup \{+\,\infty \}\), its Legendre–Fenchel transform \(F^* : E^* \rightarrow \mathbb {R}\cup \{+\,\infty \}\) is defined by
Theorem 3.5
(Fenchel–Rockafellar duality) Let E be a normed vector space and \(E^*\) its topological dual. Let \(F,G:E\rightarrow \mathbb {R}\cup \{+\infty \}\) be proper convex functions and denote by \(F^*,G^*:E^*\rightarrow \mathbb {R}\cup \{+\infty \}\) their Legendre–Fenchel transforms. Assume that there is \(z_0\in E\) such that G is continuous at \(z_0\) and \(F(z_0),G(z_0)<\infty \). Then we have:
Proof of Theorem 3.4
Let us first note that, by the convexity of the constraint (3.1), any \(\phi \in \mathsf {HJ}^1_\mathcal {X}\) can be approximated uniformly by \(C^1\) functions satisfying (3.1) by convolution (after scaling the function to a slightly larger interval \([\,\delta ,1+\delta ]\) via Remark 3.2). Therefore, the final part of the theorem follows.
To show the dual representation with \(C^1\) functions, we will apply Theorem 3.5 in the following situation. Let E be the Banach space
Since we can identify \(C^1\big ([0,1],\mathbb {R}^{\mathcal {X}}\big )\) with \(\mathbb {R}^\mathcal {X}\times C^0\big ([0,1],\mathbb {R}^{\mathcal {X}}\big )\) via the map \(I : \phi \mapsto (\phi _0, \dot{\phi })\), the dual space \(E^*\) can be identified with
where the duality pairing between \((\phi , \Phi )\in E\) and \((b,\sigma ,V)\in E^*\) is given by
keeping in mind that \(\sigma \) is a vectorvalued measure.
Define the functionals \(F,G: E \rightarrow \mathbb {R}\cup \{+\,\infty \}\) by
Here we say that a pair \((\phi ,\Phi )\in E\) belongs to \(\mathcal {D}\) if for all continuous curves \(t\mapsto \eta _t\in \mathbb {R}_+^\mathcal {X}\) we have \(\int _0^1\langle {\dot{\phi }_t,\eta _t}\rangle +\frac{1}{2} \Vert \Phi _t\Vert _{\eta _t}^2\; \mathrm {d}t \le 0\). It is readily checked that F and G are convex. Moreover, setting \(\bar{\phi }(t) = t(\,1,\ldots ,\,1)\) and \(\bar{\Phi } \equiv 0\), both F and G are finite at \((\bar{\phi },\bar{\Phi })\) and G is continuous at \((\bar{\phi },\bar{\Phi })\). Note that for \(\phi \in C^1\big ([0,1],\mathbb {R}^\mathcal {X}\big )\) we have \((\phi ,\nabla \phi )\in \mathcal {D}\) if and only if \(\phi \in \mathsf {HJ}_\mathcal {X}^1\) which follows from a simple localisation argument in t. Hence, the supremum in the lefthand side of (3.4) coincides with the supremum in the righthand side of (3.2).
We will calculate the Legendre–Fenchel transforms of F and G. For F we obtain
Thus, by homogeneity of the last expression in \(\phi \), one has \(F^*(b,\sigma ,\nu )= + \infty \) unless \((\sigma , V)\) satisfies the continuity equation \(\partial _t \sigma + \nabla \cdot V = 0\) with boundary values \(\,(\mu _0b)\) and \(\,\mu _1\), in the sense that
for all \(\phi \in C^1\big ([0,1],\mathbb {R}^\mathcal {X}\big )\). In particular, the distributional derivative of \(\sigma \) belongs to \(L^2([0,1];\mathbb {R}^\mathcal {X})\). Since the antiderivative of a distribution is unique up to a constant, the fundamental theorem of Lebesgue calculus implies that \(\sigma \) has the form \(\mathrm {d}\sigma (t)=\sigma _t\; \mathrm {d}t\) for some curve \((\sigma _t)_t\in H^1([0,1];\mathbb {R}^\mathcal {X})\). Moreover, (3.5) implies \(\sigma _0=(\mu _0b)\) and \(\sigma _1=\mu _1\). Thus, we obtain
where \(\mathsf {CE}'\) is defined by dropping the positivity and normalisation condition (i) in the definition of \(\mathsf {CE}\), and we have identified the measure \(\sigma \) with the \(H^1\)map \(\sigma _t\).
As it suffices to calculate the transform of G at points \((b,\sigma ,V)\) where \(F^*(b,\sigma ,V)\) is finite, we can assume that \(\mathrm {d}\sigma (t)= \sigma _t\; \mathrm {d}t\) with \((\sigma _t)_{t}\in H^1([0,1];\mathbb {R}^\mathcal {X})\). We claim that:
Indeed, it follows that
Since \((\phi , \Phi ) \in \mathcal {D}\) implies \((\phi + c, \Phi ) \in \mathcal {D}\) for all \(c \in \mathbb {R}^\mathcal {X}\), we have \(G^*(b,\sigma ,V)=+\infty \) unless \(b=0\). Moreover, from the definition of \(\mathcal {D}\) we infer that \(G^*(b,\sigma ,V)=+\infty \) unless \(\sigma _t\in \mathbb {R}_{+}^\mathcal {X}\) for a.e. t.
Let us assume that \(b=0\) and \(\int _0^1\mathcal {A}(\sigma _t,V_t)\; \mathrm {d}t<\infty \). Then we obtain
where the first inequality follows from the definition of \(\mathcal {D}\) and the second from (2.4).
It remains to show that we have in fact equality. First we consider a convolution in time yielding smooth pairs \(\sigma ^\varepsilon _t\), \(V^\varepsilon _t\) converging to \(\sigma _t\), \(V_t\) as \(\varepsilon \rightarrow 0\). Then we set for \(\delta > 0\), \(\sigma ^{\delta ,\varepsilon }_t = \sigma _t^\varepsilon + \delta \pi \). By convexity of the action and monotonicity of the mean \(\Lambda \) we have
The convexity and lower semicontinuity of A further implies the lower semicontinuity of the action; see [1, Theorem 3.4.3] for a general result on lower semicontinuity of integral functionals and the proof of [5, Theorem 3.2] for the application to the action functional \(\mathcal {A}\). Consequently,
Now, we can choose in particular \((\phi ^{\delta ,\varepsilon },\Phi ^{\delta ,\varepsilon })\) such that
where \(\sigma ^{\delta ,\varepsilon }=\rho ^{\delta ,\varepsilon }\pi \).
We claim that \((\phi ^{\delta ,\varepsilon },\Phi ^{\delta ,\varepsilon })\in \mathcal {D}\). To see this, we use the inequality
which is an identity for \(s=v, t=u\), see [5, Lemma 2.2]. From this we infer that for any \(\mu =\tilde{\rho }\pi \in \mathcal {P}(\mathcal {X})\) we have
which proves the claim. Note that for \(\tilde{\rho }=\rho ^{\delta ,\varepsilon }_t\) we obtain equality.
Next we claim that
To prove this, we compare the lefthand side and the second line in (3.10). The limit \(\varepsilon \rightarrow 0\) is justified by dominated convergence, since (3.10) yields the majorant
where C depends on Q and \(\pi \). The righthand side converges as \(\varepsilon \rightarrow 0\) by (3.9). The limit \(\delta \rightarrow 0\) is justified by monotone convergence. Similarly, we have
Here, we can use the estimate \(ab\le \frac{1}{2}a^2+\frac{1}{2}b^2\) to obtain a majorant that converges by (3.9) as before. Thus the expression in the first braced bracket of (3.8) converges to the righthand side of (3.8) with this choice of \((\phi ^{\delta ,\varepsilon },\Psi ^{\delta ,\varepsilon })\) as \(\delta ,\varepsilon \rightarrow 0\). A similar argument yields \(G^*(0,\sigma ,V)=\infty \) if \(\int _0^1\mathcal {A}(\sigma _t,V_t)\; \mathrm {d}t=\infty \). Combining (3.6), (3.7) and the fact that \(\mathcal {A}(\sigma ,V)=+\infty \) unless \(\sigma \in \mathbb {R}^\mathcal {X}_+\), we obtain
Thus the infimum in the righthand side of (3.4) coincides with \(\frac{1}{2}\mathcal {W}(\mu _0,\mu _1)^2\). An application of Theorem 3.5 concludes the proof. \(\square \)
4 Locality of optimal curves
In this section we investigate locality properties for discrete transport geodesics. More precisely, we study the following question: Given two probability measures supported in a subset \(\mathcal {Y}\) of a state space \(\mathcal {X}\), is there an optimal curve connecting them that is supported in \(\mathcal {Y}\)? The crucial tool to analyse this question is the dual characterisation of the transport problem given in the previous section. We prove two types of results.
Firstly, we show that the question can be answered affirmatively, under a simple condition (the retraction property of the subgraph \(\mathcal {Y}\)), which will be introduced below. This property ensures that any competitor in the dual problem on the subgraph can be extended to a competitor on the full graph. We present several examples where this property is satisfied. Later, in Sect. 6, we will show that locality may fail if the retraction property is not satisfied.
We start by introducing the retraction property and we give several examples. To increase readability, we often write subscripts instead of parentheses, e.g., \(Q_{xy} = Q(x,y)\).
A subset \(\mathcal {Y}\subseteq \mathcal {X}\) is said to be connected if any two distinct points \(y, y' \in \mathcal {Y}\) can be connected by a path \(\{y_i\}_{i=0}^n \subseteq \mathcal {Y}\) satisfying \(y_0 = y\), \(y_n = y'\), and \(Q(y_{i1}, y_i) > 0\) for \(i=1,\ldots ,n\).
Definition 4.1
(Retraction property) A connected subset \(\mathcal {Y}\subseteq \mathcal {X}\) has the retraction property if there exists a map \(T:\mathcal {X}\rightarrow \mathcal {Y}\) such that

(R1)
\(T(y)=y\) for all \(y\in \mathcal {Y}\);

(R2)
For all \(y,y' \in \mathcal {Y}\) with \(y\ne y'\), and all \(x \in T^{1}(y)\), we have
$$\begin{aligned} \sum _{x'\in T^{1}(y')} Q(x,x') \le Q(y,y'). \end{aligned}$$
The map T is called a retraction of \(\mathcal {X}\) onto \(\mathcal {Y}\).
Remark 4.2
If the Markov triple \((\mathcal {X},Q,\pi )\) corresponds to a simple random walk (i.e., \(Q(x, y) \in \{0,1\}\) for all \(x, y \in \mathcal {X}\)), the retraction property can be rephrased in graph theoretical terms. Indeed, it is readily verified that the retraction property holds if and only if there exists a map \(T : \mathcal {X}\rightarrow \mathcal {Y}\) with the following properties:
 \((R1')\) :

\(T(y) = y\) for all \(y \in \mathcal {Y}\);
 \((R2')\) :

If \(x \sim x'\), then \(T(x) = T(x')\) or \(T(x) \sim T(x')\);
 \((R3')\) :

If \(x_1' \sim x\), \(x_2' \sim x\), and \(T(x_1') = T(x_2')\) for some \(x_1' \ne x_2'\), then \(T(x) = T(x_1')\).
Definition 4.3
(Restriction) The restriction of a Markov triple \((\mathcal {X},Q,\pi )\) to a connected subset \(\mathcal {Y}\subseteq \mathcal {X}\) is the Markov triple \((\mathcal {Y},Q_{\mathcal {Y}},\pi _{\mathcal {Y}})\), where \(Q_{\mathcal {Y}}\) is the restriction of Q to \(\mathcal {Y}\times \mathcal {Y}\), and \(\pi _{\mathcal {Y}}\) is the normalised restriction of \(\pi \) to \(\mathcal {Y}\).
Connectedness of \(\mathcal {Y}\) implies that the Markov triple \((\mathcal {Y},Q_{\mathcal {Y}},\pi _{\mathcal {Y}})\) is irreducible, and the detailed balance relation is obviously inherited. The following result implies that if \(\mathcal {Y}\) has the retraction property as a subset of \(\mathcal {X}\), it also has this property as a subset of any set \(\mathcal {X}'\) with \(\mathcal {Y}\subseteq \mathcal {X}' \subseteq \mathcal {X}\).
Lemma 4.4
Let \((\mathcal {X}, Q, \pi )\) be a Markov triple and \(\mathcal {Y}\subseteq \mathcal {X}' \subseteq \mathcal {X}\). If \(T : \mathcal {X}\rightarrow \mathcal {Y}\) is a retraction, then its restriction \(T_{\mathcal {X}'} : \mathcal {X}' \rightarrow \mathcal {Y}\) is a retraction as well.
Proof
This follows immediately from the definition. \(\square \)
We present some examples of sets with the retraction property.
Example 4.5
(Cycle) For \(n \ge 2\), let \(\mathcal {X}=\mathbb {Z}/n\mathbb {Z}\), and set \(Q_{j,j+1}=Q_{j+1,j}=1\) and \(Q_{ij}=0\) otherwise. All computations are to be understood modulo n. We claim that the subset \(\{1,\dots ,k\}\) of \(\mathcal {X}\) has the retraction property if and only if \(2k \le n\). In this case, a retraction is given as follows (cf. Fig. 1):
Indeed, to check sufficiency, note that \((R1')\) is trivial, \((R2')\) holds since \(n \ge 2k\), and \((R3')\) is readily checked as well. Necessity follows from a simple argument.
Example 4.6
(Grid) Consider \(\mathbb {Z}^d\) with the usual graph structure given by \(Q_{xy}=1\) if \(xy=1\) and \(Q_{xy}=0\) otherwise. Let \(\mathcal {Y}\subseteq \mathbb {Z}^d\) be a nonempty subset of the form \(\mathcal {Y}= \mathcal {R}\cap \mathbb {Z}^d\), where \(\mathcal {R}= \prod _{j=1}^d [a_j,b_j]\) is a hyperrectangle, and let \(\mathcal {X}\) be a connected subgraph of \(\mathbb {Z}^d\) containing \(\mathcal {Y}\). We claim that \(\mathcal {Y}\) has the retraction property. Indeed, it is readily checked that a retraction from \(\mathcal {X}\) to \(\mathcal {Y}\) can be obtained by mapping \(x \in \mathcal {X}\) to the point in \(\mathcal {Y}\) that is closest to x with respect to the Euclidean distance.
Example 4.7
(2Point space) Assume that Q takes values in \(\{0, 1\}\) and let \(x, y \in \mathcal {X}\) with \(Q_{xy} = 1\). A disjoint decomposition \(\mathcal {X}=A_x\cup A_y\) with \(x\in A_x\) and \(y\in A_y\) is called an xy cut. An edge \((u,v) \in \mathcal {E}\) is a cross if \(u \in A_x\) and \(v\in A_y\). The subset \(\{x,y\}\) has the retraction property if and only if there exists an xy cut such that no distinct crosses share a point. The correspondence between xy cuts with this property and retractions is given by \(T^{1}(x)=A_x\), \(T^{1}(y)=A_y\) (Fig. 2).
Example 4.8
(Honeycomb lattice) Let \((\mathcal {X},\mathcal {E})\) be a connected subgraph of the honeycomb lattice and define transition rates by setting \(Q_{xy} = 1\) if \((x,y)\in \mathcal {E}\) and zero otherwise. Then each fundamental cell \(\mathcal {Y}=\{y_1,\dots ,y_6\}\) (see Fig. 3) has the retraction property. Indeed, to obtain a retraction of \(\mathcal {X}\) onto \(\mathcal {Y}\), we partition the plane into 6 sectors separated by rays that originate at the centre of \(\mathcal {Y}\) and intersect the midpoints of the sides of \(\mathcal {Y}\) orthogonally. A retraction is then obtained by mapping each \(x \in \mathcal {X}\) to the unique \(y \in \mathcal {Y}\) that belongs to the same sector (cf. Fig. 3).
Example 4.9
(Trees) Assume that the graph \((\mathcal {X}, \mathcal {E})\) is a tree, i.e., it does not contain a cycle. Every subtree \(\mathcal {Y}\) of \(\mathcal {X}\) has the retraction property, and a retraction can be constructed as follows: Fix a vertex \(y \in \mathcal {Y}\). Since \(\mathcal {X}\) is a tree, for every \(x\in \mathcal {X}\) there is a unique path \(\gamma \) without selfintersections connecting x and y. The map assigning to x the first point where the path \(\gamma \) meets \(\mathcal {Y}\) is a retraction of \(\mathcal {X}\) onto \(\mathcal {Y}\). Note that the retraction property depends only on the graph \((\mathcal {X}, \mathcal {E})\) and not on the choice of the transition rates Q (as long as they give rise to the same graph).
Theorem 4.10
(Extension of Hamilton–Jacobi subsolutions) Let \((\mathcal {X}, Q, \pi )\) be a Markov triple, and let \(\mathcal {Y}\) be a connected subset of \(\mathcal {X}\). If \(\mathcal {Y}\) has the retraction property, then every Hamilton–Jacobi subsolution on \(\mathcal {Y}\) can be extended to a Hamilton–Jacobi subsolution on \(\mathcal {X}\).
Proof
Let \(\phi \) be a Hamilton–Jacobi subsolution on \(\mathcal {Y}\), and let T be a retraction of \(\mathcal {X}\) onto \(\mathcal {Y}\). Define \(\bar{\phi } : \mathcal {X}\rightarrow \mathbb {R}\) by \(\bar{\phi } := \phi \circ T\), so that \(\bar{\phi }_\mathcal {Y}= \phi \) by (R1). We will show that for any \(\bar{\nu } \in \mathcal {P}(\mathcal {X})\), there exists \(\nu \in \mathcal {P}(\mathcal {Y})\) such that
for a.e. t. To improve readability, we omit the subscript t. As \(\phi \in \mathsf {HJ}_\mathcal {Y}\), the righthand side of (4.1) is nonpositive, so this suffices to prove the theorem.
For \(\bar{\nu }\in \mathcal {P}(\mathcal {X})\) define \(\nu \in \mathcal {P}(\mathcal {Y})\) by \(\nu := T_\# \bar{\nu }\). Clearly,
It thus remains to show that \(\Vert \nabla \bar{\phi }_t\Vert _{\bar{\nu }} \le \Vert \nabla \phi _t\Vert _{\nu }\).
Splitting the sum we obtain
The concavity and homogeneity of \(\Lambda \) imply
Given \(x \in \mathcal {X}\) and \(y,y' \in \mathcal {Y}\) with \(y\ne y'\) and \(T(x) = y\), the retraction property (R2) implies that \(\sum _{x'\in T^{1}(y')} Q_{xx'} \le Q_{yy'}\) (and the same holds with primed and unprimed variables interchanged). Hence the monotonicity of \(\Lambda \) yields
Combining these inequalities, we infer that
which completes the proof. \(\square \)
The following result shows that any pair of measures supported in a set \(\mathcal {Y}\) with the retraction property can be connected by a geodesic supported in \(\mathcal {Y}\).
Theorem 4.11
(Weak locality under the retraction property) Let \((\mathcal {X}, Q, \pi )\) be a Markov triple, and let \(\mathcal {Y}\) be a subset of \(\mathcal {X}\) with the retraction property. For all \(\mu ^0,\mu ^1 \in \mathcal {P}(\mathcal {X})\) with support in \(\mathcal {Y}\) there exists a minimising \(\mathcal {W}\)geodesic \((\mu _t)_{t \in [0,1]} \subseteq \mathcal {P}(\mathcal {X})\) connecting \(\mu ^0\) and \(\mu ^1\) such that \(\mu _t\) has support in \(\mathcal {Y}\) for all \(t\in [0,1]\).
In fact, we will show that any \(\mathcal {W}_\mathcal {Y}\)geodesic \((\mu _t)_t \subseteq \mathcal {P}(\mathcal {Y})\) is also a \(\mathcal {W}_\mathcal {X}\)geodesic when regarded as a curve in \(\mathcal {P}(\mathcal {X})\).
Proof
Let \((\mu _t)_t\) be a minimising geodesic in \(\mathcal {P}(\mathcal {Y})\) satisfying the continuity Eq. (2.1) with momentum vector field \((V_t)_t\). Consider the extension to \(\mathcal {X}\) defined by \(\bar{\mu }_t(x)=0\) if \(x\notin \mathcal {Y}\) and \(\bar{V}_t(x,x')=0\) if \(x\notin \mathcal {Y}\) or \(x'\notin \mathcal {Y}\). Clearly, \((\bar{\mu }_t,\bar{V}_t)_t\) has the same action as \((\mu _t,V_t)_t\).
Let \(\varepsilon > 0\). Since \((\mu _t,V_t)_t\) is a geodesic in \(\mathcal {P}(\mathcal {Y})\), Theorem 3.4 (applied in \(\mathcal {P}(\mathcal {Y})\)) implies that there exists a Hamilton–Jacobi subsolution \(\phi \in \mathsf {HJ}_\mathcal {Y}\) such that
By Theorem 4.10, \(\phi \) can be extended to a Hamilton–Jacobi subsolution \(\bar{\phi } \in \mathsf {HJ}_\mathcal {X}\). In particular, using Theorem 3.4 once more (this time in \(\mathcal {P}(\mathcal {X})\)),
Since \(\varepsilon > 0\) is arbitrary, it follows that \(\int _0^1\mathcal {A}(\mu _t,V_t) \; \mathrm {d}t \le \mathcal {W}_\mathcal {X}^2(\mu ^0,\mu ^1)\), which yields the result. \(\square \)
It follows from the previous result that Ricci curvature bounds in the sense of [5, 18] are inherited by subsets with the retraction property. We recall that a Markov triple \((\mathcal {X}, Q,\pi )\) is said to have Ricci curvature bounded from below by \(\kappa \in \mathbb {R}\) if for any \(\mu _0, \mu _1 \in \mathcal {P}(\mathcal {X})\), and for some (equivalently, for any) \(\mathcal {W}\)geodesic \((\mu _t)\) connecting \(\mu _0\) and \(\mu _1\), the relative entropy \(\mu \mapsto {{\,\mathrm{Ent}\,}}_\pi (\mu ) := \sum _{x \in \mathcal {X}}\mu (x) \log \big (\frac{\mu (x)}{\pi (x)}\big )\) satisfies the following \(\kappa \)convexity inequality, for any \(0 \le t \le 1\):
In this case we write \({{\,\mathrm{Ric}\,}}(\mathcal {X}, Q, \pi ) \ge \kappa \); cf. [5, 18] for further details.
Corollary 4.12
Let \((\mathcal {X}, Q, \pi )\) be a Markov triple, and let \(\mathcal {Y}\) be a subset of \(\mathcal {X}\) with the retraction property. If \({{\,\mathrm{Ric}\,}}(\mathcal {X}, Q, \pi ) \ge \kappa \) for some \(\kappa \in \mathbb {R}\), then \({{\,\mathrm{Ric}\,}}(\mathcal {Y}, Q_\mathcal {Y}, \pi _\mathcal {Y}) \ge \kappa \) as well.
Proof
Take \(\mu _0, \mu _1 \in \mathcal {P}(\mathcal {Y})\), and let \((\mu _t)_t\) be a \(\mathcal {W}_\mathcal {Y}\)geodesic connecting them. By Theorem 4.11, \((\mu _t)_t\) is also a geodesic in \(\mathcal {P}(\mathcal {X})\). Since \({{\,\mathrm{Ric}\,}}(\mathcal {X}, Q, \pi ) \ge \kappa \), it follows that \(t \mapsto {{\,\mathrm{Ent}\,}}(\mu _t\pi )\) is \(\kappa \)convex. As \({{\,\mathrm{Ent}\,}}_{\pi _\mathcal {Y}}(\mu _t) = {{\,\mathrm{Ent}\,}}_{\pi }(\mu _t) + \log (\pi (\mathcal {Y}))\) and \(\mathcal {W}_\mathcal {Y}(\mu _0, \mu _1) = \mathcal {W}_\mathcal {X}(\mu _0, \mu _1)\), we infer that \(t \mapsto {{\,\mathrm{Ent}\,}}_{\pi _\mathcal {Y}}(\mu _t)\) is \(\kappa \)convex as well, which yields the result. \(\square \)
5 Optimal transport avoids dead ends
In this section we prove the intuitively natural statement that optimal curves do not transport mass into “dead ends”. We formalise this concept by considering the gluing of two Markov triples along a vertex.
Definition 5.1
(Gluing of Markov triples) Let \((\mathcal {X}_1,Q_1,\pi _1)\) and \((\mathcal {X}_2,Q_2,\pi _2)\) be Markov triples, and fix \(x_1\in \mathcal {X}_1\), \(x_2\in \mathcal {X}_2\). The gluing of the two triples at \(x_1, x_2\) is the Markov triple \((\mathcal {X},Q,\pi )\) defined by setting
and \(*= [x_1] = [x_2]\). For brevity, let us write \(\mathcal {X}_i' := \mathcal {X}_i \backslash \{x_i\}\). We have canonical injections \(\mathcal {X}_1' \rightarrow \mathcal {X}\), \(\mathcal {X}_2' \rightarrow \mathcal {X}\), and we identify elements of \((\mathcal {X}_1\sqcup \mathcal {X}_2) \backslash \{x_1,x_2\}\) with their respective images. We define transition rates \(Q : \mathcal {X}\times \mathcal {X}\rightarrow \mathbb {R}\) by
It is easy to see that Q is irreducible and reversible, and the unique invariant probability measure is given by
Definition 5.2
(Dead end) Let \((\mathcal {X}, Q, \pi )\) be a Markov triple, and let \(\mathcal {X}_1, \mathcal {X}_2 \subseteq \mathcal {X}\). We say that \(\mathcal {X}_2\) is a dead end for \(\mathcal {X}_1\) (and vice versa) if the intersection of \(\mathcal {X}_1\) and \(\mathcal {X}_2\) contains exactly one point (denoted “\(*\)”), and moreover, \(Q(x,y) = Q(y,x) = 0\) whenever \(x \in \mathcal {X}_1'\) and \(y \in \mathcal {X}_2'\). Here, we write \(\mathcal {X}_i' = \mathcal {X}_i \backslash \{ *\}\).
Remark 5.3
The notions of dead end and gluing of Markov triples are compatible in the following sense: Let \((\mathcal {X}, Q, \pi )\) be a Markov triple, and suppose that \(\mathcal {X}_2 \subseteq \mathcal {X}\) is a dead end for \(\mathcal {X}_1 \subseteq \mathcal {X}\) with intersection point \(*\). Then one recovers \((\mathcal {X}, Q, \pi )\) by gluing together the restrictions of \(\mathcal {X}\) to \(\mathcal {X}_1\) and \(\mathcal {X}_2\) at \(*\).
Proposition 5.4
Let \((\mathcal {X}_1,Q_1,\pi _1)\) and \((\mathcal {X}_2,Q_2,\pi _2)\) be Markov triples, and let \((\mathcal {X}, Q, \pi )\) be the Markov triple obtained by gluing the triples at \(x_1 \in \mathcal {X}_1\) and \(x_2 \in \mathcal {X}_2\). Then \(\mathcal {X}_1\) and \(\mathcal {X}_2\) have the retraction property as subsets of \(\mathcal {X}\).
Proof
Define \(T : \mathcal {X}\rightarrow \mathcal {X}_1\) by \(T(x) = x\) for \(x \in \mathcal {X}_1\) and \(T(x) = *\) for \(x \in \mathcal {X}_2'\). One verifies that T indeed defines a retraction by distinguishing cases. \(\square \)
In view of Theorem 4.11, the previous result implies that any two measures \(\mu _0, \mu _1\) supported in (the image of) \(\mathcal {X}_1\) can be connected by a geodesic that is supported in \(\mathcal {X}_1\) for all times; i.e., weak locality holds. We will now show that in fact strong locality holds: any geodesic connecting \(\mu _0\) and \(\mu _1\) has to be supported in \(\mathcal {X}_1\).
Theorem 5.5
Let \((\mathcal {X}_1,Q_1,\pi _1)\) and \((\mathcal {X}_2,Q_2,\pi _2)\) be Markov triples, and let \((\mathcal {X}, Q, \pi )\) be the Markov triple obtained by gluing the triples at \(x_1 \in \mathcal {X}_1\) and \(x_2 \in \mathcal {X}_2\). If \((\mu _t)_{t\in [0,1]}\) is a geodesic in \((\mathcal {P}(\mathcal {X}),\mathcal {W})\) with \({\text {supp}}\mu _0, {\text {supp}}\mu _1\subseteq \mathcal {X}_1\), then \({\text {supp}}\mu _t\subseteq \mathcal {X}_1\) for all \(t\in [0,1]\).
Proof
Let \(t \mapsto V_t \in \mathbb {R}^{\mathcal {X}\times \mathcal {X}}\) be an antisymmetric momentum vector field such that \((\mu , V)\) is a solution to the continuity equation with \(\int _0^1 \mathcal {A}(\mu _t, V_t) \; \mathrm {d}t = \mathcal {W}^2(\mu _0, \mu _1)\). We define a new curve \(t \mapsto \bar{\mu }_t \in \mathcal {P}(\mathcal {X})\) by
and a new antisymmetric momentum vector field \(t \mapsto \bar{V}_t \in \mathbb {R}^{\mathcal {X}\times \mathcal {X}}\) by
We claim that \((\bar{\mu }, \bar{V})\) solves the continuity Eq. (2.1) as well.
Indeed, this statement trivially holds for any \(x \in \mathcal {X}\backslash \{ *\}\). To prove the claim at \(*\), we note that for any \(y \in \mathcal {X}_2'\),
Therefore, using the antisymmetry of \(V_t\),
Furthermore,
hence by another application of the antisymmetry,
which proves the claim.
For all \(t \in (0,1)\) and \(x, y \in \mathcal {X}\), we clearly have
Moreover, if \(\mu _t(\mathcal {X}_2') > 0\) for some \(t \in (0,1)\), then there exists \(z \in \mathcal {X}_2'\) such that \(V_t(*, z) > 0\) and \(\Lambda (\mu _t(*) Q(*,z), \mu _t(z) Q(z,*)) > 0\) for all t on a set of positive measure in (0, 1). Therefore,
This strict inequality contradicts the fact that \((\mu _t)_{t\in [0,1]}\) is a geodesic. \(\square \)
6 Nonlocality of optimal transport on the triangle
Consider a Markov triple \((\mathcal {X}, Q, \pi )\) and a connected subset \(\mathcal {Y}\subseteq \mathcal {X}\). In this section we show that locality of geodesics in \(\mathcal {P}(\mathcal {Y})\) may fail if \(\mathcal {Y}\) does not have the retraction property. We consider the simplest possible setting, where \((\mathcal {X}, Q, \pi )\) corresponds to simple random walk on a triangle, and \(\mathcal {Y}\subseteq \mathcal {X}\) is a twopoint set. We show that the canonical lift of a geodesic between Dirac measures on the twopoint space is not an optimal curve in \(\mathcal {P}(\mathcal {X})\), by constructing a competitor that transports mass along all edges.
Throughout this section we make the following additional assumption on the mean \(\Lambda \).
Assumption 6.1
For any \(s>0\) we have
If \(\Lambda (0,t)>0\) for \(t>0\), then (6.1) also holds for \(s=0\).
Clearly, this assumption is satisfied for the arithmetic, geometric, and logarithmic means, but not for the harmonic mean.
The main result of this section relies on the following lemma concerning the variation of the action functional on cycles of arbitrary length.
Lemma 6.2
For \(n \ge 3\), let \(\mathcal {X}= \mathbb {Z}/n\mathbb {Z}\) be equipped with transition rates \(Q_{ij}\) such that \(Q_{i,i+1},Q_{i+1,i}>0\) for all \(i\in \mathcal {X}\) and \(Q_{ij}=0\) otherwise. Let \(\mu , \nu \in \mathcal {P}(\mathcal {X})\), and let \(V,U \in \mathbb {R}^{\mathcal {X}\times \mathcal {X}}\) be antisymmetric, and such that both \(\mathcal {A}(\mu ,V)\) and \(\mathcal {A}(\nu ,U)\) are finite. Assume that \(\mu _1,\mu _2>0\) and \(\mu _i = 0\) for all \(i\ne 1,2\), \(V_{12} \ne 0\), and \(V_{ij} = 0\) for all \(\{i,j\} \ne \{1,2\}\) and that \(U_{12}=0\). For \(\alpha \in [0,1]\) we define \(\mu ^\alpha =(1\alpha )\mu +\alpha \nu \) and \(V^\alpha =(1\alpha )V+\alpha U\). Then we have:
Proof
First note that
Using the 1homogeneity of \(\Lambda \) we observe that
Since \(\mu _1,\mu _2>0\), the first term in (6.2) is welldefined and easily seen to be the limit of \((T_1\mathcal {A}(\mu ,V))/\alpha \). Obviously, \(T_2/\alpha \) converges to the second term in (6.2). Finally, \(T_3\) vanishes unless \(U_{23}\ne 0\). But in this case, since \((1\alpha )/\alpha \rightarrow \infty \) as \(\alpha \rightarrow 0\), we see that \(T_3/\alpha \) converges to zero as \(\alpha \rightarrow 0\) by Assumption 6.1. A similar argument applies to \(T_4\). \(\square \)
Now we can prove the nonlocality result.
Theorem 6.3
Let \((\mathcal {X},Q,\pi )\) be a Markov triple with \(\mathcal {X}=\{1,2,3\}\) and such that \(Q(x,y) > 0\) for all \(x\ne y\). Let \((\mu _t)_{t \in [0,1]}\) be a \(\mathcal {W}\)geodesic connecting \(\mu _0 = \delta _1\) to \(\mu _1 = \delta _2\). Then, \(\mu _t(3)>0\) for some \(0<t<1\).
As \(\mu _t(3)>0\) for some \(0<t<1\), the result implies that mass is transported along the edges (1, 3) and (3, 2).
Proof
Suppose that the geodesic \((\mu _t,V_t)_{t\in [0,1]}\) transports only along the edge (1, 2), i.e., \(V_t(2,3) = V_t(3,1) = 0\) for a.e. \(t \in (0, 1)\). Then \((\mu _t,V_t)\) must be given by the corresponding geodesic on the two point space \(\{1,2\}\). Obviously, we have \(\mu _t(1),\mu _t(2)>0\) for all \(t\in (0,1)\). Let \((\nu ,U) \in \mathsf {CE}_1(\delta _1,\delta _2)\) be a curve of finite action such that \(U_t(1,2)=0\) for a.e. t and \(\nu _t(1),\nu _t(2),\nu _t(3)>0\) for all \(0<t<1\). Define \((\mu ^\alpha ,V^\alpha )\in \mathsf {CE}_1(\delta _1,\delta _2)\) by \(\mu ^\alpha =(1\alpha )\mu +\alpha \nu \) and \(V^\alpha =(1\alpha )V+\alpha U\) for \(\alpha \in [0,1]\). Then Lemma 6.2 yields for a.e. t:
Consequently, there exists \(\alpha >0\) such that
contradicting the optimality of \((\mu ,V)\). \(\square \)
References
Buttazzo, G.: Semicontinuity, Relaxation and Integral Representation in the Calculus of Variations. Pitman Research Notes in Mathematics Series, vol. 207. Longman Scientific & Technical, Harlow (1989)
Chow, S.N., Huang, W., Li, Y., Zhou, H.: Fokker–Planck equations for a free energy functional or Markov process on a graph. Arch. Ration. Mech. Anal. 203(3), 969–1008 (2012)
Crandall, M.G., Lions, P.L.: Viscosity solutions of Hamilton–Jacobi equations. Trans. Am. Math. Soc. 277(1), 1–42 (1983)
Erbar, M., Henderson, C., Menz, G., Tetali, P.: Ricci curvature bounds for weakly interacting Markov chains. Electron. J. Probab. 22, Paper No. 40, 23 (2017)
Erbar, M., Maas, J.: Ricci curvature of finite Markov chains via convexity of the entropy. Arch. Ration. Mech. Anal. 206(3), 997–1038 (2012)
Erbar, M., Maas, J.: Gradient flow structures for discrete porous medium equations. Discrete Contin. Dyn. Syst. 34, 1355–1374 (2014)
Erbar, M., Maas, J., Tetali, P.: Discrete Ricci curvature bounds for Bernoulli–Laplace and random transposition models. Ann. Fac. Sci. Toulouse Math. (6) 24(4), 781–800 (2015)
Fathi, M., Maas, J.: Entropic Ricci curvature bounds for discrete interacting systems. Ann. Appl. Probab. 26(3), 1774–1806 (2016)
Gangbo, W., Li, W., Mou, C.: Geodesic of minimal length in the set of probability measures on graphs. ArXiv eprints, December (2017)
Gozlan, N., Roberto, C., Samson, P.M.: Hamilton Jacobi equations on metric spaces and transport entropy inequalities. Rev. Mat. Iberoam. 30(1), 133–163 (2014)
Jordan, R., Kinderlehrer, D., Otto, F.: The variational formulation of the Fokker–Planck equation. SIAM J. Math. Anal. 29(1), 1–17 (1998)
Lisini, S.: Characterization of absolutely continuous curves in Wasserstein spaces. Calc. Var. Partial Differ. Equ. 28(1), 85–120 (2007)
Lott, J., Villani, C.: Hamilton–Jacobi semigroup on length spaces and applications. J. Math. Pures Appl. (9) 88(3), 219–229 (2007)
Lott, J., Villani, C.: Ricci curvature for metricmeasure spaces via optimal transport. Ann. Math. (2) 169(3), 903–991 (2009)
Maas, J.: Gradient flows of the entropy for finite Markov chains. J. Funct. Anal. 261(8), 2250–2292 (2011)
McCann, R.J.: A convexity principle for interacting gases. Adv. Math. 128(1), 153–179 (1997)
Mielke, A.: A gradient structure for reactiondiffusion systems and for energydriftdiffusion systems. Nonlinearity 24(4), 1329–1346 (2011)
Mielke, A.: Geodesic convexity of the relative entropy in reversible Markov chains. Calc. Var. Partial Differ. Equ. 48(1–2), 1–31 (2013)
Maas, J., Matthes, D.: Longtime behavior of a finite volume discretization for a fourth order diffusion equation. Nonlinearity 29(7), 1992–2023 (2016)
Otto, F.: The geometry of dissipative evolution equations: the porous medium equation. Commun. Partial Differ. Equ. 26(1–2), 101–174 (2001)
Shu, Y.: Hamilton–Jacobi equations on graph and applications. Potential Anal. 48(2), 125–157 (2018)
Sturm, KTh: On the geometry of metric measure spaces. I and II. Acta Math. 196(1), 65–177 (2006)
Villani, C.: Topics in Optimal Transportation. Graduate Studies in Mathematics, vol. 58. American Mathematical Society, Providence (2003)
Acknowledgements
Open access funding provided by Institute of Science and Technology (IST Austria). Matthias Erbar gratefully acknowledges support by the German Research Foundation through the Hausdorff Centre for Mathematics and the Collaborative Research Centre 1060, The Mathematics of Emergent Effects. Jan Maas gratefully acknowledges support by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 716117), and by the Austrian Science Fund (FWF), Project SFB F65. Melchior Wirth gratefully acknowledges financial support by the German Academic Scholarship Foundation (Studienstiftung des deutschen Volkes) and by the German Research Foundation (DFG) via RTG 1523. We thank an anonymous referee for suggesting to include the statement of Corollary 4.12 in the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by L. Ambrosio.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.