Heat Flow with Dirichlet Boundary Conditions via Optimal Transport and Gluing of Metric Measure Spaces

We introduce the transportation-annihilation distance $W_p^\sharp$ between subprobabilities and derive contraction estimates with respect to this distance for the heat flow with homogeneous Dirichlet boundary conditions on an open set in a metric measure space. We also deduce the Bochner inequality for the Dirichlet Laplacian as well as gradient estimates for the associated Dirichlet heat flow. For the Dirichlet heat flow, moreover, we establish a gradient flow interpretation within a suitable space of charged probabilities. In order to prove this, we will work with the \emph{doubling} of the open set, the space obtained by gluing together two copies of it along the boundary.


Introduction and Statement of Main Results
We present an approach to heat flow with homogeneous Dirichlet boundary conditions via optimal transport -indeed, the very first ever -based on a novel particle interpretation for this evolution. The classical particle interpretation for the heat flow in an open set Y with Dirichlet boundary condition is based on particles which move around in Y and are killed (or lose their mass) as soon as they hit the boundary ∂Y . Our new interpretation will be based on particles moving around in Y , which are reflected if they hit the boundary, and which thereby randomly change their "charge": half of them change into "antiparticles", half of them continue to be normal particles. Effectively, they annihilate each other but the total number of charged particles remains constant.
This leads us to regard the initial probability distribution as a distribution σ + 0 of normal particles, with no antiparticles being around at time 0, i.e. σ − 0 = 0. In the course of time, σ + t and σ − t will evolve as subprobability measures on Y and so does the "effective distribution" σ 0 t := σ + t − σ − t whereas the "total distribution" σ t := σ + t + σ − t continues to be a probability measure. The latter will evolve as heat flow with Neumann boundary conditions whereas the former will evolve as heat flow with Dirichlet boundary conditions. The evolution of the charged particle distribution σ t = (σ + t , σ − t ) will be characterized as an EVI-gradient flow for the Boltzmann entropy. New transportation distances for subprobability measures will yield contraction estimates for the effective flow.
Technically, we will interpret the pairs of subprobability measures (σ + , σ − ) as a probability measure on the doubling of Y in X, i.e. a space obtained by gluing together two copies of X along the "boundary" X \ Y . Both settings are equivalent. Under a curvature condition for the doubling, we get Wasserstein contraction results and gradient estimates for the heat flow with Dirichlet boundary values.
In particular, we also obtain the very first version of a Bochner inequality for the Dirichlet Laplacian on a convex subset of a Riemannian manifold -which surprisingly involves both, the Dirichlet Laplacian and the Neumann Laplacian.
To overcome the lack of a triangle inequality for W 0 p , we now strive for a related length metric. In a first step, we define a (pseudo-) metric, and out of this the induced length (pseudo-) metric.
Remark 1.5. Both, W ♭ p and W ♯ p are a priori only pseudo-metrics; the former the biggest one below W 0 p , the latter the smallest intrinsic one above W ♭ p . In what follows, it will turn out however that both indeed are metrics and for p = 1 they coincide.
Definition 1.6. i) W ′ p will denote the L p -Kantorovich-Wasserstein distance on P p (Y ′ ) induced by the distance d ′ . ii) Extending each subprobability measure µ ∈ P sub (Y ) to a probability measure µ ′ ∈ P(Y ′ ) by µ ′ := µ + (1 − µ(Y ))δ ∂ induces a bijective embedding of P sub (Y ) into P(Y ′ ). The induced distance on P sub (Y ) will again be denoted by W ′ p . iii) For subprobability measures µ, ν of equal mass we will also make use of the transportation cost induced by d † . iv) Finally, for subprobabilities of equal mass define the L p -transportation distance with respect to the meta-metric d * and let W * p (µ) := 1 2 W * p (µ, µ), which will be called annihilation cost of the subprobability µ.
Remark 1.7. Obviously, W * p is symmetric in its arguments and satisfies the triangle inequality but typically W * p (µ, µ) = 0. [FG]. From now on until the end of this subsection assume that (X, d) is a length space.
Quite intuitive characterizations of W 0 p (µ, ν), W ♯ p (µ, ν), and W ′ p (µ, ν) are possible in terms of L p -transportation costs and and L p -annihilation costs.
ii) More generally for all p ≥ 1 and µ, ν ∈ P sub p (Y ) with 0 denoting the subprobability measure with vanishing total mass.
In the case p = 1, contributions from the term W † p (µ 2 , ν 2 ) p can be avoided, in other words, one can always choose µ 2 = ν 2 = 0.
. ii) More generally, for all p ≥ 1 and all µ, ν ∈ P sub p (Y ) Taking for p > 1, n ≥ 1. In particular, the lower estimate for W ♭ p in assertion ii) of the previous Theorem is sharp.
A useful feature of W ♯ p is that it metrizes vague convergence of subprobability measures.
Proposition 1.15. Assume that X is a compact geodesic space. Then for every p ∈ [1, ∞), W ♯ p is a complete, separable, geodesic metric on P sub p (Y ) and for µ n , µ ∈ P sub p (Y ) the following are equivalent: (i) µ n → µ vaguely on Y .
(ii) W ♯ p (µ n , µ) → 0 as n → ∞ Remark 1.16. In particular, this implies that µ n → µ weakly on Y if and only if W ♯ p (µ n , µ) → 0 and µ n (Y ) → µ(Y ). A similar result for W 0 p can be deduced even without requiring that X is geodesic, see Lemma 4.4.
The implication "(ii)⇒(i)" holds true for all length spaces X without requiring their compactness. For the converse, one has to add a condition on convergence of moments, see remark following Lemma 4.4.
1.2. Gradient flow perspective and transportation estimates. From now on, let us be more specific. We assume that (X, d, m) is a metric measure space which satisfies an RCD(K, ∞)condition for some number K ∈ R and that Y ⊂ X is a dense open subset with m(∂Y ) = 0. The RCD(K, ∞)-condition means that the metric measure space (X, d, m) is infinitesimally Hilbertian with Ricci curvature bounded from below by K in the sense of [S4], [LV]. The latter is formulated as K-convexity of the Boltzmann entropy Ent m in P 2 (X), W 2 . We will additionally request that this property extends to the space of charged probability measures induced by Y , that is, we will request that (X, Y, d, m) satisfies the following: Assumption 1.17 ("Charged Lower Ricci Bound K"). The Boltzmann entropy Remark 1.18. a) Note that, due to the isometric embedding of P 2 (X) intoP 2 (Y |X), this assumption will imply the K-convexity of Ent m in P 2 (X), W 2 and thus the CD(K, ∞)-condition for the metric measure space (X, d, m). b) If (X, d, m) is infinitesimally Hilbertian and if m has full topological support then Assumption 1.17 implies that Y = X. Indeed, the argument from [RS] carries over to this framework and yields essential non-branching which in turn implies the density of Y in X.
The proofs of the following results will be given in Section 5. They will be based on concepts and results for gluing of metric measure spaces which will be presented in Section 3. For the various kinds of heat flows appearing from this section on, see Subsection 2.2. i) For each σ 0 ∈P 2 (Y |X), there exists a unique EVI K -gradient flow (σ t ) t>0 for the Boltzmann entropy Ent m in P 2 (Y |X),W 2 . ii) For each µ 0 ∈ P sub 2 (Y ), the heat flow (µ t ) t>0 on Y with Dirichlet boundary conditions is obtained as the effective flow will denote the heat flow on X starting in ν 0 = σ + 0 + σ − 0 and (µ t ) t>0 will denote the heat flow on Y with Dirichlet boundary conditions starting in µ 0 = σ + 0 − σ − 0 . Remark 1.21. a) As in [S1,after Cor. 4.3,Thm. 4.4] (based on [AGS2,Prop. 3.2,Thm. 3.5]) one can extend the flow to measures without finite second moment. b) In the situation of Theorem 1.19, the "heat flow on X" will be the heat flow on Y ⊂ M with Neumann boundary conditions at ∂Y . Proposition 1.22. The EVI K -flows (σ t ) t>0 and (τ t ) t>0 as above are K-contractive in all L ptransportation distances:W and (ν t ) t>0 denote the heat flows on Y with Dirichlet boundary conditions starting in µ 0 and ν 0 , resp. Proof. Given µ 0 , ν 0 ∈ P sub p (Y ) and ε > 0, we may choose Thus, by the very definition of W 0 p and by the previous proposition, Since ε > 0 was arbitrary, this proves the claim.

Gradient estimates and Bochner's inequality.
Let us continue to assume that (X, d, m) is a metric measure space which satisfies an RCD(K, ∞)-condition and that Y ⊂ X is a dense open subset with m(∂Y ) = 0. Assumption 1.17 yields a gradient estimate which involves both semigroups, P t (with Neumann boundary condition) and P 0 t (with Dirichlet boundary condition). Before proving this estimate, we will see that it is equivalent to a Bochner inequality which involves the corresponding Laplace operators. To state directly the p-versions, let us introduce the appropriate function spaces. For p ∈ [1, ∞) we set and similarly for E 0 and ∆ 0 , which are the Dirichlet form and generator associated to the heat flow P 0 t . Proposition 1.25. Assume that m(X) < ∞. For each p ∈ [1, 2], the following properties are equivalent to each other: Note that different semigroups appear on the left and right hand side.
The proof is an adaption of the one of [H,Thm. 3.6].
Theorem 1.26. i) Assumption 1.17 implies that both properties (i) and (ii) of Proposition 1.25 are satisfied, even for all p ∈ [1, ∞) and without the assumption that m(X) < ∞.
ii) Moreover, it implies that the flows from Proposition 1.20 and the heat semigroups are related to each other by ν t = (P t v)m, µ t = (P 0 t w)m for ν 0 = vm ∈ P 2 (X) and µ 0 = wm ∈ P sub 2 (X). Corollary 1.27. Assume 1.17. Then for all u : X → R and all t > 0 Proof. The Lip d -estimate follows from the previous gradient estimates (1.12) by taking supremum norm. The Lip d ′ -estimate, on the other hand, follows via Kuwada duality from the transport estimate in Corollary 1.24 with p = 1.
Let us finally give a geometric characterization of Assumption 1.17. Given a metric measure Remark 1.29. The heat flow with Dirichlet boundary values from an optimal transport perspective, to our knowledge has so far only been investigated in [FG], where the authors define a transportation distance between measures allowing to create or destroy mass at the boundary. This metric is a modification of our transportation metric W ′ 2 based on the shortcut metric d ′ , see Remark 1.9. This leads to a gradient flow description of the heat equation with strictly positive, constant Dirichlet boundary conditions. However, it does not apply to the study of the heat flow with vanishing Dirichlet boundary conditions. Further approaches to metrics on the space of finite Radon measures are given in [LMS, PR, KMV].
Structure of the paper: In Section 1 we introduced the setting of particles and antiparticles, giving definitions, stating the main results and giving proofs of those results which do not need the doubling. Section 2 deals with the heat flow on metric measure spaces. In particular, the heat flow with Dirichlet boundary values is discussed. In Section 3, gluing of metric measure spaces is introduced and the space of charged probability measures is identified with the space of probability measures on the doubled space. Section 4 is devoted to the detailed study of various (generalized) metrics on the space of probability measures. Finally, in Section 5, we present the remaining proofs of the results of Subsections 1.2 & 1.3.
In the sequel, the notion of a metric on a space X will be crucial: it is a real-valued, symmetric function on X × X which satisfies the triangle inequality, vanishes on the diagonal and is positive otherwise. We will also use several extensions which satisfy all but one of the above properties: • extended metric: also the value +∞ is admitted • pseudo-metric: may vanish also outside the diagonal • meta-metric: not necessarily vanishing on the diagonal • semi-metric: triangle inequality is not requested.
As we will encounter as much as 9 generalized "W -metrics", let us give a short overview where to find the definitions: , based on shortcut metric d ′ , (1.6) -W † p transportation cost "over the boundary" on measures on Y of the same mass, (1.7) -W * p annihilation cost; meta-metric on measures on X of the same mass, (1.8) -Ŵ p Kantorovich-Wasserstein metric on P p (X), Lemma 3.11 2. Metric measure spaces and heat flows 2.1. Gradients and Dirichlet forms. In this subsection we will introduce some notation and collect some results for Dirichlet forms on the original space X.
Let (X, d) be a complete, separable, length metric space, and let m be a Borel measure with full support supp m = X, satisfying the exponential integrability condition The Cheeger energy of a function f ∈ L 2 (X, m) is defined as Here denotes the local Lipschitz constant of the function f . Functions f ∈ F have a weak gradient, i.e. a function |∇f | ∈ L 2 (X, m) such that Ch ( In what follows, we always assume that X is infinitesimally Hilbertian, meaning that Ch is a quadratic form. By polarisation of E(f ) := 2 Ch(f ) we get a strongly local Dirichlet form Thanks to the exponential integrability (2.1), the Cheeger energy is quasi-regular, cf. [S1,Thm. 4.1].
Given an open subset Y ⊂ X with m(∂Y ) = 0, restricting to functions which vanish on Z := X \ Y quasi-everywhere, we get another Dirichlet form, corresponding to homogeneous Dirichlet "boundary values" on Z: wheref is the quasi-continuous representative of f . By general Dirichlet form theory, a symmetric, strongly continuous contraction semigroup on L 2 (X, m) is associated with each Dirichlet form. Thus we have a semigroup (P t ) t>0 associated with (E, D(E)) and another one (P 0 t ) t>0 associated with (E 0 , D(E 0 )). They are related to the Dirichlet forms in the following way: For functions f, g ∈ L 2 (X, m) define the approximated Then we can recover the corresponding Dirichlet form in the following way (see [FOT,Lemma 1.3.4]): Further, for f ∈ L 2 (X, m) the map t → E t (f, f ) is non-increasing and non-negative. The same is true for P 0 t and (E 0 , D(E 0 )).

2.2.
Heat flows. Let us clarify the different heat flows. We have the "usual" heat flow and the one with Dirichlet boundary values, and to each a corresponding "dual" flow for measures.
Heat flow P t for functions on X. The heat flow (t, u 0 ) → u t = P t u 0 is defined by means of the semigroup in L 2 (X, m) corresponding to the Dirichlet form (E, D(E)).
Heat flow P t for probability measures on X. From now on we additionally assume that (X, d, m) is an RCD * (K, ∞) space. In this case, there is a Brownian motion (B t , P x ) on X and corresponding to this a Markov kernel p t (x, A) = P x (B t ∈ A) (and even a heat kernel), all corresponding to the Dirichlet form E, see [AGMR,Sections 7.1,7.2]. We use it to define the heat flow for probability measures: for µ ∈ P(X) let This coincides with the EVI K -flow of the entropy in (P 2 (X), W 2 ). Since the Brownian motion is connected to the Dirichlet form E uniquely, we get the following formula for the heat flow on functions through the Markov kernel The heat semigroups P t and P t are dual in the following sense: For f : X → R bounded Borel, and µ ∈ P(X) we havê (2.4) The same applies to the heat flowsP t andP t onX (to be discussed in detail in the next section) and the equivalent flowP t onP(Y |X), defined by means of the isometry introduced in Lemma 3.11.
Heat flow with Dirichlet boundary values on Y . Let Y ⊂ X be open and with m(∂Y ) = 0. Let us define a stopping time where as before Z := X \ Y . Then we can define a Markov kernel . Note that we use Fukushima's convention that a Markov kernel is a subprobability on X, in particular p 0 t (x, A) ≤ p t (x, A). This Markov kernel is associated to the Dirichlet form (E 0 , D(E 0 )) given by (2.2), see [FOT,Thm. 4.4.2]. With this we can define the heat flows for bounded Borel functions f : X → R and measures µ ∈ P sub (X) as and They also satisfy the duality relation (2.4).
Remark 2.1. With the help of the Markov kernels, all of these heat flows of measures can be extended to signed, finite Borel measures.

Gluing
In this section we glue together a finite number of copies of an open subset in a metric measure space "along the boundary". We will identify the Cheeger energy and the heat semigroup of the glued space in terms of the original objects.
Beginning with Alexandrov in the 40s, gluing has been studied in connection with curvature bounds a number of times, but mostly in Alexandrov spaces, see [A,"Verheftungssatz" Kap. IX,§3], [P4,Chapter I,§11], [P2, §5], [P3,Theorem 2.1], [K,Theorem 1.1]. More recently, Schlichting [S3, S2] applied the method of [K] to show preservation of various curvature bounds (among them Ricci curvature) on manifolds in an approximate sense which we will use later to give the Riemannian case as an example. In [P1], metric measure spaces supporting Dirichlet forms are glued together. There is also a very recent preprint by Rizzi which shows that gluing does not preserve the dimension in the measure-contraction property [R]. Apart from curvature bounds, the doubling of manifolds with boundary has also been applied by other communities to produce a related manifold without boundary, see for instance [AB].
3.1. Gluing of metric measure spaces. Take an open subset Y ⊂ X and denote Z := X \ Y . Fix a number k ∈ N. We now consider k copies of X, denoted by X 1 , . . . , X k . We will identify these spaces with the original one via maps ι i : X → X i , i = 1, . . . , k, which send points x ∈ X to the corresponding points in X i . Each X i is equipped with the metric d i := d • (ι −1 i , ι −1 i ) and the measure m i := ι i# m, but in this section we usually suppress the indices and write d and m on every We define an equivalence relation by identifying the points in the Z i 's: The k-gluing of X along Z is now obtained as the quotient of the disjoint union of the X i under this equivalence relationX We can view X i as a subset ofX, since the canonical map ⊔ i X i →X restricted to X i is injective. In the following, we will also make use of the partition As a measure we usem := 1 k k i=1 m i , meaning that for a Borel set A ⊂X, we consider the restrictions to the copies and setm This turnsX into a metric measure space. For the special case of gluing together only two copies, we call the resulting space the doubling of Y in X, and as indices we will use i ∈ {+, −}.
Proposition 3.1. The space (X,d) is a complete and separable length space, and the measurê m is Borel.
If additionally X is geodesic and Z is proper (i.e. all closed balls are compact), thenX is geodesic.
The metric properties directly transfer to the Wasserstein space, see for instance [V].
Corollary 3.2. For p ∈ [1, ∞), the Kantorovich-Wasserstein metricŴ p obtained fromd is a complete, separable length metric on P p (X) Now we introduce some notation for dealing with functions onX. For us it will be useful to consider the functions u i : X i → R given by u i := u| X i . We consider the mean valuē u : X → R,ū := 1 k k i=1 u i • ι i and the "mean free" functions Observe that since the u i all coincide on Z, the • u i are zero everywhere on Z. Also, we have Notation: During the proof of Lemma 3.7 we will start to simplify notation, by mostly omitting the identification maps ι i . Whenever a function u i now gets an argument from X, it is understood as u i • ι i and similar for u,   Proof. Being in D(Ê) means Ch(u) < ∞. By the previous lemma, this implies Since each term is non-negative, Ch(u i • ι i ) < ∞ for every i = 1, . . . , k. Thus u i • ι i ∈ D(E) and also the linear combinationū ∈ D(E). The other assertion follows from the fact that all the u i 's coincide on Z.
Now we are going to define a semigroup onX and we will show that it actually is the one corresponding toÊ.
Definition 3.5. The glued semigroup P GL t : L 2 (X,m) → L 2 (X,m) is defined by . . , k. Also, define the approximated glued Dirichlet form E GL t : L 2 (X,m) × L 2 (X,m) → R, Remark 3.6. Observe that P GL t is well-defined, since u i = u j on Z for every i, j = 1, . . . , k.
Lemma 3.7. (P GL t ) t>0 is a symmetric, strongly continuous contraction semigroup on L 2 (X,m). In particular, there is a corresponding Dirichlet form (E GL , D(E GL )) connected to P GL Proof. Symmetry: We use that P t and P 0 t are symmetric with respect to m:X From now on we will apply the abuse of notation introduced before. This is in order to improve readability. Semigroup property: First observe that on X i we have P GL 0 u = P 0ū + P 0 Contraction: To show the contraction property in L 2 (X,m), we first show that P GL t is Markovian (i.e. positivity preserving and L ∞ -contractive in L 2 ∩ L ∞ ). By symmetry of P GL t , we also get L 1 -contractivity. Using the Riesz-Thorin interpolation theorem, we finally get contractivity in L 2 . Let u ∈ L 2 ∩ L ∞ (X,m) with 0 ≤ u ≤ 1. Then also 0 ≤ u i ,ū ≤ 1. Then, on X i , For the other side, we have to show P GL t u ≥ 0, which is equivalent to P 0 tū ≤ P tū + P 0 t u i . But this holds true because P 0 t f ≤ P t f for every f ∈ L 2 , and P 0 t u i ≥ 0. Now we use that L 1 is a subspace of the dual of L ∞ . For u ∈ L 1 ∩ L 2 (X,m), consider the bounded, linear functional ℓ : L ∞ (X,m) → R, ℓ(v) :=´X vP GL t u dm. The dual space norm of ℓ coincides with the L 1 -norm of P GL t u, thus Here we used the symmetry of P GL t and the L ∞ -contractivity. Hence P GL t is a contraction in L 1 ∩ L 2 and also in L ∞ ∩ L 2 . By the Riesz-Thorin interpolation theorem, it is then also a contraction in L 2 .
Strong continuity: This follows directly from the strong continuity of P t and P 0 t : Lemma 3.8. For every u, v ∈ L 2 (X,m): Proof. We just compute Lemma 3.9. If u ∈ D(E GL ), thenū ∈ D(E) and • u i ∈ D(E 0 ), i = 1, . . . , k.

Proof. By definition and (3.3),
Since the sum converges and every term is non-negative and non-decreasing as t → 0, the terms converge and we can interchange sum and limit to get Now we come to the main theorem of this section, which identifies the semigroup P GL t with the heat semigroupP t associated toÊ.
Theorem 3.10. The semigroups P GL t andP t coincide on L 2 (X,m) .
Proof. We will proof that the Dirichlet forms (E GL , D(E GL )) and (Ê, D(Ê)) coincide. Let u, v ∈ D(Ê). By Lemma 3.8, By Lemma 3.4,ū,v ∈ D(E) and , so that we can take the limit t → 0. This yields where we used that E is an extension of E 0 . This also shows that D(Ê) ⊂ D(E GL ). The other direction works with the same argument but using Lemma 3.9 instead.
3.2. Identification ofP(Y |X) and P(X). We will show how the space of charged measures P(Y |X) can be identified with the space of probability measures on the glued space, P(X). Since we only look at two copies of Y ⊂ X, we index the different copies by Y + and Y − instead of the numerical indices in the previous subsection. Still, Z := X \ Y andX = (X + ⊔ X − ) / ∼. As we are dealing now with measures which are not equal on the different copies of X, in this section we do keep track of the identification maps ι i , i ∈ {+, −}. Every subset used in this section is assumed to be a Borel-measurable set in the space it is taken from.
The proof is straightforward and left to the reader. The isometry allows to deduce a representation of the heat flow of charged measures in terms of the heat flows of their effective and total measures. Lemma 3.12. Let σ ∈P(Y |X). Theñ Proof. We do the calculation in the equivalent setting of the doubled spaceX. Letσ ∈ P(X). ThenX We relied heavily on the fact that we glue together copies of the same space, making it possible to "switch" indices when necessary.
Lemma 3.13. Assumption 1.17 inP 2 (Y |X) is satisfied if and only if the entropy Ent is convex in P 2 (X) (i.e.X is an RCD * (K, ∞) space).
Proof. Letσ ∈ P 2 (X) withσ =ξm. We will show that the entropy ofσ in P 2 (X) equals that of Ψ(σ) inP 2 (Y |X) up to an additive constant, and then the result follows by Lemma 3.11 and the fact that K-convexity is preserved if you add a constant to the functional. We have On the other hand, to compute Ent(Ψ(σ)), let us first identify the density of Ψ(σ) i with respect to m: For a Borel-measurable set A ⊂ X = Ent(σ) + log 1 2 .
Proof of Lemma 1.1. This is an immediate consequence of the isometry betweenP p (Y |X) and P p (X), together with Lemma 3.1.
Every coupling of the charged probability measures (µ + ρ, ρ) and (ν + η, η) induces a decomposition of each of the involved measures into three parts. This leads to another, more detailed description of the transportation costs from above.
The decompositions implicitly require the coupled measures to have the same mass, so for instance µ 1 (X) = ν 1 (X) etc.
The proof consists in using again the isometry betweenP p (Y |X) andP p (X) and disintegrating the appearing measures. In the case p = 1, a more explicit description is possible.
Proof. The "≤"-direction follows from the previous Lemma by choosing the decomposition ρ For the second inequality, we used in the case p = 1 simply the fact that ρ + 1 = ρ − 1 , η + 1 = η − 1 and inf η + 1 , µ 2 +µ 3 =µ 0 The case p > 1 requires a more sophisticated argumentation using optimal transport in the glued spaceX = (X \ Y ) ∪ Y + ∪ Y − . We freely switch between equivalent representations in (P p (Y |X),W p ) and in (P p (X),Ŵ p ). Assume for simplicity that (X, d) is geodesic. (For general length spaces, one has to use approximation arguments based on almost geodesics.) Given ã W p -geodesic (σ t ) t∈[0,1] connecting σ 0 := (µ 0 , 0) and σ 1 := (0, µ 0 ), we decompose it into twõ To prove the "≥"-inequality, we assume for simplicity that minimizers in the definition of W 0 1 exist. This is for instance the case when X is compact. For the general case one has to work with almost-minimizers. Let subprobabilities µ and ν be given as well as ρ and η with (µ + 2ρ)(X) = 1, (ν + 2η)(X) = 1 such that where for the last identity we switched to the picture of the glued spaceX = (X \ Y ) ∪ Y + ∪ Y − with subprobabilities µ, ν, ρ, η on the "upper" sheet (X \ Y ) ∪ Y + and their copies ρ ′ , η ′ on the "lower" sheet (X \ Y ) ∪ Y − . We further assume for the moment that all masses are rational numbers. Given ε > 0, choose n, n 1 , n 2 ∈ N and x i , y i , u i , v i ∈ X + for i = 1, . . . , n such that and similarly for v ′ i . (To avoid ambiguity, we may assume that the sets {x i } and {y i } are disjoint form each other.) In particular we have n 1 n = ρ(X) and so on. Now fix aŴ 1 -optimal coupling q n of µ n + ρ n + ρ ′ n and ν n + η n + η ′ n onX. Without restriction, we can choose this coupling q n as a matching (i.e. it does not split mass), that is, Now consider chains of (pairwise disjoint) pairs in Q n with either initial points or endpoints of subsequent pairs being conjugate to each other. These chains of maximal length will be of the form Case 4: (z 1 , w 1 ), (z ′ 1 , w ′ 2 ), (z 2 , w 2 ), . . . , Case 5: (z 1 , w 1 ), (z ′ 2 , w ′ 1 ), (z 2 , w 2 ), . . . , (z ′ 1 , w ′ k−1 ) Case 6: (z 1 , w 1 ), (z ′ 1 , w ′ 2 ), (z 2 , w 2 ), . . . , (z ′ k−1 , w ′ 1 ) with z i , z ′ i ∈ Z, w i , w ′ i ∈ W and z → z ′ denoting the "conjugation map" which switches between upper and lower sheet. In particular, (z ′ ) ′ = z. Now let us have a closer look on the previous six cases of chains of maximal length. Case 1: Maximality implies z 1 ∈ {x i } and w k ∈ {y i } whereas all the other points inbetween The transportation cost associated with this chain is at least d(z 1 , w 1 ) +d(w ′ 1 , z ′ 2 ) + · · · +d(z k , w k ) ≥d(z 1 , w k ) = d(z 1 , w k ) and thus is bounded from below by the cost of the direct transport between the endpoints.
Denote by X 1 ⊂ {x i } the set of z 1 in case 1 and by Y 1 ⊂ {y i } the set of w k . Let Then the transport costs arising from all pairs contained in any chain of case 1 is bounded from below by W 1 µ 1 n , ν 1 n . Case 2: This is just a relabeling of case 1 with indices running in reverse order. No additional costs arise. Case 3: Here, maximality implies z 1 ∈ {x i } and also z ′ k ∈ {x i }. Thus at least one of the pairs in the chain consists of points from two different sheets. Thus with the triangle inequality onX, we conclude that the cost of this chain is at least d * (z 1 , z ′ k ). Denote by X 0 ⊂ {x i } the set of z 1 in case 3. Note that this set coincides with the set of z ′ k (just by reverting the chain) -but for calculating the cost induced by the coupling q n , only one of the pairs (z 1 , z ′ k ) and (z ′ k , z 1 ) has to be taken into account. Let Then the transport costs arising from all pairs contained in any chain of case 3 is bounded from below by 1 2 W * 1 µ 0 n , µ 0 n . Case 4: Similarly, here we conclude w 1 ∈ {y i } as well as w ′ k ∈ {y i } and that the cost of the chain is at least d * (w 1 , w ′ k ). Denote by Y 0 ⊂ {x i } the set of w 1 in case 4 and set Then the transport costs arising from all pairs contained in any chain of case 4 is bounded from below by 1 2 W * 1 ν 0 n , ν 0 n . Case 5: The cyclic chains in this case will produce superfluous costs which will vanish for optimal choices of measures ρ n , η n . That is, 0 is the best lower estimate for the transport costs arising from all pairs contained in any chain of case 5. This infimum will be attained by chains of length k = 2 of the form (z 1 , w 1 ), (z ′ 1 , w ′ 1 ) with z 1 = w 1 . Case 6: This is a cyclic permutation of case 5. No additional costs arise. Summarizing, we obtain Now for given ε and n, the decomposition µ n = µ 1 n + µ 0 n induces via the optimal coupling of µ n and µ a decomposition µ = µ 1 + µ 0 such that Similarly, for ν n = ν 1 n + ν 0 n and ν = ν 1 + ν 0 . Thus we finally obtain W 0 1 (µ, ν) =Ŵ 1 µ + ρ + ρ ′ , ν + η + η ′ ≥Ŵ 1 µ n + ρ n + ρ ′ n , ν n + η n + η ′ n − 6ε Since ε > 0 was arbitrary, this proves the claim.
For the general case of real masses, one can approximate Borel measures by sums of Dirac measures (with rational masses) in the weak topology. By continuity ofW 1 , W 1 and W * 1 with respect to weak convergence, one can apply the rational case and go to the limit in (4.2).
Proof of Lemma 1.10. Assertions (i) and (ii) are the content of the previous Lemma. The proof for the decomposition in assertion (iv) is straightforward. For the vanishing of the W † p -term in the case p = 1 note that in this case Assertion (iii) will follow from combining assertion (iv), Lemma 1.11 and Theorem 1.13(i).
Proof of Lemma 1.11. We switch to the picture of two glued copies. Given µ, ν ∈ P(Y ), consider them as µ ∈ P(Y + ) and ν ∈ P(Y − ) and fix aŴ 1 -optimal coupling q of them.
To simplify the presentation, let us first discuss the argument ifX is a geodesic space. Choose a measurable selection of connectingd-geodesics Γ :X ×X → Geo(X). For a geodesic γ inX Define a probability measure ξ = Z # q via push forward of the optimal coupling. Then this is â W 1 -intermediate point of µ and ν. Indeed, for the transport from µ to ν, the pair x ∈ Y + , y ∈ Y − contributes the cost d * (x, y). The fraction α(x, y)·d * (x, y) contributes to the cost of the transport from µ to ξ. And the fraction (1 − α(x, y)) · d * (x, y) contributes to the cost of the transport from ξ to ν. Now let us discuss the general case of a length space X. Instead of geodesics, we now choose approximated-geodesics. With the same construction then ξ will be an approximatê W 1 -intermediate point. This proves the claim in the case p = 1.
To prove the claim for p > 1, for simplicity we assume that X is compact. (This will guarantee the existence of the map Φ to be introduced below. Otherwise, one has to use approximation arguments.) For each ξ ∈ P(∂Y ) and each W p -optimal coupling q of µ and ξ To deduce the converse inequality, choose a measurable Φ : Y → ∂Y such that for each x ∈ Y the point Φ(x) is a minimizer of z → d(x, z) on ∂Y . Define a probability measure ξ = Φ ♯ µ. Then This proves that W ′ p (µ, 0) = inf{W p (µ, ξ) ξ ∈ P(∂Y )}. Moreover, the triangle inequality for d * implies that W p (µ, ξ) + W p (ξ, µ) ≥ W * p (µ, µ) for all ξ ∈ P(∂Y ). Thus W ′ p (µ, 0) ≥ W * p (µ). An estimate in the other direction is obtained as follows where q denotes any W * p -optimal coupling of µ and µ. Proof of Theorem 1.13. (i) For simplicity of the presentation we assume that length minimizing geodesics exist. This is for instance the case when Y ′ is geodesic. In this case there exist W ′ 1 -geodesics which are supported on d ′ -geodesics. For the general case one has to work with almost-geodesics.
Hence ρ * = η * and in particularμ * = µ * . This way we see that every subsequence of µ (n) has a further subsequence which converges to µ * , so that also the whole sequence converges to µ * .

Proofs for Subsections 1.2 & 1.3
Proof of Proposition 1.20. This will follow from the identification with the glued space and the properties shown in Subsection 3.1, in particular Theorem 3.10. Let us provide the details. i) Given σ 0 ∈P(Y |X), considerσ := Φ(σ 0 ) ∈ P(X), with the isometry Φ given in Lemma 3.11. SinceX is an RCD * (K, ∞) space by Assumption 1.17 and Lemma 3.13, the EVI K -gradient flowσ t ∈ P(X) starting inσ exists. Again by the identification of the entropies in Lemma 3.13, the flow σ t := Ψ(σ t ) is the EVI K -gradient flow of Ent inP(Y |X).
Proof of Theorem 1.26. i) Under Assumption 1.17,X is an RCD * (K, ∞) space and hence satisfies a gradient estimate with p = 2. By [S1,Cor. 4.3] we have the improved gradient estimate for p ∈ [1, 2] and by Jensen's inequality one easily obtains the gradient estimate for p > 2 from that. Now we take a function f ∈ D(E 0 ) and define u := f, on X + −f, on X − . Lemma 5.2. Let (g ε ) ε>0 be a sequence of smooth Riemannian metrics and g a continuous Riemannian metric on a compact, smooth manifold M. If g ε → g uniformly as ε → 0, then (M, d ε , m ε ) → (M, d, m) in the measured Gromov-Hausdorff sense, where d ε , m ε and d, m are the distance functions and volume measures obtained by g ε and g, respectively.
This seems to be well-known. We leave its straightfoward proof to the reader.
Proof of Theorem 1.19. As a Riemannian manifold with lower Ricci curvature bound K, M is an RCD * (K, ∞) space. As a convex subset, also Y with the restricted distance and measure is an RCD * (K, ∞) space. Now Assumption 1.17 is satisfied by identification of the entropies in Lemma 3.13, since the doubling of the manifold is an RCD * (K, ∞) space by Theorem 5.1.