Heat flow with Dirichlet boundary conditions via optimal transport and gluing of metric measure spaces

We introduce the transportation-annihilation distanceWp♯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_p^\sharp $$\end{document} between subprobabilities and derive contraction estimates with respect to this distance for the heat flow with homogeneous Dirichlet boundary conditions on an open set in a metric measure space. We also deduce the Bochner inequality for the Dirichlet Laplacian as well as gradient estimates for the associated Dirichlet heat flow. For the Dirichlet heat flow, moreover, we establish a gradient flow interpretation within a suitable space of charged probabilities. In order to prove this, we will work with the doubling of the open set, the space obtained by gluing together two copies of it along the boundary.


Introduction and statement of main results
We present an approach to heat flow with homogeneous Dirichlet boundary conditions via optimal transport-indeed, the very first ever-based on a novel particle interpretation for this evolution. The classical particle interpretation for the heat flow in an open set Y with Dirichlet boundary condition is based on particles which move around in Y and are killed (or lose their mass) as soon as they hit the boundary ∂Y . Our new interpretation will be based on particles moving around in Y , which are reflected if they hit the boundary, and which thereby randomly change their "charge": half of them change into "antiparticles", half of them continue to be normal particles. Effectively, they annihilate each other but the total number of charged particles remains constant.
This leads us to regard the initial probability distribution as a distribution σ + 0 of normal particles, with no antiparticles being around at time 0, i.e. σ − 0 = 0. In the course of time, σ +  1 2 δ x ) < ∞ for some/all x ∈ X . Obviously, the map μ → 1 2 μ, 1 2 μ defines an isometric embedding of P p (X ) intoP p (Y |X ). Based on an isometry betweenP p (Y |X ) and P p (X ) with a suitable "glued space"X , we will deduce important metric properties ofW p , see Sect. 3.2:
To overcome the lack of a triangle inequality for W 0 p , we now strive for a related length metric. In a first step, we define a (pseudo-) metric, and out of this the induced length (pseudo-) metric. (iii) For two measures μ, ν ∈ P sub p (Y ), the induced length metric is now obtained by (1.5) It will be called transportation-annihilation distance. Remark 1. 5 Both, W p and W p are a priori only pseudo-metrics; the former the biggest one below W 0 p , the latter the smallest intrinsic one above W p . In what follows, it will turn out however that both indeed are metrics and for p = 1 they coincide.
We will compare the previous (pseudo-)metrics with the Kantorovich-Wasserstein metric If (X , d) is a complete, length metric space then so will be (Y , d ) . If in addition X is proper (i.e. closed balls are compact) then (Y , d ) will be a geodesic space. We will further denote d † (x, y) := d (x, ∂) + d (y, ∂), so that d = min{d, d † }. Definition 1.6 (i) W p will denote the L p -Kantorovich-Wasserstein distance on P p (Y ) induced by the distance d . (ii) Extending each subprobability measure μ ∈ P sub (Y ) to a probability measure μ ∈ P(Y ) by μ := μ + (1 − μ(Y ))δ ∂ induces a bijective embedding of P sub (Y ) into P(Y ). The induced distance on P sub (Y ) will again be denoted by W p . (iii) For subprobability measures μ, ν of equal mass we will also make use of the transportation cost induced by d † . (iv) Finally, for subprobabilities of equal mass define the L p -transportation distance with respect to the meta-metric d * and let W * p (μ) := 1 2 W * p (μ, μ), which will be called annihilation cost of the subprobability μ.

Remark 1.7
Obviously, W * p is symmetric in its arguments and satisfies the triangle inequality but typically W * p (μ, μ) = 0.
Remark 1. 9 One could equally well define For p = 1 the metrics W 1 and W 1 coincide, but for p > 1 this is no longer true. Take for instance [7].
From now on until the end of this subsection assume that (X , d) is a length space.
Quite intuitive characterizations of W 0 p (μ, ν), W p (μ, ν), and W p (μ, ν) are possible in terms of L p -transportation costs and and L p -annihilation costs.
(ii) More generally for all p ≥ 1 and μ, ν ∈ P sub p (Y )

Remark 1.12
In general, W * p (μ) and W p (μ, 0) will not coincide. Our lower bound for Thus for ε sufficiently small, (ii) More generally, for all p ≥ 1 and all μ, ν ∈ P sub p (Y ) Taking for p > 1, n ≥ 1. In particular, the lower estimate for W p in assertion ii) of the previous Theorem is sharp.
A useful feature of W p is that it metrizes vague convergence of subprobability measures.

Proposition 1.15
Assume that X is a compact geodesic space. Then for every p ∈ [1, ∞), W p is a complete, separable, geodesic metric on P sub p (Y ) and for μ n , μ ∈ P sub p (Y ) the following are equivalent: (i) μ n → μ vaguely on Y .
(ii) W p (μ n , μ) → 0 as n → ∞ Remark 1. 16 In particular, this implies that μ n → μ weakly on Y if and only if W p (μ n , μ) → 0 and μ n (Y ) → μ(Y ). A similar result for W 0 p can be deduced even without requiring that X is geodesic, see Lemma 4.4.
The implication "(ii)⇒(i)" holds true for all length spaces X without requiring their compactness. For the converse, one has to add a condition on convergence of moments, see remark following Lemma 4.4.

Gradient flow perspective and transportation estimates
From now on, let us be more specific. We assume that (X , d, m) is a metric measure space which satisfies an RCD(K , ∞)-condition for some number K ∈ R and that Y ⊂ X is a dense open subset with m(∂Y ) = 0. The RCD(K , ∞)-condition means that the metric measure space (X , d, m) is infinitesimally Hilbertian with Ricci curvature bounded from below by K in the sense of Lott-Sturm-Villani, [13,24]. The latter is formulated as K -convexity of the Boltzmann entropy Ent m in P 2 (X ), W 2 . We will additionally request that this property extends to the space of charged probability measures induced by Y , that is, we will request that (X , Y , d, m) satisfies the following: Assumption 1.17 ("Charged Lower Ricci Bound K ") The Boltzmann entropy Remark 1.18 (a) Note that, due to the isometric embedding of P 2 (X ) intoP 2 (Y |X ), this assumption will imply the K -convexity of Ent m in P 2 (X ), W 2 and thus the CD(K , ∞)condition for the metric measure space (X , d, m).
is infinitesimally Hilbertian and if m has full topological support then Assumption 1.17 implies that Y = X . Indeed, the argument from [20] carries over to this framework and yields essential non-branching which in turn implies the density of Y in X .
The proofs of the following results will be given in Sect. 5. They will be based on concepts and results for gluing of metric measure spaces which will be presented in Sect. 3

on Y with Dirichlet boundary conditions is obtained as the effective flow
For each ν 0 ∈ P 2 (X ), the heat flow (ν t ) t>0 on X is obtained as the total flow where (σ t ) t>0 is the EVI K -flow as above starting in any iv) For each σ 0 ∈P 2 (Y |X ), the EVI K -flow (σ t ) t>0 from i) can be characterized as where (ν t ) t>0 will denote the heat flow on X starting in ν 0 = σ + 0 + σ − 0 and (μ t ) t>0 will denote the heat flow on Y with Dirichlet boundary conditions starting in

Proposition 1.22
The EVI K -flows (σ t ) t>0 and (τ t ) t>0 as above are K -contractive in all L p -transportation distances:W for all t > 0 and all p ∈ [1, ∞).
where (μ t ) t>0 and (ν t ) t>0 denote the heat flows on Y with Dirichlet boundary conditions starting in μ 0 and ν 0 , resp.
Proof Given μ 0 , ν 0 ∈ P sub p (Y ) and ε > 0, we may choose σ 0 , Thus, by the very definition of W 0 p and by the previous proposition, Since ε > 0 was arbitrary, this proves the claim.

Corollary 1.24
Let μ 0 , ν 0 ∈ P sub p (Y ), and (μ t ) t>0 and (ν t ) t>0 denote the heat flows on Y with Dirichlet boundary conditions starting in μ 0 and ν 0 , resp. Then for all t > 0 and all p ∈ [1, ∞) we have both Here, P 0 t is the heat semigroup with Dirichlet boundary conditions on measures, see Sect. 2.2. This also implies that for a curve (

Gradient estimates and Bochner's inequality
Let us continue to assume that (X , d, m) is a metric measure space which satisfies an RCD(K , ∞)-condition and that Y ⊂ X is a dense open subset with m(∂Y ) = 0. Assumption 1.17 yields a gradient estimate which involves both semigroups, P t (with Neumann boundary condition) and P 0 t (with Dirichlet boundary condition). Before proving this estimate, we will see that it is equivalent to a Bochner inequality which involves the corresponding Laplace operators. To state directly the p-versions, let us introduce the appropriate function spaces.
and similarly for E 0 and 0 , which are the Dirichlet form and generator associated to the heat flow P 0 t .

Proposition 1.25
Assume that m(X ) < ∞. For each p ∈ [1,2], the following properties are equivalent to each other: Note that different semigroups appear on the left and right hand side.
The proof is an adaption of the one of [9, Thm. 3.6]. (ii) Moreover, it implies that the flows from Proposition 1.20 and the heat semigroups are related to each other by as well as Here Let us finally give a geometric characterization of Assumption 1.17. Given a metric mea-

Remark 1.29
The heat flow with Dirichlet boundary values from an optimal transport perspective, to our knowledge has so far only been investigated in [7], where the authors define a transportation distance between measures allowing to create or destroy mass at the boundary. This metric is a modification of our transportation metric W 2 based on the shortcut metric d , see Remark 1.9. This leads to a gradient flow description of the heat equation with strictly positive, constant Dirichlet boundary conditions. However, it does not apply to the study of the heat flow with vanishing Dirichlet boundary conditions. Further approaches to metrics on the space of finite Radon measures are given in [11,12,18].

Structure of the paper:
In Sect. 1 we introduced the setting of particles and antiparticles, giving definitions, stating the main results and giving proofs of those results which do not need the doubling. Section 2 deals with the heat flow on metric measure spaces. In particular, the heat flow with Dirichlet boundary values is discussed. In Sect. 3, gluing of metric measure spaces is introduced and the space of charged probability measures is identified with the space of probability measures on the doubled space. Section 4 is devoted to the detailed study of various (generalized) metrics on the space of probability measures. Finally, in Sect. 5, we present the remaining proofs of the results of Sects. 1.2 and 1.3.
In the sequel, the notion of a metric on a space X will be crucial: it is a real-valued, symmetric function on X × X which satisfies the triangle inequality, vanishes on the diagonal and is positive otherwise. We will also use several extensions which satisfy all but one of the above properties: • extended metric: also the value +∞ is admitted • pseudo-metric: may vanish also outside the diagonal • meta-metric: not necessarily vanishing on the diagonal • semi-metric: triangle inequality is not requested.
As we will encounter as much as 9 generalized "W -metrics", let us give a short overview where to find the definitions: -W † p transportation cost "over the boundary" on measures on Y of the same mass, (1.7) -W * p annihilation cost; meta-metric on measures on X of the same mass, (1.8) -Ŵ p Kantorovich-Wasserstein metric on P p (X ), Lemma 3.11 2 Metric measure spaces and heat flows

Gradients and Dirichlet forms
In this subsection we will introduce some notation and collect some results for Dirichlet forms on the original space X .
Let (X , d) be a complete, separable, length metric space, and let m be a Borel measure with full support supp m = X , satisfying the exponential integrability condition In what follows, we always assume that X is infinitesimally Hilbertian, meaning that Ch is a quadratic form. By polarisation of E( f ) := 2 Ch( f ) we get a strongly local Dirichlet form (E, D(E)) on L 2 (X , m), where D(E) := F . The domain is then a Hilbert space with norm Thanks to the exponential integrability (2.1), the Cheeger energy is quasi-regular, cf. [21,Thm. 4.1].
Given an open subset Y ⊂ X with m(∂Y ) = 0, restricting to functions which vanish on Z := X \Y quasi-everywhere, we get another Dirichlet form, corresponding to homogeneous Dirichlet "boundary values" on Z : wheref is the quasi-continuous representative of f . By general Dirichlet form theory, a symmetric, strongly continuous contraction semigroup on L 2 (X , m) is associated with each Dirichlet form. Thus we have a semigroup (P t ) t>0 associated with (E, D(E)) and another one (P 0 t ) t>0 associated with (E 0 , D(E 0 )). They are related to the Dirichlet forms in the following way: Then we can recover the corresponding Dirichlet form in the following way (see [ is non-increasing and non-negative. The same is true for P 0 t and (E 0 , D(E 0 )).

Heat flows
Let us clarify the different heat flows. We have the "usual" heat flow and the one with Dirichlet boundary values, and to each a corresponding "dual" flow for measures.

Heat flow P t for functions on X
The heat flow (t, u 0 ) → u t = P t u 0 is defined by means of the semigroup in L 2 (X , m) corresponding to the Dirichlet form (E, D(E)).

Heat flow P t for probability measures on X
From now on we additionally assume that (X , d, m) is an RCD * (K , ∞) space. In this case, there is a Brownian motion (B t , P x ) on X and corresponding to this a Markov ker- This coincides with the EVI K -flow of the entropy in (P 2 (X ), W 2 ). Since the Brownian motion is connected to the Dirichlet form E uniquely, we get the following formula for the heat flow on functions through the Markov kernel The heat semigroups P t and P t are dual in the following sense: For f : X → R bounded Borel, and μ ∈ P(X ) we have The same applies to the heat flowsP t andP t onX (to be discussed in detail in the next section) and the equivalent flowP t onP(Y |X ), defined by means of the isometry introduced in Lemma 3.11.

Heat flow with Dirichlet boundary values on Y
Let Y ⊂ X be open and with m(∂Y ) = 0. Let us define a stopping time where as before Z := X \ Y . Then we can define a Markov kernel . Note that we use Fukushima's convention that a Markov kernel is a subprobability on X , in particular p 0 t (x, A) ≤ p t (x, A). This Markov kernel is associated to the Dirichlet form (E 0 , D(E 0 )) given by (2.2), see [8,Thm. 4.4.2]. With this we can define the heat flows for bounded Borel functions f : X → R and measures μ ∈ P sub (X ) as and They also satisfy the duality relation (2.4).

Remark 2.1
With the help of the Markov kernels, all of these heat flows of measures can be extended to signed, finite Borel measures.

Gluing
In this section we glue together a finite number of copies of an open subset in a metric measure space "along the boundary". We will identify the Cheeger energy and the heat semigroup of the glued space in terms of the original objects. Beginning with Alexandrov in the 40s, gluing has been studied in connection with curvature bounds a number of times, but mostly in Alexandrov spaces, see [ [22,23] applied the method of [10] to show preservation of various curvature bounds (among them Ricci curvature) on manifolds in an approximate sense which we will use later to give the Riemannian case as an example. In [14], metric measure spaces supporting Dirichlet forms are glued together. There is also a very recent preprint by Rizzi which shows that gluing does not preserve the dimension in the measure-contraction property [19]. Apart from curvature bounds, the doubling of manifolds with boundary has also been applied by other communities to produce a related manifold without boundary, see for instance [2].

Gluing of metric measure spaces
Take an open subset Y ⊂ X and denote Z := X \ Y . Fix a number k ∈ N. We now consider k copies of X , denoted by X 1 , . . . , X k . We will identify these spaces with the original one via maps ι i : X → X i , i = 1, . . . , k, which send points x ∈ X to the corresponding points in X i . Each X i is equipped with the metric d i := d • (ι −1 i , ι −1 i ) and the measure m i := ι i # m, but in this section we usually suppress the indices and write d and m on every X i . Let We define an equivalence relation by identifying the points in the Z i 's: The k-gluing of X along Z is now obtained as the quotient of the disjoint union of the X i under this equivalence relationX We can view X i as a subset ofX , since the canonical map i X i →X restricted to X i is injective. In the following, we will also make use of the partition Define a metricd :X ×X → R bŷ As a measure we usem := 1 k k i=1 m i , meaning that for a Borel set A ⊂X , we consider the restrictions to the copies and set This turnsX into a metric measure space. For the special case of gluing together only two copies, we call the resulting space the doubling of Y in X , and as indices we will use i ∈ {+, −}.

Proposition 3.1 The space (X ,d) is a complete and separable length space, and the measurê m is Borel.
If additionally X is geodesic and Z is proper (i.e. all closed balls are compact), thenX is geodesic.
The metric properties directly transfer to the Wasserstein space, see for instance [25]. p ∈ [1, ∞), the Kantorovich-Wasserstein metricŴ p obtained fromd is a complete, separable length metric on P p (X ) Now we introduce some notation for dealing with functions onX . For us it will be useful to consider the functions u i : X i → R given by u i := u| X i . We consider the mean valuē u : X → R,ū := 1 k k i=1 u i • ι i and the "mean free" functions

Corollary 3.2 For
Observe that since the u i all coincide on Z , the (3.1) Notation: During the proof of Lemma 3.7 we will start to simplify notation, by mostly omitting the identification maps ι i . Whenever a function u i now gets an argument from X , it is understood as u i • ι i and similar for u, • u i with ι −1 i . Let ( Ch,F) denote the Cheeger energy of the space (X ,d,m).

Lemma 3.3 The spaceX is infinitesimally Hilbertian and for every u ∈F, the functions u i • ι i are in F and
Proof This follows directly from the locality property

Proof Being in D(Ê) means Ch(u) < ∞. By the previous lemma, this implies
Since each term is non-negative, Ch(u i •ι i ) < ∞ for every i = 1, . . . , k. Thus u i •ι i ∈ D(E) and also the linear combinationū ∈ D(E). The other assertion follows from the fact that all the u i 's coincide on Z .
Now we are going to define a semigroup onX and we will show that it actually is the one corresponding toÊ.

Remark 3.6
Observe that P G L t is well-defined, since u i = u j on Z for every i, j = 1, . . . , k.

Lemma 3.7 (P G L t ) t>0 is a symmetric, strongly continuous contraction semigroup on L 2 (X ,m). In particular, there is a corresponding Dirichlet form
Proof Symmetry: We use that P t and P 0 t are symmetric with respect to m: From now on we will apply the abuse of notation introduced before. This is in order to improve readability. Semigroup property: First observe that on X i we have P G L 0 u = P 0ū +P 0 where we used (3.1). Contraction: To show the contraction property in L 2 (X ,m), we first show that P G L t is Markovian (i.e.positivity preserving and L ∞ -contractive in L 2 ∩ L ∞ ). By symmetry of P G L t , we also get L 1 -contractivity. Using the Riesz-Thorin interpolation theorem, we finally get contractivity in L 2 .
Let u ∈ L 2 ∩ L ∞ (X ,m) with 0 ≤ u ≤ 1. Then also 0 ≤ u i ,ū ≤ 1. Then, on X i , For the other side, we have to show P G L t u ≥ 0, which is equivalent to But this holds true because P 0 t f ≤ P t f for every f ∈ L 2 , and P 0 t u i ≥ 0. Now we use that L 1 is a subspace of the dual of L ∞ . For u ∈ L 1 ∩ L 2 (X ,m), consider the bounded, linear functional : The dual space norm of coincides with the L 1 -norm of P G L t u, thus Here we used the symmetry of P G L t and the L ∞ -contractivity. Hence P G L Strong continuity: This follows directly from the strong continuity of P t and P 0 t :

Lemma 3.9 If u ∈ D(E G L ), thenū ∈ D(E) and
Proof By definition and (3.3), Since the sum converges and every term is non-negative and non-decreasing as t → 0, the terms converge and we can interchange sum and limit to get Now we come to the main theorem of this section, which identifies the semigroup P G L t with the heat semigroupP t associated toÊ.

Theorem 3.10
The semigroups P G L t andP t coincide on L 2 (X ,m) .

By Lemma 3.4,ū,v ∈ D(E) and
• u i , , so that we can take the limit t → 0. This yields where we used that E is an extension of E 0 . This also shows that D(Ê) ⊂ D(E G L ). The other direction works with the same argument but using Lemma 3.9 instead.

Identification ofP(Y|X) and P(X)
We will show how the space of charged measuresP(Y |X ) can be identified with the space of probability measures on the glued space, P(X ). Since we only look at two copies of Y ⊂ X , we index the different copies by Y + and Y − instead of the numerical indices in the previous subsection. Still, Z := X \ Y andX = X + X − / ∼. As we are dealing now with measures which are not equal on the different copies of X , in this section we do keep track of the identification maps ι i , i ∈ {+, −}. Every subset used in this section is assumed to be a Borel-measurable set in the space it is taken from.
The pf is straightforward and left to the reader. The isometry allows to deduce a representation of the heat flow of charged measures in terms of the heat flows of their effective and total measures.
Proof We do the calculation in the equivalent setting of the doubled spaceX . Letσ ∈ P(X ). Then We relied heavily on the fact that we glue together copies of the same space, making it possible to "switch" indices when necessary.

Lemma 3.13 Assumption 1.17 inP 2 (Y |X ) is satisfied if and only if the entropy Ent is convex in
Proof Letσ ∈ P 2 (X ) withσ =ξm. We will show that the entropy ofσ in P 2 (X ) equals that of (σ ) inP 2 (Y |X ) up to an additive constant, and then the result follows by Lemma 3.11 and the fact that K -convexity is preserved if you add a constant to the functional. We have On the other hand, to compute Ent( (σ )), let us first identify the density of (σ ) i with respect to m: For a Borel-measurable set A ⊂ X

Transportation (semi-)distances between subprobabilities
Let (X , d) be a complete separable metric space and Y ⊂ X be an open subset with ∅ = Y = X . Recall the definition of L p -transportation semi-metric between subprobabilities μ, ν ∈ P sub (Y ): Proof of Lemma 1.1 This is an immediate consequence of the isometry betweenP p (Y |X ) and P p (X ), together with Lemma 3.1.
Every coupling of the charged probability measures (μ + ρ, ρ) and (ν + η, η) induces a decomposition of each of the involved measures into three parts. This leads to another, more detailed description of the transportation costs from above.
The decompositions implicitly require the coupled measures to have the same mass, so for instance μ 1 (X ) = ν 1 (X ) etc.
The pf consists in using again the isometry betweenP p (Y |X ) andP p (X ) and disintegrating the appearing measures. In the case p = 1, a more explicit description is possible.
Case 1: Maximality implies z 1 ∈ {x i } and w k ∈ {y i } whereas all the other points inbetween The transportation cost associated with this chain is at least and thus is bounded from below by the cost of the direct transport between the endpoints. Denote by X 1 ⊂ {x i } the set of z 1 in case 1 and by Y 1 ⊂ {y i } the set of w k . Let Then the transport costs arising from all pairs contained in any chain of case 1 is bounded from below by W 1 μ 1 n , ν 1 n . Case 2: This is just a relabeling of case 1 with indices running in reverse order. No additional costs arise. Case 3: Here, maximality implies z 1 ∈ {x i } and also z k ∈ {x i }. Thus at least one of the pairs in the chain consists of points from two different sheets. Thus with the triangle inequality onX , we conclude that the cost of this chain is at least d * (z 1 , z k ).
Denote by X 0 ⊂ {x i } the set of z 1 in case 3. Note that this set coincides with the set of z k (just by reverting the chain)-but for calculating the cost induced by the coupling q n , only one of the pairs (z 1 , z k ) and (z k , z 1 ) has to be taken into account. Let Then the transport costs arising from all pairs contained in any chain of case 3 is bounded from below by 1 2 W * 1 μ 0 n , μ 0 n . Case 4: Similarly, here we conclude w 1 ∈ {y i } as well as w k ∈ {y i } and that the cost of the chain is at least d * (w 1 , w k ). Denote by Y 0 ⊂ {x i } the set of w 1 in case 4 and set Then the transport costs arising from all pairs contained in any chain of case 4 is bounded from below by 1 2 W * 1 ν 0 n , ν 0 n . Case 5: The cyclic chains in this case will produce superfluous costs which will vanish for optimal choices of measures ρ n , η n . That is, 0 is the best lower estimate for the transport costs arising from all pairs contained in any chain of case 5. This infimum will be attained by chains of length k = 2 of the form (z 1 , w 1 ), (z 1 , w 1 ) with z 1 = w 1 . Case 6: This is a cyclic permutation of case 5. No additional costs arise.
For the general case of real masses, one can approximate Borel measures by sums of Dirac measures (with rational masses) in the weak topology. By continuity ofW 1 , W 1 and W * 1 with respect to weak convergence, one can apply the rational case and go to the limit in (4.2).
Proof of Lemma 1.10 Assertions (i) and (ii) are the content of the previous Lemma. The proof for the decomposition in assertion (iv) is straightforward. For the vanishing of the W † p -term in the case p = 1 note that in this case [d (x, ∂) + d (x, ∂)] p = d (x, ∂) p + d (x, ∂) p whereas in general only the ≥ inequality holds. Assertion (iii) will follow from combining assertion (iv), Lemma 1.11 and Theorem 1.13(i).
In the case of a length space X , the annihilation cost W * 1 (μ) allows for an alternative characterization as inf{W 1 (μ, ξ ) : ξ ∈ P(∂Y )} and, more generally, This is the content of Lemma 1.11.
Proof of Lemma 1. 11 We switch to the picture of two glued copies. Given μ, ν ∈ P(Y ), consider them as μ ∈ P(Y + ) and ν ∈ P(Y − ) and fix aŴ 1 -optimal coupling q of them.
To simplify the presentation, let us first discuss the argument ifX is a geodesic space. Choose a measurable selection of connectingd-geodesics :X ×X → Geo(X ). For a geodesic γ inX with γ 0 ∈ Y + , γ 1 Define a probability measure ξ = Z # q via push forward of the optimal coupling. Then this is aŴ 1 -intermediate point of μ and ν. Indeed, for the transport from μ to ν, the pair x ∈ Y + , y ∈ Y − contributes the cost d * (x, y). The fraction α(x, y) · d * (x, y) contributes to the cost of the transport from μ to ξ . And the fraction (1 − α(x, y)) · d * (x, y) contributes to the cost of the transport from ξ to ν. Now let us discuss the general case of a length space X . Instead of geodesics, we now choose approximated-geodesics. With the same construction then ξ will be an approximatê W 1 -intermediate point. This proves the claim in the case p = 1.
Since W 1 is the length metric induced by W 1 , one gets W 1 ≤ W 1 . The other inequality is provided by the fact that W 1 is the biggest metric below W 0 1 and that W 1 = W 1 ≤ W 0 1 by the above. (ii) Now let us consider the case p > 1. The idea is that locally (along a geodesic) the contribution of W † p is negligible, so that we can compare W p and W p on a small scale and then carry it over to the induced length metrics. Let subprobabilities μ, ν be given as well as a W p -geodesic (η t ) t∈ [0,1] connecting the measures μ := μ + (1 − μ(Y ))δ ∂ and ν := ν + (1 − ν(Y ))δ ∂ . By the continuity of W p and W * p with respect to weak convergence we can assume without loss of generality that μ and ν have compact supports and for ε > 0 small η t (Y ) ≤ 1 − ε for all t ∈ [0, 1]. Recall that the measures without primes are the restrictions to Y . We thus have η t (∂) = 0, whereas η t (∂) ≥ ε. Choose δ > 0 such that η t (B δ (∂)) ≤ ε 2 . Let be the probability measure on the space of Y -geodesics such that η t = (e t ) # (where e t is the evaluation map at time t), denote by L the essential supremum of d (γ 0 , γ 1 ) under , and let δ := δ L . We consider η s and η t for |s − t| ≤ δ . Using that d † (x, y) p ≥ d (x, ∂) p + d (y, ∂) p ,
Hence ρ * = η * and in particularμ * = μ * . This way we see that every subsequence of μ (n) has a further subsequence which converges to μ * , so that also the whole sequence converges to μ * .

Proofs for Sects. 1.2 and 1.3
Proof of Proposition 1.20 This will follow from the identification with the glued space and the properties shown in Sect. 3.1, in particular Theorem 3.10. Let us provide the details.
(iv) Let σ 0 ∈P 2 (Y |X ) and define μ 0 := σ + 0 − σ − 0 and ν 0 := σ + 0 + σ − 0 . Then, again by Lemma 3.12, Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.