1 Introduction and statement of main results

We present an approach to heat flow with homogeneous Dirichlet boundary conditions via optimal transport—indeed, the very first ever—based on a novel particle interpretation for this evolution. The classical particle interpretation for the heat flow in an open set Y with Dirichlet boundary condition is based on particles which move around in Y and are killed (or lose their mass) as soon as they hit the boundary \(\partial Y\). Our new interpretation will be based on particles moving around in Y, which are reflected if they hit the boundary, and which thereby randomly change their “charge”: half of them change into “antiparticles”, half of them continue to be normal particles. Effectively, they annihilate each other but the total number of charged particles remains constant.

This leads us to regard the initial probability distribution as a distribution \(\sigma _0^+\) of normal particles, with no antiparticles being around at time 0, i.e. \(\sigma _0^-=0\). In the course of time, \(\sigma _t^+\) and \(\sigma _t^-\) will evolve as subprobability measures on Y and so does the “effective distribution” \(\sigma _t^0:=\sigma _t^+-\sigma _t^-\) whereas the “total distribution” \({\overline{\sigma }}_t :=\sigma _t^++\sigma _t^-\) continues to be a probability measure. The latter will evolve as heat flow with Neumann boundary conditions whereas the former will evolve as heat flow with Dirichlet boundary conditions. The evolution of the charged particle distribution \(\sigma _t=(\sigma _t^+,\sigma _t^-)\) will be characterized as an EVI-gradient flow for the Boltzmann entropy. New transportation distances for subprobability measures will yield contraction estimates for the effective flow.

Technically, we will interpret the pairs of subprobability measures \((\sigma ^+,\sigma ^-)\) as a probability measure on the doubling of Y in X, i.e. a space obtained by gluing together two copies of X along the ”boundary“ \(X\setminus Y\). Both settings are equivalent. Under a curvature condition for the doubling, we get Wasserstein contraction results and gradient estimates for the heat flow with Dirichlet boundary values.

In particular, we also obtain the very first version of a Bochner inequality for the Dirichlet Laplacian on a convex subset of a Riemannian manifold—which surprisingly involves both, the Dirichlet Laplacian and the Neumann Laplacian.

1.1 Transportation-annihilation distance between subprobabilities

Let (Xd) be a complete separable metric space and \(Y\subset X\) be an open subset with \(\emptyset \not = Y\not = X\). The distance between two normal particles at locations x and \(y\in X\) will be given by d(xy)—and so is the distance between two antiparticles at x and y. The distance between a normal particle at \(x\in X\) and an antiparticle at \(y\in X\) (or vice versa) will be given by

$$\begin{aligned} d^*(x,y):=\inf _{z\in X\setminus Y}\big [ d(x,z)+d(z,y)\big ]. \end{aligned}$$

The set of subprobability measures on Y (i.e. measures \(\mu \) on Y equipped with its Borel field with mass \(\mu (Y)\le 1\)) will be denoted by \({\mathcal {P}}^{sub}(Y)\). Moreover, we introduce the set of charged probability measures on X by

$$\begin{aligned} \tilde{\mathcal {P}}(Y|X)&:&=\Big \{\sigma =(\sigma ^+,\sigma ^-) \,\big |\, \sigma ^\pm \in {\mathcal {P}}^{sub}(X), \ \sigma ^+|_{X\setminus Y}=\sigma ^-|_{X\setminus Y}, \ \sigma ^+(X)+\sigma ^-(X)=1\Big \}. \end{aligned}$$

The maps \(\sigma \mapsto \sigma ^0:=\sigma ^+-\sigma ^-\) and \( \sigma \mapsto {\overline{\sigma }}:=\sigma ^++\sigma ^-\) will assign the effective measure and the total measure, resp., to a charged probability measure. Observe that \(\sigma ^0\) is in general a signed measure. However, we will mostly have charged measures with \(\sigma ^0\ge 0\) since we are usually starting with a subprobability \(\mu \) and take an appropriate measure \(\sigma \) such that \(\sigma ^0=\mu \).

Given \(\sigma ,\tau \in \tilde{\mathcal {P}}(Y|X)\) and a coupling \(q\in \mathrm {Cpl}({\overline{\sigma }},{\overline{\tau }})\) of their total measures, there are canonical decompositions \(\sigma ^i=\sigma ^{i+}+\sigma ^{i-}\), \(\tau ^j=\tau ^{+j}+\tau ^{-j}\), \(q=q^{++}+q^{+-}+q^{-+}+q^{--}\) such that \(q^{ij}\in \mathrm {Cpl}(\sigma ^{ij},\tau ^{ij})\) for \(i,j\in \{+,-\}\). To construct these decompositions, choose nonnegative Borel functions \(u^i, v^j\) on X with \(\sigma ^i=u^i\, {\overline{\sigma }}\), \(\tau ^j=v^j\, {\overline{\tau }}\) and set \({\text {d}}\!q^{ij}(x,y):= u^i(x) v^j(y)\,{\text {d}}\!q(x,y)\) as well as \(\sigma ^{ij}(\cdot ):=q^{ij}(\cdot ,X),\quad \tau ^{ij}(\cdot ):=q^{ij}(X,\cdot )\).

Having this canonical decomposition for \(q\in \mathrm {Cpl}({\overline{\sigma }},{\overline{\tau }})\) in mind, we define the \(L^p\)- transportation distance between charged probability measures \(\sigma ,\tau \in \tilde{\mathcal {P}}(Y|X)\) by

$$\begin{aligned} \tilde{W}_p(\sigma ,\tau )&:= \inf \Big \{ \int _{X\times X} d(x,y)^p {\text {d}}\!q^{++}(x,y)+\int _{X\times X} d^*(x,y)^p {\text {d}}\!q^{+-}(x,y) \nonumber \\&\quad +\int _{X\times X} d^*(x,y)^p {\text {d}}\!q^{-+}(x,y)+\int _{X\times X} d(x,y)^p {\text {d}}\!q^{--}(x,y) \,\Big |\, q\in \mathrm {Cpl}({\overline{\sigma }},{\overline{\tau }})\Big \}^{1/p} \end{aligned}$$
(1.1)

for \(p\in [1,\infty )\).

Define \(\tilde{\mathcal {P}}_p(Y|X):=\big \{\sigma \in \tilde{\mathcal {P}}(Y|X)\,|\, \ \tilde{W}_p\big (\sigma ,(\frac{1}{2}\delta _x,\frac{1}{2}\delta _x)\big )<\infty \) for some/all \(x\in X\big \}\). Obviously, the map \(\mu \mapsto \big (\frac{1}{2}\mu ,\frac{1}{2}\mu \big )\) defines an isometric embedding of \({\mathcal {P}}_p(X)\) into \(\tilde{\mathcal {P}}_p(Y|X)\).

Based on an isometry between \(\tilde{\mathcal {P}}_p(Y|X)\) and \({\mathcal {P}}_p({\hat{X}})\) with a suitable “glued space” \({\hat{X}}\), we will deduce important metric properties of \(\tilde{W}_p\), see Sect. 3.2:

Lemma 1.1

For each \(p\in [1,\infty )\), \(\tilde{W}_p\) is a complete separable metric on \(\tilde{\mathcal {P}}_p(Y|X)\). It is a length metric if d is a length metric; \(\tilde{\mathcal {P}}_p(Y|X)\) is compact if X is compact.

Now we are in position to define the \(L^p\)-transportation semi-metric between subprobabilities.

Definition 1.2

For \(\mu ,\nu \in {\mathcal {P}}^{sub}(Y)\) and \(p\in [1,\infty )\) we define

$$\begin{aligned} W^0_p(\mu ,\nu ):&= \inf \Big \{\tilde{W}_p(\sigma ,\tau ) \,\Big |\, \sigma ,\tau \in \tilde{\mathcal {P}}(Y|X), \sigma ^0=\mu , \tau ^0=\nu \Big \} \end{aligned}$$
(1.2)
$$\begin{aligned}&=\inf \Big \{\tilde{W}_p\big ( (\mu +\rho ,\rho ), (\nu +\eta ,\eta )\big ) \,\Big |\, \rho ,\eta \in {\mathcal {P}}^{sub}(X), (\mu +2\rho )(X)=1,\nonumber \\&\quad (\nu +2\eta )(X)=1\Big \}, \end{aligned}$$
(1.3)

called the transportation-annihilation pre-distance. Moreover, we let

$$\begin{aligned} {\mathcal {P}}^{sub}_p(Y):=\big \{ \mu \in {\mathcal {P}}^{sub}(Y) \,\big |\, W^0_p(\mu ,\delta _y)<\infty \text { for some/all } y\in Y\big \}. \end{aligned}$$

Remark 1.3

  1. (a)

    The infima in the previous Definition will be attained if X is compact.

  2. (b)

    If \(\mu \) and \(\nu \) are probability measures, then \( W^0_p(\mu ,\nu )\) coincides with the usual \(L^p\)-Kantorovich-Wasserstein metric \(W_p(\mu ,\nu )\).

  3. (c)

    In general, \( W^0_p\) will not satisfy the triangle inequality. For instance, let \(X={{\mathbb {R}}}, Y=(-3,3), \mu =\delta _{-2}, \nu =\delta _2, \xi =0\). Then

    $$\begin{aligned} W^0_p(\mu ,\nu )=4\not \le W_p^0(\mu ,\xi )+W^0_p(\xi ,\nu )=2. \end{aligned}$$
  4. (d)

    The constraints \((\mu +2\rho )(X)=1, (\nu +2\eta )(X)=1\) can equally well be replaced by the seemingly weaker constraints \((\mu +2\rho )(X)\le 1,(\nu +2\eta )(X)\le 1\). Indeed, whenever we have subprobabilities such that the constraints hold with “\(\le 1\)”, the finiteness of \(\tilde{W}_p((\mu +\rho ,\rho ),(\nu +\eta ,\eta ))\) implies that \((\mu +2\rho )(X)=(\nu +2\eta )(X)\). But then we can choose an arbitrary subprobability \(\vartheta \) with \(\vartheta (X)=\frac{1}{2}(1-(\mu +2\rho )(X))\) and define \(\tilde{\rho }:= \rho +\vartheta , {\tilde{\eta }}:=\eta + \vartheta \). These subprobabilities now satisfy \((\mu +2{\tilde{\rho }})(X)=1=(\nu +2{\tilde{\eta }})(X)\) and we have

    $$\begin{aligned} {\tilde{W}}_p((\mu +{\tilde{\rho }},{\tilde{\rho }}),(\nu +{\tilde{\eta }},{\tilde{\eta }})) \le {\tilde{W}}_p((\mu +\rho ,\rho ),(\nu +\eta ,\eta )). \end{aligned}$$

To overcome the lack of a triangle inequality for \(W_p^0\), we now strive for a related length metric. In a first step, we define a (pseudo-) metric, and out of this the induced length (pseudo-) metric.

Definition 1.4

  1. (i)

    Given \(\mu ,\nu \in {\mathcal {P}}^{sub}_p(Y)\), let

    $$\begin{aligned} W_p^\flat (\mu ,\nu ) := \inf \left\{ \sum _{i=1}^n W_p^0(\eta _{i-1},\eta _{i}) \,\Big |\, n\in {\mathbb {N}}, \eta _i\in \mathcal P^{sub}_p(Y), \eta _0=\mu ,\eta _n=\nu \right\} . \end{aligned}$$
    (1.4)
  2. (ii)

    Given a curve \((\eta _{s})_{s\in [0,1]}\subset \mathcal P^{sub}_p(Y)\), we define its \(W_p^\flat \)-length by

    $$\begin{aligned} L_p^\flat (\eta ) := \sup \left\{ \sum _{i=1}^n W_p^\flat (\eta _{s_{i-1}}, \eta _{s_i}) \,\Big |\, n\in {\mathbb {N}}, 0=s_0<\ldots <s_n=1 \right\} . \end{aligned}$$
  3. (iii)

    For two measures \(\mu ,\nu \in {\mathcal {P}}^{sub}_p(Y)\), the induced length metric is now obtained by

    $$\begin{aligned} W_p^\sharp (\mu ,\nu ) := \inf \left\{ L_p^\flat (\eta ) \,\big |\, \eta :[0,1] \rightarrow {\mathcal {P}}^{sub}_p(Y)\, W_p^\flat \text {-continuous, } \eta _0=\mu ,\eta _1=\nu \right\} . \end{aligned}$$
    (1.5)

    It will be called transportation-annihilation distance.

Remark 1.5

Both, \(W_p^\flat \) and \(W_p^\sharp \) are a priori only pseudo-metrics; the former the biggest one below \(W_p^0\), the latter the smallest intrinsic one above \(W_p^\flat \). In what follows, it will turn out however that both indeed are metrics and for \(p=1\) they coincide.

We will compare the previous (pseudo-)metrics with the Kantorovich-Wasserstein metric \(W'_p\) on the one-point completion \((Y',d')\) of Y. Here \(Y':=Y\cup \{\partial \}\) and the shortcut metric \(d'\) is given by

$$\begin{aligned} d'(x,y) := \min \{ d(x,y), d'(x,\partial ) + d'(y,\partial ) \}, \end{aligned}$$
(1.6)

for \(x,y\in Y\), \(d'(x,\partial ) = d'(\partial ,x) := \inf _{z\in X\setminus Y} d(x,z)\), and \(d'(\partial ,\partial ):=0\). If (Xd) is a complete, length metric space then so will be \((Y',d')\) . If in addition X is proper (i.e. closed balls are compact) then \((Y',d')\) will be a geodesic space.

We will further denote \(d^\dagger (x,y):= d'(x,\partial ) + d'(y,\partial )\), so that \(d'=\min \{ d,d^\dagger \}\).

Definition 1.6

  1. (i)

    \(W'_p\) will denote the \(L^p\)-Kantorovich-Wasserstein distance on \({\mathcal {P}}_p(Y')\) induced by the distance \(d'\).

  2. (ii)

    Extending each subprobability measure \(\mu \in \mathcal P^{sub}(Y)\) to a probability measure \(\mu '\in {\mathcal {P}}(Y')\) by \(\mu ':=\mu + (1-\mu (Y))\delta _\partial \) induces a bijective embedding of \({\mathcal {P}}^{sub}(Y)\) into \({\mathcal {P}}(Y')\). The induced distance on \({\mathcal {P}}^{sub}(Y)\) will again be denoted by \(W'_p\).

  3. (iii)

    For subprobability measures \(\mu ,\nu \)of equal mass we will also make use of the transportation cost

    $$\begin{aligned} W_p^\dagger (\mu ,\nu )^p := \inf _{q \in {\text {Cpl}}(\mu ,\nu )} \int _{Y\times Y} d^\dagger (x,y)^p {\text {d}}\!q(x,y) \end{aligned}$$
    (1.7)

    induced by \(d^\dagger \).

  4. (iv)

    Finally, for subprobabilities of equal mass define the \(L^p\)-transportation distance with respect to the meta-metric \(d^*\)

    $$\begin{aligned} W_p^*(\mu ,\nu )^p := \inf _{q\in \mathrm {Cpl}(\mu ,\nu )} \int _{X\times X} d^*(x,y)^p {\text {d}}\!q(x,y), \end{aligned}$$
    (1.8)

    and let \(W_p^*(\mu ):=\frac{1}{2}W_p^*(\mu ,\mu )\), which will be called annihilation cost of the subprobability \(\mu \).

Remark 1.7

Obviously, \(W_p^*\) is symmetric in its arguments and satisfies the triangle inequality but typically \(W_p^*(\mu ,\mu ) \not =0\).

Example 1.8

Let \(X={{\mathbb {R}}}, Y=(-1,1), \mu =\delta _{x}, \nu =\delta _y\) for \(x,y\in Y\). Then

$$\begin{aligned} W^0_p(\mu ,\nu )=|x-y|,\qquad W^\flat _p(\mu ,\nu )=W^\sharp _p(\mu ,\nu )=W_p'(\mu ,\nu )=\min \{|x-y|,2-|x-y|\}. \end{aligned}$$

Remark 1.9

One could equally well define

$$\begin{aligned} W_p''(\mu ,\nu ) := \inf \{ W_p(\check{\mu },\check{\nu }) \,\big |\, \check{\mu },\check{\nu }\in {\mathcal {M}}(Y'), \check{\mu }|_Y=\mu , \check{\nu }|_Y=\nu \}. \end{aligned}$$

For \(p=1\) the metrics \(W_1'\) and \(W_1''\) coincide, but for \(p>1\) this is no longer true. Take for instance \(X={\mathbb {R}}\), \(Y=(-3,3)\) and \(\mu =\delta _{-2},\nu =\delta _2\). Then \(W_p'(\mu ,\nu )^p=d'(-2,2)^p=2^p\) whereas \(W_p''(\mu ,\nu )^p \le d'(-2,\partial )^p+d'(2,\partial )^p=2\). The metric \(W_2''\) coincides with Figalli & Gigli’s metric \(Wb_2\) [7].

From now on until the end of this subsection assume that ( X d ) is a length space.

Quite intuitive characterizations of \(W^0_p(\mu ,\nu )\), \(W^\sharp _p(\mu ,\nu )\), and \(W'_p(\mu ,\nu )\) are possible in terms of \(L^p\)-transportation costs and and \(L^p\)-annihilation costs.

Lemma 1.10

  1. (i)

    For all \(\mu ,\nu \in {\mathcal {P}}^{sub}_1(Y)\)

    $$\begin{aligned} {W^0_1}(\mu ,\nu )&= \inf \Big \{ W_1(\mu _1,\nu _1) + W_1^*(\mu _0) + W^*_1(\nu _0) \,\Big |\, \\&\mu =\mu _1+\mu _0, \nu =\nu _1+\nu _0, (\mu +\nu _0)(X)\le 1, (\nu +\mu _0)(X)\le 1 \Big \}. \end{aligned}$$
  2. (ii)

    More generally for all \(p\ge 1\) and \(\mu ,\nu \in \mathcal P^{sub}_p(Y)\)

    $$\begin{aligned}&{W^0_p}(\mu ,\nu )^p \le \inf \Big \{W_p(\mu _1,\nu _1)^p + W_p^*(\mu _0)^p + W^*_p(\nu _0)^p \,\Big |\, \\&\quad \mu =\mu _1+\mu _0, \nu =\nu _1+\nu _0, (\mu +\nu _0)(X)\le 1, (\nu +\mu _0)(X)\le 1 \Big \}. \end{aligned}$$
  3. (iii)

    For all \(\mu ,\nu \in {\mathcal {P}}^{sub}_1(Y)\)

    $$\begin{aligned} W_1^\sharp (\mu ,\nu )&= \inf \Big \{ W_1(\mu _1,\nu _1) + W_1^*(\mu _0) + W^*_1(\nu _0) \,\Big |\, \mu =\mu _1+\mu _0, \nu =\nu _1+\nu _0 \Big \}. \end{aligned}$$
  4. (iv)

    For all \(\mu ,\nu \in {\mathcal {P}}^{sub}_p(Y)\)

    $$\begin{aligned} W'_p(\mu ,\nu )^p&= \inf \Big \{ W_p(\mu _1,\nu _1)^p + W^\dagger _p(\mu _2,\nu _2)^p+ W'_p(\mu _0,0)^p + W'_p(\nu _0,0)^p \,\Big |\nonumber \\ \mu&=\mu _1+\mu _2+\mu _0, \ \nu =\nu _1+\nu _2+\nu _0,\ (\mu +\nu _0)(Y)\le 1, \ (\nu +\mu _0)(Y)\le 1\Big \} \end{aligned}$$
    (1.9)

    where \(W'_p(\mu _0,0)^p = \int _Y d'(x,\partial )^p {\text {d}}\!\mu _0(x)\) with 0 denoting the subprobability measure with vanishing total mass. In the case \(p=1\), contributions from the term \(W^\dagger _p(\mu _2,\nu _2)^p\) can be avoided, in other words, one can always choose \(\mu _2=\nu _2=0\).

Lemma 1.11

For all \(p\ge 1\) and all \(\mu \in {\mathcal {P}}_p(Y)\)

$$\begin{aligned} 2^{-1+1/p}\, W'_p(\mu ,0)\le W^*_p(\mu )\le W'_p(\mu ,0)=\inf \big \{ W_p(\mu ,\xi ) \,\big |\, \ \xi \in {\mathcal {P}}( \partial Y)\big \}. \end{aligned}$$

In particular, \(W^*_1(\mu )=W'_1(\mu ,0)\). More generally, for all \(\mu ,\nu \in {\mathcal {P}}_1(Y)\)

$$\begin{aligned} W^*_1(\mu ,\nu )=\inf \Big \{ W_1(\mu ,\xi )+W_1(\xi ,\nu ) \,\big |\, \ \xi \in {\mathcal {P}}( \partial Y)\Big \}. \end{aligned}$$

Remark 1.12

In general, \(W^*_p(\mu )\) and \(W'_p(\mu ,0)\) will not coincide. Our lower bound for \(W^*_p(\mu )/W'_p(\mu ,0)\) is sharp.

For instance, let \(Y=(0,2)\subset X={{\mathbb {R}}}\) and \(\mu =\frac{1}{2}(\delta _1+\delta _\varepsilon )\) for some \(\varepsilon \in (0,1)\). Then \(W'_p(\mu ,0)^p=\frac{1}{2}(1+\varepsilon ^p)\) whereas \(W^*_p(\mu )^p=\big (\frac{1+\varepsilon }{2}\big )^p\). Thus for \(\varepsilon \) sufficiently small, \(W^*_p(\mu )/W'_p(\mu ,0)\) is arbitrarily close to \(2^{-1+1/p}\).

Theorem 1.13

  1. (i)

    For all \(\mu ,\nu \in {\mathcal {P}}^{sub}_1(Y)\)

    $$\begin{aligned} W_1^\flat (\mu ,\nu ) = W_1^\sharp (\mu , \nu )=W_1'(\mu , \nu ). \end{aligned}$$
  2. (ii)

    More generally, for all \(p\ge 1\) and all \(\mu ,\nu \in \mathcal P^{sub}_p(Y)\)

    $$\begin{aligned} W_1'(\mu , \nu )\le W_p^\flat (\mu ,\nu )\le W^\sharp _p(\mu ,\nu )\le W'_p(\mu ,\nu ). \end{aligned}$$

Example 1.14

Let \(X={{\mathbb {R}}}, Y=(-2,2), \mu =\frac{1}{2n+1}\delta _{-1/2}, \nu =\frac{1}{2n+1}\delta _{+1/2}\) for \(n\in {{\mathbb {N}}}\). Then

$$\begin{aligned} W'_p(\mu ,\nu )^p=W_p(\mu ,\nu )^p=\frac{1}{2n+1}. \end{aligned}$$

Taking

$$\begin{aligned} \sigma := \left( \frac{1}{2n+1} \sum _{k=0}^n \delta _{\frac{2k}{2n+1}-\frac{1}{2}}, \frac{1}{2n+1} \sum _{k=1}^n \delta _{\frac{2k}{2n+1}-\frac{1}{2}} \right) \end{aligned}$$

and

$$\begin{aligned} \tau := \left( \frac{1}{2n+1} \sum _{k=0}^{n} \delta _{\frac{2k+1}{2n+1}-\frac{1}{2}}, \frac{1}{2n+1} \sum _{k=0}^{n-1} \delta _{\frac{2k+1}{2n+1}-\frac{1}{2}} \right) , \end{aligned}$$

we see that

$$\begin{aligned} W_p^0(\mu ,\nu )^p \le {\tilde{W}}_p (\sigma ,\tau )^p = \left( \frac{1}{2n+1} \right) ^p, \end{aligned}$$

so that

$$\begin{aligned} W_p^\flat (\mu ,\nu ) \le W_p^0(\mu ,\nu ) \le \left( \frac{1}{2n+1} \right) < \left( \frac{1}{2n+1} \right) ^{\frac{1}{p}} = W_p'(\mu ,\nu ), \end{aligned}$$

for \(p>1\), \(n\ge 1\). In particular, the lower estimate for \(W^\flat _p\) in assertion ii) of the previous Theorem is sharp.

A useful feature of \(W_p^\sharp \) is that it metrizes vague convergence of subprobability measures.

Proposition 1.15

Assume that X is a compact geodesic space. Then for every \(p\in [1,\infty )\), \(W^\sharp _p\) is a complete, separable, geodesic metric on \({\mathcal {P}}^{sub}_p(Y)\) and for \(\mu _n,\mu \in \mathcal P^{sub}_p(Y)\) the following are equivalent:

  1. (i)

    \(\mu _n\rightarrow \mu \) vaguely on Y.

  2. (ii)

    \(W^\sharp _p(\mu _n,\mu )\rightarrow 0\) as \(n\rightarrow \infty \)

Remark 1.16

In particular, this implies that \(\mu _n\rightarrow \mu \) weakly on Y if and only if \(W^\sharp _p(\mu _n,\mu )\rightarrow 0\) and \(\mu _n(Y)\rightarrow \mu (Y)\). A similar result for \(W^0_p\) can be deduced even without requiring that X is geodesic, see Lemma 4.4.

The implication “(ii)\(\Rightarrow \)(i)” holds true for all length spaces X without requiring their compactness. For the converse, one has to add a condition on convergence of moments, see remark following Lemma 4.4.

1.2 Gradient flow perspective and transportation estimates

From now on, let us be more specific. We assume that \((X,d,{\mathfrak {m}})\) is a metric measure space which satisfies an RCD\((K,\infty )\)-condition for some number \(K\in {{\mathbb {R}}}\) and that \(Y\subset X\) is a dense open subset with \({\mathfrak {m}}(\partial Y)=0\). The RCD\((K,\infty )\)-condition means that the metric measure space \((X,d,{\mathfrak {m}})\) is infinitesimally Hilbertian with Ricci curvature bounded from below by K in the sense of Lott-Sturm-Villani, [13, 24]. The latter is formulated as K-convexity of the Boltzmann entropy \(\mathrm {Ent}_{\mathfrak {m}}\) in \(\big ({\mathcal {P}}_2(X), W_2\big )\). We will additionally request that this property extends to the space of charged probability measures induced by Y, that is, we will request that \((X,Y,d,{\mathfrak {m}})\) satisfies the following:

Assumption 1.17

(“Charged Lower Ricci Bound K) The Boltzmann entropy

$$\begin{aligned}&{\widetilde{\mathrm {Ent}}}_{\mathfrak {m}}: \quad \tilde{\mathcal {P}}_2(Y|X)\rightarrow (-\infty ,\infty ]\\&\quad \sigma \mapsto \mathrm {Ent}_{\mathfrak {m}}(\sigma ^+)+\mathrm {Ent}_{\mathfrak {m}}(\sigma ^-) \end{aligned}$$

is K-convex in the metric space \(\big (\tilde{\mathcal {P}}_2(Y|X), \tilde{W}_2\big )\).

Remark 1.18

  1. (a)

    Note that, due to the isometric embedding of \({\mathcal {P}}_2(X)\) into \(\tilde{\mathcal {P}}_2(Y|X)\), this assumption will imply the K-convexity of \(\mathrm {Ent}_{\mathfrak {m}}\) in \(\big ({\mathcal {P}}_2(X), W_2\big )\) and thus the CD\((K,\infty )\)-condition for the metric measure space \((X,d,{\mathfrak {m}})\).

  2. (b)

    If \((X,d,{\mathfrak {m}})\) is infinitesimally Hilbertian and if \({\mathfrak {m}}\) has full topological support then Assumption 1.17 implies that \({\overline{Y}}=X\). Indeed, the argument from [20] carries over to this framework and yields essential non-branching which in turn implies the density of Y in X.

The proofs of the following results will be given in Sect. 5. They will be based on concepts and results for gluing of metric measure spaces which will be presented in Sect. 3. For the various kinds of heat flows appearing from this section on, see Sect. 2.2.

Theorem 1.19

Let (Mg) be a complete Riemannian manifold with Ricci curvature bounded below by \(K\in {\mathbb {R}}\). Take an open, bounded, convex subset \(Y\subset M\) with smooth, compact boundary. Consider the closure \(X:= {\overline{Y}}\) with the Riemannian distance d and the Riemannian volume measure \({\mathfrak {m}}\) obtained by restriction to X. Then the metric measure space \((X,d,{\mathfrak {m}})\) satisfies the RCD\((K,\infty )\)-condition and \((X,Y,d,{\mathfrak {m}})\) satisfies Assumption 1.17.

Proposition 1.20

Assume that Assumption 1.17 holds.

  1. (i)

    For each \(\sigma _0\in \tilde{\mathcal {P}}_2(Y|X)\), there exists a unique \({\text {EVI}}_K\)-gradient flow \((\sigma _t)_{t>0}\) for the Boltzmann entropy \({\widetilde{\mathrm {Ent}}}_{\mathfrak {m}}\) in \(\big (\tilde{\mathcal {P}}_2(Y|X), \tilde{W}_2\big )\).

  2. (ii)

    For each \(\mu _0\in {\mathcal {P}}_2^{sub}(Y)\), the heat flow \((\mu _t)_{t>0}\) on Y with Dirichlet boundary conditions is obtained as the effective flow

    $$\begin{aligned} \mu _t=\sigma ^+_t-\sigma ^-_t \end{aligned}$$

    where \((\sigma _t)_{t>0}\) is the \({\text {EVI}}_K\)-flow as above starting in any \(\sigma _0\in \tilde{\mathcal {P}}_2(Y|X)\) with \(\mu _0=\sigma ^+_0-\sigma ^-_0\).

  3. (iii)

    For each \(\nu _0\in {\mathcal {P}}_2(X)\), the heat flow \((\nu _t)_{t>0}\) on X is obtained as the total flow

    $$\begin{aligned} \nu _t=\sigma ^+_t+\sigma ^-_t \end{aligned}$$

    where \((\sigma _t)_{t>0}\) is the \({\text {EVI}}_K\)-flow as above starting in any \(\sigma _0\in \tilde{\mathcal {P}}_2(Y|X)\) with \(\nu _0=\sigma ^+_0+\sigma ^-_0\).

  4. iv)

    For each \(\sigma _0\in \tilde{\mathcal {P}}_2(Y|X)\), the \({\text {EVI}}_K\)-flow \((\sigma _t)_{t>0}\) from i) can be characterized as

    $$\begin{aligned} \sigma _t=\Big ( \frac{\nu _t+\mu _t}{2}, \frac{\nu _t-\mu _t}{2}\Big ) \end{aligned}$$

    where \((\nu _t)_{t>0}\) will denote the heat flow on X starting in \(\nu _0=\sigma ^+_0+\sigma ^-_0\) and \((\mu _t)_{t>0}\) will denote the heat flow on Y with Dirichlet boundary conditions starting in \(\mu _0=\sigma ^+_0-\sigma ^-_0\).

Remark 1.21

  1. (a)

    As in [21, after Cor. 4.3, Thm. 4.4] (based on [5, Prop. 3.2, Thm. 3.5]) one can extend the flow to measures without finite second moment.

  2. (b)

    In the situation of Theorem 1.19, the “heat flow on X” will be the heat flow on \({\overline{Y}}\subset M\) with Neumann boundary conditions at \(\partial Y\).

Proposition 1.22

The \({\text {EVI}}_K\)-flows \((\sigma _t)_{t>0}\) and \((\tau _t)_{t>0}\) as above are K-contractive in all \(L^p\)-transportation distances:

$$\begin{aligned} \tilde{W}_p\big ( \sigma _t,\tau _t)\le e^{-Kt}\cdot \tilde{W}_p\big ( \sigma _0,\tau _0) \end{aligned}$$

for all \(t>0\) and all \(p\in [1,\infty )\).

Theorem 1.23

For all \(\mu _0,\nu _0\in {\mathcal {P}}^{sub}_p(Y)\), all \(t>0\) and all \(p\in [1,\infty )\)

$$\begin{aligned} W^0_p\big ( \mu _t,\nu _t)\le e^{-Kt}\cdot W^0_p\big ( \mu _0,\nu _0) \end{aligned}$$

where \((\mu _t)_{t>0}\) and \((\nu _t)_{t>0}\) denote the heat flows on Y with Dirichlet boundary conditions starting in \(\mu _0\) and \(\nu _0\), resp.

Proof

Given \(\mu _0,\nu _0\in {\mathcal {P}}^{sub}_p(Y)\) and \(\varepsilon >0\), we may choose \(\sigma _0,\tau _0\in {\tilde{{\mathcal {P}}}}_p(Y|X)\) with \(\mu _0=\sigma ^+_0-\sigma ^-_0\) and \(\nu _0=\tau ^+_0-\tau ^-_0\) such that

$$\begin{aligned} \tilde{W}_p\big ( \sigma _0,\tau _0)\le W^0_p\big ( \mu _0,\nu _0)+\varepsilon . \end{aligned}$$

Thus, by the very definition of \( W^0_p\) and by the previous proposition,

$$\begin{aligned} W^0_p\big ( \mu _t,\nu _t) \le \tilde{W}_p\big ( \sigma _t,\tau _t) \le e^{-Kt}\cdot \tilde{W}_p\big ( \sigma _0,\tau _0)=e^{-Kt}\cdot \Big ( W^0_p\big ( \mu _0,\nu _0)+\varepsilon \Big ). \end{aligned}$$

Since \(\varepsilon >0\) was arbitrary, this proves the claim. \(\square \)

Corollary 1.24

Let \(\mu _0,\nu _0\in {\mathcal {P}}^{sub}_p(Y)\), and \((\mu _t)_{t>0}\) and \((\nu _t)_{t>0}\) denote the heat flows on Y with Dirichlet boundary conditions starting in \(\mu _0\) and \(\nu _0\), resp. Then for all \(t>0\) and all \(p\in [1,\infty )\) we have both

$$\begin{aligned} W^\flat _p\big ( \mu _t,\nu _t)\le e^{-Kt}\cdot W^\flat _p\big ( \mu _0,\nu _0), \end{aligned}$$

and

$$\begin{aligned} W^\sharp _p\big ( \mu _t,\nu _t)\le e^{-Kt}\cdot W^\sharp _p\big ( \mu _0,\nu _0). \end{aligned}$$

In particular, \(W'_1\big ( \mu _t,\nu _t)\le e^{-Kt}\cdot W'_1\big ( \mu _0,\nu _0)\).

Proof

Observe that

$$\begin{aligned} W_p^\flat (\mu _t,\nu _t)&= \inf \left\{ \sum _{i=1}^n W_p^0(\eta _{i-1}, \eta _i) \,\big |\, n\in {\mathbb {N}}, \eta _i\in \mathcal P^{sub}_p(Y), \eta _0=\mu _t, \eta _n=\nu _t \right\} \\&\le \inf \left\{ \sum _{i=1}^n W_p^0({\mathscr {P}}_t^0\xi _{i-1}, {\mathscr {P}}_t^0\xi _i) \,\big |\, n\in {\mathbb {N}}, \xi _i\in \mathcal P^{sub}_p(Y), \xi _0=\mu _0, \xi _n=\nu _0 \right\} \\&\le e^{-Kt} \inf \left\{ \sum _{i=1}^n W_p^0(\xi _{i-1}, \xi _i) \,\big |\, n\in {\mathbb {N}}, \xi _i\in {\mathcal {P}}^{sub}_p(Y), \xi _0=\mu _0, \xi _n=\nu _0 \right\} \\&= e^{-Kt} W_p^\flat (\mu _0,\nu _0). \end{aligned}$$

Here, \({\mathscr {P}}_t^0\) is the heat semigroup with Dirichlet boundary conditions on measures, see Sect. 2.2. This also implies that for a curve \((\eta _s)_{s\in [0,1]}\subset \mathcal P^{sub}_p(Y)\) its length satisfies \(L_p^\flat ({\mathscr {P}}_t\eta ) \le e^{-Kt} L_p^\flat (\eta )\), so that eventually

$$\begin{aligned} W_p^\sharp (\mu _t,\nu _t) =\inf _{\eta :\mu _t\leadsto \nu _t} L_p^\flat (\eta ) \le \inf _{\xi :\mu _0 \leadsto \nu _0} L_p^\flat ({\mathscr {P}}_t \xi ) \le e^{-Kt} \inf _{\xi :\mu _0 \leadsto \nu _0} L_p^\flat (\xi ) = e^{-Kt} W_p^\sharp (\mu _0,\nu _0). \end{aligned}$$

\(\square \)

1.3 Gradient estimates and Bochner’s inequality

Let us continue to assume that \((X,d,{\mathfrak {m}})\) is a metric measure space which satisfies an RCD\((K,\infty )\)-condition and that \(Y\subset X\) is a dense open subset with \({\mathfrak {m}}(\partial Y)=0\). Assumption 1.17 yields a gradient estimate which involves both semigroups, \(P_t\) (with Neumann boundary condition) and \(P_t^0\) (with Dirichlet boundary condition). Before proving this estimate, we will see that it is equivalent to a Bochner inequality which involves the corresponding Laplace operators. To state directly the p-versions, let us introduce the appropriate function spaces. For \(p\in [1,\infty )\) we set

$$\begin{aligned} D_p({\mathcal {E}}) :&= \{ f\in D({\mathcal {E}})\cap L^p(X,{\mathfrak {m}}) \,\big |\, |\nabla f| \in L^p(X,{\mathfrak {m}}) \}, \end{aligned}$$
(1.10)
$$\begin{aligned} D_p(\Delta ) :&= \{ f\in D(\Delta ) \cap L^p(X,{\mathfrak {m}}) \,\big |\, \Delta f\in L^p(X,{\mathfrak {m}}) \}, \end{aligned}$$
(1.11)

and similarly for \({\mathcal {E}}^0\) and \(\Delta ^0\), which are the Dirichlet form and generator associated to the heat flow \(P_t^0\).

Proposition 1.25

Assume that \({\mathfrak {m}}(X)<\infty \). For each \(p\in [1,2]\), the following properties are equivalent to each other:

  1. (i)

    For all \(t>0\), and all \(f\in D_p({\mathcal {E}}^0)\)

    $$\begin{aligned} \big |\nabla P^0_tf\big |^p\le e^{-Kpt}\cdot P_t\big (|\nabla f|^p\big ) \;\;\;\;\; \hbox {} {\mathfrak {m}}\hbox {-a.e.\! in} X \;\;\;\text { (``{} { p}-gradient estimate'')}. \end{aligned}$$
    (1.12)

    Note that different semigroups appear on the left and right hand side.

  2. (ii)

    For all \(f\in D_p(\Delta ^0)\) with \(\Delta ^0f\in D_p({\mathcal {E}}^0)\) and every \(\varphi \in D_\infty (\Delta )\) with \(\varphi \ge 0\)

    $$\begin{aligned}&\frac{1}{p} \int _X \Delta \varphi |\nabla f|^p \mathop {}\!\mathrm {d}{\mathfrak {m}}- \int _{\{|\nabla f| \ne 0 \}} \varphi |\nabla f|^{p-2} \nabla f \cdot \nabla \Delta ^0 f\mathop {}\!\mathrm {d}{\mathfrak {m}}\nonumber \\&\quad \ge K \int _X \varphi |\nabla f|^p \mathop {}\!\mathrm {d}{\mathfrak {m}}\;\;\; (``p\text {-Bochner inequality'')}. \end{aligned}$$
    (1.13)

The proof is an adaption of the one of [9, Thm. 3.6].

Theorem 1.26

  1. (i)

    Assumption 1.17 implies that both properties (i) and (ii) of Proposition 1.25 are satisfied, even for all \(p\in [1,\infty )\) and without the assumption that \({\mathfrak {m}}(X)<\infty \).

  2. (ii)

    Moreover, it implies that the flows from Proposition 1.20 and the heat semigroups are related to each other by

    $$\begin{aligned} \nu _t=(P_tv){\mathfrak {m}}, \qquad \mu _t=(P_t^0w){\mathfrak {m}}\end{aligned}$$

    for \(\nu _0=v{\mathfrak {m}}\in {\mathcal {P}}_2(X)\) and \(\mu _0=w{\mathfrak {m}}\in \mathcal P^{sub}_2(X)\).

Corollary 1.27

Assume 1.17. Then for all \(u:X\rightarrow {{\mathbb {R}}}\) and all \(t>0\)

$$\begin{aligned} {\text {Lip}}_d(P_t^0u)\le e^{-Kt}\, {\text {Lip}}_d(u) \end{aligned}$$

as well as

$$\begin{aligned} {\text {Lip}}_{d'}(P_t^0u)\le e^{-Kt}\, {\text {Lip}}_{d'}(u). \end{aligned}$$

Here \({\text {Lip}}_{d}(.)\) denotes the Lipschitz constant w.r.t. the original metric d on \(X={\overline{Y}}\) whereas \({\text {Lip}}_{d'}(.)\) denotes the Lipschitz constant w.r.t. the shortcut metric \(d'\) on \(Y'=Y\cup \{\partial \}\).

Proof

The \({\text {Lip}}_{d}\)-estimate follows from the previous gradient estimates (1.12) by taking supremum norm. The \({\text {Lip}}_{d'}\)-estimate, on the other hand, follows via Kuwada duality from the transport estimate in Corollary 1.24 with \(p=1\). \(\square \)

Let us finally give a geometric characterization of Assumption 1.17. Given a metric measure space \((V,d_V,{\mathfrak {m}}_V)\) we say that an open subset \(U\subset V\) is a halfspace if there exists a measure-preserving isometry \(\psi : V\rightarrow V\) with invariant set \(\partial U=\{x\in V: \psi (x)=x\}\) such that \(\psi (U)=V\setminus {\overline{U}}\). We call two metric measure spaces \((V,d_V,{\mathfrak {m}}_V)\) and \((W,d_W,{\mathfrak {m}}_W)\)mms-isomorphic if there exists a measure-preserving isometry \(\xi :(V,d_V,{\mathfrak {m}}_V) \rightarrow (W,d_W,{\mathfrak {m}}_W)\).

Theorem 1.28

Let \((X,d,{\mathfrak {m}})\) be a metric measure space, and \(Y\subset X\) an open local \({\text {RCD}}^*(K,\infty )\) space. The following properties are equivalent

  1. (i)

    Assumption 1.17.

  2. (ii)

    Y is a halfspace in some \({\text {RCD}}^*(K,\infty )\)-space \((V,d_V,{\mathfrak {m}}_V)\) in the sense that there is a halfspace \(\tilde{Y}\subset V\) and a measure-preserving isometry \(\xi :(Y,d,{\mathfrak {m}}|_Y)\rightarrow ({\tilde{Y}},d_V,{\mathfrak {m}}_V|_{{\tilde{Y}}})\).

  3. (iii)

    \(\partial Y\) is covered by open sets \(X_i\) such that \(Y\cap X_i\) for each i is mms-isomorphic to a halfspace \(W_i\) in some \({\text {RCD}}^*(K,\infty )\)-space \((V_i,d_i,{\mathfrak {m}}_i)\).

Remark 1.29

The heat flow with Dirichlet boundary values from an optimal transport perspective, to our knowledge has so far only been investigated in [7], where the authors define a transportation distance between measures allowing to create or destroy mass at the boundary. This metric is a modification of our transportation metric \(W_2'\) based on the shortcut metric \(d'\), see Remark 1.9. This leads to a gradient flow description of the heat equation with strictly positive, constant Dirichlet boundary conditions. However, it does not apply to the study of the heat flow with vanishing Dirichlet boundary conditions. Further approaches to metrics on the space of finite Radon measures are given in [11, 12, 18].

Structure of the paper: In Sect. 1 we introduced the setting of particles and antiparticles, giving definitions, stating the main results and giving proofs of those results which do not need the doubling. Section 2 deals with the heat flow on metric measure spaces. In particular, the heat flow with Dirichlet boundary values is discussed. In Sect. 3, gluing of metric measure spaces is introduced and the space of charged probability measures is identified with the space of probability measures on the doubled space. Section 4 is devoted to the detailed study of various (generalized) metrics on the space of probability measures. Finally, in Sect. 5, we present the remaining proofs of the results of Sects. 1.2 and 1.3.

In the sequel, the notion of a metric on a space X will be crucial: it is a real-valued, symmetric function on \(X\times X\) which satisfies the triangle inequality, vanishes on the diagonal and is positive otherwise. We will also use several extensions which satisfy all but one of the above properties:

  • extended metric: also the value \(+\infty \) is admitted

  • pseudo-metric: may vanish also outside the diagonal

  • meta-metric: not necessarily vanishing on the diagonal

  • semi-metric: triangle inequality is not requested.

As we will encounter as much as 9 generalized “W- metrics”, let us give a short overview where to find the definitions:

  • \(W_p\) usual Kantorovich-Wasserstein metric on \({\mathcal {P}}_p(X)\)

  • \({\tilde{W}}_p\) transportation metric on \(\tilde{\mathcal P}_p(Y|X)\), (1.1)

  • \(W_p^0\) transportation-annihilation pre-metric on \(\mathcal P^{sub}_p(Y)\), (1.2)

  • \(W_p^\flat \) pseudo-metric on \({\mathcal {P}}^{sub}_p(Y)\), (1.4)

  • \(W_p^\sharp \) transportation-annihilation metric on \(\mathcal P^{sub}_p(Y)\), (1.5)

  • \(W_p'\) Kantorovich-Wasserstein metric on \({\mathcal {P}}_p(Y')\), based on shortcut metric \(d'\), (1.6)

  • \(W_p^\dagger \) transportation cost “over the boundary” on measures on Y of the same mass, (1.7)

  • \(W_p^*\) annihilation cost; meta-metric on measures on X of the same mass, (1.8)

  • \(\hat{W}_p\) Kantorovich-Wasserstein metric on \(\mathcal P_p({\hat{X}})\), Lemma 3.11

2 Metric measure spaces and heat flows

2.1 Gradients and Dirichlet forms

In this subsection we will introduce some notation and collect some results for Dirichlet forms on the original space X.

Let (Xd) be a complete, separable, length metric space, and let \({\mathfrak {m}}\) be a Borel measure with full support \({\text {supp}}{\mathfrak {m}}=X\), satisfying the exponential integrability condition

$$\begin{aligned} \int _X e^{-cd(x,x^*)^2} \mathop {}\!\mathrm {d}{\mathfrak {m}}(x) < \infty \end{aligned}$$
(2.1)

for some \(c>0,\,x^*\in X\).

The Cheeger energy of a function \(f\in L^2(X,{\mathfrak {m}})\) is defined as

$$\begin{aligned} {\text {Ch}}(f):= \inf \left\{ \liminf _{k\rightarrow \infty } \frac{1}{2}\!\int _{X}\! |{\text {lip}}(f_k)|^2 \mathop {}\!\mathrm {d}{\mathfrak {m}}\,\Big |\, f_k\!\in {\text {Lip}}(X,d), \text { s.t. } f_k\rightarrow f \text { in } L^2(X,{\mathfrak {m}}) \right\} , \end{aligned}$$

with domain \({\mathcal {F}}:=\{ f\in L^2(X,{\mathfrak {m}}) \,\big |\, {\text {Ch}}(f)<\infty \}\) (sometimes also denoted by \(D({\text {Ch}})\) or \(W^{1,2}(X,d,{\mathfrak {m}})\)). Here \({\text {lip}}(f)(x) :=\limsup _{y\rightarrow x} \frac{|f(x)-f(y)|}{d(x,y)}\) denotes the local Lipschitz constant of the function f. Functions \(f\in {\mathcal {F}}\) have a weak gradient, i.e. a function \(|\nabla f|\in L^2(X,{\mathfrak {m}})\) such that \({\text {Ch}}(f)=\frac{1}{2}\int _X |\nabla f|^2\mathop {}\!\mathrm {d}{\mathfrak {m}}\).

In what follows, we always assume that X is infinitesimally Hilbertian, meaning that \({\text {Ch}}\) is a quadratic form. By polarisation of \({\mathcal {E}}(f):= 2{\text {Ch}}(f)\) we get a strongly local Dirichlet form \(({\mathcal {E}}, D({\mathcal {E}}))\) on \(L^2(X,{\mathfrak {m}})\), where \(D(\mathcal E):= {\mathcal {F}}\). The domain is then a Hilbert space with norm \(\Vert f \Vert _{{\mathcal {E}}}^2 := \Vert f\Vert _{L^2(X,{\mathfrak {m}})}^2+ {\mathcal {E}}(f)\). Thanks to the exponential integrability (2.1), the Cheeger energy is quasi-regular, cf. [21, Thm. 4.1].

Given an open subset \(Y\subset X\) with \({\mathfrak {m}}(\partial Y)=0\), restricting to functions which vanish on \(Z:=X\setminus Y\) quasi-everywhere, we get another Dirichlet form, corresponding to homogeneous Dirichlet “boundary values” on Z:

$$\begin{aligned} {\left\{ \begin{array}{ll} D({\mathcal {E}}^0) := \{ f\in D({\mathcal {E}}) \,\big |\, {\tilde{f}}=0 \text { quasi-everywhere on } Z \}, \\ {\mathcal {E}}^0(f) := {\mathcal {E}}(f) \text { for } f\in D({\mathcal {E}}^0), \end{array}\right. } \end{aligned}$$
(2.2)

where \({\tilde{f}}\) is the quasi-continuous representative of f.

By general Dirichlet form theory, a symmetric, strongly continuous contraction semigroup on \(L^2(X,{\mathfrak {m}})\) is associated with each Dirichlet form. Thus we have a semigroup \((P_t)_{t>0}\) associated with \(({\mathcal {E}}, D({\mathcal {E}}))\) and another one \((P_t^0)_{t>0}\) associated with \(({\mathcal {E}}^0, D({\mathcal {E}}^0))\). They are related to the Dirichlet forms in the following way: For functions \(f,g\in L^2(X,{\mathfrak {m}})\) define the approximated forms \({\mathcal {E}}_t, \mathcal E_t^0:L^2(X,{\mathfrak {m}})\times L^2(X,{\mathfrak {m}})\rightarrow {\mathbb {R}}\) by

$$\begin{aligned} {\mathcal {E}}_t(f,g):&= -\frac{1}{t} \int _X g (P_tf-f) \mathop {}\!\mathrm {d}{\mathfrak {m}}, \\ {\mathcal {E}}_t^0(f,g):&= -\frac{1}{t} \int _X g (P_t^0f-f) \mathop {}\!\mathrm {d}{\mathfrak {m}}. \end{aligned}$$

Then we can recover the corresponding Dirichlet form in the following way (see [8, Lemma 1.3.4]):

$$\begin{aligned} {\left\{ \begin{array}{ll} D({\mathcal {E}}) = \left\{ f\in L^2(X,{\mathfrak {m}}) \,\Big |\, \lim _{t\rightarrow 0} {\mathcal {E}}_t(f,f) <\infty \right\} , \\ {\mathcal {E}}(f,g) =\lim _{t\rightarrow 0} {\mathcal {E}}_t(f,g), \; \text {for } f,g\in D({\mathcal {E}}). \end{array}\right. } \end{aligned}$$
(2.3)

Further, for \(f\in L^2(X,{\mathfrak {m}})\) the map \(t\mapsto {\mathcal {E}}_t(f,f)\) is non-increasing and non-negative. The same is true for \(P_t^0\) and \(({\mathcal {E}}^0, D({\mathcal {E}}^0))\).

2.2 Heat flows

Let us clarify the different heat flows. We have the “usual” heat flow and the one with Dirichlet boundary values, and to each a corresponding “dual” flow for measures.

2.2.1 Heat flow \(P_t\) for functions on X

The heat flow \((t,u_0)\mapsto u_t=P_tu_0\) is defined by means of the semigroup in \(L^2(X,{\mathfrak {m}})\) corresponding to the Dirichlet form \(({\mathcal {E}}, D({\mathcal {E}}))\).

2.2.2 Heat flow \({\mathscr {P}}_t\) for probability measures on X

From now on we additionally assume that \((X,d,{\mathfrak {m}})\) is an \({\text {RCD}}^*(K,\infty )\) space. In this case, there is a Brownian motion \((B_t,{{\mathbb {P}}}_x)\) on X and corresponding to this a Markov kernel \(p_t(x,A)= {{\mathbb {P}}}_x(B_t\in A)\) (and even a heat kernel), all corresponding to the Dirichlet form \({\mathcal {E}}\), see [ [3],Sections 7.1, 7.2]. We use it to define the heat flow for probability measures: for \(\mu \in {\mathcal {P}}(X)\) let

$$\begin{aligned} {\mathscr {P}}_t\mu (A) := \int _X p_t(x,A) {\text {d}}\!\mu (x). \end{aligned}$$

This coincides with the \({\text {EVI}}_K\)-flow of the entropy in \(({\mathcal {P}}_2(X),W_2)\). Since the Brownian motion is connected to the Dirichlet form \({\mathcal {E}}\) uniquely, we get the following formula for the heat flow on functions through the Markov kernel

$$\begin{aligned} P_tf(x) = \int _X f(y) p_t(x,{\text {d}}\!y). \end{aligned}$$

The heat semigroups \(P_t\) and \({\mathscr {P}}_t\) are dual in the following sense: For \(f:X\rightarrow {\mathbb {R}}\) bounded Borel, and \(\mu \in {\mathcal {P}}(X)\) we have

$$\begin{aligned} \int _X P_tf(x) {\text {d}}\!\mu (x)&= \int _X \int _X f(y)p_t(x,{\text {d}}\!y) {\text {d}}\!\mu (x) = \int _X f(y) \int _X p_t(x,{\text {d}}\!y) {\text {d}}\!\mu (x) \nonumber \\&= \int _X f(y) {\text {d}}\!{\mathscr {P}}_t\mu (y). \end{aligned}$$
(2.4)

The same applies to the heat flows \({\hat{P}}_t\) and \({\hat{{\mathscr {P}}}}_t\) on \({\hat{X}}\) (to be discussed in detail in the next section) and the equivalent flow \({\tilde{{\mathscr {P}}}}_t\) on \(\tilde{{\mathcal {P}}}(Y|X)\), defined by means of the isometry introduced in Lemma 3.11.

2.2.3 Heat flow with Dirichlet boundary values on Y

Let \(Y\subset X\) be open and with \({\mathfrak {m}}(\partial Y)=0\). Let us define a stopping time

$$\begin{aligned} \tau _Z := \inf \{t>0 \,\big |\, B_t\in Z\}, \end{aligned}$$

where as before \(Z:=X\setminus Y\). Then we can define a Markov kernel

$$\begin{aligned} p_t^0(x,A) := {{\mathbb {P}}}_x(B_t\in A,\, t < \tau _Z). \end{aligned}$$

Note that we use Fukushima’s convention that a Markov kernel is a subprobability on X, in particular \(p_t^0(x,A)\le p_t(x,A)\). This Markov kernel is associated to the Dirichlet form \(({\mathcal {E}}^0,D({\mathcal {E}}^0))\) given by (2.2), see [8, Thm. 4.4.2]. With this we can define the heat flows for bounded Borel functions \(f:X\rightarrow {\mathbb {R}}\) and measures \(\mu \in {\mathcal {P}}^{sub}(X)\) as

$$\begin{aligned} P_t^0f(x) := \int _X f(y) p_t^0(x,{\text {d}}\!y) \end{aligned}$$

and

$$\begin{aligned} {\mathscr {P}}_t^0\mu (A) := \int _X p_t^0(x,A) {\text {d}}\!\mu (x). \end{aligned}$$

They also satisfy the duality relation (2.4).

Remark 2.1

With the help of the Markov kernels, all of these heat flows of measures can be extended to signed, finite Borel measures.

3 Gluing

In this section we glue together a finite number of copies of an open subset in a metric measure space “along the boundary”. We will identify the Cheeger energy and the heat semigroup of the glued space in terms of the original objects.

Beginning with Alexandrov in the 40s, gluing has been studied in connection with curvature bounds a number of times, but mostly in Alexandrov spaces, see [1, “Verheftungssatz” Kap. IX, §3] [17, Chapter I, §11], [15, §5], [16, Theorem 2.1], [10, Theorem 1.1]. More recently, Schlichting [22, 23] applied the method of [10] to show preservation of various curvature bounds (among them Ricci curvature) on manifolds in an approximate sense which we will use later to give the Riemannian case as an example. In [14], metric measure spaces supporting Dirichlet forms are glued together. There is also a very recent preprint by Rizzi which shows that gluing does not preserve the dimension in the measure-contraction property [19]. Apart from curvature bounds, the doubling of manifolds with boundary has also been applied by other communities to produce a related manifold without boundary, see for instance [2].

3.1 Gluing of metric measure spaces

Take an open subset \(Y\subset X\) and denote \(Z:= X\setminus Y\). Fix a number \(k\in {\mathbb {N}}\). We now consider k copies of X, denoted by \(X^1,\dots ,X^k\). We will identify these spaces with the original one via maps \(\iota _i:X\rightarrow X^i, i=1,\dots ,k\), which send points \(x\in X\) to the corresponding points in \(X^i\). Each \(X^i\) is equipped with the metric \(d_i:=d\circ (\iota _i^{-1}, \iota _i^{-1})\) and the measure \({\mathfrak {m}}^i:={\iota _i}_{\#}{\mathfrak {m}}\), but in this section we usually suppress the indices and write d and \({\mathfrak {m}}\) on every \(X^i\). Let \(Y^i:= \iota _i(Y),\, Z^i:= \iota _i(Z)\). We define an equivalence relation by identifying the points in the \(Z^i\)’s:

$$\begin{aligned} X^i\ni x \sim y\in X^j \;\;\; :\Leftrightarrow \;\;\; \left( i=j \mathbf{and} x=y\right) \text { or } \left( \iota _i^{-1}(x)\in Z \mathbf{and} \iota _i^{-1}(x) = \iota _j^{-1}(y) \right) . \end{aligned}$$

The k-gluing of X along Z is now obtained as the quotient of the disjoint union of the \(X^i\) under this equivalence relation

$$\begin{aligned} {\hat{X}} := \left( \bigsqcup _{i=1}^k X^i \right) /\sim . \end{aligned}$$

We can view \(X^i\) as a subset of \({\hat{X}}\), since the canonical map \(\sqcup _{i} X^i\rightarrow {\hat{X}}\) restricted to \(X^i\) is injective. In the following, we will also make use of the partition

$$\begin{aligned} {\hat{X}} = \left( \bigsqcup _{i=1}^k Y^i \right) \sqcup Z . \end{aligned}$$

Define a metric \({\hat{d}}:{\hat{X}}\times {\hat{X}}\rightarrow {\mathbb {R}}\) by

$$\begin{aligned} {\hat{d}}(x,y) := {\left\{ \begin{array}{ll} \inf _{p\in Z} \left( d_i(x,\iota _i(p)) + d_j(\iota _j(p),y) \right) , &{} \text { if } x\in X^i, y\in X^j, i\ne j \\ d(x,y) , &{} \text { otherwise}. \end{array}\right. } \end{aligned}$$

As a measure we use \({\hat{{\mathfrak {m}}}}:= \frac{1}{k}\sum _{i=1}^k {\mathfrak {m}}^i\), meaning that for a Borel set \(A\subset {\hat{X}}\), we consider the restrictions to the copies and set

$$\begin{aligned} {\hat{{\mathfrak {m}}}} (A):= \frac{1}{k}\sum _{i=1}^k {\mathfrak {m}}^i(A\cap X^i). \end{aligned}$$

This turns \({\hat{X}}\) into a metric measure space.

For the special case of gluing together only two copies, we call the resulting space the doubling of Y in X, and as indices we will use \(i\in \{+,-\}\).

Proposition 3.1

The space \(({\hat{X}},{\hat{d}})\) is a complete and separable length space, and the measure \({\hat{{\mathfrak {m}}}}\) is Borel.

If additionally X is geodesic and Z is proper (i.e. all closed balls are compact), then \({\hat{X}}\) is geodesic.

Proof

The metric properties are shown in [6, p.67f, Lemma 5.24]. \(\square \)

The metric properties directly transfer to the Wasserstein space, see for instance [25].

Corollary 3.2

For \(p\in [1,\infty )\), the Kantorovich-Wasserstein metric \({\hat{W}}_p\) obtained from \({\hat{d}}\) is a complete, separable length metric on \({\mathcal {P}}_p({\hat{X}})\)

Now we introduce some notation for dealing with functions on \({\hat{X}}\). For us it will be useful to consider the functions \(u_i:X^i\rightarrow {\mathbb {R}}\) given by \(u_i:=u|_{X^i}\). We consider the mean value \({\bar{u}}:X\rightarrow {\mathbb {R}}, \;\;{\bar{u}}:= \frac{1}{k} \sum _{i=1}^k u_i\circ \iota _i\) and the “mean free” functions

$$\begin{aligned} \mathop {u_i}\limits ^{\circ } :X\rightarrow {\mathbb {R}},\;\;\; \mathop {u_i}\limits ^{\circ } := u_i\circ \iota _i-{\bar{u}}. \end{aligned}$$

Observe that since the \(u_i\) all coincide on Z, the \(\mathop {u_i}\limits ^{\circ }\) are zero everywhere on Z. Also, we have

$$\begin{aligned} \sum _{i=1}^k \mathop {u_i}\limits ^{\circ } =0. \end{aligned}$$
(3.1)

Notation: During the proof of Lemma 3.7 we will start to simplify notation, by mostly omitting the identification maps \(\iota _i\). Whenever a function \(u_i\) now gets an argument from X, it is understood as \(u_i\circ \iota _i\) and similar for \({\overline{u}}, \mathop {u_i}\limits ^{\circ }\) with \(\iota _i^{-1}\).

Let \(({\widehat{{\text {Ch}}}}, \hat{{\mathcal {F}}})\) denote the Cheeger energy of the space \(({\hat{X}}, {\hat{d}}, {\hat{{\mathfrak {m}}}})\).

Lemma 3.3

The space \({\hat{X}}\) is infinitesimally Hilbertian and for every \(u\in \hat{{\mathcal {F}}}\), the functions \(u_i\circ \iota _i\) are in \(\mathcal F\) and

$$\begin{aligned} {\widehat{{\text {Ch}}}}(u) = \frac{1}{k}\sum _{i=1}^k {\text {Ch}}(u_i\circ \iota _i). \end{aligned}$$

Proof

This follows directly from the locality property (3.2) of weak gradients by applying it to the open sets \(Y^i\) and \(Z^\circ \) (which can be found in [4, Thm. 4.19]):

Given a complete, separable metric space equipped with a Borel measure \((W,d_W,{\mathfrak {m}}_W)\), and an open subset \(\Omega \subset W\) with \({\mathfrak {m}}_W(\partial \Omega )=0\), we have that the restriction of a function \(f\in D({\text {Ch}}^W)\) to \({\overline{\Omega }}\) is a function in \(D({\text {Ch}}^{{\overline{\Omega }}})\), and

$$\begin{aligned} |\nabla (f|_{{\overline{\Omega }}})|_{{\overline{\Omega }}} = (|\nabla f|_{W})|_{{\overline{\Omega }}} \;\;\;{\mathfrak {m}}\text {-a.e. in } {\overline{\Omega }}. \end{aligned}$$
(3.2)

\(\square \)

In particular, we get a Dirichlet form \((\hat{{\mathcal {E}}}, D(\hat{{\mathcal {E}}}))\) on \({\hat{X}}\) by polarizing \(\hat{{\mathcal {E}}}(u):= 2{\widehat{{\text {Ch}}}}(u)\) and setting \(D(\hat{{\mathcal {E}}}):= \hat{{\mathcal {F}}}\).

Lemma 3.4

If \(u\in D(\hat{{\mathcal {E}}})\), then \({\bar{u}}\in D({\mathcal {E}})\) and \(\mathop {u_i}\limits ^{\circ }\in D({\mathcal {E}}^0),\, i=1,\dots ,n\).

Proof

Being in \(D(\hat{{\mathcal {E}}})\) means \({\widehat{{\text {Ch}}}}(u)<\infty \). By the previous lemma, this implies

$$\begin{aligned} \sum _{i=1}^k \frac{1}{k}{\text {Ch}}(u_i\circ \iota _i) = {\widehat{{\text {Ch}}}}(u) <\infty . \end{aligned}$$

Since each term is non-negative, \({\text {Ch}}(u_i\circ \iota _i)<\infty \) for every \(i=1,\dots ,k\). Thus \(u_i\circ \iota _i\in D({\mathcal {E}})\) and also the linear combination \({\bar{u}}\in D({\mathcal {E}})\).

The other assertion follows from the fact that all the \(u_i\)’s coincide on Z. \(\square \)

Now we are going to define a semigroup on \({\hat{X}}\) and we will show that it actually is the one corresponding to \(\hat{{\mathcal {E}}}\).

Definition 3.5

The glued semigroup \(P_t^{GL}: L^2({\hat{X}},{\hat{{\mathfrak {m}}}})\rightarrow L^2({\hat{X}},{\hat{{\mathfrak {m}}}})\) is defined by

$$\begin{aligned} P_t^{GL}u(x) := P_t{\bar{u}}(\iota _i^{-1}(x)) + P_t^0\mathop {u_i}\limits ^{\circ }(\iota _i^{-1}(x)), \,\,\text { if } x\in X^i ,\,\, i=1,\dots ,k. \end{aligned}$$

Also, define the approximated glued Dirichlet form \(\mathcal E_t^{GL} : L^2({\hat{X}},{\hat{{\mathfrak {m}}}})\times L^2({\hat{X}},{\hat{{\mathfrak {m}}}}) \rightarrow {\mathbb {R}}\),

$$\begin{aligned} {\mathcal {E}}_t^{GL}(u,v) := -\frac{1}{t}\int _{{\hat{X}}} v (P_t^{GL}u-u) {\text {d}}\!{\hat{{\mathfrak {m}}}}. \end{aligned}$$

Remark 3.6

Observe that \(P_t^{GL}\) is well-defined, since \(u_i=u_j\) on Z for every \(i,j=1,\dots ,k\).

Lemma 3.7

\((P_t^{GL})_{t>0}\) is a symmetric, strongly continuous contraction semigroup on \(L^2({\hat{X}},{\hat{{\mathfrak {m}}}})\). In particular, there is a corresponding Dirichlet form \(({\mathcal {E}}^{GL}, D({\mathcal {E}}^{GL}))\) connected to \(P_t^{GL}\) via

$$\begin{aligned} {\left\{ \begin{array}{ll} D({\mathcal {E}}^{GL}) = \left\{ u\in L^2({\hat{X}},{\hat{{\mathfrak {m}}}}) \,\Big |\, \lim _{t\rightarrow 0} {\mathcal {E}}_t^{GL}(u) <\infty \right\} \\ {\mathcal {E}}^{GL}(u,v) =\lim _{t\rightarrow 0} {\mathcal {E}}_t^{GL}(u,v), \; \text {for } u,v\in D({\mathcal {E}}^{GL}). \end{array}\right. }\end{aligned}$$

Proof

Symmetry: We use that \(P_t\) and \(P_t^0\) are symmetric with respect to \({\mathfrak {m}}\):

$$\begin{aligned}&\int _{{\hat{X}}} u P_t^{GL}v {\text {d}}\!{\hat{{\mathfrak {m}}}} = \sum _{i=1}^k \frac{1}{k} \int _{X^i} u_i \left( (P_t{\bar{v}})\circ \iota _i^{-1} + (P_t^0\mathop {v_i}\limits ^{\circ })\circ \iota _i^{-1}\right) \mathop {}\!\mathrm {d}{\mathfrak {m}}^i \\&\quad = \sum _{i=1}^k \frac{1}{k} \int _{X} {\bar{v}} P_t(u_i\circ \iota _i) + \mathop {v_i}\limits ^{\circ } P_t^0(u_i\circ \iota _i) \mathop {}\!\mathrm {d}{\mathfrak {m}}\\&\quad = \sum _{i,j=1}^k \frac{1}{k^2} \int _{X} (v_j\circ \iota _j) P_t(u_i\circ \iota _i) + (v_i\circ \iota _i) P_t^0(u_i\circ \iota _i) - (v_j\circ \iota _j) P_t^0(u_i\circ \iota _i) \mathop {}\!\mathrm {d}{\mathfrak {m}}\\&\quad = \sum _{i,j=1}^k \frac{1}{k^2} \int _{X} (v_j\circ \iota _j) P_t(u_i\circ \iota _i) + (v_j\circ \iota _j) P_t^0(u_j\circ \iota _j) - (v_j\circ \iota _j) P_t^0(u_i\circ \iota _i) \mathop {}\!\mathrm {d}{\mathfrak {m}}\\&\quad = \sum _{j=1}^k \frac{1}{k} \int _{X} (v_j\circ \iota _j) \frac{1}{k}\sum _{i=1}^k P_t(u_i\circ \iota _i) + (v_j\circ \iota _j) \left( P_t^0(u_j\circ \iota _j) - \frac{1}{k}\sum _{i=1}^k P_t^0(u_i\circ \iota _i)\right) \mathop {}\!\mathrm {d}{\mathfrak {m}}\\&\quad = \sum _{j=1}^k \frac{1}{k} \int _{X} (v_j\circ \iota _j) (P_t{\bar{u}} + P_t^0 \mathop {u_j}\limits ^{\circ } ) \mathop {}\!\mathrm {d}{\mathfrak {m}}= \int _{{\hat{X}}} vP_t^{GL}u {\text {d}}\!{\hat{{\mathfrak {m}}}}. \end{aligned}$$

From now on we will apply the abuse of notation introduced before. This is in order to improve readability.

Semigroup property: First observe that on \(X^i\) we have \(P_0^{GL}u=P_0{\bar{u}} + P_0^0\mathop {u_i}\limits ^{\circ }= {\bar{u}} + u_i-{\bar{u}}=u\). Denote \(v:= P_t^{GL}u\). Then \(v_i = P_t{\bar{u}} + P_t^0 \mathop {u_i}\limits ^{\circ }\). Now on \(X^i\)

$$\begin{aligned} P_s^{GL}P_t^{GL}u&= P_s^{GL}v = P_s{\bar{v}} + P_s^0\mathop {v_i}\limits ^{\circ } = \frac{1}{k}\sum _{j=1}^k P_s v_j + P_s^0v_i - \frac{1}{k}\sum _{j=1}^k P_s^0v_j \\&= \frac{1}{k}\sum _{j=1}^k P_s (P_t{\bar{u}} + P_t^0 \mathop {u_j}\limits ^{\circ }) + P_s^0(P_t{\bar{u}} + P_t^0 \mathop {u_i}\limits ^{\circ }) - \frac{1}{k}\sum _{j=1}^k P_s^0(P_t{\bar{u}} + P_t^0 \mathop {u_j}\limits ^{\circ }) \\&= \frac{1}{k} \sum _{j=1}^k P_{s+t}{\bar{u}} + \underbrace{\frac{1}{k} \sum _{j=1}^k P_sP_t^0\mathop {u_j}\limits ^{\circ }}_{=0} + P_s^0P_t{\bar{u}} + P_{s+t}^0 \mathop {u_i}\limits ^{\circ }\\&- \frac{1}{k}\sum _{j=1}^k P_s^0P_t{\bar{u}} - \underbrace{\frac{1}{k}\sum _{j=1}^k P_{s+t}^0 \mathop {u_j}\limits ^{\circ }}_{=0} \\&= P_{s+t}{\bar{u}} + P_{s+t}^0 \mathop {u_i}\limits ^{\circ } = P_{s+t}^{GL}u, \end{aligned}$$

where we used (3.1).

Contraction: To show the contraction property in \(L^2({\hat{X}},{\hat{{\mathfrak {m}}}})\), we first show that \(P_t^{GL}\) is Markovian (i.e. positivity preserving and \(L^\infty \)-contractive in \(L^2\cap L^\infty \)). By symmetry of \(P_t^{GL}\), we also get \(L^1\)-contractivity. Using the Riesz-Thorin interpolation theorem, we finally get contractivity in \(L^2\).

Let \(u\in L^2\cap L^\infty ({\hat{X}},{\hat{{\mathfrak {m}}}})\) with \(0\le u\le 1\). Then also \(0\le u_i,{\bar{u}}\le 1\). Then, on \(X^i\),

$$\begin{aligned} P_t^{GL}u = P_t{\bar{u}} + P_t^0\mathop {u_i}\limits ^{\circ } \le P_t{\bar{u}}+ P_t \mathop {u_i}\limits ^{\circ } = P_t u_i \le 1. \end{aligned}$$

For the other side, we have to show \(P_t^{GL}u\ge 0\), which is equivalent to

$$\begin{aligned} P_t^0 {\bar{u}} \le P_t{\bar{u}} + P_t^0u_i. \end{aligned}$$

But this holds true because \(P_t^0f\le P_tf\) for every \(f\in L^2\), and \(P_t^0 u_i\ge 0\).

Now we use that \(L^1\) is a subspace of the dual of \(L^\infty \). For \(u\in L^1\cap L^2({\hat{X}},{\hat{{\mathfrak {m}}}})\), consider the bounded, linear functional \(\ell : L^\infty ({\hat{X}},{\hat{{\mathfrak {m}}}})\rightarrow {\mathbb {R}},\, \ell (v):= \int _{{\hat{X}}} v P_t^{GL}u{\text {d}}\!{\hat{{\mathfrak {m}}}}\). The dual space norm of \(\ell \) coincides with the \(L^1\)-norm of \(P_t^{GL}u\), thus

$$\begin{aligned} \Vert P_t^{GL}u\Vert _{L^1({\hat{X}})}&= \sup _{\Vert v\Vert _{L^\infty ({\hat{X}})}\le 1} \int _{{\hat{X}}} vP_t^{GL}u {\text {d}}\!{\hat{{\mathfrak {m}}}} = \sup _{\Vert v\Vert _{L^\infty ({\hat{X}})}\le 1} \int _{{\hat{X}}} P_t^{GL}v u {\text {d}}\!{\hat{{\mathfrak {m}}}} \\ \le&\sup _{\Vert v\Vert _{L^\infty ({\hat{X}})}\le 1} \int _{{\hat{X}}} vu {\text {d}}\!{\hat{{\mathfrak {m}}}} = \Vert u\Vert _{L^1({\hat{X}})}. \end{aligned}$$

Here we used the symmetry of \(P_t^{GL}\) and the \(L^\infty \)-contractivity.

Hence \(P_t^{GL}\) is a contraction in \(L^1\cap L^2\) and also in \(L^\infty \cap L^2\). By the Riesz-Thorin interpolation theorem, it is then also a contraction in \(L^2\).

Strong continuity: This follows directly from the strong continuity of \(P_t\) and \(P_t^0\):

$$\begin{aligned} \Vert P_t^{GL}u-u\Vert _{L^2({\hat{X}})}^2&= \int _{{\hat{X}}} \left( P_t^{GL}u-u \right) ^2 {\text {d}}\!{\hat{{\mathfrak {m}}}} = \sum _{i=1}^k \frac{1}{k}\int _{X^i} \left( P_t{\bar{u}} + P_t^0\mathop {u_i}\limits ^{\circ } - u_i \right) ^2\mathop {}\!\mathrm {d}{\mathfrak {m}}^i \\&= \sum _{i=1}^k \frac{1}{k} \int _{X} \left( P_t{\bar{u}} -{\bar{u}} + P_t^0\mathop {u_i}\limits ^{\circ } - \mathop {u_i}\limits ^{\circ } \right) ^2\mathop {}\!\mathrm {d}{\mathfrak {m}}\\&\le \sum _{i=1}^k \frac{2}{k} \int _{X} \left( P_t{\bar{u}} -{\bar{u}} \right) ^2 + \left( P_t^0\mathop {u_i}\limits ^{\circ } - \mathop {u_i}\limits ^{\circ } \right) ^2\mathop {}\!\mathrm {d}{\mathfrak {m}}\\&= \sum _{i=1}^k \frac{2}{k} \left( \Vert P_t{\bar{u}}- {\bar{u}}\Vert _{L^2(X)}^2 + \Vert P_t^0\mathop {u_i}\limits ^{\circ }-\mathop {u_i}\limits ^{\circ }\Vert _{L^2(X)}^2 \right) \longrightarrow 0 \end{aligned}$$

as \( t\rightarrow 0\). \(\square \)

Lemma 3.8

For every \(u,v\in L^2({\hat{X}},{\hat{{\mathfrak {m}}}})\):

$$\begin{aligned} {\mathcal {E}}_t^{GL}(u,v) = {\mathcal {E}}_t({\bar{u}},{\bar{v}}) + \frac{1}{k}\sum _{i=1}^k {\mathcal {E}}_t^0(\mathop {u_i}\limits ^{\circ },\mathop {v_i}\limits ^{\circ }). \end{aligned}$$
(3.3)

Proof

We just compute

$$\begin{aligned} {\mathcal {E}}_t^{GL}(u,v)&= -\frac{1}{t} \int _{{\hat{X}}} v\left( P_t^{GL}u-u\right) {\text {d}}\!{\hat{{\mathfrak {m}}}} \\&= -\sum _{i=1}^k \frac{1}{kt}\int _{X^i} v_i\left( P_t{\bar{u}} + P_t^0\mathop {u_i}\limits ^{\circ }-u_i \right) \mathop {}\!\mathrm {d}{\mathfrak {m}}^i\\&= -\sum _{i=1}^k \frac{1}{kt}\int _{X} v_i\left( P_t{\bar{u}} - {\bar{u}} + P_t^0\mathop {u_i}\limits ^{\circ }-\mathop {u_i}\limits ^{\circ } \right) \mathop {}\!\mathrm {d}{\mathfrak {m}}\\&= -\frac{1}{t} \int _{X} {\bar{v}}\left( P_t{\bar{u}}-{\bar{u}} \right) \mathop {}\!\mathrm {d}{\mathfrak {m}}- \sum _{i=1}^k \frac{1}{k} \int _X v_i\left( P_t^0 \mathop {u_i}\limits ^{\circ }- \mathop {u_i}\limits ^{\circ } \right) \mathop {}\!\mathrm {d}{\mathfrak {m}}\\&+ \underbrace{\sum _{i=1}^k \frac{1}{k} \int _X {\bar{v}}\left( P_t^0 \mathop {u_i}\limits ^{\circ }- \mathop {u_i}\limits ^{\circ } \right) \mathop {}\!\mathrm {d}{\mathfrak {m}}}_{=0 \text { by } (3.1)}\\&= {\mathcal {E}}_t({\bar{u}},{\bar{v}}) + \frac{1}{k}\sum _{i=1}^k \mathcal E_t^0(\mathop {u_i}\limits ^{\circ },\mathop {v_i}\limits ^{\circ }). \end{aligned}$$

\(\square \)

Lemma 3.9

If \(u\in D({\mathcal {E}}^{GL})\), then \({\bar{u}}\in D({\mathcal {E}})\) and \(\mathop {u_i}\limits ^{\circ }\in D({\mathcal {E}}^0), \, i=1,\dots ,k\).

Proof

By definition and (3.3),

$$\begin{aligned} \infty > {\mathcal {E}}^{GL}(u) = \lim _{t\rightarrow 0} {\mathcal {E}}_t^{GL}(u) = \lim _{t\rightarrow 0}\left( {\mathcal {E}}_t({\bar{u}},{\bar{v}}) + \frac{1}{k}\sum _{i=1}^k {\mathcal {E}}_t^0(\mathop {u_i}\limits ^{\circ },\mathop {v_i}\limits ^{\circ }) \right) . \end{aligned}$$

Since the sum converges and every term is non-negative and non-decreasing as \(t\rightarrow 0\), the terms converge and we can interchange sum and limit to get

$$\begin{aligned} \infty > {\mathcal {E}}^{GL}(u) = \lim _{t\rightarrow 0} {\mathcal {E}}_t({\bar{u}},\bar{v}) + \frac{1}{k}\sum _{i=1}^k \lim _{t\rightarrow 0} {\mathcal {E}}_t^0(\mathop {u_i}\limits ^{\circ }, \mathop {v_i}\limits ^{\circ }) = {\mathcal {E}}({\bar{u}},{\bar{v}}) + \frac{1}{k}\sum _{i=1}^k {\mathcal {E}}^0(\mathop {u_i}\limits ^{\circ },\mathop {v_i}\limits ^{\circ }). \end{aligned}$$

\(\square \)

Now we come to the main theorem of this section, which identifies the semigroup \(P_t^{GL}\) with the heat semigroup \({\hat{P}}_t\) associated to \(\hat{{\mathcal {E}}}.\)

Theorem 3.10

The semigroups \(P_t^{GL}\) and \({\hat{P}}_t\) coincide on \(L^2({\hat{X}},{\hat{{\mathfrak {m}}}})\) .

Proof

We will proof that the Dirichlet forms \(({\mathcal {E}}^{GL}, D(\mathcal E^{GL}))\) and \((\hat{{\mathcal {E}}}, D(\hat{{\mathcal {E}}}))\) coincide. Let \(u,v\in D(\hat{{\mathcal {E}}})\). By Lemma 3.8,

$$\begin{aligned} {\mathcal {E}}_t^{GL}(u,v) = {\mathcal {E}}_t({\bar{u}},{\bar{v}}) + \frac{1}{k}\sum _{i=1}^k {\mathcal {E}}_t^0(\mathop {u_i}\limits ^{\circ },\mathop {v_i}\limits ^{\circ }). \end{aligned}$$

By Lemma 3.4, \({\bar{u}},{\bar{v}} \in D({\mathcal {E}})\) and \(\mathop {u_i}\limits ^{\circ },\mathop {v_i}\limits ^{\circ }\in D({\mathcal {E}}^0)\), so that we can take the limit \(t\rightarrow 0\). This yields

$$\begin{aligned} {\mathcal {E}}^{GL}(u,v)&= \lim _{t\rightarrow 0} {\mathcal {E}}_t^{GL}(u,v) = \lim _{t\rightarrow 0} \left( {\mathcal {E}}_t({\bar{u}},{\bar{v}}) + \frac{1}{k}\sum _{i=1}^k {\mathcal {E}}_t^0(\mathop {u_i}\limits ^{\circ },\mathop {v_i}\limits ^{\circ }) \right) \\&= {\mathcal {E}}({\bar{u}},{\bar{v}}) + \frac{1}{k}\sum _{i=1}^k \mathcal E^0(\mathop {u_i}\limits ^{\circ },\mathop {v_i}\limits ^{\circ }) = {\mathcal {E}}({\bar{u}},{\bar{v}}) + \frac{1}{k}\sum _{i=1}^k \mathcal E(\mathop {u_i}\limits ^{\circ },\mathop {v_i}\limits ^{\circ }) \\&= {\mathcal {E}}({\bar{u}},{\bar{v}}) + \frac{1}{k}\sum _{i=1}^k \mathcal E(u_i-{\bar{u}},v_i-{\bar{v}}) = \frac{1}{k} \sum _{i=1}^k \mathcal E(u_i,v_i) = \hat{{\mathcal {E}}}(u,v), \end{aligned}$$

where we used that \({\mathcal {E}}\) is an extension of \({\mathcal {E}}^0\). This also shows that \(D(\hat{{\mathcal {E}}})\subset D({\mathcal {E}}^{GL})\). The other direction works with the same argument but using Lemma 3.9 instead. \(\square \)

3.2 Identification of \(\tilde{{\mathcal {P}}}(Y|X)\) and \({\mathcal {P}}({\hat{X}})\)

We will show how the space of charged measures \(\tilde{\mathcal P}(Y|X)\) can be identified with the space of probability measures on the glued space, \({\mathcal {P}}({\hat{X}})\). Since we only look at two copies of \(Y\subset X\), we index the different copies by \(Y^+\) and \(Y^-\) instead of the numerical indices in the previous subsection. Still, \(Z:=X\setminus Y\) and \({\hat{X}} = \left( X^+\sqcup X^- \right) /\sim \). As we are dealing now with measures which are not equal on the different copies of X, in this section we do keep track of the identification maps \(\iota _i, i\in \{+,-\}\). Every subset used in this section is assumed to be a Borel-measurable set in the space it is taken from.

Lemma 3.11

The maps \( \Phi : \tilde{{\mathcal {P}}}(Y|X) \rightarrow {\mathcal {P}}({\hat{X}})\) and \(\Psi : {\mathcal {P}}({\hat{X}}) \rightarrow \tilde{{\mathcal {P}}}(Y|X)\), given by

$$\begin{aligned}&\Phi ((\sigma ^+,\sigma ^-))(A):= \sigma ^+(\iota _+^{-1}(A\cap Y^+)) + \sigma ^-(\iota _-^{-1}(A\cap Y^-)) + \sigma ^+(\iota _+^{-1}(A\cap Z))\\&\quad + \sigma ^-(\iota _-^{-1}(A\cap Z)) \end{aligned}$$

for \(A\subset {\hat{X}}\), and

$$\begin{aligned} \Psi ({\hat{\sigma }})^i(B):= {\hat{\sigma }}(\iota _i(B)\cap Y^i) +\frac{1}{2}{\hat{\sigma }}(\iota _i(B)\cap Z) \end{aligned}$$

for \(B\subset X,\, i\in \{+,-\}\), respectively, are inverse to each other and isometries between \((\tilde{{\mathcal {P}}}_p(Y|X), \tilde{W}_p)\) and \(({\mathcal {P}}_p({\hat{X}}), {\hat{W}}_p)\) for each \(p\in [1,\infty )\), where \({\hat{W}}\) denotes the Kantorovich-Wasserstein metric on \({\mathcal {P}}({\hat{X}})\).

The pf is straightforward and left to the reader.

The isometry allows to deduce a representation of the heat flow of charged measures in terms of the heat flows of their effective and total measures.

Lemma 3.12

Let \(\sigma \in {\tilde{{\mathcal {P}}}}(Y|X)\). Then

$$\begin{aligned} {\tilde{{\mathscr {P}}}}_t\sigma = \left( {\mathscr {P}}_t\frac{\sigma ^++\sigma ^-}{2} + {\mathscr {P}}_t^0\frac{\sigma ^+-\sigma ^-}{2}, {\mathscr {P}}_t\frac{\sigma ^++\sigma ^-}{2} - {\mathscr {P}}_t^0\frac{\sigma ^+-\sigma ^-}{2} \right) . \end{aligned}$$

Proof

We do the calculation in the equivalent setting of the doubled space \({\hat{X}}\). Let \({\hat{\sigma }}\in {\mathcal {P}}({\hat{X}})\). Then

$$\begin{aligned} \int _{{\hat{X}}} u {\text {d}}\!\hat{{\mathscr {P}}}\!_t{\hat{\sigma }}&= \int _{{\hat{X}}} {\hat{P}}_tu {\text {d}}\!{\hat{\sigma }} \\&= \int _{X^+} \Big (P_t\frac{u^++u^-}{2} + P^0_t\frac{u^+-u^-}{2}\Big ){\text {d}}\!\sigma ^+ \\&+ \int _{X^-} \Big (P_t\frac{u^++u^-}{2} - P^0_t\frac{u^+-u^-}{2}\Big ){\text {d}}\!\sigma ^- \\&= \int _{X^+}\frac{1}{2}u^+{\text {d}}\!{\mathscr {P}}_t\sigma ^+ + \int _{X^+}\frac{1}{2}u^-{\text {d}}\!{\mathscr {P}}_t\sigma ^+ \\&+ \int _{X^+}\frac{1}{2}u^+{\text {d}}\!{\mathscr {P}}_t^0\sigma ^+ - \int _{X^+}\frac{1}{2}u^-{\text {d}}\!{\mathscr {P}}_t^0\sigma ^+ \\&+ \int _{X^-}\frac{1}{2}u^+{\text {d}}\!{\mathscr {P}}_t\sigma ^- + \int _{X^-}\frac{1}{2}u^-{\text {d}}\!{\mathscr {P}}_t\sigma ^- - \int _{X^-}\frac{1}{2}u^+{\text {d}}\!{\mathscr {P}}_t^0\sigma ^-\\&+ \int _{X^-}\frac{1}{2}u^-{\text {d}}\!{\mathscr {P}}_t^0\sigma ^- \\&= \int _{X^+} u^+ {\text {d}}\!\left( {\mathscr {P}}_t\frac{\sigma ^++\sigma ^-}{2} + {\mathscr {P}}_t^0\frac{\sigma ^+-\sigma ^-}{2}\right) \\&+ \int _{X^-} u^- {\text {d}}\!\left( {\mathscr {P}}_t\frac{\sigma ^++\sigma ^-}{2} - {\mathscr {P}}_t^0\frac{\sigma ^+-\sigma ^-}{2}\right) \end{aligned}$$

We relied heavily on the fact that we glue together copies of the same space, making it possible to “switch” indices when necessary. \(\square \)

Lemma 3.13

Assumption 1.17 in \(\tilde{{\mathcal {P}}}_2(Y|X)\) is satisfied if and only if the entropy \({\widehat{{\text {Ent}}}}\) is convex in \(\mathcal P_2({\hat{X}})\) (i.e. \({\hat{X}}\) is an \({\text {RCD}}^*(K,\infty )\) space).

Proof

Let \({\hat{\sigma }}\in {\mathcal {P}}_2({\hat{X}})\) with \({\hat{\sigma }} = {\hat{\xi }}{\hat{{\mathfrak {m}}}}\). We will show that the entropy of \({\hat{\sigma }}\) in \({\mathcal {P}}_2({\hat{X}})\) equals that of \(\Psi ({\hat{\sigma }})\) in \(\tilde{{\mathcal {P}}}_2(Y|X)\) up to an additive constant, and then the result follows by Lemma 3.11 and the fact that K-convexity is preserved if you add a constant to the functional. We have

$$\begin{aligned} {\widehat{{\text {Ent}}}} ({\hat{\sigma }})&= \int _{{\hat{X}}} {\hat{\xi }} \log {\hat{\xi }}{\text {d}}\!{\hat{{\mathfrak {m}}}} \\&= \frac{1}{2}\int _{Y^+} {\hat{\xi }}|_{Y^+} \log {\hat{\xi }}|_{Y^+} \mathop {}\!\mathrm {d}{\mathfrak {m}}+ \frac{1}{2}\int _{Y^-} {\hat{\xi }}|_{Y^-} \log {\hat{\xi }}|_{Y^-} \mathop {}\!\mathrm {d}{\mathfrak {m}}+ \int _{Z} {\hat{\xi }}|_{Z} \log {\hat{\xi }}|_{Z} \mathop {}\!\mathrm {d}{\mathfrak {m}}\\&= \frac{1}{2}\int _{X^+} {\hat{\xi }}|_{X^+} \log {\hat{\xi }}|_{X^+} \mathop {}\!\mathrm {d}{\mathfrak {m}}+ \frac{1}{2}\int _{X^-} {\hat{\xi }}|_{X^-} \log {\hat{\xi }}|_{X^+} \mathop {}\!\mathrm {d}{\mathfrak {m}}\end{aligned}$$

On the other hand, to compute \({\widetilde{{\text {Ent}}}}(\Psi ({\hat{\sigma }}))\), let us first identify the density of \(\Psi ({\hat{\sigma }})^i\) with respect to \({\mathfrak {m}}\): For a Borel-measurable set \(A\subset X\)

$$\begin{aligned} \Psi ({\hat{\sigma }})^i (A)&= {\hat{\sigma }} (\iota _i(A)\cap Y^i) + \frac{1}{2}{\hat{\sigma }}(\iota _i(A)\cap Z) = \int _{\iota _i(A)\cap Y^i} {\text {d}}\!{\hat{\sigma }} + \frac{1}{2}\int _{\iota _i(A)\cap Z} {\text {d}}\!{\hat{\sigma }} \\&= \int _{\iota _i(A)\cap Y^i} \frac{1}{2}{\hat{\xi }} {\text {d}}\!{\mathfrak {m}}+ \frac{1}{2} \int _{\iota _i(A)\cap Z} {\hat{\xi }}\mathop {}\!\mathrm {d}{\mathfrak {m}}= \frac{1}{2} \int _{\iota _i(A)\cap X^i} {\hat{\xi }}|_{X^i} \mathop {}\!\mathrm {d}{\mathfrak {m}}, \end{aligned}$$

so that \(\Psi ({\hat{\sigma }})^i = \frac{1}{2} \left( {\hat{\xi }}|_{X^i}\circ \iota _i \right) {\mathfrak {m}}\). Thus

$$\begin{aligned} {\widetilde{{\text {Ent}}}}(\Psi ({\hat{\sigma }}))&= {\text {Ent}}(\Psi ({\hat{\sigma }})^+) +{\text {Ent}}(\Psi ({\hat{\sigma }})^-) \\&= \int _{X} \frac{1}{2} \left( {\hat{\xi }}|_{X^+}\circ \iota _+\right) \log \left( \frac{1}{2} \left( {\hat{\xi }}|_{X^+}\circ \iota _+\right) \right) \mathop {}\!\mathrm {d}{\mathfrak {m}}\\&+ \int _{X} \frac{1}{2} \left( {\hat{\xi }}|_{X^-}\circ \iota _-\right) \log \left( \frac{1}{2} \left( {\hat{\xi }}|_{X^-}\circ \iota _-\right) \right) \mathop {}\!\mathrm {d}{\mathfrak {m}}\\&= \int _{X} \frac{1}{2} \left( {\hat{\xi }}|_{X^+}\circ \iota _+\right) \log \left( \left( {\hat{\xi }}|_{X^+}\circ \iota _+\right) \right) \mathop {}\!\mathrm {d}{\mathfrak {m}}+ \int _{X} \frac{1}{2} \left( {\hat{\xi }}|_{X^+}\circ \iota _+\right) \log \left( \frac{1}{2}\right) \mathop {}\!\mathrm {d}{\mathfrak {m}}\\&+ \int _{X} \frac{1}{2} \left( {\hat{\xi }}|_{X^-}\circ \iota _-\right) \log \left( \left( {\hat{\xi }}|_{X^-}\circ \iota _-\right) \right) \mathop {}\!\mathrm {d}{\mathfrak {m}}+ \int _{X} \frac{1}{2} \left( {\hat{\xi }}|_{X^-}\circ \iota _-\right) \log \left( \frac{1}{2}\right) \mathop {}\!\mathrm {d}{\mathfrak {m}}\\&+ \log \frac{1}{2} \underbrace{\int _X \frac{1}{2} \left( {\hat{\xi }}|_{X^+}\circ \iota _+\right) + \frac{1}{2} \left( {\hat{\xi }}|_{X^-}\circ \iota _-\right) \mathop {}\!\mathrm {d}{\mathfrak {m}}}_{=1} \\&= {\widehat{{\text {Ent}}}} ({\hat{\sigma }}) + \log \frac{1}{2}. \end{aligned}$$

\(\square \)

4 Transportation (semi-)distances between subprobabilities

Let (Xd) be a complete separable metric space and \(Y\subset X\) be an open subset with \(\emptyset \not = Y\not = X\). Recall the definition of \(L^p\)-transportation semi-metric between subprobabilities \(\mu ,\nu \in {\mathcal {P}}^{sub}(Y)\):

$$\begin{aligned} W^0_p(\mu ,\nu ):&= \inf \Big \{\tilde{W}_p(\sigma ,\tau ) \,\Big |\, \sigma ,\tau \in \tilde{\mathcal {P}}(Y|X), \sigma ^0=\mu , \tau ^0=\nu \Big \} \\&=\inf \Big \{\tilde{W}_p\big ( (\mu +\rho ,\rho ), (\nu +\eta ,\eta )\big ) \,\Big |\, \rho ,\eta \in {\mathcal {P}}^{sub}(X), (\mu +2\rho )(X)=1,\\&\quad (\nu +2\eta )(X)=1\Big \}. \end{aligned}$$

Proof of Lemma 1.1

This is an immediate consequence of the isometry between \(\tilde{{\mathcal {P}}}_p(Y|X)\) and \({\mathcal {P}}_p({\hat{X}})\), together with Lemma 3.1. \(\square \)

Every coupling of the charged probability measures \((\mu +\rho ,\rho )\) and \((\nu +\eta ,\eta )\) induces a decomposition of each of the involved measures into three parts. This leads to another, more detailed description of the transportation costs from above.

Lemma 4.1

Let \(\mu ,\nu \in {\mathcal {P}}_p^{sub}(Y)\). Then

$$\begin{aligned} W^0_p(\mu ,\nu )^p= & {} \inf \Big \{ W_p(\mu _1,\nu _1)^p+W_p(\mu _2,\eta _1^+)^p+W_p^*(\mu _3,\eta _1^-)^p \nonumber \\&+W_p(\rho ^+_1,\nu _2)^p+W_p(\rho ^+_2,\eta _2^+)^p+W_p^*(\rho ^+_3,\eta _2^-)^p \nonumber \\&+W^*_p(\rho ^-_1,\nu _3)^p+W^*_p(\rho ^-_2,\eta _3^+)^p+W_p(\rho ^-_3,\eta _3^-)^p \,\Big |\, \nonumber \\ \mu= & {} \mu _1+\mu _2+\mu _3, \rho =\rho _1^++\rho _2^++\rho _3^+=\rho _1^-+\rho _2^-+\rho _3^-,(\mu +2\rho )(X)=1, \nonumber \\ \nu= & {} \nu _1+\nu _2+\nu _3, \ \eta =\eta _1^++\eta _2^++\eta _3^+=\eta _1^-+\eta _2^-+\eta _3^-, \ (\nu +2\eta )(X)=1 \Big \}.\nonumber \\ \end{aligned}$$
(4.1)

The decompositions implicitly require the coupled measures to have the same mass, so for instance \(\mu _1(X)=\nu _1(X)\) etc.

The pf consists in using again the isometry between \(\tilde{{\mathcal {P}}}_p(Y|X)\) and \(\hat{{\mathcal {P}}}_p({\hat{X}})\) and disintegrating the appearing measures. In the case \(p=1\), a more explicit description is possible.

Lemma 4.2

For all \(\mu ,\nu \in {\mathcal {P}}_p^{sub}(Y)\) and \(p=1\)

$$\begin{aligned} {W^0_p}(\mu ,\nu )^p&= \inf \Big \{ W_p(\mu _1,\nu _1)^p + W_p^*(\mu _0)^p + W^*_p(\nu _0)^p \,\Big |\, \\ \mu&=\mu _1+\mu _0, \nu =\nu _1+\nu _0, (\mu +\nu _0)(X)\le 1, (\nu +\mu _0)(X)\le 1 \Big \}. \end{aligned}$$

Moreover, the \(\le \)-inequality holds for all \(p\in [1,\infty )\) if (Xd) is a length space.

Proof

The “\(\le \)”-direction follows from the previous Lemma by choosing the decomposition \(\rho _3^+=\eta _2^-=\rho _2^-=\eta _3^+=0\) and \(\rho _2^+=\eta _2^+=\rho _3^-=\eta _3^-\), so that

$$\begin{aligned} {W}^0_p(\mu ,\nu )^p&\le \inf \big \{ W_p(\mu _1,\nu _1)^p + W_p(\mu _2,\eta _1^+)^p + W^*_p(\mu _3,\eta _1^-)^p + W_p(\rho _1^+,\nu _2)^p \\&\quad + W^*_p(\rho _1^-,\nu _3)^p \,\big |\, \\&\quad (\mu +2\nu _2)(X)\le 1, \ (\nu +2\mu _2)(X)\le 1 \big \}\\&\le \inf \big \{ W_p(\mu _1,\nu _1)^p + W^*_p(\mu _0)^p + W^*_p(\nu _0)^p \,\big |\, \ (\mu +\nu _0)(X)\le 1, \ \\&\quad (\nu +\mu _0)(X)\le 1 \big \}. \end{aligned}$$

For the second inequality, we used in the case \(p=1\) simply the fact that \(\rho _1^+=\rho _1^-, \eta _1^+=\eta _1^-\) and

$$\begin{aligned} \inf _{\eta _1^+,\, \mu _2+\mu _3=\mu _0} \Big [ W_1(\mu _2,\eta _1^+)+W_1^*(\eta _1^+,\mu _3)\Big ]\le \frac{1}{2} W_1^*(\mu _0,\mu _0)= W_1^*(\mu _0) \end{aligned}$$

by choosing \(\eta _1^+=\mu _2=\mu _3=\frac{1}{2}\mu _0\).

The case \(p>1\) requires a more sophisticated argumentation using optimal transport in the glued space \({\hat{X}}=(X\setminus Y)\cup Y^+\cup Y^-\). We freely switch between equivalent representations in \((\tilde{{\mathcal {P}}}_p(Y|X), \tilde{W}_p)\) and in \((\mathcal P_p({\hat{X}}), {\hat{W}}_p)\). Assume for simplicity that (Xd) is geodesic. (For general length spaces, one has to use approximation arguments based on almost geodesics.) Given a \({\tilde{W}}_p\)-geodesic \((\sigma _t)_{t\in [0,1]}\) connecting \(\sigma _0:=(\mu _0,0)\) and \(\sigma _1:=(0,\mu _0)\), we decompose it into two \(\tilde{W}_p\)-geodesics \((\sigma '_t)_{t\in [0,1]}\) and \((\sigma ''_t)_{t\in [0,1]}\) such that \(\tilde{W}_p(\sigma '_0,\sigma '_1)=\tilde{W}_p(\sigma ''_0,\sigma ''_1)=\frac{1}{2}{\tilde{W}}_p(\sigma _0,\sigma _1)\) and \(\sigma '_{1/2}(Y^-)=\sigma ''_{1/2}(Y^+)=0\). (Actually, it suffices that \({\tilde{W}}_p(\sigma '_0,\sigma '_1)\ge \frac{1}{2}\tilde{W}_p(\sigma _0,\sigma _1)\) and \(\sigma '_{1/2}(Y^-)=0\).) Choosing \(\mu _2=(\sigma '_0)^+\), \(\mu _3=(\sigma '_1)^-\), and \(\eta _1^+=(\sigma '_{1/2})^+\) then yields

$$\begin{aligned}&\inf _{\eta _1^+,\, \mu _2+\mu _3=\mu _0} \Big [ W_p(\mu _2,\eta _1^+)^p+W_p^*(\eta _1^+,\mu _3)^p\Big ]\\&\quad \le W_p\big ((\sigma '_0)^+,(\sigma '_{1/2})^+\big )^p+W_p^*\big ((\sigma '_{1/2})^+,(\sigma '_1)^+\big )^p\\&\quad = {\tilde{W}}_p\big (\sigma '_0,\sigma '_{1/2}\big )^p+\tilde{W}_p\big (\sigma '_{1/2},\sigma '_1\big )^p =2^{1-p}{\tilde{W}}_p\big (\sigma '_0,\sigma '_{1}\big )^p\\&\quad \le 2^{-p}\tilde{W}_p\big (\sigma _0,\sigma _{1}\big )^p=2^{-p}W^*_p(\mu _0,\mu _0)^p=W^*_p(\mu _0)^p. \end{aligned}$$

To prove the “\(\ge \)”- inequality, we assume for simplicity that minimizers in the definition of \(W_1^0\) exist. This is for instance the case when X is compact. For the general case one has to work with almost-minimizers. Let subprobabilities \(\mu \) and \(\nu \) be given as well as \(\rho \) and \(\eta \) with \((\mu +2\rho )(X)=1, (\nu +2\eta )(X)=1\) such that

$$\begin{aligned} {W}^0_1(\mu ,\nu )= & {} {\tilde{W}}_1\big ((\mu +\rho ,\rho ), (\nu +\eta ,\eta )\big )\\= & {} {\hat{W}}_1\big (\mu +\rho +\rho ',\nu +\eta +\eta '\big ) \end{aligned}$$

where for the last identity we switched to the picture of the glued space \({\hat{X}}=(X\setminus Y)\cup Y^+\cup Y^-\) with subprobabilities \(\mu ,\nu ,\rho ,\eta \) on the “upper” sheet \((X\setminus Y)\cup Y^+\) and their copies \(\rho ',\eta '\) on the “lower” sheet \((X\setminus Y)\cup Y^-\). We further assume for the moment that all masses are rational numbers.

Given \(\varepsilon >0\), choose \(n,n_1,n_2\in {{\mathbb {N}}}\) and \(x_i,y_i,u_i,v_i\in X^+\) for \(i=1,\ldots ,n\) such that

$$\begin{aligned} W_1(\mu ,\mu _n)\le \varepsilon ,\quad W_1(\nu ,\nu _n)\le \varepsilon ,\quad W_1(\rho ,\rho _n)\le \varepsilon ,\quad W_1(\eta ,\eta _n)\le \varepsilon \end{aligned}$$

for

$$\begin{aligned} \mu _n=\frac{1}{n}\sum _{i=1}^{n-2n_1}\delta _{x_i},\quad \nu _n=\frac{1}{n}\sum _{i=1}^{n-2n_2}\delta _{y_i},\quad \rho _n=\frac{1}{n}\sum _{i=1}^{n_1}\delta _{u_i},\quad \eta _n=\frac{1}{n}\sum _{i=1}^{n_2}\delta _{v_i}. \end{aligned}$$

Hence also \(W_1(\rho ',\rho '_n)\le \varepsilon \), \(W_1(\eta ',\eta '_n)\le \varepsilon \) for \(\rho '_n=\frac{1}{n}\sum _{i=1}^{n_1}\delta _{u'_i}\), \(\eta '_n=\frac{1}{n}\sum _{i=1}^{n_2}\delta _{v'_i} \) with \(u'_i= \iota _-\circ \iota _+^{-1}(u_i)\) and similarly for \(v'_i\). (To avoid ambiguity, we may assume that the sets \(\{x_i\}\) and \( \{y_i\}\) are disjoint form each other.) In particular we have \(\frac{n_1}{n}=\rho (X)\) and so on.

Now fix a \({\hat{W}}_1\)-optimal coupling \(q_n\) of \(\mu _n+\rho _n+\rho '_n\) and \(\nu _n+\eta _n+\eta '_n\) on \({\hat{X}}\). Without restriction, we can choose this coupling \(q_n\) as a matching (i.e. it does not split mass), that is,

$$\begin{aligned} q_n=\frac{1}{n}\sum _{\xi \in Q_n}\delta _\xi \end{aligned}$$

with suitable \(Q_n\subset Z\times W\) where \(Z=\{x_i\}\cup \{u_i\}\cup \{u'_i\}\) and \(W=\{y_i\}\cup \{v_i\}\cup \{v'_i\}\). Now consider chains of (pairwise disjoint) pairs in \(Q_n\) with either initial points or endpoints of subsequent pairs being conjugate to each other. These chains of maximal length will be of the form

  1. Case 1:

    \((z_1,w_1), (z'_2,w_1'), (z_2,w_2), (z'_3, w'_2),\ldots ,(z'_{k},w'_{k-1}),(z_{k},w_k)\)

  2. Case 2:

    \((z_1,w_1), (z'_1,w_2'), (z_2,w_2), (z'_2, w'_3),\ldots ,(z'_{k-1},w'_{k}),(z_{k},w_k)\)

  3. Case 3:

    \((z_1,w_1), (z'_2,w_1'), (z_2,w_2),\ldots ,(z'_{k},w'_{k-1})\) with \(z'_k\not =z'_1\)

  4. Case 4:

    \((z_1,w_1), (z'_1,w_2'), (z_2,w_2),\ldots ,(z'_{k-1},w'_{k})\) with \(w'_k\not =w'_1\)

  5. Case 5:

    \((z_1,w_1), (z'_2,w_1'), (z_2,w_2),\ldots ,(z'_{1},w'_{k-1})\)

  6. Case 6:

    \((z_1,w_1), (z'_1,w_2'), (z_2,w_2),\ldots ,(z'_{k-1},w'_{1})\)

with \(z_i,z'_i\in Z\), \(w_i,w'_i\in W\) and \(z\mapsto z'\) denoting the “conjugation map” which switches between upper and lower sheet. In particular, \((z')'=z\).

Now let us have a closer look on the previous six cases of chains of maximal length.

  1. Case 1:

    Maximality implies \(z_1\in \{x_i\}\) and \(w_k\in \{y_i\}\) whereas all the other points inbetween \(w_i,w'_i,z_i,z'_i\in \{u_i\}\cup \{u'_i\}\cup \{v_i\}\cup \{v'_i\}\). The transportation cost associated with this chain is at least

    $$\begin{aligned} {\hat{d}}(z_1,w_1) + {\hat{d}}(w'_1,z'_2) + \dots + {\hat{d}}(z_k,w_k)\ge {\hat{d}}(z_1,w_k) = d(z_1, w_k) \end{aligned}$$

    and thus is bounded from below by the cost of the direct transport between the endpoints.

    Denote by \(X_1\subset \{x_i\}\) the set of \(z_1\) in case 1 and by \(Y_1\subset \{y_i\}\) the set of \(w_k\). Let

    $$\begin{aligned} \mu _n^1=\frac{1}{n}\sum _{x\in X_1}\delta _x, \quad \nu _n^1=\frac{1}{n}\sum _{y\in Y_1}\delta _y. \end{aligned}$$

    Then the transport costs arising from all pairs contained in any chain of case 1 is bounded from below by \(W_1\big (\mu _n^1,\nu _n^1\big )\).

  2. Case 2:

    This is just a relabeling of case 1 with indices running in reverse order. No additional costs arise.

  3. Case 3:

    Here, maximality implies \(z_1\in \{x_i\}\) and also \(z_k'\in \{x_i\}\). Thus at least one of the pairs in the chain consists of points from two different sheets. Thus with the triangle inequality on \({\hat{X}}\), we conclude that the cost of this chain is at least \(d^*(z_1,z'_k)\).

    Denote by \(X_0\subset \{x_i\}\) the set of \(z_1\) in case 3. Note that this set coincides with the set of \(z_k'\) (just by reverting the chain)—but for calculating the cost induced by the coupling \(q_n\), only one of the pairs \((z_1,z'_k)\) and \((z'_k,z_1)\) has to be taken into account.

    Let

    $$\begin{aligned} \mu _n^0=\frac{1}{n}\sum _{x\in X_0}\delta _x. \end{aligned}$$

    Then the transport costs arising from all pairs contained in any chain of case 3 is bounded from below by \(\frac{1}{2}W^*_1\big (\mu _n^0,\mu _n^0\big )\).

  4. Case 4:

    Similarly, here we conclude \(w_1\in \{y_i\}\) as well as \(w_k'\in \{y_i\}\) and that the cost of the chain is at least \(d^*(w_1,w'_k)\). Denote by \(Y_0\subset \{x_i\}\) the set of \(w_1\) in case 4 and set

    $$\begin{aligned} \nu _n^0=\frac{1}{n}\sum _{y\in Y_0}\delta _y. \end{aligned}$$

    Then the transport costs arising from all pairs contained in any chain of case 4 is bounded from below by \(\frac{1}{2}W^*_1\big (\nu _n^0,\nu _n^0\big )\).

  5. Case 5:

    The cyclic chains in this case will produce superfluous costs which will vanish for optimal choices of measures \(\rho _n,\eta _n\). That is, 0 is the best lower estimate for the transport costs arising from all pairs contained in any chain of case 5. This infimum will be attained by chains of length \(k=2\) of the form \((z_1,w_1), (z'_1,w'_1)\) with \(z_1=w_1\).

  6. Case 6:

    This is a cyclic permutation of case 5. No additional costs arise.

Summarizing, we obtain

$$\begin{aligned} {\hat{W}}_1\big (\mu _n+\rho _n+\rho '_n,\nu _n+\eta _n+\eta '_n\big )\ge W_1\big (\mu _n^1,\nu _n^1\big )+ \frac{1}{2}W^*_1\big (\mu _n^0,\mu _n^0\big )+\frac{1}{2}W^*_1\big (\nu _n^0,\nu _n^0\big ). \end{aligned}$$

Now for given \(\varepsilon \) and n, the decomposition \(\mu _n=\mu _n^1+\mu _n^0\) induces via the optimal coupling of \(\mu _n\) and \(\mu \) a decomposition \(\mu =\mu ^1+\mu ^0\) such that

$$\begin{aligned} W_1(\mu ^1,\mu ^1_n)\le \varepsilon , \quad W_1(\mu ^0,\mu ^0_n)\le \varepsilon . \end{aligned}$$

Similarly, for \(\nu _n=\nu _n^1+\nu _n^0\) and \(\nu =\nu ^1+\nu ^0\). Thus we finally obtain

$$\begin{aligned} W_1^0(\mu ,\nu )= & {} {\hat{W}}_1\big (\mu +\rho +\rho ',\nu +\eta +\eta '\big ) \nonumber \\\ge & {} {\hat{W}}_1\big (\mu _n+\rho _n+\rho '_n,\nu _n+\eta _n+\eta '_n\big )-6\varepsilon \nonumber \\\ge & {} W_1\big (\mu _n^1,\nu _n^1\big )+ \frac{1}{2}W^*_1\big (\mu _n^0,\mu _n^0\big )+\frac{1}{2}W^*_1\big (\nu _n^0,\nu _n^0\big )-6\varepsilon \nonumber \\\ge & {} W_1\big (\mu ^1,\nu ^1\big )+ \frac{1}{2}W^*_1\big (\mu ^0,\mu ^0\big )+\frac{1}{2}W^*_1\big (\nu ^0,\nu ^0\big )-10\varepsilon . \end{aligned}$$
(4.2)

Since \(\varepsilon >0\) was arbitrary, this proves the claim.

For the general case of real masses, one can approximate Borel measures by sums of Dirac measures (with rational masses) in the weak topology. By continuity of \({\tilde{W}}_1, W_1\) and \(W_1^*\) with respect to weak convergence, one can apply the rational case and go to the limit in (4.2). \(\square \)

Proof of Lemma 1.10

Assertions (i) and (ii) are the content of the previous Lemma. The proof for the decomposition in assertion (iv) is straightforward. For the vanishing of the \(W_p^\dagger \)-term in the case \(p=1\) note that in this case \([d'(x,\partial )+d'(x,\partial )]^p=d'(x,\partial )^p+d'(x,\partial )^p\) whereas in general only the \(\ge \) inequality holds.

Assertion (iii) will follow from combining assertion (iv), Lemma 1.11 and Theorem 1.13(i). \(\square \)

In the case of a length space X, the annihilation cost \(W_1^*(\mu )\) allows for an alternative characterization as \(\inf \{ W_1(\mu ,\xi ): \xi \in {{\mathcal {P}}}(\partial Y)\}\) and, more generally,

$$\begin{aligned} W_1^*(\mu ,\nu )= \inf \{ W_1(\mu ,\xi )+ W_1(\xi ,\nu ) \,\big |\, \xi \in {{\mathcal {P}}}(\partial Y)\}. \end{aligned}$$

This is the content of Lemma 1.11.

Proof of Lemma 1.11

We switch to the picture of two glued copies. Given \(\mu ,\nu \in {\mathcal {P}}(Y)\), consider them as \(\mu \in {\mathcal {P}}(Y^+)\) and \(\nu \in {\mathcal {P}}(Y^-)\) and fix a \({\hat{W}}_1\)-optimal coupling q of them.

To simplify the presentation, let us first discuss the argument if \({\hat{X}}\) is a geodesic space. Choose a measurable selection of connecting \({\hat{d}}\)-geodesics \(\Gamma : {\hat{X}}\times {\hat{X}}\rightarrow \mathrm {Geo}({\hat{X}})\). For a geodesic \(\gamma \) in \({\hat{X}}\) with \(\gamma _0\in Y^+, \gamma _1\in Y^-\) define \(\alpha (\gamma )=\inf \{s: \gamma _s\not \in Y^+\}\) and \(z(\gamma ):=\gamma _{\alpha (\gamma )}\). Finally, define a map \(Y^+\times Y^-\rightarrow \partial Y\) by \({\mathcal {Z}} =z\circ \Gamma \).

Define a probability measure \(\xi ={\mathcal {Z}}_\#q\) via push forward of the optimal coupling. Then this is a \({\hat{W}}_1\)-intermediate point of \(\mu \) and \(\nu \). Indeed, for the transport from \(\mu \) to \(\nu \), the pair \(x\in Y^+, y\in Y^-\) contributes the cost \(d^*(x,y)\). The fraction \(\alpha (x,y) \cdot d^*(x,y)\) contributes to the cost of the transport from \(\mu \) to \(\xi \). And the fraction \((1-\alpha (x,y)) \cdot d^*(x,y)\) contributes to the cost of the transport from \(\xi \) to \(\nu \).

Now let us discuss the general case of a length space X. Instead of geodesics, we now choose approximate \({\hat{d}}\)-geodesics. With the same construction then \(\xi \) will be an approximate \({\hat{W}}_1\)-intermediate point. This proves the claim in the case \(p=1\).

To prove the claim for \(p>1\), for simplicity we assume that X is compact. (This will guarantee the existence of the map \(\Phi \) to be introduced below. Otherwise, one has to use approximation arguments.)

For each \(\xi \in {{\mathcal {P}}}(\partial Y)\) and each \(W_p\)-optimal coupling q of \(\mu \) and \(\xi \)

$$\begin{aligned} W_p(\mu ,\xi )^p = \int _{X\times X} d(x,y)^p {\text {d}}\!q(x,y) \ge \int _{X\times X} d'(x,\partial )^p {\text {d}}\!q(x,y) = W'_p(\mu ,0)^p. \end{aligned}$$

To deduce the converse inequality, choose a measurable \(\Phi : Y\rightarrow \partial Y\) such that for each \(x\in Y\) the point \(\Phi (x)\) is a minimizer of \(z\mapsto d(x,z)\) on \(\partial Y\). Define a probability measure \(\xi =\Phi _\sharp \mu \). Then

$$\begin{aligned} W_p(\mu ,\xi )^p \le \int _X d(x,\Phi (x))^p {\text {d}}\!\mu (x) = \int _X d'(x,\partial )^p {\text {d}}\!\mu (x) = W'_p(\mu ,0)^p. \end{aligned}$$

This proves that

$$\begin{aligned} W_p'(\mu ,0) = \inf \{ W_p(\mu ,\xi ) \,\big |\, \xi \in {\mathcal P}(\partial Y)\}. \end{aligned}$$

Moreover, the triangle inequality for \(d^*\) implies that \(W_p(\mu ,\xi )+ W_p(\xi ,\mu ) \ge W^*_p(\mu ,\mu )\) for all \(\xi \in {{\mathcal {P}}}(\partial Y)\). Thus \(W'_p(\mu ,0)\ge W^*_p(\mu )\). An estimate in the other direction is obtained as follows

$$\begin{aligned} W_p^*(\mu )^p = 2^{-p} W_p^*(\mu ,\mu )^p =&\, 2^{-p} \int _{X\times X} \left( \inf _{z\in X\setminus Y} \big ( d(x,z)+ d(z,y) \big ) \right) ^p {\text {d}}\!q(x,y) \\ \ge&\, 2^{-p} \int _{X\times X} \left( \inf _{z\in X\setminus Y} d(x,z) + \inf _{w\in X\setminus Y} d(w,y) \right) ^p {\text {d}}\!q(x,y) \\ \ge&\, 2^{1-p} \int _{X\times X} \left( \inf _{z\in X\setminus Y} d(x,z)\right) ^p {\text {d}}\!q(x,y) = 2^{1-p} W_p'(\mu ,0)^p, \end{aligned}$$

where q denotes any \(W^*_p\)-optimal coupling of \(\mu \) and \(\mu \). \(\square \)

Proof of Theorem 1.13

  1. (i)

    For simplicity of the presentation we assume that length minimizing geodesics exist. This is for instance the case when \(Y'\) is geodesic. In this case there exist \(W_1'\)-geodesics which are supported on \(d'\)-geodesics. For the general case one has to work with almost-geodesics.

    Recall that then \(W_1'\) is a geodesic metric on \(\mathcal P^{sub}_1(Y)\) and that, according to Lemma 1.10iv) and Lemma 1.11,

    $$\begin{aligned} W_1'(\mu ,\nu ) = \inf \Big \{&W_1(\mu _1,\nu _1) + W_1^*(\mu _0) + W^*_1(\nu _0) \,\Big |\, \mu =\mu _1+\mu _0, \nu =\nu _1+\nu _0 \Big \}. \end{aligned}$$

    for all subprobability measures \(\mu ,\nu \in {\mathcal {P}}^{sub}_1(Y)\). Together with Lemma 1.10i) this implies \( W_1'(\mu ,\nu )\le W_1^0(\mu ,\nu )\). In particular, \(W_1^0\) does not vanish outside the diagonal. As \(W_1^\flat \) is the biggest metric below \(W_1^0\), we have \(W_1' \le W_1^\flat \). Using the fact that \(W_1'\) is a geodesic metric, we thus get

    $$\begin{aligned} W_1^\sharp (\mu ,\nu ) =&\inf _{\begin{array}{c} \eta :\mu \leadsto \nu \\ W_1^\flat \text {-cont.} \end{array}} \sup _{0=s_0<\ldots<s_n=1} \sum _{i=1}^n W_1^\flat (\eta _{s_{i-1}}, \eta _{s_i}) \\ \ge&\inf _{\begin{array}{c} \eta :\mu \leadsto \nu \\ W_1^\flat \text {-cont.} \end{array}} \sup _{0=s_0<\ldots<s_n=1} \sum _{i=1}^n W_1'(\eta _{s_{i-1}}, \eta _{s_i}) \\ \ge&\inf _{\begin{array}{c} \eta :\mu \leadsto \nu \\ W_1'\text {-cont.} \end{array}} \sup _{0=s_0<\ldots <s_n=1} \sum _{i=1}^n W_1'(\eta _{s_{i-1}}, \eta _{s_i}) \ = \ W_1'(\mu ,\nu ). \end{aligned}$$

    To prove the converse inequality, given \(\mu ,\nu \in \mathcal P^{sub}(Y)\), let \((\eta _s')_{s\in [0,1]}\) be a \(W'_1\)-geodesic connecting \(\mu ',\nu '\) in \({\mathcal {P}}(Y')\) which is supported on (constant-speed) \(d'\)-geodesics. Decompose this geodesic into two geodesics \(\eta '_s=\eta '_{s,1}+\eta '_{s,0}\) where \((\eta _{s,1}')_{s\in [0,1]}\) is a \(W'_1\)-geodesic supported by \(d'\)-geodesics staying in Y and \((\eta _{s,0}')_{s\in [0,1]}\) is a \(W'_1\)-geodesic supported by \(d'\)-geodesics passing through \(\partial \).

    Now replace the latter by another curve \(({\tilde{\eta }}_{s,0}')_{s\in [0,1]}\) with the same endpoints:

    $$\begin{aligned} {\tilde{\eta }}'_{s,0} :=\left\{ \begin{array}{ll} (1-2s)\eta '_{0,0}+2s\eta '_{0,0}(Y') \, \delta _\partial ,&{} s\in [0,\frac{1}{2}]\\ (2s-1)\eta '_{1,0}+2(1-s)\eta '_{0,0}(Y') \, \delta _\partial ,\quad &{} s\in (\frac{1}{2},1]. \end{array}\right. \end{aligned}$$

    (Indeed, this is also a \(W'_1\)-geodesic since in the \(L^1\)-Wasserstein geometry also convex combinations are geodesics.) Consider \({\tilde{\eta }}_{s}={\tilde{\eta }}'_{s,0}\big |_Y+\eta _{s,1}\). This is a curve in \({\mathcal {P}}^{sub}(Y)\) which connects \(\mu \) and \(\nu \). Moreover, taking decompositions

    $$\begin{aligned} {\tilde{\eta }}_s = \underbrace{(\eta _{s,1}+{\tilde{\eta }}'_{t,0}|_Y)}_{``\mu _1''} + \underbrace{2(t-s)\eta _{0,0}}_{``\mu _0''} \;\;\text { and }\;\; {\tilde{\eta }}_t= \underbrace{(\eta _{t,1} + {\tilde{\eta }}'_{t,0}|_Y)}_{``\nu _1''} + \underbrace{0}_{``\nu _0''} \end{aligned}$$

    in Lemma 1.10 i) for \(s\le t\le \frac{1}{2}\) and similar for the other cases, we get

    $$\begin{aligned} W^0_1({\tilde{\eta }}_s,{\tilde{\eta }}_t)\le \left\{ \begin{array}{ll} |t-s|\cdot W_1(\eta _{0,1},\eta _{1,1}) +2|t-s|\cdot W_1^*(\eta _{0,0}) , \quad &{} \text{ for } s,t\le \frac{1}{2}\\ |t-s|\cdot W_1(\eta _{0,1},\eta _{1,1}) +2|t-s|\cdot W_1^*(\eta _{1,0}) , \quad &{} \text{ for } s,t\ge \frac{1}{2} \end{array}\right. \end{aligned}$$

    and thus

    $$\begin{aligned} L^\flat _1({\tilde{\eta }})\le W_1(\eta _{0,1},\eta _{1,1})+W_1^*(\eta _{0,0})+W_1^*(\eta _{1,0}) =W_1'(\mu ,\nu ) \end{aligned}$$

    which finally implies \(W_1^\sharp (\mu ,\nu )\le W_1'(\mu ,\nu ) \).

    Since \(W_1^\sharp \) is the length metric induced by \(W_1^\flat \), one gets \(W_1^\flat \le W_1^\sharp \). The other inequality is provided by the fact that \(W_1^\flat \) is the biggest metric below \(W_1^0\) and that \(W_1^\sharp = W_1' \le W_1^0\) by the above.

  2. (ii)

    Now let us consider the case \(p>1\). The idea is that locally (along a geodesic) the contribution of \(W_p^\dagger \) is negligible, so that we can compare \(W_p'\) and \(W_p^\flat \) on a small scale and then carry it over to the induced length metrics. Let subprobabilities \(\mu ,\nu \) be given as well as a \(W'_p\)-geodesic \((\eta '_t)_{t\in [0,1]}\) connecting the measures \(\mu ':=\mu +(1-\mu (Y))\delta _\partial \) and \(\nu ':= \nu + (1-\nu (Y))\delta _\partial \). By the continuity of \(W_p'\) and \(W_p^*\) with respect to weak convergence we can assume without loss of generality that \(\mu \) and \(\nu \) have compact supports and for \(\varepsilon >0\) small

    $$\begin{aligned} \eta _t(Y) \le 1-\varepsilon \end{aligned}$$

    for all \(t\in [0,1]\). Recall that the measures without primes are the restrictions to Y. We thus have \(\eta _t(\partial ) =0\), whereas \(\eta _t'(\partial )\ge \varepsilon \). Choose \(\delta >0\) such that \(\eta _t(B_\delta '(\partial ))\le \frac{\varepsilon }{2}\). Let \(\Pi \) be the probability measure on the space of \(Y'\)-geodesics such that \(\eta _t'=({\text {e}}_t)_\#\Pi \) (where \({\text {e}}_t\) is the evaluation map at time t), denote by L the essential supremum of \(d'(\gamma _0,\gamma _1)\) under \(\Pi \), and let \(\delta ':=\frac{\delta }{L}\). We consider \(\eta _s\) and \(\eta _t\) for \(|s-t|\le \delta '\). Using that \(d^\dagger (x,y)^p \ge d'(x,\partial )^p+d'(y,\partial )^p\), we see that in the decomposition (1.9) it is actually cheaper to annihilate mass at the boundary:

    $$\begin{aligned} W_p'(\eta _s,\eta _t)^p&= \inf \Big \{ W_p(\eta _{s,1},\eta _{t,1})^p + W_p^\dagger (\eta _{s,2},\eta _{t,2})^p + W_p'(\eta _{s,0},0)^p + W_p'(\eta _{t,0},0)^p \,\Big |\, \\ \eta _s&=\eta _{s,1} + \eta _{s,2} + \eta _{s,0}, \eta _{t} = \eta _{t,1} + \eta _{t,2} + \eta _{t,0}, (\eta _{s}+\eta _{t,0})(Y)\le 1,\\&\quad (\eta _{t}+\eta _{s,0})(Y)\le 1 \Big \} \\ \ge&\inf \Big \{ W_p(\eta _{s,1},\eta _{t,1})^p + W'_p(\eta _{s,0}+ \eta _{s,2},0)^p + W'_p(\eta _{t,0} + \eta _{t,2},0)^p \,\Big |\\ \eta _s&=\eta _{s,1}+\eta _{s,2} + \eta _{s,0}, \eta _t=\eta _{t,1}+\eta _{t,2} + \eta _{t,0}, (\eta _s+\eta _{t,0})(Y)\le 1, \\&\quad (\eta _t + \eta _{s,0})(Y)\le 1\Big \}. \end{aligned}$$

    Since \(W^\dagger \) only occurs where \(d^\dagger \) is smaller than d, its contribution comes from geodesics in \(B_\delta '(\partial )\), so that by our choice of \(\delta \) we know that \(\eta _{s,2}(Y)= \eta _{s,2}(B_\delta '(\partial )) \le \frac{\varepsilon }{2}\) and the same for \(\eta _{t,2}\). Hence for \(\varepsilon \) small enough we have \((\eta _s+(\eta _{t,2}+\eta _{t,0}))(Y)\le 1\), so that \(\eta _s=\eta _{s,1} + {\tilde{\eta }}_{s,0}\) with \({\tilde{\eta }}_{s,0}:=\eta _{s,0}+\eta _{s,2}\) is an admissible decomposition. In particular, the above inequality is an equality. Note that we cannot use this trick for \(s=0,t=1\) because then the constraint would not be satisfied. Thanks to Lemma 1.11 we thus have

    $$\begin{aligned} W_p'(\eta _s,\eta _t)^p&\ge \inf \Big \{ W_p(\eta _{s,1},\eta _{t,1})^p + W^*_p({\tilde{\eta }}_{s,0})^p + W^*_p({\tilde{\eta }}_{t,0})^p \,\Big |\\&\quad \eta _s=\eta _{s,1}+{\tilde{\eta }}_{s,0}, \eta _t=\eta _{t,1}+{\tilde{\eta }}_{t,0}, (\eta _s+{\tilde{\eta }}_{t,0})(Y)\le 1, (\eta _t+{\tilde{\eta }}_{s,0})(Y) \le 1 \Big \}\\&\ge W^0_p(\eta _s,\eta _t)^p \ge W^\flat _p(\eta _s,\eta _t)^p. \end{aligned}$$

    Hence, the \(W'_p\)-length of the curve \((\eta _t)_{t\in [0,1]}\) dominates its \(W^\flat _p\)-length. This finally proves

    $$\begin{aligned} W'_p(\eta _s,\eta _t)^p\ge W^\sharp _p(\eta _s,\eta _t)^p \end{aligned}$$

    for all st. For \(s=0, t=1\), this yields the claimed upper estimate for \(W_p^\sharp \).

    The lower estimate follows from assertion i) together with the facts that \(W_1^\flat \le W_p^\flat \) (which is inherited from analogous inequalities for \({\tilde{W}}_p\) and in turn for \(W^0_p\)) and \(W_p^\flat \le W_p^\sharp \).

Proof of Proposition 1.15

Boundedness of X, say \(d(x,y)\le D\), implies that all the \(W^\sharp _p\)-metrics are continuous w.r.t. to each other: \(W_1^\sharp \le W^\sharp _p\le D^{1-1/p}\cdot (W_1^\sharp )^{1/p}\). Thus it suffices to prove the claim for \(p=1\).

Assume that \(W^\sharp _1(\mu _n,\mu )\rightarrow 0\). Then \(W'_1(\mu '_n,\mu ')\rightarrow 0\) and thus \(\mu '_n\) converges to \(\mu '\) weakly on \(Y'\). This in turn obviously implies that \(\mu _n\) converges to \(\mu \) vaguely on Y.

Now conversely assume that \(W^\sharp _1(\mu _n,\mu )\not \rightarrow 0\). By compactness of \(Y'\), there will exist \(\nu '\in {\mathcal {P}}(Y')\) such that—after passing to a suitable subsequence—\(\mu '_n\) converges to \(\nu '\) weakly on \(Y'\). Let \(\nu \) denote the restriction of \(\nu '\) to Y. Obviously, \(\nu \not =\mu \). (Otherwise, \(W^\sharp _1(\mu _n,\mu )\rightarrow 0\).) Thus there exists \(f\in {\mathcal C}_c(Y)\) with \(\int fd\mu \not =\int fd\nu =\lim _{n}\int fd\mu _n\) and therefore \(\mu _n\) does not converge to \(\mu \) vaguely on Y. \(\square \)

The following simple estimate will make it possible to prove the continuity of \(W_p^0\) with respect to weak convergence plus convergence of moments of subprobability measures.

Lemma 4.3

Let \(\mu , \nu \in {\mathcal {P}}^{sub}_1(Y)\) with \(\mu (Y) \ge \nu (Y)\). Then, for any \(z\in X\setminus Y\),

$$\begin{aligned} W_1^0(\mu ,\nu ) \le \inf \left\{ W_1(\mu _1,\nu ) + \int _{X} d(x,z) {\text {d}}\!\mu _0(x) \,\big |\, \mu =\mu _1+\mu _0, \mu _1(Y)=\nu (Y) \right\} . \end{aligned}$$

Proof

Taking a decomposition such that \(\nu _1=\nu , \nu _0=0\), Lemma 1.10 yields \(W_1^0(\mu ,\nu ) \le W_1(\mu _1,\nu ) + W^*_1(\mu _0)\). Using now

$$\begin{aligned} W_1^*(\mu _0,\mu _0) =&\inf _{q} \int _{X\times X} d^*(x,y) {\text {d}}\!q(x,y) \\ \le&\inf _{q} \int _{X\times X} \big [d(x,z) + d(z,y)\big ] {\text {d}}\!q(x,y) = 2 \int _{X} d(x,z) {\text {d}}\!\mu _0(x), \end{aligned}$$

the proof is complete. \(\square \)

Lemma 4.4

Assume that X is compact. Then for \(\mu ^{(n)},\mu ^*\in \mathcal P^{sub}(Y)\) the following are equivalent:

  1. (i)

    \(\mu ^{(n)}\rightarrow \mu ^*\) weakly on Y

  2. (ii)

    \(W^0_p(\mu ^{(n)},\mu ^*)\rightarrow 0\) and \(\mu ^{(n)}(Y)\rightarrow \mu ^*(Y)\)

Remark 4.5

Without assuming compactness in Lemma 4.4, we are still able to get that \(W_p^0(\mu ^{(n)}, \mu ^*) \rightarrow 0\) for \(\mu ^{(n)},\mu ^*\in {\mathcal {P}}_p^{sub}(Y)\) if \(\mu ^{(n)}\rightarrow \mu ^*\) weakly in Y and \(\int d(x,x_0)^p {\text {d}}\!\mu ^{(n)}(x) \rightarrow \int d(x,x_0)^p {\text {d}}\!\mu ^*(x)\) for some \(x\in Y\).

Proof of Lemma 4.4

Assume \(\mu ^{(n)} \rightarrow \mu ^*\) weakly on Y. It again suffices to prove the result for \(p=1\). We want to use Lemma 4.3 to show continuity. In order to apply this lemma, we have to decompose the larger measure. We will proceed in three steps. First we will consider only sequences \((\mu ^{(n)})\) with \(\mu ^{(n)}(Y)\ge \mu ^*(Y)\)

for all \(n\in {\mathbb {N}}\). Define \(\lambda _n :=\frac{\mu ^*(Y)}{\mu ^{(n)}(Y)}\) and \(\mu ^{(n)}_1:= \lambda _n \mu ^{(n)}\). Then \(\mu ^{(n)}_1(Y) = \mu ^*(Y)\), \(\lambda _n\rightarrow 1\), and for \(f\in C_b^0\)

$$\begin{aligned} \left| \int _Xf{\text {d}}\!\mu ^{(n)}_1 - \int _X f {\text {d}}\!\mu ^* \right| \le&\left| \int _X \lambda _n f {\text {d}}\!\mu ^{(n)} - \int _X f{\text {d}}\!\mu ^{(n)} \right| + \left| \int _X f {\text {d}}\!\mu ^{(n)} - \int _X f {\text {d}}\!\mu ^* \right| \\ =&|\lambda _n-1| \left| \int _X f {\text {d}}\!\mu ^{(n)} \right| + \left| \int _X f {\text {d}}\!\mu ^{(n)} - \int _X f {\text {d}}\!\mu ^* \right| \longrightarrow 0. \end{aligned}$$

Hence, we have convergence in the Kantorovich-Wasserstein metric: \(W_1(\mu ^{(n)}_1, \mu ^*) \rightarrow 0\). Writing \(\mu ^{(n)}_0:= (1-\lambda _n) \mu ^{(n)}\), by Lemma 4.3 we finally have

$$\begin{aligned} {W^0_1}(\mu ^{(n)},\mu ^*) \le W_1(\mu ^{(n)}_1,\mu ^*) + \int _X d(x,z) {\text {d}}\!\mu ^{(n)}_0(x) \longrightarrow 0. \end{aligned}$$

Now, for the case that \(\mu ^{(n)}(Y)\le \mu ^*(Y)\), let \(\lambda '_n:= \frac{\mu ^{(n)}(Y)}{\mu ^{*}(Y)}\) and \(\mu ^*_{1,n} := \lambda '_n\mu ^*\). Then \(\mu ^*_{1,n}(Y)= \mu ^{(n)}(Y)\) and \(\lambda '_n\rightarrow 1\). Given \(f\in C_b^0\), by

$$\begin{aligned} \left| \int _X f{\text {d}}\!\mu ^*_{1,n} - \int _X f {\text {d}}\!\mu ^* \right| \le |\lambda '_n-1| \left| \int _X f {\text {d}}\!\mu ^* \right| \longrightarrow 0, \end{aligned}$$

we see that \(\mu ^*_{1,n} \rightharpoonup \mu ^*\). In a next step this yields

$$\begin{aligned} \left| \int _X f {\text {d}}\!\mu ^*_{1,n} - \int _X f {\text {d}}\!\mu ^{(n)} \right| \le \left| \int _X f {\text {d}}\!\mu ^*_{1,n} - \int _X f {\text {d}}\!\mu ^{*} \right| + \left| \int _X f {\text {d}}\!\mu ^* - \int _X f {\text {d}}\!\mu ^{(n)} \right| \longrightarrow 0, \end{aligned}$$

i.e. \(\mu ^*_{1,n}-\mu ^{(n)} \rightharpoonup 0\). Hence, using again Lemma 4.3, we see that

$$\begin{aligned} {W^0_1}(\mu ^{(n)},\mu ^*) \le W_1(\mu ^{(n)},\mu ^*_{1,n}) + \int _X d(x,z) {\text {d}}\!\mu ^*_{0,n }(x) \longrightarrow 0 \end{aligned}$$

Since a sequence converges if and only if every subsequence has a convergent subsequence, we now can conclude that \(a_n:= {W^0_1}(\mu ^{(n)},\mu ^*)\) converges to 0. Indeed, take a subsequence \(a_{n_k}\). Then we can take a further subsequence \(a_{n_{k_\ell }}\) such that either \(\mu ^{(n_{k_\ell })}(Y) \ge \mu ^*(Y)\) for every \(\ell \in {\mathbb {N}}\), or \(\mu ^{(n_{k_\ell })}(Y) \le \mu ^*(Y)\) for every \(\ell \in {\mathbb {N}}\). But then the above ensures convergence of these subsequences to 0.

Conversely, now assume that \(\mu ^{(n)}(Y)\rightarrow \mu ^*(Y)\) and \({W^0_p}(\mu ^{(n)},\mu ^*)\rightarrow 0\). Let \(\rho ^{(n)},\eta ^{(n)} \in {{\mathcal {P}}}^{sub}(X)\) such that \((2\rho ^{(n)} +\mu ^{(n)})(X)=1 = (2\eta ^{(n)}+\mu ^*)(X)\), and \({W^0_p}(\mu ^{(n)},\mu ^*) = \tilde{W}_p((\mu ^{(n)}+\rho ^{(n)}, \rho ^{(n)}),(\mu ^*+\eta ^{(n)},\eta ^{(n)}))\). Let \(\mu ^{(n_k)}\) be any subsequence and consider the corresponding subsequences \(\rho ^{(n_k)}, \eta ^{(n_k)}\). Compactness of \({\hat{X}}\) implies that there exists a sub-subsequence \((n_{k_\ell })_\ell \) such that

$$\begin{aligned} \eta ^{(n_{k_\ell })}\rightharpoonup \eta ^* \text { and } \mu ^{(n_{k_\ell })} \rightharpoonup {\tilde{\mu }}^* \text { and } \rho ^{(n_{k_\ell })} \rightharpoonup \rho ^* \end{aligned}$$

with suitable limits points \(\eta ^*, {\tilde{\mu }}^*, \rho ^*\). Then we have

$$\begin{aligned} {\tilde{W}}_p \left( ({\tilde{\mu }}^*+\rho ^*, \rho ^*),(\mu ^*+\eta ^*,\eta ^*) \right) \le&\ {\tilde{W}}_p \left( ({\tilde{\mu }}^*+\rho ^*, \rho ^*), (\mu ^{(n_{k_\ell })}+\rho ^{(n_{k_\ell })}, \rho ^{(n_{k_\ell })}) \right) \\&+ {\tilde{W}}_p\left( (\mu ^{(n_{k_\ell })}+\rho ^{(n_{k_\ell })}, \rho ^{(n_{k_\ell })}),(\mu ^*+\eta ^{(n_{k_\ell })},\eta ^{(n_{k_\ell })}) \right) \\&+ {\tilde{W}}_p\left( (\mu ^*+\eta ^{(n_{k_\ell })}, \eta ^{(n_{k_\ell })}),(\mu ^*+\eta ^*,\eta ^*) \right) \longrightarrow 0. \end{aligned}$$

Hence \(\rho ^*=\eta ^*\) and in particular \({\tilde{\mu }}^*=\mu ^*\). This way we see that every subsequence of \(\mu ^{(n)}\) has a further subsequence which converges to \(\mu ^*\), so that also the whole sequence converges to \(\mu ^*\). \(\square \)

5 Proofs for Sects. 1.2 and 1.3

Proof of Proposition 1.20

This will follow from the identification with the glued space and the properties shown in Sect. 3.1, in particular Theorem 3.10. Let us provide the details.

  1. (i)

    Given \(\sigma _0\in {\tilde{{\mathcal {P}}}}(Y|X)\), consider \({\hat{\sigma }}:= \Phi (\sigma _0) \in {\mathcal {P}}({\hat{X}})\), with the isometry \(\Phi \) given in Lemma 3.11. Since \({\hat{X}}\) is an \({\text {RCD}}^*(K,\infty )\) space by Assumption 1.17 and Lemma 3.13, the \({\text {EVI}}_K\)-gradient flow \({\hat{\sigma }}_t\in {\mathcal {P}}({\hat{X}})\) starting in \({\hat{\sigma }}\) exists. Again by the identification of the entropies in Lemma 3.13, the flow \(\sigma _t:= \Psi ({\hat{\sigma }}_t)\) is the \({\text {EVI}}_K\)-gradient flow of \({\widetilde{{\text {Ent}}}}\) in \({\tilde{{\mathcal {P}}}}(Y|X)\).

  2. (ii)

    Let \(\mu _0 \in {\mathcal {P}}^{sub}_2(X)\), and let \(\sigma _0\in {\tilde{{\mathcal {P}}}} (Y|X)\) such that \(\mu _0=\sigma _0^+-\sigma _0^-\) (such a \(\sigma _0\) exists by definition of \({\mathcal {P}}^{sub}_2(X)\)). Consider \(\sigma _t := {\tilde{{\mathscr {P}}}}_t\sigma _0\). By Lemma 3.12 we have

    $$\begin{aligned} \sigma _t^+-\sigma _t^-={\mathscr {P}}_t(\sigma _0^+-\sigma _0^-) = {\mathscr {P}}_t\mu _0. \end{aligned}$$

    This also shows the independence of the chosen \(\sigma _0\), as the right-hand side is independent of it.

  3. (iii)

    As in (ii).

  4. (iv)

    Let \(\sigma _0\in {\tilde{{\mathcal {P}}}}_2(Y|X)\) and define \(\mu _0:=\sigma _0^+-\sigma _0^-\) and \(\nu _0:=\sigma _0^++\sigma _0^-\). Then, again by Lemma 3.12,

    $$\begin{aligned} \sigma _t = {\tilde{{\mathscr {P}}}}_t\sigma _0 =&\left( {\mathscr {P}}_t \frac{\sigma _0^++\sigma _0^-}{2} + {\mathscr {P}}_t^0\frac{\sigma _0^++\sigma _0^-}{2}, {\mathscr {P}}_t \frac{\sigma _0^++\sigma _0^-}{2} + {\mathscr {P}}_t^0\frac{\sigma _0^+-\sigma _0^-}{2} \right) \\ =&\left( {\mathscr {P}}_t \frac{\mu _0}{2} + {\mathscr {P}}_t^0\frac{\nu _0}{2}, {\mathscr {P}}_t \frac{\mu _0}{2} + {\mathscr {P}}_t^0\frac{\nu _0}{2} \right) \\ =&\left( \frac{\mu _t+\nu _t}{2}, \frac{\mu _t-\nu _t}{2} \right) . \end{aligned}$$

Proof of Proposition 1.22

This is again a direct consequence of the identification, since by Assumption 1.17 the glued space is an \({\text {RCD}}^*(K,\infty )\) space and thus satisfies the desired Wasserstein contraction. \(\square \)

Proof of Theorem 1.26

  1. (i)

    Under Assumption 1.17, \({\hat{X}}\) is an \({\text {RCD}}^*(K,\infty )\) space and hence satisfies a gradient estimate with \(p=2\). By [21, Cor. 4.3] we have the improved gradient estimate for \(p\in [1,2]\) and by Jensen’s inequality one easily obtains the gradient estimate for \(p>2\) from that. Now we take a function \(f\in D({\mathcal {E}}^0)\) and define

    $$\begin{aligned} u:={\left\{ \begin{array}{ll} f, &{}\text { on } X^+\\ -f, &{} \text { on } X^-. \end{array}\right. } \end{aligned}$$

    Then \(u\in D(\hat{{\mathcal {E}}})\) and \(|\nabla u|=|\nabla f|\) on each \(X^i\). Thus, inserting u in the gradient estimate on \({\hat{X}}\) yields on the upper half \(X^+\):

    $$\begin{aligned} |\nabla P_t^0f|^p = |\nabla {\hat{P}}_tu|^p \le e^{-pKt} {\hat{P}}_t|\nabla u|^p = e^{-pKt} P_t|\nabla f|^p. \end{aligned}$$
  2. (ii)

    This follows directly from the duality of the heat semigroups (2.4).

\(\square \)

Proof of Theorem 1.28

(i)\(\Rightarrow \)(ii): Consider the doubling of X, \(V:= {\hat{X}}\). Then we can view Y as an open subset of \({\hat{X}}\) by identifying it with \(Y^+\). Now define \(\psi : V\rightarrow V\) as the “mirror mapping”

$$\begin{aligned} \psi (x) := {\left\{ \begin{array}{ll} \iota _-\circ \iota _+^{-1} (x), &{} \text {if } x\in X^+ \\ \iota _+\circ \iota _-^{-1} (x), &{} \text {if } x\in X^-. \end{array}\right. } \end{aligned}$$

It is easy to see that \(\psi \) is a measure-preserving isometry. Further, let \(x\in X^+\) such that \(\psi (x)=x\), i.e. \(\iota _- \circ \iota _+^{-1}(x)=x\). This in particular means \(x\in Z\) since for \(x\in Y^+\) we would have \(\iota _-\circ \iota _+^{-1}(x)\in Y^-\), which would contradict \(\psi (x)=x\in Y^+\). Finally observe that \(\psi (Y)=\psi (Y^+)=\iota _-(Y)=Y^-=V\setminus Y^+\).

(ii)\(\Rightarrow \)(iii): Take \(i=1, V_1:=V\).

(ii)\(\Rightarrow \)(i): Thanks to \(\xi \), we can define a measure-preserving isometry \(\varphi :(V,d_V,{\mathfrak {m}}_V) \rightarrow ({\hat{X}},{\hat{d}},{\hat{{\mathfrak {m}}}})\) by mapping \(Y\cong {\tilde{Y}}\) to \(Y^+\), \(\psi (Y)\) to \(Y^-\) and \(\partial Y\) to \(Z=X\setminus Y\subset {\hat{X}}\), where \(\psi \) is the map given in the definition of a halfspace. Since curvature-dimension conditions are preserved under measure-preserving isometries, \({\hat{X}}\) is an \({\text {RCD}}^*(K,\infty )\) space. Lemma 3.13 then tells us that Assumption 1.17 is satisfied.

(iii)\(\Rightarrow \)(i): We want to show that \({\hat{X}}\) is an \({\text {RCD}}^*(K,\infty )\) space by using the local-to-global property. Given \(x\in \partial Y\), choose i such that \(x\in X_i\). Then we can identify \((Y\cap X_i)^+\cup (Y\cap X_i)^-\subset {\hat{X}}\) with \({\hat{W}}_i\subset V_i\) via \(\xi _i\). Given measures \(\mu _0,\mu _1\in {\mathcal {P}}({\hat{X}})\) supported in \((Y\cap X_i)^+\cup (Y\cap X_i)^-\), then \(\nu _\ell :=(\xi _i)_\# \mu _\ell \in {\mathcal {P}}(V_i), \ell =0,1,\) are supported in \({\hat{W}}_i\). Since \(V_i\) is an \({\text {RCD}}^*(K,\infty )\) space, there is a geodesic \(\nu _t\in {\mathcal {P}}(V_i)\) connecting \(\nu _0\) and \(\nu _1\) such that the entropy \({\text {Ent}}_{{\mathfrak {m}}_{V_i}}\) is convex. Pulling back this curve via \(\mu _t:=(\xi _i^{-1})_\#\nu _t\) provides us with a geodesic in \({\mathcal {P}}({\hat{X}})\) such that \(\widehat{{\text {Ent}}}\) is convex. Combining this convex optimal transport near the boundary (i.e. the gluing edge) together with the local \({\text {RCD}}\) property of X (and hence \(X^+\) and \(X^-\)), we have that \({\hat{X}}\) is a local \({\text {RCD}}^*(K,\infty )\) space and by the local-to-global property also an \({\text {RCD}}^*(K,\infty )\) space. \(\square \)

Let us finally come to the proof of Theorem 1.19. When interested in curvature properties, gluing together Riemannian manifolds is a delicate issue, since in general the glued Riemannian metric will only be continuous and so one cannot define the curvature tensors.

Theorem 5.1

Let (Mg) be a complete, n-dimensional Riemannian manifold with Ricci curvature bounded below by \(K\in {\mathbb {R}}\). Let \(Y\subset M\) be an open, bounded, convex subset with a smooth, compact boundary, equip it with the Riemannian distance d and volume measure \({\mathfrak {m}}\), and write \(X:={\overline{Y}}\). Then the 2-gluing of \((X,d,{\mathfrak {m}})\) along \(\partial Y\), denoted by \(({\hat{X}},{\hat{d}},{\hat{{\mathfrak {m}}}})\), is an \({\text {RCD}}^*(K,n)\) space.

Proof

First observe that the gluing of Riemannian manifolds yields a continuous Riemannian metric

$$\begin{aligned} {\hat{g}}(p) = {\left\{ \begin{array}{ll} g_+(p), &{}\text { if } p\in Y^+ \\ g_-(p), &{}\text { if } p\in Y^-, \end{array}\right. } \end{aligned}$$

whose Riemannian distance and volume measure are \(d_{{\hat{g}}}={\hat{d}}\) and \({\mathfrak {m}}_{{\hat{g}}}=2{\hat{{\mathfrak {m}}}}\) in terms of our metric gluing.

By convexity, the submanifold Y satisfies the same lower bound on the Ricci curvature. A result of Schlichting [22, 23] now ensures that there is a sequence of smooth Riemannian metrics \({\hat{g}}_\varepsilon \) on the glued manifold \({\hat{X}}\) converging to \({\hat{g}}\) uniformly as \(\varepsilon \rightarrow 0\) and such that

$$\begin{aligned} {\text {Ric}}_{{\hat{g}}_\varepsilon } \ge (K-\varepsilon ). \end{aligned}$$

Thus we get a sequence of smooth, compact metric measure spaces \(({\hat{X}},d_{{\hat{g}}_\varepsilon },{\mathfrak {m}}_{{\hat{g}}_\varepsilon })\) which satisfy the \({\text {RCD}}^*(K-\varepsilon ,n)\) condition. The stability of this condition under measured Gromov-Hausdorff convergence together with the convergence result in the following lemma completes the proof. \(\square \)

Lemma 5.2

Let \((g_\varepsilon )_{\varepsilon >0}\) be a sequence of smooth Riemannian metrics and g a continuous Riemannian metric on a compact, smooth manifold \({\mathfrak {M}}\). If \(g_\varepsilon \rightarrow g\) uniformly as \(\varepsilon \rightarrow 0\), then \(({\mathfrak {M}}, d_\varepsilon , {\mathfrak {m}}_\varepsilon ) \rightarrow ({\mathfrak {M}}, d, {\mathfrak {m}})\) in the measured Gromov-Hausdorff sense, where \(d_\varepsilon ,{\mathfrak {m}}_\varepsilon \) and \(d,{\mathfrak {m}}\) are the distance functions and volume measures obtained by \(g_\varepsilon \) and g, respectively.

This seems to be well-known. We leave its straightfoward pf to the reader.

Proof of Theorem 1.19

As a Riemannian manifold with lower Ricci curvature bound K, M is an \({\text {RCD}}^*(K,\infty )\) space. As a convex subset, also \({\overline{Y}}\) with the restricted distance and measure is an \({\text {RCD}}^*(K,\infty )\) space. Now Assumption 1.17 is satisfied by identification of the entropies in Lemma 3.13, since the doubling of the manifold is an \({\text {RCD}}^*(K,\infty )\) space by Theorem 5.1.