1 Introduction

Let (Xd) be a complete and separable metric space and \(P_p(X)\) be the associated Wasserstein space of order \(p\ge 1\), i.e., the space of Borel probability measures on X with finite moment of order p, endowed with the (Kantorovitch–Rubinstein–)Wasserstein distance \(W_p\). In [8], Lisini proved that, for \(p>1\), any absolutely continuous curve \((\mu _t) \in {\mathcal{A}\mathcal{C}}^p(I:P_p(X))\) over a compact time interval \(I \subset {\mathbb {R}}\) with finite p-energy can be represented by a Borel probability measure \(\pi \) on continuous curves \((\gamma _t)\) in X, that is, \(\pi \in P(C(I:X))\), which satisfies the following properties:

  1. (i)

    \(\pi \) is concentrated on \({\mathcal{A}\mathcal{C}}^p(I:X) \subset C(I:X)\);

  2. (ii)

    \((e_t)_\#\pi =\mu _t\) for all \(t\in I\) (where \(e_t\) is the evaluation map, defined by \(e_t(\gamma ) {:=}\gamma _t\));

  3. (iii)

    the metric derivative \(|{\dot{\mu }}_t|\) satisfies the following for \({\mathcal {L}}^1\)-a.e. \(t\in I\):

    $$\begin{aligned} |{\dot{\mu }}_t|^p = \int |{\dot{\gamma }}_t|^p {\mathrm {\,d}}\pi (\gamma ). \end{aligned}$$
    (1.1)

Measures on a path space, like \(\pi \) above, are sometimes called path measures. Item (i) tells us that to characterize \((\mu _t)\), we can restrict our attention to a specific set of continuous curves, namely, absolutely continuous curves, or shortly AC-curves. Item (ii) ensures that \(\pi \) has the desired time-marginals, or in other words, is a lift of \((\mu _t)\) to the path space. This is also known as a superposition principle since the curve of measures \((\mu _t)\) is obtained by superposing individual curves \((\gamma _t)\) in the underlying space. Finally, Item (iii) states that the metric speed \(|{\dot{\mu }}_t|\) in the p-Wasserstein space can be obtained by taking the average over the metric speed of the characterizing curves in the base space according to the measure \(\pi \). Equation (1.1) can be regarded as a minimality property for \(\pi \). Indeed, for general lifts satisfying –(ii), one can expect only an inequality \((\le )\) in (1.1) (see [8, Theorem 4]). The minimal choice, which achieves equality, is in fact constructed using techniques of optimal transport. For Wasserstein geodesics, such a lift, often called optimal dynamical plans, is constructed in an earlier work by Lott and Villani [10, Proposition 2.10 and E.6] for the case of \(p=2\) and in complete locally compact length spaces. In Lisini’s work [8, Theorem 5], which local compactness is no longer required, the lift is constructed for general absolutely continuous curves in p-Wasserstein spaces, \(p>1\), and in particular is used for the characterization of the geodesics. Later in [9], Lisini also extends the results above to the so-called Wasserstein–Orlicz distance, where the usual cost function \(d^p\) is replaced by a more general function \(\psi \) with suitable properties. This extension, however, does not cover the case \(d^1\).

In this paper, we study the peculiar case of \(p=1\), where the cost function \(d^p\) in the definition of the Wasserstein distance loses its strict convexity. We first provide a simple example (see Example 1.1 below) in which an absolutely continuous curve in an 1-Wasserstein space cannot be lifted to a measure \(\pi \) on continuous curves. Nonetheless, we show that a similar superposition result still holds if we relax the notion of lifts. More precisely, we need to consider a larger class of curves, namely, curves of bounded variation, or shortly BV-curves (see also Example 3.5).

When considering the case \(p>1\), it is well known that the space of absolutely continuous curves with finite p-energy is closely connected to Sobolev space of order 1 with finite p-norm via the following “identification-inclusion” relationship

$$\begin{aligned} W^{1,p}(I:X) \simeq {\mathcal{A}\mathcal{C}}^p(I:X) \subset C(I:X), \end{aligned}$$
(1.2)

which succinctly indicates that every Sobolev curve can be identified with an absolutely continuous representative. Additionally, we have the Borel inclusion of absolutely continuous curves into the space of continuous curves equipped with the topology of uniform convergence, which turns it into a Polish space. In the present paper, where we study the case \(p=1\), these are replaced by the following

$$\begin{aligned} BV(I:X) \simeq {\mathcal{B}\mathcal{V}}(I:X) \subset D (I:X). \end{aligned}$$
(1.3)

Here BV(I : X) denotes the space of all BV-curves. As an analogue to (1.2), every BV-curve can be identified through a Borel selection map with a Càdlàg (right-continuous and left-limited) curve of bounded variation. The space of such curves is denoted by \({\mathcal{B}\mathcal{V}}(I:X)\), which is a Borel subset of the larger space of all possible Càdlàg curves denoted by D(I : X). The space D(I : X) can be equipped with a specific metric, which turns it into a Polish space, known as Skorokhod space. It is worth mentioning that in restriction to C(I : X), the Skorokhod topology is exactly the topology of uniform convergence. In short, we view BV-curves as a Borel subset, up to choosing a representative, of Skorokhod space.

Even though the metric derivative of BV-curves \(u\in BV(I:X)\) exists almost everywhere, as does so for AC-curves, it does not completely capture their “speed.” The natural replacement for metric derivative in this situation is the so-called total variation measure \(|Du| \in {\mathcal {M}}(I)\), which takes also the singular part of the speed, in particular jumps of the curves, into account. Here \({\mathcal {M}}(I)\) is the set of all positive measures over I and we will use \({\mathcal {L}}^n\) to denote n-dimensional Lebesgue measure.

Main result. In Theorem 3.3, we prove that any \((\mu _t)\in {\mathcal{B}\mathcal{V}}(I:P_1(X))\) can be represented by a Borel probability measure \(\tilde{\pi }\) on Càdlàg curves in X, that is, \(\tilde{\pi }\in P(D(I:X))\), which satisfies the following properties:

  1. (i)

    \(\tilde{\pi }\) is concentrated on \({\mathcal{B}\mathcal{V}}(I:X)\subset D(I:X)\);

  2. (ii)

    \((e_t)_\#\tilde{\pi }=\mu _t\) for all \(t\in I\);

  3. (iii)

    The total variation measure \(|D\mu |\in {\mathcal {M}}(I)\) of \((\mu _t)\) satisfies

    $$\begin{aligned} |D\mu |=\int |D\gamma |{\mathrm {\,d}}\tilde{\pi }(\gamma ). \end{aligned}$$
    (1.4)

Moreover, the absolutely continuous part \(|{\dot{\mu }}|{\mathcal {L}}^1\) in the Lebesgue(–Radon–Nikodym) decomposition of the measure \(|D\mu |\), given by the metric derivative (see discussion in Sect. 2.2), satisfies

$$\begin{aligned} |{\dot{\mu }}_t| = \lim _{h \rightarrow 0 } \int \frac{ d (\gamma _{t}, \gamma _{t+h})}{|h|} {\mathrm {\,d}}\tilde{\pi }(\gamma ) \end{aligned}$$
(1.5)

for \({\mathcal {L}}^1\)-a.e. \(t\in I\). In particular, if \((\mu _t) \in {\mathcal{A}\mathcal{C}}^1(I:P_1(X))\), then \(|D\mu |=|{\dot{\mu }}|{\mathcal {L}}^1\) and (1.5) characterizes the metric speed \(|{\dot{\mu }}_t|\).

Equation (1.4) is interpreted as equality of measures, i.e., for any (non-negative) Borel function \(f:I \rightarrow {\mathbb {R}}\), we have \(\int _I f(t){\mathrm {\,d}}|D\mu |(t)=\int \int _I f(t){\mathrm {\,d}}|D\gamma |(t){\mathrm {\,d}}\tilde{\pi }(\gamma )\). Theorem 3.1, which we prove first, indicates that (1.4) can be viewed as an optimality condition among all lifts of \((\mu _t)\), as in the case of \(p>1\). To construct \(\tilde{\pi }\), we use optimal mass transport as in [8] with modifications for BV-curves.

A motivating example. Here we present an elementary example of an absolutely continuous curve in the 1-Wasserstein space over \({\mathbb {R}}\), for which it is impossible to have lifts on continuous curves (also discussed briefly in [9, Remark 3.2]). Still, we construct a lift on discontinuous BV-curves. This provides a first insight into our results.

Example 1.1

Consider a curve of probability measures on \({\mathbb {R}}\) defined as

$$\begin{aligned} \mu _t {:=}(1-t)\delta _0+t\delta _1, \quad t \in I = [0,1]. \end{aligned}$$
(1.6)

This is a basic situation where the mass is “teleported” from 0 to 1, but not continuously “transported,” as shown in Fig. 1 (left). First of all, observe that for any \(t,s\in I\),

$$\begin{aligned} W_p^p (\mu _t, \mu _s) = |t-s| W_p^p (\mu _1, \mu _0) = |t-s|, \end{aligned}$$

and thus, the metric derivative in p-Wasserstein space,

$$\begin{aligned} |{\dot{\mu }}_t|= \lim _{h \rightarrow 0 } \frac{W_p(\mu _{t+h}, \mu _t)}{|h|} = \lim _{h \rightarrow 0 } \frac{|h|^{1/p}}{|h|}, \end{aligned}$$

only exits for \(p=1\). Therefore, \((\mu _t) \notin {\mathcal{A}\mathcal{C}}^p(I:P_p({\mathbb {R}})) \) for all \(p\in (1, \infty )\). Nevertheless \((\mu _t) \in {\mathcal{A}\mathcal{C}}^1(I:P_1({\mathbb {R}})) \) and it is even a constant-speed geodesic in 1-Wasserstein space between \(\delta _0\) and \(\delta _1\).

It is clear that there is no measure \(\pi \) on the set of continuous curves, i.e., a measure in \(P(C(I:{\mathbb {R}}))\), such that \(\mu _t = (e_t)_\# \pi \) for all \(t \in I\). However, we do claim that there exists a measure \(\pi \in P(D(I:{\mathbb {R}}))\) concentrated on the set of \({\mathcal{B}\mathcal{V}}\)-curves such that \(\mu _t = (e_t)_\# \pi \) for all \(t \in I\) and moreover it enjoys the optimally property

$$\begin{aligned} \int _{a}^{b} |{\dot{\mu }}_t| {\mathrm {\,d}}t = \int |D\gamma |([a,b]) {\mathrm {\,d}}\pi (\gamma ) \end{aligned}$$
(1.7)

for any \(a,b \in I\) with \(a<b\). Comparing with (1.4), we point out that the left-hand side of the equation above is nothing but \(|D\mu |([a,b])\) since \((\mu _t)\) in this simple example is absolutely continuous.

To construct \(\pi \), let us label particles standing at position \(x=0\) at time 0 with a real-valued parameter denoted by \(\alpha \in [0,1]\). These particles gradually jump to position \(x=1\) and since the rate of mass discharge is constant, we would expect that jumps happen uniformly in time. Let the particle with label \(\alpha \) jump from 0 to 1 at time \(\alpha \). Then its path is simply expressed using the indicator function as follows

$$\begin{aligned} t \mapsto \gamma _t^{(\alpha )} {:=}\mathbbm {1}_{[\alpha , 1]}(t). \end{aligned}$$
(1.8)

Some sample paths are plotted in Fig. 1 (right). Now, we consider a uniform measure over \(\alpha \) (as jumps happen uniformly) and consequently construct a path measure \(\pi \) by

$$\begin{aligned} \pi {:=}(\gamma ^{(\cdot )})_\# {\mathcal {L}}^1 |_{[0,1]}. \end{aligned}$$
(1.9)

We show now that \(\pi \) has the desired time-marginals and satisfies (1.7) as well. As for the first claim, notice that for any Borel subset \(B \in {\mathcal {B}}({\mathbb {R}})\), we can write

$$\begin{aligned} (e_t)_\# \pi (B)&= \int \mathbbm {1}_{B} (e_t(\gamma )) {\mathrm {\,d}}\pi (\gamma ) = \int _{0}^1 \mathbbm {1}_{B} (\gamma _t^{(\alpha )}) {\mathrm {\,d}}\alpha \\&= \int _{0}^1 \mathbbm {1}_{B} (\mathbbm {1}_{[\alpha , 1]}(t)) {\mathrm {\,d}}\alpha = \int _{0}^1 \mathbbm {1}_{B} (\mathbbm {1}_{[0, t]}(\alpha )) {\mathrm {\,d}}\alpha \\&= (1-t) \delta _0 (B) + t \delta _1(B) = \mu _t(B) \end{aligned}$$

where we first used the definition of push-forward and then substituted (1.9) and (1.8). As for the second claim (1.7), we start from the right-hand side and write

$$\begin{aligned} \int |D\gamma |([a,b]) {\mathrm {\,d}}\pi (\gamma )&= \int _{0}^{1} |D\gamma ^{(\alpha )} |([a,b]) {\mathrm {\,d}}\alpha \\&= \int _{0}^{1} \delta _\alpha ([a,b]) {\mathrm {\,d}}\alpha = |b-a| = \int _{a}^{b} |{\dot{\mu }}_t| {\mathrm {\,d}}t, \end{aligned}$$

which proves the claim.

As already mentioned, \((\mu _t)\) here is a constant-speed geodesic connecting \(\delta _0\) to \(\delta _1\). In fact, there are infinitely many constant-speed \(W_1\)-geodesics between \(\delta _0\) to \(\delta _1\). In Example 4.7, we present a relatively general way of how one can construct different geodesics.

Fig. 1
figure 1

(Example 1.1) Left: measure (1.6) at times \(s=1/5, \, t=2/5\). Right: sample curves \(\gamma ^{(\alpha )}\) in the lift \(\pi \) whose jumps occur at time \(\alpha \in [s,t]\). The curve with \(\alpha = (s+t)/2\) is highlighted

Applications. As a direct application of Theorems 3.1 and 3.3, we characterize BV-curves in 1-Wasserstein spaces. Furthermore, we characterize what we call BV-geodesics, i.e., variation minimizing curves, in 1-Wasserstein spaces. With the understanding (thanks to Theorem 3.3) of continuity and metric derivatives of Wasserstein curves of bounded variation, we then distinguish continuous length minimizing and constant speed geodesics among all BV-geodesics. We also discuss why the characterizing absolutely continuous curves in 1-Wasserstein spaces using their lifts still remains challenging.

As seen in Example 1.1, superposing discontinuous curves might result in continuity in the 1-Wasserstein space. On the other hand, continuous curves will always lead to a continuous Wasserstein curve. We investigate the relation between the regularity of the curves at the level of the base space and at the level of the Wasserstein space. The observations can be summarized as follows: superposing curves can only increase the regularity or, to put it differently, irregularities may average out when superposing, see Table 1.

Table 1 Curves \((\mu _t)\) in 1-Wasserstein spaces versus curves \(\gamma \) in their lifts as in Theorem 3.3

Finally, we study the continuity equation in a discrete setting. More precisely, using the lift coming from Theorem 3.3, we show that for any absolutely continuous curve \((\mu _t)\) living in a bounded subset of a (topologically) discrete metric space, there exists \((v_t)\), a suitable discrete analogue of a time-dependent vector field, such that \((\mu _t,v_t)\) satisfy the discrete continuity (or current) equation. We conclude with a discussion on a discrete Benamou–Brenier formula and on challenges arising if one is interested not only in the metric structure of the discrete space but also in an additional graph structure.

Organisation of the paper. The remainder of the paper is structured as follows:

  • In Sect. 2, we provide preliminary concepts concerning BV-curves in metric spaces and Skhorokhod space. Additionally, we prove equivalent definitions of BV-curves in Theorem 2.17. Such a result (which we did not easily find it in the literature) makes it more convenient to work with BV-curves in different situations.

  • In Sect. 3, we present and prove the main results, Theorems 3.1 and 3.3. We then provide some examples to shed light on the main results.

  • In Sect. 4, we apply the main theorems to characterize BV-curves and geodesics, understand better the regularity of curves in superposition, and finally study the continuity equation in a discrete setting.

2 Preliminaries

2.1 Summary of main notation

Throughout the paper, we consider a complete and separable metric space (Xd) and a compact time interval \(I \subset {\mathbb {R}}\). Without loss of generality, we fix \(I=[0,1]\). Two main path spaces are the space of continuous paths C(I : X) equipped with the topology of uniform convergence, and the space of Càdlàg paths D(I : X) equipped with the Skorokhod topology. For \(p\ge 1\), \(W^{1,p}(I:X)\) denotes the Sobolev space and \({\mathcal{A}\mathcal{C}}^p(I:X)\) denotes the set absolutely continuous curves with finite p-energy (recall the relationship (1.2) for \(p>1\)). BV(I : X) is the set of curves of bounded variation, and \({\mathcal{B}\mathcal{V}}(I:X)\) is the set of Càdlàg curves of bounded variation (recall the relationship (1.3) and see Definitions 2.2 and 2.4). \({\mathcal {M}}(I)\) is the set of all positive measures over I and \({\mathcal {L}}^n\) is the n-dimensional Lebesgue measure.

For \(t\in I\), \(u\mapsto e_t (u) {:=}u_t\) is the evaluation map, where \(u:I\rightarrow X\) is a path. The variation measure of a BV-curve u is denoted by |Du| and the density of the absolutely continuous part at point t with respect to the Lebesgue measure is denoted by \(|\dot{u}| (t)\). If it exists, the metric derivative of a curve u at time t is denoted by \(|\dot{u}_t|\) (see Definition 2.3 and Eq. (2.4)). In fact, \(|\dot{u}| (t) = |\dot{u}_t|\), as stated in Lemma 2.8.

The \(\sigma \)-algebra of Borel sets of X is denoted by \({\mathcal {B}}(X)\). We denote by P(X) the space of Borel probability measures on X, and by \(P_p(X) \subset P(X)\), \(p\ge 1\), its subset of measures with finite p-th moment. The space \(P_p(X)\) is endowed with (Kantorovitch–Rubinstein–)Wasserstein metric \(W_p\). Given a map \(T: X \rightarrow Y\) between two measurable spaces and a probability measure \(\mu \in P(X)\), the push-forward measure (or the image measure) is denoted by \(T_{\#} \mu \in P(Y)\).

2.2 BV-curves in metric spaces

In this subsection, we recall basic definitions and notions related to curves of bounded variation. \(L^1(I:X)\) denotes the space of all maps \(u :I \rightarrow X\) such that \(\int _I d(u (t), \bar{x}) {\mathrm {\,d}}t < \infty \) for some (and thus every) \(\bar{x} \in X\).

Definition 2.1

(Variation) The pointwise variation of a function \(u: I\rightarrow X\) on any subset \(J \subset I\) is defined as

$$\begin{aligned} \textrm{Var}^{} (u; J){:=}\sup \left\{ \sum _{i=0}^k d(u(t_i), u (t_{i + 1})) \big | t_0<\cdots < t_{k + 1_{}}, \{ t_j\}_{ 0 \le j \le k + 1} \subset J \right\} , \end{aligned}$$

and its essential variation is defined as

$$\begin{aligned} \mathrm {ess\, Var}(u; J) {:=}\inf \left\{ \textrm{Var}(v; J) |u = v \text { a.e. on } J \right\} . \end{aligned}$$

Definition 2.2

(BV-curves) We call \(u \in L^1(I:X)\) a curve of bounded variation, or shortly a BV-curve, if

$$\begin{aligned} \mathrm {ess\, Var}(u) {:=}\mathrm {ess\, Var}(u; I)< \infty . \end{aligned}$$

We use BV(I : X) to denote the space of all BV-curves.

For a non-decreasing function \(f :I \rightarrow {\mathbb {R}}\), we can define its variation measure as the Lebesgue–Stieltjes measure |Df| given by (see [14, Sect. 6.3.3])

$$\begin{aligned} |Df| ((a, b)) = f (b^-) - f (a^+). \end{aligned}$$

Using this, we can generalize the notion of variation measure in general metric spaces:

Definition 2.3

(Variation measure) Let \(u\in BV(I:X)\). The variation measure of u is defined as the Lebesgue–Stieltjes measure |Du| induced by the non-decreasing function \(V :I \rightarrow {\mathbb {R}}\) defined as \(V(t) {:=}\mathrm {ess\, Var}(u; (0, t))\).

By Lebesgue(–Radon–Nikodym) decomposition, the variation measure can be written as

$$\begin{aligned} |Du| = |\dot{u}| {\mathcal {L}}^1+ |Du|^{C} + |Du|^{J}, \end{aligned}$$
(2.1)

where \(|\dot{u}|\) denotes the Radon–Nikodym derivative of the variation measure with respect to the Lebesgue measure, \(|Du|^{J}\) is the purely atomic part, or the “jump part,” and the remaining term \(|Du|^{C}\) is the continuous singular part or the “Cantor part”. The density \(|\dot{u}|(t)\) actually coincides almost everywhere with the metric derivative \(|\dot{u}_t|\), hence the notation (see Lemma 2.8 at the end of this subsection). In this paper, we preserve the term metric speed only for the metric derivative of absolutely continuous curves. We however do not claim this to be a common practice in the literature.

Definition 2.4

(Càdlàg curves) A curve \(u:I\rightarrow X\) is called a Càdlàg curve if it is right-continuous with left-limits, i.e., for every \(t \in I \) we have that

$$\begin{aligned} u(t)=\lim _{s\searrow t}u(s), \, \exists \lim _{s\nearrow t}u(s). \end{aligned}$$

We use D(I : X) to denote the set of all Càdlàg curves and \({\mathcal{B}\mathcal{V}}(I:X)\) to denote Càdlàg curves of bounded variation.

The reason for introducing Càdlàg curves here is that any BV-curve admits a Càdlàg representative, i.e., they coincide \({\mathcal {L}}^1\)-a.e. This is stated in the lemma below:

Lemma 2.5

(Càdlàg-representation of BV-curves)

  1. (1)

    Let \(u\in D(I:X)\). For any \(0\le a \le b\le 1\),

    $$\begin{aligned} \textrm{Var}(u;(a,b))= \mathrm {ess\, Var}(u;(a,b))=|Du|((a,b)) \end{aligned}$$
    (2.2)

    and if u is left-continuous at \(t=1\), \(\textrm{Var}(u;I)=\mathrm {ess\, Var}(u;I)\).

  2. (2)

    Any \(u\in BV(I:X)\) admits a representative \({\tilde{u}}\in D(I: X)\) such that

    $$\begin{aligned} \mathrm {ess\, Var}(u) = \textrm{Var}({\tilde{u}}). \end{aligned}$$

Proof

It suffices to consider the nontrivial case when u has finite variation. Since Càdlàg representation if exists must be unique (up to the value at \(t=1\)), we only need to show that any BV-curve has a Càdlàg representation satisfying (2.2).

Let \(u\in BV(I:X)\). By definition, there is a sequence of \(u_n:I\rightarrow X\), such that \(u_n=u\) a.e. and \(\mathrm {ess\, Var}(u)\swarrow \textrm{Var}(u_n)<\infty \). For each n, the function \(t \mapsto \textrm{Var}(u_n; (0, t))\) is non-decreasing so it has left and right limits at each \(t \in (0, 1)\) and is continuous at all \(t \in I {\setminus } N_n\), where \(N_n\) is at most countable. Then by the completeness of (Xd), \(u_n\) must have left and right limits at each \(t\in (0, 1)\) and be continuous at all \(t \in I \setminus N_n\) as well. As the family \(\{u_n\}\) coincides a.e., at each \(t \in I\), the right limit \(u_n (t^+)\) equals for all n, which will be denoted by \({\tilde{u}} (t)\). Clearly, on \(I \setminus \cup _n N_n\), \(u_n={\tilde{u}}\) for all n, ensuring \({\tilde{u}}\) is a representative of u.

Notice that if a function has right limits everywhere then its corresponding right limit function is right continuous. Therefore, \({\tilde{u}}\) is right-continuous on [0, 1). And the existence of left limits is a direct consequence of \({\tilde{u}}\in BV\). So for its variation, given any \(\varepsilon > 0\) and partition \(0 \le t_0< \cdots < t_{k + 1} \le 1\), we can show for any \(n \in {\mathbb {N}}\),

$$\begin{aligned} \sum _{i = 0}^k d ({\tilde{u}} (t_i), {\tilde{u}} (t_{i + 1}))&= \sum _{i =0}^k d (u_n (t_i^+), u_n (t_{i + 1}^+))\nonumber \\&\le \sum _{i = 0}^k d (u_n (t_i^{} + r_n), u_n (t_{i + 1}^{}+r_n))+ \frac{\varepsilon }{2^i}\nonumber \\&\le \textrm{Var}(u_n) + 2\varepsilon , \end{aligned}$$
(2.3)

where \(0<r_n \ll 1 - t_k\) and assume \(u_n (t) {:=}u_n(1^-)\) whenever \(t\ge 1\) (in fact, without loss of generality, each \(u_n\) can be chosen as left-continuous at \(t= 1\) as this never increases \(\textrm{Var}(u_n)\)). After taking supremum of division and passing \(\varepsilon \) to 0, one concludes

$$\begin{aligned} \textrm{Var}({\tilde{u}})\le \liminf _{n \rightarrow \infty }\textrm{Var}(u_n)=\mathrm {ess\, Var}(u). \end{aligned}$$

The case for general sub-interval \((a,b)\subset I\) follows by the same argument. Finally, from the definition of variation measure,

$$\begin{aligned} |Du|((a,b))&=\lim _{r\searrow 0} \left[ \mathrm {ess\, Var}(u;(0,b-r))-\mathrm {ess\, Var}(u;(0,a+r)) \right] \\&=\lim _{r\searrow 0}\left[ \textrm{Var}(u;(0,b-r))-\textrm{Var}(u;(0,a+r)) \right] \\&= \lim _{r\searrow 0}\textrm{Var}({\tilde{u}};[a+r,b - r))= \textrm{Var}({\tilde{u}};(a,b)) \end{aligned}$$

where the last line comes directly from the definition of pointwise variation. \(\square \)

Remark 2.6

Given \(E\subset I\) with \({\mathcal {L}}^1(E)=1\), if a function u defined on E has finite pointwise variation, then we can extend u to I with \(\textrm{Var}(u;E)=\textrm{Var}(u;I)\).

Indeed, arguing as in Lemma 2.5, at each \(t\in I\)

$$\begin{aligned} \lim _{E\ni \tau \searrow t}u(\tau ) \end{aligned}$$

exists and for \(t\in I\setminus E\) we define u(t) as the above limit. On \(I\setminus E\), u is right-continuous so the variation will not increase after extension.

Lemma 2.7

The function \(\mathrm {ess\, Var}:L^1(I:X)\rightarrow [0,+\infty ]\) is lower semi-continuous.

Proof

Let \((u_n)_{n\in {\mathbb {N}}}\subset L^1(I:X)\) be a sequence converging to \(u\in L^1(I:X)\), that is

$$\begin{aligned} \int _{I}d(u_n(t),u(t)){\mathrm {\,d}}t\rightarrow 0,\quad n\rightarrow \infty . \end{aligned}$$

Without loss of generality, we may assume that \(\mathrm {ess\, Var}(u_n)<\infty \) for all \(n\in {\mathbb {N}}\). By Lemma 2.5, we can assume each \(u_n\) to achieve (2.2). As (Xd) is complete, up to picking a subsequence, \(u_n(t)\) converges to u(t) on some \(E\subset I\) with full measure, yielding

$$\begin{aligned} \textrm{Var}(u;E)\le \liminf _{n\rightarrow \infty }\textrm{Var}(u_n)=\liminf _{n\rightarrow \infty }\mathrm {ess\, Var}(u_n). \end{aligned}$$

By Remark 2.6, \(\mathrm {ess\, Var}(u)\) is bounded by \(\textrm{Var}(u;E)\) and hence \(\mathrm {ess\, Var}\) is lower semi-continuous. \(\square \)

To end this subsection, we briefly comment on the relation between the variation measure and the metric derivative. Recall that the metric derivative of \(u:I\rightarrow X\) at time t is defined by

$$\begin{aligned} |\dot{u}_t| {:=}\lim _{h\rightarrow 0}\frac{d(u(t),u(t+h))}{|h|} \end{aligned}$$
(2.4)

whenever the above limit exists.

Lemma 2.8

Let \(u\in {\mathcal{B}\mathcal{V}}(I:X)\). Then for \({\mathcal {L}}^1\)-a.e. \(t \in I\), the metric derivative \(|\dot{u}_t|\) exists and

$$\begin{aligned} |\dot{u}_t|=|\dot{u}|(t)=\lim _{h\rightarrow 0}\frac{|Du|([t,t+h])}{|h|}, \end{aligned}$$

where \(|\dot{u}|\) is the density in the decomposition (2.1).

Proof

It is known, e.g. from [1, Theorem 2.2], that the metric derivative \(|\dot{u}_t|\) exists almost everywhere and equals to the density \(|\dot{u}|(t)\). The second equality is a general fact for measures on \({\mathbb {R}}\) by the following argument. Assume that \(\mu \) is a locally finite measure on \({\mathbb {R}}\) with the decomposition \(\mu =\rho {\mathcal {L}}^1+\mu ^s\), where \(\mu ^s\perp {\mathcal {L}}^1\). By the Lebesgue differentiation theorem, it suffices to show

$$\begin{aligned} \lim _{h\rightarrow 0}\frac{\mu ^s([t,t+h])}{|h|}=0\quad \text {for } {\mathcal {L}}^1\text {-a.e. }t \in I. \end{aligned}$$

If else, there exists a Borel set \(T\subset {\mathbb {R}}\) and some \(c>0\) such that \({\mathcal {L}}^1(T)>0\) and

$$\begin{aligned} \limsup _{h\rightarrow 0}\frac{\mu ^s([t,t+h])}{|h|}>c,\quad \forall t\in T. \end{aligned}$$

Based on the standard differentiation theorem of measures (cf. [4, Theorem 2.4.3]), \(\mu ^s(T)>0\), which contradicts the fact that \(\mu ^s\) and \({\mathcal {L}}^1\) are mutually singular. \(\square \)

2.3 Skorokhod space

As alluded in the previous subsection, Càdlàg curves are important in the study of BV-curves. The space of Càdlàg curves D(I : X) can be equipped with a metric, known as Skorokhod metric, which turns it into a complete and separable space, known as Skorokhod space. Recall from (1.2)–(1.3) that D(I : X) with the Skorokhod topology plays the role of C(I : X) with the topology of uniform convergence. The goal of this subsection is to recall necessary notions of Skorokhod space.

Definition 2.9

(Skorokhod space) For two curves \(\gamma ,\tilde{\gamma } \in D(I:X)\), define a distance

$$\begin{aligned} d_{Sk}(\gamma ,\tilde{\gamma }) {:=}\inf _\lambda \max \{\Vert \lambda \Vert _B, d_{\sup }(\gamma ,\tilde{\gamma }\circ \lambda )\}, \end{aligned}$$

where the infimum runs over all increasing homeomorphisms \(\lambda :I\rightarrow I\), and

$$\begin{aligned} \Vert \lambda \Vert _B{:=}\sup _{0\le s<t\le 1} \left|\log \left( \frac{\lambda (t)-\lambda (s)}{t-s}\right) \right|. \end{aligned}$$

The set D(I : X) equipped with the distance \(d_{Sk}\) is called the Skorokhod space.

By definition, a sequence of curves \(\gamma _n\in D(I:X)\) converges to \(\gamma \in D(I:X)\) if and only if there exists functions \(\{\lambda _n\}\) such that \(d_{\sup }(\gamma ,\gamma _n\circ \lambda _n)\rightarrow 0\) and \( \Vert \lambda _n\Vert _B\rightarrow 0\) where the latter ensures \(|\lambda _n(t)-t|\rightarrow 0\) uniformly in t.

Theorem 2.10

(Billingsley-Skorokhod) Let X be complete and separable metric space. Then the Skorokhod space \((D(I:X),d_{Sk})\) is complete and separable.

The proof can be found in [5, Sect. 12], where the author only studied \(D(I:{\mathbb {R}})\) but the argument holds exactly the same replacing Euclidean space with general complete and separable metric space.

The next lemma shows that the topology induced by the distance \(d_{Sk}\) is finer than \(L^1\)-topology.

Lemma 2.11

Let \((\gamma _i)_i\) and \(\gamma \) be in D(I : X) such that \(\gamma _i\rightarrow \gamma \) in \(\big (D (I:X),d_{Sk}\big )\). Then \(\gamma _i\rightarrow \gamma \) in \(L^1(I:X)\).

Proof

Let \(\lambda _i\), \(\varepsilon _i\searrow 0\) so that \(d_{\sup }(\gamma \circ \lambda _i(t),\gamma _i(t))\le \varepsilon _i\). Then

$$\begin{aligned} \int _I d(\gamma (t),\gamma _i(t)){\mathrm {\,d}}t&\le \int _Id(\gamma _i(t),\gamma \circ \lambda _i(t)){\mathrm {\,d}}t+\int _I d(\gamma \circ \lambda _i(t),\gamma (t)){\mathrm {\,d}}t\\&\le \varepsilon _i+\int _I d(\gamma \circ \lambda _i(t),\gamma (t)){\mathrm {\,d}}t\rightarrow 0, \end{aligned}$$

when \(i\rightarrow \infty \). Here the last term converges to zero due to the almost everywhere continuity of Càdlàg curves (cf. [5, Lemma 12.1]) and the dominated convergence theorem. \(\square \)

Lemma 2.12

The Borel \(\sigma \)-algebra \({\mathcal {B}}(D(I:X))\) of the Skorokhod space is equal to the \(\sigma \)-algebra generated by the evaluation maps. More generally,

$$\begin{aligned} {\mathcal {B}}(D(I:X))=\sigma (e_t: t\in T), \end{aligned}$$

where \(T\subset I\) is an arbitrary dense subset of I for which \(1\in T\).

Proof

The proof goes as in the real valued case, see [5].

By Proposition 2.15, it suffices to prove that \({\mathcal {B}}(D(I:X))\subset \sigma (e_t:t\in T)\). Notice that by writing \(e_0=\lim _{t_i\searrow 0}e_{t_i}\) for some sequence \((t_i)\subset T\), we obtain Borel measurability of \(e_0\). Thus, we may assume that \(0\in T\).

The maps \(e_{t_J}{:=}(e_{t_1},\dots ,e_{t_n}):D(I:X)\rightarrow X^n\) are \((\sigma (e_t:t\in T), B(X^n))\)-measurable by definition for all \(t_J=(t_1,\dots , t_n)\) and \(|J|{:=}n\in {\mathbb {N}}\). Moreover, for a partition \(t_J\), \(|J|=n\), of [0, 1], the map \(\iota _{t_J}:X^n\rightarrow D(I:X)\), defined as \(\iota _{t_J}(x)(t){:=}x_i\) for \(t\in [t_i,t_{i+1})\), \(1\le i<n\), and \(\iota _{t_J}(x)(1)=x_n\), is continuous, and therefore Borel measurable.

Let now \((t_{J_n})\), \(t_{J_n}\subset T\), be a sequence of partitions of I with mesh \(|t_{J_n}|\) going to zero. Then by above we have that the composition map \(S_n{:=}\iota _{t_{J_n}}\circ e_{t_{J_n}}\) is \((\sigma (e_t:t\in T),{\mathcal {B}}(D(I:X)))\)-measurable. Moreover, we have that \(\textrm{id}_{D(I:X)}=\lim _{n\rightarrow \infty }S_n\). Thus the identity is \((\sigma (e_t:t\in T),{\mathcal {B}}(D(I:X)))\)-measurable and hence \({\mathcal {B}}(D(I:X))\subset \sigma (e_t:t\in T)\), which concludes the proof. \(\square \)

2.4 Further auxiliary results

Here we collect some results that are used later in the proof of main theorems. The first statement is about the lower semi-continuity of pointwise variation, which follows immediately by combining Lemma 2.11 together with Lemmas 2.7 and 2.5 (and taking into account possible discontinuities at \(t=1\)):

Lemma 2.13

The pointwise variation \(\textrm{Var}:D(I:X)\rightarrow [0,\infty ]\) is lower semi-continuous.

The next proposition concerns the identification in (1.3):

Proposition 2.14

(Borel Selection) For any \(\gamma \in BV(I:X)\), we let \(\tilde{\gamma }\) denote the Càdlàg-representative left-continuous at \(t=1\). Then the selection map \(T:BV(I:X)\subset L^1\rightarrow D(I:X)\), \(\gamma \mapsto {\tilde{\gamma }}\), is a Borel map.

Proof

By Lemmas 2.7 and 2.13, BV(I : X) and \({\mathcal{B}\mathcal{V}}(I:X)\) are Borel subsets of \(\big (L^1(I:X),d_{L^1}\big )\) and \(\big (D(I:X),d_{Sk}\big )\), respectively. Notice that by definition, the subset \({\mathcal{B}\mathcal{V}}^{1}\) of all curves in \({\mathcal{B}\mathcal{V}}\) which is left-continuous at \(t=1\) is closed under the Skorokhod metric. So it suffices to show that the following bijection

$$\begin{aligned} {\mathfrak {t}}:(BV(I:X),d_{L^1}) \rightarrow ({\mathcal{B}\mathcal{V}}^{1},d_{Sk}), \quad \gamma \mapsto \tilde{\gamma }, \end{aligned}$$

is a Borel map. Due to Lemma 2.11, \({\mathfrak {t}}^{-1}\) is continuous and the assertion follows after applying [13, Propsition 4.5.1]. \(\square \)

The following proposition is needed in the proof of (ii) in Theorem 3.3. Then to prove the time-marginals condition reduces to prove it only on a dense set of times t.

Proposition 2.15

The evaluation map \(e_t:D(I:X)\rightarrow X\) defined as \(e_t(\gamma ) {:=}\gamma _t\) is Borel measurable. Moreover, for any \(\pi \in P(D(I:X))\) the function

$$\begin{aligned} t\mapsto \int \phi \circ e_t{\mathrm {\,d}}\pi \end{aligned}$$

is Càdlàg for every continuous and bounded \(\phi \in C_b(X:{\mathbb {R}})\).

Proof

Step 1. For \(t\in \{0,1\}\), the map \(e_t\) is actually continuous. Therefore it suffices to consider \(t\in (0,1)\). Moreover, it suffices to prove that \(\psi \circ e_t\) is Borel measurable for every continuous and bounded \(\psi \in C_b(X)\). The rest of the proof goes in the lines of the real valued case, cf. [5]; for \(m\in {\mathbb {N}}\), define \(\psi _m:D(I:X)\rightarrow {\mathbb {R}}\)

$$\begin{aligned} \psi _m(\gamma ){:=}\frac{1}{m}\int _t^{t+\frac{1}{m}}\psi (\gamma (s)){\mathrm {\,d}}s. \end{aligned}$$

We want to show that \(\psi _m\) is continuous. Let \(\gamma _n \rightarrow \gamma \) in D(I : X). Then \(\gamma _n(s)\rightarrow \gamma (s)\) \({\mathcal {L}}^1\)-almost everywhere. Thus, by dominated convergence we have that

$$\begin{aligned} \psi _m(\gamma _n)=\frac{1}{m}\int _t^{t+\frac{1}{m}}\psi (\gamma _n(s)){\mathrm {\,d}}s\rightarrow \frac{1}{m}\int _t^{t+\frac{1}{m}}\psi (\gamma (s)){\mathrm {\,d}}s=\psi _m(\gamma ), \end{aligned}$$

when \(n\rightarrow \infty \). We conclude that \(\psi _m\) is continuous. On the other hand, by right continuity of \(\psi \circ \gamma \), we have that \(\psi \circ e_t(\gamma )=\lim _{m\rightarrow \infty }\psi _m(\gamma )\). Hence, \(\psi \circ e_t\) is Borel measurable.

Step 2. Let us first prove the right continuity for all \(t\in [0,1)\). For that, let \(\phi \in C_b(X)\), and \(C>0\) so that \(|\phi |\le C\). Fix \(t_0\in [0,1)\) and \(\varepsilon >0\). Write

$$\begin{aligned} \Gamma {:=}D(I:X)= \bigcup _{n\in {\mathbb {N}}}\Gamma _n, \end{aligned}$$

where

$$\begin{aligned} \Gamma _n{:=}\left\{ \gamma \in D(I:X): |\phi (\gamma (t_0))-\phi (\gamma (t))|<\varepsilon , \forall t \in \left[ t_0, t_0+\frac{1}{n}\right] \right\} . \end{aligned}$$
(2.5)

This is possible since \(\phi \circ \gamma \) is right continuous for every \(\gamma \in D(I:X)\). Since \(\Gamma _n\) is increasing in n, there exists \(n_0\in {\mathbb {N}}\) so that

$$\begin{aligned} \pi (\Gamma \setminus \Gamma _{n_0})\le \frac{\varepsilon }{C}. \end{aligned}$$

Let \(s\in [t_0,t_0+\frac{1}{n_0}]\). Then

$$\begin{aligned} \Big |\int \phi \circ e_{t_0}-\phi \circ e_s{\mathrm {\,d}}\pi \Big |&\le \Big |\int _{\Gamma \setminus \Gamma _{n_0}} \phi \circ e_{t_0}-\phi \circ e_s{\mathrm {\,d}}\pi \Big |+\Big |\int _{\Gamma _{n_0}} \phi \circ e_{t_0}-\phi \circ e_s{\mathrm {\,d}}\pi \Big |\nonumber \\&\le \int _{\Gamma \setminus \Gamma _{n_0}}\big |\phi \circ e_{t_0}-\phi \circ e_s\big |{\mathrm {\,d}}\pi +\int _{\Gamma _{n_0}}\big |\phi \circ e_{t_0}-\phi \circ e_s\big |{\mathrm {\,d}}\pi \nonumber \\&\le 2C\pi (\Gamma \setminus \Gamma _{n_0})+\varepsilon \le 3\varepsilon . \end{aligned}$$
(2.6)

Thus, the map \(t\mapsto \int \phi \circ e_t{\mathrm {\,d}}\pi \) is right continuous.

The existence of left limits goes in the same lines. The only modifications needed are the following. First of all, in (2.5) the set \(\Gamma _n\) is replaced by

$$\begin{aligned} {\tilde{\Gamma }}_n{:=}\left\{ \gamma \in D(I:X): |\phi (\gamma (t_0^-))-\phi (\gamma (t))|<\varepsilon , \forall t\in \left[ t_0-\frac{1}{n},t_0\right] \right\} . \end{aligned}$$

Second, in (2.6), the estimates are done for \(t,s\in [t_0-\frac{1}{n_0},t_0)\) from which the existence of left limits follows by Cauchy’s criterion. \(\square \)

Finally, the observation below will be useful when dealing with (non-continuous) BV-curves in the Wasserstein space.

Lemma 2.16

Let \(\gamma \in D(I:X)\). Then the image of \(\gamma \) is precompact.

Proof

Let \(C\subset X\) be a collection of all left and right limits of \(\gamma \), and let \(S{:=}C\setminus \textrm{Im}(\gamma )\). Clearly C is the closure of the image of \(\gamma \). It suffices to prove that for any sequence \((x_i)\subset S\) there exists a subsequence that converges. Indeed, if \((x_i)\subset C\) so that there exists infinitely many i for which \(x_i=\gamma (t_i)\), then we can take a monotone subsequence of \(t_i\) and by the existence of left and right limits conclude that this subsequence converges to a point in C.

Therefore, let \((x_i)\subset S\). For any i, choose \(t_i\) so that \(d(\gamma (t_i),x_i)<\frac{1}{2^i}\). Then, as before, there exists a monotone subsequence of \((t_i)\), still denoted by \(t_i\). Let \(a=\lim \gamma (t_i)\in C\). Then (for the corresponding subsequence of \((x_i)\)) we have that

$$\begin{aligned} d(a,x_i)\le d(a,\gamma (t_i))+\frac{1}{2^i}\rightarrow 0, \end{aligned}$$

when \(i\rightarrow \infty \). \(\square \)

2.5 Equivalent definitions of BV-curves

It is known, especially in the real-valued case (see, e.g., [7]), that different definitions of BV-curves are equivalent. Although we expect that such a result is also known in the general setting of metric spaces, we did not easily find it in the literature. Thus, for the sake of completeness, we here prove the equivalence in the general case.

Theorem 2.17

(Equivalent definitions of BV-curves) Let (Xd) be a complete and separable metric space. Given \(u \in L^1 (I: X)\), the following are equivalent:

  1. (1)

    u is a BV-curve;

  2. (2)

    difference quotient \(\Delta _h u(t){:=}\, d(u(t+h),u(t))/h\) for \(h \in (0,1)\) satisfies

    $$\begin{aligned} \sup _{0< h< 1} \int _0^{1 - h}\Delta _h u (t){\mathrm {\,d}}t <\infty ; \end{aligned}$$
    (2.7)
  3. (3)

    there is a finite Borel measure \(\mu \in {\mathcal {M}}([0,1])\) such that for any \(\varphi \in \textrm{Lip}_1(X:{\mathbb {R}})\), \(\varphi \circ u\) is a BV-function and

    $$\begin{aligned} |D(\varphi \circ u) |\le \mu ; \end{aligned}$$
    (2.8)
  4. (4)

    there is a finite Borel measure \(\mu \in {\mathcal {M}}([0,1])\) such that for any \(\varphi =d(\cdot ,x)\) with \(x\in X\), \(\varphi \circ u\) is a BV-function and (2.8) holds.

  5. (5)

    there exists a finite Borel measure \(\mu \in {\mathcal {M}} ([0, 1])\) such that for \({\mathcal {L}}^2\)-a.e. \((s, t) \in I^2\)

    $$\begin{aligned} d(u(t),u(s)) \le \mu ([s, t]). \end{aligned}$$
    (2.9)

Proof

(1) \(\Rightarrow \) (2). If \(\mathrm {ess\, Var}(u)<\infty \), since (2.7) is stable under choosing different representatives, we can assume \(u \in D (I: X)\) and \(\textrm{Var}(u) = \mathrm {ess\, Var}(u)\). Then for any \(h > 0\), the function \(h \mapsto \Delta _h u (t)\) is bounded and the set of discontinuity points is \({\mathcal {L}}^1\)-negligible, making it Riemann integrable by Riemann-Lebesgue theorem. For any \(n \in {\mathbb {N}}\), we divide I into \(0 = t_0< \cdots t_i< \cdots < t_N \le 1 - h\), where \(N = \lfloor (1 - h) n / h \rfloor \) and \(t_i = i h / n\). Using triangle inequality,

$$\begin{aligned} h \Delta _h u (t_i) = d (u (t_i), u (t_i + h)) \le \sum _{j = 0}^{n - 1} d (u (t_{i + j}), u (t_{i + j + 1})), \quad 1 \le i \le N - 1. \end{aligned}$$

So for any \(h > 0\), with convention \(t_{N + 1} {:=}1 - h\), we have

$$\begin{aligned} \int _0^{1 - h} \Delta _h u (t) {\mathrm {\,d}}t= & {} \lim _{n \rightarrow \infty } \sum _{i = 0}^N \Delta _h u (t_i) (t_{i + 1} - t_i)\\= & {} \lim _{n \rightarrow \infty } \left( \Delta _h u (t_N) (1 - h - t_N) + \frac{h}{n} \sum _{i = 0}^{N - 1} \Delta _h u (t_i) \right) \\\le & {} \liminf _{n \rightarrow \infty } \frac{1}{n} \left( \sum _{i = 0}^{N - 1} \sum _{j = 0}^{n - 1} d (u (t_{i + j}), u (t_{i + j + 1})) + d (u (t_N), u (1 - h)) \right) \\\le & {} \sum _{i = 0}^N d (u (t_i), u (t_{i + 1})) \le \textrm{Var}(u). \end{aligned}$$

(2) \(\Rightarrow \) (3). Let L be the following non-decreasing function

$$\begin{aligned} L (t) = \sup _{0< h < t} \int _0^{t - h} \Delta _h u (s) {\mathrm {\,d}}s \end{aligned}$$
(2.10)

and |DL| be the Lebesgue-Stieltjes measure such that \(| D L | ((a, b]) = L (b^+) - L (a^+).\) By assumption |DL| is a finite measure and Lemma 2.18 below ensures

$$\begin{aligned} \sup _{0< h<b-a}\int _a^{b-h} \Delta _h u(t){\mathrm {\,d}}t\le |D L |([a,b]). \end{aligned}$$

Given \(\varphi \in \textrm{Lip}_1(X:{\mathbb {R}})\),

$$\begin{aligned} \sup _{0< h< b-a} \int _a^{b - h} \Delta _h (\varphi \circ u) (s) {\mathrm {\,d}}s \le \sup _{0< h < b - a} \int _a^{b - h} \Delta _h u (t) {\mathrm {\,d}}t \le | D L|([a,b]). \end{aligned}$$
(2.11)

Using [7, Corollary 2.43], \(\varphi \circ u\) is a BV-function on I and over any interval (ab)

$$\begin{aligned} \mathrm {ess\, Var}(\varphi \circ u;(a, b))=\sup _{0<h< b-a}\int _a^{b-h} \Delta _h (\varphi \circ u) (s) {\mathrm {\,d}}s. \end{aligned}$$
(2.12)

Thus, with (2.11),

$$\begin{aligned} | D (\varphi \circ u) | ((a, b))= & {} \lim _{r\searrow 0} | D (\varphi \circ u) | ((a+r,b-r))\\\le & {} \liminf _{r\searrow 0} | D L | ([a+r,b-r])\\\le & {} | D L | ((a, b)) \end{aligned}$$

for arbitrary \(\varphi \in \textrm{Lip}_1 (X: {\mathbb {R}})\) and \((a, b) \subset I\). (3) \(\Rightarrow \) (4) is trivial.

(4) \(\Rightarrow \) (1) Let \(\{z_i\}_{i \in {\mathbb {N}}} \subset X\) be a dense set. Define the functions \(\varphi _i (\cdot ) {:=}d(z_i, \cdot )\). By assumption \(\varphi _i\circ u \in BV(I:{\mathbb {R}})\). Thus for each i, we can take a representative \(\phi _i \in D(I:{\mathbb {R}})\) and \(N_i\subset I\) such that \({\mathcal {L}}^1(N_i) = 0\) and \(\phi _i = \varphi _i\circ u\) on \(I {\setminus } N_i\). Set \({\tilde{I}} = I {\setminus } \cup _{i \in {\mathbb {N}}} N_i\). Our goal is to find a representative \({\tilde{u}}\) whose pointwise variation is finite.

Notice that by the density of the set \(\{z_i\}\) in X, we have

$$\begin{aligned} d(u(s),u(t))&= \sup _i \big | d(z_i,u(s)) - d(z_i,u(t)) \big |. \end{aligned}$$

Further using the assumption on measure \(\mu \), for \(s,t \in {\tilde{I}} \) and \(s<t\), we have

$$\begin{aligned} d(u(s),u(t)) = \sup _i \big | \varphi _i(s) - \varphi _i(t) \big | \le \sup _i |D \varphi _i|((s,t]) \le \mu ((s,t]). \end{aligned}$$

Now take a decreasing sequence \(t_i \rightarrow t\) with \(t_i \in {\tilde{I}}\)

$$\begin{aligned} \sum _{i=k}^{\infty } d (u(t_i),u(t_{i+1}))&\le \sum _{i=k}^{\infty } \mu ( (t_{i+1},t_{i}]) \le \mu ( (t,t_{k}]). \end{aligned}$$

Since \(\mu \) is a finite measure, the right hand side in equation above goes to 0 as \(k \rightarrow \infty \). Therefore, \((u(t_i))_{i \in {\mathbb {N}}}\) is a Cauchy sequence and by completeness, the limit

$$\begin{aligned} {\tilde{u}} (t) {:=}\lim _{i \rightarrow \infty } u(t_i) \end{aligned}$$

exists and we have \({\tilde{u}}= u\) over \({\tilde{I}}\). Finally, by the similar argument as in the proof of Lemma 2.5, Eq. (2.3), we conclude

$$\begin{aligned} \textrm{Var}({\tilde{u}}) \le \mu (I) < \infty . \end{aligned}$$

(5) \(\Rightarrow \) (3). Let \(\varphi :X\rightarrow {\mathbb {R}}\) be any 1-Lipschitz function. As assumed,

$$\begin{aligned} |\varphi _u(t)-\varphi _u(s)|\le \mu ([s,t])\quad \varphi _u{:=}\varphi \circ u \end{aligned}$$

for \({\mathcal {L}}^2\)-a.e. \((s,t)\in I^2\). Then there is \(H\subset I\) of full \({\mathcal {L}}^1\) measure such that for any \(h\in H\), \( |\varphi _u(t)-\varphi _u(t+h)|\le \mu ([t,t+h])\) over \({\mathcal {L}}^1\)-a.e. \(t\in (0,1-h)\). Since the monotone function \(t\mapsto \mu ([0, t])\) is discontinuous at most at countably many points, \(\mu ([0, t])=\mu ([0, t))\) for \({\mathcal {L}}^1\)-a.e. \(t \in (0, 1)\). Now for every \(0\le a<b\le 1\) and \(h\in H\),

$$\begin{aligned} \int _a^{b- h} \Delta _h \varphi _u (t) {\mathrm {\,d}}t&\le \int _a^{b-h} \frac{\mu ([t, t + h])}{h} {\mathrm {\,d}}t\nonumber \\&= \frac{1}{h} \int _a^{b - h} \mu ([0, t + h]) - \mu ([0, t)) {\mathrm {\,d}}t\nonumber \\&= \frac{1}{h} \left( \int _{a+h}^b \mu ([0, t]) {\mathrm {\,d}}t - \int _a^{b - h} \mu ([0, t]) {\mathrm {\,d}}t \right) \nonumber \\&= \frac{1}{h} \left( \int _{b- h}^b \mu ([0, t]) dt - \int _a^{a+h} \mu ([0, t]) {\mathrm {\,d}}t \right) \nonumber \\&\le \mu ([0,b])-\mu ([0,a))=\mu ([a, b])<\infty . \end{aligned}$$
(2.13)

Moreover, we can extend (2.13) to those \(h\notin H\) using the continuity given by Lemma 2.18 2. With again the relation (2.12), we know \(\varphi _u\) is a BV-function and \(|D\varphi _u|\le \mu \).

(1) \(\Rightarrow \) (5). It suffices to prove the statement for \(u \in {D} (I:X)\) satisfying (2.2) as in Lemma 2.5. By choosing \(\mu \) as the variation measure |Du|, it follows

$$\begin{aligned} d(u(s), u(t))\le & {} \liminf _{r\searrow 0}\textrm{Var}(u;(s-r,t+r))\\= & {} \liminf _{r\searrow 0}\mathrm {ess\, Var}(u;(s-r,t+r))=\mu ([s,t]). \end{aligned}$$

\(\square \)

Lemma 2.18

For any \(u\in L^1(I:X)\) and \(0\le a<b\le 1\), define L as in (2.10) and

$$\begin{aligned} l^b_a(h){:=}\int _a^{b-h} \Delta _h u(t){\mathrm {\,d}}t,\quad 0<h<b-a. \end{aligned}$$

Then

  1. (1)

    for all \(0<h<b-a\), \(l^b_a(h)\le l^b_a(h/2)\), and in particular,

    $$\begin{aligned} \sup _{0<h<b-a}l^b_a(h)=\limsup _{h\rightarrow 0}l^b_a(h); \end{aligned}$$
  2. (2)

    if \(X={\mathbb {R}}\), then \((0,b-a)\ni h\mapsto l^b_a(h)\) is continuous;

  3. (3)

    for general metric space X, if \(L(1)<\infty \), \((0,b-a)\ni h\mapsto l^b_a(h)\) is continuous and

    $$\begin{aligned} \sup _{0< h<b-a}\int _a^{b-h} \Delta _h u(t){\mathrm {\,d}}t \le | D L |([a,b]). \end{aligned}$$
    (2.14)

Proof

The first assertion follows simply by triangle inequality:

$$\begin{aligned} \int ^{b-h}_{a}\frac{d(u(t),u(t+h))}{h}{\mathrm {\,d}}t&\le \int ^{b-h}_{a}\frac{d(u(t),u(t+h/2))+d(u(t+h/2),u(t+h))}{h}{\mathrm {\,d}}t\\&\le \int ^{b-h/2}_{a}\frac{d(u(t),u(t+h/2))}{h/2}{\mathrm {\,d}}t. \end{aligned}$$

Given arbitrary \(h,h'\), by triangle inequality

$$\begin{aligned} \int |d(u(t),u(t+h))-d(u(t),u(t+h'))|{\mathrm {\,d}}t \le \int d(u(t+h),u(t+h')){\mathrm {\,d}}t. \end{aligned}$$

So the continuity of \(l^b_a\) at h boils down to show

$$\begin{aligned} \lim _{h\rightarrow 0}\int d(u(t),u(t+h)){\mathrm {\,d}}t=0. \end{aligned}$$
(2.15)

If \(X={\mathbb {R}}\), (2.15) results from the fact that any \(L^1\)-function can be approximated by continuous functions under \(L^1\)-norm. On general metric spaces, (2.15) immediately follows from the finiteness of L(1).

As for (2.14), by definition, \(|DL|([a,b])=L(b^+)-L(a^-)\), so it suffices to show \(l^b_a(h_1)+l^a_0(h_2)\le L(b)\) for all \(h_1,h_2\). By the continuity of \(h\mapsto l^b_a(h)\), we can assume

$$\begin{aligned} h_i=\frac{k_i}{2^n},\quad n,k_i\in {\mathbb {N}}, 1\le k_i<2^n,i=1,2. \end{aligned}$$

The conclusion follows if we take the uniform step-size \(1/2^n\) and argue as in part (1):

$$\begin{aligned} l^b_a(h_1)+l^a_0(h_2)&= \int ^{b-h_1}_{a}\frac{d(u(t),u(t+h_1))}{h_1}{\mathrm {\,d}}t+ \int ^{a-h_2}_{0}\frac{d(u(t),u(t+h_2))}{h_2}{\mathrm {\,d}}t\\&\le \int _a^{b-h_1}\sum _{k=1}^{k_1}d\left( u\left( t+\frac{k-1}{2^n}\right) ,u\left( t+\frac{k}{2^n}\right) \right) \frac{2^n}{k_1}{\mathrm {\,d}}t\\&\quad + \int _0^{a-h_2}\sum _{k=1}^{k_2}d\left( u\left( t+\frac{k-1}{2^n}\right) ,u\left( t+\frac{k}{2^n}\right) \right) \frac{2^n}{k_2}{\mathrm {\,d}}t\\&\le \int _0^{b-1/2^n}\frac{d\big (u(t),u(t+1/2^n)\big )}{1/2^n}{\mathrm {\,d}}t\le L(b). \end{aligned}$$

\(\square \)

Remark 2.19

(Equivalence of variation measures) In Theorem 2.17, we show that five different definitions of the set of BV-curves are in fact equivalent. The proof has an even stronger implication, namely, that the five(-ish) different notions of the variation measure of a BV-curve are equal. To be more precise, given a BV-curve u, the following measures are equal:

(0):

the Lebesgue–Stieltjes measure induced by \(\mathrm {ess\, Var}(u;(0,\cdot ))\);

(1):

the measure \(\mu \) characterized by \(\mu ((a,b))=\mathrm {ess\, Var}(u;(a,b))\);

(2):

the measure \(\mu \) characterized by \(\mu ((a,b))=\sup _{h}\int _a^{b-h}\Delta _h u (t){\mathrm {\,d}}t\);

(3):

the measure \(\mu =\underset{\varphi \in \textrm{Lip}_1(X)}{{\mathcal {M}}-\sup }|D(\varphi \circ u)|\)Footnote 1;

(3\('\)):

the minimal measure satisfying (2.8);

(4):

the minimal measure satisfying (2.8) for all \(\varphi \) of the form \(\varphi =d(\cdot ,x)\);

(5):

the minimal measure satisfying (2.9).

The fact that all these different approaches lead to the same measure is evident from the corresponding steps in the proof of Theorem 2.17. The single measure obtained in one of the various ways is thus denoted by |Du| and called the variation measure of u. Moreover, we take advantage of the different approaches without mentioning it explicitly.

Remark 2.20

Sometimes it is useful to have bounds as above which hold not only for almost every s and t, but in fact everywhere. For this one needs to consider a specific choice for a representative of the BV-curve in question. One choice, which we decided to work with in this paper, is the Càdlàg-representative. Since D(I : X)-representative of a BV-curve is unique (up to its value at \(t=1\)), if \(u\in {\mathcal{B}\mathcal{V}}(I:X)\), by Lemma 2.5, we indeed have that \(d(u(s),u(t))\le |Du|((s,t])\) for all \(0\le s\le t<1\).

3 Main results

3.1 Lifts of AC- and BV-curves in 1-Wasserstein spaces

As introduced in Sects. 1 and 2.1, we consider (Xd) a complete and separable metric space and \(P_1(X)\) the associated Wasserstein space of order \(p=1\). Without loss of generality, we fix the time interval to be \(I=[0,1]\).

Theorem 3.1

Let \(\pi \in P(D(I:X))\) be concentrated on \({\mathcal{B}\mathcal{V}}(I:X) \subset D(I:X)\) such that

$$\begin{aligned} \int | D \gamma | (I) {\mathrm {\,d}}\pi (\gamma ) < \infty \end{aligned}$$
(3.1)

and \(\mu _0 {:=}(e_0)_\# \pi \in P_1(X)\). Then the curve \(t\mapsto \mu _t {:=}(e_t)_\# \pi \) belongs to \({\mathcal{B}\mathcal{V}}(I: P_1(X))\), and

$$\begin{aligned} |D \mu | \le \int |D \gamma | {\mathrm {\,d}}\pi (\gamma ) \end{aligned}$$
(3.2)

as measures.

Remark 3.2

We emphasize that the right-hand side of (3.2) is a short-hand notation for the measure which is defined by

$$\begin{aligned} \left( \int _{D(I:X)} |D \gamma | {\mathrm {\,d}}\pi (\gamma ) \right) (A) {:=}\int _{D(I:X)} |D \gamma | (A) {\mathrm {\,d}}\pi (\gamma ) \end{aligned}$$
(3.3)

for any Borel set \(A \subset I\). Notice that for an open set U, the map \(\gamma \mapsto |D \gamma |(U)\) is Borel measurable due to the lower semi-continuity of variation (Lemma 2.13). Then by standard measure theory techniques, \(\gamma \mapsto |D \gamma |(A)\) is also Borel measurable, hence the set function in (3.3) is well-defined, finite (by (3.1)), and actually a measure. Moreover, the integral of any (non-negative) Borel function \(f:I \rightarrow {\mathbb {R}}\) with respect to this measure is given by \(\iint f(t) {\mathrm {\,d}}|D \gamma | (t) {\mathrm {\,d}}\pi (\gamma )\).

Proof of Theorem 3.1

By Lemma 2.5, \(\pi \)-a.e. \(\gamma \) has bounded variation and over each interval \((a,b)\subset I\), we have that \(|D\gamma |((a,b))=\mathrm {ess\, Var}(\gamma ;(a,b))=\textrm{Var}(\gamma ;(a,b))\). Fix a point \(\bar{x}\in X\). Using the fact that \(\mu _t = (e_t)_\# \pi \), we have for all \(t\in (0,1)\) that

$$\begin{aligned} \int _X d (\bar{x},x) {\mathrm {\,d}}\mu _t(x)&=\int _{D(I:X)} d(\bar{x},\gamma (t)) {\mathrm {\,d}}\pi (\gamma )\nonumber \\&\le \int _{D(I:X)} \Big [ d(\bar{x},\gamma (0))+d(\gamma (0),\gamma (t))\Big ] {\mathrm {\,d}}\pi (\gamma )\nonumber \\&\le W_1(\delta _{\bar{x}},\mu _0) + \int _{D(I:X)} |D\gamma |([0,1]) {\mathrm {\,d}}\pi (\gamma )<\infty . \end{aligned}$$
(3.4)

In particular, \(\mu _t\in P_1(X)\) for all \(t\in [0,1)\). At each \(t\in [0,1)\), by right-continuity of curves in D(I : X) and dominated convergence theorem,

$$\begin{aligned} W_1(\mu _t,\mu _{t+r})\le \int _{D(I:X)} d(\gamma (t),\gamma (t+r)){\mathrm {\,d}}\pi (\gamma )\rightarrow 0 \end{aligned}$$

as \(r\rightarrow 0\). Similarly, we can show \(t\mapsto \mu _t\) has left limit in \(P_1(X)\) at each \(t\in (0,1]\). In other words, \((\mu _t)\in D(I:P_1(X))\) and in particular with (3.4), \(t\mapsto \mu _t\in L^1(I:P_1(X))\). Now for arbitrary \(0\le s<t\le 1\), we can estimate

$$\begin{aligned} W_1(\mu _t,\mu _s)&\le \int _{D(I:X)} d(\gamma (t),\gamma (s)){\mathrm {\,d}}\pi (\gamma )\\&\le \int _{D(I:X)} |D\gamma |([s,t]){\mathrm {\,d}}\pi (\gamma )= \left( \int _{D(I:X)} |D\gamma |{\mathrm {\,d}}\pi (\gamma )\right) ([s,t]), \end{aligned}$$

where the second inequality follows from Remark 2.20. Theorem 2.17 ensures that \((\mu _t)\in {\mathcal{B}\mathcal{V}}(I:P_1(X))\), and by Remark 2.19, we have that

$$\begin{aligned} |D\mu |\le \int _{D(I:X)} |D\gamma |{\mathrm {\,d}}\pi (\gamma ). \end{aligned}$$

This concludes the proof. \(\square \)

The previous theorem states that any lift \(\pi \) of a BV-curve \((\mu _t)\) provides an upper bound for the variation measure of \((\mu _t)\) through Eq. (3.2). In the next theorem, a measure \(\tilde{\pi }\) is constructed, using techniques of optimal transportation, that achieves equality and thus entails a key relation on the variation of the curves. Our main result is the following:

Theorem 3.3

(AC- and BV-curves in 1-Wasserstein spaces) Let \((\mu _t)\in {\mathcal{B}\mathcal{V}}(I:P_1(X))\). Then there exists a probability measure \(\tilde{\pi }\in P(D(I:X))\) such that

  1. (i)

    \(\tilde{\pi }\) is concentrated on \({\mathcal{B}\mathcal{V}}(I:X)\subset D(I:X)\);

  2. (ii)

    \((e_t)_\#\tilde{\pi }=\mu _t\) for all \(t\in I\);

  3. (iii)

    The total variation measure \(|D\mu |\) satisfiesFootnote 2

    $$\begin{aligned} |D\mu |=\int |D\gamma |{\mathrm {\,d}}\tilde{\pi }(\gamma ). \end{aligned}$$
    (3.5)

Moreover, the absolutely continuous part \(|{\dot{\mu }}|{\mathcal {L}}^1\) of the measure \(|D\mu |\), given by the metric derivative, satisfies

$$\begin{aligned} |{\dot{\mu }}_t| = \lim _{h \rightarrow 0 } \int \frac{ d (\gamma _{t}, \gamma _{t+h})}{|h|} {\mathrm {\,d}}\tilde{\pi }(\gamma ) = \lim _{h \rightarrow 0 } \int \frac{ |D \gamma |([t,t+h])}{|h|} {\mathrm {\,d}}\tilde{\pi }(\gamma ) \end{aligned}$$
(3.6)

for \({\mathcal {L}}^1\)-a.e. \(t\in I\).

In particular, if \((\mu _t) \in {\mathcal{A}\mathcal{C}}^1(I:P_1(X))\), then \(|D\mu |=|{\dot{\mu }}|{\mathcal {L}}^1\) and the metric speed \(|{\dot{\mu }}_t|\) is characterized by the equation above.

Proof

Our proof firmly follows the one of [8, Theorem 5] with modifications for BV-curves established in Sect. 2.

For any integer \(N\in {\mathbb {N}}\), we divide I into \(2^N\) pieces and denote \(t^i{:=}i/2^N\) for \(i=0,\ldots ,2^N\). Let \(X_i\) represent the i-th copy of X and take the product space

$$\begin{aligned} {{\textbf {X}}}_{N} {:=}X_0\times X_1\times \cdots \times X_{2^N}. \end{aligned}$$

It is always possible (see e.g. [3, Sect. 5.3]) to find \(\eta _N\in P({{\textbf {X}}}_N)\) s.t.

$$\begin{aligned} \textrm{Pr}^i_{\#}\eta _N=\mu _{t^i},\quad \textrm{Pr}^{i,i+1}_{\#}\eta _{N}\in \textrm{Opt}(\mu _{t^i},\mu _{t^{i+1}}), \end{aligned}$$

where \(\textrm{Opt}(\mu _{t^i},\mu _{t^{i+1}})\) is the set of optimal couplings between \(\mu _{t^{i}}\) and \(\mu _{t^{i+1}}\) and the maps \(\textrm{Pr}^i\), \(\textrm{Pr}^{i,j}\) are projections from \({{\textbf {X}}}_{N}\) to the i-th, (ij)-th component, respectively. Finally, we define the filling map \(\sigma :{{\textbf {x}}}=(x_0,\ldots ,x_{2^N})\in {{\textbf {X}}}_N\mapsto \sigma _x\in L^1(I:X)\) by

$$\begin{aligned} \sigma _x(t){:=}x_i,\quad t\in [t^i,t^{i+1}); \quad \sigma _{{{\textbf {x}}}}(1){:=}x_{2^N}; \end{aligned}$$

and set \(\pi _N{:=}\sigma _{\#}\eta _N\in P(L^1(I:X))\).

Step 1 (Tightness of \(\{\pi _N:N\in {\mathbb {N}}\}\)). It is known that tightness of \(\{\pi _N:N\in {\mathbb {N}}\}\) is equivalent to the existence of a function \(\Phi :L^1(I:X)\rightarrow [0,\infty ]\) whose sublevels \(\lambda _c(\Phi ){:=}\{u\in L^1(I:X):\Phi (u)\le c\}\) are compact in \(L^1(I:X)\) for any \(c\in {\mathbb {R}}_{+}\) and

$$\begin{aligned} \sup _{N\in {\mathbb {N}}} \int _{L^1(I:X)}\Phi (u){\mathrm {\,d}}\pi _N(u)<\infty . \end{aligned}$$
(3.7)

Clearly, \(\{\mu _t:t\in [0,1]\}\) is bounded in \(P_1(X)\), so for fixed \(\bar{x}\in X\)

$$\begin{aligned} C_1{:=}\sup _{t\in I}\int _{X}d(x,\bar{x}){\mathrm {\,d}}\mu _t(x)<\infty . \end{aligned}$$

From Lemma 2.16, \(\{\mu _t:t\in [0,1]\}\) is precompact in \(P_1(X)\), so by Prokhorov’s theorem it is tight which means there is a \(\psi :X\rightarrow [0,\infty ]\) whose sublevels \(\lambda _c(\psi ){:=}\{x\in X:\psi (x)\le c\}\) are compact in X for all \(c\in {\mathbb {R}}_{+}\) and

$$\begin{aligned} C_2{:=}\sup _{t\in I} \int _{X}\psi (x){\mathrm {\,d}}\mu _t(x)<\infty . \end{aligned}$$

We claim that the function

$$\begin{aligned} \Phi (u){:=}\int ^1_0d(u(t),\bar{x}){\mathrm {\,d}}t+ \int ^1_0\psi (u(t)){\mathrm {\,d}}t+ \sup _{0<h<1}\int ^{1-h}_0\frac{d(u(t),u(t+h))}{h}{\mathrm {\,d}}t, \end{aligned}$$

on \(L^1(I:X)\) satisfies (3.7).

Firstly, for each \(\pi _N\),

$$\begin{aligned} \int _{L^1(I:X)}\int ^1_0d(u(t),\bar{x})+\psi (u(t)){\mathrm {\,d}}t{\mathrm {\,d}}\pi _N&= \int ^1_0\int _{{{\textbf {X}}}_N}d(\sigma _{{{\textbf {x}}}}(t),\bar{x})+\psi (\sigma _{{{\textbf {x}}}}(t)){\mathrm {\,d}}\eta _N({{\textbf {x}}}){\mathrm {\,d}}t\\&= \sum ^{2^N-1}_{i=0}\int ^{t^{i+1}}_{t^i}\int _X d(x,\bar{x})+\psi (x){\mathrm {\,d}}\mu _{t^i}(x){\mathrm {\,d}}t\\&= \frac{1}{2^N}\sum _{i=0}^{2^N-1}\int _Xd(x,\bar{x})+\psi (x){\mathrm {\,d}}\mu _{t^i}(x) \le C_1+C_2. \end{aligned}$$

Secondly, from Lemma 2.18, for every \(u=\sigma _{{{\textbf {x}}}}\), \({{\textbf {x}}}=(x_0,x_1,\ldots ,x_{2^N})\), we have

$$\begin{aligned} \sup _{0<h<1}\int ^{1-h}_0\frac{d(u(t),u(t+h))}{h}{\mathrm {\,d}}t&=\sup _{0<h<1/2^N}\int ^{1-h}_0\frac{d(u(t),u(t+h))}{h}{\mathrm {\,d}}t\\&=\sup _{0<h<1/2^N}\sum ^{2^N-2}_{i=0}\int ^{t^{i+1}}_{t^{i+1}-h}\frac{d(\sigma _{{{\textbf {x}}}}(t),\sigma _{{{\textbf {x}}}}(t+h))}{h}{\mathrm {\,d}}t\\&= \sum ^{2^N-2}_{i=0}d(x_i,x_{i+1}). \end{aligned}$$

Integrating the above equality over \(\pi _N\), one has

$$\begin{aligned} \int _{L^1(I:X)}\sup _{0<h<1}\int ^{1-h}_0\frac{d(u(t),u(t+h))}{h}{\mathrm {\,d}}t{\mathrm {\,d}}\pi _N&=\int _{L^1(I:X)}\sum ^{2^N-2}_{i=0}d(x_i,x_{i+1}){\mathrm {\,d}}\pi _N\nonumber \\&=\sum ^{2^N-2}_{i=0}\int _{L^1(I:X)}d(x_i,x_{i+1}){\mathrm {\,d}}\pi _N\nonumber \\&=\sum ^{2^N-2}_{i=0}W_1(\mu _{t^{i}},\mu _{t^{i+1}}) \nonumber \\&\le \textrm{Var}(\mu ;[0,1))\le |D\mu |([0,1]), \end{aligned}$$
(3.8)

where the last inequality follows from the fact that \((\mu _t)\in D(I:P_1(X))\) (see Lemma 2.5).

Combining two above estimates, we obtain

$$\begin{aligned} \sup _{N\in {\mathbb {N}}} \int _{L^1(I:X)}\Phi (u){\mathrm {\,d}}\pi _N(u)\le C_1+C_2+|D\mu |([0,1])<\infty , \end{aligned}$$

which proves (3.7). The precompact criterion in [8, Theorem 2] guarantees all \(\{\Phi \le c\}\) are precompact. For the tightness of \(\{\pi _N\}\), it remains to show that all sublevels of \(\Phi \) are closed in \(L^1(I:X)\). It suffices to prove \(\Phi \) is lower semi-continuous with respect to \(L^1\)-convergence, which is a consequence of Fatou’s Lemma. Indeed, given any \(u_n\rightarrow u\) in \(L^1\) (and it is not restrictive to assume further \(u_n(t)\rightarrow u(t)\) for \({\mathcal {L}}^1\)-a.e. \(t\in I\)), we have

$$\begin{aligned} \sup _{0<h<1}\int ^{1-h}_{0}\Delta _hu(t){\mathrm {\,d}}t&= \sup _{0<h<1}\int ^{1-h}_{0}\liminf _{n\rightarrow \infty }\Delta _hu_n(t){\mathrm {\,d}}t\\&\le \sup _{0<h<1}\liminf _{n\rightarrow \infty }\int ^{1-h}_{0}\Delta _hu_n(t){\mathrm {\,d}}t\\&\le \liminf _{n\rightarrow \infty }\sup _{0<h<1}\int ^{1-h}_{0}\Delta _hu_n(t){\mathrm {\,d}}t. \end{aligned}$$

In conclusion, by Prokhorov’s theorem, there exists \(\pi \in P(L^1(I:X))\) and a subsequence \(N_k\) such that \(\pi _{N_k}\rightarrow \pi \) narrowly in \(P(L^1(I:X))\) as \(k\rightarrow \infty \).

Step 2 (\(\pi \) is concentrated on BV(I : X)). As shown in the end of Step 1, the function

$$\begin{aligned} L^1(I:X)\ni u\mapsto \sup _{0<h<1}\int ^{1-h}_{0}\Delta _hu(t){\mathrm {\,d}}t \end{aligned}$$

is lower semi-continuous and bounded from below. So by narrowly convergence of \(\pi _{N}\)

$$\begin{aligned} \int _{L^1(I:X)}\left( \sup _{0<h<1}\int ^{1-h}_{0}\Delta _hu(t){\mathrm {\,d}}t\right) {\mathrm {\,d}}\pi&\le \liminf _{N\rightarrow \infty }\int _{L^1(I:X)}\left( \sup _{0<h<1}\int ^{1-h}_{0}\Delta _hu(t){\mathrm {\,d}}t\right) {\mathrm {\,d}}\pi _N\nonumber \\&\le |D\mu |([0,1])<\infty , \end{aligned}$$
(3.9)

where the second inequality comes from (3.8). Therefore,

$$\begin{aligned} \sup _{0<h<1}\int ^{1-h}_{0}\Delta _hu(t){\mathrm {\,d}}t<\infty ,\quad \text {for } \pi \text {-a.e. } u\in L^1(I:X). \end{aligned}$$

By Theorem 2.17, \(\pi \) is concentrated on the Borel subset \(BV(I:X)\subset L^1(I:X)\). Considering push-forward via the Borel selection map \(T:BV(I:X)\subset L^1\rightarrow D(I:X)\) in Proposition 2.14, we can construct the probability measure

$$\begin{aligned} \tilde{\pi }{:=}T_{\#}\pi \in P(D(I:X)), \end{aligned}$$

which is concentrated on \({\mathcal{B}\mathcal{V}}(I:X)\).

Step 3 (Proof of (ii) and (iii)). Recall that for any BV-function u,

$$\begin{aligned} |Du|((s,t))=\sup _{0< h<t-s}\int _s^{t-h} \Delta _h u(t){\mathrm {\,d}}t,\quad 0\le s<t\le 1. \end{aligned}$$

Then we can repeat Step 1 to produce (3.9) on each subinterval \([s,t]\subset I\):

$$\begin{aligned} \int _{D(I:X)}|D{\tilde{u}}|{\mathrm {\,d}}\tilde{\pi }({\tilde{u}})((s,t))&=\int _{L^1(I:X)}|D(T(u))|((s,t)){\mathrm {\,d}}\pi (u)\\&=\int _{L^1(I:X)}\sup _{0< h<t-s}\int _s^{t-h} \Delta _h u(t){\mathrm {\,d}}t{\mathrm {\,d}}\pi (u)\\&\le \liminf _{N\rightarrow \infty }\int _{L^1(I:X)}\left( \sup _{0<h<t-s}\int ^{t-h}_{s}\Delta _hu(t){\mathrm {\,d}}t\right) {\mathrm {\,d}}\pi _N\\&\le |D\mu |([s,t]). \end{aligned}$$

Together with Theorem 3.1, (iii) will be proved after obtaining (ii).

At last, for (ii), fix any test functions \(\varphi \in C_b(X)\) and \(\xi \in C_b(I)\). By noticing \(u\mapsto \int _{[0,1]}\xi (t)\varphi (u(t)){\mathrm {\,d}}t\) is continuous on \(L^1(I:X)\) and \((e_t)_{\#}\pi _N=\mu _{t^i}\), for each \(N\in {\mathbb {N}}\), \(t\in [t^i,t^{i+1})\), we have

$$\begin{aligned} \int _{D(I:X)}\int _0^1\xi (t)\varphi (u(t)){\mathrm {\,d}}t {\mathrm {\,d}}\tilde{\pi }(u)&=\int _{L^1(I:X)}\int _0^1\xi (t)\varphi (u(t)){\mathrm {\,d}}t {\mathrm {\,d}}\pi (u) \nonumber \\&=\lim _{N\rightarrow \infty }\int _{L^1(I:X)}\int _0^1\xi (t)\varphi (u(t)){\mathrm {\,d}}t {\mathrm {\,d}}\pi _{N}(u)\nonumber \\&=\lim _{N\rightarrow \infty }\int _0^1\xi (t)\int _{L^1(I:X)}\varphi (u(t)) {\mathrm {\,d}}\pi _{N}(u){\mathrm {\,d}}t\nonumber \\&=\lim _{N\rightarrow \infty }\sum ^{2^N-1}_{i=0}\int ^{t^{i+1}}_{t^i}\xi (t)\int _X\varphi (x){\mathrm {\,d}}(e_t)_{\#}\pi _N(x){\mathrm {\,d}}t\nonumber \\&=\lim _{N\rightarrow \infty }\sum ^{2^N-1}_{i=0}\int ^{t^{i+1}}_{t^i}\xi (t)\int _X\varphi (x){\mathrm {\,d}}\mu _{t^i}(x){\mathrm {\,d}}t\nonumber \\&= \lim _{N\rightarrow \infty }\sum ^{2^N-1}_{i=0}\frac{1}{2^N}\xi (t^i)\int _X\varphi (x){\mathrm {\,d}}\mu _{t^i}(x), \end{aligned}$$
(3.10)

where the last limit is guaranteed by continuity and boundedness of \(\xi \) and \(\varphi \).

Since \(t\mapsto \mu _t\in D(I:P_1(X))\), the function

$$\begin{aligned} t\mapsto \xi (t)\int _X\varphi (x){\mathrm {\,d}}\mu _t(x) \end{aligned}$$

is in \(D(I:{\mathbb {R}})\) so it is continuous outside a set of countably many points, and in particular is Riemannian integrable. As a result, the limit of Riemann sums in (3.10) is equal to \(\int \int \xi (t)\varphi (x){\mathrm {\,d}}\mu _t{\mathrm {\,d}}t\). The arbitrariness of \(\xi \) means

$$\begin{aligned} \int _{D(I:X)}\varphi (u(t)){\mathrm {\,d}}\tilde{\pi }(u)=\int \varphi (x){\mathrm {\,d}}\mu _t \end{aligned}$$
(3.11)

for \({\mathcal {L}}^1\)-a.e. \(t\in [0,1]\). By Proposition 2.15, the function

$$\begin{aligned} t\mapsto \int _{D(I:X)} \varphi (u(t)){\mathrm {\,d}}\tilde{\pi }(u) \end{aligned}$$

is right-continuous. Therefore, (3.11) holds for all \(t\in [0,1)\) and \(\varphi \in C_b(X)\), which implies \((e_t)_\#\tilde{\pi }=\mu _t\).

Step 4 (Marginal constraint at \(t=1\)). When \(t\mapsto \mu _t\) is left-continuous at \(t=1\), i.e. \(\mu _{1^-}= \mu _{1}\), we can modify curves in D(I : X) where \(\tilde{\pi }\) is concentrated to let them be left-continuous at \(t=1\). After that, both sides in (3.11) depend continuously on t around \(t=1\), leading to \((e_{1})_{\#}\tilde{\pi }=\mu _1\).

When the left limit is \(\mu _{1^-}\ne \mu _{1}\), take \(\sigma \in \textrm{Opt}(\mu _{1^-},\mu _1)\) and let us define the measure \(\hat{\pi }{:=}C_{\#}\tilde{\pi }\) by performing the continuation at \(t=1\), where

$$\begin{aligned} C:D(I:X)\rightarrow D(I:X),\quad C(u)(t){:=}\left\{ \begin{array}{ll} u(t),&{} t\in [0,1)\\ \lim _{s\nearrow 1}u(s),&{} t=1 \end{array}\right. . \end{aligned}$$

Denote by \(D_1(I:X)\subset D(I:X)\) the closed subset of all Càdlàg curves left-continuous at \(t=1\). Notice that there is a natural Borel isomorphism between \(D_1(I:X)\) and D([0, 1) : X). Therefore, we may regard D(I : X) as the product of Polish spaces D([0, 1) : X) and X. In this way, \(\hat{\pi }\in P(D([0,1):X)\times X)\) with \(\textrm{Pr}^2_\#\hat{\pi }=\mu _{1^-}\), and thus it can be glued together with \(\sigma \) (along X) to obtain a probability measure \(\omega \) on \(D([0,1):X)\times X\times X\) by the gluing lemma, see e.g. [2, Lemma 3.1]. The projection \(\textrm{Pr}^{1,3}_{\#}\omega \) of \(\omega \) on the first and third marginal will be the desired \(\tilde{\pi }\) (with a slight abuse of notation, still denoted by \(\tilde{\pi }\)).

Clearly, the new \(\tilde{\pi }\) verifies (ii) and (3.5). Additionally, from the construction that \(\textrm{Pr}^{2,3}\omega =\sigma \in \textrm{Opt}(\mu _{1^{-}},\mu _1)\), we have

$$\begin{aligned} W_1(\mu _{1^{-}},\mu _1)=\int d(\gamma _{1^{-}},\gamma _1){\mathrm {\,d}}\tilde{\pi }(\gamma ). \end{aligned}$$

Together with Lemma 2.5, it means

$$\begin{aligned} \textrm{Var}(\mu ;I)=\int \textrm{Var}(\gamma ;I){\mathrm {\,d}}\tilde{\pi }(\gamma ). \end{aligned}$$

Proof of the final claim. Suppose first that \((\mu _t)\) is absolutely continuous, then in particular, it is a curve of bounded variation and thus (i)–(iii) already hold. To derive (3.6), observe that for \({\mathcal {L}}^1\)-a.e. \(t\in I\)

$$\begin{aligned} |{\dot{\mu }}|(t)&= \lim _{h \rightarrow 0 } \frac{1}{|h|}\int _{t}^{t+h} |{\dot{\mu }}|(s) {\mathrm {\,d}}s \\&= \lim _{h \rightarrow 0 } \frac{1}{|h|} |D\mu |([t,t+h]) \\&= \lim _{h \rightarrow 0 } \int \frac{ |D\gamma |([t,t+h])}{|h|} {\mathrm {\,d}}\pi (\gamma ) \\&\ge \lim _{h \rightarrow 0 } \int \frac{ d (\gamma _{t}, \gamma _{t+h})}{|h|} {\mathrm {\,d}}\pi (\gamma ) \\ {}&\ge \lim _{h \rightarrow 0 } \frac{W_1(\mu _t,\mu _{t+h})}{|h|} {=:}|{\dot{\mu }}_t| = |{\dot{\mu }}|(t) \end{aligned}$$

where in the third step, we used (3.5). Therefore, the statement follows whenever \((\mu _t)\) is absolutely continuous.

We now claim that the argument above works also in the non-absolutely continuous case. Indeed, Lemma 2.8 guarantees the equivalence between \(|{\dot{\mu }}|(t)\) and the second line as well as the validity of the last equality (recall the notation explained in Sect. 2.2). In the other steps, absolute continuity was not used. Hence, the conclusion follows. \(\square \)

3.2 Remarks on the main results

In this section, we shed light on the main results by providing some examples.

First of all, note that in general, we can not expect any uniqueness of \(\tilde{\pi }\) in Theorem 3.3. Indeed, the uniqueness is not true even in Lisini’s original result [8, Theorem 5] for \(p>1\). However, when \(p>1\), there are cases where the lift is unique. For instance, when the underlying space is non-branching, then the lift of any constant-speed geodesic must be unique (see e.g. [2, Proposition 3.16]). The following example illustrates that in the case \(p=1\), where the cost lacks strict convexity, uniqueness fails throughout, even in Euclidean spaces.

Example 3.4

(Nonuniqueness of lifts) Let \(\mu _0\) and \(\mu _1\) be two probability measures on \({\mathbb {R}}\) supported inside \([-2,-1]\) and [1, 2], respectively. Clearly \(t\mapsto \mu _t{:=}(1-t)\mu _0+t\mu _1\) is a constant-speed geodesic under \(W_1\)-distance and notice that every coupling between \(\mu _0\) and \(\mu _1\) is optimal. For any coupling induced by a Borel map T, let us define a family of curves labelled by \(\alpha \in [0,1]\) and \(x \in \text {supp}(\mu _0)\) in the following way

$$\begin{aligned} t \mapsto \gamma _t^{(\alpha ,x)} {:=}x\mathbbm {1}_{[0,\alpha )}(t)+T(x)\mathbbm {1}_{[\alpha ,1]}(t). \end{aligned}$$

This is in fact a generalized version of the construction in Example 1.1 (in which we had \(\text {supp}(\mu _0) =\{ 0\}\) and \(T(x) = 1\)). Now, similar to that example, one can check that the measure

$$\begin{aligned} \pi {:=}(\gamma ^{(\cdot ,\cdot )})_{\#}({\mathcal {L}} |_{[0,1]} \otimes \mu _0 ) \end{aligned}$$

satisfies (i)–(iii) in the theorem above. Whenever there exist at least two transport maps (e.g. in the case that \(\mu _0\) and \(\mu _1\) are uniformly distributed) then the lift \(\pi \in P (D(I:X))\) of \((\mu _t)\) will no longer be unique.

Another natural question is to what extend, an AC-curve in the 1-Wasserstein space has lifts on AC-curves satisfying the optimality condition (3.5). Recall that Example 1.1 already provides an extreme case where no lift on continuous curves is possible. The example below demonstrates that the existence of lifts on AC-curves is not a guarantee for finding an optimal one.

Example 3.5

(Non-optimality of AC lifts) Take \(\gamma \in {\mathcal{A}\mathcal{C}}(I:{\mathbb {R}}^2)\) with unit length and assume \(\gamma \) is not length-minimizing, i.e., \(d(\gamma _0,\gamma _1)<1\). Consider \((\mu _t) \subset P_1({\mathbb {R}}^2)\) defined as

$$\begin{aligned} \mu _t {:=}\frac{1}{2} \Big ( (1-t) \delta _{\gamma _0} + t \delta _{\gamma _1}\Big ) + \frac{1}{2} {\mathcal {H}}^1|_{\gamma }, \quad t \in [0,1], \end{aligned}$$

where \({\mathcal {H}}^1|_{\gamma }\) is the 1-dimensional Hausdorff measure of \(\gamma \). First of all, it can be readily checked that \((\mu _t)\) is a constant-speed geodesic. Secondly, there exists a lift \(\pi \) of \((\mu _t)\) concentrated on AC-curves. For example, take two families of AC-curves

$$\begin{aligned} \gamma ^{(1,\alpha )}(t){:=}\left\{ \begin{array}{ll} \gamma (0), &{} t\le \alpha \\ \gamma (t-\alpha ),&{} t>\alpha \end{array}\right. ,\quad \gamma ^{(2,\alpha )}(t){:=}\left\{ \begin{array}{ll} \gamma (t+\alpha ), &{} t\le 1-\alpha \\ \gamma (1),&{} t>1-\alpha \end{array}\right. . \end{aligned}$$

Then \(\pi {:=}\frac{1}{2}\gamma ^{(1,\cdot )}_{\#}{\mathcal {L}}^1+\frac{1}{2}\gamma ^{(2,\cdot )}_{\#}{\mathcal {L}}^1\) can be checked to satisfy \((e_t)_{\#}\pi =\mu _t\) for each \(t\in [0,1]\). However, \(|{\dot{\mu }}_t| < \int |{\dot{\gamma }}_t| {\mathrm {\,d}}\pi (\gamma )\). Actually, there is no way for any lift \(\pi \) concentrated on AC-curves to achieve the equality (3.5). Because if \((\mu _t)\) is optimally transported along continuous curves, they have to be length-minimizing. But on the other hand, (almost all) curves in \(\pi \) have to lie inside \(\gamma \), as each \(\textrm{supp}(\mu _t)\) does.

Observe that if \(\gamma \) in the example above is length-minimizing, then the constructed lift is indeed optimal. In this case, all measures \(\mu _t\) live in a convex set. One can ask whether the strategy above could yield an optimal lift concentrated on AC-curves if we restrict all \(\mu _t\) to be fully supported on a common convex domain. We further add the assumption \(\mu _t=\rho _t{\mathcal {L}}^n\) of absolute continuity of measures, trying to exclude the teleporting phenomenon, which appears for instance when replacing the one-dimensional Hausdorff measure by a higher-dimensional one in Example 3.5. However, such convincing-sounded assumptions (even with uniform bounds on densities \(\rho _t\)) turn out to fail again for obtaining a lift on AC-curves, at least in higher dimensions, as shown below. This is again to emphasize that it is necessary to relax the classical notion of lift and consider a larger class of curves.

Example 3.6

Consider a curve of probability measures on \({\mathbb {R}}^2\) defined as

$$\begin{aligned} \mu _t{:=}\frac{1}{2}\rho _t{\mathcal {L}}^2+\frac{1}{2}{\mathcal {L}}^2|_{[0,1]\times [0,1]}, \quad t \in [0,1], \end{aligned}$$
(3.12)

where the density \(\rho _t:{\mathbb {R}}^2\rightarrow {\mathbb {R}}\) at time t is defined as

$$\begin{aligned} \rho _t{:=}\left\{ \begin{array}{ll} \frac{1}{\varepsilon }, &{} \quad \mathrm {if\ } x\in [0,\varepsilon ]\times [t,1]\bigcup [1-\varepsilon ,1]\times [0,t], \\ 0 &{} \quad \textrm{otherwise}. \end{array}\right. \end{aligned}$$

Here \(\frac{1}{2}>\varepsilon >0\) is a fixed parameter. Measure \(\mu _t\) is shown in Fig. 2 (left). Since the curve \(t\mapsto \rho _t{\mathcal {L}}^2\) is a constant speed 1-Wasserstein geodesic, so is the curve \((\mu _t)\). While the curve \((\mu _t)\) has infinitely many lifts, all of them have the property that they (up to neglecting a zero measure set of curves) only transport horizontally. This allows us to reduce the study of a possible lift to the 1-dimensional problem of the following measures on [0, 1]. Given any \(y\in [0,1]\) (corresponding to the slices of the measures \(\mu _t\) at height y), we define a curve of probability measures on \({\mathbb {R}}\) as

$$\begin{aligned} \mu _t^y{:=}{\left\{ \begin{array}{ll} \frac{1}{2\varepsilon }{\mathcal {L}}^1|_{[0,\varepsilon ]}+\frac{1}{2}{\mathcal {L}}^1|_{[0,1]},&{}\quad \mathrm {if\ } t\le y ;\\ \frac{1}{2\varepsilon }{\mathcal {L}}^1|_{[1-\varepsilon ,1]}+\frac{1}{2}{\mathcal {L}}^1|_{[0,1]},&{}\quad \mathrm {if\ } t> y; \end{array}\right. } \end{aligned}$$
(3.13)

as shown in Fig. 2 (right). Consider now any lift \(\pi ^y\) of \((\mu ^y_t)\) as in the Theorem 3.3. Since \(|D\mu ^y|=|D\mu ^y|^J=\frac{1-\varepsilon }{2}\delta _y\), we have that \(\pi ^y(\Gamma _y)=\frac{1-\varepsilon }{2}>0\), where \(\Gamma ^y\) is the collection of curves that jump at \(t=y\). We conclude that any optimal lift \(\pi \) of \((\mu _t)\) as in Theorem 3.3 gives positive mass for the set of non-absolutely continuous curves.

Fig. 2
figure 2

(Example 3.6) Left image shows the measure \(\mu _t\) in (3.12) for \(\varepsilon =1/4\) at time \(t=2/5\). Right image shows the conditional measure \(\mu _t^y\) in (3.13) given \(y=3/5\) at different times

4 Applications

4.1 Characterization of BV-curves

An immediate consequence of combining Theorem 3.1 and Theorem 3.3 is that one can characterize BV-curves in 1-Wasserstein spaces:

Corollary 4.1

(Characterization of BV-curves in 1-Wasserstein spaces) Let \((\mu _t)\subset P(X)\) with \(\mu _0 \in P_1(X)\). Then \((\mu _t) \in {\mathcal{B}\mathcal{V}}(I:P_1(X))\) if and only if there exists \(\pi \in P(D(I:X))\) so that

  1. (i)

    \(\pi \) is concentrated on \({\mathcal{B}\mathcal{V}}(I:X)\subset D(I:X)\);

  2. (ii)

    \((e_t)_\#\pi =\mu _t\) for all \(t\in [0,1]\);

  3. (iii)

    \( \int |D \gamma | (I) {\mathrm {\,d}}\pi < \infty . \)

Characterization of AC-curves in 1-Wasserstein space using their lifts however remains challenging. A naive extension of the characterization in the case \(p>1\) to the case \(p=1\) would lead to \(\int \int _{I} |{\dot{\gamma }}_t| {\mathrm {\,d}}t {\mathrm {\,d}}\pi < \infty \), which is a well-defined condition as the metric derivative for BV-curves still exists \({\mathcal {L}}^1\)-a.e. However, this condition does not guarantee even continuity, let alone absolute continuity. In fact, it is already weaker than (iii). Recall Example 1.1, where any lift of the absolutely continuous curve \((\mu _t)\) is concentrated on discontinuous curves. Continuous curves of bounded variation, on the other hand, can be easily characterized, for which the following observation is useful:

Proposition 4.2

Under the assumptions of Theorem 3.3, we have for all \(t\in I\)

$$\begin{aligned} |D\mu |^{J} ( \{t\} ) = \int |D\gamma |^{J} ( \{t\} ) {\mathrm {\,d}}\pi (\gamma ). \end{aligned}$$
(4.1)

Proof

By considering the atomic part in the Lebesgue decomposition of the variation measure and using Eq. (3.5), we obtain

$$\begin{aligned} |D\mu |^{J} ( \{t\} ) = |D\mu | ( \{t\} ) = \int |D\gamma | ( \{t\} ) {\mathrm {\,d}}\pi (\gamma ) = \int |D\gamma |^{J} ( \{t\} ) {\mathrm {\,d}}\pi . \end{aligned}$$

\(\square \)

Equation (4.1) simply means that the jump size in the 1-Wasserstein space is obtained by taking the average over all jumps in the underlying space. An implication is that if \(\mu _t\) jumps at time t (i.e. the left-hand side of (4.1) is non-zero), then at least some curves must jump at this time as well. Notice that this does not contradict the observation in Example 1.1. Even though all the underlying curves in the example jump, this does not lead to a jump in \((\mu _t)\) since at any time t, only one curve jumps and thus has measure zero in the lift. As a result, we conclude that a BV-curve \((\mu _t)\) is continuous if and only if for all \(t\in I\), the set of curves \((\gamma _t)\) which has jump at t has measure zero. This is formally stated in the corollary below:

Corollary 4.3

(Characterization of continuous BV-curves in 1-Wasserstein spaces) We have \((\mu _t) \in {\mathcal{B}\mathcal{V}}(I:P_1(X)) \cap C(I:P_1(X))\) if and only if there exists \(\pi \in P(D(I:X))\) that in addition to (i)–(iii) of Corollary 4.1, satisfies

  1. (iv)

    for all \(t\in I\)

    $$\begin{aligned} \int |D\gamma |(\{t\}){\mathrm {\,d}}\pi (\gamma )=0. \end{aligned}$$

4.2 Characterization of geodesics

Here we apply the main theorem to study geodesics in 1-Wasserstein spaces. To this end, we shall consider a relaxed notion of geodesic:

Definition 4.4

(\({\mathcal{B}\mathcal{V}}\)-geodesics) We call \(\gamma \in D([0,1]:X)\) a \({\mathcal{B}\mathcal{V}}\)-geodesic with respect to distance d if \(d(\gamma _0,\gamma _1)=|D \gamma |([0,1])\).

Theorem 4.5

(Characterization of geodesics in 1-Wasserstein spaces) Let \((\mu _t) \subset P(X)\) with \(\mu _0 \in P_1(X)\). Then \((\mu _t)\) is

  • \({\mathcal{B}\mathcal{V}}\)-geodesic with respect to \(W_1\) distance if and only if there exists \(\pi \in P(D(I:X))\) so that

    1. (i)

      \(\pi \) is concentrated on the set of \({\mathcal{B}\mathcal{V}}\)-geodesics;

    2. (ii)

      \((e_t)_\#\pi =\mu _t\) for all \(t \in I\);

    3. (iii)

      \(W_1(\mu _0,\mu _1)=\int d(\gamma _0,\gamma _1){\mathrm {\,d}}\pi (\gamma )< \infty \).

  • Continuous and length minimising if and only if in addition to (i)–(iii), \(\pi \) satisfies

    1. (iv)

      for all \(t\in I\)

      $$\begin{aligned} \int |D\gamma |(\{t\}){\mathrm {\,d}}\pi (\gamma )=0. \end{aligned}$$
  • Constant-speed geodesic if and only if in addition to (i)–(iii), \(\pi \) satisfies

    1. (v)

      for \({\mathcal {L}}^1\)-a.e. \(t\in I\)

      $$\begin{aligned} \lim _{h \rightarrow 0 } \int \frac{ d (\gamma _{t}, \gamma _{t+h})}{|h|} {\mathrm {\,d}}\pi (\gamma ) = W_1(\mu _0, \mu _1). \end{aligned}$$

Proof

Suppose first that \(\pi \) satisfies (i)–(iii). Then by Theorem 3.1 we have that

$$\begin{aligned} W_1(\mu _0,\mu _1)\le |D\mu |([0,1])\le \int |D\gamma |([0,1]) {\mathrm {\,d}}\pi (\gamma )=\int d(\gamma _0,\gamma _1){\mathrm {\,d}}\pi (\gamma )=W_1(\mu _0,\mu _1). \end{aligned}$$

Hence, all the above inequalities are actually equalities, and thus \((\mu _t)\) is \({\mathcal{B}\mathcal{V}}\)-geodesic.

Suppose now that \((\mu _t)\) is a \({\mathcal{B}\mathcal{V}}\)-geodesic, and let \(\pi \) be given by Theorem 3.3. Then (ii) holds and we have

$$\begin{aligned} W_1(\mu _0,\mu _1)=|D\mu |([0,1])=\int |D\gamma |([0,1]){\mathrm {\,d}}\pi (\gamma )\ge \int d(\gamma _0,\gamma _1){\mathrm {\,d}}\pi (\gamma )\ge W_1(\mu _0,\mu _1), \end{aligned}$$

which proves (iii). Furthermore, since the first inequality is due to the pointwise inequality \(|D\gamma |([0,1])\ge d(\gamma _0,\gamma _1)\), we can actually conclude that there has to be pointwise equality for \(\pi \)-almost every curve \(\gamma \), and hence \(\pi \) is concentrated on \({\mathcal{B}\mathcal{V}}\)-geodesics.

The claim about continuous and length minimizing follows immediately after considering Proposition 4.2. As for the characterization of constant-speed geodesics, thanks to the equality (3.6), it is enough to show the sufficiency, i.e. to show (i)–(iii) and (v) imply being constant-speed geodesic. By Theorem 3.3 and (iii), we always have

$$\begin{aligned} \int \lim _{h \rightarrow 0 } \int \frac{ d (\gamma _{t}, \gamma _{t+h})}{|h|} {\mathrm {\,d}}\pi (\gamma ){\mathrm {\,d}}t&=\int |\dot{\mu }|(t){\mathrm {\,d}}t \\&\le |D\mu |([0,1]) =\int |D\gamma |([0,1]){\mathrm {\,d}}\pi (\gamma )\\&=\int d(\gamma _0,\gamma _1){\mathrm {\,d}}\pi (\gamma )=W_1(\mu _0,\mu _1). \end{aligned}$$

Hence, once (v) holds, equality holds everywhere in the above and in particular,

$$\begin{aligned} |D\mu |=W_{1}(\mu _0,\mu _1){\mathcal {L}}^1|_{[0,1]} \end{aligned}$$

which means that \((\mu _t)\) has to be a constant-speed geodesic. \(\square \)

4.3 Regularity of the curves in superposition

Example 1.1 illustrates that the superposition of a family of discontinuous curves can produce an absolutely continuous Wasserstein curve. One could naturally ask similar questions, e.g., is it possible to get an absolutely continuous curve from superposing continuous singular curves? Here we answer these questions by investigating three different scenarios, where the lift as in Theorem 3.3 is concentrated purely on either absolutely continuous (AC), continuous singular (CS), or jump (J) curves. The answer is summarized in Table 1 and elaborated upon thereafter.

First, note that we can always have a curve with the same regularity as the underlying curves by simply taking \(\mu _t = \delta _{\gamma _t}, t\in I \) (diagonal entries of the table). Secondly, if all curves \(\gamma \) are AC, then \((\mu _t)\) is necessarily AC and thus cannot be CS or J due to the following simple observation:

Remark 4.6

(A sufficient condition for absolutely continuity) Under the assumptions of Theorem 3.3, if the lift \(\tilde{\pi }\) is merely concentrated on \({\mathcal{A}\mathcal{C}}^1(I:X)\), then \((\mu _t) \in {\mathcal{A}\mathcal{C}}^1(I:P_1(X))\). This directly follows from (3.5),

$$\begin{aligned} |D\mu |=\int |D\gamma |{\mathrm {\,d}}\tilde{\pi }(\gamma ) = \int \int _I |{\dot{\gamma }}_t| {\mathrm {\,d}}t {\mathrm {\,d}}\tilde{\pi }(\gamma ) = \int _I \left( \int |{\dot{\gamma }}_t| {\mathrm {\,d}}\tilde{\pi }(\gamma ) \right) {\mathrm {\,d}}t, \end{aligned}$$

which implies that \(|D\mu | \ll {\mathcal {L}}^1|_{I}\) and thus \((\mu _t)\) is an AC-curve.

Next, the case \(\gamma \)-CS, \((\mu _t)\)-J is not possible. As explained after Proposition 4.2, if \((\mu _t)\) jumps, then at least some curves must jump as well. The opposite case, \(\gamma \)-J, \((\mu _t)\)-CS, can however happen. Take, for example,

$$\begin{aligned} \mu _t {:=}(1-c(t))\delta _0+ c(t)\delta _1, \quad t \in I, \end{aligned}$$
(4.2)

where \(c: [0,1] \rightarrow [0,1]\) is the Cantor function. Then, in the same way as Example 1.1, one can construct a lift that is concentrated on jump curves. Lastly, in Example 4.7 below, we show that the case \(\gamma \)-CS, \((\mu _t)\)-AC is possible as well.

As a final remark, it is interesting to notice that all cases in the upper diagonal of Table 1 turn out to be not possible. One can conclude that this is simply due to the fact that one can produce regular Wasserstein curves out of irregular curves, while producing irregular Wasserstein curves purely out of regular curves is not possible.

Example 4.7

(Constant-speed \(W_1\)-geodesics out of arbitrary BV-curves) Here we provide a fairly general way of constructing constant-speed \(W_1\)-geodesics (which are in particular absolutely continuous) for which the lift is concentrated on an arbitrary set of BV-curves, in particular, on a set of non-absolutely continuous curves. Consider \(X = {\mathbb {R}}\) with the Euclidean distance and let \(\sigma _0\in P([0,1])\) be an arbitrary probability measure with \(\sigma _0(\{1\})=0\). The aim of this example is to construct a lift \(\pi \) as in Theorem 3.3 so that the corresponding Wasserstein curve is a constant-speed geodesic connecting \(\delta _0\) to \(\delta _1\), and \(\pi \) is concentrated on curves whose total variation measure are translates of \(\sigma _0\). Then by arbitrarily choosing \(\sigma _0\), we will be able to obtain different geodesics.

First, we define a measure \(\sigma \) by periodically extending \(\sigma |_{[n,n+1)}=\sigma _0|_{[0,1)}\) for all \(n\in {\mathbb {Z}}\). Define a family of curves \(\{ \gamma ^{(\alpha )} \}_\alpha \) labelled by \(\alpha \in [0,1]\)

$$\begin{aligned} \gamma ^{(\alpha )}(t){:=}\sigma ((\alpha ,\alpha +t]), \quad t \in I, \end{aligned}$$

which forms a Borel map \(\gamma ^{(\cdot )}: \alpha \mapsto \gamma ^{(\alpha )}\) from I to D(I : [0, 1]). Indeed, by Lemma 2.12, it follows from the fact that for every t, the composition

$$\begin{aligned} {[}0,1]\overset{\gamma ^{(\cdot )}}{\rightarrow } D(I:[0,1])\overset{e_t}{\rightarrow }[0,1] \end{aligned}$$

is Borel. Notice that the map \(\alpha \mapsto \sigma ((\alpha ,\alpha +t])\) is the difference of two increasing functions, namely, \(\alpha \mapsto \sigma ((0,\alpha ])\) and \(\alpha \mapsto \sigma ((0,\alpha +t])\).

Define \(\pi \in P(D(I:X))\) as

$$\begin{aligned} \pi {:=}(\gamma ^{(\cdot )})_{\#}{\mathcal {L}}^1, \end{aligned}$$

and \(\mu _t{:=}(e_t)_\#\pi \). Clearly \(\pi \) is concentrated on \({\mathcal{B}\mathcal{V}}(I:[0,1])\) so by Theorem 3.1, \((\mu _t) \in {\mathcal{B}\mathcal{V}} (I: P_1(X))\) and

$$\begin{aligned} |D\mu |\le \int |D\gamma |{\mathrm {\,d}}\pi (\gamma )=\int _0^1|D\gamma ^{(\alpha )}| {\mathrm {\,d}}{\mathcal {L}}^1( \alpha ). \end{aligned}$$
(4.3)

As the family \(\{\gamma ^{(\alpha )}\}_\alpha \) is given by translation, we have

$$\begin{aligned} \int _0^1|D\gamma ^{(\alpha )}|{\mathrm {\,d}}{\mathcal {L}}^1( \alpha )={\mathcal {L}}^1. \end{aligned}$$
(4.4)

Actually, for any \(f\in C([0,1]:{\mathbb {R}})\),

$$\begin{aligned} \iint f(t) |D\gamma ^{(\alpha )}|({\mathrm {\,d}}t){\mathcal {L}}^1({\mathrm {\,d}}\alpha )&=\iint f(\iota (t+\alpha ))|D\gamma ^0|({\mathrm {\,d}}t){\mathrm {\,d}}\alpha \\&=\iint f(\iota (t+\alpha )){\mathrm {\,d}}\sigma (t){\mathrm {\,d}}\alpha \\&=\iint f(\alpha ){\mathrm {\,d}}\alpha {\mathrm {\,d}}\sigma (t)=\int f{\mathrm {\,d}}{\mathcal {L}}^1, \end{aligned}$$

where \(\iota :{\mathbb {R}}\rightarrow [0,1)\simeq {\mathbb {R}}/{\mathbb {Z}}\) is the quotient map.

Notice for all \(\gamma ^{(\alpha )}\), we have \(\gamma ^{(\alpha )}(0)=0\) and \(\gamma ^{(\alpha )}(1)=1\). So \(W_1(\mu _0,\mu _1)=W_1(\delta _0, \delta _1)=1\). On the other hand, combining (4.3) and (4.4), we have \(|D\mu |={\mathcal {L}}^1\). In other words, \((\mu _t)\) is a constant-speed geodesic.

In conclusion, we have constructed a constant-speed geodesic \((\mu _t)\) and a lift \(\pi \) of \((\mu _t)\) concentrated on BV-curves whose variation measures are cyclical translations of \(\sigma _0\). Now, different choices \(\sigma _0\) give rise to different \(W_1\)-geodesics between \(\delta _0\) and \(\delta _1\), as shown in Fig. 3. In particular,

  • \(\sigma _0= {\mathcal {L}}|_{[0,1]}\) corresponds to the trivial geodesic \( \mu _t = \delta _t\) (Fig. 3 (A)), which is also the (unique) constant-speed geodesic for \(p>1\).

  • \(\sigma _0= \delta _0\) corresponds to geodesic \(\mu _t = (1-t)\delta _0+t\delta _1\), studied in Example 1.1.

  • Finally, by choosing \(\sigma _0\) to be a probability measure with no atoms and no absolutely continuous part, we get that \(\pi \) is concentrated on BV-curves that are continuous but not absolutely continuous.

Fig. 3
figure 3

(Example 4.7) Construction of different constant-speed geodesics in \(P_1({\mathbb {R}})\) between \(\delta _0\) and \(\delta _1\). Left sub-figures show example measures \(\sigma _0\) and their periodic extension \(\sigma \). Right sub-figures show sample curves \(\gamma ^{(\alpha )}, \alpha \in [0,1]\). Curves with \(\alpha =0, 0.5\) are highlighted in red and blue, respectively

4.4 Continuity equation in discrete setting

Theorem 3.3 is a useful tool for the study of BV-curves in 1-Wasserstein spaces. In the continuous setting, it is well known that whenever the space X has a kind of differential structure, absolutely continuous curves \((\mu _t) \subset P_p(X)\), for \(p>1\), are related to solutions of the continuity equation (see e.g. [3, Chapter 8]). More precisely, one can find a time-dependent Borel velocity field \(v_t: X \rightarrow X\) of \((\mu _t)\) so that the continuity equation

$$\begin{aligned} \partial _t \mu _t+ \nabla \cdot (v_t\mu _t)=0 \quad \textrm{in} \, X \times I \end{aligned}$$

holds and \(|{\dot{\mu }}_t|^p=\int |v_t|^p(x){\mathrm {\,d}}\mu _t(x)\).

Concerning the continuity equation, the case \(p=1\) is far more involved, not least due to the presence of non-localities as seen already in Example 1.1. While the exponent \(p=1\) creates great difficulties in the continuous setting, it also opens up the possibility of studying analogous questions in the discrete setting. The discrete counterpart to the continuity equation, sometimes referred to as the current equation, is also studied in the literature and has a tight connection with Markov chains. In [11], among other things, Léonard derives a Benamou–Brenier type formula relating \(W_1(\mu _0,\mu _1)\) to the current equation on metric graphs. See also [12], where an alternative metric on the space of probability measures on a finite set X is introduced via modifying the Benamou–Brenier formula in order to have a gradient flow interpretation of the heat flow in a discrete setting. Different aspects of this metric are studied later in [6], in particular, for the characterization of absolutely continuous curves in the corresponding metric space.

In this section, we study the current equation in a countable and proper metric space for which the induced topology is discrete. More precisely, we show that the current equation can be directly recovered from Theorem 3.3, yielding that for a given BV-curve \((\mu _t)\) there exists \(v_t :X\rightarrow {\mathcal {M}}(X), x \mapsto v_t^{x}\), so that the pair \((\mu _t,v_t)\) satisfies the current equation. The obtained \(v_t\) (or rather \(d(x,\cdot ) v_t^{x}(\cdot )\)) can be interpreted as a velocity field, but unlike the continuous setting, it is a time-dependent positive measure over the space X. While the (pointwise defined) continuity equation makes sense for BV-curves due to almost everywhere differentiability, the result is more meaningful whenever \((\mu _t)\) is an absolutely continuous curve and therefore completely characterized by the continuity equation.

Setting. Throughout this section, (Xd) is a countable and proper metric space whose induced topology is discrete and we adopt the notation \(\mu _{t}(x) = \mu _{t}(\{x\})\), \(x\in X\). We start by recalling the current equation:

Definition 4.8

(current equation) A family \((\mu _t,v_t)_{t\in I}\) with \(\mu _t\in P(X)\), \(v_t:X\rightarrow {\mathcal {M}}(X)\), is said to satisfy the current equation if for every \(x\in X\)

$$\begin{aligned} \frac{{\mathrm {\,d}}}{{\mathrm {\,d}}t}\mu _t(x)=\sum _{y \in X} v_t^y(x)\mu _t(y)-\mu _t(x)\sum _{y \in X} v_t^x(y),\quad {\mathcal {L}}^1\text {-a.e. } t\in I. \end{aligned}$$
(4.5)

The following lemma states a useful observation for Wasserstein curves in discrete spaces, whose proof is included in the argument of Theorem 4.10.

Lemma 4.9

Let \((\mu _t)\subset P_1(X)\). If \(t\mapsto \mu _t\) is BV or absolutely continuous, then for each \(x\in X\), \(t\mapsto \mu _t(x)\) is BV or absolutely continuous, respectively. The reverse is also true when all measures \(\mu _t\) are supported inside a common bounded set.

Theorem 4.10

(BV-curves and current equation) Let \((\mu _t)\in {\mathcal{B}\mathcal{V}}(I:P_1(X))\) and assume that for each \(t\in I=[0,1]\), \(\textrm{supp}(\mu _t)\) is bounded. Then there exists \((v_t)\) so that \((\mu _t,v_t)\) satisfies the current equation. If further all measures \(\mu _t\) are supported inside a common bounded set, then

  1. (i)

    For any \((v_t)\) such that the pair \((\mu _t,v_t)\) satisfies the current equation, we have

    $$\begin{aligned} |{\dot{\mu }}_t| \le \sum _{x,y} d(x,y) v_t^x(y) \mu _t(x), \quad {\mathcal {L}}^1\text {-a.e. } t\in I. \end{aligned}$$
  2. (ii)

    There exists a \((v_t)\) satisfying the current equation such that

    $$\begin{aligned} |{\dot{\mu }}_t|= \sum _{x,y} d(x,y) v_t^x(y) \mu _t(x), \quad {\mathcal {L}}^1\text {-a.e. } t\in I. \end{aligned}$$
    (4.6)

Proof

Proof of the a priori estimate (i). Recall the Kantorovich–Rubinstein theorem,

$$\begin{aligned} W_1(\mu _{t},\mu _{s})&= \sup _{\Vert \psi \Vert _{\textrm{Lip}} \le 1} \left| \int \psi {\mathrm {\,d}}(\mu _{t} - \mu _{s}) \right| , \end{aligned}$$
(4.7)

where the supremum runs over all Lipschitz functions \(\psi : X \rightarrow {\mathbb {R}}\) with constant \(\Vert \psi \Vert _{\textrm{Lip}} \le 1\). Let \((\mu _t,v_t)\) be a pair satisfying the current equation. Due to the existence of \(\frac{{\mathrm {\,d}}}{{\mathrm {\,d}}t}\mu _t(x)\) for \({\mathcal {L}}^1\)-a.e. \(t\in I\), we have

$$\begin{aligned} \frac{\mu _t(x) - \mu _s(x)}{t-s}&= \frac{{\mathrm {\,d}}}{{\mathrm {\,d}}t} \mu _t(x) + \varepsilon _x (|t-s|) \\&= \sum _{y} v_t^y(x) \mu _t(y) - \mu _t(x) \sum _{y} v_t^x(y) + \varepsilon _x (|t-s|) \end{aligned}$$

where the error function \(\varepsilon _x (|t-s|)\), which depends on x, vanishes as \(|t-s| \rightarrow 0\). Then for any 1-Lipschitz function \(\psi \), we obtain

$$\begin{aligned} \int \psi {\mathrm {\,d}}(\mu _{t} - \mu _{s})&= \sum _{x} \psi (x) (\mu _{t}(x) - \mu _{s}(x)) \\&= (t-s) \left( \sum _{x,y} \psi (x) v_t^y(x) \mu _t(y) - \psi (x) v_t^x(y) \mu _t(x) + \sum _{x} \varepsilon _x (|t-s|) \right) \\&= (t-s) \left( \sum _{x,y} \psi (x) v_t^y(x) \mu _t(y) - \psi (y) v_t^y(x) \mu _t(y) + \sum _{x} \varepsilon _x (|t-s|) \right) \\&= (t-s) \left( \sum _{x,y} (\psi (x) - \psi (y)) v_t^y(x) \mu _t(y) + \sum _{x} \varepsilon _x (|t-s|)\right) \\&\le |t-s| \left( \sum _{x,y} d(x,y) v_t^y(x) \mu _t(y) +\sum _{x} \varepsilon _x (|t-s|)\right) \end{aligned}$$

where in the third step, we simply exchanged the indexes of summation in the second term. As all \(\textrm{supp}(\mu _t)\) are confined to a common bounded set, the above summation over x is actually a finite sum. So

$$\begin{aligned} \left| \int \psi {\mathrm {\,d}}(\mu _{t} - \mu _{s}) \right| \le |t-s| \sum _{x,y} d(x,y) v_t^y(x) \mu _t(y) + o (|t-s|). \end{aligned}$$
(4.8)

Since the right-hand side of the equation above no longer depends on the choice of function \(\psi \), we can combine (4.7) and (4.8) to get

$$\begin{aligned} \dot{\mu }_t|= \lim _{s \rightarrow t} \frac{W_1(\mu _t, \mu _s)}{|t-s|} \le \sum _{x,y} d(x,y) v_t^y(x) \mu _t(y), \quad \mathcal {L}^1\text {-a.e. } t\in I. \end{aligned}$$

Proof of the existence and (ii). Let \(\pi \) be the lift of \((\mu _t)\) given by Theorem 3.3 with

$$\begin{aligned} |{\dot{\mu }}_t|=\lim _{h\searrow 0}\int \frac{d(\gamma _t,\gamma _{t+h})}{h}{\mathrm {\,d}}\pi (\gamma ). \end{aligned}$$

Denote by \(\{\pi ^x_t\}\) the disintegration of \(\pi \) with respect to \(e_t\), i.e., \(\pi =\int \pi ^x_t {\mathrm {\,d}}\mu _t(x)\) and furthermore by \(\{\nu _{t+h}^x\}\) the push-forward \(\nu _{t+h}^x{:=}(e_{t+h})_\#\pi ^x_t\). The goal is to first prove that, for fixed x, the measure

$$\begin{aligned} \hat{\nu }_{t+h}^x{:=}\frac{\nu ^x_{t+h}|_{X\setminus \{x\}}}{h} \end{aligned}$$

converges weakly to some measure \(v_t^x\).

Since X is proper with the discrete topology, the ball \(B_1(x)\) contains only finite elements and so

$$\begin{aligned} r_x{:=}\min \{d(x,y):y\ne x\}\wedge 1 > 0. \end{aligned}$$

Thus,

$$\begin{aligned} r_x \hat{\nu }_{t+h}^x(X)&=r_x\int _{\{\gamma : \gamma _{t+h}\ne x\}}\frac{1}{h}{\mathrm {\,d}}\pi ^x_t\nonumber \\&\le r_x\int _{\{\gamma : \gamma _{t+h} \in B_1(x) \setminus \{x\} \}}\frac{1}{h}{\mathrm {\,d}}\pi ^x_t +\int _{\{\gamma : \gamma _{t+h}\notin B_1(x)\}}\frac{1}{h}{\mathrm {\,d}}\pi ^x_t\nonumber \\&\le \int \frac{d(\gamma _t,\gamma _{t+h})}{h}{\mathrm {\,d}}\pi ^x_t(\gamma ). \end{aligned}$$
(4.9)

Combining (4.9) and the inequality

$$\begin{aligned} \limsup _{h\searrow 0}\int \frac{d(\gamma _t,\gamma _{t+h})}{h}{\mathrm {\,d}}\pi ^x_t(\gamma )\le \frac{|{\dot{\mu }}_t|}{\mu _t(x)}<\infty , \end{aligned}$$

we have

$$\begin{aligned} \limsup _{h\searrow 0}\hat{\nu }_{t+h}^x(X)<\infty . \end{aligned}$$

Next we prove tightness of \(\{\hat{\nu }_{t+h}^x\}_h\). Let \(\varepsilon >0\), and for every \(M\in {\mathbb {N}}\) define \(\Gamma _M{:=}\{\gamma :d(x,\gamma _{t+h})> M\}\). Then

$$\begin{aligned} \limsup _{h\searrow 0}\frac{\pi ^x_t(\Gamma _M)}{h}\le \limsup _{h\searrow 0}\frac{1}{M}\int _{\Gamma _M}\frac{d(\gamma _t,\gamma _{t+h})}{h}{\mathrm {\,d}}\pi ^x_t\le \frac{1}{M} \frac{|{\dot{\mu }}_t|}{\mu _t(x)}. \end{aligned}$$

In particular, \(\hat{\nu }_{t+h}^x(\{y:d(x,y)> M\})\rightarrow 0\) uniformly in h when \(M\rightarrow \infty \). Thus, properness of X implies that \(\{\hat{\nu }_{t+h}^x\}_h\) is tight, and for arbitrary weakly convergent subsequence we define \(v^x_t\) to be its limit.

Note that

$$\begin{aligned} r_x\cdot |\mu _t(x)-\mu _s(x)|\le W_1(\mu _t,\mu _s). \end{aligned}$$

So \(t\mapsto \mu _t(x)\) is BV or absolutely continuous if \(t\mapsto \mu _t\) is BV or absolutely continuous, respectively, which proves Lemma 4.9 as well. At \(t\in (0,1)\) where \(t\mapsto \mu _t(x)\) is differentiable, write

$$\begin{aligned} \frac{{\mathrm {\,d}}}{{\mathrm {\,d}}t} \mu _t(x)&=\lim _{h\searrow 0}\frac{\mu _{t+h}(x)-\mu _t(x)}{h} \\&=\lim _{h\searrow 0}\frac{(e_{t+h})_{\#}\int \pi ^y_t{\mathrm {\,d}}\mu _t(y)-\mu _t}{h}(x)\\&=\lim _{h\searrow 0}\left[ \frac{\int _{\{y\ne x\}}(e_{t+h})_{\#}\pi ^y_t{\mathrm {\,d}}\mu _t(y)}{h}(x)+\frac{(e_{t+h})_{\#}\pi ^x_t(x)\mu _t(x)-\mu _t(x)}{h} \right] \\&=\lim _{h\searrow 0} \left[ \sum _y \hat{\nu }^y_{t+h}(x)\mu _t(y)-\frac{{\nu }^x_{t+h}(X\setminus \{x\})}{h}\mu _t(x) \right] \\&=\sum _y v_t^y(x)\mu _t(y)-\mu _t(x)\sum _{y} v _t^x (y), \end{aligned}$$

where in the last equality we used the assumption that \(\mu _t\) is concentrated on finite-many points. Finally, for (4.6), by weak convergence,

$$\begin{aligned} |{\dot{\mu }}_t|&=\lim _{h\searrow 0}\int \frac{d(\gamma _t,\gamma _{t+h})}{h}{\mathrm {\,d}}\pi (\gamma )=\lim _{h\searrow 0}\iint \frac{d(x,\gamma _{t+h})}{h}{\mathrm {\,d}}\pi ^x_t(\gamma ){\mathrm {\,d}}\mu _t(x)\\&=\lim _{h\searrow 0}\iint \frac{d(x,y)}{h}{\mathrm {\,d}}\nu _{t+h}^x(y){\mathrm {\,d}}\mu _t(x) \\&=\sum _{x}\mu _t(x)\lim _{h\searrow 0}\int d(x,y) {\mathrm {\,d}}\big (\frac{\nu _{t+h}^x}{h}\big )(y)\\&=\sum _{x,y} \mu _t(x) d(x,y)v_t^x(y), \end{aligned}$$

where \(d(x,\cdot )\) can be regarded as bounded function as all \(\mu _t\) are confined to a common bounded set. \(\square \)

Couple more comments about the continuity equation in the discrete setting are in order. First of all, Theorem 4.10 could be used to prove a Benamou–Brenier type formula for the 1-Wasserstein distance in the discrete setting,

$$\begin{aligned} W_1(\mu _0,\mu _1) = \inf _{(v_t,\mu _t)} \int _{0}^1 \sum _{x,y} d(x,y) v_t^x(y) \mu _t(x) {\mathrm {\,d}}t, \end{aligned}$$

where the infimum is taken over all \((\mu _t) \in {\mathcal{A}\mathcal{C}}^1([0,1]:P_1(X))\) with \((v_t,\mu _t)\) satisfying the current Eq. (4.5). In fact, the 1-Wasserstein space over any complete and separable metric space is geodesic,Footnote 3 and thus Benamou–Brenier formula follows whenever Theorem 4.10 is applicable. The disadvantage is that Benamou–Brenier formula in such a general form is hardly useful.

If instead, one assumes more structure on the space, for instance, that the space is a discrete metric graph, then one can ask whether the Benamou–Brenier formula holds among all transports that respect the graph structure in a suitable manner. As alluded before, such a formulation has been proven to hold by Léonard in [11, Theorem 3.1]. We note that the result can be recovered by techniques introduced in this paper in the case of measures with bounded support. Indeed, given any \(\mu _0\) and \(\mu _1\), take \(\sigma \in \textrm{Opt}(\mu _0,\mu _1)\). For any \((x,y)\in \textrm{supp}(\sigma )\), consider a “discrete geodesic” \((x=x_1,\dots ,x_n=y)\), and perform subsequent linear interpolations between \(\delta _{x_i}\) and \(\delta _{x_{i+1}}\) to obtain a Wasserstein geodesic \((\mu _t^{xy})\) between measures \(\delta _x\) and \(\delta _y\). This can be done so that \((\mu _t^{xy})\) has a constant speed. Now apply Theorem 3.3 (or simply modify Example 1.1) to obtain a lift \(\pi ^{xy}\) of \((\mu _t^{xy})\). Define

$$\begin{aligned} \pi {:=}\int \pi ^{xy}{\mathrm {\,d}}\sigma (x,y), \end{aligned}$$

and let \(\mu _t{:=}(e_t)_\#\pi \) for all \(t \in [0,1]\). By construction and Theorem 3.1, we have that \((\mu _t)\) is a Wasserstein geodesic. Finally, it is readily checked that the measures \(v_t\) constructed (from this particular \(\pi )\) in the proof of Theorem 4.10 respect the graph structure, that is, \(v_t^x(y)=0\) whenever y is not a neighbor of x.

One challenge, however, for using our techniques to get more insight into the framework of graphs arises from the fact that it is not clear how to detect those curves on the level of the Wasserstein space which respects the graph structure. More precisely, it is not clear when a Wasserstein curve has a lift that is concentrated on curves that only jump along the edges of the graph (cf. discussion in Sect. 3.2). For instance, simply by looking at linear interpolations between measures like in Example 1.1, one ends up with constant speed Wasserstein geodesics which often don’t have any lifts respecting the graph structure. Notice that for the construction of a pair \((\mu _t,v_t)\) realizing the Wasserstein distance via Benamou–Brenier formula, we do not need to lift arbitrary curves or even geodesics but rather construct a specific Wasserstein geodesic and its lift with the desired endpoints.