1 Introduction and summary of results

This paper is devoted to the study of a nonlocal Wasserstein distance defined as the least nonlocal action needed to connect two measures via a nonlocal continuity equation. The standard Wasserstein distance is the minimal transportation cost to couple two measures, for the quadratic point-to-point cost:

$$\begin{aligned} W_{2}(\mu _{0},\mu _{1}):=\left( \inf _{\pi \in \Pi (\mu _{0},\mu _{1})}\int _{\mathbb {R}^{d}\times \mathbb {R}^{d}}|x-y|^{2}d\pi (x,y)\right) ^{1/2}, \end{aligned}$$

where \(\mu _{0}\) and \(\mu _{1}\) lie in \(\mathcal {P}_{2}(\mathbb {R}^{d})\), the space of probability measures on \(\mathbb {R}^{d}\) with finite second moments, and \(\Pi (\mu _{0},\mu _{1})\) is the space of all couplings of \(\mu _{0}\) and \(\mu _{1}\). The celebrated work of Benamou and Brenier [4] establishes that the Wasserstein metric has a dynamical reformulation inspired by fluid mechanics. There, one considers the space of all narrowly continuous curves \(\rho _{t}:[0,1]\rightarrow \mathcal {P}_{2}(\mathbb {R}^{d})\) and velocity vector fields \(v_{t}:[0,1]\rightarrow L^{2}(\rho _{t};\mathbb {R}^{d})\) such that the continuity equation

$$\begin{aligned} \partial _{t}\rho _{t}+{{\,\textrm{div}\,}}(\rho _{t}v_{t})=0 \end{aligned}$$
(1.1)

holds in the sense of distributions. In [4] it is shown that

$$\begin{aligned} W_{2}^{2}(\mu _{0},\mu _{1})=\inf \left\{ \int _{0}^{1}\!\! \int _{\mathbb {R}^{d}}|v_{t}(x)|^{2}d\rho _{t}(x)dt:\rho _{0} = \mu _{0},\rho _{1}= \mu _{1},\text { and }(\rho _{t},v_{t})_{t\in [0,1]}\text { solve (1.1})\right\} .\nonumber \\ \end{aligned}$$
(1.2)

The integral \(\int _{0}^{1}\int _{\mathbb {R}^{d}}|v_{t}(x)|^{2}d\rho _{t}(x)dt\) represents the total action along \((\rho _{t},v_{t})_{t\in [0,1]}\); in other words, the 2-Wasserstein distance between \(\mu _{0}\) and \(\mu _{1}\) can be reformulated as the least action of a curve in \(\mathcal {P}_{2}(\mathbb {R}^{d})\) connecting \(\mu _{0}\) to \(\mu _{1}\) in which the flow of mass is continuous, in the sense of satisfying Eq. 1.1.

In this article, our object of study is the class of nonlocal transportation metrics on \(\mathcal {P}(\mathbb {R}^{d})\), introduced by Erbar in [19], which are defined in terms of an nonlocal action minimization problem on the space of curves in \(\mathcal {P}(\mathbb {R}^{d})\). A key difference is that the curves connecting the measures are not solutions of the continuity Eq. 1.1, but solutions of nonlocal continuity equation:

$$\begin{aligned} \partial _{t}\rho _{t}(x)+\int _{\mathbb {R}^{d}}v_{t}(x,y)\theta (\rho _{t}(x),\rho _{t}(y))\eta (x,y)dy=0, \end{aligned}$$
(1.3)

where \(v_{t}:\mathbb {R}^{d}\times \mathbb {R}^{d}\rightarrow \mathbb {R}\) is the nonlocal velocity, \(\eta : \mathbb {R}^d \times \mathbb {R}^d \rightarrow [0, \infty )\) is the weight kernel which encodes the ability to transport mass directly from x to y, and \(\theta : [0, \infty ) \times [0, \infty ) \rightarrow [0, \infty )\) allows one to define an “interpolated density” \(\theta (\rho _{t}(x),\rho _{t}(y))\), which is a generalized average of \(\rho _{t}(x)\) and \(\rho _{t}(y)\).

The nonlocal total action is formally given by

$$\begin{aligned} \frac{1}{2}\int _{0}^{1}\int _{\mathbb {R}^{d}\times \mathbb {R}^{d}}v_{t}(x,y)^{2}\theta (\rho _{t}(x),\rho _{t}(y))\eta (x,y)dxdydt. \end{aligned}$$
(1.4)

Together, 1.3 and 1.4 allow us to consider the family of nonlocal transportation distances (since the distance now depends on the choice of \(\eta \) and \(\theta \))

$$\begin{aligned} \mathcal {W}_{\eta ,\theta }^{2}(\mu _{0},\mu _{1}):=\inf \left\{ \frac{1}{2}\int _{0}^{1}\! \int _{\mathbb {R}^{d}\times \mathbb {R}^{d}}v_{t}(x,y)^{2}\theta (\rho _{t}(x),\rho _{t}(y))\eta (x,y)dxdydt\right\} \end{aligned}$$
(1.5)

where the infimum runs over all \((\rho _{t},v_{t})_{t\in [0,1]}\) which solve (1.3), such that \(\rho _{0}=\mu _{0}\) and \(\rho _{1}=\mu _{1}\). This family of distances can be viewed simultaneously as nonlocal analogues of the Benamou-Brenier formulation of the \(W_{2}\) metric, and also as a “continuum state space” analogue of the graph Wasserstein distance defined in [10, 37, 38], wherein the underlying space is a finite graph or irreducible Markov chain, rather than \(\mathbb {R}^{d}\) as in the case of \(\mathcal {W}_{\eta ,\theta }\).

In this paper we investigate topological and metric properties of the family of distances \(\mathcal {W}_{\eta ,\theta }\) and compare them to the Wasserstein metric \(W_2\). Some properties have already been established in [19]: it is known that the \(\mathcal {W}_{\eta ,\theta }\) distance is lower semicontinuous with respect to narrow convergence, and that if the kernel \(\eta \) has finite second moments, then the topology induced by \(\mathcal {W}_{\eta ,\theta }\) is at least as fine as that of narrow convergence.

Here we show that the topology metrized by\(\mathcal {W}_{\eta ,\theta }\) can be strictly stronger than that of narrow convergence. In particular we characterize for which kernels \(\mathcal {W}_{\eta ,\theta }\) metrizes the narrow, the strong, or an even stronger topology on the space of measures supported within a compact domain. The key to establishing the result is the following proposition which loosely speaking characterizes the effort needed to spread mass from a point to the surrounding region.

Proposition 1.1

Let \(\mathcal {W}_{\eta ,\theta }\) be defined as in Definition 2.15. Suppose that \(\eta \) and \(\theta \) satisfy Assumption 2.1 (i–v) and Assumption 2.2 respectively. In particular, in this case \(\eta (x,y) = \varvec{\eta }(|x-y|)\) for some univariate function \(\varvec{\eta } \).

If \(\nu \) is any compactly supported probability measure singular to \(\delta _{0}\) (the Dirac measure at the origin), then depending on the choice of \(\eta \) and \(\theta \):

  1. (i)

    If \(\theta (1,0)=0\) and \(\int _{B(0,1)}\varvec{\eta }(|y|)dy<\infty \) then \(\mathcal {W}_{\eta ,\theta }(\delta _{0},\nu )=\infty \).

  2. (ii)

    If \(\theta (1,0)>0\) and \(\int _{B(0,1)}\varvec{\eta }(|y|)dy<\infty \) then \(\infty > \mathcal {W}_{\eta ,\theta }(\delta _{0},\nu )\ge 2\left( \int _{\mathbb {R}^{d}}\varvec{\eta }(|y|)dy\right) ^{-1/2}\).

  3. (iii)

    If instead \(\eta \) has algebraic blow-up at the origin, that is, there exists some \(s>0\), \(\delta >0\), and constant \(c>0\) such that \(\varvec{\eta }(|y|)\ge c|y|^{-d-s}\) when \(|y|\le \delta \), then instead we have the estimate

    $$\begin{aligned} \mathcal {W}_{\eta , \theta }(\delta _{0},\mathfrak {m}_{B(0,\delta )})\le C\delta ^{s/2} \end{aligned}$$

    with explicit constant C, where \(\mathfrak {m}_{B(0,\delta )}\) is the uniform probability measure on \(B(0,\delta )\). In particular, \(\inf _{\nu \in \{\mathcal {P}(\mathbb {R}^{d}):\nu \bot \delta _{0}\}}\mathcal {W}_{\eta ,\theta }(\delta _{0},\nu )=0\).

We remark that the proposition is not a trichotomy: the case where \(\int _{B(0,1)}\varvec{\eta }(|y|)dy=\infty \), but \(\varvec{\eta }(|y|)\) does not grow strictly faster than \(|y|^{-d}\) near the origin, remains open.

We prove Proposition 1.1 in Sect. 3.2. While Proposition 1.1 shows that in case (i), the topology induced by \(\mathcal {W}_{\eta ,\theta }\) on \(\mathcal {P}(\mathbb {R}^{d})\) is highly disconnected, we give a further structural description of the topology of \(\mathcal {W}_{\eta ,\theta }\) in the other two cases. Moreover in case (ii) we establish a quantitative comparison to a combination of total variation and transportation distances. Let \(W_1\) be the Monge distance, that is the optimal transportation distance with linear cost.

Theorem 1.2

Let \(\mathcal {W}_{\eta ,\theta }\) be defined as in Definition 2.15. Suppose that \(\eta \) and \(\theta \) satisfy Assumptions 2.1 (i–v) and 2.2 respectively, and in particular, \(\eta (x,y) = \varvec{\eta }(|x-y|)\) for some univariate function \(\varvec{\eta } \). If \(\theta (1,0)>0\) and \(\int _{B(0,1)}\varvec{\eta }(|y|)dy<\infty \), then on a compact domain, there exists an explicit constant C such that for all \(\mu ,\nu \in \mathcal {P}(\mathbb {R}^{d})\),

$$\begin{aligned} \frac{1}{C}TV^{1/2}(\mu ,\nu )\le \mathcal {W}_{\eta ,\theta }(\mu ,\nu )\le C\cdot TV^{1/2}(\mu ,\nu ). \end{aligned}$$

In particular, on a compact domain, \(\mathcal {W}_{\eta ,\theta }\) metrizes the strong topology on probability measures.

Conversely, in the case where \(\int _{B(0,1)}\varvec{\eta }(|y|)dy=\infty \) but \(\eta \) has finite second moment, then we merely have the lower bound

$$\begin{aligned} W_{1}(\mu ,\nu )\le C\mathcal {W}_{\eta ,\theta }(\mu ,\nu ); \end{aligned}$$

and furthermore, if there exists some \(s>0\), \(\delta >0\), and constant \(c>0\) such that \(\varvec{\eta }(|y|)\ge c|y|^{-d-s}\) when \(|y|\le \delta \), then on a compact domain, \(\mathcal {W}_{\eta ,\theta }\) metrizes the weak topology on probability measures.

The lower bounds asserted in Theorem 1.2 are established in Sect. 3.1; the corresponding upper bound makes use of the estimates from Sect. 3.2, and is proved in Sect. 3.3.

We now turn to making the quantitative comparison between the Wasserstein distance and the nonlocal Wasserstein distances more precise. In particular we show that when the kernel of nonlocal transport \(\eta \) is localized the nonlocal Wasserstein distance converges to the Wasserstein distance up to the appropriate scaling. We furthermore obtain explicit error bounds on the difference.

Theorem 1.3

Let \(\mathcal {W}_{\eta ,\theta }\) be defined as in Definition 2.15. Suppose that \(\eta \) and \(\theta \) satisfy Assumptions 2.1 and 2.2 respectively, and in particular, \(\eta (x,y) = \varvec{\eta }(|x-y|)\) for some univariate function \(\varvec{\eta } \). Let \(\varepsilon \in (0,1]\), and define \(\varvec{\eta }_{\varepsilon }(|x-y|):=\varepsilon ^{-d}\varvec{\eta }\left( \frac{|x-y|}{\varepsilon }\right) \) and similarly \(\eta _\varepsilon (x,y):= \varvec{\eta }_\varepsilon (|x-y|)\). Suppose that \(\eta \) has finite second moment \(M_{2}(\eta ):=\int |y|^{2}\varvec{\eta }(|y|)dy\). Let \(\rho _{0},\rho _{1}\in \mathcal {P}(\mathbb {R}^{d})\). Then the following estimates hold:

  1. (i)

    Suppose either that \(\varvec{\eta }(|y|)\) is integrable and \(\theta (1,0)>0\), or that there exists some \(s>0\), \(\delta >0\), and constant \(c>0\) such that \(\varvec{\eta }(|y|)\ge c|y|^{-d-s}\) when \(|y|\le \delta \). Then, there exists a constant \(C_{d,\theta ,\eta }\) depending solely and explicitly on d, \(\theta \), and \(\eta \), such that

    $$\begin{aligned} \sqrt{\frac{M_{2}(\eta )}{2d}}\varepsilon \mathcal {W}_{\eta _{\varepsilon },\theta }(\rho _{0},\rho _{1})\le \left( 1+\sqrt{\varepsilon }\right) ^{2}W_{2}(\rho _{0},\rho _{1})+C_{d,\theta ,\eta }\sqrt{\varepsilon }. \end{aligned}$$
  2. (ii)

    Suppose that \(\rho _{0}\) and \(\rho _{1}\) are supported inside some domain of radius R. Then, there is a constant \(C_{R,\eta }\) depending solely and explicitly on R and \(\eta \) such that

    $$\begin{aligned} W_{2}^{2}(\rho _{0},\rho _{1})\le \frac{M_{2}(\eta )}{2d}\varepsilon ^{2}\mathcal {W}_{\eta _{\varepsilon },\theta }^{2}(\rho _{0},\rho _{1})+C_{R,\eta }\sqrt{\varepsilon }. \end{aligned}$$

In particular, when restricting attention to probability measures supported on a compact domain, these estimates imply the Gromov-Hausdorff convergence of \(\sqrt{\frac{M_{2}(\eta )}{2d}}\varepsilon \mathcal {W}_{\eta _{\varepsilon },\theta }\) to \(W_{2}\) as \(\varepsilon \rightarrow 0\).

Part 1 of Theorem 1.3 is deduced, with an explicit constant for \(C_{d,\theta ,\eta }\), as Corollary 4.3; likewise, Part 2 of Theorem 1.3 is deduced, with an explicit constant for \(C_{R,\eta }\), in Corollary 5.12.

1.1 Related work

Having stated the main results of the article, let us give some further motivating discussion regarding nonlocal Wasserstein distances, and why their topological and asymptotic properties are of interest.

1.1.1 Nonlocal Wasserstein metric, and associated gradient flows

Nonlocal Wasserstein distances were introduced in the work of Erbar [19]. A central result of [19] is that the \(\mathcal {W}_{\eta , \theta }\) gradient flow of entropy \(E(\rho ) = \int \ln \rho d \rho \) is the fractional heat equation

$$\begin{aligned} \partial _{t}\rho _{t}+(-\Delta )^{s/2}\rho =0 \end{aligned}$$

when \(\theta \) is chosen to be the logarithmic mean, and \(\eta \) is chosen to be the jump kernel of an s-stable Levy process: \(\eta (x,y) = - c |x-y|^{-s-d}\). More broadly, this result suggests that nonlocal parabolic equations may be studied in an analogous fashion to those parabolic equations (such as: the heat equation, the Fokker–Planck equation, the porous medium equation) which can be cast as \(W_{2}\) gradient flows [3]. Erbar has made important contributions in that direction by establishing the lower-semicontinuity of the nonlocal action, showing the topology of nonlocal Wasserstein distances is at least as strong as that of the Wasserstein distance, and showing that the entropy is geodesically convex with respect to the nonlocal Wasserstein distance.

As with the regular Wasserstein distance it is of interest to consider gradient flows of the functionals that combine some or all of: entropy, potential, and interaction energy. For \(\beta _i \ge 0\) for \(i=1,2,3\)

$$\begin{aligned} G(\rho ) = \beta _1 \int \ln \rho \, d\rho + \beta _2 \int U(x)\, d\rho (x) + \beta _3 \iint K(x,y) d\rho (x) d\rho (y). \end{aligned}$$
(1.6)

where \(K(x,y) =K(y,x)\). If \(\beta _1 \beta _2 >0 \) and \(\beta _3=0\) this would be a nonlocal Fokker-Planck equation, for \(\beta _1 \beta _3 >0 \) this would be a nonlocal McKean-Vlasov equation. The issue that arises in nonlocal Wasserstein gradient flows is that the behavior of the solutions at low temperature, for \(\beta _1 \ll 1\), crucially depends on the kernel \(\eta \) and interpolation \(\theta \). In particular the result of Proposition 1.1(i) indicates that when \(\beta _1=0\), \(\eta \) is integrable and \(\theta \) is for example a logarithmic mean then gradient flow of the potential (or interaction) energy is unable to move a delta mass! This follows from the fact that since the potential energy is finite the gradient flow curves have finite action. This highlights the need to better understand the influence of the choice of \(\eta \) and \(\theta \) on the nonlocal Wasserstein metric and the resulting gradient flows.

To overcome the issues with the freezing of support for the gradient flow of potential and interaction energies, Esposito et al. [25] studied a modification of the nonlocal transportation framework, inspired by upwind numerical schemes, which allows for the interpolation \(\theta \) to depend on the velocity. For antisymmetric v

$$\begin{aligned} \theta (\rho (x), \rho (y), v(x,y)) = {\left\{ \begin{array}{ll} \rho (x) \quad &{} \text {if } v(x,y)\ge 0 \\ \rho (y) &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

They study gradient flow of the nonlocal interaction energy (\(\beta _1=\beta _2=0\)) with respect to nonlocal Wasserstein distances both on graphs and in the continuum.

The resulting upwind nonlocal Wasserstein “distance” is not symmetric, and is shown to be a quasimetric. It provides a formal Finslerian (rather than Riemannian) differential structure on \(\mathcal {P}(\mathbb {R}^{d})\), which is nonetheless sufficient to develop gradient flows as curves of maximal slope.

Lastly: in [24], the authors show that the 1D aggregation equation

$$\begin{aligned} \partial _{t}f_{t}=\partial _{v}(f_{t}\partial _{v}W*f_{t})\qquad W(v)=c|v|^{3} \end{aligned}$$

can be cast as the gradient flow of the kinetic energy with respect to a nonlocal transportation metric, which they call the nonlocal collision metric. This metric falls outside the scope of this article because the action they consider is 2-homogeneous (rather than 1-homogeneous). While the aggregation equation is also known to be a 2-Wasserstein gradient flow (of the nonlocal interaction energy, rather than the kinetic energy), it is nonetheless notable that a nonlocal transportation metric is recently shown to be physically relevant in kinetic theory.

1.1.2 Graph Wasserstein distances

Maas [37], Mielke [38], and Chow et al. [10] have independently introduced a metric structure for probability measures on discrete spaces (finite graphs or Markov chains) modeled on the Benamou-Brenier formulation of the \(W_{2}\) metric. Our setup largely follows that of Maas.

Let \(\mathcal {X}\) be a finite set. Let \(\pi \) be some distinguished probability measure on \(\mathcal {X}\). Define

$$\begin{aligned} \mathcal {P}_{\pi }(\mathcal {X}):=\left\{ \rho :\mathcal {X}\rightarrow \mathbb {R}_{+}\mid \sum _{x\in \mathcal {X}}\rho (x)\pi (x)=1\right\} . \end{aligned}$$

In other words, \(\mathcal {P}_{\pi }(\mathcal {X})\) is the set of probability densities on \(\mathcal {X}\), w.r.t. \(\pi \).

Consider some irreducible Markov kernel \(K:\mathcal {X}\times \mathcal {X}\rightarrow \mathbb {R}_{+}\) such that \(\pi \) is the unique stationary measure for K, that is,

$$\begin{aligned} \pi (y)=\sum _{x\in \mathcal {X}}\pi (x)K(x,y). \end{aligned}$$

Furthermore, assume that K is reversible, that is, the detailed balance condition

$$\begin{aligned} K(x,y)\pi (x)=K(y,x)\pi (y) \end{aligned}$$

holds. Equivalently [36, Chapter 9], one may consider a connected weighted graph on \(\mathcal {X}\) with weights w(xy), where \(\pi \) is the stationary distribution for the uniform random walk on \((\mathcal {X},w)\).

Let \(\theta (x,y)\) be an interpolation functionFootnote 1; define the shorthand \(\hat{\rho }(x,y):=\theta (\rho (x),\rho (y))\). We introduce the graph continuity equation

$$\begin{aligned} \dot{\rho }_{t}(x)+\sum _{y\in \mathcal {X}}v_{t}(x,y)\hat{\rho }_{t}(x,y)K(x,y)=0 \end{aligned}$$

where \(v_{t}:\mathcal {X}\times \mathcal {X}\rightarrow \mathbb {R}\) is thought of as a “vector field” on \(\mathcal {X}\), analogous to the vector field \(v_{t}\) appearing in the (continuum) continuity equation. The term \(\sum _{y\in \mathcal {X}}v_{t}(x,y)\hat{\rho }_{t}(x,y)K(x,y)\) can be interpreted as a graph analogue of the term \({{\,\textrm{div}\,}}(\rho v)\) from the continuity equation on \(\mathbb {R}^{d}\).

The action of a density-potential pair is given by

$$\begin{aligned} \mathcal {A}(\rho ,v):=\frac{1}{2}\sum _{x,y\in \mathcal {X}}(v_{t}(x,y))^{2}\hat{\rho }(x,y)K(x,y)\pi (x). \end{aligned}$$

From here, one can define a geodesic metric on \(\mathcal {P}_{\pi }(\mathcal {X})\) in a variational fashion, by setting

$$\begin{aligned} \mathcal {W}_{\theta , \eta , \pi }(\bar{\rho _{0}},\bar{\rho _{1}})^{2}:=\inf \left\{ \int _{0}^{1}\mathcal {A}(\rho _{t},v_{t})dt\right\} \end{aligned}$$

where the infimum runs over all pairs \((\rho _{t},v_{t})_{t\in [0,1]}\) satisfying the graph continuity equation, with \(\rho _{0}=\bar{\rho _{0}}\) and \(\rho _{1}=\bar{\rho }_{1}\). This Benamou–Brenier-type formulation of a distance on a discrete base space is more technically straightforward than its continuum ancestor \(W_{2}\). For one, it is shown in [37] that, at least on the “interior” of \(\mathcal {P}_{\pi }(\mathcal {X})\) (namely, the subset \(\{\rho (x)\in \mathcal {P}_{\pi }(\mathcal {X}):\forall x\in \mathcal {X},\rho (x)>0\}\)) we can interpret \(\mathcal {W}\) as a geodesic metric arising from a bona fide Riemannian metric structure; this is in contrast to the continuum setting, where the space \(\mathcal {P}_{2}(\mathbb {R}^{d})\) can only be understood “formally” as a Riemannian manifold. Indeed, for the metric \(\mathcal {W}_{\theta , \eta , \pi }\) and related gradient flows, numerous heuristic arguments from the Otto calculus can be translated to rigorous arguments in the discrete setting. This has been exploited to study a variety of evolution equations on discrete spaces as \(\mathcal {W}_{\theta , \eta , \pi }\) gradient flows, for instance discrete analogs of the porous medium equation [22] and the McKean-Vlasov equation [20].

The reason for the need to introduce an interpolation \(\theta \) in discrete setting is rather clear. Indeed, because mass configurations are defined on the set of nodes, and vector fields are defined on edges, the discrete analogue of the flux of the continuity equation, \(\rho v\), must combine node- and edge-defined quantities in a noncanonical fashion. Indeed, in the definition of \(\mathcal {W}_{\theta , \eta , \pi }\) the role of the flux \(\rho v\) is played by the quantity \(\theta (\rho (x),\rho (y))v(x,y)\), where the interpolation \(\theta \) is introduced in order to define an edge-based quantity (flux) based on vertex-defined quantities (mass). While there are many possible choices for \(\theta \), which \(\theta \) one chooses can significantly alter the geometry of \(\mathcal {W}_{\theta , \eta , \pi }\).

1.1.3 From graphs to continuum

A number of works has investigated the asymptotic properties of graph Wasserstein distances as the graphs converge to a continuum limit. This is of particular interest in data science and for mesh-free numerical schemes. For instance, if one considers a sequence of finite graphs \(G_{n}\) equipped with the shortest-path metric, converging in some sense to a continuous domain \(\mathcal {G}\subset \mathbb {R}^{d}\), is it the case that \((\mathcal {P}(G_{n}),\mathcal {W}_{\theta , \eta , \pi })\) also converges to \((\mathcal {P}(\mathcal {G}),W_{2})\)? Similar Gromov-Hausdorff-type stability results are well-established for a sequence of continuous domains where each respective space of probability measures is equipped with the \(W_{2}\) metric [47]. However, the problem of discrete-to-continuum stability for the graph Wasserstein distance turns out to be considerably more delicate. In [31], it is shown that if we consider a sequence \(\mathcal {X}_{n}\) of finer and finer d-dimensional regular lattices on the flat d-torus \(\mathbb {T}^{d}\), and take for our Markov chain the uniform random walk on said lattice, then under appropriate rescaling the sequence of spaces \((\mathcal {P}(\mathcal {X}_{n}),\mathcal {W}_{\theta , \eta , \pi })\) converges to \((\mathcal {P}(\mathbb {T}^{d}),W_{2})\) in the sense of Gromov–Hausdorff. On the other hand, such convergence does not hold for an arbitrary sequence of regular meshes [33]. Despite this failure of convergence for general sequences of meshes, Garcia Trillos has shown [29] that \(\mathcal {W}_{\theta , \eta , \pi }\) corresponding to weighted random geometric graphs (e.g. where vertices are random i.i.d. samples from the Lebesgue measure on the torus) converges in the sense of Gromov–Hausdorff to \((\mathcal {P}(\mathbb {T}^{d}),W_{2})\) as the number of vertices goes to infinity and the graph bandwidth converges to zero at appropriate rate (which is such that unweighted graph degrees go to infinity).

Our own Theorem 1.3 provides another result in this vein, whereby the 2-Wasserstein distance is recovered in the limit; but ours gives a nonlocal-to-local convergence result, rather than discrete-to-continuum.

We should also draw attention to one other question regarding the graph Wasserstein metrics: what can be said about the geodesics on \((\mathcal {P}(\mathcal {X}),\mathcal {W}_{\theta , \eta , \pi })\)? Two works [23, 28] have independently investigated graph Wasserstein geodesics via their dual description in terms of solutions to a suitable discrete Hamilton–Jacobi equation. In the present article, an analogous Hamilton–Jacobi duality result for the nonlocal Wasserstein metric is exploited in Sect. 5 to prove Part 2 of Theorem 1.3.

1.1.4 Other related work

In [40], Peletier, Rossi, Savaré and Tse consider a far-reaching generalization of the setup of [19]. While [19] shows that the fractional heat equation can be viewed as the gradient flow of \(\text {KL}(\cdot \mid \text {Leb})\) with respect to a Wasserstein-like metric, [40] investigates a large class of reversible Markov jump processes whose Kolmogorov forward equations may be cast as so-called generalized gradient systems, which are a further abstraction of the gradient flows in metric spaces studied in [3]. We note, however, that [40] imposes an assumption of uniform boundedness on the transition kernel of the Markov process, so that the results of [19] are not a special case of those established in [40].

Finally, let us remark on some related work on the subject of nonlocal conservation laws. We draw attention to two distinct uses of the term in the literature (although the works cited in the following discussion refer to some other uses of the term in the literature). On the one hand, certain authors use “nonlocal conservation law” to refer to solutions to the (local) continuity equation \(\partial _{t}u_{t}+{{\,\textrm{div}\,}}(u_{t}v_{t})=0\) where \(v_{t}\) is itself a nonlocal functional of \(u_{t}\); see [5, 13] for a general discussion of this class of equations and an overview of related literature. In particular, we draw attention to recent work [11, 12] investigating the local limit (namely, as \(\varepsilon \rightarrow 0\)) of nonlocal conservation laws of the form

$$\begin{aligned} \partial _{t}u_{t}^{\varepsilon }+{{\,\textrm{div}\,}}\left( u_{t}^{\varepsilon }b\left( \eta _{\varepsilon }*u_{t}^{\varepsilon }\right) \right) =0 \end{aligned}$$

where \(b:\mathbb {R}_{+}\rightarrow \mathbb {R}^{d}\), and \(\eta :\mathbb {R}^{d}\rightarrow \mathbb {R}_{+}\) is some convolution kernel and \(\eta _{\varepsilon }(x):=\frac{1}{\varepsilon ^{d}}\eta \left( \frac{x}{\varepsilon }\right) \). Formally, the singular limit is given by the conservation law \(\partial _{t}u_{t}+{{\,\textrm{div}\,}}(u_{t}b(u_{t}))=0\), but [12] exhibits counterexamples where \(u_{t}^{\varepsilon }\not \rightarrow u_{t}\) (e.g. in \(L^{p}\) for \(p>1\)) even when b and \(\eta \) are regular. More recently, sufficient conditions for nonlocal-to-local convergence in one dimension have been given in [11]; but it remains the case that formally “obvious” nonlocal-to-local convergence problems can present unexpected technical phenomena.

In a distinct line of work, the articles [16, 17] introduce new classes of “nonlocal conservation laws” where one replaces the divergence term in the continuity equation with a nonlocal divergence-type operator. The nonlocal divergence and gradient in these papers, as well as in [17, 32]), share the properties of the objects we study, but also have important differences. More importantly the authors are interested in conservation laws where the flux j is a nonlinear function of \(\rho \). Several ways to encode the nonlinear and nonlocal dependence on \(\rho \) are developed. In [16] the authors present, in the same vein as in [11], sufficient conditions which allow one to recover a local conservation law in the limit, under a suitable rescaling of the nonlocal divergence operator.

2 Preliminaries

2.1 Wasserstein metric

Let \(\mathcal {P}_{2}(\mathbb {R}^{d})\) denote the space of probability measures on \(\mathbb {R}^{d}\) with finite second moments. The 2-Wasserstein distance \(W_{2}\) on \(\mathcal {P}_{2}(\mathbb {R}^{d})\) is defined by

$$\begin{aligned} W_{2}^{2}(\mu ,\nu ):=\inf _{\pi \in \Pi (\mu ,\nu )}\int |x-y|^{2}d\pi (x,y) \end{aligned}$$

where \(\Pi (\mu ,\nu )\) is the set of all transport plans (couplings) of \(\mu \) and \(\nu \). The 2-Wasserstein distance also has a well-known dynamical formulation, due to Benamou and Brenier [4]:

$$\begin{aligned} W_{2}^{2}(\mu ,\nu )=\inf \left\{ \int _{0}^{1}\int _{\mathbb {R}^{d}}|v_{t}(x)|^{2}d\rho _{t}(x)dt \,: \, \partial _{t}\rho _{t}+{{\,\textrm{div}\,}}(\rho _{t}v_{t})=0,\;\rho _{0}=\mu ,\;\rho _{1}=\nu \right\} . \end{aligned}$$

Here, the continuity equation \(\partial _{t}\rho _{t}+{{\,\textrm{div}\,}}(\rho _{t}v_{t})=0\) is interpreted in a suitable distributional sense, in particular to allow \(\rho _{t}\) to be a probability measure (rather than a smooth density).

The Benamou-Brenier formulation of the \(W_{2}\) metric can be interpreted as showing that the \(W_{2}\) distance between \(\mu \) and \(\nu \) is given by the minimal total kinetic energy of a unit-time flow of mass with initial and terminal distribution specified by \(\mu \) and \(\nu \) respectively. Classically, kinetic energy is either formulated in terms of position and velocity, or position and momentum; accordingly, as was observed in [4] (but see also further discussion and extensions in [8, 14]), one can also rewrite the Benamou-Brenier formulation of \(W_{2}\) in “mass-flux” coordinates:

$$\begin{aligned} W_{2}^{2}(\mu ,\nu )=\inf \left\{ \int _{0}^{1}\int _{\mathbb {R}^{d}}\left| \frac{d\mathbf {\textbf{j}}_{t}}{d\rho _{t}}\right| ^{2}d\rho _{t}(x)dt \,: \, \partial _{t}\rho _{t}+{{\,\textrm{div}\,}}\mathbf {\textbf{j}}_{t}=0,\rho _{0}=\mu ,\rho _{1}=\nu \right\} \end{aligned}$$

where \(\mathbf {\textbf{j}}_{t}\) is a locally finite signed measure, which formally takes the place of \(\rho _{t}v_{t}\). This presentation of \(W_{2}\) has the technical advantage that the action \(\int _{\mathbb {R}^{d}}\left| \frac{d\mathbf {\textbf{j}}}{d\rho }(x)\right| ^{2}d\rho (x)\) is jointly convex and lower semicontinuous in \(\rho \) and \(\mathbf {\textbf{j}}\) (this is a well-known consequence of Reshetnyak’s theorem; for completeness, we provide a proof of this result which covers the case where \(\mathbf {\textbf{j}} \in \mathcal {M}_{loc}(\mathbb {R}^{d})\) in Theorem A.1). In particular, see Remark 2.23 for a useful consequence of these properties.

2.2 Nonlocal structure: weight kernel, interpolation, vector calculus

We equip \(\mathbb {R}^{d}\) with an “underlying nonlocal structure”, as follows.

Assumption 2.1

(Weight kernel) The function \(\eta :\{(x,y)\in \mathbb {R}^{d}\times \mathbb {R}^{d}\backslash \{x=y\}\}\rightarrow [0,\infty )\) satisfies the following properties:

  1. (i)

    \(\eta \) is continuous on the set \(\{(x,y)\in \mathbb {R}^{d}\times \mathbb {R}^{d}\backslash \{x=y\}:\eta (x,y)>0\};\)

  2. (ii)

    \(\eta \) is isotropic, that is, there exists radial profile \(\varvec{\eta }: (0, \infty ) \rightarrow [0,\infty )\) such that \(\eta (x,y) = \varvec{\eta }(|x-y|)\).

  3. (iii)

    The radial profile \(\varvec{\eta }\) is non-increasing.

  4. (iv)

    \(\eta \) satisfies the tail moment bound \(\int _{\mathbb {R}^{d}}1\wedge |y|^{2} \, \varvec{\eta }(|y|)dy<\infty \).

    Additionally, for some results we require that

  5. (v)

    The support of \(\varvec{\eta }\) contains (0, 1],

    Or, furthermore, that

  6. (vi)

    The support of \(\varvec{\eta }\) is equal to (0, 1].

The assumption of isotropy is largely imposed to simplify the statements of our results. When combined with the tail moment bound, these suffice to guarantee that Assumption 1.1 from [19] is satisfied; the relevance for us is that we make use of several results from [19] which require this assumption. On the other hand, the arguments of Sects. 4 and 5 make use of the assumption that \(\eta \) is compactly supported. Note also that under the assumption \(\eta \) is isotropic, if \(\eta \) is compactly supported then the suport of \(\varvec{\eta }(|y|)\) is equal to \(\bar{B}(0,R)\) for some \(R>0\); in assumption (vi), we fix \(R=1\) merely as a convention. Likewise, in view of assumption (iii) assumption (v) may be viewed as merely a convention unless \(\varvec{\eta }\) is identically zero.

Since \(\eta \) is taken to be isotropic, we often write \(\eta (|x-y|)\) rather than \(\varvec{\eta }(|x-y|)\); in other words, we abusively identify \(\eta \) with its radial profile.

We write \(M_{p}(\eta ):=\int _{\mathbb {R}^{d}}|x-y|^{p}\eta (|x-y|)dy\) to denote the p-th central moment of \(\eta (x,y)\) with x fixed. Note that due to isotropy, \(M_{p}(\eta )\) does not depend on the choice of x. Unless otherwise stated, we do not explicitly assume pth moment bounds on \(\eta \), but note that Theorem 1.3 assumes that \(M_{2}(\eta )<\infty \).

In what follows, given a choice of weight kernel \(\eta \), we denote \(G:=\{(x,y)\in \mathbb {R}^{d}\times \mathbb {R}^{d}\backslash \{x=y\}:\eta (x,y)>0\}\). The intended interpretation is that G is the set of edges we have placed on \(\mathbb {R}^{d}\), with \(\eta (x,y)\) being the edge weight between x and y.

We also assume that the interpolation function \(\theta \) satisfies the following:

Assumption 2.2

(Interpolation function) \(\theta :[0,\infty )\times [0,\infty )\rightarrow [0,\infty )\) satisfies the following properties:

  1. (i)

    Regularity: \(\theta \) is continuous on \([0,\infty )\times [0,\infty )\) and \(C^{1}\) on \((0,\infty )\times (0,\infty )\);

  2. (ii)

    Symmetry: \(\theta (s,t)=\theta (t,s)\) for \(s,t\ge 0\);

  3. (iii)

    Positivity, normalisation: \(\theta (s,t)>0\) for \(s,t>0\) and \(\theta (1,1)=1\);

  4. (iv)

    Monotonicity: \(\theta (r,t)\le \theta (s,t)\) for all \(0\le r\le s\) and \(t\ge 0\);

  5. (v)

    Positive 1-homogeneity: \(\theta (\lambda s,\lambda t)=\lambda \theta (s,t)\) for \(\lambda >0\) and \(s\ge t\ge 0\);

  6. (vi)

    Concavity: the function \(\theta :[0,\infty )\times [0,\infty )\rightarrow [0,\infty )\) is concave;

  7. (vii)

    Connectedness: \(C_{\theta }:=\int _{0}^{1}\frac{dr}{\theta (1-r,1+r)}\in [0,\infty ).\)

Remark 2.3

Points (i–vi) in the preceding assumption are identical to [19, Assumption 2.1], except that Erbar also assumes that \(\theta \) is zero on the boundary, namely \(\theta (0,t)=0\) for all \(t\ge 0\). However, a careful reading of [19] indicates that this extra assumption is never used (except in the sense that Erbar proves some results only for the logarithmic mean, which is indeed zero on the boundary). We do not assume \(\theta \) is zero on the boundary; moreover, we will see below that whether or not \(\theta \) is zero on the boundary has significant topological consequences for the nonlocal Wasserstein distance.

We call point (vii) “connectedness” because, as discussed in [37] if instead \(C_{\theta }=\infty \), the discrete \(\mathcal {W}_{\theta }\) distance on the space of probability measures supported on a symmetric graph with two points becomes topologically disconnected. The assumption that \(C_{\theta }<\infty \) is required for several arguments in Sect. 3.

Lemma 2.4

Any \(\theta \) satisfying Assumption 2.2 also satisfies \(\theta (r,s)\le \frac{r+s}{2}\).

Proof

First, note that by 1-homogeneity and the normalization \(\theta (1,1)=1\), we have \(\theta (r,r)=r\) for all \(r\in \mathbb {R}_{+}\). At the same time, symmetry implies that at any point along the line (rr) in \(\mathbb {R}_{+}\times \mathbb {R}_{+}\) (excluding (0, 0), the directional derivative of \(\theta (r,r)\) in the direction orthogonal to the vector (1, 1) must be zero. Therefore, by concavity, \(\theta (r,s)\) is upper-bounded by the hyperplane which takes the value r at (rr) and has directional derivative zero in the direction orthogonal to (1, 1) at every point (rr), and the only such hyperplane is given by \(\frac{r+s}{2}\). \(\square \)

Definition 2.5

(Nonlocal gradient and divergence) For any function \(\phi :\mathbb {R}^{d}\rightarrow \mathbb {R}\) we define its nonlocal gradient \(\overline{\nabla }\phi :G\rightarrow \mathbb {R}\) by

$$\begin{aligned} \overline{\nabla }\phi (x,y)=\phi (y)-\phi (x)\text { for all }(x,y)\in G. \end{aligned}$$

For any \(\textbf{j}\in \mathcal {M}_{loc}(G)\), its nonlocal divergence \(\overline{\nabla }\cdot \textbf{j}\in \mathcal {M}_{loc}(\mathbb {R}^{d})\) is defined as the \(\eta \)-weighted adjoint of \(\overline{\nabla }\), i.e.,

$$\begin{aligned} \int _{\mathbb {R}^{d}}\phi d\overline{\nabla }\cdot \textbf{j}=-\frac{1}{2}\iint _{G}\overline{\nabla }\phi (x,y)\eta (x,y)d\textbf{j}(x,y). \end{aligned}$$

Remark 2.6

It is the nonlocal divergence operator \(\overline{\nabla }\cdot \) which replaces the usual divergence operator in the Definition 2.12. However, we will simply write out the integral operator \(\frac{1}{2}\iint _{G}\overline{\nabla }\phi (x,y)\eta (x,y)d\textbf{j}(x,y)\) explicitly in the sequel; the definition here is just presented for easier comparison with articles such as [19, 25]. Moreover, it is not obvious under what circumstances the measure \(\overline{\nabla }\cdot \textbf{j}\) exists, even assuming that the integral \(\frac{1}{2}\iint _{G}\overline{\nabla }\phi (x,y)\eta (x,y)d\textbf{j}(x,y)\) is finite. Additionally, we caution the reader that other conventions for the definition of nonlocal gradient and divergence operator are present in the literature, in particular our definitions are not the same as the ones presented in [15].

2.3 Action

We rigorously define the action in the “flux form”.

Definition 2.7

(Action) Let \(\eta \) satisfy Assumption 2.1 and \(\theta \) satisfy Assumption 2.2. Let \(\mu \in \mathcal {P}(\mathbb {R}^{d})\) and \(\textbf{j}\in \mathcal {M}_{loc}(G)\). Let \(m\in \mathcal {M}_{loc}(\mathbb {R}^{d})\) be any reference measure. Define the action of the pair \((\mu ,\textbf{j})\) by

$$\begin{aligned} \mathcal {A}_{\theta ,\eta }(\mu ,\textbf{j};m):=\frac{1}{2} \, \iint _{G}\frac{\left( \frac{d\textbf{j}}{d\lambda }(x,y)\right) ^{2}}{\theta \left( \frac{d(\mu \otimes m)}{d\lambda }(x,y),\frac{d(m\otimes \mu )}{d\lambda }(x,y)\right) }\eta (x,y)d\lambda (x,y) \end{aligned}$$

where \(\lambda \) is taken to be any nonnegative measure in \(\mathcal {M}_{loc}^{+}(G)\) such that \(|\textbf{j}|,\mu \otimes m,m\otimes \mu \ll \lambda \). Here, the fraction in the integrand is understood with the convention that \(\frac{0}{0}=0\).

If \(\theta \) and \(\eta \) are obvious from context, and m is chosen to be \(\text {Leb}\), the Lebesgue measure on \(\mathbb {R}^{d}\), we simply write \(\mathcal {A}(\mu ,\textbf{j})\).

Remark 2.8

Note that \(\mathcal {A}_{\theta ,\eta }(\mu ,\textbf{j};m)\) does not depend on the choice of \(\lambda \) satisfying this domination condition \(|\textbf{j}|,\mu \otimes m,m\otimes \mu \ll \lambda \), since \(\theta \) is 1-homogeneous.

Remark 2.9

The choice of “reference measure” m in Definition 2.7 allows us to encode alternate geometries on \(\mathbb {R}^{d}\) besides the usual Euclidean one; phrased differently, Definiton 2.7 makes sense when working with any metric measure space \((\mathbb {R}^{d},d,m)\) (however if we use a metric d other than the Euclidean one, note this would change what it means for \(\eta \) to be isotropic). In particular, if we consider a “weighted measured graph” \(G=(V,E,w,m_{n})\) with vertices in \(\mathbb {R}^{d}\), where \(w(x_{i},x_{j})=\eta (x_{i},x_{j})\) for any \((x_{i},x_{j})\in E\), and \(m_{n}\) is a measure supported on V, then selecting \(m_{n}\) as the reference measure in Definiton 2.7 causes \(\mathcal {A}_{\eta ,\theta ,m_{n}}\) to coincide with the action associated to the graph Wasserstein distance discussed in the introduction, if we also restrict to the case where \(\mu \ll m_{n}\) and \(\textbf{j}\ll \sum _{i,j}w(x_{i},x_{j})m_{n}(x_{i})m_{n}(x_{j})\).

Lemma 2.10

The action \(\mathcal {A}_{\theta ,\eta }(\mu ,\textbf{j};m)\) is jointly convex in \((\mu ,\textbf{j})\), and is jointly lower semicontinuous in \((\mu ,\textbf{j})\) with respect to the narrow topology on \(\mathcal {P}(\mathbb {R}^{d})\) and the weak* topology on \(\mathcal {M}_{loc}(G)\).

Proof

This is proved in Corollary A.2. \(\square \)

Remark 2.11

(Comparison with action given in Erbar) Our Definition 2.7 is superficially different from the definition of the action given in [19, Section 2]. Nonetheless, it can be seen that the two definitions are, in fact, equivalent, and so our choice of an alternate presentation of the action is largely one of taste.

Specialized to our setting, Erbar defined the action \(\mathcal {A}^{\prime }(\mu ,\varvec{\nu })\) of a pair \((\mu ,\varvec{\nu })\in \mathcal {P}(\mathbb {R}^{d})\times \mathcal {M}_{loc}(G)\) by

$$\begin{aligned} \mathcal {A}_{\theta ,\eta }^{\prime }(\mu ,\varvec{\nu }):=\iint _{G}\frac{\left( \frac{d\varvec{\nu }}{d\lambda }(x,y)\right) ^{2}}{2\theta \left( \frac{d\mu ^{1}}{d\lambda }(x,y),\frac{d\mu ^{2}}{d\lambda }(x,y)\right) }d\lambda (x,y) \end{aligned}$$

where \(d\mu ^{1}=\eta (x,y)d\mu (x)m(y)\) and \(d\mu ^{2}=\eta (x,y)dm(x)d\mu (y)\), and \(\lambda \) is any measure in \(\mathcal {M}_{loc}(G)\) dominating \(\varvec{\nu }\), \(\mu ^{1}\), and \(\mu ^{2}\) (and the fraction in the integrand is understood with the convention that \(\frac{0}{0}=0\)). Now, given a pair \((\mu ,\textbf{j})\in \mathcal {P}(\mathbb {R}^{d})\times \mathcal {M}_{loc}(G)\), it is routine to check (by using the chain rule for Radon-Nikodym derivatives and the 1-homogeneity of \(\theta \)) that

$$\begin{aligned} \mathcal {A}_{\theta ,\eta }(\mu ,\textbf{j})=\mathcal {A}_{\theta ,\eta }^{\prime }(\mu ,\varvec{\nu }) \end{aligned}$$

provided that \(\varvec{\nu }\in \mathcal {M}_{loc}(G)\) is defined so that \(\frac{d\varvec{\nu }}{d\textbf{j}}(x,y)=\eta (x,y)\). And conversely, if first given \((\mu ,\varvec{\nu })\), it holds that \(\mathcal {A}_{\theta ,\eta }(\mu ,\textbf{j})=\mathcal {A}_{\theta ,\eta }^{\prime }(\mu ,\varvec{\nu })\) provided that \(\textbf{j}\in \mathcal {M}_{loc}(G)\) is defined so that \(\frac{d\textbf{j}}{d\varvec{\nu }}=\frac{1}{\eta (x,y)}\); note that this latter definition (of \(\textbf{j}\)) is unproblematic on G, since by definition \(\eta (x,y)>0\) everywhere on G.

2.4 Nonlocal continuity equation

We define the weak solutions of the nonlocal continuity Eq. (1.3), in flux form, in the same way as [25, Section 2.3].

Definition 2.12

(Nonlocal continuity equation) Let \(T>0\). We say that \((\mu _{t},\textbf{j}_{t})_{t\in [0,T]}\) solves the nonlocal continuity equation provided that

  1. (i)

    \(\mu _{(\cdot )}:[0,T]\rightarrow \mathcal {P}(\mathbb {R}^{d})\) is continuous when \(\mathcal {P}(\mathbb {R}^{d})\) is equipped with the narrow topology,

  2. (ii)

    \(\textbf{j}_{(\cdot )}:[0,T]\rightarrow \mathcal {M}_{loc}(G)\) is Borel when \(\mathcal {M}_{loc}(G)\) is equipped with the weak* topology,

  3. (iii)

    \(\forall \varphi \in C_{c}^{\infty }([0,T]\times \mathbb {R}^{d})\)

    $$\begin{aligned} \int _{0}^{T}\int _{\mathbb {R}^{d}}\partial _{t}\varphi _{t}(x)d\mu _{t}(x)dt+\frac{1}{2}\int _{0}^{T}\iint _{G}\overline{\nabla }\varphi _{t}(x,y)\eta (x,y)d\textbf{j}_{t}(x,y)dt=0. \end{aligned}$$
    (2.1)

We write \((\mu _{t},\textbf{j}_{t})\in \mathcal{C}\mathcal{E}_{T}\) to indicate that \((\mu _{t},\textbf{j}_{t})_{t\in [0,T]}\) satisfies conditions (i), (ii), and (iii) above. Furthermore, we write \(\mathcal{C}\mathcal{E}\) to denote \(\mathcal{C}\mathcal{E}_{1}\), and write \((\mu _{t},\textbf{j}_{t})\in \mathcal{C}\mathcal{E}_{T}(\nu ,\sigma )\) to indicate that \((\mu _{t},\textbf{j}_{t})\in \mathcal{C}\mathcal{E}_{T}\) and \(\mu _{0}=\nu \) and \(\mu _{1}=\sigma \).

Remark 2.13

(Comparison with nonlocal continuity equation given in [19]) In [19], a slightly different nonlocal continuity equation is considered. There, Eq. 1.3 is replaced with the following equation:

$$\begin{aligned}{} & {} \forall \varphi \in C_{c}^{\infty }([0,T]\times \mathbb {R}^{d})\nonumber \\{} & {} \qquad \int _{0}^{T}\int _{\mathbb {R}^{d}}\partial _{t}\varphi _{t}(x)d\mu _{t}(x)dt +\frac{1}{2}\int _{0}^{T}\iint _{G}\overline{\nabla }\varphi _{t}(x,y)d\varvec{\nu }_{t}(x,y)dt=0. \end{aligned}$$
(2.2)

Here, \(\varvec{\nu }_{(\cdot )}:[0,T]\rightarrow \mathcal {M}_{loc}(G)\) is likewise assumed to be weak* Borel. It is evident that if \((\mu _{t},\textbf{j}_{t})_{t\in [0,T]}\) satisfies the nonlocal continuity equation in the sense of Definition 2.12, then \((\mu _{t},\varvec{\nu }_{t})\) satisfies 2.2, provided that \(\varvec{\nu }_{t}\in \mathcal {M}_{loc}(G)\) is defined so that \(\frac{d\varvec{\nu }_{t}}{d\textbf{j}_{t}}(x,y)=\eta (x,y)\). Conversely, if \((\mu _{t},\varvec{\nu }_{t})\) satisfies 2.2, then defining \(\textbf{j}_{t}\) so that \(\frac{d\textbf{j}_{t}}{d\varvec{\nu }_{t}}(x,y)=\frac{1}{\eta (x,y)}\), we see that \((\mu _{t},\textbf{j}_{t})_{t\in [0,T]}\) satisfies the nonlocal continuity equation in the sense of Definition 2.12. Note that this latter definition (of \(\textbf{j}_{t}\)) is unproblematic on G, since by definition \(\eta (x,y)>0\) everywhere on G.

It is sometimes advantageous to work with a stronger notion of solution to the nonlocal continuity equation:

Definition 2.14

Let \(\rho _{t}(x):[0,1]\times \mathbb {R}^{d}\rightarrow \mathbb {R}_{+}\) be a probability density which is differentiable in t, and let \(j_{t}(x,y):[0,1]\times G\rightarrow \mathbb {R}\). We say that \((\rho _{t},j_{t})_{t\in [0,1]}\) is a classical solution to the nonlocal continuity equation provided that \(j_{t}(x,y)\) is antisymmetric and

$$\begin{aligned} \partial _{t}\rho _{t}(x)+\int _{\mathbb {R}^{d}}j_{t}(x,y)\eta (x,y)dy=0 \end{aligned}$$

holds pointwise in t and x.

Note that if \((\rho _{t},j_{t})_{t\in [0,1]}\) is a classical solution to the nonlocal continuity equation, then it holds that the time-dependent measures \((\rho _{t}dx,j_{t}dxdy)_{t\in [0,1]}\) are again a solution to the nonlocal continuity equation in the sense of Definition 2.12.

2.5 Nonlocal Wasserstein metric

Definition 2.15

Let \(\eta \) satisfy Assumption 2.1 (i–iv) and \(\theta \) satisfy Assumption 2.2, and let \(m\in \mathcal {M}_{loc}^{+}(\mathbb {R}^{d})\). The nonlocal Wasserstein distance \(\mathcal {W}_{\eta ,\theta ,m}\) on \(\mathcal {P}(\mathbb {R}^{d})\) is defined by

$$\begin{aligned} \mathcal {W}_{\eta ,\theta ,m}^{2}(\nu ,\sigma ):=\inf \left\{ \int _{0}^{1}\mathcal {A}_{\eta ,\theta ,m}(\mu _{t},\textbf{j}_{t})dt:(\mu _{t},\textbf{j}_{t})\in \mathcal{C}\mathcal{E}(\nu ,\sigma )\right\} . \end{aligned}$$

We will write \(\mathcal {W}_{\eta ,\theta }\) to denote the case where \(m=\text {Leb}\). Furthermore, we will usually drop the explicit reference to the choice of \(\theta \), and simply write \(\mathcal {W}_{\eta }\).

Remark 2.16

In view of Remarks 2.11 and 2.13, our definition of the nonlocal Wasserstein distance is equivalent to a special case of the nonlocal Wasserstein distance defined in [19].

Fact 2.17

\(\mathcal {W}_{\eta ,\theta ,m}\) is a pseudometric on \(\mathcal {P}(\mathbb {R}^{d})\). On \(\mathcal {P}(\mathbb {R}^{d})\), \(\mathcal {W}_{\eta ,\theta ,m}^{2}\) is jointly convex, and \(\mathcal {W}_{\eta ,\theta , m}\) is jointly narrowly lower semicontinuous.

The proof of this fact is exactly as in [19] and is therefore omitted.

Lemma 2.18

(antisymmetric flux) Let \((\mu _{t},\textbf{j}_{t})_{t\in [0,1]}\) solve the nonlocal continuity equation. Let \(\textbf{j}_{t}^{as}:=(\textbf{j}-\textbf{j}^{\top })/2\). Then

  1. (i)

    \((\mu _{t},\textbf{j}_{t}^{as})_{t\in [0,1]}\) also solves the nonlocal continuity equation.

  2. (ii)

    \(\mathcal {A}(\mu _{t},\textbf{j}_{t}^{as})\le \mathcal {A}(\mu _{t},\textbf{j}_{t})\) for all \(t\in [0,1]\).

In particular, given any \(\mathcal {W}_{\eta ,\theta , m}\) geodesic \((\mu _{t})_{t\in [0,1]}\), there exists a tangent flux which is antisymmetric.

Proof

The proof of (i) is identical to the argument presented in [25, Corollary 2.8] and is therefore omitted.

(ii) This likewise follows from a minor modification of arguments given in [25, Lemma 2.6 and Corollary 2.8]. Namely, we reason from Lemma 2.10, in particular from the convexity of the action in the \(\textbf{j}\) variable:

$$\begin{aligned} \mathcal {A}(\mu _{t},\textbf{j}_{t}^{as})\le \frac{1}{2}\mathcal {A}(\mu _{t},\textbf{j}_{t})+\frac{1}{2}\mathcal {A}(\mu _{t},-\textbf{j}_{t}^{T}). \end{aligned}$$

Now, selecting \(\lambda _{t}\in \mathcal {M}(G)\) so that \(\lambda _{t}\gg -\textbf{j}_{t}^{T}\),

$$\begin{aligned} \mathcal {A}_{\theta ,\eta }(\mu _{t},-\textbf{j}_{t}^{T})=\iint _{G}\frac{\left( \frac{d\left( -\textbf{j}_{t}^{T} (x,y)\right) }{d\lambda _{t}}\right) ^{2}}{2\theta \left( \frac{d(\mu \otimes m)}{d\lambda }(x,y),\frac{d(m\otimes \mu )}{d\lambda }(x,y)\right) }\eta (x,y)d\lambda _{t}(x,y). \end{aligned}$$

Note that without loss of generality we can take \(\lambda _{t}\) to also dominate \(\textbf{j}_{t}\), and furthermore we can take \(\lambda _{t}\) to be symmetric (by replacing \(\lambda _{t}\) with \((\lambda _{t}+\lambda _{t}^{T})/2\). Then, computing that

$$\begin{aligned} \frac{d(-\textbf{j}_{t}^{T})}{d\lambda _{t}}=-\frac{d\textbf{j}_{t}^{T}}{d\lambda _{t}}=-\left( \frac{d\textbf{j}_{t}}{d\lambda _{t}}\right) ^{T} \end{aligned}$$

and using the fact that \(\theta \left( \frac{d(\mu \otimes m)}{d\lambda }(x,y),\frac{d(m\otimes \mu )}{d\lambda }(x,y)\right) \), \(\eta (x,y)\), and \(\lambda _{t}\) are all symmetric, we see that

$$\begin{aligned} \iint _{G}\frac{\left( \frac{d\left( -\textbf{j}_{t}^{T}\right) }{d\lambda _{t}}\right) ^{2}}{2\theta \left( \frac{d(\mu \otimes m)}{d\lambda },\frac{d(m\otimes \mu )}{d\lambda } \right) } \, \eta d\lambda _{t} =\iint _{G}\frac{\left( \frac{d\textbf{j}_{t}}{d\lambda _{t}}\right) ^{2}}{2\theta \left( \frac{d(\mu \otimes m)}{d\lambda },\frac{d(m\otimes \mu )}{d\lambda } \right) } \, \eta d\lambda _{t}. \end{aligned}$$

Consequently, \(\mathcal {A}(\mu _{t},-\textbf{j}_{t}^{T})=\mathcal {A}(\mu _{t},\textbf{j}_{t})\), and so \(\mathcal {A}(\mu _{t},\textbf{j}_{t}^{as})\le \mathcal {A}(\mu _{t},\textbf{j}_{t})\). \(\square \)

Remark 2.19

Thus far, we have only defined \(\mathcal {W}_{\eta ,\theta ,m}(\mu ,\nu )\) in the case where \(\mu ,\nu \in \mathcal {P}(\mathbb {R}^{d})\). However, it is occasionally useful to consider the nonlocal Wasserstein distance between nonnegative Radon measures of equal mass: we do so, in particular, in the proof of Proposition 3.13 below. In particular, the action \(\mathcal {A}(\rho ,\textbf{j})\) is still well-defined for \(\rho \in \mathcal {M}^{+}(\mathbb {R}^{d})\), and it is trivial to modify the definition of a solution to the nonlocal continuity equation we have given in Definition 2.12, to allow for the case where \(\rho _{t}:[0,1]\rightarrow \mathcal {M}^{+}(\mathbb {R}^{d})\) but \(\rho _{t}\) has fixed total mass for all \(t\in [0,T]\). Therefore, in the case where \(\mu ,\nu \in \mathcal {M}^{+}(\mathbb {R}^{d})\) and \(\Vert \mu \Vert =\Vert \nu \Vert \), \(\mathcal {W}_{\eta ,\theta ,m}(\mu ,\nu )\) can be defined exactly as in Definition 2.15. More precisely, we have the following well-definedness and homogeneity result:

Proposition 2.20

(Extension of \(\mathcal {W}_{\eta ,\theta ,m}\) to general nonnegative measures) Let \(\mu ,\nu \in \mathcal {M}^{+}(\mathbb {R}^{d})\). Suppose that \(\Vert \mu \Vert _{TV}=\Vert \nu \Vert _{TV}=M>0\). Then,

$$\begin{aligned} \mathcal {W}_{\eta ,\theta ,m}^{2}(\mu ,\nu )=M\mathcal {W}_{\eta ,\theta ,m}^{2}\left( \frac{\mu }{M},\frac{\nu }{M}\right) . \end{aligned}$$

In particular, \(\mathcal {W}_{\eta ,\theta ,m}^{2}(\mu ,\nu )<\infty \) iff \(\,\mathcal {W}_{\eta ,\theta ,m}^{2}\left( \frac{\mu }{M},\frac{\nu }{M}\right) <\infty \).

Proof

Let \((\rho _{t},\textbf{j}_{t})_{t\in [0,1]}\in \mathcal{C}\mathcal{E}\left( \frac{\mu }{M},\frac{\nu }{M}\right) \), and suppose that \(\mathcal {W}_{\eta ,\theta ,m}^{2}\left( \frac{\mu }{M},\frac{\nu }{M}\right) =\int _{0}^{1}\mathcal {A}(\rho _{t},\textbf{j}_{t})dt\). Then, \(\left( M\rho _{t},M\textbf{j}_{t}\right) _{t\in [0,1]}\) is also a solution to the nonlocal continuity equation, with endpoints \(\mu \) and \(\nu \); consequently, using the 1-homogeneity of the action,

$$\begin{aligned} \mathcal {W}_{\eta ,\theta ,m}^{2}(\mu ,\nu )\le \int _{0}^{1}\mathcal {A}(M\rho _{t},M\textbf{j}_{t})dt=M\int _{0}^{1}\mathcal {A}(\rho _{t},\textbf{j}_{t})dt=M\mathcal {W}_{\eta ,\theta ,m}^{2}\left( \frac{\mu }{M},\frac{\nu }{M}\right) . \end{aligned}$$

By identical reasoning, we also deduce that \(\mathcal {W}_{\eta ,\theta ,m}^{2}\left( \frac{\mu }{M},\frac{\nu }{M}\right) \le \frac{1}{M}\mathcal {W}_{\eta ,\theta ,m}^{2}(\mu ,\nu )\). \(\square \)

2.6 Convolutions

We make frequent use of convolution estimates in Sects. 4 and 5. A number of elementary computations relating to convolutions are deferred to Appendix B; here, we fix notation and state some basic convolution stability results concerning the \(\mathcal {W}_{\eta }\) distances. Lastly, we show that a specific convolution kernel, the Laplace kernel \(K:=c_{K}e^{-|x-y|}\), has the property that smoothed densities \(K*\mu \) have relative Lipschitz regularity of the form \(\frac{K*\mu (x)}{K*\mu (x^{\prime })}\le 1+C|x-x^{\prime }|\), a fact which we exploit repeatedly in Sects. 4 and 5.

Definition 2.21

We say that \(k:(0,\infty )\rightarrow [0,\infty )\) is a convolution kernel if it is \(C^{1}\) on its support and normalized so that \(\int _{\mathbb {R}^{d}}k(|x|)dx=1\).

Given a convolution kernel k and a (possibly signed, possibly vector-valued) measure \(\mu \), we denote by \(k*\mu \) the convolution of \(k(|\,\cdot \,|)\) and \(\mu \):

$$\begin{aligned} k*\mu (x):=\int k(|x-y|)d\mu (y). \end{aligned}$$

We also use the following notation: given a convolution kernel k and a measure \(\mu \), we write \(\varvec{k}*\mu \) to denote the measure whose Lebesgue density is given by

$$\begin{aligned} \frac{d(\varvec{k}*\mu )}{dx}=k*\mu . \end{aligned}$$

Separately, for measures \(\textbf{j}\in \mathcal {M}_{loc}(G)\), we follow [19] and define the convolution \(\varvec{k}*\textbf{j}\in \mathcal {M}_{loc}(G)\) of a convolution kernel k (on \(\mathbb {R}^{d}\)) with \(\textbf{j}\) as follows:

$$\begin{aligned} \varvec{k}*\textbf{j}=\int _{\mathbb {R}^{d}}k(|z|)d\textbf{j}(x-z,y-z)dz \end{aligned}$$

in other words, for all \(\varphi \in C_{c}^{\infty }(G)\),

$$\begin{aligned} \iint _{G}\varphi (x,y)d(\varvec{k}*\textbf{j})(x,y)=\int _{\mathbb {R}^{d}}k(|z|)\varphi (x+z,y+z)d\textbf{j}(x,y)dz. \end{aligned}$$

Note that this definition may be understood as a special case of convolution with respect to a translation-invariant group action on a space: in this case, \(\mathbb {R}^{d}\) acts on G by the translation \((x,y)\mapsto (x+z,y+z)\), and this action is indeed translation-invariant since \(\eta (|x-y|)\) is preserved under this translation.

Proposition 2.22

(Stability of action and metric under convolution) Let k be any convolution kernel.

  1. (i)

    For any \(\mu \in \mathcal {P}(\mathbb {R}^{d})\) and \(\textbf{j}\in \mathcal {M}_{loc}(G)\), \(\mathcal {A}_{\theta ,\eta }(\varvec{k}*\mu ,\varvec{k}*\textbf{j})\le \mathcal {A}_{\theta ,\eta }(\mu ,\textbf{j})\).

  2. (ii)

    Let \((\mu _{t},\textbf{j}_{t})_{t\in [0,1]}\in \mathcal{C}\mathcal{E}\). Then, \((\varvec{k}*\mu _{t},\varvec{k}*\textbf{j}_{t})_{t\in [0,1]}\in \mathcal{C}\mathcal{E}\) also.

  3. (iii)

    ([19, Proposition 4.8]) For any \(\mu _{0},\mu _{1}\in \mathcal {P}_{2}(\mathbb {R}^{d})\), \(\mathcal {W}_{\theta ,\eta }(\varvec{k}*\mu _{0},\varvec{k}*\mu _{1})\le \mathcal {W}_{\theta ,\eta }(\mu _{0},\mu _{1})\).

Proof

(i) In Lemma A.3, we show that

$$\begin{aligned} \mathcal {A}_{\eta ,\theta }(\varvec{k}*\mu ,\varvec{k}*\textbf{j})\le \int _{\mathbb {R}^{d}}\mathcal {A}_{\eta ,\theta }(\mu _{z},\textbf{j}_{z})k(z)dz. \end{aligned}$$

The result now follows according to the reasoning given in [19, proof of Proposition 2.8]. (ii) follows by identical reasoning to [19, proof of Proposition 4.8]. Finally, (iii) follows by combining (i) and (ii): indeed, letting \((\mu _{t},\textbf{j}_{t})_{t\in [0,1]}\) be an action-minimizing solution to the nonlocal continuity equation with endpoints \(\mu _{0}\) and \(\mu _{1}\), we see that

$$\begin{aligned} \mathcal {W}_{\theta ,\eta }^{2}(\mu _{0},\mu _{1})=\int _{0}^{1}\mathcal {A}(\mu _{t},\textbf{j}_{t})dt\ge \int _{0}^{1}\mathcal {A}(\varvec{k}*\mu _{t},\varvec{k}*\textbf{j}_{t})dt\ge \mathcal {W}_{\theta ,\eta }(\varvec{k}*\mu _{0},\varvec{k}*\mu _{1}). \end{aligned}$$

\(\square \)

Remark 2.23

Similarly, it is known [3, Lemmas 8.1.9 and 8.1.0] that if \((\rho _{t},\mathbf {\textbf{j}}_{t})_{t\in [0,1]}\) solves the (local) continuity equation \(\partial _{t}\rho _{t}+{{\,\textrm{div}\,}}\mathbf {\textbf{j}}_{t}=0\) in the sense of distributions, and k is a convolution kernel, then \((\varvec{k}*\rho _{t},\varvec{k}*\mathbf {\textbf{j}}_{t})_{t\in [0,1]}\) is again a solution of \(\partial _{t}\rho _{t}+{{\,\textrm{div}\,}}\mathbf {\textbf{j}}_{t}=0\) in the sense of distributions; and similarly,

$$\begin{aligned} \int _{\mathbb {R}^{d}}\left| \frac{d(\varvec{k}*\mathbf {\textbf{j}}_{t})}{d(\varvec{k}*\mu _{t})}\right| ^{2}d(\varvec{k}*\mu _{t})\le \int _{\mathbb {R}^{d}}\left| \frac{d\mathbf {\textbf{j}}_{t}}{d\mu _{t}}\right| ^{2}d\mu _{t}. \end{aligned}$$

Therefore, applying the mass-flux presentation of the \(W_{2}\) metric described at the beginning of Sect. 2 above, we can reason exactly as in the proof of part (iii) of the previous proposition to deduce that for any convolution kernel k and probability measures \(\mu _{0}\) and \(\mu _{1}\),

$$\begin{aligned} W_{2}(\varvec{k}*\mu _{0},\varvec{k}*\mu _{1})\le W_{2}(\mu _{0},\mu _{1}). \end{aligned}$$

2.6.1 Relative Lipschitz estimate for the right convolution

Let \(K(x) = c_K \, e^{-|x|}\) where \(\frac{1}{c_K} =\int _{\mathbb {R}^d} e^{-|x|} dx\). Let \(K_\delta (x) = \frac{1}{\delta ^d} K \left( \frac{x}{\delta }\right) \), where \(\delta >0\).

Lemma 2.24

Consider \(\mu \in {\mathcal {P}}_2(\mathbb {R}^d)\). Let \(\mu _\delta = K_\delta * \mu \) Then

$$\begin{aligned} \ln \frac{\mu _\delta (y)}{\mu _\delta (x)} \le \frac{1}{\delta } |y-x|. \end{aligned}$$

Furthermore if \(|x-y|\le \delta \) then

$$\begin{aligned} \mu _\delta (y) \le \mu _\delta (x) \left( 1 + \frac{3}{\delta } |y-x| \right) \end{aligned}$$

Proof

Let \(h = y-x\). We can assume \(h \ne 0\). Compute that

$$\begin{aligned} |\ln \mu _\delta (x+h) - \ln \mu _\delta (x) |&\le \ln \frac{ \int e^{-|x-z|/\delta } e^{|h|/\delta } d\mu (z)}{ \int e^{-|x-z|/\delta } d\mu (z)} = \frac{|h|}{\delta }. \end{aligned}$$

Therefore

$$\begin{aligned} \ln \frac{\mu _\delta (y)}{\mu _\delta (x)} \le \frac{1}{\delta } |y-x|. \end{aligned}$$

Hence, for \(|y-x|<\delta \),

$$\begin{aligned} \mu _\delta (y) \le \mu _\delta (x) e^{|y-x|/\delta } \le \mu _\delta (x) \left( 1 + \frac{3}{\delta } |y-x| \right) . \end{aligned}$$

\(\square \)

3 Metric structure of \(\mathcal {W}_{\eta ,\theta }\) distances

3.1 General lower bounds for nonlocal Wasserstein distances

In this subsection, we consider nonlocal Wassersein distances \(\mathcal {W}_{\eta ,\theta ,m}\) with general reference measure m, since this complicates our analysis only minimally. As in Definition 2.15, we assume that \(\eta \) satsfies Assumption 2.1 (i–iv), and that \(\theta \) satisfies Assumption 2.2.

We first recall the following result of Erbar (which we specialize somewhat), which gives a partial characterization of the topology induced by the nonlocal Wasserstein distance.

Proposition 3.1

([19, Proposition 4.5]) Suppose that \(\rho _{0},\rho _{1}\in \mathcal {P}(\mathbb {R}^{d})\). The nonlocal Wasserstein distance \(\mathcal {W}_{\eta ,\theta ,m}\) with arbitrary reference measure m of Definition 2.15 satisfies

$$\begin{aligned} \sqrt{\frac{2}{\tilde{C}}}W_{1}(\rho _{0},\rho _{1})\le \mathcal {W}_{\eta ,\theta ,m}(\rho _{0},\rho _{1}). \end{aligned}$$

Here \(\tilde{C}=\sup _{x\in \mathbb {R}^{d}}\int _{\mathbb {R}^{d}}|x-y|^{2}\eta (|x-y|)dm(y)\).

In particular, observe that this \(W_{1}\) lower bound is vacuous in the case where \(m=\text {Leb}\) and the second moment of \(\eta \) is infinite.

When the lower bound in the previous proposition is non-vacuous, this shows, in particular, that the topology induced by \(\mathcal {W}_{\eta ,\theta }\) as strong or stronger than the narrow topology on \(\mathcal {P}(\mathbb {R}^{d})\). What we show below is that, when \(\eta \) is integrable, the topology is strictly stronger. More precisely we show that the topology induced by \(\mathcal {W}_{\eta ,\theta ,m}\) is at least as strong as the strong topology on \(\mathcal {P}(\mathbb {R}^{d})\). This indicates that the nonlocal Wasserstein distances are fundamentally different from the standard Wasserstein distances.

Proposition 3.2

Suppose that \(\rho _{0},\rho _{1}\in \mathcal {P}(\mathbb {R}^{d})\). The nonlocal Wasserstein distance \(\mathcal {W}_{\eta ,\theta ,m}\) with arbitrary reference measure m of Definition 2.15 satisfies

$$\begin{aligned} \sqrt{\frac{2}{C}}TV(\rho _{0},\rho _{1})\le \mathcal {W}_{\eta ,\theta ,m}(\rho _{0},\rho _{1}). \end{aligned}$$

Here \(C=\sup _{x\in \mathbb {R}^{d}}\int _{\mathbb {R}^{d}}\eta (|x-y|)dm(y)\).

This lower bound is vacuous when \(\eta \) is non-integrable. This suggests that the nonlocal transportation distance induces a weaker topology when \(\eta \) is non-integrable; we address this point later on in Lemma 3.11.

Proof

Recall that one of the several equivalent definitions of the TV norm is as follows:

$$\begin{aligned} TV(\rho _{0},\rho _{1})=\sup _{A\in \mathcal {B}(\mathbb {R}^{d})}|\rho _{0}(A)-\rho _{1}(A)|. \end{aligned}$$

Let A be some measurable set such that \(|\rho _{0}(A)-\rho _{1}(A)|>TV(\rho _{0},\rho _{1})-\frac{\varepsilon }{2}.\) Any mass-flux pair \((\rho _{t},\textbf{j}_{t})\) connecting \(\rho _{0}\) to \(\rho _{1}\) must therefore move at least \(TV(\rho _{0},\rho _{1})-\varepsilon \) of mass from A to \(A^{C}\) (or vice versa). Without loss of generality, we can take A to be compact.

Let \((\rho _{t},\textbf{j}_{t})\in \mathcal{C}\mathcal{E}(\rho _{0},\rho _{1})\) be an action-minimizing mass-flux pair, so that \(\mathcal {W}_{\eta ,\theta ,m}^{2}(\rho _{0},\rho _{1})=\int _{0}^{1}\mathcal {A}(\rho _{t},\textbf{j}_{t})dt\); without loss of generality we can assume, by [19, Proposition 4.3], that \((\rho _{t},\textbf{j}_{t})\) is unit speed in the sense that \(\mathcal {W}_{\eta ,\theta ,m}^{2}(\rho _{0},\rho _{1})=\mathcal {A}(\rho _{t},\textbf{j}_{t})\) for almost all \(t\in [0,1]\). We may also assume, without loss of generality, that \(\textbf{j}_{t}\) is antisymmetric for almost all t, by Lemma 2.18.

Let \(\lambda _{t}\) be some measure which dominates all of \(\textbf{j}_{t}\), \(m\otimes \rho _{t}\), and \(\rho _{t}\otimes m\). The action for \((\rho _{t},\textbf{j}_{t})\) is

$$\begin{aligned} \mathcal {A}_{\theta }(\rho _{t},\textbf{j}_{t})&=\frac{1}{2}\iint _{G}\left( \frac{\left( \frac{d\textbf{j}_{t}}{d\lambda _{t}}(x,y)\right) ^{2}}{\theta \left( \frac{d(\rho _{t}\otimes m)}{d\lambda _{t}}(x,y),\frac{d(m\otimes \rho _{t})}{d\lambda _{t}}(x,y)\right) }\right) \eta (x,y)d\lambda _{t}(x,y). \end{aligned}$$

Applying the Cauchy-Schwarz inequality, and using the fact that \(\left| \frac{d\textbf{j}_{t}}{d\lambda _{t}}\right| =\frac{d|\textbf{j}_{t}|}{d\lambda _{t}}\), we get

$$\begin{aligned} \mathcal {A}_{\theta }(\rho _{t},\textbf{j}_{t})&\ge \frac{1}{2}\left( \iint _{G}\frac{d|\textbf{j}_{t}|}{d\lambda _{t}}(x,y)\eta (x,y)d\lambda _{t}(x,y)\right) ^{2}\\&\quad \times \left( \iint _{G}\theta \left( \frac{d(\rho _{t}\otimes m)}{d\lambda _{t}}(x,y),\frac{d(m\otimes \rho _{t})}{d\lambda _{t}}(x,y)\right) \eta (x,y)d\lambda _{t}(x,y)\right) ^{-1}. \end{aligned}$$

By Lemma 2.4, \(\theta \) automatically satisfies \(\theta (r,s)\le (r+s)/2\), so we find that

$$\begin{aligned}{} & {} \left( \iint _{G}\theta \left( \frac{d(\rho _{t}\otimes m)}{d\lambda _{t}}(x,y),\frac{d(m\otimes \rho _{t})}{d\lambda _{t}}(x,y)\right) \eta (x,y)d\lambda _{t}(x,y)\right) \\{} & {} \le \frac{1}{2}\left( \iint _{G}\left( \frac{d(\rho _{t}\otimes m)}{d\lambda _{t}}(x,y)+\frac{d(m\otimes \rho _{t})}{d\lambda _{t}}(x,y)\right) \eta (x,y)d\lambda _{t}(x,y)\right) . \end{aligned}$$

By the Radon-Nikodym theorem, we have the estimate

$$\begin{aligned} \iint _{G}\frac{d\rho _{t}\otimes m}{d\lambda _{t}}(x,y)\eta (x,y)d\lambda _{t}(x,y)&=\iint _{G}\eta (x,y)d(\rho _{t}\otimes m)(x,y)\\&\le \int _{\mathbb {R}^{d}}\left[ \sup _{x\in \mathbb {R}^{d}}\int _{\mathbb {R}^{d}}\eta (|x-y|)dm(y)\right] d\rho _{t}(x)\\&=\sup _{x\in \mathbb {R}^{d}}\int _{\mathbb {R}^{d}}\eta (|x-y|)dm(y) =: C. \end{aligned}$$

The same estimate works if we replace \(\rho _{t}(y)\) with \(\rho _{t}(x)\); hence, we conclude that

$$\begin{aligned} \iint _{G}\theta \left( \frac{d(\rho _{t}\otimes m)}{d\lambda _{t}}(x,y),\frac{d(m\otimes \rho _{t})}{d\lambda _{t}}(x,y)\right) \eta (x,y)d\lambda _{t}(x,y)\le C. \end{aligned}$$

Therefore, (applying the Radon–Nikodym theorem once more)

$$\begin{aligned} \mathcal {A}_{\theta }(\rho _{t},\textbf{j}_{t})\ge \frac{1}{2C}\left( \iint _{G}\frac{d|\textbf{j}_{t}|}{d\lambda _{t}}(x,y)\eta (x,y)d\lambda _{t}\right) ^{2}=\frac{1}{2C}\left( \iint _{G}\eta (x,y)d|\textbf{j}_{t}|(x,y)\right) ^{2}. \end{aligned}$$

Let \(\xi _{\delta }\) be asmooth cutoff function for the set A, more precisely \(\xi _{\delta }=1\) on A, 0 on \(A_{\delta }^{c}\), and smooth on \(\mathbb {R}^{d}\) (the existence of such a \(\xi _{\delta }\) is guaranteed by the smooth version of Urysohn’s lemma, see [35, Proposition 2.25]). We use \(\xi _{\delta }(x)\) as a test function in the nonlocal continuity equation: by [25, Lemma 2.15] we find that

$$\begin{aligned} \int _{\mathbb {R}^{d}}\xi _{\delta }(x)d\rho _{1}(x)-\int _{\mathbb {R}^{d}}\xi _{\delta }(x)d\rho _{0}=-\frac{1}{2}\int _{0}^{1}\iint _{G}(\xi _{\delta }(y)-\xi _{\delta }(x))\eta (x,y)d\textbf{j}_{t}(x,y)dt. \end{aligned}$$

Note that \(|\xi _{\delta }(y)-\xi _{\delta }(x)|\le 1\), so

$$\begin{aligned} \left| \int _{\mathbb {R}^{d}}\xi _{\delta }(x)d\rho _{1}(x)-\int _{\mathbb {R}^{d}}\xi _{\delta }(x)d\rho _{0}\right|&=\frac{1}{2}\left| \int _{0}^{1}\iint _{G}(\xi _{\delta }(y)-\xi _{\delta }(x))\eta (x,y)d\textbf{j}_{t}(x,y)dt\right| \\&\le \frac{1}{2}\int _{0}^{1}\iint _{G}\eta (x,y)d|\textbf{j}_{t}|(x,y)dt. \end{aligned}$$

Now, selecting \(\delta >0\) so that

$$\begin{aligned} \left| \left( \int _{\mathbb {R}^{d}}\xi _{\delta }(x)d\rho _{1}(x)-\int _{\mathbb {R}^{d}}\xi _{\delta }(x)d\rho _{0}\right) -\left( \rho _{1}(A)-\rho _{0}(A)\right) \right| <\frac{\varepsilon }{2} \end{aligned}$$

we can compute that

$$\begin{aligned} TV(\rho _{0},\rho _{1})&\le |\rho _{1}(A)-\rho _{0}(A)|+\frac{\varepsilon }{2}\\&\le \left| \int _{\mathbb {R}^{d}}\xi _{\delta }(x)d\rho _{1}(x)-\int _{\mathbb {R}^{d}}\xi _{\delta }(x)d\rho _{0}\right| +\varepsilon \\&\le \frac{1}{2}\int _{0}^{1}\iint _{G}\eta (x,y)d|\textbf{j}_{t}|(x,y)dt+\varepsilon \\&\le \frac{1}{2}\sqrt{2C}\int _{0}^{1}\sqrt{\mathcal {A}(\rho _{t},\textbf{j}_{t})}dt+\varepsilon . \end{aligned}$$

Since \(\mathcal {W}_{\eta ,\theta ,m}(\rho _{0},\rho _{1}):=\int _{0}^{1}\sqrt{\mathcal {A}(\rho _{t},\textbf{j}_{t})}dt\), and \(\varepsilon >0\) was arbitrary, we conclude that

$$\begin{aligned} \mathcal {W}_{\eta ,\theta ,m}(\rho _{0},\rho _{1})\ge \sqrt{\frac{2}{C}}TV(\rho _{0},\rho _{1})\qquad C=\sup _{x\in \mathbb {R}^{d}}\int _{\mathbb {R}^{d}}\eta (|x-y|)dm(y) \end{aligned}$$

as desired. \(\square \)

We also have the following technical corollary, which will be used in Proposition 3.6.

Corollary 3.3

Suppose that \(C:=\sup _{x\in \mathbb {R}^{d}}\int _{\mathbb {R}^{d}}\eta (|x-y|)dm(y)<\infty \). Let \(\rho _{0},\rho _{1}\in \mathcal {P}(\mathbb {R}^{d})\), and suppose that \(\mathcal {W}_{\eta ,\theta ,m}(\rho _{0},\rho _{1})<\infty \). Let \((\rho _{t},\textbf{j}_{t})\in \mathcal{C}\mathcal{E}(\rho _{0},\rho _{1})\) be a constant-speed action-minimizing mass-flux pair, so that \(\mathcal {W}_{\eta ,\theta ,m}^{2}(\rho _{t_{0}},\rho _{t_{1}})=\int _{t_{0}}^{t_{1}}\mathcal {A}(\rho _{t},\textbf{j}_{t})dt\) for all \(0\le t_{0}<t_{1}\le 1\). Then, for any Borel \(A\subset \mathbb {R}^{d}\), the function \(t\mapsto \rho _{t}(A)\) is \(\frac{1}{2}\)-Hölder continuous.

Proof

Let \(0\le t_{0}<t_{1}\le 1\). Consider \((\rho _{t},\textbf{j}_{t})_{t\in [t_{0},t_{1}]}\), the \(t_{0}\)-to-\(t_{1}\) restriction of \((\rho _{t},\textbf{j}_{t})\in \mathcal{C}\mathcal{E}(\rho _{0},\rho _{1})\). Let \((\tilde{\rho }_{t},\tilde{\textbf{j}_{t}})_{t\in [0,1]}\) denote the uniform reparametrization of \((\rho _{t},\textbf{j}_{t})_{t\in [t_{0},t_{1}]}\) into a unit-time solution to the nonlocal continuity equation; compute that

$$\begin{aligned} \int _{t_{0}}^{t_{1}}\mathcal {A}(\rho _{t},\textbf{j}_{t})dt=(t_{1}-t_{0})\int _{0}^{1}\mathcal {A}(\tilde{\rho }_{t},\tilde{\textbf{j}_{t}})dt. \end{aligned}$$

But, \((\tilde{\rho }_{t},\tilde{\textbf{j}_{t}})_{t\in [0,1]}\) is also an action-minimizing solution to the nonlocal continuity equation—otherwise, we could locally replace \((\rho _{t},\textbf{j}_{t})_{t\in [t_{0},t_{1}]}\) and get a lower-action mass-flux pair connecting \(\rho _{0}\) and \(\rho _{1}\), which is ruled out by assumption. Therefore, by Proposition 3.2,

$$\begin{aligned} (t_{1}-t_{0})^{1/2}\mathcal {W}_{\theta ,\eta ,m}(\rho _{t_{0}},\rho _{t_{1}})\ge \sqrt{\frac{2}{C}}TV(\rho _{t_{0}},\rho _{t_{1}}). \end{aligned}$$

Now, let A be any Borel set. Since \(TV(\rho _{t_{0}},\rho _{t_{1}})=\sup _{A\in \mathcal {B}(\mathbb {R}^{d})}|\rho _{t_{0}}(A)-\rho _{t_{1}}(A)|\), we find that

$$\begin{aligned} (t_{1}-t_{0})^{1/2}\mathcal {W}_{\theta ,\eta ,m}(\rho _{t_{0}},\rho _{t_{1}})\ge \sqrt{\frac{2}{C}}|\rho _{t_{0}}(A)-\rho _{t_{1}}(A)|. \end{aligned}$$

Finally, since \(\mathcal {W}_{\theta ,\eta ,m}(\rho _{t_{0}},\rho _{t_{1}})\le \mathcal {W}_{\theta ,\eta ,m}(\rho _{0},\rho _{1})\), we find that

$$\begin{aligned} |\rho _{t_{0}}(A)-\rho _{t_{1}}(A)|\le \sqrt{\frac{2}{C}}\mathcal {W}_{\theta ,\eta ,m}(\rho _{0},\rho _{1})(t_{1}-t_{0})^{1/2} \end{aligned}$$

which shows that \(t\mapsto \rho _{t}(A)\) is \(\frac{1}{2}\)-Hölder continuous, as desired. \(\square \)

3.2 Expel problem for \(\mathcal {W}_{\eta ,\theta }\)

In this subsection we consider the expel problem for nonlocal Wasserstein distances. That is, given a Dirac mass, say at the origin \(\delta _{0}\) for concreteness, we wish to estimate \(\inf _{\nu \bot \delta _{0}}\mathcal {W}_{\eta ,\theta }(\delta _{0},\nu )\). Throughout, we only consider the case where the reference measure is the Lebesgue measure. In this subsection, we also assume that \(\eta \) satisfies Assumption 2.1 (i–v), and that \(\theta \) satisfies Assumption 2.2.

We shall make repeated use of an adaptation of a specific computation from [37], which we present separately as Lemma B.1.

Lemma 3.4

(expel cost upper bound) Let \(0<\delta <\varepsilon \) where \(\varepsilon \le 1\), and let \(x_{0}\in \mathbb {R}^{d}\). Suppose that there is some \(s>0\) and some constant \(c_{s}>0\) such that \(\eta (x,y)\ge c_{s}|x-y|^{-d-s}\) when \(|x-y|\le \frac{\delta }{\varepsilon }\). Let \(\mathfrak {m}_{B(x_{0},\delta )}\) denote the uniform probability measure on the ball \(B(x_{0},\delta )\). Then,

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\delta _{x_{0}},\mathfrak {m}_{B(x_{0},\delta )})\le \frac{C_{\theta }}{C_{d,s}}\left( \frac{\delta }{\varepsilon }\right) ^{s/2} \end{aligned}$$

where \(C_{\theta }:=\int _{0}^{1}\frac{1}{\sqrt{\theta (1-r,1+r)}}dr\) and \(C_{d,s}\) is given explicitly in the proof.

An important consequence of the result above is that \(\mathcal {W}_{\eta ,\varepsilon }(\delta _{x_{0}},\mathfrak {m}_{B(x_{0},\delta )}) \rightarrow 0\) as \(\delta \rightarrow 0\) and hence the expel cost is zero: \(\inf _{\nu \bot \delta _{0}}\mathcal {W}_{\eta ,\theta }(\delta _{0},\nu )=0\).

Proof

Given \(x_{0}\in \mathbb {R}^{d}\) and \(r>s>0\) we write \(\mathfrak {A}(x_{0},r)\) to denote \(B(x_{0},r)\backslash B(x_{0},\frac{r}{2})\), that is, the annulus of outer radius r and inner radius \(\frac{r}{2}\). We let \(\mathfrak {m}_{\mathfrak {A}(x_{0},r)}\) denote the uniform probability measure on \(\mathfrak {A}(x_{0},r)\).

Fix \(\delta >0\). Applying Lemma B.1 with \(A=\mathfrak {A}(x_{0},\delta 2^{-n})\) and \(B=\mathfrak {A}(x_{0},2^{-n-1})\), we find that

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\mathfrak {m}_{\mathfrak {A}(x_{0},\delta 2^{-n})},\mathfrak {m}_{\mathfrak {A}(x_{0},\delta 2^{-n-1})})\le \frac{C_{\theta }}{4\sqrt{|\mathfrak {A}(x_{0},\delta 2^{-n})|\eta _{\varepsilon }(\delta \frac{3}{2}2^{-n})}}. \end{aligned}$$

Let us write this upper bound in a more explicit fashion. We know that

$$\begin{aligned} |\mathfrak {A}(x_{0},\delta 2^{-n})|=\alpha _{d}\left( \delta 2^{-n}\right) ^{d}-\alpha _{d}\left( \delta 2^{-n-1}\right) ^{d}=\alpha _{d}\delta ^{d}\left( \frac{1}{2^{nd}}-\frac{1}{2^{nd+d}}\right) \end{aligned}$$

where \(\alpha _{d}\) is the volume of the d-dimensional unit ball. On the other hand, \(\eta _{\varepsilon }\left( \delta \frac{3}{2}2^{-n}\right) =\frac{1}{\varepsilon ^{d}}\eta (\frac{\delta }{\varepsilon }\frac{3}{2}2^{-n})\). Suppose now that on \(B(x_{0},\frac{\delta }{\varepsilon })\), \(\eta (x,y)\ge c_{s}|x-y|^{-d-s}\) where \(s>0\). Then

$$\begin{aligned} \eta _{\varepsilon }\left( \frac{\delta }{\varepsilon }\frac{3}{2}2^{-n}\right) \ge \frac{1}{\varepsilon ^{d}}c_{s}\left( \frac{\delta }{\varepsilon }\right) ^{-d-s}\left( \frac{3}{2}2^{-n}\right) ^{-d-s} \end{aligned}$$
(3.1)

so that

$$\begin{aligned} |\mathfrak {A}(x_{0},\delta 2^{-n})|\eta _{\varepsilon }\left( \delta \frac{3}{2}2^{-n}\right)&\ge \alpha _{d}\delta ^{d}\left( \frac{1}{2^{nd}}-\frac{1}{2^{nd+d}}\right) \cdot \frac{1}{\varepsilon ^{d}}c_{s}\left( \frac{\delta }{\varepsilon }\right) ^{-d-s}\left( \frac{3}{2}2^{n}\right) ^{d+s}\\&=\alpha _{d}\left( \frac{\delta }{\varepsilon }\right) ^{-s}2^{ns}\left( 1-\frac{1}{2^{d}}\right) c_{s}\left( \frac{3}{2}\right) ^{-d-s}. \end{aligned}$$

Putting \(\tilde{C}_{d,s}=2\left( \alpha _{d}c_{s}\left( \frac{3}{2}\right) ^{-d-s}\right) ^{1/2}\), this shows that

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\mathfrak {m}_{\mathfrak {A}(x_{0},\delta 2^{-n})},\mathfrak {m}_{\mathfrak {A}(x_{0},\delta 2^{-n-1})})\le \frac{C_{\theta }}{\tilde{C}_{d,s}}\left( \frac{\delta }{\varepsilon }\right) ^{s/2}2^{-ns/2}. \end{aligned}$$

Summing the geometric series, we find

$$\begin{aligned} \sum _{n=0}^{\infty }\mathcal {W}_{\eta ,\varepsilon }(\mathfrak {m}_{\mathfrak {A}(x_{0},\delta 2^{-n})},\mathfrak {m}_{\mathfrak {A}(x_{0},\delta 2^{-n-1})})\le \frac{C_{\theta }}{\tilde{C}_{d,s}}\frac{1}{1-2^{s/2}}\left( \frac{\delta }{\varepsilon }\right) ^{s/2}. \end{aligned}$$

It follows that \(\mathcal {W}_{\eta ,\varepsilon }(\delta _{x_{0}},\mathfrak {m}_{\mathfrak {A}(x_{0},\delta )})\le \frac{C_{\theta }}{\tilde{C}_{d,s}}\frac{1}{1-2^{s/2}}\left( \frac{\delta }{\varepsilon }\right) ^{s/2}\). To see why, observe that \((\mathfrak {m}_{\mathfrak {A}(x_{0},\delta 2^{-n})})_{n\in \mathbb {N}}\) converges to \(\delta _{x_{0}}\) in \(W_{1}\) and thus in the narrow topology. Since \(\mathcal {W}_{\eta ,\varepsilon }\) is jointly l.s.c. with respcet to the narrow topology, we find that

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\delta _{x_{0}},\mathfrak {m}_{\mathfrak {A}(x_{0},\delta )})&\le \liminf _{k\rightarrow \infty }\mathcal {W}_{\eta ,\varepsilon }(\mathfrak {m}_{\mathfrak {A}(x_{0},\delta 2^{-k})},\mathfrak {m}_{\mathfrak {A}(x_{0},\delta )})\\&\le \liminf _{k\rightarrow \infty }\sum _{n=0}^{k}\mathcal {W}_{\eta ,\varepsilon }(\mathfrak {m}_{\mathfrak {A}(x_{0},\delta 2^{-n})},\mathfrak {m}_{\mathfrak {A}(x_{0},\delta 2^{-n-1})})\\&\le \frac{C_{\theta }}{\tilde{C}_{d,s}}\frac{1}{1-2^{s/2}}\left( \frac{\delta }{\varepsilon }\right) ^{s/2}. \end{aligned}$$

Finally, we can easily upper bound \(\mathcal {W}_{\eta ,\varepsilon }(\mathfrak {m}_{\mathfrak {A}(x_{0},\delta )},\mathfrak {m}_{B(x_{0},\delta )})\). We use yet another construction based on the \(\mathcal {W}\) geodesic in the two-point space: we use exactly the same computation as in the proof of Lemma B.1. Indeed, consider the curve \(\rho _{t}:[0,1]\rightarrow \mathcal {P}(\mathbb {R}^d)\) defined by

$$\begin{aligned} \frac{d\rho _{t}}{d\text {Leb}}(x)={\left\{ \begin{array}{ll} \frac{1-\gamma _{t}}{2|\mathfrak {A}(x_{0},\delta )|} &{} x\in \mathfrak {A}(x_{0},\delta )\\ \frac{1+\gamma _{t}}{2|B(x_{0},\delta 2^{-1})|} &{} x\in B(x_{0},\delta 2^{-1})\\ 0 &{} \text {else}. \end{array}\right. } \end{aligned}$$

Additionally, let \(\textbf{j}_{t}\) be chosen exactly as in the proof of Lemma B.1. This \((\rho _{t},\textbf{j}_{t})\) is constructed so that \(\rho _{0}=\mathfrak {m}_{\mathfrak {A}(x_{0},\delta )}\) and \(\rho _{1}=\mathfrak {m}_{B(x_{0},\frac{\delta }{2})}\), and so that the mass on \(\mathfrak {A}\left( x_{0},\delta \right) \) is decreasing uniformly on the set, and continuously in time; therefore, there is a \(t_{0}\in (0,1)\) such that \(\rho _{t_{0}}\) has uniform distribution on \(B(x_{0},\delta )\).

In particular, it follows that

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\mathfrak {m}_{\mathfrak {A}(x_{0},\delta )},\mathfrak {m}_{B(x_{0},\delta )})\le \sqrt{\int _{0}^{1}\mathcal {A}(\rho _{t},\textbf{j}_{t})dt}. \end{aligned}$$

Note however that \(\sqrt{\int _{0}^{1}\mathcal {A}(\rho _{t},\textbf{j}_{t})dt}\) is none other than the upper bound for \(\mathcal {W}_{\eta ,\varepsilon }(\mathfrak {m}_{\mathfrak {A}(x_{0},\delta )},\mathfrak {m}_{B(x_{0},\frac{\delta }{2})})\), so we have that

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\mathfrak {m}_{\mathfrak {A}(x_{0},\delta )},\mathfrak {m}_{B(x_{0},\delta )})&\le \frac{C_{\theta }}{4\sqrt{|B\left( x_{0},\frac{\delta }{2}\right) |\eta _{\varepsilon }\left( \frac{3}{2}\delta \right) }} \le \frac{C_{\theta }}{4\sqrt{\alpha _{d}\left( \frac{\delta }{2}\right) ^{d}\frac{1}{\varepsilon ^{d}}c_{s}\left( \frac{3}{2}\frac{\delta }{\varepsilon }\right) ^{-d-s}}}\\&=\frac{C_{\theta }}{2^{1-d/2}\tilde{C}_{d,s}}\left( \frac{\delta }{\varepsilon }\right) ^{s/2}. \end{aligned}$$

Therefore, by the triangle inequality, we have that

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\delta _{x_{0}},\mathfrak {m}_{B(x_{0},\delta )})\le \frac{C_{\theta }}{C_{d,s}}\left( \frac{\delta }{\varepsilon }\right) ^{s/2} \end{aligned}$$

where \(C_{d,s}=\tilde{C}_{d,s}\left( \frac{1}{(1-2^{s/2})}+2^{d/2-1}\right) ^{-1}.\) \(\square \)

The previous lemma computed an upper bound on the expel cost for general interpolation \(\theta \), in the case where \(\eta (|\,\cdot \,|)\) is non-integrable in \(B(0,\delta )\). It is also possible to provide an expel upper bound in the case where \(\theta \) is nonzero on the boundary—this condition is satisfied, for instance, by the arithmetic mean, but not by the logarithmic mean.

Lemma 3.5

Let \(0<\delta <\varepsilon \) where \(\varepsilon \le 1\), and let \(x_{0}\in \mathbb {R}^{d}\). Let \(\mathfrak {m}_{B(x_{0},\delta )}\) denote the uniform probability measure on the ball \(B(x_{0},\delta )\). Suppose that \(\theta (1,0)=\kappa _{\theta }>0\). (Note that if \(\theta (r,s)=\frac{r+s}{2}\), then \(\kappa _{\theta }=\frac{1}{2}\).) In this case,

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\delta _{x_{0}},\mathfrak {m}_{B(x_{0},\delta )})\le \frac{1}{\sqrt{\kappa _{\theta }\alpha _{d}\left( \frac{\delta }{\varepsilon }\right) ^{d}\eta \left( \frac{\delta }{\varepsilon }\right) }} \end{aligned}$$

where \(\alpha _{d}\) is the volume of the d-dimensional unit ball.

Proof

Let \(0<\delta <\varepsilon \). Let \(g:[0,1]\rightarrow [0,1]\) be a function to be determined later, such that \(g(0)=0\) and \(g(1)=1\). Let \(\gamma :=\text {Leb}+\delta _{x_{0}}\). Consider the curve \(\rho _{t}:[0,1]\rightarrow \mathcal {P}(\mathbb {R}^d)\) defined by

$$\begin{aligned} \frac{d\rho _{t}}{d\gamma }(x)={\left\{ \begin{array}{ll} g(t) &{} x=x_{0}\\ \frac{1-g(t)}{|B(x_{0},\delta )|} &{} x\in B(x_{0},\delta )\backslash \{x_{0}\}\\ 0 &{} \text {else}. \end{array}\right. } \end{aligned}$$

Note that with our given boundary conditions on g(t), \(\rho _{0}\) is the uniform measure on \(B(x_{0},\delta _{0})\), and \(\rho _{1}=\delta _{x_{0}}\). Note also that by construction,

$$\begin{aligned} \frac{d}{dt}\frac{d\rho _{t}}{d\gamma }(x)={\left\{ \begin{array}{ll} g^{\prime }(t) &{} x=x_{0}\\ \frac{-g^{\prime }(t)}{|B(x_{0},\delta )|} &{} x\in B(x_{0},\delta )\backslash \\ 0 &{} \text {else}. \end{array}\right. }\{x_{0}\} \end{aligned}$$

Let \(\textbf{j}_{t}(x,y)\) be a flux so that \((\rho _{t},\textbf{j}_{t})\) solves the nonlocal continuity equation; in particular we set

$$\begin{aligned} \frac{d\textbf{j}_{t}}{d(\gamma \otimes \gamma )}(x,y)={\left\{ \begin{array}{ll} -\frac{g^{\prime }(t)}{2\eta _{\varepsilon }(x,y)|B(x_{0},\delta )|} &{} (x,y)\in \{x_{0}\}\times B(x_{0},\delta )\backslash \{x_{0}\}\\ \frac{g^{\prime }(t)}{2\eta _{\varepsilon }(x,y)|B|} &{} (x,y)\in B(x_{0},\delta )\backslash \{x_{0}\}\times \{x_{0}\}\\ 0 &{} \text {else}. \end{array}\right. } \end{aligned}$$

Together, since \(\gamma \otimes \gamma \) dominates all of \(\textbf{j}_{t}\), \(\rho _{t}\otimes \text {Leb}\), and \(\text {Leb}\otimes \rho _{t}\), the action of \((\rho _{t},\textbf{j}_{t})\) is then

$$\begin{aligned} \mathcal {A}(\rho _{t},\textbf{j}_{t})&:=\iint _{G}\frac{\left( \frac{d\textbf{j}_{t}}{d(\gamma \otimes \gamma )}(x,y)\right) ^{2}}{2\theta \left( \frac{d(\rho _{t}\otimes \text {Leb})}{d(\gamma \otimes \gamma )}(x,y),\frac{d(\text {Leb}\otimes \rho _{t})}{d(\gamma \otimes \gamma )}(x,y)\right) }\eta _{\varepsilon }(x,y)d\gamma (x)d\gamma (y). \end{aligned}$$

Observe that

$$\begin{aligned} \frac{d(\rho _{t}\otimes \text {Leb})}{d(\gamma \otimes \gamma )}=\frac{d(\rho _{t}\otimes \text {Leb})}{d((\text {Leb}+\delta _{x_{0}})\otimes (\text {Leb}+\delta _{x_{0}}))}(x,y)={\left\{ \begin{array}{ll} \frac{d\rho _{t}}{d(\text {Leb}+\delta _{x_{0}})}(x) &{} y\ne x_{0}\\ 0 &{} y=x_{0} \end{array}\right. } \end{aligned}$$

and similarly for \(\frac{d(\text {Leb}\otimes \rho _{t})}{d(\gamma \otimes \gamma )}\). Moreover, note that \(\frac{d\textbf{j}_{t}}{d(\gamma \otimes \gamma )}(x,y)=0\) off of \(\{x_{0}\}\times B(x_{0},\delta )\backslash \{x_{0}\}\cup B(x_{0},\delta )\backslash \{x_{0}\}\times \{x_{0}\}\). Consequently,

$$\begin{aligned} \mathcal {A}(\rho _{t},\textbf{j}_{t})&=\iint _{\{x_{0}\}\times B(x_{0},\delta )\backslash \{x_{0}\}}\frac{\left( -\frac{g^{\prime }(t)}{2\eta _{\varepsilon }(x,y)|B(x_{0},\delta )|}\right) ^{2}}{2\theta \left( \frac{d\rho _{t}}{d(\text {Leb}+\delta _{x_{0}})}(x)\textbf{1}_{y\ne x_{0}},\frac{d\rho _{t}}{d(\text {Leb}+\delta _{x_{0}})}(y)\textbf{1}_{x\ne x_{0}}\right) }\\&\qquad \qquad \eta _{\varepsilon }(x,y)d\gamma (x)d\gamma (y)\\&\quad +\iint _{B(x_{0},\delta )\backslash \{x_{0}\}\times \{x_{0}\}}\frac{\left( \frac{g^{\prime }(t)}{2\eta _{\varepsilon }(x,y)|B(x_{0},\delta )|}\right) ^{2}}{2\theta \left( \frac{d\rho _{t}}{d(\text {Leb}+\delta _{x_{0}})}(x)\textbf{1}_{y\ne x_{0}},\frac{d\rho _{t}}{d(\text {Leb}+\delta _{x_{0}})}(y)\textbf{1}_{x\ne x_{0}}\right) }\\&\qquad \qquad \eta _{\varepsilon }(x,y)d\gamma (x)d\gamma (y)\\&=\int _{B(x_{0},\delta )\backslash \{x_{0}\}}\frac{\left( \frac{g^{\prime }(t)}{2\eta _{\varepsilon }(x_{0},y)|B(x_{0},\delta )|}\right) ^{2}}{\theta \left( g(t),0\right) }\\&\qquad \qquad \eta _{\varepsilon }(x_{0},y)dy. \end{aligned}$$

Clearly, if it were the case that \(\theta \left( g(t),0\right) =0\), then the action of \((\rho _{t},\textbf{j}_{t})\) would be infinite. However, since we have instead assumed that \(\theta (1,0)=\kappa _{\theta }>0\), and so by 1-homogeneity, \(\theta \left( g(t),0\right) =\kappa _{\theta }g(t)\) for all t, and hence

$$\begin{aligned} \mathcal {A}(\rho _{t},\textbf{j}_{t})&=\int _{B(x_{0},\delta )\backslash \{x_{0}\}}\frac{\left( \frac{g^{\prime }(t)}{2\eta _{\varepsilon }(x_{0},y)|B(x_{0},\delta )|}\right) ^{2}}{\kappa _{\theta }g(t)}\eta _{\varepsilon }(x_{0},y)dy\\&=\frac{1}{4\kappa _{\theta }|B(x_{0},\delta )|^{2}}\int _{B(x_{0},\delta )\backslash \{x_{0}\}}\frac{(g^{\prime }(t))^{2}}{g(t)\eta _{\varepsilon }(x_{0},y)}dy\\&\le \frac{1}{4\kappa _{\theta }|B(x_{0},\delta )|\eta _{\varepsilon }(\delta )}\frac{(g^{\prime }(t))^{2}}{g(t)}. \end{aligned}$$

Consequently,

$$\begin{aligned} \int _{0}^{1}\mathcal {A}(\rho _{t},\textbf{j}_{t})\le \frac{1}{4\kappa _{\theta }|B(x_{0},\delta )|\varepsilon ^{-d}\eta \left( \frac{\delta }{\varepsilon }\right) }\int _{0}^{1}\frac{(g^{\prime }(t))^{2}}{g(t)}dt. \end{aligned}$$

Finally, we select \(g(t)=t^{2}\). With this choice, \(\int _{0}^{1}\frac{(g^{\prime }(t))^{2}}{g(t)}dt=4\). And since \(|B(x_{0},\delta )|=\alpha _{d}\delta ^{d}\), we conclude that

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\delta _{x_{0}},\mathfrak {m}_{B(x_{0},\delta )})\le \frac{1}{\sqrt{\kappa _{\theta }\alpha _{d}\left( \frac{\delta }{\varepsilon }\right) ^{d}\eta \left( \frac{\delta }{\varepsilon }\right) }} \end{aligned}$$

as desired. \(\square \)

Proposition 3.6

If \(\int _{\mathbb {R}^{d}}\eta (|y|)dy<\infty \), and \(\theta (1,0)=0\), then for all probability measures \(\nu \in \mathcal {P}(\mathbb {R}^{d})\) which are singular to \(\delta _{x_{0}}\), \(\mathcal {W}_{\eta ,\theta }(\delta _{x_{0}},\nu )=\infty \).

Proof

Let \((\rho _{t},\textbf{j}_{t})_{t\in [0,1]}\) solve the nonlocal continuity equation, and let \(\rho _{0}=\delta _{0}\) and \(\rho _{1}=\nu \). We assume for the sake of contradiction that \(\int _{0}^{1}\mathcal {A}(\rho _{t},\textbf{j}_{t})dt<\infty \).

We define the set

$$\begin{aligned} \mathfrak {T}=\{t\in [0,1]\,:\, \rho _{t}(\{0\})>0 \; \wedge \; |\textbf{j}_{t}|(\{0\}\times \mathbb {R}^{d}\backslash \{0\})>0\}. \end{aligned}$$

Note that this set is measurable since \(\rho _{t}:[0,1]\rightarrow \mathcal {P}(\mathbb {R}^{d})\) is narrowly continuous and \(\textbf{j}_{t}:[0,1]\rightarrow \mathcal {M}_{loc}(G)\) is a Borel function. We claim that for any \(t\in \mathfrak {T}\), it holds that

$$\begin{aligned} \mathcal {A}(\rho _{t},\textbf{j}_{t})=\infty . \end{aligned}$$

To see this, let \(\lambda _{t}\in \mathcal {M}^{+}(G)\) be any measure such that \(\rho _{t}\otimes \text {Leb}+\text {Leb}\otimes \rho _{t}+|\textbf{j}_{t}|\ll \lambda _{t}\). In particular, if \(\rho _{t}(\{0\})\ne 0\) (and, so, for any \(t\in \mathfrak {T}\)), the fact that \(\rho _{t}\otimes \text {Leb}\ll \lambda _{t}\) implies that \(\lambda _{t}\upharpoonright \{0\}\times \mathbb {R}^{d}\) is not identically zero.

At the same time, compute that

$$\begin{aligned} \frac{d(\text {Leb}\otimes \rho _{t})}{d\lambda _{t}}(x,y)&=\frac{d(\text {Leb}\otimes \rho _{t})}{d(\text {Leb}\otimes \rho _{t}+\rho _{t}\otimes \text {Leb})}(x,y)\frac{d(\text {Leb}\otimes \rho _{t}+\rho _{t}\otimes \text {Leb})}{d\lambda _{t}}(x,y) \end{aligned}$$

Note that for all \(t\in \mathfrak {T}\) we may select a representative of \(\frac{d(\text {Leb}\otimes \rho _{t})}{d(\text {Leb}\otimes \rho _{t}+\rho _{t}\otimes \text {Leb})}\) so that

$$\begin{aligned} \frac{d(\text {Leb}\otimes \rho _{t})}{d(\text {Leb}\otimes \rho _{t}+\rho _{t}\otimes \text {Leb})}(0,y)=0\quad \forall y\in \mathbb {R}^{d} \end{aligned}$$

since for all \(t\in \mathfrak {T}\) and open, bounded \(U\subset \mathbb {R}^{d}\),

$$\begin{aligned} \rho _{t}\otimes \text {Leb}(\{0\}\times U)=\rho _{t}(\{0\})\text {Leb}(U)>0 \end{aligned}$$

and so \((\text {Leb}\otimes \rho _{t}+\rho _{t}\otimes \text {Leb})(\{0\}\times U)>0\), but \(\text {Leb}\otimes \rho _{t}(\{0\}\times U)=0\). This implies that (up to a choice of a.e.-equivalent representative) \(\frac{d\text {Leb}\otimes \rho _{t}}{d\lambda _{t}}(0,y)=0\) for all \(y\in \mathbb {R}^{d}\).

Therefore, compute as follows: for all \(t\in \mathfrak {T}\),

$$\begin{aligned} \mathcal {A}_{\eta ,\varepsilon }(\rho _{t},\textbf{j}_{t})&=\iint _{G}\frac{\left( \frac{d\textbf{j}_{t}}{d\lambda _{t}}(x,y)\right) ^{2}}{2\theta \left( \frac{d(\rho _{t}\otimes \text {Leb})}{d\lambda _{t}}(x,y),\frac{d(\text {Leb}\otimes \rho _{t})}{d\lambda _{t}}(x,y)\right) }\eta _{\varepsilon }(x,y)d\lambda _{t}(x,y)\\&\ge \int _{\{0\}\times \mathbb {R}^{d}\backslash \{0\}}\frac{\left( \frac{d\textbf{j}_{t}}{d\lambda _{t}}(x,y)\right) ^{2}}{2\theta \left( \frac{d(\rho _{t}\otimes \text {Leb})}{d\lambda _{t}}(x,y),\frac{d(\text {Leb}\otimes \rho _{t})}{d\lambda _{t}}(x,y)\right) }\eta _{\varepsilon }(x,y)d\lambda _{t}(x,y)\\&=\int _{\{0\}\times \mathbb {R}^{d}\backslash \{0\}}\frac{\left( \frac{d\textbf{j}_{t}}{d\lambda _{t}}(x,y)\right) ^{2}}{2\theta \left( \frac{d(\rho _{t}\otimes \text {Leb})}{d\lambda _{t}}(x,y),0\right) }\eta _{\varepsilon }(x,y)d\lambda _{t}(x,y)\\&=\infty . \end{aligned}$$

Therefore, if \(\int _{0}^{1}\mathcal {A}(\rho _{t},\textbf{j}_{t})<\infty \), it must be the case that \(\text {Leb}(\mathfrak {T})=0\).

However, we claim that \(\text {Leb}(\mathfrak {T})>0\). Indeed, consider the following. Let \(\xi \in C^\infty _c(\mathbb {R}^d, [0,1])\) such that \(\xi (0)=1\) and let \(\xi _{\delta }(x):=\xi \left( \frac{x}{\delta }\right) \). Note that \(0\le \xi _{\delta }(x)\le 1\), and that as \(\delta \rightarrow 0\), \(\xi _{\delta }(x)\) converges pointwise to the indicator \(1_{\{x=0\}}\). Plugging \(\xi _{\delta }(x)\) into the continuity equation, we find that for any \(\tau \in (0,1]\),

$$\begin{aligned} \int _{\mathbb {R}^{d}}\xi _{\delta }(x)d\rho _{\tau }(x)-\int _{\mathbb {R}^{d}}\xi _{\delta }(x)d\rho _{0}(x)=-\int _{0}^{\tau }\iint _{G}(\xi _{\delta }(y)-\xi _{\delta }(x))\eta (x,y)d\textbf{j}_{t}(x,y)dt \end{aligned}$$

and so

$$\begin{aligned} \left| \int _{\mathbb {R}^{d}}\xi _{\delta }(x)d\rho _{\tau }(x)-\int _{\mathbb {R}^{d}}\xi _{\delta }(x)d\rho _{0}(x)\right|&\le \int _{0}^{\tau }\iint _{G}|\xi _{\delta }(y)-\xi _{\delta }(x)|\eta (x,y)d|\textbf{j}_{t}|(x,y)dt.\\&=\int _{0}^{\tau }\iint _{G}|\xi _{\delta }(y)\\&\quad -\xi _{\delta }(x)|\frac{d|\textbf{j}_{t}|}{d\lambda _{t}}(x,y)\eta (x,y)d\lambda _{t}(x,y)dt. \end{aligned}$$

Using the fact that \(\frac{d|\textbf{j}_{t}|}{d\lambda _{t}}=\left| \frac{d\textbf{j}_{t}}{d\lambda _{t}}\right| \), and observing that

$$\begin{aligned} \left| \frac{d\textbf{j}_{t}}{d\lambda _{t}}\right| =\sqrt{2\theta \left( \frac{d\rho _{t}\otimes \text {Leb}}{d\lambda _{t}},\frac{d\text {Leb}\otimes \rho _{t}}{d\lambda _{t}}\right) }\sqrt{\frac{\left( \frac{d\textbf{j}_{t}}{d\lambda _{t}}\right) ^{2}}{\theta \left( \frac{d(\rho _{t}\otimes \text {Leb})}{d\lambda _{t}},\frac{d(m\otimes \rho _{t})}{d\lambda _{t}}\right) }} \end{aligned}$$

whenever the right hand side is well defined, we deduce (now using the convention that \(0\cdot \infty =\infty \)) that for any \(\tau \in (0,1]\),

$$\begin{aligned}{} & {} \int _{0}^{\tau }\iint _{G}|\xi _{\delta }(y)-\xi _{\delta }(x)|\frac{d|\textbf{j}_{t}|}{d\lambda _{t}}\eta d\lambda _{t}(x,y) dt\\{} & {} \quad \le \int _{0}^{\tau }\iint _{G}|\xi _{\delta }(y)-\xi _{\delta }(x)|\sqrt{2\theta \left( \frac{d\rho _{t}\otimes \text {Leb}}{d\lambda _{t}},\frac{d\text {Leb}\otimes \rho _{t}}{d\lambda _{t}}\right) }\\{} & {} \sqrt{\frac{\left( \frac{d\textbf{j}_{t}}{d\lambda _{t}}\right) ^{2}}{\theta \left( \frac{d(\rho _{t}\otimes \text {Leb})}{d\lambda _{t}},\frac{d(\text {Leb}\otimes \rho _{t})}{d\lambda _{t}}\right) }}\eta \, d\lambda _{t}(x,y) dt. \end{aligned}$$

Applying Hölder’s inequality, and using the fact that \(\theta (r,s)\le \frac{r+s}{2}\), we find that this last expression is bounded above by

$$\begin{aligned}&\left( \int _{0}^{\tau }\iint _{G}\left( \frac{d\rho _{t}\otimes \text {Leb}}{d\lambda _{t}}(x,y)+\frac{d\text {Leb}\otimes \rho _{t}}{d\lambda _{t}}(x,y)\right) \eta (x,y)d\lambda _{t}(x,y)dt\right) ^{1/2}\\&\quad \times \left( \int _{0}^{\tau }\iint _{G}\frac{(\xi _{\delta }(y)-\xi _{\delta }(x))^{2}\left( \frac{d\textbf{j}_{t}}{d\lambda _{t}}(x,y)\right) ^{2}}{2\theta \left( \frac{d(\rho _{t}\otimes \text {Leb})}{d\lambda _{t}}(x,y),\frac{d(\text {Leb}\otimes \rho _{t})}{d\lambda _{t}}(x,y)\right) }\eta _{\varepsilon }(x,y)d\lambda _{t}(x,y)dt\right) ^{1/2}. \end{aligned}$$

The term in the left parentheses is bounded above, in turn, by \(2\tau \int _{\mathbb {R}^{d}}\eta (|y|)dy\). To see this, compute that

$$\begin{aligned} \int _{0}^{\tau }\iint _{G}\frac{d\rho _{t}\otimes \text {Leb}}{d\lambda _{t}}(x,y)\eta (x,y)d\lambda _{t}(x,y)dt&=\int _{0}^{\tau }\iint _{G}\eta (x,y)d(\rho _{t}\otimes \text {Leb})(x,y)dt\\&\le \int _{0}^{\tau }\int _{\mathbb {R}^{d}}\left( \int _{\mathbb {R}^{d}}\eta (x,y)dy\right) d\rho _{t}(x)dt\\&=\tau C. \end{aligned}$$

The computation for \(\int _{0}^{\tau }\iint _{G}\frac{\text {Leb}\otimes \rho _{t}}{d\lambda _{t}}(x,y)\eta (x,y)d\lambda _{t}(x,y)dt\) is identical. Therefore, we find that

$$\begin{aligned}{} & {} \left| \int _{\mathbb {R}^{d}}\xi _{\delta }(x)d\rho _{\tau }(x)-\int _{\mathbb {R}^{d}}\xi _{\delta }(x)d\rho _{0}(x)\right| \\{} & {} \le \sqrt{2\tau C}\left( \int _{0}^{\tau }\iint _{G}\frac{(\xi _{\delta }(y)-\xi _{\delta }(x))^{2}\left( \frac{d\textbf{j}_{t}}{d\lambda _{t}}(x,y)\right) ^{2}}{2\theta \left( \frac{d(\rho _{t}\otimes \text {Leb})}{d\lambda _{t}}(x,y),\frac{d(\text {Leb}\otimes \rho _{t})}{d\lambda _{t}}(x,y)\right) }\eta _{\varepsilon }(x,y)d\lambda _{t}(x,y)dt\right) ^{1/2}. \end{aligned}$$

Now, since \((\xi _{\delta }(y)-\xi _{\delta }(x))^{2}\le 1\), and \(\frac{\left( \frac{d\textbf{j}_{t}}{d\lambda _{t}}(x,y)\right) ^{2}}{2\theta \left( \frac{d(\rho _{t}\otimes \text {Leb})}{d\lambda _{t}}(x,y),\frac{d(\text {Leb}\otimes \rho _{t})}{d\lambda _{t}}(x,y)\right) }\eta _{\varepsilon }(x,y)\) is integrable with respect to \(d\lambda _{t}(x,y)dt\) on \(G\times [0,\tau ]\) (assuming that \(\int _{0}^{\tau }\mathcal {A}(\rho _{t},\textbf{j}_{t})dt<\infty \)), and \((\xi _{\delta }(y)-\xi _{\delta }(x))^{2}\) converges pointwise to \(1_{\{0\}\times \mathbb {R}^{d}\backslash \{0\}}\), we can apply the dominated convergence theorem to deduce that as \(\delta \rightarrow 0\),

$$\begin{aligned}{} & {} \int _{0}^{\tau }\iint _{G}\frac{(\xi _{\delta }(y)-\xi _{\delta }(x))^{2}\left( \frac{d\textbf{j}_{t}}{d\lambda _{t}}(x,y)\right) ^{2}}{2\theta \left( \frac{d(\rho _{t}\otimes \text {Leb})}{d\lambda _{t}}(x,y),\frac{d(\text {Leb}\otimes \rho _{t})}{d\lambda _{t}}(x,y)\right) }\eta _{\varepsilon }(x,y)d\lambda _{t}(x,y)dt\\{} & {} \qquad \qquad \longrightarrow \int _{0}^{\tau }\iint _{\{0\}\times \mathbb {R}^{d}\backslash \{0\}}\frac{\left( \frac{d\textbf{j}_{t}}{d\lambda _{t}}(x,y)\right) ^{2}}{2\theta \left( \frac{d(\rho _{t}\otimes \text {Leb})}{d\lambda _{t}}(x,y),\frac{d(\text {Leb}\otimes \rho _{t})}{d\lambda _{t}}(x,y)\right) }\eta _{\varepsilon }(x,y)d\lambda _{t}(x,y)dt. \end{aligned}$$

At the same time, as \(\delta \rightarrow 0\), \(\left| \int _{\mathbb {R}^{d}}\xi _{\delta }(x)d\rho _{\tau }(x)-\int _{\mathbb {R}^{d}}\xi _{\delta }(x)d\rho _{0}(x)\right| \rightarrow \left| \rho _{t}(\{0\})-\rho _{0}(\{0\})\right| \). So in the limit we find that

$$\begin{aligned}{} & {} \left| \rho _{\tau }(\{0\})-\rho _{0}(\{0\})\right| \\{} & {} \le \sqrt{2\tau C}\left( \int _{0}^{\tau }\iint _{\{0\}\times \mathbb {R}^{d}\backslash \{0\}}\frac{\left( \frac{d\textbf{j}_{t}}{d\lambda _{t}} \right) ^{2}}{2\theta \left( \frac{d(\rho _{t}\otimes \text {Leb})}{d\lambda _{t}},\frac{d(\text {Leb}\otimes \rho _{t})}{d\lambda _{t}} \right) }\eta _{\varepsilon }(x,y)d\lambda _{t}(x,y)dt\right) ^{1/2}. \end{aligned}$$

In particular, if \(\tau \) is taken to be the first time t such that \(\rho _{t}(\{0\})=0\) (note that such a first t exists, thanks to Corollary ), then we find that

$$\begin{aligned} \frac{1}{\sqrt{2\tau C}}<\int _{0}^{\tau }\iint _{\{0\}\times \mathbb {R}^{d}\backslash \{0\}}\frac{\left( \frac{d\textbf{j}_{t}}{d\lambda _{t}}(x,y)\right) ^{2}}{2\theta \left( \frac{d(\rho _{t}\otimes \text {Leb})}{d\lambda _{t}}(x,y),\frac{d(\text {Leb}\otimes \rho _{t})}{d\lambda _{t}}(x,y)\right) }\eta _{\varepsilon }(x,y)d\lambda _{t}(x,y)dt \end{aligned}$$

which implies, in particular, that on a subset of \([0,\tau )\) of positive measure,

$$\begin{aligned} \frac{1}{\sqrt{2\tau C}}<\iint _{\{0\}\times \mathbb {R}^{d}\backslash \{0\}}\frac{\left( \frac{d\textbf{j}_{t}}{d\lambda _{t}}(x,y)\right) ^{2}}{2\theta \left( \frac{d(\rho _{t}\otimes \text {Leb})}{d\lambda _{t}}(x,y),\frac{d(\text {Leb}\otimes \rho _{t})}{d\lambda _{t}}(x,y)\right) }\eta _{\varepsilon }(x,y)d\lambda _{t}(x,y) \end{aligned}$$

which means that on a subset of \([0,\tau )\) of positive measure, \(|\textbf{j}_{t}|(\{0\}\times \mathbb {R}^{d}\backslash \{0\})>0\). However, for all of \([0,\tau )\), we know that \(\rho _{t}(\{0\})>0\). Therefore, \(\text {Leb}(\mathfrak {T})>0\). \(\square \)

In the remainder of the article, we will often need to rule out, by assumption, the phenomenon presented by Proposition 3.6. We do this by moreover demanding that the basic assumptions of at least one of Lemmas 3.4 or 3.5 are satisfied, instead. For clarity, we now state this as an additional Assumption which may be imposed on \(\eta \) and \(\theta \), alongside Assumptions 2.1 and 2.2.

Assumption 3.7

Regarding the weight kernel \(\eta \) and interpolation function \(\theta \), at least one of the following sets of assumptions holds:

  1. (i)

    There exists an \(s>0\) and constant \(c_s>0\) such that: for all \(x,y \in \mathbb {R}^d\) satisfying \(|x-y|\le \frac{1}{6}\), it holds that \(\eta (x,y)\ge c_{s}|x-y|^{-d-s}\); or,

  2. (ii)

    The constant \(\theta (1,0)\), which we denote by \(\kappa _{\theta }\), is strictly positive.

We remark that the constant \(\frac{1}{6}\) appearing in the Assumption above is chosen for simple convenience.

3.3 Global upper bounds for \(\mathcal {W}_{\eta ,\theta }\)

Throughout this subsection, we assume that \(\eta \) satisfies Assumption 2.1(i–v), and that \(\theta \) satisfies Assumption 2.2.

We record the following estimate, which will ultimately be used to show that on a compact domain, the topology induced by \(\mathcal {W}_{\eta ,\theta }\) is no stronger than the narrow topology.

Lemma 3.8

Suppose that there is some \(s>0\) and some constant \(c_{s}>0\) such that for all \(x,y\in \mathbb {R}^d\) satisfying \(|x-y|\le \frac{1}{2}\), it holds that \(\eta (x,y)\ge c_{s}|x-y|^{-d-s}\). Then, for all \(x,y\in \mathbb {R}^d\),

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\delta _{x},\delta _{y})\le \Phi \left( \frac{|x-y|}{\varepsilon }\right) \end{aligned}$$

where

$$\begin{aligned} \Phi (t)={\left\{ \begin{array}{ll} C_{d,\theta ,s}t^{s/2} &{} 0\le t<\frac{3}{8}\\ C_{d,\theta ,s}\left( \frac{3}{8}\right) ^{s/2}\frac{8}{3}t &{} \frac{3}{8}\le t \end{array}\right. } \end{aligned}$$

and \(C_{d,\theta ,s}\) depends only on \(d,\theta ,\) and s, and is explicitly given in the proof.

Proof

Let \(\mathfrak {m}_{B(x_{0},\delta )}\) denote the uniform probability measure on the ball \(B(x_{0},\delta )\). Then,

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\delta _{x},\delta _{y})\le \mathcal {W}_{\eta ,\varepsilon }(\delta _{x},\mathfrak {m}_{B(x,\delta )})+\mathcal {W}_{\eta ,\varepsilon }(\mathfrak {m}_{B(x,\delta )},\mathfrak {m}_{B(y,\delta )})+\mathcal {W}_{\eta ,\varepsilon }(\mathfrak {m}_{B(y,\delta )},\delta _{y}). \end{aligned}$$

In order to estimate \(\mathcal {W}_{\eta ,\varepsilon }(\mathfrak {m}_{B(x,\delta )},\mathfrak {m}_{B(y,\delta )})\), we use Lemma B.1. Indeed, applying Lemma B.1 in the case where \(A=B(x_{0},\delta )\) and \(B=B(x_{1},\delta )\), and \(|x_{0}-x_{1}|+2\delta \le \frac{\varepsilon }{2}\), it follows that

$$\begin{aligned} \mathcal {W}_{\varepsilon ,\eta }(\mathfrak {m}_{B(x_{0},\delta )},\mathfrak {m}_{B(x_{1},\delta )})\le \frac{C_{\theta }}{4\sqrt{\alpha _{d}\delta ^{d}\eta _{\varepsilon }(|x_{0}-x_{1}|+2\delta )}}. \end{aligned}$$

In particular, since \(\eta (|x-y|)\ge c_{s}|x-y|^{-d-s}\), we find that

$$\begin{aligned} \mathcal {W}_{\varepsilon ,\eta }(\mathfrak {m}_{B(x_{0},\delta )},\mathfrak {m}_{B(x_{1},\delta )})\le \frac{C_{\theta }}{4\sqrt{\alpha _{d}\left( \frac{\delta }{\varepsilon }\right) ^{d}c_{s}\left( \frac{|x_{0}-x_{1}|+2\delta }{\varepsilon }\right) ^{-d-s}}}. \end{aligned}$$

Note that \(2\delta <|x_{0}-x_{1}|\le \frac{\varepsilon }{2}-2\delta \) but otherwise \(\delta \) is arbitrary. In particular, if we pick \(\delta =\frac{1}{6}|x_{0}-x_{1}|\), this leads to the constraint on \(x_{0}\) and \(x_{1}\) that \(|x_{0}-x_{1}|<\frac{3}{8}\varepsilon \), and the estimate

$$\begin{aligned} \mathcal {W}_{\varepsilon ,\eta }(\mathfrak {m}_{B(x_{0},\delta )},\mathfrak {m}_{B(x_{1},\delta )})\le \frac{C_{\theta }}{4\sqrt{\alpha _{d}c_{s}8^{-d-s}}}\left( \frac{\frac{4}{3}|x_{0}-x_{1}|}{\varepsilon }\right) ^{s/2}\text { when }|x_{0}-x_{1}|<\frac{3}{8}\varepsilon . \end{aligned}$$

At the same time, we know from Lemma 3.4 that

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\delta _{x_{0}},\mathfrak {m}_{B(x_{0},\delta )})\le \frac{C_{\theta }}{C_{d,s}}\left( \frac{\frac{1}{6}|x_{0}-x_{1}|}{\varepsilon }\right) ^{s/2} \end{aligned}$$

and similarly for \(x_{1}\), so altogether,

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\delta _{x_{0}},\delta _{x_{1}})\le \frac{1}{2}C_{d,\theta ,s}\left( \frac{|x_{0}-x_{1}|}{\varepsilon }\right) ^{s/2}\text { when }|x_{0}-x_{1}|<\frac{3}{8}\varepsilon \end{aligned}$$

where

$$\begin{aligned} C_{d,\theta ,s}=\frac{C_{\theta }}{2\sqrt{\alpha _{d}c_{s}8^{-d-s}}}\left( \frac{4}{3}\right) ^{s/2}+4\frac{C_{\theta }}{C_{d,s}}\frac{1}{6^{s/2}}. \end{aligned}$$

For arbitrary \(x,y\in \mathbb {R}^{d}\), it suffices to repeatedly apply the triangle inequality. Namely, construct a sequence \(x=x_{0},x_{1},\ldots ,x_{k}=y\) so that \(|x_{i}-x_{i+1}|<\frac{3}{8}\varepsilon \) for each \(i\in \{0,\ldots ,k-1\}\); note in particular we can take

Thus, for \(|x-y|\ge \frac{3}{8}\varepsilon \), we have that

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\delta _{x},\delta _{y})&\le \sum _{i=1}^{k-1}\mathcal {W}_{\eta ,\varepsilon }(\delta _{x_{i}},\delta _{x_{i+1}})\le k\frac{C_{d,\theta ,s}}{2}\left( \frac{3}{8}\right) ^{s/2} \le \frac{C_{d,\theta ,s}}{2}\left( \frac{3}{8}\right) ^{s/2}\left( \frac{|x-y|}{\frac{3\varepsilon }{8}}+1\right) \\&\le C_{d,\theta ,s}\left( \frac{3}{8}\right) ^{s/2}\frac{8}{3}\frac{|x-y|}{\varepsilon }. \end{aligned}$$

Note that when \(|x-y|=\frac{3}{8}\varepsilon \), we have \(C_{d,\theta ,s}\left( \frac{3}{8}\right) ^{s/2}\frac{8}{3}\frac{|x-y|}{\varepsilon }=C_{d,\theta ,s}\left( \frac{3}{8}\right) ^{s/2}\), while \(\frac{1}{2}C_{d,\theta ,s}\left( \frac{|x_{0}-x_{1}|}{\varepsilon }\right) ^{s/2}=\frac{1}{2}C_{d,\theta ,s}\left( \frac{3}{8}\right) ^{s/2}\); so by defining the continuous, nondecreasing function

$$\begin{aligned}{} & {} \Phi _{\varepsilon }:[0,\infty )\rightarrow [0,\infty ) \\{} & {} \Phi (t)={\left\{ \begin{array}{ll} C_{d,\theta ,s}t^{s/2} &{} 0\le t<\frac{3}{8}\\ C_{d,\theta ,s}\left( \frac{3}{8}\right) ^{s/2}\frac{8}{3}t &{} \frac{3}{8}\le t \end{array}\right. } \end{aligned}$$

we see that

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\delta _{x},\delta _{y})\le \Phi \left( \frac{|x-y|}{\varepsilon }\right) . \end{aligned}$$

\(\square \)

More generally, one has the following upper bound:

Lemma 3.9

Let \(\varepsilon > 0\). Suppose that \(\eta \) and \(\theta \) satisfy Assumption 3.7. Then for all \(x,y\in \mathbb {R}^{d}\),

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\delta _{x},\delta _{y})\le \frac{C_{d,\theta }}{\sqrt{\eta \left( \frac{1}{2}\right) }}\frac{1}{\varepsilon }|x-y|+C_{d,\theta ,\eta } \end{aligned}$$

where \(C_{d,\theta ,\eta }\) is an explicit constant given in the proof.

Proof

To estimate the distance between delta masses at x and y when \(|x-y| \gg \varepsilon \), we first spread the mass to a ball of width \(\delta \), comparable to \(\epsilon \), and then jump between identical balls placed at distance comparable to \(\varepsilon \) along the line segment between x and y. The number of the balls is comparable to \(|x-y|/\varepsilon \), which explains the scaling of the right hand side in our estimate.

Let \(\mathfrak {m}_{B(x_{0},\delta )}\) denote the uniform probability measure on the ball \(B(x_{0},\delta )\). Then,

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\delta _{x},\delta _{y})\le \mathcal {W}_{\eta ,\varepsilon }(\delta _{x},\mathfrak {m}_{B(x,\delta )})+\mathcal {W}_{\eta ,\varepsilon }(\mathfrak {m}_{B(x,\delta )},\mathfrak {m}_{B(y,\delta )})+\mathcal {W}_{\eta ,\varepsilon }(\mathfrak {m}_{B(y,\delta )},\delta _{y}). \end{aligned}$$

In other to estimate \(\mathcal {W}_{\eta ,\varepsilon }(\mathfrak {m}_{B(x,\delta )},\mathfrak {m}_{B(y,\delta )})\), we use Lemma B.1. Indeed, applying Lemma B.1 in the case where \(A=B(x_{0},\delta )\) and \(B=B(x_{1},\delta )\), and \(|x_{0}-x_{1}|+2\delta \le \frac{\varepsilon }{2}\), it follows that

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\mathfrak {m}_{B(x_{0},\delta )},\mathfrak {m}_{B(x_{1},\delta )})\le \frac{C_{\theta }}{4\sqrt{\alpha _{d}\delta ^{d}\eta _{\varepsilon }(|x_{0}-x_{1}|+2\delta )}}\le \frac{C_{\theta }}{4\sqrt{\alpha _{d}\delta ^{d}\eta _{\varepsilon }(\varepsilon /2)}}. \end{aligned}$$

Then, in order to estimate \(\mathcal {W}_{\eta ,\varepsilon }(\mathfrak {m}_{B(x,\delta )},\mathfrak {m}_{B(y,\delta )})\), it suffices to select a sequence \(x=x_{0},x_{1},\ldots ,x_{k}=y\) so that \(\delta <|x_{i}-x_{i+1}|\le \frac{\varepsilon }{2}-2\delta \) for all \(i\in \{0,\ldots ,k-1\}\); in particular, we can take

Hence,

Now, note that

$$\begin{aligned} \eta _{\varepsilon }\left( \frac{\varepsilon }{2}\right) =\frac{1}{\varepsilon ^{d}}\eta \left( \frac{\frac{\varepsilon }{2}}{\varepsilon }\right) =\frac{1}{\varepsilon ^{d}}\eta \left( \frac{1}{2}\right) . \end{aligned}$$

For convenience, we also select \(\delta =\frac{\varepsilon }{6}\). Plugging this in, we get

In other words, putting

$$\begin{aligned} C_{d,\theta }=\frac{C_{\theta }}{\frac{2}{3}\sqrt{\alpha _{d}}\left( \frac{1}{6}\right) ^{d/2}} \end{aligned}$$

we see that

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\mathfrak {m}_{B(x,\delta )},\mathfrak {m}_{B(y,\delta )})&\le \frac{C_{d,\theta }}{\sqrt{\eta \left( \frac{1}{2}\right) }}\frac{1}{\varepsilon }|x-y|+\frac{1}{6}\frac{C_{d,\theta }}{\sqrt{\eta \left( \frac{1}{2}\right) }}. \end{aligned}$$

Finally, we can compute as follows:

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\delta _{x},\delta _{y})&\le \mathcal {W}_{\eta ,\varepsilon }(\delta _{x},\mathfrak {m}_{B(x,\delta )})+\mathcal {W}_{\eta ,\varepsilon }(\mathfrak {m}_{B(x,\delta )},\mathfrak {m}_{B(y,\delta )})+\mathcal {W}_{\eta ,\varepsilon }(\mathfrak {m}_{B(y,\delta )},\delta _{y})\\&\le \mathcal {W}_{\eta ,\varepsilon }(\mathfrak {m}_{B(x,\delta )},\mathfrak {m}_{B(y,\delta )})+2\cdot \mathcal {W}_{\eta ,\varepsilon }(\delta _{x},\mathfrak {m}_{B(x,\delta )})\\&\le \frac{C_{d,\theta }}{\sqrt{\eta \left( \frac{1}{2}\right) }}\frac{1}{\varepsilon }|x-y|+\frac{1}{6}\frac{C_{d,\theta }}{\sqrt{\eta \left( \frac{1}{2}\right) }}+2\mathcal {W}_{\eta ,\varepsilon }\left( \delta _{x},\mathfrak {m}_{B\left( x,\frac{\varepsilon }{6}\right) }\right) \end{aligned}$$

where we have used the fact that \(\delta =\frac{\varepsilon }{6}\). In particular, applying Lemma 3.4, we find that if \(\eta (x,y)\ge c_{s}|x-y|^{-d-s}\) when \(|x-y|\le \frac{1}{6}\), it holds that

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\delta _{x},\delta _{y})\le \frac{C_{d,\theta }}{\sqrt{\eta \left( \frac{1}{2}\right) }}\frac{1}{\varepsilon }|x-y|+\frac{1}{6}\frac{C_{d,\theta }}{\sqrt{\eta \left( \frac{1}{2}\right) }}+2\frac{C_{\theta }}{C_{d,s}}\left( \frac{1}{6}\right) ^{s/2}; \end{aligned}$$

applying Lemma 3.5, we find that if \(\theta (1,0)=\kappa _{\theta }>0\), then

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\delta _{x},\delta _{y})\le \frac{C_{d,\theta }}{\sqrt{\eta \left( \frac{1}{2}\right) }}\frac{1}{\varepsilon }|x-y|+\frac{1}{6}\frac{C_{d,\theta }}{\sqrt{\eta \left( \frac{1}{2}\right) }}+\frac{2}{\sqrt{\kappa _{\theta }\alpha _{d}\left( \frac{1}{6}\right) ^{d}\eta \left( \frac{1}{6}\right) }}. \end{aligned}$$

So altogether, we have that

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\delta _{x},\delta _{y})\le \frac{C_{d,\theta }}{\sqrt{\eta \left( \frac{1}{2}\right) }}\frac{1}{\varepsilon }|x-y|+C_{d,\theta ,\eta } \end{aligned}$$

where we have the case-wise definition of \(C_{d,\theta ,\eta }\) (if both conditions obtain, either case can be chosen for the value of \(C_{d,\theta ,\eta }\)):

$$\begin{aligned} C_{d,\theta ,\eta }:={\left\{ \begin{array}{ll} \frac{1}{6}\frac{C_{d,\theta }}{\sqrt{\eta \left( \frac{1}{2}\right) }}+2\frac{C_{\theta }}{C_{d,s}}\left( \frac{1}{6}\right) ^{s/2} &{} \begin{aligned}\eta (x,y)\ge c_{s}|x-y|^{-d-s}\\ \text { when }|x-y|\le \frac{1}{6}; \end{aligned}\\ \frac{1}{6}\frac{C_{d,\theta }}{\sqrt{\eta \left( \frac{1}{2}\right) }}+\frac{2}{\sqrt{\kappa _{\theta }\alpha _{d}\left( \frac{1}{6}\right) ^{d}\eta \left( \frac{1}{6}\right) }} &{} \theta (1,0)=\kappa _{\theta }>0. \end{array}\right. } \end{aligned}$$

\(\square \)

In order to proceed, we prove the following disintegration inequality for the \(\mathcal {W}_{\eta }\) metric, which is of independent interest, in addition to being needed in the proof of Lemma 3.11 below. An analogous result was established in [21, Proposition 2.14] in the discrete case; however, their proof does not readily adapt to our continuum setting.

Theorem 3.10

(Disintegration inequality) Let \(\mu ,\nu \in \mathcal {P}(\mathbb {R}^{d})\). Then,

$$\begin{aligned} \mathcal {W}_{\eta }^{2}(\mu ,\nu )\le \min _{\pi \in \Pi (\mu ,\nu )}\int _{\mathbb {R}^{d}\times \mathbb {R}^{d}}\mathcal {W}_{\eta }^{2}(\delta _{x},\delta _{y})d\pi (x,y). \end{aligned}$$

Morally speaking, this theorem is just an instance of Jensen’s inequality, since \(\mathcal {W}^{2}\) is a convex l.s.c. function, and \(\mathcal {W}_{\eta }^{2}\left( \mu ,\nu \right) =\mathcal {W}_{\eta }^{2}\left( \int \delta _{x}d\pi (x,y),\int \delta _{y}d\pi (x,y)\right) \). And indeed, in the discrete case, the proof of [21, Proposition 2.14] proceeds rather directly from Jensen’s inequality, albeit applied to the action \(\mathcal {A}(\rho ,\textbf{j})\) rather than to \(\mathcal {W}\). However, standard proofs of Jensen’s inequality (see for instance [41]) require the underlying space (in this case \(\mathcal {P}(\mathbb {R}^{d})^{2}\)) to carry the structure of a topological vector space, which we do not have here. We are aware of one more abstract version of Jensen’s inequality [45] which does not require a t.v.s. structure, but in our situation a direct proof turns out to be readily available.

Proof

Let \(\pi \in \Pi (\mu ,\nu )\). Let \((X(\omega ),Y(\omega ))\) and \((X_{i}(\omega ),Y_{i}(\omega ))\), \(i=1,2,\ldots \) be i.i.d. random variables distributed according to \(\pi \). In particular, \((X,Y)_{\#}\mathbb {P}=\pi \). By the joint convexity of \(\mathcal {W}_{\eta }^{2}\), we have that

$$\begin{aligned} \mathcal {W}_{\eta }^{2}\left( \frac{1}{n}\sum _{i=1}^{n}\delta _{X_{i}(\omega )},\frac{1}{n}\sum _{i=1}^{n}\delta _{Y_{i}(\omega )}\right) \le \frac{1}{n}\sum _{i=1}^{n}\mathcal {W}_{\eta }^{2}(\delta _{X_{i}(\omega )},\delta _{Y_{i}(\omega )}). \end{aligned}$$

At the same time, by change of variables, we observe that

$$\begin{aligned} \int \mathcal {W}_{\eta }^{2}(\delta _{X(\omega )},\delta _{Y(\omega )})d\mathbb {P}(\omega )=\int \mathcal {W}_{\eta }^{2}(\delta _{x},\delta _{y})d\pi (x,y). \end{aligned}$$

Now, suppose that \(\mathcal {W}_{\eta }^{2}(\delta _{X(\omega )},\delta _{Y(\omega )})\) is an integrable random variable; since \(\mathcal {W}_{\eta }^{2}\) is nonnegative, otherwise \(\int \mathcal {W}_{\eta }^{2}(\delta _{x},\delta _{y})d\pi (x,y)=\infty \), in which case the theorem holds trivially. Applying the strong law of large numbers to the i.i.d. random variables \(\mathcal {W}_{\eta }^{2}(\delta _{X_{i}(\omega )},\delta _{Y_{i}(\omega )})\), we see that with probability 1,

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}\mathcal {W}_{\eta }^{2}(\delta _{X_{i}(\omega )},\delta _{Y_{i}(\omega )})\rightarrow \int \mathcal {W}_{\eta }^{2}(\delta _{X(\omega )},\delta _{Y(\omega )})d\mathbb {P}(\omega ). \end{aligned}$$

Therefore,

$$\begin{aligned} \liminf _{n\rightarrow \infty }\mathcal {W}_{\eta }^{2}\left( \frac{1}{n}\sum _{i=1}^{n}\delta _{X_{i}(\omega )},\frac{1}{n}\sum _{i=1}^{n}\delta _{Y_{i}(\omega )}\right) \le \int \mathcal {W}_{\eta }^{2}(\delta _{x},\delta _{y})d\pi (x,y). \end{aligned}$$

The Glivenko–Cantelli theorem tells us that with probability 1, \(\frac{1}{n}\sum _{i=1}^{n}\delta _{X_{i}(\omega )}\rightharpoonup ^{*}\mu \) and \(\frac{1}{n}\sum _{i=1}^{n}\delta _{Y_{i}(\omega )}\rightharpoonup ^{*}\nu \), so since \(\mathcal {W}_{\eta }^{2}\) is jointly l.s.c. with respect to narrow convergence,

$$\begin{aligned} \mathcal {W}_{\eta }^{2}(\mu ,\nu )\le \liminf _{n\rightarrow \infty }\mathcal {W}_{\eta }^{2}\left( \frac{1}{n}\sum _{i=1}^{n}\delta _{X_{i}(\omega )},\frac{1}{n}\sum _{i=1}^{n}\delta _{Y_{i}(\omega )}\right) . \end{aligned}$$

This shows that \(\mathcal {W}_{\eta }^{2}(\mu ,\nu )\le \int \mathcal {W}_{\eta }^{2}(\delta _{x},\delta _{y})d\pi (x,y)\). But since \(\pi \in \Pi (\mu ,\nu )\) was arbitrary, we find that \(\mathcal {W}_{\eta }^{2}(\nu _{0},\nu _{1})\le \inf _{\pi \in \Pi (\nu _{0},\nu _{1})}\int \mathcal {W}_{\eta }^{2}(\delta _{x},\delta _{y})d\pi (x,y)\). Finally, the fact that the infimum is actually attained follows from the fact that \(c(x,y):=\mathcal {W}_{\eta }^{2}(\delta _{x},\delta _{y})\) is a nonnegative l.s.c. cost function, so standard Monge-Kantorovich theory applies, for instance [43, Theorem 1.7]. \(\square \)

We now use the disintegration inequality and the estimates on the nonloal transport between delta masses to obtain the initial, crude, upper bound on nonlocal transport between general probability measures in terms of the Wasserstein distance.

Lemma 3.11

Let \(\nu _{0},\nu _{1}\) be any measures in \(\mathcal {P}_{2}(\mathbb {R}^{d})\). Suppose that \(\eta \) and \(\theta \) satisfy Assumption 3.7. Then,

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }^{2}(\nu _{0},\nu _{1})\le 2\frac{C_{d,\theta }^{2}}{\eta \left( \frac{1}{2}\right) }\frac{1}{\varepsilon ^{2}}W_{2}^{2}(\nu _{0},\nu _{1})+2C_{d,\theta ,\eta }^{2} \end{aligned}$$

where \(C_{d,\theta }=\frac{C_{\theta }}{\frac{2}{3}\sqrt{\alpha _{d}}\left( \frac{1}{6}\right) ^{d/2}}\), \(C_{\theta }=\int _{0}^{1}\frac{1}{\sqrt{\theta (1-r,1+r)}}dr\), and \(C_{d,\theta ,\eta }\) is the constant from Lemma 3.9.

Furthermore, in the case where Assumption 3.7 (i) is satisfied, one has the alternative upper bound

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }^{2}(\nu _{0},\nu _{1})\le \min _{\pi \in \Pi (\nu _{0},\nu _{1})}\int _{\left( \mathbb {R}^{d}\right) ^{2}}\Phi \left( \frac{|x-y|}{\varepsilon }\right) ^{2}d\pi (x,y) \end{aligned}$$

where \(\Phi \) is the function from the statement of Lemma 3.8.

Remark 3.12

This lemma actually helps to address a question posed by Erbar. In [19], it is mentioned that it is unclear for which probability measures \(\nu _{0}\) and \(\nu _{1}\) on \(\mathbb {R}^{d}\) we have that \(\mathcal {W}(\nu _{0},\nu _{1})<\infty \) (in the case of the Wasserstein distances \(W_{p}\), these are precisely the measures which have finite pth moments). The proposition we are about to prove gives a sufficient condition on \(\eta \) and \(\theta \) ensuring that \(\mathcal {W}(\nu _{0},\nu _{1})<\infty \) for all \(\nu _{0},\nu _{1}\in \mathcal {P}_{2}(\mathbb {R}^{d})\).

Proof

In Proposition 3.10, we proved the disintegration inequality

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }^{2}(\nu _{0},\nu _{1})\le \min _{\pi \in \Pi (\nu _{0},\nu _{1})}\int _{\left( \mathbb {R}^{d}\right) ^{2}}\mathcal {W}_{\eta ,\varepsilon }^{2}(\delta _{x},\delta _{y})d\pi (x,y). \end{aligned}$$

On the other hand,

$$\begin{aligned} W_{2}^{2}(\nu _{0},\nu _{1}):=\min _{\pi \in \Pi (\nu _{0},\nu _{1})}\int _{\left( \mathbb {R}^{d}\right) ^{2}}|x-y|^{2}d\pi (x,y). \end{aligned}$$

Therefore, let \(\overline{\pi }\) be \(W_{2}\)-optimal plan for \((\nu _{0},\nu _{1})\). Using the assumption that \(\eta (x,y)\ge c_{s}|x-y|^{-d-s}\) when \(|x-y|\le \frac{1}{6}\), or \(\theta (1,0)=\kappa _{\theta }>0\) (or both), it follows that

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }^{2}(\nu _{0},\nu _{1})&\le \int _{\left( \mathbb {R}^{d}\right) ^{2}}\mathcal {W}_{\eta ,\varepsilon }^{2}(\delta _{x},\delta _{y})d\overline{\pi }(x,y)\\ (\text {Lemma~3.9})&\le \int _{\left( \mathbb {R}^{d}\right) ^{2}}\left[ \frac{C_{d,\theta }}{\sqrt{\eta \left( \frac{1}{2}\right) }}\frac{1}{\varepsilon }|x-y|+C_{d,\theta ,\eta }\right] ^{2}d\overline{\pi }(x,y)\\&\le 2\left( \frac{C_{d,\theta }}{\sqrt{\eta \left( \frac{1}{2}\right) }}\frac{1}{\varepsilon }\right) ^{2}\int _{\left( \mathbb {R}^{d}\right) ^{2}}|x-y|^{2}d\overline{\pi }(x,y)+2C_{d,\theta ,\eta }^{2}\\&=2\left( \frac{C_{d,\theta }}{\sqrt{\eta \left( \frac{1}{2}\right) }}\frac{1}{\varepsilon }\right) ^{2}W_{2}^{2}(\nu _{0},\nu _{1})+2C_{d,\theta ,\eta }^{2}. \end{aligned}$$

Alternatively, in the case where \(\eta (|x-y|)\ge c_{s}|x-y|^{-d-s}\) when \(|x-y|\le \frac{1}{6}\), we can apply Lemma 3.8, and deduce that

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }^{2}(\nu _{0},\nu _{1})&\le \min _{\pi \in \Pi (\nu _{0},\nu _{1})}\int _{\left( \mathbb {R}^{d}\right) ^{2}}\mathcal {W}_{\eta ,\varepsilon }^{2}(\delta _{x},\delta _{y})d\pi (x,y)\\&\quad \le \min _{\pi \in \Pi (\nu _{0},\nu _{1})}\int _{\left( \mathbb {R}^{d}\right) ^{2}}\Phi \left( \frac{|x-y|}{\varepsilon }\right) ^{2}d\pi (x,y). \end{aligned}$$

\(\square \)

Let us mention the following consequence of Lemma 3.11. Together with Proposition 3.2, this shows that when \(\eta \) is integrable, on a bounded domain the topology induced by \(\mathcal {W}_{\eta }\) is equivalent to the strong topology on probability measures.

Proposition 3.13

(TV upper bound on a bounded set) Suppose that \(\eta \) and \(\theta \) satisfy Assumption 3.7, and that \(\nu _{0},\nu _{1}\in \mathcal {P}(\mathbb {R}^{d})\) are both supported inside some bounded set \(K\subset \mathbb {R}^{d}\). Then, there exists some constant C (independent of \(\nu _{0}\) and \(\nu _{1}\), but allowed to depend on \(d,\theta ,\eta \), and the diameter of K) such that

$$\begin{aligned} \mathcal {\mathcal {W}}_{\eta }^{2}(\nu _{0},\nu _{1})\le C\cdot TV(\nu _{0},\nu _{1}). \end{aligned}$$

Proof

The idea is that we simply rerun the argument for Lemma 3.11, but allow the mass in the “overlap” between \(\nu _{0}\) and \(\nu _{1}\) to stay put.

To wit, define the measure \(\Theta :=\min \{\nu _{0},\nu _{1}\}\). We suppose that \(\Theta \) is not identically zero, since otherwise the desired inequality holds trivially. Observe that

$$\begin{aligned} \Vert \nu _{0}-\Theta \Vert _{TV}=\Vert \nu _{1}-\Theta \Vert _{TV}=2TV(\nu _{0},\nu _{1}), \end{aligned}$$

so in particular

$$\begin{aligned} \mathcal {W}_{\eta }\left( \frac{\nu _{0}-\Theta }{2TV(\nu _{0},\nu _{1})},\frac{\nu _{1}-\Theta }{2TV(\nu _{0},\nu _{1})}\right) \end{aligned}$$

is well-defined. Moreover, by Lemma 3.11 (with \(\varepsilon =1\)) it holds that

$$\begin{aligned} \mathcal {W}_{\eta }^{2}\left( \frac{\nu _{0}-\Theta }{2TV(\nu _{0},\nu _{1})},\frac{\nu _{1}-\Theta }{2TV(\nu _{0},\nu _{1})}\right) \le 2\frac{C_{d,\theta }^{2}}{\eta \left( \frac{1}{2}\right) }W_{2}^{2}\left( \frac{\nu _{0}-\Theta }{2TV(\nu _{0},\nu _{1})},\frac{\nu _{1}-\Theta }{2TV(\nu _{0},\nu _{1})}\right) +2C_{d,\theta ,\eta }^{2}. \end{aligned}$$

By the 1-homogeneity of the action \(\mathcal {A}_{\eta ,\theta }\), and also the 1-homogeneity of \(W_{2}^{2}\), this implies that

$$\begin{aligned} \mathcal {W}_{\eta }^{2}\left( \nu _{0}-\Theta ,\nu _{1}-\Theta \right)&\le 2\frac{C_{d,\theta }^{2}}{\eta \left( \frac{1}{2}\right) }W_{2}^{2}\left( \nu _{0}-\Theta ,\nu _{1}-\Theta \right) \\&\quad +4C_{d,\theta ,\eta }^{2}TV(\nu _{0},\nu _{1}). \end{aligned}$$

At the same time, any solution to the nonlocal continuity equation with endpoints \(\nu _{0}-\Theta \) and \(\nu _{1}-\Theta \) extends trivially to a solution to the nonlocal continuity equation with endpoints \(\nu _{0}\) and \(\nu _{1}\): this is because the nonlocal continuity equation is additive, and so we can just add on the constant solution \((\Theta ,0)_{t\in [0,1]}\) to the NCE. By the convexity of the action, this implies that

$$\begin{aligned} \mathcal {W}_{\eta }^{2}(\nu _{0},\nu _{1})\le 2TV(\nu _{0},\nu _{1})\mathcal {W}_{\eta }^{2}\left( \frac{\nu _{0}-\Theta }{2TV(\nu _{0},\nu _{1})},\frac{\nu _{1}-\Theta }{2TV(\nu _{0},\nu _{1})}\right) =\mathcal {W}_{\eta }^{2}\left( \nu _{0}-\Theta ,\nu _{1}-\Theta \right) . \end{aligned}$$

Therefore,

$$\begin{aligned} \mathcal {W}_{\eta }^{2}\left( \nu _{0},\nu _{1}\right)&\le 2\frac{C_{d,\theta }^{2}}{\eta \left( \frac{1}{2}\right) }W_{2}^{2}(\nu _{0},\nu _{1})+4C_{d,\theta ,\eta }^{2}TV(\nu _{0},\nu _{1}). \end{aligned}$$

as desired.

Lastly, by combining [43, Equation (5.1)] with [30, Theorem 4], we see that

$$\begin{aligned} W_{2}^{2}(\nu _{0},\nu _{1})\le {{\,\textrm{diam}\,}}(K)\cdot W_{1}(\nu _{0},\nu _{1})\le {{\,\textrm{diam}\,}}(K)^{2}\cdot TV(\nu _{0},\nu _{1}) \end{aligned}$$

which allows us to deduce that

$$\begin{aligned} \mathcal {W}_{\eta }^{2}(\nu _{0},\nu _{1})\le \left( 2\frac{C_{d,\theta }^{2}\cdot {{\,\textrm{diam}\,}}(K)^{2}}{\eta \left( \frac{1}{2}\right) }+4C_{d,\theta ,\eta }^{2}\right) TV(\nu _{0},\nu _{1}). \end{aligned}$$

\(\square \)

4 Exact nonlocalization

4.1 Exact solution to nonlocal continuity equation

We start by introducing a way to use solutions of the continuity equation to create solutions of the nonlocal continuity equation with a kernel \(\eta \), (1.3). Namely we discovered that given a solution of the continuity equation in the flux form one can convolve it by a specific, \(\eta \)-dependent kernel \(\zeta \) so that the convolved flow is an exact solution of the nonlocal continuity equation. We first present the solution in a formal way and then justify it for weak solutions below.

Let \(\eta (s)\) denote the radial profile of a kernel \(\eta (x,y)\) satisfying Assumption 2.1. Define

$$\begin{aligned} \zeta (r) = \int _r^\infty s \eta (s) ds. \end{aligned}$$

One can check that under this assumption, \(\zeta \) is integrable in \(\mathbb {R}^d\) even when \(\eta \) is not.

Consider a solution of the continuity equation

$$\begin{aligned} \partial _t \rho + {{\,\textrm{div}\,}}(J) = 0. \end{aligned}$$

Let \(\rho _\zeta = \rho * \zeta \) and \(J_\zeta = J * \zeta \). Then

$$\begin{aligned} \partial _t \rho _\zeta + {{\,\textrm{div}\,}}(J_\zeta ) = 0. \end{aligned}$$

Let \(j(x,y) = (y-x) \cdot (J(y) + J(x) ) \). We claim that

$$\begin{aligned} \partial _t \rho _\zeta + \int j(x,y) \eta (x-y) dy = 0. \end{aligned}$$

Namely note that

$$\begin{aligned} \nabla \zeta (|x-y|) = \eta (|x-y|) (x-y). \end{aligned}$$

Thus, using symmetry of \(\eta \),

$$\begin{aligned} \int j(x,y) \eta (|x-y|) dy&= - \int \eta (|x-y|) (x-y) \cdot (J(y) + J(x)) dy \\&= \int \nabla \zeta (|x-y|) \cdot J(y) dy - J(x) \int \eta (|x-y|) (x-y) dy \\&= {{\,\textrm{div}\,}}\left( \int \zeta (|x-y|) J(y) dy \right) + 0 = {{\,\textrm{div}\,}}(J_\zeta ). \end{aligned}$$

While the preceding argument is formal, and written for strong solutions, making the argument rigorous and extending to weak solutions is straightforward, and is done in the lemma below.

In what follows, we use slightly more burdensome notation: given a specific kernel \(\eta \), we write \(\zeta _{\eta }(r):=\int _{r}^{\infty }s\eta (s)ds\); so in particular \(\zeta _{(\eta _{\varepsilon })}(r):=\int _r^\infty s\eta _{\varepsilon }(s)ds\). Note however that it need not hold that \(\zeta _{\eta }(|x|)\) is a convolution kernel, i.e. in may not be normalized when integrating on \(\mathbb {R}^{d}\). We therefore introduce the (normalized) convolution kernels (that these are the correct normalization constants is shown in Lemma B.3):

$$\begin{aligned} \bar{\zeta }_{\eta }:=\frac{d}{M_{2}(\eta )}\zeta _{\eta };\qquad \bar{\zeta }_{(\eta _{\varepsilon })}:=\frac{d}{\varepsilon ^{2}M_{2}(\eta )}\zeta _{(\eta _{\varepsilon })}. \end{aligned}$$

Note that \(\bar{\zeta }_{\eta }\) is a convolution kernel supported on the unit ball, while \(\bar{\zeta }_{(\eta _{\varepsilon })}\) is a convolution kernel supported on the ball of radius \(\varepsilon \).

Lemma 4.1

(exact nonlocalization) Assume that \(\eta \) satisfies Assumption 2.1. Suppose that for all test functions \(\varphi _{t}\in C_{c}^{\infty }((0,1)\times \mathbb {R}^{d})\), \((\rho _{t},\mathbf {\textbf{j}}_{t})_{t\in [0,1]}\in [0,1]\rightarrow \mathcal {P}(\mathbb {R}^{d})\times \mathcal {M}_{loc}(\mathbb {R}^{d};\mathbb {R}^{d})\) satisfies

$$\begin{aligned} \int _{0}^{1}\int _{\mathbb {R}^{d}}\partial _{t}\varphi _{t}(x)d\rho _{t}(x)dt+\int _{0}^{1}\int _{\mathbb {R}^{d}}\nabla \varphi _{t}(x)\cdot d\mathbf {\textbf{j}}_{t}(x)dt=0. \end{aligned}$$

Then, it holds that \(\left( \varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\rho _{t},\textbf{j}_{t}\right) _{t\in [0,1]}\) solves the nonlocal continuity equation, where the measure \(\textbf{j}_{t}:[0,1]\rightarrow \mathcal {M}_{loc}(G)\) is defined in the following way: for all test functions \(\Phi (x,y)\in C_{C}^{\infty }(G)\), we define

$$\begin{aligned} \iint _{G}\Phi (x,y)d\textbf{j}_{t}(x,y):=\frac{d}{\varepsilon ^{2}M_{2}(\eta )}\iint _{G}\Phi (x,y)(y-x)\cdot \left( d\mathbf {\textbf{j}}_{t}(x)dy+d\mathbf {\textbf{j}}_{t}(y)dx\right) . \end{aligned}$$

Proof

Suppose that for all test functions \(\varphi _{t}\in C_{c}^{\infty }((0,1)\times \mathbb {R}^{d})\), it holds that \((\rho _{t},\mathbf {\textbf{j}}_{t})_{t\in [0,1]}\in [0,1]\rightarrow \mathcal {P}(\mathbb {R}^{d})\times \mathcal {M}_{loc}(\mathbb {R}^{d};\mathbb {R}^{d})\) satisfies

$$\begin{aligned} \int _{0}^{1}\int _{\mathbb {R}^{d}}\partial _{t}\varphi _{t}(x)d\rho _{t}(x)dt+\int _{0}^{1}\int _{\mathbb {R}^{d}}\nabla \varphi _{t}(x)\cdot d\mathbf {\textbf{j}}_{t}(x)dt=0. \end{aligned}$$

Then, it also holds that \(\left( \varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\rho _{t},\varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\mathbf {\textbf{j}}_{t}\right) _{t\in [0,1]}\) is also a solution to this form of the continuity equation:

$$\begin{aligned} \int _{0}^{1}\int _{\mathbb {R}^{d}}\partial _{t}\varphi _{t}(x)d\left( \varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\rho _{t}\right) (x)dt+\int _{0}^{1}\int _{\mathbb {R}^{d}}\nabla \varphi _{t}(x)\cdot d\left( \varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\mathbf {\textbf{j}}_{t}\right) (x)dt=0. \end{aligned}$$

Our goal is to show that

$$\begin{aligned} \int _{0}^{1}\int _{\mathbb {R}^{d}}\partial _{t}\varphi _{t}(x)d\left( \varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\rho _{t}\right) (x)dt+\frac{1}{2}\int _{0}^{1}\iint _{G}\bar{\nabla }\varphi _{t}(x,y)\eta _{\varepsilon }(x,y)d\textbf{j}_{t}(x,y)dt=0, \end{aligned}$$

so we claim that

$$\begin{aligned} \frac{1}{2}\int _{0}^{1}\iint _{G}\bar{\nabla }\varphi _{t}(x,y)\eta _{\varepsilon }(x,y)d\textbf{j}_{t}(x,y)dt=\int _{0}^{1}\int _{\mathbb {R}^{d}}\nabla \varphi _{t}(x)\cdot d\left( \varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\mathbf {\textbf{j}}_{t}\right) (x)dt, \end{aligned}$$

which establishes the theorem. Recalling that \(\bar{\nabla }\varphi _{t}(x,y):=\varphi _{t}(y)-\varphi _{t}(x)\), we see that for each \(t\in [0,1]\),

$$\begin{aligned}&\iint _{G}\bar{\nabla }\varphi _{t}(x,y)\eta _{\varepsilon }(x,y)d\textbf{j}_{t}(x,y)\\&=\frac{1}{\varepsilon ^{2}\alpha _{d}\sigma _{\eta }}\iint _{G}\varphi _{t}(y)\eta _{\varepsilon }(x,y)(y-x)\cdot \left( d\mathbf {\textbf{j}}_{t}(x)dy+d\mathbf {\textbf{j}}_{t}(y)dx\right) \\&\quad -\frac{1}{\varepsilon ^{2}\alpha _{d}\sigma _{\eta }}\iint _{G}\varphi _{t}(x)\eta _{\varepsilon }(x,y)(y-x)\cdot \left( d\mathbf {\textbf{j}}_{t}(x)dy+d\mathbf {\textbf{j}}_{t}(y)dx\right) . \end{aligned}$$

(Note that the two integrals on the right hand side are well-defined, since \(\varphi _{t}(x)\) is smooth and compactly supported in \(\mathbb {R}^{d}\), \(\eta _{\varepsilon }(x,y)(y-x)\) is integrable and compactly supported in x for each y, and \(\mathbf {\textbf{j}}_{t}\in \mathcal {M}_{loc}(\mathbb {R}^{d};\mathbb {R}^{d})\).) So first, compute (using the fact that \(\nabla _{y}\zeta _{(\eta _{\varepsilon })}(|x-y|)=(x-y)\eta _{\varepsilon }(|x-y|)\)) that

$$\begin{aligned}&\frac{d}{\varepsilon ^{2}M_{2}(\eta )}\iint _{G}\varphi _{t}(y)\eta _{\varepsilon }(x,y)(y-x)\cdot d\mathbf {\textbf{j}}_{t}(x)dy\\&=-\frac{d}{\varepsilon ^{2}M_{2}(\eta )}\iint _{G}\varphi _{t}(y)\nabla _{y}\zeta _{(\eta _{\varepsilon })}(|x-y|)\cdot d\mathbf {\textbf{j}}_{t}(x)dy\\&=\frac{d}{\varepsilon ^{2}M_{2}(\eta )}\iint _{G}\zeta _{(\eta _{\varepsilon })}(|x-y|)\nabla \varphi _{t}(y)\cdot d\mathbf {\textbf{j}}_{t}(x)dy\\&=\iint _{G}\bar{\zeta }_{(\eta _{\varepsilon })}(|x-y|)\nabla \varphi _{t}(y)\cdot d\mathbf {\textbf{j}}_{t}(x)dy\\&=\int _{\mathbb {R}^{d}}\nabla \varphi _{t}(y)\cdot \left( \int _{\mathbb {R}^{d}}\bar{\zeta }_{(\eta _{\varepsilon })}(|x-y|)d\mathbf {\textbf{j}}_{t}(x)\right) dy\\&=\int _{\mathbb {R}^{d}}\nabla \varphi _{t}(y)\cdot d\left( \varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\mathbf {\textbf{j}}_{t}\right) (y). \end{aligned}$$

By identical reasoning,

$$\begin{aligned} \frac{d}{\varepsilon ^{2}M_{2}(\eta )}\iint _{G}\varphi _{t}(x)\eta _{\varepsilon }(x,y)(y-x)\cdot d\mathbf {\textbf{j}}_{t}(y)dx=-\int _{\mathbb {R}^{d}}\nabla \varphi _{t}(x)\cdot d\left( \varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\mathbf {\textbf{j}}_{t}\right) (x). \end{aligned}$$

Next, compute that

$$\begin{aligned} \iint _{G}\varphi _{t}(y)\eta _{\varepsilon }(x,y)(y-x)\cdot d\mathbf {\textbf{j}}_{t}(y)dx&=\int _{\mathbb {R}^{d}}\varphi _{t}(y)\left( \int _{\mathbb {R}^{d}}\eta _{\varepsilon }(x,y)(y-x)dx\right) \cdot d\mathbf {\textbf{j}}_{t}(y)\\&=0 \end{aligned}$$

since the function \(\eta _{\varepsilon }(x,y)(y-x)\) is radially anti-symmetric around y, implying that \(\int _{\mathbb {R}^{d}}\eta _{\varepsilon }(x,y)(y-x)dx=0\). By identical reasoning, it also holds that

$$\begin{aligned} \iint _{G}\varphi _{t}(x)\eta _{\varepsilon }(x,y)d\left( \mathbf {\textbf{j}}_{t}\cdot (y-x)\right) (x)dy=0. \end{aligned}$$

Therefore, for all \(t\in [0,1]\),

$$\begin{aligned} \iint _{G}\bar{\nabla }\varphi _{t}(x,y)\eta _{\varepsilon }(x,y)d\textbf{j}_{t}(x,y)=2\int _{\mathbb {R}^{d}}\nabla \varphi _{t}(x)\cdot \left( \varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\mathbf {\textbf{j}}_{t}(y)\right) dx, \end{aligned}$$

which establishes the claim. \(\square \)

4.2 A quantitative upper bound on the nonlocal Wasserstein distance

In this section we establish a bound on the nonlocal Wasserstein distance of the form \(\mathcal {W}_{\eta , \varepsilon } \le C W_{2} + O(\sqrt{\varepsilon })\), where C is the exact proportionality constant that also appears in the matching lower bound on \(\mathcal {W}_{\eta , \varepsilon } \) presented in Corollary 5.12.

Proposition 4.2

(bounding \(\mathcal {W}\) by \(W_{2}\)) Assume that \(\eta \) and \(\theta \) satisfy Assumptions 2.1 and 2.2 respectively. Let K denote the convolution kernel \(K(x)=c_{K}e^{-|x|}\), where \(c_{K}\) is a normalizing constant, and let \(\bar{\zeta }_{(\eta _{\varepsilon })}\) denote the convolution kernel \(\bar{\zeta }_{(\eta _{\varepsilon })}(x)=\frac{d}{\varepsilon ^{2}M_{2}(\eta )}\int _{|x|}^{\infty }t\eta _{\varepsilon }(t)dt\). Let \((\nu _{t},\mathbf {\textbf{j}}_{t})_{t\in [0,1]}\) be a solution to the (local) continuity equation in flux form. Furthermore, define \(\textbf{j}_{t}:[0,1]\rightarrow \mathcal {M}_{loc}(G)\) as follows:

$$\begin{aligned} d\textbf{j}_{t}(x,y)=\frac{d}{\varepsilon ^{2}M_{2}(\eta )}(y-x)\cdot \left( d(\varvec{K_{s}}*\mathbf {\textbf{j}}_{t})(x)dy+d(\varvec{K_{s}}*\mathbf {\textbf{j}}_{t})(y)dx\right) . \end{aligned}$$

Then, \((\varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\varvec{K_{s}}*\rho _{t},\textbf{j}_{t})_{t\in [0,1]}\) solves the nonlocal continuity equation, and for all \(0<\varepsilon <s\), and all \(t\in [0,1]\),

$$\begin{aligned} \mathcal {A}_{\eta ,\varepsilon }\left( \varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\varvec{K_{s}}*\rho _{t},\textbf{j}_{t}\right) \le \frac{2d}{\varepsilon ^{2}M_{2}(\eta )}\left( 1+\frac{3}{s}\varepsilon \right) ^{4}\mathcal {A}(\varvec{K_{s}}*\rho _{t},\varvec{K_{s}}*\mathbf {\textbf{j}}_{t}). \end{aligned}$$

In particular, for all \(\rho _{0},\rho _{1}\in \mathcal {P}_{2}(\mathbb {R}^{d})\),

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }\left( \varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\varvec{K_{s}}*\rho _{0},\varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\varvec{K_{s}}*\rho _{1})\right)&\le \frac{1}{\varepsilon }\left( \frac{2d}{M_{2}(\eta )}\right) ^{1/2}\left( 1+\frac{3}{s}\varepsilon \right) ^{2}W_{2}\left( \rho _{0},\rho _{1}\right) . \end{aligned}$$

Proof

Let \((\rho _{t},\mathbf {\textbf{j}}_{t})_{t\in [0,1]}\) be a solution to the (local) continuity equation in flux form. Let \(s>0\) be some fixed quantity chosen later on; we define \(\rho _{t}^{s}:=\varvec{K_{s}}*\nu _{t}\) and \(\mathbf {\textbf{j}}_{t}^{s}:=\varvec{K_{s}}*\mathbf {\textbf{j}}_{t}\). Note that these objects are measures; the corresponding Lebesgue densities are \(K_{s}*\rho _{t}\) and \(\textbf{j}_{t}^{s}:=K_{s}*\mathbf {\textbf{j}}_{t}\) respectively. By [3, Lemma 8.1.9], \((\rho _{t}^{s},\mathbf {\textbf{j}}_{t}^{s})\) also solves the continuity equation in flux form. We then smooth \(\rho _{t}^{s}\) and \(\mathbf {\textbf{j}}_{t}^{s}\) again, using the kernel \(\bar{\zeta }_{(\eta _{\varepsilon })}\), so that \(\left( \varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\rho _{t}^{s},\varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\mathbf {\textbf{j}}_{t}^{s}\right) \) again solves the continuity equation in flux form. By Lemma B.4, we know (since \(\eta _{\varepsilon }(|x-y|)\) is supported on \(B(0,\varepsilon )\)) that if \(\varepsilon <s\), then the corresponding Lebesgue density \(\bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s}\) has local relative Lipschitz regularity of the form

$$\begin{aligned} \frac{\bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s}(y)}{\bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s}(x)}\le \left( 1+\frac{3}{s}\varepsilon \right) ^{2}\left( 1+\frac{3}{s}|x-y|\right) \text { when }|x-y|<s. \end{aligned}$$

Define \(\textbf{j}_{t}:[0,1]\rightarrow \mathcal {M}_{loc}(G)\) as in Lemma 4.1 with respect to \(\mathbf {\textbf{j}}_{t}^{s}\), namely

$$\begin{aligned} d\textbf{j}_{t}(x,y)=\frac{d}{\varepsilon ^{2}M_{2}(\eta )}(y-x)\cdot \left( d\mathbf {\textbf{j}}_{t}^{s}(x)dy+d\mathbf {\textbf{j}}_{t}^{s}(y)dx\right) . \end{aligned}$$

In this case, since \(\mathbf {\textbf{j}}_{t}^{s}\) has a density with respect to the Lebesgue measure given by \(\textbf{j}_{t}^{s}\), it follows that \(\textbf{j}_{t}\) has density with respect to the product Lebesgue measure restricted to G, given by

$$\begin{aligned} j_{t}(x,y):=\frac{d\textbf{j}_{t}}{dxdy}=\frac{d}{\varepsilon ^{2}M_{2}(\eta )}(y-x)\cdot \left[ \textbf{j}_{t}^{s}(x)+\textbf{j}_{t}^{s}(y)\right] . \end{aligned}$$

Furthermore, by Lemma 4.1, we know that \(\left( \varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\rho _{t}^{s},\textbf{j}_{t}\right) \) solves the nonlocal continuity equation.

Now, let us compare the nonlocal action \(\mathcal {A}_{\eta ,\varepsilon }\left( \varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\rho _{t}^{s},\textbf{j}_{t}\right) \) with the local action \(\mathcal {A}(\rho _{t}^{s},\mathbf {\textbf{j}}_{t}^{s})\). Relying on the homogeneity of the interpolation \(\theta \), we observe (using Lemma B.4) that

$$\begin{aligned} \theta (\bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s}(x),\bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s}(y))&=\bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s}(y)\theta \left( \frac{\bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s}(x)}{\bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s}(y)},1\right) \\&\ge \bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s}(y)\theta \left( \frac{1}{\left( 1+\frac{3}{s}\varepsilon \right) ^{2}}\frac{1}{1+\frac{3}{s}|x-y|},1\right) \\&\ge \bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s}(y)\cdot \frac{1}{\left( 1+\frac{3}{s}\varepsilon \right) ^{2}}\frac{1}{1+\frac{3}{s}|x-y|}. \end{aligned}$$

Therefore,

$$\begin{aligned} \mathcal {A}_{\eta ,\varepsilon }&\left( \varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\rho _{t}^{s},\textbf{j}_{t}\right) =\int _{\mathbb {R}^{d}}\int _{\mathbb {R}^{d}}\frac{j_{t}(x,y)^{2}}{2\theta (\bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s}(x),\bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s}(y))}\eta _{\varepsilon }(x,y)dxdy\\&=\int _{\mathbb {R}^{d}}\int _{\mathbb {R}^{d}}\frac{\left( \frac{d}{\varepsilon ^{2}M_{2}(\eta )}\left[ \textbf{j}_{t}^{s}(x)+\textbf{j}_{t}^{s}(y)\right] \cdot (y-x)\right) ^{2}}{2\theta (\bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s}(x),\bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s}(y))}\eta _{\varepsilon }(x,y)dxdy\\&\le \left( 1+\frac{3}{s}\varepsilon \right) ^{2}\int _{\mathbb {R}^{d}}\int _{\mathbb {R}^{d}}\left( 1+\frac{3}{s}|x-y|\right) \frac{\left( \frac{d}{\varepsilon ^{2}M_{2}(\eta )}\left[ \textbf{j}_{t}^{s}(x)+\textbf{j}_{t}^{s}(y)\right] \cdot (y-x)\right) ^{2}}{2\bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s}(y)}\\&\quad \eta _{\varepsilon }(x,y)dxdy. \end{aligned}$$

By [29, Corollary 2.16], together with Lemma B.2, we have that

$$\begin{aligned} \int _{\mathbb {R}^{d}}\left( \textbf{j}_{t}^{s}(y)\cdot (y-x)\right) ^{2}\eta _{\varepsilon }(x,y)dx=\varepsilon ^{2}\frac{M_{2}(\eta )}{d}|\textbf{j}_{t}^{s}(y)|^{2}. \end{aligned}$$

Consequently, also using the fact that the support of \(\eta _{\varepsilon }(x,y)\) has diameter \(\varepsilon \),

$$\begin{aligned}{} & {} \left( 1+\frac{3}{s}\varepsilon \right) ^{2}\int _{\mathbb {R}^{d}}\int _{\mathbb {R}^{d}}\left( 1+\frac{3}{s}|x-y|\right) \frac{\left( \frac{d}{\varepsilon ^{2}M_{2}(\eta )}\textbf{j}_{t}^{s}(y)\cdot (y-x)\right) ^{2}}{\bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s}(y)}\eta _{\varepsilon }(x,y)dxdy\\{} & {} \begin{aligned}&\le \left( \frac{d}{\varepsilon ^{2}M_{2}(\eta )}\right) ^{2}\left( 1+\frac{3}{s}\varepsilon \right) ^{3}\int _{\mathbb {R}^{d}}\int _{\mathbb {R}^{d}}\frac{\left( \textbf{j}_{t}^{s}(y)\cdot (y-x)\right) ^{2}}{\bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s}(y)}\eta _{\varepsilon }(x,y)dxdy\\&\le \frac{d}{\varepsilon ^{2}M_{2}(\eta )}\left( 1+\frac{3}{s}\varepsilon \right) ^{3}\int _{\mathbb {R}^{d}}\frac{(\textbf{j}_{t}^{s}(y))^{2}}{\bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s}(y)}dy. \end{aligned} \end{aligned}$$

Likewise,

$$\begin{aligned}{} & {} \left( 1+\frac{3}{s}\varepsilon \right) ^{2}\int _{\mathbb {R}^{d}}\int _{\mathbb {R}^{d}}\left( 1+\frac{3}{s}|x-y|\right) \frac{\left( \frac{d}{\varepsilon ^{2}M_{2}(\eta )}\textbf{j}_{t}^{s}(x)\cdot (y-x)\right) ^{2}}{\bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s}(y)}\eta _{\varepsilon }(x,y)dxdy\\{} & {} \begin{aligned}&\le \frac{d}{\varepsilon ^{2}M_{2}(\eta )}\left( 1+\frac{3}{s}\varepsilon \right) ^{3}\int _{\mathbb {R}^{d}}\frac{(\textbf{j}_{t}^{s}(x))^{2}}{\bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s}(x)}dx\end{aligned} \end{aligned}$$

and so we deduce that

$$\begin{aligned} \mathcal {A}_{\eta ,\varepsilon }\left( \varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\rho _{t}^{s},\textbf{j}_{t}\right) \le \frac{2d}{\varepsilon ^{2}M_{2}(\eta )}\left( 1+\frac{3}{s}\varepsilon \right) ^{3}\int _{\mathbb {R}^{d}}\frac{(\textbf{j}_{t}^{s}(y))^{2}}{\bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s}(y)}dy. \end{aligned}$$

From Lemma B.4, we know that \(\rho _{t}^{s}(x)\left( \frac{1}{1+\frac{3}{s}\varepsilon }\right) \le (\bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s})(x)\). Therefore,

$$\begin{aligned} \int _{\mathbb {R}^{d}}\frac{(\textbf{j}_{t}^{s}(y))^{2}}{\bar{\zeta }_{(\eta _{\varepsilon })}*\rho _{t}^{s}(y)}dy&\le \left( 1+\frac{3}{s}\varepsilon \right) \int _{\mathbb {R}^{d}}\frac{(\textbf{j}_{t}^{s}(y))^{2}}{\rho _{t}^{s}(y)}dy \end{aligned}$$

and hence

$$\begin{aligned} \mathcal {A}_{\eta ,\varepsilon }\left( \varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\rho _{t}^{s},\textbf{j}_{t}\right) \le \frac{2d}{\varepsilon ^{2}M_{2}(\eta )}\left( 1+\frac{3}{s}\varepsilon \right) ^{4}\mathcal {A}(\rho _{t}^{s},\mathbf {\textbf{j}}_{t}^{s}). \end{aligned}$$

We now take \((\rho _{t},\mathbf {\textbf{j}}_{t})_{t\in [0,1]}\) to be a curve of least action for \(W_{2}\), connecting \(\rho _{0}\) and \(\rho _{1}\). Then, by using the fact that \(\mathcal {A}(\rho _{t}^{s},\mathbf {\textbf{j}}_{t}^{s})\le \mathcal {A}(\rho _{t},\textbf{j}_{t})\), and integrating in t, we have that

$$\begin{aligned} \mathcal {W}\left( \varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\rho _{0}^{s},\varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\rho _{1}^{s}\right) \le \frac{1}{\varepsilon }\left( \frac{2d}{M_{2}(\eta )}\right) ^{1/2}\left( 1+\frac{3}{s}\varepsilon \right) ^{2}W_{2}(\rho _{0},\rho _{1}), \end{aligned}$$

as desired. \(\square \)

Corollary 4.3

Let \(\mu _{0},\mu _{1}\in \mathcal {P}_{2}(\mathbb {R}^{d})\). Assume that \(\eta \) and \(\theta \) satisfy Assumptions 2.1,  2.2, and 3.7. Then,

$$\begin{aligned} \varepsilon \mathcal {W}_{\eta ,\varepsilon }(\mu _{0},\mu _{1})\le \left( \frac{2d}{M_{2}(\eta )}\right) ^{1/2}W_{2}(\mu _{0},\mu _{1})+O(\sqrt{\varepsilon }). \end{aligned}$$

Explicitly,

$$\begin{aligned} \mathcal {W}_{\varepsilon ,\eta }(\mu _{0},\mu _{1})\le&\frac{1}{\varepsilon }\left( \frac{2d}{M_{2}(\eta )}\right) ^{1/2}\left( 1+\sqrt{\varepsilon }\right) ^{2}W_{2}(\mu _{0},\mu _{1})\\&+2\sqrt{2}\left( \frac{C_{d,\theta }}{\sqrt{\eta \left( \frac{1}{2}\right) }}\frac{1}{\varepsilon }\right) \left( \left( d^{2}+d\right) ^{1/2}\sqrt{\varepsilon }+\left( \frac{d}{d+2}\frac{M_{4}(\eta )}{M_{2}(\eta )}\right) ^{1/2}\varepsilon \right) \\&+2\sqrt{2}C_{d,\theta ,\eta } \end{aligned}$$

where \(C_{d,\theta }=\frac{C_{\theta }}{\frac{2}{3}\sqrt{\alpha _{d}}\left( \frac{1}{6}\right) ^{d/2}}\), \(C_{\theta }=\int _{0}^{1}\frac{1}{\sqrt{\theta (1-r,1+r)}}dr\), and \(C_{d,\theta ,\eta }\) is the constant from Lemma 3.9.

Proof

We saw in Lemma 3.11 that for arbitrary \(\nu _{0},\nu _{1}\in \mathcal {P}_{2}(\mathbb {R}^{d})\),

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }^{2}(\nu _{0},\nu _{1})\le 2\frac{C_{d,\theta }^{2}}{\eta \left( \frac{1}{2}\right) }\frac{1}{\varepsilon ^{2}}W_{2}^{2}(\nu _{0},\nu _{1})+2C_{d,\theta ,\eta }^{2} \end{aligned}$$

where \(C_{d,\theta }=\frac{C_{\theta }}{\frac{2}{3}\sqrt{\alpha _{d}}\left( \frac{1}{6}\right) ^{d/2}}\), \(C_{\theta }=\int _{0}^{1}\frac{1}{\sqrt{\theta (1-r,1+r)}}dr\), and \(C_{d,\theta ,\eta }\) is the constant from Lemma 3.9. In particular, take \(\nu _{0}=\mu _{0}\) and \(\nu _{1}=\varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\varvec{K_{s}}*\mu _{0}\). It follows that

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }^{2}(\mu _{0},\varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\varvec{K_{s}}*\mu _{0})\le 2\frac{C_{d,\theta }^{2}}{\eta \left( \frac{1}{2}\right) }\frac{1}{\varepsilon ^{2}}W_{2}^{2}(\mu _{0},\varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\varvec{K_{s}}*\mu _{0})+2C_{d,\theta ,\eta }^{2}. \end{aligned}$$

By identical reasoning,

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }^{2}(\mu _{1},\varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\varvec{K_{s}}*\mu _{1})\le 2\frac{C_{d,\theta }^{2}}{\eta \left( \frac{1}{2}\right) }\frac{1}{\varepsilon ^{2}}W_{2}^{2}(\mu _{1},\varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\varvec{K_{s}}*\mu _{1})+2C_{d,\theta ,\eta }^{2}. \end{aligned}$$

Now since, by the triangle inequality,

$$\begin{aligned}{} & {} \mathcal {W}_{\eta ,\varepsilon }(\mu _{0},\mu _{1})\le \mathcal {W}_{\eta ,\varepsilon }(\mu _{0},\varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\varvec{K_{s}}*\mu _{0})\\{} & {} \quad +\mathcal {W}_{\eta ,\varepsilon }(\varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\varvec{K_{s}}*\mu _{0},\varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\varvec{K_{s}}*\mu _{1})+\mathcal {W}_{\eta ,\varepsilon }(\mu _{1},\varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\varvec{K_{s}}*\mu _{1}) \end{aligned}$$

we can use the previous proposition to see that

$$\begin{aligned}&\mathcal {W}_{\eta ,\varepsilon }(\mu _{0},\mu _{1})\le \frac{1}{\varepsilon }\left( \frac{2d}{M_{2}(\eta )}\right) ^{1/2}\left( 1+\frac{3}{s}\varepsilon \right) ^{2}W_{2}(\mu _{0},\mu _{1})\\&\quad +\sqrt{2\frac{C_{d,\theta }^{2}}{\eta \left( \frac{1}{2}\right) }\frac{1}{\varepsilon ^{2}}W_{2}^{2}(\mu _{0},\varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\varvec{K_{s}}*\mu _{0})+2C_{d,\theta ,\eta }^{2}}\\&\quad +\sqrt{2\frac{C_{d,\theta }^{2}}{\eta \left( \frac{1}{2}\right) }\frac{1}{\varepsilon ^{2}}W_{2}^{2}(\mu _{1},\varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\varvec{K_{s}}*\mu _{1})+2C_{d,\theta ,\eta }^{2}}. \end{aligned}$$

It remains to estimate \(W_{2}(\mu _{0},\varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\varvec{K_{s}}*\mu _{0})\) and \(W_{2}(\mu _{1},\varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\varvec{K_{s}}*\mu _{1})\). To do so, we can use two successive convolution estimates, since

$$\begin{aligned} W_{2}(\mu _{0},\varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\varvec{K_{s}}*\mu _{0})\le W_{2}(\mu _{0},\varvec{K_{s}}*\mu _{0})+W_{2}(\varvec{K_{s}}*\mu _{0},\varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\varvec{K_{s}}*\mu _{0}) \end{aligned}$$

and similarly for \(\mu _{1}\). Thanks to the estimates from Lemmas B.5 and B.6, we know that

$$\begin{aligned} W_{2}(\mu _{0},\varvec{K_{s}}*\mu _{0})\le \left( \int |y|^{2}c_{K}e^{-|y|}dy\right) ^{1/2}s=(d^{2}+d)^{1/2}s \end{aligned}$$

and

$$\begin{aligned} W_{2}(\varvec{K_{s}}*\mu _{0},\varvec{\bar{\zeta }_{(\eta _{\varepsilon })}}*\varvec{K_{s}}*\mu _{0})\le \left( \frac{d}{d+2}\frac{M_{4}(\eta )}{M_{2}(\eta )}\right) ^{1/2}\varepsilon \end{aligned}$$

and of course the same holds for \(\mu _{1}\). Therefore, putting \(s=\sqrt{\varepsilon }\), we deduce the estimate

$$\begin{aligned} \mathcal {W}_{\varepsilon ,\eta }(\mu _{0},\mu _{1})\le&\frac{1}{\varepsilon }\left( \frac{2d}{M_{2}(\eta )}\right) ^{1/2}\left( 1+\sqrt{\varepsilon }\right) ^{2}W_{2}(\mu _{0},\mu _{1})\\&+2\sqrt{2}\left( \frac{C_{d,\theta }}{\sqrt{\eta \left( \frac{1}{2}\right) }}\frac{1}{\varepsilon }\right) \left( \left( d^{2}+d\right) ^{1/2}\sqrt{\varepsilon }+\left( \frac{d}{d+2}\frac{M_{4}(\eta )}{M_{2}(\eta )}\right) ^{1/2}\varepsilon \right) \\&+2\sqrt{2}C_{d,\theta ,\eta } \end{aligned}$$

as desired. \(\square \)

5 Nonlocal Hamilton–Jacobi subsolution

We first first sketch the argument of this section. Our aim is to prove a bound of the following form:

$$\begin{aligned} W_{2}(\mu _{0},\mu _{1})\le \varepsilon \sqrt{\frac{M_{2}(\eta )}{2d}}\mathcal {W}_{\eta ,\varepsilon }(\mu _{0},\mu _{1})+\text {error terms}. \end{aligned}$$

Ultimately, we will show that the error terms are of order \(\sqrt{\varepsilon }\). Our strategy is to use Hamilton–Jacobi duality for \(W_{2}\) (and also \(\mathcal {W}_{\eta ,\varepsilon }\)). Simplifying somewhat:

  1. (i)

    Let \((\phi _{t})_{t\in [0,1]}\) be a (viscosity) solution to the Hamilton–Jacobi equation \(\partial _{t}\phi _{t}+|\nabla \phi _{t}|^{2}=0\). Duality theory for \(W_{2}\) (specifically [46, Proposition 5.48]) tells us that

    $$\begin{aligned} \frac{1}{2}W_{2}^{2}(\mu _{0},\mu _{1})=\max _{\phi _{0}\in C_{b}}\left\{ \int \phi _{1}d\mu _{1}-\int \phi _{0}d\mu _{0}\,:\,\partial _{t}\phi _{t}+\frac{1}{2}|\nabla \phi _{t}|^{2}=0\right\} . \end{aligned}$$
  2. (ii)

    We expect that a similar duality theorem holds for \(\mathcal {W}_{\eta ,\varepsilon }\) (but there is a different notion of “nonlocal Hamilton–Jacobi equation”):

    $$\begin{aligned} \frac{1}{2}\mathcal {W}_{\eta ,\varepsilon }^{2}(\mu _{0},\mu _{1})=\max \left\{ \int \phi _{1}^\varepsilon d\mu _{1}-\int \phi _{0}^\varepsilon d\mu _{0}\,:\,(\phi _{t}^\varepsilon )_{t\in [0,1]} \text { is a n.l. HJ subsolution}\right\} . \end{aligned}$$
  3. (iii)

    We will use solutions of the Hamilton–Jacobi equation to construct subsolutions to the nonlocal Hamilton–Jacobi equations thus obtaining a lower bound on \(\mathcal {W}_{\eta ,\varepsilon }^{2}\).

Unfortunately, things are not so simple an we will need to introduce a layer of approximations. Because of the (conjectured, at this point, based on the outcome of Sect. 4) asymptotic proportionality constant between \(W_{2}\) and \(\mathcal {W}_{\eta ,\varepsilon }\), the constant prefactor in any mapping which takes a (local) HJ solution and gives us a nonlocal HJ subsolution must be \(\frac{2d}{\varepsilon ^{2}M_{2}(\eta )}\).

Before proceeding, we present several preparatory lemmas.

Definition 5.1

(space of space-time bounded Lipschitz functions) We define

$$\begin{aligned} BL([0,1]\times \mathbb {R}^{d}):=\{\phi _{t}(x):[0,1]\times \mathbb {R}^{d}\rightarrow \mathbb {R} \, \mid \, \text {Lip}_{[0,1]\times \mathbb {R}^{d}}(\phi )+\Vert \phi \Vert _{L^{\infty }([0,1]\times \mathbb {R}^{d})}<\infty \}. \end{aligned}$$

Lemma 5.2

Suppose that \(\mu _{0}\) and \(\mu _{1}\) are probability measures which are both supported within B(0, R) inside \(\mathbb {R}^{d}\). Then, in the Kantorovich duality

$$\begin{aligned} \frac{1}{2}W_{2}^{2}(\mu _{0},\mu _{1})=\sup _{\phi \in BL(\mathbb {R}^{d})}\left\{ \int \phi ^{c}(y)d\mu _{1}(y)-\int \phi (x)d\mu _{0}(x)\right\} \end{aligned}$$

(where \(\phi ^{c}(y):=\inf _{x}\{\phi (x)+\frac{1}{2}|x-y|^{2}\}\)), the optimal potential, that is,

$$\begin{aligned} \underset{\phi \in BL(\mathbb {R}^{d})}{\text {argmax}}\left\{ \int \phi ^{c}(y)d\mu _{1}(y)-\int \phi (x)d\mu _{0}(x)\right\} \end{aligned}$$

has \(\text {Lip}(\phi )\le R\).

Note that by Rademacher’s theorem, if \(\phi \) is Lipschitz then \(\nabla \phi \) exists Lebesgue-almost everywhere; the lemma therefore also shows that for the optimal Kantorovich potential, \(|\nabla \phi |\le R\).

Proof

This is an easy refinement of standard results concerning Kantorovich duality, such as [47, Theorem 5.10].

Indeed, as discussed on [43, p. 11], if c(xy) is any continuous cost function and \(\psi ^{c}\) is any c-convex (resp. c-concave) function, it holds automatically that any modulus of continuity for c(xy) is also a modulus of continuity for \(\psi ^{c}\). In the case of \(c(x,y)=\frac{1}{2}|x-y|^{2}\) on a domain of diameter R, we can take \(|c(x,y)-c(x^{\prime },y)|\le R|x-x^{\prime }|\) as a crude global modulus of continuity in the x variable (and of course the same reasoning applies to the y variable). Since the optimal Kantorovich potential \(\phi \) can always be taken to be c-convex, the claim follows. \(\square \)

The Kantorovich duality formula for \(W_{2}\) also has a “dynamic” counterpart in terms of solutions to a Hamilton–Jacobi equation. This fact was initially observed in [6, 39]; here we just give a proof for convenience.

Corollary 5.3

Suppose that \(\mu _{0}\) and \(\mu _{1}\) are probability measures which are both supported within some domain of radius R inside \(\mathbb {R}^{d}\). Then,

$$\begin{aligned}{} & {} \frac{1}{2}W_{2}^{2}(\mu _{0},\mu _{1})\\{} & {} \quad =\sup _{\phi _{t}\in BL([0,1]\times \mathbb {R}^{d})}\left\{ \int \phi _{1}d\mu _{1}-\int \phi _{0}d\mu _{0}:\partial _{t}\phi _{t}+\frac{1}{2}|\nabla \phi _{t}|^{2}=0\text { in viscosity sense}\right\} \end{aligned}$$

and it holds that the optimal Hamilton–Jacobi subsolution, that is,

$$\begin{aligned} \underset{\phi \in BL([0,1]\times \mathbb {R}^{d})}{\textrm{argmax}}\left\{ \int \phi _{1}d\mu _{1}-\int \phi _{0}d\mu _{0}:\partial _{t}\phi _{t}+\frac{1}{2}|\nabla \phi _{t}|^{2}=0\text { in viscosity sense}\right\} \end{aligned}$$

has the property that \(\text {Lip}(\phi _{t})\le R\), for all \(t\in [0,1]\).

Proof

Let \(\phi _{0}\in BL(\mathbb {R}^{d})\). By [26, Theorem 10.3.3], the unique viscosity solution of \(\partial _{t}\phi _{t}+\frac{1}{2}|\nabla \phi _{t}|^{2}=0\) with initial condition \(\phi _{0}\) is given by the Hopf-Lax formula \(\phi _{t}(x)=\inf _{y}\{\phi _{0}(y)+\frac{t}{2}|x-y|^{2}\}\) (which at time 1 just returns \(\phi _{0}^{c}\) for \(c=\frac{1}{2}|x-y|^{2}\), and so by comparing with [47, Theorem 5.10], we see that

$$\begin{aligned} \sup _{\phi \in BL(\mathbb {R}^{d})} \int \phi ^{c}(y)d\mu _{1}(y)-\int \phi (x)d\mu _{0}(x)= & {} \sup _{\phi _{0}\in BL(\mathbb {R}^{d})}\left\{ \int \phi _{1}d\mu _{1}\right. \\{} & {} \left. - \int \phi _{0}d\mu _{0}:\partial _{t}\phi _{t}+\frac{1}{2}|\nabla \phi _{t}|^{2}=0\right\} \end{aligned}$$

where on the right hand side \(\phi _{t}\) solves \(\partial _{t}\phi _{t}+\frac{1}{2}|\nabla \phi _{t}|^{2}=0\) in the viscosity sense. By [26, Lemma 3.3.2], we know that \(\phi _{t}\in BL([0,1]\times \mathbb {R}^{d})\).

Furthermore, the proof of [26, Lemma 3.3.2] indicates that \(\text {Lip}(\phi _{t})\le \text {Lip}(\phi _{0})\) for all \(t\in [0,1].\) Therefore, by Lemma 5.2 it holds that \(\text {Lip}(\phi _{t})\le R\), for all \(t\in [0,1]\). \(\square \)

Definition 5.4

Following [23, Definition 3.1], we say that \(\phi _{t}(x):[0,1]\times \mathbb {R}^{d}\rightarrow \mathbb {R}\) is a nonlocal Hamilton–Jacobi (HJ) subsolution, and write \(\phi _{t}(x)\in \text {HJ}_{\text {NL}}^{1}\) if, for a.e. \(t\in (0,1)\), if, for a.e. \(t\in (0,1)\), the partial derivative \(\partial _{t}\phi _{t}(x)\) exists for every \(x\in \mathbb {R}^{d}\), and \(\sup _{x\in \mathbb {R}^{d}}|\partial _{t}\phi _{t}(x)|<\infty \); and we have, for all probability measures \(\mu \in \mathcal {P}(\mathbb {R}^{d})\), and for any (hence all) \(\lambda \) such that \(\mu \ll \lambda \),

$$\begin{aligned} \int \partial _{t}\phi _{t}(x)d\mu (x)+\frac{1}{4}\int (\phi _{t}(y)-\phi _{t}(x))^{2}\theta \left( \frac{d\mu }{d\lambda }(x),\frac{d\mu }{d\lambda }(y)\right) \eta _{\varepsilon }(x,y)d\lambda (x)d\lambda (y)\le 0. \end{aligned}$$

Remark 5.5

(Nonlocal Hamilton–Jacobi solutions) In the present work, it is the notion of nonlocal HJ subsolution which is relevant. Nonetheless, we mention here an associated notion of solution. Following [28], we say that \(\phi _{t}(x)\) is a nonlocal Hamilton–Jacobi solution if it is a nonlocal Hamilton–Jacobi subsolution, and moreover and we have that

$$\begin{aligned}{} & {} \sup _{\mu \in \mathcal {P}(\mathbb {R}^{d});\lambda \gg \mu }\left\{ \int \partial _{t}\phi _{t}(x)d\mu (x)\right. \\{} & {} \quad +\frac{1}{4}\int (\phi _{t}(y)\left. -\phi _{t}(x))^{2}\theta \left( \frac{d\mu }{d\lambda }(x),\frac{d\mu }{d\lambda }(y)\right) \eta _{\varepsilon }(x,y)d\lambda (x)d\lambda (y)\right\} =0. \end{aligned}$$

We leave investigation of this rather atypical PDE to future work.

The duality formula we expect to hold for \(\mathcal {W}_{\eta ,\varepsilon }\) is

$$\begin{aligned} \frac{1}{2}\mathcal {W}_{\eta ,\varepsilon }^{2}(\mu _{0},\mu _{1})=\sup \left\{ \int \phi _{1}(x)d\mu _{1}(x)-\int \phi _{0}(x)\mu _{0}(x):\phi _{t}\in \text {HJ}_{\text {NL}}^{1}\right\} . \end{aligned}$$

However, in this work, we do not attempt to prove this duality formula directly. Rather, for technical reasons, we introduce a “smoothed version” of the nonlocal Wasserstein distance (for which we do prove a partial duality result), as follows.

Definition 5.6

Let K denote the convolution kernel \(c_{K}e^{-|x|}\). The “s-smoothed nonlocal Wasserstein distance”, denoted \(\mathcal {W}_{\eta ,\varepsilon ,s}\), is defined as follows: given \(\mu _{0},\mu _{1}\in \mathcal {P}(\mathbb {R}^{d})\), and denoting \(\mu _{t}^{s}:=\varvec{K_{s}}*\mu _{t}\) and \(\textbf{j}_{t}^{s}:=\varvec{K_{s}}*\textbf{j}_{t}\),

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon ,s}^{2}(\mu _{0}^{s},\mu _{1}^{s})&:=\inf _{(\mu _{t},\textbf{j}_{t})_{t\in [0,1]}}\left\{ \int _{0}^{1}\mathcal {A}(\mu _{t}^{s},\textbf{j}_{t}^{s})dt:(\mu _{t}^{s},\textbf{j}_{t}^{s})_{t\in [0,1]}\in \mathcal{C}\mathcal{E}(\mu _{0}^{s},\mu _{1}^{s})\right\} . \end{aligned}$$

In other words, \(\mathcal {W}_{\eta ,\varepsilon ,s}\) is defined in the same variational fashion as \(\mathcal {W}_{\eta ,\varepsilon }\), except we restrict to the class of a.c. curves which have been smoothed using the mollification kernel \(K_{s}\). Similarly, we also have a notion of “smoothed nonlocal HJ subsolution:”

Definition 5.7

We say that \(\phi _{t}(x):[0,1]\times \mathbb {R}^{d}\rightarrow \mathbb {R}\) is an s-smoothed nonlocal Hamilton–Jacobi subsolution, and write \(\phi _{t}(x)\in \text {HJ}_{\text {NL}}^{1,s}\) if the weak partial derivative \(\partial _{t}\phi _{t}(x):\mathbb {R}^{d}\rightarrow \mathbb {R}\) exists, and there exists some \(p\in [1,\infty ]\) such that \(\Vert \partial _{t}\phi _{t}\Vert _{L^{p}(\mathbb {R}^{d})}<\infty \), for Lebesgue almost all \(t\in (0,1)\); and we have, for all probability measures \(\mu \in \mathcal {P}(\mathbb {R}^{d})\), for almost all \(t\in (0,1)\),

$$\begin{aligned}{} & {} \int \partial _{t}\phi _{t}(x)d(\varvec{K_{s}}*\mu )(x)+\frac{1}{4}\iint (\phi _{t}(y)-\phi _{t}(x))^{2}\\{} & {} \theta \left( \frac{d(\varvec{K_{s}}*\mu )}{dLeb}(x),\frac{d(\varvec{K_{s}}*\mu )}{dLeb}(y)\right) \eta _{\varepsilon }(x,y)dxdy\le 0. \end{aligned}$$

Remark 5.8

The regularity assumption we have imposed on \(\partial _{t}\phi _{t}(x)\) is chosen so that \(|\int \partial _{t}\phi _{t}d\varvec{K_{s}}*\mu |<\infty \) for almost all \(t\in (0,1)\). Indeed, \(\varvec{K_{s}}*\mu \) has density \(K_{s}*\mu \) with respect to the Lebesgue measure, and since \(\varvec{K_{s}}*\mu \) is a probability measure, \(K_{s}*\mu \in L^{1}(\mathbb {R}^{d})\). At the same time, Young’s convolution inequality implies that \(K_{s}*\mu \in L^{\infty }(\mathbb {R}^{d})\) since \(e^{-|x|}\in L^{\infty }\). Hence by interpolation, \(K_{s}*\mu \in L^{q}\) for all \(q\in [1,\infty ]\), so it suffices that \(\partial _{t}\phi _{t}(x)\in L^{p}(\mathbb {R}^{d})\) for some \(p\in [1,\infty ]\), for almost all \(t\in (0,1)\). In our present situation, it will always be the case that \(\phi _{t}(x)\in BL([0,1]\times \mathbb {R}^{d})\), so in particular we can take \(p=\infty \).

Lemma 5.9

For all \(\mu _{0},\mu _{1}\in \mathcal {P}_{2}(\mathbb {R}^{d})\), it holds that

$$\begin{aligned} \mathcal {W}_{\eta ,\varepsilon }(\varvec{K_{s}}*\mu _{0},\varvec{K_{s}}*\mu _{1})\le \mathcal {W}_{\eta ,\varepsilon ,s}(\varvec{K_{s}}*\mu _{0},\varvec{K_{s}}*\mu _{1})\le \mathcal {W}_{\eta ,\varepsilon }(\mu _{0},\mu _{1}). \end{aligned}$$

Proof

That \(\mathcal {W}_{\eta ,\varepsilon }(\varvec{K_{s}}*\mu _{0},\varvec{K_{s}}*\mu _{1})\le \mathcal {W}_{\eta ,\varepsilon ,s}(\varvec{K_{s}}*\mu _{0},\varvec{K_{s}}*\mu _{1})\) holds is immediate from the fact that the infimum in the definition of \(\mathcal {W}_{\eta ,\varepsilon ,s}\) runs only over a.c. curves connecting \(\varvec{K_{s}}*\mu _{0}\) and \(\varvec{K_{s}}*\mu _{1}\) which happen to be \(K_{s}\)-smoothings of (not necessarily a.c.) curves between \(\mu _{0}\) and \(\mu _{1}\); whereas, for \(\mathcal {W}_{\eta ,\varepsilon }(\varvec{K_{s}}*\mu _{0},\varvec{K_{s}}*\mu _{1})\), the infimum runs over all a.c. curves connecting \(\varvec{K_{s}}*\mu _{0}\) and \(\varvec{K_{s}}*\mu _{1}\).

That \(\mathcal {W}_{\eta ,\varepsilon ,s}(\varvec{K_{s}}*\mu _{0},\varvec{K_{s}}*\mu _{1})\le \mathcal {W}_{\eta ,\varepsilon }(\mu _{0},\mu _{1})\) follows from the fact that mollifying a.c. curves reduces their total action, see Proposition 2.22; and that the class of time-dependent mass-flux pairs \((\mu _{t},\textbf{j}_{t})_{t\in [0,1]}\) considered in the infimum defining \(\mathcal {W}_{\eta ,\varepsilon ,s}\) includes all a.c. curves connecting \(\mu _{0}\) and \(\mu _{1}\). \(\square \)

Proposition 5.10

Assume that \(\eta \) and \(\theta \) satisfy Assumptions 2.1 (i–iv) and 2.2 respectively. Let \(s>0\). Denote \(\mu ^{s}:=\varvec{K_{s}}*\mu \) for any \(\mu \in \mathcal {P}(\mathbb {R}^{d})\). The following duality inequality for \(\mathcal {W}_{\eta ,\varepsilon ,s}\) holds:

$$\begin{aligned}&\frac{1}{2}\mathcal {W}_{\eta ,\varepsilon ,s}^{2}(\mu _{0}^{s},\mu _{1}^{s}) \\&\quad \ge \sup \left\{ \int \phi _{1}(x)d\mu _{1}^{s}(x)-\int \phi _{0}(x)d\mu _{0}^{s}(x):\phi _{t}\in \text {HJ}_{\text {NL}}^{1,s}\cap BL([0,1]\times \mathbb {R}^{d})\right\} . \end{aligned}$$

Proof

We follow rather closely the argument given in [23, Section 3]. Following the exposition there, we introduce the shorthand notation

$$\begin{aligned}{} & {} \langle \phi ,\mu \rangle :=\int _{\mathbb {R}^{d}}\phi (x)d\mu (x);\qquad \langle \langle \Phi ,\textbf{j}\rangle \rangle :=\iint _{G}\Phi (x,y)\eta (x,y)d\textbf{j}(x,y). \end{aligned}$$

We also use the notation \(\mu ^{s}:=\varvec{K_{s}}*\mu \) and \(\textbf{j}^{s}:=\varvec{K_{s}}*\textbf{j}\). In this argument, we allow \(\mu _{t}\) to take values in \(\mathcal {M}^{+}(\mathbb {R}^{d})\) rather than just \(\mathcal {P}(\mathbb {R}^{d})\). Note that in this situation, the action \(\mathcal {A}(\mu ,\textbf{j})\) and the nonlocal continuity equation are still well-defined. Additionally, there will be no loss of generality in assuming that, for paths we consider, \(\mathcal {A}(\mu _{t}^{s},\textbf{j}_{t}^{s})<\infty \) for almost all \(t\in [0,1]\); since otherwise, we will find that \(\mathcal {W}_{\eta ,\varepsilon ,s}^{2}(\varvec{K_{s}}*\mu _{0},\varvec{K_{s}}*\mu _{1})=\infty \), and so the proposition holds trivially. In particular, quantification over “all” \((\mu _{t},\textbf{j}_{t})_{t\in [0,1]}\) shall be understood to mean that:

  • both \(\mu _{t}:[0,1]\rightarrow \mathcal {M}^{+}(\mathbb {R}^{d})\) is continuous, and \(\textbf{j}_{t}:[0,1]\rightarrow \mathcal {M}_{loc}(G)\) is Borel measurable, w.r.t. the respective weak* topologies on \(\mathcal {M}^{+}(\mathbb {R}^{d})\) and \(\mathcal {M}_{loc}(G)\); and

  • \(\mathcal {A}(\mu _{t}^{s},\textbf{j}_{t}^{s})<\infty \) for almost all \(t\in [0,1]\).

By [19, Lemma 3.1], we know that, if \((\mu _{t}^{s},\textbf{j}_{t}^{s})_{t\in [0,1]}\) satisfies the nonlocal continuity equation in the sense of distributions, then for all \(\phi _{t}(x)\in C_{c}^{\infty }([0,1]\times \mathbb {R}^{d})\),

$$\begin{aligned} \langle \phi _{1},\mu _{1}^{s}\rangle -\langle \phi _{0},\mu _{0}^{s}\rangle -\int _{0}^{1}\left( \langle \partial _{t}\phi _{t},\mu _{t}^{s}\rangle +\frac{1}{2}\langle \langle \bar{\nabla }\phi _{t},\textbf{j}_{t}^{s}\rangle \rangle \right) dt=0. \end{aligned}$$

Now consider, more generally, the case where \(\tilde{\phi }_{t}(x)\in BL([0,1]\times \mathbb {R}^{d})\). Reasoning as in [19, Remark 3.3], we approximate \(\tilde{\phi }_{t}(x)\) by functions \(\phi _{t}\in C_{c}^{\infty }([0,1]\times \mathbb {R}^{d})\) which are uniformly bounded in \(C^{1}([0,1]\times \mathbb {R}^{d})\) norm, and use the fact that

$$\begin{aligned} |\langle \langle \bar{\nabla }\phi _{t},\textbf{j}_{t}^{s}\rangle \rangle |\le \left( \sup _{t\in [0,1]}\Vert \phi _{t}\Vert _{C^{1}(\mathbb {R}^{d})}\right) \iint (1\wedge |x-y|)\eta _{\varepsilon }(x,y)d|\textbf{j}_{t}^{s}|(x,y)\in L^{1}([0,1]) \end{aligned}$$

to pass to the limit in the nonlocal continuity equation, and deduce that more generally,

$$\begin{aligned}{} & {} \forall \phi _{t}(x)\in BL([0,1]\times \mathbb {R}^{d})\nonumber \\{} & {} \langle \phi _{1},\mu _{1}^{s}\rangle -\langle \phi _{0},\mu _{0}^{s}\rangle -\int _{0}^{1}\left( \langle \partial _{t}\phi _{t},\mu _{t}^{s}\rangle +\frac{1}{2}\langle \langle \bar{\nabla }\phi _{t},\textbf{j}_{t}^{s}\rangle \rangle \right) dt=0. \end{aligned}$$
(5.1)

Therefore, for any two s-smoothed probability measures \(\bar{\mu }_{0}^{s}\) and \(\bar{\mu }_{1}^{s}\) with \(\mathcal {W}_{\eta ,\varepsilon ,s}^{2}(\bar{\mu }_{0}^{s},\bar{\mu }_{1}^{s})<\infty \), we have that

$$\begin{aligned}{} & {} \frac{1}{2}\mathcal {W}_{\eta ,\varepsilon ,s}^{2}(\bar{\mu }_{0}^{s},\bar{\mu }_{1}^{s})\\{} & {} =\inf _{(\mu _{t},\textbf{j}_{t})_{t\in [0,1]}:\mu _{0}=\bar{\mu }_{0},\mu _{1}=\bar{\mu }_{1}}\left\{ \int _{0}^{1}\frac{1}{2}\mathcal {A}(\mu _{t}^{s},\textbf{j}_{t}^{s})dt:(\mu _{t}^{s},\textbf{j}_{t}^{s})_{t\in [0,1]}\text { satisfies }(5.1)\right\} . \end{aligned}$$

Introducing a Lagrange multiplier for the constraint “\((\mu _{t}^{s},\textbf{j}_{t}^{s})_{t\in [0,1]}\text { satisfies }(5.1)\)” we see that

$$\begin{aligned} \frac{1}{2}\mathcal {W}_{\eta ,\varepsilon ,s}^{2}(\bar{\mu }_{0}^{s},\bar{\mu }_{1}^{s})= & {} \inf _{(\mu _{t},\textbf{j}_{t})_{t\in [0,1]}:\mu _{0}=\bar{\mu }_{0},\mu _{1}=\bar{\mu }_{1}}\sup _{\phi _{t}\in BL([0,1]\times \mathbb {R}^{d})}\bigg \{\int _{0}^{1}\frac{1}{2}\mathcal {A}(\mu _{t}^{s},\textbf{j}_{t}^{s})dt\\{} & {} +\langle \phi _{1},\mu _{1}^{s}\rangle -\langle \phi _{0},\mu _{0}^{s}\rangle -\int _{0}^{1}\left( \langle \partial _{t}\phi _{t},\mu _{t}^{s}\rangle +\frac{1}{2}\langle \langle \bar{\nabla }\phi _{t},\textbf{j}_{t}^{s}\rangle \rangle \right) dt\bigg \}. \end{aligned}$$

(Note that the the inner supremum takes the value \(+\infty \) unless (5.1) holds.)

Using the general fact that \(\sup \inf \le \inf \sup \), we see that

$$\begin{aligned} \frac{1}{2}\mathcal {W}_{\eta ,\varepsilon ,s}^{2}(\bar{\mu }_{0}^{s},\bar{\mu }_{1}^{s})\ge & {} \sup _{\phi _{t}\in BL([0,1]\times \mathbb {R}^{d})}\inf _{(\mu _{t},\textbf{j}_{t})_{t\in [0,1]}:\mu _{0}=\bar{\mu }_{0},\mu _{1}=\bar{\mu }_{1}}\bigg \{\langle \phi _{1},\mu _{1}^{s}\rangle -\langle \phi _{0},\mu _{0}^{s}\rangle \\{} & {} +\int _{0}^{1}\left( \frac{1}{2}\mathcal {A}(\mu _{t}^{s},\textbf{j}_{t}^{s})-\langle \partial _{t}\phi _{t},\mu _{t}^{s}\rangle -\frac{1}{2}\langle \langle \bar{\nabla }\phi _{t},\textbf{j}_{t}^{s}\rangle \rangle \right) dt\bigg \} \end{aligned}$$

which in turn implies that (now letting the infimum quantify over a larger set, without fixed endpoints \(\bar{\mu }_{0}\) and \(\bar{\mu }_{1}\))

$$\begin{aligned} \frac{1}{2}\mathcal {W}_{\eta ,\varepsilon ,s}^{2}(\bar{\mu }_{0}^{s},\bar{\mu }_{1}^{s})\ge & {} \sup _{\phi _{t}\in BL([0,1]\times \mathbb {R}^{d})}\bigg \{\langle \phi _{1},\bar{\mu }_{1}^{s}\rangle -\langle \phi _{0},\bar{\mu }_{0}^{s}\rangle \\{} & {} +\inf _{(\mu _{t},\textbf{j}_{t})_{t\in [0,1]}}\int _{0}^{1}\left( \frac{1}{2}\mathcal {A}(\mu _{t}^{s},\textbf{j}_{t}^{s})-\langle \partial _{t}\phi _{t},\mu _{t}^{s}\rangle -\frac{1}{2}\langle \langle \bar{\nabla }\phi _{t},\textbf{j}_{t}^{s}\rangle \rangle \right) dt\bigg \}. \end{aligned}$$

Observe that due to the 1-homogeneity in \((\mu _{t}^{s},\textbf{j}_{t}^{s})_{t\in [0,1]}\) of both the total action and nonlocal continuity equation, the inner infimum evaluates to \(-\infty \) unless \(\phi _{t}\) is chosen so that, for all \((\mu _{t},\textbf{j}_{t})_{t\in [0,1]}\),

$$\begin{aligned} \int _{0}^{1}\left( \frac{1}{2}\mathcal {A}(\mu _{t}^{s},\textbf{j}_{t}^{s})-\langle \partial _{t}\phi _{t},\mu _{t}^{s}\rangle -\frac{1}{2}\langle \langle \bar{\nabla }\phi _{t},\textbf{j}_{t}^{s}\rangle \rangle \right) dt\ge 0 \end{aligned}$$

since otherwise we can just replace \((\mu _{t}^{s},\textbf{j}_{t}^{s})\) with \((\lambda \mu _{t}^{s},\lambda \textbf{j}_{t}^{s})\) and then send \(\lambda \rightarrow \infty \).

At the same time, since \(\mu _{t}^{s}\) has full support, and \(\mathcal {A}(\mu _{t}^{s},\textbf{j}_{t}^{s})<\infty \) by assumption, we have by [19, Lemma 2.3] that \(\textbf{j}_{t}^{s}\ll dx\otimes dy\). So, we can compute that

$$\begin{aligned} \frac{1}{2}\mathcal {A}(\mu _{t}^{s},\textbf{j}_{t}^{s})-\frac{1}{2}\langle \langle \bar{\nabla }\phi _{t},\textbf{j}_{t}^{s}\rangle \rangle= & {} \frac{1}{4}\iint \frac{\left( \frac{d\textbf{j}_{t}^{s}}{dxdy}(x,y)-\bar{\nabla }\phi _{t}(x,y)\theta \left( \frac{d\mu _{t}^{s}}{dx}(x),\frac{d\mu _{t}^{s}}{dy}(y)\right) \right) ^{2}}{\theta \left( \frac{d\mu _{t}^{s}}{dx}(x),\frac{d\mu _{t}^{s}}{dy}(y)\right) }\\{} & {} \eta _{\varepsilon }(x,y)dxdy\\{} & {} -\frac{1}{4}\iint (\bar{\nabla }\phi _{t}(x,y))^{2}\theta \left( \frac{d\mu _{t}^{s}}{dx}(x),\frac{d\mu _{t}^{s}}{dy}(y)\right) \eta _{\epsilon }(x,y)dxdy. \end{aligned}$$

Therefore, we conclude that

$$\begin{aligned}{} & {} \int _{0}^{1}\left( \int \partial _{t}\phi _{t}d\mu _{t}^{s}+\frac{1}{4}\iint (\bar{\nabla }\phi _{t}(x,y))^{2}\theta \left( \frac{d\mu _{t}^{s}}{dx}(x),\frac{d\mu _{t}^{s}}{dy}(y)\right) \eta _{\varepsilon }(x,y)dxdy\right) dt\nonumber \\{} & {} \le \int _{0}^{1} \frac{1}{4}\iint \frac{\left( \frac{d\textbf{j}_{t}^{s}}{dxdy}(x,y)-\bar{\nabla }\phi _{t}(x,y)\theta \left( \frac{d\mu _{t}^{s}}{dx}(x),\frac{d\mu _{t}^{s}}{dy}(y)\right) \right) ^{2}}{\theta \left( \frac{d\mu _{t}^{s}}{dx}(x),\frac{d\mu _{t}^{s}}{dy}(y)\right) }\eta _{\varepsilon }(x,y)dxdy\,dt. \nonumber \\ \end{aligned}$$
(5.2)

Again, this condition holds provided that the inner infimum is \(\ge 0\) (as opposed to \(-\infty \)). Of course, if it is the case that the inner infimum is nonnegative (for a given \(\phi _{t}(x)\in BL([0,1]\times \mathbb {R}^{d})\)), we certainly have that

$$\begin{aligned}{} & {} \langle \phi _{1},\bar{\mu }_{1}^{s}\rangle -\langle \phi _{0},\bar{\mu }_{0}^{s}\rangle +\inf _{(\mu _{t},\textbf{j}_{t})_{t\in [0,1]}}\int _{0}^{1}\left( \frac{1}{2}\mathcal {A}(\mu _{t}^{s},\textbf{j}_{t}^{s})-\langle \partial _{t}\phi _{t},\mu _{t}^{s}\rangle \right. \\{} & {} \quad \left. -\frac{1}{2}\langle \langle \bar{\nabla }\phi _{t},\textbf{j}_{t}^{s}\rangle \rangle \right) dt\ge \langle \phi _{1},\bar{\mu }_{1}^{s}\rangle -\langle \phi _{0},\bar{\mu }_{0}^{s}\rangle . \end{aligned}$$

Therefore, we deduce the duality relation

$$\begin{aligned} \frac{1}{2}\mathcal {W}_{\eta ,\varepsilon ,s}^{2}(\bar{\mu }_{0}^{s},\bar{\mu }_{1}^{s})\ge \sup _{\phi _{t}\in BL([0,1]\times \mathbb {R}^{d})}\{\langle \phi _{1},\bar{\mu }_{1}^{s}\rangle -\langle \phi _{0},\bar{\mu }_{0}^{s}\rangle :\forall (\mu _{t},\textbf{j}_{t})_{t\in [0,1]}\text { (5.2) holds}\}. \end{aligned}$$

In turn, since in general,

$$\begin{aligned} \int _{0}^{1} \frac{1}{4}\iint \frac{\left( \frac{d\textbf{j}_{t}^{s}}{dxdy}(x,y)-\bar{\nabla }\phi _{t}(x,y)\theta \left( \frac{d\mu _{t}^{s}}{dx}(x),\frac{d\mu _{t}^{s}}{dy}(y)\right) \right) ^{2}}{\theta \left( \frac{d\mu _{t}^{s}}{dx}(x),\frac{d\mu _{t}^{s}}{dy}(y)\right) }\eta _{\varepsilon }(x,y)dxdy\, dt\ge 0, \end{aligned}$$

this implies that

$$\begin{aligned}{} & {} \frac{1}{2}\mathcal {W}_{\eta ,\varepsilon ,s}^{2}(\bar{\mu }_{0}^{s},\bar{\mu }_{1}^{s})\ge \sup _{\phi _{t}\in BL([0,1]\times \mathbb {R}^{d})}\bigg \{\langle \phi _{1},\bar{\mu }_{1}^{s}\rangle -\langle \phi _{0},\bar{\mu }_{0}^{s}\rangle :\\{} & {} \forall (\mu _{t},\textbf{j}_{t})_{t\in [0,1]},\underbrace{\int _{0}^{1}\left( \int \partial _{t}\phi _{t}d\mu _{t}^{s}+\frac{1}{4}\iint (\bar{\nabla }\phi _{t}(x,y))^{2}\theta \left( \frac{d\mu _{t}^{s}}{dx}(x),\frac{d\mu _{t}^{s}}{dy}(y)\right) \eta _{\varepsilon }(x,y)dxdy\right) dt}_{(\dagger )}\le 0\bigg \}. \end{aligned}$$

However, the quantity \((\dagger )\) is independent of \(\textbf{j}_{t}\). Therefore the statement reduces to quantification over \((\mu _{t})_{t\in [0,1]}\) is for all weak*ly continuous “curves” in \(\mathcal {M}^{+}(\mathbb {R}^{d})\), such that there exists a \((\textbf{j}_{t})_{t\in [0,1]}\) for which \(\mathcal {A}(\mu _{t}^{s},\textbf{j}_{t}^{s})<\infty \) for almost all \(t\in [0,1]\); but we do not even require satisfaction of the nonlocal continuity equation, so we can always just take \(\textbf{j}_{t}=0\). From this, by restricting the quantification only to “curves” \((\mu _{t})_{t\in [0,1]}\) which are constant and belong to \(\mathcal {P}(\mathbb {R}^{d})\), we deduce that

$$\begin{aligned}{} & {} \frac{1}{2}\mathcal {W}_{\eta ,\varepsilon ,s}^{2}(\bar{\mu }_{0}^{s},\bar{\mu }_{1}^{s})\ge \sup _{\phi _{t}\in BL([0,1]\times \mathbb {R}^{d})}\bigg \{\langle \phi _{1},\bar{\mu }_{1}^{s}\rangle -\langle \phi _{0},\bar{\mu }_{0}^{s}\rangle :\forall \mu \in \mathcal {P}(\mathbb {R}^{d}),\\{} & {} \int \partial _{t}\phi _{t}d\mu ^{s}+\frac{1}{4}\iint (\bar{\nabla }\phi _{t}(x,y))^{2}\theta \left( \frac{d\mu _{t}^{s}}{dx}(x),\frac{d\mu _{t}^{s}}{dy}(y)\right) \eta _{\varepsilon }(x,y)dxdy\le 0\quad t\text {-a.s.}\bigg \} \end{aligned}$$

as desired. \(\square \)

Proposition 5.11

Let \(\varepsilon \in (0,1]\) and \(s\ge \varepsilon \). Assume that \(\eta \) and \(\theta \) satisfy Assumptions 2.1 (i–iv) and 2.2 respectively. Assume also that \(M_{5}(\eta )<\infty \). Suppose that \((\phi _{t})_{t\in [0,1]}\in BL([0,1]\times \mathbb {R}^{d})\) is a Hamilton–Jacobi subsolution in the sense that

$$\begin{aligned} \partial _{t}\phi _{t}(x)+\frac{1}{2}|\nabla \phi _{t}(x)|^{2}\le 0\text { a.e.}. \end{aligned}$$

Then,

$$\begin{aligned}{} & {} \left( \frac{2d}{\varepsilon ^{2}M_{2}(\eta )}K_{s}*\phi _{t}-\frac{CA^{2}}{\varepsilon s}t\right) _{t\in [0,1]};\\ C= & {} \frac{d^{2}}{M_{2}(\eta )^{2}}\left[ \frac{3}{8}M_{3}(\eta )+\sqrt{\left( \frac{M_{2}(\eta )}{d}+\frac{3}{2}M_{3}(\eta )\right) \left( M_{4}(\eta )+\frac{3}{2}M_{5}(\eta )\right) }+\frac{1}{4}\left( M_{4}(\eta )+\frac{3}{2}M_{5}(\eta )\right) \right] \\ A= & {} \sup _{t\in [0,1]}\Vert \nabla \phi _{t}\Vert _{L^{\infty }(\mathbb {R}^{d})} \end{aligned}$$

is an s-smooth nonlocal Hamilton–Jacobi subsolution, belonging to \(BL([0,1]\times \mathbb {R}^{d})\).

Proof

Let \(\phi _{t}\in BL([0,1]\times \mathbb {R}^{d})\) satisfy \(\partial _{t}\phi _{t}(x)+\frac{1}{2}|\nabla \phi _{t}(x)|^{2}\le 0\) almost everywhere. Then \(\phi _{t}^{s}(x):=(K_{s}*\phi _{t})(x)\) also satisfies

$$\begin{aligned} \partial _{t}\phi _{t}^{s}(x)+\frac{1}{2}|\nabla \phi _{t}^{s}(x)|^{2}\le 0\quad (t,x)\text {-a.e.} \end{aligned}$$

by Lemma B.7. Additionally, we observe that \(\phi _{t}^{s}\in BL([0,1]\times \mathbb {R}^{d})\), thanks to Lemma B.10.

We claim that some slight modification of

$$\begin{aligned} \tilde{\phi }_{t}^{s,\varepsilon }:=\frac{2d}{\varepsilon ^{2}M_{2}(\eta )}\phi _{t}^{s}, \end{aligned}$$

more precisely

$$\begin{aligned} \check{\phi }_{t}^{s,\varepsilon }:=\tilde{\phi }_{t}^{s,\varepsilon }-\frac{CA^{2}}{\varepsilon s}t \end{aligned}$$

where \(C>0\) and \(A>0\) are some constants to be determined later on, is a subsolution to the s-smoothed nonlocalHJ equation.

First, observe that since \(\partial _{t}\phi _{t}^{s}(x)+\frac{1}{2}|\nabla \phi _{t}^{s}(x)|^{2}\le 0\), it follows that

$$\begin{aligned} \frac{2d}{\varepsilon ^{2}M_{2}(\eta )}\partial _{t}\phi _{t}^{s}(x)+\frac{2d}{\varepsilon ^{2}M_{2}(\eta )}\frac{1}{2}|\nabla \phi _{t}^{s}(x)|^{2}\le 0 \end{aligned}$$

and so

$$\begin{aligned} \partial _{t}\tilde{\phi }_{t}^{s,\varepsilon }+\frac{1}{4}\varepsilon ^{2}\frac{M_{2}(\eta )}{d}|\nabla \tilde{\phi }_{t}^{s,\varepsilon }(x)|^{2}\le 0. \end{aligned}$$

Second, note that replacing \(\tilde{\phi }_{t}^{s,\varepsilon }\) with \(\check{\phi }_{t}^{s,\varepsilon }:=\tilde{\phi }_{t}^{s,\varepsilon }-\frac{CA^{2}}{\varepsilon s}t\) leaves the gradient unchanged (that is, \(\nabla \tilde{\phi }_{t}^{s,\varepsilon }(x)=\nabla \check{\phi }_{t}^{s,\varepsilon }(x)\)), whereas \(\partial _{t}(\tilde{\phi }_{t}^{s,\varepsilon }-\frac{CA^{2}}{\varepsilon s}t)=\partial _{t}\tilde{\phi }_{t}^{s,\varepsilon }-\frac{CA^{2}}{\varepsilon s}\). In particular, given that

$$\begin{aligned} \partial _{t}\tilde{\phi }_{t}^{s,\varepsilon }+\frac{1}{4}\varepsilon ^{2}\frac{M_{2}(\eta )}{d}|\nabla \tilde{\phi }_{t}^{s,\varepsilon }(x)|^{2}\le 0 \end{aligned}$$

we have that

$$\begin{aligned} \partial _{t}\left( \tilde{\phi }_{t}^{s,\varepsilon }-\frac{CA^{2}}{\varepsilon s}t\right) +\frac{1}{4}\varepsilon ^{2}\frac{M_{2}(\eta )}{d}|\nabla \left( \tilde{\phi }_{t}^{s,\varepsilon }-\frac{CA^{2}}{\varepsilon s}t\right) (x)|^{2}\le -\frac{CA^{2}}{\varepsilon s} \end{aligned}$$

so we know that \(\check{\phi }_{t}^{s,\varepsilon }\) is also a (local) HJ subsolution. At the same time, it is clear that \(\check{\phi }_{t}^{s,\varepsilon }\in BL([0,1]\times \mathbb {R}^{d})\), from the fact that \(\phi _{t}^{s}\in BL([0,1]\times \mathbb {R}^{d})\).

Given some arbitrary \(\mu \in \mathcal {P}(\mathbb {R}^{d})\), we denote \(\rho :=K_{s}*\mu \). Note that \(\rho \) is a Lebesgue density. Then, we see that

$$\begin{aligned}{} & {} \int \partial _{t}\check{\phi }_{t}^{s,\varepsilon }(x)d\rho (x)+\frac{1}{4}\int \int (\check{\phi }_{t}^{s,\varepsilon }(y)-\check{\phi }_{t}^{s,\varepsilon }(x))^{2}\theta (\rho (x),\rho (y))\eta _{\varepsilon }(x,y)dxdy\\{} & {} \begin{aligned}&\le -\frac{1}{4}\varepsilon ^{2}\frac{M_{2}(\eta )}{d}\int |\nabla \check{\phi }_{t}^{s,\varepsilon }(x)|^{2}d\rho (x)-\frac{CA^{2}}{\varepsilon s}\\&+\frac{1}{4}\iint (\check{\phi }_{t}^{s,\varepsilon }(y)-\check{\phi }_{t}^{s,\varepsilon }(x))^{2}\theta (\rho (x),\rho (y))\eta _{\varepsilon }(x,y)dxdy\\&\le -\frac{1}{4}\varepsilon ^{2}\frac{M_{2}(\eta )}{d}\int |\nabla \check{\phi }_{t}^{s,\varepsilon }(x)|^{2}d\rho (x)-\frac{CA^{2}}{\varepsilon s} \\&+\frac{1}{4}\iint (\check{\phi }_{t}^{s,\varepsilon }(y)-\check{\phi }_{t}^{s,\varepsilon }(x))^{2}\left( 1+\frac{3}{2s}|y-x|\right) \rho (x)\eta _{\varepsilon }(x,y)dxdy \end{aligned} \end{aligned}$$

where we have used the fact that \(K=c_{K}e^{-|x|}\), which implies (thanks to Lemma 2.24) that

$$\begin{aligned} \theta (\rho (x),\rho (y))=\rho (x)\theta \left( 1,\frac{\rho (y)}{\rho (x)}\right) \le \rho (x)\theta \left( 1,1+\frac{3}{s}|y-x|\right) \le \rho (x)\left( 1+\frac{3}{2s}|y-x|\right) . \end{aligned}$$

Next, we replace \((\check{\phi }_{t}^{s,\varepsilon }(y)-\check{\phi }_{t}^{s,\varepsilon }(x))^{2}\) with an expression involving \(|\nabla \check{\phi }_{t}^{s,\varepsilon }|^{2}\), using a Taylor approximation. Note that such a Taylor series approximation requires sufficient regularity of \(\check{\phi }_{t}^{s,\varepsilon }\) (which is why we are using the smooth potential \(\check{\phi }_{t}^{s,\varepsilon }\) as our candidate s-smoothed nonlocal HJ subsolution).

To wit,

$$\begin{aligned}{} & {} |(\check{\phi }_{t}^{s,\varepsilon }(y)-\check{\phi }_{t}^{s,\varepsilon }(x))|-|\nabla \check{\phi }_{t}^{s,\varepsilon }(x)\cdot (x-y)|\le |(\check{\phi }_{t}^{s,\varepsilon }(y)-\check{\phi }_{t}^{s,\varepsilon }(x))-\nabla \check{\phi }_{t}^{s,\varepsilon }(x)\cdot (x-y)|\\{} & {} \le \frac{1}{2}\Vert D^{2}\check{\phi }_{t}^{s,\varepsilon }\Vert _{\infty }|x-y|^{2} \end{aligned}$$

and so

$$\begin{aligned}{} & {} (\check{\phi }_{t}^{s,\varepsilon }(y)-\check{\phi }_{t}^{s,\varepsilon }(x))^{2}\le \frac{1}{4}\Vert D^{2}\check{\phi }_{t}^{s,\varepsilon }\Vert _{\infty }^{2}|x-y|^{4}\\{} & {} \quad +\Vert D^{2}\check{\phi }_{t}^{s,\varepsilon }\Vert _{\infty }|x-y|^{2}|\nabla \check{\phi }_{t}^{s,\varepsilon }(x)\cdot (x-y)|+|\nabla \check{\phi }_{t}^{s,\varepsilon }(x)\cdot (x-y)|^{2}. \end{aligned}$$

And, since \(|\nabla \check{\phi }_{t}^{s,\varepsilon }(x)\cdot (x-y)|\le |\nabla \check{\phi }_{t}^{s,\varepsilon }(x)||x-y|\), it follows that

$$\begin{aligned}{} & {} \frac{1}{4}\iint (\check{\phi }_{t}^{s,\varepsilon }(y)-\check{\phi }_{t}^{s,\varepsilon }(x))^{2}\left( 1+\frac{3}{2s}|y-x|\right) \rho (x)\eta _{\varepsilon }(x,y)dxdy\\{} & {} \begin{aligned}&\le \frac{1}{4}\iint |\nabla \check{\phi }_{t}^{s,\varepsilon }(x)\cdot (x-y)|^{2}\left( 1+\frac{3}{2s}|y-x|\right) \rho (x)\eta _{\varepsilon }(x,y)dxdy\\&\quad +\frac{1}{4}\iint \Vert D^{2}\check{\phi }_{t}^{s,\varepsilon }\Vert _{\infty }|x-y|^{2}|\nabla \check{\phi }_{t}^{s,\varepsilon }(x)\cdot (x-y)|\left( 1+\frac{3}{2s}|y-x|\right) \rho (x)\eta _{\varepsilon }(x,y)dxdy\\&\quad +\frac{1}{16}\iint \Vert D^{2}\check{\phi }_{t}^{s,\varepsilon }\Vert _{\infty }^{2}|x-y|^{4}\left( 1+\frac{3}{2s}|y-x|\right) \rho (x)\eta _{\varepsilon }(x,y)dxdy\\&:=I_{\varepsilon }+II_{\varepsilon }+III_{\varepsilon }. \end{aligned} \end{aligned}$$

Since (as we will explicitly show below) the latter two terms are higher order in \(\varepsilon \), we initially focus our attention on the first term \(I_{\varepsilon }\).

In what follows, recall our notation \(M_{p}(\eta ):=\int |x-y|^{p}\eta (|x-y|)dy\); note that this does not depend on the choice of \(x\in \mathbb {R}^{d}\). Additionally, observe that \(M_{p}(\eta _{\varepsilon })=\varepsilon ^{p}M_{p}(\eta ).\)

From [29, Corollary 2.16] (together with Lemma B.2), we know that

$$\begin{aligned} \frac{1}{4}\iint |\nabla \check{\phi }_{t}^{s,\varepsilon }(x)\cdot (x-y)|^{2}\rho (x)\eta _{\varepsilon }(x,y)dxdy=\frac{1}{4}\varepsilon ^{2}\frac{M_{2}(\eta )}{d}\int |\nabla \check{\phi }_{t}^{s,\varepsilon }(x)|^{2}\rho (x)dx. \end{aligned}$$

Similarly, compute that

$$\begin{aligned}&\iint |\nabla \check{\phi }_{t}^{s,\varepsilon }(x)\cdot (x-y)|^{2}\left( \frac{3}{2s}|x-y|\right) \rho (x)\eta _{\varepsilon }(x,y)dxdy\\&\le \frac{3}{2s}\iint |\nabla \check{\phi }_{t}^{s,\varepsilon }(x)|^{2}|x-y|^{3}\rho (x)\eta _{\varepsilon }(x,y)dxdy\\&=\frac{3}{2}\frac{\varepsilon ^{3}}{s}M_{3}(\eta )\int |\nabla \check{\phi }_{t}^{s,\varepsilon }(x)|^{2}\rho (x)dx. \end{aligned}$$

Consequently,

$$\begin{aligned} I_{\varepsilon }\le \left( \frac{1}{4}\varepsilon ^{2}\frac{M_{2}(\eta )}{d}+\frac{3}{8}\frac{\varepsilon ^{3}}{s}M_{3}(\eta )\right) \int |\nabla \check{\phi }_{t}^{s,\varepsilon }(x)|^{2}\rho (x)dx \end{aligned}$$

and so

$$\begin{aligned}{} & {} \frac{1}{4}\iint (\check{\phi }_{t}^{s,\varepsilon }(y)-\breve{\phi }_{t}^{s,\varepsilon }(x))^{2}\left( 1+\frac{3}{2s}|y-x|\right) \rho (x)\eta _{\varepsilon }(x,y)dxdy\\{} & {} \le \left( \frac{1}{4}\frac{M_{2}(\eta )}{d}\varepsilon ^{2}+\frac{3}{8}M_{3}(\eta )\frac{\varepsilon ^{3}}{s}\right) \int |\nabla \check{\phi }_{t}^{s,\varepsilon }(x)|^{2}\rho (x)dx+II_{\varepsilon }+III_{\varepsilon }. \end{aligned}$$

Therefore, we find that

$$\begin{aligned}&\int \partial _{t}\check{\phi }_{t}^{s,\varepsilon }(x)d\rho (x)+\frac{1}{4}\int \int (\check{\phi }_{t}^{s,\varepsilon }(y)-\check{\phi }_{t}^{s,\varepsilon }(x))^{2}\theta (\rho (x),\rho (y))\eta _{\varepsilon }(x,y)dxdy \\&\le -\frac{1}{4}\varepsilon ^{2}\frac{M_{2}(\eta )}{d}\int |\nabla \check{\phi }_{t}^{s,\varepsilon }(x)|^{2}d\rho (x)\\&-\frac{CA^{2}}{\varepsilon s}+\left( \frac{1}{4}\frac{M_{2}(\eta )}{d}\varepsilon ^{2}+\frac{3}{8}M_{3}(\eta )\frac{\varepsilon ^{3}}{s}\right) \int |\nabla \check{\phi }_{t}^{s,\varepsilon }(x)|^{2}\rho (x)dx \\&+II_{\varepsilon }+III_{\varepsilon }\\&=-\frac{CA^{2}}{\varepsilon s}+\frac{3}{8}M_{3}(\eta )\frac{\varepsilon ^{3}}{s}\int |\nabla \check{\phi }_{t}^{s,\varepsilon }(x)|^{2}\rho (x)dx+II_{\varepsilon }+III_{\varepsilon }. \end{aligned}$$

Hence, in order for \(\check{\phi }_{t}^{s,\varepsilon }\) to be an s-smoothed nonlocal HJ subsolution, it suffices to show that for \(s>\varepsilon \) and an appropriate choice of A and C,

$$\begin{aligned} -\frac{CA^{2}}{\varepsilon s}+\frac{3}{8}M_{3}(\eta )\frac{\varepsilon ^{3}}{s}\int |\nabla \check{\phi }_{t}^{s,\varepsilon }(x)|^{2}\rho (x)dx+II_{\varepsilon }+III_{\varepsilon }\le 0. \end{aligned}$$

To see when this occurs, we first use the estimate

Consequently, for any probability density \(\rho (x)dx\), it holds that

$$\begin{aligned} \frac{3}{8}M_{3}(\eta )\frac{\varepsilon ^{3}}{s}\int |\nabla \check{\phi }_{t}^{s,\varepsilon }(x)|^{2}\rho (x)dx&\le \frac{3}{2}\frac{d^{2}M_{3}(\eta )}{\varepsilon sM_{2}(\eta )^{2}}\Vert \nabla \phi _{t}\Vert _{\infty }^{2}. \end{aligned}$$

We also note that this implies that

$$\begin{aligned} I_{\varepsilon }&\le \left( \frac{d}{\varepsilon ^{2}M_{2}(\eta )}+\frac{3}{2}\frac{d^{2}M_{3}(\eta )}{\varepsilon sM_{2}(\eta )^{2}}\right) \Vert \nabla \phi _{t}\Vert _{\infty }^{2}. \end{aligned}$$
(5.3)

Having analyzed the term \(I_{\varepsilon }\) from the Taylor series expansion, we consider the first terms \(II_{\varepsilon }\) and \(III_{\varepsilon }\), both of which involve the quantity \(\Vert D^{2}\check{\phi }_{t}^{s,\varepsilon }\Vert _{\infty }\). For the term \(III_{\varepsilon }\): using the fact that \(\Vert D^{2}\check{\phi }_{t}^{s,\varepsilon }\Vert _{\infty }=\Vert D^{2}\tilde{\phi }_{t}^{s,\varepsilon }\Vert _{\infty }=\frac{2d}{\varepsilon ^{2}M_{2}(\eta )}\Vert D^{2}\phi _{t}^{s}\Vert _{\infty }\),

and also, by Lemma B.8 that \(\Vert D^{2}\phi _{t}^{s}\Vert _{\infty }^{2}\le s^{-2}\Vert \nabla \phi _{t}\Vert _{\infty }^{2}\), we then see that

$$\begin{aligned} \begin{aligned} III_{\varepsilon }&=\frac{1}{16}\iint \Vert D^{2}\check{\phi }_{t}^{s,\varepsilon }\Vert _{\infty }^{2}|x-y|^{4}\left( 1+\frac{3}{2s}|y-x|\right) \rho (x)\eta _{\varepsilon }(x,y)dxdy\\&\le \frac{1}{4}\frac{d^{2}}{\left( \varepsilon ^{2}M_{2}(\eta )\right) ^{2}}\frac{1}{s^{2}}\Vert \nabla \phi _{t}\Vert _{\infty }^{2}\left( M_{4}(\eta )\varepsilon ^{4}+\frac{3}{2}M_{5}(\eta )\frac{\varepsilon ^{5}}{s}\right) \int \rho (x)dx \\&=\frac{1}{4}\frac{d^{2}}{M_{2}(\eta )^{2}}\frac{1}{s^{2}}\Vert \nabla \phi _{t}\Vert _{\infty }^{2}\left( M_{4}(\eta )+\frac{3}{2}M_{5}(\eta )\frac{\varepsilon }{s}\right) . \end{aligned} \end{aligned}$$
(5.4)

It remains to consider the second “mixed” term in the Taylor series expansion. Using Holder’s inequality, we observe that

$$\begin{aligned} II_{\varepsilon }&:=\frac{1}{4}\iint \Vert D^{2}\check{\phi }_{t}^{s,\varepsilon }\Vert _{\infty }|x-y|^{2}|\nabla \check{\phi }_{t}^{s,\varepsilon }(x)\cdot (x-y)|\left( 1+\frac{3}{2s}|y-x|\right) \rho (x)\eta _{\varepsilon }(x,y)dxdy\\&\le \left( \frac{1}{4}\iint \left( |\nabla \check{\phi }_{t}^{s,\varepsilon }(x)\cdot (x-y)|\left( 1+\frac{3}{2s}|y-x|\right) ^{1/2}\right) ^{2}\rho (x)\eta _{\varepsilon }(x,y)dxdy\right) ^{1/2}\\&\quad \times \left( \frac{1}{4}\iint \left( \Vert D^{2}\check{\phi }_{t}^{s,\varepsilon }\Vert _{\infty }|x-y|^{2}\left( 1+\frac{3}{2s}|y-x|\right) ^{1/2}\right) ^{2}\rho (x)\eta _{\varepsilon }(x,y)dxdy\right) ^{1/2}\\&=(I_{\varepsilon }\times 4\cdot III_{\varepsilon })^{1/2}. \end{aligned}$$

So, plugging in Eqs. 5.3 and 5.4, we see that

$$\begin{aligned} II_{\varepsilon }&\le \left( \left( \left( \frac{d}{\varepsilon ^{2}M_{2}(\eta )}+\frac{3}{2}\frac{d^{2}M_{3}(\eta )}{\varepsilon sM_{2}(\eta )^{2}}\right) \Vert \nabla \phi _{t}\Vert _{\infty }^{2}\right) \left( \frac{d^{2}}{M_{2}(\eta )^{2}}\frac{1}{s^{2}}\Vert \nabla \phi _{t}\Vert _{\infty }^{2}\left( M_{4}(\eta )+\frac{3}{2}M_{5}(\eta )\frac{\varepsilon }{s}\right) \right) \right) ^{1/2}\\&=\frac{d^{2}}{\varepsilon sM_{2}(\eta )^{2}}\sqrt{\left( \frac{M_{2}(\eta )}{d}+\frac{3}{2}M_{3}(\eta )\frac{\varepsilon }{s}\right) \left( M_{4}(\eta )+\frac{3}{2}M_{5}(\eta )\frac{\varepsilon }{s}\right) }\Vert \nabla \phi _{t}\Vert _{\infty }^{2}. \end{aligned}$$

In sum: we find that it suffices to pick C in such a way that

$$\begin{aligned}{} & {} -\frac{CA^{2}}{\varepsilon s}+\frac{3}{8}\frac{d^{2}M_{3}(\eta )}{\varepsilon sM_{2}(\eta )^{2}}\Vert \nabla \phi _{t}\Vert _{\infty }^{2}\\{} & {} +\frac{d^{2}}{\varepsilon sM_{2}(\eta )^{2}}\sqrt{\left( \frac{M_{2}(\eta )}{d}+\frac{3}{2}M_{3}(\eta )\frac{\varepsilon }{s}\right) \left( M_{4}(\eta )+\frac{3}{2}M_{5}(\eta )\frac{\varepsilon }{s}\right) }\Vert \nabla \phi _{t}\Vert _{\infty }^{2}\\{} & {} +\frac{1}{4}\frac{d^{2}}{M_{2}(\eta )^{2}}\frac{1}{s^{2}}\left( M_{4}(\eta )+\frac{3}{2}M_{5}(\eta )\frac{\varepsilon }{s}\right) \Vert \nabla \phi _{t}\Vert _{\infty }^{2}\le 0. \end{aligned}$$

Using the fact that \(s\ge \varepsilon \) and \(\varepsilon \in (0,1]\), we see that it suffices to pick

$$\begin{aligned} C=\frac{d^{2}}{M_{2}(\eta )^{2}}\left[ \frac{3}{8}M_{3}(\eta )+\sqrt{\left( \frac{M_{2}(\eta )}{d}+\frac{3}{2}M_{3}(\eta )\right) \left( M_{4}(\eta )+\frac{3}{2}M_{5}(\eta )\right) }+\frac{1}{4}\left( M_{4}(\eta )+\frac{3}{2}M_{5}(\eta )\right) \right] \end{aligned}$$

and \( A^{2}\ge \sup _{t}\Vert \nabla \phi _{t}\Vert _{\infty }^{2}. \) \(\square \)

Corollary 5.12

Suppose that \(\eta \) and \(\theta \) satisfy Assumptions 2.1 and 2.2 respectively. Let \(\varepsilon \in (0,1]\). Let \(\mu _{0},\mu _{1}\in \mathcal {P}(\mathbb {R}^{d})\) and assume both \(\mu _{0}\) and \(\mu _{1}\) are supported inside some set with diameter at most R. Then, we have that

$$\begin{aligned} W_{2}^{2}(\mu _{0},\mu _{1})\le \varepsilon ^{2}\frac{M_{2}(\eta )}{2d}\mathcal {W}_{\eta ,\varepsilon }^{2}(\mu _{0},\mu _{1})+\left( \frac{7}{4}dR^{2}+8dR\right) \sqrt{\varepsilon }. \end{aligned}$$

Proof

Suppose by Corollary 5.3 that \((\phi _{t})_{t\in [0,1]}\) is the optimal HJ subsolution for \((\mu _{0},\mu _{1})\), that is, \((\phi _{t})_{t\in [0,1]}\) satisfies

$$\begin{aligned}{} & {} \frac{1}{2}W_{2}^{2}(\mu _{0},\mu _{1})\\{} & {} \quad =\underset{\phi \in BL([0,1]\times \mathbb {R}^{d})}{\text {argmax}}\left\{ \int \phi _{1}d\mu _{1}-\int \phi _{0}d\mu _{0}:\partial _{t}\phi _{t}+\frac{1}{2}|\nabla \phi _{t}|^{2}=0\text { in viscosity sense}\right\} . \end{aligned}$$

Note that \((\phi _{t})_{t\in [0,1]}\) is also a (not necessarily optimal) Hamilton–Jacobi solution for \((\varvec{K_{2s}}*\mu _{0},\varvec{K_{2s}}*\mu _{1})\), and so

$$\begin{aligned} \frac{1}{2}W_{2}^{2}(\mu _{0},\mu _{1})\ge \frac{1}{2}W_{2}^{2}(\varvec{K_{2s}}*\mu _{0},\varvec{K_{2s}}*\mu _{1})\ge \int \phi _{1}d(\varvec{K_{2s}}*\mu _{1})-\int \phi _{0}d(\varvec{K_{2s}}*\mu _{0}). \end{aligned}$$

Furthermore,

$$\begin{aligned} \int \phi _{1}d(\varvec{K_{2s}}*\mu _{1})-\int \phi _{0}d(\varvec{K_{2s}}*\mu _{0})&=\int \phi _{1}^{s}d(\varvec{K_{s}}*\mu _{1})-\int \phi _{0}^{s}d(\varvec{K_{s}}*\mu _{0}).\\&=\int \phi _{1}^{2s}d\mu _{1}-\int \phi _{0}^{2s}d\mu _{0} \end{aligned}$$

At the same time, \(\phi _{1}^{s}\rightarrow \phi _{1}\) and \(\phi _{0}^{s}\rightarrow \phi _{0}\) uniformly on compact sets as \(s\rightarrow 0\). Quantitatively, by Corollary 5.3 one has the estimate

$$\begin{aligned} \mid \phi -K_{s}*\phi |\le s\cdot \text {Lip}\phi \le sRM_{1}(K) \end{aligned}$$

where \(M_{1}(K):=\int _{\mathbb {R}^{d}}c_{K}|x|e^{-|x|}dx\). Hence

$$\begin{aligned} \int |\phi _{i}-\phi _{i}^{2s}|d\mu _{i}\le 2sRM_{1}(K)\qquad i=0,1 \end{aligned}$$

and therefore

$$\begin{aligned} \int \phi _{1}^{2s}d\mu _{1}-\int \phi _{0}^{2s}d\mu _{0}\ge \int \phi _{1}d\mu _{1}-\int \phi _{0}d\mu _{0}-4sRM_{1}(K). \end{aligned}$$

Together, this implies that

$$\begin{aligned} \frac{1}{2}W_{2}^{2}(\mu _{0},\mu _{1})\ge \int \phi _{1}^{s}d(\varvec{K_{s}}*\mu _{1})-\int \phi _{0}^{s}d(\varvec{K_{s}}*\mu _{0})\ge \frac{1}{2}W_{2}^{2}(\mu _{0},\mu _{1})-4sRM_{1}(K). \end{aligned}$$

Since \(\phi _{t}\) is a viscosity solution of the Hamilton–Jacobi equation and is Lipschitz, it is also a Lebesgue (xt)-almost everywhere solution, by Rademacher’s theorem and [26, Theorem 10.1.1]. Therefore, we can apply Proposition 5.11, and deduce (defining \(\tilde{\phi }_{t}^{s,\varepsilon }:=\frac{2d}{\varepsilon ^{2}M_{2}(\eta )}\phi _{t}^{s}\), as in Proposition 5.11) that

$$\begin{aligned} \frac{d}{\varepsilon ^{2}M_{2}(\eta )}W_{2}^{2}(\mu _{0},\mu _{1})\ge & {} \int \tilde{\phi }_{1}^{s,\varepsilon }d(\varvec{K_{s}}*\mu _{1})-\int \tilde{\phi }_{0}^{s,\varepsilon }d(\varvec{K_{s}}*\mu _{0})\\\ge & {} \frac{d}{\varepsilon ^{2}M_{2}(\eta )}\left( W_{2}^{2}(\mu _{0},\mu _{1})-8sRM_{1}(K)\right) \end{aligned}$$

and likewise (defining \(\check{\phi }_{t}^{s,\varepsilon }:=\tilde{\phi }_{t}^{s,\varepsilon }-\frac{CA^{2}}{\varepsilon s}t\), as in Proposition 5.11)

$$\begin{aligned}&\int \check{\phi }_{1}^{s,\varepsilon }d(\varvec{K_{s}}*\mu _{1})-\int \check{\phi }_{0}^{s,\varepsilon }d(\varvec{K_{s}}*\mu _{0}) \\&=\int \left( \tilde{\phi }_{t}^{s,\varepsilon }-\frac{CA^{2}}{\varepsilon s}\right) d(\varvec{K_{s}}*\mu _{1})-\int \tilde{\phi }_{t}^{s,\varepsilon }d(\varvec{K_{s}}*\mu _{0})\\&\ge \frac{d}{\varepsilon ^{2}M_{2}(\eta )}\left( W_{2}^{2}(\mu _{0},\mu _{1})-8sRM_{1}(K)\right) -\frac{CA^{2}}{\varepsilon s}. \end{aligned}$$

Observe that \(\check{\phi }_{t}^{s,\varepsilon }\in BL([0,1]\times \mathbb {R}^{d})\). At the same time, we know, by Lemma 5.3, that \(\sup _{t}\Vert \nabla \phi _{t}\Vert _{\infty }^{2}\le R^{2}\); so if we put \(A=R\) and C as in Proposition 5.11, then by Proposition 5.11, \(\check{\phi }_{t}^{s}\) is an s-smooth nonlocal HJ subsolution. Therefore, we have that

$$\begin{aligned} \int \check{\phi }_{1}^{s,\varepsilon }d(\varvec{K_{s}}*\mu _{1})-\int \check{\phi }_{0}^{s,\varepsilon }d(\varvec{K_{s}}*\mu _{0})&\le \! \sup _{\phi _{t}(x)\in \text {HJ}_{\text {NL}}^{1,s}\cap BL([0,1]\times \mathbb {R}^{d})} \int \phi _{1}d(\varvec{K_{s}}*\mu _{1})\\&\quad -\int \phi _{0}d(\varvec{K_{s}}*\mu _{0})\\ \text {(Proposition~5.10)}&\le \frac{1}{2}\mathcal {W}_{\eta ,\varepsilon ,s}^{2}(\varvec{K_{s}}*\mu _{0},\varvec{K_{s}}*\mu _{1}). \end{aligned}$$

This implies that

$$\begin{aligned} \frac{d}{\varepsilon ^{2}M_{2}(\eta )}\left( W_{2}^{2}(\mu _{0},\mu _{1})-8sRM_{1}(K)\right) -\frac{CR^{2}}{\varepsilon s}\le \frac{1}{2}\mathcal {W}_{\eta ,\varepsilon ,s}^{2}(\varvec{K_{s}}*\mu _{0},\varvec{K_{s}}*\mu _{1}) \end{aligned}$$

and thus

$$\begin{aligned} W_{2}^{2}(\mu _{0},\mu _{1})-8sRM_{1}(K) \le \varepsilon ^{2}\frac{M_{2}(\eta )}{2d}\left( \mathcal {W}_{\eta ,\varepsilon ,s}^{2}(\varvec{K_{s}}*\mu _{0},\varvec{K_{s}}*\mu _{1})+\frac{CR^{2}}{\varepsilon s}\right) . \end{aligned}$$

Now, since \(\eta \) is supported on B(0, 1), all of the higher moments \(M_{3}(\eta )\), \(M_{4}(\eta )\), and \(M_{5}(\eta )\) are bounded above by \(M_{2}(\eta )\), we find that

$$\begin{aligned} C\le \frac{d^{2}}{M_{2}(\eta )}\left[ \frac{3}{8}+\sqrt{\left( \frac{1}{d}+\frac{3}{2}\right) \frac{5}{2}}+\frac{5}{16}\right] \end{aligned}$$

so that

$$\begin{aligned} W_{2}^{2}(\mu _{0},\mu _{1})\le & {} \varepsilon ^{2}\frac{M_{2}(\eta )}{2d}\mathcal {W}_{\eta ,\varepsilon ,s}^{2}(\varvec{K_{s}}*\mu _{0},\varvec{K_{s}}*\mu _{1})\\{} & {} +\left[ \frac{3}{8}+\sqrt{\left( \frac{1}{d}+\frac{3}{2}\right) \frac{5}{2}}+\frac{5}{16}\right] R^{2}\frac{d\varepsilon }{2s}+8sRM_{1}(K). \end{aligned}$$

Since \(\mathcal {W}_{\eta ,\varepsilon ,s}^{2}(\varvec{K_{s}}*\mu _{0},\varvec{K_{s}}*\mu _{1})\le \mathcal {W}_{\eta ,\varepsilon }^{2}(\mu _{0},\mu _{1})\) by Lemma 5.9, we deduce that

$$\begin{aligned} W_{2}^{2}(\mu _{0},\mu _{1})\le&\varepsilon ^{2}\frac{M_{2}(\eta )}{2d}\mathcal {W}_{\eta ,\varepsilon }^{2}(\mu _{0},\mu _{1})+\frac{7}{4}dR^{2}\frac{\varepsilon }{s}++8sRM_{1}(K). \end{aligned}$$

Finally, we set \(s=\sqrt{\varepsilon }\), and use the fact that \(M_1(K)=d\), by Lemma B.6. \(\square \)