Abstract
We consider dynamics driven by interaction energies on graphs. We introduce graph analogues of the continuum nonlocalinteraction equation and interpret them as gradient flows with respect to a graph Wasserstein distance. The particular Wasserstein distance we consider arises from the graph analogue of the Benamou–Brenier formulation where the graph continuity equation uses an upwind interpolation to define the density along the edges. While this approach has both theoretical and computational advantages, the resulting distance is only a quasimetric. We investigate this quasimetric both on graphs and on more general structures where the set of “vertices” is an arbitrary positive measure. We call the resulting gradient flow of the nonlocalinteraction energy the nonlocal nonlocalinteraction equation (NL\(^2\)IE). We develop the existence theory for the solutions of the NL\(^2\)IE as curves of maximal slope with respect to the upwind Wasserstein quasimetric. Furthermore, we show that the solutions of the NL\(^2\)IE on graphs converge as the empirical measures of the set of vertices converge weakly, which establishes a valuable discretetocontinuum convergence result.
Notation
We list here some symbols used throughout the paper.

\({\mathcal {M}}(A)\) is the set of Borel measures on \(A \subseteq {\mathbb {R}}^d\).

\({\mathcal {M}}^+(A)\) is the set of nonnegative Borel measures on A.

\({\mathcal {P}}(A)\subset {\mathcal {M}}^+(A)\) is the set of Borel probability measures on A.

\({\mathcal {P}}_{2}(A)\subseteq {\mathcal {P}}(A)\) stands for the elements of \({\mathcal {P}}(A)\) with finite second moment, that is,
$$\begin{aligned} M_2(\rho ) := {\int }_{A} x^2\,\text {d}\rho (x) < \infty . \end{aligned}$$ 
\(C_\mathrm {b}(A)\) is the set of bounded continuous functions from A to \({\mathbb {R}}\).

\(a_+:=\max \{0,a\}\) and \(a_:=(a)_+\) are the positive and negative parts of \(a \in {\mathbb {R}}\).

\(\mu \in {\mathcal {M}}^+({\mathbb {R}}^d)\) sets the underlying geometry of the state space; it is sometimes referred to as base measure.

\(\rho \in {\mathcal {P}}({\mathbb {R}}^d)\) denotes a configuration; the natural setting is that \({{\,\mathrm{supp}\,}}\rho \subseteq {{\,\mathrm{supp}\,}}\mu \), although we allow for general supports as needed for stability results.

\(\eta :\{ (x,y)\in {\mathbb {R}}^d \times {\mathbb {R}}^d : x\ne y \}\rightarrow [0,\infty )\) is the edge weight function.

\(G= \{ (x,y) \in {\mathbb {R}}^d \times {\mathbb {R}}^d : x\ne y ,\, \eta (x,y)>0\}\) are the edges.

\(\rho _1\otimes \rho _2 \in {\mathcal {M}}^+(G)\) is the product measure of \(\rho _1, \rho _2 \in {\mathcal {M}}^+({\mathbb {R}}^d)\) restricted to G.

\(\gamma _1 = \rho \otimes \mu \) and \(\gamma _2 = \mu \otimes \rho \).

\({\mathcal {V}}^{\mathrm {as}}(G)\) is the set of antisymmetric graph vector fields on G, defined in (1.6).

\({\overline{\nabla }}f\) is the nonlocal gradient of a function \(f :{\mathbb {R}}^d \rightarrow {\mathbb {R}}\), while \({\overline{\nabla }}\cdot {\varvec{j}}\) is the nonlocal divergence of a measurevalued flux \({\varvec{j}}\in {\mathcal {M}}(G)\); see Definition 2.7.

\({\mathcal {A}}\) stands for the action functional; see Definition 2.3.

\({\mathcal {T}}\) denotes the nonlocal transportation quasimetric; see (2.22).

\({{\,\mathrm{CE}\,}}_T(\rho _0,\rho _1)\) denotes the set of paths (solutions to the nonlocal continuity equation for densities (1.7) or measures (2.12)) on the time interval [0, T] connecting two measures \(\rho _0, \rho _1\in {\mathcal {P}}({\mathbb {R}}^d)\); we set \({{\,\mathrm{CE}\,}}:={{\,\mathrm{CE}\,}}_1\).
Let us also specify the notions of narrow convergence and convolution. A sequence \((\rho ^n)_n\subset {\mathcal {M}}(A)\) is said to converge narrowly to \(\rho \in {\mathcal {M}}(A)\), in which case we write \(\rho ^n \rightharpoonup \rho \), provided that
Given a function \(f :A \times A \rightarrow {\mathbb {R}}\) and \(\rho \in {\mathcal {M}}(A)\), we write \(f*\rho \) the convolution of f and \(\rho \), that is,
Introduction
We investigate dynamics driven by interaction energies on graphs, and their continuum limits. We interpret the relevant dynamics as gradient flows of the interaction energy with respect to a particular graph analogue of the Wasserstein distance. We prove the convergence of the dynamics on finite graphs to a continuum dynamics as the number of vertices goes to infinity. To do this we create a unified setup where the continuum and the discrete dynamics are both seen as particular instances of the gradient flow of the same energy, with respect to a nonlocal Wasserstein quasimetric whose state space is adapted to the configuration space considered.
Let us first introduce the problem on finite graphs where it is the simplest to describe.
Graph Setting with General Interactions
Consider an undirected graph with vertices \(X =\{x_1, \dots , x_n\}\) and edge weights \(w_{x,y} \geqq 0\), satisfying \(w_{x,y} = w_{y,x}\) for all \(x,y \in X\). Although technically not necessary, we impose the natural requirement that \(w_{x,x}=0\). The interaction potential is a symmetric function \(K :X \times X \rightarrow {\mathbb {R}}\), while the external potential is denoted \(P:X \rightarrow {\mathbb {R}}\). We consider a “mass” distribution \(\rho :X \rightarrow [0, \infty )\), and we require \(\sum _{x \in X} \rho _x =1\). The total energy \({\mathcal {E}}_X:{\mathcal {P}}(X)\rightarrow {\mathbb {R}}\) is a combination of the interaction energy \({\mathcal {E}}_I\) and the potential energy \({\mathcal {E}}_P\):
The gradient descent of \({\mathcal {E}}_X\) that we study is described by the following system of ODE for the mass distribution:
The quantities \(v:X \times X \rightarrow {\mathbb {R}}\) and \(j:X \times X \rightarrow {\mathbb {R}}\) are defined on edges and model the graph analogues of velocity and flux. An evolution by such system is illustrated on Fig. 1. The system (1.2)–(1.4) is the gradient flow of the energy \({\mathcal {E}}_X\) with respect to a new graph equivalent of the Wasserstein metric. The concept of Wasserstein metrics on finite graphs were introduced independently by Chow et al. [14], Maas [36], and Mielke [37, 38]. All of the approaches rely on graph analogues of the continuity equation to describe the paths in the configuration space. On graphs the mass is distributed over the vertices and is exchanged over the edges. Hence, the analogues of the vector field and the flux are defined over the edges. However, the flux should be the product of the velocity (an edgebased quantity) by the density (a vertexbased quantity). Thus, one has to interpolate the densities at vertices to define the density (and hence the flux) along the edges. The choice of interpolation is not unique, and has important ramifications.
While the overall structure of our setup is derived from one in [36], which we recall in Section 1.4; the form of the interpolation used is related to the upwind interpolation used in [14] and is almost identical to one in [13]. While in [14] the authors considered only the direction of the flux due to the potential energy to determine which density to use on the edges, in our case the density chosen depends on the total velocity and we furthermore include the interaction term which itself depends on the configuration. In particular, we use an upwind interpolation based on the total velocity. In the context of graph Wasserstein distance, such interpolation was first used by Chen et al. [13].
The “velocities” v we consider can be assumed to be antisymmetric: \(v_{x,y} =  v_{y,x}\) for all \(x,y \in X\). In the graph setting, which we normalize in order to consider limit \(n \rightarrow \infty \), the continuity equation with upwind interpolation is obtained by combining (1.2) with the fluxvelocity relation (1.3). Similarly to [36] and exactly as in [13], we define the graph Wasserstein distance by minimizing the action, which is the graph analogue of the kinetic energy:
As in [13, 14, 36, 38], the graph Wasserstein distance is defined by adapting the Benamou–Brenier formula:
where \({{\,\mathrm{CE}\,}}_X(\rho ^0,\rho ^1)\) is the set of all paths (i.e., solutions of (1.2)–(1.3)) connecting \(\rho ^0\) and \(\rho ^1\).
It is important to observe that, in our setting, \({\mathcal {T}}\) is not symmetric (that is, \({\mathcal {T}}(\rho ^0,\rho ^1)\) is in general different from \({\mathcal {T}}(\rho ^1,\rho ^0)\)). The reason for this is that in general, \(A(\rho ,v) \ne A(\rho , v)\). Therefore the nonlocal Wasserstein distance which arises from the upwind interpolation is only a quasimetric. The action \(A(\rho ,v)\) provides a Finsler structure to the tangent space, instead of the usual Riemannian structure. Formally the system (1.2)–(1.4) is the gradient flow of \({\mathcal {E}}_X\) with respect to this Finsler structure; we present a derivation of this fact in a more general setting in Section 3.1. The system is also the curve of steepest descent with respect to quasimetric \({\mathcal {T}}\), which is the point of view we use to create rigorous theory in the general setting.
Remark 1.1
The wellposedness of (1.2)–(1.4) is a straightforward consequence of the Picard existence theorem. Namely, note that the simplex \(1 \geqq \rho _x \geqq 0\), \(\sum _{x \in X} \rho _x =1\) is an invariant region of the dynamics and that on it the vector field (1.4) is Lipschitz continuous in \(\rho _x\), \(x \in X\).
Remark 1.2
One could consider other interpolations instead of the upwind one. In particular, if we considered an interpolation of the form \(I(\rho _x, \rho _y)\) instead of the upwind one, the only change in the gradient flow would be that the velocityflux relation (1.3) would become \( j_{x,y} = \frac{1}{n} I(\rho _x, \rho _y) v_{x,y} \). We note that this can have major implications on the resulting dynamics. In particular, for the logarithmic interpolation, \(I(r,s) = (rs)/(\ln r  \ln s)\), or the geometric interpolation, \(I(r,s) = \sqrt{rs}\), the resulting dynamics would never expand the support of the solutions, so even for repulsive potentials the mass may not spread throughout the domain. On the other hand, using the arithmetic interpolation, \(I(r,s) =(r+s)/2\), would not work directly since the solutions may become negative. In this case additional technical steps, like a Lagrange multiplicator as in [39], are necessary to obtain the evolution of a nonnegative probability density. We use the more physical inspired upwind flux, which automatically ensures the positivity of the density.
Before we turn to the general setting we point out that the system (1.2)–(1.4) offers a new model of graphbased clustering, which is briefly discussed in Section 1.5.
General Setting for Vertices in Euclidean Space
Here we introduce the general framework for studies of interaction equations on families of graphs and their limits as the number of vertices n goes to \(\infty \). In particular, in the applications to machine learning which we briefly discuss in Section 1.5, the graphs considered are random samples of some underlying measure in Euclidean space, and the edge weights, as well as the interaction energy, depend on the positions of the vertices. The vertices are points in \({\mathbb {R}}^d\). The edges are given in terms of a nonnegative symmetric weight function \(\eta :\{ (x,y) \in {\mathbb {R}}^d \times {\mathbb {R}}^d : x \ne y \} \rightarrow [0, \infty )\), which defines the set of edges as \(G=\{ (x,y)\in {\mathbb {R}}^d\times {\mathbb {R}}^d : x\ne y , \,\eta (x,y)>0\}\). From the discrete setting, the set of vertices is replaced by the more general notion of a measure on \({\mathbb {R}}^d\); the discrete graphs with vertices \(X = \{x_1, \dots , x_n\} \subset {\mathbb {R}}^d\) correspond to \(\mu \) being the empirical measure of the set of points, \(\mu = \frac{1}{n} \sum _{i=1}^n \delta _{x_i}\). The distribution of mass over the vertices is described by the measure \(\rho \in {\mathcal {P}}({\mathbb {R}}^d)\) and in most applications we consider \({{\,\mathrm{supp}\,}}\rho \subseteq {{\,\mathrm{supp}\,}}\mu \). However, in order to prove general stability results (e.g., Theorem 3.14), we need to allow that initially part of the support of \(\rho \) is outside of the support of \(\mu \); we think of such mass as outside of the domain specified by \(\mu \). The mass starting outside of the support of \(\mu \) can only flow into the support of \(\mu \). Here we present the evolution assuming \(\rho \ll \mu \), while in Sections 2 and 3 we present the setup in full generality. Furthermore, we denote by \(\rho \) both the measure and its density with respect to \(\mu \).
The evolution of interest is the gradient descent of the energy \({\mathcal {E}}:{\mathcal {P}}({\mathbb {R}}^d)\rightarrow {\mathbb {R}}\) given by
where \(K:{{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}\rightarrow {\mathbb {R}}\) is symmetric and \(P:{{\mathbb {R}}^{d}}\rightarrow {\mathbb {R}}\). This energy generalizes (1.1) in terms of the configurations \(\rho \) and specializes it in terms of the type of potentials K and P considered. In fact, from now on we omit the subscripts X referring to the vertices (e.g. in the energy) since our general setting allows for distribution of mass outside of the support of \(\mu \). The gradient flow we consider takes the form
The system (\({\text {NL}}^2 {\text {IE}}\)) consists first of a nonlocal continuity equation, where the divergence \({\overline{\nabla }}\cdot \) is encoded with the graph structure described through \(\mu \) and \(\eta \) (see Definition 2.7). Secondly, it involves a mapping from velocity to flux, which in our case is the upwind flux and encodes the geometry of the gradient structure. Finally, the third equation identifies the driving velocity as the nonlocal gradient of the variation of the energy (1.5). Overall, we obtain that (\({\text {NL}}^2 {\text {IE}}\)) is the gradient flow of the energy \({\mathcal {E}}\) with respect to a generalization of the graph Wasserstein metric we now introduce.
Nonlocal Continuity Equation
Let us set
and call its elements nonlocal (antisymmetric) vector fields on G; for any pair \((x,y) \in G\) the value v(x, y) can be regarded as a jump rate from x to y. Let us fix a final time \(T>0\) throughout the paper and let a family \(\{v_t\}_{t\in [0,T]}\subset {\mathcal {V}}^{\mathrm {as}}(G)\) be given. In the case \(\rho _t \ll \mu \) for all \(t\in [0,T]\), it is possible to combine the first two equations in (\({\text {NL}}^2 {\text {IE}}\)) in order to arrive at the nonlocal continuity equation
For general curves \(\rho :[0,T] \rightarrow {\mathcal {P}}({{\mathbb {R}}^{d}})\), it is necessary to consider the weak form of (1.7), which is discussed in Section 2.3.
We remark that the general setup we develop allows for the solution \(\rho \) to develop atoms and persist even after the atoms have formed. Heuristic arguments and numerical experiments indicate that there are equations covered by our theory for which this is the case. For example, if \(\mu \) is the Lebesgue measure on \({\mathbb {R}}\), \(\rho _0\) the restriction of the Lebesgue measure to \([0.5,0.5]\), \(K(x,y) = xy\) and \(\eta (x,y)= 1/(xy)^2\), then the solutions develop delta mass concentrations at 0 in finite time. Understanding for which K and \(\eta \) solutions do develop finite time singularities is an interesting open problem.
We note that when defining the flux in (1.7) we define the density along edges to be the density at the source; analogously to an upwind numerical scheme. While, as we show, this leads to a convenient framework to consider the dynamics, it creates the difficulty that the resulting distance, that we are about to define, is not symmetric and is thus only a quasimetric.
Upwind Nonlocal Transportation Metric
We use the nonlocal continuity equation (1.7) to define a nonlocal Wasserstein quasidistance in analogy to the Benamou–Brenier formulation [6] for the classical Kantorovich–Wasserstein distances [50]. That is, for two probability measures \(\rho _0,\rho _1\in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\), let
where \({{\,\mathrm{CE}\,}}(\rho _0,\rho _1)\) is the set of weak solutions \(\rho \) to the nonlocal continuity equation (see Definition 2.14) on [0, 1] with \(\rho (0)=\rho _0\) and \(\rho (1)=\rho _1\). We note that the notion of the nonlocal Wasserstein distance for measures on \({\mathbb {R}}^d\) was introduced by Erbar [23], who used it to study the fractional heat equation. One difference is that the interpolation we consider is beyond the scope of [23]. Very recently [43] has extended the gradient flow viewpoint of the jump processes to generalized gradient structures driven by a broad class of internal energies.
Another difference is that here the measure \(\mu \) plays an important role in how the action is measured and allows one to incorporate seamlessly both the continuum case (e.g., \(\mu \) is the Lebesgue measure on \({\mathbb {R}}^d\)) and the graph case (\(\mu \) is the empirical measure of the set of vertices).
The notions above are rigorously developed in Section 2, where we list the precise assumption (W) on the edge weight function \(\eta \) and the joint assumptions (A1) and (A2)y on \(\eta \) and the underlying measure \(\mu \). We then rigorously introduce the action (Definition 2.3), which is a nonlocal analogue of kinetic energy; we show its fundamental properties, in particular joint convexity (Lemma 2.12) and lower semicontinuity with respect to narrow convergence (Lemma 2.9). In Section 2.3 we rigorously introduce the nonlocal continuity equation in measurevalued flux form (2.12); we introduce the notion on all of \({\mathbb {R}}^d\) where \(\mu \) does not initially play a role. The measure \(\mu \) enters the framework by considering paths of finite action. Proposition 2.17 establishes an important compactness property of sequences of solutions. In Section 2.4 we turn our attention to the nonlocal Wasserstein quasimetric based on the upwind interpolation, which we introduce in Definition 2.18. The compactness of solutions of the nonlocal continuity equation and the lower semicontinuity of the action imply the existence of (directed) geodesics (Proposition 2.20). We do not characterize the geodesics. Nevertheless we note that this is a interesting problem. A possible approach in this direction is via duality using nonlocal analogues of the HamiltonJacobi equations, similarly to how this problem was recently treated in the discrete setting in [28, 30]. Following the work of Erbar [23] we show that the nonlocal Wasserstein quasimetric generates a topology on the set of probability measures which is stronger than the \(W_1\) topology (i.e., the Monge distance or the 1Wasserstein metric). Analogously to [2] we show the equivalence between the paths of finite length with respect to the quasimetric and the solutions of the nonlocal continuity equation with finite action (Proposition 2.20). The set of probability measures endowed with the quasimetric \({\mathcal {T}}\) has a formal structure of a Finsler manifold, and parts of this structure can be described; in particular, in (2.27) we describe the tangent space at a given measure \(\rho \) using the fluxes. We note that using fluxes, instead of velocities, is necessary since, because of the upwinding, the relation between the velocities and the tangent vectors is not linear (Proposition 2.26) and in particular not symmetric. For this reason the resulting gradient structure is also different to the large class of nonlinear, however still symmetric, fluxvelocity relations considered in [43]. We conclude Section 2 by showing that, given a measure \(\mu \), the finiteness of the action ensures that any path starting within the support of \(\mu \) will remain within the support of \(\mu \) (Proposition 2.28).
Nonlocal NonlocalInteraction Equation
In Section 3 we develop the existence theory of the equation (\({\text {NL}}^2 {\text {IE}}\)) based on the interpretation as the gradient flow of \({\mathcal {E}}\) with respect to the quasimetric \({\mathcal {T}}\) defined in (1.8). We begin by listing the precise conditions (K1)–(K3) on the interaction kernel K. We note that these are less restrictive than the typical conditions for the wellposedness of the standard nonlocalinteraction equation in Euclidean setting [2, 10].
Before we turn to the rigorous theory of weak solutions as curves of maximal slope on quasimetric space, we discuss the gradient flow structure in a more geometric setting, namely the Finsler structure related to \({\mathcal {T}}\). Indeed, the action [formally given by the time integrand in (1.8), and rigorously defined by (2.4)] defines a positively homogeneous norm (namely a Minkowski norm) on the tangent space. The Hessian of the square of the norm endows the tangent space at each measure with the formal structure of a Riemann manifold. We compute this Riemann metric in “Appendix A” under an absolutecontinuity assumption. With this assumption, we show that (\({\text {NL}}^2 {\text {IE}}\)) is the gradient flow of \({\mathcal {E}}\) with respect to the Finsler structure in Section 3.1. For simplicity, we consider \(P \equiv 0\), since the extension to \(P \not \equiv 0\) is straightforward, as it is explained in Remark 3.2.
In Section 3.2 we develop the rigorous gradient descent formulation based on curves of maximal slope in the space of probability measures endowed with the quasimetric \({\mathcal {T}}\). The theory of gradient flows in the spaces of probability measures endowed with the standard Wasserstein metric was developed in [2]. Here we extend it to the setting of quasimetric spaces, endowed with the nonlocal Wasserstein distance. This requires several delicate arguments. We start by introducing the notions of onesided strong upper gradient (Definition 3.12) and curves of maximal slope (Definition 3.8). We define the local slope \({\mathcal {D}}\) in (3.19) by using a heuristically derived gradient of the energy \({\mathcal {E}}\), and show, using a chain rule established in Proposition 3.10, that \(\sqrt{{\mathcal {D}}}\) is a onesided strong upper gradient for \({\mathcal {E}}\) with respect to \({\mathcal {T}}\). One of our main results is Theorem 3.9, which establishes the equivalence between curves of maximal slope and weak solutions of (\({\text {NL}}^2 {\text {IE}}\)). In Section 3.4 we prove several important results. Namely Theorem 3.14 establishes that the De Giorgi functional \({\mathcal {G}}_T\) is stable under variations of the base measure \(\mu \) and of the solutions. A consequence of this result is the convergence of solutions of (\({\text {NL}}^2 {\text {IE}}\)) on graphs defined on random samples of a measure to solutions of (\({\text {NL}}^2 {\text {IE}}\)) corresponding to the full underlying measure (Remark 3.17). The proof of Theorem 3.14 relies on the lower semicontinuity of the local slope (Lemma 3.12) and the lower semicontinuity of the De Giorgi functional (3.13). Another important consequence is the existence of weak solutions of (\({\text {NL}}^2 {\text {IE}}\)), which is proved in Theorem 3.15.
Remark 1.3
(Asymptotics) Describing the steady states and determining the longtime asymptotics of (\({\text {NL}}^2 {\text {IE}}\)) are natural and important problems. Both questions have been extensively studied for the nonlocalinteraction equations (NLI) which are Wasserstein gradient flows of (1.5) with \(P \equiv 0\). For attractive interaction potentials it was shown that the solutions converge to a delta mass [7], while for more general repulsive–attractive potentials very rich families of steady states were discovered [3, 35]. We remark that the dynamics of the (\({\text {NL}}^2 {\text {IE}}\)) can be significantly different. Namely, as the example of Remark 3.18 shows, the solutions for attractive potentials do not necessarily converge to a point.
A further question closely related to asymptotics is the contractivity of solutions of (\({\text {NL}}^2 {\text {IE}}\)). For Riemannian gradient flows the contractivity of the flow follows form the geodesic convexity of the energy. In particular if \(K(x,y)=k(xy)\), where k is symmetric and convex, the NLI flow is contractive in Wasserstein metric [2, 11]. Determining the geodesic convexity of energies in the setting of the nonlocal Wasserstein metrics is an intriguing question. Thus far, the only result in the general (not purely discrete) setting is the geodesic convexity of the entropy [23]. However, for Finslerian gradient flows we caution that establishing geodesic convexity does not imply contractivity, as [42] shows. Instead a new property of skewconvexity [42, Definition 3.1] needs to be investigated.
Finally we note that the asymptotics of gradient flows with respect to (nonlocal) Wasserstein metrics in discrete setting has recently been investigated in [15, 26], where the equations also include diffusion (i.e., energy includes an entropic contribution). These papers use the convexity of the total energy in the discrete setting to establish the exponential convergence of the flow towards the unique minimizer. Establishing under which conditions (on the graph construction, etc.) do these estimates persist in the discrete to continuum limit as the number of vertices increases is an interesting open problem. We also remark that, while these results do not carry over to our setting, analyzing the asymptotics of (\({\text {NL}}^2 {\text {IE}}\)) in purely discrete setting is an intriguing and potentially approachable question.
Relation to the Numerical FiniteVolume Upwind Scheme
Equation (1.7) can be interpreted in several ways. For example, it can be understood as the master equation of a continuoustime and continuousspace Markov jump process on the graphon \(({{\mathbb {R}}^{d}}, \eta )\), that is, a continuous graph with vertices \({{\mathbb {R}}^{d}}\), and symmetric weight \(\eta (x,y)\) for \((x,y)\in \{(x,y)\in {\mathbb {R}}^d\times {\mathbb {R}}^d: x\ne y\}\). The stochastic interpretation is that a particle at position \(x\in {\mathbb {R}}^d\) jumps according to the measure \(v(x,y)_+\eta (x,y)\text {d}\mu (y)\) to \(y\in {\mathbb {R}}^d\). In this way it gives rise to a Markov jump process related to the numerical upwind scheme.
The numerical upwind scheme is one of the basic finitevolume methods used to solve conservation laws; see [29]. To draw the connection, let \(\{x_1, \dots , x_n\}\) be a suitable representative of a tessellation \(\{K_1,\dots ,K_n\}\), for instance a Voronoi tessellation, of some bounded domain \(\Omega \subset {\mathbb {R}}^d\). Let \(\mu \) be the Lebesgue measure on \(\Omega \) and take \(\eta \) to be the transmission coefficient common in finitevolume schemes: \(\eta (x_i,x_j) = {\mathcal {H}}^{d1}(\overline{K_i}\cap \overline{K_j})/{\text {Leb}}(K_i)\), for \(i,j\in \{1,\dots ,n\}\), where \({\mathcal {H}}^{d1}(\overline{K_i}\cap \overline{K_j})\) is the \(d1\) dimensional Hausdorff measure of the common face between \(K_i\) and \(K_j\). With this choice the equation (1.7) becomes the (continuoustime) discretization of the classical continuity equation
for some vector field \({\mathbf {v}}_t:\Omega \rightarrow {\mathbb {R}}^d\). Hereby, the discretized vector field \(v_t\) is obtained from \({\mathbf {v}}_t\) by taking the average over common interfaces:
where \(\nu _{K_i,K_j}\) is the unit normal to \(K_i\) pointing from \(K_i\) to \(K_j\). We refer to the recent work [9] for a variational interpretation of the upwind scheme, which is close to that we propose for the more general equation (1.7). Earlier results in this direction are contained in [21, 38].
The connection to finitevolume schemes explains also that the nonlocality in (1.7) introduces a regularization, which in the numerical literature is referred to as numerical diffusion. That the numerical diffusion is actually an honest Markov jump process, as described at the beginning of this section, was observed and used to find optimal convergence rates in the works [19, 20, 45, 46].
Comparison with Other Discrete Metrics and Gradient Structures
The interpretation of diffusion on graphs as gradient flows of the entropy was independently carried out in [14, 36, 37]. Here we recall the descriptions of the flows relying on reversible Markov chains, which was the framework used in [25, 27, 36]. Starting with Markov chains, which then determine the edge weights, offers an additional layer of modeling flexibility. In particular, consider the Markov chain with state space \(X = \{x_1, \dots , x_n\}\) and jump rates \(\{Q_{x,y}\}_{x,y\in X}\). Let \(\pi _x\) be the reversible probability measure for the Markov chain, meaning that it satisfies the detailed balance condition \(\pi _x Q_{x,y} = \pi _y Q_{y,x}\). The edge weights \(\{w_{x,y}\}_{x,y\in X}\) are given by \(w_{x,y}=\pi _x Q_{x,y}\). The energy considered is the relative entropy: for \(\rho :X \rightarrow [0,1]\) with \(\sum _{x \in X} \rho _x = 1\) we define
The paths in the configuration space are given as the solution of the continuity equation which for the flux \(\{j_{x,y}:[0,T]\rightarrow {\mathbb {R}}\}_{x,y\in X}\) takes the form (1.2).
To compute the flux from a given velocity \(\{v_{x,y}\}_{x,y\in X}\) (an edgebased quantity) and density \(\{\rho _x\}_{x\in X}\) (a vertexbased quantity), one interpolates the densities at vertices to define the density (and hence the flux) along the edges. The literature so far has considered a proportional constitutive relation of the form
where the function \(\theta :{\mathbb {R}}_+\times {\mathbb {R}}_+\rightarrow {\mathbb {R}}_+\) needs to be onehomogeneous for dimensional reasons. In addition, it is assumed that the function \(\theta \) is an interpolation, that is, \(\min \{a,b\}\leqq \theta (a,b)\leqq \max \{a,b\}\). The choice providing a gradient flow characterization for linear Markov chains is the logarithmic mean, defined by \(\theta (a,b)= \frac{ab}{\log a  \log b}\) for \(a \ne b\) and \(\theta (a,a)=a\).
The associated transportation distance is obtained by minimizing the action functional
The corresponding transportation distance is induced as the minimum of the action along paths:
As we do in Corollary 2.8, it was shown that it suffices to consider antisymmetric fluxes. To arrive at a gradient flow formulation, one considers the metric induced by the action function (1.11):
Then the gradient \({{\,\mathrm{grad}\,}}{\mathcal {H}}\) of the relative entropy (1.9) with respect to this metric is given as the antisymmetric flux \(j^*\) of minimal norm satisfying
for any curve \(({\tilde{\rho }}(t))_{t\geqq 0}\) such that \(\partial _t \rho (0) =  \big ({\overline{\nabla }}\cdot j\big )\). Expanding (1.13) and using that \(j^*\) is antisymmetric gives
Since this identity holds for all \(j_{x,y}\), the flux \(j^*\) is identified by
where the last equality holds for the particular choice of the logarithmic mean interpolation \(\theta (r,s) = \frac{rs}{\ln r  \ln s}\). By plugging \(j_{x,y}^*\) into the continuity equation (1.2), one recovers the (linear) heat equation on graphs.
The next relevant step is the introduction of the interaction and the potential energies as in (1.1). In particular, [25] provides a gradient flow structure for free energy functionals of the form
where \(\beta >0\) is the inverse temperature. This is the discrete analogue of the McKeanVlassov equation. Finding a desirable gradient flow structure is nontrivial since considering the logarithmic interpolation, which makes the diffusion term linear, would make the potential term nonlinear, and thus the Fokker–Planck equation on graphs would be nonlinear. To cope with this, the framework of [25] extends the linear theory outlined above to a family of nonlinear Markov chains satisfying a local detailed balance condition. The consequence for the resulting gradient structure is that the quantities \(\{\pi _x\}_{x\in X}\), \(\left\{ Q_{x,y}\right\} _{x,y\in X}\) and \(\left\{ w_{x,y}\right\} _{x,y\in X}\) depend on the current state \(\rho \) in such a way that the detailed balance condition \(w_{x,y}[\rho ] = \pi _x[\rho ] Q_{x,y}[\rho ] = \pi _y[\rho ] Q_{y,x}[\rho ] \) is still valid for all \(\rho \in {\mathcal {P}}(X)\). In particular, for \({\mathcal {F}}_\beta \) defined in (1.14), it holds that
It would be natural to try to build the framework for the case \(\beta =\infty \), which we consider in this paper, by taking the limit \(\beta \rightarrow \infty \) in the framework of [25]. It turns out that this limit is singular for the constructed gradient structure. First of all, the measure \(\pi _x[\rho ]\) degenerates at all points except at the argmin of the effective potential \(x\mapsto P_x + \sum _{y} K_{x,y}\rho _y\). This causes the constitutive relation (1.10) to become meaningless. A more detailed analysis also shows that the metric in (1.12) degenerates.
We also note that in this setting the potential functions P and K and inverse temperature \(\beta \) enter the metric in (1.11) through the weights \(w_{x,y}\) and rate matrix \(Q_{x,y}\). This is in stark contrast to the continuous classical gradient flow formulation for free energies of the form \({\mathcal {F}}_\beta \) form (1.14), where the metric is always the \(L^2\)Wasserstein distance, independently of the potentials P and K and also of the inverse temperature \(\beta >0\), including \(\beta =\infty \) [2, 10, 11, 33].
Another approach to McKeanVlasov equations is to consider the arithmetic interpolation, as was done in [15]. The theory the authors developed requires the densities to be strictly positive and diffusion to be present. We note that the diffusion itself is nonlinear.
The above problems lead us to consider the upwind interpolation in the fluxvelocity relation (1.10). In view of (1.2), this relation is replaced in the present setting by
Note that the relation (1.15) is a functional relation between velocity and flux with the interpolation \(\Theta \) depending on the velocity.
We remark that solutions of system (1.2)–(1.4) are not the limit of the gradient flows in [25] as \(\beta \rightarrow \infty \). We emphasize here that the limit of these dynamics as \(\beta \rightarrow \infty \) would in fact not be the desirable gradient flow of the nonlocalinteraction energy, since the initial support of the solutions would never expand; see the related Remark 1.2.
We conclude this section by observing that it seems possible to generalize the upwind interpolation in a continuous way to define a fluxvelocity relation to deal with free energies \({\mathcal {F}}_\beta \) for \(\beta >0\). A candidate, inspired by the Scharfetter–Gummel scheme [44], is the following constitutive fluxvelocity relation depending on \(\beta \):
In particular, it holds that \(j^\beta _{x,y} \rightarrow j_{x,y}\) as \(\beta \rightarrow \infty \), where \(j_{x,y}\) is as in (1.15). The form of \(j^\beta _{x,y}\) can be physically deduced from the onedimensional cell problem for the unknown value \(j^\beta _{x,y}\in {\mathbb {R}}\) and function \(\rho :[0,1]\rightarrow {\mathbb {R}}\):
Note that \(j^\beta _{x,y} = \frac{\rho _x\rho _y}{\beta }\) for \(v_{x,y} =0\), which is the flux due to Fick’s law. Likewise, \(j^\beta _{x,y} = 0\) for \(v_{x,y} = \beta ^{1} \log \frac{\rho _y}{\rho _x}\), which is the velocity needed to counteract the diffusion. In [47], it is shown that the Scharfetter–Gummel finite volume scheme provides a stable positivity preserving numerical approximation of the diffussionaggregation equation, which also respects the thermodynamic free energy structure. We pursue the investigation of the existence of a possible related gradient structure in future research.
Connections to Machine Learning
Part of the motivation for the present work comes from applications to machine learning. Here we introduce a family of nonlinear gradient flows that is relevant to discovering local concentrations in networks akin to modes of a distribution.
Our main interest is in equations posed on graphs whose vertices are random samples of some underlying distribution and whose edge weights are a function of distances between vertices. In machine learning one often deals with data in the form of a point cloud in highdimensional space. While the ambient dimension may be very large, the data often possess an underlying lowdimensional structure that can be used in making reliable inferences about the underlying data distribution. To use the geometric information, we follow one of the standard approaches and consider graphs associated to point clouds. Formulating the machine learning tasks directly on the point cloud enables one to access the geometric structure of the distribution in a simple and computationally efficient way. The works in the literature have mostly focused on models based on minimizing objective functionals modeling tasks such as clustering or dimensionality reduction [5, 31, 32, 34, 40], or based on characterizing clusters through estimating some property of the data distribution (most often the density); see [12] and references therein. Only few dynamical models have been considered—notable among them are diffusion maps [16], where the heat equation is used to redistance the points.
Here we focus on models that are motivated by nonlocal PDEs. Consider a probability measure \(\mu \) on \({\mathbb {R}}^d\) with finite second moments. Let \(X =\{x_1, \dots , x_n\}\) be random i.i.d. samples of the measure \(\mu \). Let \(\mu ^n = \frac{1}{n} \sum _{i=1}^n \delta _{x_i}\) be the empirical measure of the sample and let \(K:{\mathbb {R}}^d \times {\mathbb {R}}^d \rightarrow {\mathbb {R}}\) be symmetric and \(P:{\mathbb {R}}^d \rightarrow {\mathbb {R}}\). The total energy \({\mathcal {E}}_X:{\mathcal {P}}(X)\rightarrow {\mathbb {R}}\), given in (1.1), for the empirical measure \(\mu ^n\) can be rewritten as
The gradient flow of \({\mathcal {E}}_X\) with respect to the graph Wasserstein metric \({\mathcal {T}}_{\mu ^n}\) defined in (1.8) is described by the ODE system (1.2)–(1.4), where \(K_{x_i,x_j} = K(x_i,x_j)\) and \(P_{x_i} =P(x_i)\) for all \(i,j\in \{1,\dots ,n\}\). Another evolution by such system is illustrated on Fig. 2.
Here we remark on the contrast between (1.2)–(1.4) and the gradient flow of (1.16) in the ambient space \({\mathbb {R}}^d\), with respect to the standard Wasserstein metric, which takes the form
The first notable difference is that, on the graph, masses change and the positions remain fixed, while in \({\mathbb {R}}^d\) positions change and the masses remain fixed. This difference is somewhat superficial, since both equations describe the rearrangement of mass in order to decrease the same energy in the most efficient way measured by two different metrics. The main difference is that the graph encodes the geometry of the space that mass is allowed to occupy. In particular, it ensures that the geometric mode discovered will be a data point itself.
We note that the popular meanshift algorithm [17] can be interpreted as a timestepping algorithm to approximate solutions of (1.17) with \(K\equiv 0\) and \(P = \ln (\theta * \mu ^n(0))\), where \(\mu ^n(0)\) is the empirical measure of the initial distribution of particles and \(\theta * \mu ^n(0)\) is the kernel density estimate of the density \({\varvec{\rho }}\) of the underlying distribution. Namely the step of the meanshift algorithm is to replace the position of the particle at \(x_j\) by the center of mass of \(\theta ( \,\cdot \,  x_j)* \mu _n(0)\) and iterate the procedure. Formal expansion shows that this is a time step of the forward scheme for the flow driven by \(P = \ln (\theta * \mu ^n(0))\). We note that considering the gradient flow of the corresponding energy on the graph (1.2)–(1.4) ensures that the modes of the distribution discovered by the (graph) meanshift algorithm will remain within the data set. Furthermore, we note that adding nonlocal attraction on the graph progressively clumps nearby masses together and thus provides an approach to agglomerative clustering.
One of our main results, stated in Theorem 3.14, is that as \(n \rightarrow \infty \) the solutions of the graphbased equation (1.2)–(1.4) narrowly converge along a subsequence to a solution of the nonlocal nonlocalinteraction equation (\({\text {NL}}^2 {\text {IE}}\)).
Nonlocal Continuity Equation and Upwind Transportation Metric
Weight Function
Throughout the paper we consider a weight function \(\eta :\{(x,y)\in {\mathbb {R}}^d\times {\mathbb {R}}^d : x\ne y\} \rightarrow [0,\infty )\), which shall always satisfy
Since \(\eta \) is symmetric, we regard the edges set G as undirected graph. Many of the edgebased quantities we consider, like vector fields and fluxes, will lie in an \(\eta \)weighted \(L^2\) space, \(L^2(\eta \, \lambda )\) for some \(\lambda \in {\mathcal {M}}(G)\). The space \(L^2(\eta \,\lambda )\) is equipped with the inner product
where the factor \(\frac{1}{2}\) ensures that each undirected edge is counted only once.
Below we state two assumptions on the base measure \(\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})\) and the weight function \(\eta \), where we use the notation \(\vee \) to denote the maximum.

(A1) (moment bound) The family of functions \(\{\left( x\cdot ^2 \vee x\cdot ^4 \right) \eta (x,\cdot )\}_{x\in {{\mathbb {R}}^{d}}}\) is uniformly integrable with respect to \(\mu \), that is, for some \(C_\eta \in (0,\infty )\), it holds that
$$\begin{aligned} \sup _{x\in {\mathbb {R}}^d} {\int } \left( xy^2 \vee xy^4 \right) \, \eta (x,y)\,\text {d}\mu (y) \leqq C_\eta . \end{aligned}$$ 
(A2) (local blowup control) The family of measures \(\{x  \cdot ^2\eta (x,\cdot ) \mu (\cdot )\}_{x\in {{\mathbb {R}}^{d}}}\) is locally uniformly integrable, that is,
$$\begin{aligned}&\lim _{\varepsilon \rightarrow 0} \sup _{x\in {\mathbb {R}}^d} {\int }_{B_\varepsilon (x){\setminus }\{x\}} xy ^2 \, \eta (x,y) \,\text{ d }\mu (y)= 0, \quad \text{ where } \\&\quad B_\varepsilon (x) = \bigl \{ y\in {\mathbb {R}}^d: xy<\varepsilon \bigr \}. \end{aligned}$$
Remark 2.1
Continuity on G in (W) is needed to obtain lower semicontinuity of the action functional; see Lemma 2.9. Assumption (A1) ensures wellposedness of the nonlocal continuity equation we shall introduce in Section 2.3, whereas Assumption (A2) is necessary for compactness of solutions to the nonlocal continuity equation; see Proposition 2.17.
Example 2.2
Typically the function \(\eta \) is a function of the distance
where \(\vartheta :(0,\infty ) \rightarrow [0,\infty )\) is continuous on \(\{\vartheta >0\}\) and satisfies analogues of (A1) and (A2). An important example are geometric graphs with connectivity distance given by \(\varepsilon >0\) and weight
In this example, fixing \(\mu = {\text {Leb}}({{\mathbb {R}}^{d}})\), we conjecture that the weak formulation of (\({\text {NL}}^2 {\text {IE}}\))—see Section 3—converges to the nonlocal aggregation equation \(\partial _t \rho _t = \nabla \cdot \left( \rho _t \nabla K*\rho _t+ \rho _t \nabla P\right) \) as \(\varepsilon \rightarrow 0\) for sufficiently smooth potentials K and P. See Section 3.5 for a discussion on the local limit.
Action
The form of the action inside (1.8) seems practical, but it does not have any obvious convexity and lower semicontinuity properties. Therefore, we define the action in flux variables. We start by introducing some notation. For a signed measure \({\varvec{j}}\in {\mathcal {M}}(G)\), we denote by \({\varvec{j}}={\varvec{j}}^+{\varvec{j}}^\) its Jordan decomposition. Moreover, for any measurable \(A\subseteq G\), let \(A^\top =\{(y,x) \in {{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}: (x,y)\in A\}\) be its transpose. Likewise, for \({\varvec{j}}\in {\mathcal {M}}(G)\), we denote by \({\varvec{j}}^\top \) the transposed measure defined by \({\varvec{j}}^\top (A)={\varvec{j}}(A^\top )\).
For any measures \(\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})\) and \(\rho \in {\mathcal {P}}({{\mathbb {R}}^{d}})\), we define the (restricted) product measures \(\gamma _i\in {\mathcal {M}}^+(G)\) for \(i=1,2\) as
Note that \(\gamma ^\top _1 = \gamma _2\). We define the action for general \(\eta \) which we only require to satisfy Assumption (W), i.e., continuity on G, symmetry and positivity.
Definition 2.3
(Action) For \(\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})\), \(\rho \in {\mathcal {P}}({{\mathbb {R}}^{d}})\) and \({\varvec{j}}\in {\mathcal {M}}(G)\), consider \(\lambda \in {\mathcal {M}}(G)\) such that \(\rho \otimes \mu ,\mu \otimes \rho ,{\varvec{j}}\ll \lambda \). We define
Hereby, the lower semicontinuous, convex, and positively onehomogeneous function \(\alpha :{\mathbb {R}}\times {\mathbb {R}}_+\rightarrow {\mathbb {R}}_+\cup \{\infty \}\) is defined, for all \(j\in {\mathbb {R}}\) and \(r\geqq 0\), by
with \(j_+=\max \{0,j\}\). If the measure \(\mu \) is clear from the context, we write \({\mathcal {A}}(\rho ,{\varvec{j}})\) for \({\mathcal {A}}(\mu ;\rho ,{\varvec{j}})\).
Note that Definition 2.3 is wellposed since the onehomogeneity of \(\alpha \) makes it independent of the particular choice of \(\lambda \) as long as the absolute continuity condition in Definition 2.3 is satisfied. An example of such measure is a \(\lambda \) such that \(\lambda =\rho \otimes \mu +\mu \otimes \rho +{\varvec{j}}\). Moreover, \(\lambda \) can be chosen symmetric, otherwise it can be replaced by \(\frac{1}{2}(\lambda +\lambda ^\top )\).
Remark 2.4
We note that the action is inversely proportional to the measure \(\mu \): doubling the measure \(\mu \) leads to halving the action. This has important consequence for the way \(\mu \) influences the geometry of the space of measures. In particular, \(\mu \) not only sets the region where mass can be transported, but also makes the transport less costly in the regions of high density of \(\mu \).
Remark 2.5
If \(\rho \ll \mu \), then we denote its density by \(\rho \) by abuse of notation, and if furthermore \({\varvec{j}}\ll \mu \otimes \mu \) with density j, then it holds that
In the following lemma we can see that the action takes the form from the tentative definition of the metric in (1.8), as soon as it is bounded.
Lemma 2.6
Let \(\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})\), \(\rho \in {\mathcal {P}}({{\mathbb {R}}^{d}})\) and \({\varvec{j}}\in {\mathcal {M}}(G)\) be such that \({\mathcal {A}}(\mu ;\rho ,{\varvec{j}})<\infty \). Then there exists a measurable \(v:G\rightarrow {\mathbb {R}}\) such that
and it holds that
In particular, if \(v\in {\mathcal {V}}^{\mathrm {as}}(G)\), then
Proof
Let \(\lambda \in {\mathcal {M}}^+(G)\) be such that \(\text {d}\gamma _1(x,y) = \text {d}\rho (x) \text {d}\mu (y) = {\tilde{\gamma }}_1(x,y) \text {d}\lambda (x,y)\), likewise \(\text {d}\gamma _2(x,y) = \text {d}\mu (x) \text {d}\rho (y) = {\tilde{\gamma }}_2(x,y) \text {d}\lambda (x,y)\), and \(\text {d}{\varvec{j}}= {\tilde{j}} \text {d}\lambda \) for some measurable \({\tilde{\gamma }}_1,{\tilde{\gamma }}_2,{\tilde{j}}:G\rightarrow {\mathbb {R}}\). Without loss of generality we can assume \(\lambda \) to be symmetric; for instance by considering \(\tfrac{1}{2} (\lambda + \lambda ^\top )\) instead. Thus, (2.4) implies
By the definition of the function \(\alpha \) in (2.5), it immediately follows that the vector field \({\tilde{v}}^+(x,y) = \frac{{\tilde{j}}(x,y)_+}{{\tilde{\gamma }}_1(x,y)}\) is welldefined \(\gamma _1\)a.e. on G. By the same argument, we find that \({\tilde{v}}^(x,y) = \frac{{\tilde{j}}(x,y )_}{{\tilde{\gamma }}_2(x,y)}\) is welldefined \(\gamma _2\)a.e. on G. Since \(\gamma _1=\gamma _2^\top \) we have that \({\left( {\tilde{v}}^\right) }^{\top }\) exists \(\gamma _1\)a.e. on G. Hence, we obtain the measurable vector field
The statement (2.7) follows by using the positively onehomogeneity of \(\alpha \), the identity \(\alpha (j,r)=\alpha (j_+,r)\) and the symmetry of \(\lambda \):
\(\square \)
Definition 2.7
(Nonlocal gradient and divergence) For any function \(\phi :{{\mathbb {R}}^{d}}\rightarrow {\mathbb {R}}\) we define its nonlocal gradient \({\overline{\nabla }}\phi :G \rightarrow {\mathbb {R}}\) by
For any \({\varvec{j}}\in {\mathcal {M}}(G)\), its nonlocal divergence \({\overline{\nabla }}\cdot {\varvec{j}}\in {\mathcal {M}}({\mathbb {R}}^d)\) is defined as \(\eta \)weighted adjoint of \({\overline{\nabla }}\), i.e.,
In particular, for \({\varvec{j}}\in {\mathcal {M}}^{\mathrm {as}}(G) := \{{\varvec{j}}\in {\mathcal {M}}(G): {\varvec{j}}^\top =  {\varvec{j}}\}\),
If \({\varvec{j}}\) is given by (2.6) for some \(v\in {\mathcal {V}}^{\mathrm {as}}(G)\), then the flux satisfies an antisymmetric relation on the support of \(\gamma _1\)a.e. on G, i.e., \({\varvec{j}}^+=({\varvec{j}}^\top )^\) \(\gamma _1\)a.e. on G. The following corollary shows that those antisymmetric fluxes are the relevant ones for the minimization of the action functional. For this reason, the natural class of fluxes are those measure on G which are antisymmetric with positive part absolutely continuous with respect to \(\gamma _1\), that is,
Corollary 2.8
(Antisymmetric vector fields have lower action) Let \(\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})\), \(\rho \in {\mathcal {P}}({{\mathbb {R}}^{d}})\) and \({\varvec{j}}\in {\mathcal {M}}(G)\) be such that \({\mathcal {A}}(\mu ;\rho ,{\varvec{j}})<\infty \). Then there exists an antisymmetric flux \({\varvec{j}}^{\mathrm {as}}\in {\mathcal {M}}_{\gamma _1}^{\mathrm {as}}\) such that
with lower action:
Proof
Let us set \({\varvec{j}}^{\mathrm {as}} = ({\varvec{j}} {\varvec{j}}^\top )/2\). Since \(\eta \) is symmetric and \(\big ({\overline{\nabla }}\phi \big )^\top =  {\overline{\nabla }}\phi \), we get
By an application of Lemma 2.6 and comparison of (2.7) and (2.8) it is enough to show that, for all \((x,y)\in G\),
for any measurable \(v:G\rightarrow {\mathbb {R}}\), where \(v^{\mathrm {as}}(x,y) = \left( v(x,y) v(y,x)\right) /2\). This estimate is a consequence of Jensen’s inequality applied to the convex functions
\(\square \)
Lemma 2.9
(Lower semicontinuity of the action) The action is lower semicontinuous with respect to the narrow convergence in \({\mathcal {M}}^+({{\mathbb {R}}^{d}})\times {\mathcal {P}}({{\mathbb {R}}^{d}})\times {\mathcal {M}}(G)\). That is, if \(\mu ^n {\rightharpoonup }\mu \) in \({\mathcal {M}}({{\mathbb {R}}^{d}})\), \(\rho ^n {\rightharpoonup }\rho \) in \({\mathcal {P}}({{\mathbb {R}}^{d}})\), and \({\varvec{j}}^n {\rightharpoonup }{\varvec{j}}\) in \({\mathcal {M}}(G)\), then
Proof
First, note that the narrow convergence of any sequences \((\rho ^n)_n\) and \((\mu ^n)_n\) implies the narrow convergence of the product: \(\rho ^n\otimes \mu ^n \rightharpoonup \rho \otimes \mu \) in \({\mathcal {P}}({{\mathbb {R}}^{d}})\times {\mathcal {M}}^+({{\mathbb {R}}^{d}})\), therefore also in \({\mathcal {M}}^+(G)\). Then, in Definition 2.3 consider the vectorvalued measure
Further, we define the function
Since the function \(\eta \) is lower semicontinuous by (W) and \(\alpha \) defined in (2.5) is lower semicontinuous, jointly convex and positively onehomogeneous, f satisfies the assumptions of [8, Theorem 3.4.3], whence the claim follows. \(\square \)
According to Definition 2.3, fluxes and action are strictly related. In case \({\mathcal {A}}(\mu ;\rho ,{\varvec{j}})<+ \infty \), we get a useful upper bound in the following lemma that will be crucial in several technical parts later on.
Lemma 2.10
For any \(\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})\), \(\rho \in {\mathcal {P}}({{\mathbb {R}}^{d}})\), \({\varvec{j}}\in {\mathcal {M}}(G)\) and any measurable \(\Phi :G\rightarrow {\mathbb {R}}_+\) it holds
Proof
Let \(\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})\), \(\rho \in {\mathcal {P}}({{\mathbb {R}}^{d}})\) and \({\varvec{j}}\in {\mathcal {M}}(G)\) be such that \({\mathcal {A}}(\mu ;\rho ,{\varvec{j}})<+ \infty \). Let \(\lambda  \in {\mathcal {M}}^+(G)\) be such that \(\gamma _1, \gamma _2, {\varvec{j}}\ll \lambda \) as in Definition 2.3 and write \(\gamma _i = {\tilde{\gamma }}_i \lambda \) and \({\varvec{j}} = j \lambda \) for the densities.
We have that \(A:=\bigl \{ (x,y) \in G:\alpha (j,{\tilde{\gamma }}_1) = \infty \text{ or } \alpha (j,{\tilde{\gamma }}_2)=\infty \bigr \}\) is a \(\lambda \)nullset. We observe the elementary inequality
In particular, it holds that
Hence we can estimate
Now, the result follows by estimating \(\max \left\{ {\tilde{\gamma }}_1,{\tilde{\gamma }}_2\right\} \leqq {\tilde{\gamma }}_1 + {\tilde{\gamma }}_2\). \(\square \)
As a consequence of the previous results we have the following corollary, which will be useful in Section 2.3:
Corollary 2.11
Let \(\mu \in {\mathcal {M}}^+({\mathbb {R}}^d)\) satisfy (A1) for some \(C_\eta \in (0,\infty )\), then for all \(\rho \in {\mathcal {P}}({{\mathbb {R}}^{d}})\) and \({\varvec{j}}\in {\mathcal {M}}(G)\) there holds
Proof
Let us consider the case \({\mathcal {A}}(\mu ;\rho ,{\varvec{j}})<\infty \), otherwise the result is trivial. From Lemma 2.6 we have \(\text {d}{\varvec{j}}(x,y)=v(x,y)_+\text {d}\gamma _1(x,y)v(x,y)_\text {d}\gamma _2(x,y)\), with \(\text {d}\gamma _1(x,y)=\text {d}\rho (x)\mu (y)\) and \(\text {d}\gamma _2(x,y)=\text {d}\mu (x)\text {d}\rho (y)\). Applying Lemma 2.10 for \(\Phi (x,y)=2\wedge xy\) and noticing \(\Phi (x,y) \leqq xy \leqq xy\vee xy^2\), we arrive at the bound
where the last estimate follows from (A1) and the integral is finite since \(\rho \in {\mathcal {P}}({{\mathbb {R}}^{d}})\). \(\square \)
Lemma 2.12
(Convexity of the action) Let \(\mu ^i\in {\mathcal {M}}^+({{\mathbb {R}}^{d}})\), \(\rho ^i \in {\mathcal {P}}({{\mathbb {R}}^{d}})\) and \({\varvec{j}}^i \in {\mathcal {M}}(G)\) for \(i=0,1\). For \(\tau \in (0,1)\) such that \(\mu ^\tau = (1\tau ) \mu ^0 + \tau \mu ^1\), \(\rho ^\tau = (1\tau ) \rho ^0 + \tau \rho ^1\) and \({\varvec{j}}^\tau = (1\tau ) {\varvec{j}}^0 + \tau {\varvec{j}}^1\), it holds
Proof
Let us consider a measure \(\lambda \in {\mathcal {M}}(G)\) such that \(\text {d}\gamma _j^i={\tilde{\gamma }}_j^i\text {d}\lambda \) and \(\text {d}{\varvec{j}}^i=\tilde{{\varvec{j}}}^i\text {d}\lambda \) for \(i=0,1\) and \(j=1,2\). Then, the convex combinations are such that \(\text {d}\gamma _j^\tau ={\tilde{\gamma }}_j^\tau \text {d}\lambda \) and \(\text {d}{\varvec{j}}^{\tau }=\tilde{{\varvec{j}}}^{\tau }\text {d}\lambda \), where
Using the convexity of the function \(\alpha \) we get the result, that is,
\(\square \)
Nonlocal Continuity Equation
In view of the considerations made in Section 2.2, we now deal with the nonlocal continuity equation
where \((\rho _t)_{t\in [0,T]}\) and \(({\varvec{j}}_t)_{t\in [0,T]}\) are unknown Borel families of measures in \({\mathcal {P}}({{\mathbb {R}}^{d}})\) and \({\mathcal {M}}(G)\), respectively. Equation (2.12) is understood in the weak form: \(\forall \varphi \in C_\mathrm {c}^\infty ((0,T)\times {{\mathbb {R}}^{d}})\),
Since \({\overline{\nabla }}\varphi (x,y)\leqq \varphi _{C^1}(2\wedge xy)\), the weak formulation is welldefined under the integrability condition
Remark 2.13
The integrability condition (2.14) is automatically satisfied by a pair \((\rho _t, {\varvec{j}}_t)_{t\in [0,T]}\) such that \({\int }_0^T {\mathcal {A}}(\mu ;\rho _t,{\varvec{j}}_t) \,\text {d}t< \infty \), due to Corollary 2.11.
Hence we arrive at the following definition of weak solution of the nonlocal continuity equation:
Definition 2.14
(Nonlocal continuity equation in flux form) A pair \((\rho ,{\varvec{j}}):[0,T] \rightarrow {\mathcal {P}}({{\mathbb {R}}^{d}})\times {\mathcal {M}}(G)\) is called a weak solution to the nonlocal continuity equation (2.12) provided that

(i)
\((\rho _t)_{t\in [0,T]}\) is weakly continuous curve in \({\mathcal {P}}({{\mathbb {R}}^{d}})\);

(ii)
\(({\varvec{j}}_t)_{t\in [0,T]}\) is a Borelmeasurable curve in \({\mathcal {M}}(G)\);

(iii)
the pair \((\rho ,{\varvec{j}})\) satisfies (2.13).
We denote the set of all weak solutions on the time interval [0, T] by \( {{\,\mathrm{CE}\,}}_T\). For \(\rho ^0,\rho ^1\in {\mathcal {P}}({{\mathbb {R}}^{d}})\), a pair \((\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}(\rho ^0,\rho ^1)\) if \((\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}:={{\,\mathrm{CE}\,}}_1\) and in addition \(\rho (0)=\rho ^0\) and \(\rho (1)=\rho ^1\).
The following lemma shows that any weak solution satisfying (2.13), which additionally satisfies the integrability condition (2.14) has a weakly continuous representative and hence is a weak solution in the sense of Definition 2.14. This observation justifies the terminology of curve in the space of probability measures; see [2, Lemma 8.1.2] and [23, Lemma 3.1].
Lemma 2.15
Let \((\rho _t)_{t\in [0,T]}\) and \(({\varvec{j}}_t)_{t\in [0,T]}\) be Borel families of measures in \({\mathcal {P}}({{\mathbb {R}}^{d}})\) and \({\mathcal {M}}(G)\) satisfying (2.13) and (2.14). Then there exists a weakly continuous curve \(({\bar{\rho }}_t)_{t\in [0,T]}\subset {\mathcal {P}}({{\mathbb {R}}^{d}})\) such that \({\bar{\rho }}_t=\rho _t\) for a.e. \(t\in [0,T]\). Moreover, for any \(\varphi \in C_\mathrm {c}^\infty ([0,T]\times {{\mathbb {R}}^{d}})\) and all \(0\leqq t_0\leqq t_1\leqq T\) it holds that
We now prove propagation of secondorder moments.
Lemma 2.16
(Uniformly bounded second moments) Let \((\mu ^n)_n\subset {\mathcal {M}}^+({{\mathbb {R}}^{d}})\) such that (A1) holds uniformly in n. Let \((\rho _0^n)_n \subset {\mathcal {P}}_{2}({{\mathbb {R}}^{d}})\) be such that \(\sup _{n\in {\mathbb {N}}} M_2(\rho _0^n) < \infty \) and \((\rho ^n,{\varvec{j}}^n)_n \subset {{\,\mathrm{CE}\,}}_T\) be such that \(\sup _{n\in {\mathbb {N}}} {\int }_0^T {\mathcal {A}}(\mu ^n;\rho _t^n,{\varvec{j}}_t^n)\,\text {d}t<\infty \). Then \(\sup _{t\in [0,T]}\sup _{n\in {\mathbb {N}}} M_2(\rho _t^n) < \infty \).
Proof
We proceed by considering the time derivative of the secondorder moment of \(\rho _t^n\) for all \(t\in [0,T]\) and \(n\in {\mathbb {N}}\). Since \(x\mapsto x^2\) is not an admissible test function in (2.13), we introduce a smooth cutoff function \(\varphi _R\) satisfying \(\varphi _R(x)=1\) for \(x\in B_R\), \(\varphi _R(x)=0\) for \(x \in {{\mathbb {R}}^{d}}{\setminus } B_{2R}\) and \(\nabla \varphi _R \leqq \frac{2}{R}\). Then, we can use the definition of solution with the function \(\psi _R(x)= \varphi _R(x)^2 (x^2+1)\) and apply Lemma 2.10 with \(\Phi ={\overline{\nabla }}\psi _R\) to obtain, for all \(t\in [0,T]\) and \(n\in {\mathbb {N}}\),
For \(R\geqq 1\), we estimate, for all \((x,y)\in G\),
and observe that
Hence the first term in (2.16) is bounded by \(32 xy ^2\), since \(R\geqq 1\). For the second term in (2.16), we abbreviate by setting \(r = \varphi _R(x) x \) and \(s = \varphi _R(y)y \) and compute the bound
It is easy to check that \(x\mapsto \varphi _R(x) x \) is globally Lipschitz and we can conclude that, for some numerical constant \(C>0\), for all \((x,y)\in G\) we have
Thus, by sending \(R\rightarrow \infty \) and using (A1), it follows that
By integrating the above differential inequality, we arrive at the bound
whence we conclude by taking the suprema in \(n\in {\mathbb {N}}\) and \(t\in [0,T]\). \(\square \)
Now we are ready to show compactness for the solutions to (2.12).
Proposition 2.17
(Compactness of solutions to the nonlocal continuity equation) Let \((\mu ^n)_n\subset {\mathcal {M}}^+({\mathbb {R}}^d)\) and suppose that \((\mu ^n)_n\) narrowly converges to \(\mu \). Moreover, suppose that the base measures \(\mu ^n\) and \(\mu \) satisfy (A1) and (A2) uniformly in n. Let \((\rho ^n,{\varvec{j}}^n) \in {{\,\mathrm{CE}\,}}_T\) for each \(n\in {\mathbb {N}}\) be such that \((\rho _0^n)_n\) satisfies \(\sup _{n\in {\mathbb {N}}} M_2(\rho _0^n)< \infty \) and
Then, there exists \((\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T\) such that, up to a subsequence, as \(n\rightarrow \infty \) it holds
with \(\rho _t\in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) for any \(t\in [0,T]\). Moreover, the action is lower semicontinuous along the above subsequences \((\mu ^n)_n, (\rho ^n)_n\) and \(({\varvec{j}}^n)_n\), i.e.,
Proof
We argue similarly to [22, Lemma 4.5], [23, Proposition 3.4]. For each \(n\in {\mathbb {N}}\) we define \({\varvec{j}}^n\in {\mathcal {M}}(G\times [0,T])\) as \(\text {d}{\varvec{j}}^n(x,y,t)=\text {d}{\varvec{j}}_t^n(x,y)\text {d}t\). In view of Lemma 2.16 there exists \(C_2>0\) such that \(\sup _{t\in [0,T]}\sup _{n\in {\mathbb {N}}} M_2(\rho _t^n) \leqq C_2 <+ \infty \).
For any compact sets \(K\subset G\) and \(I\subseteq [0,T]\), we apply the bound (2.11) of Corollary 2.11 and the Cauchy–Schwarz inequality to get
Thanks to Assumption (W), we have that \(\inf _{(x,y)\in K} (2\wedge xy)\eta (x,y)>0\) for any compact \(K\subset G\). Hence, by (2.17), \(({\varvec{j}}^n)_n\) has total variation uniformly bounded in n on every compact set of \(G\times [0,T]\), which implies, up to a subsequence, \({\varvec{j}}^n\rightharpoonup {\varvec{j}}\) as \(n\rightarrow \infty \) in \({\mathcal {M}}_{{{\,\mathrm{loc}\,}}}(G \times [0,T])\). Because of the disintegration theorem, there exists a Borel family \(({\varvec{j}}_t)_{t\in [0,T]}\) such that, for all compact sets \(I\subseteq [0,T]\) and \(K\subset G\), there holds that \({\varvec{j}}(K\times I)={\int }_I {\varvec{j}}_t(K) \,\text {d}t\). Thanks to the bound (2.18), the family \(\{{\varvec{j}}_t\}_{t\in [0,T]}\) still satisfies (2.14).
Now, as we need to pass to the limit in (2.13), we consider a function \(\xi \in C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}})\) and an interval \([t_0,t_1]\subseteq [0,T]\). The function \(\chi _{[t_0,t_1]}(t){\overline{\nabla }}\xi (x,y)\) has no compact support in \([t_0,t_1]\times G\), so we proceed by a truncation argument. Let \(\varepsilon >0\) and let us set \(I^\varepsilon = [t_0+\varepsilon , t_1\varepsilon ]\), \(N_\varepsilon = {\overline{B}}_{\varepsilon ^{1}} \times {\overline{B}}_{\varepsilon ^{1}}\), where \(B_{\varepsilon ^{1}}= \left\{ x \in {\mathbb {R}}^d: x< \varepsilon ^{1}\right\} \), and \(G_\varepsilon =\{(x,y)\in G:\varepsilon \leqq xy\}\). Hence we can find \(\varphi _\varepsilon \in C_\mathrm {c}^\infty ([t_0,t_1]\times G; [0,1])\) satisfying
so that \(\varphi _\varepsilon \rightarrow \chi _{[t_0,t_1]} \, \chi _G\) as \(\varepsilon \rightarrow 0\) and \(\varphi _\varepsilon \, \chi _{[t_0,t_1]} \, {\overline{\nabla }}\xi \) has compact support in \([t_0,t_1]\times G\). Then, we get thanks to Assumption (W), that
Now, it remains to show that
We need to estimate terms for which \(\varphi _\varepsilon (t,x)<1\). First, setting \(I_\varepsilon ^\mathrm {c} = [t_0,t_1]{\setminus } I_\varepsilon \), we note that
whence, by Lemma 2.10,
Since \(4\wedge xy^2 \leqq xy^2\vee xy^4\) we have, by Assumption (A1), the bound
Likewise, using the symmetry, we arrive at
which vanishes as \(\varepsilon \rightarrow 0\) in view of Assumption (A2). Finally, the last term is estimated again using (A1):
since \(M_2(\rho _t^n) \leqq C_2\) for any \(n\in {\mathbb {N}}\) and \(t\in [0,T]\) by Lemma 2.16.
Combining (2.20) and (2.21), we get
By means of the last convergence, the tightness of \((\rho _0^n)_n\), and (2.15) with \(\varphi (t,x)=\xi (x)\), \(t_0=0\) and \(t_1=T\), we obtain that \((\rho _t^n)_n\) locally narrowly converges to some finite nonnegative measure \(\rho _t\in {\mathcal {M}}^+({{\mathbb {R}}^{d}})\) for any \(t\in [0,T]\). In particular, for any \(\xi \in C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}})\) and any \(t\in [0,T]\), we have
Now, for \(R>0\), let us consider a function \(\xi _R\in C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}})\) such that \(0\leqq \xi \leqq 1\), \(\xi =1\) on \(B_R\), and \(\Vert \xi \Vert _{C^1}\leqq 1\). Because of the integrability condition (2.14), satisfied thanks to Corollary 2.11, we have
Hence the measure \(\rho _t\) is actually a probability measure on \({{\mathbb {R}}^{d}}\) for all \(t\in [0,T]\). Moreover Lemma 2.16 ensures that the convergence is global and not only local. As a direct consequence of the previous considerations, \((\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T\) and the lower semicontinuity follows from Lemma 2.9. \(\square \)
Nonlocal Upwind Transportation QuasiMetric
Here, we give a rigorous definition of the nonlocal transportation quasimetric we introduced in (1.8). Let us recall that \(\eta :\{ (x,y)\in {\mathbb {R}}^d \times {\mathbb {R}}^d : x\ne y \}\rightarrow [0,\infty )\) is the weight function satisfying (W).
Definition 2.18
(Nonlocal upwind transportation cost) For \(\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})\) satisfying Assumptions (A1) and (A2), and \(\rho _0,\rho _1\in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\), the nonlocal upwind transportation cost between \(\rho _0\) and \(\rho _1\) is defined by
If \(\mu \) is clear from the context, the notation \({\mathcal {T}}\) is used in place of \({\mathcal {T}}_\mu \).
Note that Proposition 2.17 ensures the existence of minimizers to (2.22), when \({\mathcal {T}}_\mu <\infty \), which holds when there exists a path of finite action. On the other hand, if this is not the case, the nonlocal upwind transportation cost is infinite. For example, consider the graph with vertices set by \(\mu \) and \(\eta \) which is disconnected, meaning that there are \(x,y\in {{\,\mathrm{supp}\,}}\mu \) such that there is no sequence \((x_0=x,x_1,\dots ,x_{n1},x_n=y)_n\) with \(\eta (x_i,x_{i+1})>0\) for all \(i=0,\dots ,n1\); in this case, \({\mathcal {T}}_\mu (\delta _x,\delta _y)=\infty \) since the set of solutions to the continuity equation \({{\,\mathrm{CE}\,}}(\delta _x,\delta _y)\) is empty.
Due to the onehomogeneity of the action density function \(\alpha \) in (2.5), we have the following reparametrization result, which is similar to [22, Theorem 5.4]:
Lemma 2.19
(Reparametrization) For any \(\mu \in {\mathcal {M}}^+({\mathbb {R}}^d)\) satisfying Assumptions (A1) and (A2), and any \(\rho _0,\rho _T\in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\), it holds that
Now, as consequence of the above reparametrization and Jensen’s inequality, we have the following result, which implies that the infimum is in fact a minimum; see [23, Proposition 4.3].
Proposition 2.20
For any \(\mu \in {\mathcal {M}}^+({\mathbb {R}}^d)\) satisfying Assumptions (A1) and (A2), and any \(\rho _0,\rho _1\in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) such that \({\mathcal {T}}_\mu (\rho _0,\rho _1)<\infty \), the infimum in (2.22) is attained by a curve \((\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}(\rho _0,\rho _1)\) so that \({\mathcal {A}}(\rho _t,{\varvec{j}}_t)={\mathcal {T}}_\mu (\rho _0,\rho _1)^2\) for a.e. \(t\in [0,1]\). Such curve is a constantspeed geodesic for \({\mathcal {T}}_\mu \), i.e.,
The next proposition establishes a link between \({\mathcal {T}}_\mu \) and the \(W_1\)distance.
Proposition 2.21
(Comparison with \(W_1\)) Let \(\mu \in {\mathcal {M}}^+({\mathbb {R}}^d)\) satisfy (A1) for some \(C_\eta >0\) (depending only on \(\mu \) and \(\eta \)). Then for any \(\rho ^0,\rho ^1\in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) it holds
Proof
By a standard regularization argument and the truncation procedure as in the proof of Lemma 2.16, we can actually consider any 1Lipschitz function \(\psi \) as a test function in the weak formulation (2.13) for some \((\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}(\rho ^0,\rho ^1)\). Then we can estimate, by Lemma 2.10 and Assumption (A1),
Taking the supremum over all 1Lipschitz functions and the infimum in the couplings \((\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}(\rho ^0,\rho ^1)\) gives the result. \(\square \)
The results above show that \({\mathcal {T}}_\mu \) is an extended (meaning that it can take value \(\infty \)) quasimetric on the set of probability measures which induces a topology stronger than the \(W_1\)topology.
Theorem 2.22
Let \(\mu \in {\mathcal {M}}^+({\mathbb {R}}^d)\) satisfy Assumptions (A1) and (A2). The nonlocal upwind transportation cost \({\mathcal {T}}_\mu \) defines an extended quasimetric on \({\mathcal {P}}_2({{\mathbb {R}}^{d}})\). The map \((\rho _0,\rho _1)\mapsto {\mathcal {T}}_\mu (\rho _0,\rho _1)\) is lower semicontinuous with respect to the narrow convergence. The topology induced by \({\mathcal {T}}_\mu \) is stronger than the \(W_1\)topology and the narrow topology. In particular, bounded sets are narrowly relatively compact in \(({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu )\).
Proof
If \({\mathcal {T}}_\mu (\rho _0,\rho _1)=0\), then \({\mathcal {A}}(\mu ;\rho _t,{\varvec{j}}_t)=0\) for a.e. \(t\in [0,1]\). Hence \({\varvec{j}}_t \equiv 0\) \(\gamma _t\)a.e., which implies that \(\rho _0\equiv \rho _1\) by the nonlocal continuity equation (2.15). The triangle inequality is a consequence of Lemma 2.19 and the fact that solutions to the nonlocal continuity equation can be concatenated. The lower semicontinuity and compactness properties of \({\mathcal {T}}_\mu \) are inherited from the action functional \({\mathcal {A}}\) via Proposition 2.17. In view of the comparison with \(W_1\) from Proposition 2.21, we have that the topology induced by \({\mathcal {T}}_\mu \) is stronger than that induced by \(W_1\) and the narrow topology. \(\square \)
The next lemma provides a quantitative illustration of asymmetry of \({\mathcal {T}}\).
Lemma 2.23
(Twopoint space) Let us consider the twopoint graph \(\Omega :=\{0,1\}\), with \(\eta (0,1)=\eta (1,0)=\alpha >0\), \(\mu (0)=p>0\) and \(\mu (1)=q>0\). Let \(\rho ,\nu \in {\mathcal {P}}_2(\Omega )\) and let \(\rho _0, \rho _1, \nu _0, \nu _1 \in [0,1]\) be such that \(\rho =\rho _0\delta _0+\rho _1\delta _1\) and \(\nu =\nu _0\delta _0+\nu _1\delta _1\). There holds
Proof
Let us fix \(\lambda =\delta _{(0,1)}+\delta _{(1,0)}\) and notice that \(\rho _0+\rho _1=1\) and \(\nu _0+\nu _1=1\) as \(\rho ,\nu \) are probability measures. Since \(\Omega =\{0,1\}\), note that for any curve \(t\in [0,1]\mapsto \rho _t\in {\mathcal {P}}_2(\Omega )\) there exists a function \(g:t\in [0,1]\mapsto g_t\in [0,1]\) accounting for the mass displacement. Thus, we notice that \((\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}(\rho ,\nu )\) if
Hence, using that \({\varvec{j}}_t\) is antisymmetric yields
Now, let us assume without loss of generality that \(\rho _0 < \nu _0\). Obviously, in this configuration we can restrict the above infimum among nondecreasing g, as it gives a lower action. Therefore, by applying Jensen’s inequality, we have
The equality case is obtained by noting that the solution to \(\frac{\text {d}}{\text {d}t} \sqrt{1g_t}=\sqrt{\rho _1}\sqrt{\nu _1}\) for all \(t\in [0,1]\), with consistent boundary values \(g_0=\rho _0\) and \(g_1=\nu _0\), is given by \(g_t = 1\bigl (\sqrt{\rho _1}(1t)+\sqrt{\nu _1} t \bigr )^2\). The case \(\nu _0<\rho _0\) is obtained in a similar manner, which gives formula (2.23). \(\square \)
Remark 2.24
The quasimetric is in general already nonsymmetric on the twopoint space, which one can best observe in Fig. 3. In the case \(p=\frac{1}{2}\), the swapping \({\hat{\rho }}_0 = \rho _1\) and \({\hat{\rho }}_1 = \rho _0\) preserves the quasidistance \({\mathcal {T}}(\rho ,\nu )= {\mathcal {T}}({\hat{\rho }},\hat{\nu })\).
We now adapt the standard definition of absolutely continuous curves in metric spaces from [2, Chapter 1] to our setting. Let \(\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})\) satisfy Assumptions (A1) and (A2). A curve \([0,T]\ni t\mapsto \rho _t\in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) is said to be 2absolutely continuous with respect to \({\mathcal {T}}_\mu \) if there exists \(m\in L^2((0,T))\) such that
In this case, we write \(\rho \in {{\,\mathrm{AC}\,}}\bigl ([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu )\bigr )\). For any \(\rho \in {{\,\mathrm{AC}\,}}\bigl ([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu )\bigr )\) the quantity
is welldefined for a.e. \(t\in [0,T]\) and is called the metric derivative of \(\rho \) at t. Moreover, the function \(t\rightarrow \rho '(t)\) belongs to \(L^2((0,T))\) and it satisfies \(\rho '(t)\leqq m(t)\) for a.e. \(t\in [0,T]\), which means \(\rho '\) is the minimal integrand satisfying (2.24). The length of a curve \(\rho \in {{\,\mathrm{AC}\,}}\bigl ([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu )\bigr )\) is defined by \(L(\rho ):={\int }_0^T\rho '(t)\,\text {d}t\).
Proposition 2.25
(Metric velocity) Let \(\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})\) satisfy Assumptions (A1) and (A2). A curve \((\rho _t)_{t\in [0,T]}\subset {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) belongs to \({{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu ))\) if and only if there exists a family \(({\varvec{j}}_t)_{t\in [0,T]}\) such that \((\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T\) and
In this case, the metric derivative is bounded as in \(\rho '^{2}(t)\leqq {\mathcal {A}}(\mu ;\rho _t,{\varvec{j}}_t)\) for a.e. \(t\in [0,T]\). In addition, there exists a unique family \((\tilde{{\varvec{j}}}_t)_{t\in [0,T]}\) such that \((\rho ,\tilde{{\varvec{j}}})\in {{\,\mathrm{CE}\,}}_T\) and
Hereby, the previous identity holds if and only if \(\tilde{{\varvec{j}}}_t\in T_{\rho }{\mathcal {P}}_2({{\mathbb {R}}^{d}})\) for a.e. \(t\in [0,T]\), where
with \({\mathcal {M}}_{\gamma _1}^{\mathrm {as}}(G)\) defined in (2.9), and \({\mathcal {M}}_{{{\,\mathrm{div}\,}}}(G)\) the set of nonlocal divergencefree fluxes, that is
Proof
The first statement on the characterization of absolutely continuous curves as curves of finite action follows from [22, Theorem 5.17], in view of Lemma 2.19 and Propositions 2.17 and 2.20. Let us now show that (2.26) holds if and only if \({\tilde{{\varvec{j}}}}_t\) belongs to \(T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) for a.e. \(t\in [0,1]\), given by (2.27). Let \(t\in [0,1]\) be so that \({\varvec{j}}_t\) verifies \({\mathcal {A}}(\mu ;\rho _t,{\varvec{j}}_t) <+ \infty \). Due to Corollary 2.8, the element \({\tilde{{\varvec{j}}}}_t\) of minimal action satisfying (2.26) is characterized by \(\partial _t \rho _t + {\overline{\nabla }}\cdot {\varvec{j}}_t = 0 = \partial _t \rho _t +{\overline{\nabla }}\cdot {\tilde{{\varvec{j}}}}_t\), that is,
Recalling the notation for the Jordan decomposition of a measure from Section 2.2, note that we use that the functional \({\varvec{j}}\mapsto {\mathcal {A}}(\mu ;\rho ,{\varvec{j}})\) is strictly convex for \({\varvec{j}}\in {\mathcal {M}}(G)\) such that \({\varvec{j}}^+ \ll \rho \otimes \mu \) and \({\varvec{j}}^ \ll \mu \otimes \rho \), which is guaranteed above since \({\mathcal {A}}(\mu ;\rho ,{\varvec{j}}) < \infty \) and \({\varvec{j}}\in {\mathcal {M}}^{\mathrm {as}}_{\gamma _1}(G)\). Then, we observe the set \(\{{\varvec{j}}\in {\mathcal {M}}_{\gamma _1}^{\mathrm {as}}(G):{\overline{\nabla }}\cdot {\varvec{j}}={\overline{\nabla }}\cdot {\varvec{j}}_t\}\) is closed with respect to the narrow convergence. In addition, the estimate (2.10) from Lemma 2.10 with \(\Phi (x,y) = xy\vee xy^2\) gives
showing that the sublevel sets of \({\varvec{j}}\mapsto {\mathcal {A}}(\mu ;\rho _t,{\varvec{j}})\) are locally relatively compact with respect to the narrow convergence, arguing as in the proof of Proposition 2.17. Hence the element \({\tilde{{\varvec{j}}}}_t\) is welldefined by applying the direct method of calculus of variations. \(\square \)
We defined the tangent space \(T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) in (2.27) using the nonlocal fluxes \({\varvec{j}}\). We note that this is in some way a nonlocal, Lagrangian description of the tangent vectors and that the relationship between this Lagrangian description and the Eulerian description is the nonlocal continuity equation
which is satisfied in the weak sense. This provides a useful heuristic, but as for classical Wasserstein gradient flows [2] the precise, rigorous definition of the tangent space is in Lagrangian form; we note, however, that here we use fluxes instead of velocities. This is not just a superficial difference. Namely, as can be seen in Proposition 2.26, the relation between velocities and fluxes is not linear and thus the velocities do not provide a linear parametrization of the tangent space. We use the argument from [22, Theorem 5.21] to characterize the tangent space \(T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) in more detail.
Proposition 2.26
(Tangent fluxes have almost gradient velocities) Let \(\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})\) satisfy Assumptions (A1) and (A2), and \(\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\). Then, it holds that \({\varvec{j}}\in T_{\rho }{\mathcal {P}}_2({{\mathbb {R}}^{d}})\) if and only if \({\varvec{j}}\in {\mathcal {M}}(G)\) with \({\varvec{j}}^+\ll \gamma _1\), \({\varvec{j}}^ \ll \gamma _2\), and \(v^+:=\frac{\text {d}{\varvec{j}}^+}{\text {d}\gamma _1}\), \(v^ :=\frac{\text {d}{\varvec{j}}^}{\text {d}\gamma _2}\) satisfy, for \(v:=v^+  v^:G\rightarrow {\mathbb {R}}\), the relation
Proof
If \({\mathcal {A}}(\mu ;\rho ,{\varvec{j}})<\infty \), then by Lemma 2.6 it holds for some \(v\in {\mathcal {V}}^{\mathrm {as}}(G)\) that
where \(\gamma _+ = \gamma _1_{J^+}\), with \(J^+ = {{\,\mathrm{supp}\,}}{\varvec{j}}^+\), and we used that \((J^+)^\top = {{\,\mathrm{supp}\,}}{\varvec{j}}^\). Then, by recalling the definition of the norm on \(L^2(\eta \,\gamma _1)\) from (2.1),
By using the relation between \({\varvec{j}}\) and v from above, we can rewrite the divergence \({\overline{\nabla }}\cdot {\varvec{j}}\) in weak form for any \(\psi \in C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}})\):
Now, the characterization (2.27) of \({\varvec{j}}\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) is equivalent to
Hence \(v^+\) belongs to the closure of \(\{{\overline{\nabla }}\varphi : \varphi \in C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}}) \}\) in \(L^2(\eta \,\gamma _+)\). From the antisymmetry of v follows that \(v^\) belongs to the closure of \(\{{\overline{\nabla }}\varphi : \varphi \in C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}}) \}\) in \(L^2(\eta \,\gamma _)\). Thus, the conclusion follows from the identity \(\gamma _+ + \gamma _+^\top = {\hat{\gamma }}^v\) on G. \(\square \)
Remark 2.27
Proposition 2.26 shows that for \(\mu \) as in its statement, \(\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) and \({\varvec{j}}\) chosen from a dense subset of \(T_{\rho }{\mathcal {P}}_2({{\mathbb {R}}^{d}})\), there exists a measurable \(\varphi :{{\mathbb {R}}^{d}}\rightarrow {\mathbb {R}}\) such that we have the identity
Finally, we provide an interesting property of absolutely continuous curves.
Proposition 2.28
(Absolutely continuous curves stay supported on \(\mu \)) Let \(\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})\) satisfy Assumptions (A1) and (A2) and \(\rho \in {{\,\mathrm{AC}\,}}([0,T],({\mathcal {P}}_2({\mathbb {R}}^d),{\mathcal {T}}_\mu ))\) be such that \({{\,\mathrm{supp}\,}}\rho _0 \subseteq {{\,\mathrm{supp}\,}}\mu \). Then, for all \(t\in [0,T]\), it holds \({{\,\mathrm{supp}\,}}\rho _t\subseteq {{\,\mathrm{supp}\,}}\mu \).
Proof
Since \((\rho _t)_{t\in [0,T]}\) is absolutely continuous, there exists by Proposition 2.25 a unique family \(({\varvec{j}}_t)_{t\in [0,T]}\) such that \((\rho ,{\varvec{j}}) \in {{\,\mathrm{CE}\,}}_T\) and \({\varvec{j}}_t \in T_{\rho _t}{\mathcal {P}}_2({\mathbb {R}}^d)\subseteq {\mathcal {M}}^{\mathrm {as}}_{\gamma _{1,t}}(G)\), where \(\gamma _{1,t} = \rho _t \otimes \mu \), and \(\rho _t' ^2= {\mathcal {A}}(\mu ;\rho _t,{\varvec{j}}_t)\) for a.e. \(t\in [0,T]\). In particular, by Lemma 2.6, there exists a measurable family \((v_t)_{t\in [0,T]}\subset {\mathcal {V}}^{\mathrm {as}}(G)\) such that
Without loss of generality, let \((\rho _t)_{t\in [0,T]}\) be the weakly continuous curve from Lemma 2.15 satisfying, for any test function \(\varphi \in C_\mathrm {c}^\infty ({\mathbb {R}}^d)\) and \(t\in [0,T]\),
Now, let \(\varphi \in C_\mathrm {c}^\infty ({\mathbb {R}}^d)\) with \(\varphi \geqq 0\) and \({{\,\mathrm{supp}\,}}\varphi \subseteq {\mathbb {R}}^d {\setminus } {{\,\mathrm{supp}\,}}\mu \). Then, for all \(t\in [0,T]\), it holds
which implies that \({{\,\mathrm{supp}\,}}\rho _t \subseteq {{\,\mathrm{supp}\,}}\mu \), since \(\rho _t \in {\mathcal {P}}({\mathbb {R}}^d)\) is in particular a nonnegative measure for all \(t\in [0,T]\) by Lemma 2.15. \(\square \)
Nonlocal NonlocalInteraction Equation
In this section we consider gradient flows in the spaces of probability measures \({\mathcal {P}}_2({{\mathbb {R}}^{d}})\) endowed with the nonlocal transportation quasimetric \({\mathcal {T}}_\mu \), defined by (2.22). From now until Section 3.4 (excluded) we fix \(\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})\) satisfying (A1) and (A2), unless otherwise specified. For this reason we shall use the simplifications \({\mathcal {A}}(\rho ,{\varvec{j}})\) for \({\mathcal {A}}(\mu ;\rho ,{\varvec{j}})\) and \({\mathcal {T}}\) for \({\mathcal {T}}_\mu \).
In this section investigate the nonlocal nonlocalinteraction equation (\({\text {NL}}^2 {\text {IE}}\)) as a gradient flow with respect to the metric \({\mathcal {T}}\). We restate it in a oneline form and note that from now on we consider the external potential \(P \equiv 0\). The extension to \(P \not \equiv 0\) is straightforward; see Remark 3.2. Thus,
In the classical setting of gradient flows in the spaces of probability measures endowed with the Wasserstein metric [2, 10], the nonlocalinteraction equation
is the gradient flow of the nonlocalinteraction energy
We start by discussing the geometry of (\({\text {NL}}^2 {\text {IE}}\)) and interpret it as the gradient flow of (3.2) in the infinitedimensional Finsler manifold of measures endowed with the Finsler metric associated to \({\mathcal {T}}\). Following this, we develop a framework of gradient flows in the quasimetric space \({\mathcal {T}}\), which extends the setup of gradient flows in metric spaces [2] to quasimetric spaces. In particular, we build the existence theory for (\({\text {NL}}^2 {\text {IE}}\)) based on this approach.
Above, for simplicity, (\({\text {NL}}^2 {\text {IE}}\)) was written for \(\rho \ll \mu \), where we recall that we used the notation \(\rho \) to denote both the measure and the density with respect to \(\mu \). Our framework, however, also applies to the case when \(\rho \) is not absolutely continuous with respect to \(\mu \). The general weak form of (\({\text {NL}}^2 {\text {IE}}\)) is obtained in terms of the nonlocal continuity equation as introduced in Section 2.3. Specifically, we have
Definition 3.1
A curve \(\rho :[0,T]\rightarrow {\mathcal {P}}_2({\mathbb {R}}^d)\) is called a weak solution to (\({\text {NL}}^2 {\text {IE}}\)) if, for the flux \({\varvec{j}}:[0,T]\rightarrow {\mathcal {M}}(G)\) defined by
the pair \((\rho ,{\varvec{j}})\) is a weak solution to the continuity equation
according to Definition 2.14.
Here we list the assumptions on the interaction kernel \(K:{{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}\rightarrow {\mathbb {R}}\) we refer to throughout this section:

(K1) \(K\in C({{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}})\);

(K2) K is symmetric, i.e., \(K(x,y)=K(y,x)\) for all \((x,y)\in {{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}\);

(K3) K is LLipschitz near the diagonal and at most quadratic far away, that is there exists some \(L\in (0,\infty )\) such that, for all \((x,y),(x',y')\in {{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}\),
$$\begin{aligned} K(x,y)K(x',y')\leqq L\left( (x,y)(x',y')\vee (x,y)(x',y')^2\right) . \end{aligned}$$
Remark 3.2
Assumption (K3) implies that, for some \(C >0\) and all \(x,y\in {{\mathbb {R}}^{d}}\),
indeed, for fixed \((x',y')\in {{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}\), (K3) yields
and bounding the maximum (\(\vee \)) by the sum, we arrive at \(K(x,y)  \leqq L +2 L \left( (x',y')^2 + (x,y)^2\right) + K(x',y') \), which gives (3.3) with \(C=2L\bigl (1+(x',y')^2\bigr ) + K(x',y') \). We notice, by the way, that the bound (3.3) implies that \({\mathcal {E}}:{\mathcal {P}}_2({{\mathbb {R}}^{d}})\rightarrow {\mathbb {R}}\) is proper with domain equal to \({\mathcal {P}}_2({{\mathbb {R}}^{d}})\).
As mentioned previously, the theory in this section can be easily extended to energies of the form (1.5) including potential energies \({\mathcal {E}}_P(\rho )={\int }_{{\mathbb {R}}^{d}}P \,\text {d}{\rho }\) for some external potential \(P:{\mathbb {R}}^d \rightarrow {\mathbb {R}}\) satisfying a local Lipschitz condition with atmostquadratic growth at infinity; that is, similarly to (K3), there exists \(L\in (0,\infty )\) so that for all \(x,y\in {\mathbb {R}}^d\) we have
We now show that, under the above assumptions on the interaction potential K, we have narrow continuity of the energy.
Proposition 3.3
(Continuity of the energy) Let the interaction potential K satisfy Assumptions (K1)–(K3). Then, for any sequence \((\rho ^n)_n \subset {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) such that \(\rho ^n \rightharpoonup \rho \) as \(n\rightarrow \infty \) for some \(\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\), we have
Proof
Let \((\rho ^n)_n \subset {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) and \(\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) be such that \(\rho ^n \rightharpoonup \rho \) as \(n\rightarrow \infty \). For all \(R>0\), we write \({{\overline{B}}}_R\) the closed ball of radius R centered at the origin in \(({{\mathbb {R}}^{d}})^2\) and \(\varphi _R :({{\mathbb {R}}^{d}})^2 \rightarrow {\mathbb {R}}\) a continuous function such that \(\varphi _R(z) = 1\) for all \(z\in {{\overline{B}}}_R\), \(\varphi _R(z) = 0\) for all \(z\in ({{\mathbb {R}}^{d}})^2 {\setminus } {{\overline{B}}}_{2R}\), and \(\varphi _R(z) \leqq 1\) for all \(z\in ({{\mathbb {R}}^{d}})^2\). For all \(R>0\), we then set \(K_R = \varphi _R K\) and
Since \((\rho ^n)_n\) converges narrowly to \(\rho \) as \(n\rightarrow \infty \) and \(K_R\) is bounded and continuous, we get
Furthermore, since \(K_R \rightarrow K\) pointwise as \(R\rightarrow \infty \), \(K_R \leqq K\) for all \(R>0\), the domain of \({\mathcal {E}}\) is \({\mathcal {P}}_2({{\mathbb {R}}^{d}})\) and \(\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\), we also have
by the Lebesgue dominated convergence theorem. Similarly, we also have
By a diagonal argument, we deduce the result. \(\square \)
Identification of the Gradient in Finsler Geometry
Since the nonlocal upwind transportation cost \({\mathcal {T}}\) is only a quasimetric, the underlying structure of \({\mathcal {P}}_2({{\mathbb {R}}^{d}})\) does not have the formal Riemannian structure as it does in the classical gradient flow theory, but a Finslerian structure instead. This highlights the fact that at every point \(\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) the tangent space \(T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) is not a Euclidean space, but rather a manifold in its own right.
In this section we provide calculations, in the spirit of Otto’s calculus, that characterize the gradient descent in the infinitedimensional Finsler manifold of probability measures endowed with the nonlocal transportation quasimetric \({\mathcal {T}}\). To keep the following considerations simple, we assume that \(\rho \) is a given probability measure which is absolutely continuous with respect to \(\mu \). In this way, we avoid the need to introduce yet another measure \(\lambda \in {\mathcal {M}}^+(G)\) with respect to which all of the occurring measures are absolutely continuous, similar to how we proceeded in Definition 2.3 for the action. This restriction is done solely to make the presentation clearer and highlight the geometric structure. Hence any flux \({\varvec{j}}\) of interest is absolutely continuous with respect to \(\mu \otimes \mu \) and we can think of \({\varvec{j}}\) via its density with respect to \(\mu \otimes \mu \), which we shall denote by j (using a letter which is not bold).
At every tangent flux \({\varvec{j}}\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) we define an inner product \(g_{\rho ,{\varvec{j}}}:T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}}) \times T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}}) \rightarrow {\mathbb {R}}\) by
where \(\{j>0\}\) is an abbreviation for \(\{(x,y) \in G :j(x,y)>0\}\) and similarly for \(\{j<0\}\). The ratios are welldefined since \(\rho \) cannot be zero where j is not zero. We note that this is the bilinear form that corresponds to the quadratic form defining the action (see Definition 2.3 and Remark 2.5); namely,
We refer the reader to “Appendix A” for a derivation of this inner product from a Minkowski norm on \(T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) as it is required in Finsler geometry. We recall that from Proposition 2.26 a dense subset of tangentfluxes \({\varvec{j}}\) are characterized by the existence of a potential \(\varphi \in C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}})\) such that, for \(\mu \otimes \mu \)a.e. \((x,y) \in G\),
In this Finsler setting, we now want to determine the direction of steepest descent from \(\rho \), for the underlying energy defined in (3.2). The gradient vector of some energy \({\mathcal {E}}:{\mathcal {P}}({\mathbb {R}}^d)\rightarrow {\mathbb {R}}\) at \(\rho \), which we denote by \({{\,\mathrm{grad}\,}}{\mathcal {E}}(\rho )\), is defined as the tangent vector which satisfies
provided this vector exists and is unique. Here, we use the continuity equation Definition 2.14 to define variations via
where \({\tilde{\rho }}\) is any curve such that \({\tilde{\rho }}_0= \rho \) and \(\left. \frac{\text {d}}{\text {d}t}\right _{t=0}{\tilde{\rho }}_t =  {\overline{\nabla }}\cdot {\varvec{j}}\). From Definition 2.7, due to \(\mu \otimes \mu \)absolute continuity of \({\varvec{j}}\) we have that
In the case, when \({\mathcal {M}}\) is a finitedimensional Finsler manifold, such gradient vector exists and is unique since the mapping \(\ell :T_\rho {\mathcal {M}}\rightarrow (T_{\rho }{\mathcal {M}})^*,\, {\varvec{j}}\mapsto g_{\rho ,{\varvec{j}}}({\varvec{j}},\cdot )\), is a bijection; see [18, Proposition 1.9]. For further details into Finsler geometry, we refer the reader to [4, 49]. In our case, we can at least claim that the functional \(\ell _\rho :T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}}) \rightarrow (T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}}))^*\), given for \({\varvec{j}}\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) by
is injective \(\eta \, \mu \otimes \mu \)a.e.; that is, the existence of a gradient implies its uniqueness (\(\eta \, \mu \otimes \mu \)a.e.), in which case we have
To see the injectivity of (3.6), we first note that \(\ell _\rho \) is positively 1homogeneous by definition. Moreover, we have the following onesided version of a Cauchy–Schwarztype estimate
Here, we also used that \(\sqrt{ab}+\sqrt{cd}\leqq \sqrt{(a+c)(b+d)}\) for all \(a,b,c,d>0\). Note that the above inequalities become strict if any of the integrands \(j_2(x,y)_+ j(x,y)_\) or \(j_2(x,y)_ j(x,y)_+\) have a contribution. In particular, we could have \(\ell _\rho ({\varvec{j}})({\varvec{j}}_2)=\infty \) although the righthand side is finite. Despite this, we still have equality in (3.7) if and only if \({\varvec{j}}_2 = \beta {\varvec{j}}_1\) \(\eta \, \mu \otimes \mu \)a.e. for some \(\beta \geqq 0\).
To prove the injectivity of \(\ell _\rho \), let us suppose that \({\varvec{j}}_1, {\varvec{j}}_2 \in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) are so that \(\ell _\rho ({\varvec{j}}_1) = \ell _\rho ({\varvec{j}}_2)\). If \({\varvec{j}}_1 = 0\) or \({\varvec{j}}_2 = 0\) \(\eta \, \mu \otimes \mu \)a.e., then \(\ell _\rho ({\varvec{j}}_1) = \ell _\rho ({\varvec{j}}_2)\) implies that \({\varvec{j}}_1 = {\varvec{j}}_2 = 0\). If both \({\varvec{j}}_1\) and \({\varvec{j}}_2\) are nonzero, then by the above Cauchy–Schwarz inequality we get
which, after dividing by \(\sqrt{g_{\rho ,{\varvec{j}}_2}({\varvec{j}}_2,{\varvec{j}}_2)}\) yields \(g_{\rho ,{\varvec{j}}_2}({\varvec{j}}_2,{\varvec{j}}_2) \leqq g_{\rho ,{\varvec{j}}_1}({\varvec{j}}_1,{\varvec{j}}_1)\). Similarly, one gets \(g_{\rho ,{\varvec{j}}_1}({\varvec{j}}_1,{\varvec{j}}_1) \leqq g_{\rho ,{\varvec{j}}_2}({\varvec{j}}_2,{\varvec{j}}_2)\), from which we get
Hence
which is the equality case in the Cauchy–Schwarz inequality. Therefore, there exists \(\beta \geqq 0\) such that \({\varvec{j}}_2 = \beta {\varvec{j}}_1\). By positive 1homogeneity of \(\ell _\rho \) we get \(\ell _\rho ({\varvec{j}}_2) = \ell _\rho (\beta {\varvec{j}}_1) = \beta \ell _\rho ({\varvec{j}}_1) = \beta \ell _\rho ({\varvec{j}}_2)\), so that \(\beta = 1\), since \(\ell _\rho ({\varvec{j}}_2)({\varvec{j}}_2) \ne 0\). This ends the proof of the claim of injectivity of \(\ell _\rho \).
The direction of the steepest descent on Finsler manifolds is in general not \({{\,\mathrm{grad}\,}}{\mathcal {E}}(\rho )\), but is defined to be the tangent flux, which we denote by \({{\,\mathrm{grad}\,}}^ {\mathcal {E}}(\rho )\), such that
In other words, we define \({{\,\mathrm{grad}\,}}^ {\mathcal {E}}(\rho )\) as the tangent vector (provided it exists) such that
Here we clearly see that in general \({{\,\mathrm{grad}\,}}^ {\mathcal {E}}(\rho ) \ne {{\,\mathrm{grad}\,}}{\mathcal {E}}(\rho )\) since \(\ell _\rho \) is not negatively 1homogeneous. We can justify that \({{\,\mathrm{grad}\,}}^ {\mathcal {E}}(\rho )\) indeed corresponds to the direction of steepest descent at \(\rho \) via the following criterion, which is analogous to the Riemann case. We first note that if \({{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}= 0\) then \({{\,\mathrm{grad}\,}}^ {\mathcal {E}}(\rho )=0\). If \({{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}\ne 0\) we note that minimizers \({\varvec{j}}^*\) of
are of the form \({\varvec{j}}^* = \beta {{\,\mathrm{grad}\,}}^ {\mathcal {E}}(\rho )\) for some \(\beta >0\). Indeed, using the fact that \(\left. \frac{\text {d}}{\text {d}s}\right _{s=0}g_{\rho ,{\varvec{j}}+ s{\varvec{j}}_1}({\varvec{j}}+s{\varvec{j}}_1,{\varvec{j}}+ s{\varvec{j}}_1) = 2g_{\rho ,{\varvec{j}}}({\varvec{j}},{\varvec{j}}_1)\) for all \({\varvec{j}},{\varvec{j}}_1\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) [as shown in (A.1) of “Appendix A”] and using the Lagrange multiplier \(\beta \) and the functional
yields, for a constrained minimizer \({\varvec{j}}^*\), the condition
By the definition of \({\varvec{j}}^*\) we have \(0> {{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}[{\varvec{j}}^*] =  \beta ^* g_{\rho ,{\varvec{j}}^*}({\varvec{j}}^*,{\varvec{j}}^*)\), which implies that \(\beta ^*>0\). By injectivity and positive 1homogeneity of \(\ell _\rho \), we get
The gradient flows with respect to \({\mathcal {E}}\) in the Finsler space \(({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}})\) can thus be written
These considerations stay valid for general energy functionals \({\mathcal {E}}:{\mathcal {P}}_2({{\mathbb {R}}^{d}})\rightarrow {\mathbb {R}}\).
Let us compute the gradient flux for the specific case of the interaction energy (3.2). A direct computation using the symmetry of K and Definition 2.7 gives, for all \({\varvec{j}}\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\),
where by comparison with (3.6), we observe that \({{\,\mathrm{grad}\,}}^ {\mathcal {E}}(\rho )\) is given for \(\mu \otimes \mu \)a.e. \((x,y) \in G\) by
This shows by (3.8) the existence and by our previous argument also uniqueness of \({{\,\mathrm{grad}\,}}^ {\mathcal {E}}(\rho )\). It is easily observed that it has exactly the form (3.5) with the corresponding potential given by \(\varphi = K*\rho \).
We conclude this section by mentioning that the Finsler gradient flow structure of differential equations has been discovered and investigated in other systems; see [1, 41, 42].
Variational Characterization for the Nonlocal NonlocalInteraction Equation
Section 3.1 shows that the nonlocal nonlocalinteraction equation (\({\text {NL}}^2 {\text {IE}}\)) can in fact be written as the gradient descent of the energy \({\mathcal {E}}\) according to the Finsler gradient operator; see (3.10) and (3.11). This is why we refer to weak solutions of (\({\text {NL}}^2 {\text {IE}}\)) as gradient flows.
In this section we consider \(({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}})\) as a quasimetric space rather than a Finsler manifold, which allows us to prove rigorous statements more easily. In particular, we show that the weak solutions of (\({\text {NL}}^2 {\text {IE}}\)) are curves of maximal slope for the energy (3.2) in the quasimetric space \(({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}})\) and vice versa. We then establish the existence and stability of gradient flows using the variational framework of curves of maximal slope. To develop the variational formulation, we adapt the approach of [2] to curves of maximal slope in metric spaces to the quasimetric space \(({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}})\). This requires introducing a onesided version of the usual concepts from [2] to cope with the asymmetry of the quasimetric \({\mathcal {T}}\).
Definition 3.4
(Onesided strong upper gradient) A function \(h:{\mathcal {P}}_2({{\mathbb {R}}^{d}})\rightarrow [0,\infty ]\) is a onesided strong upper gradient for \({\mathcal {E}}\) if for every \(\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}))\) the function \(h\circ \rho \) is Borel and
where \(\rho '\) is the metric derivative of \(\rho \) as defined in (2.25).
The above onesided definition is sufficient to characterize the curves of maximal slope.
Definition 3.5
(Curve of maximal slope) A curve \(\rho \in {{\,\mathrm{AC}\,}}([0,T];{\mathcal {P}}_2({{\mathbb {R}}^{d}}))\) is a curve of maximal slope for \({\mathcal {E}}\) with respect to its onesided strong upper gradient h if and only if \(t\mapsto {\mathcal {E}}(\rho _t)\) is nonincreasing and
Remark 3.6
Note that by using Young’s inequality in (3.12), we get
Hence, if the curve \((\rho _t)_{t\in [0.T]}\) is a curve of maximal slope for \({\mathcal {E}}\) with respect to its strong upper gradient h, we actually have an equality in (3.13).
Therefore, in order to give a variational characterization of (\({\text {NL}}^2 {\text {IE}}\)) we need to detect the right onesided strong upper gradient. As showed in [24], the variation of the energy along the solution to the equation provides the suitable candidate. In what follows we clarify this point as well as the strategy.
We recall that Proposition 2.25 ensures that for any \(\rho \in {{\,\mathrm{AC}\,}}\bigl ([0,T]; ({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}})\bigr )\) there exists a unique flux \(({\varvec{j}}_t)_{t\in [0,T]}\) in \(T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) such that \({\int }_0^T{\mathcal {A}}(\rho _t,{\varvec{j}}_t)\,\text {d}{t}<\infty \), \((\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T\) and \(\rho _t'^2={\mathcal {A}}(\rho ,{\varvec{j}}_t)\) for a.e. \(t\in [0,T]\). Moreover, according to Lemma 2.6 there exists an antisymmetric measurable vector field \(w:[0,T]\times G \rightarrow {\mathbb {R}}\) such that
It will be convenient to work directly with this vector field \((w_t)_{t\in [0,T]}\): from now on we write \((\rho ,w)\in {{\,\mathrm{CE}\,}}_T\) for \((\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T\) as well as \({\widehat{{\mathcal {A}}}}(\rho _t,w_t)\) for \({\mathcal {A}}(\rho _t,{\varvec{j}}_t)\) according to (2.8). With this convention, we can define a Finslertype product on velocities in analogy to (3.4) as
Note that, under the absolutecontinuity assumptions of Section 3.1, by comparing with (3.4) we have that \({\widehat{g}}_{\rho ,w}(u,v)= g_{\rho ,{\varvec{j}}}({\varvec{j}}_1,{\varvec{j}}_2)\), where \({\varvec{j}}_1,{\varvec{j}}_2\) are obtained from u, v by (3.14), respectively. Moreover, taking (3.6) into account, we also define
Arguing as in (3.7), we arrive at the following onesided Cauuchy–Schwarz inequality:
Lemma 3.7
(Onesided Cauchy–Schwarz inequality) For all \(v,w \in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) it holds that
with equality if and only if, for some \(\lambda >0\), \(v(x,y)_+= \lambda w(x,y)_+\) for \(\eta \, \rho \otimes \mu \)a.e. \((x,y)\in G\) (and thus, by antisymmetry, also \(v(x,y)_= \lambda w(x,y)_\) for \(\eta \, \mu \otimes \rho \)a.e. \((x,y)\in G\)).
Proof
Using \(v=v_+v_\) and the usual Cauchy–Schwarz inequality in \(L^2(\eta \,\rho \otimes \mu )\), we get
From the usual Cauchy–Schwarz inequality we have equalities above if and only if there exists \(\lambda > 0\) such that \(v(x,y)_+=\lambda w(x,y)_+\) for \(\eta \rho \otimes \mu \)a.e. \((x,y) \in G\) and \(v(x,y)_=\lambda w(x,y)_\) for \(\eta \mu \otimes \rho \)a.e. \((x,y)\in G\), since all the contributions are positive. \(\square \)
Now note that, from the weak formulation of the nonlocal continuity equation (2.15), we have for any \(\varphi \in C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}})\) and any \(0\leqq s < t \leqq T\) the following chain rule:
Moreover, we still have the identification of the product \({\widehat{g}}\) with the action in the form of Lemma 2.6,
which shows that the action is the norm with respect to the Finsler structure.
A crucial step toward the variational characterization of (\({\text {NL}}^2 {\text {IE}}\)) mentioned above is to obtain the chain rule (3.17) for the energy functional (3.2), which is done in Proposition 3.10 below by a suitable regularization. As a consequence, by using the onesided Cauchy–Schwarz inequality from Lemma 3.7, we obtain in Corollary 3.11 that the square root \(\sqrt{{\mathcal {D}}}\) of the local slope, defined below in (3.19), is a onesided strong upper gradient for \({\mathcal {E}}\) with respect to the quasimetric \({\mathcal {T}}\) in the sense of Definition 3.4, where \(\rho _t'^2={\hat{{\mathcal {A}}}}(\rho _t,w_t)={\widehat{g}}_{\rho _t,w_t}(w_t,w_t)\) for a.e. \(t\in [0,T]\) due to Proposition 2.25 and (3.18). This allows us to define the De Giorgi functional, which provides the characterization of weak solutions as curves of maximal slope.
Definition 3.8
(Local slope and De Giorgi functional) For any \(\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\), let the local slope at \(\rho \) be given by
For any \(\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}))\), the De Giorgi functional at \(\rho \) is defined as
When the dependence on the base measure \(\mu \) needs to be explicit, the local slope and the De Giorgi functional are denoted by \({\mathcal {D}}(\mu ;\rho )\) and \({\mathcal {G}}_T(\mu ;\rho )\), respectively.
If the potential K satisfies Assumptions (K1)–(K3), we note that whenever \(\rho \) is a weak solution to (\({\text {NL}}^2 {\text {IE}}\)) and \(\rho \in {{\,\mathrm{AC}\,}}([0,T];{\mathcal {P}}_2({{\mathbb {R}}^{d}}))\) the quantity \({\mathcal {G}}_T(\rho )\) is finite; indeed, the domain of the energy is all of \({\mathcal {P}}_2({{\mathbb {R}}^{d}})\) and Proposition 2.25 yields that both the local slope (since it is equal to the action of \((\rho ,{\varvec{j}})\), where \({\varvec{j}}\) is given in Definition 3.1) and metric derivative are finite.
We are ready to state our main theorem.
Theorem 3.9
Suppose that \(\mu \) satisfies Assumptions (A1) and (A2) and K satisfies Assumptions (K1)–(K3). A curve \((\rho _t)_{t\in [0,T]} \subset {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) is a weak solution to (\({\text {NL}}^2 {\text {IE}}\)) according to Definition 3.1 if and only if \(\rho \) belongs to \({{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}))\) and is a curve of maximal slope for \({\mathcal {E}}\) with respect to \(\sqrt{{\mathcal {D}}}\) in the sense of Definition 3.5, that is, satisfies
where \({\mathcal {G}}_T\) is the De Giorgi functional as given in Definition 3.8.
Note that in the above theorem, the implicit assumption that \(\sqrt{{\mathcal {D}}}\) is a onesided strong upper gradient for \({\mathcal {E}}\) is made; this is in fact true thanks to Corollary 3.11 below. In light of this we can represent the result via the following diagram:
The Chain Rule and Proof of Theorem 3.9
Firstly, we focus on the chainrule property, which is the main technical step for proving Theorem 3.9.
Proposition 3.10
Let K satisfy Assumptions (K1)–(K3). For all \(\rho \in {{\,\mathrm{AC}\,}}\bigl ([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}})\bigr )\) and \(0\leqq s\leqq t\leqq T\) we have the chainrule identity
where \((w_t)_{t\in [0,T]}\) is the antisymmetric vector field associated by (2.6) to \((\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T\).
Proof
Since the curve \(\rho \in {{\,\mathrm{AC}\,}}\bigl ([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}})\bigr )\), according to Proposition 2.25 there exists a unique family \(({\varvec{j}}_t)_{t\in [0,T]}\) belonging to \(T_{\rho }{\mathcal {P}}_{2}({{\mathbb {R}}^{d}})\) for a.e. \(t\in [0,T]\) such that:

(i)
\((\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T\);

(ii)
\({\int }_0^T\sqrt{{\mathcal {A}}(\rho _t,{\varvec{j}}_t)}\,\text {d}t<\infty \);

(iii)
\(\rho _t'^2={\mathcal {A}}(\rho _t,{\varvec{j}}_t)\) for a.e. \(t\in [0,T]\);

(iv)
\(\text {d}{\varvec{j}}_t(x,y) = w_t(x,y)_+ \text {d}\gamma _{1,t}(x,y)  w_t(x,y)_ \text {d}\gamma _{2,t}(x,y)\).
Then the identity (3.22) is equivalent to proving
We proceed by applying two regularization procedures. First, for all \((x,y)\in {{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}\) we define \(K^\varepsilon (x,y)=K*m_\varepsilon (x,y)={\iint }_{{{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}} K(z,z')m_{\varepsilon }(xz,yz')\,\text {d}z\,\text {d}z'\), where \(m_\varepsilon (z)=\frac{1}{\varepsilon ^{2d}}m(\frac{z}{\varepsilon })\) for all \(z\in {\mathbb {R}}^{2d}\) and \(\varepsilon >0\), where m is a standard mollifier on \({\mathbb {R}}^{2d}\). We also introduce a smooth cutoff function \(\varphi _R\) on \({{\mathbb {R}}^{2d}}\) such that \(\varphi (z)=1\) on \(B_R\), \(\varphi (z)=0\) on \({{\mathbb {R}}^{2d}}{\setminus } B_{2R}\) and \(\nabla \varphi _R\leqq \frac{2}{R}\), where \(B_R\) is the ball of radius R in \({{\mathbb {R}}^{2d}}\) centered at the origin. We set \(K_R^\varepsilon :=\varphi _R K^\varepsilon \) and note that it is a \(C_\mathrm {c}^\infty ({{\mathbb {R}}^{2d}})\) function. We now introduce the approximate energies, indexed by \(\varepsilon \) and R,
Let us extend \(\rho \) and \({\varvec{j}}\) to \([T,2 T]\) periodically in time, meaning that \(\rho _{s}=\rho _{Ts}\) and \(\rho _{T+s}=\rho _{s}\) for all \(s\in (0,T]\) and likewise for \({\varvec{j}}\). We regularize \(\rho \) and \({\varvec{j}}\) in time by using a standard mollifier n on \({\mathbb {R}}\) supported on \([1,1]\), by setting \(n_\sigma (t)=\frac{1}{\sigma }n(\frac{t}{\sigma })\) and
for any \(\sigma \in (0,T)\); whence \(\rho _t^\sigma \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\). Let us now show that the integral of the action is uniformly bounded with respect to \(\sigma \). Let \(\lambda  \in {\mathcal {M}}^+(G)\) be such that \(\gamma _{1,t},\gamma _{2,t},{\varvec{j}}_t \ll \lambda \) for all \(t\in [0,T]\). Then by using the joint convexity of the function \(\alpha \) from (2.5), Jensen’s inequality and Fubini’s Theorem, we get
It is easy to check that \((\rho ^\sigma ,{\varvec{j}}^\sigma )\) is still a solution to the nonlocal continuity equation on [0, T]. By arguing as in the proof of Proposition 2.17, we get that along subsequences it holds \(\rho _t^\sigma \rightharpoonup \tilde{\rho }_t\) as \(\sigma \rightarrow 0\) for all \(t\in [0,T]\) for some curve \(({\tilde{\rho }}_t)_{t\in [0,T]}\) in \({\mathcal {P}}_2({{\mathbb {R}}^{d}})\), and \({\varvec{j}}^\sigma \rightharpoonup \hat{{\varvec{j}}}\) in \({\mathcal {M}}_{\mathrm {loc}}(G \times [0,T])\). with \(\text {d}{\hat{{\varvec{j}}}} := \text {d}{\tilde{{\varvec{j}}}}_t\text {d}t\), for some curve \(({\tilde{{\varvec{j}}}}_t)_{t\in [0,T]}\) in \({\mathcal {M}}(G)\). Note that \(n_\sigma \rightharpoonup \delta _0\) as \(\sigma \rightarrow 0\), and, as a consequence, \(\rho _t^\sigma \rightharpoonup \rho _t\) for all \(t\in [0,T]\) in the view of Proposition 2.21. Thus, we actually have \(\tilde{\rho }=\rho \) and \(\tilde{{\varvec{j}}}={\varvec{j}}\) by uniqueness of the limit and the flux, as highlighted above. Using the regularity for \(\varepsilon >0\) and \(\sigma >0\), we get
For the sake of completeness, we note that the second equality follows from the definition of \({{\,\mathrm{CE}\,}}_T\) by using again a cutoff argument on the function \(K_R^\varepsilon *\rho _t^\sigma \). We omit this step as it is a standard procedure. By integrating in time between s and t, with \(s\leqq t\), it follows
In order to obtain (3.23) we need to let \(\varepsilon \) and \(\sigma \) go to 0 and R go to \(\infty \) in (3.24). The lefthand side is easy to handle since \(\rho _t^\sigma \rightharpoonup \rho _t\) as \(\sigma \rightarrow 0\) for any \(t\in [0,T]\), and \(K_R^\varepsilon \rightarrow K_R\) uniformly on compact sets as \(\varepsilon \rightarrow 0\). Finally, by letting R go to \(\infty \) we have convergence to \({\mathcal {E}}(\rho _t)\).
In order to pass to the limit in the righthand side of (3.24), we use a truncation argument similar to that in the proof of Proposition 2.17. Let \(\delta >0\) and let us set \(N_\delta = {\overline{B}}_{\delta ^{1}} \times {\overline{B}}_{\delta ^{1}}\), where \(B_{\delta ^{1}}= \left\{ x \in {\mathbb {R}}^d: x< \delta ^{1}\right\} \), and \(G_\delta =\bigl \{(x,y)\in G:\delta \leqq xy\bigr \}\). We can consider a family \((\varphi _\delta )_{\delta >0} \subset C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}}\times G;[0,1])\) of truncation functions such that, for all \(\delta >0\),
Now, we add and subtract \(\varphi _\delta \) in the integral on the RHS of (3.24) and we argue as follows. Since \(\rho _t^\sigma \otimes {\varvec{j}}_t^\sigma \rightharpoonup \rho _t\otimes {\varvec{j}}_t\) for any \(t\in [0,T]\) as \(\sigma \rightarrow 0\), and \(K^\varepsilon _R\rightarrow K_R\) uniformly on compact sets as \(\varepsilon \rightarrow 0\), we can pass to the limit in \(\sigma \) and \(\varepsilon \), for any R and \(\delta >0\),
By using \(\varphi _\delta \leqq 1\), Assumption (K3), Lemma 2.10 with \(\Phi (x,y)=xy\vee xy^2\) and (A1), we can bound the modulus of (3.25) for any \(\tau \in [s,t]\) by
Hence the integral is uniformly bounded in \(\delta \) and R, and by the Lebesgue dominated convergence theorem we can pass to the limit in (3.25) in \(\delta \) and R, obtaining
Now, it remains to control the integral involving the term \(1\varphi _\delta (z,x,y)\) in the integrand. Let us note that, for all \(\delta >0\),
Using Assumption (K3) and splitting each contribution, we obtain
Using Lemma 2.10 with \(\Phi (x,y)=xy\vee xy^2\), (A1) and the Cauchy–Schwarz inequality with respect to \(\eta \, \rho _t^\sigma \otimes \mu \), the righthand side in the inequality above can be further bounded by
Thanks to the uniform second moment bound of \(\rho _t^\sigma \) from Lemma 2.16 and Assumption (A2), the above terms converge to zero as \(\delta \rightarrow 0\), which concludes the proof. \(\square \)
That \(\sqrt{{\mathcal {D}}}\) is a onesided strong upper gradient for \({\mathcal {E}}\) is an easy consequence of the previous result.
Corollary 3.11
For any curve \(\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}))\) it holds that
i.e., \(\sqrt{{\mathcal {D}}}\) is a onesided strong upper gradient for \({\mathcal {E}}\) in the sense of Definition 3.4.
Proof
Without loss of generality we assume \({\int }_s^t\sqrt{{\mathcal {D}}(\rho _\tau )}\rho '(\tau )\,\text {d}\tau <\infty \), as otherwise the inequality (3.26) is trivially satisfied. We obtain the result as consequence of Proposition 3.10 by applying the onesided Cauchy–Schwarz inequality (Lemma 3.7) to (3.22) as follows: for any \(0\leqq s\leqq t\leqq T\),
Note that the last two equalities are provided by identity (3.18) and Proposition 2.25. \(\square \)
At this point, we have collected all auxiliary results to deduce Theorem 3.9.
Proof of Theorem 3.9
Let us start by assuming that \(\rho \) is a weak solution to (\({\text {NL}}^2 {\text {IE}}\)). In view of Definition 3.1, a weak solution is obtained from the weak formulation of the nonlocal continuity equation (2.13) if we set
Then, by writing \(v_t^{\mathcal {E}}(x,y)={\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(x,y)\), it is easy to check that
where the finiteness follows from Assumptions (K3) and (A1), as shown by the computation
Thanks to Proposition 2.25, this also proves that \(\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}))\) and \(\rho _t'^2\leqq {\mathcal {D}}(\rho _t)\) for a.e. \(t\in [0,T]\). In view of Proposition 3.10, we thus obtain
This implies that

(i)
the map \(t\mapsto {\mathcal {E}}(\rho _t)\) is nonincreasing;

(ii)
\({\mathcal {E}}(\rho _t){\mathcal {E}}(\rho _s)+\frac{1}{2}{\int }_s^t {\mathcal {D}}(\rho _\tau )+\rho _\tau '^2\,\text {d}\tau = 0\), by Corollary 3.11.
Whence the first part of the theorem follows for \(s=0\) and \(t=T\) since \({\mathcal {G}}_T(\rho )=0\).
Consider now \(\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}))\) satisfying the equality (3.21). Let us verify that it is a weak solution of (\({\text {NL}}^2 {\text {IE}}\)). By Proposition 2.25 there exists a unique family \(({\varvec{j}}_t)_{t\in [0,T]}\) in \(T_{\rho _t}{\mathcal {P}}_2({{\mathbb {R}}^{d}})\) such that \((\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T\), \({\int }_0^T\sqrt{{\mathcal {A}}(\rho _t,{\varvec{j}}_t)}\,\text {d}t<\infty \) and \(\rho _t'^2={\mathcal {A}}(\rho _t,{\varvec{j}}_t)\) for a.e. \(t\in [0,T]\). Moreover, by Lemma 2.6 we find an antisymmetric measurable vector field \(w:[0,T]\times G \rightarrow {\mathbb {R}}\) such that
Thanks to Proposition 3.10, by applying the onesided Cauchy–Schwarz, using the identification (3.18), the definition of the local slope (3.19) and Young inequality, we get
Thanks to the equality (3.21), we actually have that the above inequalities are equalities, which holds if and only if \(w_t(x,y)={\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(x,y)\) for a.e. \(t\in [0,T]\) and \(\gamma _{1,t}\)a.e. \((x,y)\in G\). Hence \((\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T\) with \(w={\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }\), that is, \(\rho \) is a weak solution to (\({\text {NL}}^2 {\text {IE}}\)). \(\square \)
Stability and Existence of Weak Solutions
Theorem 3.9 provides a characterization of (weak) solutions to (\({\text {NL}}^2 {\text {IE}}\)) as minimizers of \({\mathcal {G}}_T\) attaining the value 0. The direct method of calculus of variations gives existence of minimizers of \({\mathcal {G}}_T\). However, it is not clear a priori whether they attain the value 0 and are thus actually weak solutions to (\({\text {NL}}^2 {\text {IE}}\)). Hence we prove compactness and stability of gradient flows (see Theorem 3.14) and approximate the desired problem by discrete problems for which the existence of solutions is easy to show; see the proof of Theorem 3.15. We start by proving that the local slope \({\mathcal {D}}\) is narrowly lower semicontinuous jointly in its arguments, \(\mu \) and \(\rho \); see Lemma 3.12. We then establish the compactness coming from a uniform control of the De Giorgi functional \({\mathcal {G}}_T\), as well as its joint narrow lower semicontinuity (see Lemma 3.13), which we prove using compactness in \({{\,\mathrm{CE}\,}}_T\) and the joint narrow lower semicontinuity of the action (see Proposition 2.17) and of the local slope. (See also [48, Theorem 2] for an analogous strategy.)
In Theorem 3.14 we prove one of our main results, namely that the functional \({\mathcal {G}}_T\) is stable under variations in base measures, defining the vertices of the graph, and absolutely continuous curves. A particular consequence of this theorem is that weak solutions to (\({\text {NL}}^2 {\text {IE}}\)) with respect to graphs defined by random samples of a measure \(\mu \) converge to weak solutions to (\({\text {NL}}^2 {\text {IE}}\)) with respect to \(\mu \); see Remark 3.17.
The existence of weak solutions of (\({\text {NL}}^2 {\text {IE}}\)) (and thus gradient flows) with respect to \({\mathcal {E}}\) proved in Theorem 3.15 shows that, indeed, the De Giorgi functional (3.20) corresponding to an interaction potential K satisfying (K1)–(K3) admits a minimizer when \(\mu ({{\mathbb {R}}^{d}})\) is finite.
Lemma 3.12
Let \((\mu ^n)_n\subset {\mathcal {M}}^+({{\mathbb {R}}^{d}})\) and suppose that \((\mu ^n)_n\) narrowly converges to \(\mu \). Assume that the base measures \((\mu ^n)_n\) and \(\mu \) are such that (A1) and (A2) hold uniformly in n, and let K satisfy Assumptions (K1)–(K3). Let moreover \((\rho ^n)_n\) be a sequence such that \(\rho ^n \in {\mathcal {P}}_{2}({{\mathbb {R}}^{d}})\) for all \(n\in {\mathbb {N}}\) and \(\rho ^n\rightharpoonup \rho \) as \(n\rightarrow \infty \) for some \(\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\). Then
Proof
For every \(n\in {\mathbb {N}}\) we set \(u^n = {\overline{\nabla }}K*\rho ^n\). Furthermore, we write \(u= {\overline{\nabla }}K*\rho \) and define \(g:{\mathbb {R}}\rightarrow {\mathbb {R}}\) by \(g(x) = (x_+)^2\) for all \(x \in {\mathbb {R}}\). Then note that g is convex and continuous, and
and, similarly,
We want to use [2, Theorem 5.4.4 (ii)] to prove the desired \(\liminf \) inequality. Observe that \(u^n \in L^2(\eta \,\gamma _1^n)\) and \(u \in L^2(\eta \,\gamma _1)\); indeed, (K3) and (A1) give
and, similarly, for u. Let now \(\varphi \in C_\mathrm {c}^\infty (G)\). We have
The last integral is actually vanishing as \(R\rightarrow \infty \) since (K3), (A1) and Prokhorov’s Theorem give
The function \((z,x,y) \mapsto (K(y,z)  K(x,z))\varphi (x,y)\eta (x,y)\) is continuous and bounded on \(({{\mathbb {R}}^{d}}\cap B_R)\times G\) thanks to Assumption (W). In addition, we note that \((\rho ^n\otimes \gamma _1^n)_n\) narrowly converges to \(\rho \otimes \gamma _1\) in \({\mathcal {P}}({{\mathbb {R}}^{d}})\times {\mathcal {M}}^+(G)\). Therefore, we obtain for any \(R>0\) the convergence
By sending \(R\rightarrow \infty \), we obtain
Thus, \(u^n\) converges weakly to u as \(n\rightarrow \infty \) in the sense of [2, Definition 5.4.3]. By [2, Theorem 5.4.4 (ii)] we therefore conclude that
which is the desired result. \(\square \)
Let us also prove the compactness and narrow lower semicontinuity of the De Giorgi functional.
Lemma 3.13
(Compactness and lower semicontinuity of the De Giorgi functional) Let \((\mu ^n)_n\subset {\mathcal {M}}^+({\mathbb {R}}^d)\) and suppose that \((\mu ^n)_n\) narrowly converges to \(\mu \). Assume that the base measures \(\mu ^n\) and \(\mu \) satisfy (A1) and (A2) uniformly in n, and let K satisfy (K1)–(K3). Let moreover \((\rho ^n)_n\) be a sequence so that \(\rho ^n \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_{\mu ^n}))\) for all \(n\in {\mathbb {N}}\) with \(\sup _{n\in {\mathbb {N}}} M_2(\rho _0^n) < \infty \) and \(\sup _{n\in {\mathbb {N}}} {\mathcal {G}}_T(\mu ^n;\rho ^n)<\infty \). Then, up to a subsequence, \(\rho ^n_t \rightharpoonup \rho _t\) as \(n\rightarrow \infty \) for all \(t\in [0,T]\) for some \(\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu ))\) and
Proof
For any \(n\in {\mathbb {N}}\), recall the definition
where we are careful to take the metric derivative of \(\rho ^n\) with respect to \({\mathcal {T}}_{\mu ^n}\) (as given in Definition 2.18). Since the domain of the energy \({\mathcal {E}}\) is all of \({\mathcal {P}}_2({{\mathbb {R}}^{d}})\) and the local slope \({\mathcal {D}}\) is nonnegative, the bound \(\sup _{n\in {\mathbb {N}}} {\mathcal {G}}_T(\mu ^n;\rho ^n)<\infty \) ensures that
For all \(n\in {\mathbb {N}}\), since \(\rho ^n \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_{\mu ^n}))\), Proposition 2.25 yields the existence of a flux \({\varvec{j}}^n\) such that \((\rho ^n,{\varvec{j}}^n)\in {{\,\mathrm{CE}\,}}_T\) and \((\rho ^n_t)'^2 = {\mathcal {A}}(\mu ^n;\rho ^n_t,{\varvec{j}}^n_t)\) for almost all \(t\in [0,T]\). We then get
By Proposition 2.17, there now exists \((\rho ,{\varvec{j}}) \in {{\,\mathrm{CE}\,}}_T\) such that, up to subsequences, \(\rho _t^n \rightharpoonup \rho _t\) for all \(t\in [0,T]\) and \({\varvec{j}}^n \rightharpoonup {\varvec{j}}\) as \(n\rightarrow \infty \), and
By Proposition 2.25, we therefore have \(\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu ))\) and \((\rho _t)'_{{\mathcal {T}}_\mu }^2 \leqq {\mathcal {A}}(\mu ;\rho _t,{\varvec{j}}_t)\) for almost all \(t\in [0,T]\), which finally gives
By the narrow continuity of the energy proved in Proposition 3.3, we get
Furthermore, by Fatou’s lemma and the narrow lower semicontinuity of the local slope shown in Lemma 3.12, we have
Gathering (3.27), (3.28) and (3.29), we finally obtain
which ends the proof. \(\square \)
We now get our stability result.
Theorem 3.14
(Stability of gradient flows) Let \((\mu ^n)_n\subset {\mathcal {M}}^+({\mathbb {R}}^d)\) and suppose that \((\mu ^n)_n\) narrowly converges to \(\mu \). Assume that the base measures \(\mu ^n\) and \(\mu \) satisfy (A1) and (A2) uniformly in n, and let the interaction potential K satisfy (K1)–(K3). Suppose that \(\rho ^n\) is a gradient flow of \({\mathcal {E}}\) with respect to \(\mu ^n\) for all \(n\in {\mathbb {N}}\), that is,
such that \((\rho _0^n)_n\) satisfies \(\sup _{n\in {\mathbb {N}}} M_2(\rho _0^n)< \infty \) and \(\rho _t^n \rightharpoonup \rho _t\) as \(n\rightarrow \infty \) for all \(t\in [0,T]\) for some curve \((\rho _t)_{t\in [0,T]} \subset {\mathcal {P}}_2({{\mathbb {R}}^{d}})\). Then, \(\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu ))\) and \(\rho \) is a gradient flow of \({\mathcal {E}}\) with respect to \(\mu \), that is,
Proof
By Lemma 3.13 we directly obtain that \(\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu ))\) and, up to a subsequence,
Finally, since \({\mathcal {G}}_T(\mu ;\rho ) \geqq 0\) by Young’s inequality and Corollary 3.11, we obtain \({\mathcal {G}}_T(\mu ;\rho ) = 0\). \(\square \)
Note that, via Theorem 3.9, the above theorem also shows stability of weak solutions to (\({\text {NL}}^2 {\text {IE}}\)). Typically, in Theorem 3.14, \((\mu ^n)_n\) is a sequence of atomic measures used to approximate, or sample, the support of \(\mu \). Indeed, we now use this approach to show the existence of weak solutions to the nonlocal nonlocalinteraction equation.
Theorem 3.15
(Existence of weak solutions) Let K be an interaction potential satisfying Assumptions (K1)–(K3). Suppose that \(\mu \in {\mathcal {M}}^+({\mathbb {R}}^d)\) is finite, i.e., \(\mu ({{\mathbb {R}}^{d}})<\infty \), and satisfies (A2). Assume furthermore that for some \(C_\eta ' > 0\) it holds that
Consider \(\rho _0 \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) which is \(\mu \)absolutely continuous. Then there exists a weakly continuous curve \(\rho :[0,T] \rightarrow {\mathcal {P}}({{\mathbb {R}}^{d}})\) such that \({{\,\mathrm{supp}\,}}\rho _t\subseteq {{\,\mathrm{supp}\,}}\mu \) for all \(t\in [0,T]\), which is a weak solution of (\({\text {NL}}^2 {\text {IE}}\)) and satisfies the initial condition \(\rho (0)=\rho _0\).
Proof
Let \((\mu ^n)_n \subset {\mathcal {M}}^+({\mathbb {R}}^d)\) be a sequence of atomic measures such that \((\mu ^n)_n\) converges narrowly to \(\mu \). Moreover, assume that \(\mu ^n\) has finitely many atoms and \(\mu ^n({{\mathbb {R}}^{d}}) \leqq \mu ({{\mathbb {R}}^{d}})\) and \({{\,\mathrm{supp}\,}}\mu ^n \subseteq {{\,\mathrm{supp}\,}}\mu \) for all \(n\in {\mathbb {N}}\). Let \({\hat{\mu }}^n\) be the normalization of \(\mu ^n\) which has the same total mass as \(\mu \), that is,
and let \(\pi ^n\) be optimal transportation plan between \(\mu \) and \({\hat{\mu }}^n\) for the quadratic cost. Let \(\rho _0^n\) be the second marginal of \({\tilde{\rho }}_0 \pi ^n\), where \({\tilde{\rho }}_0\) is the density of the measure \(\rho _0\) with respect to \(\mu \); namely, let \(\rho _0^n(A) = {\int }_{{\mathbb {R}}^d \times A} {\tilde{\rho }}_0(x) \,\text {d}\pi ^n(x,y)\) for any Borel set \(A\subset {{\mathbb {R}}^{d}}\). Note that \(\rho _0^n({\mathbb {R}}^d) = \rho _0({\mathbb {R}}^d)\) and \(\rho _0^n \ll \mu ^n\) for all \(n\in {\mathbb {N}}\), and that, since \({\tilde{\rho }}_0 \pi ^n\) is a transport plan between \(\rho _0\) and \(\rho _0^n\), \(\rho _0^n \rightharpoonup \rho _0\) as \(n\rightarrow \infty \).
Thanks to Assumption (3.30), it holds, for all \(n\in {\mathbb {N}}\), that
Since, by construction \(\rho _0^n \ll \mu ^n\), we have \({{\,\mathrm{supp}\,}}\rho _0^n \subseteq {{\,\mathrm{supp}\,}}\mu ^n \subseteq {{\,\mathrm{supp}\,}}\mu \). This nested support property is, thanks to Proposition 2.28, preserved in time, so that \({{\,\mathrm{supp}\,}}\rho _t^n \subseteq {{\,\mathrm{supp}\,}}\mu \) for all \(t\in [0,T]\) and \(n\in {\mathbb {N}}\). For this reason, (3.31) can be used, under the stated support restriction on \(\rho _0\), instead of Assumption (A1) uniformly in n when calling Lemma 3.13 and Theorem 3.14 later in this proof. Since \(\mu ^n\) consists of finitely many atoms and \(\mu \) satisfies (A2), the family \((\mu _n)_n\) satisfies (A2) uniformly in n.
By Remark 1.1, we know that the ODE system (1.2)–(1.4) admits a unique solution for all \(n\in {\mathbb {N}}\). It can be easily checked that this solution, which we denote by \(\rho ^n\), is a weak solution to (\({\text {NL}}^2 {\text {IE}}\)) with respect to \(\mu ^n\) starting from \(\rho _0^n\), according to Definition 3.1. By Theorem 3.9, we then get that \(\rho ^n\) is a gradient flow of \({\mathcal {E}}\) with respect to \(\mu \) starting from \(\rho _0^n\) for all \(n\in {\mathbb {N}}\).
Combining the compactness part of Lemma 3.13 and the stability from Theorem 3.14, we get that, up to a subsequence, \(\rho _t^n \rightharpoonup \rho _t\) as \(n\rightarrow \infty \) for all \(t\in [0,T]\), where \(\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu ))\) is a gradient flow of \({\mathcal {E}}\) with respect to \(\mu \) starting from \(\rho _0\). Theorem 3.9 finally shows that \(\rho \) is a weak solution to (\({\text {NL}}^2 {\text {IE}}\)) with respect to \(\mu \) starting from \(\rho _0\). \(\square \)
Remark 3.16
Assumption (3.30) is only needed to arrive at an atomic approximation sequence \((\mu ^n)_n\) of \(\mu \) such that Assumptions (A1) and (A2) hold uniformly in n. On a casebycase basis, one could drop (3.30) and try to construct the sequence \((\mu ^n)_n\) explicitly in such a way as to satisfy both assumptions uniformly in n.
Remark 3.17
We conclude the section by remarking on the relevance of the Theorem 3.14 to the setting of machine learning. Namely, there \(\mu \) is the measure modeling the true data distribution, which can be assumed to be compact. Let \((x_i)_i\) be a sequence of i.i.d. samples of \(\mu \) and let \(\mu ^n = \frac{1}{n} \sum _{i=1}^n \delta _{x_i}\) be the empirical measure of the first n sample points. Assume \((\rho ^n)_n\) is a narrowly converging sequence of probability measures such that \({{\,\mathrm{supp}\,}}\rho ^n \subseteq \{x_1, \dots , x_n\}\) for all \(n\in {\mathbb {N}}\), and denote by \(\rho \) its limit. Assume that \(\eta \) is an edge weight kernel such that \(\mu \) and \(\eta \) satisfy (A2) and (3.30). Let K be an interaction kernel satisfying (K2) and (K3). Finally, let \(({\tilde{\rho }}^n)_n\) be the sequence of solutions of (\({\text {NL}}^2 {\text {IE}}\)) in the sense of Definition 3.1 such that \({\tilde{\rho }}^n_0 = \rho ^n\) for all \(n\in {\mathbb {N}}\). Then, by Lemma 3.13, the sequence \(({\tilde{\rho }}_t^n)_n\) narrowly converges along a subsequence for all \(t\in [0,T]\), and furthermore, by Theorem 3.15, any curve \(({\tilde{\rho }}_t)_{t\in [0,T]}\) of subsequential limits yields a solution \({\tilde{\rho }}\) of (\({\text {NL}}^2 {\text {IE}}\)) with initial condition \(\rho \).
Discussion of the Local Limit
Here we discuss at a formal level the connection between the nonlocal nonlocalinteraction equation and its limit as the graph structure localizes. We first present a very formal justification as to why we expect the solutions of (\({\text {NL}}^2 {\text {IE}}\)) to converge to the solutions of a nonlocalinteraction equation as the localizing parameter \(\varepsilon \rightarrow 0^+\), i.e., as the edgeweight function \(\eta = \eta _\varepsilon \) localizes. We conclude this section with an example that cautions that the formal argument cannot be justified in full generality. Proving the convergence of (\({\text {NL}}^2 {\text {IE}}\)) in the limit \(\varepsilon \rightarrow 0^+\), under appropriate conditions, remains an intriguing open problem.
Take \(\mu ={\text {Leb}}({{\mathbb {R}}^{d}})\) and choose \(\eta _\varepsilon \) given by (2.2). Consider a smooth interaction potential \(K:{{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}\rightarrow {\mathbb {R}}\) and a compactly supported initial condition \(\rho _0\) which has a continuous density with respect to \(\mu \). Let \(\rho ^\varepsilon \) be the solution of (\({\text {NL}}^2 {\text {IE}}\)) starting from \(\rho _0\) for the edge weight function \(\eta _\varepsilon \). Assume that \(\rho ^\varepsilon _t\) is absolutely continuous with respect to \(\mu \) for all t. In the following we drop the tdependence of \(\rho ^\varepsilon \) for brevity. From (\({\text {NL}}^2 {\text {IE}}\)), by adding and subtracting \(\rho ^\varepsilon (x) {\int }_{{\mathbb {R}}^{d}}({\overline{\nabla }}K*\rho ^\varepsilon (x,y))_{+} \eta _\varepsilon (x,y) \,\text {d}y\), it follows that
Then, for almost all \(x \in {{\mathbb {R}}^{d}}\) we have
A standard calculation, using a secondorder Taylor expansion, shows that the righthand side approximates \(\Delta K*\rho ^\varepsilon (x)\) when \(\varepsilon \) is small, provided that derivatives of \(\rho ^\varepsilon \) remain uniformly bounded.
Similarly, by Taylor expanding \({\overline{\nabla }}\rho ^\varepsilon \) and \({\overline{\nabla }}K *\rho ^\varepsilon \) to first order and changing variable over the unit sphere while carefully tracking the positive part, one gets
Combining the expressions above yields
This suggests that if \(\rho ^\varepsilon \) converge as \(\varepsilon \rightarrow 0^+\), then the limiting \(\rho \) is a solution of the standard nonlocal interaction equation (3.1). A possible way to attack the local limit within the variational framework is via a stability statement similar to that of Theorem 3.14, but now with respect to the family \((\eta _\varepsilon )_{\varepsilon >0}\) in the limit \(\varepsilon \rightarrow 0^+\). The next remark indicates that this will require further regularity assumptions on the interaction kernel K.
Remark 3.18
We present an example that indicates that, in certain situations, solutions of (\({\text {NL}}^2 {\text {IE}}\)) cannot be expected to converge to solutions of (3.1) as the interaction kernel \(\eta _\varepsilon \) becomes more concentrated. Namely, consider \(d=1\), \(\Omega = (2,2)\) and \(\mu ={\text {Leb}}(\Omega )\). Let \(K(x,y) = 1e^{xy}\) for all \(x,y\in \Omega \) and \(\eta \) be a smooth, even function, positive on \((0.2,0.2)\) and zero otherwise. Consider \(\rho _0 = \frac{1}{2} (\delta _{1} + \delta _1)\). It is straightforward to verify that \(\rho _t = \rho _0\) for all \(t\in [0,T]\) yields a weak solution of (\({\text {NL}}^2 {\text {IE}}\)) for all \(\varepsilon >0\). In particular, note that the corresponding velocity field satisfies \(v(1,y) = (K*\rho _0(y)  K*\rho _0(1)) \leqq 0\) for all \(y \in (1.2,0.8)\), and thus the flux from \(x=1\) remains zero, and analogously from \(x=1\). Therefore, one cannot expect the weak solutions for the interaction potential K to converge to weak solutions of (3.1) as \(\varepsilon \rightarrow 0^+\). We believe that, for these particular kernel K and edge weights \(\eta \), the problem persists for strong solutions for initial data close to \(\rho _0\), only that explicit solutions are not available.
References
Agueh, M.: Finsler structure in the \(p\)Wasserstein space and gradient flows. C. R. Math. Acad. Sci. Paris 350(1–2), 35–40, 2012
Ambrosio, L., Gigli, N., Savaré, G.: Gradient Flows in Metric Spaces and in the Space of Probability Measures. Lectures in Mathematics ETH Zurich. Birkhäuser Verlag, Basel 2008
Balagué, D., Carrillo, J.A., Laurent, T., Raoul, G.: Dimensionality of local minimizers of the interaction energy. Arch. Ration. Mech. Anal. 209(3), 1055–1088, 2013
Bao, D., Chern, S.S., Shen, Z.: An Introduction to RiemannFinsler Geometry. Springer, New York 2000
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396, 2002
Benamou, J.D., Brenier, Y.: A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem. Numer. Math. 84(3), 375–393, 2000
Bertozzi, A.L., Carrillo, J.A., Laurent, T.: Blowup in multidimensional aggregation equations with mildly singular interaction kernels. Nonlinearity 22(3), 683–710, 2009
Buttazzo, G.: Semicontinuity, relaxation and integral representation in the calculus of variations, volume 207 of Pitman Research Notes in Mathematics Series. Longman Scientific & Technical, Harlow; copublished in the USA with John Wiley & Sons Inc, New York, 1989
Cancès, C., Gallouët, T.O., Todeschi, G.: A variational finite volume scheme for Wasserstein gradient flows. Preprint arXiv:1907.08305, 2019
Carrillo, J.A., Di Francesco, M., Figalli, A., Laurent, T., Slepčev, D.: Globalintime weak measure solutions and finitetime aggregation for nonlocal interaction equations. Duke Math. J. 156(2), 229–271, 2011
Carrillo, J.A., McCann, R.J., Villani, C.: Kinetic equilibration rates for granular media and related equations: entropy dissipation and mass transportation estimates. Rev. Mat. Iberoam. 19(3), 971–1018, 2003
Chaudhuri, K., Dasgupta, S., Kpotufe, S., von Luxburg, U.: Consistent procedures for cluster tree estimation and pruning. IEEE Trans. Inform. Theory 60(12), 7900–7912, 2014
Chen, Y., Georgiou, T.T., Tannenbaum, A.: Vectorvalued optimal mass transport. SIAM J. Appl. Math. 78(3), 1682–1696, 2018
Chow, S.N., Huang, W., Li, Y., Zhou, H.: FokkerPlanck equations for a free energy functional or Markov process on a graph. Arch. Ration. Mech. Anal. 203(3), 969–1008, 2012
Chow, S.N., Li, W., Zhou, H.: Entropy dissipation of Fokker–Planck equations on graphs. Discrete Contin. Dyn. Syst. 38(10), 4929–4950, 2018
Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmon. Anal. 21(1), 5–30, 2006
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619, 2002
Dahl, M.: A brief introduction to Finsler geometry. 2006. This work is based on my licentiate thesis, Propagation of Gaussian beams using RiemannFinsler geometry, Helsinki University of technology, 2006, instructed by doctor Kirsi Peltonen and supervised by professor Erkki Somersalo
Delarue, F., Lagoutière, F.: Probabilistic analysis of the upwind scheme for transport equations. Arch. Ration. Mech. Anal. 199(1), 229–268, 2011
Delarue, F., Lagoutière, F., Vauchelet, N.: Convergence order of upwind type schemes for transport equations with discontinuous coefficients. J. Math. Pures Appl., 2017 (to appear)
Disser, K., Liero, M.: On gradient structures for Markov chains and the passage to Wasserstein gradient flows. Netw. Heterog. Media 10(2), 233–253, 2015
Dolbeault, J., Nazaret, B., Savaré, G.: A new class of transport distances between measures. Calc. Var. Partial Differ. Equ. 34(2), 193–231, 2009
Erbar, M.: Gradient flows of the entropy for jump processes. Ann. Inst. Henri Poincaré Probab. Stat. 50(3), 920–945, 2014
Erbar, M.: A gradient flow approach to the Boltzmann equation. Preprint arXiv:1603.0540, 2018
Erbar, M., Fathi, M., Laschos, V., Schlichting, A.: Gradient flow structure for McKeanVlasov equations on discrete spaces. Discrete Contin. Dyn. Syst. 36(12), 6799–6833, 2016
Erbar, M., Fathi, M., Schlichting, A.: Entropic curvature and convergence to equilibrium for meanfield dynamics on discrete spaces. ALEA Lat. Am. J. Probab. Math. Stat. 17(1), 445–471, 2020
Erbar, M., Maas, J.: Gradient flow structures for discrete porous medium equations. Discrete Contin. Dyn. Syst. 34(4), 1355–1374, 2014
Erbar, M., Maas, J., Wirth, M.: On the geometry of geodesics in discrete optimal transport. Calc. Var. Partial Differ. Equ. 58(1), 19, 2019
Eymard, R., Gallouët, T., Herbin, R.: Finite volume methods. In: Handbook of Numerical Analysis, number Part 3, pp. 713–1018. Elsevier, 2000
Gangbo, W., Li, W., Mou, C.: Geodesics of minimal length in the set of probability measures on graphs. ESAIM Control Optim. Calc. Var. 25, 78, 2019
García Trillos, N., Slepčev, D.: Continuum limit of total variation on point clouds. Arch. Ration. Mech. Anal. 220(1), 193–241, 2016
García Trillos, N., Slepčev, D.: A variational approach to the consistency of spectral clustering. Appl. Comput. Harmon. Anal. 45(2), 239–281, 2018
Jordan, R., Kinderlehrer, D., Otto, F.: The variational formulation of the FokkerPlanck equation. SIAM J. Math. Anal. 29(1), 1–17, 1998
Kannan, R., Vempala, S., Vetta, A.: On clusterings: good, bad and spectral. J. ACM 51(3), 497–515, 2004
Kolokolnikov, T., Sun, H., Uminsky, D., Bertozzi, A.: Stability of ring patterns arising from twodimensional particle interactions. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 84 1 Pt 2, 015203, 2011
Maas, J.: Gradient flows of the entropy for finite Markov chains. J. Funct. Anal. 261(8), 2250–2292, 2011
Mielke, A.: A gradient structure for reactiondiffusion systems and for energydriftdiffusion systems. Nonlinearity 24(4), 1329–1346, 2011
Mielke, A.: Geodesic convexity of the relative entropy in reversible Markov chains. Calc. Var. Partial Differ. Equ. 48(1–2), 1–31, 2013
Natale, A., Todeschi, G.: TPFA finite volume approximation of Wasserstein gradient flows. In: Finite Volumes for Complex Applications IX  Methods, Theoretical Aspects, Examples, pp. 193–201. Springer 2020
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems, pp. 849–856. MIT Press 2001
Ohta, S.I., Sturm, K.T.: Heat flow on Finsler manifolds. Commun. Pure Appl. Math. 62(10), 1386–1433, 2009
Ohta, S.I., Sturm, K.T.: Noncontraction of heat flow on Minkowski spaces. Arch. Ration. Mech. Anal. 204(3), 917–944, 2012
Peletier, M.A., Rossi, R., Savaré, G., Tse, O.: Jump processes as Generalized Gradient Flows. Preprint arXiv:2006.10624, 2020
Scharfetter, D.L., Gummel, H.K.: Largesignal analysis of a silicon read diode oscillator. IEEE Trans. Electron Devices 16(1), 64–77, 1969
Schlichting, A., Seis, C.: Convergence rates for upwind schemes with rough coefficients. SIAM J. Numer. Anal. 55(2), 812–840, 2017
Schlichting, A., Seis, C.: Analysis of the implicit upwind finite volume scheme with rough coefficients. Numer. Math. 139(1), 155–186, 2018
Schlichting, A., Seis, C.: The ScharfetterGummel scheme for aggregationdiffusion equations. Preprint arXiv:2004.13981, 2020
Serfaty, S.: Gammaconvergence of gradient flows on Hilbert and metric spaces and applications. Discrete Contin. Dyn. Syst. 31(4), 1427–1451, 2011
Shen, Z.: Lectures on Finsler Geometry. World Scientific, Singapore 2001
Villani, C.: Topics in Optimal Transportation, Volume 58 of Graduate Studies in Mathematics. American Mathematical Society, Providence 2003
Acknowledgements
The authors are grateful to José Antonio Carrillo for several enlightening discussions, to Mark Peletier for insightful remarks on an earlier version of the paper, and to Triphon Georgiou for valuable information. The authors would like to thank the American Institute of Mathematics (AIM) for its support through the workshop Nonlocal differential equations in collective behavior. AE and AS gratefully acknowledge the support of the Hausdorff Research Institute for Mathematics (Bonn), through the Junior Trimester Program on Kinetic Theory. AE acknowledges support by the EUfunded Erasmus Mundus programme “MathMods  Mathematical models in engineering: theory, methods, and applications” at the University of L’Aquila, and by the German Science Foundation (DFG) through CRC TR 154 “Mathematical Modelling, Simulation and Optimization Using the Example of Gas Networks". AS is supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy EXC 2044 – 390685587, Mathematics Münster: Dynamics–Geometry–Structure, and EXC 2047 – 390685813, Hausdorff Center for Mathematics, as well as the Collaborative Research Center 1060 – 211504053, The Mathematics of Emergent Effects at the Universität Bonn. DS is grateful to NSF for support via Grants DMS 1516677 and DMS 1814991 and via KiNet (SF Research Network Grant RNMS 1107444). DS and FSP are grateful to the Center for Nonlinear Analysis of CMU for its support.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Communicated by A. Figalli
Appendix A: Minkowski Norm of the Underlying Finsler Structure
Appendix A: Minkowski Norm of the Underlying Finsler Structure
In this appendix we show that, given \(\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) and \({\varvec{j}}\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\), the inner product \(g_{\rho ,{\varvec{j}}}\) from Section 3.1 derives from a socalled Minkowski norm, as it should be in the theory of Finsler geometry; see [4, 18, 49].
Let us fix \(\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) which is absolutely continuous with respect to \(\mu \), in accordance with Section 3.1. For \({\varvec{j}}\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\), we denote j its density with respect to \(\mu \otimes \mu \). We show that the function \(F_\rho :T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}}) \rightarrow {\mathbb {R}}\) given by
is a Minkowski norm, that is, it is smooth away from 0, positively 1homogeneous and, whenever \({\varvec{j}}\) in nonzero \(\eta \, \mu \otimes \mu \)a.e., its second variation is a symmetric positive definite bilinear form. In fact, we now prove that, for all \({\varvec{j}},{\varvec{j}}_1,{\varvec{j}}_2 \in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) such that \({\varvec{j}}\) is nonzero \(\eta \,\mu \otimes \mu \)a.e.:
Indeed, let \({\varvec{j}}\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\), \(s\in {\mathbb {R}}\) and \({\varvec{j}}_1 \in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) such that \({\varvec{j}}+s{\varvec{j}}_1 \in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\). Then,
Note that
Therefore,
Similarly, one gets
We also have that
Using Lebesgue’s dominated convergence theorem, one gets, moreover, that
We thus overall get
which shows that
Note that this equality was used in Section 3.1 to determine that \({{\,\mathrm{grad}\,}}^ {\mathcal {E}}(\rho )\) is indeed the direction of steepest descent from \(\rho \), i.e., to get (3.9). Computing now a further derivative in direction \({\varvec{j}}_2 \in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) and using similar boundedness arguments, we get
Note that the presence of the integral over the set \(\{j=0\}\) comes from the fact that j is not a multiplicative function of the integrand anymore (as it was the case for the first derivative), so that the set of points where \(j = 0\) has to be considered. Assuming then that \(\eta \, \mu \otimes \mu \)a.e. we have \({\varvec{j}}\ne 0\) we obtain that this integral over \(\{j=0\}\) is equal to zero, which yields the claim.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Esposito, A., Patacchini, F.S., Schlichting, A. et al. NonlocalInteraction Equation on Graphs: Gradient Flow Structure and Continuum Limit. Arch Rational Mech Anal 240, 699–760 (2021). https://doi.org/10.1007/s0020502101631w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s0020502101631w