## 1 Notation

We list here some symbols used throughout the paper.

• $${\mathcal {M}}(A)$$ is the set of Borel measures on $$A \subseteq {\mathbb {R}}^d$$.

• $${\mathcal {M}}^+(A)$$ is the set of non-negative Borel measures on A.

• $${\mathcal {P}}(A)\subset {\mathcal {M}}^+(A)$$ is the set of Borel probability measures on A.

• $${\mathcal {P}}_{2}(A)\subseteq {\mathcal {P}}(A)$$ stands for the elements of $${\mathcal {P}}(A)$$ with finite second moment, that is,

\begin{aligned} M_2(\rho ) := {\int }_{A} |x|^2\,\text {d}\rho (x) < \infty . \end{aligned}
• $$C_\mathrm {b}(A)$$ is the set of bounded continuous functions from A to $${\mathbb {R}}$$.

• $$a_+:=\max \{0,a\}$$ and $$a_-:=(-a)_+$$ are the positive and negative parts of $$a \in {\mathbb {R}}$$.

• $$\mu \in {\mathcal {M}}^+({\mathbb {R}}^d)$$ sets the underlying geometry of the state space; it is sometimes referred to as base measure.

• $$\rho \in {\mathcal {P}}({\mathbb {R}}^d)$$ denotes a configuration; the natural setting is that $${{\,\mathrm{supp}\,}}\rho \subseteq {{\,\mathrm{supp}\,}}\mu$$, although we allow for general supports as needed for stability results.

• $$\eta :\{ (x,y)\in {\mathbb {R}}^d \times {\mathbb {R}}^d : x\ne y \}\rightarrow [0,\infty )$$ is the edge weight function.

• $$G= \{ (x,y) \in {\mathbb {R}}^d \times {\mathbb {R}}^d : x\ne y ,\, \eta (x,y)>0\}$$ are the edges.

• $$\rho _1\otimes \rho _2 \in {\mathcal {M}}^+(G)$$ is the product measure of $$\rho _1, \rho _2 \in {\mathcal {M}}^+({\mathbb {R}}^d)$$ restricted to G.

• $$\gamma _1 = \rho \otimes \mu$$ and $$\gamma _2 = \mu \otimes \rho$$.

• $${\mathcal {V}}^{\mathrm {as}}(G)$$ is the set of antisymmetric graph vector fields on G, defined in (1.6).

• $${\overline{\nabla }}f$$ is the nonlocal gradient of a function $$f :{\mathbb {R}}^d \rightarrow {\mathbb {R}}$$, while $${\overline{\nabla }}\cdot {\varvec{j}}$$ is the nonlocal divergence of a measure-valued flux $${\varvec{j}}\in {\mathcal {M}}(G)$$; see Definition 2.7.

• $${\mathcal {A}}$$ stands for the action functional; see Definition 2.3.

• $${\mathcal {T}}$$ denotes the nonlocal transportation quasi-metric; see (2.22).

• $${{\,\mathrm{CE}\,}}_T(\rho _0,\rho _1)$$ denotes the set of paths (solutions to the nonlocal continuity equation for densities (1.7) or measures (2.12)) on the time interval [0, T] connecting two measures $$\rho _0, \rho _1\in {\mathcal {P}}({\mathbb {R}}^d)$$; we set $${{\,\mathrm{CE}\,}}:={{\,\mathrm{CE}\,}}_1$$.

Let us also specify the notions of narrow convergence and convolution. A sequence $$(\rho ^n)_n\subset {\mathcal {M}}(A)$$ is said to converge narrowly to $$\rho \in {\mathcal {M}}(A)$$, in which case we write $$\rho ^n \rightharpoonup \rho$$, provided that

\begin{aligned} \forall f\in C_\mathrm {b}(A), \qquad \qquad {\int }_A f\,\text {d}{\rho ^n} \rightarrow {\int }_A f \,\text {d}{\rho } \qquad \text {as } n \rightarrow \infty . \end{aligned}

Given a function $$f :A \times A \rightarrow {\mathbb {R}}$$ and $$\rho \in {\mathcal {M}}(A)$$, we write $$f*\rho$$ the convolution of f and $$\rho$$, that is,

\begin{aligned} f*\rho (x)={\int }_A f(x,y)\,\text {d}\rho (y) \quad \text {for any} \, x \in A \, \text {such that the right-hand side exists}. \end{aligned}

## 2 Introduction

We investigate dynamics driven by interaction energies on graphs, and their continuum limits. We interpret the relevant dynamics as gradient flows of the interaction energy with respect to a particular graph analogue of the Wasserstein distance. We prove the convergence of the dynamics on finite graphs to a continuum dynamics as the number of vertices goes to infinity. To do this we create a unified setup where the continuum and the discrete dynamics are both seen as particular instances of the gradient flow of the same energy, with respect to a nonlocal Wasserstein quasi-metric whose state space is adapted to the configuration space considered.

Let us first introduce the problem on finite graphs where it is the simplest to describe.

### 2.1 Graph Setting with General Interactions

Consider an undirected graph with vertices $$X =\{x_1, \dots , x_n\}$$ and edge weights $$w_{x,y} \geqq 0$$, satisfying $$w_{x,y} = w_{y,x}$$ for all $$x,y \in X$$. Although technically not necessary, we impose the natural requirement that $$w_{x,x}=0$$. The interaction potential is a symmetric function $$K :X \times X \rightarrow {\mathbb {R}}$$, while the external potential is denoted $$P:X \rightarrow {\mathbb {R}}$$. We consider a “mass” distribution $$\rho :X \rightarrow [0, \infty )$$, and we require $$\sum _{x \in X} \rho _x =1$$. The total energy $${\mathcal {E}}_X:{\mathcal {P}}(X)\rightarrow {\mathbb {R}}$$ is a combination of the interaction energy $${\mathcal {E}}_I$$ and the potential energy $${\mathcal {E}}_P$$:

\begin{aligned} {\mathcal {E}}_X(\rho ) = {\mathcal {E}}_{I}(\rho ) + {\mathcal {E}}_P(\rho ) = \frac{1}{2} \sum _{x \in X} \sum _{y \in X} K_{x,y} \rho _x \rho _y + \sum _{x \in X} P_x \rho _x. \end{aligned}
(1.1)

The gradient descent of $${\mathcal {E}}_X$$ that we study is described by the following system of ODE for the mass distribution:

\begin{aligned} \frac{\text {d}\rho _x}{\text {d}t}&= - \frac{1}{2} \sum _{y\in X} \big (j_{x,y} - j_{y,x}\big ) w_{x,y}, \end{aligned}
(1.2)
\begin{aligned} j_{x,y}&= \frac{1}{n} \big (\rho _x (v_{x,y})_+ - \rho _y (v_{x,y})_-\big ), \end{aligned}
(1.3)
\begin{aligned} v_{x,y}&= -\sum _{z\in X} \rho _z (K_{y,z}-K_{x,z}) - (P_y - P_x). \end{aligned}
(1.4)

The quantities $$v:X \times X \rightarrow {\mathbb {R}}$$ and $$j:X \times X \rightarrow {\mathbb {R}}$$ are defined on edges and model the graph analogues of velocity and flux. An evolution by such system is illustrated on Fig. 1. The system (1.2)–(1.4) is the gradient flow of the energy $${\mathcal {E}}_X$$ with respect to a new graph equivalent of the Wasserstein metric. The concept of Wasserstein metrics on finite graphs were introduced independently by Chow et al. [14], Maas [36], and Mielke [37, 38]. All of the approaches rely on graph analogues of the continuity equation to describe the paths in the configuration space. On graphs the mass is distributed over the vertices and is exchanged over the edges. Hence, the analogues of the vector field and the flux are defined over the edges. However, the flux should be the product of the velocity (an edge-based quantity) by the density (a vertex-based quantity). Thus, one has to interpolate the densities at vertices to define the density (and hence the flux) along the edges. The choice of interpolation is not unique, and has important ramifications.

While the overall structure of our setup is derived from one in [36], which we recall in Section 1.4; the form of the interpolation used is related to the upwind interpolation used in [14] and is almost identical to one in [13]. While in [14] the authors considered only the direction of the flux due to the potential energy to determine which density to use on the edges, in our case the density chosen depends on the total velocity and we furthermore include the interaction term which itself depends on the configuration. In particular, we use an upwind interpolation based on the total velocity. In the context of graph Wasserstein distance, such interpolation was first used by Chen et al. [13].

The “velocities” v we consider can be assumed to be antisymmetric: $$v_{x,y} = - v_{y,x}$$ for all $$x,y \in X$$. In the graph setting, which we normalize in order to consider limit $$n \rightarrow \infty$$, the continuity equation with upwind interpolation is obtained by combining (1.2) with the flux-velocity relation (1.3). Similarly to [36] and exactly as in [13], we define the graph Wasserstein distance by minimizing the action, which is the graph analogue of the kinetic energy:

\begin{aligned} A(\rho ,v) = \frac{1}{n} \sum _{x \in X} \sum _{y \in X} (v_{x,y+})^2 w_{x,y} \rho _x. \end{aligned}

As in [13, 14, 36, 38], the graph Wasserstein distance is defined by adapting the Benamou–Brenier formula:

\begin{aligned} {\mathcal {T}}(\rho ^0,\rho ^1)^2 = \inf _{(\rho ,v) \in {{\,\mathrm{CE}\,}}_X(\rho ^0,\rho ^1)} {\int }_0^1 A(\rho (t), v(t)) \,\text {d}{t}, \end{aligned}

where $${{\,\mathrm{CE}\,}}_X(\rho ^0,\rho ^1)$$ is the set of all paths (i.e., solutions of (1.2)–(1.3)) connecting $$\rho ^0$$ and $$\rho ^1$$.

It is important to observe that, in our setting, $${\mathcal {T}}$$ is not symmetric (that is, $${\mathcal {T}}(\rho ^0,\rho ^1)$$ is in general different from $${\mathcal {T}}(\rho ^1,\rho ^0)$$). The reason for this is that in general, $$A(\rho ,v) \ne A(\rho , -v)$$. Therefore the nonlocal Wasserstein distance which arises from the upwind interpolation is only a quasi-metric. The action $$A(\rho ,v)$$ provides a Finsler structure to the tangent space, instead of the usual Riemannian structure. Formally the system (1.2)–(1.4) is the gradient flow of $${\mathcal {E}}_X$$ with respect to this Finsler structure; we present a derivation of this fact in a more general setting in Section 3.1. The system is also the curve of steepest descent with respect to quasi-metric $${\mathcal {T}}$$, which is the point of view we use to create rigorous theory in the general setting.

### Remark 1.1

The well-posedness of (1.2)–(1.4) is a straightforward consequence of the Picard existence theorem. Namely, note that the simplex $$1 \geqq \rho _x \geqq 0$$, $$\sum _{x \in X} \rho _x =1$$ is an invariant region of the dynamics and that on it the vector field (1.4) is Lipschitz continuous in $$\rho _x$$, $$x \in X$$.

### Remark 1.2

One could consider other interpolations instead of the upwind one. In particular, if we considered an interpolation of the form $$I(\rho _x, \rho _y)$$ instead of the upwind one, the only change in the gradient flow would be that the velocity-flux relation (1.3) would become $$j_{x,y} = \frac{1}{n} I(\rho _x, \rho _y) v_{x,y}$$. We note that this can have major implications on the resulting dynamics. In particular, for the logarithmic interpolation, $$I(r,s) = (r-s)/(\ln r - \ln s)$$, or the geometric interpolation, $$I(r,s) = \sqrt{rs}$$, the resulting dynamics would never expand the support of the solutions, so even for repulsive potentials the mass may not spread throughout the domain. On the other hand, using the arithmetic interpolation, $$I(r,s) =(r+s)/2$$, would not work directly since the solutions may become negative. In this case additional technical steps, like a Lagrange multiplicator as in [39], are necessary to obtain the evolution of a non-negative probability density. We use the more physical inspired upwind flux, which automatically ensures the positivity of the density.

Before we turn to the general setting we point out that the system (1.2)–(1.4) offers a new model of graph-based clustering, which is briefly discussed in Section 1.5.

### 2.2 General Setting for Vertices in Euclidean Space

Here we introduce the general framework for studies of interaction equations on families of graphs and their limits as the number of vertices n goes to $$\infty$$. In particular, in the applications to machine learning which we briefly discuss in Section 1.5, the graphs considered are random samples of some underlying measure in Euclidean space, and the edge weights, as well as the interaction energy, depend on the positions of the vertices. The vertices are points in $${\mathbb {R}}^d$$. The edges are given in terms of a non-negative symmetric weight function $$\eta :\{ (x,y) \in {\mathbb {R}}^d \times {\mathbb {R}}^d : x \ne y \} \rightarrow [0, \infty )$$, which defines the set of edges as $$G=\{ (x,y)\in {\mathbb {R}}^d\times {\mathbb {R}}^d : x\ne y , \,\eta (x,y)>0\}$$. From the discrete setting, the set of vertices is replaced by the more general notion of a measure on $${\mathbb {R}}^d$$; the discrete graphs with vertices $$X = \{x_1, \dots , x_n\} \subset {\mathbb {R}}^d$$ correspond to $$\mu$$ being the empirical measure of the set of points, $$\mu = \frac{1}{n} \sum _{i=1}^n \delta _{x_i}$$. The distribution of mass over the vertices is described by the measure $$\rho \in {\mathcal {P}}({\mathbb {R}}^d)$$ and in most applications we consider $${{\,\mathrm{supp}\,}}\rho \subseteq {{\,\mathrm{supp}\,}}\mu$$. However, in order to prove general stability results (e.g., Theorem 3.14), we need to allow that initially part of the support of $$\rho$$ is outside of the support of $$\mu$$; we think of such mass as outside of the domain specified by $$\mu$$. The mass starting outside of the support of $$\mu$$ can only flow into the support of $$\mu$$. Here we present the evolution assuming $$\rho \ll \mu$$, while in Sections 2 and 3 we present the setup in full generality. Furthermore, we denote by $$\rho$$ both the measure and its density with respect to $$\mu$$.

The evolution of interest is the gradient descent of the energy $${\mathcal {E}}:{\mathcal {P}}({\mathbb {R}}^d)\rightarrow {\mathbb {R}}$$ given by

\begin{aligned} {\mathcal {E}}(\rho )= \frac{1}{2}{\int }_{{\mathbb {R}}^{d}}{\int }_{{\mathbb {R}}^{d}}K(x,y)\,\text {d}\rho (x)\,\text {d}\rho (y) + {\int }_{{\mathbb {R}}^{d}}P(x) \,\text {d}\rho (x), \end{aligned}
(1.5)

where $$K:{{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}\rightarrow {\mathbb {R}}$$ is symmetric and $$P:{{\mathbb {R}}^{d}}\rightarrow {\mathbb {R}}$$. This energy generalizes (1.1) in terms of the configurations $$\rho$$ and specializes it in terms of the type of potentials K and P considered. In fact, from now on we omit the subscripts X referring to the vertices (e.g. in the energy) since our general setting allows for distribution of mass outside of the support of $$\mu$$. The gradient flow we consider takes the form

The system ($${\text {NL}}^2 {\text {IE}}$$) consists first of a nonlocal continuity equation, where the divergence $${\overline{\nabla }}\cdot$$ is encoded with the graph structure described through $$\mu$$ and $$\eta$$ (see Definition 2.7). Secondly, it involves a mapping from velocity to flux, which in our case is the upwind flux and encodes the geometry of the gradient structure. Finally, the third equation identifies the driving velocity as the nonlocal gradient of the variation of the energy (1.5). Overall, we obtain that ($${\text {NL}}^2 {\text {IE}}$$) is the gradient flow of the energy $${\mathcal {E}}$$ with respect to a generalization of the graph Wasserstein metric we now introduce.

#### 2.2.1 Nonlocal Continuity Equation

Let us set

\begin{aligned} {\mathcal {V}}^{\mathrm {as}}(G) := \{v:G\rightarrow {\mathbb {R}}: v(x,y) = -v(y,x)\ \text {for all}\ (x,y)\in G\} \end{aligned}
(1.6)

and call its elements nonlocal (antisymmetric) vector fields on G; for any pair $$(x,y) \in G$$ the value v(xy) can be regarded as a jump rate from x to y. Let us fix a final time $$T>0$$ throughout the paper and let a family $$\{v_t\}_{t\in [0,T]}\subset {\mathcal {V}}^{\mathrm {as}}(G)$$ be given. In the case $$\rho _t \ll \mu$$ for all $$t\in [0,T]$$, it is possible to combine the first two equations in ($${\text {NL}}^2 {\text {IE}}$$) in order to arrive at the nonlocal continuity equation

\begin{aligned} \partial _t\rho _t(x)\!+\!{\int }_{{\mathbb {R}}^{d}}\left( \rho _t(x)v_t(x,y)_+\!-\!\rho _t(y)v_t(x,y)_-\right) \eta (x,y)\,\text {d}\mu (y) \!=\! 0, \mu \text {-a.e.}\ x \!\in \! {{\mathbb {R}}^{d}}.\nonumber \\ \end{aligned}
(1.7)

For general curves $$\rho :[0,T] \rightarrow {\mathcal {P}}({{\mathbb {R}}^{d}})$$, it is necessary to consider the weak form of (1.7), which is discussed in Section 2.3.

We remark that the general setup we develop allows for the solution $$\rho$$ to develop atoms and persist even after the atoms have formed. Heuristic arguments and numerical experiments indicate that there are equations covered by our theory for which this is the case. For example, if $$\mu$$ is the Lebesgue measure on $${\mathbb {R}}$$, $$\rho _0$$ the restriction of the Lebesgue measure to $$[-0.5,0.5]$$, $$K(x,y) = |x-y|$$ and $$\eta (x,y)= 1/(x-y)^2$$, then the solutions develop delta mass concentrations at 0 in finite time. Understanding for which K and $$\eta$$ solutions do develop finite time singularities is an interesting open problem.

We note that when defining the flux in (1.7) we define the density along edges to be the density at the source; analogously to an upwind numerical scheme. While, as we show, this leads to a convenient framework to consider the dynamics, it creates the difficulty that the resulting distance, that we are about to define, is not symmetric and is thus only a quasi-metric.

#### 2.2.2 Upwind Nonlocal Transportation Metric

We use the nonlocal continuity equation (1.7) to define a nonlocal Wasserstein quasi-distance in analogy to the Benamou–Brenier formulation [6] for the classical Kantorovich–Wasserstein distances [50]. That is, for two probability measures $$\rho _0,\rho _1\in {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$, let

\begin{aligned}&{\mathcal {T}}_\mu (\rho _0,\rho _1)^2 \nonumber \\&\quad :=\inf _{(\rho ,v)\in {{\,\mathrm{CE}\,}}(\rho _0,\rho _1)}\left\{ {\int }_0^1{\iint }_G|v_t(x,y)_+|^2\eta (x,y)\,\text {d}\rho _t(x)\,\text {d}\mu (y)\,\text {d}t\right\} ,\qquad \end{aligned}
(1.8)

where $${{\,\mathrm{CE}\,}}(\rho _0,\rho _1)$$ is the set of weak solutions $$\rho$$ to the nonlocal continuity equation (see Definition 2.14) on [0, 1] with $$\rho (0)=\rho _0$$ and $$\rho (1)=\rho _1$$. We note that the notion of the nonlocal Wasserstein distance for measures on $${\mathbb {R}}^d$$ was introduced by Erbar [23], who used it to study the fractional heat equation. One difference is that the interpolation we consider is beyond the scope of [23]. Very recently [43] has extended the gradient flow viewpoint of the jump processes to generalized gradient structures driven by a broad class of internal energies.

Another difference is that here the measure $$\mu$$ plays an important role in how the action is measured and allows one to incorporate seamlessly both the continuum case (e.g., $$\mu$$ is the Lebesgue measure on $${\mathbb {R}}^d$$) and the graph case ($$\mu$$ is the empirical measure of the set of vertices).

The notions above are rigorously developed in Section 2, where we list the precise assumption (W) on the edge weight function $$\eta$$ and the joint assumptions (A1) and (A2)y on $$\eta$$ and the underlying measure $$\mu$$. We then rigorously introduce the action (Definition 2.3), which is a nonlocal analogue of kinetic energy; we show its fundamental properties, in particular joint convexity (Lemma 2.12) and lower semicontinuity with respect to narrow convergence (Lemma 2.9). In Section 2.3 we rigorously introduce the nonlocal continuity equation in measure-valued flux form (2.12); we introduce the notion on all of $${\mathbb {R}}^d$$ where $$\mu$$ does not initially play a role. The measure $$\mu$$ enters the framework by considering paths of finite action. Proposition 2.17 establishes an important compactness property of sequences of solutions. In Section 2.4 we turn our attention to the nonlocal Wasserstein quasi-metric based on the upwind interpolation, which we introduce in Definition 2.18. The compactness of solutions of the nonlocal continuity equation and the lower semicontinuity of the action imply the existence of (directed) geodesics (Proposition 2.20). We do not characterize the geodesics. Nevertheless we note that this is a interesting problem. A possible approach in this direction is via duality using nonlocal analogues of the Hamilton-Jacobi equations, similarly to how this problem was recently treated in the discrete setting in [28, 30]. Following the work of Erbar [23] we show that the nonlocal Wasserstein quasi-metric generates a topology on the set of probability measures which is stronger than the $$W_1$$ topology (i.e., the Monge distance or the 1-Wasserstein metric). Analogously to [2] we show the equivalence between the paths of finite length with respect to the quasi-metric and the solutions of the nonlocal continuity equation with finite action (Proposition 2.20). The set of probability measures endowed with the quasi-metric $${\mathcal {T}}$$ has a formal structure of a Finsler manifold, and parts of this structure can be described; in particular, in (2.27) we describe the tangent space at a given measure $$\rho$$ using the fluxes. We note that using fluxes, instead of velocities, is necessary since, because of the upwinding, the relation between the velocities and the tangent vectors is not linear (Proposition 2.26) and in particular not symmetric. For this reason the resulting gradient structure is also different to the large class of nonlinear, however still symmetric, flux-velocity relations considered in [43]. We conclude Section 2 by showing that, given a measure $$\mu$$, the finiteness of the action ensures that any path starting within the support of $$\mu$$ will remain within the support of $$\mu$$ (Proposition 2.28).

#### 2.2.3 Nonlocal Nonlocal-Interaction Equation

In Section 3 we develop the existence theory of the equation ($${\text {NL}}^2 {\text {IE}}$$) based on the interpretation as the gradient flow of $${\mathcal {E}}$$ with respect to the quasi-metric $${\mathcal {T}}$$ defined in (1.8). We begin by listing the precise conditions (K1)(K3) on the interaction kernel K. We note that these are less restrictive than the typical conditions for the well-posedness of the standard nonlocal-interaction equation in Euclidean setting [2, 10].

Before we turn to the rigorous theory of weak solutions as curves of maximal slope on quasi-metric space, we discuss the gradient flow structure in a more geometric setting, namely the Finsler structure related to $${\mathcal {T}}$$. Indeed, the action [formally given by the time integrand in (1.8), and rigorously defined by (2.4)] defines a positively homogeneous norm (namely a Minkowski norm) on the tangent space. The Hessian of the square of the norm endows the tangent space at each measure with the formal structure of a Riemann manifold. We compute this Riemann metric in “Appendix A” under an absolute-continuity assumption. With this assumption, we show that ($${\text {NL}}^2 {\text {IE}}$$) is the gradient flow of $${\mathcal {E}}$$ with respect to the Finsler structure in Section 3.1. For simplicity, we consider $$P \equiv 0$$, since the extension to $$P \not \equiv 0$$ is straightforward, as it is explained in Remark 3.2.

In Section 3.2 we develop the rigorous gradient descent formulation based on curves of maximal slope in the space of probability measures endowed with the quasi-metric $${\mathcal {T}}$$. The theory of gradient flows in the spaces of probability measures endowed with the standard Wasserstein metric was developed in [2]. Here we extend it to the setting of quasi-metric spaces, endowed with the nonlocal Wasserstein distance. This requires several delicate arguments. We start by introducing the notions of one-sided strong upper gradient (Definition 3.12) and curves of maximal slope (Definition 3.8). We define the local slope $${\mathcal {D}}$$ in (3.19) by using a heuristically derived gradient of the energy $${\mathcal {E}}$$, and show, using a chain rule established in Proposition 3.10, that $$\sqrt{{\mathcal {D}}}$$ is a one-sided strong upper gradient for $${\mathcal {E}}$$ with respect to $${\mathcal {T}}$$. One of our main results is Theorem 3.9, which establishes the equivalence between curves of maximal slope and weak solutions of ($${\text {NL}}^2 {\text {IE}}$$). In Section 3.4 we prove several important results. Namely Theorem 3.14 establishes that the De Giorgi functional $${\mathcal {G}}_T$$ is stable under variations of the base measure $$\mu$$ and of the solutions. A consequence of this result is the convergence of solutions of ($${\text {NL}}^2 {\text {IE}}$$) on graphs defined on random samples of a measure to solutions of ($${\text {NL}}^2 {\text {IE}}$$) corresponding to the full underlying measure (Remark 3.17). The proof of Theorem 3.14 relies on the lower semicontinuity of the local slope (Lemma 3.12) and the lower semicontinuity of the De Giorgi functional (3.13). Another important consequence is the existence of weak solutions of ($${\text {NL}}^2 {\text {IE}}$$), which is proved in Theorem 3.15.

### Remark 1.3

(Asymptotics) Describing the steady states and determining the long-time asymptotics of ($${\text {NL}}^2 {\text {IE}}$$) are natural and important problems. Both questions have been extensively studied for the nonlocal-interaction equations (NLI) which are Wasserstein gradient flows of (1.5) with $$P \equiv 0$$. For attractive interaction potentials it was shown that the solutions converge to a delta mass [7], while for more general repulsive–attractive potentials very rich families of steady states were discovered [3, 35]. We remark that the dynamics of the ($${\text {NL}}^2 {\text {IE}}$$) can be significantly different. Namely, as the example of Remark 3.18 shows, the solutions for attractive potentials do not necessarily converge to a point.

A further question closely related to asymptotics is the contractivity of solutions of ($${\text {NL}}^2 {\text {IE}}$$). For Riemannian gradient flows the contractivity of the flow follows form the geodesic convexity of the energy. In particular if $$K(x,y)=k(x-y)$$, where k is symmetric and convex, the NLI flow is contractive in Wasserstein metric [2, 11]. Determining the geodesic convexity of energies in the setting of the nonlocal Wasserstein metrics is an intriguing question. Thus far, the only result in the general (not purely discrete) setting is the geodesic convexity of the entropy [23]. However, for Finslerian gradient flows we caution that establishing geodesic convexity does not imply contractivity, as [42] shows. Instead a new property of skew-convexity [42, Definition 3.1] needs to be investigated.

Finally we note that the asymptotics of gradient flows with respect to (nonlocal) Wasserstein metrics in discrete setting has recently been investigated in [15, 26], where the equations also include diffusion (i.e., energy includes an entropic contribution). These papers use the convexity of the total energy in the discrete setting to establish the exponential convergence of the flow towards the unique minimizer. Establishing under which conditions (on the graph construction, etc.) do these estimates persist in the discrete to continuum limit as the number of vertices increases is an interesting open problem. We also remark that, while these results do not carry over to our setting, analyzing the asymptotics of ($${\text {NL}}^2 {\text {IE}}$$) in purely discrete setting is an intriguing and potentially approachable question.

### 2.3 Relation to the Numerical Finite-Volume Upwind Scheme

Equation (1.7) can be interpreted in several ways. For example, it can be understood as the master equation of a continuous-time and continuous-space Markov jump process on the graphon $$({{\mathbb {R}}^{d}}, \eta )$$, that is, a continuous graph with vertices $${{\mathbb {R}}^{d}}$$, and symmetric weight $$\eta (x,y)$$ for $$(x,y)\in \{(x,y)\in {\mathbb {R}}^d\times {\mathbb {R}}^d: x\ne y\}$$. The stochastic interpretation is that a particle at position $$x\in {\mathbb {R}}^d$$ jumps according to the measure $$v(x,y)_+\eta (x,y)\text {d}\mu (y)$$ to $$y\in {\mathbb {R}}^d$$. In this way it gives rise to a Markov jump process related to the numerical upwind scheme.

The numerical upwind scheme is one of the basic finite-volume methods used to solve conservation laws; see [29]. To draw the connection, let $$\{x_1, \dots , x_n\}$$ be a suitable representative of a tessellation $$\{K_1,\dots ,K_n\}$$, for instance a Voronoi tessellation, of some bounded domain $$\Omega \subset {\mathbb {R}}^d$$. Let $$\mu$$ be the Lebesgue measure on $$\Omega$$ and take $$\eta$$ to be the transmission coefficient common in finite-volume schemes: $$\eta (x_i,x_j) = {\mathcal {H}}^{d-1}(\overline{K_i}\cap \overline{K_j})/{\text {Leb}}(K_i)$$, for $$i,j\in \{1,\dots ,n\}$$, where $${\mathcal {H}}^{d-1}(\overline{K_i}\cap \overline{K_j})$$ is the $$d-1$$ dimensional Hausdorff measure of the common face between $$K_i$$ and $$K_j$$. With this choice the equation (1.7) becomes the (continuous-time) discretization of the classical continuity equation

\begin{aligned} \partial _t \rho _t + \nabla \cdot \left( {\mathbf {v}}_t \, \rho _t \right) = 0 \end{aligned}

for some vector field $${\mathbf {v}}_t:\Omega \rightarrow {\mathbb {R}}^d$$. Hereby, the discretized vector field $$v_t$$ is obtained from $${\mathbf {v}}_t$$ by taking the average over common interfaces:

\begin{aligned} v_t(x_i,x_j) = \frac{1}{{\mathcal {H}}^{d-1}(\overline{K_i}\cap \overline{K_j})} {\int }_{\overline{K_i}\cap \overline{K_j}} {\mathbf {v}}_t \cdot \nu _{K_i,K_j} \,\text {d}{\mathcal {H}}^{d-1}, \end{aligned}

where $$\nu _{K_i,K_j}$$ is the unit normal to $$K_i$$ pointing from $$K_i$$ to $$K_j$$. We refer to the recent work [9] for a variational interpretation of the upwind scheme, which is close to that we propose for the more general equation (1.7). Earlier results in this direction are contained in [21, 38].

The connection to finite-volume schemes explains also that the nonlocality in (1.7) introduces a regularization, which in the numerical literature is referred to as numerical diffusion. That the numerical diffusion is actually an honest Markov jump process, as described at the beginning of this section, was observed and used to find optimal convergence rates in the works [19, 20, 45, 46].

### 2.4 Comparison with Other Discrete Metrics and Gradient Structures

The interpretation of diffusion on graphs as gradient flows of the entropy was independently carried out in [14, 36, 37]. Here we recall the descriptions of the flows relying on reversible Markov chains, which was the framework used in [25, 27, 36]. Starting with Markov chains, which then determine the edge weights, offers an additional layer of modeling flexibility. In particular, consider the Markov chain with state space $$X = \{x_1, \dots , x_n\}$$ and jump rates $$\{Q_{x,y}\}_{x,y\in X}$$. Let $$\pi _x$$ be the reversible probability measure for the Markov chain, meaning that it satisfies the detailed balance condition $$\pi _x Q_{x,y} = \pi _y Q_{y,x}$$. The edge weights $$\{w_{x,y}\}_{x,y\in X}$$ are given by $$w_{x,y}=\pi _x Q_{x,y}$$. The energy considered is the relative entropy: for $$\rho :X \rightarrow [0,1]$$ with $$\sum _{x \in X} \rho _x = 1$$ we define

\begin{aligned}&{\mathcal {H}}(\rho \mid \pi ) = \sum _{x} \rho _x \log \frac{\rho _x}{\pi _x} = \sum _{x} \rho _x \log \rho _x - \sum _{x} \rho _x \log \pi _x = {\mathcal {S}}(\rho ) + {\mathcal {E}}_{P}(\rho ) \nonumber \\&\quad \text {with} \quad P_x = -\log \pi _x . \end{aligned}
(1.9)

The paths in the configuration space are given as the solution of the continuity equation which for the flux $$\{j_{x,y}:[0,T]\rightarrow {\mathbb {R}}\}_{x,y\in X}$$ takes the form (1.2).

To compute the flux from a given velocity $$\{v_{x,y}\}_{x,y\in X}$$ (an edge-based quantity) and density $$\{\rho _x\}_{x\in X}$$ (a vertex-based quantity), one interpolates the densities at vertices to define the density (and hence the flux) along the edges. The literature so far has considered a proportional constitutive relation of the form

\begin{aligned} j_{x,y} = v_{x,y} \, \theta \biggl ( \frac{\rho _x}{\pi _x},\frac{\rho _y}{\pi _y}\biggr ) , \end{aligned}
(1.10)

where the function $$\theta :{\mathbb {R}}_+\times {\mathbb {R}}_+\rightarrow {\mathbb {R}}_+$$ needs to be one-homogeneous for dimensional reasons. In addition, it is assumed that the function $$\theta$$ is an interpolation, that is, $$\min \{a,b\}\leqq \theta (a,b)\leqq \max \{a,b\}$$. The choice providing a gradient flow characterization for linear Markov chains is the logarithmic mean, defined by $$\theta (a,b)= \frac{a-b}{\log a - \log b}$$ for $$a \ne b$$ and $$\theta (a,a)=a$$.

The associated transportation distance is obtained by minimizing the action functional

\begin{aligned} {\mathcal {A}}(\rho ,j) = \frac{1}{2}\sum _{x,y} j_{x,y} \, v_{x,y} \, w_{x,y}&= \frac{1}{2} \sum _{x,y} \frac{\left| j_{x,y} \right|^2}{\theta \left( \frac{\rho _x}{\pi _x},\frac{\rho _y}{\pi _y}\right) } \pi _x Q_{x,y} . \end{aligned}
(1.11)

The corresponding transportation distance is induced as the minimum of the action along paths:

\begin{aligned} \inf \left\{ {\int }_0^1 {\mathcal {A}}(\rho (t),j(t)) \,\text{ d }{t} : \big (\rho (t),j(t)\big )_{t\in [0,1]} \text{ solves }~(1.2)\, \text{ and } \rho (0)\!=\!\rho _0,\, \rho (1)\!=\!\rho _1 \right\} . \end{aligned}

As we do in Corollary 2.8, it was shown that it suffices to consider antisymmetric fluxes. To arrive at a gradient flow formulation, one considers the metric induced by the action function (1.11):

\begin{aligned} g_\rho (j^1,j^2) = \frac{1}{2} \sum _{x,y} \frac{j_{x,y}^1 \, j_{x,y}^2}{\theta \Bigl ( \frac{\rho _x}{\pi _x},\frac{\rho _y}{\pi _y}\Bigr ) } \pi _x Q_{x,y} . \end{aligned}
(1.12)

Then the gradient $${{\,\mathrm{grad}\,}}{\mathcal {H}}$$ of the relative entropy (1.9) with respect to this metric is given as the antisymmetric flux $$j^*$$ of minimal norm satisfying

\begin{aligned} g_\rho ({{\,\mathrm{grad}\,}}{\mathcal {H}},j) = {{\,\mathrm{Diff}\,}}{\mathcal {H}}[j] = \left. \frac{{\text {d}}^{} }{\text {d} t^{}}\right| _{t=0} {\mathcal {H}}({\tilde{\rho }}(t)) , \end{aligned}
(1.13)

for any curve $$({\tilde{\rho }}(t))_{t\geqq 0}$$ such that $$\partial _t \rho (0) = - \big ({\overline{\nabla }}\cdot j\big )$$. Expanding (1.13) and using that $$j^*$$ is antisymmetric gives

\begin{aligned} \frac{1}{2} \sum _{x,y} \frac{j_{x,y} \, j^*_{x,y}}{\theta \Bigl ( \frac{\rho _x(t)}{\pi _x},\frac{\rho _y(t)}{\pi _y}\Bigr ) } \pi _x Q_{x,y} = - \frac{1}{2}\sum _{x,y} \left( \log \frac{\rho _x(t)}{\pi _x} - \log \frac{\rho _y(t)}{\pi _y}\right) j_{x,y} \, w_{x,y}. \end{aligned}

Since this identity holds for all $$j_{x,y}$$, the flux $$j^*$$ is identified by

\begin{aligned} j_{x,y}^* = - \left( \log \frac{\rho _x}{\pi _x} - \log \frac{\rho _y}{\pi _y}\right) \theta \left( \frac{\rho _x(t)}{\pi _x},\frac{\rho _y(t)}{\pi _y}\right) = -\left( \frac{\rho _x(t)}{\pi _x} - \frac{\rho _y(t)}{\pi _y}\right) , \end{aligned}

where the last equality holds for the particular choice of the logarithmic mean interpolation $$\theta (r,s) = \frac{r-s}{\ln r - \ln s}$$. By plugging $$j_{x,y}^*$$ into the continuity equation (1.2), one recovers the (linear) heat equation on graphs.

The next relevant step is the introduction of the interaction and the potential energies as in (1.1). In particular, [25] provides a gradient flow structure for free energy functionals of the form

\begin{aligned} {\mathcal {F}}_{\beta }(\rho ) = \beta ^{-1} {\mathcal {S}}(\rho ) + {\mathcal {E}}_X(\rho ) , \end{aligned}
(1.14)

where $$\beta >0$$ is the inverse temperature. This is the discrete analogue of the McKean-Vlassov equation. Finding a desirable gradient flow structure is nontrivial since considering the logarithmic interpolation, which makes the diffusion term linear, would make the potential term nonlinear, and thus the Fokker–Planck equation on graphs would be nonlinear. To cope with this, the framework of [25] extends the linear theory outlined above to a family of nonlinear Markov chains satisfying a local detailed balance condition. The consequence for the resulting gradient structure is that the quantities $$\{\pi _x\}_{x\in X}$$, $$\left\{ Q_{x,y}\right\} _{x,y\in X}$$ and $$\left\{ w_{x,y}\right\} _{x,y\in X}$$ depend on the current state $$\rho$$ in such a way that the detailed balance condition $$w_{x,y}[\rho ] = \pi _x[\rho ] Q_{x,y}[\rho ] = \pi _y[\rho ] Q_{y,x}[\rho ]$$ is still valid for all $$\rho \in {\mathcal {P}}(X)$$. In particular, for $${\mathcal {F}}_\beta$$ defined in (1.14), it holds that

\begin{aligned}&\pi _x[\rho ] = \frac{1}{Z_\beta } \exp \left( - \beta \left( P_x + \sum _{y} K_{x,y}\rho _y\right) \right) \quad \text {with} \\&\quad Z_\beta = \sum _x \exp \left( - \beta \left( P_x + \sum _{y} K_{x,y}\rho _y\right) \right) . \end{aligned}

It would be natural to try to build the framework for the case $$\beta =\infty$$, which we consider in this paper, by taking the limit $$\beta \rightarrow \infty$$ in the framework of [25]. It turns out that this limit is singular for the constructed gradient structure. First of all, the measure $$\pi _x[\rho ]$$ degenerates at all points except at the argmin of the effective potential $$x\mapsto P_x + \sum _{y} K_{x,y}\rho _y$$. This causes the constitutive relation (1.10) to become meaningless. A more detailed analysis also shows that the metric in (1.12) degenerates.

We also note that in this setting the potential functions P and K and inverse temperature $$\beta$$ enter the metric in (1.11) through the weights $$w_{x,y}$$ and rate matrix $$Q_{x,y}$$. This is in stark contrast to the continuous classical gradient flow formulation for free energies of the form $${\mathcal {F}}_\beta$$ form (1.14), where the metric is always the $$L^2$$-Wasserstein distance, independently of the potentials P and K and also of the inverse temperature $$\beta >0$$, including $$\beta =\infty$$ [2, 10, 11, 33].

Another approach to McKean-Vlasov equations is to consider the arithmetic interpolation, as was done in [15]. The theory the authors developed requires the densities to be strictly positive and diffusion to be present. We note that the diffusion itself is nonlinear.

The above problems lead us to consider the upwind interpolation in the flux-velocity relation (1.10). In view of (1.2), this relation is replaced in the present setting by

\begin{aligned}&j_{x,y} = \rho _x (v_{x,y})_+ - \rho _y (v_{x,y})_- = \Theta (\rho _x, \rho _y; v_{x,y}) v_{x,y} \qquad \text {where } \nonumber \\&\quad \Theta (a,b; v) = {\left\{ \begin{array}{ll} a &{}\quad \text {if } v>0, \\ b &{} \quad \text {if } v<0, \\ 0 &{} \quad \text {if } v=0. \end{array}\right. } \end{aligned}
(1.15)

Note that the relation (1.15) is a functional relation between velocity and flux with the interpolation $$\Theta$$ depending on the velocity.

We remark that solutions of system (1.2)–(1.4) are not the limit of the gradient flows in [25] as $$\beta \rightarrow \infty$$. We emphasize here that the limit of these dynamics as $$\beta \rightarrow \infty$$ would in fact not be the desirable gradient flow of the nonlocal-interaction energy, since the initial support of the solutions would never expand; see the related Remark 1.2.

We conclude this section by observing that it seems possible to generalize the upwind interpolation in a continuous way to define a flux-velocity relation to deal with free energies $${\mathcal {F}}_\beta$$ for $$\beta >0$$. A candidate, inspired by the Scharfetter–Gummel scheme [44], is the following constitutive flux-velocity relation depending on $$\beta$$:

\begin{aligned} j^\beta _{x,y} = v_{x,y} \, \frac{\rho _x \exp \left( \beta v_{x,y}/2\right) - \rho _y \exp \left( -\beta v_{x,y}/2\right) }{ \exp \left( \beta v_{x,y}/2\right) - \exp \left( -\beta v_{x,y}/2\right) }. \end{aligned}

In particular, it holds that $$j^\beta _{x,y} \rightarrow j_{x,y}$$ as $$\beta \rightarrow \infty$$, where $$j_{x,y}$$ is as in (1.15). The form of $$j^\beta _{x,y}$$ can be physically deduced from the one-dimensional cell problem for the unknown value $$j^\beta _{x,y}\in {\mathbb {R}}$$ and function $$\rho :[0,1]\rightarrow {\mathbb {R}}$$:

\begin{aligned} j^\beta _{x,y} = -\beta ^{-1} {\overline{\nabla }}\rho (\cdot ) + v_{x,y} \, \rho (\cdot ) \quad \text {on } [0,1], \qquad \text {with } \rho (0)= \rho _x \text { and } \rho (1) = \rho _y . \end{aligned}

Note that $$j^\beta _{x,y} = \frac{\rho _x-\rho _y}{\beta }$$ for $$v_{x,y} =0$$, which is the flux due to Fick’s law. Likewise, $$j^\beta _{x,y} = 0$$ for $$v_{x,y} = \beta ^{-1} \log \frac{\rho _y}{\rho _x}$$, which is the velocity needed to counteract the diffusion. In [47], it is shown that the Scharfetter–Gummel finite volume scheme provides a stable positivity preserving numerical approximation of the diffussion-aggregation equation, which also respects the thermodynamic free energy structure. We pursue the investigation of the existence of a possible related gradient structure in future research.

### 2.5 Connections to Machine Learning

Part of the motivation for the present work comes from applications to machine learning. Here we introduce a family of nonlinear gradient flows that is relevant to discovering local concentrations in networks akin to modes of a distribution.

Our main interest is in equations posed on graphs whose vertices are random samples of some underlying distribution and whose edge weights are a function of distances between vertices. In machine learning one often deals with data in the form of a point cloud in high-dimensional space. While the ambient dimension may be very large, the data often possess an underlying low-dimensional structure that can be used in making reliable inferences about the underlying data distribution. To use the geometric information, we follow one of the standard approaches and consider graphs associated to point clouds. Formulating the machine learning tasks directly on the point cloud enables one to access the geometric structure of the distribution in a simple and computationally efficient way. The works in the literature have mostly focused on models based on minimizing objective functionals modeling tasks such as clustering or dimensionality reduction [5, 31, 32, 34, 40], or based on characterizing clusters through estimating some property of the data distribution (most often the density); see [12] and references therein. Only few dynamical models have been considered—notable among them are diffusion maps [16], where the heat equation is used to redistance the points.

Here we focus on models that are motivated by nonlocal PDEs. Consider a probability measure $$\mu$$ on $${\mathbb {R}}^d$$ with finite second moments. Let $$X =\{x_1, \dots , x_n\}$$ be random i.i.d. samples of the measure $$\mu$$. Let $$\mu ^n = \frac{1}{n} \sum _{i=1}^n \delta _{x_i}$$ be the empirical measure of the sample and let $$K:{\mathbb {R}}^d \times {\mathbb {R}}^d \rightarrow {\mathbb {R}}$$ be symmetric and $$P:{\mathbb {R}}^d \rightarrow {\mathbb {R}}$$. The total energy $${\mathcal {E}}_X:{\mathcal {P}}(X)\rightarrow {\mathbb {R}}$$, given in (1.1), for the empirical measure $$\mu ^n$$ can be rewritten as

\begin{aligned} {\mathcal {E}}_X(\mu ^n)= & {} {\mathcal {E}}_I(\mu ^n) + {\mathcal {E}}_P(\mu ^n) \nonumber \\= & {} \frac{1}{2} {\iint }_{{{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}} K(x,y) \,\text {d}\mu ^n(x) \,\text {d}\mu ^n(y) + {\int }_{{\mathbb {R}}^{d}}P(x)\,\text {d}\mu ^n(x).\qquad \end{aligned}
(1.16)

The gradient flow of $${\mathcal {E}}_X$$ with respect to the graph Wasserstein metric $${\mathcal {T}}_{\mu ^n}$$ defined in (1.8) is described by the ODE system (1.2)–(1.4), where $$K_{x_i,x_j} = K(x_i,x_j)$$ and $$P_{x_i} =P(x_i)$$ for all $$i,j\in \{1,\dots ,n\}$$. Another evolution by such system is illustrated on Fig. 2.

Here we remark on the contrast between (1.2)–(1.4) and the gradient flow of (1.16) in the ambient space $${\mathbb {R}}^d$$, with respect to the standard Wasserstein metric, which takes the form

\begin{aligned} \dot{x}_i = - \nabla P(x_i)- \sum _{j=1}^n \rho _j \nabla _{x_i} K(x_i , x_j). \end{aligned}
(1.17)

The first notable difference is that, on the graph, masses change and the positions remain fixed, while in $${\mathbb {R}}^d$$ positions change and the masses remain fixed. This difference is somewhat superficial, since both equations describe the rearrangement of mass in order to decrease the same energy in the most efficient way measured by two different metrics. The main difference is that the graph encodes the geometry of the space that mass is allowed to occupy. In particular, it ensures that the geometric mode discovered will be a data point itself.

We note that the popular mean-shift algorithm [17] can be interpreted as a time-stepping algorithm to approximate solutions of (1.17) with $$K\equiv 0$$ and $$P = \ln (\theta * \mu ^n(0))$$, where $$\mu ^n(0)$$ is the empirical measure of the initial distribution of particles and $$\theta * \mu ^n(0)$$ is the kernel density estimate of the density $${\varvec{\rho }}$$ of the underlying distribution. Namely the step of the mean-shift algorithm is to replace the position of the particle at $$x_j$$ by the center of mass of $$\theta ( \,\cdot \, - x_j)* \mu _n(0)$$ and iterate the procedure. Formal expansion shows that this is a time step of the forward scheme for the flow driven by $$P = \ln (\theta * \mu ^n(0))$$. We note that considering the gradient flow of the corresponding energy on the graph (1.2)–(1.4) ensures that the modes of the distribution discovered by the (graph) mean-shift algorithm will remain within the data set. Furthermore, we note that adding nonlocal attraction on the graph progressively clumps nearby masses together and thus provides an approach to agglomerative clustering.

One of our main results, stated in Theorem 3.14, is that as $$n \rightarrow \infty$$ the solutions of the graph-based equation (1.2)–(1.4) narrowly converge along a subsequence to a solution of the nonlocal nonlocal-interaction equation ($${\text {NL}}^2 {\text {IE}}$$).

## 3 Nonlocal Continuity Equation and Upwind Transportation Metric

### 3.1 Weight Function

Throughout the paper we consider a weight function $$\eta :\{(x,y)\in {\mathbb {R}}^d\times {\mathbb {R}}^d : x\ne y\} \rightarrow [0,\infty )$$, which shall always satisfy

Since $$\eta$$ is symmetric, we regard the edges set G as undirected graph. Many of the edge-based quantities we consider, like vector fields and fluxes, will lie in an $$\eta$$-weighted $$L^2$$ space, $$L^2(\eta \, \lambda )$$ for some $$\lambda \in {\mathcal {M}}(G)$$. The space $$L^2(\eta \,\lambda )$$ is equipped with the inner product

\begin{aligned}&\langle f,g\rangle _{L^2(\eta \lambda )} = \frac{1}{2} {\iint }_G f(x,y) g(x,y) \eta (x,y) \text{ d }\lambda (x,y) \quad \text{ for } \text{ all } f,g\in L^2(\eta \,\lambda ) ,\qquad \end{aligned}
(2.1)

where the factor $$\frac{1}{2}$$ ensures that each undirected edge is counted only once.

Below we state two assumptions on the base measure $$\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})$$ and the weight function $$\eta$$, where we use the notation $$\vee$$ to denote the maximum.

• (A1) (moment bound) The family of functions $$\{\left( |x-\cdot |^2 \vee |x-\cdot |^4 \right) \eta (x,\cdot )\}_{x\in {{\mathbb {R}}^{d}}}$$ is uniformly integrable with respect to $$\mu$$, that is, for some $$C_\eta \in (0,\infty )$$, it holds that

\begin{aligned} \sup _{x\in {\mathbb {R}}^d} {\int } \left( |x-y|^2 \vee |x-y|^4 \right) \, \eta (x,y)\,\text {d}\mu (y) \leqq C_\eta . \end{aligned}
• (A2) (local blow-up control) The family of measures $$\{|x - \cdot |^2\eta (x,\cdot ) \mu (\cdot )\}_{x\in {{\mathbb {R}}^{d}}}$$ is locally uniformly integrable, that is,

\begin{aligned}&\lim _{\varepsilon \rightarrow 0} \sup _{x\in {\mathbb {R}}^d} {\int }_{B_\varepsilon (x){\setminus }\{x\}} |x-y |^2 \, \eta (x,y) \,\text{ d }\mu (y)= 0, \quad \text{ where } \\&\quad B_\varepsilon (x) = \bigl \{ y\in {\mathbb {R}}^d: |x-y|<\varepsilon \bigr \}. \end{aligned}

### Remark 2.1

Continuity on G in (W) is needed to obtain lower semicontinuity of the action functional; see Lemma 2.9. Assumption (A1) ensures well-posedness of the nonlocal continuity equation we shall introduce in Section 2.3, whereas Assumption (A2) is necessary for compactness of solutions to the nonlocal continuity equation; see Proposition 2.17.

### Example 2.2

Typically the function $$\eta$$ is a function of the distance

\begin{aligned} \eta (x,y) = \vartheta \bigl (|x-y|) \quad \hbox { for all}\ (x,y)\in G, \end{aligned}

where $$\vartheta :(0,\infty ) \rightarrow [0,\infty )$$ is continuous on $$\{\vartheta >0\}$$ and satisfies analogues of (A1) and (A2). An important example are geometric graphs with connectivity distance given by $$\varepsilon >0$$ and weight

\begin{aligned} \eta _\varepsilon (x,y) = \frac{2(2+d)}{\varepsilon ^2} \frac{\chi _{B_\varepsilon (x)}(y)}{|B_\varepsilon |} \quad \hbox { for all}\ (x,y) \in G . \end{aligned}
(2.2)

In this example, fixing $$\mu = {\text {Leb}}({{\mathbb {R}}^{d}})$$, we conjecture that the weak formulation of ($${\text {NL}}^2 {\text {IE}}$$)—see Section 3—converges to the nonlocal aggregation equation $$\partial _t \rho _t = \nabla \cdot \left( \rho _t \nabla K*\rho _t+ \rho _t \nabla P\right)$$ as $$\varepsilon \rightarrow 0$$ for sufficiently smooth potentials K and P. See Section 3.5 for a discussion on the local limit.

### 3.2 Action

The form of the action inside (1.8) seems practical, but it does not have any obvious convexity and lower semicontinuity properties. Therefore, we define the action in flux variables. We start by introducing some notation. For a signed measure $${\varvec{j}}\in {\mathcal {M}}(G)$$, we denote by $${\varvec{j}}={\varvec{j}}^+-{\varvec{j}}^-$$ its Jordan decomposition. Moreover, for any measurable $$A\subseteq G$$, let $$A^\top =\{(y,x) \in {{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}: (x,y)\in A\}$$ be its transpose. Likewise, for $${\varvec{j}}\in {\mathcal {M}}(G)$$, we denote by $${\varvec{j}}^\top$$ the transposed measure defined by $${\varvec{j}}^\top (A)={\varvec{j}}(A^\top )$$.

For any measures $$\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})$$ and $$\rho \in {\mathcal {P}}({{\mathbb {R}}^{d}})$$, we define the (restricted) product measures $$\gamma _i\in {\mathcal {M}}^+(G)$$ for $$i=1,2$$ as

\begin{aligned}&\text {d}\gamma _1(x,y) = \text {d}\rho (x)\text {d}\mu (y) \qquad \text {and} \nonumber \\&\text {d}\gamma _2(x,y) = \text {d}\mu (x) \text {d}\rho (y) \qquad \text {for} \, (x,y)\in G. \end{aligned}
(2.3)

Note that $$\gamma ^\top _1 = \gamma _2$$. We define the action for general $$\eta$$ which we only require to satisfy Assumption (W), i.e., continuity on G, symmetry and positivity.

### Definition 2.3

(Action) For $$\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})$$, $$\rho \in {\mathcal {P}}({{\mathbb {R}}^{d}})$$ and $${\varvec{j}}\in {\mathcal {M}}(G)$$, consider $$\lambda \in {\mathcal {M}}(G)$$ such that $$\rho \otimes \mu ,\mu \otimes \rho ,|{\varvec{j}}|\ll |\lambda |$$. We define

\begin{aligned} {\mathcal {A}}(\mu ;\rho ,{\mathbf {j}})= & {} {} \frac{1}{2}{\iint }_G \alpha \left( \frac{\text{ d }{\mathbf {j}}}{\text{ d }|\lambda |}, \frac{\text{ d }(\rho \otimes \mu )}{\text{ d }|\lambda |}\right) \eta \,\text{ d }|\lambda | \nonumber \\&\quad + \frac{1}{2}{\iint }_G \alpha \left( -\frac{\text{ d }{\mathbf {j}}}{\text{ d }|\lambda |}, \frac{\text{ d }(\mu \otimes \rho )}{\text{ d }|\lambda |}\right) \eta \,\text{ d }|\lambda | . \end{aligned}
(2.4)

Hereby, the lower semicontinuous, convex, and positively one-homogeneous function $$\alpha :{\mathbb {R}}\times {\mathbb {R}}_+\rightarrow {\mathbb {R}}_+\cup \{\infty \}$$ is defined, for all $$j\in {\mathbb {R}}$$ and $$r\geqq 0$$, by

\begin{aligned} \alpha (j,r):={\left\{ \begin{array}{ll} \frac{(j_+)^2}{r} &{}\quad \text {if}\ r>0,\\ 0 &{}\quad \text {if}\ j\leqq 0\ \text {and}\ r=0,\\ \infty &{}\quad \text {if}\ j> 0\ \text {and}\ r=0, \end{array}\right. } \end{aligned}
(2.5)

with $$j_+=\max \{0,j\}$$. If the measure $$\mu$$ is clear from the context, we write $${\mathcal {A}}(\rho ,{\varvec{j}})$$ for $${\mathcal {A}}(\mu ;\rho ,{\varvec{j}})$$.

Note that Definition 2.3 is well-posed since the one-homogeneity of $$\alpha$$ makes it independent of the particular choice of $$\lambda$$ as long as the absolute continuity condition in Definition 2.3 is satisfied. An example of such measure is a $$\lambda$$ such that $$|\lambda |=|\rho \otimes \mu |+|\mu \otimes \rho |+|{\varvec{j}}|$$. Moreover, $$\lambda$$ can be chosen symmetric, otherwise it can be replaced by $$\frac{1}{2}(\lambda +\lambda ^\top )$$.

### Remark 2.4

We note that the action is inversely proportional to the measure $$\mu$$: doubling the measure $$\mu$$ leads to halving the action. This has important consequence for the way $$\mu$$ influences the geometry of the space of measures. In particular, $$\mu$$ not only sets the region where mass can be transported, but also makes the transport less costly in the regions of high density of $$\mu$$.

### Remark 2.5

If $$\rho \ll \mu$$, then we denote its density by $$\rho$$ by abuse of notation, and if furthermore $${\varvec{j}}\ll \mu \otimes \mu$$ with density j, then it holds that

\begin{aligned} {\mathcal {A}}(\mu ;\rho ,{\varvec{j}}) = \frac{1}{2}{\iint }_G \left( \frac{(j(x,y)_+)^2}{\rho (x)} + \frac{(j(x,y)_-)^2}{\rho (y)}\right) \eta (x,y)\,\text {d}\mu (x)\,\text {d}\mu (y). \end{aligned}

In the following lemma we can see that the action takes the form from the tentative definition of the metric in (1.8), as soon as it is bounded.

### Lemma 2.6

Let $$\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})$$, $$\rho \in {\mathcal {P}}({{\mathbb {R}}^{d}})$$ and $${\varvec{j}}\in {\mathcal {M}}(G)$$ be such that $${\mathcal {A}}(\mu ;\rho ,{\varvec{j}})<\infty$$. Then there exists a measurable $$v:G\rightarrow {\mathbb {R}}$$ such that

\begin{aligned} \text {d}{\varvec{j}}(x,y)=v(x,y)_+\text {d}\rho (x)\text {d}\mu (y)-v(x,y)_-\text {d}\mu (x)\text {d}\rho (y), \end{aligned}
(2.6)

and it holds that

\begin{aligned} {\mathcal {A}}(\mu ;\rho ,{\varvec{j}})=\frac{1}{2}{\iint }_G\left( |v(x,y)_+|^2+|v(y,x)_-|^2\right) \eta (x,y)\,\text {d}\rho (x)\,\text {d}\mu (y). \end{aligned}
(2.7)

In particular, if $$v\in {\mathcal {V}}^{\mathrm {as}}(G)$$, then

\begin{aligned} {\mathcal {A}}(\mu ;\rho ,{\varvec{j}})={\iint }_G|v(x,y)_+|^2\eta (x,y)\,\text {d}\rho (x)\,\text {d}\mu (y). \end{aligned}
(2.8)

### Proof

Let $$\lambda \in {\mathcal {M}}^+(G)$$ be such that $$\text {d}\gamma _1(x,y) = \text {d}\rho (x) \text {d}\mu (y) = {\tilde{\gamma }}_1(x,y) \text {d}\lambda (x,y)$$, likewise $$\text {d}\gamma _2(x,y) = \text {d}\mu (x) \text {d}\rho (y) = {\tilde{\gamma }}_2(x,y) \text {d}\lambda (x,y)$$, and $$\text {d}{\varvec{j}}= {\tilde{j}} \text {d}\lambda$$ for some measurable $${\tilde{\gamma }}_1,{\tilde{\gamma }}_2,{\tilde{j}}:G\rightarrow {\mathbb {R}}$$. Without loss of generality we can assume $$\lambda$$ to be symmetric; for instance by considering $$\tfrac{1}{2} (\lambda + \lambda ^\top )$$ instead. Thus, (2.4) implies

\begin{aligned}&{\mathcal {A}}(\mu ;\rho ,{\mathbf {j}})\\&\quad = \frac{1}{2}{\iint }_G \left( \alpha \big ( \tilde{j}(x,y), {\tilde{\gamma }}_1(x,y)\big ) + \alpha \big ( -\tilde{j}(x,y), {\tilde{\gamma }}_2(x,y)\big )\right) \eta (x,y)\,\text{ d }\lambda (x,y) < \infty . \end{aligned}

By the definition of the function $$\alpha$$ in (2.5), it immediately follows that the vector field $${\tilde{v}}^+(x,y) = \frac{{\tilde{j}}(x,y)_+}{{\tilde{\gamma }}_1(x,y)}$$ is well-defined $$\gamma _1$$-a.e. on G. By the same argument, we find that $${\tilde{v}}^-(x,y) = \frac{{\tilde{j}}(x,y )_-}{{\tilde{\gamma }}_2(x,y)}$$ is well-defined $$\gamma _2$$-a.e. on G. Since $$\gamma _1=\gamma _2^\top$$ we have that $${\left( {\tilde{v}}^-\right) }^{\top }$$ exists $$\gamma _1$$-a.e. on G. Hence, we obtain the measurable vector field

\begin{aligned} v(x,y) = {\tilde{v}}^+(x,y) - {\tilde{v}}^-(x,y). \end{aligned}

The statement (2.7) follows by using the positively one-homogeneity of $$\alpha$$, the identity $$\alpha (j,r)=\alpha (j_+,r)$$ and the symmetry of $$\lambda$$:

\begin{aligned} {\mathcal {A}}(\mu ;\rho ,{\varvec{j}})&= \frac{1}{2}{\iint }_G \alpha \big (v(x,y)_+ {\tilde{\gamma }}_1(x,y), {\tilde{\gamma }}_1(x,y)\big ) \, \eta (x,y) \,\text {d}\lambda (x,y)\\&\qquad + \frac{1}{2}{\iint }_G \alpha \big (v(x,y)_- {\tilde{\gamma }}_2(x,y) , {\tilde{\gamma }}_2(x,y)\big ) \, \eta (x,y) \,\text {d}\lambda (x,y) \\&= \frac{1}{2}{\iint }_G | v(x,y)_+ |^2 {\tilde{\gamma }}_1(x,y) \, \eta (x,y) \,\text {d}\lambda (x,y) \\&\qquad + \frac{1}{2}{\iint }_G | v(y,x)_- |^2 {\tilde{\gamma }}_1(x,y) \, \eta (x,y) \,\text {d}\lambda (x,y). \qquad \qquad \quad \qquad \end{aligned}

$$\square$$

### Definition 2.7

(Nonlocal gradient and divergence) For any function $$\phi :{{\mathbb {R}}^{d}}\rightarrow {\mathbb {R}}$$ we define its nonlocal gradient $${\overline{\nabla }}\phi :G \rightarrow {\mathbb {R}}$$ by

\begin{aligned} {\overline{\nabla }}\phi (x,y)=\phi (y)-\phi (x) \quad \text{ for } \text{ all } \, (x,y)\in G. \end{aligned}

For any $${\varvec{j}}\in {\mathcal {M}}(G)$$, its nonlocal divergence $${\overline{\nabla }}\cdot {\varvec{j}}\in {\mathcal {M}}({\mathbb {R}}^d)$$ is defined as $$\eta$$-weighted adjoint of $${\overline{\nabla }}$$, i.e.,

\begin{aligned} {\int } \phi \,\text {d}{\overline{\nabla }}\cdot {\varvec{j}}= & {} - \frac{1}{2}{\iint }_G{\overline{\nabla }}\phi (x,y) \eta (x,y)\,\text {d}{\varvec{j}}(x,y) \\= & {} \frac{1}{2}{\int } \phi (x) {\int } \eta (x,y) \left( \text {d}{\varvec{j}}(x,y) - \text {d}{\varvec{j}}(y,x)\right) . \end{aligned}

In particular, for $${\varvec{j}}\in {\mathcal {M}}^{\mathrm {as}}(G) := \{{\varvec{j}}\in {\mathcal {M}}(G): {\varvec{j}}^\top = - {\varvec{j}}\}$$,

\begin{aligned} {\int } \phi \,\text {d}{\overline{\nabla }}\cdot {\varvec{j}}= {\iint }_G \phi (x) \eta (x,y) \,\text {d}{\varvec{j}}(x,y) . \end{aligned}

If $${\varvec{j}}$$ is given by (2.6) for some $$v\in {\mathcal {V}}^{\mathrm {as}}(G)$$, then the flux satisfies an antisymmetric relation on the support of $$\gamma _1$$-a.e. on G, i.e., $${\varvec{j}}^+=({\varvec{j}}^\top )^-$$ $$\gamma _1$$-a.e. on G. The following corollary shows that those antisymmetric fluxes are the relevant ones for the minimization of the action functional. For this reason, the natural class of fluxes are those measure on G which are antisymmetric with positive part absolutely continuous with respect to $$\gamma _1$$, that is,

\begin{aligned} {\mathcal {M}}_{\gamma _1}^{\mathrm {as}}(G)=\bigl \{{\mathbf {j}}\in {\mathcal {M}}(G): {\mathbf {j}}^+\ll \gamma _1,\, {\mathbf {j}}^-\ll \gamma _1^\top , {\mathbf {j}}^+=({\mathbf {j}}^\top )^-\ \gamma _1\text{-a.e. }\bigr \} \end{aligned}
(2.9)

### Corollary 2.8

(Antisymmetric vector fields have lower action) Let $$\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})$$, $$\rho \in {\mathcal {P}}({{\mathbb {R}}^{d}})$$ and $${\varvec{j}}\in {\mathcal {M}}(G)$$ be such that $${\mathcal {A}}(\mu ;\rho ,{\varvec{j}})<\infty$$. Then there exists an antisymmetric flux $${\varvec{j}}^{\mathrm {as}}\in {\mathcal {M}}_{\gamma _1}^{\mathrm {as}}$$ such that

\begin{aligned} {\overline{\nabla }}\cdot {\varvec{j}}={\overline{\nabla }}\cdot {\varvec{j}}^{\mathrm {as}}, \end{aligned}

with lower action:

\begin{aligned} {\mathcal {A}}(\mu ;\rho ,{\varvec{j}}^{\mathrm {as}})\leqq {\mathcal {A}}(\mu ;\rho ,{\varvec{j}}). \end{aligned}

### Proof

Let us set $${\varvec{j}}^{\mathrm {as}} = ({\varvec{j}}- {\varvec{j}}^\top )/2$$. Since $$\eta$$ is symmetric and $$\big ({\overline{\nabla }}\phi \big )^\top = - {\overline{\nabla }}\phi$$, we get

\begin{aligned} {\iint }_G {\overline{\nabla }}\phi \; \eta \,\text {d}{{\varvec{j}}^{\mathrm {as}}}&= \frac{1}{2}{\iint }_G {\overline{\nabla }}\phi \; \eta \; ( \,\text {d}{\varvec{j}}- \,\text {d}{\varvec{j}}^\top ) \\&= \frac{1}{2}{\iint }_G {\overline{\nabla }}\phi \; \eta \,\text {d}{\varvec{j}}- \frac{1}{2}{\iint }_G \big ({\overline{\nabla }}\phi \big )^\top \; \eta \,\text {d}{\varvec{j}}\\&= {\iint }_G {\overline{\nabla }}\phi \; \eta \,\text {d}{\varvec{j}}. \end{aligned}

By an application of Lemma 2.6 and comparison of (2.7) and (2.8) it is enough to show that, for all $$(x,y)\in G$$,

\begin{aligned}&{\left| v^{\mathrm {as}}(x,y)_+ \right|^2 + \left|v^{\mathrm {as}}(x,y)_- \right|^2 + \left| v^{\mathrm {as}}(y,x)_+ \right|^2 + \left|v^{\mathrm {as}}(y,x)_- \right|^2} \\&\quad \leqq \left| v(x,y)_+ \right|^2 + \left|v(x,y)_- \right|^2 + \left| v(y,x)_+ \right|^2 + \left|v(y,x)_- \right|^2 \end{aligned}

for any measurable $$v:G\rightarrow {\mathbb {R}}$$, where $$v^{\mathrm {as}}(x,y) = \left( v(x,y) -v(y,x)\right) /2$$. This estimate is a consequence of Jensen’s inequality applied to the convex functions

\begin{aligned} \varphi ^\pm :{\mathbb {R}}\rightarrow {\mathbb {R}}\qquad \text {with}\qquad \varphi ^\pm (r) = \left( r_\pm \right) ^2. \end{aligned}

$$\square$$

### Lemma 2.9

(Lower semicontinuity of the action) The action is lower semicontinuous with respect to the narrow convergence in $${\mathcal {M}}^+({{\mathbb {R}}^{d}})\times {\mathcal {P}}({{\mathbb {R}}^{d}})\times {\mathcal {M}}(G)$$. That is, if $$\mu ^n {\rightharpoonup }\mu$$ in $${\mathcal {M}}({{\mathbb {R}}^{d}})$$, $$\rho ^n {\rightharpoonup }\rho$$ in $${\mathcal {P}}({{\mathbb {R}}^{d}})$$, and $${\varvec{j}}^n {\rightharpoonup }{\varvec{j}}$$ in $${\mathcal {M}}(G)$$, then

\begin{aligned} \liminf _{n\rightarrow \infty }{\mathcal {A}}(\mu ^n;\rho ^n,{\varvec{j}}^n)\geqq {\mathcal {A}}(\mu ;\rho ,{\varvec{j}}) \;. \end{aligned}

### Proof

First, note that the narrow convergence of any sequences $$(\rho ^n)_n$$ and $$(\mu ^n)_n$$ implies the narrow convergence of the product: $$\rho ^n\otimes \mu ^n \rightharpoonup \rho \otimes \mu$$ in $${\mathcal {P}}({{\mathbb {R}}^{d}})\times {\mathcal {M}}^+({{\mathbb {R}}^{d}})$$, therefore also in $${\mathcal {M}}^+(G)$$. Then, in Definition 2.3 consider the vector-valued measure

\begin{aligned} \lambda = \left( {\varvec{j}}, \rho \otimes \mu ,\mu \otimes \rho \right) . \end{aligned}

Further, we define the function

\begin{aligned} f:G \times {\mathbb {R}}^3 \rightarrow {\mathbb {R}}\quad \text {by} \quad f\big ((x,y),(j,\gamma _1,\gamma _2)\big ) = \big ( \alpha (j,\gamma _1) + \alpha (-j,\gamma _2)\big ) \, \eta (x,y). \end{aligned}

Since the function $$\eta$$ is lower semicontinuous by (W) and $$\alpha$$ defined in (2.5) is lower semicontinuous, jointly convex and positively one-homogeneous, f satisfies the assumptions of [8, Theorem 3.4.3], whence the claim follows. $$\square$$

According to Definition 2.3, fluxes and action are strictly related. In case $${\mathcal {A}}(\mu ;\rho ,{\varvec{j}})<+ \infty$$, we get a useful upper bound in the following lemma that will be crucial in several technical parts later on.

### Lemma 2.10

For any $$\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})$$, $$\rho \in {\mathcal {P}}({{\mathbb {R}}^{d}})$$, $${\varvec{j}}\in {\mathcal {M}}(G)$$ and any measurable $$\Phi :G\rightarrow {\mathbb {R}}_+$$ it holds

\begin{aligned} \left( \frac{1}{2}{\iint }_G\Phi \, \eta \,\text {d}|{\varvec{j}}|\right) ^2\leqq \, {\mathcal {A}}(\mu ;\rho ,{\varvec{j}}) {\iint }_G\Phi ^2\, \eta \,(\text {d}\gamma _1+\text {d}\gamma _2) . \end{aligned}
(2.10)

### Proof

Let $$\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})$$, $$\rho \in {\mathcal {P}}({{\mathbb {R}}^{d}})$$ and $${\varvec{j}}\in {\mathcal {M}}(G)$$ be such that $${\mathcal {A}}(\mu ;\rho ,{\varvec{j}})<+ \infty$$. Let $$|\lambda | \in {\mathcal {M}}^+(G)$$ be such that $$\gamma _1, \gamma _2, |{\varvec{j}}|\ll |\lambda |$$ as in Definition 2.3 and write $$\gamma _i = {\tilde{\gamma }}_i |\lambda |$$ and $$|{\varvec{j}}| = |j| |\lambda |$$ for the densities.

We have that $$A:=\bigl \{ (x,y) \in G:\alpha (j,{\tilde{\gamma }}_1) = \infty \text{ or } \alpha (-j,{\tilde{\gamma }}_2)=\infty \bigr \}$$ is a $$\lambda$$-nullset. We observe the elementary inequality

\begin{aligned} (j_+)^2 + (j_-)^2 \leqq {\max \{{\tilde{\gamma }}_1,{\tilde{\gamma }}_2\}} \big (\alpha (j,{\tilde{\gamma }}_1) + \alpha (-j,{\tilde{\gamma }}_2)\big ), \qquad \lambda \text {-a.e.}\ \text {in} \, A^\mathrm {c}. \end{aligned}

In particular, it holds that

\begin{aligned} |j | = j_+ + j_- \leqq \sqrt{2 \max \{{\tilde{\gamma }}_1,{\tilde{\gamma }}_2\}} \sqrt{\alpha (j,{\tilde{\gamma }}_1) + \alpha (-j,{\tilde{\gamma }}_2)}, \qquad \lambda \text {-a.e. in} \, A^\mathrm {c}. \end{aligned}

Hence we can estimate

\begin{aligned} \frac{1}{2}{\iint }_G \Phi \,\eta \,\text{ d }|{\mathbf {j}} |&= \frac{1}{2}{\iint }_G \Phi \,\eta \, |j | \,\text{ d }|\lambda | = \frac{1}{2}{\iint }_{A^\mathrm {c}} \Phi \,\eta \, \left( j_+ + j_-\right) \,\text{ d }{|\lambda |} \\ {}&\leqq \frac{1}{2}{\iint }_{A^\mathrm {c}} \Phi \,\eta \, \sqrt{2\max \left\{ {\tilde{\gamma }}_1,{\tilde{\gamma }}_2\right\} } \sqrt{\alpha (j,{\tilde{\gamma }}_1) + \alpha (-j,{\tilde{\gamma }}_2)} \,\text{ d }|\lambda | \\ {}&\leqq \left( {\iint }_G \Phi ^2\,\eta \, \max \left\{ {\tilde{\gamma }}_1,{\tilde{\gamma }}_2\right\} \,\text{ d }|\lambda |\right) ^{\frac{1}{2}} \\ {}&\quad \times \left( \frac{1}{2}{\iint }_G \left( \alpha (j,{\tilde{\gamma }}_1) + \alpha (-j,{\tilde{\gamma }}_2)\right) \,\eta \,\text{ d }|\lambda |\right) ^{\frac{1}{2}}. \end{aligned}

Now, the result follows by estimating $$\max \left\{ {\tilde{\gamma }}_1,{\tilde{\gamma }}_2\right\} \leqq {\tilde{\gamma }}_1 + {\tilde{\gamma }}_2$$. $$\square$$

As a consequence of the previous results we have the following corollary, which will be useful in Section 2.3:

### Corollary 2.11

Let $$\mu \in {\mathcal {M}}^+({\mathbb {R}}^d)$$ satisfy (A1) for some $$C_\eta \in (0,\infty )$$, then for all $$\rho \in {\mathcal {P}}({{\mathbb {R}}^{d}})$$ and $${\varvec{j}}\in {\mathcal {M}}(G)$$ there holds

\begin{aligned} \frac{1}{2}{\iint }_G(2\wedge |x-y|)\eta (x,y)\,\text {d}|{\varvec{j}} |(x,y)\leqq \sqrt{2C_\eta \, {\mathcal {A}}(\mu ;\rho ,{\varvec{j}})}. \end{aligned}
(2.11)

### Proof

Let us consider the case $${\mathcal {A}}(\mu ;\rho ,{\varvec{j}})<\infty$$, otherwise the result is trivial. From Lemma 2.6 we have $$\text {d}{\varvec{j}}(x,y)=v(x,y)_+\text {d}\gamma _1(x,y)-v(x,y)_-\text {d}\gamma _2(x,y)$$, with $$\text {d}\gamma _1(x,y)=\text {d}\rho (x)\mu (y)$$ and $$\text {d}\gamma _2(x,y)=\text {d}\mu (x)\text {d}\rho (y)$$. Applying Lemma 2.10 for $$\Phi (x,y)=2\wedge |x-y|$$ and noticing $$\Phi (x,y) \leqq |x-y| \leqq |x-y|\vee |x-y|^2$$, we arrive at the bound

\begin{aligned}&\left( \frac{1}{2}{\iint }_G(2\wedge |x-y|)\eta (x,y)\,\text {d}{\varvec{j}}\right) ^2 \\&\quad \leqq {\mathcal {A}}(\mu ;\rho ,{\varvec{j}}) {\iint }_G(2\wedge |x-y|)^2\eta (x,y)(\text {d}\gamma _1+\text {d}\gamma _2)\\&\quad \leqq {\mathcal {A}}(\mu ;\rho ,{\varvec{j}})\, 2 {\iint }_G \left( |x-y |^2 \vee |x-y |^4\right) \, \eta (x,y)\,\text {d}\mu (y) \,\text {d}\rho (x) \\&\quad \leqq {\mathcal {A}}(\mu ;\rho ,{\varvec{j}}) \, 2 C_\eta , \end{aligned}

where the last estimate follows from (A1) and the integral is finite since $$\rho \in {\mathcal {P}}({{\mathbb {R}}^{d}})$$. $$\square$$

### Lemma 2.12

(Convexity of the action) Let $$\mu ^i\in {\mathcal {M}}^+({{\mathbb {R}}^{d}})$$, $$\rho ^i \in {\mathcal {P}}({{\mathbb {R}}^{d}})$$ and $${\varvec{j}}^i \in {\mathcal {M}}(G)$$ for $$i=0,1$$. For $$\tau \in (0,1)$$ such that $$\mu ^\tau = (1-\tau ) \mu ^0 + \tau \mu ^1$$, $$\rho ^\tau = (1-\tau ) \rho ^0 + \tau \rho ^1$$ and $${\varvec{j}}^\tau = (1-\tau ) {\varvec{j}}^0 + \tau {\varvec{j}}^1$$, it holds

\begin{aligned} {\mathcal {A}}(\mu ^\tau ;\rho ^\tau , {\varvec{j}}^\tau )\leqq (1-\tau ) {\mathcal {A}}(\mu ^0;\rho ^0,{\varvec{j}}^0) + \tau {\mathcal {A}}(\mu ^1;\rho ^1,{\varvec{j}}^1). \end{aligned}

### Proof

Let us consider a measure $$\lambda \in {\mathcal {M}}(G)$$ such that $$\text {d}\gamma _j^i={\tilde{\gamma }}_j^i\text {d}\lambda$$ and $$\text {d}{\varvec{j}}^i=\tilde{{\varvec{j}}}^i\text {d}\lambda$$ for $$i=0,1$$ and $$j=1,2$$. Then, the convex combinations are such that $$\text {d}\gamma _j^\tau ={\tilde{\gamma }}_j^\tau \text {d}\lambda$$ and $$\text {d}{\varvec{j}}^{\tau }=\tilde{{\varvec{j}}}^{\tau }\text {d}\lambda$$, where

\begin{aligned}&\tilde{\gamma }_j^\tau =(1-\tau )\tilde{\gamma }_j^0 + \tau \tilde{\gamma }_j^1, \qquad \text {for } j=1,2,\\ \text {and}\qquad&\tilde{{\varvec{j}}}^{\tau }=(1-\tau )\tilde{{\varvec{j}}}^0 + \tau \tilde{{\varvec{j}}}^1. \end{aligned}

Using the convexity of the function $$\alpha$$ we get the result, that is,

\begin{aligned} {\mathcal {A}}(\mu ^\tau ;\rho ^\tau ,{\varvec{j}}^\tau )&=\frac{1}{2}{\iint }_G\left( \alpha (\tilde{{\varvec{j}}}^\tau ,\tilde{\gamma }_1^\tau )+\alpha (-\tilde{{\varvec{j}}}^\tau ,\tilde{\gamma }_2^\tau )\right) \eta (x,y)\,\text {d}\lambda (x,y)\\&\leqq \frac{1-\tau }{2}{\iint }_G\left( \alpha (\tilde{{\varvec{j}}}^0,\tilde{\gamma }_1^0)+\alpha (-\tilde{{\varvec{j}}}^0,\tilde{\gamma }_2^0)\right) \eta (x,y)\,\text {d}\lambda (x,y)\\&\quad +\frac{\tau }{2}{\iint }_G\left( \alpha (\tilde{{\varvec{j}}}^1,\tilde{\gamma }_1^1)+\alpha (-\tilde{{\varvec{j}}}^1,\tilde{\gamma }_2^1)\right) \eta (x,y)\,\text {d}\lambda (x,y)\\&=(1-\tau ){\mathcal {A}}(\mu ^0;\rho ^0,{\varvec{j}}^0) + \tau {\mathcal {A}}(\mu ^1;\rho ^1,{\varvec{j}}^1). \end{aligned}

$$\square$$

### 3.3 Nonlocal Continuity Equation

In view of the considerations made in Section 2.2, we now deal with the nonlocal continuity equation

\begin{aligned} \partial _t\rho _t+{\overline{\nabla }}\cdot {\varvec{j}}_t=0 \qquad \text {on}\ (0,T)\times {{\mathbb {R}}^{d}}, \end{aligned}
(2.12)

where $$(\rho _t)_{t\in [0,T]}$$ and $$({\varvec{j}}_t)_{t\in [0,T]}$$ are unknown Borel families of measures in $${\mathcal {P}}({{\mathbb {R}}^{d}})$$ and $${\mathcal {M}}(G)$$, respectively. Equation (2.12) is understood in the weak form: $$\forall \varphi \in C_\mathrm {c}^\infty ((0,T)\times {{\mathbb {R}}^{d}})$$,

\begin{aligned} {\int }_0^T{\int }_{{\mathbb {R}}^{d}}\partial _t\varphi _t(x)\,\text{ d }\rho _t(x)\,\text{ d }t +\frac{1}{2}{\int }_0^T{\iint }_G{\overline{\nabla }}\varphi _t(x,y)\eta (x,y)\,\text{ d }{\mathbf {j}}_t(x,y)\,\text{ d }t =0. \end{aligned}
(2.13)

Since $$|{\overline{\nabla }}\varphi (x,y)|\leqq ||\varphi ||_{C^1}(2\wedge |x-y|)$$, the weak formulation is well-defined under the integrability condition

\begin{aligned} {\int }_0^T{\iint }_G(2\wedge |x-y|)\eta (x,y)\text {d}{\varvec{j}}_t(x,y)\,\text {d}t<\infty . \end{aligned}
(2.14)

### Remark 2.13

The integrability condition (2.14) is automatically satisfied by a pair $$(\rho _t, {\varvec{j}}_t)_{t\in [0,T]}$$ such that $${\int }_0^T {\mathcal {A}}(\mu ;\rho _t,{\varvec{j}}_t) \,\text {d}t< \infty$$, due to Corollary 2.11.

Hence we arrive at the following definition of weak solution of the nonlocal continuity equation:

### Definition 2.14

(Nonlocal continuity equation in flux form) A pair $$(\rho ,{\varvec{j}}):[0,T] \rightarrow {\mathcal {P}}({{\mathbb {R}}^{d}})\times {\mathcal {M}}(G)$$ is called a weak solution to the nonlocal continuity equation (2.12) provided that

1. (i)

$$(\rho _t)_{t\in [0,T]}$$ is weakly continuous curve in $${\mathcal {P}}({{\mathbb {R}}^{d}})$$;

2. (ii)

$$({\varvec{j}}_t)_{t\in [0,T]}$$ is a Borel-measurable curve in $${\mathcal {M}}(G)$$;

3. (iii)

the pair $$(\rho ,{\varvec{j}})$$ satisfies (2.13).

We denote the set of all weak solutions on the time interval [0, T] by $${{\,\mathrm{CE}\,}}_T$$. For $$\rho ^0,\rho ^1\in {\mathcal {P}}({{\mathbb {R}}^{d}})$$, a pair $$(\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}(\rho ^0,\rho ^1)$$ if $$(\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}:={{\,\mathrm{CE}\,}}_1$$ and in addition $$\rho (0)=\rho ^0$$ and $$\rho (1)=\rho ^1$$.

The following lemma shows that any weak solution satisfying (2.13), which additionally satisfies the integrability condition (2.14) has a weakly continuous representative and hence is a weak solution in the sense of Definition 2.14. This observation justifies the terminology of curve in the space of probability measures; see [2, Lemma 8.1.2] and [23, Lemma 3.1].

### Lemma 2.15

Let $$(\rho _t)_{t\in [0,T]}$$ and $$({\varvec{j}}_t)_{t\in [0,T]}$$ be Borel families of measures in $${\mathcal {P}}({{\mathbb {R}}^{d}})$$ and $${\mathcal {M}}(G)$$ satisfying (2.13) and (2.14). Then there exists a weakly continuous curve $$({\bar{\rho }}_t)_{t\in [0,T]}\subset {\mathcal {P}}({{\mathbb {R}}^{d}})$$ such that $${\bar{\rho }}_t=\rho _t$$ for a.e. $$t\in [0,T]$$. Moreover, for any $$\varphi \in C_\mathrm {c}^\infty ([0,T]\times {{\mathbb {R}}^{d}})$$ and all $$0\leqq t_0\leqq t_1\leqq T$$ it holds that

\begin{aligned} \begin{aligned}&{\int }_{{\mathbb {R}}^{d}}\varphi _{t_1}(x)\,\text {d}{\bar{\rho }}_{t_1}(x)-{\int }_{{\mathbb {R}}^{d}}\varphi _{t_0}(x)\,\text {d}{\bar{\rho }}_{t_0}(x) \\&\quad ={\int }_{t_0}^{t_1}{\int }_{{\mathbb {R}}^{d}}\partial _t\varphi _t(x)\,\text {d}\rho _t(x)\,\text {d}t\\&\qquad +\frac{1}{2}{\int }_{t_0}^{t_1}{\iint }_G{\overline{\nabla }}\varphi _t(x,y)\eta (x,y)\,\text {d}{\varvec{j}}_t(x,y)\,\text {d}t. \end{aligned} \end{aligned}
(2.15)

We now prove propagation of second-order moments.

### Lemma 2.16

(Uniformly bounded second moments) Let $$(\mu ^n)_n\subset {\mathcal {M}}^+({{\mathbb {R}}^{d}})$$ such that (A1) holds uniformly in n. Let $$(\rho _0^n)_n \subset {\mathcal {P}}_{2}({{\mathbb {R}}^{d}})$$ be such that $$\sup _{n\in {\mathbb {N}}} M_2(\rho _0^n) < \infty$$ and $$(\rho ^n,{\varvec{j}}^n)_n \subset {{\,\mathrm{CE}\,}}_T$$ be such that $$\sup _{n\in {\mathbb {N}}} {\int }_0^T {\mathcal {A}}(\mu ^n;\rho _t^n,{\varvec{j}}_t^n)\,\text {d}t<\infty$$. Then $$\sup _{t\in [0,T]}\sup _{n\in {\mathbb {N}}} M_2(\rho _t^n) < \infty$$.

### Proof

We proceed by considering the time derivative of the second-order moment of $$\rho _t^n$$ for all $$t\in [0,T]$$ and $$n\in {\mathbb {N}}$$. Since $$x\mapsto |x|^2$$ is not an admissible test function in (2.13), we introduce a smooth cut-off function $$\varphi _R$$ satisfying $$\varphi _R(x)=1$$ for $$x\in B_R$$, $$\varphi _R(x)=0$$ for $$x \in {{\mathbb {R}}^{d}}{\setminus } B_{2R}$$ and $$|\nabla \varphi _R |\leqq \frac{2}{R}$$. Then, we can use the definition of solution with the function $$\psi _R(x)= \varphi _R(x)^2 (|x|^2+1)$$ and apply Lemma 2.10 with $$\Phi ={\overline{\nabla }}\psi _R$$ to obtain, for all $$t\in [0,T]$$ and $$n\in {\mathbb {N}}$$,

\begin{aligned}&\frac{\text {d}}{\text {d}{t}}{\int }_{{\mathbb {R}}^{d}}\psi _R(x)\,\text {d}\rho _t^n(x) \\&\quad =\frac{1}{2}{\iint }_G {\overline{\nabla }}\psi _R(x,y)\,\eta (x,y)\,\text {d}{\varvec{j}}_t^n(x,y)\\&\quad \leqq \sqrt{{\mathcal {A}}(\mu ^n;\rho _t^n,{\varvec{j}}_t^n)}\left( {\iint }_G \left|{\overline{\nabla }}\psi _R(x,y) \right|^2\eta (x,y)(\text {d}\gamma _1^n+\text {d}\gamma _2^{n}) \right) ^\frac{1}{2}. \end{aligned}

For $$R\geqq 1$$, we estimate, for all $$(x,y)\in G$$,

\begin{aligned} |{\overline{\nabla }}\psi _R(x,y) |^2&\leqq 2 |\varphi _R(y)^2-\varphi _R(x)^2 |^2 + 2 | \varphi _R(y)^2 |y|^2 - \varphi _R(x)^2 |x|^2 |^2 , \end{aligned}
(2.16)

and observe that

\begin{aligned} \left| {\overline{\nabla }}\varphi _R^2(x,y) \right| = \left|{\overline{\nabla }}\varphi _R(x,y)\left( \varphi _R(x) + \varphi _R(y)\right) \right| \leqq \frac{4}{R} \left|x-y \right|. \end{aligned}

Hence the first term in (2.16) is bounded by $$32 |x-y |^2$$, since $$R\geqq 1$$. For the second term in (2.16), we abbreviate by setting $$r = \varphi _R(x) |x |$$ and $$s = \varphi _R(y)|y |$$ and compute the bound

\begin{aligned} |s^2 - r^2 |^2&= |s-r |^2 |s+r |^2 \leqq 2 |s-r |^4 + 8 |r|^2 |s-r |^2 \\&\leqq 8 \left( |r|^2+1\right) \left( |s-r|^2 \vee |s-r|^4 \right) . \end{aligned}

It is easy to check that $$x\mapsto \varphi _R(x) |x |$$ is globally Lipschitz and we can conclude that, for some numerical constant $$C>0$$, for all $$(x,y)\in G$$ we have

\begin{aligned} \left| {\overline{\nabla }}\psi _R(x,y) \right|^2\leqq & {} 32|x-y |^2 + C |x |^2 \left( |x-y |^2 \vee |x-y |^4\right) \\\leqq & {} C\left( |x |^2+1\right) \left( |x-y |^2 \vee |x-y |^4\right) . \end{aligned}

Thus, by sending $$R\rightarrow \infty$$ and using (A1), it follows that

\begin{aligned} \frac{\text {d}}{\text {d}{t}}{\int }_{{\mathbb {R}}^{d}}\left( |x|^2+1\right) \,\text {d}\rho _t^n(x) \leqq \sqrt{{\mathcal {A}}(\mu ^n;\rho _t^n,{\varvec{j}}_t^n)}\left( 2 C C_\eta {\int }_{{\mathbb {R}}^{d}}\left( |x|^2+1\right) \,\text {d}\rho ^n_t(x)\right) ^\frac{1}{2} \end{aligned}

By integrating the above differential inequality, we arrive at the bound

\begin{aligned} {\int }_{{\mathbb {R}}^{d}}|x|^2\,\text {d}\rho _t^n(x)\leqq 2 {\int }_{{\mathbb {R}}^{d}}\left( |x|^2+1\right) \,\text {d}\rho _0^n(x)+ 2 C C_\eta T {\int }_0^T {\mathcal {A}}(\mu ^n;\rho _t^n,{\varvec{j}}_t^n)\,\text {d}t, \end{aligned}

whence we conclude by taking the suprema in $$n\in {\mathbb {N}}$$ and $$t\in [0,T]$$. $$\square$$

Now we are ready to show compactness for the solutions to (2.12).

### Proposition 2.17

(Compactness of solutions to the nonlocal continuity equation) Let $$(\mu ^n)_n\subset {\mathcal {M}}^+({\mathbb {R}}^d)$$ and suppose that $$(\mu ^n)_n$$ narrowly converges to $$\mu$$. Moreover, suppose that the base measures $$\mu ^n$$ and $$\mu$$ satisfy (A1) and (A2) uniformly in n. Let $$(\rho ^n,{\varvec{j}}^n) \in {{\,\mathrm{CE}\,}}_T$$ for each $$n\in {\mathbb {N}}$$ be such that $$(\rho _0^n)_n$$ satisfies $$\sup _{n\in {\mathbb {N}}} M_2(\rho _0^n)< \infty$$ and

\begin{aligned} \sup _{n\in {\mathbb {N}}}{\int }_0^T{\mathcal {A}}(\mu ^n;\rho _t^n,{\varvec{j}}_t^n)\,\text {d}t<\infty . \end{aligned}
(2.17)

Then, there exists $$(\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T$$ such that, up to a subsequence, as $$n\rightarrow \infty$$ it holds

\begin{aligned} \rho _t^n\rightharpoonup \rho _t\quad&\text {for all}\ t\in [0,T],\\ {\varvec{j}}^n\rightharpoonup {\varvec{j}}\quad \&\text {in}\ {\mathcal {M}}_{\mathrm {loc}}(G\times [0,T]), \end{aligned}

with $$\rho _t\in {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ for any $$t\in [0,T]$$. Moreover, the action is lower semicontinuous along the above subsequences $$(\mu ^n)_n, (\rho ^n)_n$$ and $$({\varvec{j}}^n)_n$$, i.e.,

\begin{aligned} \liminf _{n\rightarrow \infty }{\int }_0^T{\mathcal {A}}(\mu ^n;\rho _t^n,{\varvec{j}}_t^n)\,\text {d}t\geqq {\int }_0^T{\mathcal {A}}(\mu ;\rho _t,{\varvec{j}}_t)\,\text {d}t. \end{aligned}

### Proof

We argue similarly to [22, Lemma 4.5], [23, Proposition 3.4]. For each $$n\in {\mathbb {N}}$$ we define $${\varvec{j}}^n\in {\mathcal {M}}(G\times [0,T])$$ as $$\text {d}{\varvec{j}}^n(x,y,t)=\text {d}{\varvec{j}}_t^n(x,y)\text {d}t$$. In view of Lemma 2.16 there exists $$C_2>0$$ such that $$\sup _{t\in [0,T]}\sup _{n\in {\mathbb {N}}} M_2(\rho _t^n) \leqq C_2 <+ \infty$$.

For any compact sets $$K\subset G$$ and $$I\subseteq [0,T]$$, we apply the bound (2.11) of Corollary 2.11 and the Cauchy–Schwarz inequality to get

\begin{aligned} \sup _{n\in {\mathbb {N}}}|{\varvec{j}}^n|(K\times I)&\leqq \sup _{n\in {\mathbb {N}}}{\int }_I {\iint }_K \frac{(2\wedge |x-y|)\,\eta (x,y)}{(2\wedge |x-y|)\,\eta (x,y)} \,\text {d}|{\varvec{j}}_t^n|(x,y)\,\text {d}t \nonumber \\&\leqq \frac{2 \sqrt{|I|} \sqrt{2C_\eta }}{\inf _{(x,y)\in K} (2\wedge |x-y|)\eta (x,y)}\left( \sup _{n\in {\mathbb {N}}}{\int }_0^T{\mathcal {A}}(\mu ^n;\rho _t^n,{\varvec{j}}_t^n)\,\text {d}t\right) ^\frac{1}{2} . \end{aligned}
(2.18)

Thanks to Assumption (W), we have that $$\inf _{(x,y)\in K} (2\wedge |x-y|)\eta (x,y)>0$$ for any compact $$K\subset G$$. Hence, by (2.17), $$({\varvec{j}}^n)_n$$ has total variation uniformly bounded in n on every compact set of $$G\times [0,T]$$, which implies, up to a subsequence, $${\varvec{j}}^n\rightharpoonup {\varvec{j}}$$ as $$n\rightarrow \infty$$ in $${\mathcal {M}}_{{{\,\mathrm{loc}\,}}}(G \times [0,T])$$. Because of the disintegration theorem, there exists a Borel family $$({\varvec{j}}_t)_{t\in [0,T]}$$ such that, for all compact sets $$I\subseteq [0,T]$$ and $$K\subset G$$, there holds that $${\varvec{j}}(K\times I)={\int }_I {\varvec{j}}_t(K) \,\text {d}t$$. Thanks to the bound (2.18), the family $$\{{\varvec{j}}_t\}_{t\in [0,T]}$$ still satisfies (2.14).

Now, as we need to pass to the limit in (2.13), we consider a function $$\xi \in C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}})$$ and an interval $$[t_0,t_1]\subseteq [0,T]$$. The function $$\chi _{[t_0,t_1]}(t){\overline{\nabla }}\xi (x,y)$$ has no compact support in $$[t_0,t_1]\times G$$, so we proceed by a truncation argument. Let $$\varepsilon >0$$ and let us set $$I^\varepsilon = [t_0+\varepsilon , t_1-\varepsilon ]$$, $$N_\varepsilon = {\overline{B}}_{\varepsilon ^{-1}} \times {\overline{B}}_{\varepsilon ^{-1}}$$, where $$B_{\varepsilon ^{-1}}= \left\{ x \in {\mathbb {R}}^d: |x|< \varepsilon ^{-1}\right\}$$, and $$G_\varepsilon =\{(x,y)\in G:\varepsilon \leqq |x-y|\}$$. Hence we can find $$\varphi _\varepsilon \in C_\mathrm {c}^\infty ([t_0,t_1]\times G; [0,1])$$ satisfying

\begin{aligned} \left\{ \varphi _\varepsilon = 1 \right\} \supseteq I_\varepsilon \times \left( G_\varepsilon \cap N_\varepsilon \right) , \end{aligned}
(2.19)

so that $$\varphi _\varepsilon \rightarrow \chi _{[t_0,t_1]} \, \chi _G$$ as $$\varepsilon \rightarrow 0$$ and $$\varphi _\varepsilon \, \chi _{[t_0,t_1]} \, {\overline{\nabla }}\xi$$ has compact support in $$[t_0,t_1]\times G$$. Then, we get thanks to Assumption (W), that

\begin{aligned}&\lim _{n\rightarrow \infty }{\int }_{t_0}^{t_1} {\iint }_G \varphi _\varepsilon (t,x,y){\overline{\nabla }}\xi (x,y) \eta (x,y) \,\text {d}{\varvec{j}}_t^n(x,y)\,\text {d}t \nonumber \\&\quad = {\int }_{t_0}^{t_1} {\iint }_G \varphi _\varepsilon (t,x,y){\overline{\nabla }}\xi (x,y) \eta (x,y)\,\text {d}{\varvec{j}}_t(x,y)\,\text {d}t . \end{aligned}
(2.20)

Now, it remains to show that

\begin{aligned} \lim _{\varepsilon \rightarrow 0} \sup _{n\in {\mathbb {N}}} \left|{\int }_{t_0}^{t_1} {\iint }_G \left( 1- \varphi _\varepsilon (t,x,y)\right) {\overline{\nabla }}\xi (x,y)\eta (x,y)\,\text {d}{\varvec{j}}_t^n(x,y)\,\text {d}t \right| = 0.\nonumber \\ \end{aligned}
(2.21)

We need to estimate terms for which $$\varphi _\varepsilon (t,x)<1$$. First, setting $$I_\varepsilon ^\mathrm {c} = [t_0,t_1]{\setminus } I_\varepsilon$$, we note that

\begin{aligned}{}[t_0,t_1] \times G {\setminus } \{\varphi _\varepsilon =1\} \subseteq \big ( I_\varepsilon ^\mathrm {c} \times G \big ) \cup \big ( I_\varepsilon \times ( G{\setminus } (G_\varepsilon \cap N_\varepsilon ))\big ) =: M_\varepsilon , \end{aligned}

whence, by Lemma 2.10,

\begin{aligned}&{\left| {\int }_{t_0}^{t_1} {\iint }_G \left( 1- \varphi _\varepsilon (t,x,y)\right) {\overline{\nabla }}\xi (x,y)\eta (x,y)\,\text{ d }{\mathbf {j}}_t^n(x,y)\,\text{ d }t \right| }\\ {}&\leqq \Vert \xi \Vert _{C^1} {\int }_{t_0}^{t_1} {\iint }_G \left( 1- \varphi _\varepsilon (t,x,y)\right) \left( 2 \wedge | x-y|\right) \eta (x,y) \,\text{ d }|{\mathbf {j}}_t^n |(x,y)\,\text{ d }t \\ {}&\leqq 2\Vert \xi \Vert _{C^1} \bigg ({\int }_0^T{\mathcal {A}}(\mu ^n;\rho _t^n,{\mathbf {j}}_t^n)\,\text{ d }t\bigg )^\frac{1}{2}\\ {}&\qquad \times \left( {\iiint }_{M_\varepsilon } \left( 4\wedge | x-y|^2\right) \eta (x,y)\,\text{ d }\big (\gamma ^{n}_{1,t} + \gamma ^{n}_{2,t}\big )\, \text{ d }t\right) ^\frac{1}{2}. \end{aligned}

Since $$4\wedge |x-y|^2 \leqq |x-y|^2\vee |x-y|^4$$ we have, by Assumption (A1), the bound

\begin{aligned} {\int }_{I_\varepsilon ^\mathrm {c}}{\iint }_G \left( 4\wedge | x-y|^2\right) \eta (x,y) \,\text {d}\big (\gamma ^{1,n}_t + \gamma ^{2,n}_t\big )\, \text {d}t \leqq 2 |I_\varepsilon ^\mathrm {c}| C_\eta = 4 C_\eta \varepsilon . \end{aligned}

Likewise, using the symmetry, we arrive at

\begin{aligned}&{\int }_{I_\varepsilon } {\iint }_{G_\varepsilon ^\mathrm {c}} \left( 4\wedge | x-y|^2\right) \eta (x,y) \,\text {d}\big (\gamma ^{n}_{1,t} + \gamma ^{n}_{2,t}\big ) \,\text {d}t \\&\quad = 2{\int }_0^T {\iint }_{G_\varepsilon ^\mathrm {c}} \left( 4\wedge | x-y|^2\right) \eta (x,y) \,\text {d}\mu ^n(y)\, \text {d}\rho _t^n(x)\,\text {d}{t}, \end{aligned}

which vanishes as $$\varepsilon \rightarrow 0$$ in view of Assumption (A2). Finally, the last term is estimated again using (A1):

\begin{aligned}&{\int }_{I_\varepsilon }{\iint }_{G{\setminus } N_\varepsilon } \left( 4\wedge | x-y|^2\right) \eta (x,y)\,\text {d}\gamma ^{1,n}_t \,\text {d}t \\&\quad \leqq {\int }_0^T {\int }_{{\overline{B}}_{\varepsilon ^{-1}}^\mathrm {c}} {\int }_{{\mathbb {R}}^{d}}\left( 4\wedge |x-y|^2\right) \eta (x,y) \,\text {d}\mu ^n(y) \,\text {d}\rho _t^n(x)\, \text {d}t \\&\quad \leqq T C_\eta \sup _{t\in [0,T]} \rho _t^n\left( {\overline{B}}_{\varepsilon ^{-1}}^\mathrm {c}\right) \rightarrow 0 \qquad \text {as } \varepsilon \rightarrow 0 , \end{aligned}

since $$M_2(\rho _t^n) \leqq C_2$$ for any $$n\in {\mathbb {N}}$$ and $$t\in [0,T]$$ by Lemma 2.16.

Combining (2.20) and (2.21), we get

\begin{aligned}&\lim _{n\rightarrow \infty }{\int }_{t_0}^{t_1}{\iint }_G{\overline{\nabla }}\xi (x,y)\,\eta (x,y)\,\text {d}{\varvec{j}}_t^n(x,y)\,\text {d}t \\&\quad ={\int }_{t_0}^{t_1}{\iint }_G{\overline{\nabla }}\xi (x,y)\,\eta (x,y)\,\text {d}{\varvec{j}}_t(x,y)\,\text {d}t. \end{aligned}

By means of the last convergence, the tightness of $$(\rho _0^n)_n$$, and (2.15) with $$\varphi (t,x)=\xi (x)$$, $$t_0=0$$ and $$t_1=T$$, we obtain that $$(\rho _t^n)_n$$ locally narrowly converges to some finite non-negative measure $$\rho _t\in {\mathcal {M}}^+({{\mathbb {R}}^{d}})$$ for any $$t\in [0,T]$$. In particular, for any $$\xi \in C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}})$$ and any $$t\in [0,T]$$, we have

\begin{aligned} {\int }_{{\mathbb {R}}^{d}}\xi (x)\,\text {d}\rho _t(x)={\int }_{{\mathbb {R}}^{d}}\xi (x)\,\text {d}\rho _0(x)+\frac{1}{2}{\int }_{0}^{t}{\iint }_G{\overline{\nabla }}\xi (x,y)\eta (x,y) \,\text {d}{\varvec{j}}_s(x,y)\,\text {d}s. \end{aligned}

Now, for $$R>0$$, let us consider a function $$\xi _R\in C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}})$$ such that $$0\leqq \xi \leqq 1$$, $$\xi =1$$ on $$B_R$$, and $$\Vert \xi \Vert _{C^1}\leqq 1$$. Because of the integrability condition (2.14), satisfied thanks to Corollary 2.11, we have

\begin{aligned}&\left| {\int }_{0}^{t}\frac{1}{2}{\iint }_G{\overline{\nabla }}\xi _R(x,y)\,\eta (x,y) \,\text {d}{\varvec{j}}_s(x,y)\,\text {d}s\right| \\&\quad \leqq \frac{1}{2}{\int }_0^t{\iint }_{G{\setminus }(B_R\times B_R)}\left( 2\wedge |x-y|\right) \eta (x,y) \,\text {d}|{\varvec{j}}_s|\,\text {d}s \xrightarrow [R\rightarrow \infty ]{}0. \end{aligned}

Hence the measure $$\rho _t$$ is actually a probability measure on $${{\mathbb {R}}^{d}}$$ for all $$t\in [0,T]$$. Moreover Lemma 2.16 ensures that the convergence is global and not only local. As a direct consequence of the previous considerations, $$(\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T$$ and the lower semicontinuity follows from Lemma 2.9. $$\square$$

### 3.4 Nonlocal Upwind Transportation Quasi-Metric

Here, we give a rigorous definition of the nonlocal transportation quasi-metric we introduced in (1.8). Let us recall that $$\eta :\{ (x,y)\in {\mathbb {R}}^d \times {\mathbb {R}}^d : x\ne y \}\rightarrow [0,\infty )$$ is the weight function satisfying (W).

### Definition 2.18

(Nonlocal upwind transportation cost) For $$\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})$$ satisfying Assumptions (A1) and (A2), and $$\rho _0,\rho _1\in {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$, the nonlocal upwind transportation cost between $$\rho _0$$ and $$\rho _1$$ is defined by

\begin{aligned} {\mathcal {T}}_\mu (\rho _0,\rho _1)^2=\inf \left\{ {\int }_0^1{\mathcal {A}}(\mu ;\rho _t,{\varvec{j}}_t)\,\text {d}t:(\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}(\rho _0,\rho _1)\right\} . \end{aligned}
(2.22)

If $$\mu$$ is clear from the context, the notation $${\mathcal {T}}$$ is used in place of $${\mathcal {T}}_\mu$$.

Note that Proposition 2.17 ensures the existence of minimizers to (2.22), when $${\mathcal {T}}_\mu <\infty$$, which holds when there exists a path of finite action. On the other hand, if this is not the case, the nonlocal upwind transportation cost is infinite. For example, consider the graph with vertices set by $$\mu$$ and $$\eta$$ which is disconnected, meaning that there are $$x,y\in {{\,\mathrm{supp}\,}}\mu$$ such that there is no sequence $$(x_0=x,x_1,\dots ,x_{n-1},x_n=y)_n$$ with $$\eta (x_i,x_{i+1})>0$$ for all $$i=0,\dots ,n-1$$; in this case, $${\mathcal {T}}_\mu (\delta _x,\delta _y)=\infty$$ since the set of solutions to the continuity equation $${{\,\mathrm{CE}\,}}(\delta _x,\delta _y)$$ is empty.

Due to the one-homogeneity of the action density function $$\alpha$$ in (2.5), we have the following reparametrization result, which is similar to [22, Theorem 5.4]:

### Lemma 2.19

(Reparametrization) For any $$\mu \in {\mathcal {M}}^+({\mathbb {R}}^d)$$ satisfying Assumptions (A1) and (A2), and any $$\rho _0,\rho _T\in {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$, it holds that

\begin{aligned} {\mathcal {T}}_\mu (\rho _0,\rho _T)=\inf \left\{ {\int }_0^T\root \of {{\mathcal {A}}(\mu ; \rho _t,{\varvec{j}}_t)}\,\text {d}t:(\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T(\rho _0,\rho _T)\right\} . \end{aligned}

Now, as consequence of the above reparametrization and Jensen’s inequality, we have the following result, which implies that the infimum is in fact a minimum; see [23, Proposition 4.3].

### Proposition 2.20

For any $$\mu \in {\mathcal {M}}^+({\mathbb {R}}^d)$$ satisfying Assumptions (A1) and (A2), and any $$\rho _0,\rho _1\in {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ such that $${\mathcal {T}}_\mu (\rho _0,\rho _1)<\infty$$, the infimum in (2.22) is attained by a curve $$(\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}(\rho _0,\rho _1)$$ so that $${\mathcal {A}}(\rho _t,{\varvec{j}}_t)={\mathcal {T}}_\mu (\rho _0,\rho _1)^2$$ for a.e. $$t\in [0,1]$$. Such curve is a constant-speed geodesic for $${\mathcal {T}}_\mu$$, i.e.,

\begin{aligned} {\mathcal {T}}_\mu (\rho _s,\rho _t)=|t-s|{\mathcal {T}}_\mu (\rho _0,\rho _1), \quad \text{ for } \text{ all } \, s,t\in [0,1]. \end{aligned}

The next proposition establishes a link between $${\mathcal {T}}_\mu$$ and the $$W_1$$-distance.

### Proposition 2.21

(Comparison with $$W_1$$) Let $$\mu \in {\mathcal {M}}^+({\mathbb {R}}^d)$$ satisfy (A1) for some $$C_\eta >0$$ (depending only on $$\mu$$ and $$\eta$$). Then for any $$\rho ^0,\rho ^1\in {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ it holds

\begin{aligned} W_1(\rho ^0,\rho ^1)\leqq \sqrt{2C_\eta }\, \sqrt{{\mathcal {T}}(\rho ^0,\rho ^1)}. \end{aligned}

### Proof

By a standard regularization argument and the truncation procedure as in the proof of Lemma 2.16, we can actually consider any 1-Lipschitz function $$\psi$$ as a test function in the weak formulation (2.13) for some $$(\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}(\rho ^0,\rho ^1)$$. Then we can estimate, by Lemma 2.10 and Assumption (A1),

\begin{aligned}&\left| {\int }_{{\mathbb {R}}^{d}}\psi \text{ d }\rho ^1 - {\int }_{{\mathbb {R}}^{d}}\psi \,\text{ d }\rho ^0 \right| \\ {}&\quad = \left| \frac{1}{2}{\int }_0^1 {\iint }_G {\overline{\nabla }}\psi \, \eta \,\text{ d }{\mathbf {j}}_t\, \text{ d }t \right| \leqq \frac{1}{2}{\int }_0^1 {\iint }_G |x-y | \, \eta (x,y)\, \text{ d }|{\mathbf {j}}_t |(x,y)\, \text{ d }t \\ {}&\quad \leqq \left( {\int }_0^1 {\mathcal {A}}(\rho _t, {\mathbf {j}}_t)\, \text{ d }t\right) ^{\frac{1}{2}} \left( {\int }_0^1{\iint }_G |x-y |^2 \eta (x,y) \bigl (\text{ d }\gamma _1+\text{ d }\gamma _2\bigr )\right) ^{\frac{1}{2}}\\ {}&\quad \leqq \left( {\int }_0^1 {\mathcal {A}}(\rho _t, {\mathbf {j}}_t)\,\text{ d }t\right) ^{\frac{1}{2}}\\ {}&\qquad \times \left( 2 {\int }_0^1{\iint }_G \left( |x-y |^2\vee |x-y |^4\right) \eta (x,y)\,\text{ d }\mu (y)\,\text{ d }\rho _t(x)\right) ^{\frac{1}{2}}\\ {}&\quad \leqq \sqrt{2C_\eta }\left( {\int }_0^1 {\mathcal {A}}(\rho _t, {\mathbf {j}}_t)\,\text{ d }t\right) ^{\frac{1}{2}}. \end{aligned}

Taking the supremum over all 1-Lipschitz functions and the infimum in the couplings $$(\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}(\rho ^0,\rho ^1)$$ gives the result. $$\square$$

The results above show that $${\mathcal {T}}_\mu$$ is an extended (meaning that it can take value $$\infty$$) quasi-metric on the set of probability measures which induces a topology stronger than the $$W_1$$-topology.

### Theorem 2.22

Let $$\mu \in {\mathcal {M}}^+({\mathbb {R}}^d)$$ satisfy Assumptions (A1) and (A2). The nonlocal upwind transportation cost $${\mathcal {T}}_\mu$$ defines an extended quasi-metric on $${\mathcal {P}}_2({{\mathbb {R}}^{d}})$$. The map $$(\rho _0,\rho _1)\mapsto {\mathcal {T}}_\mu (\rho _0,\rho _1)$$ is lower semicontinuous with respect to the narrow convergence. The topology induced by $${\mathcal {T}}_\mu$$ is stronger than the $$W_1$$-topology and the narrow topology. In particular, bounded sets are narrowly relatively compact in $$({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu )$$.

### Proof

If $${\mathcal {T}}_\mu (\rho _0,\rho _1)=0$$, then $${\mathcal {A}}(\mu ;\rho _t,{\varvec{j}}_t)=0$$ for a.e. $$t\in [0,1]$$. Hence $${\varvec{j}}_t \equiv 0$$ $$\gamma _t$$-a.e., which implies that $$\rho _0\equiv \rho _1$$ by the nonlocal continuity equation (2.15). The triangle inequality is a consequence of Lemma 2.19 and the fact that solutions to the nonlocal continuity equation can be concatenated. The lower semicontinuity and compactness properties of $${\mathcal {T}}_\mu$$ are inherited from the action functional $${\mathcal {A}}$$ via Proposition 2.17. In view of the comparison with $$W_1$$ from Proposition 2.21, we have that the topology induced by $${\mathcal {T}}_\mu$$ is stronger than that induced by $$W_1$$ and the narrow topology. $$\square$$

The next lemma provides a quantitative illustration of asymmetry of $${\mathcal {T}}$$.

### Lemma 2.23

(Two-point space) Let us consider the two-point graph $$\Omega :=\{0,1\}$$, with $$\eta (0,1)=\eta (1,0)=\alpha >0$$, $$\mu (0)=p>0$$ and $$\mu (1)=q>0$$. Let $$\rho ,\nu \in {\mathcal {P}}_2(\Omega )$$ and let $$\rho _0, \rho _1, \nu _0, \nu _1 \in [0,1]$$ be such that $$\rho =\rho _0\delta _0+\rho _1\delta _1$$ and $$\nu =\nu _0\delta _0+\nu _1\delta _1$$. There holds

\begin{aligned} {\mathcal {T}}(\rho ,\nu )= {\left\{ \begin{array}{ll} \frac{2}{\sqrt{\alpha p}} \left( \sqrt{\rho _1}-\sqrt{\nu _1}\right) &{}\quad \text { if } \rho _0< \nu _0, \\ \frac{2}{\sqrt{\alpha q}} \left( \sqrt{\rho _0}-\sqrt{\nu _0}\right) &{}\quad \text { if } \nu _0 < \rho _0. \end{array}\right. } \end{aligned}
(2.23)

### Proof

Let us fix $$\lambda =\delta _{(0,1)}+\delta _{(1,0)}$$ and notice that $$\rho _0+\rho _1=1$$ and $$\nu _0+\nu _1=1$$ as $$\rho ,\nu$$ are probability measures. Since $$\Omega =\{0,1\}$$, note that for any curve $$t\in [0,1]\mapsto \rho _t\in {\mathcal {P}}_2(\Omega )$$ there exists a function $$g:t\in [0,1]\mapsto g_t\in [0,1]$$ accounting for the mass displacement. Thus, we notice that $$(\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}(\rho ,\nu )$$ if

\begin{aligned}&\rho _t=g_t\delta _0+(1-g_t)\delta _1,\quad {\varvec{j}}_t\ \text { such that }\ {\varvec{j}}_t(0,1)=-\frac{\dot{g}_t}{\alpha }\ \text { and }\ {\varvec{j}}_t(1,0)=\frac{\dot{g}_t}{\alpha }, \\&\quad \text{ for } \text{ all } \, t\in [0,1]. \end{aligned}

Hence, using that $${\varvec{j}}_t$$ is antisymmetric yields

\begin{aligned} {\mathcal {T}}(\rho _0,\rho _1)^2&=\inf \left\{ {\int }_0^1{\mathcal {A}}(\rho _t,{\varvec{j}}_t)\,\text {d}t:(\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}(\rho _0,\rho _1)\right\} \\&=\inf _g\left\{ {\int }_0^1\frac{|(\dot{g}_t)_-|^2}{\alpha g_t q}+\frac{|(\dot{g}_t)_+|^2}{\alpha (1-g_t)p}\,\text {d}t : g_0 = \rho _0 \text { and } g_1 = \nu _0 \right\} . \end{aligned}

Now, let us assume without loss of generality that $$\rho _0 < \nu _0$$. Obviously, in this configuration we can restrict the above infimum among non-decreasing g, as it gives a lower action. Therefore, by applying Jensen’s inequality, we have

\begin{aligned} {\mathcal {T}}(\rho ,\nu )^2&=\inf _{g\nearrow }\frac{1}{\alpha p}{\int }_0^1\frac{|\dot{g}_t|^2}{(1-g_t)}\,\text {d}t = \inf _{g\nearrow }\frac{1}{\alpha p}{\int }_0^1\left| -2\frac{\text {d}}{\text {d}t} \left( \sqrt{1-g_t}\right) \right| ^2\,\text {d}t \\&\geqq \inf _{g\nearrow } \frac{4}{\alpha p} \left| {\int }_0^1 -\frac{\text {d}}{\text {d}t}\left( \sqrt{1-g_t}\right) \,\text {d}t \right|^2 = \frac{4}{\alpha p} \left( \sqrt{1-\rho _0} -\sqrt{1-\nu _0}\right) ^2 \\&= \frac{4}{\alpha p} \left( \sqrt{\rho _1} -\sqrt{\nu _1}\right) ^2 \,. \end{aligned}

The equality case is obtained by noting that the solution to $$-\frac{\text {d}}{\text {d}t} \sqrt{1-g_t}=\sqrt{\rho _1}-\sqrt{\nu _1}$$ for all $$t\in [0,1]$$, with consistent boundary values $$g_0=\rho _0$$ and $$g_1=\nu _0$$, is given by $$g_t = 1-\bigl (\sqrt{\rho _1}(1-t)+\sqrt{\nu _1} t \bigr )^2$$. The case $$\nu _0<\rho _0$$ is obtained in a similar manner, which gives formula (2.23). $$\square$$

### Remark 2.24

The quasi-metric is in general already non-symmetric on the two-point space, which one can best observe in Fig. 3. In the case $$p=\frac{1}{2}$$, the swapping $${\hat{\rho }}_0 = \rho _1$$ and $${\hat{\rho }}_1 = \rho _0$$ preserves the quasi-distance $${\mathcal {T}}(\rho ,\nu )= {\mathcal {T}}({\hat{\rho }},\hat{\nu })$$.

We now adapt the standard definition of absolutely continuous curves in metric spaces from [2, Chapter 1] to our setting. Let $$\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})$$ satisfy Assumptions (A1) and (A2). A curve $$[0,T]\ni t\mapsto \rho _t\in {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ is said to be 2-absolutely continuous with respect to $${\mathcal {T}}_\mu$$ if there exists $$m\in L^2((0,T))$$ such that

\begin{aligned} {\mathcal {T}}_\mu (\rho _{t_0},\rho _{t_1})\leqq {\int }_{t_0}^{t_1}m(t)\,\text {d}t\quad \text{ for } \text{ all } \, 0< t_0\leqq t_1< T. \end{aligned}
(2.24)

In this case, we write $$\rho \in {{\,\mathrm{AC}\,}}\bigl ([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu )\bigr )$$. For any $$\rho \in {{\,\mathrm{AC}\,}}\bigl ([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu )\bigr )$$ the quantity

\begin{aligned} |\rho '_t|:=\lim _{h\rightarrow 0}\frac{{\mathcal {T}}_\mu (\rho _t,\rho _{t+h})}{|h|} \end{aligned}
(2.25)

is well-defined for a.e. $$t\in [0,T]$$ and is called the metric derivative of $$\rho$$ at t. Moreover, the function $$t\rightarrow |\rho '|(t)$$ belongs to $$L^2((0,T))$$ and it satisfies $$|\rho '|(t)\leqq m(t)$$ for a.e. $$t\in [0,T]$$, which means $$\rho '$$ is the minimal integrand satisfying (2.24). The length of a curve $$\rho \in {{\,\mathrm{AC}\,}}\bigl ([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu )\bigr )$$ is defined by $$L(\rho ):={\int }_0^T|\rho '|(t)\,\text {d}t$$.

### Proposition 2.25

(Metric velocity) Let $$\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})$$ satisfy Assumptions (A1) and (A2). A curve $$(\rho _t)_{t\in [0,T]}\subset {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ belongs to $${{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu ))$$ if and only if there exists a family $$({\varvec{j}}_t)_{t\in [0,T]}$$ such that $$(\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T$$ and

\begin{aligned} {\int }_0^T\root \of {{\mathcal {A}}(\mu ;\rho _t,{\varvec{j}}_t)}\,\text {d}t<\infty . \end{aligned}

In this case, the metric derivative is bounded as in $$|\rho '|^{2}(t)\leqq {\mathcal {A}}(\mu ;\rho _t,{\varvec{j}}_t)$$ for a.e. $$t\in [0,T]$$. In addition, there exists a unique family $$(\tilde{{\varvec{j}}}_t)_{t\in [0,T]}$$ such that $$(\rho ,\tilde{{\varvec{j}}})\in {{\,\mathrm{CE}\,}}_T$$ and

\begin{aligned} |\rho '|^2(t)={\mathcal {A}}(\mu ;\rho _t,\tilde{{\varvec{j}}}_t)\qquad for\ a.e.\ t\in [0,T]. \end{aligned}
(2.26)

Hereby, the previous identity holds if and only if $$\tilde{{\varvec{j}}}_t\in T_{\rho }{\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ for a.e. $$t\in [0,T]$$, where

\begin{aligned}&T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})=\{ {\mathbf {j}}\in {\mathcal {M}}_{\gamma _1}^{\mathrm {as}}(G):{\mathcal {A}}(\mu ;\rho ,{\mathbf {j}}) <\infty ,\nonumber \\&\quad {\mathcal {A}}(\mu ;\rho ,{\mathbf {j}})\leqq {\mathcal {A}}(\mu ;\rho ,{\mathbf {j}}+{\mathbf {d}}) \text{ for } \text{ all } {\mathbf {d}}\in {\mathcal {M}}_{{{\,\mathrm {div}\,}}}(G)\}, \end{aligned}
(2.27)

with $${\mathcal {M}}_{\gamma _1}^{\mathrm {as}}(G)$$ defined in (2.9), and $${\mathcal {M}}_{{{\,\mathrm{div}\,}}}(G)$$ the set of nonlocal divergence-free fluxes, that is

\begin{aligned} {\mathcal {M}}_{{{\,\mathrm{div}\,}}}(G)=\left\{ {\varvec{d}}\in {\mathcal {M}}(G):{\iint }_G{\overline{\nabla }}\psi \, \eta \,\text {d}{\varvec{d}}=0\ for\ all\ \psi \in C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}})\right\} . \end{aligned}

### Proof

The first statement on the characterization of absolutely continuous curves as curves of finite action follows from [22, Theorem 5.17], in view of Lemma 2.19 and Propositions 2.17 and 2.20. Let us now show that (2.26) holds if and only if $${\tilde{{\varvec{j}}}}_t$$ belongs to $$T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ for a.e. $$t\in [0,1]$$, given by (2.27). Let $$t\in [0,1]$$ be so that $${\varvec{j}}_t$$ verifies $${\mathcal {A}}(\mu ;\rho _t,{\varvec{j}}_t) <+ \infty$$. Due to Corollary 2.8, the element $${\tilde{{\varvec{j}}}}_t$$ of minimal action satisfying (2.26) is characterized by $$\partial _t \rho _t + {\overline{\nabla }}\cdot {\varvec{j}}_t = 0 = \partial _t \rho _t +{\overline{\nabla }}\cdot {\tilde{{\varvec{j}}}}_t$$, that is,

\begin{aligned} {\tilde{{\varvec{j}}}}_t = \mathop {\mathrm{arg\,min}}\limits _{{\varvec{j}}\in {\mathcal {M}}_{\gamma _1}^{\mathrm {as}}(G)}\bigl \{ {\mathcal {A}}(\mu ;\rho _t, {\varvec{j}}) : {\overline{\nabla }}\cdot {\varvec{j}}= {\overline{\nabla }}\cdot {\varvec{j}}_t\bigr \}. \end{aligned}

Recalling the notation for the Jordan decomposition of a measure from Section 2.2, note that we use that the functional $${\varvec{j}}\mapsto {\mathcal {A}}(\mu ;\rho ,{\varvec{j}})$$ is strictly convex for $${\varvec{j}}\in {\mathcal {M}}(G)$$ such that $${\varvec{j}}^+ \ll \rho \otimes \mu$$ and $${\varvec{j}}^- \ll \mu \otimes \rho$$, which is guaranteed above since $${\mathcal {A}}(\mu ;\rho ,{\varvec{j}}) < \infty$$ and $${\varvec{j}}\in {\mathcal {M}}^{\mathrm {as}}_{\gamma _1}(G)$$. Then, we observe the set $$\{{\varvec{j}}\in {\mathcal {M}}_{\gamma _1}^{\mathrm {as}}(G):{\overline{\nabla }}\cdot {\varvec{j}}={\overline{\nabla }}\cdot {\varvec{j}}_t\}$$ is closed with respect to the narrow convergence. In addition, the estimate (2.10) from Lemma 2.10 with $$\Phi (x,y) = |x-y|\vee |x-y|^2$$ gives

\begin{aligned} \frac{1}{2}{\iint }_K \eta (x,y) \,\text {d}\left|{\varvec{j}} \right|(x,y)\leqq \frac{\sqrt{2C_\eta } \sqrt{{\mathcal {A}}(\mu ;\rho _t,{\varvec{j}})}}{\inf _K(|x-y|\vee |x-y|^2)} \quad \text {for all compact} \, K\subset G, \end{aligned}

showing that the sublevel sets of $${\varvec{j}}\mapsto {\mathcal {A}}(\mu ;\rho _t,{\varvec{j}})$$ are locally relatively compact with respect to the narrow convergence, arguing as in the proof of Proposition 2.17. Hence the element $${\tilde{{\varvec{j}}}}_t$$ is well-defined by applying the direct method of calculus of variations. $$\square$$

We defined the tangent space $$T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ in (2.27) using the nonlocal fluxes $${\varvec{j}}$$. We note that this is in some way a nonlocal, Lagrangian description of the tangent vectors and that the relationship between this Lagrangian description and the Eulerian description is the nonlocal continuity equation

\begin{aligned} \partial _t \rho _t = - {\overline{\nabla }}\cdot {\varvec{j}}, \end{aligned}

which is satisfied in the weak sense. This provides a useful heuristic, but as for classical Wasserstein gradient flows [2] the precise, rigorous definition of the tangent space is in Lagrangian form; we note, however, that here we use fluxes instead of velocities. This is not just a superficial difference. Namely, as can be seen in Proposition  2.26, the relation between velocities and fluxes is not linear and thus the velocities do not provide a linear parametrization of the tangent space. We use the argument from [22, Theorem 5.21] to characterize the tangent space $$T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ in more detail.

### Proposition 2.26

(Tangent fluxes have almost gradient velocities) Let $$\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})$$ satisfy Assumptions (A1) and (A2), and $$\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$. Then, it holds that $${\varvec{j}}\in T_{\rho }{\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ if and only if $${\varvec{j}}\in {\mathcal {M}}(G)$$ with $${\varvec{j}}^+\ll \gamma _1$$, $${\varvec{j}}^- \ll \gamma _2$$, and $$v^+:=\frac{\text {d}{\varvec{j}}^+}{\text {d}\gamma _1}$$, $$v^- :=\frac{\text {d}{\varvec{j}}^-}{\text {d}\gamma _2}$$ satisfy, for $$v:=v^+ - v^-:G\rightarrow {\mathbb {R}}$$, the relation

\begin{aligned} v \in \overline{\left\{ {\overline{\nabla }}\varphi : \varphi \in C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}})\right\} }^{L^2(\eta \,{\widehat{\gamma }}^v)}, \qquad \text {where}\qquad \text {d}{\widehat{\gamma }}^v = \chi _{\{v>0\}} \text {d}\gamma _1 + \chi _{\{v<0\}} \text {d}\gamma _2.\nonumber \\ \end{aligned}
(2.28)

### Proof

If $${\mathcal {A}}(\mu ;\rho ,{\varvec{j}})<\infty$$, then by Lemma 2.6 it holds for some $$v\in {\mathcal {V}}^{\mathrm {as}}(G)$$ that

\begin{aligned} \text {d}{\varvec{j}}(x,y)&= v(x,y)_+ \text {d}\gamma _1(x,y) - v(x,y)_- \text {d}\gamma _1(y,x)\\&= v(x,y) \text {d}\gamma _+(x,y) - v(y,x) \text {d}\gamma _+(y,x) \ , \end{aligned}

where $$\gamma _+ = \gamma _1|_{J^+}$$, with $$J^+ = {{\,\mathrm{supp}\,}}{\varvec{j}}^+$$, and we used that $$(J^+)^\top = {{\,\mathrm{supp}\,}}{\varvec{j}}^-$$. Then, by recalling the definition of the norm on $$L^2(\eta \,\gamma _1)$$ from (2.1),

\begin{aligned} {\mathcal {A}}(\mu ;\rho ,{\varvec{j}}) = 2 \Vert v_+ \Vert _{L^2(\eta \,\gamma _1)}^2 = 2 \Vert v \Vert _{L^2(\eta \,\gamma _+)}^2 . \end{aligned}

By using the relation between $${\varvec{j}}$$ and v from above, we can rewrite the divergence $${\overline{\nabla }}\cdot {\varvec{j}}$$ in weak form for any $$\psi \in C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}})$$:

\begin{aligned} \frac{1}{2}{\iint }_G {\overline{\nabla }}\psi \,\eta \,\text {d}{\varvec{j}}= {\iint }_G {\overline{\nabla }}\psi \, v_+ \, \eta \, \text {d}\gamma _1 = {\iint }_G {\overline{\nabla }}\psi \, v \, \eta \,\text {d}\gamma _+. \end{aligned}

Now, the characterization (2.27) of $${\varvec{j}}\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ is equivalent to

\begin{aligned}&{\iint }_G |v |^2 \eta \,\text {d}\gamma _+ \leqq {\iint }_G |v+w |^2 \eta \,\text {d}\gamma _+ \quad \text {for all} \, w\in {\mathcal {V}}^\mathrm {as}(G) \, \text {so that}\\&{\iint }_G {\overline{\nabla }}\psi \, w \, \eta \,\text {d}\gamma _+ = 0\;\; \forall \psi \in C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}}). \end{aligned}

Hence $$v^+$$ belongs to the closure of $$\{{\overline{\nabla }}\varphi : \varphi \in C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}}) \}$$ in $$L^2(\eta \,\gamma _+)$$. From the antisymmetry of v follows that $$v^-$$ belongs to the closure of $$\{{\overline{\nabla }}\varphi : \varphi \in C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}}) \}$$ in $$L^2(\eta \,\gamma _-)$$. Thus, the conclusion follows from the identity $$\gamma _+ + \gamma _+^\top = {\hat{\gamma }}^v$$ on G. $$\square$$

### Remark 2.27

Proposition 2.26 shows that for $$\mu$$ as in its statement, $$\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ and $${\varvec{j}}$$ chosen from a dense subset of $$T_{\rho }{\mathcal {P}}_2({{\mathbb {R}}^{d}})$$, there exists a measurable $$\varphi :{{\mathbb {R}}^{d}}\rightarrow {\mathbb {R}}$$ such that we have the identity

\begin{aligned} {\mathcal {A}}(\mu ;\rho ,{\mathbf {j}})={\mathcal {A}}(\mu ;\rho ,{\overline{\nabla }}\varphi \,\gamma _1)={\iint }_G \Bigl |\bigl ({\overline{\nabla }}\varphi \bigr )_+\Bigr |^2\eta \,\text{ d }\gamma _1. \end{aligned}

Finally, we provide an interesting property of absolutely continuous curves.

### Proposition 2.28

(Absolutely continuous curves stay supported on $$\mu$$) Let $$\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})$$ satisfy Assumptions (A1) and (A2) and $$\rho \in {{\,\mathrm{AC}\,}}([0,T],({\mathcal {P}}_2({\mathbb {R}}^d),{\mathcal {T}}_\mu ))$$ be such that $${{\,\mathrm{supp}\,}}\rho _0 \subseteq {{\,\mathrm{supp}\,}}\mu$$. Then, for all $$t\in [0,T]$$, it holds $${{\,\mathrm{supp}\,}}\rho _t\subseteq {{\,\mathrm{supp}\,}}\mu$$.

### Proof

Since $$(\rho _t)_{t\in [0,T]}$$ is absolutely continuous, there exists by Proposition 2.25 a unique family $$({\varvec{j}}_t)_{t\in [0,T]}$$ such that $$(\rho ,{\varvec{j}}) \in {{\,\mathrm{CE}\,}}_T$$ and $${\varvec{j}}_t \in T_{\rho _t}{\mathcal {P}}_2({\mathbb {R}}^d)\subseteq {\mathcal {M}}^{\mathrm {as}}_{\gamma _{1,t}}(G)$$, where $$\gamma _{1,t} = \rho _t \otimes \mu$$, and $$|\rho _t' |^2= {\mathcal {A}}(\mu ;\rho _t,{\varvec{j}}_t)$$ for a.e. $$t\in [0,T]$$. In particular, by Lemma 2.6, there exists a measurable family $$(v_t)_{t\in [0,T]}\subset {\mathcal {V}}^{\mathrm {as}}(G)$$ such that

\begin{aligned} \text {d}{\varvec{j}}_t(x,y) = v_t(x,y)_+\text {d}\rho _t(x)\text {d}\mu (y) - v_t(x,y)_-\text {d}\mu (x)\text {d}\rho _t(y). \end{aligned}

Without loss of generality, let $$(\rho _t)_{t\in [0,T]}$$ be the weakly continuous curve from Lemma 2.15 satisfying, for any test function $$\varphi \in C_\mathrm {c}^\infty ({\mathbb {R}}^d)$$ and $$t\in [0,T]$$,

\begin{aligned} {\int }_{{\mathbb {R}}^d} \varphi (x)\, \text {d}\rho _t(x)&= {\int }_{{\mathbb {R}}^d} \varphi (x) \,\text {d}\rho _0(x) + \frac{1}{2}{\int }_0^t {\iint }_G {\overline{\nabla }}\varphi (x,y)\eta (x,y)\,\text {d}{\varvec{j}}_s(x,y)\,\text {d}{s} \\&= {\int }_{{\mathbb {R}}^d} \varphi (x) \,\text {d}\rho _0(x) \\&\quad + {\int }_0^t {\iint }_G {\overline{\nabla }}\varphi (x,y) v_s(x,y)_+ \eta (x,y)\,\text {d}(\rho _s\otimes \mu )(x,y)\,\text {d}{s} . \end{aligned}

Now, let $$\varphi \in C_\mathrm {c}^\infty ({\mathbb {R}}^d)$$ with $$\varphi \geqq 0$$ and $${{\,\mathrm{supp}\,}}\varphi \subseteq {\mathbb {R}}^d {\setminus } {{\,\mathrm{supp}\,}}\mu$$. Then, for all $$t\in [0,T]$$, it holds

\begin{aligned} \frac{1}{2}{\int }_{{\mathbb {R}}^d} \varphi (x)\,\text {d}\rho _t(x) = - {\int }_0^t {\iint }_G \varphi (x) v_s(x,y)_+ \eta (x,y)\,\text {d}(\rho _s\otimes \mu )(x,y)\,\text {d}{s} \leqq 0 , \end{aligned}

which implies that $${{\,\mathrm{supp}\,}}\rho _t \subseteq {{\,\mathrm{supp}\,}}\mu$$, since $$\rho _t \in {\mathcal {P}}({\mathbb {R}}^d)$$ is in particular a non-negative measure for all $$t\in [0,T]$$ by Lemma 2.15. $$\square$$

## 4 Nonlocal Nonlocal-Interaction Equation

In this section we consider gradient flows in the spaces of probability measures $${\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ endowed with the nonlocal transportation quasi-metric $${\mathcal {T}}_\mu$$, defined by (2.22). From now until Section 3.4 (excluded) we fix $$\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})$$ satisfying (A1) and (A2), unless otherwise specified. For this reason we shall use the simplifications $${\mathcal {A}}(\rho ,{\varvec{j}})$$ for $${\mathcal {A}}(\mu ;\rho ,{\varvec{j}})$$ and $${\mathcal {T}}$$ for $${\mathcal {T}}_\mu$$.

In this section investigate the nonlocal nonlocal-interaction equation ($${\text {NL}}^2 {\text {IE}}$$) as a gradient flow with respect to the metric $${\mathcal {T}}$$. We restate it in a one-line form and note that from now on we consider the external potential $$P \equiv 0$$. The extension to $$P \not \equiv 0$$ is straightforward; see Remark 3.2. Thus,

In the classical setting of gradient flows in the spaces of probability measures endowed with the Wasserstein metric [2, 10], the nonlocal-interaction equation

\begin{aligned} \partial _t\rho _t+ \nabla \cdot ( \rho _t \nabla (K * \rho _t)) = 0 \end{aligned}
(3.1)

is the gradient flow of the nonlocal-interaction energy

\begin{aligned} {\mathcal {E}}(\rho )= \frac{1}{2}{\iint }_{{{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}} K(x,y)\,\text {d}\rho (x)\,\text {d}\rho (y). \end{aligned}
(3.2)

We start by discussing the geometry of ($${\text {NL}}^2 {\text {IE}}$$) and interpret it as the gradient flow of (3.2) in the infinite-dimensional Finsler manifold of measures endowed with the Finsler metric associated to $${\mathcal {T}}$$. Following this, we develop a framework of gradient flows in the quasi-metric space $${\mathcal {T}}$$, which extends the setup of gradient flows in metric spaces [2] to quasi-metric spaces. In particular, we build the existence theory for ($${\text {NL}}^2 {\text {IE}}$$) based on this approach.

Above, for simplicity, ($${\text {NL}}^2 {\text {IE}}$$) was written for $$\rho \ll \mu$$, where we recall that we used the notation $$\rho$$ to denote both the measure and the density with respect to $$\mu$$. Our framework, however, also applies to the case when $$\rho$$ is not absolutely continuous with respect to $$\mu$$. The general weak form of ($${\text {NL}}^2 {\text {IE}}$$) is obtained in terms of the nonlocal continuity equation as introduced in Section 2.3. Specifically, we have

### Definition 3.1

A curve $$\rho :[0,T]\rightarrow {\mathcal {P}}_2({\mathbb {R}}^d)$$ is called a weak solution to ($${\text {NL}}^2 {\text {IE}}$$) if, for the flux $${\varvec{j}}:[0,T]\rightarrow {\mathcal {M}}(G)$$ defined by

\begin{aligned} \text {d}{\varvec{j}}_t(x,y)={\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(x,y)_- \text {d}\rho _t(x)\text {d}\mu (y)-{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(x,y)_+ \text {d}\rho _t(y)\text {d}\mu (x), \end{aligned}

the pair $$(\rho ,{\varvec{j}})$$ is a weak solution to the continuity equation

\begin{aligned} \partial _t\rho _t+{\overline{\nabla }}\cdot {\varvec{j}}_t=0 \qquad \text {on}\ [0,T]\times {{\mathbb {R}}^{d}}, \end{aligned}

according to Definition 2.14.

Here we list the assumptions on the interaction kernel $$K:{{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}\rightarrow {\mathbb {R}}$$ we refer to throughout this section:

• (K1) $$K\in C({{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}})$$;

• (K2) K is symmetric, i.e., $$K(x,y)=K(y,x)$$ for all $$(x,y)\in {{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}$$;

• (K3) K is L-Lipschitz near the diagonal and at most quadratic far away, that is there exists some $$L\in (0,\infty )$$ such that, for all $$(x,y),(x',y')\in {{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}$$,

\begin{aligned} |K(x,y)-K(x',y')|\leqq L\left( |(x,y)-(x',y')|\vee |(x,y)-(x',y')|^2\right) . \end{aligned}

### Remark 3.2

Assumption (K3) implies that, for some $$C >0$$ and all $$x,y\in {{\mathbb {R}}^{d}}$$,

\begin{aligned} |K(x,y) | \leqq C \left( 1+ |x |^2 + |y |^2\right) ; \end{aligned}
(3.3)

indeed, for fixed $$(x',y')\in {{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}$$(K3) yields

\begin{aligned} |K(x,y) | - |K(x',y') | \leqq L \left( 1 \vee 2\left( |(x,y)|^2 + |(x',y')|^2\right) \right) , \end{aligned}

and bounding the maximum ($$\vee$$) by the sum, we arrive at $$|K(x,y) | \leqq L +2 L \left( |(x',y')|^2 + |(x,y)|^2\right) + |K(x',y') |$$, which gives (3.3) with $$C=2L\bigl (1+|(x',y')|^2\bigr ) + |K(x',y') |$$. We notice, by the way, that the bound (3.3) implies that $${\mathcal {E}}:{\mathcal {P}}_2({{\mathbb {R}}^{d}})\rightarrow {\mathbb {R}}$$ is proper with domain equal to  $${\mathcal {P}}_2({{\mathbb {R}}^{d}})$$.

As mentioned previously, the theory in this section can be easily extended to energies of the form (1.5) including potential energies $${\mathcal {E}}_P(\rho )={\int }_{{\mathbb {R}}^{d}}P \,\text {d}{\rho }$$ for some external potential $$P:{\mathbb {R}}^d \rightarrow {\mathbb {R}}$$ satisfying a local Lipschitz condition with at-most-quadratic growth at infinity; that is, similarly to (K3), there exists $$L\in (0,\infty )$$ so that for all $$x,y\in {\mathbb {R}}^d$$ we have

\begin{aligned} |P(x)-P(y) | \leqq L \left( |x-y|\vee |x-y|^2\right) . \end{aligned}

We now show that, under the above assumptions on the interaction potential K, we have narrow continuity of the energy.

### Proposition 3.3

(Continuity of the energy) Let the interaction potential K satisfy Assumptions (K1)(K3). Then, for any sequence $$(\rho ^n)_n \subset {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ such that $$\rho ^n \rightharpoonup \rho$$ as $$n\rightarrow \infty$$ for some $$\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$, we have

\begin{aligned} \lim _{n\rightarrow \infty } {\mathcal {E}}(\rho ^n) = {\mathcal {E}}(\rho ). \end{aligned}

### Proof

Let $$(\rho ^n)_n \subset {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ and $$\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ be such that $$\rho ^n \rightharpoonup \rho$$ as $$n\rightarrow \infty$$. For all $$R>0$$, we write $${{\overline{B}}}_R$$ the closed ball of radius R centered at the origin in $$({{\mathbb {R}}^{d}})^2$$ and $$\varphi _R :({{\mathbb {R}}^{d}})^2 \rightarrow {\mathbb {R}}$$ a continuous function such that $$\varphi _R(z) = 1$$ for all $$z\in {{\overline{B}}}_R$$, $$\varphi _R(z) = 0$$ for all $$z\in ({{\mathbb {R}}^{d}})^2 {\setminus } {{\overline{B}}}_{2R}$$, and $$\varphi _R(z) \leqq 1$$ for all $$z\in ({{\mathbb {R}}^{d}})^2$$. For all $$R>0$$, we then set $$K_R = \varphi _R K$$ and

\begin{aligned} {\mathcal {E}}_R(\nu ) = \frac{1}{2}{\iint }_{{{\mathbb {R}}^{d}}\times {\mathbb {R}}^d} K_R(x,y)\,\text {d}\nu (y)\,\text {d}\nu (x) \quad \text{ for } \text{ all } \, \nu \in {\mathcal {P}}_2({{\mathbb {R}}^{d}}). \end{aligned}

Since $$(\rho ^n)_n$$ converges narrowly to $$\rho$$ as $$n\rightarrow \infty$$ and $$K_R$$ is bounded and continuous, we get

\begin{aligned} {\mathcal {E}}_R(\rho ^n) \rightarrow {\mathcal {E}}_R(\rho ) \quad \text{ as } \, n\rightarrow \infty . \end{aligned}

Furthermore, since $$K_R \rightarrow K$$ pointwise as $$R\rightarrow \infty$$, $$|K_R| \leqq |K|$$ for all $$R>0$$, the domain of $${\mathcal {E}}$$ is $${\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ and $$\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$, we also have

\begin{aligned} {\mathcal {E}}_R(\rho ) \rightarrow {\mathcal {E}}(\rho ) \quad \text {as} \, R\rightarrow \infty \end{aligned}

by the Lebesgue dominated convergence theorem. Similarly, we also have

\begin{aligned} {\mathcal {E}}_R(\rho ^n) \rightarrow {\mathcal {E}}(\rho ^n) \quad \text{ as } \, R\rightarrow \infty \, \text {for all}\, n\in {\mathbb {N}}. \end{aligned}

By a diagonal argument, we deduce the result. $$\square$$

### 4.1 Identification of the Gradient in Finsler Geometry

Since the nonlocal upwind transportation cost $${\mathcal {T}}$$ is only a quasi-metric, the underlying structure of $${\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ does not have the formal Riemannian structure as it does in the classical gradient flow theory, but a Finslerian structure instead. This highlights the fact that at every point $$\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ the tangent space $$T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ is not a Euclidean space, but rather a manifold in its own right.

In this section we provide calculations, in the spirit of Otto’s calculus, that characterize the gradient descent in the infinite-dimensional Finsler manifold of probability measures endowed with the nonlocal transportation quasi-metric $${\mathcal {T}}$$. To keep the following considerations simple, we assume that $$\rho$$ is a given probability measure which is absolutely continuous with respect to $$\mu$$. In this way, we avoid the need to introduce yet another measure $$\lambda \in {\mathcal {M}}^+(G)$$ with respect to which all of the occurring measures are absolutely continuous, similar to how we proceeded in Definition 2.3 for the action. This restriction is done solely to make the presentation clearer and highlight the geometric structure. Hence any flux $${\varvec{j}}$$ of interest is absolutely continuous with respect to $$\mu \otimes \mu$$ and we can think of $${\varvec{j}}$$ via its density with respect to $$\mu \otimes \mu$$, which we shall denote by j (using a letter which is not bold).

At every tangent flux $${\varvec{j}}\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ we define an inner product $$g_{\rho ,{\varvec{j}}}:T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}}) \times T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}}) \rightarrow {\mathbb {R}}$$ by

\begin{aligned} g_{\rho ,{\mathbf {j}}}({\mathbf {j}}_1,{\mathbf {j}}_2)= & {} {} \frac{1}{2}{\iint }_G j_1(x,y)\,j_2(x,y)\, \eta (x,y) \nonumber \\&\times \left( \frac{\chi _{\{j>0\}}(x,y)}{\rho (x)} + \frac{\chi _{\{j<0\}}(x,y)}{\rho (y)} \right) \, \text{ d }\mu (x) \,\text{ d }\mu (y), \end{aligned}
(3.4)

where $$\{j>0\}$$ is an abbreviation for $$\{(x,y) \in G :j(x,y)>0\}$$ and similarly for $$\{j<0\}$$. The ratios are well-defined since $$\rho$$ cannot be zero where j is not zero. We note that this is the bilinear form that corresponds to the quadratic form defining the action (see Definition 2.3 and Remark 2.5); namely,

\begin{aligned} g_{\rho ,{\varvec{j}}}({\varvec{j}},{\varvec{j}}) = {\mathcal {A}}(\mu ; \rho , {\varvec{j}}). \end{aligned}

We refer the reader to “Appendix A” for a derivation of this inner product from a Minkowski norm on $$T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ as it is required in Finsler geometry. We recall that from Proposition 2.26 a dense subset of tangent-fluxes $${\varvec{j}}$$ are characterized by the existence of a potential $$\varphi \in C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}})$$ such that, for $$\mu \otimes \mu$$-a.e. $$(x,y) \in G$$,

\begin{aligned} j(x,y) = {\overline{\nabla }}\varphi (x,y) \left( \rho (x) \chi _{\{{\overline{\nabla }}\varphi >0\}}(x,y) + \rho (y) \chi _{\{{\overline{\nabla }}\varphi <0\}}(x,y) \right) . \end{aligned}
(3.5)

In this Finsler setting, we now want to determine the direction of steepest descent from $$\rho$$, for the underlying energy defined in (3.2). The gradient vector of some energy $${\mathcal {E}}:{\mathcal {P}}({\mathbb {R}}^d)\rightarrow {\mathbb {R}}$$ at $$\rho$$, which we denote by $${{\,\mathrm{grad}\,}}{\mathcal {E}}(\rho )$$, is defined as the tangent vector which satisfies

\begin{aligned} {{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}[{\varvec{j}}] = g_{\rho ,{{\,\mathrm{grad}\,}}{\mathcal {E}}(\rho )}\bigl ({{\,\mathrm{grad}\,}}{\mathcal {E}}(\rho ), {\varvec{j}}\bigr ) \qquad \text {for all} \, {\varvec{j}}\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}}), \end{aligned}

provided this vector exists and is unique. Here, we use the continuity equation Definition 2.14 to define variations via

\begin{aligned} {{\,\mathrm{Diff}\,}}_{\rho }{\mathcal {E}}[{\varvec{j}}] = \left. \frac{\text {d}}{\text {d}t}\right| _{t=0} {\mathcal {E}}({\tilde{\rho }}_t), \end{aligned}

where $${\tilde{\rho }}$$ is any curve such that $${\tilde{\rho }}_0= \rho$$ and $$\left. \frac{\text {d}}{\text {d}t}\right| _{t=0}{\tilde{\rho }}_t = - {\overline{\nabla }}\cdot {\varvec{j}}$$. From Definition 2.7, due to $$\mu \otimes \mu$$-absolute continuity of $${\varvec{j}}$$ we have that

\begin{aligned} -{\overline{\nabla }}\cdot {\varvec{j}}(x) = -{\int } \eta (x,y) j(x,y)\, \text {d}{\mu }(y) \qquad \text {for} \, \mu \text {-a.e.}\, x \in {{\mathbb {R}}^{d}}. \end{aligned}

In the case, when $${\mathcal {M}}$$ is a finite-dimensional Finsler manifold, such gradient vector exists and is unique since the mapping $$\ell :T_\rho {\mathcal {M}}\rightarrow (T_{\rho }{\mathcal {M}})^*,\, {\varvec{j}}\mapsto g_{\rho ,{\varvec{j}}}({\varvec{j}},\cdot )$$, is a bijection; see [18, Proposition 1.9]. For further details into Finsler geometry, we refer the reader to [4, 49]. In our case, we can at least claim that the functional $$\ell _\rho :T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}}) \rightarrow (T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}}))^*$$, given for $${\varvec{j}}\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ by

\begin{aligned} {\mathbf {j}}_2\mapsto & {} \ell _\rho ({\mathbf {j}})({\mathbf {j}}_2)= {} g_{\rho ,{\mathbf {j}}}({\mathbf {j}},{\mathbf {j}}_2) \nonumber \\&= {} \frac{1}{2}{\iint }_G j_2(x,y) \, \eta (x,y) \left( \frac{j(x,y)_+}{\rho (x)}-\frac{j(x,y)_-}{\rho (y)} \right) \,\text{ d }\mu (x)\, \text{ d }\mu (y) ,\nonumber \\ \end{aligned}
(3.6)

is injective $$\eta \, \mu \otimes \mu$$-a.e.; that is, the existence of a gradient implies its uniqueness ($$\eta \, \mu \otimes \mu$$-a.e.), in which case we have

\begin{aligned} \ell _\rho ({{\,\mathrm{grad}\,}}{\mathcal {E}}(\rho )) = {{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}. \end{aligned}

To see the injectivity of (3.6), we first note that $$\ell _\rho$$ is positively 1-homogeneous by definition. Moreover, we have the following one-sided version of a Cauchy–Schwarz-type estimate

\begin{aligned} \ell _\rho ({\mathbf {j}})({\mathbf {j}}_2)&\leqq \frac{1}{2}{\iint }_G \frac{ j_2(x,y)_+ j(x,y)_+}{\rho (x)} \eta (x,y) \,\text{ d }\mu (x)\, \text{ d }\mu (y) \nonumber \\&\quad + \frac{1}{2}{\iint }_G \frac{ j_2(x,y)_- j(x,y)_-}{\rho (y)} \eta (x,y) \,\text{ d }\mu (x)\, \text{ d }\mu (y) \nonumber \\&\leqq \sqrt{\ell _\rho ({\mathbf {j}})({\mathbf {j}}) \, \ell _\rho ({\mathbf {j}}_2)({\mathbf {j}}_2)}. \end{aligned}
(3.7)

Here, we also used that $$\sqrt{ab}+\sqrt{cd}\leqq \sqrt{(a+c)(b+d)}$$ for all $$a,b,c,d>0$$. Note that the above inequalities become strict if any of the integrands $$j_2(x,y)_+ j(x,y)_-$$ or $$j_2(x,y)_- j(x,y)_+$$ have a contribution. In particular, we could have $$\ell _\rho ({\varvec{j}})({\varvec{j}}_2)=-\infty$$ although the right-hand side is finite. Despite this, we still have equality in (3.7) if and only if $${\varvec{j}}_2 = \beta {\varvec{j}}_1$$ $$\eta \, \mu \otimes \mu$$-a.e. for some $$\beta \geqq 0$$.

To prove the injectivity of $$\ell _\rho$$, let us suppose that $${\varvec{j}}_1, {\varvec{j}}_2 \in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ are so that $$\ell _\rho ({\varvec{j}}_1) = \ell _\rho ({\varvec{j}}_2)$$. If $${\varvec{j}}_1 = 0$$ or $${\varvec{j}}_2 = 0$$ $$\eta \, \mu \otimes \mu$$-a.e., then $$\ell _\rho ({\varvec{j}}_1) = \ell _\rho ({\varvec{j}}_2)$$ implies that $${\varvec{j}}_1 = {\varvec{j}}_2 = 0$$. If both $${\varvec{j}}_1$$ and $${\varvec{j}}_2$$ are nonzero, then by the above Cauchy–Schwarz inequality we get

\begin{aligned} 0< & {} g_{\rho ,{\varvec{j}}_2}({\varvec{j}}_2,{\varvec{j}}_2) = \ell _\rho ({\varvec{j}}_2)({\varvec{j}}_2) = \ell _\rho ({\varvec{j}}_1)({\varvec{j}}_2) = g_{\rho ,{\varvec{j}}_1}({\varvec{j}}_1,{\varvec{j}}_2) \\\leqq & {} \sqrt{ g_{\rho ,{\varvec{j}}_1}({\varvec{j}}_1,{\varvec{j}}_1)g_{\rho ,{\varvec{j}}_2}({\varvec{j}}_2,{\varvec{j}}_2) }, \end{aligned}

which, after dividing by $$\sqrt{g_{\rho ,{\varvec{j}}_2}({\varvec{j}}_2,{\varvec{j}}_2)}$$ yields $$g_{\rho ,{\varvec{j}}_2}({\varvec{j}}_2,{\varvec{j}}_2) \leqq g_{\rho ,{\varvec{j}}_1}({\varvec{j}}_1,{\varvec{j}}_1)$$. Similarly, one gets $$g_{\rho ,{\varvec{j}}_1}({\varvec{j}}_1,{\varvec{j}}_1) \leqq g_{\rho ,{\varvec{j}}_2}({\varvec{j}}_2,{\varvec{j}}_2)$$, from which we get

\begin{aligned} g_{\rho ,{\varvec{j}}_1}({\varvec{j}}_1,{\varvec{j}}_1) = g_{\rho ,{\varvec{j}}_2}({\varvec{j}}_2,{\varvec{j}}_2). \end{aligned}

Hence

\begin{aligned} g_{\rho ,{\varvec{j}}_1}({\varvec{j}}_1,{\varvec{j}}_2)= & {} \ell _\rho ({\varvec{j}}_1)({\varvec{j}}_2) = \ell _\rho ({\varvec{j}}_2)({\varvec{j}}_2) = g_{\rho ,{\varvec{j}}_2}({\varvec{j}}_2,{\varvec{j}}_2) \\= & {} \sqrt{g_{\rho ,{\varvec{j}}_1}({\varvec{j}}_1,{\varvec{j}}_1)g_{\rho ,{\varvec{j}}_2}({\varvec{j}}_2,{\varvec{j}}_2)}, \end{aligned}

which is the equality case in the Cauchy–Schwarz inequality. Therefore, there exists $$\beta \geqq 0$$ such that $${\varvec{j}}_2 = \beta {\varvec{j}}_1$$. By positive 1-homogeneity of $$\ell _\rho$$ we get $$\ell _\rho ({\varvec{j}}_2) = \ell _\rho (\beta {\varvec{j}}_1) = \beta \ell _\rho ({\varvec{j}}_1) = \beta \ell _\rho ({\varvec{j}}_2)$$, so that $$\beta = 1$$, since $$\ell _\rho ({\varvec{j}}_2)({\varvec{j}}_2) \ne 0$$. This ends the proof of the claim of injectivity of $$\ell _\rho$$.

The direction of the steepest descent on Finsler manifolds is in general not $$-{{\,\mathrm{grad}\,}}{\mathcal {E}}(\rho )$$, but is defined to be the tangent flux, which we denote by $${{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho )$$, such that

\begin{aligned} -{{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}[{\varvec{j}}] = g_{\rho ,{{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho )\,}({{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho ), {\varvec{j}}) \qquad \text {for all} \, {\varvec{j}}\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}}). \end{aligned}

In other words, we define $${{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho )$$ as the tangent vector (provided it exists) such that

\begin{aligned} \ell _\rho ({{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho )) = -{{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}. \end{aligned}
(3.8)

Here we clearly see that in general $${{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho ) \ne -{{\,\mathrm{grad}\,}}{\mathcal {E}}(\rho )$$ since $$\ell _\rho$$ is not negatively 1-homogeneous. We can justify that $${{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho )$$ indeed corresponds to the direction of steepest descent at $$\rho$$ via the following criterion, which is analogous to the Riemann case. We first note that if $${{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}= 0$$ then $${{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho )=0$$. If $${{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}\ne 0$$ we note that minimizers $${\varvec{j}}^*$$ of

\begin{aligned} {\varvec{j}}\mapsto {{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}[{\varvec{j}}], \qquad \text {with the constraint that} \, g_{\rho ,{\varvec{j}}}({\varvec{j}},{\varvec{j}}) = 1, \end{aligned}

are of the form $${\varvec{j}}^* = \beta {{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho )$$ for some $$\beta >0$$. Indeed, using the fact that $$\left. \frac{\text {d}}{\text {d}s}\right| _{s=0}g_{\rho ,{\varvec{j}}+ s{\varvec{j}}_1}({\varvec{j}}+s{\varvec{j}}_1,{\varvec{j}}+ s{\varvec{j}}_1) = 2g_{\rho ,{\varvec{j}}}({\varvec{j}},{\varvec{j}}_1)$$ for all $${\varvec{j}},{\varvec{j}}_1\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ [as shown in (A.1) of “Appendix A”] and using the Lagrange multiplier $$\beta$$ and the functional

\begin{aligned} H(\beta ,{\varvec{j}}) := {{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}[{\varvec{j}}] + \tfrac{\beta }{2} (g_{\rho ,{\varvec{j}}}({\varvec{j}},{\varvec{j}}) -1), \qquad {\varvec{j}}\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}}),\quad \beta \in {\mathbb {R}}, \end{aligned}

yields, for a constrained minimizer $${\varvec{j}}^*$$, the condition

\begin{aligned} {{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}= - \beta ^* g_{\rho ,{\varvec{j}}^*}({\varvec{j}}^*,\cdot ) = - \beta ^* \ell _\rho ({\varvec{j}}^*). \end{aligned}
(3.9)

By the definition of $${\varvec{j}}^*$$ we have $$0> {{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}[{\varvec{j}}^*] = - \beta ^* g_{\rho ,{\varvec{j}}^*}({\varvec{j}}^*,{\varvec{j}}^*)$$, which implies that $$\beta ^*>0$$. By injectivity and positive 1-homogeneity of $$\ell _\rho$$, we get

\begin{aligned} {\varvec{j}}^* = \ell _\rho ^{-1}\left( -\frac{1}{\beta ^*}{{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}\right) = \frac{1}{\beta ^*} \ell _\rho ^{-1}(-{{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}) = \frac{1}{\beta ^*} {{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho ). \end{aligned}

The gradient flows with respect to $${\mathcal {E}}$$ in the Finsler space $$({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}})$$ can thus be written

\begin{aligned} \partial _t \rho _t = {\overline{\nabla }}\cdot {{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho ). \end{aligned}
(3.10)

These considerations stay valid for general energy functionals $${\mathcal {E}}:{\mathcal {P}}_2({{\mathbb {R}}^{d}})\rightarrow {\mathbb {R}}$$.

Let us compute the gradient flux for the specific case of the interaction energy (3.2). A direct computation using the symmetry of K and Definition 2.7 gives, for all $${\varvec{j}}\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$,

\begin{aligned}&-{{\,\mathrm {Diff}\,}}_\rho {\mathcal {E}}[{\mathbf {j}}]\\ {}&\quad = \frac{1}{2}{\iint }_G \bigl (-{\overline{\nabla }}(K*\rho )\bigr )(x,y) \, \eta (x,y) \, j(x,y) \,\text{ d }\mu (x)\,\text{ d }\mu (y) \\ {}&\quad = \frac{1}{2}{\iint }_G j(x,y) \, \eta (x,y) \\ {}&\qquad \times \left( \frac{\rho (x) \bigl (-{\overline{\nabla }}(K*\rho )\bigr )_+(x,y)}{\rho (x)} - \frac{\rho (y) \bigl (-{\overline{\nabla }}(K*\rho )\bigr )_-(x,y)}{\rho (y)} \right) \,\text{ d }\mu (x)\,\text{ d }\mu (y) \\ {}&\quad =\frac{1}{2}{\iint }_G j(x,y) \, \eta (x,y) \bigl (-{\overline{\nabla }}(K*\rho )(x,y)\bigr ) \\ {}&\qquad \times \left( \frac{\rho (x) \chi _{\{-{\overline{\nabla }}K*\rho >0)\}} (x,y)}{\rho (x)} + \frac{\rho (y) \chi _{\{-{\overline{\nabla }}K*\rho <0\}}(x,y)}{\rho (y)} \right) \,\text{ d }\mu (x)\,\text{ d }\mu (y) \\ {}&\quad = \ell _{\rho }\bigl ({{\,\mathrm {grad}\,}}^- {\mathcal {E}}(\rho )\bigr )({\mathbf {j}}) , \end{aligned}

where by comparison with (3.6), we observe that $${{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho )$$ is given for $$\mu \otimes \mu$$-a.e. $$(x,y) \in G$$ by

\begin{aligned}&{{\,\mathrm {grad}\,}}^- {\mathcal {E}}(\rho )(x,y) \nonumber \\&= {} -{\overline{\nabla }}(K*\rho )(x,y)\left( \rho (x)\chi _{\{-{\overline{\nabla }}K*\rho >0\}}(x,y) + \rho (y)\chi _{\{-{\overline{\nabla }}K*\rho <0\}}(x,y) \right) .\nonumber \\ \end{aligned}
(3.11)

This shows by (3.8) the existence and by our previous argument also uniqueness of $${{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho )$$. It is easily observed that it has exactly the form (3.5) with the corresponding potential given by $$\varphi = -K*\rho$$.

We conclude this section by mentioning that the Finsler gradient flow structure of differential equations has been discovered and investigated in other systems; see [1, 41, 42].

### 4.2 Variational Characterization for the Nonlocal Nonlocal-Interaction Equation

Section 3.1 shows that the nonlocal nonlocal-interaction equation ($${\text {NL}}^2 {\text {IE}}$$) can in fact be written as the gradient descent of the energy $${\mathcal {E}}$$ according to the Finsler gradient operator; see (3.10) and (3.11). This is why we refer to weak solutions of ($${\text {NL}}^2 {\text {IE}}$$) as gradient flows.

In this section we consider $$({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}})$$ as a quasi-metric space rather than a Finsler manifold, which allows us to prove rigorous statements more easily. In particular, we show that the weak solutions of ($${\text {NL}}^2 {\text {IE}}$$) are curves of maximal slope for the energy (3.2) in the quasi-metric space $$({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}})$$ and vice versa. We then establish the existence and stability of gradient flows using the variational framework of curves of maximal slope. To develop the variational formulation, we adapt the approach of [2] to curves of maximal slope in metric spaces to the quasi-metric space $$({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}})$$. This requires introducing a one-sided version of the usual concepts from [2] to cope with the asymmetry of the quasi-metric $${\mathcal {T}}$$.

### Definition 3.4

(One-sided strong upper gradient) A function $$h:{\mathcal {P}}_2({{\mathbb {R}}^{d}})\rightarrow [0,\infty ]$$ is a one-sided strong upper gradient for $${\mathcal {E}}$$ if for every $$\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}))$$ the function $$h\circ \rho$$ is Borel and

\begin{aligned} {\mathcal {E}}(\rho _t)-{\mathcal {E}}(\rho _s) \geqq - {\int }_s^th(\rho _\tau )|\rho _\tau '|\,\text {d}\tau \quad \; \text{ for } \text{ all } \, 0\leqq s\leqq t\leqq T, \end{aligned}
(3.12)

where $$|\rho '|$$ is the metric derivative of $$\rho$$ as defined in (2.25).

The above one-sided definition is sufficient to characterize the curves of maximal slope.

### Definition 3.5

(Curve of maximal slope) A curve $$\rho \in {{\,\mathrm{AC}\,}}([0,T];{\mathcal {P}}_2({{\mathbb {R}}^{d}}))$$ is a curve of maximal slope for $${\mathcal {E}}$$ with respect to its one-sided strong upper gradient h if and only if $$t\mapsto {\mathcal {E}}(\rho _t)$$ is non-increasing and

\begin{aligned} {\mathcal {E}}(\rho _t)-{\mathcal {E}}(\rho _s)+\frac{1}{2}{\int }_s^t \Bigl ( h(\rho _\tau )^2+|\rho _\tau '|^2 \Bigr ) \,\text {d}\tau \leqq 0 \quad \text{ for } \text{ all }\; 0\leqq s\leqq t\leqq T.\nonumber \\ \end{aligned}
(3.13)

### Remark 3.6

Note that by using Young’s inequality in (3.12), we get

\begin{aligned} {\mathcal {E}}(\rho _t)-{\mathcal {E}}(\rho _s)+\frac{1}{2}{\int }_s^t \Bigl ( h(\rho _\tau )^2+|\rho _\tau '|^2 \Bigr ) \,\text {d}\tau \geqq 0 \quad \; \text{ for } \text{ all } \, 0\leqq s\leqq t\leqq T. \end{aligned}

Hence, if the curve $$(\rho _t)_{t\in [0.T]}$$ is a curve of maximal slope for $${\mathcal {E}}$$ with respect to its strong upper gradient h, we actually have an equality in (3.13).

Therefore, in order to give a variational characterization of ($${\text {NL}}^2 {\text {IE}}$$) we need to detect the right one-sided strong upper gradient. As showed in [24], the variation of the energy along the solution to the equation provides the suitable candidate. In what follows we clarify this point as well as the strategy.

We recall that Proposition 2.25 ensures that for any $$\rho \in {{\,\mathrm{AC}\,}}\bigl ([0,T]; ({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}})\bigr )$$ there exists a unique flux $$({\varvec{j}}_t)_{t\in [0,T]}$$ in $$T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ such that $${\int }_0^T{\mathcal {A}}(\rho _t,{\varvec{j}}_t)\,\text {d}{t}<\infty$$, $$(\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T$$ and $$|\rho _t'|^2={\mathcal {A}}(\rho ,{\varvec{j}}_t)$$ for a.e. $$t\in [0,T]$$. Moreover, according to Lemma 2.6 there exists an antisymmetric measurable vector field $$w:[0,T]\times G \rightarrow {\mathbb {R}}$$ such that

\begin{aligned} \text {d}{\varvec{j}}_t(x,y) = w_t(x,y)_+ \text {d}\gamma _{1,t}(x,y) - w_t(x,y)_- \text {d}\gamma _{2,t}(x,y). \end{aligned}
(3.14)

It will be convenient to work directly with this vector field $$(w_t)_{t\in [0,T]}$$: from now on we write $$(\rho ,w)\in {{\,\mathrm{CE}\,}}_T$$ for $$(\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T$$ as well as $${\widehat{{\mathcal {A}}}}(\rho _t,w_t)$$ for $${\mathcal {A}}(\rho _t,{\varvec{j}}_t)$$ according to (2.8). With this convention, we can define a Finsler-type product on velocities in analogy to (3.4) as

\begin{aligned} {\widehat{g}}_{\rho ,w}(u,v)= & {} {} \frac{1}{2}{\iint }_G u(x,y)\,v(x,y)\, \eta (x,y) \\&\times \big (\chi _{\{w>0\}}(x,y)\text{ d }\gamma _1(x,y) + \chi _{\{w<0\}}(x,y) \,\text{ d }\gamma _2(x,y)\big ). \end{aligned}

Note that, under the absolute-continuity assumptions of Section 3.1, by comparing with (3.4) we have that $${\widehat{g}}_{\rho ,w}(u,v)= g_{\rho ,{\varvec{j}}}({\varvec{j}}_1,{\varvec{j}}_2)$$, where $${\varvec{j}}_1,{\varvec{j}}_2$$ are obtained from uv by (3.14), respectively. Moreover, taking (3.6) into account, we also define

\begin{aligned} {\widehat{\ell }}_{\rho }(w)(v) = {\widehat{g}}_{\rho ,w}(w,v) . \end{aligned}
(3.15)

Arguing as in (3.7), we arrive at the following one-sided Cauuchy–Schwarz inequality:

### Lemma 3.7

(One-sided Cauchy–Schwarz inequality) For all $$v,w \in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ it holds that

\begin{aligned} {\widehat{g}}_{\rho ,w}(w,v) \leqq \sqrt{ {\widehat{g}}_{\rho ,v}(v,v) \, {\widehat{g}}_{\rho ,w}(w,w)}, \end{aligned}
(3.16)

with equality if and only if, for some $$\lambda >0$$, $$v(x,y)_+= \lambda w(x,y)_+$$ for $$\eta \, \rho \otimes \mu$$-a.e. $$(x,y)\in G$$ (and thus, by antisymmetry, also $$v(x,y)_-= \lambda w(x,y)_-$$ for $$\eta \, \mu \otimes \rho$$-a.e. $$(x,y)\in G$$).

### Proof

Using $$v=v_+-v_-$$ and the usual Cauchy–Schwarz inequality in $$L^2(\eta \,\rho \otimes \mu )$$, we get

\begin{aligned} {\widehat{g}}_{\rho ,w}(w,v)&= \frac{1}{2}{\iint }_G v(x,y) \eta (x,y)\\ {}&\quad \times \bigl ( w(x,y)_+ \text{ d }\rho (x) \text{ d }\mu (y) - w(x,y)_- \text{ d }\mu (x) \,\text{ d }\rho (y)\bigr ) \\ {}&\leqq \frac{1}{2}{\iint }_G v(x,y)_+ w(x,y)_+ \eta (x,y) \,\text{ d }\rho (x) \,\text{ d }\mu (y) \\ {}&\quad + \frac{1}{2}{\iint }_G v(x,y)_-w(x,y)_-\eta (x,y)\, \text{ d }\mu (x) \,\text{ d }\rho (y)\\ {}&\leqq \sqrt{ {\widehat{g}}_{\rho ,v}(v,v) \, {\widehat{g}}_{\rho ,w}(w,w)}. \end{aligned}

From the usual Cauchy–Schwarz inequality we have equalities above if and only if there exists $$\lambda > 0$$ such that $$v(x,y)_+=\lambda w(x,y)_+$$ for $$\eta \rho \otimes \mu$$-a.e. $$(x,y) \in G$$ and $$v(x,y)_-=\lambda w(x,y)_-$$ for $$\eta \mu \otimes \rho$$-a.e. $$(x,y)\in G$$, since all the contributions are positive. $$\square$$

Now note that, from the weak formulation of the nonlocal continuity equation (2.15), we have for any $$\varphi \in C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}})$$ and any $$0\leqq s < t \leqq T$$ the following chain rule:

\begin{aligned}&{\int }_{{\mathbb {R}}^{d}}\varphi (x)\, \text{ d }\rho _t(x)-{\int }_{{\mathbb {R}}^{d}}\varphi (x)\, \text{ d }\rho _s(x) \nonumber \\ {}&\quad = \frac{1}{2}{\int }_{s}^t{\iint }_G{\overline{\nabla }}\varphi (x,y)\,\eta (x,y) \,\text{ d }{\mathbf {j}}_\tau (x,y)\,\text{ d }\tau \nonumber \\ {}&\quad = \frac{1}{2}{\int }_{s}^t{\iint }_G{\overline{\nabla }}\varphi (x,y)\,\eta (x,y) \nonumber \\ {}&\qquad \times \left( w_\tau (x,y)_+ \text{ d }\gamma _{1,\tau }(x,y) - w_\tau (x,y)_- \,\text{ d }\gamma _{2,\tau }(x,y) \right) \,\text{ d }\tau \nonumber \\ {}&\quad = \frac{1}{2}{\int }_{s}^t {\iint }_G{\overline{\nabla }}\varphi (x,y)w_\tau (x,y)\,\eta (x,y) \nonumber \\ {}&\qquad \times \left( \chi _{\{w>0\}}\,\text{ d }\gamma _{1,\tau }(x,y) + \chi _{\{w<0\}}\,\text{ d }\gamma _{2,\tau }(x,y)\right) \,\text{ d }\tau \nonumber \\ {}&\quad = {\int }_{s}^t {\widehat{g}}_{\rho _\tau ,w_\tau }(w_\tau ,{\overline{\nabla }}\varphi )\,\text{ d }\tau = {\int }_{s}^t {\widehat{\ell }}_{\rho }(w_\tau )({\overline{\nabla }}\varphi )\, \text{ d }\tau . \end{aligned}
(3.17)

Moreover, we still have the identification of the product $${\widehat{g}}$$ with the action in the form of Lemma 2.6,

\begin{aligned} {\widehat{g}}_{\rho _t,w_t}(w_t,w_t)&= \frac{1}{2}{\iint }_G w_t(x,y)^2\eta (x,y) \nonumber \\ {}&\quad \times \left( \chi _{\{w>0\}}(x,y)\,\text{ d }\gamma _{1,t}(x,y) + \chi _{\{w<0\}}(x,y) \,\text{ d }\gamma _{2,t}(x,y )\right) \nonumber \\ {}&= \frac{1}{2}{\iint }_G w_t(x,y)_+^2 \eta (x,y) \,\text{ d }\gamma _{1,t}(x,y) \nonumber \\ {}&\quad + \frac{1}{2}{\iint }_G w_t(x,y)_-^2 \eta (x,y) \,\text{ d }\gamma _{2,t}(x,y) \nonumber \\ {}&= \frac{1}{2}{\iint }_G \left( w_t(x,y)_+^2 + w_t(y,x)_-^2 \right) \eta (x,y) \text{ d }\gamma _{1,t}(x,y) \nonumber \\ {}&={\hat{{\mathcal {A}}}}(\rho _t,w_t), \end{aligned}
(3.18)

which shows that the action is the norm with respect to the Finsler structure.

A crucial step toward the variational characterization of ($${\text {NL}}^2 {\text {IE}}$$) mentioned above is to obtain the chain rule (3.17) for the energy functional (3.2), which is done in Proposition 3.10 below by a suitable regularization. As a consequence, by using the one-sided Cauchy–Schwarz inequality from Lemma 3.7, we obtain in Corollary 3.11 that the square root $$\sqrt{{\mathcal {D}}}$$ of the local slope, defined below in (3.19), is a one-sided strong upper gradient for $${\mathcal {E}}$$ with respect to the quasi-metric $${\mathcal {T}}$$ in the sense of Definition 3.4, where $$|\rho _t'|^2={\hat{{\mathcal {A}}}}(\rho _t,w_t)={\widehat{g}}_{\rho _t,w_t}(w_t,w_t)$$ for a.e. $$t\in [0,T]$$ due to Proposition 2.25 and (3.18). This allows us to define the De Giorgi functional, which provides the characterization of weak solutions as curves of maximal slope.

### Definition 3.8

(Local slope and De Giorgi functional) For any $$\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$, let the local slope at $$\rho$$ be given by

\begin{aligned} {\mathcal {D}}(\rho ) := {\widehat{g}}_{\rho ,-{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }}\left( -{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho },-{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }\right) . \end{aligned}
(3.19)

For any $$\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}))$$, the De Giorgi functional at $$\rho$$ is defined as

\begin{aligned} {\mathcal {G}}_T(\rho ):={\mathcal {E}}(\rho _T)-{\mathcal {E}}(\rho _0)+\frac{1}{2}{\int }_0^T\big ({\mathcal {D}}(\rho _\tau ) + |\rho _\tau '|^2\big )\,\text {d}\tau . \end{aligned}
(3.20)

When the dependence on the base measure $$\mu$$ needs to be explicit, the local slope and the De Giorgi functional are denoted by $${\mathcal {D}}(\mu ;\rho )$$ and $${\mathcal {G}}_T(\mu ;\rho )$$, respectively.

If the potential K satisfies Assumptions (K1)(K3), we note that whenever $$\rho$$ is a weak solution to ($${\text {NL}}^2 {\text {IE}}$$) and $$\rho \in {{\,\mathrm{AC}\,}}([0,T];{\mathcal {P}}_2({{\mathbb {R}}^{d}}))$$ the quantity $${\mathcal {G}}_T(\rho )$$ is finite; indeed, the domain of the energy is all of $${\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ and Proposition 2.25 yields that both the local slope (since it is equal to the action of $$(\rho ,{\varvec{j}})$$, where $${\varvec{j}}$$ is given in Definition 3.1) and metric derivative are finite.

We are ready to state our main theorem.

### Theorem 3.9

Suppose that $$\mu$$ satisfies Assumptions (A1) and (A2) and K satisfies Assumptions (K1)(K3). A curve $$(\rho _t)_{t\in [0,T]} \subset {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ is a weak solution to ($${\text {NL}}^2 {\text {IE}}$$) according to Definition 3.1 if and only if $$\rho$$ belongs to $${{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}))$$ and is a curve of maximal slope for $${\mathcal {E}}$$ with respect to $$\sqrt{{\mathcal {D}}}$$ in the sense of Definition 3.5, that is, satisfies

\begin{aligned} {\mathcal {G}}_T(\rho ) = 0, \end{aligned}
(3.21)

where $${\mathcal {G}}_T$$ is the De Giorgi functional as given in Definition 3.8.

Note that in the above theorem, the implicit assumption that $$\sqrt{{\mathcal {D}}}$$ is a one-sided strong upper gradient for $${\mathcal {E}}$$ is made; this is in fact true thanks to Corollary 3.11 below. In light of this we can represent the result via the following diagram:

\begin{aligned}&\rho \text { is a weak solution of } ({\text {NL}}^2{\text {IE}}) \\&\iff \!\! \rho \text { is a curve of maximal slope for } {\mathcal {E}}\text { w.r.t. }\! \sqrt{{\mathcal {D}}} \\&\iff \! {\mathcal {G}}_T(\rho )\! =\! 0. \end{aligned}

### 4.3 The Chain Rule and Proof of Theorem 3.9

Firstly, we focus on the chain-rule property, which is the main technical step for proving Theorem 3.9.

### Proposition 3.10

Let K satisfy Assumptions (K1)(K3). For all $$\rho \in {{\,\mathrm{AC}\,}}\bigl ([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}})\bigr )$$ and $$0\leqq s\leqq t\leqq T$$ we have the chain-rule identity

\begin{aligned} {\mathcal {E}}(\rho _t) - {\mathcal {E}}(\rho _s) = {\int }_s^t {\widehat{g}}_{\rho _\tau ,w_\tau }\left( w_\tau , {\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(\rho _\tau )\right) \,\text {d}\tau , \end{aligned}
(3.22)

where $$(w_t)_{t\in [0,T]}$$ is the antisymmetric vector field associated by (2.6) to $$(\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T$$.

### Proof

Since the curve $$\rho \in {{\,\mathrm{AC}\,}}\bigl ([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}})\bigr )$$, according to Proposition 2.25 there exists a unique family $$({\varvec{j}}_t)_{t\in [0,T]}$$ belonging to $$T_{\rho }{\mathcal {P}}_{2}({{\mathbb {R}}^{d}})$$ for a.e. $$t\in [0,T]$$ such that:

1. (i)

$$(\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T$$;

2. (ii)

$${\int }_0^T\sqrt{{\mathcal {A}}(\rho _t,{\varvec{j}}_t)}\,\text {d}t<\infty$$;

3. (iii)

$$|\rho _t'|^2={\mathcal {A}}(\rho _t,{\varvec{j}}_t)$$ for a.e. $$t\in [0,T]$$;

4. (iv)

$$\text {d}{\varvec{j}}_t(x,y) = w_t(x,y)_+ \text {d}\gamma _{1,t}(x,y) - w_t(x,y)_- \text {d}\gamma _{2,t}(x,y)$$.

Then the identity (3.22) is equivalent to proving

\begin{aligned} {\mathcal {E}}(\rho _t)-{\mathcal {E}}(\rho _s) = \frac{1}{2}{\int }_s^t{\iint }_G{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(\rho _\tau )(x,y)\, \eta (x,y) \,\text {d}{\varvec{j}}_\tau (x,y)\,\text {d}\tau . \end{aligned}
(3.23)

We proceed by applying two regularization procedures. First, for all $$(x,y)\in {{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}$$ we define $$K^\varepsilon (x,y)=K*m_\varepsilon (x,y)={\iint }_{{{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}} K(z,z')m_{\varepsilon }(x-z,y-z')\,\text {d}z\,\text {d}z'$$, where $$m_\varepsilon (z)=\frac{1}{\varepsilon ^{2d}}m(\frac{z}{\varepsilon })$$ for all $$z\in {\mathbb {R}}^{2d}$$ and $$\varepsilon >0$$, where m is a standard mollifier on $${\mathbb {R}}^{2d}$$. We also introduce a smooth cut-off function $$\varphi _R$$ on $${{\mathbb {R}}^{2d}}$$ such that $$\varphi (z)=1$$ on $$B_R$$, $$\varphi (z)=0$$ on $${{\mathbb {R}}^{2d}}{\setminus } B_{2R}$$ and $$|\nabla \varphi _R|\leqq \frac{2}{R}$$, where $$B_R$$ is the ball of radius R in $${{\mathbb {R}}^{2d}}$$ centered at the origin. We set $$K_R^\varepsilon :=\varphi _R K^\varepsilon$$ and note that it is a $$C_\mathrm {c}^\infty ({{\mathbb {R}}^{2d}})$$ function. We now introduce the approximate energies, indexed by $$\varepsilon$$ and R,

\begin{aligned} {\mathcal {E}}_R^\varepsilon (\nu )=\frac{1}{2}{\int }_{{\mathbb {R}}^{d}}{\int }_{{{\mathbb {R}}^{d}}} K_R^\varepsilon (x,y)\,\text {d}\nu (y)\,\text {d}\nu (x) \quad \text{ for } \text{ all }\, \nu \in {\mathcal {P}}_2({{\mathbb {R}}^{d}}). \end{aligned}

Let us extend $$\rho$$ and $${\varvec{j}}$$ to $$[-T,2 T]$$ periodically in time, meaning that $$\rho _{-s}=\rho _{T-s}$$ and $$\rho _{T+s}=\rho _{s}$$ for all $$s\in (0,T]$$ and likewise for $${\varvec{j}}$$. We regularize $$\rho$$ and $${\varvec{j}}$$ in time by using a standard mollifier n on $${\mathbb {R}}$$ supported on $$[-1,1]$$, by setting $$n_\sigma (t)=\frac{1}{\sigma }n(\frac{t}{\sigma })$$ and

\begin{aligned}&\rho _t^\sigma (A)=n_\sigma *\rho _t(A)={\int }_{-\sigma }^\sigma n_\sigma (t-s)\rho _s(A)\,\text {d}s, \qquad \forall A\subseteq {{\mathbb {R}}^{d}},\\&{\varvec{j}}_t^\sigma (U)=n_\sigma *{\varvec{j}}_t(A)={\int }_{-\sigma }^\sigma n_\sigma (t-s){\varvec{j}}_s(U)\,\text {d}s, \qquad \forall U\subset G, \end{aligned}

for any $$\sigma \in (0,T)$$; whence $$\rho _t^\sigma \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$. Let us now show that the integral of the action is uniformly bounded with respect to $$\sigma$$. Let $$|\lambda | \in {\mathcal {M}}^+(G)$$ be such that $$\gamma _{1,t},\gamma _{2,t},|{\varvec{j}}_t| \ll |\lambda |$$ for all $$t\in [0,T]$$. Then by using the joint convexity of the function $$\alpha$$ from (2.5), Jensen’s inequality and Fubini’s Theorem, we get

\begin{aligned}&{\int }_0^T{\mathcal {A}}(\rho _t^\sigma ,{\mathbf {j}}_t^\sigma )\, \text{ d }t \\ {}&\quad =\frac{1}{2}{\int }_0^T {\iint }_G \alpha \left( {\int }_{-\sigma }^{\sigma } \frac{{\text{ d }}^{} {\mathbf {j}}_{t-s}}{\text{ d } |\lambda |^{}} n_\sigma (s)\,\text{ d }s, {\int }_{-\sigma }^{\sigma } \frac{{\text{ d }}^{} \gamma _{1,t-s}}{\text{ d } |\lambda |^{}} n_\sigma (s)\,\text{ d }s \right) \eta \,\text{ d }|\lambda | \,\text{ d }t \\ {}&\qquad +\frac{1}{2}{\int }_0^T {\iint }_G \alpha \left( - {\int }_{-\sigma }^{\sigma } \frac{{\text{ d }}^{} {\mathbf {j}}_{t-s}}{\text{ d } |\lambda |^{}} n_\sigma (s)\,\text{ d }s, {\int }_{-\sigma }^{\sigma } \frac{{\text{ d }}^{} \gamma _{2,t-s}}{\text{ d } |\lambda |^{}} n_\sigma (s)\,\text{ d }s \right) \eta \,\text{ d }|\lambda |\, \text{ d }t \\ {}&\quad \leqq \frac{1}{2}{\int }_0^T {\iint }_G {\int }_{-\sigma }^\sigma \alpha \left( \frac{{\text{ d }}^{} {\mathbf {j}}_{t-s}}{\text{ d } |\lambda |^{}}, \frac{{\text{ d }}^{} \gamma _{1,t-s}}{\text{ d } |\lambda |^{}} \right) n_\sigma (s) \,\text{ d }s \, \eta \,\text{ d }|\lambda |\, \text{ d }t \\ {}&\qquad + \frac{1}{2}{\int }_0^T {\iint }_G {\int }_{-\sigma }^\sigma \alpha \left( - \frac{{\text{ d }}^{} {\mathbf {j}}_{t-s}}{\text{ d } |\lambda |^{}}, \frac{{\text{ d }}^{} \gamma _{2,t-s}}{\text{ d } |\lambda |^{}} \right) n_\sigma (s) \,\text{ d }s \, \eta \,\text{ d }|\lambda |\, \text{ d }t \\ {}&\quad = {\int }_{-\sigma }^{+\sigma } {\int }_0^T {\mathcal {A}}(\rho _{t-s},{\mathbf {j}}_{t-s}) \,\text{ d }t\, n_\sigma (s)\,\text{ d }s\\ {}&\quad \leqq {\int }_{-T}^{2T} {\mathcal {A}}(\rho _{t},{\mathbf {j}}_{t}) \, \text{ d }t = 3 {\int }_{0}^{T} {\mathcal {A}}(\rho _{t},{\mathbf {j}}_{t}) \, \text{ d }t<\infty . \end{aligned}

It is easy to check that $$(\rho ^\sigma ,{\varvec{j}}^\sigma )$$ is still a solution to the nonlocal continuity equation on [0, T]. By arguing as in the proof of Proposition 2.17, we get that along subsequences it holds $$\rho _t^\sigma \rightharpoonup \tilde{\rho }_t$$ as $$\sigma \rightarrow 0$$ for all $$t\in [0,T]$$ for some curve $$({\tilde{\rho }}_t)_{t\in [0,T]}$$ in $${\mathcal {P}}_2({{\mathbb {R}}^{d}})$$, and $${\varvec{j}}^\sigma \rightharpoonup \hat{{\varvec{j}}}$$ in $${\mathcal {M}}_{\mathrm {loc}}(G \times [0,T])$$. with $$\text {d}{\hat{{\varvec{j}}}} := \text {d}{\tilde{{\varvec{j}}}}_t\text {d}t$$, for some curve $$({\tilde{{\varvec{j}}}}_t)_{t\in [0,T]}$$ in $${\mathcal {M}}(G)$$. Note that $$n_\sigma \rightharpoonup \delta _0$$ as $$\sigma \rightarrow 0$$, and, as a consequence, $$\rho _t^\sigma \rightharpoonup \rho _t$$ for all $$t\in [0,T]$$ in the view of Proposition 2.21. Thus, we actually have $$\tilde{\rho }=\rho$$ and $$\tilde{{\varvec{j}}}={\varvec{j}}$$ by uniqueness of the limit and the flux, as highlighted above. Using the regularity for $$\varepsilon >0$$ and $$\sigma >0$$, we get

\begin{aligned} \frac{{\text {d}}^{} }{\text {d} t^{}} {\mathcal {E}}_R^\varepsilon (\rho _t^\sigma )= & {} {\int }_{{\mathbb {R}}^{d}}(K_R^\varepsilon *\rho _t^\sigma )(x)\partial _t\rho _t^\sigma (x)\,\text {d}\mu (x)\\= & {} \frac{1}{2}{\iint }_G{\overline{\nabla }}(K_R^\varepsilon *\rho _t^\sigma )(x,y)\,\eta (x,y) \,\text {d}{\varvec{j}}_t^\sigma (x,y). \end{aligned}

For the sake of completeness, we note that the second equality follows from the definition of $${{\,\mathrm{CE}\,}}_T$$ by using again a cut-off argument on the function $$K_R^\varepsilon *\rho _t^\sigma$$. We omit this step as it is a standard procedure. By integrating in time between s and t, with $$s\leqq t$$, it follows

\begin{aligned}&{\mathcal {E}}_R^\varepsilon (\rho _t^\sigma )-{\mathcal {E}}_R^\varepsilon (\rho _s^\sigma )\nonumber \\&\quad =\frac{1}{2}{\int }_s^t{\iint }_G{\overline{\nabla }}(K_R^\varepsilon *\rho _\tau ^\sigma )(x,y)\, \eta (x,y)\, \text {d}{\varvec{j}}_\tau ^\sigma (x,y)\,\text {d}\tau \nonumber \\&\quad =\frac{1}{2}{\int }_{s}^{t}{\iint }_G {\int }_{{\mathbb {R}}^{d}}\left( K_R^\varepsilon (y,z)-K_R^\varepsilon (x,z)\right) \text {d}\rho _\tau ^\sigma (z) \eta (x,y)\, \text {d}{\varvec{j}}_\tau ^\sigma (x,y)\,\text {d}\tau . \end{aligned}
(3.24)

In order to obtain (3.23) we need to let $$\varepsilon$$ and $$\sigma$$ go to 0 and R go to $$\infty$$ in (3.24). The left-hand side is easy to handle since $$\rho _t^\sigma \rightharpoonup \rho _t$$ as $$\sigma \rightarrow 0$$ for any $$t\in [0,T]$$, and $$K_R^\varepsilon \rightarrow K_R$$ uniformly on compact sets as $$\varepsilon \rightarrow 0$$. Finally, by letting R go to $$\infty$$ we have convergence to $${\mathcal {E}}(\rho _t)$$.

In order to pass to the limit in the right-hand side of (3.24), we use a truncation argument similar to that in the proof of Proposition 2.17. Let $$\delta >0$$ and let us set $$N_\delta = {\overline{B}}_{\delta ^{-1}} \times {\overline{B}}_{\delta ^{-1}}$$, where $$B_{\delta ^{-1}}= \left\{ x \in {\mathbb {R}}^d: |x|< \delta ^{-1}\right\}$$, and $$G_\delta =\bigl \{(x,y)\in G:\delta \leqq |x-y|\bigr \}$$. We can consider a family $$(\varphi _\delta )_{\delta >0} \subset C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}}\times G;[0,1])$$ of truncation functions such that, for all $$\delta >0$$,

\begin{aligned} \{\varphi _\delta = 1\} \supseteq {\overline{B}}_{\delta ^{-1}} \times G_\delta \cap N_\delta . \end{aligned}

Now, we add and subtract $$\varphi _\delta$$ in the integral on the RHS of (3.24) and we argue as follows. Since $$\rho _t^\sigma \otimes {\varvec{j}}_t^\sigma \rightharpoonup \rho _t\otimes {\varvec{j}}_t$$ for any $$t\in [0,T]$$ as $$\sigma \rightarrow 0$$, and $$K^\varepsilon _R\rightarrow K_R$$ uniformly on compact sets as $$\varepsilon \rightarrow 0$$, we can pass to the limit in $$\sigma$$ and $$\varepsilon$$, for any R and $$\delta >0$$,

\begin{aligned}&\frac{1}{2}{\int }_{s}^{t} {\iint }_G {\int }_{{\mathbb {R}}^{d}}\varphi _\delta (z,x,y) \left( K_R^\varepsilon (y,z)-K_R^\varepsilon (x,z)\right) \,\text {d}\rho _\tau ^\sigma (z) \eta (x,y)\,\text {d}{\varvec{j}}_\tau ^\sigma (x,y)\,\text {d}\tau \nonumber \\&\rightarrow \frac{1}{2} {\int }_{s}^{t} {\iint }_G {\int }_{{\mathbb {R}}^{d}}\varphi _\delta (z,x,y) \left( K_R(y,z)-K_R(x,z)\right) \,\text {d}\rho _\tau (z) \eta (x,y) \,\text {d}{\varvec{j}}_\tau (x,y)\,\text {d}\tau . \end{aligned}
(3.25)

By using $$\varphi _\delta \leqq 1$$, Assumption (K3), Lemma 2.10 with $$\Phi (x,y)=|x-y|\vee |x-y|^2$$ and (A1), we can bound the modulus of  (3.25) for any $$\tau \in [s,t]$$ by

\begin{aligned}&\frac{1}{2}{\iint }_G {\int }_{{\mathbb {R}}^{d}}\frac{|K_R(y,z)-K_R(x,z) |}{|x-y|\vee |x-y|^2}\,\text {d}\rho _t(z) \left( |x-y|\vee |x-y|^2\right) \eta (x,y) \,\text {d}|{\varvec{j}}_t|(x,y) \\&\quad \leqq L \sqrt{2C_\eta \, {\mathcal {A}}(\rho _t,{\varvec{j}}_t)}. \end{aligned}

Hence the integral is uniformly bounded in $$\delta$$ and R, and by the Lebesgue dominated convergence theorem we can pass to the limit in (3.25) in $$\delta$$ and R, obtaining

\begin{aligned} \frac{1}{2} {\int }_{s}^{t} {\iint }_G {\int }_{{\mathbb {R}}^{d}}\left( K(y,z)-K(x,z)\right) \,\text {d}\rho _\tau (z) \eta (x,y) \,\text {d}{\varvec{j}}_\tau (x,y)\,\text {d}\tau . \end{aligned}

Now, it remains to control the integral involving the term $$1-\varphi _\delta (z,x,y)$$ in the integrand. Let us note that, for all $$\delta >0$$,

\begin{aligned} \left( {{\mathbb {R}}^{d}}\times G\right) {\setminus } \{\varphi _\delta =1\} \subseteq \big ({\overline{B}}_{\delta ^{-1}}^\mathrm {c} \times G \big ) \cup \big ( {{\mathbb {R}}^{d}}\times ( G{\setminus } (G_\delta \cap N_\delta ))\big ) =: M_\delta . \end{aligned}

Using Assumption (K3) and splitting each contribution, we obtain

\begin{aligned}&\left|{\iint }_G {\int }_{{\mathbb {R}}^{d}}\left( 1-\varphi _\delta (z,x,y)\right) \left( K_R^\varepsilon (y,z)-K_R^\varepsilon (x,z)\right) \, \text {d}\rho _t^\sigma (z) \eta (x,y)\, \text {d}{\varvec{j}}_t^\sigma (x,y) \right| \\&\quad \leqq L {\iiint }_{M_\delta }\left( |x-y|\vee |x-y|^2\right) \eta (x,y)\,\text {d}{\varvec{j}}_t^\sigma (x,y)\,\text {d}\rho ^\sigma _t(z) \\&\quad \leqq L {\iiint }_{{\overline{B}}_{\delta ^{-1}}^\mathrm {c} \times G}\left( |x-y|\vee |x-y|^2\right) \eta (x,y)\,\text {d}{\varvec{j}}_t^\sigma (x,y)\,\text {d}\rho ^\sigma _t(z)\\&\qquad + 2L {\int }_{{{\mathbb {R}}^{d}}}\,\text {d}\rho ^\sigma _t(z){\iint }_{G_\delta ^\mathrm {c}}\left( |x-y|\vee |x-y|^2\right) w_t(x,y)_+ \eta (x,y)\,\text {d}\rho _t^\sigma (x)\,\text {d}\mu (y)\\&\qquad + 2L {\int }_{{{\mathbb {R}}^{d}}}\,\text {d}\rho ^\sigma _t(z){\iint }_{N_\delta ^\mathrm {c}}\left( |x-y|\vee |x-y|^2\right) w_t(x,y)_+ \eta (x,y)\,\text {d}\rho _t^\sigma (x)\,\text {d}\mu (y). \end{aligned}

Using Lemma 2.10 with $$\Phi (x,y)=|x-y|\vee |x-y|^2$$(A1) and the Cauchy–Schwarz inequality with respect to $$\eta \, \rho _t^\sigma \otimes \mu$$, the right-hand side in the inequality above can be further bounded by

\begin{aligned}&4L \sqrt{ C_\eta {\mathcal {A}}(\rho _t^\sigma ,{\varvec{j}}_t^\sigma )} \ \rho _t^\sigma \left( {\overline{B}}_{\delta ^{-1}}^\mathrm {c}\right) \\&+ 2L \sqrt{{\mathcal {A}}(\rho _t^\sigma ,{\varvec{j}}_t^\sigma )} \left( \left( {\iint }_{G_\delta ^\mathrm {c}} \left|x-y \right|^2 \eta (x,y) \,\text {d}\rho _t^\sigma (x)\, \text {d}\mu (y)\right) ^{\frac{1}{2}} + \sqrt{C_\eta \rho _t^{\sigma }\left( {\overline{B}}_{\delta ^{-1}}^\mathrm {c}\right) }\right) . \end{aligned}

Thanks to the uniform second moment bound of $$\rho _t^\sigma$$ from Lemma 2.16 and Assumption (A2), the above terms converge to zero as $$\delta \rightarrow 0$$, which concludes the proof. $$\square$$

That $$\sqrt{{\mathcal {D}}}$$ is a one-sided strong upper gradient for $${\mathcal {E}}$$ is an easy consequence of the previous result.

### Corollary 3.11

For any curve $$\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}))$$ it holds that

\begin{aligned} {\mathcal {E}}(\rho _t)-{\mathcal {E}}(\rho _s) \geqq - {\int }_s^t\sqrt{{\mathcal {D}}(\rho _\tau )}\,|\rho _\tau '| \,\text {d}\tau \quad \text{ for } \text{ all }\;\; \, 0\leqq s\leqq t\leqq T, \end{aligned}
(3.26)

i.e., $$\sqrt{{\mathcal {D}}}$$ is a one-sided strong upper gradient for $${\mathcal {E}}$$ in the sense of Definition 3.4.

### Proof

Without loss of generality we assume $${\int }_s^t\sqrt{{\mathcal {D}}(\rho _\tau )}|\rho '|(\tau )\,\text {d}\tau <\infty$$, as otherwise the inequality (3.26) is trivially satisfied. We obtain the result as consequence of Proposition 3.10 by applying the one-sided Cauchy–Schwarz inequality (Lemma 3.7) to (3.22) as follows: for any $$0\leqq s\leqq t\leqq T$$,

\begin{aligned}&{\mathcal {E}}(\rho _t) - {\mathcal {E}}(\rho _s) \\&\quad = {\int }_s^t {\widehat{g}}_{\rho _\tau ,w_\tau }\left( w_\tau , {\overline{\nabla }}\frac{\delta {\mathcal {E}}(\rho _\tau )}{\delta \rho }\right) \,\text {d}\tau = - {\int }_s^t {\widehat{g}}_{\rho _\tau ,w_\tau }\left( w_\tau ,-{\overline{\nabla }}\frac{\delta {\mathcal {E}}(\rho _\tau )}{\delta \rho }\right) \,\text {d}{\tau }\\&\quad \geqq -{\int }_s^t\sqrt{{\widehat{g}}_{\rho _\tau ,-{\overline{\nabla }}\frac{\delta {\mathcal {E}}(\rho _\tau )}{\delta \rho }}\left( -{\overline{\nabla }}\frac{\delta {\mathcal {E}}(\rho _\tau )}{\delta \rho } ,-{\overline{\nabla }}\frac{\delta {\mathcal {E}}(\rho _\tau )}{\delta \rho }\right) }\sqrt{{\widehat{g}}_{\rho _t,w_\tau }(w_\tau ,w_\tau )}\,\text {d}\tau \\&\quad ={\int }_s^t\sqrt{{\mathcal {D}}(\rho _\tau )}\,\sqrt{{\hat{{\mathcal {A}}}}(\rho _\tau ,w_t)} \,\text {d}\tau \\&\quad ={\int }_s^t\sqrt{{\mathcal {D}}(\rho _\tau )}\,|\rho '|(\tau ) \,\text {d}\tau . \end{aligned}

Note that the last two equalities are provided by identity (3.18) and Proposition 2.25. $$\square$$

At this point, we have collected all auxiliary results to deduce Theorem 3.9.

### Proof of Theorem 3.9

Let us start by assuming that $$\rho$$ is a weak solution to ($${\text {NL}}^2 {\text {IE}}$$). In view of Definition 3.1, a weak solution is obtained from the weak formulation of the nonlocal continuity equation (2.13) if we set

\begin{aligned} \text {d}{\varvec{j}}_t(x,y)={\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(x,y)_- \text {d}\rho _t(x)\text {d}\mu (y)-{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(x,y)_+ \text {d}\rho _t(y)\text {d}\mu (x). \end{aligned}

Then, by writing $$v_t^{\mathcal {E}}(x,y)=-{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(x,y)$$, it is easy to check that

\begin{aligned} {\mathcal {A}}(\rho _t,{\varvec{j}}_t)={\widehat{{\mathcal {A}}}}(\rho _t,v_t^{\mathcal {E}})={\mathcal {D}}(\rho _t) < \infty , \end{aligned}

where the finiteness follows from Assumptions (K3) and (A1), as shown by the computation

\begin{aligned} {\mathcal {D}}(\rho _t)&= {\iint }_G |({\overline{\nabla }}K*\rho _t(x,y))_-|^2\eta (x,y)\,\text {d}\rho _t(x)\,\text {d}\mu (y)\\&\leqq {\iint }_G ({\overline{\nabla }}K*\rho _t(x,y))^2\eta (x,y)\,\text {d}\rho _t(x)\,\text {d}\mu (y)\\&= {\iint }_G \left( {\int }_{{\mathbb {R}}^{d}}(K(x,z)-K(y,z))\,\text {d}{\rho _t}(z) \right) ^2 \eta (x,y)\,\text {d}\rho _t(x)\,\text {d}\mu (y)\\&\leqq {\iint }_G {\int }_{{\mathbb {R}}^{d}}(K(x,z)-K(y,z))^2\,\text {d}{\rho _t}(z) \eta (x,y)\,\text {d}\rho _t(x)\,\text {d}\mu (y)\\&\leqq L^2 {\int }_{{\mathbb {R}}^{d}}{\iint }_G \left( |x-y|^2\vee |x-y|^4\right) \eta (x,y)\,\text {d}\mu (y) \,\text {d}\rho _t(x)\,\text {d}{\rho _t}(z)\\&\leqq L^2 C_\eta {\int }_{{\mathbb {R}}^{d}}{\int }_{{\mathbb {R}}^{d}}\,\text {d}\rho _t(x)\,\text {d}{\rho _t}(z) = L^2 C_\eta . \end{aligned}

Thanks to Proposition 2.25, this also proves that $$\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}))$$ and $$|\rho _t'|^2\leqq {\mathcal {D}}(\rho _t)$$ for a.e. $$t\in [0,T]$$. In view of Proposition 3.10, we thus obtain

\begin{aligned} {\mathcal {E}}(\rho _t)-{\mathcal {E}}(\rho _s)&= {\int }_s^t {\widehat{g}}_{\rho _\tau ,v_\tau ^{\mathcal {E}}}\left( v_\tau ^{\mathcal {E}},{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(\rho _\tau )\right) \,\text {d}\tau \\&= - {\int }_s^t {\widehat{g}}_{\rho _\tau ,v_\tau ^{\mathcal {E}}}\left( v_\tau ^{\mathcal {E}},-{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(\rho _\tau )\right) \,\text {d}\tau \\&=-{\int }_s^t{\iint }_G\left| {\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(x,y)_-\right| ^2\eta (x,y)\,\text {d}\rho _\tau (x)\,\text {d}\mu (y)\,\text {d}\tau \\&=-{\int }_s^t {\mathcal {D}}(\rho _\tau )\,\text {d}\tau \leqq -{\int }_s^t\sqrt{{\mathcal {D}}(\rho _\tau )}|\rho '_\tau |\,\text {d}\tau . \end{aligned}

This implies that

1. (i)

the map $$t\mapsto {\mathcal {E}}(\rho _t)$$ is non-increasing;

2. (ii)

$${\mathcal {E}}(\rho _t)-{\mathcal {E}}(\rho _s)+\frac{1}{2}{\int }_s^t {\mathcal {D}}(\rho _\tau )+|\rho _\tau '|^2\,\text {d}\tau = 0$$, by Corollary 3.11.

Whence the first part of the theorem follows for $$s=0$$ and $$t=T$$ since $${\mathcal {G}}_T(\rho )=0$$.

Consider now $$\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}))$$ satisfying the equality (3.21). Let us verify that it is a weak solution of ($${\text {NL}}^2 {\text {IE}}$$). By Proposition 2.25 there exists a unique family $$({\varvec{j}}_t)_{t\in [0,T]}$$ in $$T_{\rho _t}{\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ such that $$(\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T$$, $${\int }_0^T\sqrt{{\mathcal {A}}(\rho _t,{\varvec{j}}_t)}\,\text {d}t<\infty$$ and $$|\rho _t'|^2={\mathcal {A}}(\rho _t,{\varvec{j}}_t)$$ for a.e. $$t\in [0,T]$$. Moreover, by Lemma 2.6 we find an antisymmetric measurable vector field $$w:[0,T]\times G \rightarrow {\mathbb {R}}$$ such that

\begin{aligned} \text {d}{\varvec{j}}_t(x,y) = w_t(x,y)_+ \text {d}\gamma _{1,t}(x,y) - w_t(x,y)_- \text {d}\gamma _{2,t}(x,y). \end{aligned}

Thanks to Proposition 3.10, by applying the one-sided Cauchy–Schwarz, using the identification (3.18), the definition of the local slope (3.19) and Young inequality, we get

\begin{aligned} {\mathcal {E}}(\rho _T)-{\mathcal {E}}(\rho _0)&={\int }_0^T {\widehat{g}}_{\rho _\tau ,w_\tau }\left( w_\tau , {\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(\rho _\tau ) \right) \,\text {d}\tau \\&= - {\int }_0^T {\widehat{g}}_{\rho _\tau ,w_\tau }\left( w_\tau , -{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(\rho _\tau ) \right) \,\text {d}\tau \\&\geqq -{\int }_0^T\sqrt{{\mathcal {D}}(\rho _\tau )}\sqrt{{\mathcal {A}}(\rho _\tau ,{\varvec{j}}_\tau )}\,\text {d}\tau =-{\int }_0^T\sqrt{{\mathcal {D}}(\rho _\tau )}|\rho _\tau '|\,\text {d}\tau \\&\geqq -\frac{1}{2}{\int }_0^T {\mathcal {D}}(\rho _\tau )\,\text {d}\tau - \frac{1}{2} {\int }_0^T |\rho _\tau '|^2\,\text {d}\tau . \end{aligned}

Thanks to the equality (3.21), we actually have that the above inequalities are equalities, which holds if and only if $$w_t(x,y)=-{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(x,y)$$ for a.e. $$t\in [0,T]$$ and $$\gamma _{1,t}$$-a.e. $$(x,y)\in G$$. Hence $$(\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T$$ with $$w=-{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }$$, that is, $$\rho$$ is a weak solution to ($${\text {NL}}^2 {\text {IE}}$$). $$\square$$

### 4.4 Stability and Existence of Weak Solutions

Theorem 3.9 provides a characterization of (weak) solutions to ($${\text {NL}}^2 {\text {IE}}$$) as minimizers of $${\mathcal {G}}_T$$ attaining the value 0. The direct method of calculus of variations gives existence of minimizers of $${\mathcal {G}}_T$$. However, it is not clear a priori whether they attain the value 0 and are thus actually weak solutions to ($${\text {NL}}^2 {\text {IE}}$$). Hence we prove compactness and stability of gradient flows (see Theorem 3.14) and approximate the desired problem by discrete problems for which the existence of solutions is easy to show; see the proof of Theorem 3.15. We start by proving that the local slope $${\mathcal {D}}$$ is narrowly lower semicontinuous jointly in its arguments, $$\mu$$ and $$\rho$$; see Lemma 3.12. We then establish the compactness coming from a uniform control of the De Giorgi functional $${\mathcal {G}}_T$$, as well as its joint narrow lower semicontinuity (see Lemma 3.13), which we prove using compactness in $${{\,\mathrm{CE}\,}}_T$$ and the joint narrow lower semicontinuity of the action (see Proposition 2.17) and of the local slope. (See also [48, Theorem 2] for an analogous strategy.)

In Theorem 3.14 we prove one of our main results, namely that the functional $${\mathcal {G}}_T$$ is stable under variations in base measures, defining the vertices of the graph, and absolutely continuous curves. A particular consequence of this theorem is that weak solutions to ($${\text {NL}}^2 {\text {IE}}$$) with respect to graphs defined by random samples of a measure $$\mu$$ converge to weak solutions to ($${\text {NL}}^2 {\text {IE}}$$) with respect to $$\mu$$; see Remark 3.17.

The existence of weak solutions of ($${\text {NL}}^2 {\text {IE}}$$) (and thus gradient flows) with respect to $${\mathcal {E}}$$ proved in Theorem 3.15 shows that, indeed, the De Giorgi functional (3.20) corresponding to an interaction potential K satisfying (K1)(K3) admits a minimizer when $$\mu ({{\mathbb {R}}^{d}})$$ is finite.

### Lemma 3.12

Let $$(\mu ^n)_n\subset {\mathcal {M}}^+({{\mathbb {R}}^{d}})$$ and suppose that $$(\mu ^n)_n$$ narrowly converges to $$\mu$$. Assume that the base measures $$(\mu ^n)_n$$ and $$\mu$$ are such that (A1) and (A2) hold uniformly in n, and let K satisfy Assumptions (K1)(K3). Let moreover $$(\rho ^n)_n$$ be a sequence such that $$\rho ^n \in {\mathcal {P}}_{2}({{\mathbb {R}}^{d}})$$ for all $$n\in {\mathbb {N}}$$ and $$\rho ^n\rightharpoonup \rho$$ as $$n\rightarrow \infty$$ for some $$\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$. Then

\begin{aligned} \liminf _{n\rightarrow \infty } {\mathcal {D}}(\mu ^n;\rho ^n) \geqq {\mathcal {D}}(\mu ;\rho ) . \end{aligned}

### Proof

For every $$n\in {\mathbb {N}}$$ we set $$u^n = {\overline{\nabla }}K*\rho ^n$$. Furthermore, we write $$u= {\overline{\nabla }}K*\rho$$ and define $$g:{\mathbb {R}}\rightarrow {\mathbb {R}}$$ by $$g(x) = (x_+)^2$$ for all $$x \in {\mathbb {R}}$$. Then note that g is convex and continuous, and

\begin{aligned} {\mathcal {D}}(\mu ^n;\rho ^n) = {\iint }_G g(u^n(x,y)) \eta (x,y) \,\text {d}\rho ^n(x) \,\text {d}\mu ^n(y), \end{aligned}

and, similarly,

\begin{aligned} {\mathcal {D}}(\mu ;\rho ) = {\iint }_G g(u(x,y)) \eta (x,y) \,\text {d}\rho (x)\,\text {d}\mu (y). \end{aligned}

We want to use [2, Theorem 5.4.4 (ii)] to prove the desired $$\liminf$$ inequality. Observe that $$u^n \in L^2(\eta \,\gamma _1^n)$$ and $$u \in L^2(\eta \,\gamma _1)$$; indeed, (K3) and (A1) give

\begin{aligned}&{\iint }_G u^n(x,y)^2 \eta (x,y)\,\text {d}\gamma _1^n(x,y)\\&\quad = {\iint }_G (K*\rho ^n(y) - K*\rho ^n(x))^2 \eta (x,y)\,\text {d}\gamma _1^n(x,y)\\&\quad \leqq L^2 C_\eta , \end{aligned}

and, similarly, for u. Let now $$\varphi \in C_\mathrm {c}^\infty (G)$$. We have

\begin{aligned}&{{\iint }_G u^n(x,y)\varphi (x,y)\eta (x,y) \,\text {d}\gamma _1^n(x,y)}\\&\quad = {\iint }_G \left( {\int }_{{\mathbb {R}}^{d}}K(y,z)\,\text {d}\rho ^n(z) - {\int }_{{\mathbb {R}}^{d}}K(x,z)\,\text {d}\rho ^n(z)\right) \varphi (x,y)\eta (x,y) \,\text {d}\gamma _1^n(x,y)\\&\quad = {\iint }_G {\int }_{{\mathbb {R}}^{d}}( K(y,z) - K(x,z) )\varphi (x,y)\eta (x,y) \,\text {d}(\rho ^n\otimes \gamma _1^n)(z,x,y)\\&\quad ={\iint }_{{{\,\mathrm{supp}\,}}\varphi }{\int }_{{{\mathbb {R}}^{d}}\cap B_R}( K(y,z) - K(x,z) )\varphi (x,y)\eta (x,y) \,\text {d}(\rho ^n\otimes \gamma _1^n)(z,x,y)\\&\qquad + {\iint }_{{{\,\mathrm{supp}\,}}\varphi }{\int }_{{{\mathbb {R}}^{d}}{\setminus } B_R}( K(y,z) - K(x,z) )\varphi (x,y)\eta (x,y) \,\text {d}(\rho ^n\otimes \gamma _1^n)(z,x,y). \end{aligned}

The last integral is actually vanishing as $$R\rightarrow \infty$$ since (K3)(A1) and Prokhorov’s Theorem give

\begin{aligned}&{\left| {\iint }_{{{\,\mathrm{supp}\,}}\varphi }{\int }_{{{\mathbb {R}}^{d}}{\setminus } B_R}(K(y,z) - K(x,z) )\varphi (x,y)\eta (x,y) \,\text {d}(\rho ^n\otimes \gamma _1^n)(z,x,y)\right| }\\&\quad \leqq \frac{L\Vert \varphi \Vert _{\infty }\rho ^n({{\mathbb {R}}^{d}}{\setminus } B_R)}{\inf _{{{\,\mathrm{supp}\,}}\varphi }(|x-y|\vee |x-y|^2)} \\&\qquad {\iint }_{{{\,\mathrm{supp}\,}}\varphi }(|x-y|^2\vee |x-y|^4)\eta (x,y)\,\,\text {d}\mu ^n(y)\,\,\text {d}\rho ^n(x)\\&\quad \leqq \frac{LC_\eta \Vert \varphi \Vert _{\infty }\rho ^n({{\mathbb {R}}^{d}}{\setminus } B_R)}{\inf _{{{\,\mathrm{supp}\,}}\varphi }(|x-y|\vee |x-y|^2)}\underset{R\rightarrow \infty }{\longrightarrow }0. \end{aligned}

The function $$(z,x,y) \mapsto (K(y,z) - K(x,z))\varphi (x,y)\eta (x,y)$$ is continuous and bounded on $$({{\mathbb {R}}^{d}}\cap B_R)\times G$$ thanks to Assumption (W). In addition, we note that $$(\rho ^n\otimes \gamma _1^n)_n$$ narrowly converges to $$\rho \otimes \gamma _1$$ in $${\mathcal {P}}({{\mathbb {R}}^{d}})\times {\mathcal {M}}^+(G)$$. Therefore, we obtain for any $$R>0$$ the convergence

\begin{aligned}&\lim _{n\rightarrow \infty }{\iint }_{{{\,\mathrm{supp}\,}}\varphi }{\int }_{{{\mathbb {R}}^{d}}\cap B_R} ( K(y,z) - K(x,z) )\varphi (x,y)\eta (x,y) \,\text {d}(\rho ^n\otimes \gamma _1^n)(z,x,y)\\&\quad ={\iint }_{{{\,\mathrm{supp}\,}}\varphi }{\int }_{{{\mathbb {R}}^{d}}\cap B_R}( K(y,z) - K(x,z) )\varphi (x,y)\eta (x,y) \,\text {d}(\rho \otimes \gamma _1)(z,x,y) . \end{aligned}

By sending $$R\rightarrow \infty$$, we obtain

\begin{aligned}&{\lim _{n\rightarrow \infty } {\iint }_G u^n(x,y)\varphi (x,y) \eta (x,y)\,\text {d}\gamma _1^n(x,y)}\\&= {\iint }_G {\int }_{{\mathbb {R}}^{d}}( K(y,z) - K(x,z) )\varphi (x,y)\eta (x,y) \,\text {d}(\rho \otimes \gamma _1)(z,x,y)\\&= {\iint }_G u(x,y)\varphi (x,y) \eta (x,y)\,\text {d}\gamma _1(x,y). \end{aligned}

Thus, $$u^n$$ converges weakly to u as $$n\rightarrow \infty$$ in the sense of [2, Definition 5.4.3]. By [2, Theorem 5.4.4 (ii)] we therefore conclude that

\begin{aligned} \liminf _{n\rightarrow \infty } {\mathcal {D}}(\mu ^n;\rho ^n)&= \liminf _{n\rightarrow \infty } {\iint }_G g(u^n(x,y)) \eta (x,y)\, \text {d}\rho ^n(x) \,\text {d}\mu ^n(y)\\&\geqq {\iint }_G g(u(x,y)) \eta (x,y) \,\text {d}\rho (x)\,\text {d}\mu (y) = {\mathcal {D}}(\mu ;\rho ) , \end{aligned}

which is the desired result. $$\square$$

Let us also prove the compactness and narrow lower semicontinuity of the De Giorgi functional.

### Lemma 3.13

(Compactness and lower semicontinuity of the De Giorgi functional) Let $$(\mu ^n)_n\subset {\mathcal {M}}^+({\mathbb {R}}^d)$$ and suppose that $$(\mu ^n)_n$$ narrowly converges to $$\mu$$. Assume that the base measures $$\mu ^n$$ and $$\mu$$ satisfy (A1) and (A2) uniformly in n, and let K satisfy (K1)(K3). Let moreover $$(\rho ^n)_n$$ be a sequence so that $$\rho ^n \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_{\mu ^n}))$$ for all $$n\in {\mathbb {N}}$$ with $$\sup _{n\in {\mathbb {N}}} M_2(\rho _0^n) < \infty$$ and $$\sup _{n\in {\mathbb {N}}} {\mathcal {G}}_T(\mu ^n;\rho ^n)<\infty$$. Then, up to a subsequence, $$\rho ^n_t \rightharpoonup \rho _t$$ as $$n\rightarrow \infty$$ for all $$t\in [0,T]$$ for some $$\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu ))$$ and

\begin{aligned} \liminf _{n\rightarrow \infty } {\mathcal {G}}_T(\mu ^n;\rho ^n) \geqq {\mathcal {G}}_T(\mu ;\rho ). \end{aligned}

### Proof

For any $$n\in {\mathbb {N}}$$, recall the definition

\begin{aligned} {\mathcal {G}}_T(\mu ^n;\rho ^n) = {\mathcal {E}}(\rho ^n_T) - {\mathcal {E}}(\rho ^n_0) + \frac{1}{2} {\int }_0^T {\mathcal {D}}(\mu ^n; \rho ^n_t) \,\text {d}t + \frac{1}{2} {\int }_0^T |(\rho _t^n)'|_{{\mathcal {T}}_{\mu ^n}}^2 \,\text {d}t, \end{aligned}

where we are careful to take the metric derivative of $$\rho ^n$$ with respect to $${\mathcal {T}}_{\mu ^n}$$ (as given in Definition 2.18). Since the domain of the energy $${\mathcal {E}}$$ is all of $${\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ and the local slope $${\mathcal {D}}$$ is non-negative, the bound $$\sup _{n\in {\mathbb {N}}} {\mathcal {G}}_T(\mu ^n;\rho ^n)<\infty$$ ensures that

\begin{aligned} \sup _{n\in {\mathbb {N}}} {\int }_0^T |(\rho ^n_t)'|_{{\mathcal {T}}_{\mu ^n}}^2 \,\text {d}t < \infty . \end{aligned}

For all $$n\in {\mathbb {N}}$$, since $$\rho ^n \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_{\mu ^n}))$$, Proposition 2.25 yields the existence of a flux $${\varvec{j}}^n$$ such that $$(\rho ^n,{\varvec{j}}^n)\in {{\,\mathrm{CE}\,}}_T$$ and $$|(\rho ^n_t)'|^2 = {\mathcal {A}}(\mu ^n;\rho ^n_t,{\varvec{j}}^n_t)$$ for almost all $$t\in [0,T]$$. We then get

\begin{aligned} \sup _{n\in {\mathbb {N}}} {\int }_0^T {\mathcal {A}}(\mu ^n;\rho ^n_t,{\varvec{j}}^n_t) \,\text {d}t = \sup _{n\in {\mathbb {N}}} {\int }_0^T |(\rho ^n_t)'|_{{\mathcal {T}}_{\mu ^n}}^2 \,\text {d}t < \infty . \end{aligned}

By Proposition 2.17, there now exists $$(\rho ,{\varvec{j}}) \in {{\,\mathrm{CE}\,}}_T$$ such that, up to subsequences, $$\rho _t^n \rightharpoonup \rho _t$$ for all $$t\in [0,T]$$ and $${\varvec{j}}^n \rightharpoonup {\varvec{j}}$$ as $$n\rightarrow \infty$$, and

\begin{aligned} \infty > \liminf _{n\rightarrow \infty } {\int }_0^T {\mathcal {A}}(\mu ^n;\rho ^n_t, {\varvec{j}}^n_t) \,\text {d}t \geqq {\int }_0^T {\mathcal {A}}(\mu ;\rho _t,{\varvec{j}}_t) \,\text {d}t. \end{aligned}

By Proposition 2.25, we therefore have $$\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu ))$$ and $$|(\rho _t)'|_{{\mathcal {T}}_\mu }^2 \leqq {\mathcal {A}}(\mu ;\rho _t,{\varvec{j}}_t)$$ for almost all $$t\in [0,T]$$, which finally gives

\begin{aligned} \liminf _{n\rightarrow \infty } {\int }_0^T |(\rho ^n_t)'|^2_{{\mathcal {T}}_{\mu ^n}} \,\text {d}t \geqq {\int }_0^T |\rho _t'|_{{\mathcal {T}}_\mu } \,\text {d}t. \end{aligned}
(3.27)

By the narrow continuity of the energy proved in Proposition 3.3, we get

\begin{aligned} \lim _{n\rightarrow \infty } {\mathcal {E}}(\rho ^n_T) = {\mathcal {E}}(\rho _T) \quad \text {and} \quad \lim _{n\rightarrow \infty } {\mathcal {E}}(\rho _0^n) = {\mathcal {E}}(\rho _0). \end{aligned}
(3.28)

Furthermore, by Fatou’s lemma and the narrow lower semicontinuity of the local slope shown in Lemma 3.12, we have

\begin{aligned} \liminf _{n\rightarrow \infty } {\int }_0^T {\mathcal {D}}(\mu ^n;\rho ^n_t) \,\text {d}t \geqq {\int }_0^T {\mathcal {D}}(\mu ;\rho _t) \,\text {d}t. \end{aligned}
(3.29)

Gathering (3.27), (3.28) and (3.29), we finally obtain

\begin{aligned} \liminf _{n\rightarrow \infty } {\mathcal {G}}_T(\mu ^n;\rho ^n)\geqq & {} {\mathcal {E}}(\rho _T) - {\mathcal {E}}(\rho _0) + \frac{1}{2} {\int }_0^T {\mathcal {D}}(\mu ;\rho _t) \,\text {d}t + \frac{1}{2} {\int }_0^T |\rho _t'|_{{\mathcal {T}}_\mu }^2 \,\text {d}t \\= & {} {\mathcal {G}}_T(\mu ;\rho ), \end{aligned}

which ends the proof. $$\square$$

We now get our stability result.

### Theorem 3.14

(Stability of gradient flows) Let $$(\mu ^n)_n\subset {\mathcal {M}}^+({\mathbb {R}}^d)$$ and suppose that $$(\mu ^n)_n$$ narrowly converges to $$\mu$$. Assume that the base measures $$\mu ^n$$ and $$\mu$$ satisfy (A1) and (A2) uniformly in n, and let the interaction potential K satisfy (K1)(K3). Suppose that $$\rho ^n$$ is a gradient flow of $${\mathcal {E}}$$ with respect to $$\mu ^n$$ for all $$n\in {\mathbb {N}}$$, that is,

\begin{aligned} {\mathcal {G}}_T(\mu ^n;\rho ^n) = 0 \quad \text{ for } \text{ all } \, n\in {\mathbb {N}}, \end{aligned}

such that $$(\rho _0^n)_n$$ satisfies $$\sup _{n\in {\mathbb {N}}} M_2(\rho _0^n)< \infty$$ and $$\rho _t^n \rightharpoonup \rho _t$$ as $$n\rightarrow \infty$$ for all $$t\in [0,T]$$ for some curve $$(\rho _t)_{t\in [0,T]} \subset {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$. Then, $$\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu ))$$ and $$\rho$$ is a gradient flow of $${\mathcal {E}}$$ with respect to $$\mu$$, that is,

\begin{aligned} {\mathcal {G}}_T(\mu ;\rho ) = 0. \end{aligned}

### Proof

By Lemma 3.13 we directly obtain that $$\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu ))$$ and, up to a subsequence,

\begin{aligned} 0 = \liminf _{n\rightarrow \infty } {\mathcal {G}}_T(\mu ^n;\rho ^n) \geqq {\mathcal {G}}(\mu ;\rho ). \end{aligned}

Finally, since $${\mathcal {G}}_T(\mu ;\rho ) \geqq 0$$ by Young’s inequality and Corollary 3.11, we obtain $${\mathcal {G}}_T(\mu ;\rho ) = 0$$. $$\square$$

Note that, via Theorem 3.9, the above theorem also shows stability of weak solutions to ($${\text {NL}}^2 {\text {IE}}$$). Typically, in Theorem 3.14, $$(\mu ^n)_n$$ is a sequence of atomic measures used to approximate, or sample, the support of $$\mu$$. Indeed, we now use this approach to show the existence of weak solutions to the nonlocal nonlocal-interaction equation.

### Theorem 3.15

(Existence of weak solutions) Let K be an interaction potential satisfying Assumptions (K1)(K3). Suppose that $$\mu \in {\mathcal {M}}^+({\mathbb {R}}^d)$$ is finite, i.e., $$\mu ({{\mathbb {R}}^{d}})<\infty$$, and satisfies (A2). Assume furthermore that for some $$C_\eta ' > 0$$ it holds that

\begin{aligned} \sup _{(x,y) \in G \cap {{\,\mathrm{supp}\,}}\mu \otimes \mu } \left( |x-y|^2 \vee |x-y|^4 \right) \eta (x,y) \leqq C_\eta '. \end{aligned}
(3.30)

Consider $$\rho _0 \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})$$ which is $$\mu$$-absolutely continuous. Then there exists a weakly continuous curve $$\rho :[0,T] \rightarrow {\mathcal {P}}({{\mathbb {R}}^{d}})$$ such that $${{\,\mathrm{supp}\,}}\rho _t\subseteq {{\,\mathrm{supp}\,}}\mu$$ for all $$t\in [0,T]$$, which is a weak solution of ($${\text {NL}}^2 {\text {IE}}$$) and satisfies the initial condition $$\rho (0)=\rho _0$$.

### Proof

Let $$(\mu ^n)_n \subset {\mathcal {M}}^+({\mathbb {R}}^d)$$ be a sequence of atomic measures such that $$(\mu ^n)_n$$ converges narrowly to $$\mu$$. Moreover, assume that $$\mu ^n$$ has finitely many atoms and $$\mu ^n({{\mathbb {R}}^{d}}) \leqq \mu ({{\mathbb {R}}^{d}})$$ and $${{\,\mathrm{supp}\,}}\mu ^n \subseteq {{\,\mathrm{supp}\,}}\mu$$ for all $$n\in {\mathbb {N}}$$. Let $${\hat{\mu }}^n$$ be the normalization of $$\mu ^n$$ which has the same total mass as $$\mu$$, that is,

\begin{aligned} {\hat{\mu }}^n = \frac{\mu ({\mathbb {R}}^d)}{\mu ^n({\mathbb {R}}^d)} \, \mu ^n , \end{aligned}

and let $$\pi ^n$$ be optimal transportation plan between $$\mu$$ and $${\hat{\mu }}^n$$ for the quadratic cost. Let $$\rho _0^n$$ be the second marginal of $${\tilde{\rho }}_0 \pi ^n$$, where $${\tilde{\rho }}_0$$ is the density of the measure $$\rho _0$$ with respect to $$\mu$$; namely, let $$\rho _0^n(A) = {\int }_{{\mathbb {R}}^d \times A} {\tilde{\rho }}_0(x) \,\text {d}\pi ^n(x,y)$$ for any Borel set $$A\subset {{\mathbb {R}}^{d}}$$. Note that $$\rho _0^n({\mathbb {R}}^d) = \rho _0({\mathbb {R}}^d)$$ and $$\rho _0^n \ll \mu ^n$$ for all $$n\in {\mathbb {N}}$$, and that, since $${\tilde{\rho }}_0 \pi ^n$$ is a transport plan between $$\rho _0$$ and $$\rho _0^n$$, $$\rho _0^n \rightharpoonup \rho _0$$ as $$n\rightarrow \infty$$.

Thanks to Assumption (3.30), it holds, for all $$n\in {\mathbb {N}}$$, that

\begin{aligned} \mathop {\mu \mathrm{-ess\,sup}}\limits _{x\in {\mathbb {R}}^d} {\int } ( |x-y|^2 \vee |x-y|^4 ) \eta (x,y)\, \text {d}\mu ^n(y)\leqq & {} \mu ^n({{\mathbb {R}}^{d}}) C_\eta ' \nonumber \\\leqq & {} \mu ({{\mathbb {R}}^{d}}) C_\eta ' . \end{aligned}
(3.31)

Since, by construction $$\rho _0^n \ll \mu ^n$$, we have $${{\,\mathrm{supp}\,}}\rho _0^n \subseteq {{\,\mathrm{supp}\,}}\mu ^n \subseteq {{\,\mathrm{supp}\,}}\mu$$. This nested support property is, thanks to Proposition 2.28, preserved in time, so that $${{\,\mathrm{supp}\,}}\rho _t^n \subseteq {{\,\mathrm{supp}\,}}\mu$$ for all $$t\in [0,T]$$ and $$n\in {\mathbb {N}}$$. For this reason, (3.31) can be used, under the stated support restriction on $$\rho _0$$, instead of Assumption (A1) uniformly in n when calling Lemma 3.13 and Theorem 3.14 later in this proof. Since $$\mu ^n$$ consists of finitely many atoms and $$\mu$$ satisfies (A2), the family $$(\mu _n)_n$$ satisfies (A2) uniformly in n.

By Remark 1.1, we know that the ODE system (1.2)–(1.4) admits a unique solution for all $$n\in {\mathbb {N}}$$. It can be easily checked that this solution, which we denote by $$\rho ^n$$, is a weak solution to ($${\text {NL}}^2 {\text {IE}}$$) with respect to $$\mu ^n$$ starting from $$\rho _0^n$$, according to Definition 3.1. By Theorem 3.9, we then get that $$\rho ^n$$ is a gradient flow of $${\mathcal {E}}$$ with respect to $$\mu$$ starting from $$\rho _0^n$$ for all $$n\in {\mathbb {N}}$$.

Combining the compactness part of Lemma 3.13 and the stability from Theorem 3.14, we get that, up to a subsequence, $$\rho _t^n \rightharpoonup \rho _t$$ as $$n\rightarrow \infty$$ for all $$t\in [0,T]$$, where $$\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu ))$$ is a gradient flow of $${\mathcal {E}}$$ with respect to $$\mu$$ starting from $$\rho _0$$. Theorem 3.9 finally shows that $$\rho$$ is a weak solution to ($${\text {NL}}^2 {\text {IE}}$$) with respect to $$\mu$$ starting from $$\rho _0$$. $$\square$$

### Remark 3.16

Assumption (3.30) is only needed to arrive at an atomic approximation sequence $$(\mu ^n)_n$$ of $$\mu$$ such that Assumptions (A1) and (A2) hold uniformly in n. On a case-by-case basis, one could drop (3.30) and try to construct the sequence $$(\mu ^n)_n$$ explicitly in such a way as to satisfy both assumptions uniformly in n.

### Remark 3.17

We conclude the section by remarking on the relevance of the Theorem 3.14 to the setting of machine learning. Namely, there $$\mu$$ is the measure modeling the true data distribution, which can be assumed to be compact. Let $$(x_i)_i$$ be a sequence of i.i.d. samples of $$\mu$$ and let $$\mu ^n = \frac{1}{n} \sum _{i=1}^n \delta _{x_i}$$ be the empirical measure of the first n sample points. Assume $$(\rho ^n)_n$$ is a narrowly converging sequence of probability measures such that $${{\,\mathrm{supp}\,}}\rho ^n \subseteq \{x_1, \dots , x_n\}$$ for all $$n\in {\mathbb {N}}$$, and denote by $$\rho$$ its limit. Assume that $$\eta$$ is an edge weight kernel such that $$\mu$$ and $$\eta$$ satisfy (A2) and  (3.30). Let K be an interaction kernel satisfying (K2) and (K3). Finally, let $$({\tilde{\rho }}^n)_n$$ be the sequence of solutions of ($${\text {NL}}^2 {\text {IE}}$$) in the sense of Definition 3.1 such that $${\tilde{\rho }}^n_0 = \rho ^n$$ for all $$n\in {\mathbb {N}}$$. Then, by Lemma 3.13, the sequence $$({\tilde{\rho }}_t^n)_n$$ narrowly converges along a subsequence for all $$t\in [0,T]$$, and furthermore, by Theorem 3.15, any curve $$({\tilde{\rho }}_t)_{t\in [0,T]}$$ of subsequential limits yields a solution $${\tilde{\rho }}$$ of ($${\text {NL}}^2 {\text {IE}}$$) with initial condition $$\rho$$.

### 4.5 Discussion of the Local Limit

Here we discuss at a formal level the connection between the nonlocal nonlocal-interaction equation and its limit as the graph structure localizes. We first present a very formal justification as to why we expect the solutions of ($${\text {NL}}^2 {\text {IE}}$$) to converge to the solutions of a nonlocal-interaction equation as the localizing parameter $$\varepsilon \rightarrow 0^+$$, i.e., as the edge-weight function $$\eta = \eta _\varepsilon$$ localizes. We conclude this section with an example that cautions that the formal argument cannot be justified in full generality. Proving the convergence of ($${\text {NL}}^2 {\text {IE}}$$) in the limit $$\varepsilon \rightarrow 0^+$$, under appropriate conditions, remains an intriguing open problem.

Take $$\mu ={\text {Leb}}({{\mathbb {R}}^{d}})$$ and choose $$\eta _\varepsilon$$ given by (2.2). Consider a smooth interaction potential $$K:{{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}\rightarrow {\mathbb {R}}$$ and a compactly supported initial condition $$\rho _0$$ which has a continuous density with respect to $$\mu$$. Let $$\rho ^\varepsilon$$ be the solution of ($${\text {NL}}^2 {\text {IE}}$$) starting from $$\rho _0$$ for the edge weight function $$\eta _\varepsilon$$. Assume that $$\rho ^\varepsilon _t$$ is absolutely continuous with respect to $$\mu$$ for all t. In the following we drop the t-dependence of $$\rho ^\varepsilon$$ for brevity. From ($${\text {NL}}^2 {\text {IE}}$$), by adding and subtracting $$\rho ^\varepsilon (x) {\int }_{{\mathbb {R}}^{d}}({\overline{\nabla }}K*\rho ^\varepsilon (x,y))_{+} \eta _\varepsilon (x,y) \,\text {d}y$$, it follows that

\begin{aligned} \partial _t \rho ^\varepsilon (x)= & {} - \rho ^\varepsilon (x) {\int }_{{{\mathbb {R}}^{d}}} {\overline{\nabla }}K*\rho ^\varepsilon (x,y) \eta _\varepsilon (x,y) \,\text {d}y \\&- {\int }_{{{\mathbb {R}}^{d}}} {\overline{\nabla }}\rho ^\varepsilon (x,y) ({\overline{\nabla }}K*\rho ^\varepsilon (x,y))_+ \eta _\varepsilon (x,y) \,\text {d}y. \end{aligned}

Then, for almost all $$x \in {{\mathbb {R}}^{d}}$$ we have

\begin{aligned}&{\int }_{{{\mathbb {R}}^{d}}} {\overline{\nabla }}K*\rho ^\varepsilon (x,y) \eta _\varepsilon (x,y) \,\text {d}y \\&\quad = \frac{2(2+d)}{\varepsilon ^2} {\int }_{{\mathbb {R}}^{d}}(K*\rho ^\varepsilon (y) - K*\rho ^\varepsilon (x)) \frac{\chi _{B_\varepsilon (x)}(y)}{|B_\varepsilon |} \,\text {d}y\\&\quad = \frac{2(2+d)}{\varepsilon ^2} \left( \frac{1}{|B_\varepsilon |} {\int }_{B_\varepsilon (x)} K*\rho ^\varepsilon (y) \,\text {d}y - K*\rho ^\varepsilon (x) \right) . \end{aligned}

A standard calculation, using a second-order Taylor expansion, shows that the right-hand side approximates $$\Delta K*\rho ^\varepsilon (x)$$ when $$\varepsilon$$ is small, provided that derivatives of $$\rho ^\varepsilon$$ remain uniformly bounded.

Similarly, by Taylor expanding $${\overline{\nabla }}\rho ^\varepsilon$$ and $${\overline{\nabla }}K *\rho ^\varepsilon$$ to first order and changing variable over the unit sphere while carefully tracking the positive part, one gets

\begin{aligned}&{\int }_{{\mathbb {R}}^{d}}{\overline{\nabla }}\rho ^\varepsilon (x,y) ({\overline{\nabla }}K*\rho ^\varepsilon (x,y))_+ \eta _\varepsilon (x,y) \,\text {d}y \approx \nabla \rho ^\varepsilon (x) \cdot \nabla K*\rho ^\varepsilon (x) \\&\quad \text {for small } \varepsilon . \end{aligned}

Combining the expressions above yields

\begin{aligned} \partial _t \rho ^\varepsilon (x)\approx & {} -\rho ^\varepsilon (x) \Delta K*\rho ^\varepsilon (x) - \nabla \rho ^\varepsilon (x) \cdot \nabla K*\rho ^\varepsilon (x)\\= & {} -\nabla \cdot (\rho ^\varepsilon \nabla K*\rho ^\varepsilon )(x). \end{aligned}

This suggests that if $$\rho ^\varepsilon$$ converge as $$\varepsilon \rightarrow 0^+$$, then the limiting $$\rho$$ is a solution of the standard nonlocal interaction equation (3.1). A possible way to attack the local limit within the variational framework is via a stability statement similar to that of Theorem 3.14, but now with respect to the family $$(\eta _\varepsilon )_{\varepsilon >0}$$ in the limit $$\varepsilon \rightarrow 0^+$$. The next remark indicates that this will require further regularity assumptions on the interaction kernel K.

### Remark 3.18

We present an example that indicates that, in certain situations, solutions of ($${\text {NL}}^2 {\text {IE}}$$) cannot be expected to converge to solutions of (3.1) as the interaction kernel $$\eta _\varepsilon$$ becomes more concentrated. Namely, consider $$d=1$$, $$\Omega = (-2,2)$$ and $$\mu ={\text {Leb}}(\Omega )$$. Let $$K(x,y) = 1-e^{-|x-y|}$$ for all $$x,y\in \Omega$$ and $$\eta$$ be a smooth, even function, positive on $$(-0.2,0.2)$$ and zero otherwise. Consider $$\rho _0 = \frac{1}{2} (\delta _{-1} + \delta _1)$$. It is straightforward to verify that $$\rho _t = \rho _0$$ for all $$t\in [0,T]$$ yields a weak solution of ($${\text {NL}}^2 {\text {IE}}$$) for all $$\varepsilon >0$$. In particular, note that the corresponding velocity field satisfies $$v(-1,y) = -(K*\rho _0(y) - K*\rho _0(-1)) \leqq 0$$ for all $$y \in (-1.2,-0.8)$$, and thus the flux from $$x=-1$$ remains zero, and analogously from $$x=1$$. Therefore, one cannot expect the weak solutions for the interaction potential K to converge to weak solutions of (3.1) as $$\varepsilon \rightarrow 0^+$$. We believe that, for these particular kernel K and edge weights $$\eta$$, the problem persists for strong solutions for initial data close to $$\rho _0$$, only that explicit solutions are not available.