In this section we consider gradient flows in the spaces of probability measures \({\mathcal {P}}_2({{\mathbb {R}}^{d}})\) endowed with the nonlocal transportation quasi-metric \({\mathcal {T}}_\mu \), defined by (2.22). From now until Section 3.4 (excluded) we fix \(\mu \in {\mathcal {M}}^+({{\mathbb {R}}^{d}})\) satisfying (A1) and (A2), unless otherwise specified. For this reason we shall use the simplifications \({\mathcal {A}}(\rho ,{\varvec{j}})\) for \({\mathcal {A}}(\mu ;\rho ,{\varvec{j}})\) and \({\mathcal {T}}\) for \({\mathcal {T}}_\mu \).
In this section investigate the nonlocal nonlocal-interaction equation (\({\text {NL}}^2 {\text {IE}}\)) as a gradient flow with respect to the metric \({\mathcal {T}}\). We restate it in a one-line form and note that from now on we consider the external potential \(P \equiv 0\). The extension to \(P \not \equiv 0\) is straightforward; see Remark 3.2. Thus,
In the classical setting of gradient flows in the spaces of probability measures endowed with the Wasserstein metric [2, 10], the nonlocal-interaction equation
$$\begin{aligned} \partial _t\rho _t+ \nabla \cdot ( \rho _t \nabla (K * \rho _t)) = 0 \end{aligned}$$
(3.1)
is the gradient flow of the nonlocal-interaction energy
$$\begin{aligned} {\mathcal {E}}(\rho )= \frac{1}{2}{\iint }_{{{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}} K(x,y)\,\text {d}\rho (x)\,\text {d}\rho (y). \end{aligned}$$
(3.2)
We start by discussing the geometry of (\({\text {NL}}^2 {\text {IE}}\)) and interpret it as the gradient flow of (3.2) in the infinite-dimensional Finsler manifold of measures endowed with the Finsler metric associated to \({\mathcal {T}}\). Following this, we develop a framework of gradient flows in the quasi-metric space \({\mathcal {T}}\), which extends the setup of gradient flows in metric spaces [2] to quasi-metric spaces. In particular, we build the existence theory for (\({\text {NL}}^2 {\text {IE}}\)) based on this approach.
Above, for simplicity, (\({\text {NL}}^2 {\text {IE}}\)) was written for \(\rho \ll \mu \), where we recall that we used the notation \(\rho \) to denote both the measure and the density with respect to \(\mu \). Our framework, however, also applies to the case when \(\rho \) is not absolutely continuous with respect to \(\mu \). The general weak form of (\({\text {NL}}^2 {\text {IE}}\)) is obtained in terms of the nonlocal continuity equation as introduced in Section 2.3. Specifically, we have
Definition 3.1
A curve \(\rho :[0,T]\rightarrow {\mathcal {P}}_2({\mathbb {R}}^d)\) is called a weak solution to (\({\text {NL}}^2 {\text {IE}}\)) if, for the flux \({\varvec{j}}:[0,T]\rightarrow {\mathcal {M}}(G)\) defined by
$$\begin{aligned} \text {d}{\varvec{j}}_t(x,y)={\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(x,y)_- \text {d}\rho _t(x)\text {d}\mu (y)-{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(x,y)_+ \text {d}\rho _t(y)\text {d}\mu (x), \end{aligned}$$
the pair \((\rho ,{\varvec{j}})\) is a weak solution to the continuity equation
$$\begin{aligned} \partial _t\rho _t+{\overline{\nabla }}\cdot {\varvec{j}}_t=0 \qquad \text {on}\ [0,T]\times {{\mathbb {R}}^{d}}, \end{aligned}$$
according to Definition 2.14.
Here we list the assumptions on the interaction kernel \(K:{{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}\rightarrow {\mathbb {R}}\) we refer to throughout this section:
-
(K1) \(K\in C({{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}})\);
-
(K2) K is symmetric, i.e., \(K(x,y)=K(y,x)\) for all \((x,y)\in {{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}\);
-
(K3) K is L-Lipschitz near the diagonal and at most quadratic far away, that is there exists some \(L\in (0,\infty )\) such that, for all \((x,y),(x',y')\in {{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}\),
$$\begin{aligned} |K(x,y)-K(x',y')|\leqq L\left( |(x,y)-(x',y')|\vee |(x,y)-(x',y')|^2\right) . \end{aligned}$$
Remark 3.2
Assumption (K3) implies that, for some \(C >0\) and all \(x,y\in {{\mathbb {R}}^{d}}\),
$$\begin{aligned} |K(x,y) | \leqq C \left( 1+ |x |^2 + |y |^2\right) ; \end{aligned}$$
(3.3)
indeed, for fixed \((x',y')\in {{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}\), (K3) yields
$$\begin{aligned} |K(x,y) | - |K(x',y') | \leqq L \left( 1 \vee 2\left( |(x,y)|^2 + |(x',y')|^2\right) \right) , \end{aligned}$$
and bounding the maximum (\(\vee \)) by the sum, we arrive at \(|K(x,y) | \leqq L +2 L \left( |(x',y')|^2 + |(x,y)|^2\right) + |K(x',y') |\), which gives (3.3) with \(C=2L\bigl (1+|(x',y')|^2\bigr ) + |K(x',y') |\). We notice, by the way, that the bound (3.3) implies that \({\mathcal {E}}:{\mathcal {P}}_2({{\mathbb {R}}^{d}})\rightarrow {\mathbb {R}}\) is proper with domain equal to \({\mathcal {P}}_2({{\mathbb {R}}^{d}})\).
As mentioned previously, the theory in this section can be easily extended to energies of the form (1.5) including potential energies \({\mathcal {E}}_P(\rho )={\int }_{{\mathbb {R}}^{d}}P \,\text {d}{\rho }\) for some external potential \(P:{\mathbb {R}}^d \rightarrow {\mathbb {R}}\) satisfying a local Lipschitz condition with at-most-quadratic growth at infinity; that is, similarly to (K3), there exists \(L\in (0,\infty )\) so that for all \(x,y\in {\mathbb {R}}^d\) we have
$$\begin{aligned} |P(x)-P(y) | \leqq L \left( |x-y|\vee |x-y|^2\right) . \end{aligned}$$
We now show that, under the above assumptions on the interaction potential K, we have narrow continuity of the energy.
Proposition 3.3
(Continuity of the energy) Let the interaction potential K satisfy Assumptions (K1)–(K3). Then, for any sequence \((\rho ^n)_n \subset {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) such that \(\rho ^n \rightharpoonup \rho \) as \(n\rightarrow \infty \) for some \(\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\), we have
$$\begin{aligned} \lim _{n\rightarrow \infty } {\mathcal {E}}(\rho ^n) = {\mathcal {E}}(\rho ). \end{aligned}$$
Proof
Let \((\rho ^n)_n \subset {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) and \(\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) be such that \(\rho ^n \rightharpoonup \rho \) as \(n\rightarrow \infty \). For all \(R>0\), we write \({{\overline{B}}}_R\) the closed ball of radius R centered at the origin in \(({{\mathbb {R}}^{d}})^2\) and \(\varphi _R :({{\mathbb {R}}^{d}})^2 \rightarrow {\mathbb {R}}\) a continuous function such that \(\varphi _R(z) = 1\) for all \(z\in {{\overline{B}}}_R\), \(\varphi _R(z) = 0\) for all \(z\in ({{\mathbb {R}}^{d}})^2 {\setminus } {{\overline{B}}}_{2R}\), and \(\varphi _R(z) \leqq 1\) for all \(z\in ({{\mathbb {R}}^{d}})^2\). For all \(R>0\), we then set \(K_R = \varphi _R K\) and
$$\begin{aligned} {\mathcal {E}}_R(\nu ) = \frac{1}{2}{\iint }_{{{\mathbb {R}}^{d}}\times {\mathbb {R}}^d} K_R(x,y)\,\text {d}\nu (y)\,\text {d}\nu (x) \quad \text{ for } \text{ all } \, \nu \in {\mathcal {P}}_2({{\mathbb {R}}^{d}}). \end{aligned}$$
Since \((\rho ^n)_n\) converges narrowly to \(\rho \) as \(n\rightarrow \infty \) and \(K_R\) is bounded and continuous, we get
$$\begin{aligned} {\mathcal {E}}_R(\rho ^n) \rightarrow {\mathcal {E}}_R(\rho ) \quad \text{ as } \, n\rightarrow \infty . \end{aligned}$$
Furthermore, since \(K_R \rightarrow K\) pointwise as \(R\rightarrow \infty \), \(|K_R| \leqq |K|\) for all \(R>0\), the domain of \({\mathcal {E}}\) is \({\mathcal {P}}_2({{\mathbb {R}}^{d}})\) and \(\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\), we also have
$$\begin{aligned} {\mathcal {E}}_R(\rho ) \rightarrow {\mathcal {E}}(\rho ) \quad \text {as} \, R\rightarrow \infty \end{aligned}$$
by the Lebesgue dominated convergence theorem. Similarly, we also have
$$\begin{aligned} {\mathcal {E}}_R(\rho ^n) \rightarrow {\mathcal {E}}(\rho ^n) \quad \text{ as } \, R\rightarrow \infty \, \text {for all}\, n\in {\mathbb {N}}. \end{aligned}$$
By a diagonal argument, we deduce the result. \(\square \)
Identification of the Gradient in Finsler Geometry
Since the nonlocal upwind transportation cost \({\mathcal {T}}\) is only a quasi-metric, the underlying structure of \({\mathcal {P}}_2({{\mathbb {R}}^{d}})\) does not have the formal Riemannian structure as it does in the classical gradient flow theory, but a Finslerian structure instead. This highlights the fact that at every point \(\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) the tangent space \(T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) is not a Euclidean space, but rather a manifold in its own right.
In this section we provide calculations, in the spirit of Otto’s calculus, that characterize the gradient descent in the infinite-dimensional Finsler manifold of probability measures endowed with the nonlocal transportation quasi-metric \({\mathcal {T}}\). To keep the following considerations simple, we assume that \(\rho \) is a given probability measure which is absolutely continuous with respect to \(\mu \). In this way, we avoid the need to introduce yet another measure \(\lambda \in {\mathcal {M}}^+(G)\) with respect to which all of the occurring measures are absolutely continuous, similar to how we proceeded in Definition 2.3 for the action. This restriction is done solely to make the presentation clearer and highlight the geometric structure. Hence any flux \({\varvec{j}}\) of interest is absolutely continuous with respect to \(\mu \otimes \mu \) and we can think of \({\varvec{j}}\) via its density with respect to \(\mu \otimes \mu \), which we shall denote by j (using a letter which is not bold).
At every tangent flux \({\varvec{j}}\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) we define an inner product \(g_{\rho ,{\varvec{j}}}:T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}}) \times T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}}) \rightarrow {\mathbb {R}}\) by
$$\begin{aligned} g_{\rho ,{\mathbf {j}}}({\mathbf {j}}_1,{\mathbf {j}}_2)= & {} {} \frac{1}{2}{\iint }_G j_1(x,y)\,j_2(x,y)\, \eta (x,y) \nonumber \\&\times \left( \frac{\chi _{\{j>0\}}(x,y)}{\rho (x)} + \frac{\chi _{\{j<0\}}(x,y)}{\rho (y)} \right) \, \text{ d }\mu (x) \,\text{ d }\mu (y), \end{aligned}$$
(3.4)
where \(\{j>0\}\) is an abbreviation for \(\{(x,y) \in G :j(x,y)>0\}\) and similarly for \(\{j<0\}\). The ratios are well-defined since \(\rho \) cannot be zero where j is not zero. We note that this is the bilinear form that corresponds to the quadratic form defining the action (see Definition 2.3 and Remark 2.5); namely,
$$\begin{aligned} g_{\rho ,{\varvec{j}}}({\varvec{j}},{\varvec{j}}) = {\mathcal {A}}(\mu ; \rho , {\varvec{j}}). \end{aligned}$$
We refer the reader to “Appendix A” for a derivation of this inner product from a Minkowski norm on \(T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) as it is required in Finsler geometry. We recall that from Proposition 2.26 a dense subset of tangent-fluxes \({\varvec{j}}\) are characterized by the existence of a potential \(\varphi \in C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}})\) such that, for \(\mu \otimes \mu \)-a.e. \((x,y) \in G\),
$$\begin{aligned} j(x,y) = {\overline{\nabla }}\varphi (x,y) \left( \rho (x) \chi _{\{{\overline{\nabla }}\varphi >0\}}(x,y) + \rho (y) \chi _{\{{\overline{\nabla }}\varphi <0\}}(x,y) \right) . \end{aligned}$$
(3.5)
In this Finsler setting, we now want to determine the direction of steepest descent from \(\rho \), for the underlying energy defined in (3.2). The gradient vector of some energy \({\mathcal {E}}:{\mathcal {P}}({\mathbb {R}}^d)\rightarrow {\mathbb {R}}\) at \(\rho \), which we denote by \({{\,\mathrm{grad}\,}}{\mathcal {E}}(\rho )\), is defined as the tangent vector which satisfies
$$\begin{aligned} {{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}[{\varvec{j}}] = g_{\rho ,{{\,\mathrm{grad}\,}}{\mathcal {E}}(\rho )}\bigl ({{\,\mathrm{grad}\,}}{\mathcal {E}}(\rho ), {\varvec{j}}\bigr ) \qquad \text {for all} \, {\varvec{j}}\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}}), \end{aligned}$$
provided this vector exists and is unique. Here, we use the continuity equation Definition 2.14 to define variations via
$$\begin{aligned} {{\,\mathrm{Diff}\,}}_{\rho }{\mathcal {E}}[{\varvec{j}}] = \left. \frac{\text {d}}{\text {d}t}\right| _{t=0} {\mathcal {E}}({\tilde{\rho }}_t), \end{aligned}$$
where \({\tilde{\rho }}\) is any curve such that \({\tilde{\rho }}_0= \rho \) and \(\left. \frac{\text {d}}{\text {d}t}\right| _{t=0}{\tilde{\rho }}_t = - {\overline{\nabla }}\cdot {\varvec{j}}\). From Definition 2.7, due to \(\mu \otimes \mu \)-absolute continuity of \({\varvec{j}}\) we have that
$$\begin{aligned} -{\overline{\nabla }}\cdot {\varvec{j}}(x) = -{\int } \eta (x,y) j(x,y)\, \text {d}{\mu }(y) \qquad \text {for} \, \mu \text {-a.e.}\, x \in {{\mathbb {R}}^{d}}. \end{aligned}$$
In the case, when \({\mathcal {M}}\) is a finite-dimensional Finsler manifold, such gradient vector exists and is unique since the mapping \(\ell :T_\rho {\mathcal {M}}\rightarrow (T_{\rho }{\mathcal {M}})^*,\, {\varvec{j}}\mapsto g_{\rho ,{\varvec{j}}}({\varvec{j}},\cdot )\), is a bijection; see [18, Proposition 1.9]. For further details into Finsler geometry, we refer the reader to [4, 49]. In our case, we can at least claim that the functional \(\ell _\rho :T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}}) \rightarrow (T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}}))^*\), given for \({\varvec{j}}\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) by
$$\begin{aligned} {\mathbf {j}}_2\mapsto & {} \ell _\rho ({\mathbf {j}})({\mathbf {j}}_2)= {} g_{\rho ,{\mathbf {j}}}({\mathbf {j}},{\mathbf {j}}_2) \nonumber \\&= {} \frac{1}{2}{\iint }_G j_2(x,y) \, \eta (x,y) \left( \frac{j(x,y)_+}{\rho (x)}-\frac{j(x,y)_-}{\rho (y)} \right) \,\text{ d }\mu (x)\, \text{ d }\mu (y) ,\nonumber \\ \end{aligned}$$
(3.6)
is injective \(\eta \, \mu \otimes \mu \)-a.e.; that is, the existence of a gradient implies its uniqueness (\(\eta \, \mu \otimes \mu \)-a.e.), in which case we have
$$\begin{aligned} \ell _\rho ({{\,\mathrm{grad}\,}}{\mathcal {E}}(\rho )) = {{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}. \end{aligned}$$
To see the injectivity of (3.6), we first note that \(\ell _\rho \) is positively 1-homogeneous by definition. Moreover, we have the following one-sided version of a Cauchy–Schwarz-type estimate
$$\begin{aligned} \ell _\rho ({\mathbf {j}})({\mathbf {j}}_2)&\leqq \frac{1}{2}{\iint }_G \frac{ j_2(x,y)_+ j(x,y)_+}{\rho (x)} \eta (x,y) \,\text{ d }\mu (x)\, \text{ d }\mu (y) \nonumber \\&\quad + \frac{1}{2}{\iint }_G \frac{ j_2(x,y)_- j(x,y)_-}{\rho (y)} \eta (x,y) \,\text{ d }\mu (x)\, \text{ d }\mu (y) \nonumber \\&\leqq \sqrt{\ell _\rho ({\mathbf {j}})({\mathbf {j}}) \, \ell _\rho ({\mathbf {j}}_2)({\mathbf {j}}_2)}. \end{aligned}$$
(3.7)
Here, we also used that \(\sqrt{ab}+\sqrt{cd}\leqq \sqrt{(a+c)(b+d)}\) for all \(a,b,c,d>0\). Note that the above inequalities become strict if any of the integrands \(j_2(x,y)_+ j(x,y)_-\) or \(j_2(x,y)_- j(x,y)_+\) have a contribution. In particular, we could have \(\ell _\rho ({\varvec{j}})({\varvec{j}}_2)=-\infty \) although the right-hand side is finite. Despite this, we still have equality in (3.7) if and only if \({\varvec{j}}_2 = \beta {\varvec{j}}_1\) \(\eta \, \mu \otimes \mu \)-a.e. for some \(\beta \geqq 0\).
To prove the injectivity of \(\ell _\rho \), let us suppose that \({\varvec{j}}_1, {\varvec{j}}_2 \in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) are so that \(\ell _\rho ({\varvec{j}}_1) = \ell _\rho ({\varvec{j}}_2)\). If \({\varvec{j}}_1 = 0\) or \({\varvec{j}}_2 = 0\) \(\eta \, \mu \otimes \mu \)-a.e., then \(\ell _\rho ({\varvec{j}}_1) = \ell _\rho ({\varvec{j}}_2)\) implies that \({\varvec{j}}_1 = {\varvec{j}}_2 = 0\). If both \({\varvec{j}}_1\) and \({\varvec{j}}_2\) are nonzero, then by the above Cauchy–Schwarz inequality we get
$$\begin{aligned} 0< & {} g_{\rho ,{\varvec{j}}_2}({\varvec{j}}_2,{\varvec{j}}_2) = \ell _\rho ({\varvec{j}}_2)({\varvec{j}}_2) = \ell _\rho ({\varvec{j}}_1)({\varvec{j}}_2) = g_{\rho ,{\varvec{j}}_1}({\varvec{j}}_1,{\varvec{j}}_2) \\\leqq & {} \sqrt{ g_{\rho ,{\varvec{j}}_1}({\varvec{j}}_1,{\varvec{j}}_1)g_{\rho ,{\varvec{j}}_2}({\varvec{j}}_2,{\varvec{j}}_2) }, \end{aligned}$$
which, after dividing by \(\sqrt{g_{\rho ,{\varvec{j}}_2}({\varvec{j}}_2,{\varvec{j}}_2)}\) yields \(g_{\rho ,{\varvec{j}}_2}({\varvec{j}}_2,{\varvec{j}}_2) \leqq g_{\rho ,{\varvec{j}}_1}({\varvec{j}}_1,{\varvec{j}}_1)\). Similarly, one gets \(g_{\rho ,{\varvec{j}}_1}({\varvec{j}}_1,{\varvec{j}}_1) \leqq g_{\rho ,{\varvec{j}}_2}({\varvec{j}}_2,{\varvec{j}}_2)\), from which we get
$$\begin{aligned} g_{\rho ,{\varvec{j}}_1}({\varvec{j}}_1,{\varvec{j}}_1) = g_{\rho ,{\varvec{j}}_2}({\varvec{j}}_2,{\varvec{j}}_2). \end{aligned}$$
Hence
$$\begin{aligned} g_{\rho ,{\varvec{j}}_1}({\varvec{j}}_1,{\varvec{j}}_2)= & {} \ell _\rho ({\varvec{j}}_1)({\varvec{j}}_2) = \ell _\rho ({\varvec{j}}_2)({\varvec{j}}_2) = g_{\rho ,{\varvec{j}}_2}({\varvec{j}}_2,{\varvec{j}}_2) \\= & {} \sqrt{g_{\rho ,{\varvec{j}}_1}({\varvec{j}}_1,{\varvec{j}}_1)g_{\rho ,{\varvec{j}}_2}({\varvec{j}}_2,{\varvec{j}}_2)}, \end{aligned}$$
which is the equality case in the Cauchy–Schwarz inequality. Therefore, there exists \(\beta \geqq 0\) such that \({\varvec{j}}_2 = \beta {\varvec{j}}_1\). By positive 1-homogeneity of \(\ell _\rho \) we get \(\ell _\rho ({\varvec{j}}_2) = \ell _\rho (\beta {\varvec{j}}_1) = \beta \ell _\rho ({\varvec{j}}_1) = \beta \ell _\rho ({\varvec{j}}_2)\), so that \(\beta = 1\), since \(\ell _\rho ({\varvec{j}}_2)({\varvec{j}}_2) \ne 0\). This ends the proof of the claim of injectivity of \(\ell _\rho \).
The direction of the steepest descent on Finsler manifolds is in general not \(-{{\,\mathrm{grad}\,}}{\mathcal {E}}(\rho )\), but is defined to be the tangent flux, which we denote by \({{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho )\), such that
$$\begin{aligned} -{{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}[{\varvec{j}}] = g_{\rho ,{{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho )\,}({{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho ), {\varvec{j}}) \qquad \text {for all} \, {\varvec{j}}\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}}). \end{aligned}$$
In other words, we define \({{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho )\) as the tangent vector (provided it exists) such that
$$\begin{aligned} \ell _\rho ({{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho )) = -{{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}. \end{aligned}$$
(3.8)
Here we clearly see that in general \({{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho ) \ne -{{\,\mathrm{grad}\,}}{\mathcal {E}}(\rho )\) since \(\ell _\rho \) is not negatively 1-homogeneous. We can justify that \({{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho )\) indeed corresponds to the direction of steepest descent at \(\rho \) via the following criterion, which is analogous to the Riemann case. We first note that if \({{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}= 0\) then \({{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho )=0\). If \({{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}\ne 0\) we note that minimizers \({\varvec{j}}^*\) of
$$\begin{aligned} {\varvec{j}}\mapsto {{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}[{\varvec{j}}], \qquad \text {with the constraint that} \, g_{\rho ,{\varvec{j}}}({\varvec{j}},{\varvec{j}}) = 1, \end{aligned}$$
are of the form \({\varvec{j}}^* = \beta {{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho )\) for some \(\beta >0\). Indeed, using the fact that \(\left. \frac{\text {d}}{\text {d}s}\right| _{s=0}g_{\rho ,{\varvec{j}}+ s{\varvec{j}}_1}({\varvec{j}}+s{\varvec{j}}_1,{\varvec{j}}+ s{\varvec{j}}_1) = 2g_{\rho ,{\varvec{j}}}({\varvec{j}},{\varvec{j}}_1)\) for all \({\varvec{j}},{\varvec{j}}_1\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) [as shown in (A.1) of “Appendix A”] and using the Lagrange multiplier \(\beta \) and the functional
$$\begin{aligned} H(\beta ,{\varvec{j}}) := {{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}[{\varvec{j}}] + \tfrac{\beta }{2} (g_{\rho ,{\varvec{j}}}({\varvec{j}},{\varvec{j}}) -1), \qquad {\varvec{j}}\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}}),\quad \beta \in {\mathbb {R}}, \end{aligned}$$
yields, for a constrained minimizer \({\varvec{j}}^*\), the condition
$$\begin{aligned} {{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}= - \beta ^* g_{\rho ,{\varvec{j}}^*}({\varvec{j}}^*,\cdot ) = - \beta ^* \ell _\rho ({\varvec{j}}^*). \end{aligned}$$
(3.9)
By the definition of \({\varvec{j}}^*\) we have \(0> {{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}[{\varvec{j}}^*] = - \beta ^* g_{\rho ,{\varvec{j}}^*}({\varvec{j}}^*,{\varvec{j}}^*)\), which implies that \(\beta ^*>0\). By injectivity and positive 1-homogeneity of \(\ell _\rho \), we get
$$\begin{aligned} {\varvec{j}}^* = \ell _\rho ^{-1}\left( -\frac{1}{\beta ^*}{{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}\right) = \frac{1}{\beta ^*} \ell _\rho ^{-1}(-{{\,\mathrm{Diff}\,}}_\rho {\mathcal {E}}) = \frac{1}{\beta ^*} {{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho ). \end{aligned}$$
The gradient flows with respect to \({\mathcal {E}}\) in the Finsler space \(({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}})\) can thus be written
$$\begin{aligned} \partial _t \rho _t = {\overline{\nabla }}\cdot {{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho ). \end{aligned}$$
(3.10)
These considerations stay valid for general energy functionals \({\mathcal {E}}:{\mathcal {P}}_2({{\mathbb {R}}^{d}})\rightarrow {\mathbb {R}}\).
Let us compute the gradient flux for the specific case of the interaction energy (3.2). A direct computation using the symmetry of K and Definition 2.7 gives, for all \({\varvec{j}}\in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\),
$$\begin{aligned}&-{{\,\mathrm {Diff}\,}}_\rho {\mathcal {E}}[{\mathbf {j}}]\\ {}&\quad = \frac{1}{2}{\iint }_G \bigl (-{\overline{\nabla }}(K*\rho )\bigr )(x,y) \, \eta (x,y) \, j(x,y) \,\text{ d }\mu (x)\,\text{ d }\mu (y) \\ {}&\quad = \frac{1}{2}{\iint }_G j(x,y) \, \eta (x,y) \\ {}&\qquad \times \left( \frac{\rho (x) \bigl (-{\overline{\nabla }}(K*\rho )\bigr )_+(x,y)}{\rho (x)} - \frac{\rho (y) \bigl (-{\overline{\nabla }}(K*\rho )\bigr )_-(x,y)}{\rho (y)} \right) \,\text{ d }\mu (x)\,\text{ d }\mu (y) \\ {}&\quad =\frac{1}{2}{\iint }_G j(x,y) \, \eta (x,y) \bigl (-{\overline{\nabla }}(K*\rho )(x,y)\bigr ) \\ {}&\qquad \times \left( \frac{\rho (x) \chi _{\{-{\overline{\nabla }}K*\rho >0)\}} (x,y)}{\rho (x)} + \frac{\rho (y) \chi _{\{-{\overline{\nabla }}K*\rho <0\}}(x,y)}{\rho (y)} \right) \,\text{ d }\mu (x)\,\text{ d }\mu (y) \\ {}&\quad = \ell _{\rho }\bigl ({{\,\mathrm {grad}\,}}^- {\mathcal {E}}(\rho )\bigr )({\mathbf {j}}) , \end{aligned}$$
where by comparison with (3.6), we observe that \({{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho )\) is given for \(\mu \otimes \mu \)-a.e. \((x,y) \in G\) by
$$\begin{aligned}&{{\,\mathrm {grad}\,}}^- {\mathcal {E}}(\rho )(x,y) \nonumber \\&= {} -{\overline{\nabla }}(K*\rho )(x,y)\left( \rho (x)\chi _{\{-{\overline{\nabla }}K*\rho >0\}}(x,y) + \rho (y)\chi _{\{-{\overline{\nabla }}K*\rho <0\}}(x,y) \right) .\nonumber \\ \end{aligned}$$
(3.11)
This shows by (3.8) the existence and by our previous argument also uniqueness of \({{\,\mathrm{grad}\,}}^- {\mathcal {E}}(\rho )\). It is easily observed that it has exactly the form (3.5) with the corresponding potential given by \(\varphi = -K*\rho \).
We conclude this section by mentioning that the Finsler gradient flow structure of differential equations has been discovered and investigated in other systems; see [1, 41, 42].
Variational Characterization for the Nonlocal Nonlocal-Interaction Equation
Section 3.1 shows that the nonlocal nonlocal-interaction equation (\({\text {NL}}^2 {\text {IE}}\)) can in fact be written as the gradient descent of the energy \({\mathcal {E}}\) according to the Finsler gradient operator; see (3.10) and (3.11). This is why we refer to weak solutions of (\({\text {NL}}^2 {\text {IE}}\)) as gradient flows.
In this section we consider \(({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}})\) as a quasi-metric space rather than a Finsler manifold, which allows us to prove rigorous statements more easily. In particular, we show that the weak solutions of (\({\text {NL}}^2 {\text {IE}}\)) are curves of maximal slope for the energy (3.2) in the quasi-metric space \(({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}})\) and vice versa. We then establish the existence and stability of gradient flows using the variational framework of curves of maximal slope. To develop the variational formulation, we adapt the approach of [2] to curves of maximal slope in metric spaces to the quasi-metric space \(({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}})\). This requires introducing a one-sided version of the usual concepts from [2] to cope with the asymmetry of the quasi-metric \({\mathcal {T}}\).
Definition 3.4
(One-sided strong upper gradient) A function \(h:{\mathcal {P}}_2({{\mathbb {R}}^{d}})\rightarrow [0,\infty ]\) is a one-sided strong upper gradient for \({\mathcal {E}}\) if for every \(\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}))\) the function \(h\circ \rho \) is Borel and
$$\begin{aligned} {\mathcal {E}}(\rho _t)-{\mathcal {E}}(\rho _s) \geqq - {\int }_s^th(\rho _\tau )|\rho _\tau '|\,\text {d}\tau \quad \; \text{ for } \text{ all } \, 0\leqq s\leqq t\leqq T, \end{aligned}$$
(3.12)
where \(|\rho '|\) is the metric derivative of \(\rho \) as defined in (2.25).
The above one-sided definition is sufficient to characterize the curves of maximal slope.
Definition 3.5
(Curve of maximal slope) A curve \(\rho \in {{\,\mathrm{AC}\,}}([0,T];{\mathcal {P}}_2({{\mathbb {R}}^{d}}))\) is a curve of maximal slope for \({\mathcal {E}}\) with respect to its one-sided strong upper gradient h if and only if \(t\mapsto {\mathcal {E}}(\rho _t)\) is non-increasing and
$$\begin{aligned} {\mathcal {E}}(\rho _t)-{\mathcal {E}}(\rho _s)+\frac{1}{2}{\int }_s^t \Bigl ( h(\rho _\tau )^2+|\rho _\tau '|^2 \Bigr ) \,\text {d}\tau \leqq 0 \quad \text{ for } \text{ all }\; 0\leqq s\leqq t\leqq T.\nonumber \\ \end{aligned}$$
(3.13)
Remark 3.6
Note that by using Young’s inequality in (3.12), we get
$$\begin{aligned} {\mathcal {E}}(\rho _t)-{\mathcal {E}}(\rho _s)+\frac{1}{2}{\int }_s^t \Bigl ( h(\rho _\tau )^2+|\rho _\tau '|^2 \Bigr ) \,\text {d}\tau \geqq 0 \quad \; \text{ for } \text{ all } \, 0\leqq s\leqq t\leqq T. \end{aligned}$$
Hence, if the curve \((\rho _t)_{t\in [0.T]}\) is a curve of maximal slope for \({\mathcal {E}}\) with respect to its strong upper gradient h, we actually have an equality in (3.13).
Therefore, in order to give a variational characterization of (\({\text {NL}}^2 {\text {IE}}\)) we need to detect the right one-sided strong upper gradient. As showed in [24], the variation of the energy along the solution to the equation provides the suitable candidate. In what follows we clarify this point as well as the strategy.
We recall that Proposition 2.25 ensures that for any \(\rho \in {{\,\mathrm{AC}\,}}\bigl ([0,T]; ({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}})\bigr )\) there exists a unique flux \(({\varvec{j}}_t)_{t\in [0,T]}\) in \(T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) such that \({\int }_0^T{\mathcal {A}}(\rho _t,{\varvec{j}}_t)\,\text {d}{t}<\infty \), \((\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T\) and \(|\rho _t'|^2={\mathcal {A}}(\rho ,{\varvec{j}}_t)\) for a.e. \(t\in [0,T]\). Moreover, according to Lemma 2.6 there exists an antisymmetric measurable vector field \(w:[0,T]\times G \rightarrow {\mathbb {R}}\) such that
$$\begin{aligned} \text {d}{\varvec{j}}_t(x,y) = w_t(x,y)_+ \text {d}\gamma _{1,t}(x,y) - w_t(x,y)_- \text {d}\gamma _{2,t}(x,y). \end{aligned}$$
(3.14)
It will be convenient to work directly with this vector field \((w_t)_{t\in [0,T]}\): from now on we write \((\rho ,w)\in {{\,\mathrm{CE}\,}}_T\) for \((\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T\) as well as \({\widehat{{\mathcal {A}}}}(\rho _t,w_t)\) for \({\mathcal {A}}(\rho _t,{\varvec{j}}_t)\) according to (2.8). With this convention, we can define a Finsler-type product on velocities in analogy to (3.4) as
$$\begin{aligned} {\widehat{g}}_{\rho ,w}(u,v)= & {} {} \frac{1}{2}{\iint }_G u(x,y)\,v(x,y)\, \eta (x,y) \\&\times \big (\chi _{\{w>0\}}(x,y)\text{ d }\gamma _1(x,y) + \chi _{\{w<0\}}(x,y) \,\text{ d }\gamma _2(x,y)\big ). \end{aligned}$$
Note that, under the absolute-continuity assumptions of Section 3.1, by comparing with (3.4) we have that \({\widehat{g}}_{\rho ,w}(u,v)= g_{\rho ,{\varvec{j}}}({\varvec{j}}_1,{\varvec{j}}_2)\), where \({\varvec{j}}_1,{\varvec{j}}_2\) are obtained from u, v by (3.14), respectively. Moreover, taking (3.6) into account, we also define
$$\begin{aligned} {\widehat{\ell }}_{\rho }(w)(v) = {\widehat{g}}_{\rho ,w}(w,v) . \end{aligned}$$
(3.15)
Arguing as in (3.7), we arrive at the following one-sided Cauuchy–Schwarz inequality:
Lemma 3.7
(One-sided Cauchy–Schwarz inequality) For all \(v,w \in T_\rho {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) it holds that
$$\begin{aligned} {\widehat{g}}_{\rho ,w}(w,v) \leqq \sqrt{ {\widehat{g}}_{\rho ,v}(v,v) \, {\widehat{g}}_{\rho ,w}(w,w)}, \end{aligned}$$
(3.16)
with equality if and only if, for some \(\lambda >0\), \(v(x,y)_+= \lambda w(x,y)_+\) for \(\eta \, \rho \otimes \mu \)-a.e. \((x,y)\in G\) (and thus, by antisymmetry, also \(v(x,y)_-= \lambda w(x,y)_-\) for \(\eta \, \mu \otimes \rho \)-a.e. \((x,y)\in G\)).
Proof
Using \(v=v_+-v_-\) and the usual Cauchy–Schwarz inequality in \(L^2(\eta \,\rho \otimes \mu )\), we get
$$\begin{aligned} {\widehat{g}}_{\rho ,w}(w,v)&= \frac{1}{2}{\iint }_G v(x,y) \eta (x,y)\\ {}&\quad \times \bigl ( w(x,y)_+ \text{ d }\rho (x) \text{ d }\mu (y) - w(x,y)_- \text{ d }\mu (x) \,\text{ d }\rho (y)\bigr ) \\ {}&\leqq \frac{1}{2}{\iint }_G v(x,y)_+ w(x,y)_+ \eta (x,y) \,\text{ d }\rho (x) \,\text{ d }\mu (y) \\ {}&\quad + \frac{1}{2}{\iint }_G v(x,y)_-w(x,y)_-\eta (x,y)\, \text{ d }\mu (x) \,\text{ d }\rho (y)\\ {}&\leqq \sqrt{ {\widehat{g}}_{\rho ,v}(v,v) \, {\widehat{g}}_{\rho ,w}(w,w)}. \end{aligned}$$
From the usual Cauchy–Schwarz inequality we have equalities above if and only if there exists \(\lambda > 0\) such that \(v(x,y)_+=\lambda w(x,y)_+\) for \(\eta \rho \otimes \mu \)-a.e. \((x,y) \in G\) and \(v(x,y)_-=\lambda w(x,y)_-\) for \(\eta \mu \otimes \rho \)-a.e. \((x,y)\in G\), since all the contributions are positive. \(\square \)
Now note that, from the weak formulation of the nonlocal continuity equation (2.15), we have for any \(\varphi \in C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}})\) and any \(0\leqq s < t \leqq T\) the following chain rule:
$$\begin{aligned}&{\int }_{{\mathbb {R}}^{d}}\varphi (x)\, \text{ d }\rho _t(x)-{\int }_{{\mathbb {R}}^{d}}\varphi (x)\, \text{ d }\rho _s(x) \nonumber \\ {}&\quad = \frac{1}{2}{\int }_{s}^t{\iint }_G{\overline{\nabla }}\varphi (x,y)\,\eta (x,y) \,\text{ d }{\mathbf {j}}_\tau (x,y)\,\text{ d }\tau \nonumber \\ {}&\quad = \frac{1}{2}{\int }_{s}^t{\iint }_G{\overline{\nabla }}\varphi (x,y)\,\eta (x,y) \nonumber \\ {}&\qquad \times \left( w_\tau (x,y)_+ \text{ d }\gamma _{1,\tau }(x,y) - w_\tau (x,y)_- \,\text{ d }\gamma _{2,\tau }(x,y) \right) \,\text{ d }\tau \nonumber \\ {}&\quad = \frac{1}{2}{\int }_{s}^t {\iint }_G{\overline{\nabla }}\varphi (x,y)w_\tau (x,y)\,\eta (x,y) \nonumber \\ {}&\qquad \times \left( \chi _{\{w>0\}}\,\text{ d }\gamma _{1,\tau }(x,y) + \chi _{\{w<0\}}\,\text{ d }\gamma _{2,\tau }(x,y)\right) \,\text{ d }\tau \nonumber \\ {}&\quad = {\int }_{s}^t {\widehat{g}}_{\rho _\tau ,w_\tau }(w_\tau ,{\overline{\nabla }}\varphi )\,\text{ d }\tau = {\int }_{s}^t {\widehat{\ell }}_{\rho }(w_\tau )({\overline{\nabla }}\varphi )\, \text{ d }\tau . \end{aligned}$$
(3.17)
Moreover, we still have the identification of the product \({\widehat{g}}\) with the action in the form of Lemma 2.6,
$$\begin{aligned} {\widehat{g}}_{\rho _t,w_t}(w_t,w_t)&= \frac{1}{2}{\iint }_G w_t(x,y)^2\eta (x,y) \nonumber \\ {}&\quad \times \left( \chi _{\{w>0\}}(x,y)\,\text{ d }\gamma _{1,t}(x,y) + \chi _{\{w<0\}}(x,y) \,\text{ d }\gamma _{2,t}(x,y )\right) \nonumber \\ {}&= \frac{1}{2}{\iint }_G w_t(x,y)_+^2 \eta (x,y) \,\text{ d }\gamma _{1,t}(x,y) \nonumber \\ {}&\quad + \frac{1}{2}{\iint }_G w_t(x,y)_-^2 \eta (x,y) \,\text{ d }\gamma _{2,t}(x,y) \nonumber \\ {}&= \frac{1}{2}{\iint }_G \left( w_t(x,y)_+^2 + w_t(y,x)_-^2 \right) \eta (x,y) \text{ d }\gamma _{1,t}(x,y) \nonumber \\ {}&={\hat{{\mathcal {A}}}}(\rho _t,w_t), \end{aligned}$$
(3.18)
which shows that the action is the norm with respect to the Finsler structure.
A crucial step toward the variational characterization of (\({\text {NL}}^2 {\text {IE}}\)) mentioned above is to obtain the chain rule (3.17) for the energy functional (3.2), which is done in Proposition 3.10 below by a suitable regularization. As a consequence, by using the one-sided Cauchy–Schwarz inequality from Lemma 3.7, we obtain in Corollary 3.11 that the square root \(\sqrt{{\mathcal {D}}}\) of the local slope, defined below in (3.19), is a one-sided strong upper gradient for \({\mathcal {E}}\) with respect to the quasi-metric \({\mathcal {T}}\) in the sense of Definition 3.4, where \(|\rho _t'|^2={\hat{{\mathcal {A}}}}(\rho _t,w_t)={\widehat{g}}_{\rho _t,w_t}(w_t,w_t)\) for a.e. \(t\in [0,T]\) due to Proposition 2.25 and (3.18). This allows us to define the De Giorgi functional, which provides the characterization of weak solutions as curves of maximal slope.
Definition 3.8
(Local slope and De Giorgi functional) For any \(\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\), let the local slope at \(\rho \) be given by
$$\begin{aligned} {\mathcal {D}}(\rho ) := {\widehat{g}}_{\rho ,-{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }}\left( -{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho },-{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }\right) . \end{aligned}$$
(3.19)
For any \(\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}))\), the De Giorgi functional at \(\rho \) is defined as
$$\begin{aligned} {\mathcal {G}}_T(\rho ):={\mathcal {E}}(\rho _T)-{\mathcal {E}}(\rho _0)+\frac{1}{2}{\int }_0^T\big ({\mathcal {D}}(\rho _\tau ) + |\rho _\tau '|^2\big )\,\text {d}\tau . \end{aligned}$$
(3.20)
When the dependence on the base measure \(\mu \) needs to be explicit, the local slope and the De Giorgi functional are denoted by \({\mathcal {D}}(\mu ;\rho )\) and \({\mathcal {G}}_T(\mu ;\rho )\), respectively.
If the potential K satisfies Assumptions (K1)–(K3), we note that whenever \(\rho \) is a weak solution to (\({\text {NL}}^2 {\text {IE}}\)) and \(\rho \in {{\,\mathrm{AC}\,}}([0,T];{\mathcal {P}}_2({{\mathbb {R}}^{d}}))\) the quantity \({\mathcal {G}}_T(\rho )\) is finite; indeed, the domain of the energy is all of \({\mathcal {P}}_2({{\mathbb {R}}^{d}})\) and Proposition 2.25 yields that both the local slope (since it is equal to the action of \((\rho ,{\varvec{j}})\), where \({\varvec{j}}\) is given in Definition 3.1) and metric derivative are finite.
We are ready to state our main theorem.
Theorem 3.9
Suppose that \(\mu \) satisfies Assumptions (A1) and (A2) and K satisfies Assumptions (K1)–(K3). A curve \((\rho _t)_{t\in [0,T]} \subset {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) is a weak solution to (\({\text {NL}}^2 {\text {IE}}\)) according to Definition 3.1 if and only if \(\rho \) belongs to \({{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}))\) and is a curve of maximal slope for \({\mathcal {E}}\) with respect to \(\sqrt{{\mathcal {D}}}\) in the sense of Definition 3.5, that is, satisfies
$$\begin{aligned} {\mathcal {G}}_T(\rho ) = 0, \end{aligned}$$
(3.21)
where \({\mathcal {G}}_T\) is the De Giorgi functional as given in Definition 3.8.
Note that in the above theorem, the implicit assumption that \(\sqrt{{\mathcal {D}}}\) is a one-sided strong upper gradient for \({\mathcal {E}}\) is made; this is in fact true thanks to Corollary 3.11 below. In light of this we can represent the result via the following diagram:
$$\begin{aligned}&\rho \text { is a weak solution of } ({\text {NL}}^2{\text {IE}}) \\&\iff \!\! \rho \text { is a curve of maximal slope for } {\mathcal {E}}\text { w.r.t. }\! \sqrt{{\mathcal {D}}} \\&\iff \! {\mathcal {G}}_T(\rho )\! =\! 0. \end{aligned}$$
The Chain Rule and Proof of Theorem 3.9
Firstly, we focus on the chain-rule property, which is the main technical step for proving Theorem 3.9.
Proposition 3.10
Let K satisfy Assumptions (K1)–(K3). For all \(\rho \in {{\,\mathrm{AC}\,}}\bigl ([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}})\bigr )\) and \(0\leqq s\leqq t\leqq T\) we have the chain-rule identity
$$\begin{aligned} {\mathcal {E}}(\rho _t) - {\mathcal {E}}(\rho _s) = {\int }_s^t {\widehat{g}}_{\rho _\tau ,w_\tau }\left( w_\tau , {\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(\rho _\tau )\right) \,\text {d}\tau , \end{aligned}$$
(3.22)
where \((w_t)_{t\in [0,T]}\) is the antisymmetric vector field associated by (2.6) to \((\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T\).
Proof
Since the curve \(\rho \in {{\,\mathrm{AC}\,}}\bigl ([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}})\bigr )\), according to Proposition 2.25 there exists a unique family \(({\varvec{j}}_t)_{t\in [0,T]}\) belonging to \(T_{\rho }{\mathcal {P}}_{2}({{\mathbb {R}}^{d}})\) for a.e. \(t\in [0,T]\) such that:
-
(i)
\((\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T\);
-
(ii)
\({\int }_0^T\sqrt{{\mathcal {A}}(\rho _t,{\varvec{j}}_t)}\,\text {d}t<\infty \);
-
(iii)
\(|\rho _t'|^2={\mathcal {A}}(\rho _t,{\varvec{j}}_t)\) for a.e. \(t\in [0,T]\);
-
(iv)
\(\text {d}{\varvec{j}}_t(x,y) = w_t(x,y)_+ \text {d}\gamma _{1,t}(x,y) - w_t(x,y)_- \text {d}\gamma _{2,t}(x,y)\).
Then the identity (3.22) is equivalent to proving
$$\begin{aligned} {\mathcal {E}}(\rho _t)-{\mathcal {E}}(\rho _s) = \frac{1}{2}{\int }_s^t{\iint }_G{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(\rho _\tau )(x,y)\, \eta (x,y) \,\text {d}{\varvec{j}}_\tau (x,y)\,\text {d}\tau . \end{aligned}$$
(3.23)
We proceed by applying two regularization procedures. First, for all \((x,y)\in {{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}\) we define \(K^\varepsilon (x,y)=K*m_\varepsilon (x,y)={\iint }_{{{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}} K(z,z')m_{\varepsilon }(x-z,y-z')\,\text {d}z\,\text {d}z'\), where \(m_\varepsilon (z)=\frac{1}{\varepsilon ^{2d}}m(\frac{z}{\varepsilon })\) for all \(z\in {\mathbb {R}}^{2d}\) and \(\varepsilon >0\), where m is a standard mollifier on \({\mathbb {R}}^{2d}\). We also introduce a smooth cut-off function \(\varphi _R\) on \({{\mathbb {R}}^{2d}}\) such that \(\varphi (z)=1\) on \(B_R\), \(\varphi (z)=0\) on \({{\mathbb {R}}^{2d}}{\setminus } B_{2R}\) and \(|\nabla \varphi _R|\leqq \frac{2}{R}\), where \(B_R\) is the ball of radius R in \({{\mathbb {R}}^{2d}}\) centered at the origin. We set \(K_R^\varepsilon :=\varphi _R K^\varepsilon \) and note that it is a \(C_\mathrm {c}^\infty ({{\mathbb {R}}^{2d}})\) function. We now introduce the approximate energies, indexed by \(\varepsilon \) and R,
$$\begin{aligned} {\mathcal {E}}_R^\varepsilon (\nu )=\frac{1}{2}{\int }_{{\mathbb {R}}^{d}}{\int }_{{{\mathbb {R}}^{d}}} K_R^\varepsilon (x,y)\,\text {d}\nu (y)\,\text {d}\nu (x) \quad \text{ for } \text{ all }\, \nu \in {\mathcal {P}}_2({{\mathbb {R}}^{d}}). \end{aligned}$$
Let us extend \(\rho \) and \({\varvec{j}}\) to \([-T,2 T]\) periodically in time, meaning that \(\rho _{-s}=\rho _{T-s}\) and \(\rho _{T+s}=\rho _{s}\) for all \(s\in (0,T]\) and likewise for \({\varvec{j}}\). We regularize \(\rho \) and \({\varvec{j}}\) in time by using a standard mollifier n on \({\mathbb {R}}\) supported on \([-1,1]\), by setting \(n_\sigma (t)=\frac{1}{\sigma }n(\frac{t}{\sigma })\) and
$$\begin{aligned}&\rho _t^\sigma (A)=n_\sigma *\rho _t(A)={\int }_{-\sigma }^\sigma n_\sigma (t-s)\rho _s(A)\,\text {d}s, \qquad \forall A\subseteq {{\mathbb {R}}^{d}},\\&{\varvec{j}}_t^\sigma (U)=n_\sigma *{\varvec{j}}_t(A)={\int }_{-\sigma }^\sigma n_\sigma (t-s){\varvec{j}}_s(U)\,\text {d}s, \qquad \forall U\subset G, \end{aligned}$$
for any \(\sigma \in (0,T)\); whence \(\rho _t^\sigma \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\). Let us now show that the integral of the action is uniformly bounded with respect to \(\sigma \). Let \(|\lambda | \in {\mathcal {M}}^+(G)\) be such that \(\gamma _{1,t},\gamma _{2,t},|{\varvec{j}}_t| \ll |\lambda |\) for all \(t\in [0,T]\). Then by using the joint convexity of the function \(\alpha \) from (2.5), Jensen’s inequality and Fubini’s Theorem, we get
$$\begin{aligned}&{\int }_0^T{\mathcal {A}}(\rho _t^\sigma ,{\mathbf {j}}_t^\sigma )\, \text{ d }t \\ {}&\quad =\frac{1}{2}{\int }_0^T {\iint }_G \alpha \left( {\int }_{-\sigma }^{\sigma } \frac{{\text{ d }}^{} {\mathbf {j}}_{t-s}}{\text{ d } |\lambda |^{}} n_\sigma (s)\,\text{ d }s, {\int }_{-\sigma }^{\sigma } \frac{{\text{ d }}^{} \gamma _{1,t-s}}{\text{ d } |\lambda |^{}} n_\sigma (s)\,\text{ d }s \right) \eta \,\text{ d }|\lambda | \,\text{ d }t \\ {}&\qquad +\frac{1}{2}{\int }_0^T {\iint }_G \alpha \left( - {\int }_{-\sigma }^{\sigma } \frac{{\text{ d }}^{} {\mathbf {j}}_{t-s}}{\text{ d } |\lambda |^{}} n_\sigma (s)\,\text{ d }s, {\int }_{-\sigma }^{\sigma } \frac{{\text{ d }}^{} \gamma _{2,t-s}}{\text{ d } |\lambda |^{}} n_\sigma (s)\,\text{ d }s \right) \eta \,\text{ d }|\lambda |\, \text{ d }t \\ {}&\quad \leqq \frac{1}{2}{\int }_0^T {\iint }_G {\int }_{-\sigma }^\sigma \alpha \left( \frac{{\text{ d }}^{} {\mathbf {j}}_{t-s}}{\text{ d } |\lambda |^{}}, \frac{{\text{ d }}^{} \gamma _{1,t-s}}{\text{ d } |\lambda |^{}} \right) n_\sigma (s) \,\text{ d }s \, \eta \,\text{ d }|\lambda |\, \text{ d }t \\ {}&\qquad + \frac{1}{2}{\int }_0^T {\iint }_G {\int }_{-\sigma }^\sigma \alpha \left( - \frac{{\text{ d }}^{} {\mathbf {j}}_{t-s}}{\text{ d } |\lambda |^{}}, \frac{{\text{ d }}^{} \gamma _{2,t-s}}{\text{ d } |\lambda |^{}} \right) n_\sigma (s) \,\text{ d }s \, \eta \,\text{ d }|\lambda |\, \text{ d }t \\ {}&\quad = {\int }_{-\sigma }^{+\sigma } {\int }_0^T {\mathcal {A}}(\rho _{t-s},{\mathbf {j}}_{t-s}) \,\text{ d }t\, n_\sigma (s)\,\text{ d }s\\ {}&\quad \leqq {\int }_{-T}^{2T} {\mathcal {A}}(\rho _{t},{\mathbf {j}}_{t}) \, \text{ d }t = 3 {\int }_{0}^{T} {\mathcal {A}}(\rho _{t},{\mathbf {j}}_{t}) \, \text{ d }t<\infty . \end{aligned}$$
It is easy to check that \((\rho ^\sigma ,{\varvec{j}}^\sigma )\) is still a solution to the nonlocal continuity equation on [0, T]. By arguing as in the proof of Proposition 2.17, we get that along subsequences it holds \(\rho _t^\sigma \rightharpoonup \tilde{\rho }_t\) as \(\sigma \rightarrow 0\) for all \(t\in [0,T]\) for some curve \(({\tilde{\rho }}_t)_{t\in [0,T]}\) in \({\mathcal {P}}_2({{\mathbb {R}}^{d}})\), and \({\varvec{j}}^\sigma \rightharpoonup \hat{{\varvec{j}}}\) in \({\mathcal {M}}_{\mathrm {loc}}(G \times [0,T])\). with \(\text {d}{\hat{{\varvec{j}}}} := \text {d}{\tilde{{\varvec{j}}}}_t\text {d}t\), for some curve \(({\tilde{{\varvec{j}}}}_t)_{t\in [0,T]}\) in \({\mathcal {M}}(G)\). Note that \(n_\sigma \rightharpoonup \delta _0\) as \(\sigma \rightarrow 0\), and, as a consequence, \(\rho _t^\sigma \rightharpoonup \rho _t\) for all \(t\in [0,T]\) in the view of Proposition 2.21. Thus, we actually have \(\tilde{\rho }=\rho \) and \(\tilde{{\varvec{j}}}={\varvec{j}}\) by uniqueness of the limit and the flux, as highlighted above. Using the regularity for \(\varepsilon >0\) and \(\sigma >0\), we get
$$\begin{aligned} \frac{{\text {d}}^{} }{\text {d} t^{}} {\mathcal {E}}_R^\varepsilon (\rho _t^\sigma )= & {} {\int }_{{\mathbb {R}}^{d}}(K_R^\varepsilon *\rho _t^\sigma )(x)\partial _t\rho _t^\sigma (x)\,\text {d}\mu (x)\\= & {} \frac{1}{2}{\iint }_G{\overline{\nabla }}(K_R^\varepsilon *\rho _t^\sigma )(x,y)\,\eta (x,y) \,\text {d}{\varvec{j}}_t^\sigma (x,y). \end{aligned}$$
For the sake of completeness, we note that the second equality follows from the definition of \({{\,\mathrm{CE}\,}}_T\) by using again a cut-off argument on the function \(K_R^\varepsilon *\rho _t^\sigma \). We omit this step as it is a standard procedure. By integrating in time between s and t, with \(s\leqq t\), it follows
$$\begin{aligned}&{\mathcal {E}}_R^\varepsilon (\rho _t^\sigma )-{\mathcal {E}}_R^\varepsilon (\rho _s^\sigma )\nonumber \\&\quad =\frac{1}{2}{\int }_s^t{\iint }_G{\overline{\nabla }}(K_R^\varepsilon *\rho _\tau ^\sigma )(x,y)\, \eta (x,y)\, \text {d}{\varvec{j}}_\tau ^\sigma (x,y)\,\text {d}\tau \nonumber \\&\quad =\frac{1}{2}{\int }_{s}^{t}{\iint }_G {\int }_{{\mathbb {R}}^{d}}\left( K_R^\varepsilon (y,z)-K_R^\varepsilon (x,z)\right) \text {d}\rho _\tau ^\sigma (z) \eta (x,y)\, \text {d}{\varvec{j}}_\tau ^\sigma (x,y)\,\text {d}\tau . \end{aligned}$$
(3.24)
In order to obtain (3.23) we need to let \(\varepsilon \) and \(\sigma \) go to 0 and R go to \(\infty \) in (3.24). The left-hand side is easy to handle since \(\rho _t^\sigma \rightharpoonup \rho _t\) as \(\sigma \rightarrow 0\) for any \(t\in [0,T]\), and \(K_R^\varepsilon \rightarrow K_R\) uniformly on compact sets as \(\varepsilon \rightarrow 0\). Finally, by letting R go to \(\infty \) we have convergence to \({\mathcal {E}}(\rho _t)\).
In order to pass to the limit in the right-hand side of (3.24), we use a truncation argument similar to that in the proof of Proposition 2.17. Let \(\delta >0\) and let us set \(N_\delta = {\overline{B}}_{\delta ^{-1}} \times {\overline{B}}_{\delta ^{-1}}\), where \(B_{\delta ^{-1}}= \left\{ x \in {\mathbb {R}}^d: |x|< \delta ^{-1}\right\} \), and \(G_\delta =\bigl \{(x,y)\in G:\delta \leqq |x-y|\bigr \}\). We can consider a family \((\varphi _\delta )_{\delta >0} \subset C_\mathrm {c}^\infty ({{\mathbb {R}}^{d}}\times G;[0,1])\) of truncation functions such that, for all \(\delta >0\),
$$\begin{aligned} \{\varphi _\delta = 1\} \supseteq {\overline{B}}_{\delta ^{-1}} \times G_\delta \cap N_\delta . \end{aligned}$$
Now, we add and subtract \(\varphi _\delta \) in the integral on the RHS of (3.24) and we argue as follows. Since \(\rho _t^\sigma \otimes {\varvec{j}}_t^\sigma \rightharpoonup \rho _t\otimes {\varvec{j}}_t\) for any \(t\in [0,T]\) as \(\sigma \rightarrow 0\), and \(K^\varepsilon _R\rightarrow K_R\) uniformly on compact sets as \(\varepsilon \rightarrow 0\), we can pass to the limit in \(\sigma \) and \(\varepsilon \), for any R and \(\delta >0\),
$$\begin{aligned}&\frac{1}{2}{\int }_{s}^{t} {\iint }_G {\int }_{{\mathbb {R}}^{d}}\varphi _\delta (z,x,y) \left( K_R^\varepsilon (y,z)-K_R^\varepsilon (x,z)\right) \,\text {d}\rho _\tau ^\sigma (z) \eta (x,y)\,\text {d}{\varvec{j}}_\tau ^\sigma (x,y)\,\text {d}\tau \nonumber \\&\rightarrow \frac{1}{2} {\int }_{s}^{t} {\iint }_G {\int }_{{\mathbb {R}}^{d}}\varphi _\delta (z,x,y) \left( K_R(y,z)-K_R(x,z)\right) \,\text {d}\rho _\tau (z) \eta (x,y) \,\text {d}{\varvec{j}}_\tau (x,y)\,\text {d}\tau . \end{aligned}$$
(3.25)
By using \(\varphi _\delta \leqq 1\), Assumption (K3), Lemma 2.10 with \(\Phi (x,y)=|x-y|\vee |x-y|^2\) and (A1), we can bound the modulus of (3.25) for any \(\tau \in [s,t]\) by
$$\begin{aligned}&\frac{1}{2}{\iint }_G {\int }_{{\mathbb {R}}^{d}}\frac{|K_R(y,z)-K_R(x,z) |}{|x-y|\vee |x-y|^2}\,\text {d}\rho _t(z) \left( |x-y|\vee |x-y|^2\right) \eta (x,y) \,\text {d}|{\varvec{j}}_t|(x,y) \\&\quad \leqq L \sqrt{2C_\eta \, {\mathcal {A}}(\rho _t,{\varvec{j}}_t)}. \end{aligned}$$
Hence the integral is uniformly bounded in \(\delta \) and R, and by the Lebesgue dominated convergence theorem we can pass to the limit in (3.25) in \(\delta \) and R, obtaining
$$\begin{aligned} \frac{1}{2} {\int }_{s}^{t} {\iint }_G {\int }_{{\mathbb {R}}^{d}}\left( K(y,z)-K(x,z)\right) \,\text {d}\rho _\tau (z) \eta (x,y) \,\text {d}{\varvec{j}}_\tau (x,y)\,\text {d}\tau . \end{aligned}$$
Now, it remains to control the integral involving the term \(1-\varphi _\delta (z,x,y)\) in the integrand. Let us note that, for all \(\delta >0\),
$$\begin{aligned} \left( {{\mathbb {R}}^{d}}\times G\right) {\setminus } \{\varphi _\delta =1\} \subseteq \big ({\overline{B}}_{\delta ^{-1}}^\mathrm {c} \times G \big ) \cup \big ( {{\mathbb {R}}^{d}}\times ( G{\setminus } (G_\delta \cap N_\delta ))\big ) =: M_\delta . \end{aligned}$$
Using Assumption (K3) and splitting each contribution, we obtain
$$\begin{aligned}&\left|{\iint }_G {\int }_{{\mathbb {R}}^{d}}\left( 1-\varphi _\delta (z,x,y)\right) \left( K_R^\varepsilon (y,z)-K_R^\varepsilon (x,z)\right) \, \text {d}\rho _t^\sigma (z) \eta (x,y)\, \text {d}{\varvec{j}}_t^\sigma (x,y) \right| \\&\quad \leqq L {\iiint }_{M_\delta }\left( |x-y|\vee |x-y|^2\right) \eta (x,y)\,\text {d}{\varvec{j}}_t^\sigma (x,y)\,\text {d}\rho ^\sigma _t(z) \\&\quad \leqq L {\iiint }_{{\overline{B}}_{\delta ^{-1}}^\mathrm {c} \times G}\left( |x-y|\vee |x-y|^2\right) \eta (x,y)\,\text {d}{\varvec{j}}_t^\sigma (x,y)\,\text {d}\rho ^\sigma _t(z)\\&\qquad + 2L {\int }_{{{\mathbb {R}}^{d}}}\,\text {d}\rho ^\sigma _t(z){\iint }_{G_\delta ^\mathrm {c}}\left( |x-y|\vee |x-y|^2\right) w_t(x,y)_+ \eta (x,y)\,\text {d}\rho _t^\sigma (x)\,\text {d}\mu (y)\\&\qquad + 2L {\int }_{{{\mathbb {R}}^{d}}}\,\text {d}\rho ^\sigma _t(z){\iint }_{N_\delta ^\mathrm {c}}\left( |x-y|\vee |x-y|^2\right) w_t(x,y)_+ \eta (x,y)\,\text {d}\rho _t^\sigma (x)\,\text {d}\mu (y). \end{aligned}$$
Using Lemma 2.10 with \(\Phi (x,y)=|x-y|\vee |x-y|^2\), (A1) and the Cauchy–Schwarz inequality with respect to \(\eta \, \rho _t^\sigma \otimes \mu \), the right-hand side in the inequality above can be further bounded by
$$\begin{aligned}&4L \sqrt{ C_\eta {\mathcal {A}}(\rho _t^\sigma ,{\varvec{j}}_t^\sigma )} \ \rho _t^\sigma \left( {\overline{B}}_{\delta ^{-1}}^\mathrm {c}\right) \\&+ 2L \sqrt{{\mathcal {A}}(\rho _t^\sigma ,{\varvec{j}}_t^\sigma )} \left( \left( {\iint }_{G_\delta ^\mathrm {c}} \left|x-y \right|^2 \eta (x,y) \,\text {d}\rho _t^\sigma (x)\, \text {d}\mu (y)\right) ^{\frac{1}{2}} + \sqrt{C_\eta \rho _t^{\sigma }\left( {\overline{B}}_{\delta ^{-1}}^\mathrm {c}\right) }\right) . \end{aligned}$$
Thanks to the uniform second moment bound of \(\rho _t^\sigma \) from Lemma 2.16 and Assumption (A2), the above terms converge to zero as \(\delta \rightarrow 0\), which concludes the proof. \(\square \)
That \(\sqrt{{\mathcal {D}}}\) is a one-sided strong upper gradient for \({\mathcal {E}}\) is an easy consequence of the previous result.
Corollary 3.11
For any curve \(\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}))\) it holds that
$$\begin{aligned} {\mathcal {E}}(\rho _t)-{\mathcal {E}}(\rho _s) \geqq - {\int }_s^t\sqrt{{\mathcal {D}}(\rho _\tau )}\,|\rho _\tau '| \,\text {d}\tau \quad \text{ for } \text{ all }\;\; \, 0\leqq s\leqq t\leqq T, \end{aligned}$$
(3.26)
i.e., \(\sqrt{{\mathcal {D}}}\) is a one-sided strong upper gradient for \({\mathcal {E}}\) in the sense of Definition 3.4.
Proof
Without loss of generality we assume \({\int }_s^t\sqrt{{\mathcal {D}}(\rho _\tau )}|\rho '|(\tau )\,\text {d}\tau <\infty \), as otherwise the inequality (3.26) is trivially satisfied. We obtain the result as consequence of Proposition 3.10 by applying the one-sided Cauchy–Schwarz inequality (Lemma 3.7) to (3.22) as follows: for any \(0\leqq s\leqq t\leqq T\),
$$\begin{aligned}&{\mathcal {E}}(\rho _t) - {\mathcal {E}}(\rho _s) \\&\quad = {\int }_s^t {\widehat{g}}_{\rho _\tau ,w_\tau }\left( w_\tau , {\overline{\nabla }}\frac{\delta {\mathcal {E}}(\rho _\tau )}{\delta \rho }\right) \,\text {d}\tau = - {\int }_s^t {\widehat{g}}_{\rho _\tau ,w_\tau }\left( w_\tau ,-{\overline{\nabla }}\frac{\delta {\mathcal {E}}(\rho _\tau )}{\delta \rho }\right) \,\text {d}{\tau }\\&\quad \geqq -{\int }_s^t\sqrt{{\widehat{g}}_{\rho _\tau ,-{\overline{\nabla }}\frac{\delta {\mathcal {E}}(\rho _\tau )}{\delta \rho }}\left( -{\overline{\nabla }}\frac{\delta {\mathcal {E}}(\rho _\tau )}{\delta \rho } ,-{\overline{\nabla }}\frac{\delta {\mathcal {E}}(\rho _\tau )}{\delta \rho }\right) }\sqrt{{\widehat{g}}_{\rho _t,w_\tau }(w_\tau ,w_\tau )}\,\text {d}\tau \\&\quad ={\int }_s^t\sqrt{{\mathcal {D}}(\rho _\tau )}\,\sqrt{{\hat{{\mathcal {A}}}}(\rho _\tau ,w_t)} \,\text {d}\tau \\&\quad ={\int }_s^t\sqrt{{\mathcal {D}}(\rho _\tau )}\,|\rho '|(\tau ) \,\text {d}\tau . \end{aligned}$$
Note that the last two equalities are provided by identity (3.18) and Proposition 2.25. \(\square \)
At this point, we have collected all auxiliary results to deduce Theorem 3.9.
Proof of Theorem 3.9
Let us start by assuming that \(\rho \) is a weak solution to (\({\text {NL}}^2 {\text {IE}}\)). In view of Definition 3.1, a weak solution is obtained from the weak formulation of the nonlocal continuity equation (2.13) if we set
$$\begin{aligned} \text {d}{\varvec{j}}_t(x,y)={\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(x,y)_- \text {d}\rho _t(x)\text {d}\mu (y)-{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(x,y)_+ \text {d}\rho _t(y)\text {d}\mu (x). \end{aligned}$$
Then, by writing \(v_t^{\mathcal {E}}(x,y)=-{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(x,y)\), it is easy to check that
$$\begin{aligned} {\mathcal {A}}(\rho _t,{\varvec{j}}_t)={\widehat{{\mathcal {A}}}}(\rho _t,v_t^{\mathcal {E}})={\mathcal {D}}(\rho _t) < \infty , \end{aligned}$$
where the finiteness follows from Assumptions (K3) and (A1), as shown by the computation
$$\begin{aligned} {\mathcal {D}}(\rho _t)&= {\iint }_G |({\overline{\nabla }}K*\rho _t(x,y))_-|^2\eta (x,y)\,\text {d}\rho _t(x)\,\text {d}\mu (y)\\&\leqq {\iint }_G ({\overline{\nabla }}K*\rho _t(x,y))^2\eta (x,y)\,\text {d}\rho _t(x)\,\text {d}\mu (y)\\&= {\iint }_G \left( {\int }_{{\mathbb {R}}^{d}}(K(x,z)-K(y,z))\,\text {d}{\rho _t}(z) \right) ^2 \eta (x,y)\,\text {d}\rho _t(x)\,\text {d}\mu (y)\\&\leqq {\iint }_G {\int }_{{\mathbb {R}}^{d}}(K(x,z)-K(y,z))^2\,\text {d}{\rho _t}(z) \eta (x,y)\,\text {d}\rho _t(x)\,\text {d}\mu (y)\\&\leqq L^2 {\int }_{{\mathbb {R}}^{d}}{\iint }_G \left( |x-y|^2\vee |x-y|^4\right) \eta (x,y)\,\text {d}\mu (y) \,\text {d}\rho _t(x)\,\text {d}{\rho _t}(z)\\&\leqq L^2 C_\eta {\int }_{{\mathbb {R}}^{d}}{\int }_{{\mathbb {R}}^{d}}\,\text {d}\rho _t(x)\,\text {d}{\rho _t}(z) = L^2 C_\eta . \end{aligned}$$
Thanks to Proposition 2.25, this also proves that \(\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}))\) and \(|\rho _t'|^2\leqq {\mathcal {D}}(\rho _t)\) for a.e. \(t\in [0,T]\). In view of Proposition 3.10, we thus obtain
$$\begin{aligned} {\mathcal {E}}(\rho _t)-{\mathcal {E}}(\rho _s)&= {\int }_s^t {\widehat{g}}_{\rho _\tau ,v_\tau ^{\mathcal {E}}}\left( v_\tau ^{\mathcal {E}},{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(\rho _\tau )\right) \,\text {d}\tau \\&= - {\int }_s^t {\widehat{g}}_{\rho _\tau ,v_\tau ^{\mathcal {E}}}\left( v_\tau ^{\mathcal {E}},-{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(\rho _\tau )\right) \,\text {d}\tau \\&=-{\int }_s^t{\iint }_G\left| {\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(x,y)_-\right| ^2\eta (x,y)\,\text {d}\rho _\tau (x)\,\text {d}\mu (y)\,\text {d}\tau \\&=-{\int }_s^t {\mathcal {D}}(\rho _\tau )\,\text {d}\tau \leqq -{\int }_s^t\sqrt{{\mathcal {D}}(\rho _\tau )}|\rho '_\tau |\,\text {d}\tau . \end{aligned}$$
This implies that
-
(i)
the map \(t\mapsto {\mathcal {E}}(\rho _t)\) is non-increasing;
-
(ii)
\({\mathcal {E}}(\rho _t)-{\mathcal {E}}(\rho _s)+\frac{1}{2}{\int }_s^t {\mathcal {D}}(\rho _\tau )+|\rho _\tau '|^2\,\text {d}\tau = 0\), by Corollary 3.11.
Whence the first part of the theorem follows for \(s=0\) and \(t=T\) since \({\mathcal {G}}_T(\rho )=0\).
Consider now \(\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}))\) satisfying the equality (3.21). Let us verify that it is a weak solution of (\({\text {NL}}^2 {\text {IE}}\)). By Proposition 2.25 there exists a unique family \(({\varvec{j}}_t)_{t\in [0,T]}\) in \(T_{\rho _t}{\mathcal {P}}_2({{\mathbb {R}}^{d}})\) such that \((\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T\), \({\int }_0^T\sqrt{{\mathcal {A}}(\rho _t,{\varvec{j}}_t)}\,\text {d}t<\infty \) and \(|\rho _t'|^2={\mathcal {A}}(\rho _t,{\varvec{j}}_t)\) for a.e. \(t\in [0,T]\). Moreover, by Lemma 2.6 we find an antisymmetric measurable vector field \(w:[0,T]\times G \rightarrow {\mathbb {R}}\) such that
$$\begin{aligned} \text {d}{\varvec{j}}_t(x,y) = w_t(x,y)_+ \text {d}\gamma _{1,t}(x,y) - w_t(x,y)_- \text {d}\gamma _{2,t}(x,y). \end{aligned}$$
Thanks to Proposition 3.10, by applying the one-sided Cauchy–Schwarz, using the identification (3.18), the definition of the local slope (3.19) and Young inequality, we get
$$\begin{aligned} {\mathcal {E}}(\rho _T)-{\mathcal {E}}(\rho _0)&={\int }_0^T {\widehat{g}}_{\rho _\tau ,w_\tau }\left( w_\tau , {\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(\rho _\tau ) \right) \,\text {d}\tau \\&= - {\int }_0^T {\widehat{g}}_{\rho _\tau ,w_\tau }\left( w_\tau , -{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(\rho _\tau ) \right) \,\text {d}\tau \\&\geqq -{\int }_0^T\sqrt{{\mathcal {D}}(\rho _\tau )}\sqrt{{\mathcal {A}}(\rho _\tau ,{\varvec{j}}_\tau )}\,\text {d}\tau =-{\int }_0^T\sqrt{{\mathcal {D}}(\rho _\tau )}|\rho _\tau '|\,\text {d}\tau \\&\geqq -\frac{1}{2}{\int }_0^T {\mathcal {D}}(\rho _\tau )\,\text {d}\tau - \frac{1}{2} {\int }_0^T |\rho _\tau '|^2\,\text {d}\tau . \end{aligned}$$
Thanks to the equality (3.21), we actually have that the above inequalities are equalities, which holds if and only if \(w_t(x,y)=-{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }(x,y)\) for a.e. \(t\in [0,T]\) and \(\gamma _{1,t}\)-a.e. \((x,y)\in G\). Hence \((\rho ,{\varvec{j}})\in {{\,\mathrm{CE}\,}}_T\) with \(w=-{\overline{\nabla }}\frac{\delta {\mathcal {E}}}{\delta \rho }\), that is, \(\rho \) is a weak solution to (\({\text {NL}}^2 {\text {IE}}\)). \(\square \)
Stability and Existence of Weak Solutions
Theorem 3.9 provides a characterization of (weak) solutions to (\({\text {NL}}^2 {\text {IE}}\)) as minimizers of \({\mathcal {G}}_T\) attaining the value 0. The direct method of calculus of variations gives existence of minimizers of \({\mathcal {G}}_T\). However, it is not clear a priori whether they attain the value 0 and are thus actually weak solutions to (\({\text {NL}}^2 {\text {IE}}\)). Hence we prove compactness and stability of gradient flows (see Theorem 3.14) and approximate the desired problem by discrete problems for which the existence of solutions is easy to show; see the proof of Theorem 3.15. We start by proving that the local slope \({\mathcal {D}}\) is narrowly lower semicontinuous jointly in its arguments, \(\mu \) and \(\rho \); see Lemma 3.12. We then establish the compactness coming from a uniform control of the De Giorgi functional \({\mathcal {G}}_T\), as well as its joint narrow lower semicontinuity (see Lemma 3.13), which we prove using compactness in \({{\,\mathrm{CE}\,}}_T\) and the joint narrow lower semicontinuity of the action (see Proposition 2.17) and of the local slope. (See also [48, Theorem 2] for an analogous strategy.)
In Theorem 3.14 we prove one of our main results, namely that the functional \({\mathcal {G}}_T\) is stable under variations in base measures, defining the vertices of the graph, and absolutely continuous curves. A particular consequence of this theorem is that weak solutions to (\({\text {NL}}^2 {\text {IE}}\)) with respect to graphs defined by random samples of a measure \(\mu \) converge to weak solutions to (\({\text {NL}}^2 {\text {IE}}\)) with respect to \(\mu \); see Remark 3.17.
The existence of weak solutions of (\({\text {NL}}^2 {\text {IE}}\)) (and thus gradient flows) with respect to \({\mathcal {E}}\) proved in Theorem 3.15 shows that, indeed, the De Giorgi functional (3.20) corresponding to an interaction potential K satisfying (K1)–(K3) admits a minimizer when \(\mu ({{\mathbb {R}}^{d}})\) is finite.
Lemma 3.12
Let \((\mu ^n)_n\subset {\mathcal {M}}^+({{\mathbb {R}}^{d}})\) and suppose that \((\mu ^n)_n\) narrowly converges to \(\mu \). Assume that the base measures \((\mu ^n)_n\) and \(\mu \) are such that (A1) and (A2) hold uniformly in n, and let K satisfy Assumptions (K1)–(K3). Let moreover \((\rho ^n)_n\) be a sequence such that \(\rho ^n \in {\mathcal {P}}_{2}({{\mathbb {R}}^{d}})\) for all \(n\in {\mathbb {N}}\) and \(\rho ^n\rightharpoonup \rho \) as \(n\rightarrow \infty \) for some \(\rho \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\). Then
$$\begin{aligned} \liminf _{n\rightarrow \infty } {\mathcal {D}}(\mu ^n;\rho ^n) \geqq {\mathcal {D}}(\mu ;\rho ) . \end{aligned}$$
Proof
For every \(n\in {\mathbb {N}}\) we set \(u^n = {\overline{\nabla }}K*\rho ^n\). Furthermore, we write \(u= {\overline{\nabla }}K*\rho \) and define \(g:{\mathbb {R}}\rightarrow {\mathbb {R}}\) by \(g(x) = (x_+)^2\) for all \(x \in {\mathbb {R}}\). Then note that g is convex and continuous, and
$$\begin{aligned} {\mathcal {D}}(\mu ^n;\rho ^n) = {\iint }_G g(u^n(x,y)) \eta (x,y) \,\text {d}\rho ^n(x) \,\text {d}\mu ^n(y), \end{aligned}$$
and, similarly,
$$\begin{aligned} {\mathcal {D}}(\mu ;\rho ) = {\iint }_G g(u(x,y)) \eta (x,y) \,\text {d}\rho (x)\,\text {d}\mu (y). \end{aligned}$$
We want to use [2, Theorem 5.4.4 (ii)] to prove the desired \(\liminf \) inequality. Observe that \(u^n \in L^2(\eta \,\gamma _1^n)\) and \(u \in L^2(\eta \,\gamma _1)\); indeed, (K3) and (A1) give
$$\begin{aligned}&{\iint }_G u^n(x,y)^2 \eta (x,y)\,\text {d}\gamma _1^n(x,y)\\&\quad = {\iint }_G (K*\rho ^n(y) - K*\rho ^n(x))^2 \eta (x,y)\,\text {d}\gamma _1^n(x,y)\\&\quad \leqq L^2 C_\eta , \end{aligned}$$
and, similarly, for u. Let now \(\varphi \in C_\mathrm {c}^\infty (G)\). We have
$$\begin{aligned}&{{\iint }_G u^n(x,y)\varphi (x,y)\eta (x,y) \,\text {d}\gamma _1^n(x,y)}\\&\quad = {\iint }_G \left( {\int }_{{\mathbb {R}}^{d}}K(y,z)\,\text {d}\rho ^n(z) - {\int }_{{\mathbb {R}}^{d}}K(x,z)\,\text {d}\rho ^n(z)\right) \varphi (x,y)\eta (x,y) \,\text {d}\gamma _1^n(x,y)\\&\quad = {\iint }_G {\int }_{{\mathbb {R}}^{d}}( K(y,z) - K(x,z) )\varphi (x,y)\eta (x,y) \,\text {d}(\rho ^n\otimes \gamma _1^n)(z,x,y)\\&\quad ={\iint }_{{{\,\mathrm{supp}\,}}\varphi }{\int }_{{{\mathbb {R}}^{d}}\cap B_R}( K(y,z) - K(x,z) )\varphi (x,y)\eta (x,y) \,\text {d}(\rho ^n\otimes \gamma _1^n)(z,x,y)\\&\qquad + {\iint }_{{{\,\mathrm{supp}\,}}\varphi }{\int }_{{{\mathbb {R}}^{d}}{\setminus } B_R}( K(y,z) - K(x,z) )\varphi (x,y)\eta (x,y) \,\text {d}(\rho ^n\otimes \gamma _1^n)(z,x,y). \end{aligned}$$
The last integral is actually vanishing as \(R\rightarrow \infty \) since (K3), (A1) and Prokhorov’s Theorem give
$$\begin{aligned}&{\left| {\iint }_{{{\,\mathrm{supp}\,}}\varphi }{\int }_{{{\mathbb {R}}^{d}}{\setminus } B_R}(K(y,z) - K(x,z) )\varphi (x,y)\eta (x,y) \,\text {d}(\rho ^n\otimes \gamma _1^n)(z,x,y)\right| }\\&\quad \leqq \frac{L\Vert \varphi \Vert _{\infty }\rho ^n({{\mathbb {R}}^{d}}{\setminus } B_R)}{\inf _{{{\,\mathrm{supp}\,}}\varphi }(|x-y|\vee |x-y|^2)} \\&\qquad {\iint }_{{{\,\mathrm{supp}\,}}\varphi }(|x-y|^2\vee |x-y|^4)\eta (x,y)\,\,\text {d}\mu ^n(y)\,\,\text {d}\rho ^n(x)\\&\quad \leqq \frac{LC_\eta \Vert \varphi \Vert _{\infty }\rho ^n({{\mathbb {R}}^{d}}{\setminus } B_R)}{\inf _{{{\,\mathrm{supp}\,}}\varphi }(|x-y|\vee |x-y|^2)}\underset{R\rightarrow \infty }{\longrightarrow }0. \end{aligned}$$
The function \((z,x,y) \mapsto (K(y,z) - K(x,z))\varphi (x,y)\eta (x,y)\) is continuous and bounded on \(({{\mathbb {R}}^{d}}\cap B_R)\times G\) thanks to Assumption (W). In addition, we note that \((\rho ^n\otimes \gamma _1^n)_n\) narrowly converges to \(\rho \otimes \gamma _1\) in \({\mathcal {P}}({{\mathbb {R}}^{d}})\times {\mathcal {M}}^+(G)\). Therefore, we obtain for any \(R>0\) the convergence
$$\begin{aligned}&\lim _{n\rightarrow \infty }{\iint }_{{{\,\mathrm{supp}\,}}\varphi }{\int }_{{{\mathbb {R}}^{d}}\cap B_R} ( K(y,z) - K(x,z) )\varphi (x,y)\eta (x,y) \,\text {d}(\rho ^n\otimes \gamma _1^n)(z,x,y)\\&\quad ={\iint }_{{{\,\mathrm{supp}\,}}\varphi }{\int }_{{{\mathbb {R}}^{d}}\cap B_R}( K(y,z) - K(x,z) )\varphi (x,y)\eta (x,y) \,\text {d}(\rho \otimes \gamma _1)(z,x,y) . \end{aligned}$$
By sending \(R\rightarrow \infty \), we obtain
$$\begin{aligned}&{\lim _{n\rightarrow \infty } {\iint }_G u^n(x,y)\varphi (x,y) \eta (x,y)\,\text {d}\gamma _1^n(x,y)}\\&= {\iint }_G {\int }_{{\mathbb {R}}^{d}}( K(y,z) - K(x,z) )\varphi (x,y)\eta (x,y) \,\text {d}(\rho \otimes \gamma _1)(z,x,y)\\&= {\iint }_G u(x,y)\varphi (x,y) \eta (x,y)\,\text {d}\gamma _1(x,y). \end{aligned}$$
Thus, \(u^n\) converges weakly to u as \(n\rightarrow \infty \) in the sense of [2, Definition 5.4.3]. By [2, Theorem 5.4.4 (ii)] we therefore conclude that
$$\begin{aligned} \liminf _{n\rightarrow \infty } {\mathcal {D}}(\mu ^n;\rho ^n)&= \liminf _{n\rightarrow \infty } {\iint }_G g(u^n(x,y)) \eta (x,y)\, \text {d}\rho ^n(x) \,\text {d}\mu ^n(y)\\&\geqq {\iint }_G g(u(x,y)) \eta (x,y) \,\text {d}\rho (x)\,\text {d}\mu (y) = {\mathcal {D}}(\mu ;\rho ) , \end{aligned}$$
which is the desired result. \(\square \)
Let us also prove the compactness and narrow lower semicontinuity of the De Giorgi functional.
Lemma 3.13
(Compactness and lower semicontinuity of the De Giorgi functional) Let \((\mu ^n)_n\subset {\mathcal {M}}^+({\mathbb {R}}^d)\) and suppose that \((\mu ^n)_n\) narrowly converges to \(\mu \). Assume that the base measures \(\mu ^n\) and \(\mu \) satisfy (A1) and (A2) uniformly in n, and let K satisfy (K1)–(K3). Let moreover \((\rho ^n)_n\) be a sequence so that \(\rho ^n \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_{\mu ^n}))\) for all \(n\in {\mathbb {N}}\) with \(\sup _{n\in {\mathbb {N}}} M_2(\rho _0^n) < \infty \) and \(\sup _{n\in {\mathbb {N}}} {\mathcal {G}}_T(\mu ^n;\rho ^n)<\infty \). Then, up to a subsequence, \(\rho ^n_t \rightharpoonup \rho _t\) as \(n\rightarrow \infty \) for all \(t\in [0,T]\) for some \(\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu ))\) and
$$\begin{aligned} \liminf _{n\rightarrow \infty } {\mathcal {G}}_T(\mu ^n;\rho ^n) \geqq {\mathcal {G}}_T(\mu ;\rho ). \end{aligned}$$
Proof
For any \(n\in {\mathbb {N}}\), recall the definition
$$\begin{aligned} {\mathcal {G}}_T(\mu ^n;\rho ^n) = {\mathcal {E}}(\rho ^n_T) - {\mathcal {E}}(\rho ^n_0) + \frac{1}{2} {\int }_0^T {\mathcal {D}}(\mu ^n; \rho ^n_t) \,\text {d}t + \frac{1}{2} {\int }_0^T |(\rho _t^n)'|_{{\mathcal {T}}_{\mu ^n}}^2 \,\text {d}t, \end{aligned}$$
where we are careful to take the metric derivative of \(\rho ^n\) with respect to \({\mathcal {T}}_{\mu ^n}\) (as given in Definition 2.18). Since the domain of the energy \({\mathcal {E}}\) is all of \({\mathcal {P}}_2({{\mathbb {R}}^{d}})\) and the local slope \({\mathcal {D}}\) is non-negative, the bound \(\sup _{n\in {\mathbb {N}}} {\mathcal {G}}_T(\mu ^n;\rho ^n)<\infty \) ensures that
$$\begin{aligned} \sup _{n\in {\mathbb {N}}} {\int }_0^T |(\rho ^n_t)'|_{{\mathcal {T}}_{\mu ^n}}^2 \,\text {d}t < \infty . \end{aligned}$$
For all \(n\in {\mathbb {N}}\), since \(\rho ^n \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_{\mu ^n}))\), Proposition 2.25 yields the existence of a flux \({\varvec{j}}^n\) such that \((\rho ^n,{\varvec{j}}^n)\in {{\,\mathrm{CE}\,}}_T\) and \(|(\rho ^n_t)'|^2 = {\mathcal {A}}(\mu ^n;\rho ^n_t,{\varvec{j}}^n_t)\) for almost all \(t\in [0,T]\). We then get
$$\begin{aligned} \sup _{n\in {\mathbb {N}}} {\int }_0^T {\mathcal {A}}(\mu ^n;\rho ^n_t,{\varvec{j}}^n_t) \,\text {d}t = \sup _{n\in {\mathbb {N}}} {\int }_0^T |(\rho ^n_t)'|_{{\mathcal {T}}_{\mu ^n}}^2 \,\text {d}t < \infty . \end{aligned}$$
By Proposition 2.17, there now exists \((\rho ,{\varvec{j}}) \in {{\,\mathrm{CE}\,}}_T\) such that, up to subsequences, \(\rho _t^n \rightharpoonup \rho _t\) for all \(t\in [0,T]\) and \({\varvec{j}}^n \rightharpoonup {\varvec{j}}\) as \(n\rightarrow \infty \), and
$$\begin{aligned} \infty > \liminf _{n\rightarrow \infty } {\int }_0^T {\mathcal {A}}(\mu ^n;\rho ^n_t, {\varvec{j}}^n_t) \,\text {d}t \geqq {\int }_0^T {\mathcal {A}}(\mu ;\rho _t,{\varvec{j}}_t) \,\text {d}t. \end{aligned}$$
By Proposition 2.25, we therefore have \(\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu ))\) and \(|(\rho _t)'|_{{\mathcal {T}}_\mu }^2 \leqq {\mathcal {A}}(\mu ;\rho _t,{\varvec{j}}_t)\) for almost all \(t\in [0,T]\), which finally gives
$$\begin{aligned} \liminf _{n\rightarrow \infty } {\int }_0^T |(\rho ^n_t)'|^2_{{\mathcal {T}}_{\mu ^n}} \,\text {d}t \geqq {\int }_0^T |\rho _t'|_{{\mathcal {T}}_\mu } \,\text {d}t. \end{aligned}$$
(3.27)
By the narrow continuity of the energy proved in Proposition 3.3, we get
$$\begin{aligned} \lim _{n\rightarrow \infty } {\mathcal {E}}(\rho ^n_T) = {\mathcal {E}}(\rho _T) \quad \text {and} \quad \lim _{n\rightarrow \infty } {\mathcal {E}}(\rho _0^n) = {\mathcal {E}}(\rho _0). \end{aligned}$$
(3.28)
Furthermore, by Fatou’s lemma and the narrow lower semicontinuity of the local slope shown in Lemma 3.12, we have
$$\begin{aligned} \liminf _{n\rightarrow \infty } {\int }_0^T {\mathcal {D}}(\mu ^n;\rho ^n_t) \,\text {d}t \geqq {\int }_0^T {\mathcal {D}}(\mu ;\rho _t) \,\text {d}t. \end{aligned}$$
(3.29)
Gathering (3.27), (3.28) and (3.29), we finally obtain
$$\begin{aligned} \liminf _{n\rightarrow \infty } {\mathcal {G}}_T(\mu ^n;\rho ^n)\geqq & {} {\mathcal {E}}(\rho _T) - {\mathcal {E}}(\rho _0) + \frac{1}{2} {\int }_0^T {\mathcal {D}}(\mu ;\rho _t) \,\text {d}t + \frac{1}{2} {\int }_0^T |\rho _t'|_{{\mathcal {T}}_\mu }^2 \,\text {d}t \\= & {} {\mathcal {G}}_T(\mu ;\rho ), \end{aligned}$$
which ends the proof. \(\square \)
We now get our stability result.
Theorem 3.14
(Stability of gradient flows) Let \((\mu ^n)_n\subset {\mathcal {M}}^+({\mathbb {R}}^d)\) and suppose that \((\mu ^n)_n\) narrowly converges to \(\mu \). Assume that the base measures \(\mu ^n\) and \(\mu \) satisfy (A1) and (A2) uniformly in n, and let the interaction potential K satisfy (K1)–(K3). Suppose that \(\rho ^n\) is a gradient flow of \({\mathcal {E}}\) with respect to \(\mu ^n\) for all \(n\in {\mathbb {N}}\), that is,
$$\begin{aligned} {\mathcal {G}}_T(\mu ^n;\rho ^n) = 0 \quad \text{ for } \text{ all } \, n\in {\mathbb {N}}, \end{aligned}$$
such that \((\rho _0^n)_n\) satisfies \(\sup _{n\in {\mathbb {N}}} M_2(\rho _0^n)< \infty \) and \(\rho _t^n \rightharpoonup \rho _t\) as \(n\rightarrow \infty \) for all \(t\in [0,T]\) for some curve \((\rho _t)_{t\in [0,T]} \subset {\mathcal {P}}_2({{\mathbb {R}}^{d}})\). Then, \(\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu ))\) and \(\rho \) is a gradient flow of \({\mathcal {E}}\) with respect to \(\mu \), that is,
$$\begin{aligned} {\mathcal {G}}_T(\mu ;\rho ) = 0. \end{aligned}$$
Proof
By Lemma 3.13 we directly obtain that \(\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu ))\) and, up to a subsequence,
$$\begin{aligned} 0 = \liminf _{n\rightarrow \infty } {\mathcal {G}}_T(\mu ^n;\rho ^n) \geqq {\mathcal {G}}(\mu ;\rho ). \end{aligned}$$
Finally, since \({\mathcal {G}}_T(\mu ;\rho ) \geqq 0\) by Young’s inequality and Corollary 3.11, we obtain \({\mathcal {G}}_T(\mu ;\rho ) = 0\). \(\square \)
Note that, via Theorem 3.9, the above theorem also shows stability of weak solutions to (\({\text {NL}}^2 {\text {IE}}\)). Typically, in Theorem 3.14, \((\mu ^n)_n\) is a sequence of atomic measures used to approximate, or sample, the support of \(\mu \). Indeed, we now use this approach to show the existence of weak solutions to the nonlocal nonlocal-interaction equation.
Theorem 3.15
(Existence of weak solutions) Let K be an interaction potential satisfying Assumptions (K1)–(K3). Suppose that \(\mu \in {\mathcal {M}}^+({\mathbb {R}}^d)\) is finite, i.e., \(\mu ({{\mathbb {R}}^{d}})<\infty \), and satisfies (A2). Assume furthermore that for some \(C_\eta ' > 0\) it holds that
$$\begin{aligned} \sup _{(x,y) \in G \cap {{\,\mathrm{supp}\,}}\mu \otimes \mu } \left( |x-y|^2 \vee |x-y|^4 \right) \eta (x,y) \leqq C_\eta '. \end{aligned}$$
(3.30)
Consider \(\rho _0 \in {\mathcal {P}}_2({{\mathbb {R}}^{d}})\) which is \(\mu \)-absolutely continuous. Then there exists a weakly continuous curve \(\rho :[0,T] \rightarrow {\mathcal {P}}({{\mathbb {R}}^{d}})\) such that \({{\,\mathrm{supp}\,}}\rho _t\subseteq {{\,\mathrm{supp}\,}}\mu \) for all \(t\in [0,T]\), which is a weak solution of (\({\text {NL}}^2 {\text {IE}}\)) and satisfies the initial condition \(\rho (0)=\rho _0\).
Proof
Let \((\mu ^n)_n \subset {\mathcal {M}}^+({\mathbb {R}}^d)\) be a sequence of atomic measures such that \((\mu ^n)_n\) converges narrowly to \(\mu \). Moreover, assume that \(\mu ^n\) has finitely many atoms and \(\mu ^n({{\mathbb {R}}^{d}}) \leqq \mu ({{\mathbb {R}}^{d}})\) and \({{\,\mathrm{supp}\,}}\mu ^n \subseteq {{\,\mathrm{supp}\,}}\mu \) for all \(n\in {\mathbb {N}}\). Let \({\hat{\mu }}^n\) be the normalization of \(\mu ^n\) which has the same total mass as \(\mu \), that is,
$$\begin{aligned} {\hat{\mu }}^n = \frac{\mu ({\mathbb {R}}^d)}{\mu ^n({\mathbb {R}}^d)} \, \mu ^n , \end{aligned}$$
and let \(\pi ^n\) be optimal transportation plan between \(\mu \) and \({\hat{\mu }}^n\) for the quadratic cost. Let \(\rho _0^n\) be the second marginal of \({\tilde{\rho }}_0 \pi ^n\), where \({\tilde{\rho }}_0\) is the density of the measure \(\rho _0\) with respect to \(\mu \); namely, let \(\rho _0^n(A) = {\int }_{{\mathbb {R}}^d \times A} {\tilde{\rho }}_0(x) \,\text {d}\pi ^n(x,y)\) for any Borel set \(A\subset {{\mathbb {R}}^{d}}\). Note that \(\rho _0^n({\mathbb {R}}^d) = \rho _0({\mathbb {R}}^d)\) and \(\rho _0^n \ll \mu ^n\) for all \(n\in {\mathbb {N}}\), and that, since \({\tilde{\rho }}_0 \pi ^n\) is a transport plan between \(\rho _0\) and \(\rho _0^n\), \(\rho _0^n \rightharpoonup \rho _0\) as \(n\rightarrow \infty \).
Thanks to Assumption (3.30), it holds, for all \(n\in {\mathbb {N}}\), that
$$\begin{aligned} \mathop {\mu \mathrm{-ess\,sup}}\limits _{x\in {\mathbb {R}}^d} {\int } ( |x-y|^2 \vee |x-y|^4 ) \eta (x,y)\, \text {d}\mu ^n(y)\leqq & {} \mu ^n({{\mathbb {R}}^{d}}) C_\eta ' \nonumber \\\leqq & {} \mu ({{\mathbb {R}}^{d}}) C_\eta ' . \end{aligned}$$
(3.31)
Since, by construction \(\rho _0^n \ll \mu ^n\), we have \({{\,\mathrm{supp}\,}}\rho _0^n \subseteq {{\,\mathrm{supp}\,}}\mu ^n \subseteq {{\,\mathrm{supp}\,}}\mu \). This nested support property is, thanks to Proposition 2.28, preserved in time, so that \({{\,\mathrm{supp}\,}}\rho _t^n \subseteq {{\,\mathrm{supp}\,}}\mu \) for all \(t\in [0,T]\) and \(n\in {\mathbb {N}}\). For this reason, (3.31) can be used, under the stated support restriction on \(\rho _0\), instead of Assumption (A1) uniformly in n when calling Lemma 3.13 and Theorem 3.14 later in this proof. Since \(\mu ^n\) consists of finitely many atoms and \(\mu \) satisfies (A2), the family \((\mu _n)_n\) satisfies (A2) uniformly in n.
By Remark 1.1, we know that the ODE system (1.2)–(1.4) admits a unique solution for all \(n\in {\mathbb {N}}\). It can be easily checked that this solution, which we denote by \(\rho ^n\), is a weak solution to (\({\text {NL}}^2 {\text {IE}}\)) with respect to \(\mu ^n\) starting from \(\rho _0^n\), according to Definition 3.1. By Theorem 3.9, we then get that \(\rho ^n\) is a gradient flow of \({\mathcal {E}}\) with respect to \(\mu \) starting from \(\rho _0^n\) for all \(n\in {\mathbb {N}}\).
Combining the compactness part of Lemma 3.13 and the stability from Theorem 3.14, we get that, up to a subsequence, \(\rho _t^n \rightharpoonup \rho _t\) as \(n\rightarrow \infty \) for all \(t\in [0,T]\), where \(\rho \in {{\,\mathrm{AC}\,}}([0,T];({\mathcal {P}}_2({{\mathbb {R}}^{d}}),{\mathcal {T}}_\mu ))\) is a gradient flow of \({\mathcal {E}}\) with respect to \(\mu \) starting from \(\rho _0\). Theorem 3.9 finally shows that \(\rho \) is a weak solution to (\({\text {NL}}^2 {\text {IE}}\)) with respect to \(\mu \) starting from \(\rho _0\). \(\square \)
Remark 3.16
Assumption (3.30) is only needed to arrive at an atomic approximation sequence \((\mu ^n)_n\) of \(\mu \) such that Assumptions (A1) and (A2) hold uniformly in n. On a case-by-case basis, one could drop (3.30) and try to construct the sequence \((\mu ^n)_n\) explicitly in such a way as to satisfy both assumptions uniformly in n.
Remark 3.17
We conclude the section by remarking on the relevance of the Theorem 3.14 to the setting of machine learning. Namely, there \(\mu \) is the measure modeling the true data distribution, which can be assumed to be compact. Let \((x_i)_i\) be a sequence of i.i.d. samples of \(\mu \) and let \(\mu ^n = \frac{1}{n} \sum _{i=1}^n \delta _{x_i}\) be the empirical measure of the first n sample points. Assume \((\rho ^n)_n\) is a narrowly converging sequence of probability measures such that \({{\,\mathrm{supp}\,}}\rho ^n \subseteq \{x_1, \dots , x_n\}\) for all \(n\in {\mathbb {N}}\), and denote by \(\rho \) its limit. Assume that \(\eta \) is an edge weight kernel such that \(\mu \) and \(\eta \) satisfy (A2) and (3.30). Let K be an interaction kernel satisfying (K2) and (K3). Finally, let \(({\tilde{\rho }}^n)_n\) be the sequence of solutions of (\({\text {NL}}^2 {\text {IE}}\)) in the sense of Definition 3.1 such that \({\tilde{\rho }}^n_0 = \rho ^n\) for all \(n\in {\mathbb {N}}\). Then, by Lemma 3.13, the sequence \(({\tilde{\rho }}_t^n)_n\) narrowly converges along a subsequence for all \(t\in [0,T]\), and furthermore, by Theorem 3.15, any curve \(({\tilde{\rho }}_t)_{t\in [0,T]}\) of subsequential limits yields a solution \({\tilde{\rho }}\) of (\({\text {NL}}^2 {\text {IE}}\)) with initial condition \(\rho \).
Discussion of the Local Limit
Here we discuss at a formal level the connection between the nonlocal nonlocal-interaction equation and its limit as the graph structure localizes. We first present a very formal justification as to why we expect the solutions of (\({\text {NL}}^2 {\text {IE}}\)) to converge to the solutions of a nonlocal-interaction equation as the localizing parameter \(\varepsilon \rightarrow 0^+\), i.e., as the edge-weight function \(\eta = \eta _\varepsilon \) localizes. We conclude this section with an example that cautions that the formal argument cannot be justified in full generality. Proving the convergence of (\({\text {NL}}^2 {\text {IE}}\)) in the limit \(\varepsilon \rightarrow 0^+\), under appropriate conditions, remains an intriguing open problem.
Take \(\mu ={\text {Leb}}({{\mathbb {R}}^{d}})\) and choose \(\eta _\varepsilon \) given by (2.2). Consider a smooth interaction potential \(K:{{\mathbb {R}}^{d}}\times {{\mathbb {R}}^{d}}\rightarrow {\mathbb {R}}\) and a compactly supported initial condition \(\rho _0\) which has a continuous density with respect to \(\mu \). Let \(\rho ^\varepsilon \) be the solution of (\({\text {NL}}^2 {\text {IE}}\)) starting from \(\rho _0\) for the edge weight function \(\eta _\varepsilon \). Assume that \(\rho ^\varepsilon _t\) is absolutely continuous with respect to \(\mu \) for all t. In the following we drop the t-dependence of \(\rho ^\varepsilon \) for brevity. From (\({\text {NL}}^2 {\text {IE}}\)), by adding and subtracting \(\rho ^\varepsilon (x) {\int }_{{\mathbb {R}}^{d}}({\overline{\nabla }}K*\rho ^\varepsilon (x,y))_{+} \eta _\varepsilon (x,y) \,\text {d}y\), it follows that
$$\begin{aligned} \partial _t \rho ^\varepsilon (x)= & {} - \rho ^\varepsilon (x) {\int }_{{{\mathbb {R}}^{d}}} {\overline{\nabla }}K*\rho ^\varepsilon (x,y) \eta _\varepsilon (x,y) \,\text {d}y \\&- {\int }_{{{\mathbb {R}}^{d}}} {\overline{\nabla }}\rho ^\varepsilon (x,y) ({\overline{\nabla }}K*\rho ^\varepsilon (x,y))_+ \eta _\varepsilon (x,y) \,\text {d}y. \end{aligned}$$
Then, for almost all \(x \in {{\mathbb {R}}^{d}}\) we have
$$\begin{aligned}&{\int }_{{{\mathbb {R}}^{d}}} {\overline{\nabla }}K*\rho ^\varepsilon (x,y) \eta _\varepsilon (x,y) \,\text {d}y \\&\quad = \frac{2(2+d)}{\varepsilon ^2} {\int }_{{\mathbb {R}}^{d}}(K*\rho ^\varepsilon (y) - K*\rho ^\varepsilon (x)) \frac{\chi _{B_\varepsilon (x)}(y)}{|B_\varepsilon |} \,\text {d}y\\&\quad = \frac{2(2+d)}{\varepsilon ^2} \left( \frac{1}{|B_\varepsilon |} {\int }_{B_\varepsilon (x)} K*\rho ^\varepsilon (y) \,\text {d}y - K*\rho ^\varepsilon (x) \right) . \end{aligned}$$
A standard calculation, using a second-order Taylor expansion, shows that the right-hand side approximates \(\Delta K*\rho ^\varepsilon (x)\) when \(\varepsilon \) is small, provided that derivatives of \(\rho ^\varepsilon \) remain uniformly bounded.
Similarly, by Taylor expanding \({\overline{\nabla }}\rho ^\varepsilon \) and \({\overline{\nabla }}K *\rho ^\varepsilon \) to first order and changing variable over the unit sphere while carefully tracking the positive part, one gets
$$\begin{aligned}&{\int }_{{\mathbb {R}}^{d}}{\overline{\nabla }}\rho ^\varepsilon (x,y) ({\overline{\nabla }}K*\rho ^\varepsilon (x,y))_+ \eta _\varepsilon (x,y) \,\text {d}y \approx \nabla \rho ^\varepsilon (x) \cdot \nabla K*\rho ^\varepsilon (x) \\&\quad \text {for small } \varepsilon . \end{aligned}$$
Combining the expressions above yields
$$\begin{aligned} \partial _t \rho ^\varepsilon (x)\approx & {} -\rho ^\varepsilon (x) \Delta K*\rho ^\varepsilon (x) - \nabla \rho ^\varepsilon (x) \cdot \nabla K*\rho ^\varepsilon (x)\\= & {} -\nabla \cdot (\rho ^\varepsilon \nabla K*\rho ^\varepsilon )(x). \end{aligned}$$
This suggests that if \(\rho ^\varepsilon \) converge as \(\varepsilon \rightarrow 0^+\), then the limiting \(\rho \) is a solution of the standard nonlocal interaction equation (3.1). A possible way to attack the local limit within the variational framework is via a stability statement similar to that of Theorem 3.14, but now with respect to the family \((\eta _\varepsilon )_{\varepsilon >0}\) in the limit \(\varepsilon \rightarrow 0^+\). The next remark indicates that this will require further regularity assumptions on the interaction kernel K.
Remark 3.18
We present an example that indicates that, in certain situations, solutions of (\({\text {NL}}^2 {\text {IE}}\)) cannot be expected to converge to solutions of (3.1) as the interaction kernel \(\eta _\varepsilon \) becomes more concentrated. Namely, consider \(d=1\), \(\Omega = (-2,2)\) and \(\mu ={\text {Leb}}(\Omega )\). Let \(K(x,y) = 1-e^{-|x-y|}\) for all \(x,y\in \Omega \) and \(\eta \) be a smooth, even function, positive on \((-0.2,0.2)\) and zero otherwise. Consider \(\rho _0 = \frac{1}{2} (\delta _{-1} + \delta _1)\). It is straightforward to verify that \(\rho _t = \rho _0\) for all \(t\in [0,T]\) yields a weak solution of (\({\text {NL}}^2 {\text {IE}}\)) for all \(\varepsilon >0\). In particular, note that the corresponding velocity field satisfies \(v(-1,y) = -(K*\rho _0(y) - K*\rho _0(-1)) \leqq 0\) for all \(y \in (-1.2,-0.8)\), and thus the flux from \(x=-1\) remains zero, and analogously from \(x=1\). Therefore, one cannot expect the weak solutions for the interaction potential K to converge to weak solutions of (3.1) as \(\varepsilon \rightarrow 0^+\). We believe that, for these particular kernel K and edge weights \(\eta \), the problem persists for strong solutions for initial data close to \(\rho _0\), only that explicit solutions are not available.