1 Introduction

The aim of this paper was to derive the population cross-diffusion system of Shigesada et al. (1979) from a stochastic, moderately interacting particle system in a mean-field-type limit. More precisely, we derive the system of equations

$$\begin{aligned} \partial _t u_i = {\text {div}}(u_i\nabla U_i) + \Delta \bigg (\sigma _i u_i + u_i\sum _{j=1}^n f(a_{ij}u_j)\bigg ),\quad u_i(0) = u_{0,i}\quad \text{ in } {{\mathbb {R}}}^d,\ t>0,\nonumber \\ \end{aligned}$$
(1)

where \(i=1,\ldots ,n\) is the species index, \(d\ge 1\) the space dimension, \(u=(u_1,\ldots ,u_n)\) is the vector of population densities, and \(U_i=U_i(x)\) are given environmental potentials. The parameters \(\sigma _i>0\) are the constant diffusion coefficients in the stochastic system, and \(a_{ij}\ge 0\) are limiting values of the interaction potentials. In the linear case \(f(s)=s\), we obtain the population model in Shigesada et al. (1979). System (1) with nonlinear functions f have also been studied in the mathematical literature; see, e.g., (Chen et al. 2018; Desvillettes et al. 2015; Lepoutre and Moussa 2017). Such systems can be formally derived from random walks on a lattice, where the nonlinearity originates from the transition rates in the random-walk model (Zamponi and Jüngel 2017, Appendix A). Assuming that the transition rates depend in a nonlinear way on the densities leads to equations similar to (1). We assume that f is smooth but possibly not globally Lipschitz continuous (including power functions). Our results are valid for functions \(f_i\) depending on the species type, but we choose the same function for all species to simplify the presentation.

This paper extends the many-particle limit of Chen et al. (2019) leading to the cross-diffusion system

$$\begin{aligned} \partial _i u_i = {\text {div}}\bigg (\sigma _i\nabla u_i + \sum _{j=1}^n a_{ij}u_i\nabla u_j\bigg ) \quad \text{ in } {{\mathbb {R}}}^d,\ t>0,\ i=1,\ldots ,n, \end{aligned}$$
(2)

which differs from (1) by the drift term, the nonlinear function f, and the diffusion term \({\text {div}}\sum _{j=1}^n a_{ij}u_j\nabla u_i\). System (2) is the mean-field limit of the particle system for N individuals

$$\begin{aligned} \begin{aligned}&\mathrm {d}Y_{k,i}^{N,\eta } = -\sum _{j=1}^n\frac{1}{N}\sum _{\ell =1}^N \nabla B_{ij}^\eta \big (Y_{k,i}^{N,\eta }-Y_{\ell ,j}^{N,\eta }\big )\mathrm {d}t + \sqrt{2\sigma _i}\mathrm {d}W_i^k(t), \\&Y_{k,i}^{N,\eta }(0)=\xi _i^k, \quad i=1,\ldots ,n,\ k=1,\ldots ,N, \end{aligned} \end{aligned}$$
(3)

where \((W_i^k(t))_{t\ge 0}\) are d-dimensional Brownian motions and \(\xi _i^1,\ldots ,\xi _i^N\) are independent and identically distributed (iid) random variables with the common probability density function \(u_{0,i}\). The functions

$$\begin{aligned} B_{ij}^\eta (x) = \eta ^{-d}B_{ij}\bigg (\frac{|x|}{\eta }\bigg ), \quad x\in {{\mathbb {R}}}^d, \end{aligned}$$
(4)

are interaction potentials regularizing the delta distribution \(\delta _0\), i.e. \(B_{ij}^\eta \rightarrow a_{ij}\delta _0\) as \(\eta \rightarrow 0\) in the sense of distributions.

System (1) is derived from an interacting particle system for n species with particle numbers \(N_1,\ldots ,N_n\), moving in the whole space \({{\mathbb {R}}}^d\). To simplify, we set \(N=N_i\) for all \(i=1,\ldots ,n\). The key idea of this paper is to consider interacting diffusion coefficients:

$$\begin{aligned} \begin{aligned}&\mathrm {d}X_{k,i}^{N,\eta } = -\nabla U_i(X_{k,i}^{N,\eta })\mathrm {d}t \\&\quad \qquad \quad + \bigg (2\sigma _i + 2\sum _{j=1}^nf_\eta \bigg (\frac{1}{N}\!\!\!\! \sum _{\begin{array}{c} \ell =1\\ (\ell ,j)\ne (k,i) \end{array}}^N\!\!\!\! B_{ij}^\eta (X_{k,i}^{N,\eta }-X_{\ell ,j}^{N,\eta })\bigg )\bigg )^{1/2}\mathrm {d}W_i^k(t), \\&X_{k,i}^{N,\eta }(0) = \xi _i^k, \quad i=1,\ldots ,n,\ k=1,\ldots ,N, \end{aligned} \end{aligned}$$
(5)

where \(f_\eta \) is a globally Lipschitz continuous approximation of f with a Lipschitz constant smaller or equal than \(\eta ^{-\alpha }\) for some small \(\alpha >0\). In view of (4), we can interpret the scaling parameter \(\eta \) as the interaction radius of each particle.

Equations (1) are derived from system (5) in the limit \(N\rightarrow \infty \), \(\eta \rightarrow 0\), with the scaling relation between \(\eta \) and N given in (9) below. First, for fixed \(\eta >0\), we perform a classical mean-field limit from (5) to the following auxiliary intermediate system:

$$\begin{aligned} \begin{aligned}&\mathrm {d}{\overline{X}}_{k,i}^{\eta } = -\nabla U_i({\overline{X}}_{k,i}^{\eta })\mathrm {d}t + \bigg (2\sigma _i + 2\sum _{j=1}^nf_\eta \big (B_{ij}^\eta * u_{\eta ,j}({\overline{X}}_{k,i}^\eta )\big )\bigg )^{1/2}\mathrm {d}W_i^k(t), \\&{\overline{X}}_{k,i}^\eta (0) = \xi _i^k, \quad i=1,\ldots ,n,\ k=1,\ldots ,N, \end{aligned} \end{aligned}$$
(6)

where we set \(u_{\eta ,j}({\overline{X}}_{k,i}^\eta )=u_{\eta ,j} (t,{\overline{X}}_{k,i}^\eta (t))\) for \(j=1,\ldots ,n\). The function \(u_{\eta ,j}\) satisfies the nonlocal cross-diffusion system

$$\begin{aligned} \begin{aligned}&\partial _t u_{\eta ,i} = {\text {div}}(u_{\eta ,i}\nabla U_i) + \Delta \bigg (\sigma _i u_{\eta ,i} + u_{\eta ,i}\sum _{j=1}^n f_\eta (B_{ij}^\eta * u_{\eta ,j})\bigg ), \\&u_{\eta ,i}(0) = u_i^0\ \text{ in } {{\mathbb {R}}}^d,\quad i=1,\ldots ,n, \end{aligned} \end{aligned}$$
(7)

and will be later identified as the probability density function of \({\overline{X}}_{k,i}^{\eta }\). Note that we consider N independent copies \({\overline{X}}_{k,i}^\eta \), \(k=1,\ldots ,N\), and the intermediate system depends on k only through the initial datum.

Then, passing to the limit \(N\rightarrow \infty \), \(\eta \rightarrow 0\) in (5) leads to the macroscopic system

$$\begin{aligned} \begin{aligned}&\mathrm {d}{\widehat{X}}_{k,i} = -\nabla U_i({\widehat{X}}_{k,i})\mathrm {d}t + \bigg (2\sigma _i + 2\sum _{j=1}^n f(a_{ij}u_{j}({\widehat{X}}_{k,i}))\bigg )^{1/2} \mathrm {d}W_i^k(t), \\&{\widehat{X}}_{k,i}^\eta (0) = \xi _i^k, \quad i=1,\ldots ,n,\ k=1,\ldots ,N, \end{aligned} \end{aligned}$$
(8)

where the functions \(u_i\) satisfy (1) and can be identified as the probability density functions of \({\widehat{X}}_{k,i}\). In this limit, we assume that there exists \(\delta >0\), depending on n, \(\min _i\sigma _i\), and T, such that

$$\begin{aligned} \eta ^{-2(d+1+\alpha )}\le \delta \log N \end{aligned}$$
(9)

holds, where \(\alpha \ge 0\) depends on the Lipschitz condition of f, see Assumption (A4) below, and that the function f and its derivatives or, alternatively the initial data, are sufficiently small (see Sect. 2 for details). The main result of the paper is the error estimate

$$\begin{aligned} \sup _{k=1,\ldots ,N}{{\mathbb {E}}}\bigg (\sum _{i=1}^n\sup _{0<s<T}\big |X_{k,i}^{N,\eta }(s) - {\widehat{X}}_{k,i}(s)\big |^2\bigg )\le C(T)\eta ^{2(1-\alpha )}. \end{aligned}$$
(10)

We prove this estimate for the potential \(U_i(x)=-\frac{1}{2}|x|^2\), but more general functions are possible; see Remark 1. Note that estimate (10) implies propagation of chaos; see Remark 6. In the case \(\alpha =0\), our scaling (9) for the multi-species case recovers the result in Jourdain and Méléard (1998), where a single-species, moderately interacting particle system with interaction in the diffusion part was considered. Our strategy is similar to that one of Jourdain and Méléard (1998) (and based on ideas of Oelschläger 1989). Since we allow for locally Lipschitz continuous nonlinearities only, we obtain a smaller convergence rate compared to Jourdain and Méléard (1998), which in fact is natural, since we approximate the nonlinearity with functions having a Lipschitz constant of order \(\eta ^{-\alpha }\). A difference to Jourdain and Méléard (1998) is that the authors assume that the diffusion matrix in the stochastic part is positive definite. We do not suppose such a condition, but we need a smallness condition on the nonlinearity for the existence proofs of systems (1) and (7).

Next, we present a brief overview on the existing literature concerning mean-field limits and moderately interacting many-particle limits in the context of diffusion equations. Mean-field limits from stochastic differential equations have been investigated since the 1980s; see the reviews (Golse 2003; Jabin and Wang 2017) and the classical works by Sznitman (1984, 1991). Oelschläger proved that in the many-particle limit, weakly interacting stochastic particle systems converge to a deterministic nonlinear process (Oelschläger 1984). Later, he generalized his approach for systems of reaction-diffusion equations (Oelschläger 1989) and porous-medium-type equations with quadratic diffusion (Oelschläger 1990), by using moderately interacting particle systems. We also refer to the recent work (Chen et al. 2018), which also includes numerical simulations. As already mentioned, moderate interactions in stochastic particle system with nonlinear diffusion coefficients were investigated for the first time in Jourdain and Méléard (1998). Later, Stevens derived the chemotaxis model from a many-particle system (Stevens 2000). Further works concern the mean-field limit leading to reaction-diffusion equations with nonlocal terms (Ichikawa et al. 2012), the hydrodynamic limit in a two-component system of Brownian motions to the cross-diffusion Maxwell–Stefan equations (Seo 2018), and the large population limit of point measure-valued Markov processes to nonlocal Lotka–Volterra systems with cross-diffusion (Fontbona and Méléard 2015). The latter model is similar to the nonlocal system (7). The limit from the nonlocal to the local diffusion system was shown in Moussa (2020) but only for triangular diffusion matrices. The many-particle limit from a particle system driven by Lévy noise to a fractional cross-diffusion system related to (2) was recently shown in Daus et al. (2020). Furthermore, the population system (1) was derived in Daus et al. (2019) from a time-continuous Markov chain model using the BBGKY hierarchy. This paper presents, up to our knowledge, the first rigorous derivation of the Shigesada–Kawasaki–Teramoto (SKT) model (1) from a stochastic particle system in the moderate many-particle limit.

Porous-medium-type equations can be derived from stochastic interacting particle systems by assuming interactions in the drift term (Figalli and Philipowski 2008) or in the diffusion term (Jourdain and Méléard 1998). We allow for interactions in the diffusion part but in a multi-species setting. The paper (Fontbona and Méléard 2015) is concerned with a multi-species framework too, but the authors assume bounded Lipschitz continuous interaction potentials and derive a nonlocal cross-diffusion system only. We are able to relax the assumptions and derive the local cross-diffusion system (1).

Compared to the work (Daus et al. 2019), we take the limits \(N\rightarrow \infty \), \(\eta \rightarrow 0\) simultaneously. However, our approach also implies the two-step limit. Indeed, we can first perform the limit \(N\rightarrow \infty \) for fixed \(\eta >0\) and afterward the limit \(\eta \rightarrow 0\) on the PDE level; see Lemma 9 and Theorem 3. The simultaneous limit \(N\rightarrow \infty \), \(\eta \rightarrow 0\), satisfying the scaling relation (9), gives a more complete picture, since we can prove the convergence in expectation for the difference of the solutions to the stochastic systems (5) and (8).

Finally, we remark that the cross-diffusion models (1) and (2) have quite different structural properties; also see (Burger et al. 2020a, b). First, system (2) has a formal gradient-flow structure for each species separately, while system (1) can be written, under the detailed-balance condition (Chen et al. 2019), only in a vector-valued gradient-flow form. Second, the segregation behavior of both models is different, i.e., segregation is stronger for the solutions to (2) than for model (1); see the numerical experiments in Sect. 7.

The paper is organized as follows. We present our assumptions and main results in Sect. 2. The existence of smooth solutions to the cross-diffusion systems (1) and (7) and an error estimate for the difference of the corresponding solutions is proved in Sects. 3 and 4, respectively. The proofs are based on Banach’s fixed-point theorem and higher-order estimations. We present the full proof since the environmental potential \(U_i(x)=-\frac{1}{2}|x|^2\) is not square-integrable, which requires some care; see the arguments following (22). Section 5 is concerned with the identification of the solutions to the local and nonlocal cross-diffusion systems (1) and (7), respectively, with the probability density functions associated to the particle systems (8) and (6), respectively. Error estimate (10), the main result of the paper, is proved in Sect. 6. In Sect. 7, we present Monte–Carlo simulations for an Euler–Maruyama discretization of system (5) and compare them to the numerical results from the particle system associated to (2). In the appendix, we recall some inequalities used in the paper.

2 Assumptions and Main Results

We impose the following assumptions:

  1. (A1)

    Data: \(\sigma _i\in (0,\infty )\) and \(\xi _i^1,\ldots ,\xi _i^N\) are independent and identically distributed (iid) square-integrable random variables with the common density function \(u_{0,i}\) for \(i=1,\ldots ,n\) on the probability space \((\Omega ,{\mathcal {F}},P)\).

  2. (A2)

    Environmental potential: \(U_i(x)=-\frac{1}{2}|x|^2\), \(i=1,\ldots ,n\).

  3. (A3)

    Interaction potential: \(B_{ij}\in C_0^\infty ({{\mathbb {R}}}^d)\) satisfies \({\text {supp}}(B_{ij})\subset B_1(0)\), where \(B_1(0)\) is the unit ball in \({{\mathbb {R}}}^d\) and \(i,j=1,\ldots ,n\).

  4. (A4)

    Nonlinearity: \(f\in W_{\mathrm{loc}}^{s+1,\infty }({{\mathbb {R}}};[0,\infty ))\) and \(f_\eta \in W^{s+1,\infty }({{\mathbb {R}}},[0,\infty ))\) is such that \(f_\eta =f\) on \([-a_\eta ,a_\eta ]\) and the Lipschitz constant of \(f_\eta \) is less than or equal to \(\eta ^{-\alpha }\) for a fixed \(\alpha \in [0,1)\). Here, \(s>d/2+1\) and \(a_\eta \rightarrow \infty \) as \(\eta \rightarrow 0\). If f is globally Lipschitz continuous, we set \(\alpha = 0\) and \(f_\eta = f\).

Remark 1

(Discussion) Environmental potential: The sign of \(U_i\) guarantees that the populations are dispersed since the drift term becomes \(-x\cdot \nabla u_i-u_i\). We have taken a quadratic potential \(U_i\) to simplify the presentation. “Dispersive” potentials (i.e. potentials \(U_i\) with \(\Delta U_i\le 0\)) are needed in the analysis, since we cannot bound terms including \(\Delta U_i\) if \(\Delta U_i\ge 0\). It is possible to choose general (dispersive) potentials \(U_i\in C^\infty ({{\mathbb {R}}}^d)\) such that \(\nabla U_i\) is globally Lipschitz continuous, \(D^kU_i\in L^\infty ({{\mathbb {R}}}^d)\) for \(k=2,\ldots ,s+2\), the Hessian \(D^2U_i\) is negative semidefinite, \(\Delta U_i<0\), and \(D^kU_i\) for \(k=3,\ldots ,s\) is sufficiently small in the \(L^\infty ({{\mathbb {R}}}^d)\) norm. Thus, we may choose \(U_i(x)=-|x|^2+g(x)\) and g is a smooth perturbation.

Nonlinearity: Since f is not assumed to be globally Lipschitz continuous, we need to approximate the nonlinearity. The condition on the Lipschitz constant of \(f_\eta \) ensures that we have a control on the growth of the Lipschitz constant of \(f_\eta \) in the limit \(N\rightarrow \infty \) and \(\eta \rightarrow 0\). This growth condition is needed in the proof of Lemma 9; see (34) and thereafter. The condition \(s>d/2+1\) ensures that the embedding \(H^s({{\mathbb {R}}}^d)\hookrightarrow W^{1,\infty }({{\mathbb {R}}}^d)\) is continuous, and this embedding is needed to obtain solutions in \(H^s({{\mathbb {R}}}^d)\) and to derive the estimates. \(\square \)

We introduce some notation. We set

$$\begin{aligned} a_{ij} = \int _{{{\mathbb {R}}}^d}B_{ij}(|x|)\mathrm {d}x, \quad i,j=1,\ldots ,n, \end{aligned}$$

\(B_{ij}^\eta (x)=\eta ^{-d}B_{ij}(|x|/\eta )\), \(A_{ij}=\Vert B_{ij}\Vert _{L^1({{\mathbb {R}}}^d)} =\Vert B_{ij}^\eta \Vert _{L^1({{\mathbb {R}}}^d)}\) and \(A=\max _{i,j=1,\ldots ,n}A_{ij}\). Let \(C_s>0\) be the constant of the continuous embedding \(H^s({{\mathbb {R}}}^d)\hookrightarrow L^\infty ({{\mathbb {R}}}^d)\) and set

$$\begin{aligned} I = [-2AC_s\Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)},2AC_s\Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}]. \end{aligned}$$
(11)

Then, for small \(\eta >0\) such that \(a_\eta \ge 2AC_s\Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}\), we have \(f_\eta =f\) on I.

First, we ensure that the nonlocal and local cross-diffusion systems (7) and (1), respectively, have global smooth solutions.

Theorem 2

(Existence for the nonlocal system) Let Assumptions (A2) and (A4) hold, \(u_0\in H^s({{\mathbb {R}}}^d;{{\mathbb {R}}}^n)\) for \(s>d/2+1\), and let \(\eta >0\) be such that \(a_\eta \ge 2AC_s\Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}\). There exists \(\varepsilon >0\) depending on \(u_0\) such that if \(\Vert f\Vert _{C^{s+1}(I)}\le \varepsilon \), system (7) possesses a unique solution \(u_\eta =(u_{\eta ,1},\ldots ,u_{\eta ,n})\) satisfying

$$\begin{aligned}&u_{\eta ,i}\in L^\infty (0,\infty ;H^s({{\mathbb {R}}}^d))\cap L^2(0,\infty ;H^{s+1}({{\mathbb {R}}}^d)), \\&\Vert u_\eta \Vert _{L^\infty (0,T;H^s({{\mathbb {R}}}^d))}^2 + \sigma _*\Vert \nabla u_\eta \Vert _{L^2(0,\infty ;H^{s}({{\mathbb {R}}}^d))}^2 \le \Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}^2, \end{aligned}$$

where \(0<\sigma _*<\sigma _{\mathrm{min}}:=\min _{i=1,\ldots ,n}\sigma _i\).

The dependence of \(\varepsilon \) on \(u_0\) can be made more explicit. The proof shows that we need to choose \(0<\varepsilon < C\sigma _{\mathrm{min}}^{1/2}\Vert u_0\Vert ^{-s}_{H^s({{\mathbb {R}}}^d)}\), where \(C>0\) is independent of \(u_0\) and \(\sigma _i\). Thus, if \(\Vert f\Vert _{C^{s+1}(I)}\) is finite, the global existence result is valid for small initial data.

Theorem 3

(Existence for the local system) Let \(u_0\) and \(\eta \) satisfy the assumptions of Theorem 2. Then there exists \(\varepsilon >0\) depending on \(u_0\) such that if \(\Vert f\Vert _{C^{s+1}(I)}\le \varepsilon \), system (1) possesses a unique solution \(u=(u_1,\ldots ,u_n)\) satisfying

$$\begin{aligned}&u_i\in L^\infty (0,\infty ;H^s({{\mathbb {R}}}^d))\cap L^2(0,\infty ;H^{s+1}({{\mathbb {R}}}^d)), \quad i=1,\ldots ,n, \\&\Vert u\Vert _{L^\infty (0,\infty ;H^s({{\mathbb {R}}}^d))}^2 + \sigma _*\Vert \nabla u\Vert _{L^2(0,\infty ;H^s({{\mathbb {R}}}^d))}^2 \le \Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}^2, \end{aligned}$$

where \(0<\sigma _*<\sigma _{\mathrm{min}}\). Moreover, with the solution \(u_\eta \) from Theorem 2, it holds that for an arbitrary \(T>0\),

$$\begin{aligned} \Vert u-u_\eta \Vert _{L^\infty (0,T;L^2({{\mathbb {R}}}^d))} + \Vert \nabla (u-u_\eta )\Vert _{L^2(0,T;L^2({{\mathbb {R}}}^d))} \le C(T)\eta . \end{aligned}$$

Next, we state an existence result for the stochastic particle systems (5), (6), and (8).

Proposition 4

Let Assumptions (A1)–(A4) hold and let \(\eta >0\), \(N\in {{\mathbb {N}}}\). Then:

(i) There exist unique square-integrable adapted stochastic processes with continuous paths, which are strong solutions to systems (5), (6), and (8), respectively.

(ii) For each \(t>0\), the (nNd)-dimensional random variables \({\overline{X}}^\eta (t)\) and \({\widehat{X}}(t)\) possess density functions \({\overline{u}}_\eta (t)^{\otimes N}\) and \({\widehat{u}}(t)^{\otimes N}\) with respect to the Lebesgue measure on \({{\mathbb {R}}}^{nNd}\), respectively.

The proof follows from Karatzas and Shreve (1991) and Nualart (2006). Indeed, Theorem 2.9 in (Karatzas and Shreve 1991, page 289) shows that there exist continuous square-integrable stochastic processes, which are strong solutions to (5), (6), and (8), respectively. Strong uniqueness is guaranteed by Theorem 2.5 in (Karatzas and Shreve 1991, page 287). We conclude from (Nualart 2006, Theorem 2.3.1) that \({\overline{X}}_\eta (t)\) and \({\widehat{X}}(t)\) are absolutely continuous with respect to the Lebesgue measure and thus, they possess density functions \({\overline{u}}_\eta (t,x)^{\otimes N}\) and \({\widehat{u}}(t,x)^{\otimes N}\), respectively. We prove in Sect. 5 that the density functions \({\overline{u}}_\eta \) and \({\widehat{u}}\) can be identified with \(u_\eta \) and u, the solutions to (7) and (1), respectively.

The following theorem is our main result.

Theorem 5

Let \(X_{k,i}^{N,\eta }\) and \({\widehat{X}}_{k,i}\) be the solutions to (5) and (8), respectively. Then there exist parameters \(\delta >0\), depending on n, \(\sigma _{\mathrm{min}}\), and T, and \(\varepsilon >0\), depending on \(u_0\), such that if \(\eta ^{-2(d+1+\alpha )}\le \delta \log N\) and \(\Vert f\Vert _{C^{s+1}(I)}\le \varepsilon \),

$$\begin{aligned} \sup _{k=1,\ldots ,N}{{\mathbb {E}}}\bigg (\sum _{i=1}^n\sup _{0<s<T} \big |(X_{k,i}^{N,\eta }-{\widehat{X}}_{k,i})(s)\big |^2\bigg ) \le C(T,n,\sigma _{\mathrm{min}})\eta ^{2(1-\alpha )}, \end{aligned}$$

where \(\alpha \ge 0\) is defined in Assumption (A4).

Remark 6

It is well-known that this result implies propagation of chaos in the single-species case; see, e.g., (Jabin and Wang 2017, Section 3.1). In the multi-species case, this generalizes for fixed k to the convergence of the k-marginal distribution \(F_k(t)\) of \((X_{{j_1},i_1}^{N,\eta }(t), \ldots , X_{{j_{k}},i_k}^{N,\eta }(t))\) at any time \(t>0\) towards the product measure \(\otimes _{\ell =1}^k u_{i_\ell }(\cdot ,t)\) as \(N\rightarrow \infty \), \(\eta \rightarrow 0\), i.e.

$$\begin{aligned} W_2^2\bigg (F_k(t),\bigotimes _{\ell =1}^k u_{i_\ell }(\cdot ,t)\bigg ) \le kC(T,n,\sigma _{\mathrm{min}})\eta \rightarrow 0, \end{aligned}$$

where \(W_2\) denotes the 2-Wasserstein distance. \(\square \)

3 Proof of Theorem 2

We prove the global existence of smooth solutions to the nonlocal system (7). Since \(\eta \) is fixed in the proof, we omit it for \(u_\eta \) to simplify the notation. We split the proof in several steps. In the first step, we prove the existence of local-in-time solutions satisfying \(\Vert u_i(t)\Vert _{H^s({{\mathbb {R}}}^d)}\le 2\Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}\) for \(0<t<T(\eta )\) for some (possibly) small \(T(\eta )>0\). Actually, we show in the second step, that the factor 2 can be replaced by one. This uniform estimate allows us in the third step to conclude the global existence.

Step 1: Local existence of solutions. In this step, the smallness conditions on \(\eta \) and f are not needed. The idea is to apply the Banach fixed-point theorem on the space

$$\begin{aligned} X_T := \big \{v\in L^\infty (0,T;H^s({{\mathbb {R}}}^d;{{\mathbb {R}}}^n)): \Vert v\Vert _{L^\infty (0,T;H^s({{\mathbb {R}}}^d))} \le 2\Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}\big \}, \end{aligned}$$

where \(T>0\) will be determined later in this proof. We define the fixed-point operator \(S:X_T\rightarrow X_T\), \(S(v)=u\), where u is the unique solution to the linear problem

$$\begin{aligned} \partial _t u_i = {\text {div}}(u_i\nabla U_i) + \Delta \big (u_i(\sigma _i + K_i(v(t,x)))\big ), \quad u_i(0)=u_{0,i}\quad \text{ in } {{\mathbb {R}}}^d,\ t>0,\nonumber \\ \end{aligned}$$
(12)

with \(K_i(v)=\sum _{j=1}^n f_\eta (B_{ij}^\eta *v_j)\ge 0\), \(i=1,\ldots ,n\). We need to show that S is well defined. We infer from Young’s convolution inequality (Lemma 11) and the embedding \(H^s({{\mathbb {R}}}^d)\hookrightarrow L^{\infty }({{\mathbb {R}}}^d)\) that

$$\begin{aligned} \sup _{0<t<T}\Vert \nabla K_i(v)\Vert _{L^\infty ({{\mathbb {R}}}^d)}&\le \sum _{j=1}^n\Vert f'_\eta \Vert _{L^\infty ({{\mathbb {R}}})}\Vert \nabla B_{ij}^\eta \Vert _{L^1({{\mathbb {R}}}^d)} \sup _{0<t<T}\Vert v_j(t)\Vert _{L^\infty ({{\mathbb {R}}}^d)} \nonumber \\&\le C(\eta ) \sum _{j=1}^n\Vert v_j\Vert _{L^\infty (0,T;H^s({{\mathbb {R}}}^d))} < \infty , \end{aligned}$$
(13)

i.e., \(K_i(v)\) is globally Lipschitz continuous. Therefore, a Galerkin argument to verify higher-order regularity shows that, for given \(v\in X_T\), there exists a unique solution \(u_i\in L^\infty (0,T;H^{s}({{\mathbb {R}}}^d))\cap L^2(0,T;H^{s+1}({{\mathbb {R}}}^d))\) to (12). It remains to show that \(u=(u_1,\ldots ,u_n)\in X_T\) for some \(T>0\). The estimations are not difficult, but since \(\nabla U_i\) is not square integrable, some care is needed.

First, we prove higher-order estimates for \(K_i(v)\). Let \(\alpha \in {{\mathbb {N}}}_0^d\) be a multi-index with order \(|\alpha |=m\le s\). By Lemma 13 and Young’s convolution inequality,

$$\begin{aligned}&\int _0^T\Vert D^\alpha K_i(v)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t \nonumber \\&\quad \le C\int _0^T\sum _{j=1}^n\Vert f'_\eta \Vert _{C^{m-1}({{\mathbb {R}}})}^2 \Vert B_{ij}^\eta *v_j\Vert _{L^\infty ({{\mathbb {R}}}^d)}^{2(m-1)} \Vert D^\alpha (B_{ij}^\eta *v_j)\Vert _{L^2({{\mathbb {R}}}^d)}^2 \mathrm {d}t \nonumber \\&\quad \le C(\eta )\int _0^T\sum _{j=1}^n\Vert B_{ij}^\eta \Vert _{L^1({{\mathbb {R}}}^d)}^{2m} \Vert v_j\Vert _{L^\infty ({{\mathbb {R}}}^d)}^{2(m-1)}\Vert D^\alpha v_j\Vert _{L^2({{\mathbb {R}}}^d)}^2 \mathrm {d}t \nonumber \\&\quad \le C(\eta )\sum _{j=1}^n\int _0^T\Vert v_j\Vert _{H^s({{\mathbb {R}}}^d)}^{2m} \mathrm {d}t < \infty , \end{aligned}$$
(14)

where here and in the following, \(C>0\), \(C(\eta )>0\), etc. are generic constants with values changing from line to line. In a similar way, applying Lemmas 11 and 12,

$$\begin{aligned}&\sup _{0<t<T}\Vert D^\alpha \nabla K_i(v)\Vert _{L^2({{\mathbb {R}}}^d)}^2 \le C\sup _{{0<t<T}}\sum _{j=1}^n\big \Vert D^\alpha \big (f_\eta '(B_{ij}^\eta *v_j) \nabla B_{ij}^\eta *v_j\big )\big \Vert _{L^2({{\mathbb {R}}}^d)}^2 \nonumber \\&\quad \le C\sup _{0<t<T}\sum _{j=1}^n\Big (\Vert f'_\eta (B_{ij}^\eta *v_j)\Vert _{L^\infty ({{\mathbb {R}}}^d)} \Vert \nabla B_{ij}^\eta \Vert _{L^1({{\mathbb {R}}}^d)}\Vert D^{m} v_j\Vert _{L^2({{\mathbb {R}}}^d)} \nonumber \\&\qquad + \Vert D^{m} (f'_\eta (B_{ij}^\eta *v_j))\Vert _{L^2({{\mathbb {R}}}^d)} \Vert \nabla B_{ij}^\eta \Vert _{L^1({{\mathbb {R}}}^d)}\Vert v_j\Vert _{L^\infty ({{\mathbb {R}}}^d)}\Big )^2 \le C(\eta ), \end{aligned}$$
(15)

since, according to Lemma 13, we can bound \(\sup _{0<t<T}\Vert D^{m}(f'_\eta (B_{ij}*v_j))\Vert _{L^2({{\mathbb {R}}}^d)}\) in terms of \(\Vert f_\eta \Vert _{C^{s+1}({{\mathbb {R}}})}\), \(\Vert B_{ij}^\eta \Vert _{L^1({{\mathbb {R}}}^d)}\), and \(\sup _{0<t<T}\Vert v_j\Vert _{H^s({{\mathbb {R}}}^d)}\), and it holds that \(\Vert \nabla B_{ij}^\eta \Vert _{L^1({{\mathbb {R}}}^d)}\le C(\eta )\).

We proceed with the proof of \(u\in X_T\) for some \(T>0\). Applying \(D^\alpha \) to (12), multiplying the resulting equation by \(D^\alpha u_i\), and integrating over \((0,\tau )\times {{\mathbb {R}}}^d\) for \(\tau <T\) yields

$$\begin{aligned}&\frac{1}{2}\int _{{{\mathbb {R}}}^d} |D^\alpha u_i(\tau )|^2 \mathrm {d}x - \frac{1}{2}\int _{{{\mathbb {R}}}^d}|D^\alpha u_{0,i}|^2\mathrm {d}x \nonumber \\&\quad + \sigma _i\int _0^\tau \int _{{{\mathbb {R}}}^d}|\nabla D^\alpha u_i|^2 \mathrm {d}x\mathrm {d}t = I_1 + I_2 + I_3, \end{aligned}$$
(16)

where

$$\begin{aligned} I_1&= -\int _0^\tau \int _{{{\mathbb {R}}}^d}\nabla D^\alpha u_i\cdot D^\alpha (u_i\nabla U_i)\mathrm {d}x\mathrm {d}t, \\ I_2&= -\int _0^\tau \int _{{{\mathbb {R}}}^d}\nabla D^\alpha u_i\cdot D^\alpha (\nabla u_iK_i(v))\mathrm {d}x\mathrm {d}t, \\ I_3&= -\int _0^\tau \int _{{{\mathbb {R}}}^d}\nabla D^\alpha u_i\cdot D^\alpha (u_i\nabla K_i(v))\mathrm {d}x\mathrm {d}t. \end{aligned}$$

First, let \(|\alpha |=m=0\). Then, integrating by parts in \(I_1\), using Young’s inequality, and observing that \(U_i (x)= -\frac{1}{2}|x|^2\),

$$\begin{aligned} I_1&= \frac{1}{2}\int _0^\tau \int _{{{\mathbb {R}}}^d}u_i^2\Delta U_i \mathrm {d}x\mathrm {d}t = -\frac{d}{2}\int _0^\tau \int _{{{\mathbb {R}}}^d}u_i^2\mathrm {d}x\mathrm {d}t \le 0, \\ I_2&= -\int _0^\tau \int _{{{\mathbb {R}}}^d}K_i(v)|\nabla u_i|^2 \mathrm {d}x\mathrm {d}t \le 0, \\ I_3&\le \frac{\sigma _i}{2}\int _0^\tau \int _{{{\mathbb {R}}}^d}|\nabla u_i|^2\mathrm {d}x\mathrm {d}t\\&\quad + \frac{1}{2\sigma _i} \Vert \nabla K_i(v)\Vert _{L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d))}^2\int _0^\tau \Vert u_i\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t, \end{aligned}$$

where we used \(K_i(v)\ge 0\) for \(I_2\). It follows from (13) that

$$\begin{aligned} I_1+I_2+I_3 \le \frac{\sigma _i}{2}\int _0^\tau \int _{{{\mathbb {R}}}^d}|\nabla u_i|^2\mathrm {d}x\mathrm {d}t + C\int _0^\tau \Vert u_i\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t, \end{aligned}$$

where \(C>0\) depends on the \(L^\infty (0,T;H^s({{\mathbb {R}}}^d))\) norm of v. Inserting this estimate into (16) with \(\alpha =0\) and applying the Gronwall inequality, we infer that

$$\begin{aligned} \int _{{{\mathbb {R}}}^d}u_i(\tau )^2\mathrm {d}x + \frac{\sigma _i}{2} \int _0^\tau \int _{{{\mathbb {R}}}^d}|\nabla u_i|^2\mathrm {d}x\mathrm {d}t \le C(u_0)e^{C\tau }. \end{aligned}$$

This shows that \(u_i\) is bounded in \(L^\infty (0,T;L^2({{\mathbb {R}}}^d))\) and \(L^2(0,T;H^1({{\mathbb {R}}}^d))\).

Now, let \(|\alpha |=m\ge 1\). Then, integrating by parts, using \(\Delta U_i\le 0\), and applying Young’s inequality again,

$$\begin{aligned} I_1&= \frac{1}{2}\int _0^\tau \int _{{{\mathbb {R}}}^d}(D^\alpha u_i)^2\Delta U_i \mathrm {d}x\mathrm {d}t \\&\quad - \int _0^\tau \int _{{{\mathbb {R}}}^d}\nabla D^\alpha u_i\cdot \big (D^\alpha (u_i\nabla U_i) -D^\alpha u_i\nabla U_i\big )\mathrm {d}x\mathrm {d}t \\&\le \frac{\sigma _i}{4}\int _0^\tau \int _{{{\mathbb {R}}}^d}|\nabla D^\alpha u_i|^2 \mathrm {d}x\mathrm {d}t \\&\quad + \sum _{0<|\beta |\le |\alpha |}\int _0^\tau c_\beta \Vert D^{\alpha -\beta }u_i\Vert _{L^2({{\mathbb {R}}}^d)}^2 \Vert D^\beta \nabla U_i\Vert _{L^\infty ({{\mathbb {R}}}^d)}^2 \mathrm {d}t \\&\le \frac{\sigma _i}{4}\int _0^\tau \int _{{{\mathbb {R}}}^d}|\nabla D^\alpha u_i|^2 \mathrm {d}x\mathrm {d}t + C\int _0^\tau \Vert u_i\Vert _{H^{m-1}({{\mathbb {R}}}^d)}^2\mathrm {d}t, \end{aligned}$$

where we used the fact that \(D^\beta \nabla U_i\) is bounded for \(|\beta |=1\) and vanishes for \(|\beta |>1\). It follows from integration by parts, \(K_i(v)\ge 0\), and Lemma 14 that

$$\begin{aligned} I_2&= -\int _0^\tau \int _{{{\mathbb {R}}}^d}\nabla D^\alpha u_i\cdot \big (D^\alpha (\nabla u_i K_i(v)) - \nabla D^\alpha u_i K_i(v)\big )\mathrm {d}x\mathrm {d}t \\&\quad - \int _0^\tau \int _{{{\mathbb {R}}}^d}K_i(v)|\nabla D^\alpha u_i|^2\mathrm {d}x\mathrm {d}t \\&\le \frac{\sigma _i}{4}\int _0^\tau \int _{{{\mathbb {R}}}^d}|\nabla D^\alpha u_i|^2\mathrm {d}x\mathrm {d}t + C\int _0^\tau \big (\Vert DK_i(v)\Vert _{L^\infty ({{\mathbb {R}}}^d)} \Vert D^{m-1}\nabla u_i\Vert _{L^2({{\mathbb {R}}}^d)} \\&\quad + \Vert D^{m} K_i(v)\Vert _{L^2({{\mathbb {R}}}^d)}\Vert \nabla u_i\Vert _{L^\infty ({{\mathbb {R}}}^d)}\big )^2\mathrm {d}x\mathrm {d}t. \end{aligned}$$

We infer from estimates (13) and (14) for \(K_i(v)\) and the embedding \(H^s({{\mathbb {R}}}^d)\hookrightarrow W^{1,\infty }({{\mathbb {R}}}^d)\) that

$$\begin{aligned} I_2 \le \frac{\sigma _i}{4}\int _0^\tau \int _{{{\mathbb {R}}}^d}|\nabla D^\alpha u_i|^2\mathrm {d}x\mathrm {d}t + C\int _0^\tau \Vert u_i\Vert _{H^s({{\mathbb {R}}}^d)}^2\mathrm {d}t. \end{aligned}$$

Finally, we use Lemma 12 and estimates (13) and (15) to obtain

$$\begin{aligned} I_3&\le \frac{\sigma _i}{4}\int _0^\tau \int _{{{\mathbb {R}}}^d}|\nabla D^\alpha u_i|^2 \mathrm {d}x\mathrm {d}t \\&\quad + C\int _0^\tau \int _{{{\mathbb {R}}}^d}\big (\Vert u_i\Vert _{L^\infty ({{\mathbb {R}}}^d)} \Vert D^{m}\nabla K_i(v)\Vert _{L^2({{\mathbb {R}}}^d)} \\&\quad + \Vert D^{m} u_i\Vert _{L^2({{\mathbb {R}}}^d)}\Vert \nabla K_i(v)\Vert _{L^\infty ({{\mathbb {R}}}^d)}\big )^2\mathrm {d}x\mathrm {d}t \\&\le \frac{\sigma _i}{4}\int _0^\tau \int _{{{\mathbb {R}}}^d}|\nabla D^\alpha u_i|^2 \mathrm {d}x\mathrm {d}t + C(\eta )\int _0^\tau \Vert u_i\Vert _{H^s({{\mathbb {R}}}^d)}^2\mathrm {d}t. \end{aligned}$$

Inserting these estimates into (16) and summing over \(|\alpha |\le s\), we arrive at

$$\begin{aligned} \Vert u_i(\tau )\Vert _{H^s({{\mathbb {R}}}^d)}^2&+ \frac{\sigma _i}{4}\int _0^\tau \Vert \nabla u_i\Vert _{H^s({{\mathbb {R}}}^d)}^2\mathrm {d}t \le \Vert u_{0,i}\Vert _{H^s({{\mathbb {R}}}^d)}^2 \\&+ C(\eta )\int _0^\tau \Vert u_i\Vert _{H^s({{\mathbb {R}}}^d)}^2\mathrm {d}t. \end{aligned}$$

Summing over \(i=1,\ldots ,n\) and applying Gronwall’s inequality gives

$$\begin{aligned} \Vert u(\tau )\Vert _{H^s({{\mathbb {R}}}^d)}^2\le \Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}^2 e^{C(\eta )\tau } \le \Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}^2 e^{C(\eta )T}. \end{aligned}$$

Choosing \(T>0\) sufficiently small, we can ensure that \(\Vert u(\tau )\Vert _{H^s({{\mathbb {R}}}^d)} \le 2\Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}\) for all \(0<\tau <T\). This shows that \(u\in X_T\), i.e., the operator is well-defined.

Next, we prove that \(S:X_T\rightarrow X_T\) is a contraction. Let v, \(w\in X_T\) and set \({\bar{v}}=S(v)\) and \({\bar{w}}=S(w)\). Taking the difference of equations (12) satisfied by \({\bar{v}}_i\) and \({\bar{w}}_i\), respectively, using the test function \({\bar{v}}_i-{\bar{w}}_i\), and integrating by parts, it follows that

$$\begin{aligned} \frac{1}{2}\int _{{{\mathbb {R}}}^d}({\bar{v}}_i-{\bar{w}}_i)(\tau )^2\mathrm {d}x + \sigma _i\int _0^\tau \int _{{{\mathbb {R}}}^d}|\nabla ({\bar{v}}_i-{\bar{w}}_i)|^2\mathrm {d}x\mathrm {d}t = I_4+I_5+I_6, \end{aligned}$$
(17)

where

$$\begin{aligned} I_4&= \frac{1}{2}\int _0^\tau \int _{{{\mathbb {R}}}^d}\Delta U_i({\bar{v}}_i-{\bar{w}}_i)^2\mathrm {d}x\mathrm {d}t \le 0, \\ I_5&= -\int _0^\tau \int _{{{\mathbb {R}}}^d}\nabla \big (({\bar{v}}_i-{\bar{w}}_i)K_i(v)\big )\cdot \nabla ({\bar{v}}_i-{\bar{w}}_i)\mathrm {d}x\mathrm {d}t, \\ I_6&= -\int _0^\tau \int _{{{\mathbb {R}}}^d}\nabla \big ({\bar{w}}_i(K_i(v)-K_i(w))\big )\cdot \nabla ({\bar{v}}_i-{\bar{w}}_i)\mathrm {d}x\mathrm {d}t. \end{aligned}$$

Because of \(K_i(v)\ge 0\) and estimate (13) for \(\nabla K_i(v)\), we find that, by Young’s inequality,

$$\begin{aligned} I_5&= -\int _0^\tau \int _{{{\mathbb {R}}}^d}K_i(v)|\nabla ({\bar{v}}_i-{\bar{w}}_i)|^2\mathrm {d}x\mathrm {d}t \\&\quad - \int _0^\tau \int _{{{\mathbb {R}}}^d}({\bar{v}}_i-{\bar{w}}_i)\nabla K_i(v)\cdot \nabla ({\bar{v}}_i-{\bar{w}}_i) \mathrm {d}x\mathrm {d}t \\&\le \frac{\sigma _i}{4}\int _0^\tau \Vert \nabla ({\bar{v}}_i-{\bar{w}}_i)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C(\sigma _i)\int _0^\tau \Vert {\bar{v}}_i-{\bar{w}}_i\Vert _{L^2({{\mathbb {R}}}^d)}^2 \Vert \nabla K_i(v)\Vert _{L^\infty ({{\mathbb {R}}}^d)}^2\mathrm {d}t \\&\le \frac{\sigma _i}{4}\int _0^\tau \Vert \nabla ({\bar{v}}_i-{\bar{w}}_i)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C(\eta )\int _0^\tau \Vert {\bar{v}}_i-{\bar{w}}_i\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t. \end{aligned}$$

It follows again from Young’s inequality that

$$\begin{aligned} I_6 \le&\frac{\sigma _i}{4}\int _0^\tau \Vert \nabla ({\bar{v}}_i-{\bar{w}}_i)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C(\sigma _i)\int _0^\tau \Vert \nabla {\bar{w}}_i\Vert _{L^\infty ({{\mathbb {R}}}^d)}^2 \Vert K_i(v)-K_i(w)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t \nonumber \\&+ C(\sigma _i)\int _0^\tau \Vert {\bar{w}}_i\Vert _{L^\infty ({{\mathbb {R}}}^d)}^2 \Vert \nabla (K_i(v)-K_i(w))\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t. \end{aligned}$$
(18)

Since \({\bar{w}}\in X_T\), we have \(\Vert \nabla {\bar{w}}_i\Vert _{L^\infty ({{\mathbb {R}}}^d)}\le C\Vert {\bar{w}}_i\Vert _{H^s({{\mathbb {R}}}^d)}\le C(u_0)\) and \(\Vert {\bar{w}}_i\Vert _{L^\infty ({{\mathbb {R}}}^d)}\le C(u_0)\). We use the fact that \(f_\eta \) and \(f'_\eta \) are globally Lipschitz continuous:

$$\begin{aligned} \Vert K_i(v)-K_i(w)\Vert _{L^2({{\mathbb {R}}}^d)}&\le C(\eta )\sum _{j=1}^n\Vert B_{ij}^\eta *(v_j-w_j)\Vert _{L^2({{\mathbb {R}}}^d)}\\&\le C(\eta )\Vert v-w\Vert _{L^2({{\mathbb {R}}}^d)}, \\ \Vert \nabla (K_i(v)-K_i(w))\Vert _{L^2({{\mathbb {R}}}^d)}&\le \sum _{j=1}^n\Vert (f'_\eta (B_{ij}^\eta *v_j)-f'_\eta (B_{ij}^\eta *w_j)) B_{ij}^\eta *\nabla v_j\Vert _{L^2({{\mathbb {R}}}^d)} \\&\quad + \sum _{j=1}^n\Vert f'_\eta (B_{ij}^\eta *w_j)\nabla B_{ij}^\eta *(v_j-w_j)\Vert _{L^2({{\mathbb {R}}}^d)} \\&\le C(\eta )\sum _{j=1}^n\Vert v_j-w_j\Vert _{L^2({{\mathbb {R}}}^d)}\Vert B_{ij}^\eta \Vert _{L^1({{\mathbb {R}}}^d)} \Vert \nabla v_j\Vert _{L^\infty ({{\mathbb {R}}}^d)} \\&\quad + C(\eta )\sum _{j=1}^n\Vert \nabla B_{ij}^\eta \Vert _{L^1({{\mathbb {R}}}^d)} \Vert v_j-w_j\Vert _{L^2({{\mathbb {R}}}^d)} \\&\le C(\eta )\Vert v-w\Vert _{L^2({{\mathbb {R}}}^d)}. \end{aligned}$$

Inserting these inequalities into (18) and summarizing the estimates for \(I_4\), \(I_5\), and \(I_6\), we conclude from (17) and summation over \(i=1,\ldots ,n\) that

$$\begin{aligned}&\frac{1}{2}\Vert ({\bar{v}}-{\bar{w}})(\tau )\Vert _{L^2({{\mathbb {R}}}^d)}^2 + \sum _{i=1}^n\frac{\sigma _i}{4}\int _0^\tau \Vert \nabla ({\bar{v}}_i-{\bar{w}}_i)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t \\&\quad \le C_1\int _0^\tau \Vert {\bar{v}}-{\bar{w}}\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C_2\tau \Vert v-w\Vert _{L^\infty (0,\tau ;L^2({{\mathbb {R}}}^d))}^2. \end{aligned}$$

We apply Gronwall’s inequality and the supremum over \(0<\tau <T\) to find that

$$\begin{aligned} \Vert {\bar{v}}-{\bar{w}}\Vert _{L^\infty (0,T;L^2({{\mathbb {R}}}^d))}^2 \le {2}C_2 e^{{2}C_1T}T\Vert v-w\Vert _{L^\infty (0,T;L^2({{\mathbb {R}}}^d))}^2. \end{aligned}$$

Thus, choosing \(T>0\) such that \({2}C_2e^{{2}C_1T}T<1\), we infer that \(S:X_T\rightarrow X_T\) is a contraction. By Banach’s fixed-point theorem, there exists a unique solution \(u\in L^\infty (0,T;H^s({{\mathbb {R}}}^d))\cap L^2(0,T;H^{s+1}({{\mathbb {R}}}^d))\) to (7).

Step 2: A priori estimates. Let \(u=u_\eta \) be the unique solution to (7). We know from Step 1 that \(\Vert u_i(t)\Vert _{L^\infty ({{\mathbb {R}}}^d)}\le {C_s}\Vert u_i(t)\Vert _{H^s({{\mathbb {R}}}^d)} \le 2C_s\Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}\) for any \(0<t<T\). Recall that \(T=T(\eta )\) and hence we do not have uniform estimates in \(\eta \) even for small \(T>0\) at this step. We show in this step the estimate \(\Vert u_i(t)\Vert _{H^s({{\mathbb {R}}}^d)} \le \Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}\), which allows us to conclude that the end time T can be arbitrary and actually does not depend on \(\eta \). We apply \(D^\alpha \) to (7) (with \(|\alpha |=m\le s\)), multiply the resulting equation by \(D^\alpha u_i\), and integrate over \((0,\tau )\times {{\mathbb {R}}}^d\) for \(\tau <T\), similarly to the corresponding estimate in Step 1:

$$\begin{aligned}&\frac{1}{2}\int _{{{\mathbb {R}}}^d}|D^\alpha u_i(\tau )|^2{\mathrm {d}x} - \frac{1}{2}\int _{{{\mathbb {R}}}^d}|D^\alpha u_{0,i}|^2{\mathrm {d}x} \nonumber \\&\quad + \sigma _i \int _0^\tau \int _{{{\mathbb {R}}}^d}|\nabla D^\alpha u_i|^2\mathrm {d}x\mathrm {d}t = I_7 + I_8 + I_9, \end{aligned}$$
(19)

where

$$\begin{aligned} I_7&= -\int _0^\tau \int _{{{\mathbb {R}}}^d}\nabla D^\alpha u_i\cdot D^\alpha (u_i\nabla U_i)\mathrm {d}x\mathrm {d}t, \\ I_8&= -\int _0^\tau \int _{{{\mathbb {R}}}^d}\nabla D^\alpha u_i\cdot D^\alpha (\nabla u_iK_i(u))\mathrm {d}x\mathrm {d}t, \\ I_9&= -\int _0^\tau \int _{{{\mathbb {R}}}^d}\nabla D^\alpha u_i\cdot D^\alpha (u_i \nabla K_i(u))\mathrm {d}x\mathrm {d}t, \end{aligned}$$

and we recall that \(K_i(u)=\sum _{j=1}^n f_\eta (B_{ij}^\eta *u_j)\).

First, let \(m=0\). Arguing similarly as for \(I_1\) and \(I_2\), we find that \(I_7\le 0\) and \(I_8\le 0\). We estimate \(\nabla K_i(u)=\sum _{j=1}^n f'_\eta (B_{ij}^\eta *u_j)B_{ij}^\eta *\nabla u_j\):

$$\begin{aligned} \Vert \nabla K_i(u)\Vert _{L^2({{\mathbb {R}}}^d)} \le A\sum _{j=1}^n\Vert f'_\eta (B_{ij}^\eta *u_j)\Vert _{L^\infty ({{\mathbb {R}}}^d)} \Vert \nabla u_j\Vert _{L^2({{\mathbb {R}}}^d)}, \end{aligned}$$
(20)

recalling that \(A=\max _{i,j=1,\ldots ,n}\) \(\Vert B_{ij}^\eta \Vert _{L^1({{\mathbb {R}}}^d)}\). This gives for \(m=0\):

$$\begin{aligned} I_9&\le \Vert u_i\Vert _{L^\infty (0,\tau ;L^\infty ({{\mathbb {R}}}^d))}\int _0^\tau \Vert \nabla u_i\Vert _{L^2({{\mathbb {R}}}^d)}\Vert \nabla K_i(u)\Vert _{L^2({{\mathbb {R}}}^d)}\mathrm {d}t \\&\le C\Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}\sum _{j=1}^n \Vert f'_\eta (B_{ij}^\eta *u_j)\Vert _{L^\infty (0,\tau ;L^\infty ({{\mathbb {R}}}^d))} \int _0^\tau \Vert \nabla u_j\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t. \end{aligned}$$

From this point on, we will need the smallness condition on \(f_\eta \) and \(f'_\eta \). Because of

$$\begin{aligned} \Vert B_{ij}^\eta *u_j(t)\Vert _{L^\infty ({{\mathbb {R}}}^d)}\le \Vert B_{ij}^\eta \Vert _{L^1({{\mathbb {R}}}^d)} C_s\Vert u_j(t)\Vert _{H^s({{\mathbb {R}}}^d)} \le 2AC_s\Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}, \end{aligned}$$
(21)

where \(C_s>0\) is the constant of the embedding \(H^s({{\mathbb {R}}}^d)\hookrightarrow L^\infty ({{\mathbb {R}}}^d)\), \((B_{ij}^\eta *u_j(t))(x)\) lies in the interval \(I=[-2AC_s\Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)},2AC_s\Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}]\) for \(0<t<T\) and \(x\in {{\mathbb {R}}}^d\). On this interval, \(f_\eta =f\) if \(\eta >0\) is sufficiently small. From now on, we use \(f\le \varepsilon \) and \(|f'|\le \varepsilon \) on I for a small \(\varepsilon >0\). Thus, we have

$$\begin{aligned} I_9 \le C\varepsilon \Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}{\sum _{j=1}^n \int _0^\tau \Vert \nabla u_j\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t}. \end{aligned}$$

Inserting these estimates into (19), we conclude that

$$\begin{aligned} \Vert u_i(\tau )\Vert _{L^2({{\mathbb {R}}}^d)}^2 + \big (\sigma _i-C\varepsilon \Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}\big ) \int _0^\tau \Vert \nabla u_i\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t\le \Vert u_{0,i}\Vert _{L^2({{\mathbb {R}}}^d)}^2. \end{aligned}$$

Choosing \(\varepsilon >0\) sufficiently small, this gives an estimate for \(u_i\) in \(L^\infty (0,T;L^2({{\mathbb {R}}}^d))\cap L^2(0,T;H^1({{\mathbb {R}}}^d))\).

Next, let \(m\ge 1\). The estimate for \(I_7\) is delicate since \(\nabla U_i\not \in L^2({{\mathbb {R}}}^d)\), and the corresponding estimate for \(I_1\) cannot be directly used. We split \(I_7\) into two parts:

$$\begin{aligned} I_7&= \int _0^\tau \int _{{{\mathbb {R}}}^d}D^\alpha u_i D^\alpha (\nabla u_i\cdot \nabla U_i + u_i\Delta U_i)\mathrm {d}x\mathrm {d}t \nonumber \\&= \int _0^\tau \int _{{{\mathbb {R}}}^d}D^\alpha u_i\big (D^\alpha (\nabla u_i\cdot \nabla U_i) - D^\alpha \nabla u_i\cdot \nabla U_i\big )\mathrm {d}x\mathrm {d}t \nonumber \\&\quad + \int _0^\tau \int _{{{\mathbb {R}}}^d}D^\alpha u_i\big (D^\alpha (u_i\Delta U_i) - D^\alpha u_i\Delta U_i\big )\mathrm {d}x\mathrm {d}t, \end{aligned}$$
(22)

noting that the second terms in both integrals are the same (with different signs) because of

$$\begin{aligned} -\int _{{{\mathbb {R}}}^d}D^\alpha u_i D^\alpha \nabla u_i\cdot \nabla U_i\mathrm {d}x = -\frac{1}{2}\int _{{{\mathbb {R}}}^d}\nabla (D^\alpha u_i)^2\cdot {\nabla } U_i\mathrm {d}x = \frac{1}{2}\int _{{{\mathbb {R}}}^d}(D^\alpha u_i)^2\Delta U_i\mathrm {d}x. \end{aligned}$$

Moreover, the last integral in (22) vanishes since \(\Delta U_i=-{d}\). In the first integral of the right-hand side of (22), the first-order derivative of \(U_i\) cancels, while the second-order derivative equals \(\partial ^2 U_i/\partial x_j\partial x_k=-\delta _{jk}\) and all higher-order derivatives of \(U_i\) vanish. Then a straightforward computation leads to

$$\begin{aligned} I_7 = -d\int _0^\tau \int _{{{\mathbb {R}}}^d}(D^\alpha u_i)^2 \mathrm {d}x\mathrm {d}t \le 0. \end{aligned}$$

For the estimates of \(I_8\) and \(I_9\), we need a smallness condition on f and its derivatives. We apply Young’s inequality and Lemma 12 to estimate the (more delicate) term \(I_9\):

$$\begin{aligned} I_9&\le \frac{\sigma _i}{4}\int _0^\tau \Vert \nabla D^\alpha u_i\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C(\sigma _i)\int _0^\tau \Vert D^\alpha (u_i\nabla K_i(u))\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t \\&\le \frac{\sigma _i}{4}\int _0^\tau \Vert \nabla D^\alpha u_i\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C\int _0^\tau \big (\Vert u_i\Vert _{L^\infty ({{\mathbb {R}}}^d)} \Vert D^{m}\nabla K_i(u)\Vert _{L^2({{\mathbb {R}}}^d)} \\&\quad + \Vert D^{m} u_i\Vert _{L^2({{\mathbb {R}}}^d)}\Vert \nabla K_i(u)\Vert _{L^\infty ({{\mathbb {R}}}^d)}\big )^2\mathrm {d}t. \end{aligned}$$

Estimate (21) shows that \(f_\eta =f\) and \(|f'|\le \varepsilon \) on I. Then, by similar arguments leading to (20),

$$\begin{aligned} \Vert \nabla K_i(u)\Vert _{L^\infty ({{\mathbb {R}}}^d)}&{\le A\sum _{j=1}^n\Vert f_\eta '(B_{ij}^\eta *u_j)\Vert _{L^\infty ({{\mathbb {R}}}^d)} \Vert \nabla u_j\Vert _{L^\infty ({{\mathbb {R}}}^d)}} \\&\le A\varepsilon \Vert \nabla u\Vert _{L^\infty ({{\mathbb {R}}}^d)} \le \varepsilon AC_s\Vert \nabla u\Vert _{H^s({{\mathbb {R}}}^d)}. \end{aligned}$$

Moreover, using Lemma 13, the embedding \(H^s({{\mathbb {R}}}^d)\hookrightarrow W^{1,\infty }({{\mathbb {R}}}^d)\), and \(m\le s\),

$$\begin{aligned}&\Vert D^{m}\nabla K_i(u)\Vert _{L^2({{\mathbb {R}}}^d)} \le A\sum _{j=1}^n\Vert \nabla u_j\Vert _{L^\infty ({{\mathbb {R}}}^d)} \Vert D^{m}(f'_\eta (B_{ij}^\eta *u_j))\Vert _{L^2({{\mathbb {R}}}^d)} \\&\quad \le C\sum _{j=1}^n\Vert \nabla u_j\Vert _{H^s({{\mathbb {R}}}^d)}\Vert f''\Vert _{C^{m-1}(I)} \Vert B_{ij}^\eta *u_j\Vert _{L^\infty ({{\mathbb {R}}}^d)}^{m-1} \Vert B_{ij}^\eta *D^{m} u_j\Vert _{L^2({{\mathbb {R}}}^d)} \\&\quad \le \varepsilon C\Vert \nabla u\Vert _{H^s({{\mathbb {R}}}^d)}\Vert u\Vert _{L^\infty ({{\mathbb {R}}}^d)}^{m-1} \Vert D^{m} u\Vert _{L^2({{\mathbb {R}}}^d)} \le \varepsilon C\Vert \nabla u\Vert _{H^s({{\mathbb {R}}}^d)}\Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}^{s}, \end{aligned}$$

recalling definition (11) of the interval I. Consequently, the estimate for \(I_9\) becomes

$$\begin{aligned} I_9 \le \frac{\sigma _i}{4}\int _0^\tau \Vert \nabla D^\alpha u_i\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C\varepsilon ^2\Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}^{2s}\int _0^\tau \Vert \nabla u\Vert _{H^s({{\mathbb {R}}}^d)}^2\mathrm {d}t. \end{aligned}$$

The term \(I_8\) is treated in a similar way, resulting in

$$\begin{aligned} I_8 \le \frac{\sigma _i}{4}\int _0^\tau \Vert \nabla D^\alpha u_i\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C\varepsilon ^2\Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}^{2s}\int _0^\tau \Vert \nabla u\Vert _{H^s({{\mathbb {R}}}^d)}^2\mathrm {d}t. \end{aligned}$$

Set \(\sigma _{\mathrm{min}}=\min _{i=1,\ldots ,n}\sigma _i>0\). We conclude from (19) after summation over \(|\alpha |\le s\) and \(i=1,\ldots ,n\) that

$$\begin{aligned} \Vert u(\tau )\Vert _{H^s({{\mathbb {R}}}^d)}^2 + \big (\sigma _{\mathrm{min}}-C\varepsilon ^2\Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}^s\big ) \int _0^\tau \Vert \nabla u\Vert _{H^s({{\mathbb {R}}}^d)}^2\mathrm {d}t \le \Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}^2. \end{aligned}$$

Thus, for sufficiently small \(\varepsilon >0\), we arrive at the desired estimate uniform in \(\eta \).

Step 3: Global existence and uniqueness. We have proved that \(\Vert u(\tau )\Vert _{H^s({{\mathbb {R}}}^d)}\le \Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}\) for \(0<\tau \le T\) for some sufficiently small \(T>0\). The value for T does not depend on the solution. Thus, we can use u(T) as an initial datum and solve the equation in [T, 2T]. Repeating this argument leads to a global solution. The uniqueness of a solution follows after standard estimates, based on the global Lipschitz continuity of \(f_\eta \) and \(f'_\eta \) (see the calculations for \(I_4\), \(I_5\), and \(I_6\)) and choosing \(\varepsilon >0\) sufficiently small.

4 Proof of Theorem 3

We show the global existence of smooth solutions to the local system (1) and an error estimate for the difference of the solutions to (1) and (7), respectively. First, we prove that a solution \(u_\eta \) to (7) converges to a solution u to (1) in a certain sense. Then we prove the error bound in Theorem 3 by estimating the difference \(u_\eta -u\). The key of the proof is the estimate of the difference \(f_\eta (B_{ij}^\eta *u_{\eta ,j})-f_\eta (a_{ij}u_{\eta ,j})\).

Step 1. Existence and uniqueness of solutions. Let \(u_\eta \) be a smooth solution to (7) and let \(\phi \in C_0^\infty ({{\mathbb {R}}}^d)\) with \({\text {supp}}(\phi )\subset B_R\), \(\zeta \in C^0([0,T])\) be test functions, where \(B_R\) is a ball around the origin with radius \(R>0\). Then the weak formulation of (7) reads as

$$\begin{aligned} \begin{aligned} \int _0^T\langle \partial _t u_{\eta .i},\phi \rangle \zeta (t)\mathrm {d}t&= -\int _0^T\int _{{{\mathbb {R}}}^d} u_{\eta ,i}\nabla U_i\cdot \nabla \phi \zeta (t)\mathrm {d}x\mathrm {d}t \\&\quad -\int _0^T\int _{{{\mathbb {R}}}^d}\big (\sigma _i\nabla u_{\eta ,i} + \nabla (u_{\eta ,i}K_i(u_\eta ))\big ) \cdot \nabla \phi \zeta (t)\mathrm {d}x\mathrm {d}t, \end{aligned} \end{aligned}$$
(23)

where \(\langle \cdot ,\cdot \rangle \) is the duality pairing between \(H^{-1}({{\mathbb {R}}}^d)\) and \(H^1({{\mathbb {R}}}^d)\) and \(K_i(u) =\sum _{j=1}^n f_\eta (B_{ij}^\eta *u_j)\). We want to perform the limit \(\eta \rightarrow 0\). By the uniform estimate of Theorem 2, there exists a subsequence, which is not relabeled, such that \(u_\eta \rightharpoonup u\) weakly in \(L^2(0,T;H^{s+1}({{\mathbb {R}}}^d))\) and weakly* in \(L^\infty (0,T;H^s({{\mathbb {R}}}^d))\subset L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d))\) as \(\eta \rightarrow 0\). Our aim is to prove that u is a weak solution to (1).

It follows from the proof of Lemma 7 in Chen et al. (2019) that

$$\begin{aligned} B_{ij}^\eta * \nabla u_{\eta ,j}\rightharpoonup a_{ij}\nabla u_j\quad \text{ weakly } \text{ in } L^2(0,T;L^2({{\mathbb {R}}}^d)). \end{aligned}$$

We claim that \(f_\eta (B_{ij}^\eta *u_{\eta ,j})\rightarrow f(a_{ij}u_j)\) strongly in \(L^2(0,T;L^2(B_R))\). First, we observe that \(u\in L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d))\). The weak formulation (23) gives

$$\begin{aligned} \Vert \partial _t u_{\eta ,i}\Vert _{L^2(0,T;H^{-1}(B_R))} \le&\Vert u_{\eta ,i}\Vert _{L^2(0,T;L^2({{\mathbb {R}}}^d))}\Vert \nabla U_i\Vert _{L^\infty (B_R)} + \sigma _i\Vert \nabla u_{\eta ,i}\Vert _{L^2(0,T;L^2({{\mathbb {R}}}^d))} \\&+ \Vert \nabla u_{\eta ,i}\Vert _{L^2(0,T;L^2({{\mathbb {R}}}^d))} \Vert K_i(u_\eta )\Vert _{L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d))} \\&+ \Vert u_{\eta ,i}\Vert _{L^2(0,T;L^2({{\mathbb {R}}}^d))} \Vert \nabla K_i(u_\eta )\Vert _{L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d))}. \end{aligned}$$

Because of

$$\begin{aligned} \Vert K_i(u_\eta )\Vert _{L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d))}&\le \sum _{j=1}^n\Vert f_\eta (B_{ij}^\eta *u_{\eta ,j})\Vert _{L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d))} \le C\Vert f\Vert _{L^\infty (I)}, \\ \Vert \nabla K_i(u_\eta )\Vert _{L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d))}&\le \sum _{j=1}^n\Vert f'_\eta (B_{ij}^\eta *u_{\eta ,j})\Vert _{L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d))}\\&\quad \times \Vert B_{ij}^\eta *\nabla u_{\eta ,j}\Vert _{L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d))} \\&\le C\Vert f'\Vert _{L^\infty (I)}\Vert \nabla u_\eta \Vert _{L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d))} \le C\Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}, \end{aligned}$$

we obtain a uniform bound for \(\partial _t u_{\eta ,i}\) in \(L^2(0,T;H^{-1}(B_R))\) (the bound might depend on R). In particular, up to a subsequence, as \(\eta \rightarrow 0\),

$$\begin{aligned} \partial _t u_{\eta ,i}\rightharpoonup \partial _t u_i\quad \text{ weakly } \text{ in } L^2(0,T;H^{-1}(B_R)). \end{aligned}$$

Since \(u_\eta \) is uniformly bounded in \(L^2(0,T;H^1(B_R))\), the Aubin–Lions lemma implies the existence of a subsequence (not relabeled) such that

$$\begin{aligned} u_{\eta ,i}\rightarrow u_i \quad \text{ strongly } \text{ in } L^2(0,T;L^2(B_R)). \end{aligned}$$

We use the Lipschitz continuity of \(f=f_\eta \) on I to infer that

$$\begin{aligned}&\Vert f_\eta (B_{ij}^\eta * u_{\eta ,j})-f(a_{ij}u_j)\Vert _{L^2(0,T;L^2(B_R))} \\&\quad \le C\Vert B_{ij}^\eta *(u_{\eta ,j}-u_j) + B_{ij}^\eta *u_j - a_{ij}u_j\Vert _{L^2(0,T;L^2(B_R))} \\&\quad \le C\Vert B_{ij}^\eta \Vert _{L^1({{\mathbb {R}}}^d)}\Vert u_{\eta ,j}-u_j\Vert _{L^2(0,T;L^2(B_R))} + \Vert B_{ij}^\eta *u_j-a_{ij}u_j\Vert _{L^2(0,T;L^2(B_R))} \rightarrow 0. \end{aligned}$$

This shows the claim. In a similar way, it follows from the Lipschitz continuity of \(f'_\eta \) that \(f'_\eta (B_{ij}^\eta *u_{\eta ,j})\rightarrow f'(a_{ij}u_j)\) strongly in \(L^2(0,T;L^2(B_R))\).

The previous convergences allow us to perform the limit \(\eta \rightarrow 0\) in (23), leading to

$$\begin{aligned} \int _0^T\langle \partial _t u_i,\phi \rangle \zeta (t)\mathrm {d}t= & {} -\int _0^T\int _{{{\mathbb {R}}}^d}u_i\nabla U_i\cdot \nabla \phi \zeta (t)\mathrm {d}x\mathrm {d}t \\&- \int _0^T\int _{{{\mathbb {R}}}^d}\nabla F_i(u)\cdot \nabla \phi \zeta (t)\mathrm {d}x\mathrm {d}t, \end{aligned}$$

where \(F_i(u)=u_i(\sigma _i+\sum _{j=1}^n f(a_{ij}u_j))\). Moreover, \(u_{i}(0)=u_{0,i}\) in \(B_R\) for any \(R>0\). Thus, u is a weak solution to (1). Standard estimates show that u is the unique solution, again choosing \(\varepsilon >0\) sufficiently small.

Step 2: Convergence rate. We take the difference of (7) and (1), multiply the resulting equation by \(u_{\eta ,i}-u_i\), integrate over \((0,\tau )\times {{\mathbb {R}}}^d\) for any \(\tau >0\), and integrate by parts:

$$\begin{aligned}&\frac{1}{2}\int _{{{\mathbb {R}}}^d}(u_{\eta ,i}-u_i)(\tau )^2\mathrm {d}x + \sigma _i\int _0^\tau \int _{{{\mathbb {R}}}^d}|\nabla (u_{\eta ,i}-u_i)|^2\mathrm {d}x\mathrm {d}t\nonumber \\&\quad = \frac{1}{2}\int _0^\tau \int _{{{\mathbb {R}}}^d}\Delta U_i(u_{\eta ,i}-u_i)^2\mathrm {d}x\mathrm {d}t \nonumber \\&\qquad {}- \int _0^\tau \int _{{{\mathbb {R}}}^d}\nabla \sum _{j=1}^n\big (u_{\eta ,i} f_\eta (B_{ij}^\eta *u_{\eta ,j}) - u_if(a_{ij}u_j)\big )\cdot \nabla (u_{\eta ,i}-u_i)\mathrm {d}x\mathrm {d}t. \end{aligned}$$
(24)

The first integral on the right-hand side is nonpositive since \(\Delta U_i=-d\). We split the second integral into three parts:

$$\begin{aligned}&-\int _0^\tau \int _{{{\mathbb {R}}}^d}\sum _{j=1}^n\nabla \big (u_{\eta ,i}f_\eta (B_{ij}^\eta *u_{\eta ,j})\nonumber \\&\quad - u_if(a_{ij}u_j)\big )\cdot \nabla (u_{\eta ,i}-u_i)\mathrm {d}x\mathrm {d}t = J_1 + J_2 + J_3, \end{aligned}$$
(25)

where

$$\begin{aligned} J_1&= -\int _0^\tau \int _{{{\mathbb {R}}}^d}\sum _{j=1}^n\nabla \big ((u_{\eta ,i}-u_i) f_\eta (B_{ij}^\eta *u_{\eta ,j})\big )\cdot \nabla (u_{\eta ,i}-u_i)\mathrm {d}x\mathrm {d}t, \\ J_2&= -\int _0^\tau \int _{{{\mathbb {R}}}^d}\sum _{j=1}^n\nabla \big (u_i\big ( f_\eta (B_{ij}^\eta *u_{\eta ,j}) - f_\eta (a_{ij}u_{\eta ,j})\big )\big )\cdot \nabla (u_{\eta ,i}-u_i)\mathrm {d}x\mathrm {d}t, \\ J_3&= -\int _0^\tau \int _{{{\mathbb {R}}}^d}\sum _{j=1}^n\nabla \big (u_i\big (f_\eta (a_{ij}u_{\eta ,j}) - f(a_{ij}u_j)\big )\big )\cdot \nabla (u_{\eta ,i}-u_i)\mathrm {d}x\mathrm {d}t. \end{aligned}$$

We start with the estimate of \(J_1\). The families \((B_{ij}^\eta *u_{\eta ,j})\) and \((B_{ij}^\eta *\nabla u_{\eta ,j})\) are bounded in \(L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d))\). Using \(\Vert f_\eta \Vert _{L^\infty (I)}=\Vert f\Vert _{L^\infty (I)}\le \varepsilon \) and Young’s inequality, we have

$$\begin{aligned} J_1 \le&\Vert f_\eta (B_{ij}^\eta *u_{\eta ,j})\Vert _{L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d))} \int _0^\tau \Vert \nabla (u_{\eta ,i}-u_i)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t \nonumber \\&+ \int _0^\tau \Vert u_{\eta ,i}-u_i\Vert _{L^2({{\mathbb {R}}}^d)}\Vert f'_\eta (B_{ij}^\eta *u_{\eta ,j}) \Vert _{L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d))} \nonumber \\&\times \Vert B_{ij}^\eta *\nabla u_{\eta ,j}\Vert _{L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d))} \Vert \nabla (u_{\eta ,i}-u_i)\Vert _{L^2({{\mathbb {R}}}^d)}\mathrm {d}t \nonumber \\ \le&\bigg (\frac{\sigma _i}{4}+\varepsilon \bigg )\int _0^\tau \Vert \nabla (u_{\eta ,i}-u_i)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C(\sigma _i)\int _0^\tau \Vert u_{\eta ,i}-u_i\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t. \end{aligned}$$
(26)

Next, we estimate \(J_2=J_{21}+J_{22}\), where

$$\begin{aligned} J_{21}&= -\int _0^\tau \int _{{{\mathbb {R}}}^d}\nabla u_i\sum _{j=1}^n \big (f_\eta (B_{ij}^\eta *u_{\eta ,j})-f_\eta (a_{ij}u_{\eta ,j})\big ) \cdot \nabla (u_{\eta ,i}-u_i)\mathrm {d}x\mathrm {d}t, \\ J_{22}&= -\int _0^\tau \int _{{{\mathbb {R}}}^d}u_i\sum _{j=1}^n\big (f'_\eta (B_{ij}^\eta *u_{\eta ,j}) B_{ij}^\eta *\nabla u_{\eta ,j} - f'_\eta (a_{ij}u_{\eta ,j})a_{ij}\nabla u_{\eta ,j}\big )\\&\quad \cdot \nabla (u_{\eta ,i}-u_i)\mathrm {d}x\mathrm {d}t. \end{aligned}$$

It follows that

$$\begin{aligned} J_{21}&\le \Vert \nabla u_i\Vert _{L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d))} \sum _{j=1}^n\int _0^\tau \Vert f_\eta (B_{ij}^\eta *u_{\eta ,j})-f_\eta (a_{ij}u_{\eta ,j}) \Vert _{L^2({{\mathbb {R}}}^d)}\\&\qquad \quad \times \Vert \nabla (u_{\eta ,i}-u_i)\Vert _{L^2({{\mathbb {R}}}^d)}\mathrm {d}t \\&\le \frac{\sigma _i}{8}\int _0^\tau \Vert \nabla (u_{\eta ,i}-u_i)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t {+} C\sum _{j=1}^n\int _0^\tau \Vert f_\eta (B_{ij}^\eta *u_{\eta ,j}){-}f_\eta (a_{ij}u_{\eta ,j}) \Vert ^2_{L^2({{\mathbb {R}}}^d)}\mathrm {d}t. \end{aligned}$$

Since both \(B_{ij}^\eta *u_{\eta ,j}\) and \(u_{\eta ,j}\) are uniformly bounded in \(L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d))\), we can choose \(\eta >0\) sufficiently small such that \(f=f_\eta \) on I. On that interval, f is Lipschitz continuous uniformly in \(\eta \). We use this information in

$$\begin{aligned}&\bigg |\int _{{{\mathbb {R}}}^d}\big (f_\eta (B_{ij}^\eta *u_{\eta ,j})-f_\eta (a_{ij}u_{\eta ,j}) \big )g(x)\mathrm {d}x\bigg | \\&\quad \le C\int _{{{\mathbb {R}}}^d}\big |B_{ij}^\eta *u_{\eta ,j}-a_{ij}u_{\eta ,j}\big ||g(x)|\mathrm {d}x, \end{aligned}$$

where \(g\in L^2({{\mathbb {R}}}^d)\). Recalling that \({\text {supp}}(B_{ij}^\eta )\subset B_\eta (0)\) and \(a_{ij}=\int _{B_\eta }B_{ij}^\eta \mathrm {d}x\), we obtain

$$\begin{aligned}&\bigg |\int _{{{\mathbb {R}}}^d}\big (f_\eta (B_{ij}^\eta *u_{\eta ,j})-f_\eta (a_{ij}u_{\eta ,j}) \big )g(x)\mathrm {d}x\bigg | \\&\quad \le C\int _{{{\mathbb {R}}}^d}\bigg |\int _{B_\eta }B_{ij}^\eta (y)\big (u_{\eta ,j}(x-y)-u_{\eta ,j}(x) \big )\mathrm {d}y\bigg ||g(x)|\mathrm {d}x \\&\quad \le C\int _{{{\mathbb {R}}}^d}\int _{B_\eta }|B_{ij}^\eta (y)|\bigg ( \int _0^1|\nabla u_{\eta ,j}(x-ry)|\eta \mathrm {d}r\bigg )\mathrm {d}y|g(x)|\mathrm {d}x \\&\quad = C\eta \int _0^1\int _{B_\eta }|B_{ij}^\eta (y)|\bigg (\int _{{{\mathbb {R}}}^d}|\nabla u_{\eta ,j}(x-ry)| |g(x)|\mathrm {d}x\bigg )\mathrm {d}y\mathrm {d}r \\&\quad \le C\eta \int _0^1\int _{B_\eta }|B_{ij}^\eta (y)| \Vert \nabla u_{\eta ,j}(\cdot -ry)\Vert _{L^2({{\mathbb {R}}}^d)}\Vert g\Vert _{L^2({{\mathbb {R}}}^d)}\mathrm {d}y\mathrm {d}r \\&\quad \le C\eta \int _{B_\eta }|B_{ij}^\eta (y)|\mathrm {d}y\Vert \nabla u_{\eta ,j}\Vert _{L^2({{\mathbb {R}}}^d)} \Vert g\Vert _{L^2({{\mathbb {R}}}^d)} \le C\eta \Vert g\Vert _{L^2({{\mathbb {R}}}^d)}. \end{aligned}$$

By duality, we find that

$$\begin{aligned} J_{21}\le \frac{\sigma _i}{8}\int _0^\tau \Vert \nabla (u_{\eta ,i}-u_i)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C\eta ^2. \end{aligned}$$

The integral \(J_{22}\) is split into \(J_{22}=J_{221}+J_{222}\), where

$$\begin{aligned} J_{221}&= -\int _0^\tau \int _{{{\mathbb {R}}}^d}u_i\sum _{j=1}^n f'_\eta (B_{ij}^\eta *u_{\eta ,j}) \big (B_{ij}^\eta *\nabla u_{\eta ,j} - a_{ij}\nabla u_{\eta ,j}\big ) \cdot \nabla (u_{\eta ,i}-u_i)\mathrm {d}x\mathrm {d}t, \\ J_{222}&= -\int _0^\tau \int _{{{\mathbb {R}}}^d}u_i\sum _{j=1}^n\big (f'_\eta (B_{ij}^\eta *u_{\eta ,j}) - f'_\eta (a_{ij}u_{\eta ,j})\big )a_{ij}\nabla u_{\eta ,j} \cdot \nabla (u_{\eta ,i}-u_i)\mathrm {d}x\mathrm {d}t. \end{aligned}$$

We infer from the uniform boundedness of \(B_{ij}^\eta *u_{\eta ,j}\) in \(L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d))\) and the fact that \(f'_\eta =f'\) on I for sufficiently small \(\eta >0\) that

$$\begin{aligned} J_{221}&\le \frac{\sigma _i}{16}\int _0^\tau \Vert \nabla (u_{\eta ,i}-u_i)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C{\sum _{j=1}^n}\int _0^T\Vert B_{ij}^\eta *\nabla u_{\eta ,j} - a_{ij}\nabla u_{\eta ,j} \Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t \\&\le \frac{\sigma _i}{16}\int _0^\tau \Vert \nabla (u_{\eta ,i}-u_i)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C\eta ^2{\sum _{j=1}^n}\int _0^\tau \Vert D^2 u_{\eta ,j}\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t, \end{aligned}$$

where we estimated the difference \(B_{ij}^\eta *\nabla u_{\eta ,j} - a_{ij}\nabla u_{\eta ,j}\) similarly as for \(J_{21}\). Furthermore, the Lipschitz continuity of \(f'_\eta =f'\) on I leads to

$$\begin{aligned} J_{222}&\le C{\sum _{j=1}^n} \int _0^\tau \Vert u_i\Vert _{L^\infty ({{\mathbb {R}}}^d)}\Vert B_{ij}^\eta *u_{\eta ,j} \\&\quad - a_{ij}u_{\eta ,j}\Vert _{L^2({{\mathbb {R}}}^d)}\Vert \nabla u_{\eta ,j}\Vert _{L^\infty ({{\mathbb {R}}}^d)} \Vert \nabla (u_{\eta ,i}-u_i)\Vert _{L^2({{\mathbb {R}}}^d)}\mathrm {d}t \\&\le \frac{\sigma _i}{16}\int _0^\tau \Vert \nabla (u_{\eta ,i}-u_i)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C\eta ^2{\sum _{j=1}^n}\int _0^\tau \Vert \nabla u_{\eta ,j}\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t. \end{aligned}$$

Summarizing these estimates, we infer that

$$\begin{aligned} J_{22} \le \frac{\sigma _i}{8}\int _0^\tau \Vert \nabla (u_{\eta ,i}-u_i)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C\eta ^2, \end{aligned}$$

and combining the estimate for \(J_{21}\) and \(J_{22}\),

$$\begin{aligned} J_2 \le \frac{\sigma _i}{4}\int _0^\tau \Vert \nabla (u_{\eta ,i}-u_i)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C\eta ^2. \end{aligned}$$
(27)

It remains to estimate \(J_3=J_{31}+J_{32}\), where

$$\begin{aligned} J_{31}&= -\int _0^\tau \int _{{{\mathbb {R}}}^d}\sum _{j=1}^n\big (f_\eta (a_{ij}u_{\eta ,j}) - f(a_{ij}u_j)\big )\nabla u_i\cdot \nabla (u_{\eta ,i}-u_i)\mathrm {d}x\mathrm {d}t, \\ J_{32}&= -\int _0^\tau \int _{{{\mathbb {R}}}^d}u_i\sum _{j=1}^n\big (f'_\eta (a_{ij}u_{\eta ,j}) a_{ij}\nabla u_{\eta ,j} - f'(a_{ij}u_j)a_{ij}\nabla u_j\big ) \cdot \nabla (u_{\eta ,i}-u_i)\mathrm {d}x\mathrm {d}t. \end{aligned}$$

Similar arguments as above yield

$$\begin{aligned} J_{31}&\le \frac{\sigma _i}{8}\int _0^\tau \Vert \nabla (u_{\eta ,i}-u_i)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C\int _0^\tau \Vert \nabla u_i\Vert _{L^\infty ({{\mathbb {R}}}^d)}^2\Vert u_{\eta }-u\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t \\&\le \frac{\sigma _i}{8}\int _0^\tau \Vert \nabla (u_{\eta ,i}-u_i)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C\int _0^\tau \Vert u_{\eta }-u\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t. \end{aligned}$$

The second term \(J_{32}\) is again split into two parts, \(J_{32}=J_{321}+J_{322}\), where

$$\begin{aligned} J_{321}&= -\int _0^\tau \int _{{{\mathbb {R}}}^d}u_i\sum _{j=1}^n \big (f'_\eta (a_{ij}u_{\eta ,j}) - f'_\eta (a_{ij}u_j)\big )a_{ij}\nabla u_{\eta ,j}\cdot \nabla (u_{\eta ,i}-u_i)\mathrm {d}x\mathrm {d}t, \\ J_{322}&= -\int _0^\tau \int _{{{\mathbb {R}}}^d}u_i\sum _{j=1}^n a_{ij}\big (f'_\eta (a_{ij}u_j) \nabla u_{\eta ,j} - f'(a_{ij}u_j)\nabla u_j\big )\cdot \nabla (u_{\eta ,i}-u_i)\mathrm {d}x\mathrm {d}t. \end{aligned}$$

Using the Lipschitz continuity again, \(f'_\eta =f'\) on I, and \(|f'|\le \varepsilon \), we deduce that

$$\begin{aligned} J_{321}&\le C\Vert u_i\Vert _{L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d)}\int _0^\tau \sum _{j=1}^n \Vert \nabla u_{\eta ,j}\Vert _{L^\infty ({{\mathbb {R}}}^d)} \Vert u_{\eta ,j}\\&\quad -u_j\Vert _{L^2({{\mathbb {R}}}^d)}\Vert \nabla (u_{\eta ,i}-u_i)\Vert _{L^2({{\mathbb {R}}}^d)}\mathrm {d}t \\&\le \frac{\sigma _i}{8}\int _0^\tau \Vert \nabla (u_{\eta ,i}-u_i)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C\int _0^\tau \Vert u_\eta -u\Vert _{L^2({{\mathbb {R}}}^d)}^2, \\ J_{322}&\le C\int _0^\tau \sum _{j=1}^n \Vert f'(a_{ij}u_j)\Vert _{L^\infty ({{\mathbb {R}}}^d)}\Vert \nabla (u_{\eta ,j}-u_j)\Vert _{L^2({{\mathbb {R}}}^d)}\\&\qquad \quad \times \Vert \nabla (u_{\eta ,i}-u_i)\Vert _{L^2({{\mathbb {R}}}^d)}\mathrm {d}t \\&\le C\varepsilon \int _0^\tau \Vert \nabla (u_\eta -u)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t. \end{aligned}$$

This shows that

$$\begin{aligned} J_{32} \le \bigg (\frac{\sigma _i}{8}+C\varepsilon \bigg ) \int _0^\tau \Vert \nabla (u_{\eta ,i}-u_i)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C\int _0^\tau \Vert u_\eta -u\Vert _{L^2({{\mathbb {R}}}^d)}^2. \end{aligned}$$

Summarizing the estimate for \(J_{31}\) and \(J_{32}\), we arrive at

$$\begin{aligned} J_3 \le \bigg (\frac{\sigma _i}{4}+C\varepsilon \bigg ) \int _0^\tau \Vert \nabla (u_{\eta ,i}-u_i)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C\int _0^\tau \Vert u_\eta -u\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t. \end{aligned}$$
(28)

Finally, putting together the estimates (26), (27), and (28), we infer from (25) that

$$\begin{aligned}&\bigg |\int _0^\tau \int _{{{\mathbb {R}}}^d}\sum _{j=1}^n\nabla \big (u_{\eta ,i} f_\eta (B_{ij}^\eta *u_{\eta ,j}) - u_if(a_{ij}u_j)\big )\cdot \nabla (u_{\eta ,i}-u_i)\mathrm {d}x\mathrm {d}t\bigg | \\&\quad \le \bigg (\frac{3\sigma _i}{4} + C\varepsilon \bigg )\int _0^\tau \Vert \nabla (u_{\eta ,j}-u_j)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C\int _0^\tau \Vert u_{\eta }-u\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C\eta ^2. \end{aligned}$$

This is the desired estimate for the last integral in (24). We conclude for sufficiently small \(\varepsilon >0\) and after summation over \(i=1,\ldots ,n\) that

$$\begin{aligned}&\Vert (u_\eta -u)(\tau )\Vert _{L^2({{\mathbb {R}}}^d)}^2 + \sigma _{\mathrm{min}}C\int _0^\tau \Vert \nabla (u_\eta -u)\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t \\&\quad \le C\int _0^\tau \Vert u_{\eta }-u\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t + C\eta ^2. \end{aligned}$$

The proof ends after applying Gronwall’s inequality.

5 Links Between the SDEs and PDEs

We show that the density function \({\widehat{u}}\) from Proposition 4 coincides with the unique weak solution u to (1).

Theorem 7

Let the assumptions of Theorem 3 hold. Let \({\widehat{X}}_i\) for \(i=1,\ldots ,n\) be the square-integrable process solving (8) with density function \({\widehat{u}}_i\) and let \(u_i\) be the unique weak solution to (1). Then \({\widehat{u}}=({{\widehat{u}}}_1,\ldots ,{{\widehat{u}}}_n)\) solves the linear equation

$$\begin{aligned} \partial _t{\widehat{u}}_i = {\text {div}}({\widehat{u}}_i\nabla U_i) + \Delta \bigg (\sigma _i{\widehat{u}}_i + {\widehat{u}}_i\sum _{j=1}^n f(a_{ij}u_j)\bigg ) \quad \text{ in } {{\mathbb {R}}}^d,\ i=1,\ldots ,n, \end{aligned}$$
(29)

in the weak integrable sense, i.e.

$$\begin{aligned}&\int _{{{\mathbb {R}}}^d}{\widehat{u}}_i(t)\phi (t)\mathrm {d}x - \int _{{{\mathbb {R}}}^d}u_{0,i}\phi (0)\mathrm {d}x - \int _0^t\int _{{{\mathbb {R}}}^d}{\widehat{u}}_i\partial _t\phi \mathrm {d}x\mathrm {d}s \\&\quad = -\int _0^t\int _{{{\mathbb {R}}}^d}{\widehat{u}}_i\nabla U_i\cdot \nabla \phi \mathrm {d}x\mathrm {d}t + \int _0^t\int _{{{\mathbb {R}}}^d}{\widehat{u}}_i\bigg (\sigma _i+\sum _{j=1}^n f(a_{ij}u_j)\bigg ) \Delta \phi \mathrm {d}x\mathrm {d}s \end{aligned}$$

for all \(\phi \in C_0^\infty ([0,\infty )\times {{\mathbb {R}}}^d)\) and \(t>0\), where we assume that the initial datum \({\widehat{u}}_i(0)=u_{0,i}\) fulfils

$$\begin{aligned} \int _{{{\mathbb {R}}}^d}u_{0,i}(x)\mathrm {d}x = 1, \quad \int _{{{\mathbb {R}}}^d}u_{0,i}(x)|x|^2\mathrm {d}x < \infty . \end{aligned}$$
(30)

Additionally, \({\widehat{u}}=u\) in \((0,\infty )\times {{\mathbb {R}}}^d\), \(u_i\ge 0\), and (30) is fulfilled for \(u_i\) instead of \(u_{0,i}\) for almost all \(t>0\) and all \(i=1,\ldots ,n\).

Proof

Since \({\widehat{X}}_{k,i}\) depends on k only via the initial data \(\xi _i^k\) with the same law \(u_{0,i}\), we can omit the index k. Let \(\phi \in C_0^\infty ([0,\infty )\times {{\mathbb {R}}}^d)\) and set \(F_i(u)=\sigma _i+\sum _{j=1}^n f(a_{ij}u_j)\). By Itô’s lemma, we obtain

$$\begin{aligned}&\phi (t,{\widehat{X}}_i(t)) = \phi (0,\xi _i) + \int _0^t\partial _t\phi (s,{\widehat{X}}_i(s))\mathrm {d}s - \int _0^t\nabla U_i(s)\cdot \nabla \phi (s,{\widehat{X}}_i(s))\mathrm {d}s \nonumber \\&\qquad \quad \qquad + \int _0^t F_i\big (u({\widehat{X}}_i(s))\big )\Delta \phi (s,{\widehat{X}}_i(s))\mathrm {d}s \nonumber \\&\quad + \int _0^t F_i\big (u({\widehat{X}}_i(s))\big )^{1/2}\nabla \phi (s,X(s))\cdot \mathrm {d}W_i(s). \end{aligned}$$
(31)

We claim that the density function \({\widehat{u}}_i:[0,\infty )\rightarrow {\mathcal {P}}_2({{\mathbb {R}}}^d)\), where \({\mathcal {P}}_2({{\mathbb {R}}}^d)\) is the space of all density functions with finite second moment, is continuous with respect to the 2-Wasserstein distance \(W_2\). Indeed, since \({\widehat{X}}_i\) is square-integrable, we have \({\widehat{u}}_i(t)\in {\mathcal {P}}_2({{\mathbb {R}}}^d)\) for almost all \(t>0\) and the limit \(s\rightarrow t\) in the Wasserstein distance leads to

$$\begin{aligned} W_2({\widehat{u}}_i(t),{\widehat{u}}_i(s))&= \inf \big \{\big ({{\mathbb {E}}}(|Y_t-Y_s|^2)\big )^{1/2}:\ {\text {Law}}(Y_t)={\widehat{u}}_i(t),\ {\text {Law}}(Y_s)={\widehat{u}}_i(s)\big \} \\&\le \big ({{\mathbb {E}}}(|{\widehat{X}}_i(t)-{\widehat{X}}_i(s)|^2)\big )^{1/2}\rightarrow 0, \end{aligned}$$

using the facts that \({\widehat{X}}_i\) is continuous in time and has bounded second moments. This shows the claim. We conclude that the point evaluation \({\widehat{u}}_i(t)\) is well defined.

The previous argumentation shows that we can apply the expectation to (31) to obtain

$$\begin{aligned}&\int _{{{\mathbb {R}}}^d}{\widehat{u}}_i(t)\phi (t)\mathrm {d}x = \int _{{{\mathbb {R}}}^d}u_{0,i}\phi (0)\mathrm {d}x + \int _0^t\int _{{{\mathbb {R}}}^d}{\widehat{u}}_i(s)\partial _t\phi (s)\mathrm {d}x\mathrm {d}s \\&\qquad \qquad \quad {} -\int _0^t\int _{{{\mathbb {R}}}^d}{\widehat{u}}_i(s)\nabla U_i\cdot \nabla \phi (s)\mathrm {d}x\mathrm {d}s + \int _0^t\int _{{{\mathbb {R}}}^d}{\widehat{u}}_i(s)F_i(u(s))\Delta \phi (s)\mathrm {d}x\mathrm {d}s. \end{aligned}$$

This is the very weak formulation of (29), showing the first part of the theorem.

Next, we verify that the solution to (29) is unique. More precisely, we take \(u_0=0\) and show that \({\widehat{u}}_i(t)=0\) for almost all \(t>0\). The statement is usually proved by a duality argument. However, the coefficients of the dual problem associated to (29) are not regular enough such that we need to regularize it. As the proof is rather standard but tedious, we only sketch the arguments. Let \(\chi _k\) be a family of mollifiers and consider the regularized dual backward problem on the ball \(B_R\) around the origin with radius \(R>0\):

$$\begin{aligned}&\partial _t w_{k,R} - \nabla U_i\cdot \nabla w_{k,R} + (\chi _k*F_i(u))\Delta w_{k,R} = 0 \quad \text{ in } B_R,\ 0<s<t, \\&w_{k,R}=0\quad \text{ on } \partial B_R, \quad w_{k,R}(t)=g\in C_0^\infty (B_R)\quad \text{ in } B_R. \end{aligned}$$

We extend the unique smooth solution \(w_{k,R}\) to the whole space by setting \(w_{k,R}=0\) on \({{\mathbb {R}}}^d\setminus B_R\). Since the extension may be not smooth, we choose a cut-off function \(\psi _R\in C^\infty ({{\mathbb {R}}}^d)\) and use \(w_{k,R}\psi _R\) as an admissible test function in the very weak formulation of (29). Standard estimations give bounds for \(w_{k,R}\) uniform in k and R. Then, passing to the limit \(k\rightarrow \infty \), \(R\rightarrow \infty \) in the weak formulation shows that \(\int _{{{\mathbb {R}}}^d}g(x){{\widehat{u}}}_i(s,x)\mathrm {d}x=0\), and since g was arbitrary, we conclude that \({\widehat{u}}_i(s)=0\) for \(0<s<t\).

The weak solution u to (1) is also a very weak solution to (29). Therefore, by the previous uniqueness result, \({\widehat{u}}=u\). \(\square \)

Similar arguments lead to the following result that relates the solutions \({\overline{u}}_\eta \) and \(u_\eta \).

Theorem 8

Let the assumptions of Theorem 2 hold and let \(\eta >0\). Let \({\overline{X}}^\eta _{k,i}\) for \(i=1,\ldots ,n\) and \(k=1,\ldots ,N\) be the square-integrable process solving (6) with density function \({\overline{u}}_{\eta ,i}\). Then \({\overline{u}}_{\eta }=({\overline{u}}_{\eta ,1},\ldots , {\overline{u}}_{\eta ,n})\) solves the linear problem

$$\begin{aligned} \partial _t{\overline{u}}_{\eta ,i} = {\text {div}}({\overline{u}}_{\eta ,i}\nabla U_i) + \Delta \bigg (\sigma _i{\overline{u}}_{\eta ,i} + {\overline{u}}_{\eta ,i}\sum _{j=1}^n f_\eta (B_{ij}^\eta * u_{\eta ,j})\bigg ) \quad \text{ in } {{\mathbb {R}}}^d,\ i=1,\ldots ,n, \end{aligned}$$

with initial datum \({\overline{u}}_{\eta ,i}(0)=u_{0,i}\), which fulfils (30), where \(u_{\eta ,i}\) is the unique weak solution to (7). Then \({\overline{u}}_\eta =u_\eta \) in \((0,\infty )\times {{\mathbb {R}}}^d\), \(u_{\eta ,i}\ge 0\), and

$$\begin{aligned} \int _{{{\mathbb {R}}}^d}u_{\eta ,i}(x,t)\mathrm {d}x=1, \quad \int _{{{\mathbb {R}}}^d}u_{\eta ,i}(x,t)|x|^2\mathrm {d}x<\infty \end{aligned}$$

for almost all \(t>0\) and all \(i=1,\ldots ,n\).

6 Proof of Theorem 5

The proof is split into two parts. We estimate first the square mean error of the difference \(X_{k,i}^{N,\eta }-{\overline{X}}_{k,i}^\eta \), where \({\overline{X}}_{k,i}^\eta \) is the solution to the intermediate system (6). In fact, this error bound is a generalization of a result due to Sznitman (1991). Essential for this step are the facts that the Lipschitz constant of \(B_{ij}^\eta \) is of order \(\eta ^{-d-1}\), while the Lipschitz constant of \(f_\eta \) is of order \(\eta ^{-\alpha }\). Second, we estimate the square mean error of the difference \({\overline{X}}_{k,i}^\eta -{\widehat{X}}_{k,i}\), based on an estimate of \(f_\eta (B_{ij}^\eta *u_j)-f_\eta (a_{ij}u_j)\) in \(L^2\), which is of the order of \(\eta ^{1-\alpha }\).

Lemma 9

Let \(X_{k,i}^{N,\eta }\) and \({\overline{X}}_{k,i}^\eta \) be the solutions to (5) and (8), respectively, in the sense of Proposition 4. Under the assumptions of Theorem 5, there exists \(\delta >0\), depending on n, \(\sigma _{\mathrm{min}}\), and T, such that if \(\eta ^{-2(d+1+\alpha )}\le \delta \log N\), where \(\alpha \ge 0\) is fixed in Assumption (A4), we have

$$\begin{aligned} \sup _{k=1,\ldots ,N}{{\mathbb {E}}}\bigg (\sum _{i=1}^n\sup _{0<s<T} \big |(X_{k,i}^{N,\eta }-{\overline{X}}_{k,i}^\eta )(s)\big |^2\bigg ) \le C(T,n,\sigma _{\mathrm{min}})N^{-1+(T+1)C(n,{T},\sigma _{\mathrm{min}})\delta }, \end{aligned}$$

where \(C(T,n,\sigma _{\mathrm{min}})>0\) is a positive constant.

Proof

The process \(D_{k,i}^{N,\eta }:=X_{k,i}^{N,\eta }-{\overline{X}}_{k,i}^\eta \) solves

$$\begin{aligned} D_{k,i}^{N,\eta }(s) = E_{1,i}(s) + E_{2,i}(s), \quad 0\le s\le T, \end{aligned}$$
(32)

where

$$\begin{aligned} E_{1,i}(s)&= -\int _0^s\big (\nabla U_i(X_{k,i}^{N,\eta }(t)) -\nabla U_i({\overline{X}}_{k,i}^\eta (t))\big )\mathrm {d}t, \\ E_{2,i}(s)&= \int _0^s(E_{21}(t)-E_{22}(t))\mathrm {d}W_i^k(t), \\ E_{21}(t)&= \bigg (2\sigma _i + 2\sum _{j=1}^n f_\eta \bigg (\frac{1}{N}\!\!\!\! \sum _{\begin{array}{c} \ell =1 \\ (\ell ,j)\ne (k,i) \end{array}}^N\!\!\!\! B_{ij}^\eta (X_{k,i}^{N,\eta }(t)-X_{\ell ,j}^{N,\eta }(t))\bigg )\bigg )^{1/2}, \\ E_{22}(t)&= \bigg (2\sigma _i + 2\sum _{j=1}^n f_\eta \big (B_{ij}^\eta * u_{\eta ,j}(t,{\overline{X}}_{k,i}^\eta (t))\big )\bigg )^{1/2}. \end{aligned}$$

We use the global Lipschitz continuity of \(\nabla U_i\) and the Fubini theorem to estimate the first term:

$$\begin{aligned} {{\mathbb {E}}}\Big (\sup _{0<s<T} |E_{1,i}(s)|^2\Big )&\le C{T} {{\mathbb {E}}}\int _0^T\big |(X_{k,i}^{N,\eta }-{\overline{X}}_{k,i}^\eta )(s)\big |^2\mathrm {d}s \\&\le C{T} \int _0^T{{\mathbb {E}}}\Big (\sup _{0<s<t}|(X_{k,i}^{N,\eta }-{\overline{X}}_{k,i}^\eta )(s)|^2\Big )\mathrm {d}t. \end{aligned}$$

Summing over \(i=1,\ldots ,n\) and taking the supremum over \(k=1,\ldots ,N\) leads to

$$\begin{aligned}&\sup _{k=1,\ldots ,N}{{\mathbb {E}}}\bigg (\sum _{i=1}^n\sup _{0<s<T} |E_{1,i}(s)|^2\bigg ) \nonumber \\&\quad \le C{T}\int _0^T\sup _{k=1,\ldots ,N}{{\mathbb {E}}}\Big (\sup _{0<s<t} |(X_{k,i}^{N,\eta }-{\overline{X}}_{k,i}^\eta )(s)|^2\Big )\mathrm {d}t. \end{aligned}$$
(33)

Next, we apply the Burkholder–Davis–Gundy inequality (Karatzas and Shreve 1991, Theorem 3.28) to the second term \(E_{2,i}\) and use the Lipschitz continuity of \(x\mapsto (2\sigma _i+x)^{1/2}\) for \(x\ge 0\):

$$\begin{aligned}&{{\mathbb {E}}}\Big (\sup _{0<s<T}|E_{2,i}(s)|^2\Big ) \le C{{\mathbb {E}}}\int _0^T(E_{21}(t)-E_{22}(t))^2\mathrm {d}t \nonumber \\&\quad \le C{{\mathbb {E}}}\int _0^T\bigg [\sum _{j=1}^n f_\eta \bigg (\frac{1}{N}\!\!\!\! \sum _{\begin{array}{c} \ell =1 \\ (\ell ,j)\ne (k,i) \end{array}}^N\!\!\!\! B_{ij}^\eta (X_{k,i}^{N,\eta }(t)-X_{\ell ,j}^{N,\eta }(t))\bigg ) \nonumber \\&\qquad - \sum _{j=1}^n f_\eta \big (B_{ij}^\eta *u_{\eta ,j}(t,{\overline{X}}_{k,i}^\eta (t)) \big )\bigg ]^2\mathrm {d}t \nonumber \\&\quad = C{{\mathbb {E}}}\int _0^T\bigg [\sum _{j=1}^n(L^1_j(t)+L^2_j(t)+L^3_j(t))\bigg ]^2\mathrm {d}t \nonumber \\&\quad \le C(n){{\mathbb {E}}}\int _0^T\sum _{j=1}^n\big (L^1_j(t)^2 + L^2_j(t)^2 + L^3_j(t)^2\big )\mathrm {d}t, \end{aligned}$$
(34)

where

$$\begin{aligned} L^1_j(t)&= f_\eta \bigg (\frac{1}{N}\!\!\!\! \sum _{\begin{array}{c} \ell =1 \\ (\ell ,j)\ne (k,i) \end{array}}^N\!\!\!\! B_{ij}^\eta (X_{k,i}^{N,\eta }(t)-X_{\ell ,j}^{N,\eta }(t))\bigg ) \\&\quad - f_\eta \bigg (\frac{1}{N}\!\!\!\! \sum _{\begin{array}{c} \ell =1 \\ (\ell ,j)\ne (k,i) \end{array}}^N\!\!\!\!B_{ij}^\eta ({\overline{X}}_{k,i}^{\eta }(t)-X_{\ell ,j}^{N,\eta }(t))\bigg ), \\ L^2_j(t)&= f_\eta \bigg (\frac{1}{N}\!\!\!\! \sum _{\begin{array}{c} \ell =1 \\ (\ell ,j)\ne (k,i) \end{array}}^N\!\!\!\! B_{ij}^\eta ({\overline{X}}_{k,i}^{\eta }(t)-X_{\ell ,j}^{N,\eta }(t))\bigg )\\&\quad - f_\eta \bigg (\frac{1}{N}\!\!\!\! \sum _{\begin{array}{c} \ell =1 \\ (\ell ,j)\ne (k,i) \end{array}}^N\!\!\!\! B_{ij}^\eta ({\overline{X}}_{k,i}^{\eta }(t)-{\overline{X}}_{\ell ,j}^{\eta }(t))\bigg ), \\ L^3_j(t)&= f_\eta \bigg (\frac{1}{N}\!\!\!\! \sum _{\begin{array}{c} \ell =1 \\ (\ell ,j)\ne (k,i) \end{array}}^N\!\!\!\! B_{ij}^\eta ({\overline{X}}_{k,i}^{\eta }(t)-{\overline{X}}_{\ell ,j}^{\eta }(t))\bigg ) - f_\eta \big (B_{ij}^\eta * u_{\eta ,j}(t,{\overline{X}}_{k,i}^\eta (t))\big ). \\ \end{aligned}$$

We estimate these three terms separately. By construction, the Lipschitz constant of \(f_\eta \) can be estimated by \(L_f\le \eta ^{-\alpha }\). Moreover, the Lipschitz constant of \(B_{ij}^\eta (x)=\eta ^{-d}B_{ij}(|x|/\eta )\) is computed by \(L_B=\max _{i,j=1,\ldots ,n}\Vert \nabla B_{ij}^\eta \Vert _{L^\infty ({{\mathbb {R}}}^d)} \le C\eta ^{-d-1}\). This shows that

$$\begin{aligned} |L^1_j(t)|&\le L_f\bigg |\frac{1}{N}\!\!\!\! \sum _{\begin{array}{c} \ell =1 \\ (\ell ,j)\ne (k,i) \end{array}}^N\!\!\!\!\big (B_{ij}^\eta (X_{k,i}^{N,\eta }(t)-X_{\ell ,j}^{N,\eta }(t)) - B_{ij}^\eta ({\overline{X}}_{k,i}^\eta (t)-X_{\ell ,j}^{N,\eta }(t))\big )\bigg | \\&\le L_f L_B\big |X_{k,i}^{N,\eta }(t)-{\overline{X}}_{k,i}^\eta (t)\big | \le {C}\eta ^{-d-1-\alpha }\big |X_{k,i}^{N,\eta }(t)-{\overline{X}}_{k,i}^\eta (t)\big |. \end{aligned}$$

Therefore, by Fubini’s theorem,

$$\begin{aligned} {{\mathbb {E}}}\int _0^T\sum _{j=1}^n |L^1_j(t)|^2\mathrm {d}t&\le C(n)\eta ^{-2(d+1 +\alpha )}{{\mathbb {E}}}\int _0^T\big |X_{k,i}^{N,\eta }(t) -{\overline{X}}_{k,i}^\eta (t)\big |^2\mathrm {d}t \nonumber \\&\le C(n)\eta ^{-2(d+1 +\alpha )}\int _0^T\sup _{k=1,\ldots ,N}{{\mathbb {E}}}\Big ( \sup _{0<s<t}\big |X_{k,i}^{N,\eta }(t)-{\overline{X}}_{k,i}^\eta (t)\big |^2\Big )\mathrm {d}t. \end{aligned}$$
(35)

We can estimate the second term \(L^2_j(t)\) in a similar way, leading to

$$\begin{aligned}&{{\mathbb {E}}}\int _0^T\sum _{j=1}^n L^2_j(t)^2\mathrm {d}t \le C(n)\eta ^{-2(d+1 +\alpha )}\nonumber \\&\quad \times \int _0^T\sup _{\ell =1,\ldots ,N} {{\mathbb {E}}}\bigg (\sup _{0<s<t}\sum _{j=1}^n \big |X_{\ell ,j}^{N,\eta }(t)-{\overline{X}}_{\ell ,j}^\eta (t)\big |^2\bigg )\mathrm {d}t. \end{aligned}$$
(36)

The third term \(L^3_j(t)\) has to be treated in a different way. First, we use the Lipschitz continuity of \(f_\eta \) to find that

$$\begin{aligned} L^3_j(t)\le & {} \frac{C(n)}{N\eta ^{\alpha }}\bigg | \sum _{\ell =1}^N \big (B_{ij}^\eta ({\overline{X}}_{k,i}^\eta - {\overline{X}}_{\ell ,j}^\eta ) - B_{ij}^\eta * u_{\eta ,j}({\overline{X}}_{k,i}^\eta )\big ) - \frac{1}{\eta ^d}B_{ii}(0)\bigg |. \end{aligned}$$

This implies that

$$\begin{aligned}&{{\mathbb {E}}}\int _0^T \sum _{j=1}^n L^3_j(t)^2\mathrm {d}t \le \frac{C(n,T)}{N^2\eta ^{2(d+\alpha )}} \nonumber \\&\qquad \quad + \frac{C(n)}{N^2\eta ^{2\alpha }}\sum _{j=1}^n\int _0^T{{\mathbb {E}}}\bigg ( \!\sum _{\ell =1}^N \Big (B_{ij}^\eta \big ({\overline{X}}_{k,i}^\eta (t) - {\overline{X}}_{\ell ,j}^\eta (t)\big ) - B_{ij}^\eta * u_{\eta ,j}({\overline{X}}_{k,i}^\eta )\Big )\bigg )^2 \mathrm {d}t. \end{aligned}$$
(37)

It remains to estimate the expectation. To this end, we introduce

$$\begin{aligned} D_{(k,i),(\ell ,j)}(t) := B_{ij}^\eta ({\overline{X}}_{k,i}^\eta (t) - {\overline{X}}_{\ell ,j}^\eta (t)) - B_{ij}^\eta * u_{\eta ,j}(t,{\overline{X}}_{k,i}^\eta (t)), \quad (\ell ,j)\ne (k,i). \end{aligned}$$

The processes \({\overline{X}}_{k,i}^\eta \) and \({\overline{X}}_{\ell ,j}^\eta \) are independent, since for \(i=j\), we are considering N independent copies of the same process and for \(i\ne j\), the equation fulfilled by \({\overline{X}}_{k,i}^\eta \) does not depend on the process \({\overline{X}}_{\ell ,j}^\eta \). If \((k,i)\ne (\ell ,j)\), \((k,i)\ne (m,j)\), and \(\ell \ne m\), the processes \(D_{(k,i),(\ell ,j)}(t)\) and \(D_{(k,i),(m,j)}(t)\) are orthogonal, since

$$\begin{aligned}&{{\mathbb {E}}}\big (D_{(k,i),(\ell ,j)}(t)D_{(k,i),(m,j)}(t)\big )\\&\quad = \int _{{{\mathbb {R}}}^d}\bigg (\int _{{{\mathbb {R}}}^d}\int _{{{\mathbb {R}}}^d}B_{ij}^\eta (x-y)B_{ij}^\eta (x-z) u_{\eta ,j}(t,y)u_{\eta ,j}(t,z)\mathrm {d}y\mathrm {d}z \\&\qquad - 2\int _{{{\mathbb {R}}}^d}B_{ij}^\eta (x-y)u_{\eta ,j}(t,y)(B_{ij}^\eta *u_{\eta ,j})(t,y)\mathrm {d}y \\&\qquad + (B_{ij}^\eta *u_{\eta ,j})(t,x)(B_{ij}^\eta *u_{\eta ,j})(t,x) \bigg )u_{\eta ,i}(t,x)\mathrm {d}x = 0. \end{aligned}$$

Together with \({{\mathbb {E}}}(D_{(k,i),(\ell ,j)})=0\), this shows that the processes \(D_{(k,i),(\ell ,j)}\) are uncorrelated.

However, if \((k,i)\ne (\ell ,j)\), \((k,i)\ne (m,j)\), and \(\ell =m\), the expectation does not vanish:

$$\begin{aligned} {{\mathbb {E}}}\big (D_{(k,i),(\ell ,j)}(t)^2\big ) =&\int _{{{\mathbb {R}}}^d}\bigg ((B^\eta _{ij}*u_{\eta ,j})(t,x) (B_{ij}^\eta *u_{\eta ,j})(t,x) + \int _{{{\mathbb {R}}}^d}\Big (B_{ij}^\eta (x-y)^2u_{\eta ,j}(t,y) \\&-2B_{ij}^\eta (x-y)u_{\eta ,j}(t,y)(B_{ij}^\eta *u_{\eta ,j}(t,x)\Big )\mathrm {d}y\bigg ) u_{\eta ,i}(t,x)\mathrm {d}x \\ =&\int _{{{\mathbb {R}}}^d}\big (((B_{ij}^\eta )^2*u_{\eta ,j})(t,x) - (B_{ij}^\eta *u_{\eta ,j})(t,x)^2\big ) u_{\eta ,i}(t,x)\mathrm {d}x. \end{aligned}$$

This expression is independent of the particle index k and \(\ell \), it depends only on the species numbers i and j. The case \((k,i)=(\ell ,j)\) can be treated in a similar way with the difference that, since \( D_{(k,i),(k,i)}(t) = \eta ^{-d}B_{ii}(0) - B_{ii}^\eta *u_{i,\eta }({\overline{X}}_{k,i}^\eta (t))\), we obtain for \({\mathbb {E}}(D_{(k,i),(k,i)}(t)D_{(k,i),(m,j)}(t))\) an additional term of order \(\eta ^{-2d}\). Hence, we infer from (37) and the previous computation that

$$\begin{aligned}&{{\mathbb {E}}}\int _0^T\sum _{j=1}^n L^3_j(t)^2\mathrm {d}t -\frac{C(n,T)}{N^2\eta ^{2(d+\alpha )}} = \frac{C(n)}{N^2\eta ^{2\alpha }}\sum _{j=1}^n\sum _{\ell =1}^N \int _0^T{{\mathbb {E}}}\big (D_{(k,i),(\ell ,j)}(t)^2\big )\mathrm {d}t \nonumber \\&\quad \le \frac{C(n)}{N\eta ^{2\alpha }}\Vert u_{\eta ,i}\Vert _{L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d))} \nonumber \\&\qquad \times \sum _{j=1}^n\int _0^T\bigg (\Vert (B_{ij}^\eta )^2*u_{\eta ,j}\Vert _{L^1({{\mathbb {R}}}^d)} + \Vert B_{ij}^\eta *u_{\eta ,j}\Vert _{L^2({{\mathbb {R}}}^d)}^2\bigg (1+\frac{1}{\eta ^{2d}}\bigg ) \bigg )\mathrm {d}t \nonumber \\&\quad \le \frac{C(n)}{N\eta ^{2\alpha }}\sum _{j=1}^n\int _0^T \bigg (\Vert B_{ij}^\eta \Vert _{L^2({{\mathbb {R}}}^d)}^2 \Vert u_{\eta ,j}\Vert _{L^\infty ({{\mathbb {R}}}^d)} + \Vert B_{ij}^\eta \Vert _{L^1({{\mathbb {R}}}^d)}^2 \Vert u_{\eta ,j}\Vert _{L^2({{\mathbb {R}}}^d)}^2\bigg (1+\frac{1}{\eta ^{2d}}\bigg )\bigg )\mathrm {d}t \nonumber \\&\quad \le \frac{C(T,n)}{N\eta ^{2(d + \alpha )}}, \end{aligned}$$
(38)

recalling that \(\Vert B_{ij}^\eta \Vert _{L^2({{\mathbb {R}}}^d)}\le C\eta ^{-d/2}\) and \(\Vert B_{ij}^\eta \Vert _{L^1({{\mathbb {R}}}^d)}=A_{ij}\le A\) and choosing \(\eta <1\).

Inserting estimates (35), (36), and (38) for \(L^m_j(t)\) (\(m=1,2,3\)) into (34), we conclude that

$$\begin{aligned}&\sup _{k=1,\ldots ,N}{{\mathbb {E}}}\bigg (\sum _{i=1}^n\sup _{0<s<T}|E_{2,i}(s)|^2\bigg ) \le \frac{C(T,n)}{N\eta ^{2(d+\alpha )}} \\&\qquad \qquad \qquad + C(n,\sigma _{\mathrm{min}})\eta ^{-2(d+1+\alpha )} \int _0^T\sup _{k=1,\ldots ,N}{{\mathbb {E}}}\Big (\sup _{0<s<t} \big |X_{k,i}^{N,\eta }(t)-{\overline{X}}_{k,i}^\eta (t)\big |^2\Big )\mathrm {d}t. \end{aligned}$$

We infer from (32), estimate (33), and the previous estimate for \(E_{2,i}\) that

$$\begin{aligned} S(T)&:= \sup _{k=1,\ldots ,N}{{\mathbb {E}}}\bigg (\sum _{i=1}^n\sup _{0<s<t}|D_{k,i}^{N,\eta }(s)|^2\bigg ) \\&\le \frac{C(T,n)}{N\eta ^{2(d+\alpha )}} + C(n,\sigma _{\mathrm{min}})(\eta ^{-2(d+1+\alpha )}+{T})\int _0^T S(t)\mathrm {d}t. \end{aligned}$$

Note that the function S is continuous because of the continuity of the paths of \(X_{k,i}^{N,\eta }\) and \({\overline{X}}_{k,i}^\eta \). Therefore, by Gronwall’s inequality, we have

$$\begin{aligned} S(T) \le \frac{C(T,n)}{N\eta ^{2(d+\alpha )}} \exp \big (C(n,{T},\sigma _{\mathrm{min}})\eta ^{-2(d+1+\alpha )}T\big ). \end{aligned}$$

We choose \(\delta >0\) such that \(C(n,{T},\sigma _{\mathrm{min}})T\delta < 1\) and \(\eta >0\) such that \(\eta ^{-2(d+1+\alpha )}\le \delta \log N\). Then

$$\begin{aligned} S(T) \le \frac{1}{N}C(T,n)\exp \big (C(n,{T},\sigma _{\mathrm{min}})T \delta \log N\big ) = C(T,n)N^{-1+C(n,{T},\sigma _{\mathrm{min}})T\delta }. \end{aligned}$$

This finishes the proof. \(\square \)

Next, we prove an error estimate for the difference \({\overline{X}}_{k,i}^\eta -{\widehat{X}}_{k,i}\).

Lemma 10

Let \({\overline{X}}_{k,i}^\eta \) and \({\widehat{X}}_{k,i}\) be the solutions to (6) and (8) in the sense of Proposition 4. Under the assumptions of Theorem 5, it holds for small \(\eta >0\) that

$$\begin{aligned} \sup _{k=1,\ldots ,N}{{\mathbb {E}}}\bigg (\sum _{i=1}^n\sup _{0<s<T} \big |({\overline{X}}_{k,i}^{\eta }-{\widehat{X}}_{k,i})(s)\big |^2\bigg ) \le C(T,\sigma _{\mathrm{min}})\eta ^{2(1-\alpha )}. \end{aligned}$$

Proof

Since we are considering N independent copies, we can omit the particle index k. Set \(D_i^\eta (s):={\overline{X}}_{k,i}^{\eta }(s)-{\widehat{X}}_{k,i}(s)\). Then, similarly as in the proof of Lemma 9, \(D_i^\eta (s)=D_1(s)+D_2(s)\), where

$$\begin{aligned} D_1(s)&= -\int _0^s\big (\nabla U_i({\overline{X}}_i^\eta (t))-\nabla U_i({\widehat{X}}_i(t)) \big )\mathrm {d}t, \\ D_2(s)&= \int _0^s\bigg [\bigg (2\sigma _i + 2\sum _{j=1}^n f_\eta \big (B_{ij}^\eta * u_{\eta ,j}({\overline{X}}_i^\eta )\big )\bigg )^{1/2} \\&\quad - \bigg (2\sigma _i + 2\sum _{j=1}^n f\big (a_{ij}u_j({\widehat{X}}_i)\big )\bigg )^{1/2} \bigg ]\mathrm {d}W_i(t). \end{aligned}$$

We infer from the Lipschitz continuity of \(\nabla U_i\) and Fubini’s theorem that

$$\begin{aligned} {{\mathbb {E}}}\Big (\sup _{0<s<T}|D_1(s)|^2\Big )&\le C{T}{{\mathbb {E}}}\bigg (\int _0^T\big |{\overline{X}}_{i}^{\eta }(s)-{\widehat{X}}_{i}(s)\big |^2 \mathrm {d}s\bigg ) \nonumber \\&\le C{T}\int _0^T{{\mathbb {E}}}\Big (\sup _{0<s<t}|D_i^\eta (s)|^2\Big )\mathrm {d}t. \end{aligned}$$
(39)

Similarly as in the proof of Lemma 9, we use for \(D_2\) the Burkholder–Davis–Gundy inequality and the Lipschitz continuity of \(x\mapsto (2\sigma _i+x)^{1/2}\) on \([0,\infty )\) to obtain

$$\begin{aligned} {{\mathbb {E}}}\Big (\sup _{0<s<T}|D_2(s)|^2\Big )&\le C{{\mathbb {E}}}\int _0^T\bigg (\sum _{j=1}^n\big (f(a_{ij}u_j({\widehat{X}}_i)) - f_\eta (B_{ij}^\eta *u_{\eta ,j}({\overline{X}}_i^\eta )) \big )\bigg )^2\mathrm {d}t \nonumber \\&\le C(n)(D_{21}+D_{22}+D_{23}+D_{24}), \end{aligned}$$
(40)

where

$$\begin{aligned} D_{21}&= \sum _{j=1}^n{{\mathbb {E}}}\int _0^T\big (f(a_{ij}u_j({\widehat{X}}_i)) - f_\eta (a_{ij}u_j({\widehat{X}}_i))\big )^2\mathrm {d}t, \\ D_{22}&= \sum _{j=1}^n{{\mathbb {E}}}\int _0^T\big (f_\eta (a_{ij}u_j({\widehat{X}}_i)) - f_\eta (B_{ij}^\eta *u_j({\widehat{X}}_i))\big )^2\mathrm {d}t, \\ D_{23}&= \sum _{j=1}^n{{\mathbb {E}}}\int _0^T\big (f_\eta (B_{ij}^\eta *u_j({\widehat{X}}_i)) - f_\eta (B_{ij}^\eta *u_{j}({\overline{X}}_i^\eta ))\big )^2\mathrm {d}t, \\ D_{24}&= \sum _{j=1}^n{{\mathbb {E}}}\int _0^T\big (f_\eta (B_{ij}^\eta *u_{j}({\overline{X}}_i^\eta )) - f_\eta (B_{ij}^\eta *u_{\eta ,j}({\overline{X}}_i^\eta ))\big )^2\mathrm {d}t. \end{aligned}$$

The first expression \(D_{21}\) vanishes if \(\eta >0\) is sufficiently small, since then \(f=f_\eta \) on the range of \(a_{ij}u_j({\widehat{X}}_i)\). Using

$$\begin{aligned} \Vert a_{ij}u_j-B_{ij}^\eta *u_j\Vert _{L^2(0,T;L^2({{\mathbb {R}}}^d))} \le C\eta \Vert \nabla u_j\Vert _{L^2(0,T;L^2({{\mathbb {R}}}^d))} \le C\eta , \end{aligned}$$

which was shown in the proof of Theorem 3, and the Lipschitz continuity of \(f_\eta \) with Lipschitz constant less or equal \(\eta ^{-\alpha }\), we find that

$$\begin{aligned} D_{22}&= \sum _{j=1}^n\int _0^T\int _{{{\mathbb {R}}}^d}\big (f_\eta (a_{ij}u_j) - f_\eta (B_{ij}^\eta *u_j)\big )^2 u_i\mathrm {d}x\mathrm {d}t \\&\le \eta ^{-2\alpha }\sum _{j=1}^n\Vert u_i\Vert _{L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d))} \Vert a_{ij}u_j-B_{ij}^\eta *u_j\Vert _{L^2(0,T;L^2({{\mathbb {R}}}^d))}^2 \le C(n)\eta ^{2(1-\alpha )}. \end{aligned}$$

Thanks to the uniform boundedness of the family \(B_{ij}^\eta *u_j\), we can choose \(\eta >0\) sufficiently small, say \(\eta \le \eta ^*\) for some \(\eta ^*>0\), such that \(f(B_{ij}^\eta *u_j)=f_\eta (B_{ij}^\eta *u_j)\) for \(0<\eta \le \eta ^*\). Then, using Young’s convolution inequality and the uniform estimate \(\Vert \nabla u_j\Vert _{L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d))}\le C\Vert u_0\Vert _{H^s({{\mathbb {R}}}^d)}\) from Theorem 3, the third term \(D_{23}\) is estimated as

$$\begin{aligned} D_{23}&\le C(\eta ^*)\sum _{j=1}^n \Vert \nabla (B_{ij}^\eta *u_j)\Vert _{L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d)} \int _0^T{{\mathbb {E}}}\big (|{\widehat{X}}_i(t)-{\overline{X}}_i^\eta (t)|^2\big )\mathrm {d}t \\&\le C\sum _{j=1}^n\Vert \nabla u_j\Vert _{L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d)} \int _0^T{{\mathbb {E}}}\big (|{\widehat{X}}_i(t)-{\overline{X}}_i^\eta (t)|^2\big )\mathrm {d}t \\&\le C\int _0^T{{\mathbb {E}}}\Big (\sup _{0<s<t}|D_i^\eta (s)|^2\Big )\mathrm {d}t. \end{aligned}$$

Finally, it follows from the error estimate for \(u-u_{\eta }\) from Theorem 3 that

$$\begin{aligned} D_{24}&\le C\sum _{j=1}^n\int _0^T\int _{{{\mathbb {R}}}^d}|B_{ij}^\eta *u_j - B_{ij}^\eta *u_{\eta ,j}|^2 u_{\eta ,i}\mathrm {d}x\mathrm {d}t \\&\le C\sum _{j=1}^n\Vert u_{\eta ,i}\Vert _{L^\infty (0,T;L^\infty ({{\mathbb {R}}}^d))} \int _0^T\Vert B_{ij}^\eta \Vert _{L^1({{\mathbb {R}}}^d)}^2\Vert u_j-u_{\eta ,j}\Vert _{L^2({{\mathbb {R}}}^d)}^2\mathrm {d}t \\&\le C(T)\eta ^{2}. \end{aligned}$$

Inserting the estimates for \(D_{21},\ldots ,D_{24}\) into (40), we conclude that

$$\begin{aligned} {{\mathbb {E}}}\Big (\sup _{0<s<T}|D_2(s)|^2\Big ) \le C(T,n)\eta ^{2(1-\alpha )} + C({T})\int _0^T{{\mathbb {E}}}\Big (\sup _{0<s<t}|D_i^\eta (s)|^2\Big )\mathrm {d}t. \end{aligned}$$

Together with estimate (39) for \(D_1(s)\) and recalling that \(D_i^\eta =D_1+D_2\), we arrive at

$$\begin{aligned} {{\mathbb {E}}}\Big (\sup _{0<s<T}|D_i^\eta (s)|^2\Big ) \le C(T,n)\eta ^{2(1-\alpha )} + C({T})\int _0^T{{\mathbb {E}}}\Big (\sup _{0<s<t}|D_i^\eta (s)|^2\Big )\mathrm {d}t. \end{aligned}$$

The proof is finished after applying Gronwall’s inequality and summing over \(i=1,\ldots ,n\). \(\square \)

Theorem 5 now follows from Lemmas 9 and 10 and the triangle inequality:

$$\begin{aligned}&\sup _{k=1,\ldots ,N}{{\mathbb {E}}}\bigg (\sum _{i=1}^n\sup _{0<s<t} \big |X_{\eta ,i}^{k,N}(s) - {{\widehat{X}}}_i^k(s)\big |^2\bigg ) \\&\quad \le {2}\sup _{k=1,\ldots ,N}{{\mathbb {E}}}\bigg (\sum _{i=1}^n\sup _{0<s<t} \big |X_{\eta ,i}^{k,N}(s) - {\overline{X}}_{\eta ,i}^k(s)\big |^2\bigg ) \\&\qquad + {2}\sup _{k=1,\ldots ,N}{{\mathbb {E}}}\bigg (\sum _{i=1}^n\sup _{0<s<t} \big |{\overline{X}}_{\eta ,i}^k(s) - {{\widehat{X}}}_i^k(s)\big |^2\bigg ) \\&\quad \le C_1N^{-1+C_2\delta } + C_3\eta ^{2(1-\alpha )}. \end{aligned}$$

The condition \(\log N\ge \delta ^{-1}\eta ^{-2(d+1+\alpha )}\) is equivalent to \(N^{-1+C_2\delta }\le \exp ((-\delta ^{-1}+C_2)\eta ^{-2(d+1+\alpha )})\). We choose \(\delta >0\) such that \(-\delta ^{-1}+C_2<0\) and observe that exponential decay is always faster than algebraic decay to conclude that \(\exp ((-\delta ^{-1}+C_2)\eta ^{-2(d+1+\alpha )})\le \eta ^{2(1-\alpha )}\). This yields

$$\begin{aligned} \sup _{k=1,\ldots ,N}{{\mathbb {E}}}\bigg (\sum _{i=1}^n\sup _{0<s<t} \big |X_{\eta ,i}^{k,N}(s) - {{\widehat{X}}}_i^k(s)\big |^2\bigg ) \le C_4\eta ^{2(1-\alpha )}, \end{aligned}$$

finishing the proof.

7 Numerical Tests

In this section, we perform some numerical simulations of the particle system (5) in one space dimension, without environmental potential, and with linear function \(f(x)=x\). We are interested in the numerical comparison of the solutions to the particle systems (3) and (5) in terms of the segregation behavior. We explore the ability of both systems to model the segregation of the species. Numerical tests for the associated cross-diffusion systems (1) and (2) are work in progress.

We discretize the particle systems (3) and (5) by the Euler–Maruyama scheme. Let \(M\in {{\mathbb {N}}}\) and introduce the time steps \(0<t_1<\cdots <t_M=T\) with \(\triangle t_m=t_{m+1}-t_m\). We approximate \(X_{k,i}^{N,\eta }(t_m)\) by \(x_{m}^{k,i}\) and \(Y_{k,i}^{N,\eta }(t_m)\) by \(y_{m}^{k,i}\), defined by, respectively,

$$\begin{aligned} x_{m+1}^{k,i}&= x_m^{k,i} + \bigg (2\sigma _i + \frac{2}{N}\sum _{j=1}^n \sum _{\ell =1}^N B_{ij}^\eta (x_m^{k,i}-x_m^{\ell ,j})\bigg )^{1/2}\sqrt{\triangle t_m} w_m, \\ y_{m+1}^{k,i}&= y_m^{k,i} - \sum _{j=1}^n\frac{1}{N}\sum _{\ell =1}^N \nabla B_{ij}^\eta (y_m^{k,i}-y_\ell ^{m,j})\triangle t_m + \sqrt{2\sigma _i\triangle t_m} z_m, \end{aligned}$$

with initial conditions \(x_0^{i,k} = \xi _i^k\) and \(y_0^{i,k}=\xi _i^k\), where \(\xi _i^k\) are iid random variables and \(w_m\) and \(z_m\) are normally distributed. It is well known that the solutions to the Euler–Maruyama scheme converge to the associated stochastic processes in the strong sense; see, e.g., (Kloeden and Platen 1992, Theorem 9.6.2).

The numerical scheme is implemented in MATLAB using the parallel computing toolbox to accelerate the simulations. The interaction potential is given by \(B(x)=\exp (-1/(1-x^2))\) for \(|x|\le 1\) and \(B(x)=0\) else. Then \(B_{ij}^\eta (x)=\eta ^{-1}B(x/\eta )\). The numerical parameters are \(\triangle t=1/100\), \(\eta =2\), \(N=5000\) particles, \(n_{\mathrm{sim}}=500\) simulations.

7.1 Two Species: Nonsymmetric Case

We consider a nonsymmetric diffusion matrix with \(a_{11}=0\), \(a_{12}=355\), \(a_{21}=25\), \(a_{22}=0\), and \(\sigma _1=1\), \(\sigma _2=2\). The initial data are Gaussian distributions with mean \(-1\) (for species \(i=1\)) and 1 (for species \(i=2\)) and variance 2. Figure 1 shows the approximate densities of both species (histogram) for systems (5) and (3) at time \(t=2\). We observe a segregation of the densities in both models. In the population system (5), species 1 develops two clusters because of the very different “population pressure” parameters \(a_{12}=355\) and \(a_{21}=25\), while species 2 develops only one cluster around \(x=0\); see Fig. 1 left. The segregation effect is stronger in the particle system (3) in the sense that both species avoid each other as far as possible; see Fig. 1 right. This is not surprising since the diffusion of system (5) is generally larger than that one of system (3). The numerical results confirm the segregation property defined in Bertsch et al. (1985). Indeed, this work considers the cross-diffusion system (3) with \(\sigma _1=\sigma _2=0\) and \(a_{11}=a_{12}=a_{21}=a_{22}=1\). It was proved that the two species are segregated for all times if they do so initially. Here, segregation means that the intersection of the supports of the densities is empty.

Fig. 1
figure 1

Nonsymmetric case: densities of particle system (5) corresponding to the SKT population model (left) and particle system (3) (right) at time \(t=2\). Solid blue line: species 1; Dashed red line: species 2

7.2 Two Species: Symmetric Case

We investigate the symmetric case by choosing \(a_{11}=a_{22}=0\), \(a_{12}=a_{21}=355\), and, as before, \(\sigma _1=1\), \(\sigma _2=2\). The initial data are chosen as in the previous example. In this example, we expect that cross-diffusion dominates self-diffusion. We present the approximate densities for different times in Fig. 2. In both models, the species have the tendency to segregate. As expected, the segregation in the particle system (3) is stronger than in system (5) corresponding to the SKT model.

Fig. 2
figure 2

Symmetric case: densities of particle system (5) corresponding to the SKT population model (left) and particle system (3) (right) for different times \(t=0.01\), 0.15, 2. Solid blue line: species 1; dashed red line: species 2

7.3 Three Species

Our third numerical experiment illustrates the segregation behavior in case of three interacting species with coefficients \(\sigma _1=1\), \(\sigma _2=2\), \(\sigma _3=3\) and

$$\begin{aligned} (a_{ij}) = \begin{pmatrix} 0 &{} 355 &{} 355 \\ 25 &{} 0 &{} 25 \\ 355 &{} 0 &{} 0 \end{pmatrix}. \end{aligned}$$

Similar as in the two-species case, the initial data are overlapping normal distributions with means \(-1\), 2, and \(-3\), respectively, and variance 2. The approximate densities at \(t=2\) are shown in Fig. 3. We observe that the approximate densities of particle model (3) show a much clearer component-wise segregation behavior than the stochastic particle model (5), which corresponds to the SKT system, where the diffusion effects are much stronger. This may be explained by the fact that, on the PDE level, the gradient-flow structure of model (2) can be written species-wise, whereas the SKT model (1) (with \(f(x)=x\)) only posseses a vector-valued gradient-flow structure.

Fig. 3
figure 3

Three-species case: densities of particle system (5) corresponding to the SKT population model (left) and particle system (3) (right) at time \(t=2\). Solid blue line: species 1; dashed red line: species 2; dash-dotted black line: species 3

7.4 Cubic Nonlinearity

For our last experiment, we compare the numerical results for the cubic nonlinearity \(f(s)=s^3\) with the linear case imposed in the previous examples. The parameters are the same as in Sect. 7.2. The numerical simulations are performed without using approximating functions \(f_{\eta }\). This may be justified by the fact that the simulations deal with (relatively) small time scales and with compactly supported initial data. We observe in Fig. 4 that the cubic nonlinearity causes more clustering than the linear case \(f(s)=s\). The simulations suggest that in the cubic case, diffusion happens on a faster time scale than segregation, while in the linear case, the particles diffuse slower and hence they form bigger but fewer clusters.

Fig. 4
figure 4

Densities of particle system (5) corresponding to the SKT population model with \(f(s)=s^3\) (left) and \(f(s)=s\) (right) at time \(t=2\). Solid blue line: species 1; dashed red line: species 2. The right figure is the same as in Fig. 2 (bottom right) but with the range \(x=-50,\ldots ,50\)