Rigorous derivation of population cross-diffusion systems from moderately interacting particle systems

Population cross-diffusion systems of Shigesada-Kawasaki-Teramoto type are derived in a mean-field-type limit from stochastic, moderately interacting many-particle systems for multiple population species in the whole space. The diffusion term in the stochastic model depends nonlinearly on the interactions between the individuals, and the drift term is the gradient of the environmental potential. In the first step, the mean-field limit leads to an intermediate nonlocal model. The local cross-diffusion system is derived in the second step in a moderate scaling regime, when the interaction potentials approach the Dirac delta distribution. The global existence of strong solutions to the intermediate and the local diffusion systems is proved for sufficiently small initial data. Furthermore, numerical simulations on the particle level are presented.


Introduction
The aim of this paper is to derive the population cross-diffusion system of Shigesada, Kawasaki, and Teramoto [27] from a stochastic, moderately interacting particle system in a mean-field-type limit. More precisely, we derive the system of equations (1) ∂ t u i = div(u i ∇U i ) + ∆ σ i u i + u i n j=1 f (a ij u j ) , u i (0) = u 0,i in R d , t > 0, where i = 1, . . . , n is the species index, d ≥ 1 the space dimension, u = (u 1 , . . . , u n ) is the vector of population densities, and U i = U i (x) are given environmental potentials. The parameters σ i > 0 are the constant diffusion coefficients in the stochastic system, and a ij ≥ 0 are limiting values of the interaction potentials. In the linear case f (s) = s, we obtain the population model in [27]. System (1) with nonlinear functions f have also been studied in the mathematical literature; see, e.g., [6,9,18]. Such systems can be formally derived from random walks on a lattice, where the nonlinearity originates from the transition rates in the random-walk model [31,Appendix A]. Assuming that the transition rates depend in a nonlinear way on the densities leads to equations similar to (1). We assume that f is smooth but possibly not globally Lipschitz continuous (including power functions). Our results are valid for functions f i depending on the species type, but we choose the same function for all species to simplify the presentation. This paper extends the many-particle limit of [4] leading to the cross-diffusion system a ij u i ∇u j in R d , t > 0, i = 1, . . . , n, which differs from (1) by the drift term, the nonlinear function f , and the diffusion term div n j=1 a ij u j ∇u i . System (2) is the mean-field limit of the particle system for N individuals Y N,η k,i (0) = ξ k i , i = 1, . . . , n, k = 1, . . . , N, where (W k i (t)) t≥0 are d-dimensional Brownian motions and ξ 1 i , . . . , ξ N i are independent and identically distributed (iid) random variables with the common probability density function u 0,i . The functions are interaction potentials regularizing the delta distribution δ 0 , i.e. B η ij → a ij δ 0 as η → 0 in the sense of distributions.
System (1) is derived from an interacting particle system for n species with particle numbers N 1 , . . . , N n , moving in the whole space R d . To simplify, we set N = N i for all i = 1, . . . , n. The key idea of this paper is to consider interacting diffusion coefficients: (5) dX N,η k,i = −∇U i (X N,η k,i )dt + 2σ i + 2 where we set u η,j (X η k,i ) = u η,j (t, X η k,i (t)) for j = 1, . . . , n. The function u η,j satisfies the nonlocal cross-diffusion system (7) ∂ t u η,i = div(u η,i ∇U i ) + ∆ σ i u η,i + u η,i n j=1 f η (B η ij * u η,j ) , u η,i (0) = u 0 i in R d , i = 1, . . . , n, and will be later identified as the probability density function of X η k,i . Note that we consider N independent copies X η k,i , k = 1, . . . , N, and the intermediate system depends on k only through the initial datum.
Then, passing to the limit N → ∞, η → 0 in (5) leads to the macroscopic system X η k,i (0) = ξ k i , i = 1, . . . , n, k = 1, . . . , N, where the functions u i satisfy (1) and can be identified as the probability density functions of X k,i . In this limit, we assume that there exists δ > 0, depending on n, min i σ i , and T , such that (9) η −2(d+1+α) ≤ δ log N holds, where α ≥ 0 depends on the Lipschitz condition of f , see Assumption (A4) below, and that the function f and its dervatives or, alternatively the initial data, are sufficiently small (see Section 2 for details). The main result of the paper is the error estimate We prove this estimate for the potential U i (x) = − 1 2 |x| 2 , but more general functions are possible; see Remark 1. Note that estimate (10) implies propagation of chaos; see Remark 6. In the case α = 0, our scaling (9) for the multi-species case recovers the result in [15], where a single-species, moderately interacting particle system with interaction in the diffusion part was considered. Our strategy is similar to that one of [15] (and based on ideas of Oelschläger [24]). Since we allow for locally Lipschitz continuous nonlinearities only, we obtain a smaller convergence rate compared to [15], which in fact is natural, since we approximate the nonlinearity with functions having a Lipschitz constant of order η −α . A difference to [15] is that the authors assume that the diffusion matrix in the stochastic part is positive definite. We do not suppose such a condition, but we need a smallness condition on the nonlinearity for the existence proofs of systems (1) and (7).
Next, we present a brief overview on the existing literature concerning mean-field limits and moderately interacting many-particle limits in the context of diffusion equations. Mean-field limits from stochastic differential equations have been investigated since the 1980s; see the reviews [12,14] and the classical works by Sznitman [29,30]. Oelschläger proved that in the many-particle limit, weakly interacting stochastic particle systems converge to a deterministic nonlinear process [23]. Later, he generalized his approach for systems of reaction-diffusion equations [24] and porous-medium-type equations with quadratic diffusion [25], by using moderately interacting particle systems. We also refer to the recent work [5], which also includes numerical simulations. As already mentioned, moderate interactions in stochastic particle system with nonlinear diffusion coefficients were investigated for the first time in [15]. Later, Stevens derived the chemotaxis model from a many-particle system [28]. Further works concern the mean-field limit leading to reaction-diffusion equations with nonlocal terms [13], the hydrodynamic limit in a twocomponent system of Brownian motions to the cross-diffusion Maxwell-Stefan equations [26], and the large population limit of point measure-valued Markov processes to nonlocal Lotka-Volterra systems with cross diffusion [11]. The latter model is similar to the nonlocal system (7). The limit from the nonlocal to the local diffusion system was shown in [21] but only for triangular diffusion matrices. The many-particle limit from a particle system driven by Lévy noise to a fractional cross-diffusion system related to (2) was recently shown in [8]. Furthermore, the population system (1) was derived in [7] from a time-continuous Markov chain model using the BBGKY hierarchy. This paper presents, up to our knowledge, the first rigorous derivation of the Shigesada-Kawasaki-Teramoto (SKT) model (1) from a stochastic particle system in the moderate many-particle limit.
Porous-medium-type equations can be derived from stochastic interacting particle systems by assuming interactions in the drift term [10] or in the diffusion term [15]. We allow for interactions in the diffusion part but in a multi-species setting. The paper [11] is concerned with a multi-species framework too, but the authors assume bounded Lipschitz continuous interaction potentials and derive a nonlocal cross-diffusion system only. We are able to relax the assumptions and derive the local cross-diffusion system (1).
Compared to the work [7], we take the limits N → ∞, η → 0 simultaneously. However, our approach also implies the two-step limit. Indeed, we can first perform the limit N → ∞ for fixed η > 0 and afterwards the limit η → 0 on the PDE level; see Lemma 9 and Theorem 3. The simultaneous limit N → ∞, η → 0, satisfying the scaling relation (9), gives a more complete picture, since we can prove the convergence in expectation for the difference of the solutions to the stochastic systems (5) and (8).
Finally, we remark that the cross-diffusion models (1) and (2) have quite different structural properties; also see [2,3]. First, system (2) has a formal gradient-flow structure for each species separately, while system (1) can be written, under the detailed-balance condition [4], only in a vector-valued gradient-flow form. Second, the segregation behavior of both models is different, i.e., segregation is stronger for the solutions to (2) than for model (1); see the numerical experiments in Section 7.
The paper is organized as follows. We present our assumptions and main results in Section 2. The existence of smooth solutions to the cross-diffusion systems (1) and (7) and an error estimate for the difference of the corresponding solutions is proved in Sections 3 and 4, respectively. The proofs are based on Banach's fixed-point theorem and higher-order estimations. We present the full proof since the environmental potential U i (x) = − 1 2 |x| 2 is not square-integrable, which requires some care; see the arguments following (22). Section 5 is concerned with the identification of the solutions to the local and nonlocal cross-diffusion systems (1) and (7), respectively, with the probability density functions associated to the particle systems (8) and (6), respectively. Error estimate (10), the main result of the paper, is proved in Section 6. In Section 7, we present Monte-Carlo simulations for an Euler-Maruyama discretization of system (5) and compare them to the numerical results from the particle system associated to (2). In the appendix, we recall some inequalities used in the paper.

Assumptions and main results
We impose the following assumptions: (A1) Data: σ i ∈ (0, ∞) and ξ 1 i , . . . , ξ N i are independent and identically distributed (iid) square-integrable random variables with the common density function u 0,i for i = 1, . . . , n on the probability space (Ω, F , P ). (A2) Environmental potential: and the Lipschitz constant of f η is less than or equal to η −α for a fixed α ∈ [0, 1). Here, s > d/2 + 1 and a η → ∞ as η → 0. If f is globally Lipschitz continuous, we set α = 0 and f η = f . Remark 1 (Discussion). Environmental potential: The sign of U i guarantees that the populations are dispersed since the drift term becomes −x · ∇u i − u i . We have taken a quadratic potential U i to simplify the presentation. "Dispersive" potentials (i.e. potentials U i with ∆U i ≤ 0) are needed in the analysis, since we cannot bound terms including ∆U i if ∆U i ≥ 0. It is possible to choose general (dispersive) potentials U i ∈ C ∞ (R d ) such that ∇U i is globally Lipschitz continuous, D k U i ∈ L ∞ (R d ) for k = 2, . . . , s+2, the Hessian D 2 U i is negative semidefinite, ∆U i < 0, and D k U i for k = 3, . . . , s is sufficiently small in the L ∞ (R d ) norm. Thus, we may choose U i (x) = −|x| 2 + g(x) and g is a smooth perturbation.
Nonlinearity: Since f is not assumed to be globally Lipschitz continuous, we need to approximate the nonlinearity. The condition on the Lipschitz constant of f η ensures that we have a control on the growth of the Lipschitz constant of f η in the limit N → ∞ and η → 0. This growth condition is needed in the proof of Lemma 9; see (34) and thereafter. The condition s > d/2 + 1 ensures that the embedding H s (R d ) ֒→ W 1,∞ (R d ) is continuous, and this embedding is needed to obtain solutions in H s (R d ) and to derive the estimates.
We introduce some notation. We set and A = max i,j=1,...,n A ij . Let C s > 0 be the constant of the continuous embedding H s (R d ) ֒→ L ∞ (R d ) and set (11) Then, for small η > 0 such that a η ≥ 2AC s u 0 H s (R d ) , we have f η = f on I.
First, we ensure that the nonlocal and local cross-diffusion systems (7) and (1), respectively, have global smooth solutions.
Theorem 2 (Existence for the nonlocal system). Let Assumptions (A2) and (A4) hold, u 0 ∈ H s (R d ; R n ) for s > d/2 + 1, and let η > 0 be such that a η ≥ 2AC s u 0 H s (R d ) . There exists ε > 0 depending on u 0 such that if f C s+1 (I) ≤ ε, system (7) possesses a unique solution u η = (u η,1 , . . . , u η,n ) satisfying The dependence of ε on u 0 can be made more explicit. The proof shows that we need to choose 0 < ε < Cσ where C > 0 is independent of u 0 and σ i . Thus, if f C s+1 (I) is finite, the global existence result is valid for small initial data.
(ii) For each t > 0, the (nNd)-dimensional random variables X η (t) and X(t) possess density functions u η (t) ⊗N and u(t) ⊗N with respect to the Lebesgue measure on R nN d , respectively.
The proof follows from [16] and [22]. Indeed, Theorem 2.9 in [16, page 289] shows that there exist continuous square-integrable stochastic processes, which are strong solutions to (5), (6), and (8), respectively. Strong uniqueness is guaranteed by Theorem 2.5 in [16, page 287]. We conclude from [22, Theorem 2.3.1] that X η (t) and X(t) are absolutely continuous with respect to the Lebesgue measure and thus, they possess density functions u η (t, x) ⊗N and u(t, x) ⊗N , respectively. We prove in Section 5 that the density functions u η and u can be identified with u η and u, the solutions to (7) and (1), respectively.
The following theorem is our main result.
Theorem 5. Let X N,η k,i and X k,i be the solutions to (5) and (8), respectively. Then there exist parameters δ > 0, depending on n, σ min , and T , and ε > 0, depending on u 0 , such where α ≥ 0 is defined in Assumption (A4).
Remark 6. It is well-known that this result implies propagation of chaos in the singlespecies case; see, e.g., [14,Section 3.1]. In the multi-species case, this generalizes for fixed k to the convergence of the k-marginal distribution F k (t) of (X N,η j 1 ,i 1 (t), . . . , X N,η j k ,i k (t)) at any time t > 0 towards the product measure ⊗ k ℓ=1 u i ℓ (·, t) as N → ∞, η → 0, i.e.
where W 2 denotes the 2-Wasserstein distance.

Proof of Theorem 2
We prove the global existence of smooth solutions to the nonlocal system (7). Since η is fixed in the proof, we omit it for u η to simplify the notation. We split the proof in several steps. In the first step, we prove the existence of local-in-time solutions satisfying u i (t) H s (R d ) ≤ 2 u 0 H s (R d ) for 0 < t < T (η) for some (possibly) small T (η) > 0. Actually, we show in the second step, that the factor 2 can be replaced by one. This uniform estimate allows us in the third step to conclude the global existence.
Step 1: Local existence of solutions. In this step, the smallness conditions on η and f are not needed. The idea is to apply the Banach fixed-point theorem on the space where T > 0 will be determined later in this proof. We define the fixed-point operator We need to show that S is well defined. We infer from Young's convolution inequality (Lemma 11) and the embedding i.e., K i (v) is globally Lipschitz continuous. Therefore, a Galerkin argument to verify higher-order regularity shows that, for given v ∈ X T , there exists a unique solution (12). It remains to show that u = (u 1 , . . . , u n ) ∈ X T for some T > 0. The estimations are not difficult, but since ∇U i is not square integrable, some care is needed.
First, we prove higher-order estimates for K i (v). Let α ∈ N d 0 be a multi-index with order |α| = m ≤ s. By Lemma 13 and Young's convolution inequality, where here and in the following, C > 0, C(η) > 0, etc. are generic constants with values changing from line to line. In a similar way, applying Lemmas 11 and 12, since, according to Lemma 13, we can bound sup 0<t<T . We proceed with the proof of u ∈ X T for some T > 0. Applying D α to (12), multiplying the resulting equation by D α u i , and integrating over (0, τ ) × R d for τ < T yields where First, let |α| = m = 0. Then, integrating by parts in I 1 , using Young's inequality, and observing that U i (x) = − 1 2 |x| 2 , It follows from (13) that Inserting this estimate into (16) with α = 0 and applying the Gronwall inequality, we infer that This shows that . Now, let |α| = m ≥ 1. Then, integrating by parts, using ∆U i ≤ 0, and applying Young's inequality again, where we used the fact that D β ∇U i is bounded for |β| = 1 and vanishes for |β| > 1. It follows from integration by parts, K i (v) ≥ 0, and Lemma 14 that We infer from estimates (13) and (14) for K i (v) and the embedding H s (R d ) ֒→ W 1,∞ (R d ) that Finally, we use Lemma 12 and estimates (13) and (15) to obtain Inserting these estimates into (16) and summing over |α| ≤ s, we arrive at Summing over i = 1, . . . , n and applying Gronwall's inequality gives This shows that u ∈ X T , i.e., the operator is well-defined.
Next, we prove that S : X T → X T is a contraction. Let v, w ∈ X T and setv = S(v) andw = S(w). Taking the difference of equations (12) satisfied byv i andw i , respectively, using the test functionv i −w i , and integrating by parts, it follows that where Because of K i (v) ≥ 0 and estimate (13) for ∇K i (v), we find that, by Young's inequality, It follows again from Young's inequality that . We use the fact that f η and f ′ η are globally Lipschitz continuous: Inserting these inequalities into (18) and summarizing the estimates for I 4 , I 5 , and I 6 , we conclude from (17) and summation over i = 1, . . . , n that We apply Gronwall's inequality and the supremum over Step 2: A priori estimates. Let u = u η be the unique solution to (7). We know from for any 0 < t < T . Recall that T = T (η) and hence we do not have uniform estimates in η even for small T > 0 at this step. We show in this step the estimate , which allows us to conclude that the end time T can be arbitrary and actually does not depend on η.
We apply D α to (7) (with |α| = m ≤ s), multiply the resulting equation by D α u i , and integrate over (0, τ ) × R d for τ < T , similarly to the corresponding estimate in Step 1: where . First, let m = 0. Arguing similarly as for I 1 and I 2 , we find that I 7 ≤ 0 and . This gives for m = 0: From this point on, we will need the smallness condition on f η and f ′ η . Because of (21) From now on, we use f ≤ ε and |f ′ | ≤ ε on I for a small ε > 0. Thus, we have Inserting these estimates into (19), we conclude that Choosing ε > 0 sufficiently small, this gives an estimate for u i in L ∞ (0, T ; L 2 (R d )) ∩ L 2 (0, T ; H 1 (R d )).
Next, let m ≥ 1. The estimate for I 7 is delicate since ∇U i ∈ L 2 (R d ), and the corresponding estimate for I 1 cannot be directly used. We split I 7 into two parts: noting that the second terms in both integrals are the same (with different signs) because of Moreover, the last integral in (22) vanishes since ∆U i = −d. In the first integral of the right-hand side of (22), the first-order derivative of U i cancels, while the second-order derivative equals ∂ 2 U i /∂x j ∂x k = −δ jk and all higher-order derivatives of U i vanish. Then a straightforward computation leads to For the estimates of I 8 and I 9 , we need a smallness condition on f and its derivatives. We apply Young's inequality and Lemma 12 to estimate the (more delicate) term I 9 : Estimate (21) shows that f η = f and |f ′ | ≤ ε on I. Then, by similar arguments leading to (20), Moreover, using Lemma 13, the embedding H s (R d ) ֒→ W 1,∞ (R d ), and m ≤ s, , recalling definition (11) of the interval I. Consequently, the estimate for I 9 becomes The term I 8 is treated in a similar way, resulting in Set σ min = min i=1,...,n σ i > 0. We conclude from (19) after summation over |α| ≤ s and i = 1, . . . , n that Thus, for sufficiently small ε > 0, we arrive at the desired estimate uniform in η.
Step 3: Global existence and uniqueness. We have proved that

Proof of Theorem 3
We show the global existence of smooth solutions to the local system (1) and an error estimate for the difference of the solutions to (1) and (7), respectively. First, we prove that a solution u η to (7) converges to a solution u to (1) in a certain sense. Then we prove the error bound in Theorem 3 by estimating the difference u η − u. The key of the proof is the estimate of the difference f η (B η ij * u η,j ) − f η (a ij u η,j ). Step 1. Existence and uniqueness of solutions. Let u η be a smooth solution to (7) and where B R is a ball around the origin with radius R > 0. Then the weak formulation of (7) reads as where ·, · is the duality pairing between H −1 (R d ) and H 1 (R d ) and K i (u) = n j=1 f η (B η ij * u j ). We want to perform the limit η → 0. By the uniform estimate of Theorem 2, there exists a subsequence, which is not relabeled, such that u η ⇀ u weakly in L 2 (0, T ; H s+1 (R d )) and weakly* in L ∞ (0, T ; H s (R d )) ⊂ L ∞ (0, T ; L ∞ (R d )) as η → 0. Our aim is to prove that u is a weak solution to (1).

Because of
we obtain a uniform bound for ∂ t u η,i in L 2 (0, T ; H −1 (B R )) (the bound might depend on R). In particular, up to a subsequence, as η → 0, Since u η is uniformly bounded in L 2 (0, T ; H 1 (B R )), the Aubin-Lions lemma implies the existence of a subsequence (not relabeled) such that u η,i → u i strongly in L 2 (0, T ; L 2 (B R )).
We use the Lipschitz continuity of f = f η on I to infer that ) → 0. This shows the claim. In a similar way, it follows from the Lipschitz continuity of ). The previous convergences allow us to perform the limit η → 0 in (23), leading to where F i (u) = u i (σ i + n j=1 f (a ij u j )). Moreover, u i (0) = u 0,i in B R for any R > 0. Thus, u is a weak solution to (1). Standard estimates show that u is the unique solution, again choosing ε > 0 sufficiently small.
Step 2: Convergence rate. We take the difference of (7) and (1), multiply the resulting equation by u η,i − u i , integrate over (0, τ ) × R d for any τ > 0, and integrate by parts: The first integral on the right-hand side is nonpositive since ∆U i = −d. We split the second integral into three parts: We start with the estimate of J 1 . The families (B η ij * u η,j ) and (B η ij * ∇u η,j ) are bounded in L ∞ (0, T ; L ∞ (R d )). Using f η L ∞ (I) = f L ∞ (I) ≤ ε and Young's inequality, we have Next, we estimate J 2 = J 21 + J 22 , where It follows that Since both B η ij * u η,j and u η,j are uniformly bounded in L ∞ (0, T ; L ∞ (R d )), we can choose η > 0 sufficiently small such that f = f η on I. On that interval, f is Lipschitz continuous uniformly in η. We use this information in By duality, we find that We infer from the uniform boundedness of B η ij * u η,j in L ∞ (0, T ; L ∞ (R d )) and the fact that f ′ η = f ′ on I for sufficiently small η > 0 that where we estimated the difference B η ij * ∇u η,j − a ij ∇u η,j similarly as for J 21 . Furthermore, the Lipschitz continuity of f ′ η = f ′ on I leads to Summarizing these estimates, we infer that and combining the estimate for J 21 and J 22 , It remains to estimate J 3 = J 31 + J 32 , where Similar arguments as above yield The second term J 32 is again split into two parts, J 32 = J 321 + J 322 , where Using the Lipschitz continuity again, f ′ η = f ′ on I, and |f ′ | ≤ ε, we deduce that This shows that Summarizing the estimate for J 31 and J 32 , we arrive at Finally, putting together the estimates (26), (27), and (28), we infer from (25) that This is the desired estimate for the last integral in (24). We conclude for sufficiently small ε > 0 and after summation over i = 1, . . . , n that The proof ends after applying Gronwall's inequality.

Links between the SDEs and PDEs
We show that the density function u from Proposition 4 coincides with the unique weak solution u to (1). Theorem 7. Let the assumptions of Theorem 3 hold. Let X i for i = 1, . . . , n be the squareintegrable process solving (8) with density function u i and let u i be the unique weak solution to (1). Then u = ( u 1 , . . . , u n ) solves the linear equation in the weak integrable sense, i.e.
and t > 0, where we assume that the initial datum u i (0) = u 0,i fulfils (30) is fulfilled for u i instead of u 0,i for almost all t > 0 and all i = 1, . . . , n.
Proof. Since X k,i depends on k only via the initial data ξ k i with the same law u 0,i , we can omit the index k. Let φ ∈ C ∞ 0 ([0, ∞) × R d ) and set F i (u) = σ i + n j=1 f (a ij u j ). By Itô's lemma, we obtain 1/2 ∇φ(s, X(s)) · dW i (s). (31) We claim that the density function u i : [0, ∞) → P 2 (R d ), where P 2 (R d ) is the space of all density functions with finite second moment, is continuous with respect to the 2-Wasserstein distance W 2 . Indeed, since X i is square-integrable, we have u i (t) ∈ P 2 (R d ) for almost all t > 0 and the limit s → t in the Wasserstein distance leads to using the facts that X i is continuous in time and has bounded second moments. This shows the claim. We conclude that the point evaluation u i (t) is well defined.
The previous argumentation shows that we can apply the expectation to (31) to obtain This is the very weak formulation of (29), showing the first part of the theorem. Next, we verify that the solution to (29) is unique. More precisely, we take u 0 = 0 and show that u i (t) = 0 for almost all t > 0. The statement is usually proved by a duality argument. However, the coefficients of the dual problem associated to (29) are not regular enough such that we need to regularize it. As the proof is rather standard but tedious, we only sketch the arguments. Let χ k be a family of mollifiers and consider the regularized dual backward problem on the ball B R around the origin with radius R > 0: We extend the unique smooth solution w k,R to the whole space by setting w k,R = 0 on R d \ B R . Since the extension may be not smooth, we choose a cut-off function ψ R ∈ C ∞ (R d ) and use w k,R ψ R as an admissible test function in the very weak formulation of (29). Standard estimations give bounds for w k,R uniform in k and R. Then, passing to the limit k → ∞, R → ∞ in the weak formulation shows that R d g(x) u i (s, x)dx = 0, and since g was arbitrary, we conclude that u i (s) = 0 for 0 < s < t.
The weak solution u to (1) is also a very weak solution to (29). Therefore, by the previous uniqueness result, u = u.
Similar arguments lead to the following result that relates the solutions u η and u η .

Proof of Theorem 5
The proof is split into two parts. We estimate first the square mean error of the difference X N,η k,i − X η k,i , where X η k,i is the solution to the intermediate system (6). In fact, this error bound is a generalization of a result due to [30]. Essential for this step are the facts that the Lipschitz constant of B η ij is of order η −d−1 , while the Lipschitz constant of f η is of order η −α . Second, we estimate the square mean error of the difference X η k,i − X k,i , based on an estimate of f η (B η ij * u j ) − f η (a ij u j ) in L 2 , which is of the order of η 1−α .
Lemma 9. Let X N,η k,i and X η k,i be the solutions to (5) and (8), respectively, in the sense of Proposition 4. Under the assumptions of Theorem 5, there exists δ > 0, depending on n, σ min , and T , such that if where C(T, n, σ min ) > 0 is a positive constant.
Proof. The process D N,η k,i : We use the global Lipschitz continuity of ∇U i and the Fubini theorem to estimate the first term: Summing over i = 1, . . . , n and taking the supremum over k = 1, . . . , N leads to Next, we apply the Burkholder-Davis-Gundy inequality [16,Theorem 3.28] to the second term E 2,i and use the Lipschitz continuity of x → (2σ i + x) 1/2 for x ≥ 0: where We estimate these three terms separately. By construction, the Lipschitz constant of f η can be estimated by . Therefore, by Fubini's theorem, We can estimate the second term L 2 j (t) in a similar way, leading to The third term L 3 j (t) has to be treated in a different way. First, we use the Lipschitz continuity of f η to find that

This implies that
It remains to estimate the expectation. To this end, we introduce The processes X η k,i and X η ℓ,j are independent, since for i = j, we are considering N independent copies of the same process and for i = j, the equation fulfilled by X η k,i does not depend on the process X η ℓ,j . If (k, i) = (ℓ, j), (k, i) = (m, j), and ℓ = m, the processes D (k,i),(ℓ,j) (t) and D (k,i),(m,j) (t) are orthogonal, since Together with E(D (k,i),(ℓ,j) ) = 0, this shows that the processes D (k,i),(ℓ,j) are uncorrelated. However, if (k, i) = (ℓ, j), (k, i) = (m, j), and ℓ = m, the expectation does not vanish: This expression is independent of the particle index k and ℓ, it depends only on the species numbers i and j. The case (k, i) = (ℓ, j) can be treated in a similar way with the difference that, since , we obtain for E(D (k,i),(k,i) (t)D (k,i),(m,j) (t)) an additional term of order η −2d . Hence, we infer from (37) and the previous computation that We infer from (32), estimate (33), and the previous estimate for E 2,i that Note that the function S is continuous because of the continuity of the paths of X N,η k,i and X η k,i . Therefore, by Gronwall's inequality, we have S(T ) ≤ C(T, n) Nη 2(d+α) exp C(n, T , σ min )η −2(d+1+α) T .
Next, we prove an error estimate for the difference X Proof. Since we are considering N independent copies, we can omit the particle index k.
We infer from the Lipschitz continuity of ∇U i and Fubini's theorem that Similarly as in the proof of Lemma 9, we use for D 2 the Burkholder-Davis-Gundy inequality and the Lipschitz continuity of x → (2σ where The first expression D 21 vanishes if η > 0 is sufficiently small, since then f = f η on the range of a ij u j ( X i ). Using a ij u j − B η ij * u j L 2 (0,T ;L 2 (R d )) ≤ Cη ∇u j L 2 (0,T ;L 2 (R d )) ≤ Cη, which was shown in the proof of Theorem 3, and the Lipschitz continuity of f η with Lipschitz constant less or equal η −α , we find that Thanks to the uniform boundedness of the family B η ij * u j , we can choose η > 0 sufficiently small, say η ≤ η * for some η * > 0, such that f (B η ij * u j ) = f η (B η ij * u j ) for 0 < η ≤ η * . Then, using Young's convolution inequality and the uniform estimate ∇u j L ∞ (0,T ;L ∞ (R d )) ≤ C u 0 H s (R d ) from Theorem 3, the third term D 23 is estimated as Finally, it follows from the error estimate for u − u η from Theorem 3 that Inserting the estimates for D 21 , . . . , D 24 into (40), we conclude that We choose δ > 0 such that −δ −1 + C 2 < 0 and observe that exponential decay is always faster than algebraic decay to conclude that exp( finishing the proof.

Numerical tests
In this section, we perform some numerical simulations of the particle system (5) in one space dimension, without environmental potential, and with linear function f (x) = x. We are interested in the numerical comparison of the solutions to the particle systems (3) and (5) in terms of the segregation behavior. We explore the ability of both systems to model the segregation of the species. Numerical tests for the associated cross-diffusion systems (1) and (2) are work in progress.
We discretize the particle systems (3) and (5) by the Euler-Maruyama scheme. Let M ∈ N and introduce the time steps 0 < t 1 < · · · < t M = T with △t m = t m+1 − t m . We approximate X N,η k,i (t m ) by x k,i m and Y N,η k,i (t m ) by y k,i m , defined by, respectively, with initial conditions x i,k 0 = ξ k i and y i,k 0 = ξ k i , where ξ k i are iid random variables and w m and z m are normally distributed. It is well known that the solutions to the Euler-Maruyama scheme converge to the associated stochastic processes in the strong sense; see, e.g., [17,Theorem 9.6.2].
7.1. Two species: nonsymmetric case. We consider a nonsymmetric diffusion matrix with a 11 = 0, a 12 = 355, a 21 = 25, a 22 = 0, and σ 1 = 1, σ 2 = 2. The initial data are Gaussian distributions with mean −1 (for species i = 1) and 1 (for species i = 2) and variance 2. Figure 1 shows the approximate densities of both species (histogram) for systems (5) and (3) at time t = 2. We observe a segregation of the densities in both models. In the population system (5), species 1 develops two clusters because of the very different "population pressure" parameters a 12 = 355 and a 21 = 25, while species 2 develops only one cluster around x = 0; see Figure 1 left. The segregation effect is stronger in the particle system (3) in the sense that both species avoid each other as far as possible; see Figure  1 right. This is not surprising since the diffusion of system (5) is generally larger than that one of system (3). The numerical results confirm the segregation property defined in [1]. Indeed, this work considers the cross-diffusion system (3) with σ 1 = σ 2 = 0 and a 11 = a 12 = a 21 = a 22 = 1. It was proved that the two species are segregated for all times if they do so initially. Here, segregation means that the intersection of the supports of the densities is empty. 7.2. Two species: symmetric case. We investigate the symmetric case by choosing a 11 = a 22 = 0, a 12 = a 21 = 355, and, as before, σ 1 = 1, σ 2 = 2. The initial data are chosen as in the previous example. In this example, we expect that cross-diffusion dominates selfdiffusion. We present the approximate densities for different times in Figure 2. In both models, the species have the tendency to segregate. As expected, the segregation in the particle system (3) is stronger than in system (5) corresponding to the SKT model. Similar as in the two-species case, the initial data are overlapping normal distributions with means −1, 2, and −3, respectively, and variance 2. The approximate densities at t = 2 are shown in Figure 3. We observe that the approximate densities of particle model (3) show a much clearer component-wise segregation behavior than the stochastic particle model (5), which corresponds to the SKT system, where the diffusion effects are much stronger. This may be explained by the fact that, on the PDE level, the gradient-flow structure of model (2) can be written species-wise, whereas the SKT model (1) (with f (x) = x) only posseses a vector-valued gradient-flow structure.  Figure 3. Three-species case: Densities of particle system (5) corresponding to the SKT population model (left) and particle system (3)  The parameters are the same as in Section 7.2. The numerical simulations are performed without using approximating functions f η . This may be justified by the fact that the simulations deal with (relatively) small time scales and with compactly supported initial data. We observe in Figure 4 that the cubic nonlinearity causes more clustering than the linear case f (s) = s. The simulations suggests that in the cubic case, diffusion happens on a faster time scale than segregation, while in the linear case, the particles diffuse slower and hence they form bigger but fewer clusters.

Appendix A. Auxiliary results
For the convenience of the reader, we recall some well-known estimates used in this paper.