Nonlocal approximation of nonlinear diffusion equations

We show that degenerate nonlinear diffusion equations can be asymptotically obtained as a limit from a class of nonlocal partial differential equations. The nonlocal equations are obtained as gradient flows of interaction-like energies approximating the internal energy. We construct weak solutions as the limit of a (sub)sequence of weak measure solutions by using the Jordan-Kinderlehrer-Otto scheme from the context of $2$-Wasserstein gradient flows. Our strategy allows to cover the porous medium equation, for the general slow diffusion case, extending previous results in the literature. As a byproduct of our analysis, we provide a qualitative particle approximation.


Introduction
Nonlinear diffusion equations are ubiquitous in several real world applications. They were introduced to analyse gas expansion in a porous medium, groundwater infiltration, and heat conduction in plasmas, to name a few applications in physics. These applications drove the first rigorous mathematical results by Zel'dovich and Kompaneets in [62] and Barenblatt in [2] regarding important particular weak solutions of nonlinear diffusion equations with homogeneous nonlinearity. The general filtration equation was then first developed in [38]. The use of these equations in oil recovery software is extensive nowadays. Another source of applications of this family of equations arises from population models in mathematical biology: ecological models [4,56,6] derived from probabilistic interpretations [44,34], volume effect in Keller-Segel type models [48,35,8], volume exclusion in cell-cell adhesion models [22,11], and many others.
Although a rigorous mathematical theory has been extensively provided over the years [58,47,17], there are particular aspects of renewed interest in view of novel applications as well as advances in mathematics. For instance, their derivation from interacting particles, with a distinction between deterministic and stochastic methods, has recently attracted attention for its implications in derivation of models in mathematical biology [22] and data science [27]. We take advantage of the gradient flow structure of nonlinear diffusions [47] to connect with nonlocal interaction equations. In fact, we rigorously derive particle approximations of nonlinear diffusions from these variational considerations by approximating their energy functional completing the approach started in [10].
For ease of presentation, let us focus on more standard diffusion equations. Let m ≥ 1 and consider the equation which is better known as the heat equation for m = 1, or the porous medium equation (PME) in the case m > 1. A comprehensive study of the above PDE can be found, e.g., in the book of Vázquez, [58]. Owing to the advances in optimal transport theory, [60,61,54], starting from the seminal works of Jordan, Kinderlehrer, and Otto, [37,47], such diffusion equations are known to be 2-Wasserstein gradient flows for a specific choice of the energy functional. More precisely, the previous equation can be written as (1.1) being δHm δρ the first variation of the energy functional In [37] the equation of interest was the linear Fokker-Planck equation, while Otto focused on the porous medium equation in [47]. Afterwards, a 2-Wasserstein gradient flow approach has been extended to other PDEs, in particular those modelling nonlocal interaction, [19,20,1,14]. The latter equation is of the form (1.1) with v = −∇ δW δρ and Recent works in the literature show a rigorous and fascinating connection between the two energies above for m > 1 in (1.2) and the corresponding dynamics, by means of gradient flow techniques, c.f. [10,7]. More precisely, exploiting the so-called blob method developed in [26], one can notice already at a formal level that an appropriate regularisation of H m transforms a diffusion equation (which is local) into an interaction PDE (which is nonlocal) by choosing a delocalising kernel. For simplicity, let m = 2 and consider a standard family of non-negative radial mollifiers V ε (x) = ε −d V 1 (x/ε) for ε > 0 on R d . Using the commutativity of convolution with even functions such as V ε , it is indeed not difficult to see by setting W ε := V ε * V ε . This observation sheds light on the aforementioned link between local and nonlocal PDEs. As a natural byproduct such a connection provides a rigorous particle approximation for a class of nonlinear diffusion equations. More precisely, this hinges on deterministic approaches for nonlocal interaction equations, since, particles are solutions, i.e. the following empirical measure ρ N t is a weak solution of (1.1) with v = −∇ δW δρ and W as in (1.3) where, for any i = 1, . . . , N , X i (t) solves the ODĖ Further details on this aspect can be found, e.g., in [14,9], and in [32,31] in case of systems of nonlocal PDEs. This structure is advantageous for the computational approximation of continuous solutions to (1.1). The main issue when diffusion is present is that particles do not remain particles. Indeed, if the initial datum is a Dirac delta, we have an immediate smoothing effect, excluding measure solutions. However, numerical evidence of these deterministic particle methods [10] show that this can be achieved, as we shall see later on.
In this manuscript, we consider a general class of internal energy functionals F : P 2 (R d ) → (−∞, +∞] given by where we identify the measure ρ with its density ρ(x) if it is absolutely continuous with respect to Lebesgue measure and P 2 (R d ) denotes the set of probability measures with finite second order moment. We define the regularised internal energy functional F ε : P 2 (R d ) → (−∞, +∞] given by which gives rise to a class of nonlocal PDEs The functional F includes H m , but it is not limited to it, c.f. Section 2. The reader is invited to verify which motivates the consideration of (NLE) as the 2-Wasserstein gradient flow of F ε . Following the strategy proposed in [7], defining the pressure by P (x) := xF ′ (x) − F (x), as in [42,19,1], we construct weak solutions of the nonlinear diffusion equation as a limit of a sequence of weak measure solutions of (NLE), in case F behaves like power laws of porous medium type, for m > 1.
The blob method for diffusion was first introduced in [10] for diffusion equations with the addition of local and nonlocal drifts. Let us mention that a similar approach was used on the previous work [26] approximating nonlocal equations with singular kernels by smooth kernels. The authors in [10] consider a slightly different regularisation of the internal energy which is better for numerical purposes, see [10,Eq. (6)]. Despite this difference, the gradient flow perspective remains at the forefront of their and our present work. The corresponding nonlocal gradient flow is indeed different from (NLE), c.f. [10,Eq. (8)], but it coincides with ours in case m = 2 for the energy H 2 . In [10], Γ-convergence of the regularised energy, as well as that of minimisers is proven for m ≥ 1. The authors show that stability of gradient flows in the ε → 0 can be established for m ≥ 2 using the framework introduced by Sandier and Serfaty in [53,55] and the concept of λ-gradient flows developed in [1]. This strategy requires to verify additional assumptions which are only known to hold in the case m = 2 for an initial datum with finite second order moment and log-entropy, i.e. H 1 [ρ 0 ] < ∞. The result for m = 2 was previously proven in [40], however on a bounded domain with periodic boundary conditions. The blob method in [10] is a deterministic particle method for linear and nonlinear diffusion on R d . Numerical simulations in [10, Section 6] suggest that the particle approximation remains valid even when H 1 [ρ 0 ] = ∞. Relaxing the condition H 1 [ρ 0 ] < ∞ and rigorously proving a quantitative particle approximation is still an open problem, and left for future research.
In the case m = 2, in the same spirit of [10,27], the authors in [7] construct weak solutions of the quadratic porous medium equation as a localising limit (ε → 0) of a sequence of weak measure solutions of the nonlocal interaction equation (NLE), for F (x) = x 2 . The authors work directly at the level of the (nonlocal) equations by means of a time-discretisation scheme which allows to work with lack of convexity, as for instance in the case of cross-diffusion systems, or even PDEs with no purely gradient flow structure. As in [10], finite initial log-entropy is required, thus excluding particle approximation. However, simultaneously to [7], the authors in [27] focus on a weighted (quadratic) porous medium equation which is relevant, e.g. in sampling -the weight,ρ in their notations, represents a target probability measure to be approximated from specific samples drawn from it. The blob method is indeed useful to develop a deterministic particle approximation for the weighted porous medium equation, and, as a byproduct, it provides a way to quantize a targetρ in the long-time behaviour. We stress that also in this work it is essential to assume H 1 [ρ 0 ] < ∞, however using again λ-convexity of the regularised energy one can achieve a rigorous particle approximation as consequence of λ-stability (or contractivity) of Wasserstein gradient flows, as in [1]. This means one can achieve, so far, a qualitative result, as the initial datum needs to be approximated fast enough, c.f. [27,Theorem 1.4]. To the best of our knowledge, a quantitative result has not been achieved yet in more than one dimension. Still in one space dimension, the authors in [29] introduce a deterministic particle approximation for aggregation-diffusion equations, including the porous medium equation for the subquadratic (1 < m < 2) and superquadratic (m > 2) cases. This approach, however, is limited to one space dimension. All the previous three works do not make use of gradient flow techniques. Indeed, other attempts for a particle method have been proposed in the literature. Let us mention two simultaneous numerical methods for linear diffusion (m = 1) [30,52]. In one dimension and for nonlinear diffusions, there are other numerical methods based on the PDE satisfied by the transporting maps, see [36,21,23]. A nice survey of most of the available numerical methods for these families of equations can be found in [18].
Further related to particle methods, we mention the seminal paper by Oelschläger, [45], where a stochastic particle approximation is proven for classical and positive solutions of the quadratic porous medium equation in R d , and for weak solutions in one dimension, and the recent results in [24] for systems. In [34] very weak solutions of the viscous porous medium equation (m > 1) are studied as a limit of a sequence of distributions of the solutions to nonlinear stochastic differential equations generalising previous results [46,43,34]. In [49] strong L 1 -solutions, c.f. [57], of the quadratic porous medium equation are derived from a stochastic mean field interacting particle system with the addition of a vanishing Brownian motion.
Our strategy is different from the aforementioned stochastic approaches as it is based on an optimal transport approach avoiding the addition of higher regularity induced by the (vanishing) viscosity method. We consider a time-discretisation of (NLE) à la Jordan-Kinderlehrer-Otto (JKO), c.f. [37]. This method provides uniform bounds on the approximating sequence in terms of the associated energy and second order moments. Although the sequence solving (NLE) is only a measure, we are able to prove strong L m -compactness of a smoother sequence of solutions for the ε → 0 limit by using the so-called flow interchange technique, c.f. [41]. More precisely, one of our main contributions is to construct weak solutions of (PME) as a subsequential ε → 0 limit of weak measure solutions to for all m > 1. The same result is proven also for (NLE) and (DE). In particular, this extends [7] to the case m > 2, which is not trivial in view of the nonlinearities involved, and to a class of general nonlinear diffusion function. In [10], their gradient flow convergence result for m > 2 was conditional on a uniform BV bound for ρ ε while we make no such assumptions here. Furthermore, we are also able to treat the case 1 < m < 2 which is more challenging due to the lack of regularity at zero. As a byproduct of our analysis we obtain an existence result for nonlocal diffusion equations related to a nonlocal internal energy functional. In particular, we are able to construct weak solutions to (NLE-m) via the JKO scheme for m > 1. While this may not be surprising, this is the first result in this direction to the best of our knowledge. We also provide a particle approximation for (NLE) in case F behaves like power laws, for m > 1. This result is purely qualitative, and quantitative estimates are not proven. Finally, we stress that the strategy we use to construct weak solutions does not require convexity of the internal energy, thus allowing to extend this method to non-convex energies, e.g. nonlinear cross-diffusion systems, see [7]. We leave the extension to systems for a future work as it deserves a deeper analysis.
The case m = 1, i.e. linear diffusion, is not completely covered in our theory, due to the lack of control on the compactness near the logarithmic singularity in the gradient flow approach. More precisely, our strategy does provide an approximating scheme, validated numerically in [10], but we are not able at this stage to identify the limit as solution of the heat equation. Indeed, the logarithmic singularity cannot be coped with for the case m = 1 when the mollifier V 1 is compactly supported. This is indeed one of the reasons we did not assume V 1 is compactly supported in the case m ≥ 2. Similar difficulties are found for the Landau equation [39] in plasma physics, for which efficient deterministic particle methods preserving all the properties of the Landau equation at the discrete level were introduced in [16] using the same strategy as in this work. Moreover such an approximated Landau equation has been analytically studied in [12,13] showing the existence of solutions for the approximated problems where V 1 (x) = e −|x| with an appropriate mollification at the origin. The particular non-compactly supported kernel is crucial in the detailed estimates performed in [12]. Dealing with the logarithmic singularity in these problems is a challenging open problem.
Structure of the manuscript. Section 2 sets the assumptions, notations, and definitions we use in this paper. At the end of Section 2, we state the precise results obtained once the appropriate notions of solutions are introduced. Section 3 focuses on the construction of weak solutions ρ ε to (NLE) (c.f. Theorem 2.1) based on the JKO scheme [37]. Section 4 discusses the strong compactness criteria used to construct a limit ρ (which is the candidate weak solution to (DE)) from the sequence ρ ε . Section 5 verifies that the limit ρ is a weak solution to (DE) (c.f. Theorem 2.2) by passing to the limit ε → 0 from (NLE). In Section 6, we sketch the ideas behind the proofs of Theorem 2.3, which gives conditions for uniqueness of solutions to (NLE), and Corollary 2.1, which provides a particle approximation to (DE). Finally, Section A collects various technical results which, possibly with minor adaptations, already exist in the literature.

Preliminaries and results
The mollifying sequence is generated by V ε (x) = ε −d V 1 (x/ε) for ε > 0. We assume that the generating function V 1 satisfies Depending on the results we prove, we assume the function F : [0, +∞) → (−∞, +∞] satisfies some combination of the following assumptions: (F1) F is a proper, convex, and lower semicontinuous function such that Remark 2.1 (Comments on the assumptions). (F1) is lifted directly from [1, Example 9.3.6] so that F enjoys certain properties; it is well-defined and the associated JKO scheme is well-posed c.f. [37]. 5 For the reader's convenience, we observe the condition lim inf s↓0 ensures (c.f. [1, Remark 9.3.7] and Lemma A.1) that F − (ρ) ∈ L 1 (R d ) whenever ρ ∈ P 2 (R d ) is absolutely continuous with respect to Lebesgue measure. In particular, on any sublevel subset {ρ ∈ P 2 (R d ) | m 2 (ρ) ≤ C}, the functional F is uniformly bounded below. The superlinear growth (c.f. [1,Remark 9.3.8]) and convexity ensure that F is lower semicontinuous in P 1 (R d ).
Assumption (F3) mainly refers to energies lacking regularity at the origin as in the case F (x) = x log x. We stress that (F2) is used to construct solutions to (NLE) in Section 5, however it is not used to derive the compactness estimates in Section 4. Conversely, (F3) is used for the compactness estimates in Section 4 but is not assumed to construct solutions to (NLE). Throughout the manuscript we will denote by P(R d ) the set of probability measures on R d , for d ∈ N, and by P p (R d ) := {ρ ∈ P(R d ) : m p (ρ) < +∞}, being m p (ρ) := R d |x| p dρ(x) the p th -order moment of ρ, for 1 ≤ p < ∞. We shall use P a p (R d ) for elements in P p (R d ) which are absolutely continuous with respect to the Lebesgue measure. For p = 2, the 2-Wasserstein distance between where Γ(µ 1 , µ 2 ) is the class of all transport plans between µ 1 and µ 2 , that is the class of measures γ ∈ P(R 2d ) such that, denoting by π i the projection operator on the i-th component of the product space, the marginality condition is satisfied. In the expression above, marginals are the push-forward of γ through π i . For a measure ρ ∈ P(R d ) and a Borel map T : R d → R n , n ∈ N, the push-forward of ρ through T is defined by for all Borel functions f on R n .
Setting Γ 0 (µ 1 , µ 2 ) as the class of optimal plans, i.e. minimizers of (2.1), the 2-Wasserstein distance can be written as We denote the 1-Wasserstein distance with d 1 and it is defined by We refer the reader to [1,61,54] for further details on optimal transport theory and Wasserstein spaces. Below we recall the concepts of solutions used throughout the manuscript, distinguishing between measure and weak solutions.
, is a weak measure solution to (NLE) if, for every ϕ ∈ C 1 c (R d ) and any t ∈ [0, T ], it holds Definition 2.2 (Weak measure solution to (NLE-m)). An absolutely continuous curve ρ ε : Remark 2.2. By considering fixed ε > 0 and the corresponding scaling for V 1 satisfying (V), the driving velocity field satisfies (2.5) [1, Lemma 8.2.1] provides the existence of a continuous representative for distributional solutions of continuity equations with velocity fields in L 1 ([0, T ]; L 1 (ρ t )). This justifies Definition 2.2 in the sense that the right-hand side of (2.4) is well-defined. Note that a similar computation holds true for the velocity field in (NLE) by applying Lemma A.3, thus justifying Definition 2.1 in the sense that the right-hand side of (2.3) is well-defined.
is an absolutely continuous curve ρ ∈ C([0, T ]; P 2 (R d )) satisfying the following properties: (1) for almost every t ∈ [0, T ] the measure ρ(t) has a density with respect to the Lebesgue measure, still denoted by ρ(t), such that ρ ∈ L ∞ ([0, T ]; L m (R d )) and ∇ρ Remark 2.3. For the sake of clarity we point out the weak solution we obtain initially is The chain rule in Sobolev spaces gives sense to ∇ρ m in L 1 (R d ), hence the more standard concept of weak solution for porous medium equation. A further application of the chain rule identifies ∇ρ m = m m−1 ρ∇ρ m−1 , for m ≥ 2; the same result, however, does not hold in the case 1 < m < 2.
Further details are provided in the proof of Theorem 2.2 in Section 5. Finally, the last condition in Definition 2.3 is a consequence of uniqueness of very weak solutions, cf. [28], and the theory in [1].
Equally, the same concept is extended to general diffusion equations.
) satisfying the following properties: (1) for almost every t ∈ [0, T ] the measure ρ(t) has a density with respect to the Lebesgue measure, still denoted by With the previous definitions, we are ready to state the results of this manuscript.
Theorem 2.1 (Existence for (NLE)). Fix ε > 0 and let V 1 , the generator of the mollifying sequence In the case 1 < m < 2, assume further that suppV 1 ⊂ B R for some R > 0. Let ρ ε be a sequence of weak measure solutions to (NLE) from Theorem 2.1 with initial condition ρ ε (0) = ρ 0 . Then, the sequence ρ ε converges narrowly to the unique weak solution ρ of (DE) as ε ↓ 0.
At first glance, our compactness estimates only show that a subsequence of ρ ε converges narrowly to ρ. However, we appeal to [5,28,58] which imply that weak solutions to (DE) and (PME) are unique. Hence, the entire sequence converges.
In Theorem 2.1, the construction of weak measure solutions ρ ε to (NLE) leverages the JKO scheme [37]. Just at the level of the JKO scheme, only for any m ≥ 1 are admissible. In fact, assumption (F2) enters only when verifying ρ ε is a weak measure solution of (NLE) (c.f. Section 3). This excludes F (x) = x log x, but all the power laws for m > 1 are permitted in this consistency result. Moreover, the assumption that (F m ) holds for some m > 1 in Theorem 2.2 is only used to verify that the limit ρ is a weak solution to (DE). On the other hand, the construction of the limit ρ from the sequence ρ ε allows to relax assumption (F m ) to any m ≥ 1 provided the initial condition ρ ε (0) = ρ 0 belongs in L m ∩ L log L (c.f. Section 4), thus including all of the regularised Rényi entropies H ε m . To summarise in the specific case of F ε = H ε m as the regularised energy, the construction of curves ρ ε and ρ without consideration of the respective equations (NLE-m) and (PME) can be done for any m ≥ 1. However, our technique requires m > 1 to verify that ρ ε is a weak measure solution of (NLE-m). Moreover, when 1 < m < 2, we insist that the generator, V 1 of the mollifying sequence, satisfies (V) and has compact support (in the case m ≥ 2 only (V) is required). It is certainly interesting to investigate how we can close this gap to m = 1 and we leave this direction for future research.
In Theorem 2.2 we prove that the solutions ρ ε to (NLE) coming from the construction in Theorem 2.1 converge to ρ, the unique weak solution of (DE). It is natural to ask whether other solutions ρ ε to (NLE) (not necessarily those constructed via the JKO scheme c.f. Section 3) also converge to ρ. Actually, under additional assumptions on the nonlinearity F and the mollifier V , the sequence ρ ε is unique.
Then, the weak measure solution ρ ε in Theorem 2.1 is unique among absolutely continuous curves ρ : The following concluding result is completely analogous to Theorem 1.2 of [27].
is a weak solution to (NLE) provided the particles satisfy the following ODE systeṁ Then ρ N ε (t) converges narrowly to a weak solution of (DE), ρ(t), for any t ∈ [0, T ].
In view of Corollary 2.1 and [27], if the initial distribution of particles x i ε is cleverly chosen (so that d W (ρ N ε (0), ρ(0)) = O(1/N )), then one can take N = o e −1/ε 2+d(m−1) to fulfill the hypothesis on the initial condition. However, it was also suggested in [26,27] by numerical evidence that a much smaller number of particles N ∼ ε −1.01 for m = 2 in one dimension still yields good accuracy. Bridging this gap between theory and practice is left for future investigation.

Results on the nonlocal equation
In this section we focus on (NLE). We show existence of weak measure solutions by means of the JKO scheme [37] which is needed to derive uniform bounds for the nonlocal-to-local limit proven in Section 5. Although this is not the main purpose of the paper, and it may be unsurprising, this is indeed an existence result for weak measure solutions to a class of nonlocal PDEs, including nonlocal interactions but not limited to this case. To the best of our knowledge this is the first general result in this context -the structure of F ε does not fit in the classical framework of functionals considered in [1]. Note that we do not require the functional to satisfy convexity, for instance as in [1,14].
We consider initial data ρ 0 ∈ P 2 (R d ) such that sup ε>0 F ε [ρ 0 ] < +∞. In the case of nonlinear diffusion equations, F (x) = 1 m−1 |x| m with m > 1 we denote the corresponding energy functionals by More precisely, Young's convolution inequality gives We now proceed with the JKO scheme associated to F ε . First, we define a sequence recursively as follows: • fix a time step τ ∈ (0, 1) such that ρ 0 τ,ε := ρ 0 ; • for n ∈ N and given ρ n The above sequence is well-defined for τ sufficiently small independently of ε (given explicitly in Lemma A.2). Let T > 0 be fixed, and define a piecewise constant interpolation as follows: take N := T τ the largest integer less than or equal to T τ and set ρ ε being ρ n τ,ε defined in (3.1). As usually proven, we derive energy and moments bounds sufficient to show narrow compactness.
The following proof is based on [37,1].
, and, in particular, the following bound for the regularised internal energy where the supremum is over all n = 0, . . . , N and τ ∈ (0, 1) such that N τ ≤ T with N := T τ . By summing up over k in inequality (3.3), we obtain Bounded second moment: We claim the existence of some uniform constant C > 0 (depending on the quantities discussed in the statement of this result) such that By Remark A.1, for any fixed n = 0, . . . , N − 1, we begin with . We use the triangle inequality and Cauchy-Schwarz to estimate the d 2 W term We replace the summation with (3.5) and use n + 1 ≤ N to obtain . We insert the lower bound for F ε from (A.1) so that we have . Keeping in mind that we can assume α < 1 without loss of generality, this final inequality implies the bound (3.6). This can be seen by analysing sequences x n ≥ 0 satisfying x n ≤ C 1 + C 2 x α n . Bounded squared 2-Wasserstein distance: We claim the existence of some uniform constant c > 0 such that We insert the upper bound of F ε (3.4) and the lower bound of . By the uniform second moment estimate (3.6), the inequality (3.7) is verified.
Next, we show thatρ ε provided by Proposition 3.1 is indeed a solution to (NLE), thus proving Theorem 2.1. Since we make use of (F2), the theorem below does not include linear diffusion corresponding to F (x) = x log x.
Proof of Theorem 2.1. Let us consider two consecutive elements of the sequence {ρ n τ,ε } n∈N defined from the JKO step (3.1), i.e. ρ n τ,ε and ρ n+1 τ,ε . We perturb ρ n+1 τ,ε by using the map P σ = id + σζ, for some ζ ∈ C ∞ c (R d ; R d ) and σ > 0, that is we consider the perturbation Being ρ n+1 τ,ε a minimiser of (3.1), we have 1 2τ We now let σ → 0 in (3.10) analysing the two terms involved separately. The energy functional terms in (3.10): In this part of the proof, we aim to show We apply the mean-value form of the Taylor expansion to F (3.12) In the last few lines, we used the definition of ρ σ from (3.9) and expanded the convolution. The limit (3.11) is achieved by first proving x) which appears as the integrand in (3.11). Assuming this is true for now, by Egorov's theorem, for every η > 0, there exists a measurable set S η ⊂ R d such that ρ n+1 τ,ε (S η ) < η and the convergence (3.13) is uniform on R d \ S η . Continuing from the last line of (3.12), we have (3.14) The integral over R d \ S η passes well in the limit σ → 0 owing to (3.13) and Egorov's theorem, so (3.11) is achieved once we show that the integral over S η is small. We apply the mean-value form of Taylor's theorem for V ε and Lemma A.3 (with ρ = tρ σ + (1 − t)ρ n+1 τ,ε and C = 1) to estimate M σ ε and obtain In the second to last line, we have used Fubini and the linear change of variables z = x + sσζ(x) − y for fixed x. Therefore, the integral over S η from (3.14) can be estimated by which is negligible by taking η → 0. Proving (3.13): Throughout this step, we fix x ∈ R d . We again use the mean-value form of Taylor's theorem to rewrite the difference quotient appearing in (3.13) We majorise the integrand with the sequence We seek to apply Theorem A.1 on X = R d with We have already shown the majorisation |f σ (y)| ≤ g σ (y) and the convergence for almost every y ∈ R d can be proven using the usual Dominated Convergence Theorem. In particular, the growth estimate |∇V 1 (z)| ≤ C(1 + |z|) treats the integration 1 0 ds. On the other hand, for M σ ε (y), the composition F ′ (tV ε * ρ σ (y) + (1 − t)V ε * ρ n+1 τ,ε (y)) is bounded uniformly in σ by Lemma A.3. We verify the last assumption of Theorem A.1 using Fubini and the change of variables z = −sσζ(x) + y.
Therefore, we can apply Theorem A.1 and (3.13) is established. The 2-Wasserstein terms in (3.10): the treatment here is standard and we reproduce the proof in [7, Theorem 3.1] for completeness. Consider an optimal transport plan γ n+1 τ,ε ∈ Γ o (ρ n τ,ε , ρ n+1 τ,ε ) between ρ n τ,ε and ρ n+1 τ,ε . By definition of d W , we have 1 2τ where in the last equality we applied a first order Taylor expansion. By sending σ to 0 and recalling (3.10), it holds Repeating the same computation for σ ≤ 0, we actually obtain an equality, that is, for ζ = ∇ϕ Note that the Hölder estimate (3.8) and Taking into account the last equality, by summing in (3.15) over j from h to k, we obtain (3.16) It remains to pass the limit τ ↓ 0 up to a subsequence for ρ ε τ ⇀ρ ε as in Proposition 3.1. More specifically, the result there states that ρ ε τ narrowly converges uniformly in t ∈ [0, T ] toρ ε as (a subsequence of) τ ↓ 0. Clearly, the left-hand side of (3.16) passes easily in the limit τ ↓ 0 so we only focus on the right-hand side. Let us take the following statement for granted: for fixed ε > 0 and almost every r ∈ [0, T ], we have (3.17) Passing to the limit τ ↓ 0 on the right-hand side of (3.16) reduces to finding an L 1 ((s, t); dr) majorant, assuming (3.17) holds. By Young's convolution inequality and Lemma A.3, we have Hence, we can pass to the limit τ ↓ 0 in the right-hand side of (3.16) and conclude. Let us prove (3.17). We fix r ∈ [0, T ] and henceforth drop the explicit dependence on this variable. We add and subtract Fix small η > 0 and find R > 1 large enough such that R d \B R |∇V ε (y)|dy < η where B R denotes the open ball of radius R centred at the origin. We begin with the difference in (3.18) by expanding the convolution Up to a further subsequence, Corollary A.1 gives for τ > 0 sufficiently small. Hence, Concerning the integral over R d \ B R , we apply Lemma A.3 to obtain (uniformly in τ > 0) These inequalities imply that the integral in (3.18) can be made arbitrarily small in the limit τ ↓ 0. Turning to the difference in (3.19), we only need to show that ∇V ε * [F ′ (V ε * ρ ε )] is continuous on suppϕ. Then, we can appeal to the narrow convergence ρ ε τ ⇀ρ ε in duality with continuous and bounded functions. Suppose x n ∈ suppϕ is a sequence which converges to x ∈ suppϕ, we compare the difference The integral over B R can be made arbitrarily small as n → ∞ owing to the uniform continuity of F ′ (V ε * ρ ε (·)) from Corollary A.1 and integrability of ∇V ε . This is similar to what is done for (3.20). The other integral over R d \ B R can be made arbitrarily small by the same argument for (3.21).

Compactness in the limit ε ↓ 0
This section discusses the construction of a limit ρ for a subsequence of {ρ ε } ε>0 . The key estimate is Lemma 4.1 which we are able to prove for general functions F satisfying (F1), (F3), and the growth conditions (F m ).
Assumptions (F1), (F3), and (F m ) cover all the power laws F (x) = 1 m−1 |x| m for m > 1 and F (x) = x log x (corresponding to m = 1). In the case m > 1, (F m ) implies (F2) since F ′ can be extended to x = 0. More precisely, if F ′′ satisfies the bounds in (F m ) for some m > 1, then F ′′ is locally integrable around 0. By the fundamental theorem of Calculus, Owing to Lebesgue's dominated convergence theorem, the right-hand side has a limit as x ↓ 0 and therefore so does the left-hand side which we call F ′ (0) := lim x↓0 F ′ (x).
The inequalities above and the uniform bound for F ε [ρ ε (t)] from Proposition 3.1 yield the following integrability estimate uniform in t ∈ [0, T ] and ε > 0 Concerning the m = 1 case, we directly estimate
The sequence of solutions {ρ ε } ε>0 to (NLE) constructed in Section 3 is the candidate approximating weak solution of (DE). As {ρ ε } ε>0 is in general a sequence of measures, it is useful to consider the regularised version, V ε * ρ ε . For brevity, we drop the tilde on ρ ε from now on. First, we state compactness of {ρ ε } ε>0 in C([0, T ]; P 2 (R d )).
Proof. The proof is exactly the same as in [7, Proposition 4.1] using a refined version of Ascoli-Arzelà [1, Proposition 3.3.1].
The narrow convergence proven in Proposition 4.1 is not sufficient to pass to the limit ε ↓ 0 from (NLE) to (DE). For this reason, we study the sequence v ε (t) := V ε * ρ ε (t) for t ∈ [0, T ] (we drop the subscript k for simplicity). We obtain higher regularity estimates uniform in ε by using the flow interchange technique developed by Matthes, McCann, and Savaré in [41]. The strategy is to compute the dissipation of F ε along a solution of an auxiliary gradient flow. This flow is chosen so that it satisfies an Evolution Variational Inequality (EVI) which allows us to obtain the desired estimate leading to compactness.
Since the seminal work of Jordan, Kinderlehrer, and Otto [37], it is known that the heat equation can be interpreted as the 2-Wasserstein gradient flow of the Boltzmann entropy H (see below for the precise definition). Moreover the heat semigroup, denoted by S H , is a 0-flow in the following sense.
for all t ≥ 0, with respect to every reference measureρ ∈ P 2 (R d ) such that E(ρ) < ∞.
Below we use the flow interchange by considering the heat equation as an auxiliary flow with respect to the Boltzmann entropy Again, when ρ is an absolutely continuous measure with respect to Lebesgue, we identify its density as ρ(x).

Remark 4.2. We remind the reader that H[ρ]
is bounded below by m 2 (ρ). This can be seen by looking at the relative entropy with respect to the standard Gaussian on R d denoted by M(x) = (2π) −d exp{−|x| 2 /2}. For any ρ ∈ P a 2 (R d ), Jensen's inequality with the convex function x log x gives This gives the lower bound for the entropy In the following, for any ν ∈ P 2 (R d ) such that H(ν) < +∞, we denote by S t H ν the solution at time t of the heat equation coupled with an initial value ν at t = 0. Moreover, for every ρ ∈ P 2 (R d ), we define the dissipation of F ε along S H by In order to prove stronger compactness, we begin with an L 2 t H 1 x estimate on the m 2 power of v ε τ = V ε * ρ ε τ . This generalises Lemma 4.1 from [7].
Proof. If m = 1, then the L 2 t L 2 x bound simply reads For m > 1, the estimate is very similar to that of (4.1) applied to the pre-limit curves ρ ε τ , The rest of this proof focuses on the uniform bound for ∇(v ε τ ) m 2 . For s > 0, we take S s H ρ n+1 τ,ε as a competitor against ρ n+1 τ,ε in the minimisation problem (3.1). We thus have 1 2τ which, dividing by s > 0 and passing to lim sup s↓0 , gives In the last inequality we used that S H is a 0-flow. Now, let us focus on the left hand side of (4.5). Firstly, note that Thus, we now compute the time derivative inside the above integral. Using integration by parts, the C ∞ regularity of the heat semigroup, and (F m ), we have (4.7) The previous computation is justified since S t H ρ n+1 τ,ε > 0 everywhere on R d so there is no division by zero. By substituting (4.7) into (4.6), from (4.5) we obtain In order to pass to the limit s ↓ 0 for m > 1, we first deduce V ε * ρ n+1 τ,ε ∈ L m by (F m ) and Proposition 3.1. Second, by standard properties of the heat semigroup, we obtain V ε * S st in L m as s ↓ 0. Notice that the first and second steps are immediate for m = 1. Third, by the inequality Finally, the weak L 2 lower semi-continuity of the H 1 semi-norm gives By summing up over n from 0 to N − 1, taking into account Remark 4.2 and that second order moments are uniformly bounded (see Proposition 3.1), we get For m = 1, the initial entropy is assumed to be bounded. For m > 1, since x log x ≤ x m for any In both cases, the initial entropy is bounded, and this establishes the desired The strong L m compactness in time and space follows by applying a refined version of the Aubin-Lions Lemma due to Rossi and Savaré [50,Theorem 2]. For the reader's convenience we recall the latter result below before presenting the compactness result for {v ε k } k . • a pseudo-distance g : X × X → [0, +∞], i.e., g is lower semicontinuous and such that g(ρ, η) = 0 for any ρ, η ∈ X with F (ρ) < ∞, F (η) < ∞ implies ρ = η. Let U be a set of measurable functions u : (0, T ) → X, with a fixed T > 0. Assume further that Then U contains an infinite sequence (u n ) n∈N that converges in measure, with respect to t ∈ (0, T ), to a measurableũ : (0, T ) → X, i.e.
The two conditions in (4.9) are called tightness and weak integral equicontinuity, respectively. There is a subsequence τ k ↓ 0 such that for any ε > 0, we have Moreover, there is a subsequence ε k ↓ 0 and a curve v ∈ C([0, T ]; Proof. The proof of the result is obtained by applying Proposition 4.2 to a subset of the sequence U := {v ε τ } ε∈(0,ε 0 ),τ >0 for X := L m (R d ) and g := d 1 being the 1-Wasserstein distance -extended to +∞ outside of P 1 (R d ) × P 1 (R d ). As for the functional, we consider F : otherwise.
Note that elements in the domain of the functional F belong to P 1 (R d ), thus 0 = g(ρ, η) = d 1 (ρ, η) implies ρ = η. Let us check that F is an admissible functional. Lower semicontinuity can be easily verified following, e.g., [7].
The Riesz-Fréchet-Kolmogorov theorem provides relatively compactness in L 2 (R d ) of B c . In fact, elements of B c are bounded in L 2 (R d ) and it holds the uniform continuity estimate Before proceeding to the uniform integrability, we record the following improved estimates afforded to us by the fact that B c is a bounded subset of H 1 (R d ). (4.11) In the case d = 1, for any m ≥ 1, we set δ = 1 in the following estimate Hence, uniform integrability is proven in the case d = 1 and m ≥ 1. In fact, for any d ≥ 2 and m = 1, we can simply take δ = 1 again in (4.12) to establish uniform integrability in this case. For general d ≥ 2 and m > 1, we further develop (4.12) by Hölder's inequality to obtain, for a particular choice of δ ∈ (0, 1) which will be made clear, The parameter δ ∈ (0, 1) can be chosen to take advantage of the extra integrability from (4.11). For example, we can take which is permissible in light of the Sobolev embedding (4.11 m . Thus, (4.13) yields the uniform integrability of w in L 2 . 20 We now check tightness and weak integral equicontinuity, i.e. conditions (4.9). Let us set U := {v ε τ } 0<ε≤ε 0 ,0<τ , being v ε τ : [0, T ] → L m (R d ) the sequence defined above by v ε τ = V ε * ρ ε τ , which satisfies Lemma 4.1. For any 0 < ε ≤ ε 0 and τ > 0, it holds where in the intermediate inequalities we used (3.8) for some constant c > 0 (independent of ε, τ, h) as well as standard properties of Wasserstein distances, c.f. for example [54, Section 5.1]. The equicontinuity follows by sending h ↓ 0. In the case τ > h, we use (3.7) instead to estimate where the constant c is defined when proving (3.7).
We are left to prove the relative compactness in L m ([0, T ]; L m (R d )) for all m ≥ 1. We start with the limit τ ↓ 0 for fixed ε > 0. Remember that the estimates we have proven so far are uniform in ε and τ so there is no dependence on ε as τ ↓ 0. We begin with the m > 1 case. The first part in the proof of Lemma 4 . By Proposition 3.1, we know that ρ ε τ narrowly converges toρ ε along a subsequence uniformly in [0, T ]. By testing against smooth functions, we must have agreement between these limits v ε = V ε * ρ ε . Moreover, along a further subsequence which we just label τ ↓ 0, we can apply Proposition 4.2 giving Let us denote the set above by A σ (τ ). For arbitrary σ > 0, we have Similar to (4.1), we can insert Passing to τ ↓ 0 and using lim τ ↓0 |A σ (τ )| = 0, we arrive at Since σ > 0 was arbitrary, this implies the strong L m convergence from v ε τ to v ε = V ε * ρ ε . In the case m = 1, we need to argue differently. We apply [50, Proposition 1.10] which asserts that relatively compactness in L 1 ((0, T ); X) is implied by uniform integrability and relatively compactness in measure as a function with values in X (X ≡ L 1 (R d ) in this proof). Compactness in measure has just been proven as an application of Proposition 4.2. Following [50, Remark 1.11], uniform integrability is a consequence of the strong integral equicontinuity where we used that v ε τ (t) L 1 (R d ) = 1 for any t ∈ [0, T ] and ε > 0. Strong compactness ε ↓ 0: We first claim that the estimate in Lemma 4.1 also holds uniformly for v ε , namely (4.14) This can be seen by the fact that, up to a further subsequence, Indeed, by standard results in L p integration theory and the fact that v ε τ → v ε strongly in L m , there exists w ε ∈ L m ([0, T ] × R d ) such that, along a subsequence, |v ε τ | ≤ w ε for almost every (t, since the integrand converges to 0 pointwise almost everywhere and it is majorised, uniformly in τ , by Owing to the (weak L 2 ) lower semicontinuity of the H 1 seminorm, the estimate in Lemma 4.1 passes to the limit (along a subsequence) τ ↓ 0 and (4.14) is established.
At this point, we can repeat all of the previous argument for U = {v ε } ε∈(0,ε 0 ) . We take the same space X = L m (R d ) and g = d 1 . The same functional F is still admissible. Tightness and weak integral equicontinuity can be analogously proven. Proposition 4.2 applies and we have convergence in measure for v ε to some curve v described in the statement of this result. By the same arguments as before, this convergence is strong in L m ([0, T ]; L m (R d )).

Convergence of solutions
This section addresses the proof of Theorem 2.2. We cover the case m ≥ 2 in Section 5.1 while the case 1 < m < 2 is treated in Section 5.2. To simplify the presentation, we focus on functionals F = H m but we also discuss (see Remark 5.2) the extension to general energies satisfying (F1), (F2), (F3), and (F m ) for convergence from (NLE) to (DE).

5.1.
The case m ≥ 2. Building on the previous discussions from Sections 3 and 4, we denote ρ ε the weak measure solutions to (NLE-m) constructed from the JKO scheme in Proposition 3.1. Moreover, we focus on the subsequence such that v ε = V ε * ρ ε converges to v ∈ C([0, T ]; Starting from the definition of weak measure solution to (NLE-m) we can reformulate the right-hand side as follows: being for any r ∈ [0, T ] and x ∈ R d , the error term The product (V ε * ρ ε r ) m 2 ∇(V ε * ρ ε r ) m 2 in the last line of (5.1) is a weak-strong convergence pair in L 2 t,x . Indeed, recall the uniform L 2 t H 1 x bound on (V ε * ρ ε ) m 2 and Proposition 4.2 from Section 4. Hence, the first integral in the last line of (5.1) passes well in the limit (along a subsequence) ε → 0. This is precised later in the full proof of Theorem 2.2 so we dedicate much of this section to estimates proving that the error vanishes as ε → 0.
Remark 5.1. If V 1 is compactly supported, the argument that the last term in (5.1) vanishes as ε ↓ 0 can be simplified based on the arguments in Section 5.2. In the rest of this subsection however, we present a general argument allowing for V 1 with unbounded support.
Notice that the last term in the last equality of (5.1) can be estimated as for q = 2m m−2 , so that 1 m + 1 q + 1 2 = 1 and Notice that the exponent q is only valid for m ≥ 2 based on the computations above. In order to obtain a solution of (PME) in the ε → 0 + limit, we need to prove that z ε → 0 in L m ([0, t] × R d ), for any t ∈ [0, T ]. In turn, this will imply the error term in (5.1) vanishes as ε → 0, as a consequence of the L 2 version of Lebesgue dominated convergence theorem and weak-L 2 convergence. More precisely, we note that the product z ε (v ε ) m 2 −1 ∈ L 2 ([0, t] × R d ) and it converges to 0 strongly in L 2 .
for q = 2m m−2 , so that 1 m + 1 q + 1 2 = 1; thus it vanishes as ε → 0 similar to Lemma 5.1. As for the first term, note that it can be rewritten as where F ′′ (x)x 2− m 2 is extended by zero when x = 0 owing to (F m ), and we applied the chain rule twice on the set {v ε r > 0} v ε r ∇v ε r = When multiplied with F ′′ (v ε r ), the integrand on the right-hand side makes sense in and applying the generalised version of the Lebesgue dominated convergence theorem. Therefore, in the ε → 0 + limit we obtain (up to pass to a subsequence) , and G(0) = 0, we can apply the chain rule to P ( Note that the chain rule and the construction of the pressure P holds for all m > 1. Uniqueness of distributional solutions of (DE) is proven in [5] for bounded solutions, which is actually the case for L 1 solutions -see [59,15] for further details on the so-called L 1 -L ∞ regularising effect. In particular, from [1, Theorem 11.2.5] we infer that our solution is a 2-Wasserstein gradient flow satisfying Furthermore, uniqueness of solutions implies convergence of the whole sequence ρ ε , as for (PME).
Remark 5.3. Our result can be also interpreted in the context of generalised gradient flows or gradient structures, following the dynamical interpretation of the Wasserstein distance, c.f. [3,33]. More precisely, we know where, for any λ ∈ M(R d ; R d ) such that ρ, j ≪ |λ|, if j ≤ 0 and r = 0, ∞ if j > 0 and r = 0.
Upon using a careful regularisation and cut-off argument one can prove the following chain rule for any absolutely continuous curve with respect to the Wasserstein distance hence re-intepret weak (measure) solutions of (NLE) as the zero level set of the De Giorgi functional The key idea here is to estimate the error z ε from (5.2) differently by exploiting the compact support of V 1 . We take R > 0 such that suppV 1 ⊂ B R . In the case m ≥ 2, negative powers of v ε = V ε * ρ ε never appeared in (5.1) but these computations can be recycled by cautiously avoiding the 0 level set of v ε . We define In the second line, we swapped the convolution against ∇V ε and picked up a minus sign because it is an odd function (remember from (V) that V 1 is even). At this point, we would like to perform integration by parts and apply the gradient onto (V ε * ρ ε r ) m−1 . In contrast to (5.1), when 1 < m < 2 we need to avoid the zero set of V ε * ρ ε r ; as smooth as this convolution may be, the function However, due to Lemma 5.3 (see below), we can justify the integration by parts and develop (5.3) to get The last line is nearly identical to the end result of (5.1). Here, we integrate over A ε r = {V ε * ρ ε r > 0} which is justified by Lemma 5.3. As was the case in Section 5.1, we need to show that the error term in the last line vanishes as ε ↓ 0.
Proof of Theorem 2.2 for 1 < m < 2. Convergence in the first term on the right-hand side of (5.4) can be treated as for the case m ≥ 2, due to Lemma 4.1 and Propositon 4.3. Hence we focus on the error term. For simplicity, let us assume ϕ ∈ C 2 c (R d ) since it can be approximated in such a way as described in Section 5.1. We estimate the error by first expressing it as )dρ ε (y). 27 Next, we apply the Mean Value theorem to the difference |∇ϕ(y) − ∇ϕ(x)| ≤ D 2 ϕ L ∞ |x − y| and obtain Now, we exploit the compact support of the generator V 1 . Since V 1 is supported within B R , then V ε is supported within B Rε which leads to The last integral in (5.4) can be estimated with (5.5) as follows We have suggestively recalled the notation v ε = V ε * ρ ε precisely with Lemma 4.1 in mind; . Hence, the last integral is uniformly bounded in ε. Moreover, the prefactor of vanishing ε implies that the last term of (5.4) converges to zero in the limit, thus recovering By means of the chain rule for Sobolev spaces, one can prove ∇ρ m = 2ρ m 2 ∇ρ m 2 (in the weak sense). More precisely, we can see ρ m = G • u, for u = ρ m 2 and G(x) = x 2 . The work by Dahlberg and Kenig [28] establishes uniqueness of very weak solutions for (PME), for m > 1. As a byproduct, we also infer that our solution is a gradient flow in the sense of [1,Theorem 11.2.5], meaning We are allowed to restrict the integration region to A ε r due to (5.8); if x / ∈ A ε r , then |V * (ρ∇ϕ)(x)| ≤ ∇ϕ L ∞ v(x) = 0. In order to prove (5.6), we wish to show The strategy is to apply the extended Dominated Convergence Theorem (Theorem A.1) by exhibiting an appropriate sequence of majorants to the integrand in the first line of (5.9). The first step is to remember (5.8) and estimate the integrand of I σ as follows (5.10) In the last line, we have distributed v = V * ρ into the difference quotient. We wish to re-express the term v m−1 (x − σe)v(x) as v m (x)+ error term. For this, we use the Mean Value Theorem to write Inserting this into (5.10) yields the estimate ∇V * (ρ∇ϕ)(x − sσe)ds .

(5.11)
We use the Mean-Value Theorem again with the initial difference (valid since m > 1) and insert this into (5.11) to obtain |∇v(x − sσe)|ds.
In the final line of (5.12), we used the following inequality We are now in a position to apply Theorem A.1 with X = (0, t) × R d where y = (r, x) ∈ X and Here, χ A ε r (x) is the indicator function of the set A ε r . Remember that we have suppressed the dependence on r ∈ (0, t) in ρ. The first assumption of Theorem A.1 has been verified by the estimate of (5.12). We can verify the second assumption of Theorem A.1 since the pointwise limits of f σ and g σ as σ → 0 are r is an open set. As for the pointwise limit of g σ , the usual Dominated Convergence Theorem suffices.
It remains to check the third assumption of Theorem A.1 which we do with Fubini; Therefore, by Theorem A.1, we have X f σ (r, x)dy → X f (r, x)dy as σ → 0 which is precisely (5.9); 6. Convexity, uniqueness, and particle approximation In this section, we sketch the argument adapted from [27] to prove Theorem 2.3 and Corollary 2.1. Recall that we assume (F m ) with m > 1. To simplify the exposition, we set F ′ (0) = 0. In view of the assumptions needed for the kernel V 1 , see (V), it is not reasonable to choose V 1 convex, as we require finite second order moment. However, this does not prohibit λ-convexity of the functional F ε along geodesics. Indeed, differentiability of F ε , as in [10, Proposition 3.10], holds in our case assuming F satisfies (F1), (F2), (F3), and it is convex (convexity is ensured by (F m )). Furthermore, we need V 1 ∈ C 2 (R d ) and D 2 V 1 ∈ L ∞ (R d ). In order to give a few explanations in this direction, fix ρ 1 , ρ 2 ∈ P 2 (R d ) and consider the geodesic connecting ρ 1 to ρ 2 defined by Owing to the regularisation by V ε , one can show (generalising and adapting the computations in [27,Propositions 3.4 and 3.6]) that the functional F ε satisfies a geodesic 'above the tangent line' inequality [25,Proposition 2.8] in the sense that The technical assumption F ′ (0) = 0 enters here in the computation of (6.1). As a consequence we infer λ ε F -convexity similar to [10, Proposition 3.11] as well as a characterisation of the subdifferential, [10, Proposition 3.12] -adapted to our functional, meaning that As for the modulus of convexity λ ε F , similarly to [27], using (F m ) and fixed m > 1, we have The information above are enough to prove existence of a unique gradient flow of F ε , for ε > 0 fixed, following [1] and [10, Section 5] for regularised energies, thus proving Theorem 2.3. We omit the details and refer the interested reader again to similar computations in [27].
The regularisation of the energy by mollifiers V 1 satisfying (V) used in this manuscript, as well as in [7], allows to extend stability of gradient flows to the case m > 1 -further assuming V 1 is compactly supported for 1 < m < 2 (c.f. Section 5.2). The advantage of using convex energies is given by the possibility of using stability estimates with the 2-Wasserstein distance so that one obtains a particle approximation when the number of particles involved depends on ε, i.e. N = N (ε). This is a qualitative result, recently proved rigorously in [27,Theorem 1.4], for m = 2. In our setting, we consider (NLE) as a continuity equation with velocity given by −∇V ε * F ′ (V ε * ρ). Then, under mild assumptions on the mollifer V 1 and the function F , the empirical measure ρ N ε (t) = 1 N N j=1 δ x j ε (t) is a weak solution to (NLE) provided the particles satisfy the following ODE systeṁ The regularisation in (NLE) is done in the same spirit as [27], therefore an analogous version of [27,Theorem 1.4] also holds true in our setting, with λ ε F specified above and m > 1. More precisely, as a consequence of the usual stability estimate for λ-gradient flows, [1, Theorem 11.2.1], we know ρ(0)). Therefore, if we assume that for ε → 0 there exists N = N (ε) → +∞ such that e −λ ε F t d W (ρ N ε (0), ρ(0)) → 0, we infer the mean field limit since ρ ε is converging to a weak solution of (DE).

A. Technical proofs
We begin with an extension of the Dominated Convergence Theorem [51,Chapter 4,Theorem 17]. This is quoted with less generality for our purposes.
Theorem A.1 (Extended Dominated Convergence Theorem). Let (f σ ) σ>0 and (g σ ) σ>0 be sequences of measurable functions on a (Lebesgue) measurable set X ⊂ R n such that g σ ≥ 0 and suppose that there exist measurable functions f, g satisfying the following assumptions.
Next, we turn to estimates related to the JKO scheme with the regularised energy F ε and regularity of V and F under various assumptions.
Remark A.1. From the definition of the 2-Wasserstein distance and the inequality |y| 2 ≤ 2|x| 2 + 2|x − y| 2 it follows for some constants c 1 , c 2 > 0 and we can take, without loss of generality, α ∈ d d+2 , 1 . With this notation, we have the following result.
The lower bound for F ε is used to prove that one step of the JKO scheme (3.1) is well-defined.
With this restriction on τ , the lower bound reads Step 2 -The Direct Method: From the previous step, we can find an infimising sequence ρ n ∈ P 2 (R d ) such that As a convergent sequence, we also have sup n∈N d 2 W (ρ n ,ρ) 2τ + F ε [ρ n ] ≤ C < +∞, for some constant C > 0. By Remark A.1, the second moments are uniformly bounded by sup n∈N m 2 (ρ n ) ≤ 2m 2 (ρ) + Cτ.
Therefore, by Prokhorov's theorem, there exists a subsequence, still labelled ρ n , such that ρ n ⇀ ρ (narrow convergence) for ρ ∈ P 2 (R d ). Moreover, it is known (c.f. Remark 2.1) that (F1) provides the required lower semicontinuity with respect to narrow convergence so that These results are enough to prove Proposition 3.1.
In the last line, the constant C ε, V 1 comes from the linear growth assumption of ∇V 1 scaled with ε, and the constant C K absorbs bounds for |x 1 | and |x 2 | since x 1 , x 2 ∈ K.
Turning to the sequence ρ n , standard tightness arguments using the uniform first moment bound yield ρ ∈ P(R d ) such that ρ n ⇀ ρ along a subsequence. As for the uniform convergence over K along a subsequence, properties of the mollification by V ε yield that x → |V ε * ρ n (x) − V ε * ρ(x)| is a continuous function for every n ∈ N. Therefore, for every n ∈ N, there is x n K ∈ K such that sup x∈K |V ε * ρ n (x) − V ε * ρ(x)| = |V ε * ρ n (x n K ) − V ε * ρ(x n K )|.
As well, we can take V i = V ε * ρ(x i ) for i = 1, 2 in (A.2) to conclude the uniform continuity. Note that V i ∈ [0, V ε L ∞ ].
Turning to the limit, the uniform convergence from Lemma A.4 gives the existence of N ∈ N such that for every n ≥ N and x ∈ K, we have |V ε * ρ n (x) − V ε * ρ(x)| < δ 1 .