Large deviations and gradient flows for the Brownian one-dimensional hard-rod system

We study a system of hard rods of finite size in one space dimension, which move by Brownian noise while avoiding overlap. We consider a scaling in which the number of particles tends to infinity while the volume fraction of the rods remains constant; in this limit the empirical measure of the rod positions converges almost surely to a deterministic limit evolution. We prove a large-deviation principle on path space for the empirical measure, by exploiting a one-to-one mapping between the hard-rod system and a system of non-interacting particles on a shorter domain. The large-deviation principle naturally identifies a gradient-flow structure for the limit evolution, with clear interpretations for both the driving functional (an `entropy') and the dissipation, which in this case is the Wasserstein dissipation. This study is inspired by recent developments in the continuum modelling of multiple-species interacting particle systems with finite-size effects; for such systems many different modelling choices appear in the literature, raising the question how one can understand such choices in terms of more microscopic models. The results of this paper give a clear answer to this question, albeit for the simpler one-dimensional hard-rod system. For this specific system this result provides a clear understanding of the value and interpretation of different modelling choices, while giving hints for more general systems.

1. Introduction 1.1. Continuum modelling of systems of interacting particles. Systems of interacting particles can be observed in physics (e.g. gases, liquids, solutions [Rue69,Lig06]), biology (e.g. populations of cells [TMPA08]), social sciences (e.g. animal swarms [Oku86]), engineering (e.g. swarms of robots [BFBD13]), and various other fields. Such systems are routinely described with different types of models: particle-or individual-based models characterize the position and velocity of every single particle, while continuum models characterize the behaviour of the system in terms of (often continuous) densities or concentrations. While particle-based models contain more information and may well be more accurate, continuum models are easier to analyze and less demanding to simulate, and there is a natural demand for continuum models that describe such systems in as much accuracy as possible.
In this paper we focus on systems of particles for which the finite size of the particles has a prominent effect on the larger-scale, continuum-level behavior of the system. For such systems a wide range of continuum models has been postulated (see below) but very few of these continuum models have been rigorously justified, and particularly the dynamics of these continuum models have rarely been rigorously justified. We present here a rigorous derivation of the continuum equation that describes such a class of interacting particles, but restricted to one space dimension. The proof uses large-deviation theory, and this method identifies not only the limiting equation but also the gradient-flow structure of the limit. In this way it gives a rigorous derivation of the Variational Modelling structure of the limit.
1.2. Sterically interacting particles. In many applications, the finite size of the particles can be witnessed in various ways, such as in the natural upper bound on the density of such particles, in the way particles 'push away' other particles (cross-diffusion, possibly leading to uphill diffusion), and in the striking oscillations of ion densities near charged walls (see e.g. [GFVdE87,HFEL12,Gil15]).
Characterizing the macroscopic, continuum-level behaviour of such 'sterically interacting' particles (from the Greek στερεός for 'hard, solid') is a major challenge, and various communities have addressed this challenge. In the mathematical community, in one-dimensional systems the behaviour of single, 'tagged' particles has been characterized [LP67,MP02,LA09], the equilibrium statistical properties of the ensemble were determined by Percus [Per76], and the dynamic continuum limit was first derived by Rost [Ros84]. In higher dimensions cluster expansions have opened the door to accurate expansions of the free energy [JKM15,Jan15]. In a number of papers Bruna and Chapman [Bru12,BC12b,BC12a,BC14] have given asymptotic expressions for the continuum-limit partial differential equation in the limit of low volume fraction. A related line of research focuses on strongly interacting particle systems with soft interaction; Spohn characterized the central-limit fluctuations in an infinite system of interacting Brownian particles [Spo86], and Varadhan proved the continuum limit as n → ∞ for a system of n interacting particles [Var91,Uch94] in one dimension.
Despite all this activity, however, the main question for this paper is still open: Which continuum-level partial differential equation describes the evolution of systems of many finitesize particles, and what is the corresponding gradient-flow structure? Before describing the answer of this paper we first comment on the philosophy of Variational Modelling, which underlies both this paper and some of the work in this area.
1.3. Variational Modelling. Many strongly damped continuum systems can be modelled by gradient flows; they are then fully characterized by a driving functional (e.g., a Gibbs free energy) and a dissipation mechanism that describes how the system dissipates its free energy. By choosing these two components one fully determines the model, and the model equations are readily derived as an outcome of the these two inputs. We call this way of working Variational Modelling; recent examples can be found in e.g. [Zie83,Doi11,AWTSK18], and the lecture notes [Pel14] describe this modelling philosophy and its foundations in detail.
The quality of a variational-modelling derivation rests on the quality of the two choices, the choice of the driving functional and the choice of the dissipation (e.g., drift-diffusion or Wasserstein gradient flow). Different combinations of choices, however, can lead to the same equation (see e.g. [PRV14] or [DFM18,Eq. (2.1)]). Therefore, it is not possible to assess the quality of the independent modelling choices, nor to deduce that the combined choices are right, based on comparison of the model predictions to particle-based simulations, or to experimental data. Accordingly, there is great importance in systematically determining the driving functional and dissipation mechanism from 'first principles'. In recent years it has been discovered that not only the free energy, but also the dissipation mechanism can be rigorously deduced from an upscaling of the underlying particle system, by determining the large-deviation rate functional in the many-particle limit. In this way various free energies and dissipation mechanisms have been placed on a secure foundation [ADPZ13,MPR14,PRV14,BP16,MPR16].
In a series of works, Hyon, Horng, Lin, Liu, and Eisenberg applied a special case of the variational modelling approach, called 'Energetic Variational Approach', to derive evolution equations for hard-sphere ions [HLLE12,LE14a,HEL11]. However, the authors solely focused on the effect of the finite size on the free energy. The impact of the finite size on the dissipation mechanism, and therefore the dynamics, was not considered, and implicitly taken to be the same as for zero-size particles.
Instead, in this work we develop a systematic derivation of both the driving functional and the dissipation mechanism for the system of this paper: a one-dimensional system of Brownian hard rods.
1.4. The model: One-dimensional hard rods. The system that we consider is a collection of n hard rods of length α/n, for α ∈ (0, 1), that are free to move along the real line R, except that they may not overlap. The position of each rod is given by its left-hand point Y n i , i.e. the rod occupies the space [Y n i , Y n i + α/n). Since the rods can not overlap, the state space is the 'swiss cheese' space Ω n := y ∈ R n : ∀i, j, i = j, |y i − y j | ≥ α/n . (1) Note that the length α/n is scaled such that the total volume fraction of the rods is O(1). The evolution of the rods is that of Brownian motion in a potential landscape with the non-overlap constraint, where we additionally allow for mean-field interaction between the particles. For this paper we choose as potentials an on-site potential V and a two-particle interaction potential W , and both are assumed to be sufficiently smooth.
In the interior of Ω n we therefore solve where B i are independent one-dimensional standard Brownian motions. On the boundary ∂Ω n we assume reflecting boundary conditions.
This system has been studied before. For the case V ≡ W ≡ 0, Percus [Per76] calculated various distribution functions for finite n. Also for V ≡ W ≡ 0, Rost [Ros84] proved that in the the n → ∞ limit, the empirical measureŝ converge almost surely (the continuum limit) to a solution of the nonlinear parabolic equation Bodnar and Velazguez generalized this convergence by allowing for n-dependent W n that shrinks to a Dirac delta function as n → ∞, leading to an additional term ∂ y (ρ∂ y ρ) in the equation above [BV05]. Bruna and Chapman studied the related case of fixed n in the limit α → 0, in arbitrary dimensions, and calculated the approximate limit equation up to order O(α) [BC12b,BC12a,BC14,Bru12]. Based on analogy with continuum limits in other interacting particle systems (see e.g. [Oel84,DG87]), one would expect that the continuum limit for the case of non-zero V and W is ( Here (W * ρ)(y) = R W (y − y )ρ(dy ) is the convolution of W with ρ. The aims of this paper are (a) to prove the limit equation above rigorously, and (b) show that it has a variational, gradient-flow structure that is generated by large deviations in a canonical way.
1.5. Main result I: Large deviations of the invariant measure. The first step in probing the variational structure of equation (3) is to derive the 'free energy' that will drive the gradient-flow evolution. For reversible stochastic processes, such as the particle system Y n , it is well known (see e.g. [MPR14]) that this driving functional is given by the large-deviation rate functional of the invariant measure.
Under our conditions on V and W , the system of particles Y n has an invariant measure where L n | Ωn is n-dimensional Lebesgue measure restricted to Ω n , and Z n is the normalization constant Our first main result identifies the large-deviation behaviour of these invariant measures. In this paper, P(R) is the space of probability measures on R.
Theorem 1.1 (Large-deviation principle for the invariant measures). Assume that the functions V and W satisfy Assumption 4.4. For each n ∈ N, let Y n ∈ R n have law P inv n , and let ρ n := 1 n n i=1 δ Y n i ∈ P(R) be the corresponding empirical measure. Then the measures ρ n satisfy a large-deviation principle with good rate function 2F: as n → ∞.
The functionalF is non-negative by definition; our assumptions on V and W imply thatF has at least one minimizer at value zero, and possibly more than one.
The large-deviation principle (6) gives a characterization of the behaviour of the empirical measures ρ n that can be split into two parts: (1) With probability one, and along a subsequence, ρ n converges to a minimizer ofF.
(This can be proved using the Borel-Cantelli lemma; see e.g. [PS19, Th. A.2].) (2) The event that ρ n ≈ ν where ν is not a minimizer ofF becomes increasingly unlikely as n tends to infinity; in fact, it is exponentially unlikely in n, with a prefactor 2F(ν) that depends on ν. Large values ofF(ν) correspond to 'even more unlikely' behaviour of ρ n than smaller values.
1.6. Gradient flows. We now turn to the evolution. A gradient-flow structure is defined by a state space, a driving functional, and a dissipation metric [Pel14,Mie16a]. The driving functional was identified above asF; the large-deviation principle that we prove below will indicate that the state space for this gradient-flow structure is the metric space given by the set P 2 (R) of probability measures of finite second moment (i.e. R y 2 ρ(dy) < ∞) equipped with the Wasserstein metric.
We describe the Wasserstein metric and Wasserstein gradient-flow structures in more detail in Section 2; here we only summarize a few aspects. The Wasserstein distance W 2 is a measure of distance between two probability measures on physical space. When modelling particles embedded in a viscous fluid, the appearance of the Wasserstein distance in gradient-flow structures can be traced back to the drag force experienced by the particles when moving through the fluid. This is illustrated by the property that if n particles are dragged from positions y 1 , . . . , y n to positions y 1 , . . . , y n in time τ , then the minimal viscous dissipation as a result of this motion is given by the Wasserstein distance between the two empirical measures: The drag-force parameter c depends on the size of the particles and the viscosity of the fluid. This connection between the Wasserstein distance and viscous dissipation is described in depth in the lecture notes [Pel14,Ch. 5].
For the functionalF one can formally define a 'Wasserstein gradient' grad WF (ρ) for each ρ as a real-valued function on R given by grad WF (ρ)(y) := −∂ y ρ∂ y ξ (y), Therefore (3) can be rewritten abstractly as the Wasserstein gradient flow ofF, In this context there are two natural solution concepts for equation (3). The first is the more classical, distributional defintion.
Definition 1.2 (Distributional solutions of (3)). A Lebesgue measurable function ρ : [0, T ] → P(R) is a distributional solution of (3) if it is a solution in the sense of distributions on (0, T ) × R of the (slightly rewritten) equation We also use a second solution concept that is more adapted to the gradient-flow structure. The monograph [AGS08] formulates a number of alternative solution concepts for the general idea of a 'metric-space gradient flow'; in this paper we focus on the following one, called Curve of Maximal Slope in [AGS08] and Energy-Dissipation Principle in [Mie16b], and attributed originally to De Giorgi [DGMT80]. Here • AC 2 ([0, T ]; P 2 (R)) is the space of absolutely continuous functions ρ : [0, T ] → P 2 (R) (see Definition 2.6); • The metric derivative |ρ| of a curve ρ ∈ AC 2 ([0, T ]; P 2 (R)) is defined as • The local slope is One can calculate that the metric velocity |ρ|(t) and the metric slope |∂F|(ρ) are formally given by the expressions (see Sections 2.3 and 2.4): with ξ given in (9).
Each gradient-flow solution also is a distributional solution, and for given initial datum ρ(0) gradient-flow solutions are unique (see Lemma 2.11).
The definition (12) is inspired by the smooth Hilbert-space case, in which | · | is a Hilbert norm, and an expression of the form of equation (12) for a curve x and a functional Φ in Hilbert space can be rewritten as where DΦ(x) is the Hilbert gradient (Riesz representative) of Φ at x. The right-hand side in (15) is non-negative, its minimal value is zero, and this minimal value is achieved wheṅ for almost all t. For metric spaces, under some conditions, the same is true for (12): the right-hand side is non-negative, its minimal value is zero, and this minimal value is achieved in exactly one curve x among all curvesx with given initial datumx(0). In other words, equations such as (12) and (15) both define a flow in the state space, and this is what we call a 'gradient flow' in this paper. In recent years it has become clear that expressions of the type of (12) and (15) arise naturally as large-deviation rate functions associated with stochastic processes, typically in a many-particle limit; we describe this in detail below for the system of this paper, and the general scheme can be found in [MPR14,Prop. 3.7]. Through such connections, gradient-flow structures of various partial-differential equations can be understood as a natural consequence of the upscaling from a more microscopic system of which the PDE is a scaling limit [ADPZ11, ADPZ13, DPZ13, DLR13, MPR14, PRV14, EMR15]. In addition, this connection provides a natural way to derive and understand new gradient-flow structures for equations in the long term. In this paper we use this method to investigate the gradient-flow structure that arises in this simple one-dimensional, hard-rod system. 1.7. Main result II: Large deviations of the stochastic evolutions. The second main theorem of this paper then describes the large-deviation behaviour of the empirical measures ρ n (t) = 1 n n i=1 δ Y n i (t) as functions of time, i.e. in the state space C [0, T ]; P(R) . The choice of initial data for the process Y n i requires some care. From the point of view of equation (3) we would like to fix a measure ρ • ∈ P(R) and then select initial data Y n i (0) such that the empirical measures 1 n n i=1 δ Y n i (0) converge to ρ • as n → ∞. However, not all ρ • ∈ P(R) are admissible, since initial data for (3) should have Lebesgue density bounded by 1/α. This is a natural consequence of the fact that each particle occupies a section of length α/n, and it is also visible in the degeneration of the denominators in (3) and (7).
Given some ρ • satisfying this restriction, one might try to draw initial data Y n i (0) i.i.d. from ρ • , since then with probability one we have n −1 n i=1 δ Y n i (0) ρ • . This is still problematic, since the strong interaction between the rods implies that the initial data for Y n i can never be chosen independently. Instead, in the theorem below, we choose initial data for the Y n i by modifying a version of the invariant measure P inv n instead.
Let f ∈ C b (R), and define the tilted, 'W = 0' invariant measure P inv,f n ∈ P(R) by Under this measure the particles are i.i.d. Also define the tilted free energŷ if ρ is Lebesgue-absolutely-continuous and ρ(y) < 1/α a.e., +∞ otherwise (17) where the constant C f is chosen such that infF f = 0. The functionalF f is strictly convex and coercive, and we write ρ •,f for the unique minimizer ofF f : (It is not hard to verify that any ρ can be written this way, provided it satisfies ρ < 1/α a.e. and log(ρ/e −2V ) ∈ C b (R).) Theorem 1.4 (Large-deviation principle on path space). Assume that V, W satisfy Assumption 4.4. For each n, let the particle system t → Y n (t) ∈ R n be given by (2), with initial positions drawn from the tilted invariant measure P inv,f n . The random evolving empirical measures ρ n (t) = 1 n n i=1 δ Y n i (t) then satisfy a large-deviation principle on C [0, T ]; P(R) with good rate functionÎ f : as n → ∞.
1.8. Consequences: the limit equation as a Wasserstein gradient flow. The largedeviation rate functionalÎ f in (18) can be decomposed aŝ where GF[F, W 2 ](ρ) is shorthand for the right-hand side in the gradient-flow definition in (12), with driving functionalF and dissipation metric W 2 . Both terms are non-negative, and they represent different aspects of the large-deviation behaviour of the sequence of particle systems Y n . The first term, 2F f (ρ(0)), characterizes the probability of deviations of the initial empirical measure ρ n (0) = 1 n n i=1 δ Y n i (0) from the minimizer ρ •,f ofF f . The second term GF[F, W 2 ](ρ) measures deviations of the time course t → ρ n (t) from 'being a solution of the gradient flow (3)' (or (10)). For minimizers both terms are zero, implying the following Minimizers ofÎ f describe the typical behaviour of empirical measures ρ n , by the Borel-Cantelli argument that was already mentioned above: Corollary 1.6. The curve of empirical measures t → ρ n (t) converges almost surely in C([0, T ]; P(R)) to a (unique) solution ρ of (3) with initial datum ρ(0) = ρ •,f .
Although the λ-convexity ofF already guarantees existence of gradient-flow solutions by [AGS08], Corollary 1.6 trivially gives the same: Corollary 1.7. Equation (3) with initial datum ρ •,f has a gradient-flow solution.
1.9. Ingredients of the proofs. As in many proofs of large-deviation principles, the core of the argument is Sanov's theorem, which provides a large-deviation principle for independent particles.
In the case of this paper, however, the particles are not only correlated, but the hard-core interaction is a very strong one. A central step in the proof is to replace this strong interaction by a weaker one. This step is done by the second main ingredient, a mapping from the hardrod particle system Y n to a system of weakly-interacting zero-length particles called X n . This map appears to have been known at least to Lebowitz and Percus [LP67] and was used to prove the many-particle limit by Rost [Ros84] and later by Bodnar and Velazguez [BV05].
y 1 y 2 y 3 y 4 The idea behind this mapping is to map the original collection of rods of length α/n to a collection of zero-length particles by 'collapsing' or 'compressing' them to zero length and moving the rods on the right-hand side up towards the left (see Figure 1). Two particles Y n i and Y n i+1 that collide at some time t 0 are mapped by this transformation to two particles X n i and X n i+1 that occupy the same point x at time t 0 . While the compressed particles X n i and X n i+1 remain ordered for all time (X n i (t) ≤ X n i+1 (t)), the distribution of the empirical measures remains the same if the two particles are allowed to pass each other instead. Mapping the length of the particles to zero therefore allows us to remove the non-passing restriction, and by removing this restriction we eliminate the strong interaction between particles.
The price to pay is that after transformation the effects of the on-site potential V and the interaction potential W come to depend on the whole particle system. This happens because the amount that particle X n i should be considered 'shifted to the right' is equal to α/n times the number of particles X n j that are-at that moment-to the left of X n i , and that therefore the force exerted by the on-site potential V (for instance) is equal to −V X n i (t) + α n # j ∈ 1, . . . , n : X n j (t) < X n i (t) . This force on the particle X n i depends in a discontinuous manner on the positions of all particles. Had this force been smooth, a standard application of Varadhan's Lemma would convert Sanov's theorem into a large-deviation principle for the particle system X n , as done by e.g. Dai Pra and Den Hollander (see [DPdH96] or [dH00, Ch. X]). Since it is not smooth, however, we use a recent result by Hoeksema, Maurelli, Holding, and Tse [HMHT20], that generalizes Varadhan's Lemma to mildly singular and discontinuous forcings (Theorem 6.2 below).
Finally, a fortuitous property of the expansion and compression maps is that they are isometries for the Wasserstein metric. This implies that the metric structure of the largedeviation rate functionalÎ f -in terms of the metric velocity |ρ| and the metric slope |∂F|transforms transparently from the X n to the Y n particle system. 1.10. Conclusion and discussion, part I: Mathematics. We have proved a large-deviation principle on path space for a one-dimensional system of hard rods, in the many-particle limit. This large-deviation principle characterizes the entropy of the system as a function of the density, and identifies the limit evolution as a Wasserstein gradient flow of the entropy.
From a mathematical point of view, this result can be interpreted in different ways: (1) It rigorously establishes equation (3) as the continuum limit of the particle system, in the sense that the empirical measures ρ n converge to a solution of (3). While this result was proved for the case V = W = 0 by Rost in [Ros84], it is new for the case of non-zero V and W .
(2) In addition, it establishes the functionalF as the driving functional and the metric W 2 as the dissipation of the gradient-flow structure for equation (3). This result is new, also for the case V = W = 0.
The difference between W 2 -and narrow topology. Hidden in the notation of the two largedeviation theorems is a subtlety concerning topology. The W 2 -topology is central to the gradient-flow structure, and we argue here that this structure arises from the large deviations.
On the other hand, the two large-deviation principles themselves are proved in the narrow topology on P(R), which is weaker. The large-deviation theorems themselves probably do not hold in the stronger W 2 -topology. For independent particles this can be recognized in the characterization of the validity of Sanov's theorem in Wasserstein metric spaces by Wang, Wang, and Wu [WWW10]. These authors show that Sanov's theorem is invalid without exponential moments on the underlying distribution, and this condition is much stronger than the first-moment condition induced by V in the case of this paper.
This begs the question how the W 2 -topology is generated by the large-deviation rate function while not being part of the large-deviation principle. The answer is that if I f (ρ) is finite and if the initial datum ρ(0) is in P 2 (R), then ρ(t) ∈ P 2 (R) for all time t; this is shown in Lemma 8.4. However, ρ(t) ∈ P 2 (R) is a much weaker property than finiteness of exponential moments of ρ(t), which is necessary for exponential tightness in W 2 of the underlying particle system.
1.11. Conclusion and discussion, part II: Consequences for modelling. This largedeviation result also gives rise to a rigorous Variational-Modelling derivation of the limit equation (3). It explains and motivates the choice of the modified entropyF as the driving functional and the Wasserstein distance as the dissipation.
The appearance of the driving functionalF is expected. The first integral inF arises as a measure of 'free space' after taking into account the finite length of the particles; this becomes apparent in the discussion of the 'compression' map in Section 4. The second and third integrals are relatively standard contributions from on-site and interaction potentials.
On the other hand, the appearance of the Wasserstein distance as the dissipation metric is unexpected. This is the same metric as for non-interacting particles [DG87,KO90,ADPZ11,Pel14], and the result therefore shows that incorporating steric interactions does not change the dissipation metric, a fact that is surprising at first glance.
This fact can be understood from the proof, however. It is related to the property that the compression and expansion maps are isometries for the Wasserstein distance. The central observation is the following: the total travel distance between a set of initial points y 1 , . . . , y n and final points y 1 , . . . , y n is the same as the total travel distance between the corresponding compressed set of initial points x 1 , . . . , x n and final points x 1 , . . . , x n . This is true because in the minimization problem (8) the optimal permutation of the particles is such that particles preserve their ordering, and therefore the compression mapping moves the points y σ(i) and y i to the left by the same amount.
This result therefore is intrinsically limited to the one-dimensional setup of this paper. In higher dimensions there is no such compression map, but one can still wonder whether the dissipation of particles with finite and with zero size might be both respresented by the Wasserstein distance. This appears not to be the case: we illustrate this in Figure 2. In addition, in the case of multiple species the metric can certainly not be Wasserstein, since particles moving in opposite directions will be forced to move around each other. Figure 2. In higher dimensions the metric will not be Wasserstein. In one dimension (left), linear interpolation of particle positions preserves admissibility: if the initial and final positions do not overlap, then the intermediary positions also do not overlap. In higher dimensions, this is false: two spheres arranged in admissible configurations may collide under linear interpolation (top right). We expect that the metric in higher dimensions therefore will be non-Wasserstein, since it will have to accommodate particles 'moving around' each other (bottom right).

Comparison with Bruna & Chapman's approximate equation.
In a series of publications [Bru12, BC12b, BC12a, BC14], Bruna and Chapman analyze systems of hard spheres with Brownian noise in the limit of small volume fraction. Their approach is to apply a singular-limit analysis to the Fokker-Planck equation associated with the particles, and this allows them to address this issue in all dimensions and for finite numbers of particles. For the setup of this paper with W = 0, Bruna finds an approximate equation in the small-α limit [Bru12, App. D] This equation is also found by a Taylor development of the denominator in (3). Similarly applying a formal Taylor development toF in (7), we find that equation (19) has a formal 'approximate' gradient flow structure Bruna, Burger, Ranetbauer, and Wolfram study the concept of approximate gradient-flow structures in more detail in [BBRW17b,BBRW17a].
Comparison with Poisson-Nernst-Planck type models with steric effects. As described in the introduction, a wide family of generalized Poisson-Nernst-Planck models has been derived by modelling the effect of the finite particle size on the driving functional (free energy) of the system, while assuming that the dissipation mechanism is the same as for systems with point ions. Our work shows that the last assumption is valid for the case of a single species of hard rods in one spatial dimension, and is therefore consistent with the current literature. As illustrated above, however, in the case of multiple species in higher dimensions a form of cross-diffusion is to be expected. We present an example of such a system for charged particles in [GEY18]. Here the mobility matrix is nonlinear and degenerate, in that transport of particles of species A to a region diminishes with increasing concentration of that species in that region. The mobility matrix is also non-diagonal, reflecting inter-diffusion, i.e., the movement of an ionic species must involve counter movement of water and other ionic species. (This also is observed in limits of lattice models with exclusion; see e.g. [BSW12]). Furthermore, while in classical Poisson-Nernst-Planck theory, the diffusivity of the ions is proportional to their concentration, the modified equation show a super-linear increase of diffusivity with ionic concentration. This increase reflects the solvent tendency to diffuse to the regions of high ionic concentration and may be a significant effect since the entropy per volume of many small particles is larger than the entropy of a fewer larger particles and the solvent molecules are typically significantly smaller than the ions. This work should be considered a step towards the study of such systems.
1.12. Overview of the paper. In Section 2 we introduce the Wasserstein distance, Wasserstein gradient flows, and inverse cumulative distribution functions, which play a central role in the analysis. In Section 3 we introduce large-deviation principles. In Section 4 we formally define the systems that we study and the compression and expansion maps that we mentioned above. In Section 5 we formally define various functionals that appear in the analysis, and prove a number of properties. In Sections 6, 8, and 9 we prove Theorems 1.1 and 1.4 in three stages, while Section 7 is devoted to a number of estimates used in Section 8.
1.13. Notation. We sometimes write R x and R y to distinguish state spaces for particle systems of 'compressed' particles (usually called X n , sometimes Z n ) and 'expanded' particles Y n . For measures µ on R we write µ(x) for the Lebesgue density and µ(dx) for the measure inside an integral. For time-dependent measures µ(t, dx) we write both µ(t) and µ t for the measure µ(t, ·), and correspondingly µ 0 and µ(0) both indicate the measure µ(0, ·).

|ρ|(t)
Metric Generators for X n and Y n stochastic particle systems Def. 4.5 Ω n State space for particle system Y n (1) Probability measures with finite second moments and W 2 -metric Sec. 2.3 P n (R) Empirical measures of n points on R (32) P inv n Invariant measure for Y n (4)

Measures and the Wasserstein metric
The Wasserstein gradient of a functionalF, and the corresponding gradient flow, was informally defined in Section 1.5. There is an extensive literature on the Wasserstein metric and its properties [Vil03, AGS08, Vil09, San15], but for the discussion of this paper we only need a number of facts, which we summarize in this section.
2.1. Preliminaries on one-dimensional measures. The concept of push-forward will be used throughout this work: Definition 2.1 (Push-forwards). Let f : R → R be Borel measurable, and µ ∈ P(R). The push-forward f # µ ∈ P(R) is the measure µ • f −1 , and has the equivalent characterization for all Borel measurable ϕ : R → R.
The Wasserstein distance and the energy functionals in this paper have convenient representations in terms of inverse cumulative distribution functions.
Definition 2.2 (Inverse cumulative distribution functions). Let µ ∈ P(R). Let F : R → [0, 1] be the right-continuous cumulative distribution function: Then the inverse cumulative distribution function X of µ is the generalized (right-continuous) inverse of F , The following lemma collects some well-known properties of inverse cumulative distribution functions.
Lemma 2.3. Let µ ∈ P(R) and let X be the inverse cumulative distribution function of µ.
(1) X is non-decreasing and right-continuous; (2) If µ is absolutely continuous, then F (X(m)) = m, X (m) exists for Lebesgue-almostevery m ∈ [0, 1], and for those m we have X (m) = 1/µ(X(m)); (3) For all Borel measurable ϕ : Proof. The function X is obviously non-decreasing, and the right-continuity is a direct consequence of the definition. To characterize X (m) for absolutely-continuous µ, first note that F then is an absolutely continuous function; by [Win97, Prop. 1] we then have F (X(m)) = m for all m ∈ [0, 1]. Since X is monotonic, it is differentiable at almost all m ∈ [0, 1]. Let M be the set of such m; then for each m ∈ M , .
First, assume that X (m) = 0. The identity above then implies that F is not differentiable at x = X(m); the set X of such x is a Lebesgue null set of R, and since the function F has the 'Lusin N' property [Bog07, Def. 9.9.1] the corresponding set of values F (X ) has Lebesgue measure zero as well. For all m in the full-measure set M \F (X ) we therefore have that X (m) exists and is non-zero, and by the calculation above X (m) = 1/F (X(m)) = 1/µ(X(m)). Finally, the transformation rule (20) is proved in [Win97, Th. 2].
2.2. Narrow topology and the dual bounded-Lipschitz metric. We will be using two topologies on spaces of probability measures. The first type is the narrow topology, often called the weak topology of measures, which can be defined in various ways. For the purposes of this paper it is convenient to introduce it through the set BL(R) of bounded Lipschitz functions on R, with norm The narrow convergence on P(R) is metricised by duality with the set of bounded Lipschitz functions, leading to the dual bounded-Lipschitz metric Alternative ways of defining the same topology are by the Lévy metric or through duality with continuous and bounded functions [Rac91]. When we write P(R), we implicitly equip the space with the dual bounded-Lipschitz metric d BL .
2.3. The Wasserstein metric. We write P 2 (R) for the space of probability measures with finite second moments, Definition 2.4 (Wasserstein distance). The Wasserstein distance of order 2 between measures µ and ν in P 2 (R) is defined by where Γ(µ, ν) is the set of couplings ('transport plans') of µ and ν, i.e. of measures γ ∈ P(R × R) such that for all Borel sets A ⊂ R.
In this paper we always consider P 2 (R) to be equipped with the metric W 2 .
Lemma 2.5 (Properties of the Wasserstein metric).
(2) We have the characterization where X µ and X ν are the inverse cumulative distribution functions of µ and ν.
and the continuity equation In addition, if µ ∈ AC 2 ([0, T ]; P 2 (R)), then the metric derivative |μ| defined in (13) exists at a.e. t ∈ [0, T ], and satisfies Proof. The statement follows directly from [AGS08, Th. 8.3.1]; we only need to prove unique- We then calculate for a < b and Since the right-hand side does not depend on (b − a), we find , and therefore f = 0. It follows that µv and µṽ are almost everywhere equal.
The subdifferential is a closed convex subset of L 2 (µ); if it is non-empty, it therefore admits a unique element (1) In the context of Definition 2.9, if the subdifferential is non-empty, then the local slope (14) is finite and satisfies (2) The following chain rule holds. Let F : P 2 (R) → R ∪ {∞} be λ-convex, and let µ ∈ AC 2 ([0, T ]; P 2 (R)) be such that (a) µ t is Lebesgue-absolutely-continuous and ∂F For any 0 ≤ s ≤ t ≤ T and any selection ξ σ ∈ ∂F (µ σ ) we then have where v t is the velocity field given by Lemma 2.7.
2.5. Wasserstein gradient flows. Recall from Section 1.6 the definition of 'gradient flow' that we use here, applied to the case of the Wasserstein metric space P 2 (R) and a functional F : By [AGS08, Th. 11.1.3], if F is proper, lower semicontinuous, and λ-convex, then solutions in this sense satisfy the pointwise property In the case of the Wasserstein gradient flow ofF, we show in Lemma 5.1 thatF satisfies these properties, and that ∂ •F (ρ) is ∂ y ξ, where the function ξ was already introduced in (9), If ρ is a gradient-flow solution withF(ρ 0 ) < ∞, then writing (29) aŝ it follows that |∂F(ρ t )| < ∞ and therefore ∂ y ξ(ρ t ) ∈ L 2 (ρ t ) for almost all t. Therefore solutions ρ of the gradient flow ofF satisfy in the sense of distributions on (0, T ) × R. Comparing equation (30) with (11) it follows that ρ satisfies (11) in the sense of distributions if we prove that ∂ y ψ(ρ t ) (y) = ρ t (y) ∂ y ψ(ρ t ) (y) in the sense of distributions on (0, T ) × R. This identity follows from the next Lemma and the fact that ∂ y ξ(ρ t ) ∈ L 2 (ρ t ).

Large-deviation principles
The theory of large deviations characterizes the probability of events that become exponentially small in an asymptotic sense. Consider a sequence of probability measures {γ n } ∞ n=1 on some space X . Large-deviation theory describes exponentially small probabilities under the γ n 's in the limit n → ∞, in terms of a r ate function I : X → [0, ∞], in the following (rough) sense: for A ⊂ X , This is formalized by the notion of a large-deviation principle. Before giving the definition we define the type of functions I of interest in this setting; X is here taken to be a complete separable metric space. Note that for a good rate function lower semicontinuity follows from the compact sublevel sets.
We are now ready to state the definition of a large-deviation principle. The definition can be made more general, however the following form suffices for this paper.
Definition 3.2. Let {γ n } be a sequence of probability measures on a complete separable metric space X . We say that the sequence {γ n } satisfies a large-deviation principle with rate function I : where A • andĀ denote the interior and closure, respectively, of the set A.
This definition is also referred to as a strong large-devation principle and there is a related notion of a weak large-deviation principle: The sequence {γ n } is said to satisfy a weak largedeviation principle, with rate function I, if the lower bound in the previous definition holds for all measurable sets, and the following upper bound holds for every α < ∞: If γ n → δ x for some x ∈ X , that is if the sequence of underlying random elements has a unique deterministic limit as n → ∞, then I(x) = 0 and I(x) > I(x) for allx ∈ X \ {x}.
A useful result when dealing with large deviations is the so-called contraction principle, a continuous-mapping-type theorem for the large-deviation setting.
Theorem 3.4 (Contraction principle for large-deviations [DZ98]). Let X and Y be two complete separable metric spaces and f : X → Y a continuous mapping. Suppose the sequence {γ n } ⊂ P(X ) satisfies a large-deviation principle with good rate function I : X → [0, ∞]. Then the sequence of push-forward measures {f # γ n } satisfies a large-deviation principle on Y with good rate functionĨ defined as This result can be extended to 'approximately continuous' maps (see [DZ98,Section 4.2]). To prove the main theorems of this paper we will use both the standard contraction principle above and a version with n-dependent maps.
We will also use the following 'mean-field' localization result two times.
Lemma 3.5 (Simple mean-field large-deviations result). Let X be a metric space. For each n let P n ∈ P(X ), and for each n and each y ∈ X let Q y n ∈ P(X ). Assume that for each y, Q y n satisfies a strong large-deviation principle with good rate function I y . Let f : X → R be lower semi-continuous. Assume that (2) If I y (y) = ∞, sup δ>0 sup n≥1 1 n log P n (B δ (y)) − 1 n log Q y n (B δ (y)) + f (y) =: C < ∞.
Then P n satisfies a weak large-deviation principle with good rate function x → I x (x) + f (x).
Proof. By [DZ98, Th. 4.1.11], the sequence P n satisfies a weak large-deviation principle provided that for all y ∈ X , lim δ↓0 lim inf n→∞ 1 n log P n (B δ (y)) = lim δ↓0 lim sup n→∞ 1 n log P n (B δ (y)), in which case the common value of the two is the negative of the rate function at y.
If y is such that I y (y) < ∞, then by condition 1, and using the lower semi-continuity of I y , Similarly, This proves (31) for the case I y (y) < ∞. If I y (y) = ∞, then by condition 2, This concludes the proof of the lemma.
4. The particle system Y n and the transformed particle system X n As mentioned in the introduction, the proofs of the results of this paper are based on a 'compression' mapping that is very specific for this system, and which was already illustrated in Figure 1. 4.1. The 'compression' map. The idea is to consider a collection of rods of length α/n in the one-dimensional domain R y , described by their empirical measure, and map them to a collection of zero-length particles by 'collapsing' them to zero length and moving the rods on the right-hand side up towards the left. For notational reasons we prefer to define the inverse operation, which is to map zero-length particles in R x to particles of length α/n in R y by 'expanding' each zero-length particle to length α/n and 'pushing along' all the particles to the right.
This mapping comes in two forms, one for the discrete case and one for the continuous case. For convenience we write P n (E) for the set of empirical measures of n points, i.e. (1) The operator A n maps empirical measures of zero-length particles to the corresponding empirical measures of rods by expanding each particle by α/n: where we assume that the x i are ordered (x i ≤ x i+1 ).
To estimate the difference W 2 (Aµ, A n µ n )−W 2 (µ, µ n ), the same formulation of the Wasserstein distance in terms of icdf's becomes The opposite inequality follows similarly. Finally, to prove part 3, the fact that A maps 1 n n i=1 δ x i to 1 n n i=1 δ α,n x i +α(i−1)/n ∈ P(R y ) follows from remarking that δ x 1 is mapped by A to the left-most smeared delta function δ α,n x 1 ; the second one, δ x 2 , to δ α,n x 2 +α/n ; and so forth.
4.2. Mapping particle systems. The compression and decompression maps A n and A −1 n create a one-to-one connection between two stochastic particle systems, which is the basis for the proofs of the two main theorems. We now make this connection explicit. First, given a measure µ on R x , the maps A and A n induce corresponding maps from R x to R y , made explicit by the following Lemma. (37) , and x i < x i+1 for all i, then A n µ = (T µ ) # µ.
(2) If X is the icdf of an absolutely continuous µ, then (3) If µ is Lebesgue-absolutely-continuous, andμ ∈ P(R x ), then Aµ = (T µ ) # µ and (t Aμ Next, assume that µ ∈ P(R x ) is Lebesgue-absolutely-continuous; then the cumulative distribution function F (x) := µ((−∞, x]) is continuous, and consequently the inverse cumulative distribution function X satisfies F (X(m)) = m for all m (Lemma 2.3). The expression (38) then follows from remarking that Turning to part 3, we have for any ϕ ∈ C b (R y ), writing Y for the icdf of Aµ as in Definition 4.1, This proves Aµ = (T µ ) # µ for absolutely-continuous µ.
Finally, to prove that (t Aμ Aµ − id)Aµ = (T µ ) # (tμ µ − id)µ , we write similarly, using (20) and (23), 4.3. The particle systems of this paper. We now state the assumptions on V and W and define precisely the systems of particles that we consider in this paper.
Assumption 4.4 (Assumptions on V and W .). Throughout the paper we make the following assumptions.
(V ) The function V : R → R is C 2 (R), globally Lipschitz, V is C 1 b (R) and there exist constants c 1 > 0, c 2 > 0 such that (W ) The function W : R → R is C 2 (R), bounded and even, and W is C 1 b (R). We will use two consequences of these assumptions: The first set of particles Y n i was already informally defined in the introduction; the second set X n i is a compressed version of Y n i . We will often use the notation η n for the empirical measure of a set of particles, Definition 4.5.
(2) For each n ∈ N, the system of particles X n = (X n i ) i=1,...,n ⊂ C([0, ∞); R n x ) is defined by the generator where the drift b is now given by Lemma 4.6. For these two particle systems, weak solutions exist and are unique, and at each t > 0 the laws of X n (t) and Y n (t) are absolutely continuous with respect to the Lebesgue measure.
This result is more-or-less standard, and the proof is given in the Appendix. In fact, throughout this paper, unless explicitly stated otherwise, whenever we speak of existence or uniqueness of a solution of a stochastic differential equation, we are referring to the existence of weak solutions and uniqueness in law [KS98, Section 5.3]. Henceforth, unless required for the argument at hand, we do not go into details (such as corresponding filtrations, or similar aspects) about the weak solutions under study.
The following lemma makes the relationship between the two particle systems precise.
Lemma 4.7 (Equality of distributions). Let ρ n (t) = 1 be the empirical measures of the particle systems Y n and X n . The stochastic processes ρ n and A n µ n have the same distribution in C [0, ∞); P(R y ) .
Proof. The idea of this property goes back to Rost [Ros84], who used it for the particle system Y n without potentials V and W . Because of the additional complexity of the two potentials V and W we give an independent proof. Since every function of η n (x) maps one-to-one to a symmetric function of x (that is, a function f : R n → R such that f (x 1 , . . . , x n ) = f (x σ 1 , . . . , x σn ) for all permutations σ), the martingale problem for the random measure-valued process ρ n = η n (Y n ) can be reformulated as the property that is a martingale for all symmetric f ∈ D(L Y ).
(44) Given the process X n , consider the transformed process Y n that is given by the expression (at each time t) Y n := T ηn(X n ) X n 1 , T ηn(X n ) X n 2 , . . . , T ηn(X n ) X n n .
Whenever all X n i are distinct, we have η n ( Y n ) = A n η n (X n ) by part 1 of Lemma 4.3. Since the X n are almost surely distinct at any time, we have proved the lemma if we show that Y n satisfies (44).
Note that for any x ∈ R n without collisions, i.e. with x i = x j for i = j, The equality ( * ) holds because each of the terms #{ : x < x j } is constant away from the set of collisions. With this expression we find that e.g. for each k, and by collecting similar arguments we conclude that Also note that the function f • T ηn is an element of D(L X ). This follows since at noncollision points f • T ηn is as smooth as f (by the same constancy argument as above); at the collision set, f • T ηn connects with regularity C 2 by the C 2 -regularity of f in Ω n , the boundary condition ∂ n f = 0, and the symmetry of f .
To conclude the proof, we show that Y n satisfies (44) by rewriting and this expression is a martingale by the properties of X n . Note that although the identity (45) holds only for non-collision points x, the process X n spends zero time on the set of remaining points. Therefore the identity ( * * ) above holds almost surely.
Lemma 4.8 (Transformed version of P inv n and Z n ). The particle system X n has invariant measure P n ∈ P(R n x ), given by The normalization constant Z n is the same as in (5) and can be written as This property follows from arguments very similar to those of Lemma 4.7, and we omit the proof.

The functionals of this paper
With the maps A and T ρ defined in the previous section, we can also define the various functionals that we use in this paper. The functionalF as defined in the introduction is one of these; in this section we review this definition and place it in a larger context.
We define in total six functionals, three functionalsF, Ent V , andÊ W , on the set of "expanded" measures P(R y ), and at the same time three transformed versions F, Ent V , and E W on the set of "compressed" measures P(R x ). We split the definition ofF of the introduction up into an entropic part Ent V and an interaction-energy partÊ W : c. and ρ(y) < 1/α a.e., +∞ otherwise (47) E W (ρ) := 1 2 Ry Ry W (y − y ) ρ(dy)ρ(dy ), for ρ ∈ P(R y ).
The constants C F and C Ent are such that inf Ent V = infF = 0. The integral inÊ W is welldefined by the boundedness of W , and we show in Lemma 5.1 below that the integral in Ent V is well-defined in (−∞, +∞]. We then define the corresponding functionals on "compressed" space P(R x ) through the isometry A: We also need the relative entropy: for two measures µ and ν on the same space, Lemma 5.1 (Properties of the functionals).
(3) The functionals F andF are lower semicontinuous and λ-convex for some λ ∈ R.
(4) (Subdifferential of F.) If µ is Lebesgue-a.c. and where b was already given in (43): is the element of minimal norm of ∂F(ρ).
Remark 5.2 (Mean-field structure of the compressed rate functions). The invariant-measure rate functionF and the dynamic rate functionÎ that we introduce below both have a particular form. This is best observed in (51) and in (56) below: the argument of the functional appears twice, first as the first argument in the relative entropy, and secondly as a parameter in the reference measure. This is a common structure in mean-field interacting particle systems (see e.g. [Léo95] or [dH00, Ch. X]). It reflects the fact that once the system has been 'compressed' (i.e., transformed to X n ) the interaction between the particles has a 'nearly-weakly-continuous' dependence on the empirical measure. The estimate (64) below illustrates this: while 1 n log dP n /dQ ν n is not completely continuous in the empirical measure (the right-hand side does not vanish as δ → 0 for finite n), the discontinuity does vanish in the limit n → ∞.
This structure is reflected in the fact that the entropic part of the free energy F is of the Gibbs-Boltzmann type µ log µ. By contrast, the expanded system has a different entropic term ρ log(ρ/(1 − αρ), which reflects the fact that in the expanded system the particles have a strong interaction with each other.
Proof. We first show that the integrals in (49) and (47) are well defined; since µ → R µ log µ is unbounded from below on the space of probability measures, this is not immediate. For the first integral in (49), we write µ V (dx) := e −2V (Tµx) dx, and use the inequality s − ≤ (s+t) − +|t| for the negative part s − := max{−s, 0} to estimate It follows that the first integral in (49) is well defined in (−∞, ∞], and a similar calculation shows the same for the first integral in (47). We next prove the formula (49) for the functional F. Since F(µ) is defined as Ent V (Aµ) + E W (Aµ) + C F , finiteness of F(µ) implies that µ is absolutely continuous. The second integral in (49) then follows by Lemma 4.3:

Ry Ry
We turn to the first integral in (49). Again let F(µ) be finite, which implies that there exists ρ such that ρ = Aµ and Ent V (ρ) = Ent V (µ) < ∞. This implies that ρ is Lebesgueabsolutely-continuous and satisfies ρ(y) < 1/α for almost all y. By Lemma 2.3 the icdf Y of ρ is monotonic, and its derivative Y (m) exists at almost all m ∈ (0, 1) and is equal to 1/ρ(Y(m)). We then calculate Since this integral is assumed to be finite, Y (m) > α for Lebesgue-a.e. m, and therefore X (m) = Y (m) − α > 0 for almost all m. Inserting this into the expression above yields Writing we therefore have proved By reversing the argument we similarly show that which concludes the proof of (49).
To prove part 3, note that the lower semicontinuity in P 2 (R) of Ent V is a consequence of its convexity, and the lower semicontinuity ofÊ W follows from the boundedness and continuity of W . The isometries A and A −1 then transfer the same properties to Ent V and E W . The λ-convexity ofF also is a standard result for functionals of this type; see e.g. [CMV06,Sec. 5].
We next turn to the calculation of the element of minimal norm in the subdifferential of F, evaluated at a µ ∈ P(R x ) with the properties that µ is Lebesgue absolutely continuous and ∂ x µ = wµ with w ∈ L 2 (µ).
Take φ ∈ C ∞ c (R) and set r ε (x) := x + εφ(x); note that for small ε, r ε is strictly increasing. Set µ ε := (r ε ) # µ, and note that µ ε also is absolutely continuous. From [AGS08, Theorems 10.4.4 and 10.4.6] (or an explicit calculation) we deduce that d dε Setting we calculate Combining these expressions with a similar one for E W , and using By an argument as in the proof of [AGS08, Th. 10.4.13] it follows that ∂ x µ/2µ − b(·, µ) is the element of minimal norm in the subdifferential ∂F(µ). Finally, the proof of part 5 follows along very similar lines as the previous part, and we omit it.
6. Pathwise large deviations for X n with i.i.d. initial data The aim of this section is to prove the following large-deviations principle for the compressed particle system X n . In this theorem we start the evolution with i.i.d. initial data, which is different from the situation of Theorem 1.4; we use the name Z n in order to distinguish this case from the case we study in the proof of Theorem 1.4.
Theorem 6.1. Let ξ ∈ P(R), and for each n ∈ N let Z n be the particle system defined in Definition 4.5(2), with initial data Z n (0) drawn from ξ ⊗n .
The random variable t → η n (Z n (t)) satisfies a large-deviation principle on C([0, T ]; P(R x )) with good rate function I ξ (µ) := inf H(P |W P ξ ) : P t = µ t for all t and H(P |W ξ ) < ∞ .
Here W ξ ∈ P C([0, T ]; R) is the law of a Brownian particle with initial position drawn from ξ, and for any P ∈ P(C([0, T ]; R)), the measure W P ξ ∈ P C([0, T ]; R) is the law of the process Z P satisfying the SDE in R, The notation P t ∈ P(R) represents the time-slice marginal of the measure P at time t.

Preliminary estimates for the pathwise large deviations
In the previous section we established a large-deviation principle for the particle system Z n , which starts at initial positions Z n (0) drawn i.i.d. from some distribution ν ∈ P(R x ). The particle system Z n is situated in the compressed ('X') setup. After mapping to the expanded setup, the evolutions t → Z n (t) are solutions of the 'correct' SDE (2) (or Definition 4.5(1)). However, the expansion causes the initial data to be distributed in a convoluted and unnatural way.
In this section and the following two ones we therefore adapt the large-deviation principle for Z n of Theorem 6.1 to the more natural initial distribution of Theorem 1.4. In this section we establish a number of estimates.
In the proof of Theorem 8.1 below of the large-deviation principle for Y n , the initial data Y n (0) will be distributed according to a version of the invariant measure P inv n with W = 0: The compressed version of this measure is (see Lemma 4.8) On the other hand, in the auxiliary particle system Z n the initial positions Z n i (0) will be i.i.d. distributed with common law ξ := Q ν ; recall from Lemma 5.1 that for given ν ∈ P(R x ) the single-particle measure Q ν ∈ P(R x ) is defined as Therefore the vector Z n (0) has as law the n-particle tensor product Q ν n ∈ P(R n x ), The following lemma is the main result of this section, and establishes some estimates that connect these two particle systems; recall that the metric d BL on P(R), appearing in part (3), is defined in duality with bounded Lipschitz functions (see Section 2.2).
Lemma 7.1 (Basic estimates). For ν ∈ P(R x ), let Q ν n ∈ P(R n x ) and P n be defined as above in (60) and (61). Recall that the function γ and the constant C γ were defined in Lemma 5.1.
Proof. We first show that r n is bounded as n → ∞ for any ν. Fix ν ∈ P(R x ). By the estimate (40) there exists C > 0 such that In combination with the analogous estimate from the other side, this proves that r n is bounded. It also follows that there exists a subsequence (n k ) k and a constant c ∈ R such thatr At the end of this proof we will show that c = C γ , and therefore r n =r n , and that the whole sequence converges. Part 2 will follow from part 3, since we are able to take the function R to be bounded. To show part 3, take ν ∈ P(R) and estimate for x ∈ R n 1 n k log If ν is not absolutely continuous, then we simply take R(δ, ν) := 2 Lip(V ), by which we satisfy the assertion of the Lemma. If ν is absolutely continuous, then by Lemma 7.2 below we can further estimate the right-hand side above by ≤r n k + 2 Lip(V ) ω ν (d BL (ν, η n k (x))).
Setting R(δ, ν) := 2 Lip(V ) ω ν (δ), we now have proved the slightly modified version of (64), We now come back to the property c = C γ , which we prove using Lemma 3.5. We set X := P(R x ) with the bounded-Lipschitz metric and P k := (η n k ) # P n k . For ν ∈ P(R x ) we set Q ν k := (η n k ) # Q ν n k ; by Sanov's theorem Q ν k satisfies a (strong) large-deviation principle with good rate function µ → H(µ|Q ν ).
From (67) we deduce that for any ν ∈ P(R x ), writing B δ (µ) for the d BL -ball in P(X ), Note that if ν is such that H(ν|Q ν ) < ∞, then ν is Lebesgue absolutely continuous, and the right-hand side of (68) vanishes as k → ∞ and then δ → 0; and if H(ν|Q ν ) = ∞, then the right-hand side of (68) is bounded. Therefore the two conditions of Lemma 3.5 are satisfied, and it follows that P k satisfies a weak large-deviation principle with rate function ν → H(ν|Q ν ) + γ(ν) − C γ + c. Since the infimum of this is zero, we find c = C γ .
The lemma below gives a quantitative version of a well-known result attributed to Polyā (see e.g. [AL06, Th. 9.1.4]).
Proof. Write F (x) := ν (−∞, x) and F (x) := ν (−∞, x) . Since F is bounded and nondecreasing, it is uniformly continuous on R; we write α for the modulus of continuity of F , and we assume without loss of generality that α is non-decreasing. Fix x 0 and set ε := F (x 0 ) − F (x 0 ); for definiteness we assume that ε > 0. Since F is non-decreasing and F has modulus of continuity α, we estimate for x ≥ x 0 that Let δ ε := sup{0 < δ ≤ 1 : α(δ) ≤ ε}, and define ϕ ε : R → R by We then calculate from which we deduce that Taking the supremum over x 0 ∈ R and inverting the relationship above we find Sinceα(ε) is strictly positive for ε > 0, lim d↓0 ω ν (d) = 0, implying that ω ν is a bona fides modulus of continuity.

Pathwise large deviations for Y n with special initial data
In this section we prove Theorem 8.1 below, which is a slightly weaker version of Theorem 1.4. The difference lies in the initial data Y n (0), which are not distributed by the measure P inv,f n as in Theorem 1.4, but according to the "(W = 0)-version" of the invariant measure P inv n that we introduced in the previous section: Theorem 8.1. Assume that V, W satisfy Assumption 4.4.
For each n, let the particle system t → Y n (t) be given by Definition 4.5(1), with initial positions Y n (0) drawn from the W = 0 invariant measure P inv,W =0 n .
The random evolutions t → ρ n (t) = η n (Y n (t)) then satisfy a large-deviation principle in C [0, T ]; P(R) with good rate functionÎ. If in addition ρ(0) ∈ P 2 (R) and ρ satisfieŝ F(ρ(0)) +Î(ρ) < ∞, then ρ ∈ AC 2 ([0, T ]; P 2 (R)) and the functionalÎ(ρ) can be characterized asÎ Note that although the initial data Y n (0) are drawn from P inv,W =0 n , the processes Y n evolve under the dynamics that includes W . 8.1. First part of the proof of Theorem 8.1. As in the statement of the theorem, consider for each n initial data Y n (0) drawn from P inv,W =0 n . We transform these positions to initial data for the X n -process, through η n (X n (0)) = A −1 n η n (Y n (0)).
This identity only fixes the positions X n i (0) up to permutation of i; this arbitrariness has no impact, however, since all our results only depend on η n (X n ), which is invariant under such permutations.
By Lemma 4.8 the transformed initial data X n (0) have law Let t → X n (t) solve the system of Definition 4.5(2) with initial data X n (0).
The following lemma uses the result of Section 6 to give a large-deviation principle for this particle system X n .
Proof of Lemma 8.2. We use Lemma 3.5 to prove that the random measures µ n satisfy a weak large-deviation principle; subsequently we upgrade this into a strong large-deviation principle by showing exponential tightness.
We write P n for the law of X n on C([0, T ]; R n x ). Set X = C([0, T ]; P(R x )) and define the modified push-forward P n := (η n ) % P n ∈ P(X ) by P n (A) := P n η n (X n (t)) t∈[0,T ] ∈ A .
We construct the measure Q n as follows. Fix some ν ∈ X . Let the curves Z n solve the same equation in Definition 4.5(2) as X n , but with initial data Z n (0) drawn from the independent measure Q ν 0 n = (Q ν 0 ) ⊗n (see (61)). We write Q ν 0 n for the law of Z n on C([0, T ]; R n x ), and we set Q ν k := (η n ) % Q ν 0 n ∈ P(X ). By Theorem 6.1, the sequence of measures Q ν n satisfies a strong large-deviation principle with the good rate function I Q ν 0 .
Since the laws of X n and Z n are the same after conditioning on the initial positions, we have for each x ∈ C([0, T ]; R n x ) that dP n dQ ν 0 n (x) = dP n dQ ν 0 (x(0)).
The properties of r n and R now imply that the conditions of Lemma 3.5 are satisfied, and it follows that P n satisfies a weak large-deviation principle with good rate function µ → I Q µ 0 (µ) + γ(µ 0 ).
Finally, to show that the weak large-deviation principle is in fact a strong principle, take an arbitrary ν ∈ C([0, T ]; P(R x )) and construct the particle system Z n as above. By Theorem 6.1 the random variables t → η n (Z n (t)) are exponentially tight in C([0, T ]; P(R x )); by (72) and the bound (63) the same follows for t → η n (X n (t)).

Characterization of the rate function.
Lemma 8.3. Let ν ∈ P(R x ) and let µ ∈ C([0, T ]; P(R x )) satisfy I ν (µ) < ∞. Then there exists a measurable function u : [0, T ] × R x → R, such that and µ is a distributional solution of The function u is unique in L 2 (0, T ; L 2 µt (R x )).

Proof. We begin by showing existence and uniqueness in law of weak solutions to the SDE
with initial positions distributed according to ν. That is, we establish existence and uniqueness of the measure W µ ν such that under W µ ν , and with respect to some filtration {F t } t∈[0,T ] , {B T } t∈[0,T ] is a Brownian motion and W µ ν is the probability law of the solution {X t } t∈[0,T ] of the SDE. To show this we first prove that b satisfies, for any x ∈ R and ρ, η ∈ P(R), where d T V (·, ·) is the total variation distance, and |b(x, ρ)| ≤C(1 + |x|).
With these estimates the assumptions of [DH18] are fulfilled and existence and uniqueness of weak solutions to the SDE hold.
From the inequality we obtain with the assumption that V is Lipschitz the estimate For W we split the difference as follows: Because W is Lipschitz, the first term on the right-hand side can similarly be bounded: The second term, involving the integral with respect to the difference ρ − η, can be bounded from above using the characterization of the total variation distance in terms of Borel measurable functions f ∈ B(R x ): Together the two upper bounds yield Combining this with the upper bound for the difference of V -terms, we have for some constant C.
The linear growth condition follows from the assumption that V and W are Lipschitz and bounded, respectively.
With these estimates, the conditions of [DH18] are satisfied, implying that weak existence and uniqueness of solutions holds for the SDE (76); therefore the measure W µ ν is well-defined. Define the set Since by assumption I ν (µ) < ∞, the set A µ ν is non-empty. By Theorem 3.1 of [CL94], in the definition of J ν (µ) it is sufficient to minimize over (strongly) Markovian P such that P ∈ A µ ν . Uniqueness in law corresponds to the uniqueness condition 'U' in [L12] and by Theorems 1 and 2 therein, for each Markovian P ∈ A µ ν there exists a process {β t } t∈[0,T ] adapted to the (augmented version of the) filtration {F t } t∈[0,T ] of the weak solution such that T 0 β t dt and T 0 β 2 t dt are both finite P -a.s. and under P there is a P -Brownian motion B P such that the process {X t } t∈[0,T ] solves, under P , By [CL94,Thm 3.60] there is a u : [0, T ] × R x → R such that T 0 Rx u 2 (t, x)µ t (dx)dt < ∞ and the process β can be expressed as β t = u(t, X t ); the function u can be obtained via the Riesz representation theorem, see [CL94] for details. The Radon-Nikodym derivative between P and W µ ν satisfies dP dW µ and it follows that This is precisely (74).
Replacing β with u in the SDE, we find that under P the process X solves dX t = (b(X t , µ t ) + u(t, X t ))dt + dB P t . The Forward-Kolmogorov equation of this SDE for the single-time marginals P t = µ t is equal to (75). The uniqueness of u is a direct consequence of the strict convexity of u 2 µ and the linear constraint (75).
(2) For almost all t ∈ [0, T ], µ t is Lebesgue-absolutely-continuous, ∂ x µ t ∈ L 1 (R x ), and (3) The functional I ν can be written as where v t is the velocity field associated with ∂ t µ t (see Lemma 2.7).
Proof. We first show that R x 2 µ t (dx) is bounded uniformly in t. Formally this follows from multiplying equation (75) by x 2 and integrating, giving d dt The second inequality follows from the assumptions (39) on V . Gronwall's Lemma then yields boundedness of x 2 µ t (dx) for finite time. This argument can be made rigorous by approximating x 2 by smooth compactly supported functions. We next prove part 2. Fix a function ϕ ∈ C 0,1 c ([0, T ] × R), and let g ∈ C 1,2 ([0, T ] × R) be a solution of the linear backward-parabolic equation where we use the shorthand notation b t (x) := b(x, µ t ). Existence of such a solution follows from standard linear parabolic theory; see e.g. [Fri64, Th. 1.12]. The constant initial datum and the compact support of the right-hand side imply that g(x, t) → 1 for x → ±∞ and for all t, and that all derivatives converge to zero as x → ±∞; this can be recognized in the representation formula [Fri64, (1.7.6)].
Note that by the coerciveness bound (39) on V we have To obtain bounds on the same expression at earlier times t we calculate, briefly suppressing the subscript t, after which Gronwall's Lemma implies that g 2 0 e −2V ≤ C, with a constant C > 0 that does not depend on ϕ.
The function f := 2 log g then satisfies with final value f T = 0. Multiplying (75) with f and integrating we find Briefly writing µ V (dx) := Z −1 V e −2V dx, for which we have the estimate H(µ 0 |µ V ) ≤ 2F(µ 0 ) + C, we then apply the entropy inequality to find The right-hand side of this expression is bounded from above independently of ϕ, and by the dual characterization of Fisher Information (see e.g. [FK06,Lemma D.44]) it follows that where the second identity follows from a standard argument involving the separability in C b (R) of the subspace C 1 c (R). This proves part 2. To prove part 1, i.e. to show that µ ∈ AC 2 ([0, T ]; P 2 ), we write equation (75) as The function v t satisfies T 0 Rx µ t v 2 t < ∞ because each of the three terms in (78) satisfies a similar bound: u satisfies this property by (74), b is uniformly bounded by Assumption 4.4, and for ∂ x µ/µ this follows from part 2. By the characterization of Lemma 2.7 it follows that µ ∈ AC 2 ([0, T ]; P 2 ).
8.3. End of proof of Theorem 8.1. We now have constructed two particle systems as follows: • The particle system Y n of Definition 4.5(1) is started at initial positions drawn from P inv,W =0 n ; • The particle system X n of Definition 4.5(2) is started from the transformed initial positions X n i (0), as described in (70). By Lemma 8.2, the random time-dependent measures µ n (t) = η n (X n (t)) satisfy a largedeviation principle in C([0, T ]; P(R x )) with rate function I(µ) = I Q µ 0 (µ) + γ(µ 0 ). By Lemma 4.7 the random measures ρ n (t) = η n (Y n (t)) have the same distribution as A n µ n (t). We deduce the LDP for ρ n by applying a generalization of the contraction principle to n-dependent maps in the form of [DZ98, Corollary 4.2.21]. In the case at hand these maps will be the maps A −1 n and A −1 , which extend A and A n to time-dependent measures: , The only non-trivial condition to check for [DZ98, Corollary 4.2.21] is a convergence property of the maps A n to A. We temporarily write P µn for the law of µ n on P(C([0, T ]; R x )). Define for n ∈ N and δ > 0 Γ n,δ := {ν ∈ supp P µn : d BL (Aν, A n ν) > δ}, where we also write d BL for the metric on C([0, T ]; P(R y )) generated by the metric d BL on R y . Corollary 4.2.21 of [DZ98] requires that this set has super-exponentially small P µn -probability.
In fact, for every δ > 0 the set is empty for sufficiently large n, since by part 3 of Lemma 4.2 we have for any ν n = 1 Since d BL (µ, ν) ≤ W 2 (µ, ν) by Lemma 2.5, the set Γ n,δ is empty for n ≥ α 2 /2δ 2 . Applying [DZ98, Corollary 4.2.21] we find that A n µ n satisfies a large-deviation principle in C([0, T ]; P(R y )) with good rate function It remains to prove the form (69) ofÎ; this is the content of the next lemma.
To show identity (b), note that since F(µ 0 ) + I Q µ 0 (µ) + γ(µ 0 ) =F(ρ 0 ) +Î(ρ) < ∞ we have by Lemma 8.4 The first two terms are equal to 2 Ent V (µ 0 ); we rewrite the remainder as The first integral equals 1 2 T 0 |μ| 2 (t) dt by the characterization (26) of the velocity of absolutelycontinuous curves. By the characterizations (53) and (27) of the element of minimal norm in the subdifferential, the second integral equals 1 2 T 0 |∂F| 2 (µ t ) dt, and by the chain rule (28) the third integral equals F(µ T ) − F(µ 0 ). 9.1. Proof of Theorem 1.4. In this section we finalize the proofs of the two main theorems. We first prove a version of Theorem 1.4 that allows a little more freedom in the initial data.
Let G : P(R) → R be continuous and bounded, and set We also define the modified free energŷ where the constant C G is such that infF G = 0. Then • For G = 0 the distribution P inv,G n coincides with P inv,W =0 n ; • For G(ρ) = f ρ the distribution P inv,G n coincides with P inv,f n andF G withF f ; • For G(ρ) = W (x − y)ρ(dx)ρ(dy) the distribution P inv,G n coincides with P inv n andF G withF.
Here |ρ| and |∂F| are the metric derivative and the local slope defined in Definition 1.3, for the Wasserstein metric space X = P 2 (R).
Theorem 1.4 is Theorem 9.1 for the special case G(ρ) = f ρ.
Proof of Theorem 9.1. As in the theorem, let Y n be solutions of equation (2) (or equivalently of Definition 4.5(1)) with initial data drawn from P inv,G n . Let Y n be solutions of the same system, but with initial data Y n (0) drawn from P inv,W =0 n . Let ρ n , ρ n ∈ C([0, T ]; P(R y )) be the corresponding empirical measures, and write P ρn and P ρn for their laws. By Theorem 8.1, ρ n satisfies a large-deviation principle in C([0, T ]; P(R y )) with the rate functionÎ defined in (79).
Since the evolution of the two particle systems is the same, conditioned on their initial positions, we have as in the proof of Theorem 6.1 that for any ρ ∈ C([0, T ]; P(R y )) dP ρn dP ρn (ρ) = d(η n ) # P inv,G n d(η n ) # P inv,W =0 n (ρ 0 ) = C n exp −nG(ρ 0 ) for some normalization constants C n > 0. Since the exponent is a bounded and narrowly continuous function of ρ 0 , Varadhan's Lemma (e.g. [DZ98,Th. 4.3.1]) implies that ρ n satisfies a large-deviation principle in C([0, T ]; P(R y )) with rate function where the constant C is chosen such that infÎ G = 0. We now show the formula (81). Set F G (ν) := inf Î G (ρ) : ρ 0 = ν .
In the proof of Theorem 9.1 above we showed that F G equals 2F G , which coincides with 2F for this choice of G; the characterization of Lemma 5.1 then concludes the proof.
Appendix A. Proof of Lemma 4.6 The existence and uniqueness of weak solutions X n follows from arguments similar to the proof of Lemma 8.3, and we omit this proof.
For the stochastic process Y n we prove the existence and uniqueness as follows. The main step is to show the unique existence of a fundamental solution of the parabolic partial differential equation ∂ t ρ − L * Y ρ = 0 on Ω n : Lemma A.1. Fix T > 0. For each y 0 ∈ Ω n there exists a unique fundamental solution (t, y) → p(t, y; y 0 ) of the equation ∂ t − L * Y = 0 on Ω n with Neumann boundary conditions, i.e., a function p = p y 0 ∈ L 1 ((0, T ) × Ω n ) satisfying 0 = for all ϕ ∈ C 1,2 b ([0, T ] × Ω n ). In addition, for each t > 0 the function y → p(t, ·; y 0 ) is non-negative and continuous, and satisfies Ωn p(t, y; y 0 ) dy = 1.
Given this fundamental solution, the proof of Lemma 4.6 proceeds along classical lines. We construct a consistent family of finite-dimensional distributions P t 1 ,...,t k ∈ P((Ω n ) k ) in the usual way, by daisy-chaining copies of the fundamental solution p(t k − t k−1 , · ; · ) (see e.g. [ [SV97] then implies that P t 1 ,...,t k is generated by a unique probability measure P on the space C([0, ∞); Ω n ). This concludes the argument.
The main step therefore is the proof of Lemma A.1, which we now give.
Proof. Define for L > α the truncated state space Ω L n := y ∈ [−L, L] n : |y i − y j | ≥ α/n for all i = j . Fix an initial datum φ ∈ C b (Ω L n ), φ ≥ 0; by classical methods there exists a non-negative We now take a sequence L → ∞ and choose φ L ≥ 0 with φ L = 1 such that φ L converges narrowly on Ω n to δ y 0 . For each T > 0, the corresponding solution (t, y) → u L (t, y) is nonnegative and has integral over [0, T ]×Ω n equal to T (where we extend u L on Ω n \Ω L n by zero); by taking a subsequence we can therefore assume that u L converges weakly, in duality with C c ([0, T ] × Ω n ), to a non-negative limit measure p with p([0, T ] × Ω n ) ≤ T . By e.g. [BKRS15, Th. 6.4.1] the measure p has a continuous Lebesgue density on (0, T ) × Ω n , implying that for 0 < t < T we can write it as p(dtdt) = p(t, y)dtdy.
By a standard approximation argument, using the fact that the total mass of p is finite, this identity can be shown to hold for all ϕ ∈ C 1,2 b ([0, T ] × Ω n ) with ϕ(t = T ) = 0. This proves (82). By taking ϕ in (82) to be a function only of t, we also find ∂ t Ωn p(t, y) dy = 0 in distributional sense, and therefore p(t, ·) has unit mass for all time t.
By substituting this ϕ in (86) we find This implies that p is the zero measure on [0, T ] × Ω n , and proves the uniqueness of solutions of (82).