Generalized gradient structures for measure-valued population dynamics and their large-population limit

We consider the forward Kolmogorov equation corresponding to measure-valued processes stemming from a class of interacting particle systems in population dynamics, including variations of the Bolker–Pacala–Dieckmann-Law model. Under the assumption of detailed balance, we provide a rigorous generalized gradient structure, incorporating the fluxes arising from the birth and death of the particles. Moreover, in the large population limit, we show convergence of the forward Kolmogorov equation to a Liouville equation, which is a transport equation associated with the mean-field limit of the underlying process. In addition, we show convergence of the corresponding gradient structures in the sense of Energy-Dissipation Principles, from which we establish a propagation of chaos result for the particle system and derive a generalized gradient-flow formulation for the mean-field limit.


INTRODUCTION
An important goal in theoretical biology and population dynamics is to derive macroscopic equations from microscopic models [CFM06,FKK09]. For many stochastic interacting particle systems involving birth, mutation, and death, these connections have been made rigorous. One such class of particle systems consists of spatially-structured models such as the Bolker-Pacala and Dieckmann-Law (BPDL) model [BP97,LD00]. The dynamics of these particle systems can be described by jump processes on the space of finite positive measures and can be used to derive macroscopic models.
The convergence of such measure-valued jump processes under a mean-field scaling to a largepopulation limit is shown for example in [FM04] via martingale techniques, and in [FKK09], where an analytic approach to the convergence of rescaled moment equations is used. In both approaches, the limiting evolution is governed by a non-local evolution equation given by We will refer to (1.1) as the mean-field equation. Here, represents the limiting density of particles at time , and the parameter functions and are continuous and bounded functions stemming from birth, dispersal, and competition in the BPDL model. In recent years, there has been considerable activity in studying the mean-field equation (1.1) and the BPDL model in more general spaces, allowing for dynamics involving multiple species and combinations of discrete and continuous traits. See for example [FKK21] for an overview of existing models, where instead of ℝ the underlying space is an arbitrary locally compact Polish space. However, convergence in the large-population limit is not considered.
Meanwhile, powerful variational tools have been developed in the last decade for studying meanfield interacting jump processes and their limits under the assumption of detailed balance. To highlight only a few: [EFLS16] studied mean-field limits for measure-dependent jump processes; [Erb16] proved the convergence of the spatially-homogeneous Kac-process to the Boltzmann equation; [Sch19] investigated the macroscopic limit of Becker-Döring models; [KJZ19] showed hydrodynamic limits for zero-range and exclusion processes; [MM20] discussed convergence and higher-order approximations for chemical reaction networks, an approach that was subsequently used in the setting of discretized reaction-diffusion equations in [MSW22].
In this work, we extend and apply these variational techniques to prove the mean-field limit for population dynamics over arbitrary compact Polish spaces, with bounded measurable parameters , satisfying a detailed balance condition. In addition, we establish entropic propagation of chaos, which controls the discrepancy between the microscopic and macroscopic models in a precise sense. To the authors' knowledge, this is the first convergence result under such general assumptions.
To do so, we first introduce a new generalized gradient structure and rigorous variational formulation for the forward Kolmogorov equation (FKE) corresponding to the BPDL model, where the FKE describes the evolution of the law of the measure-valued process. Our formulation incorporates not only the equation itself but tracks the birth and death fluxes as well. This extends the generalized gradient-flow framework of [PRST22] due to the unboundedness of the underlying jump kernel, and the positivity of the fluxes.
We then show convergence of these generalized gradient structures under a mean-field scaling and the large-population limit in the sense of Energy Dissipation Principles (EDPs) (see [LMPR17]). The limiting gradient flow is the Liouville equation corresponding to the mean-field equation, namely a transport equation that describes the evolution of the law of a process that follows deterministic dynamics described by (1.1) but for possibly random initial conditions. This connection between the Liouville equation and the mean-field equation is made rigorous with the help of a modification of the superposition principle of [AC08].
In particular, we deduce that the laws determined by the FKE equation concentrate around the solution of the mean-field equation (1.1), which due to the convergence of the associated free energies translates into an entropic propagation of chaos result, see Theorem 1.10.
Outline. The rest of this section is devoted to giving a brief overview of our setting and presenting the main results. In Section 2 the mean-field equation and corresponding gradient structure are introduced. We repeat this process in Sections 3 and 4 for the forward Kolmogorov equation and the Liouville equation respectively, with the proof of a modified superposition principle delegated to Appendix B. Finally, in Section 5, we establish the EDP-convergence of the gradient structures, and prove both the convergence to the mean-field limit and the propagation of chaos.
1.1. Measure-valued population dynamics and mean-field limits. We consider the forward Kolmogorov equation that corresponds to a generalized version of the BPDL model. In its classical form, the Bolker-Pacala model is a purely spatially-structured microscopic model for a population of plants involving the birth, dispersal, and either natural death or death by competition for resources and can be modeled as a jump process in the space of positive measures over ℝ . However, in certain models of adaptive evolution it is the mutation of traits that play a role, instead of spatial evolution (see [LD00,CFM06,CFM08]). Moreover, if one wants to model multiple interacting species or marked configuration spaces, more general spaces than ℝ are needed [KLU99,FKK21]).
Therefore, let the trait space be an arbitrary Polish space, denoted henceforth as  . We model the BPDL-dynamics at any time as an interacting particle system with particles 1 , … , ∈  at positions 1 , … , ∈  , where the number of particles at time is not fixed since particles can be removed from and added to the system. Moreover, let ∈  + ( ), , ∈  + ( × ) be non-negative measurable functions, > 0 a positive parameter, and ∈  + ( ) a non-negative reference measure such that ∫  ( , ) (d ) = 1, for all ∈  .
Then the BPDL-dynamics can be described as follows: • Each particle located at a position ∈  has two exponential clocks: a seed clock with rate ( ) and a death clock with rate 1 ∑ =1 ( , ). • If the death clocks rings, the particle is deleted.
• If the seed clock rings, a new particle is added at position ∈  with probability ( , ) (d ).
Alternatively, we can describe these dynamics in the form of reacting particles. Namely, setting ( , ) ∶= ( ) ( , ), then with a little of abuse of notation we have ( , .
We will refer to as the mutation kernel, and as the competition kernel. The parameter > 0 is called the system size, in the sense that that the scaling −1 guarantees that if the amount of particles in the system is of the order of , the total rate of created or deleted particles is of the same order.
Instead of looking at the individual positions of the particles, it is common to only consider the measure-valued process determined by the rescaled empirical measure Here, ∈ Γ ∶=  + ( ) with  + ( ) the space of finite non-negative measures. The infinitesimal generator of this process is given for all ∈ (Γ) by where ± [ ] ∈ Γ are the measure-dependent birth/death-kernels The law of the process is now given by the corresponding forward Kolmogorov equation Depending on the setting, this formulation can be made rigorous in various ways: for example via an analytical approach on configuration spaces as done in [FKK09], which in fact models infinite configurations of particles over ℝ , or via martingale techniques with  a closed subset of ℝ and = ℒ |  (see [FM04]). Moreover, in the latter, under the assumption of continuous, bounded, and integrable mutation/competition kernels, it is also shown that the process converges in the largepopulation limit → ∞ to the mean-field equation (1.1), which can be rewritten as While different choices of scalings are possible, the mean-field equation describes the macroscopic properties of the measure-valued process when the population is large. An alternative way is to study the evolution of the moments, which form a hierarchy similar to the BBGKY-hierarchy of correlation functions, and under the so-called Vlasov scaling the first moment or correlation function converges to ( ). For the case of infinite configurations over ℝ this has been established, see [FKK10], and both propagation of chaos in the Vlasov limit and the sub-Poissonian property have been established as well [FKKK15].
In this work, we do not consider the measure-valued process itself, but take the forward Kolmogorov equation ( ) as a starting point, and show convergence to the mean-field equation in the sense that → narrowly on (Γ) under suitable initial conditions. Throughout we assume the following: Henceforth we equip the space Γ with the narrow topology. Moreover, the assumption of no natural death means that particles can only be deleted due to competition with other particles. Together with the detailed balance condition this guarantees that the jump kernel is reversible with respect to an invariant measure Π ∈ (Γ), which is obtained as a push-forward of the Poisson measure with This allows us to write the forward Kolmogorov equation as a gradient flow of the relative entropy with respect to Π , and equip it with a corresponding variational structure, see Theorem 1.6. In light of similar results in [EFLS16,MM20] for mean-field jump processes on finite spaces and finite chemical reaction networks, one expects ( ) to converge to the following Liouville equation It is a transport equation that can be interpreted as the lifting of mean-field dynamics in Γ to evolutions in (Γ), and describes the evolution of the law of random measures that all satisfy ( ). In particular, if a solution of ( ) then ∶= is itself a solution of ( ). It turns out that in our general setting this convergence holds as well, as will be stated in Theorem

FIGURE 1. Convergence in the large-population limit
This convergence is a direct consequence of the convergence of the associated gradient structures, which we will describe below.
In these works a common starting point is to describe the relation between , representing either laws of some process or mean-field limits and generalized fluxes in the form of an abstract continuity equation. For example, in the case of independent particles following a common jump process over a graph, corresponds to the number of particles on a node at time , and a choice of flux can be the so-called net flux , which is related to the number of particles going through an edge.
However, we propose a slightly different structure, namely one that tracks the effective mass fluxes for both creation (arising from mutation) and annihilation (arising from competition) separately. The use of mass fluxes instead of usual particle fluxes ensures that in our convergence results as → ∞ we have both convergences of laws and fluxes (see Theorem 1.8).
Moreover, separating the effects of creation and annihilation (henceforth simply referred to as birth and death) instead of their combined contribution allows us to incorporate more information in our variational formulation. The downside is that we are forced to work with positive fluxes, while the framework in the aforementioned examples involves either quadratic or generalized structures for signed net fluxes. In this sense we are closer to the variational representations stemming from large deviations, involving so-called one-way or unidirectional fluxes, see for example [MPR14,PR19,BBBO21,PS22]. Indeed, our structure is motivated by large deviation theory, as we will discuss briefly in Appendix A.
In all three cases, i.e. for ( ), ( ) and ( ), our proposed structure is similar to the classical notion of a gradient flow in the sense that they all satisfy an abstract Energy-Dissipation Balance. Since we will repeat the same concept three times on different levels and for different spaces, let us make the general and abstract concepts clear: Formal Definition 1.2. Given a free energy functional ( ), a dissipation potential ( , ), a Fisher information functional ( ), and a linear operator with dual * , we consider pairs of curves ( , ) satisfying the continuity equation and define the EDP-functional Moreover, a gradient-flow solution is a pair (̂ ,̂ ) satisfying ( ) with (̂ ,̂ ) = 0.
Throughout we require the non-negativity of . For a deeper look at the mathematical basis of this sort of setting, especially for generalized gradient systems incorporating net fluxes, see [PRST22].
In all three examples the generalized fluxes consist of two parts: + and − , corresponding to birth and death. The continuity equations depend on the setting and are summarized in Table 1, with  + as the space of non-negative Radon measures. Remark 1.3. Note that the gradient-flow solution (̂ ,̂ ) is the null-minimizer of , and satisfies the energy-dissipation balance Moreover, for small ≪ 1 one would expect In light of the generalized gradient-flow framework of [PRST22] and the relation to minimizing movement schemes, a formal minimization procedure provides the gradient-flow solution ̂ + * ̂ = 0 and that along the solution where  * ( , ) is the dual of the dissipation potential . Finally, note that along the gradient-flow solution the free energy  is non-increasing, i.e.  is a Lyapunov functional. These (in)equalities indeed hold in our setting. See also Appendix A, where we compare the relation to generalized gradient flows for net fluxes, which follow from the above after a contraction argument, and the connection to the reversibility of the underlying process.
Let ( 1 , 2 ) be the Hellinger distance, see (2.2), and nt( 1 | 2 ) the relative entropy of 1 with respect to 2 for two (possible infinite) locally finite Borel measures 1 , 2 : where is the geometric mean of the expected birth and death fluxes, i.e.
As mentioned, although treating birth and death separately provides us with additional information, this prohibits the use of some of the previous works for gradient structures because of the positivity of the fluxes. However, there is still a strong connection to the variational formulations for jump processes arising from the large deviations of fluxes as seen in [PR19] and [BBBO21], see for example Appendix A on the equivalence of the EDP-functional to the expected rate functional.
and hence it is not directly clear that the relation (1.3) holds. However, as will be shown for Theorem 2.7, at least along the solution̂ the equivalence holds for a.e. ∈ [0, ].
Similar to the mean-field case, the dissipation potential consists of relative entropies with respect to geometric averages, now of forward and backward rates along a transition → ± 1 . Moreover, note that in contrast to the framework of [PRST22], we employ fluxes ± that are not finite measures. This is due to the unboundedness of as the mass of grows, which implies that the underlying jump kernel over Γ is itself unbounded as well, see Section 3.

Define the Fisher information  ∞ as stated in Definition 4.4, free energy
and dissipation potential Then the corresponding EDP-functional ∞ given by is non-negative, and for any 0 with  ∞ ( 0 ) < ∞ a unique gradient-flow solution (̂ ,̂ ± ) exists, witĥ a weak solution to ( ) and̂ ± =̂ ± for almost every ∈ [0, ]. Finally, for any ( , + , − ) such that ∞ ( , + , − ) < ∞, there exists (with a little abuse of notation) a Borel probability measure Ω over curves satisfying the mean-field continuity equation ( ℰ) such that for all the time marginals ( ) # Ω are equal to , and The statement of (1.6) is the aforementioned superposition principle, which is a modified version of the superposition principle [AT14] in metric measure spaces, and the ones used in [EFLS16], [Erb16]. It allows one to essentially jump back and forth between the Liouville equation and the mean-field dynamics, and in particular, provides us with the non-negativity of  ∞ and uniqueness of gradientflow solutions.
1.3. Convergence results. Our final and most important result is that the above gradient structures converge in the sense of EDP-convergence (e.g. see [LMPR17,PS22]), a generalization of the evolutionary Γ-convergence approach stated by [SS04,Ser11] and expanded on in [Mie16], which implies convergence of the gradient-flow solutions and their free energies.
We say that a sequence ( , ,+ , ,− ) ∈ converges to some ( , + , − ) ∈ if for all ∈ [0, ] the probability measures converge narrowly to in (Γ), and ,± (d , d ) d converge vaguely to . Again postponing technicalities, see Theorem 5.1, we have the following lower semi-continuity and compactness result: Finally, for any sequence ( , ,+ , ,− ) ∈ such that there exists a subsequence converging to some ( , + , − ) ∈ ∞ . Here the notion of EDP-convergence or evolutionary -convergence (where the is not to be confused with our space of positive measures Γ) relates to the -convergence of the free energies  and suitable liminf-estimates for the dissipation potentials and Fisher-information functionals (or local slopes in a metric setting).
In certain applications or for certain notions of convergence (e.g. see [MMP21]) one also establishes -convergence for the total dissipation  +  when written as functionals over ([0, ]; (Γ)). Moreover, -convergence of the functionals  over such path-spaces are related to the large deviations of the underlying process [Kra19], as we briefly discuss in Appendix A. In our framework this would require that for every ( , + , − ) ∈ ∞ , we can find a sequence ( , ,+ , ,− ) ∈ that converges to ( , + , − ) and satisfies the limsup-estimate However, in this paper we restrict ourselves only to the liminf-estimates, which is sufficient to obtain convergence of the solutions, an approach also taken in [EFLS16,Erb16,MM20]. Namely, by a lower semicontinuity and compactness argument, Theorem 1.8 implies the convergence of both the solutions and the free energies  , if the initial data are well prepared. The second half of Theorem 1.9, on the concentration around mean-field solutions and convergence of entropies, follows directly from the definition of  ∞ and uniqueness.
For interacting particle systems where the number of particles is fixed at ∈ ℕ the narrow con-vergencê → ̂ is equivalent to propagation of chaos in the sense of Snitzman [Szn91], and would imply narrow convergence of the -particle marginals at time to ⊗ . However, in our setting, this implies convergence of the -correlation functions, see [BGSRS20].
Moreover, the convergence of the free energies  implies the stronger notion of entropic propagation of chaos if the initial condition is sufficiently regular.
To the authors' knowledge, this is the first entropic propagation of chaos result for bounded competition kernels over compact Polish spaces, under the assumption of detailed balance.

Comments.
We have given an overview of the generalized gradient structures that we introduced for the forward Kolmogorov equation of our underlying interacting particle system and eluded to how this sequence of structures converges to a gradient structure induced by the mean-field limit. Throughout, we assumed bounded measurable rates , over a compact Polish space  satisfying the detailed balance condition ( , ) = ( , ) and ( , ) = 0 for all , ∈  , and we would like to briefly touch on possible relaxations of these assumptions.
First, for the limit inferior in Theorem 5.1, there is a technical issue concerning the possible noncontinuity of the competition kernel , which we resolve by an approximation argument from large deviation theory [HHMT20], see Appendix C. This argument can be straightforwardly extended to unbounded rates and under certain exponential integrability estimates with respect to the reference measure . However, the uniqueness of solutions and well-posed of variational formulations would be less clear.
Moreover, it should be noted that although for brevity and clarity we chose  to be compact, many of the listed results carry over to the case of  Polish with finite , under suitable choices of topologies and by bootstrapping from the tightness of . For -finite , this is not necessarily the case and would depend strongly on newly constructed estimates on the propagation of tightness.
A more fundamental restriction is the detailed balance assumption, which is necessary to phrase the variational structures in terms of generalized gradient systems and the evolution in terms of a gradient flow. However, there exist possible extensions and decompositions of variational structures for jump processes that do not assume detailed balance or even complex balance, see for example [KJZ18] for an overview. Therefore, in future work, the authors plan to generalize the variational methods outlined here to more general evolutions. "Dynamical-Variational Transport Costs and Application to Variational Evolution".
1.4. Notation. Below we collect some of the notation used throughout this paper. In this section, we will discuss the gradient-flow formulation of the mean-field equation under the detailed balance condition. Let us first make precise the context of Theorem 1.4, and embed it within the more general statement of Theorem 2.7 below.
Recall that the trait space  is a compact Polish space, and Γ ∶=  + ( ) is the space of finite non-negative measures over  equipped with the narrow topology. Fix a reference measure ∈ Γ, and rates , satisfying Assumption 1.1, i.e. , ∈  ( ×  ) with ( , ) = ( , ) for all , ∈  , and ( , ) = 0 for all ∈  . The mean-field equation then reads with measure-dependent birth and death kernels ± ∶ Γ → Γ given by Routinely, we will also adopt the shorthand notation ± ∶= ± [ ]. Now, setting ( ) ∶= ∫  ( , ) (d ), it is clear that that + = , − = , and the dynamics simplify to Strong solutions to ( ) in either total variation or appropriate 1 spaces follow straightforwardly via classical methods, see Section 2.2.
and the squared Hellinger distance 2 is given by with a measure dominating both and . Note that the definition (2.2) is independent of the choice for the dominating measure , and = + is always admissible. Moreover, recall the entropy function ∶ ℝ ≥0 → ℝ ≥0 and its Legendre dual * ∶ ℝ → ℝ by ( ) ∶= log − + 1, * ( ) ∶= − 1, and the relative entropy of with respect to as We will consider curves satisfying the continuity equation in an appropriately weak sense.
We will refer to net = + − − as the net flux.
Remark 2.2. When seen as approximations of particle systems the birth/death fluxes ± represent the observed amount of mass being created/annihilated around a certain point, and represents the density of the particles, while ± correspond to the expected birth and death fluxes of the BPDL model. d ∶= for any dominating measure . We define the following objects: Remark 2.5. Since ( ) < ∞ by Lemma 2.9 all objects above are well-defined, and it is straightforward to verify via the dual representation of the entropy that  ,  * are truly dual objects in the sense that and vice versa.
Remark 2.6. If ≪ with d = d , note that d = √ d , and that the Fisher information simplifies to We are now able to fully state the variational characterization of strong solutions to the mean-field equation ( ).
Theorem 2.7. For any ( , Moreover, whenever  ( 0 ) < ∞ and  ( , + , − ) < ∞ the chain rule for  holds:  ( ) is absolutely continuous and The proof of Theorem 2.7 is postponed to Section 2.3, where we establish the main technical ingredient, namely the chain rule for the entropy functional.
Remark 2.8. The non-negativity of  and the fact that null-minimizers are solutions to ( ) is related to the formal equivalence where  is the so-called Lagrangian given by Note that  is non-negative and zero if only if ± = ± . Although we do not prove the full equivalence in this work, it does play a role in the intuition and motivation behind the EDP-functional  with the Lagrangian  stemming from a large deviation perspective, as seen in Appendix A.

A priori estimates.
In this section, we will collect some elementary estimates and results that are either necessary for the well-posedness of the mean-field equation and the corresponding gradient structure, or necessary to do the same for the Liouville equation in Section 4.
Let Ψ * be given as . Then the following estimates hold: (i) The measures ± and are finite: (ii) For any birth/death fluxes ± ∈  + ( ), net flux net = + − −, and ± , ∈ ( ), For any birth/death fluxes ± ∈ Γ, Remark 2.10. Although the estimate for can be made more precise, namely we will not require it for our results.
for any dominating measure we have by Hölder's inequality Using the elementary inequality | | ≤ + − we derive by duality of the entropy Next, fix any measurable function ∈ ( ) and set its -truncation ∶= max{min{ , }, − }. Since Ψ * is even and monotone, by dominated convergence applied to the left-hand side and monotone convergence to the right-hand side, the inequality holds for as well.
(iii) Without loss of generality, suppose that  is finite. Set ( ) ∶= (1 + ( ) 2 ) −1 , and note that 0 ≤ ( ) ≤ 1. With̃ ( ) ∶= ( ∨ 1) the monotone relaxation of , we then have the following chain of inequalities, where the last inequality follows from Jensen's inequality. By convexity of̃ and̃ (0) = 0 the latter expression is monotone in ( ), and hence by (2.8) we find ± ( ) We will briefly state the improvement of regularity in time of if there exists a common dominating measure. The proof is similar to Corollary 4.14 of [PRST22] and therefore omitted here.
In particular, the continuity equation holds in the strong sense, namely that is an a.e. differentiable map from [0, ] to (Γ, ‖ ⋅ ‖ ) and Next, we will list two results that are either necessary for the chain rule in Section 3.3 or the superposition principle and well-posedness of the continuity equation in Section 4.

Moreover, for any net flux
Proof. It is straightforward to check that Ψ * ( )∕ 2 is monotone increasing for ≥ 0, from which the first statement follows. Now, for the net flux, it is convenient to go through the dual representation. Set ( ) ∶= (1+ ( )) −1 . By duality, for any ∈  ( ) However, by (2.10), Taking the supremum over all ∈  ( ) in (2.12) we find (2.11).
Proof. Since is narrowly continuous its mass is uniformly bounded in time, hence let ∶= sup ∈[0, ] ( ). By (2.9) and monotonicity of (⋅ ∨ 1) we have for a.e. ∈ [0, ], and therefore by convexity of (⋅ ∨ 1) Since the measures By a monotone class argument this can be extended to all ∈  ( ) and we derive that is indeed TV-absolutely continuous and ( , + , − ) ∈ ℰ.

Strong solutions.
Strong solutions to ( ) exist and are unique, and we list the most important properties here. It should be noted that these arguments apply even without the detailed balance condition ( , ) = ( , ) and only require both ‖ ‖ ∞ and ‖ ‖ ∞ < ∞ to be finite, but for simplicity, we will restrict ourselves to our framework. Moreover, in all results the time window > 0 is arbitrary.
The proof is an adaptation from [FM04, Proposition 7.2], which is stated for Lebesgue absolutely continuous measures over  = ℝ . In short, the linear dependence of the birth flux on the mass of gives a bound on this mass uniform in time, in which case both ± are Lipschitz in on (Γ, ‖ ⋅ ‖), and classical existence theory can be applied.
Proof. First, note that for the linear case of with ∈  uniformly bounded and ∈ Γ with ∫ 0 ‖ ‖ d < ∞ with a common dominating measure, it is easy to verify that a unique strong non-negative solution exists and is given by We now set 0 ∶=̄ for all ∈ [0, ], and perform the implicit Picard iteration It is straightforward to check that for all We will show that  is contractive under a suitable metric on the space of curves with initial datā and mass bounded by . This implies there exists a -absolutely continuous curve such that Moreover, since in the iterations ≪̄ + for all it is clear that we obtain strong solutions in 1 (̄ + ). In particular, for̄ ≪ we have ≪ for all ∈ [0, ] as well. Now, note that ⟨ ( , ⋅), ⟩ depends Lipschitz on in (Γ, ‖ ⋅ ‖ ) due to the uniform bound on mass. This implies that there exists a constant such that for any two admissible curves ,̃ : Hence, by a Gronwall-type argument, we find that for any > 0 for all ∈ [0, ] , thus yielding the contraction required to apply the Banach fixed-point theorem.
Finally, for the use in entropic propagation chaos of Theorem 5.4, it is convenient to characterize the conditions for which is bounded from above and below. The following statement follows directly from a Gronwall-type argument.
Lemma 2.17. Suppose 0 is such that −1 ≤ d 0 ∕d ( ) < for some constant > 0 and all ∈  . Then there exist a constant > 0 such that for the corresponding solution

Variational characterization.
We will now prove the non-negativity of our EDP-functional  and the characterization of strong solutions to ( ) as minimizers of  . To do so we first need the prove the chain rule for the free energy  along curves with finite  .
There is an important technical issue concerning the Fisher information, in the sense that on curves with finite  the chain rule inequality holds for the following replacement: We will see the same principle arise in Section 3 for the variational characterization of the forward Kolmogorov equation, which is also observed in [PRST22, Section 5].

Lemma 2.18. For any curve
Moreover, for such a curve Remark 2.19. In fact, for such curves, for a.e. both the terms will be finite, and hence We will show that whenever  < ∞ the mapping ↦ nt( | ) is absolutely continuous and satisfies the chain rule, i.e.
and in particular ({ = 0}) = 0. Similarly, ± ≪ for a.e. and hence > 0 for ± , net -a.e. for such as well. Furthermore, since for a.e. we have ± ≪ ≪ we find by Lemma 2.11 that ∶ [0, ] → 1 ( , ) is absolutely continuous and differentiable at a.e. ∈ [0, ]. Consider any such with  ( , + , − ), ( ) < ∞. By Lemma 2.9, for any ∈  ( ), Now let be the convex and uniformly Lipschitz regularizations of constructed by using the trun- Note that ′ converges pointwise to ′ , and both and | ′ | converge monotonically to and | ′ | respectively. Moreover, note that ′ ( ) = log is -a.e. finite, and similarly ± -a.e. as well. Therefore, since Ψ * is even and monotone on ℝ ≥0 we derive Recall that  − ( ) ≤  ( ). By substituting = 1 2 ′ in (2.14) we find and after a monotone convergence argument Note that for every the function is smooth and uniformly Lipschitz, thus the functional ∫ ( ) d is ‖ ⋅ ‖ -Lipschitz continuous and hence absolutely continuous by TV-regularity of . Moreover, since ± ≪ and is a.e. differentiable in 1 ( , ) it is straightforward to check that Therefore, since nt( 0 | ) is finite by assumption and the functionals ∫ ( ) d converge monotonically to nt( | ), we find In particular nt( | ) is finite for all ∈ [0, ], and after repeating the argument for , ∈ [0, ] we conclude by a dominated convergence argument that We are now finally in a position to prove Theorem 2.7. With the chain rule above all that remains is on one hand showing that  − ( , + , − ) = 0 implies that ± = ± for a.e. , and on the other hand, showing that if is a strong solution it holds that  − ( , + , − ) = 0 and  − =  for a.e. ∈ [0, ]. The second part again involves proving a chain rule, but now along the solution curve.
Vice versa, assume that is a strong solution with  ( 0 ) < ∞. Recall that ≪ for all ∈ [0, ] by Lemma 2.16, and hence ± ≪ as well. Therefore we can again write ∶= d ∕d , is absolutely continuous and a.e. differentiable, and thus for every regularized entropy function: Note that the latter expression is non-positive since ′ ( )( − 1) is non-negative, due to the convexity of and (1) = 0. Moreover, recall that the regularized entropies converge for every , are nonnegative, and nt( 0 | ) < 0 by assumption. Therefore It is clear that to obtain  = 0 it is sufficient to prove that for any with ≪ , By non-negativity of the integrand both Since ′ (0) = − this implies that in fact for all but since the former is finite after taking the limit → ∞, we deduce that and hence ({ = 0, > 0) = 0. Moreover, by monotone convergence we have Note by straightforward algebraic manipulation that Therefore Since all terms are non-negative we can separate terms and reduce the expression to Here the equality follows from the fact that ({ = 0, > 0) = 0 and hence i.e.  − ( ) =  ( ), and

FORWARD KOLMOGOROV EQUATION
In the introduction, we discussed how the BPDL model describes a measure-valued process in Γ involving particles being created and annihilated, with the corresponding Forward Kolmogorov equation where ∈ (Γ) for all ∈ [0, ] and * is the dual of the infinitesimal generator with for all ∈ (Γ). Throughout this section the parameter > 0 will be fixed.
In the case of  = ℝ it is shown in [FM04] that a measure-valued process with generator exists, and is in fact a jump process in Γ corresponding to the jump kernel̄ shown below. However, for our general setting with  a compact Polish space, we will take ( ) simply as a starting point, and do not consider the existence or convergence of the measure-valued process itself-even though we will sometimes borrow the language of jump processes for illustration purposes.
In this section, we will state the general version of Theorem 1.6, by showing that a detailed balance condition holds, establishing a generalized gradient structure for the Forward-Kolmogorov equation, and characterizing the solutions as minimizers of corresponding EDP-functionals. Similar to Section 2 we first give an overview of the ingredients to state the main results and then leave the proofs for the existence of solutions and the variational characterization to Sections 3.2 and 3.3.
Note that due to the fact that sup ∈Γ ± ( ) = +∞, the operator is not bounded on  (Γ). If it were, suitable solutions and possible variational formulation would fall into the framework of [PRST22], where triples ( , , ) are considered, with a Polish space, a finite measure, and ( , d ) a jump kernel satisfying a detailed balance condition with respect to and the boundedness condition They construct solutions to the forward Kolmogorov equation that are absolutely continuous to and characterize them as minimizers of a suitable EDP functional involving the net flux. In this section, we generalize part of this framework to unbounded kernels and so-called one-way or uni-directional fluxes and tailor it to our setting of interacting particle systems.
Namely, let the rescaled empirical measure mapping ∶ ∐ ≥1  → Γ be given as and let Γ ⊂ Γ be the space of finite positive discrete measures with common unit weight 1 , i.e.
Note that the operators , * can be represented as wherē ( , ⋅) ∈  + (Γ ) for all ∈ Γ is a jump kernel over Γ given by Moreover, we consider Poisson measures Π ∈ (Γ ) induced by the reference measure . Namely, with the measure ∈ ( ∐ ≥1  ) given by We will show in Lemma 3.12 that the measures Π are invariant measures of ( ) and that̄ satisfies the detailed balance condition with respect to Π , i.e. we have the symmetry It is straightforward to check that even though̄ is unbounded, we still have the weighted integrability condition Therefore we can still bootstrap from gradient-flow solutions in the sense of [PRST22] for regularized triples (Γ , Π ,̄ ), after passing from a net flux to a one-way flux formulation, see Appendix A, to obtain unique gradient-flow solutions as defined in Section 3.2.
We consider the familes of curves satisfying ( ) in the following appropriate distributional sense.
Throughout we will call arbitrary measures ± ∈  + (Γ ×  ) admissible if Moreover, since Γ is a closed subspace of the Polish space Γ, the extension of to (Γ) and the extension of ± to  + (Γ ×  ) are well-defined. For simplicity we will simply refer to them as , ± as well, and drop the -dependence in most arguments. It is also clear that for any admissible ± (∇ ,± )( , ) ∶= ( ± 1 ) − ( ) , ( , ) ∈ supp( ± ) and in particular (3.11) is equivalent to for all ∈ (Γ). Note that this can again be extended to all ∈  (Γ) via a monotone class argument.
Remark 3.2. Condition (2) represents the restriction that particles can only be deleted if there are at least two particles in the system, consistent with the fact that ∈ (Γ ) and hence the underlying process never attains = 0.
Moreover, condition (3) reflects the unboundedness of the observed fluxes ± , which stems from the unboundedness of the birth/death kernels ± in .
In order to define the dissipation potentials, let us introduce the measures ± ∈  + (Γ ×  ) Note that for any curve ( ) ∈[0, ] the measures ± ∶= ± satisfy the conditions (2) and (3), where the latter holds because ( , ) = 0. Moreover, as will be shown in Lemma 3.12, we have the following symmetry from which the detailed balance condition (3.7) directly follows.
Remark 3.5. The definition of Θ ,± is independent of the dominating measure Σ. Moreover, formally i.e. it represents the geometric mean of the expected fluxes going forwards and backwards along the transition ↔ + 1 . In addition, due to the symmetry (3.13) the measures Θ ,± simplify whenever ≪ Π , i.e. if d = dΠ we have Remark 3.6. Note that  is a jointly convex function in ( ± , ,∓ # ∓ ), and lower semicontinuous if  is bounded. Moreover, it is straightforward to check that whenever ≪ Π with d = Π it holds Finally, for technical purposes, we also introduce a version for net fluxes.
We are now in a position to give the general version of Theorem 1.6.
Theorem 3.8. For any ( , The proof of Theorem 3.8 is postponed to Section 3.3, and follows from the existence of a gradientflow solution via EDP-convergence of a sequence of regularized problems established in Section 3.2, and its uniqueness via a convexity argument. Remark 3.9. Similar to the mean-field case, the non-negativity of  and the identification of solutions to ( ) as null-minimizers of  is related to the formal equivalence where  is the so-called Lagrangian given by We discuss the implication of this relation in Appendix A. 3.1. A priori estimates. Below we will state the estimates and identities necessary to prove the chain rule and establish the existence of solutions.
Recall that ± satisfies the same restrictions (Conditions (2) and (3)) as the fluxes ± . This is easily verified, but since we will use it repeatedly let us state it here precisely.  . We then have the following.
Since − [ 1 ] = 0 for any ∈  , the sum in the right-hand side of the last expression starts from = 2, thus reducing the expression to It is clear that, for our desired equality, it is enough to show that for every , To do so, note that since ( , ) = 0, Hence, by symmetry of ⊗( +1) , we obtain as desired.
We then have the following result.
Lemma 3.13. The following statements hold: (ii) For any , admissible ± , and net flux net = + − ,− Moreover, for any common dominating measure Σ. Moreover, if d = dΠ , Remark 3.14. Since ≤ 3 for all ≥ 1 the estimates (3.16) are uniform in , which we will use in the EDP-convergence to establish tightness of sequences ,± under bound on  . Moreover, the representation (3.17) is used to deduce the lower-semicontinuity of  for sequences of curves.
(ii) By duality we have for any ∈  (Γ ×  ), Substituting + = , − = − • ,− and using the fact that ,− # Θ ,− = Θ ,+ we derive Since Ψ * is even we can replace and by their absolutes in the inequality, after substituting for appropriately, and we conclude with a monotone convergence argument. The inequalities (3.16a) and (3.16b) now follow similarly as in Lemma 2.9 via respectively Jensen's inequality and a dual approach.
Finally, we discuss the time-regularity of for admissible curves and state the analog of Lemma 2.11. Let the weighted total variation metric , be given as Note that , is lower semicontinuous with respect to the narrow topology, and while convergence in , does not directly imply narrow convergence, it does so on narrowly pre-compact sets.
Alternatively, in terms of the net-flux net = + with net ∶= + − − • ,+ , Remark 3.16. Note that the estimate (3.19) for the weighted total variation metric blows up as → ∞.
For the proof of EDP-convergence we instead use a weaker metric, the transportation-like metric defined by (4.4), which does behave uniform-in-for a sequence of curves with finite lim sup →∞  .
Proof. Due to the continuity equation and after a monotone class argument, we have the crude estimate for any ∈  (Γ). Now fix ∈  (Γ), and let ∶= sup ∈Γ ( )(1 + ( ) 2 ). Note that by the bounds of Lemma 3.13 for any ∈ Γ , we have the following estimates Taking the supremum over all ∈  (Γ) with sup ∈Γ ( )(1 + ( ) 2 ) ≤ 1 we conclude that Next, suppose that ≪ Π , ± ≪ ± Π for all ∈ [0, ]. Let = d ∕dΠ , ± = d ± ∕d ± Π . Note that by the absolutely continuity of with respect to , , the map ↦ is absolutely continuous in 1 ( ). Moreover, for every ∈  (Γ) the continuity equation reads as But due to Lemma 3.12, the integrands can be rewritten as follows and therefore which is the weak formulation of (3.20). Putting in the pre-factors (1+ ( ) 2 ) −1 to state the expression in terms of the finite measures and Σ, and noting that due to time-regularity (1 + ( ) 2 ) −1 is TVregular, we can proceed as in Corollary 4.14 of [PRST22] and conclude the proof after redefining , ± on negligible sets.

Weak solutions.
In this section we will discuss the existence of weak solutions to ( ), i.e. solutions to = div ,+ + + div ,− − , in appropriate weak form, but with the property that  ( , + , − ) ≤ 0. In the next section we will show that  ≥ 0 and that gradient-flow solutions, i.e. those with  = 0, are in fact unique.
Definition 3.17. A curve ( ) ∈[0, ] is a weak solution to ( ) if supp ∈ Γ for all ∈ [0, ], is continuous in the narrow topology and for all , ∈ [0, ], and all ∈ (Γ), Moreover, solutions turn out to inherit polynomial mass-estimates from the initial condition, see e.g. Theorem 3.1 of [FM04] for the case in ℝ . While throughout we do not assume more from the initial condition than having finite entropy with respect to Π (which does imply the finiteness of the first moment) and unfortunately arbitrary curves ( , + , − ) with finite  do not preserve moment estimates, we will include the statement for completeness.
We can now state the existence result of a weak solution satisfying one-half of the Energy-Dissipation principle, which is complemented by the chain rule proved in Section 3.3. The existence proof is one of EDP-convergence (see also Section 5), bootstrapping from problems with bounded kernels and the results of [PRST22].
Thus, fix any > 0. Due to the bound (3.21) it is clear that , is a bounded operator since Moreover, since the prefactor ( ) ( ) is symmetric under swapping of and , it straightforward to verify that̄ is still reversible with respect to the same invariant measure Π , i.e. we have The triple (Γ, Π ,̄ ) therefore satisfies the assumptions of [PRST22]. Keeping in mind the difference in definitions of Ψ * due to extra the factor 2, by [PRST22, Theoren 6.6] there exist a unique curve ∈ 1 ([0, ], 1 (Γ, Π )) such that 0 = d̄ ∕dΠ , and with ∶= Π as usual. In particular the entropy nt( |Π ) decreases along the solution and hence By evenness of Ψ, symmetry of Π ̄ and the identity (A.2), we can express for any after substituting for̄ ( , d ) Moreover, it is straightforward to check that and therefore with ± ∶= ±, we conclude Finally, note that by Lemma 3.13 and Remark 3.6 Next, we consider the sequence of pairs ( , ±, ) stemming from the regularized problems above, satisfying  , ( , +, , −, ) = 0 for all > 0. As for a priori estimates, we have Recall that is lower semicontinuous with respect to the narrow topology and convergence in implies narrow convergence on narrowly pre-compact sets. Since nt( |Π ) is bounded uniformly in and and nt(⋅|Π ) is narrowly coercive we obtain by a standard Arzelá-Ascoli argument, up to choosing a subsequence, the existence of a curve ↦ such that → narrowly for all ∈ [0, ].
Note that by the estimate (3.22) and lower-semicontinuity of the entropy, we have that for every ∈ [0, ], the sequence of measures converge setwise to and nt( |Π ) ≤ nt(̄ |Π ) < ∞. Moreover, ±, ↗ ± as → 0 for every , and hence setwise convergence of implies setwise convergence on pre-compact sets of Γ ×  for It is straightforward to check that we can pass to the limit in the continuity equation (3.11), and in particular, derive that is a weak solution to the unregularized problem. Finally, recall that  ( ) is convex in and narrowly lower semicontinuous in , and as shown above the action  is jointly convex and lower semicontinuous in ( ±, , ,∓ # ∓, ). Proceeding as in Remark 3.6, we also find that the Fisher information is jointly convex and lower semicontinuous in ( ±, , ,∓ # ∓, ) if are contained in sub-level sets of  . Therefore, we conclude that thus establishing the claim.
Together, Theorems 3.21 and 3.20 provide a proof of the variational characterization for the forward Kolmogorov equation.
Proof of Theorem 3.8. Under the assumption of  ( 0 ) < ∞ we have by Theorem 3.21 a chain rule for the entropy, the inequality  ≥ 0, and the statement that  ( , + , − ) = 0 implies that is a weak solution. Moreover, due to Theorem 3.20 there exists a weak solution with  ≤ 0.

LIOUVILLE EQUATION AND LIFTED DYNAMICS
In this section, we will consider the variational formulation for our proposed limit of the forward Kolmogorov equation , namely the Liouville equation It can be interpreted as a transport equation lifted from the mean-field dynamics, in the sense that it describes the evolution of the law of a deterministic process satisfying the mean-field equation but with possibly random initial conditions. We will consider the same ingredients as in previous sections, namely a non-negative EDP functional consisting of an action term, a difference of free energies, and a corresponding Fisher information term. The main technical tool that we use is a new superposition principle, which allows us to prove the chain rule via the results on mean-field curves of Section 2.
To be precise, we consider the following type of solutions.  As will be shown in Section 4.2, ∶= ( ) #̄ is a weak solution to ( ) for any initial datā ∈ (Γ).
In particular, if is a solution to ( ) than ∶= is a weak solution to (Li).
In particular, if  ∞ ( ) < ∞ we have Remark 4.6. Note that Θ ∞ (d , d ) = (d ) (d ). Moreover, if nt( ± |Θ ∞ ) is finite, we can set and it is straightforward to verify that we have the disintegration and the equivalence Together with the definitions of  ∞ and  ∞ this implies that if  ∞ ( , + , − ) is finite then the ± [ ] are well-defined for a.e. ∈ [0, ], and Throughout the rest of this section we will simply write ± , = ± [ ]. We will show the following equivalence, which subsumes Theorem (1.7).  . We do not have a priori uniqueness of the Liouville equation. However, we do have uniqueness of weak solutions for which a superposition holds, in particular for curves with finite  ∞ . Therefore gradient-flow solutions (null-minimizers of  ∞ ) are in fact unique.
In the case of ∶= with the solution to the mean-field equation there is a trivial superposition principle, and we have the following consequence. 4.1. A priori estimates. Due to the representation (4.3) of the dissipation potential in terms of meanfield objects, we can directly derive the following estimates from Lemma's 2.9 and 2.12.

Corollary 4.10. For any
for any common dominating measure Σ.
Finally, we consider the time-regularity for arbitrary curves, with respect to the following metric.
Definition 4.11. We define the following metric: Note that is narrowly lower semicontinuous. Moreover, for any ∈ Cyl (Γ) automatically ‖ ‖ (1 + ( ) 2 )grad Γ ‖ ‖∞ < ∞, and hence by a density argument it is straightforward to verify that convergence in implies vague convergence on Γ, and therefore narrow convergence on narrowly pre-compact subsets.
The inspiration for using a superposition principle stems from similar approaches in [EFLS16], [Erb16], where it is applied to transport equations lifted from the Boltzmann-equation or mean-field jump dynamics respectively, and the main ingredient is the abstract superposition principle over ℝ ℕ of [AT14]. However, these results are not directly applicable to our setting, since the mass of ( ) for a mean-field curve is not fixed, and [ ]( ) is finite but unbounded over Γ. We remedy this by combining two known superposition principles: on the one hand, the abstract superposition principle over ℝ ℕ of [AT14], and on the other hand one for finite-dimensional vector fields with linear growth, found in [AC08]. Our result is stated in Theorem B.1.

Variational characterization.
Having all the ingredients at hand, we can now prove the variational characterization for the Liouville equation, namely Theorem 4.7.
Proof of Theorem 4.7.
Since  ∞ ( 0 ) < ∞ we have that for -a.e. curve  ( 0 ) < ∞. Moreover, since both  ∞ and  ∞ are simply their mean-field counterparts integrated by , we find where the second equality follows from Fubini-Tonelli and the fact that  ,  ,  ≥ 0 and  ∞ ( 0 ) < ∞. In particular, by the non-negativeness of  it holds that  ∞ ≥ 0. Moverover, since  = 0 if and only if is the unique strong solution for an initial datum̄ with nt(̄ | ) < ∞, we derive by non-negativeness of  that  ∞ = 0 if and only if is concentrated on the unique solutions of the mean-field equation. In this case is characterized by where ∶ Γ → Γ defined by (4.2) maps anȳ to the unique solution to ( ) for initial condition for almost every ∈ [0, ], and in particular is a weak solution to (Li). Vice versa, if is a weak solution such that = ( ) # 0 , we simply set Since  ∞ ( 0 ) < ∞, we still have nt( | ) < ∞ for 0 -almost every , and we repeat the same calculations to conclude that indeed  ∞ = 0.

EDP CONVERGENCE
In the previous sections, we have established variational formulations for the solution to the forward Kolmogorov equation of the interacting particle system, for the solutions to the mean-field equation, and the corresponding Liouville equation. Moreover, for the latter, we have shown how the corresponding EDP-functional can be represented as the expectation over a functional of mean-field paths.
We are now in a position to rigorously discuss the convergence of the forward Kolmogorov equation to the Liouville equation, in terms of EDP-convergence of their gradient structures. Namely, let us denote a sequence of curves ( , ,+ , ,− ) ∈ converging to a curve ( , + , − ), denoted by lim →∞ ( , ,+ , ,− ) = ( , + , − ), if the following holds: • → narrowly for all ∈ [0, ], Moreover, for any such converging sequence Proof. Recall that  ( , + , − ) = 0 for all ≥ 0. Therefore, by (5.3) and Theorem 5.1 we have for any subsequence indexed by ′ converging to a ( , + , − ) ∈ ∞ that (5.2) holds, and hence and thus  ∞ ( , + , − ) = 0, which implies that is the unique gradient-flow solution to (Li) and ± = ± for a.e. ∈ [0, ]. The convergence of now follows from a compactness and equicontinuity argument, and by lower semicontinuity we conclude that for every ∈ [0, ] lim sup Now suppose that in addition the initial sequence of measures̄ is chaotic, in the sense that → ̄ narrowly for somē ∈ Γ.
Then as a consequence of Theorem 5.3 we have propagation of chaos, namelȳ where is the unique solution to the mean-field equation (2.13) with initial datum̄ . As mentioned in the introduction, while for interacting particle systems with the number of particles fixed at ∈ ℕ this would imply narrow convergence of the -marginals at time to ⊗ (e.g. see [Szn91]), in our setting this implies convergence of the -correlation functions [BGSRS20]. Moreover, note that we have a stronger notion of convergence, since the free energies  converge as well. Under appropriate conditions on the initial datum̄ , this guarantees a version of propagation of entropic chaoticity. Namely, for any we define the rescaled Poisson measures It is straightforward to check that Π , * → * narrowly. We then have the following result. Theorems 5.1 and 5.4 are proved in Section 5.3. However, first we show Γ-convergence of the free energies in Section 5.1, and establish the necessary estimates in Section 5.2. 5.1. -convergence of  . While only the liminf-estimates for the free energy  are necessary for the proof of Theorem 5.1 and the convergence of solutions, we provide here the full -convergence result. We rely strongly on the characterization of [Mar12], which connects a large deviation principle with rate function to the fact that and provides useful sufficient conditions for both. Recall in our setting that We then have the following result, which we prove after Lemma 5.6 below.
Theorem 5.5. The family { } ≥1 is equicoercive and -converges to  in the sense that • for any converging sequence → ∈ (Γ): • for any ∈ (Γ) with  ∞ ( ) < ∞ there exists a sequence ∈ Γ converging to such that By the results of [Mar12, Theorems 3.4, 3.5] it is sufficient to merely show the corresponding bounds or limits for any of the form = for some ∈ Γ. Because of this reduction, we can make use of the so-called cumulant generating functionals given by for any ∈  (Γ), and their limit counterpart Note that by duality of the entropy, we have for all > 0 the inequality and for the Legendre-dual of we have * ( ) ∶= sup We will first simplify and show that it indeed converges to .
Proof. Using the representation for the rescaled Poisson measure Π we have and after taking logarithms and dividing by we obtain the desired statement. Moreover, recall that by assumption ( ) > 0 and note that by the boundedness of , Hence we can take limit → ∞ to deduce thereby concluding the proof.
Proof of Theorem 5.5. First, we will show that the family { } ≥1 is equicoercive, by establishing a first moment bound for in terms of mass ( ). Namely, setting = 1 in (5.4) we have for any ∈ (Γ), ≥ 1, the inequality where the final term is bounded from above independently of .
Next, for the limit inferior, consider a converging sequence → = ̄ for somē ∈ Γ. Fix any ∈ ( ), then by the duality (5.4), Taking the supremum over all ∈ ( ) we find Finally, consider anȳ ∈ Γ with nt(̄ | ) < ∞ and set = ̄ . We will construct a sequence of measures that locally consists of Poisson measures induced bȳ . Namely, set and consider the sequence ∶= Π ,̄ . It is straightforward to verify that indeed → ̄ . Moreover, note that although is not bijective, we do have the equality due to the symmetry of the -particle distributions̄ ⊗ , ⊗ . Therefore, we derive nt( |Π ) = nt( ,̄ | ) Rescaling and taking the limit → ∞, we obtain therewith concluding the proof.

Uniform estimates.
In Section 3.1 we provided uniform-in-estimates for the flux. Namely, from Lemma 3.13, we directly have the following.

Proof of main results.
We finally conclude the manuscript with the proof of the main results.

Now suppose that
In particular we have the bounds Due to the chain rule and the assumption on  ( 0 ), we obtain The latter guarantees, by Corollary C.3, that we have the vague convergence Recall that from Lemma 3.13 and Remark 3.6 we have for each ≥ 1: for any dominating measure Σ, and similarly, from Corollary 4.10 and Remark 4.5 that  ∞ ( ) = 2 2 ( ± , ∓ ).
By the convexity and lower semi-continuity of Υ and we conclude by standard semi-continuity results (e.g. see [But89,Theorem 3.4.3]) that for each ∈ [0, ], from which (5.1) directly follows after applying the Fatou lemma.
Next, we consider the question of compactness. As in the previous part, let us consider a sequence ( , ,+ , ,− ) ∈ with which imply that the estimates (5.5) and (5.6) still hold. The bound on the free energy ensures by Theorem 5.5 that { } ∈[0, ], ≥1 is pre-compact. Moreover, due to the bound on the action  , we have by the results of Corollary (5.7) and Lemma (5.8) that where |̇ | is again the -metric speed. From (5.7), we then conclude from the non-decreasing, convex and super-linear at infinity property of̃ that, up to choosing a subsequence ′ , there exists a family { ± } ∈[0, ] ∈  + (Γ ×  ) such that for all , the sequence of measures Similarly, since the metric is narrowly lower semicontinuous and induces narrow convergence on narrowly pre-compact subsets, we find by an Arzela-Ascoli argument and the estimate (5.8) that, up to choosing a subsequence ′′ , there exist a narrowly continuous curve ( ) ∈[0, ] such that ′′ converges to for all ∈ [0, ].
All that remains is showing that ( , + , − ) ∈ ∞ . Therefore, fix any , ∈ [0, ] and ∈ Cyl (Γ). It is straightforward to verify that there exist constants and such that the following Taylor approximation holds: Thus, we can take the limit in the continuity equation , to conclude that For each ∈ ℕ let be the unique gradient-flow solution to ( ) with initial datā . Moreover, let be the unique solution to (2.13) with initial datā , and set ∶= , which is the unique gradientflow solution to the Liouville equation (Li) with initial datā . Then by Theorem 5.3 we have for every ∈ [0, ] that → , and Next, suppose that in addition there exists a constant > 1 such that −1 ≤ d̄ ∕d ≤ . By Lemma 2.17 we find that there exists ′ < ∞ with Now fix any ∈ [0, ], and recall that It is straightforward to check that Π ≪ Π , ≪ Π and hence for any Γ ∋ Γ = ( 1 , … , ), with all terms finite, and | ∑ log ( )| ≤ ′ . Therefore, by applying a similar density argument for log as in Theorem C.1 we derive Subsequently, we can compute as follows: In Section 3, we introduced a new generalized gradient structure for the forward Kolmogorov equation and later showed convergence in the large-population limit to a structure that was lifted from the mean-field dynamics. Here we briefly discuss the relation between existing variational structures, and their connection to the asymptotic probabilities of the underlying process as treated in large deviation theory. All calculations are purely formal and are meant for illustratory purposes.
Throughout, for simplicity, let  be a finite set. Recall the reacting particle system formulation described by (1.2), i.e. as particles 1 , … , ∈  at positions 1 , … , ∈  , and with The splitting above is a direct consequence of fact that under the assumption of ( , ) = 0, ( , ) = ( , ) for all , ∈  , the underlying jump process is reversible, i.e.,̄ satisfies the detailed balance condition Π (d )̄ (d , d ) = Π (d )̄ (d , d ). Namely, consider the functional given bȳ And as we have shown, in the large-population limit of → ∞,  EDP-converges to a functional that is lifted from  , establishing the microscopic origin of the splitting for  .
This decomposition for reversible processes is well-known in the net-flux representation. Namely, one can show via a minimization approach that Thus is simply the EDP-functional for jump processes of [PRST22]. The works [MPR14, KJZ18,PS22] contain an extensive overview and discussion on how is the expected rate functional for a large-deviation principle for the empirical measures of independent jump processes, how the reversibility of the process ensures a possible splitting in both the interacting and non-interacting case, and how for complex-balanced systems this can even be done in the irreversible setting. Moreover, for an implicit decomposition using measure-dependent Dirichlet forms in the case of the homogeneous Boltzmann equation and the underlying process, see [BBBO21].
On a final note, due to (A.1) and the origin of in large deviations for independent particles (or via variational representations as found in [DE97]), one would expect that if ∈ (Γ) for all ∈ [0, ], we would have for all > 0 the following representation formula for the expectation: On the other hand, by the large deviation principle of ( , ,± ) as → ∞, and Varadhan's Lemma (see [DZ10]), it holds that Note that the lower bound of this equality follows from Theorem 5.1 and the superposition principle in Theorem 4.7. Moreover, we expect that the large-deviation principle implies evolutionaryconvergence of  in a suitable topology-an implication studied in [Kra19] in a general setting.
It then begs the question if one can reverse this procedure, namely using evolutionary -convergence to establish large-deviation principles similar to the non-evolutionary setting of [Mar12]. This approach was successfully applied in the case of certain diffusion processes [Fat16] and discussed for more general processes in [KJZ19].

APPENDIX B. SUPERPOSITION PRINCIPLE IN ℝ ℕ
In this section, we present a superposition principle for continuity equations over ℝ ℕ with an additional weighted integrability condition on the associated vector fields. We set Cyl (ℝ ℕ ) as the union over ∈ ℕ of all smooth -cylindrical functions with compact support. In the following, we consider pairs ( , ), where ( ) ∈[0, ] ⊂ (ℝ ℕ ) is a weakly continuous family of probability measures and ∶ [0, ] × ℝ ℕ → ℝ ℕ is a Borel vector field satisfying and all 0 ≤ ≤ ≤ .
We then have the following result.
Theorem B.1. Let ( , ) be as above. Furthermore, suppose that for some > 0 The proof of Theorem B.1 combines a slight adaptation of the proof for the superposition principle in ℝ ℕ found in [AT14, Theorem 7.1], developed for use in metric measure spaces, with a finitedimensional result for vector fields over ℝ found in [AC08,Theorem 4.4]. Due to the strong similarities with the proof found in [AT14], we merely give a brief sketch.
Proof. By tightness of 0 , we can choose a sequence of coercive functionals Φ such that show that is concentrated on solutions oḟ = ( ). In fact we will show that the sub-levels of  are sequentially compact with respect to setwise convergence, this is not the case for equibounded sets of { } ≥1 . Fortunately, due to the connection between -convergence of  and large deviations as discussed in Section A, we can modify results from the authors' earlier work on large deviations for interacting systems induced by singular or irregular functionals [HHMT20]. In particular, we obtain the following convergence statement. Remark C.2. The result can be easily generalized to bounded measurable functions ∈  ( ) for finite ∈ ℕ, but we restrict ourselves to the case = 2. Proof. The first statement of (C.1) follows directly from Theorem C.1 by substituting ∶= . Moreover, by the uniform continuity and compact support of any ∈ (Γ ×  ) we have For the proof of Theorem (C.1) we will need some a priori bounds. Namely, recall from Section 5.1 the generating functionals and their limit For the "interacting" case, namely functionals of the form there is however a problem with the unboundedness of the mass of . Nevertheless, upon controlling the mass we can provide the following technical estimate.
With the above estimate in hand, we can now prove our convergence statement by approximating with a sequence of continuous such that The existence of such a sequence follows similarly as for density statements in ( ), see for example [HHMT20][Theorem C.5].