Abstract
We consider the forward Kolmogorov equation corresponding to measure-valued processes stemming from a class of interacting particle systems in population dynamics, including variations of the Bolker–Pacala–Dieckmann-Law model. Under the assumption of detailed balance, we provide a rigorous generalized gradient structure, incorporating the fluxes arising from the birth and death of the particles. Moreover, in the large population limit, we show convergence of the forward Kolmogorov equation to a Liouville equation, which is a transport equation associated with the mean-field limit of the underlying process. In addition, we show convergence of the corresponding gradient structures in the sense of Energy-Dissipation Principles, from which we establish a propagation of chaos result for the particle system and derive a generalized gradient-flow formulation for the mean-field limit.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
An important goal in theoretical biology and population dynamics is to derive macroscopic equations from microscopic models [7, 14]. For many stochastic interacting particle systems involving birth, mutation, and death, these connections have been made rigorous. One such class of particle systems consists of spatially-structured models such as the Bolker–Pacala and Dieckmann-Law (BPDL) model [5, 24]. The dynamics of these particle systems can be described by jump processes on the space of finite positive measures and can be used to derive macroscopic models.
The convergence of such measure-valued jump processes under a mean-field scaling to a large-population limit is shown for example in [18] via martingale techniques, and in [14], where an analytic approach to the convergence of rescaled moment equations is used. In both approaches, the limiting evolution is governed by a non-local evolution equation given by
We will refer to (1.1) as the mean-field equation. Here, \(u_t\) represents the limiting density of particles at time t, and the parameter functions m and c are continuous and bounded functions stemming from birth, dispersal, and competition in the BPDL model.
In recent years, there has been considerable activity in studying the mean-field Eq. (1.1) and the BPDL model in more general spaces, allowing for dynamics involving multiple species and combinations of discrete and continuous traits. See for example [16] for an overview of existing models, where instead of \({\mathbb {R}}^d\) the underlying space is an arbitrary locally compact Polish space. However, convergence in the large population limit is not considered.
Meanwhile, powerful variational tools have been developed in the last decade for studying mean-field interacting jump processes and their limits under the assumption of detailed balance. To highlight only a few: [11] studied mean-field limits for measure-dependent jump processes; [12] proved the convergence of the spatially-homogeneous Kac-process to the Boltzmann equation; [35] investigated the macroscopic limit of Becker–Döring models; [21] showed hydrodynamic limits for zero-range and exclusion processes; [28] discussed convergence and higher-order approximations for chemical reaction networks, an approach that was subsequently used in the setting of discretized reaction-diffusion equations in [31].
In this work, we extend and apply these variational techniques to prove the mean-field limit for population dynamics over arbitrary compact Polish spaces, with bounded measurable parameters m, c satisfying a detailed balance condition. In addition, we establish entropic propagation of chaos, which controls the discrepancy between the microscopic and macroscopic models in a precise sense. To the authors’ knowledge, this is the first convergence result under such general assumptions.
To do so, we first introduce a new generalized gradient structure and rigorous variational formulation for the forward Kolmogorov equation (FKE) corresponding to the BPDL model, where the FKE describes the evolution of the law of the measure-valued process. Our formulation incorporates not only the equation itself but tracks the birth and death fluxes as well. This extends the generalized gradient-flow framework of [33] due to the unboundedness of the underlying jump kernel, and the positivity of the fluxes.
We then show the convergence of these generalized gradient structures under a mean-field scaling and the large-population limit in the sense of Energy Dissipation Principles (EDPs) (see [25]). The limiting gradient flow is the Liouville equation corresponding to the mean-field equation, namely a transport equation that describes the evolution of the law of a process that follows deterministic dynamics described by (1.1) but for possibly random initial conditions. This connection between the Liouville equation and the mean-field equation is made rigorous with the help of a modification of the superposition principle of [1].
In particular, we deduce that the laws determined by the FKE equation concentrate around the solution of the mean-field equation (1.1), which due to the convergence of the associated free energies translates into an entropic propagation of chaos result, see Theorem 1.10.
Outline The rest of this section is devoted to giving a brief overview of our setting and presenting the main results. In Sect. 2 the mean-field equation and corresponding gradient structure are introduced. We repeat this process in Sects. 3 and 4 for the forward Kolmogorov equation and the Liouville equation respectively, with the proof of a modified superposition principle delegated to “Appendix B”. Finally, in Sect. 5, we establish the EDP-convergence of the gradient structures and prove both the convergence to the mean-field limit and the propagation of chaos.
1.1 Measure-valued population dynamics and mean-field limits
We consider the forward Kolmogorov equation that corresponds to a generalized version of the BPDL model. In its classical form, the Bolker-Pacala model is a purely spatially-structured microscopic model for a population of plants involving the birth, dispersal, and either natural death or death by competition for resources and can be modeled as a jump process in the space of positive measures over \({\mathbb {R}}^d\). However, in certain models of adaptive evolution, it is the mutation of traits that play a role, instead of spatial evolution (see [7, 8, 24]). Moreover, if one wants to model multiple interacting species or marked configuration spaces, more general spaces than \({\mathbb {R}}^d\) are needed [16, 22]).
Therefore, let the trait space be an arbitrary Polish space, denoted henceforth as \(\mathcal {T}\). We model the BPDL-dynamics at any time t as an interacting particle system with particles with labels \(A_t^1,\dots ,A_t^{N_t} \) and traits \(X_t^1,\dots ,X_t^{N_t} \in \mathcal {T}\), where the number of particles \(N_t\) at time t is not fixed since particles can be removed from and added to the system.
Moreover, let \(b\in \mathcal {B}^+(\mathcal {T})\), \(d,c\in \mathcal {B}^+(\mathcal {T}\times \mathcal {T})\) be non-negative measurable functions, \(n>0\) a positive parameter, and \(\gamma \in \mathcal {M}^+_{loc}(\mathcal {T})\) a non-negative reference measure such that
Then the BPDL dynamics can be described as follows:
-
Each particle with trait \(x\in \mathcal {T}\) has two exponential clocks: a seed clock with rate b(x) and a death clock with rate \(\tfrac{1}{n}\sum _{i=1}^{N_t} c(x,X_t^i)\).
-
If the death clock rings, the particle is deleted.
-
If the seed clock rings, a new particle is added with trait \(y\in \mathcal {T}\) with probability \(d(x,y)\gamma (\textrm{d}y)\).
Alternatively, we can describe these dynamics in the form of reacting particles. Namely, setting \(m(x,y):=b(x)d(x,y)\), then with a little of abuse of notation we have
We will refer to m as the mutation kernel, and c as the competition kernel. The parameter \(n>0\) is called the system size, in the sense that that the scaling \(n^{-1}c\) guarantees that if the amount of particles in the system is of the order of n, the total rate of created or deleted particles is of the same order.
Instead of looking at the individual traits of the particles, it is common to only consider the measure-valued process \(\nu _t\) determined by the rescaled empirical measure
Here, \(\nu _t\in \Gamma :=\mathcal {M}^+(\mathcal {T})\) with \(\mathcal {M}^+(\mathcal {T})\) the space of finite non-negative measures. The infinitesimal generator \(Q_n\) of this process is given for all \(F\in C_c(\Gamma )\) by
where \(\kappa ^{\pm }[\nu ]\in \Gamma \) are the measure-dependent birth/death-kernels
The law of the process is now given by the corresponding forward Kolmogorov equation
Depending on the setting, this formulation can be made rigorous in various ways: for example via an analytical approach on configuration spaces as done in [14], which in fact models infinite configurations of particles over \({\mathbb {R}}^d\), or via martingale techniques with \(\mathcal {T}\) a closed subset of \({\mathbb {R}}^d\) and \(\gamma =\mathscr {L}^d|_{\mathcal {T}}\) (see [18]). Moreover, in the latter, under the assumption of continuous, bounded, and integrable mutation/competition kernels, it is also shown that the process converges in the large-population limit \(n\rightarrow \infty \) to the mean-field Eq. (1.1), which can be rewritten as
i.e. \(u_t\) is the density of \(\nu _t\) with respect to \(\gamma \).
While different choices of scalings are possible, the mean-field equation describes the macroscopic properties of the measure-valued process when the population is large. An alternative way is to study the evolution of the moments, which form a hierarchy similar to the BBGKY-hierarchy of correlation functions, and under the so-called Vlasov scaling the first moment or correlation function converges to (\(\mathsf MF\)). For the case of infinite configurations over \({\mathbb {R}}^d\) this has been established, see [15], and both propagation of chaos in the Vlasov limit and the sub-Poissonian property have been established as well [17].
In this work, we do not consider the measure-valued process itself, but take the forward Kolmogorov equation (\(\mathsf FKE_n\)) as a starting point, and show convergence to the mean-field equation in the sense that \(\textsf{P}^n_t\rightarrow \delta _{\nu _t}\) narrowly on \(\mathcal {P}(\Gamma )\) under suitable initial conditions. Throughout we equip the space \(\Gamma \) with the narrow topology, and assume the following:
Assumption 1.1
The trait space \(\mathcal {T}\) is a compact Polish space, and moreover
The assumption of no natural death means that particles can only be deleted due to competition with other particles. Moreover, with a bit of abuse of notation, the two conditions \(c(x,x)=0\) for all \(x\in \mathcal {T}\) and \(m(y,x)=c(x,y)\) for all \(x,y\in \mathcal {T}\) together will be referred to as the detailed balance condition, because they imply that the jump kernel \({{\bar{\kappa }}}_n\) (defined in Sect. 3) corresponding to the measure-valued process satisfies the detailed balance condition with respect to an invariant measure \(\Pi _n\), i.e.
Here \(\Pi _n\) is obtained as a push-forward of the Poisson measure \(\pi _n\), with
under the rescaled empirical measure mapping determined by (1.3), see (3.2). This allows us to write the forward Kolmogorov equation as a gradient flow of the relative entropy with respect to \(\Pi _n\), and equip it with a corresponding variational structure, see Theorem 1.6.
However, the condition \(m(y,x)=c(x,y)\) for all \(x,y\in \mathcal {T}\) alone suffices to express the mean-field equation as a gradient flow as (cf. Theorem 1.4), and will, therefore, be referred to as the mean-field detailed balance condition.
In light of similar results in [11, 28] for mean-field jump processes on finite spaces and finite chemical reaction networks, one expects (\(\mathsf FKE_n\)) to converge to the following Liouville equation
It is a transport equation that can be interpreted as the lifting of mean-field dynamics in \(\Gamma \) to evolutions in \(\mathcal {P}(\Gamma )\) and describes the evolution of the law of random measures \(\nu _t\) that all satisfy (\(\mathsf MF\)). In particular, if \(\nu _t\) a solution of (\(\mathsf MF\)) then \(\textsf{P}_t:=\delta _{\nu _t}\) is itself a solution of (\(\mathsf Li\)).
It turns out that in our general setting, this convergence holds as well, as will be stated in Theorem 1.9. Letting \(V[\nu ]=\kappa ^+[\nu ]-\kappa ^-[\nu ]\), we can therefore represent part of our results in Fig. 1.
This convergence is a direct consequence of the convergence of the associated gradient structures, which we will describe below.
1.2 Gradient-flow formulation
Our first main result concerns the variational formulation of the equations (\(\mathsf FKE_n\)), (\(\mathsf MF\)), (\(\mathsf Li\)) and their specific gradient structure. Various gradient-flow formulations exist for jump processes, mean-field jump processes, and chemical reaction networks [11, 12, 21, 28, 33].
In these works, a common starting point is to describe the relation between \(\rho _t\), representing either law of some process or mean-field limits and generalized fluxes \(j_t\) in the form of an abstract continuity equation. For example, in the case of independent particles following a common jump process over a graph, \(\rho _t\) corresponds to the number of particles on a node at time t, and a choice of flux \(j_t\) can be the so-called net flux \(j_t\), which is related to the number of particles going through an edge.
However, we propose a slightly different structure, namely one that tracks the effective mass fluxes for both creation (arising from mutation) and annihilation (arising from competition) separately. The use of mass fluxes instead of usual particle fluxes ensures that in our convergence results as \(n\rightarrow \infty \) we have both convergences of laws and fluxes (see Theorem 1.8).
Moreover, separating the effects of creation and annihilation (henceforth simply referred to as birth and death) instead of their combined contribution allows us to incorporate more information in our variational formulation. The downside is that we are forced to work with positive fluxes, while the framework in the aforementioned examples involves either quadratic or generalized structures for signed net fluxes. In this sense we are closer to the variational representations stemming from large deviations, involving so-called one-way or unidirectional fluxes, see for example [3, 30, 32, 34]. Indeed, our structure is motivated by large deviation theory, as we will discuss briefly in “Appendix A”.
In all three cases, i.e. for (\(\mathsf FKE_n\)), (\(\mathsf MF\)) and (\(\mathsf Li\)), our proposed structure is similar to the classical notion of a gradient flow in the sense that they all satisfy an abstract Energy-Dissipation Balance. Since we will repeat the same concept three times on different levels and for different spaces, let us make the general and abstract concepts clear:
Formal Definition 1.2
Given a free energy functional \(\mathcal {F}(\rho )\), a dissipation potential \(\mathcal {R}(\rho ,j)\), a Fisher information functional \(\mathcal {D}(\rho )\), and a linear operator B with dual \(B^*\), we consider pairs of curves \((\rho ,j)\) satisfying the continuity equation
and define the EDP-functional
Moreover, a gradient-flow solution is a pair \((\hat{\rho },\hat{\jmath })\) satisfying (\(\textsf{CE}\)) with \({\mathcal {I}}(\hat{\rho },\hat{\jmath })=0\).
Throughout we require the non-negativity of \(\mathcal {I}\). For a deeper look at the mathematical basis of this sort of setting, especially for generalized gradient systems incorporating net fluxes, see [33].
In all three examples the generalized fluxes j consist of two parts: \(j^+\) and \(j^-\), corresponding to birth and death. The continuity equations depend on the setting and are summarized in Table 1, with \(\mathcal {M}^+_{loc}\) as the space of non-negative Radon measures.
Remark 1.3
Note that the gradient-flow solution \((\hat{\rho },\hat{\jmath })\) is the null-minimizer of \(\mathcal {I}\), and satisfies the energy-dissipation balance
Moreover, with \(\langle \cdot ,\cdot \rangle \) shorthand for appropriate dual pairings, one would expect for small T that
where we used the continuity equation (\(\textsf{CE}\)) and duality of \(B,B^*\), and therefore
In light of the generalized gradient-flow framework of [33] and the relation to minimizing movement schemes, a formal minimization procedure provides the gradient-flow solution
and that along the solution,
where \(\mathcal {R}^*(\rho ,w)\) is the dual of the dissipation potential \(\mathcal {R}\). Finally, note that along the gradient-flow solution the free energy \(\mathcal {F}\) is non-increasing, i.e. \(\mathcal {F}\) is a Lyapunov functional.
These (in)equalities indeed hold in our setting. See also “Appendix A”, where we compare the relation to generalized gradient flows for net fluxes, which follow from the above after a contraction argument, and the connection to the reversibility of the underlying process.
Let \(H(\mu _1,\mu _2)\) be the Hellinger distance, see (2.2), and \(\mathcal {E}\textrm{nt}(\mu _1|\mu _2)\) the relative entropy of \(\mu _1\) with respect to \(\mu _2\) for two (possible infinite) locally finite Borel measures \(\mu _1,\mu _2\):
where
With the full technical details contained in Theorems 2.7, 3.8 and 4.7, we then have the following triple of results below,
Theorem 1.4
(Mean-field, cf. Theorem 2.7) Consider triples \((\nu ,\lambda ^+,\lambda ^-)\), with \(\nu _t,\lambda _t^{\pm }\in \Gamma \), satisfying the mean-field continuity equation
Define the dissipation potential \(\mathcal {R}_{MF}\), free energy \(\mathcal {F}_{MF}\) and Fisher information \(\mathcal {D}_{MF}\) as
where \(\theta _{\nu }\) is the geometric mean of the expected birth and death fluxes, i.e.
Then the corresponding EDP-functional \({\mathcal {I}}_{MF}\) given by
is non-negative, and for any \(\nu _0\) with \({\mathcal {F}_{MF}(\nu _0)}<\infty \) a unique gradient-flow solution \(({\hat{\nu }},{\hat{\lambda }}^{+},\hat{\lambda }^{-})\) exists, with \({\hat{\nu }}_t\) equal to the unique strong solution to (\(\mathsf MF\)) and \(\hat{\lambda }_t^{\pm }=\kappa ^{\pm }[{\hat{\nu }}_t]\) for almost every \(t\in [0,T]\).
As mentioned, although treating birth and death separately provides us with additional information, this prohibits the use of some of the previous works for gradient structures because of the positivity of the fluxes. However, there is still a strong connection to the variational formulations for jump processes arising from the large deviations of fluxes as seen in [32] and [3], see for example “Appendix A” on the equivalence of the EDP-functional to the expected rate functional.
Remark 1.5
It is straightforward to verify that if \(\textrm{d}\nu =u \textrm{d}\gamma \)
and hence it is not directly clear that the relation (1.5) holds. However, as will be shown for Theorem 2.7, at least along the solution \({\hat{\nu }}_t\) the equivalence holds for a.e. \(t\in [0,T]\).
Theorem 1.6
(Forward Kolmogorov, cf. Theorem 3.8) Consider triples \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\), with \(\textsf{P}_t\in \mathcal {P}(\Gamma )\) and \({\textsf{J}_t^{\pm }\in \mathcal {M}^+_{loc}}(\Gamma \times \mathcal {T})\), satisfying the continuity equation
where
Define the n-dependent Fisher information \(\mathcal {D}_{n}\) as stated in Definition 3.4, free energy
and dissipation potential
where, with a little abuse of notation (see (3.14)),
Then the corresponding EDP-functional \({\mathcal {I}}_{n}\) given by
is non-negative, and for any \(\textsf{P}_0\) with \(\mathcal {F}_n(\textsf{P}_0)<\infty \) a unique gradient-flow solution \((\hat{\textsf{P}},\hat{\textsf{J}}^{\pm })\) exists, with \(\hat{\textsf{P}}_t\) equal to a weak solution to (\(\mathsf FKE_n\)) and \(\hat{\textsf{J}}_t^{\pm }=\hat{\textsf{P}}_t \kappa _{\nu }^{\pm }\) for almost every \(t\in [0,T]\).
Similar to the mean-field case, the dissipation potential consists of relative entropies with respect to geometric averages, now of forward and backward rates along a transition \(\nu \rightarrow \nu \pm \tfrac{1}{n}\delta _{x}\). Moreover, note that in contrast to the framework of [33], we employ fluxes \(\textsf{J}^{\pm }\) that are not finite measures. This is due to the unboundedness of \(\kappa _{\nu }\) as the mass of \(\nu \) grows, which implies that the underlying jump kernel over \(\Gamma \) is itself unbounded as well, see Sect. 3.
For the Liouville equation, let us define \(\textrm{Cyl}_c(\Gamma )\) as the space of compactly supported smooth cylinder functions of the form
where \(f_1,\dots ,f_m\in C_b(\mathcal {T})\), and \(\textrm{grad}_{\Gamma }\) is the distributional gradient defined by
Theorem 1.7
(Liouville, cf. Theorem 4.7) Consider triples \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\), with \(\textsf{P}_t\in \mathcal {P}(\Gamma )\), \(\textsf{J}^{\pm }\in \mathcal {M}_{loc}(\Gamma \times \mathcal {T})\), satisfying the continuity equation
Define the Fisher information \(\mathcal {D}_{\infty }\) as stated in Definition 4.4, free energy
and dissipation potential
Then the corresponding EDP-functional \({\mathcal {I}}_{\infty }\) given by
is non-negative, and for any \(\textsf{P}_0\) with \(\mathcal {F}_n(\textsf{P}_0)<\infty \) a unique gradient-flow solution \((\hat{\textsf{P}},\hat{\textsf{J}}^{\pm })\) exists, with \(\hat{\textsf{P}}_t\) equal to a weak solution to (\(\mathsf FKE_n\)) and \(\hat{\textsf{J}}_t^{\pm }=\hat{\textsf{P}}_t \kappa _{\nu }^{\pm }\) for almost every \(t\in [0,T]\).
Finally, for any \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\) such that \(I_{\infty }(\textsf{P},\textsf{J}^+,\textsf{J}^-)<\infty \), there exists (with a little abuse of notation) a Borel probability measure \(\Omega \) over curves satisfying the mean-field continuity equation (\(\mathscr{C}\mathscr{E}\)) such that for all t the time marginals \((e_t)_{\#} \Omega \) are equal to \(\textsf{P}_t\), and
The statement of (1.8) is the aforementioned superposition principle, which is a modified version of the superposition principle [2] in metric measure spaces, and the ones used in [11, 12]. It allows one to essentially jump back and forth between the Liouville equation and the mean-field dynamics, and in particular, provides us with the non-negativity of \(\mathcal {I}_{\infty }\) and the uniqueness of gradient-flow solutions.
1.3 Convergence results
Our final and most important result is that the above gradient structures converge in the sense of EDP-convergence (e.g. see [25, 34]), a generalization of the evolutionary \(\Gamma \)-convergence approach stated by [36, 37] and expanded on in [27], which implies convergence of the gradient-flow solutions and their free energies.
We say that a sequence \((\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})\in \textsf{CE}_n\) converges to some \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{{\infty }}\) if for all \(t\in [0,T]\) the probability measures \(\textsf{P}_t^n\) converge narrowly to \(\textsf{P}_t\) in \(\mathcal {P}(\Gamma )\), and \(\textsf{J}^{n,\pm }_t(\textrm{d}\nu ,\textrm{d}x) \,\textrm{d}t\) converge vaguely to \(\textsf{J}^{\pm }_t (\textrm{d}\nu ,\textrm{d}x)\, \textrm{d}t\) in \(\mathcal {M}_{loc}^{{+}}([0,T]\times \Gamma \times \mathcal {T})\). Again postponing technicalities, see Theorem 5.1, we have the following lower semi-continuity and compactness result:
Theorem 1.8
(cf. Theorem 5.1) The sequence of free energies \(\mathcal {F}_n\) \(\varGamma \)-converges to \(\mathcal {F}_{\infty }\).
Moreover, the sequence of Fisher-information functionals and dissipation potentials are all sequentially lower semicontinuous for sequences of curves with bounded \(\mathcal {I}_n\) and initial \(\mathcal {F}_n\). In particular, for any sequence \((\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})\in \textsf{CE}_n\) converging to a \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{\infty }\) such that \(\mathcal {F}_{n}(\textsf{P}_0^n)\rightarrow \mathcal {F}_{\infty }(\textsf{P}_0)\) as well, we have
Finally, for any sequence \((\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})\in \textsf{CE}_n\) such that
there exists a subsequence converging to some \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{\infty }\).
Here the notion of EDP-convergence or evolutionary \(\varGamma \)-convergence (where the \(\varGamma \) is not to be confused with our space of positive measures \(\Gamma \)) relates to the \(\varGamma \)-convergence of the free energies \(\mathcal {F}_n\) and suitable liminf-estimates for the dissipation potentials and Fisher-information functionals (or local slopes in a metric setting).
In certain applications or for certain notions of convergence (e.g. see [29]) one also establishes \(\varGamma \)-convergence for the total dissipation \(\mathcal {R}_n+\mathcal {D}_n\) when written as functionals over \(C([0,T];\mathcal {P}(\Gamma ))\). Moreover, \(\varGamma \)-convergence of the functionals \(\mathcal {I}_n\) over such path-spaces are related to the large deviations of the underlying process [23], as we briefly discuss in “Appendix A”. In our framework this would require that for every \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{\infty }\), we can find a sequence \((\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})\in \textsf{CE}_n\) that converges to \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\) and satisfies the limsup-estimate
However, in this paper we restrict ourselves only to the liminf-estimates, which is sufficient to obtain convergence of the solutions, an approach also taken in [11, 12, 28]. Namely, by a lower semicontinuity and compactness argument, Theorem 1.8 implies the convergence of both the solutions and the free energies \(\mathcal {F}_n\), if the initial data are well prepared.
Theorem 1.9
(cf. Theorem 5.3) Suppose that \(\textsf{P}_0^n \rightarrow \textsf{P}\) with \(\mathcal {F}_{n}(\textsf{P}_0^n)\rightarrow \mathcal {F}_{\infty }(\textsf{P}_0)\) as well. Then for the sequence \(\hat{\textsf{P}}^n\) of gradient-flow solutions to (\(\mathsf FKE_n\)), and \(\hat{\textsf{P}}\) the gradient-flow solution to (\(\mathsf Li\)), we have that for all \(t\in [0,T]\)
In particular, if \(\textsf{P}_0=\delta _{{\hat{\nu }}_0}\) and \({\hat{\nu }}_t\) is the solution to the mean-field problem (\(\mathsf MF\)), then for all \(t\in [0,T]\)
The second half of Theorem 1.9, on the concentration around mean-field solutions and convergence of entropies, follows directly from the definition of \(\mathcal {F}_{\infty }\) and uniqueness.
For interacting particle systems where the number of particles is fixed at \(n\in {\mathbb {N}}\) the narrow convergence \(\hat{\textsf{P}}_t^n\rightarrow \delta _{\hat{\nu }_t}\) is equivalent to the propagation of chaos in the sense of Sznitman [38], and would imply narrow convergence of the k-particle marginals at time t to \(\nu _t^{\otimes k}\). However, in our setting, this implies convergence of the k-correlation functions, see [4].
Moreover, the convergence of the free energies \(\mathcal {F}_n\) implies the stronger notion of entropic propagation of chaos if the initial condition is sufficiently regular.
Theorem 1.10
(cf. Theorem 5.4) Suppose that \(\textsf{P}^n_0\rightarrow \delta _{{\hat{\nu }}_0}\) with \(C^{-1} \le \textrm{d}{\hat{\nu }}_0/\textrm{d}\gamma \le C\) for some \(C>0\). If the initial sequence \(\textsf{P}^n_0\) is entropically chaotic in the sense that
then this is propagated along the solution, i.e.
where \(\Pi _{n,\nu }\in \mathcal {P}(\Gamma )\) stems from the Poisson measure \(\pi _{n,\nu }\)with intensity measure \(\nu \), i.e.
To the authors’ knowledge, this is the first entropic propagation of chaos result for bounded competition kernels over compact Polish spaces, under the assumption of detailed balance.
1.3.1 Comments
We have given an overview of the generalized gradient structures that we introduced for the forward Kolmogorov equation of our underlying interacting particle system and alluded to how this sequence of structures converges to a gradient structure induced by the mean-field limit. Throughout, we assumed bounded measurable rates m, c over a compact Polish space \(\mathcal {T}\) satisfying the detailed balance condition \(m(x,y)=c(x,y)\) and \(c(x,x)=0\) for all \(x,y\in \mathcal {T}\), and we would like to briefly touch on possible relaxations of these assumptions.
First, for the limit inferior in Theorem 5.1, there is a technical issue concerning the possible non-continuity of the competition kernel c, which we resolve by an approximation argument from large deviation theory [19], see “Appendix C”. This argument can be straightforwardly extended to unbounded rates m and c under certain exponential integrability estimates with respect to the reference measure \(\gamma \). However, the uniqueness of solutions and well-posed of variational formulations would be less clear.
Moreover, it should be noted that, while we chose \(\mathcal {T}\) to be compact for brevity and clarity of the exposition, many of the listed results carry over to the case of \(\mathcal {T}\) Polish with finite \(\gamma \), under suitable choices of topologies and by bootstrapping from the tightness of \(\gamma \). However, the classical case of \(\mathcal {T}={\mathbb {R}}^d\) with the merely locally finite reference measure \(\gamma =\mathscr {L}^d\) (under suitable integrability estimates on m), as treated in for example [14, 18], is not easily contained in our framework. Due to the necessity to control the entropy, any solution to this problem would involve newly constructed estimates on the propagation of tightness.
A more fundamental restriction is the detailed balance assumption, which is necessary to phrase the variational structures in terms of generalized gradient systems and the evolution in terms of a gradient flow. However, there exist possible extensions and decompositions of variational structures for jump processes that do not assume detailed balance or even complex balance, see for example [20] for an overview. Therefore, in future work, the authors plan to generalize the variational methods outlined here to more general evolutions.
1.4 Notation
Below we collect some of the notation used throughout this paper.
\(\mathcal {T}\) | Trait space, Assumption 1.1 |
m, c | Mutation/competition kernel, Assumption 1.1 |
\(\gamma \) | Reference measure, Assumption 1.1 |
n | System size, Assumption 1.1 |
\(\mathcal {E}\textrm{nt}\) | Relative entropy (2.3) |
H | Hellinger distance (2.2) |
\(\Psi ,\Psi ^*\) | |
\(\mathcal {M}^{+}\) | Space of finite non-negative Borel measures, with narrow topology |
\(\mathcal {M}^{+}_{loc}\) | Space of non-negative Radon measures, with vague topology |
\(\Gamma :=\mathcal {M}^+(\mathcal {T})\) | State space of measure-valued process |
\(\Gamma _n\subset \Gamma \) | Space of positive atomic measures with common mass \(\tfrac{1}{n}\) (3.3) |
\(\kappa ^{\pm }_{\nu }=\kappa ^{\pm }[\nu ]\) | Measure-dependent birth/death kernels (2.1) |
\(\theta _{\nu }\) | Geometric mean of \(\kappa ^{+}_{\nu }\) and \(\kappa ^{-}_{\nu }\), Definition 2.4 |
\(\mathscr{C}\mathscr{E}\) | Continuity equation for mean-field (\(\mathsf MF\)), Definition 2.1 |
\(\mathcal {R}_{MF},\mathcal {F}_{MF},\mathcal {D}_{MF}\) | Ingredients of EDP-functional \(\mathcal {I}_{MF}\) for (\(\mathsf MF\)) , Definition 2.4 |
\(Q_n,Q_n^{*}\) | Generator and dual generator (3.1) of (\(\mathsf FKE_n\)) |
\({{\bar{\kappa }}}_n\) | Jump kernel (3.4) corresponding to (\(\mathsf FKE_n\)) |
\(L_n\) | Rescaled empirical measure map (3.2) |
\(\pi _n,\Pi _n\) | Invariant measures for particle system (3.5) and measure-valued process (3.6) |
\(\textsf{T}^{n,\pm }\) | Creation/annihilation mappings (3.8) |
\(\overline{\nabla }^{n,\pm }, \textrm{div}^{n,\pm }\) | |
\(\vartheta _{\textsf{P}}^{\pm }\) | Expected fluxes (3.12) |
\(\Theta _{\textsf{P}}^{n,\pm }\) | Geometric average \(\vartheta _{\textsf{P}}^{\pm }\) along transition, Definition (3.1) |
\(\textsf{CE}_n\) | Continuity equation for (\(\mathsf FKE_n\)), Definition (3.1) |
\(\mathcal {R}_{n},\mathcal {F}_{n},\mathcal {D}_{n}\) | Ingredients of EDP-functional \(\mathcal {I}_{n}\) for (\(\mathsf FKE_n\)), Definition 3.4 |
\(d_{TV,w}, W\) | Weighted total variation metric (3.18)/transportation metric (4.11) over \(\mathcal {P}(\Gamma )\) |
\(\textsf{CE}_{\infty }\) | Continuity equation for (Li), Definition 4.3 |
\(\mathcal {R}_{\infty },\mathcal {F}_{\infty },\mathcal {D}_{\infty }\) | Ingredients of EDP-functional \(\mathcal {I}_{\infty }\) for (\(\mathsf Li\)), Definition 4.4 |
2 Mean-field system
In this section, we will discuss the gradient-flow formulation of the mean-field equation under the detailed balance condition. Let us first make precise the context of Theorem 1.4, and embed it within the more general statement of Theorem 2.7 below.
Recall that the trait space \(\mathcal {T}\) is a compact Polish space, and \(\Gamma :=\mathcal {M}^+(\mathcal {T})\) is the space of finite non-negative measures over \(\mathcal {T}\) equipped with the narrow topology. Fix a reference measure \(\gamma \in \Gamma \), and rates m, c satisfying Assumption 1.1, i.e. \(m,c\in \mathcal {B}_b^{{+}}(\mathcal {T}\times \mathcal {T})\) with \(m(x,y)=c(y,x)\) for all \(x,y\in \mathcal {T}\), and \(c(x,x)=0\) for all \(x\in \mathcal {T}\). The mean-field equation then reads
with measure-dependent birth and death kernels \(\kappa ^{\pm }:\Gamma \rightarrow \Gamma \) given by
Routinely, we will also adopt the shorthand notation \(\kappa _\nu ^{\pm }:= \kappa ^{\pm }[\nu ]\). Now, setting \(c_{\nu }(x):=\int _{\mathcal {T}} c(x,y)\, \nu (\textrm{d}y)\), it is clear that that \(\kappa ^+_{\nu }=c_{\nu } \gamma \), \(\kappa ^-_{\nu }=c_{\nu } \nu \), and the dynamics simplify to
Strong solutions to (\(\mathsf MF\)) in either total variation or appropriate \(L^1\) spaces follow straightforwardly via classical methods, see Sect. 2.2.
The total variation norm \(\Vert \cdot \Vert _{TV}\) on \(\mathcal {M}(\mathcal {T})\) is defined as
and the squared Hellinger distance \(H^2\) is given by
with \(\sigma \) a measure dominating both \(\mu \) and \(\nu \). Note that the definition (2.2) is independent of the choice for the dominating measure \(\sigma \), and \(\sigma =\nu +\eta \) is always admissible.
Moreover, recall the entropy function \(\phi : {\mathbb {R}}_{\ge 0}\rightarrow {\mathbb {R}}_{\ge 0}\) and its Legendre dual \(\phi ^*:{\mathbb {R}}\rightarrow {\mathbb {R}}\) by
and the relative entropy of \(\nu \) with respect to \(\mu \) as
We will consider curves satisfying the continuity equation
in an appropriately weak sense.
Definition 2.1
(Mean-field continuity equation) A triple \((\nu ,\lambda ^+,\lambda ^-)\) satisfies the mean-field continuity equation \(\mathscr{C}\mathscr{E}\) if
-
(1)
the curve \([0,T]\ni t\mapsto \nu _t\in \Gamma \) is absolutely continuous with respect to \(\Vert \cdot \Vert _{TV}\),
-
(2)
the Borel family \((\lambda _t^\pm )_{t\in [0,T]}\subset \Gamma \) satisfies \(\int _0^T \Vert \lambda _t^{\pm }\Vert _{TV} \, \textrm{d}t<\infty \),
-
(3)
for every \(s,t\in [0,T]\) and all \(f\in C_b(\mathcal {T})\)
$$\begin{aligned} \int _{\mathcal {T}} f \textrm{d}\nu _t - \int _{\mathcal {T}} f \textrm{d}\nu _s = \int _s^t \left( \int _{\mathcal {T}} f \textrm{d}\lambda _r^+-\int _{\mathcal {T}} f \textrm{d}\lambda _r^- \right) \, \textrm{d}r, \quad \hbox {for all } s,t \hbox { with} 0\le s,t\le T. \end{aligned}$$
We will refer to \(\lambda ^{\textrm{net}}=\lambda ^+-\lambda ^-\) as the net flux.
Remark 2.2
When seen as approximations of particle systems the birth/death fluxes \(\lambda ^{\pm }_t\) represent the observed amount of mass being created/annihilated around a certain point, and \(\nu _t\) represents the density of the particles, while \(\kappa _{\nu }^{\pm }\) correspond to the expected birth and death fluxes of the BPDL model.
Remark 2.3
(Time-regularity) As we will see in Lemma 2.12, if there exist a common dominating measure for \(\{\nu _t,\lambda ^+_t,\lambda _t^-\}_{t\in [0,T]}\) then the continuity equation holds in a strong sense: \(\nu _t\) is an a.e. differentiable map from [0, T] to \((\Gamma ,\Vert \cdot \Vert _{TV})\) and
Definition 2.4
Let \(\theta _{\nu }\) be the geometric average of \(\kappa _{\nu }^+\) and \(\kappa _{\nu }^-\), i.e.
for any dominating measure \(\sigma \). We define the following objects:
-
The dissipation potential \(\mathcal {R}_{MF}:\Gamma ^3 \rightarrow [0,+\infty ]\),
$$\begin{aligned} \mathcal {R}_{MF}(\nu ,\lambda ^+,\lambda ^-):=\mathcal {E}\textrm{nt}(\lambda ^+|\theta _{\nu })+\mathcal {E}\textrm{nt}(\lambda ^-|\theta _{\nu }), \end{aligned}$$and the dual dissipation potential \(\mathcal {R}^*_{MF}:\Gamma \times \mathcal {B}_b({\mathcal {T}})^2 \rightarrow {{\mathbb {R}}}\),
$$\begin{aligned} \mathcal {R}^*_{MF}(\nu ,w^+,w^-):=\int _{\mathcal {T}} (e^{w^{+}}-1)\,\textrm{d}\theta _{\nu }+\int _{\mathcal {T}} (e^{w^{-}}-1)\,\textrm{d}\theta _{\nu }. \end{aligned}$$ -
The free energy \(\mathcal {F}_{MF}:\Gamma \rightarrow [0,+\infty ]\),
$$\begin{aligned} \mathcal {F}_{MF}(\nu ):=\tfrac{1}{2} \mathcal {E}\textrm{nt}(\nu |\gamma ), \end{aligned}$$and Fisher information \(\mathcal {D}_{MF}:\Gamma \rightarrow [0,+\infty ]\),
$$\begin{aligned} \mathcal {D}_{MF}(\nu ):=\left\{ \begin{aligned}&2H^2(\kappa _{\nu }^+,\kappa _{\nu }^-),{} & {} \qquad \hbox { if}\ \nu \ll \gamma ,\\&+\infty ,{} & {} \qquad \hbox {otherwise.} \end{aligned}\right. \end{aligned}$$ -
The EDP-functional \(\mathcal {I}_{MF}:\mathscr{C}\mathscr{E}\rightarrow [0,+\infty ]\) for all curves with \(\mathcal {F}_{MF}(\nu _0)<\infty \)
$$\begin{aligned} \mathcal {I}_{MF}(\nu ,\lambda ^+,\lambda ^-):=\int _0^T \mathcal {R}_{MF}(\nu _t,\lambda _t^{+},\lambda ^-_t) \, \textrm{d}t + \mathcal {F}(\nu _T)-\mathcal {F}(\nu _0)+\int _0^T \mathcal {D}_{MF}(\nu _t) \, \textrm{d}t.\nonumber \\ \end{aligned}$$(2.4)
Remark 2.5
Since \(\theta _{\nu }(\mathcal {T})<\infty \) by Lemma 2.10 all objects above are well-defined, and it is straightforward to verify via the dual representation of the entropy that \(\mathcal {R}_{MF}, \mathcal {R}^*_{MF}\) are truly dual objects in the sense that
and vice versa.
Remark 2.6
If \(\nu \ll \gamma \) with \(\textrm{d}\nu =u \textrm{d}\gamma \), note that \(\textrm{d}\theta _{\nu }=c_{\nu } \sqrt{u}\, \textrm{d}\gamma \), and that the Fisher information simplifies to
We are now able to fully state the variational characterization of strong solutions to the mean-field equation (\(\mathsf MF\)).
Theorem 2.7
For any \((\nu ,\lambda ^+,\lambda ^-)\in \mathscr{C}\mathscr{E}\) with \(\mathcal {F}_{MF}(\nu _0)<\infty \), we have \(\mathcal {I}_{MF}(\nu ,\lambda ^+,\lambda ^-)\ge 0\) and
Moreover, whenever \(\mathcal {F}_{MF}(\nu _0)<\infty \) and \(\mathcal {I}_{MF}(\nu ,\lambda ^+,\lambda ^-)<\infty \) the chain rule for \(\mathcal {F}_{MF}\) holds: \(\mathcal {F}_{MF}(\nu _t)\) is absolutely continuous and
The proof of Theorem 2.7 is postponed to Sect. 2.3, where we establish the main technical ingredient, namely the chain rule for the entropy functional.
Remark 2.8
The results of this section do not depend on the no natural death condition \(c(x,x)=0\) for all \(x\in \mathcal {T}\), but arise from the bounds on m, c and depend crucially on the mean-field detailed balance condition \(m(x,y)=c(y,x)\) for all \(x,y\in \mathcal {T}\).
Remark 2.9
The non-negativity of \(\mathcal {I}_{MF}\) and the fact that null-minimizers are solutions to (\(\mathsf MF\)) is related to the formal equivalence
where \(\mathcal {L}\) is the so-called Lagrangian given by
Note that \(\mathcal {L}\) is non-negative and zero if only if \(\lambda ^{\pm }=\kappa _{\nu }^{\pm }\). Although we do not prove the full equivalence in this work, it does play a role in the intuition and motivation behind the EDP-functional \(\mathcal {I}_{MF}\) with the Lagrangian \(\mathcal {L}\) stemming from a large deviation perspective, as seen in “Appendix A”.
2.1 A priori estimates
In this section, we will collect some elementary estimates and results that are either necessary for the well-posedness of the mean-field equation and the corresponding gradient structure, or necessary to do the same for the Liouville equation in Sect. 4.
Let \(\Psi ^*\) be given as
and its dual \(\Psi :=(\Psi ^*)^*\)
Lemma 2.10
Let \(M:=\Vert c\Vert _{\infty } (1+\gamma (\mathcal {T}))\). Then the following estimates hold:
-
(i)
The measures \(\kappa _{\nu }^{\pm }\) and \(\theta _{\nu }\) are finite:
$$\begin{aligned} \kappa ^{\pm }_{\nu }(\mathcal {T})\le M (1+\nu (\mathcal {T})^2). \end{aligned}$$(2.7)and
$$\begin{aligned} \theta _{\nu }(\mathcal {T})\le M (1+\nu (\mathcal {T})^2) \end{aligned}$$(2.8) -
(ii)
For any birth/death fluxes \(\lambda ^{\pm }\in \mathcal {M}^+(\mathcal {T})\), net flux \(\lambda ^{\textrm{net}}=\lambda ^+-\lambda ^{{-}}\), and \(w^{\pm }, w\in \mathcal {B}(\mathcal {T})\),
$$\begin{aligned} \begin{aligned} \int _{\mathcal {T}} |w^{\pm }| \,\textrm{d}\lambda ^{\pm }&\le \mathcal {E}\textrm{nt}(\lambda ^{\pm }|\theta _{\nu })+\int _{\mathcal {T}} \Psi ^*(w) \, \textrm{d}\theta _{\nu } + \theta _{\nu }(\mathcal {T}), \\ \int _{\mathcal {T}} |w|\,\textrm{d}|\lambda ^{\textrm{net}}| \,&\le \mathcal {R}_{MF}(\nu ,\lambda ^+,\lambda ^-)+\int _{\mathcal {T}} \Psi ^*(w) \, \textrm{d}\theta _{\nu }. \end{aligned} \end{aligned}$$ -
(iii)
For any birth/death fluxes \(\lambda ^{\pm }\in \Gamma \),
$$\begin{aligned} \phi \left( \frac{\lambda ^{\pm }(\mathcal {T})}{M(1+\nu (\mathcal {T})^2)} \vee 1\right) M\le \mathcal {R}_{MF}(\nu ,\lambda ^+,\lambda ^-) \end{aligned}$$(2.9)
Remark 2.11
Although the estimate for \(\theta _{\nu }\) can be made more precise, namely
we will not require it for our results.
Proof
(i) With \(\theta _{\nu }:=\sqrt{\textrm{d}\kappa _{\nu }^+/\textrm{d}\sigma \, \textrm{d}\kappa _{\nu }^-/\textrm{d}\sigma } \, \sigma \) for any dominating measure \(\sigma \) we have by Hölder’s inequality
Note that \(\kappa ^+_{\nu }(\mathcal {T})\le \Vert c\Vert _{\infty } \gamma (\mathcal {T}) \nu (\mathcal {T})\), and \(\kappa ^-_{\nu }(\mathcal {T})\le \Vert c\Vert _{\infty } \nu (\mathcal {T})^2\), which provides (2.7). Since \(z\le 1+z^2\) for all \(z\ge 0\) (2.8) follows directly.
(ii) First, suppose that \(w\in \mathcal {B}_b(\mathcal {T})\). Using the elementary inequality \( e^{|a|}\le e^a+e^{-a}\) we derive by duality of the entropy
Next, fix any measurable function \(w\in \mathcal {B}(\mathcal {T})\) and set its k-truncation \(w_k:=\max \{\min \{w,k\},-k\}\). Since \(\Psi ^*\) is even and monotone, by monotone convergence applied to both sides, the inequality holds for w as well, with both sides possibly equal to \(+\infty \).
Next, note that for for any \({\tilde{w}}\in \mathcal {B}_b(\mathcal {T})\)
Substituting \({\tilde{w}}:=|w|1_{P}-|w|1_{P^c}\), with \(P,P^c\) stemming from the Hahn decomposition for \(\lambda ^{\textrm{net}}=\lambda ^+-\lambda ^-\), the desired inequality for \(\lambda ^{\textrm{net}}\) now follow after another truncation argument.
(iii) Without loss of generality, suppose that \(\mathcal {R}_{MF}\) is finite. Set \(a(\nu ):=(1+\nu (\mathcal {T})^2)^{-1}\), and note that \(0\le a(\nu )\le 1\). With \({\tilde{\phi }}(s):=\phi (s\vee 1)\) the monotone relaxation of \(\phi \), we then have the following chain of inequalities,
where the last inequality follows from Jensen’s inequality. By convexity of \({\tilde{\phi }}\) and \({\tilde{\phi }}(0)=0\) the latter expression is monotone in \(\theta _{\nu }(\mathcal {T})\), and hence by (2.8) we find
\(\square \)
We will briefly state the improvement of regularity in time of \(\nu _t\) if there exists a common dominating measure. The proof is similar to Corollary 4.14 of [33] and therefore omitted here.
Lemma 2.12
Let \((\nu ,\lambda ^+,\lambda ^-)\in \mathscr{C}\mathscr{E}\) and suppose that there exists a measure \(\ell \in \Gamma \) such that \(\nu _t,\lambda _t^{\pm }\ll \ell \) for all \(t\in [0,T]\).
Then there exists an absolutely continuous and a.e. differentiable map \(u:[0,T]\rightarrow L^1(\mathcal {T},\ell )\) and maps \(g^{\pm }:[0,T]\rightarrow L^1(\mathcal {T},\ell )\) such that \(u_t=\textrm{d}\nu _t/\textrm{d}\ell \), \(g_t^{\pm }=\textrm{d}\lambda _t^{\pm }/\textrm{d}\ell \) and
In particular, the continuity equation holds in the strong sense, namely that \(\nu _t\) is an a.e. differentiable map from [0, T] to \((\Gamma ,\Vert \cdot \Vert _{TV})\) and
Next, we will list two results that are either necessary for the chain rule in Sect. 3.3 or the superposition principle and well-posedness of the continuity equation in Sect. 4.
Lemma 2.13
For any \(0\le a\le 1\), \(z\in {\mathbb {R}}\)
Moreover, for any net flux \(\lambda ^{\textrm{net}}\in \mathcal {M}(\mathcal {T})\),
Proof
It is straightforward to check that \(\Psi ^*(z)/z^2\) is monotone increasing for \(z\ge 0\), from which the first statement follows.
Now, for the net flux, it is convenient to go through the dual representation. Set \(a(\nu ):=(1+\nu (\mathcal {T}))^{-1}\). By duality, for any \(w\in \mathcal {B}_b(\mathcal {T})\)
However, by (2.10),
Taking the supremum over all \(w\in \mathcal {B}_b(\mathcal {T})\) in (2.12) we find (2.11). \(\square \)
Lemma 2.14
Let \(\{f_i\}_{i\in {\mathbb {N}}} \subset C_b(\mathcal {T})\) be a countable and dense set of bounded continuous functions. Suppose \((\nu ,\lambda ^+,\lambda ^-)\) is such that
-
(i)
the curve \([0,T]\ni t\mapsto \nu _t\in \Gamma \) is narrowly continuous
-
(ii)
\((\lambda _t^\pm )_{t\in [0,T]}\subset \Gamma \) is a Borel family with
$$\begin{aligned} \int _0^T \mathcal {R}_{MF}(\nu _t,\lambda ^+_t,\lambda ^-_t) \, \textrm{d}t<\infty \end{aligned}$$ -
(iii)
For all \(i \in {\mathbb {N}}\)
$$\begin{aligned}{} & {} \int _{\mathcal {T}} f_i\, \textrm{d}\nu _t - \int _{\mathcal {T}} f_i\, \textrm{d}\nu _s = \int _s^t \left( \int _{\mathcal {T}} f_i \,\textrm{d}\lambda _r^+-\int _{\mathcal {T}} f_i \,\textrm{d}\lambda _r^- \right) \textrm{d}r,\\{} & {} \qquad \hbox {for all }s,t \hbox {with } 0\le s,t\le T. \end{aligned}$$
Then \((\nu ,\lambda ^+,\lambda ^-)\in \mathscr{C}\mathscr{E}\), i.e. the triple satisfies the mean-field continuity equation.
Proof
Since \(\nu _t\) is narrowly continuous its mass is uniformly bounded in time, hence let \(C:=\sup _{t\in [0,T]} \nu _t(\mathcal {T})\). By (2.9) and monotonicity of \(\phi (\cdot \vee 1)\) we have for a.e. \(t\in [0,T]\),
and therefore by convexity of \(\phi (\cdot \vee 1)\)
Since the measures \(\lambda _t^{\pm }(\textrm{d}x)\, \textrm{d}t \in \mathcal {M}^+([0,T]\times \Gamma )\) are finite, by density of \(f_i\) in \(C_b(\mathcal {T})\) it is clear that for all \(f\in C_b(\mathcal {T})\)
By a monotone class argument, this can be extended to all \(f\in \mathcal {B}_b(\mathcal {T})\) and we derive that \(\nu _t\) is indeed TV-absolutely continuous and \((\nu ,\lambda ^+,\lambda ^-)\in \mathscr{C}\mathscr{E}\). \(\square \)
2.2 Strong solutions
Strong solutions to (\(\mathsf MF\)) exist and are unique, and we list the most important properties here. It should be noted that these arguments apply even without the detailed balance condition \(m(x,y)=c(y,x)\) and only require both \(\Vert m\Vert _{\infty }\) and \(\Vert c\Vert _{\infty }{}\) to be finite, but for simplicity, we will restrict ourselves to our framework. Moreover, in all results the time window \(T>0\) is arbitrary.
Definition 2.15
A strong solution to (\(\mathsf MF\)) is any TV-absolutely continuous and a.e. differentiable mapping \(\nu :[0,T]\rightarrow (\Gamma ,\Vert \cdot \Vert _{TV})\) satisfying
Recall that \(\kappa ^+_{\nu }(\textrm{d}x)={c_{\nu }(x)} \gamma (\textrm{d}x)\) and \(\kappa ^-_{\nu }(\textrm{d}x)={c_{\nu }}(x) \nu (\textrm{d}x)\), where \(c_{\nu }(x)=\int _{\mathcal {T}} c(x,y)\, \nu (\textrm{d}y)\).
Remark 2.16
Note that if \(\nu \) is a strong solution to (\(\mathsf MF\)) automatically \((\nu ,\kappa ^{+}_{\nu },\kappa _{\nu }^-) \in \mathscr{C}\mathscr{E}\).
Vice versa, if \((\nu ,\kappa ^{+}_{\nu },\kappa _{\nu }^-) \in \mathscr{C}\mathscr{E}\) then \(\nu _t\) is a strong solution. Namely, any TV-absolutely continuous curve \(\nu _t\) possesses a common dominating measure \(\ell \in \Gamma \), which implies \(\kappa _{\nu _t}^{\pm }\ll \ell +\gamma \). By Lemma 2.12 the curve \(\nu \) is indeed a a.e. differentiable mapping to \((\Gamma ,\Vert \cdot \Vert _{TV})\)
Lemma 2.17
For any \({{\bar{\nu }}}\in \Gamma \) there exist a unique strong solution \(\nu _t\) to (\(\mathsf MF\)) such that \(\nu _0={{\bar{\nu }}}\).
Moreover, if \({{\bar{\nu }}}\ll \gamma \), then also \(\nu _t\ll \gamma \) for all \(t\in [0,T]\).
The proof is an adaptation from [18, Proposition 7.2], which is stated for Lebesgue absolutely continuous measures over \(\mathcal {T}={\mathbb {R}}^d\). In short, the linear dependence of the birth flux on the mass of \(\nu \) gives a bound on this mass uniform in time, in which case both \(\kappa ^{\pm }_{\nu }\) are Lipschitz in \(\nu \) on \((\Gamma ,\Vert \cdot \Vert )\), and classical existence theory can be applied.
Proof
First, note that for the linear case of
with \(c_t\in \mathcal {B}_b\) uniformly bounded and \(b_{{t}} \in \Gamma \) with \(\int _0^T \Vert b_{{t}}\Vert _{TV} \, \textrm{d}t<\infty \) with a common dominating measure, it is easy to verify that a unique strong non-negative solution exists and is given by
We now set \(\nu ^0_t:={{\bar{\nu }}}\) for all \(t\in [0,T]\), and perform the implicit Picard iteration
i.e. \(\nu ^{k+1}=(\mathcal {G}\nu ^k)\) with
It is straightforward to check that for all \(t\in [0,T]\)
We will show that \(\mathcal {G}\) is contractive under a suitable metric on the space of curves with initial data \({{\bar{\nu }}}\) and mass bounded by C. This implies there exists a TV-absolutely continuous curve \(\nu \) such that
Moreover, since in the iterations \(\nu ^k\ll {{\bar{\nu }}}+\gamma \) for all \(\nu \) it is clear that we obtain strong solutions in \(L^1(\bar{\nu }+\gamma )\). In particular, for \({{\bar{\nu }}}\ll \gamma \) we have \(\nu _t\ll \gamma \) for all \(t\in [0,T]\) as well.
Now, note that \(\langle c(x,\cdot ),\nu \rangle \) depends Lipschitz on \(\nu \) in \((\Gamma ,\Vert \cdot \Vert _{TV})\) due to the uniform bound on mass. This implies that there exists a constant K such that for any two admissible curves \(\nu ,{\tilde{\nu }}\):
Hence, by a Gronwall-type argument, we find that for any \(\varepsilon >0\) for all \(t\in [0,T]\)
thus yielding the contraction required to apply the Banach fixed-point theorem. \(\square \)
Finally, for the use in entropic propagation chaos of Theorem 5.4, it is convenient to characterize the conditions for which \(u_t\) is bounded from above and below. The following statement follows directly from a Gronwall-type argument.
Lemma 2.18
Suppose \(\nu _0\) is such that \(C^{-1} \le \textrm{d}\nu _0/\textrm{d}\gamma (x) <C\) for some constant \(C>0\) and all \(x\in \mathcal {T}\). Then there exists a constant \(C_T>0\) such that for the corresponding solution
2.3 Variational characterization
We will now prove the non-negativity of our EDP-functional \(\mathcal {I}_{MF}\) and the characterization of strong solutions to (\(\mathsf MF\)) as minimizers of \(\mathcal {I}_{MF}\). To do so we first need the prove the chain rule for the free energy \(\mathcal {F}_{MF}\) along curves with finite \(\mathcal {I}_{MF}\).
There is an important technical issue concerning the Fisher information, in the sense that on curves with finite \(\mathcal {I}_{MF}\) the chain rule inequality holds for the following replacement:
for any \(\nu \ll \gamma \) with \(u:=\textrm{d}\nu /\textrm{d}\gamma \). Note that \(0\le \mathcal {D}^-_{MF}(\nu )\le \mathcal {D}_{MF}(\nu )\) and \(\mathcal {D}^-_{MF}=\mathcal {R}_{MF}^*(\partial _{\nu } \mathcal {F}_{MF})\).
We will see the same principle arise in Sect. 3 for the variational characterization of the forward Kolmogorov equation, which is also observed in [33, Section 5].
Lemma 2.19
For any curve \((\nu ,\lambda ^+,\lambda ^-)\in \mathscr{C}\mathscr{E}\) with \(\mathcal {F}_{MF}(\nu _0)<\infty \) and \(\mathcal {I}_{MF}(\nu ,\lambda ^+,\lambda ^-)<\infty \) it holds that \([0,T]\ni t\mapsto \mathcal {F}_{MF}(\nu _t)\) is absolutely continuous and a.e. differentiable with
Moreover, for such a curve
Remark 2.20
In fact, for such curves, for a.e. t both the terms
will be finite, and hence
Remark 2.21
From Lemma 2.19, it is clear that an alternative approach would be to discard the functional \(\mathcal {I}\) and only consider \(\mathcal {I}^-\), and relate minimizers to EDP-solutions, and so forth. However, the reason for the introduction of \(\mathcal {D}_{MF}\), and \(\mathcal {I}_{MF}\) by extension, is the lower semicontinuity of \(\mathcal {D}_{MF}\) and its Liouville-counterpart \(\mathcal {D}_{\infty }\) (see Sect. 4) and is related to the fact that \(\mathcal {I}_{MF}\) arises in the limit of the EDP-convergence of Sect. 5.
Proof
Fix any curve \((\nu ,\lambda ^+,\lambda ^-)\in \mathscr{C}\mathscr{E}\) with \(\mathcal {F}_{MF}(\nu _0)<\infty \). We will show that whenever \(\mathcal {I}_{MF}<\infty \) the mapping \(t \mapsto \mathcal {E}\textrm{nt}(\nu _t|\gamma )\) is absolutely continuous and satisfies the chain rule, i.e.
Suppose that \(\mathcal {I}_{MF}<\infty \). Since \(\mathcal {E}\textrm{nt}\) is bounded from below, \(\mathcal {E}\textrm{nt}(\nu _0|\gamma )<\infty \) implies that
In particular for a.e. \(t\in [0,T]\) it holds that \(\nu _t\ll \gamma \), \(\lambda _t^{\pm }\ll \theta _{\nu _t}\), and in turn \(\theta _{\nu _t} \ll \gamma \). In fact, due to TV-continuity of \(\nu _t\), we have \(\nu _t\ll \gamma \) for all \(t\in [0,T]\). Moreover, \(\int _0^T \lambda ^{\pm }_t(\mathcal {T})<\infty \) and \(\sup _t \nu _t(\mathcal {T})<\infty \).
Setting \(u_t:=\textrm{d}\nu _t/\textrm{d}\gamma \), we have
and in particular \(\theta _{\nu _t}(\{u_t=0\})=0\). Similarly, \(\lambda _t^{\pm }\ll \theta _{\nu _t}\) for a.e. t and hence \(u_t>0\) for \(\lambda ^{\pm }_t,\lambda ^{\textrm{net}}_t\)-a.e. x for such t as well. Furthermore, since for a.e. t we have \(\lambda _t^{\pm }\ll \theta _{\nu _t}\ll \gamma \) we find by Lemma 2.12 that \(u: [0,T]\rightarrow L^1(\mathcal {T},\gamma )\) is absolutely continuous and differentiable at a.e. \(r\in [0,T]\).
Consider any such r with \(\mathcal {R}_{MF}(\nu _r,\lambda _r^+,\lambda _r^-),\mathcal {D}_{{MF}}(\nu _r)<\infty \). By Lemma 2.10, for any \(w\in \mathcal {B}_b(\mathcal {T})\),
Now let \(\phi _m\) be the convex and uniformly Lipschitz regularizations of \(\phi \) constructed by using the truncations \(\phi _m':=[\phi ']_m=\max \{\min \{\phi ,m\},-m\}\) and \(\phi (s):=\int _1^s \phi _m'(z)\, \textrm{d}z\). Note that \(\phi _m'\) converges pointwise to \(\phi '\), and both \(\phi _m\) and \(|\phi '_m|\) converge monotonically to \(\phi \) and \(|\phi '|\) respectively.
Moreover, note that \(\phi '(u_r)=\log u_r\) is \(\theta _{\nu _r}\)-a.e. finite, and similarly \(\lambda ^{\pm }_r\)-a.e. as well. Therefore, since \(\Psi ^*\) is even and monotone on \({\mathbb {R}}_{\ge 0}\) we derive
Recall that \(\mathcal {D}^-_{MF}(\nu _r)\le \mathcal {D}_{MF}(\nu _r)\). By substituting \(w=\tfrac{1}{2}\phi _m'\) in (2.14) we find
and after a monotone convergence argument
Note that for every m the function \(\phi _m\) is smooth and uniformly Lipschitz, thus the functional \(\int \phi _m(u_r) \,\textrm{d}\gamma \) is \(\Vert \cdot \Vert _{TV}\)-Lipschitz continuous and hence absolutely continuous by TV-regularity of \(\nu _r\). Moreover, since \(\lambda _r^{\pm }\ll \gamma \) and \(u_r\) is a.e. differentiable in \(L^1({\mathcal {T}},\gamma )\) it is straightforward to check that
Therefore, since \(\mathcal {E}\textrm{nt}(\nu _0|\gamma )\) is finite by assumption and the functionals \(\int \phi _m(u_t)\, \textrm{d}\gamma \) converge monotonically to \(\mathcal {E}\textrm{nt}(\nu _t|\gamma )\), we find
In particular \(\mathcal {E}\textrm{nt}(\nu _t|\gamma )\) is finite for all \(t\in [0,T]\), and after repeating the argument for \(s,t\in [0,T]\) we conclude by a dominated convergence argument that
and
\(\square \)
We are now finally in a position to prove Theorem 2.7. With the chain rule above, all that remains is on one hand showing that \(\mathcal {I}^-_{MF}(\nu ,\lambda ^+,\lambda _t^-)=0\) implies that \(\lambda _t^{\pm }=\kappa _{\nu _t}^{\pm }\) for a.e. t, and on the other hand, showing that if \(\nu \) is a strong solution it holds that \(\mathcal {I}^-_{MF}(\nu ,\kappa _{\nu }^+,\kappa _{\nu }^-)=0\) and \(\mathcal {D}^-_{MF}=\mathcal {D}_{MF}\) for a.e. \(t\in [0,T]\). The second part again involves proving a chain rule, but now along the solution curve.
Proof of Theorem 2.7
First, consider any \((\nu ,\lambda ^+,\lambda ^-)\in \mathscr{C}\mathscr{E}\) with \(\mathcal {F}_{MF}(\nu _0)<\infty \), and \(\mathcal {I}_{MF}=0\). By Lemma 2.19,
Now, recall that \(\textrm{d}\theta _{\nu }=c_{\nu } \sqrt{u}\, \textrm{d}\gamma \). Setting \(g_t^{\pm }:=\textrm{d}\lambda ^{\pm }_t/\textrm{d}\theta _{\nu }\), it holds that \(\log (u_t)\, g_t^{\pm }<\infty \) for \(\theta _{\nu _t}\)-a.e. x and a.e. t, and by the inequality (2.15) that \(|\log u_t|\, |g_t^+-g_t^-|\) is \(\theta _{\nu _t}\)-integrable. Therefore, by straightforward algebraic manipulations, we find that for a.e. t,
Due to the duality between \(\phi \) and \(\phi ^*\) this expression is zero if only if \(\theta _{\nu _t}\)-a.e.
Recalling that \(\theta _{\nu }=c_{\nu } \sqrt{u} \gamma \), \(\kappa _{\nu }^+=c_{\nu } \gamma \) and \(\kappa ^-_{\nu }=c_{\nu } u \gamma \) we find that indeed for a.e. t,
Vice versa, assume that \(\nu _t\) is a strong solution with \(\mathcal {F}_{MF}(\nu _0)<\infty \). Recall that \(\nu _t\ll \gamma \) for all \(t\in [0,T]\) by Lemma 2.17, and hence \(\kappa _{\nu _t}^{\pm }\ll \gamma \) as well. Therefore we can again write \(u_t:=\textrm{d}\nu _t/\textrm{d}\gamma \), \(\kappa _{\nu }^+=c_{\nu } \gamma \), \(\kappa ^-_{\nu }=c_{\nu } u \gamma \) and \(\theta _{\nu }=c_{\nu } \sqrt{u} \gamma \). Moreover, \(u: [0,T]\rightarrow L^1(\mathcal {T},\gamma )\) is absolutely continuous and a.e. differentiable, and thus for every regularized entropy function:
Note that the latter expression is non-positive since \(\phi _{{m}}'(z)(z-1)\) is non-negative, due to the convexity of \(\phi _m\) and \(\phi _m(1)=0\). Moreover, recall that the regularized entropies converge for every \(\nu \), are non-negative, and \(\mathcal {E}\textrm{nt}(\nu _0|\gamma )<{\infty }\) by assumption. Therefore
It is clear that to obtain \(\mathcal {I}_{MF}=0\) it is sufficient to prove that for any \(\nu \) with \(\nu \ll \gamma \),
By non-negativity of the integrand both
and
Since \(\phi _{{m}}'(0)=-m\) this implies that in fact for all m
but since the former is finite after taking the limit \(m\rightarrow \infty \), we deduce that
and hence \(\gamma (\{u=0,c_{\nu }>0{\}})=0\). Moreover, by monotone convergence we have
Note by straightforward algebraic manipulation that
Therefore
Since all terms are non-negative we can separate terms and reduce the expression to
Here the equality follows from the fact that \(\gamma (\{u=0,c_{\nu }>0{\}})=0\) and hence
i.e. \(\mathcal {D}_{MF}^ -(\nu )=\mathcal {D}_{MF}(\nu )\), and
\(\square \)
3 Forward Kolmogorov equation
In the Introduction, we discussed how the BPDL model describes a measure-valued process \(\nu ^n_t\) in \(\Gamma \) involving particles being created and annihilated, with the corresponding Forward Kolmogorov equation
where \(\textsf{P}_t \in \mathcal {P}(\Gamma )\) for all \(t\in [0,T]\) and \(Q_n^*\) is the dual of the infinitesimal generator \(Q_n\) with
for all \(F\in C_c(\Gamma )\). Throughout this section, the parameter \(n>0\) will be fixed.
In the case of \(\mathcal {T}={\mathbb {R}}^d\) it is shown in [18] that a measure-valued process with generator \(Q_n\) exists, and is in fact a jump process in \(\Gamma \) corresponding to the jump kernel \({{\bar{\kappa }}}_n\) shown below. However, for our general setting with \(\mathcal {T}\) a compact Polish space, we will take (\(\mathsf FKE_n\)) simply as a starting point, and do not consider the existence or convergence of the measure-valued process \(\nu _t^n\) itself—even though we will sometimes borrow the language of jump processes for illustration purposes.
In this section, we will state the general version of Theorem 1.6, by showing that a detailed balance condition holds, establishing a generalized gradient structure for the Forward-Kolmogorov equation, and characterizing the solutions as minimizers of corresponding EDP-functionals. Similar to Sect. 2 we first give an overview of the ingredients to state the main results and then leave the proofs for the existence of solutions and the variational characterization to Sects. 3.2 and 3.3.
Note that since
the operator \(Q_n\) is not bounded on \(\mathcal {B}_b(\Gamma )\). If it were, suitable solutions and possible variational formulation would fall into the framework of [33], where triples \((V,\pi ,\kappa )\) are considered, with V a Polish space, \(\pi \) a finite measure, and \(\kappa (x,\textrm{d}y)\) a jump kernel satisfying a detailed balance condition with respect to \(\pi \) and the boundedness condition
They construct solutions to the forward Kolmogorov equation that are absolutely continuous to \(\pi \) and characterize them as minimizers of a suitable EDP functional involving the net flux. In this section, we generalize part of this framework to unbounded kernels and so-called one-way or uni-directional fluxes and tailor it to our setting of interacting particle systems.
Namely, let the rescaled empirical measure mapping \(L_n:\coprod _{N\ge 1} \mathcal {T}^{N} \rightarrow \Gamma \) be given as
and let \(\Gamma _n\subset \Gamma \) be the space of finite positive discrete measures with common unit weight \(\tfrac{1}{n}\), i.e.
Note that the operators \(Q_n, Q_n^*\) can be represented as
where \({{\bar{\kappa }}}_n(\nu ,\cdot ) \in \mathcal {M}^+(\Gamma _n)\) for all \(\nu \in \Gamma _n\) is a jump kernel over \(\Gamma _n\) given by
Moreover, we consider Poisson measures \(\Pi _n\in \mathcal {P}(\Gamma _n)\) induced by the reference measure \(\gamma \). Namely, with the measure \(\pi _n \in \mathcal {P}(\coprod _{N\ge 1}\mathcal {T}^N)\) given by
we define
We will show in Lemma 3.12 that the measures \(\Pi _n\) are invariant measures of (\(\mathsf FKE_n\)) and that \({{\bar{\kappa }}}_{n}\) satisfies the detailed balance condition with respect to \(\Pi _n\), i.e. we have the symmetry
It is straightforward to check that even though \({{\bar{\kappa }}}_n\) is unbounded, we still have the weighted integrability condition
Therefore we can still bootstrap from gradient-flow solutions in the sense of [33] for regularized triples \((\Gamma _n,\Pi _n,{{\bar{\kappa }}}^{\varepsilon }_n)\), after passing from a net flux to a one-way flux formulation, see “Appendix A”, to obtain unique gradient-flow solutions as defined in Sect. 3.2.
To discuss the continuity equation and the dissipation potentials properly, we need to introduce some additional notation. We define the following creation and annihilation operators:
with the convention that \(\textsf{T}^{n,-}(\nu ,x)={(}\nu {,x)}\) if \(x\notin \textrm{supp}(\nu )\). Note that \(\textsf{T}^{n,-} \circ \textsf{T}^{n,+}=\textsf{Id}\) always holds, and \(\textsf{T}^{n,+} \circ \textsf{T}^{n,-} (\nu ,x)=(\nu ,x)\) whenever \(x\in \textrm{supp}(\nu )\).
We further define the discrete \(\Gamma _n\)-gradients \(\overline{\nabla }^{n,\pm }: C_c(\Gamma _n)\rightarrow C_c( \Gamma _n\times \mathcal {T})\):
and the corresponding \(\Gamma _n\)-divergence \(\overline{\text {div}}^{n,\pm }: \mathcal {M}_{loc}^+(\Gamma _n\times \mathcal {T})\rightarrow \mathcal {M}_{loc}(\Gamma _n)\), dual to \(\overline{\nabla }^{n,\pm }\), given by
where \(\textsf{p}^{\Gamma _n}:{\Gamma _n}\times \mathcal {T}\rightarrow \Gamma _n\) denotes the projection to the first variable.
We consider the families of curves satisfying
in the following appropriate distributional sense.
Definition 3.1
(Continuity equation)
A triple \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\) satisfies the continuity equation \(\textsf{CE}_n\), if
-
(1)
the curve \([0,T]\ni t\mapsto \textsf{P}_t\in \mathcal {P}(\Gamma _n)\) is narrowly continuous,
-
(2)
the Borel family \((\textsf{J}^{\pm }_t)_{t\in [0,T]}\in \mathcal {M}^+_{loc}(\Gamma _n\times \mathcal {T})\) satisfies
$$\begin{aligned} \textrm{supp}(\textsf{J}^-_t) \subseteq \left\{ (\nu ,x)\,:\, \nu (\mathcal {T})\ge \tfrac{2}{n}, \, x\in \textrm{supp}(\nu ) \right\} , \end{aligned}$$ -
(3)
\(\int _0^T \int _{\Gamma _n\times \mathcal {T}} (1+\nu (\mathcal {T})^2)^{-1}\,\textrm{d}\textsf{J}^{\pm }_{t} \, \textrm{d}t<\infty \),
-
(4)
for every \(s,t\in [0,T]\) and all \(F\in C_c(\Gamma _n)\)
$$\begin{aligned} \int _{\Gamma _n} F(\nu ) \,\textrm{d}\textsf{P}_t - \int _{\Gamma _n} F(\nu ) \,\textrm{d}\textsf{P}_s = \int _s^t \int _{\Gamma _n\times \mathcal {T}} \left( (\overline{\nabla }^{n,+} F) \,\textrm{d}\textsf{J}_r^++(\overline{\nabla }^{n,-} F) \, \textrm{d}\textsf{J}_r^{-} \right) \, \textrm{d}r.\nonumber \\ \end{aligned}$$(3.11)
Throughout we will call arbitrary measures \(\textsf{J}^{\pm } \in \mathcal {M}^+_{loc}(\Gamma _n\times \mathcal {T})\) admissible if
and
Moreover, since \(\Gamma _n\) is a closed subspace of the Polish space \(\Gamma \), the extension of \(\textsf{P}\) to \(\mathcal {P}(\Gamma )\) and the extension of \(\textsf{J}^{\pm }\) to \(\mathcal {M}^+_{loc}(\Gamma \times \mathcal {T})\) are well-defined. For simplicity, we will simply refer to them as \(\textsf{P}\), \(\textsf{J}^{\pm }\) as well, and drop the n-dependence in most arguments.
It is also clear that for any admissible \(\textsf{J}^{\pm }\)
and in particular (3.11) is equivalent to
for all \(F\in C_c(\Gamma )\). Note that this can again be extended to all \(F\in \mathcal {B}_c(\Gamma )\) via a monotone class argument.
Remark 3.2
Condition (2) represents the restriction that particles can only be deleted if there are at least two particles in the system, consistent with the fact that \(\textsf{P}\in \mathcal {P}(\Gamma _n)\) and hence the underlying process never attains \(\nu =0\).
Moreover, condition (3) reflects the unboundedness of the observed fluxes \(\textsf{J}^{\pm }\), which stems from the unboundedness of the birth/death kernels \(\kappa ^{\pm }_{\nu }\) in \(\nu \).
Remark 3.3
Whenever \(\textsf{J}^{\pm }\) are of the form
with \(\lambda ^{\pm }[t,\nu ]\in \mathcal {M}^+(\mathcal {T})\) for all \(\nu \in \Gamma \) and \(t\in [0,T]\), the continuity equation (3.11) describes the forward Kolmogorov equation corresponding to an interacting birth/death process with the birth/death kernels \(\lambda ^{\pm }[t,\nu ]\) depending on both time and the empirical measure of the particles \(\nu \). The time-dependent jump kernel is then given by
To define the dissipation potentials, let us introduce the measures \(\vartheta _\textsf{P}^{\pm } \in \mathcal {M}_{loc}^+(\Gamma \times \mathcal {T})\)
Note that for any curve \((\textsf{P}_t)_{t\in [0,T]}\) the measures \(\textsf{J}_t^{\pm }:=\vartheta _{\textsf{P}_t}^{\pm }\) satisfy the conditions (2) and (3), where the former holds because \(c(x,x)=0\).
Moreover, as will be shown in Lemma 3.12, we have the following symmetry
from which the detailed balance condition (3.7) directly follows.
Definition 3.4
Let \(\Theta ^{n,\pm }_{\textsf{P}}\in \mathcal {M}_{loc}(\Gamma \times \mathcal {T})\) be the geometric average of \(\vartheta ^{\pm }_{\textsf{P}}\) and \(\textsf{T}^{n,\mp }_{\#}\vartheta ^{\mp }_{\textsf{P}}\), i.e.
for any dominating measure \(\Sigma \).
The dissipation potential \(\mathcal {R}_n:\mathcal {P}(\Gamma )\times \mathcal {M}_{loc}^+(\Gamma \times \mathcal {T})^2\rightarrow [0,+\infty ]\) and dual dissipation potential \(\mathcal {R}^*_n:\mathcal {P}(\Gamma )\times \mathcal {B}_c(\Gamma \times \mathcal {T})^2\) are given by
For the free energy \(\mathcal {F}_n:\mathcal {P}(\Gamma )\rightarrow [0,+\infty ]\) and Fisher information \(\mathcal {D}_n:\mathcal {P}(\Gamma )\rightarrow [0,+\infty ]\)
For the EDP-functional \(\mathcal {I}_{n}:\textsf{CE}_n\rightarrow [0,+\infty ]\) for all curves with \(\mathcal {F}_{n}(\textsf{P}_0)<\infty \)
Remark 3.5
The definition of \(\Theta _{\textsf{P}}^{n,\pm }\) is independent of the dominating measure \(\Sigma \). Moreover, formally
i.e. it represents the geometric mean of the expected fluxes going forwards and backwards along the transition \(\nu \leftrightarrow \nu +\tfrac{1}{n}\delta _x\).
In addition, due to the symmetry (3.13) the measures \(\Theta _{\textsf{P}}^{n,\pm }\) simplify whenever \(\textsf{P}\ll \Pi _n\), i.e. if \(\textrm{d}\textsf{P}= U \textrm{d}\Pi _n\) we have
Remark 3.6
Note that \(\mathcal {D}_n\) is a jointly convex function in \((\vartheta ^{\pm }_{\textsf{P}},\textsf{T}_{\#}^{n,\mp } \vartheta _{\textsf{P}}^{\mp })\), and lower semicontinuous if \(\mathcal {F}_{n}\) is bounded. Moreover, it is straightforward to check that whenever \(\textsf{P}\ll \Pi _n\) with \(\textrm{d}\textsf{P}= U \Pi _n\) it holds
Finally, for technical purposes, we also introduce a version for net fluxes.
Definition 3.7
The upward net flux \(\textsf{J}^{\textrm{net}}\) is defined as
Note that \(\textsf{J}^{\textrm{net}}(\nu ,x)\) can be interpreted as the net flux along the jump \(\nu \leftrightarrow \nu +\tfrac{1}{n}\delta _x\).
The continuity equation for the net flux reduces to
We are now in a position to give the general version of Theorem 1.6.
Theorem 3.8
For any \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{n}\) with \(\mathcal {F}_n(\textsf{P}_0)<\infty \) we have \(\mathcal {I}_{n}(\textsf{P},\textsf{J}^+,\textsf{J}^-)\ge 0\),
and there exist a unique gradient-flow solution, i.e. a curve \((\textsf{P})\) such that \(\mathcal {I}_{n}(\textsf{P},\textsf{P}_t \kappa _{\nu }^{+},\textsf{P}_t \kappa _{\nu }^{-})=0\).
Moreover, whenever \(\mathcal {F}_n(\textsf{P}_0)<\infty \) and \(\mathcal {I}_{n}(\textsf{P},\textsf{J}^+,\textsf{J}^-) < \infty \), the chain rule for \(\mathcal {F}_{n}\) and the net flux holds holds: \(\mathcal {F}_{n}(\textsf{P}_t)\) is absolutely continuous and
The proof of Theorem 3.8 is postponed to Sect. 3.3 and follows from the existence of a gradient-flow solution via EDP-convergence of a sequence of regularized problems established in Sect. 3.2, and its uniqueness via a convexity argument.
Remark 3.9
Similar to the mean-field case, the non-negativity of \(\mathcal {I}_{n}\) and the identification of solutions to (\(\mathsf MF\)) as null-minimizers of \(\mathcal {I}_n\) is related to the formal equivalence
where \(\mathcal {L}_n\) is the so-called Lagrangian given by
We discuss the implication of this relation in “Appendix A”.
Remark 3.10
(Net flux) To show the existence of gradient-flow solutions in the sense of null-minimizers of \(\mathcal {I}_n\) we will have to jump from gradient-flow solutions in the sense of [33], see Theorem 3.20. The expressions for net fluxes are in fact contractions of those for one-way or uni-directional fluxes, as discussed in Section A, which we use to show that the two notions of gradient-flow solutions are equivalent.
3.1 A priori estimates
Below we will state the estimates and identities necessary to prove the chain rule and establish the existence of solutions.
Recall that \(\vartheta _{\textsf{P}}^{\pm }\) satisfies the same restrictions (Conditions (2) and (3)) as the fluxes \(\textsf{J}^{\pm }\). This is easily verified, but since we will use it repeatedly let us state it here precisely.
Lemma 3.11
For any \(\textsf{P}\in \mathcal {P}(\Gamma _n)\)
In particular, for any \(\omega \in C_c(\Gamma \times \mathcal {T})\)
and
Finally,
The above identities allow us to prove the symmetry condition that implies the detailed balance condition (3.7).
Lemma 3.12
(Detailed balance)
Proof
Fix an arbitrary \(\omega \in C_c(\Gamma \times \mathcal {T})\), and for any ordered collection of N variables in \(\mathcal {T}\) set \({\textbf{x}}^{N}:=(x_1,\dots ,x_N) \in {\mathcal {T}}^N\). We then have the following.
Since \(\kappa ^-[\tfrac{1}{n}\delta _{y}]=0\) for any \(y\in \mathcal {T}\), the sum in the right-hand side of the last expression starts from \(N=2\), thus reducing the expression to
It is clear that, for our desired equality, it is enough to show that for every N,
To do so, note that since \(c(x,x)=0\),
Hence, by symmetry of \(\gamma ^{\otimes (N+1)}\), we obtain
as desired. \(\square \)
Recall from Lemma 2.10 that that
where \(M:=(1+\gamma (\mathcal {T}))\Vert c\Vert _{\infty }\). Now let
and the jointly convex and lower semicontinuous function \(\Upsilon :{\mathbb {R}}_{\ge 0}^3\rightarrow [0,+\infty ]\) given by
We then have the following result.
Lemma 3.13
The following statements hold:
-
(i)
For all \(\textsf{P}\)
$$\begin{aligned} \int _{\Gamma \times \mathcal {T}} (1+\nu (\mathcal {T}))^{-2}\,\Theta _{\textsf{P}}^{n,\pm }(\textrm{d}\nu \,\textrm{d}y)\le \int _{\Gamma \times \mathcal {T}} (1+\nu (\mathcal {T})^2)^{-1}\,\Theta _{\textsf{P}}^{n,\pm }(\textrm{d}\nu \,\textrm{d}y) \le M_n. \end{aligned}$$ -
(ii)
For any \(\textsf{P}\), admissible \(\textsf{J}^{\pm }\), and net flux \(\textsf{J}^{\textrm{net}}=\textsf{J}^+-\textsf{T}^{n,-}_{\#}\textsf{J}^-\), \(\omega \in \mathcal {B}(\Gamma \times \mathcal {T})\), we have
$$\begin{aligned} \int _{\Gamma \times \mathcal {T}} |\omega |\,\textrm{d}|\textsf{J}^{\textrm{net}}| \, \le \mathcal {R}_n(\textsf{P},\textsf{J}^+,\textsf{J}^-)+\int _{\Gamma \times \mathcal {T}} \Psi ^*(\omega ) \, \textrm{d}\Theta _{\textsf{P}}^{n,+}. \end{aligned}$$Moreover,
$$\begin{aligned} \phi \left( 1 \vee \frac{1}{M_n} \int _{\Gamma \times \mathcal {T}} (1+\nu (\mathcal {T}{})^2)^{-1}\,\textsf{J}^{\pm }(\textrm{d}\nu ,\textrm{d}x) \right) M_n\, \le \mathcal {R}_n(\textsf{P},\textsf{J}^+,\textsf{J}^-), \end{aligned}$$(3.16a)$$\begin{aligned} \Psi \left( \frac{1}{M_n} \int _{\Gamma \times \mathcal {T}} (1+\nu (\mathcal {T}{}))^{-1}\,|\textsf{J}^{\textrm{net}}|(\textrm{d}\nu ,\textrm{d}x) \right) M_n\, \le \mathcal {R}_n(\textsf{P},\textsf{J}^+,\textsf{J}^-). \end{aligned}$$(3.16b) -
(iii)
For all admissible \(\textsf{P},\textsf{J}^{\pm }\),
$$\begin{aligned} \mathcal {E}\textrm{nt}(\textsf{J}^{\pm }|\Theta ^{n,\pm })=\int _{\Gamma \times \mathcal {T}} \Upsilon \left( \frac{\textrm{d}\textsf{J}^\pm }{\textrm{d}\Sigma },\frac{\textrm{d}\vartheta _{\textsf{P}}^\pm }{\textrm{d}\Sigma },\frac{\textrm{d}(\textsf{T}^{n,\mp }_{\#}\vartheta _{\textsf{P}}^\mp )}{\textrm{d}\Sigma }\right) \textrm{d}\Sigma , \end{aligned}$$(3.17)for any common dominating measure \(\Sigma \). Moreover, if \(\textrm{d}\textsf{P}=U \textrm{d}\Pi _n\),
$$\begin{aligned} \mathcal {E}\textrm{nt}(\textsf{J}^{\pm }|\Theta ^{n,\pm })=\int _{\Gamma \times \mathcal {T}} \Upsilon \left( \frac{\textrm{d}\textsf{J}^{\pm }}{\textrm{d}\vartheta _{\textsf{P}}^{\pm }},U(\nu ),U(\nu \pm \tfrac{1}{n}\delta _x)\right) \textrm{d}\vartheta ^{\pm }_{\textsf{P}}. \end{aligned}$$
Remark 3.14
Since \(M_n\le 3\,M\) for all \(n\ge 1\) the estimates (3.16) are uniform in n, which we will use in the EDP-convergence to establish tightness of sequences \(\textsf{J}^{n,\pm }\) under bound on \(\mathcal {I}_n\). Moreover, the representation (3.17) is used to deduce the lower-semicontinuity of \(\mathcal {I}_n\) for sequences of curves.
Proof
(i) For any \(x^*\in \mathcal {T}\), \(\nu \in \Gamma \), we have
due to the inequality
In particular,
and hence the desired statement follows after applying Jensen’s inequality.
(ii) By duality we have for any \(\omega \in \mathcal {B}_c(\Gamma \times \mathcal {T})\),
Substituting \(\omega ^+=\omega \), \( \omega ^-=-\omega \circ \textsf{T}^{n,-}\) and using the fact that \(\textsf{T}^{n,-}_{\#}\Theta ^{n,-}_{\textsf{P}}=\Theta ^{n,+}_{\textsf{P}}\) we derive
Since \(\Psi ^*\) is even we can replace \(\omega \) and \(\textsf{J}\) by their absolutes in the inequality, after substituting for \(\omega \) appropriately, and we conclude with a monotone convergence argument. The inequalities (3.16a) and (3.16b) now follow similarly as in Lemma 2.10 via respectively Jensen’s inequality and a dual approach.
(iii) Let us only consider \(\textsf{J}^+\), \(\Theta ^{n,+}\) (the case for \(\textsf{J}^-\), \(\Theta ^{n,-}\) is similar). Suppose \(\mathcal {E}\textrm{nt}(\textsf{J}^{+}|\Theta ^{n,+})<\infty \) and recall that
where \(\Sigma \) is a dominating measure, e.g \(\Sigma =\vartheta ^{+}_{\textsf{P}}+\textsf{T}^{n,-}_{\#}\vartheta ^{-}_{\textsf{P}}\). Then \(\textsf{J}^+\ll \Theta _{\textsf{P}}^{n,+} \ll \Sigma \), and it follows that \(\textsf{J}^+\)-a.e. \(\textrm{d}\vartheta _{\textsf{P}}^{+}/\textrm{d}\Sigma \), \(\textrm{d}(\textsf{T}^{n,-}_{\#}\vartheta _{\textsf{P}}^{-})/\textrm{d}\Sigma >0\), from which one can easily verifies (3.17).
Vice versa, suppose that
for some dominating measure \(\Sigma \). Then again \(\textsf{J}^+\)-a.e. we have that \(\textrm{d}\vartheta _{\textsf{P}}^{+}/\textrm{d}\Sigma \), \(\textrm{d}(\textsf{T}^{n,-}_{\#}\vartheta _{\textsf{P}}^{-})/\textrm{d}\Sigma >0\), and by super-linearity of \(\phi \) deduce that in fact \(\textsf{J}^+\ll \tilde{\Sigma }\) for any dominating measure of \(\vartheta _{\textsf{P}}^+\) and \(\textsf{T}^{n,-}_{\#}\vartheta _{\textsf{P}}^-\), which together implies \(\textsf{J}^+\ll \Theta _{\textsf{P}}^{n,+}\) and the result follows similarly as above. \(\square \)
Finally, we discuss the time-regularity of \(\textsf{P}_t\) for admissible curves and state the analog of Lemma 2.12. Let the weighted total variation metric \(d_{TV,w}\) be given as
Note that \(d_{TV,w}\) is lower semicontinuous with respect to the narrow topology, and while convergence in \(d_{TV,w}\) does not directly imply narrow convergence, it does so on narrowly pre-compact sets.
Lemma 3.15
For any \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_n\) we have \( \text{ for } \text{ all } s,t\in [0,T]\):
Suppose in addition that \(\textsf{P}_t\ll \Pi _n, \textsf{J}^{\pm }_t\ll \vartheta _{\Pi _n}^{\pm }\) for all \(t\in [0,T]\) and set
Then there exists an absolutely continuous and a.e. differentiable map \(U:[a,b]\rightarrow L^1(\mathcal {P}(\Gamma ),\ell )\) and maps \(G^\pm :[0,T]\rightarrow L^1(\Sigma ^{\pm })\) such that \(U_t=\textrm{d}\textsf{P}_t/\textrm{d}\Pi _n\), \(G_t^{\pm }=\textrm{d}\textsf{J}^{\pm }/\textrm{d}\vartheta _{\Pi _n}^{\pm }\), and
Alternatively, in terms of the net-flux \(\textsf{J}^{\textrm{net}}=G \vartheta _{\textsf{P}}^+\) with \(G^\textrm{net}:=G^+-G^-\circ \textsf{T}^{n,+}\),
Remark 3.16
Note that the estimate (3.19) for the weighted total variation metric blows up as \(n\rightarrow \infty \). For the proof of EDP-convergence we instead use a weaker metric, the transportation-like metric W defined by (4.4), which does behave uniform-in-n for a sequence of curves with finite \(\limsup _{n\rightarrow \infty } \mathcal {I}_n\).
Proof
Due to the continuity equation and after a monotone class argument, we have the crude estimate
for any \(F\in \mathcal {B}_c(\Gamma )\). Now fix \(F\in \mathcal {B}_c(\Gamma )\), and let \(K:=\sup _{\nu \in \Gamma } F(\nu )(1+\nu (\mathcal {T})^2)\). Note that by the bounds of Lemma 3.13 for any \(\nu \in \Gamma _n\), we have the following estimates
and therefore
Taking the supremum over all \(F\in \mathcal {B}_c(\Gamma )\) with \(\sup _{\nu \in \Gamma } F(\nu )(1+\nu (\mathcal {T})^2)\le 1\) we conclude that
Next, suppose that \(\textsf{P}_t\ll \Pi _n, \textsf{J}^{\pm }_t\ll \vartheta _{\Pi _n}^{\pm }\) for all \(t\in [0,T]\). Let \(U_t=\textrm{d}\textsf{P}_t/\textrm{d}\Pi _n\), \(G_t^{\pm }=\textrm{d}\textsf{J}^{\pm }/\textrm{d}\vartheta _{\Pi _n}^{\pm }\). Note that by the absolute continuity of \(\textsf{P}_t\) with respect to \(d_{TV,w}\), the map \(t\mapsto U_t\) is absolutely continuous in \(L^1(\ell )\). Moreover, for every \(F\in \mathcal {B}_c(\Gamma )\) the continuity equation reads as
But due to Lemma 3.12, the integrands can be rewritten as follows
and therefore
which is the weak formulation of (3.20). Putting in the pre-factors \((1+\nu (\mathcal {T})^2)^{-1}\) to state the expression in terms of the finite measures \(\ell \) and \(\Sigma \), and noting that due to time-regularity \((1+\nu (\mathcal {T})^2)^{-1} \textsf{P}_t\) is TV-regular, we can proceed as in Corollary 4.14 of [33] and conclude the proof after redefining \(U,G^{\pm }\) on negligible sets. \(\square \)
3.2 Weak solutions
In this section, we will discuss the existence of weak solutions to (\(\mathsf FKE_n\)), i.e. solutions to
in appropriate weak form, but with the property that \(\mathcal {I}_n(\textsf{P},\vartheta _{\textsf{P}}^+,\vartheta _{\textsf{P}}^-)\le 0\). In the next section, we will show that \(\mathcal {I}_n\ge 0\) and that gradient-flow solutions, i.e. those with \(\mathcal {I}_n=0\), are unique.
Definition 3.17
A curve \((\textsf{P}_t)_{t\in [0,T]}\) is a weak solution to (\(\mathsf FKE_n\)) if \(\textrm{supp} \,\textsf{P}_t\in \Gamma _n\) for all \(t\in [0,T]\), \(\textsf{P}_t\) is continuous in the narrow topology and for all \(s,t\in [0,T]\), and all \(F\in C_c(\Gamma )\),
Remark 3.18
Recall that \(\int (1+\nu (\mathcal {T}))^2)\,\textrm{d}\vartheta _{\textsf{P}_t}^{\pm }\le M_n\) independently of \(\textsf{P}_t\). Hence it is easy to check that \((\textsf{P})\) is a weak solution if and only if \((\textsf{P},\vartheta _{\textsf{P}}^+,\vartheta _{\textsf{P}}^-)\in \textsf{CE}_{n}\).
Moreover, under some additional assumptions, solutions turn out to inherit polynomial mass-estimates from the initial datum, see e.g. Theorem 3.1 of [18] for the case in \({\mathbb {R}}^d\). While throughout this article we do not assume more from the initial data than having finite relative entropy with respect to \(\Pi _n\) (which does imply the finiteness of the first moment), we provide the higher-moment estimates here for completeness.
Lemma 3.19
Fix any \(p\ge 0\), and assume that \((\textsf{P})\) is a weak solution with initial datum satisfying
and such that for any \(F\in \mathcal {B}_b(\Gamma )\), \(s,t\in [0,T]\) we have the inequality
Then
The condition (3.21) is necessary to show the propagation of mass-moments, but can itself be shown to hold if the first moment is uniformly bounded in time (and in particular if the relative entropy with respect to \(\Pi _n\) is uniformly bounded in time), using the compactly supported multipliers \(\chi _m\) of Sect. 3.3.
Proof
Set \(F(\nu ):=f(\nu (\mathcal {T}))\) with \(f(z):=z^p\) and let \(f_k(z)=\min \{z,k\}^p\) be its sequence of truncations. Setting \(F_k(\nu ):=f_k(\nu (\mathcal {T}))\), we have for every \(k\ge 1\),
where we used the fact that \(f_k\) is non-decreasing, thus implying \(\overline{\nabla }^{n,-} F_k\le 0\). Recalling that
and using that \(z(f_k(z+\tfrac{1}{n})-f_k(z))\le C_{p,n}(1+f_k(z))\) for a suitable constant \(C_{p,n}\) independent of k, we can apply a standard Gronwall argument to obtain
Taking \(k\rightarrow \infty \) we derive the desired inequality by monotone convergence. \(\square \)
We can now state the existence result of a weak solution satisfying one-half of the Energy-Dissipation principle, which is complemented by the chain rule proved in Sect. 3.3. The existence proof is one of EDP-convergence (see also Sect. 5), bootstrapping from problems with bounded kernels and the results of [33].
Theorem 3.20
Suppose that
Then there exist a weak solution \((\textsf{P})\) with initial datum \({\bar{\textsf{P}}}\) such that
Proof
Fix any \({\bar{\textsf{P}}}\) with \(\mathcal {E}\textrm{nt}({\bar{\textsf{P}}}|\Pi _n)<\infty \). We proceed by approximating the unbounded kernel \({{\bar{\kappa }}}_n\) with bounded ones. For every \(\varepsilon >0\), we introduce the regularized jump kernel \({{\bar{\kappa }}}_{n}^{\varepsilon }(\nu ,\textrm{d}\eta )\) over \(\Gamma \) defined by
In terms of birth/death kernels, this can be rewritten as
where
Note that
Correspondingly, we denote \(\vartheta _{\textsf{P}}^{\pm ,\varepsilon }\), \(\Theta ^{n,\pm ,\varepsilon }_{\textsf{P}}\), \(Q_{n,\varepsilon }\), \(Q_{n,\varepsilon }^*\), \(\mathcal {R}_{n,\varepsilon }\), \(\mathcal {D}_{n,\varepsilon }\), \(\mathcal {I}_{n,\varepsilon }\), \((\textsf{FKE}_{n,\varepsilon })\) as the relevant quantities, operators, functionals and forward Kolmogorov equations induced by \(\kappa ^{\pm ,\varepsilon }_{\nu }\). We will first show the existence of gradient-flow solutions for the regularized problems, i.e. curves such that \(\mathcal {I}_{n,\varepsilon }=0\), and then construct an appropriate limit curve as \(\varepsilon \rightarrow 0\).
Thus, fix any \(\varepsilon >0\). Due to the bound (3.22) it is clear that \(Q_{n,\varepsilon }\) is a bounded operator since
Moreover, since the prefactor \(\nu (\mathcal {T})\eta (\mathcal {T})\) is symmetric under swapping of \(\nu \) and \(\eta \), it straightforward to verify that \({{\bar{\kappa }}}_{n}^{\varepsilon }\) is still reversible with respect to the same invariant measure \(\Pi _n\), i.e. we have
The triple \((\Gamma ,\Pi _n,{{\bar{\kappa }}}_{n}^{\varepsilon })\) therefore satisfies the assumptions of [33]. Keeping in mind the difference in definitions of \(\Psi ^*\) due to the extra factor 2, by [33, Theorem 6.6] there exist a unique curve \(U^{\varepsilon }\in C^1([0,T],L^1(\Gamma ,\Pi _n))\) such that \(U_0=\textrm{d}{\bar{\textsf{P}}}/\textrm{d}\Pi _n\), and
with \(\textsf{P}_t:=U_t \Pi _n\) as usual. In particular the entropy \(\mathcal {E}\textrm{nt}(\textsf{P}|\Pi _n)\) decreases along the solution and hence
By evenness of \(\Psi \), symmetry of \(\Pi _n {{\bar{\kappa }}}_n^{\varepsilon }\) and the identity (A.2), we can express for any U after substituting for \({{\bar{\kappa }}}_{n}^{\varepsilon }(\nu ,\textrm{d}\eta )\)
Moreover, it is straightforward to check that
and therefore with \(J^{\pm }:=\vartheta _{\textsf{P}}^{\pm ,\varepsilon }\) we conclude
Finally, note that by Lemma 3.13 and Remark 3.6
for any dominating measure \(\Sigma \), which are both non-negative, convex, and vaguely lower-semicontinuous functionals of \(\vartheta _{\textsf{P}}^{\pm ,\varepsilon },\textsf{T}^{n,\mp }_{\#}\vartheta _{\textsf{P}}^{\mp ,\varepsilon }\) in \(\mathcal {M}_{loc}(\Gamma \times \mathcal {T})\), see [6, Theorem 3.4.3].
Next, we consider the sequence of pairs \((\textsf{P}^{\varepsilon },\textsf{J}^{\pm ,\varepsilon })\) stemming from the regularized problems above, satisfying
As for a priori estimates, we have
and
From the latter, it can be shown similarly as in Lemma 3.15 that we have the equicontinuity result
Here \(d_{TV,w}\) is the weighted total variation-metric defined in (3.18) as
Recall that d is lower semicontinuous with respect to the narrow topology and convergence in d implies narrow convergence on narrowly pre-compact sets. Since \(\mathcal {E}\textrm{nt}(\textsf{P}^{\varepsilon }_t|\Pi _n)\) is bounded uniformly in \(\varepsilon \) and t and \(\mathcal {E}\textrm{nt}(\cdot |\Pi _n)\) is narrowly coercive we obtain by a standard Arzelá-Ascoli argument, up to choosing a subsequence, the existence of a curve \(t\mapsto \textsf{P}_t\) such that
Note that by the estimate (3.23) and lower-semicontinuity of the entropy, we have that for every \(t\in [0,T]\), the sequence of measures \(\textsf{P}_t^{\varepsilon }\) converge setwise to \(\textsf{P}_t\) and \(\mathcal {E}\textrm{nt}(\textsf{P}_t|\Pi _n)\le \mathcal {E}\textrm{nt}({{\bar{\textsf{P}}}}|\Pi _n)<\infty \). Moreover, \(\kappa ^{\pm ,\varepsilon }_{\nu } \nearrow \kappa ^{\pm }_{\nu }\) as \(\varepsilon \rightarrow 0\) for every \(\nu \), and hence setwise convergence of \(\textsf{P}^{\varepsilon }_t\) implies setwise convergence on pre-compact sets of \(\Gamma \times \mathcal {T}\) for
see e.g. [33, Lemma 2.4] for the case of set-wise convergence for bounded jump kernels. In particular, we have the vague convergence
It is straightforward to check that we can pass to the limit in the continuity Eq. (3.11), and in particular, derive that \(\textsf{P}\) is a weak solution to the unregularized problem.
Finally, recall that \(\mathcal {F}_n(\textsf{P}_T^{{\varepsilon }})\) is convex in and narrowly lower semicontinuous in \(\textsf{P}^{\varepsilon }_T\), and as shown above the action \(\mathcal {R}^{\varepsilon }_n\) is jointly convex and lower semicontinuous in \((\vartheta ^{\pm ,\varepsilon }_{\textsf{P}^{\varepsilon }},\textsf{T}^{n,\mp }_{\#}\vartheta ^{\mp ,\varepsilon }_{\textsf{P}^{\varepsilon }})\). Proceeding as in Remark 3.6, we also find that the Fisher information is jointly convex and lower semicontinuous in \((\vartheta ^{\pm ,\varepsilon }_{\textsf{P}^{\varepsilon }},\textsf{T}^{n,\mp }_{\#}\vartheta ^{\mp ,\varepsilon }_{\textsf{P}^{\varepsilon }})\) if \(\textsf{P}^{\varepsilon }\) are contained in sub-level sets of \(\mathcal {F}_n\). Therefore, we conclude that
thus establishing the claim. \(\square \)
3.3 Variational characterization
We will now present the chain rule for the entropy. The strategy of the proof is similar to the mean-field case and the proof for jump processes of [33], with the difference that due to the unboundedness of \({{\bar{\kappa }}}\), we need a two-fold regularization of the entropy, namely via truncations and compactly supported multipliers.
Theorem 3.21
For any \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{n}\) with \(\mathcal {F}_n(\textsf{P}_0)<\infty \) and \(\mathcal {I}_{n}(\textsf{P},\textsf{J}^+,\textsf{J}^-) < \infty \), it holds that \(t \mapsto \mathcal {F}_{n}(\textsf{P}_t)\) is absolutely continuous and
Moreover, \(\mathcal {I}_n\ge 0\), and if \(\mathcal {I}_n=0\) we have
Proof
For any curve \(\textsf{P}\) with \(\textsf{P}\ll \Pi _n\) for all \(t\in [0,T]\) we will use
where \(\phi _{k,m}(U,\nu )=\chi _k(\nu )\phi _m(U)\), \(k,m\in {\mathbb {N}}\) with \(\phi _m\) the previously defined regularized entropy functions, and \(\chi _k:=f_k(\nu (\mathcal {T})) \in C_c(\Gamma )\) compactly supported multipliers defined via
Note that \(|f_k|\le 1\), \(|f_k'(z)|z\le 2\) uniformly in k, \(f_k\) converges monotonically to 1, and \(|\overline{\nabla }^{n,+} \xi _k|(\nu ,x)\le 3/(1+\nu (\mathcal {T}))\) if \(k\ge 1\). In addition, recall that \(\phi _m'\) converges pointwise to \(\phi '\) and \(|\phi '_m|,\phi _m\) converge monotonically to \(|\phi '|,\phi \) respectively, and in particular,
Moreover, let the distributional derivatives with respect to \(\textsf{P}\) be defined as
Note that pointwise \(\lim _{k\rightarrow \infty } \overline{\nabla }^{n,\pm } DS^{k,m}_t=\overline{\nabla }^{n,\pm } DS^{m}_t\) and \(\lim _{m\rightarrow \infty } \overline{\nabla }^{n,\pm } DS^{m}_t = \overline{\nabla }^{n,\pm } \phi '(U_t)\).
Now, consider a curve \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_n\) with \(\mathcal {F}_n(\textsf{P}_0)<\infty \) and \(\mathcal {I}_n<\infty \). Since \(\mathcal {E}\textrm{nt}\) is bounded from below
and therefore \(\textsf{P}_t\ll \Pi _n\), \(\textsf{J}^{\pm }_t\ll \Theta ^{n,\pm }_{\textsf{P}_t} \ll \vartheta ^{\pm }_{\Pi _n}\) for a.e. \(t\in [0,T]\), with
In particular \(U_t(\nu )\), \(U_t(\nu \pm \tfrac{1}{n}\delta _x)>0\) for \(\textsf{J}^{\pm }_t, \Theta ^{n,\pm }_{\textsf{P}_t}\)-a.e. \(\nu ,x\).
Moreover, set \(\textsf{J}^{\pm }_t=G_t^{\pm } \vartheta _{\Pi _{{n}}}^{\pm }\), \(\textsf{J}^{\textrm{net}}_t=G^{\textrm{net}}_t \vartheta _{\Pi _{{n}}}^+\) (or \(G^{\textrm{net}}_t:=G_t^+-G_t^-\circ \textsf{T}^{n,+})\), and
By Lemma 3.15, the map \(t\mapsto U_t\) is absolutely continuous and a.e. differentiable in \(L^1(\mathcal {P}(\Gamma ),\ell )\) with
or in terms of the net flux,
Therefore, since \((1+\nu (\mathcal {T})^2)\) is bounded from above and below on the support of \(\xi _k\), it is clear that for every m, n the maps \(t\mapsto S_t^{k,m}\) are Lipschitz, absolutely continuous and for a.e. \(t\in [0,T]\)
and in particular, for all \(s,t\in [0,T]\),
Recall that the following convergences hold pointwisely:
Moreover, the following estimate holds for every \((\nu ,x)\):
where the final inequality follows from the truncation inequality for discrete derivatives, i.e. \(|\phi _m(\eta )-\phi _m(\nu )|\le |\phi (\eta )-\phi (\nu )|\). Note that by Lemma 3.13, for any \(\textsf{P},\textsf{J}^{\pm }\) with finite \(\mathcal {R}_n\) that
and moreover
with
Therefore, since \(\mathcal {E}\textrm{nt}(\textsf{P}_0|\Pi _n)<\infty \) we find by a dominated convergence argument and taking subsequent limits in k and m in (3.24) that \(\mathcal {E}\textrm{nt}(\textsf{P}_t|\Pi _n)<\infty \) for all \(t\in [0,T]\),
and
Next, assume that \(\mathcal {I}_n=0\). Then the above arguments imply that for a.e. \(t\in [0,T]\),
To simplify manipulations, let \(U^{\pm }(\nu ,x):=U\circ \textsf{T}_x^{n,\pm }=U(\nu \pm \tfrac{1}{n}\delta _x)\). Note that for the actions,
for the modified Fisher information \(\mathcal {D}_n^-\),
and finally
which due to \(\textsf{J}^{\pm }\ll \Theta ^{n,\pm }_{\textsf{P}}\) is equal to
Therefore, after some cumbersome rewriting, the integrands of the left-hand side of (3.25) reads as the indicator functions over \(\{U,U^+>0\}\) multiplied by the terms
since
By duality of \(\phi ,\phi ^*\) we have \(G^+=U\) and \(G^{-}\circ \textsf{T}^{n,+}=U^+\), hence \(G^-=U\) as well. Subsequently we can conclude that \(\mathcal {I}_n=0\) if and only if \(\textsf{J}^{\pm }_t=\vartheta _{\textsf{P}_t}^{\pm }\) for a.e. \(t\in [0,T]\) and a.e. \(\nu ,x\). \(\square \)
Together, Theorems 3.21 and 3.20 provide a proof of the variational characterization for the forward Kolmogorov equation.
Proof of Theorem 3.8
Under the assumption of \(\mathcal {F}_n(\textsf{P}_0)<\infty \) we have by Theorem 3.21 a chain rule for the entropy, the inequality \(\mathcal {I}_n\ge 0\), and the statement that \(\mathcal {I}_n(\textsf{P},\textsf{J}^+,\textsf{J}^-)=0\) implies that \(\textsf{P}\) is a weak solution. Moreover, due to Theorem 3.20 there exists a weak solution with \(\mathcal {I}_n\le 0\).
It remains to show that gradient-flow solutions are unique, which is a classical argument using the strict convexity of \(\mathcal {F}_n\), e.g. see Theorem 5.9 of [33]. Suppose that there exist two curves \(\textsf{P}^1,\textsf{P}^2\) such that \(\textsf{P}^1_0=\textsf{P}^2_0={{\bar{\textsf{P}}}}\), \(\mathcal {I}_n(\textsf{P}^1,\vartheta _{\textsf{P}^1}^+,\vartheta _{\textsf{P}^1}^-)\) and \(\mathcal {I}_n( \textsf{P}^2,\vartheta _{\textsf{P}^2}^+,\vartheta _{\textsf{P}^2}^-)=0\). Applying the chain rule it is straightforward to verify that for a gradient-flow solution \(\mathcal {I}_{n}^t=0\) for every \(t \in [0,T]\), where
and that \(\mathcal {I}_n^t\ge 0\) for arbitrary curves with initial condition \({{\bar{\textsf{P}}}}\).
Now, define \({\tilde{\textsf{P}}}_t=\tfrac{1}{2}\textsf{P}^1+\tfrac{1}{2}\textsf{P}^2\) and note that \(({\tilde{\textsf{P}}},\vartheta _{\tilde{\textsf{P}}}^+,\vartheta _{{\tilde{\textsf{P}}}}^-)\in \textsf{CE}_n\) as well, and
Fix any \(t \in [0,T]\) and suppose that \(\textsf{P}_t^1\ne \textsf{P}_t^{2}\). Then by convexity of \(\mathcal {R}_n\) and \(\mathcal {D}_n\), and strict convexity of \(\mathcal {F}_n\), we have
which leads to a contradiction, and hence \(\textsf{P}_t^1= \textsf{P}_t^{2}\) for all \(t \in [0,T]\). \(\square \)
4 Liouville equation and lifted dynamics
In this section, we will consider the variational formulation for our proposed limit of the forward Kolmogorov equation (\(\mathsf FKE_n\)), namely the Liouville equation
It can be interpreted as a transport equation lifted from the mean-field dynamics, in the sense that it describes the evolution of the law of a deterministic process satisfying the mean-field equation but with possibly random initial conditions. We will consider the same ingredients as in previous sections, namely a non-negative EDP functional consisting of an action term, a difference of free energies, and a corresponding Fisher information term. The main technical tool that we use is a new superposition principle, which allows us to prove the chain rule via the results on mean-field curves of Sect. 2.
Solutions to (Li) are defined as appropriate weak solutions to
where \(\textsf{P}_t \in \mathcal {P}(\Gamma )\) for all \(t\in [0,T]\) and the operator \(Q_{\infty }^*\) is the dual of \(Q_{\infty }\) given by
for all \(F\in \textrm{Cyl}_c(\Gamma )\). Here \(\textrm{Cyl}_c(\Gamma )\) is the space of all compactly supported smooth cylinder functions, i.e. those of the form
where \(g\in C^{\infty }_c({\mathbb {R}}^{m})\) with \(m \in {\mathbb {N}}\), and \(f_1,\dots ,f_m\in C_b(\mathcal {T})\), and \(\textrm{grad}_{\Gamma }\) is the distributional gradient defined by
To be precise, we consider the following type of solutions.
Definition 4.1
A curve \((\textsf{P}_t)_{t\in [0,T]}\) is a weak solution to (\(\mathsf Li\)) if \(\textsf{P}_t\) is continuous in the narrow topology and for all \(s,t\in [0,T]\), and all \(F\in \textrm{Cyl}_c(\Gamma )\),
Remark 4.2
Note that (Li) is the transport equation associated with the measure-valued vector field \(V[\nu ]\). Now let the flow \(G:[0,T]\times \Gamma \rightarrow \Gamma \) be the unique strong solution to the mean-field equation, i.e. with
As will be shown in Sect. 4.2, \(\textsf{P}_t:=(G_t)_{\#} {{\bar{\textsf{P}}}}\) is a weak solution to (\(\mathsf Li\)) for any initial data \({{\bar{\textsf{P}}}}\in \mathcal {P}(\Gamma )\). In particular, if \(\nu _t\) is a solution to (\(\mathsf MF\)) than \(\textsf{P}_t:=\delta _{\nu _t}\) is a weak solution to (Li).
Instead of the solution to (Li), we will now consider arbitrary curves satisfying
in the following appropriate distributional sense.
Definition 4.3
(Continuity equation) A triple \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\) satisfies the continuity equation \(\textsf{CE}_{\infty }\), if
-
(1)
the curve \([0,T]\ni t\mapsto \textsf{P}_t\in \mathcal {P}(\Gamma )\) is narrowly continuous,
-
(2)
the Borel family \((\textsf{J}^{\pm }_t)_{t\in [0,T]}\in \mathcal {M}^+_{loc}(\Gamma \times \mathcal {T})\) satisfies
$$\begin{aligned} \int _0^T \int _{\Gamma \times \mathcal {T}} (1+\nu (\mathcal {T})^2)^{-1}\,\textrm{d}\textsf{J}^{\pm }_{t} \, \textrm{d}t<\infty , \end{aligned}$$ -
(3)
for every \(s,t\in [0,T]\) and all \(F\in \textrm{Cyl}_c(\Gamma )\)
$$\begin{aligned} \int _{\Gamma } F(\nu ) \,\textrm{d}\textsf{P}_t - \int _{\Gamma } F(\nu ) \,\textrm{d}\textsf{P}_s = \int _s^t \int _{\Gamma \times \mathcal {T}} \textrm{grad}_{\Gamma }F \,(\textrm{d}\textsf{J}_r^+-\textrm{d}\textsf{J}_r^-) \, \textrm{d}r. \end{aligned}$$
Moreover, let us introduce the EDP-functional. Recall from Sect. 3 the notation \(\vartheta _{\textsf{P}}^{\pm }(\textrm{d}\nu ,\textrm{d}x):=\kappa ^{\pm }[\nu ](\textrm{d}x)\textsf{P}(\textrm{d}\nu )\).
Definition 4.4
Let \(\Theta ^{\infty }_{\textsf{P}}\in \mathcal {M}_{loc}(\Gamma \times \mathcal {T})\) be the geometric average of \(\vartheta ^{+}_{\textsf{P}}\) and \(\vartheta ^{-}_{\textsf{P}}\), i.e.
for any dominating measure \(\Sigma \). We define the following objects:
-
The dissipation potential \(\mathcal {R}_{\infty }:\mathcal {P}(\Gamma )\times \mathcal {M}_{loc}^+(\Gamma \times \mathcal {T})^2\rightarrow [0,+\infty ]\),
$$\begin{aligned} \mathcal {R}_{\infty }(\textsf{P},\textsf{J}^+,\textsf{J}^-):=\mathcal {E}\textrm{nt}(\textsf{J}^{+}|\Theta ^{\infty }_{\textsf{P}})+\mathcal {E}\textrm{nt}(\textsf{J}^{-}|\Theta ^{\infty }_{\textsf{P}}). \end{aligned}$$ -
The dual dissipation potential \(\mathcal {R}^*_{\infty }:\mathcal {P}(\Gamma )\times \mathcal {B}_c(\Gamma \times \mathcal {T})^2\rightarrow {\mathbb {R}}\),
$$\begin{aligned} \mathcal {R}_{\infty }^*(\textsf{P},\omega ^+,\omega ^-):=\int _{\Gamma \times \mathcal {T}} (e^{\omega ^{+}}-1)\, \textrm{d}\Theta ^{\infty }_{\textsf{P}}+\int _{\Gamma \times \mathcal {T}} (e^{\omega ^{-}}-1)\, \textrm{d}\Theta ^{\infty }_{\textsf{P}}. \end{aligned}$$ -
The free energy \(\mathcal {F}_{\infty }:\mathcal {P}(\Gamma )\rightarrow [0,+\infty ]\),
$$\begin{aligned} \mathcal {F}_{\infty }(\textsf{P}):=\int _{\Gamma } \mathcal {F}_{MF}(\nu )\, \textsf{P}(\textrm{d}\nu ). \end{aligned}$$ -
The Fisher information \(\mathcal {D}_{\infty }:\mathcal {P}(\Gamma )\rightarrow [0,+\infty ]\),
$$\begin{aligned} \mathcal {D}_{\infty }(\textsf{P}):=\int _{\Gamma } \mathcal {D}_{MF}(\nu ) \, \textsf{P}(\textrm{d}\nu ). \end{aligned}$$ -
The EDP-functional \(\mathcal {I}_{\infty }:\textsf{CE}_{\infty }\rightarrow [0,+\infty ]\) for all curves with \(\mathcal {F}_{\infty }(\textsf{P}_0)<\infty \),
$$\begin{aligned} \mathcal {I}_{\infty }(\textsf{P},\textsf{J}^+,\textsf{J}^-):=\int _0^T \mathcal {R}_{\infty }(\textsf{P}_t,\textsf{J}_t^+,\textsf{J}^-_t) \, \textrm{d}t + \mathcal {F}_\infty ({\textsf{P}_T})-\mathcal {F}_\infty ({\textsf{P}_0})+\int _0^T \mathcal {D}_{\infty }(\textsf{P}_t) \, \textrm{d}t. \end{aligned}$$
Remark 4.5
Recall from Sect. 2 that \(\mathcal {F}_{MF}(\nu ):=\tfrac{1}{2}\mathcal {E}\textrm{nt}(\nu |\gamma )\) and
In particular, if \(\mathcal {F}_{\infty }(\textsf{P})<\infty \) we have
Remark 4.6
Note that \(\Theta _{\textsf{P}}^ {\infty }(\textrm{d}\nu ,\textrm{d}x)=\textsf{P}(\textrm{d}\nu ) \theta _{\nu }(\textrm{d}x)\). Moreover, if \(\mathcal {E}\textrm{nt}(\textsf{J}_t^{\pm }|\Theta _{\textsf{P}_t}^{\infty })\) is finite, we can set
and it is straightforward to verify that we have the disintegration
and the equivalence
Together with the definitions of \(\mathcal {F}_{\infty }\) and \(\mathcal {D}_{\infty }\) this implies that if \(\mathcal {I}_{\infty }(\textsf{P},\textsf{J}^+,\textsf{J}^-)\) is finite then the \(\lambda _{t}^{\pm }[\nu ]\) are well-defined for a.e. \(t\in [0,T]\), and
Throughout the rest of this section we will simply write \(\lambda _{t,\nu }^{\pm }=\lambda ^\pm _t[\nu ]\).
We will show the following equivalence, which subsumes Theorem (1.7).
Theorem 4.7
For any \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{\infty }\) with \(\mathcal {F}_{\infty }(\textsf{P}_0)<\infty \), the EDP-functional \(\mathcal {I}_{\infty }\) is finite if and only if there exists a Borel probability measure Q over \(C([0,T];\Gamma )\) such that
-
(1)
for the time-evaluations \(e_t\) we have \((e_t)_{\#}Q=\textsf{P}_t\) for all \(t\in [0,T]\),
-
(2)
the measure Q is concentrated on the family of curves \(\nu \in AC([0,T];(\Gamma ,\Vert \cdot \Vert _{TV}))\) such that \((\nu ,\lambda ^+_{\nu },\lambda ^-_{\nu }) \in \mathscr{C}\mathscr{E}\), where \(\lambda _\nu ^\pm \) is defined via the disintegration
$$\begin{aligned} \textsf{J}_t^{\pm }(\textrm{d}\nu ,\textrm{d}x)=\lambda _{t,\nu }^{\pm }(\textrm{d}x)\textsf{P}_t(\textrm{d}\nu )\qquad \text {for a.e. } t\in [0,T], \end{aligned}$$ -
(3)
we have the representation
$$\begin{aligned} \mathcal {I}_{\infty }(\textsf{P},\textsf{J}^+,\textsf{J}^-)= \int \mathcal {I}_{MF}\left( \nu ,\lambda ^+_{\nu },\lambda ^-_{\nu }\right) \textrm{d}Q, \end{aligned}$$with the latter term finite.
In particular, \(\mathcal {I}_{\infty }\ge 0\), and
Here \(G_t:\Gamma \rightarrow \Gamma \) maps \({{\bar{\nu }}}\) to the unique mean-field solution \(\nu _t\) at time t, see Remark 4.2. It is determined by
We do not have a priori uniqueness of the Liouville equation. However, we do have the uniqueness of weak solutions for which a superposition holds, in particular for curves with finite \(\mathcal {I}_{\infty }\). Therefore gradient-flow solutions (null-minimizers of \(\mathcal {I}_{\infty }\)) are unique.
In the case of \(\textsf{P}_t:=\delta _{\nu _t}\) with \(\nu _t\) the solution to the mean-field equation there is a trivial superposition principle, and we have the following consequence.
Corollary 4.8
Suppose \(\textsf{P}_0=\delta _{\nu _0}\) with \(\mathcal {F}_{MF}(\nu _0)<\infty \). Then
4.1 A priori estimates
Due to the representation (4.3) of the dissipation potential in terms of mean-field objects, we can directly derive the following estimates from Lemma’s 2.10 and 2.13.
Corollary 4.9
Let \(\textsf{P}\in \mathcal {P}(\Gamma ),\textsf{J}^{\pm }\in \mathcal {M}^+_{loc}(\Gamma \times \mathcal {T})\) be such that \(\mathcal {R}_{\infty }(\textsf{P},\textsf{J}^+,\textsf{J}^-)<\infty \), and set
Then the following estimates hold:
Moreover, the following equivalence follows straightforwardly from Lemma 3.13.
Corollary 4.10
For any \(\textsf{P}\in \mathcal {P}(\Gamma ),\textsf{J}^{\pm }\in \mathcal {M}^+_{loc}(\Gamma \times \mathcal {T})\)
for any common dominating measure \(\Sigma \).
Finally, we consider the time regularity for arbitrary curves, with respect to the following metric.
Definition 4.11
We define the following metric:
where
Note that W is narrowly lower semicontinuous. Moreover, we have that
and hence by a density argument, it is straightforward to verify that convergence in W implies vague convergence on \(\Gamma \), and therefore narrow convergence on narrowly pre-compact subsets.
Remark 4.12
Formally, one can represent W as a transport distance, in the sense that
where \(W_{d_{\Gamma }}\) is the 1-Wasserstein metric on \(\mathcal {P}(\Gamma )\) induced by the metric \(d_{\Gamma }\) over \(\Gamma \) given by
However, we do not require such representations in this current work.
Lemma 4.13
For any \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{\infty }\) we have
Proof
This follows directly from the continuity equation, since for any \(F\in {\mathbb {F}}\), \(s,t\in [0,T]\):
Taking the supremum over all \(F\in {\mathbb {F}}\) we obtain the desired statement. \(\square \)
4.2 Weak solutions
Here we briefly consider the existence and representations for solutions to the Liouville equation.
Lemma 4.14
For any \({{\bar{\textsf{P}}}}_t\in \mathcal {P}(\Gamma )\) there exists a solution \(\textsf{P}\) to (Li) with initial data \({{\bar{\textsf{P}}}}\).
Proof
Recall the flow \(G:[0,T]\times \Gamma \rightarrow \Gamma \) determined by
Set \(\textsf{P}_t:=(G_t)_{\#} {{\bar{\textsf{P}}}}\). We will show that \(\textsf{P}_t\) is a weak solution in the sense of (4.1). Namely, consider any \(F\in \textrm{Cyl}_c(\Gamma )\). Due to the strong regularity of solutions to the mean-field equation, it is straightforward to show that for all \(s,t\in [0,T]\) we have the chain rule
and hence
and thus \(\textsf{P}_t\) is indeed a weak solution. \(\square \)
4.3 Superposition principle
One of our main tools in proving the chain rule, uniqueness of solutions, and the variational representation of Theorem 4.7 is the superposition principle. It guarantees that we can represent the action as an expectation of the mean-field action under some measure over curves in \(\mathscr{C}\mathscr{E}\), and allows us to use the theory on mean-field dynamics of Sect. 2. In this section, we will make this notion precise.
Theorem 4.15
Let \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{\infty }\) with
Then there exists a Borel probability measure \(Q\in \mathcal {P}(C([0,T];\Gamma ))\) satisfying \((e_t)_{\#}Q=\textsf{P}_t\) for all \(t\in [0,T]\), and concentrated on curves \(\nu \in AC([0,T];(\Gamma ,\Vert \cdot \Vert _{TV}))\), for which \((\nu ,\lambda ^+_{\nu },\lambda ^-_{\nu }) \in \mathscr{C}\mathscr{E}\). Moreover,
Conversely, if there is a Borel probability measure \(Q\in \mathcal {P}(C([0,T];\Gamma ))\) concentrated on curves \(\nu \in AC([0,T];(\Gamma ,\Vert \cdot \Vert _{TV}))\) and a Borel family \(\{\lambda ^{\pm }_{t,\nu }\}\), for which \((\nu ,\lambda ^+_{\nu },\lambda ^-_{\nu })\in \mathscr{C}\mathscr{E}\), with
then \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in {\textsf{CE}_{\infty }}\) for \(\textsf{P}_t:=(e_t)_{\#}Q\), \(\textsf{J}_t^{\pm }:=\textsf{P}_t \lambda ^{\pm }_{t,\nu }\), and (4.5) holds as well.
The inspiration for using a superposition principle stems from similar approaches in [11, 12], where it is applied to transport equations lifted from the Boltzmann-equation or mean-field jump dynamics respectively, and the main ingredient is the abstract superposition principle over \({\mathbb {R}}^{\mathbb {N}}\) of [2]. However, these results are not directly applicable to our setting, since the mass of \(\nu _t(\mathcal {T})\) for a mean-field curve is not fixed, and \(V[\nu ](\mathcal {T})\) is finite but unbounded over \(\Gamma \). We remedy this by combining two known superposition principles: on the one hand, the abstract superposition principle over \({\mathbb {R}}^{\mathbb {N}}\) of [2], and on the other hand one for finite-dimensional vector fields with linear growth, found in [1]. Our result is stated in Theorem B.1.
Proof
Consider any \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{\infty }\) with finite \(\mathcal {R}_{\infty }\), and for a.e. \(t\in [0,T]\) set \(\lambda ^{\textrm{net}}_{t,\nu }:=\lambda ^+_{t,{\nu }}-\lambda ^-_{t,{\nu }}\). By Corollary 4.9,
Now, take a countable and dense set \(f_1,f_2,\ldots \in C_b(\mathcal {T})\), with \(f_1=1\), \(\Vert f_i\Vert _{\infty }\le 1\), \(i\ge 2\), and define \({\mathbb {T}}:\Gamma \rightarrow {\mathbb {R}}^{{\mathbb {N}}}\)
Note that \({\mathbb {T}}(\nu )\) is injective, continuous when \(\Gamma \) is equipped with the narrow topology and \({\mathbb {R}}^{{\mathbb {N}}}\) with product topology, and is an isometry between \((\Gamma ,\Vert \cdot \Vert _{TV})\) and \(({\mathbb {T}}(\Gamma ),|\cdot |_{\infty })\), where \(|\cdot |_{\infty }\) is the uniform norm over \({\mathbb {R}}^{{{\mathbb {N}}}}\). We set \(\sigma _t:={\mathbb {T}}_{\#}\textsf{P}_t \in \mathcal {P}({\mathbb {R}}^{{\mathbb {N}}})\), and for a.e. \(t\in [0,T]\) define the vector field \({\textbf{W}}_t:{\mathbb {R}}^{{{\mathbb {N}}}}\rightarrow {\mathbb {R}}^{{{\mathbb {N}}}}\) via its components
Note that the support of \({\textbf{W}}_t\) is in \({\mathbb {T}}(\Gamma )\), that \(|{\textbf{W}}_t(z)|_{\infty }\le \Vert \lambda ^{{\textrm{net}}}_{t,{\mathbb {T}}^{-1}(z)}\Vert _{TV}\) and \(({\mathbb {T}}(\nu ))_1=\nu (\mathcal {T})\). Therefore, by (4.6) we have the estimate
Moreover, \((\sigma ,{\textbf{W}})\) satisfy the continuity equation, in the sense that for all \(g\in \textrm{Cyl}_c({\mathbb {R}}^{\mathbb {N}})\), we have
Indeed, take any \(g\in \textrm{Cyl}_c({\mathbb {R}}^{\mathbb {N}})\) and define \(F:=g\circ {\mathbb {T}}\), i.e.
Note that \(F\in \textrm{Cyl}_c(\Gamma )\), and therefore since \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{\infty }\),
Thus, we are now in a position to apply Theorem B.1, and obtain a Borel probability measure \(\Omega \) over \(C([0,T];{\mathbb {R}}^{{\mathbb {N}}})\) satisfying \((e_t)_{\#} \Omega =\sigma _t\) for all \(t\in [0,T]\), and which is concentrated on the family of curves \(z\in AC([0,T];{\mathbb {R}}^{{\mathbb {N}}})\) that are solutions to the ODE
Note that since \(\textrm{supp}(\sigma )\subseteq {\mathbb {T}}(\Gamma )\), we have \(\textrm{supp}(\Omega )\subseteq AC([0,T];{\mathbb {T}}(\Gamma ))\). Now let \(\tilde{{\mathbb {T}}}:C([0,T];\Gamma )\rightarrow C([0,T];{\mathbb {R}}^{{\mathbb {N}}})\) be defined via \((\tilde{{\mathbb {T}}}(\nu ))_t:={\mathbb {T}}(\nu _t)\). Similar as for \({\mathbb {T}}\), \(\tilde{{\mathbb {T}}}\) is injective and an isometry when seen as a map \(\tilde{{\mathbb {T}}}:AC([0,T];(\Gamma ,\Vert \cdot \Vert _{TV}))\rightarrow AC([0,T];({\mathbb {R}}^{{\mathbb {N}}},|\cdot |_{\infty }))\). Therefore, it is clear the measure \(Q:=\tilde{{\mathbb {T}}}^{-1}_{\#} \Omega \in \mathcal {P}(C([0,T];\Gamma ))\) is well defined, satisfies \(\textsf{P}_t=(e_t)_{\#}Q\) and is concentrated on the family of curves \(\nu \in AC([0,T];(\Gamma ,\Vert \cdot \Vert _{TV}))\), for which
Moreover,
where the latter is finite by assumption, and hence, by Lemma 2.14, we deduce that \((\nu ,\lambda _{\nu }^{+},\lambda _{\nu }^{+})\in \mathscr{C}\mathscr{E}\) Q-almost everywhere.
The reverse statement can be derived straightforwardly and we omit the proof. \(\square \)
4.4 Variational characterization
Having all the ingredients at hand, we can now prove the variational characterization for the Liouville equation, namely Theorem 4.7.
Proof of Theorem 4.7
Suppose \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\) is such that \(\mathcal {F}_{\infty }(\textsf{P}_0)<\infty \) and \(\mathcal {I}_{\infty }<\infty \). Since \(\mathcal {F}_{\infty }\) is non-negative we have in particular that
Hence, from the superposition principle of Theorem 4.15, we obtain a Borel probability measure Q over \(C([0,T];\Gamma )\) satisfying \((e_t)_{\#}Q=\textsf{P}_t\) for all \(t\in [0,T]\) and concentrated on the family of curves \(\nu \in AC([0,T];(\Gamma ,\Vert \cdot \Vert _{TV}))\) for which \((\nu ,\lambda ^+_{\nu },\lambda ^-_{\nu })\in \mathscr{C}\mathscr{E}\). Moreover,
Since \(\mathcal {F}_{\infty }(\textsf{P}_0)<\infty \) we have that for Q-a.e. curve \(\mathcal {F}_{MF}(\nu _0)<\infty \). Moreover, since both \(\mathcal {F}_{\infty }\) and \(\mathcal {D}_{\infty }\) are simply their mean-field counterparts integrated by \(\textsf{P}\), we find
where the second equality follows from Fubini-Tonelli and the fact that \(\mathcal {R}_{MF},\mathcal {D}_{MF},\mathcal {F}_{MF}\ge 0\) and \(\mathcal {F}_{\infty }(\textsf{P}_0)<\infty \). In particular, by the non-negativeness of \(\mathcal {I}_{MF}\) it holds that \(\mathcal {I}_{\infty }\ge 0\).
Moreover, since \(\mathcal {I}_{MF}=0\) if and only if \(\nu \) is the unique strong solution for an initial datum \({{\bar{\nu }}}\) with \(\mathcal {E}\textrm{nt}(\bar{\nu }|\gamma )<\infty \), we derive by non-negativeness of \(\mathcal {I}_{MF}\) that \(\mathcal {I}_{\infty }=0\) if and only if Q is concentrated on the unique solutions of the mean-field equation. In this case, Q is characterized by
where \(G_t:\Gamma \rightarrow \Gamma \) defined by (4.2) maps any \({{\bar{\nu }}}\) to the unique solution to (\(\mathsf MF\)) for initial condition \(\nu _0={{\bar{\nu }}}\) and \({\tilde{G}}:\Gamma \rightarrow C([0,T],\Gamma )\) is defined via \(({\tilde{G}}(\nu _0))_t:=G_t(\nu _0)\). Note that \(\textsf{P}_t=(G_t)_{\#}\textsf{P}_0\), \(\textsf{J}_t^{\pm }=\textsf{P}_t \kappa _{\nu }^{\pm }\) for almost every \(t\in [0,T]\), and in particular \(\textsf{P}_t\) is a weak solution to (Li).
Vice versa, if \(\textsf{P}\) is a weak solution such that \(\textsf{P}_t=(G_{t})_{\#} \textsf{P}_0\), we simply set
Since \(\mathcal {F}_{\infty }(\textsf{P}_0)<\infty \), we still have \(\mathcal {E}\textrm{nt}(\nu |\gamma )<\infty \) for \(\textsf{P}_0\)-almost every \(\nu \), and we repeat the same calculations to conclude that indeed \(\mathcal {I}_{\infty }=0\). \(\square \)
5 EDP-convergence
In the previous sections, we have established variational formulations for the solution to the forward Kolmogorov equation of the interacting particle system, for the solutions to the mean-field equation, and for the corresponding Liouville equation. Moreover, for the latter, we have shown how the corresponding EDP-functional can be represented as the expectation over a functional of mean-field paths.
We are now in a position to rigorously discuss the convergence of the forward Kolmogorov equation to the Liouville equation, in terms of the EDP-convergence of their gradient structures. Namely, let us denote a sequence of curves \((\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})\in \textsf{CE}_n\) converging to a curve \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\), denoted by \(\lim _{n\rightarrow \infty }(\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-}) = (\textsf{P},\textsf{J}^+,\textsf{J}^-)\), if the following holds:
-
\(\textsf{P}_t^n \rightarrow \textsf{P}_t\) narrowly for all \(t\in [0,T]\),
-
\(\textsf{J}_t^{n,\pm }(\textrm{d}\nu ,\textrm{d}x) \, \textrm{d}t \rightarrow \textsf{J}_t^{\pm }(\textrm{d}\nu ,\textrm{d}x) \, \textrm{d}t\) vaguely on \(\mathcal {M}^+_{loc}(\Gamma \times \mathcal {T}\times [0,T])\).
Theorem 5.1
Suppose that a sequence \((\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})\in \textsf{CE}_n\), \(n\ge 1\), is such that
then the family of curves \(\{(\textsf{P}_t)_{t\in [0,T]}\}_{n}\) is W-equicontinuous (4.4), and there exists a (not relabelled) subsequence \((\textsf{P}^{n},\textsf{J}^{n,+},\textsf{J}^{n,-})\) and a \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{\infty }\) such that
Moreover, for any such converging sequence
Remark 5.2
The compactness result is slightly stronger. As shown in the proof of Theorem 5.1 the measures \(\textsf{J}_r^{n,\pm }(\textrm{d}\nu ,\textrm{d}x) \, \textrm{d}r\) converge vaguely on \(\mathcal {M}^+_{loc}(\Gamma \times \mathcal {T}\times [s,t])\) for any \(s,t\in [0,T]\).
Note that if in addition, the initial data is well-prepared, in the sense that
then for any converging subsequence, we have the liminf-estimate
or in other words, obtain evolutionary \(\varGamma \)-convergence of \(\mathcal {I}_n\) to \(\mathcal {I}_{\infty }\).
Now, recall by Theorem 3.8 that unique gradient-flow solutions to the forward Kolmogorov equations (\(\mathsf FKE_n\)) exist, and similarly, gradient-flow solutions to the Liouville equation (Li) are unique by Theorem 4.7. Therefore, modifying classical arguments from [36, 37], we can directly conclude the following convergence for the sequence of solutions.
Theorem 5.3
Consider a converging sequence \(\mathcal {P}(\Gamma _n) \ni {{\bar{\textsf{P}}}}^n\rightarrow {{\bar{\textsf{P}}}} \in \mathcal {P}(\Gamma )\) such that
and for each \(n\ge 0\) let \(\textsf{P}_t^n\) be the unique gradient-flow solution to ((\(\mathsf FKE_n\))) with initial data \({{\bar{\textsf{P}}}}^n\). Then there exists a unique gradient-flow solution \(\textsf{P}\) to (Li) with initial data \({{\bar{\textsf{P}}}}\). Moreover, we have the convergence
Proof
Recall that \(\mathcal {I}_n(\textsf{P}^n,\vartheta _{\textsf{P}^n}^{+},\vartheta _{\textsf{P}^n}^{-})=0\) for all \(n\ge 0\). Therefore, by (5.3) and Theorem 5.1 we have for any subsequence indexed by \(n'\) converging to a \((\textsf{P},\textsf{J}^{+},\textsf{J}^-)\in \textsf{CE}_{\infty }\) that (5.2) holds, and hence
and thus \(\mathcal {I}_{\infty }(\textsf{P},\textsf{J}^{+},\textsf{J}^{-})=0\), which implies that \(\textsf{P}\) is the unique gradient-flow solution to (Li) and \(\textsf{J}_t^{\pm }=\vartheta _{\textsf{P}_t}^{\pm }\) for a.e. \(t\in [0,T]\). The convergence of \(\textsf{P}_t^n\) now follows from a compactness argument, and by lower semicontinuity, we conclude that for every \(t\in [0,T]\),
and therefore \(\liminf _{n\rightarrow \infty } \mathcal {F}_n(\textsf{P}_t^n)=\mathcal {F}_{\infty }(\textsf{P}_t)\). \(\square \)
Now suppose that in addition, the initial sequence of measures \({\bar{\textsf{P}}}^n\) is chaotic, in the sense that
Then, as a consequence of Theorem 5.3, we have the propagation of chaos result, namely
where \(\nu _t\) is the unique solution to the mean-field equation (2.13) with initial datum \({{\bar{\nu }}}\). As mentioned in the introduction, for interacting particle systems with the number of particles fixed at \(n\in {\mathbb {N}}\), this would imply narrow convergence of the k-marginals at time t to \(\nu _t^{\otimes k}\) (e.g. see [38]), in our setting this implies convergence of the k-correlation functions [4].
Moreover, note that we have a stronger notion of convergence since the free energies \(\mathcal {F}_n\) converge as well. Under appropriate conditions on the initial datum \({{\bar{\nu }}}\), this guarantees a version of propagation of entropic chaoticity. Namely, for any \(\nu \) we define the rescaled Poisson measures
It is straightforward to check that \(\Pi _{n,\nu ^*}\rightarrow \delta _{\nu ^*}\) narrowly. We then have the following result.
Theorem 5.4
(Propagation of chaos) Consider the setting of Theorem 5.3 and assume additionally that \({{\bar{\textsf{P}}}}=\delta _{{{\bar{\nu }}}}\) for some \(\bar{\nu }\in \Gamma \) with \(\mathcal {E}\textrm{nt}({{\bar{\nu }}}|\gamma )<\infty \). Let \(\nu _t\) be the unique solution to (2.13) with initial datum \({{\bar{\nu }}}\). Then for all \(t\in [0,T]\),
If additionally there exists a constant \(C>1\) such that \(C^{-1}\le \textrm{d}{{\bar{\nu }}}/\textrm{d}\gamma \le C\) then
Theorems 5.1 and 5.4 are proved in Sect. 5.3. However, first, we show \(\Gamma \)-convergence of the free energies in Sect. 5.1 and establish the necessary estimates in Sect. 5.2.
5.1 \(\varGamma \)-convergence of \(\mathcal {F}_n\)
While only the liminf-estimates for the free energy \(\mathcal {F}_n\) are necessary for the proof of Theorem 5.1 and the convergence of solutions, we provide here the full \(\varGamma \)-convergence result. We rely strongly on the characterization of [26], which connects a large deviation principle with rate function I to the fact that
and provides useful sufficient conditions for both.
Recall in our setting that
We then have the following result, which we prove after Lemma 5.6 below.
Theorem 5.5
The family \(\{\mathcal {F}_n\}_{n\ge 1}\) is equicoercive and \(\varGamma \)-converges to \(\mathcal {F}_{\infty }\) in the sense that
-
for any converging sequence \(\textsf{P}^n\rightarrow \textsf{P}\in \mathcal {P}(\Gamma )\):
$$\begin{aligned} \mathcal {F}_{\infty }(\textsf{P})\le \liminf _{n\rightarrow \infty } \mathcal {F}_n(\textsf{P}^n), \end{aligned}$$ -
for any \(\textsf{P}\in \mathcal {P}(\Gamma )\) with \(\mathcal {F}_{\infty }(\textsf{P})<\infty \) there exists a sequence \(\textsf{P}^n\in \Gamma \) converging to \(\textsf{P}\) such that
$$\begin{aligned} \lim _{n\rightarrow \infty } \mathcal {F}_n(\textsf{P}^n) = \mathcal {F}_{\infty }(\textsf{P}). \end{aligned}$$
By the results of [26, Theorems 3.4, 3.5] it is sufficient to merely show the corresponding bounds or limits for any \(\textsf{P}\) of the form \(\textsf{P}=\delta _{\nu }\) for some \(\nu \in \Gamma \). Because of this reduction, we can make use of the so-called cumulant generating functionals \(G_n\) given by
for any \(f\in \mathcal {B}_b(\Gamma )\), and their limit counterpart
Note that by Legendre duality of the entropy functional, we have for all \(n> 0\) the inequality
and for the Legendre dual of G, we have
We will first simplify \(G_n\) and show that it indeed converges to G.
Lemma 5.6
Let \(f\in \mathcal {B}_b(\mathcal {T})\). Then for each \(n>0\)
In particular
Proof
Using the representation for the rescaled Poisson measure \(\Pi _n\) we have
and after taking logarithms and dividing by n we obtain the desired statement. Moreover, recall that by assumption \(\gamma (\mathcal {T})>0\) and note that by the boundedness of f,
Hence we can take limit \(n\rightarrow \infty \) to deduce
thereby concluding the proof. \(\square \)
Next, we establish convergence for suitable linear functionals of \(\nu \). In “Appendix C”, we will even prove convergence for quadratic functionals if the mass of \(\nu (\mathcal {T})\) is appropriately controlled.
Lemma 5.7
Suppose that the sequence \(\textsf{P}^n\) converges narrowly and
Then for any \(f\in B_b(\mathcal {T})\) it holds that
Proof
First, let us consider \(f\in C_b(\mathcal {T})\), and introduce the functions \(F(\nu ):=\langle f,\nu \rangle \) and its truncation \(F_L(\nu ):=\alpha _{L}(\nu (\mathcal {T})) \langle f,\nu \rangle \), where \(\alpha _L(z):={{\bar{\alpha }}}(z-L)\) with \({{\bar{\alpha }}}\in C_b({\mathbb {R}})\) a continuous non-increasing function such that \(0\le \bar{\alpha }(z)\le 1\) for all z, \({{\bar{\alpha }}}(z)=1\) for \(z\le 0\), and \({{\bar{\alpha }}}(z)=0\) for all \(z\ge 1\).
Note that \(F_L(\nu )\uparrow F(\nu )\) as \(L \rightarrow \infty \) and that \(F_L\) is continuous and bounded for every \(L\ge 0\). Hence,
We will show that
From this we can obtain (5.5) since by duality,
Taking subsequent limits in n, L and \(\beta \) to infinity, we deduce
thus proving the desired equality (5.5).
Now, to establish (5.6), first note that \(|F_L-F|(\nu )\le |\alpha _L(N/n)-1|\langle |f|,\nu \rangle \), and therefore
with \(C_{\beta }:=e^{\beta \Vert f\Vert _{\infty }} \gamma (\mathcal {T})\). Suppose \(X_n\) is a Poisson variable with mean \(n C_\beta \). Then the second term in the previous estimate can be expressed as
It is clear that \(\frac{1}{n}X_n\rightarrow C_{\beta }\) almost surely as \(n\rightarrow \infty \). Moreover, by elementary large deviation results, e.g. as in Cramer’s theorem [10, Theorem 2.2.3], it satisfies a large deviation principle with the rate n and rate function \(I_\beta (z):=z \log (z/C_{\beta })-z+C_{\beta }\), which implies
Note that \(\inf _{z\ge a} I_\beta (z)=I_\beta (a)\) for \(a\ge C_{\beta }\), and hence for \(L\ge C_{\beta }\) we obtain the bound
Letting \(L\rightarrow \infty \), we deduce
Finally, let us now consider \(f\in \mathcal {B}_b\). Using a similar density approach as above it is sufficient to show that there exists a sequence of bounded continuous functions \(f_k\), such that
but, by Lemma 5.6, we have
Similar to density statements in \(L^p(\gamma )\), one can find a sequence such that the above integrals vanish as \(k\rightarrow \infty \), see for example [19, Theorem C.5]. \(\square \)
Proof of Theorem 5.5
First, we will show that the family \(\{\mathcal {F}_n\}_{n\ge 1}\) is equicoercive, by establishing a first moment bound for \(\textsf{P}\) in terms of the mass \(\nu (\mathcal {T})\). Namely, setting \(f\equiv 1\) in (5.4) we have for any \(\textsf{P}\in \mathcal {P}(\Gamma )\), \(n\ge 1\), the inequality
where the final term is bounded from above independently of \(\textsf{P}\).
Next, for the limit inferior, consider a narrowly converging sequence \(\textsf{P}^n\rightarrow \textsf{P}=\delta _{{{\bar{\nu }}}}\) for some \({{\bar{\nu }}}\in \Gamma \). Fix any \(f\in C_b(\mathcal {T})\). By duality, we have for every n,
and due to Lemmas 5.6 and 5.7 and,
Taking the supremum over all \(f\in C_b(\mathcal {T})\) we find
Finally, consider an arbitrary \({{\bar{\nu }}}\in \Gamma \) with \(\mathcal {E}\textrm{nt}(\bar{\nu }|\gamma )<\infty \) and set \(\textsf{P}=\delta _{{{\bar{\nu }}}}\). We will construct a sequence of measures \(\textsf{P}^n\) that locally consists of Poisson measures induced by \({{\bar{\nu }}}\). Namely, set
and consider the sequence \(\textsf{P}^n:=\Pi _{n,{{\bar{\nu }}}}\). It is straightforward to verify that indeed \(\textsf{P}^n\rightarrow \delta _{{{\bar{\nu }}}}\). Moreover, note that although \(L_n\) is not bijective, we do have the equality
due to the symmetry of the N-particle distributions \(\bar{\nu }^{\otimes N}\), \(\gamma ^{\otimes N}\). Therefore, we derive
Rescaling and taking the limit \(n\rightarrow \infty \), we obtain
therewith concluding the proof. \(\square \)
5.2 Uniform estimates
In Sect. 3.1 we provided uniform-in-n estimates for the flux. Namely, from Lemma 3.13, we directly have the following.
Corollary 5.8
Consider a sequence \((\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})\in \textsf{CE}_n\) such that
Then
where \(M:=(1+\gamma (\mathcal {T}))\Vert c\Vert _{\infty }\).
However, the weighted total variation metric \(d_{TV,w}\) that was introduced is not appropriate for taking limits, and instead, we take the weaker metric defined in (4.4),
where
Recall that W is narrowly lower semicontinuous and implies narrow convergence on narrowly pre-compact subsets. We now have the following equicontinuity result.
Lemma 5.9
Consider a sequence \((\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})\in \textsf{CE}_n\) such that
Then
where \(|{\dot{\textsf{P}}}_t|_{W}\) is the W-metric speed and \(\tilde{\phi }(s):=\phi (s \vee 1)\) is the monotone relaxation of \(\phi \).
Proof
The proof is similar to Lemmas 3.15 and 4.13, now for the distance W instead of the weighted total variation metric \(d_{TV,w}\). Namely, fix \(n>0\) and consider a curve \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_n\). Then we have for any \(s,t\in [0,T]\) and any \(F\in C_c(\Gamma )\),
Substituting any \(F\in {\mathbb {F}}\) it is straightforward to verify that
for sufficiently large n, and therefore
Taking the supremum over \(F\in {\mathbb {F}}\), we find that \((\textsf{P}_t)_{t\in [0,T]}\) is absolutely continuous w.r.t. W with
where \(|{\dot{\textsf{P}}}^n_t|_W\) is the W-metric speed. Applying the estimates in Lemma 3.13 concludes the proof. \(\square \)
5.3 Proof of main results
We finally conclude the manuscript with the proof of the main results.
Proof of Theorem 5.1
We will first establish the liminf-estimates. Namely, consider a sequence \((\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})\in \textsf{CE}_n\) that converges to the curve \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{\infty }\). In particular \(\textsf{P}_t^n\rightarrow \textsf{P}_t\) for all \(t\in [0,T]\), and hence by Theorem 5.5 on the \(\Gamma \)-convergence of \(\mathcal {F}_n\) we immediately obtain
Now suppose that
In particular we have the bounds
Due to the chain rule and the assumption on \(\mathcal {F}_n(\textsf{P}^n_0)\), we obtain
The latter guarantees, by Corollary C.3, that we have the vague convergence
Recall that from Lemma 3.13 and Remark 3.6 we have for each \(n\ge 1\):
for any dominating measure \(\Sigma \), and similarly, from Corollary 4.10 and Remark 4.5 that
By the convexity and lower semi-continuity of \(\Upsilon \) and H we conclude by standard semi-continuity results (e.g. see [6, Theorem 3.4.3]) that for each \(t\in [0,T]\),
from which (5.1) directly follows after applying the Fatou lemma.
Next, we consider the question of compactness. As in the previous part, let us consider a sequence \((\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})\in \textsf{CE}_n\) with
which imply that the estimates (5.7) and (5.8) still hold. The bound on the free energy ensures by Theorem 5.5 that \(\{\textsf{P}_{t}^n\}_{t\in [0,T],n\ge 1}\) is pre-compact. Moreover, due to the bound on the action \(\mathcal {R}_n\), we have by the results of Corollary (5.8) and Lemma (5.9) that
where \(|{\dot{\textsf{P}}}^n_t|_W\) is again the W-metric speed. From (5.9), we then conclude from the non-decreasing, convex, and super-linear at infinity property of \({\tilde{\phi }}\) that, up to choosing a subsequence \(n'\), there exists a family \(\{\textsf{J}^{\pm }_t\}_{t\in [0,T]} \in \mathcal {M}_{loc}^+(\Gamma \times \mathcal {T})\) such that for all s, t the sequence of measures \(\textsf{J}^{n',\pm }_r(\textrm{d}\nu ,\textrm{d}x)\, \textrm{d}r\) converges to \(\textsf{J}^{\pm }_r(\textrm{d}\nu ,\textrm{d}x)\, \textrm{d}r\) in \(\mathcal {M}_{loc}(\Gamma \times \mathcal {T}\times [s,t])\), and
Similarly, since the metric W is narrowly lower semicontinuous and induces narrow convergence on narrowly pre-compact subsets, we find by an Arzela-Ascoli argument and the estimate (5.10) that, up to choosing a subsequence \(n''\), there exist a narrowly continuous curve \((\textsf{P}_t)_{t\in [0,T]}\) such that \(\textsf{P}^{n''}_t\) converges to \(\textsf{P}_t\) for all \(t\in [0,T]\).
All that remains is showing that \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{\infty }\). Therefore, fix any \(s,t\in [0,T]\) and \(F \in \textrm{Cyl}_c(\Gamma )\). It is straightforward to verify that there exist constants \(K_{F}\) and \(C_F\) such that the following Taylor approximation holds:
Thus, we can take the limit in the continuity equation \(\textsf{CE}_n\), to conclude that
thereby concluding the proof. \(\square \)
Proof of Theorem 5.4
Suppose that \({{\bar{\textsf{P}}}}^n\rightarrow {{\bar{\textsf{P}}}} =\delta _{{{\bar{\nu }}}}\) with
For each \(n\in {\mathbb {N}}\) let \(\textsf{P}_t^n\) be the unique gradient-flow solution to ((\(\mathsf FKE_n\))) with initial data \({{\bar{\textsf{P}}}}^n\). Moreover, let \(\nu _t\) be the unique solution to (2.13) with initial data \({{\bar{\nu }}}\), and set \(\textsf{P}_t:=\delta _{\nu _t}\), which is the unique gradient-flow solution to the Liouville equation (Li) with initial data \({{\bar{\textsf{P}}}}\). Then by Theorem 5.3 we have for every \(t\in [0,T]\) that \(\textsf{P}_t^n\rightarrow \textsf{P}_t\), and
Next, suppose that in addition there exists a constant \(C>1\) such that \(C^{-1}\le \textrm{d}{{\bar{\nu }}}/\textrm{d}\gamma \le C\). By Lemma 2.18 there exists a \(C'<\infty \) with
Now fix any \(t\in [0,T]\), and recall that
It is straightforward to check that \(\Pi _n \ll \Pi _{n,\nu _t} \ll \Pi _n\) and hence for any \(\Gamma _n \ni \Gamma _n=L_n(x_1,\dots ,x_N)\),
with all terms finite, and \(|\sum \log u_t(x_i)|\le N C'\). Therefore, by Lemma 5.7 we derive
Subsequently, we can compute as follows:
and hence the initial data are well-prepared. Therefore, we can conclude for all \(t\in [0,T]\)
thus establishing the entropic propagation of chaos result. \(\square \)
Data Availibility
Data sharing does not apply to this article as no datasets were generated or analyzed during the current study.
References
Ambrosio, L., Crippa, G.: Existence, uniqueness, stability and differentiability properties of the flow associated to weakly differentiable vector fields. In: Transport Equations and Multi-D Hyperbolic Conservation Laws, Volume 5 of Lect. Notes Unione Mat. Ital., pp. 3–57. Springer, Berlin (2008)
Ambrosio, L., Trevisan, D.: Well-posedness of Lagrangian flows and continuity equations in metric measure spaces. Anal. PDE 7(5), 1179–1234 (2014)
Basile, G., Benedetto, D., Bertini, L., Orrieri, C.: Large Deviations for Kac-Like Walks. J. Stat. Phys. 184(1), 27 (2021)
Bodineau, T., Gallagher, I., Saint-Raymond, L., Simonella, S.: Fluctuation theory in the Boltzmann–Grad limit. J. Stat. Phys. 180(1–6), 873–895 (2020)
Bolker, B., Pacala, S.W.: Using moment equations to understand stochastically driven spatial pattern formation in ecological systems. Theoret. Popul. Biol. 52(3), 179–197 (1997)
Buttazzo, G.: Semicontinuity, relaxation and integral representation in the calculus of variations, volume 207 of Pitman Research Notes in Mathematics Series. Longman Scientific & Technical, Harlow; copublished in the United States with John Wiley & Sons, Inc., New York (1989)
Champagnat, N., Ferrière, R., Méléard, S.: Unifying evolutionary dynamics: from individual stochastic processes to macroscopic models. Theoret. Popul. Biol. 69(3), 297–321 (2006)
Champagnat, N., Ferrière, R., Méléard, S.: From individual stochastic processes to macroscopic models in adaptive evolution. Stoch. Model. 24, 2–44 (2008)
Dupuis, P., Ellis, R.S.: A Weak Convergence Approach to the Theory of Large Deviations. Wiley Series in Probability and Statistics: Probability and Statistics. Wiley (1997)
Dembo, A., Zeitouni, O.: Large Deviations Techniques and Applications. Stochastic Modelling and Applied Probability, Vol. 38. Springer, Berlin (2010)
Erbar, M., Fathi, M., Laschos, V., Schlichting, A.: Gradient flow structure for McKean–Vlasov equations on discrete spaces. Discrete Contin. Dyn. Syst. 36(12), 6799–6833 (2016)
Erbar, M.: A gradient flow approach to the Boltzmann equation. arXiv preprint arXiv:1603.00540 (2016)
Fathi, M.: A gradient flow approach to large deviations for diffusion processes. J. Math. Pures Appl. (9) 106(5), 957–993 (2016)
Finkelshtein, D., Kondratiev, Y., Kutoviy, O.: Individual based model with competition in spatial ecology. SIAM J. Math. Anal. 41(1), 297–317 (2009)
Finkelshtein, D., Kondratiev, Y., Kutoviy, O.: Vlasov scaling for stochastic dynamics of continuous systems. J. Stat. Phys. 141(1), 158–178 (2010)
Finkelshtein, D., Kondratiev, Y., Kuchling, P.: Markov dynamics on the cone of discrete Radon measures. Methods Funct. Anal. Topol. 27(2), 173–191 (2021)
Finkelshtein, D., Kondratiev, Y., Yuri, K., Kutoviy, O.: The statistical dynamics of a spatial logistic model and the related kinetic equation. Math. Models Methods Appl. Sci. 25(2), 343–370 (2015)
Fournier, N., Méléard, S.: A microscopic probabilistic description of a locally regulated population and macroscopic approximations. Ann. Appl. Probab. 14(4), 1880–1919 (2004)
Hoeksema, J., Holding, T., Maurelli, M., Tse, O.: Large deviations for singularly interacting diffusions. arXiv preprint arXiv:2002.01295 (2020)
Kaiser, M., Jack, R.L., Zimmer, J.: Canonical structure and orthogonality of forces and currents in irreversible Markov chains. J. Stat. Phys. 170(6), 1019–1050 (2018)
Kaiser, M., Jack, R.L., Zimmer, J.: A variational structure for interacting particle systems and their hydrodynamic scaling limits. Commun. Math. Sci. 17(3), 739–780 (2019)
Kondratiev, Y.G., Lytvynov, E.W., Us, G.F.: Analysis and geometry on marked configuration space. Methods Funct. Anal. Topol. 5(1), 29–64 (1999)
Kraaij, R.C.: Gamma convergence on path-spaces via convergence of viscosity solutions of Hamilton–Jacobi equations. arXiv preprint arXiv:1905.08785 (2019)
Law, R., Dieckmann, U.: Moment Approximations of Individual-based Models, pp. 252–270. Cambridge Studies in Adaptive Dynamics. Cambridge University Press (2000)
Liero, M., Mielke, A., Peletier, M.A., Michiel Renger, D.R.: On microscopic origins of generalized gradient structures. Discrete Contin. Dyn. Syst. Ser. S 10(1), 1–35 (2017)
Mariani, M.: A Gamma-convergence approach to large deviations. arXiv preprint arXiv:1204.0640 (2012)
Mielke, A.: On evolutionary Gamma-convergence for gradient systems. In: Macroscopic and large scale phenomena: coarse graining, mean field limits and ergodicity. Lect. Notes Appl. Math. Mech., Vol. 3, pp. 187–249. Springer, Cham (2016)
Maas, J., Mielke, A.: Modeling of chemical reaction systems with detailed balance using gradient structures. J. Stat. Phys. 181(6), 2257–2303 (2020)
Mielke, A., Montefusco, A., Peletier, M.A.: Exploring families of energy-dissipation landscapes via tilting: three types of EDP convergence. Contin. Mech. Thermodyn. 33(3), 611–637 (2021)
Mielke, A., Peletier, M.A., Michiel Renger, D.R.: On the relation between gradient flows and the large-deviation principle, with applications to Markov chains and diffusion. Potential Anal. 41(4), 1293–1327 (2014)
Montefusco, A., Schütte, C., Winkelmann, S.: A route to the hydrodynamic limit of a reaction-diffusion master equation using gradient structures. arXiv preprint arXiv:2201.02613 (2022)
Patterson, R.I.A., Michiel Renger, D.R.: Large deviations of jump process fluxes. Math. Phys. Anal. Geom. 22(3), 32 (2019)
Peletier, M.A., Rossi, R., Savaré, G., Tse, O.: Jump processes as generalized gradient flows. Calc. Var. Partial Differ. Equ. 61(1), 85 (2022)
Peletier, M.A., Schlichting, A.: Cosh gradient systems and tilting. arXiv preprint arXiv:2203.05435 (2022)
Schlichting, A.: Macroscopic limit of the Becker–Döring equation via gradient flows. ESAIM Control Optim. Calc. Var. 25, 36 (2019)
Serfaty, S.: Gamma-convergence of gradient flows on Hilbert and metric spaces and applications. Discrete Contin. Dyn. Syst. 31(4), 1427–1451 (2011)
Sandier, E., Serfaty, S.: Gamma-convergence of gradient flows with applications to Ginzburg–Landau. Commun. Pure Appl. Math. 57(12), 1627–1672 (2004)
Sznitman, A.-S.: Topics in propagation of chaos. In: École d’Été de Probabilités de Saint-Flour XIX-1989, volume 1464 of Lecture Notes in Math., pp. 165–251. Springer, Berlin (1991)
Acknowledgements
The authors acknowledge support from NWO Vidi grant 016.Vidi.189.102 on “Dynamical-Variational Transport Costs and Application to Variational Evolution”.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Andrea Mondino.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Motivation from large deviations
In Sect. 3, we introduced a new generalized gradient structure for the forward Kolmogorov equation and later showed convergence in the large-population limit to a structure that was lifted from the mean-field dynamics. Here we briefly discuss the relation between existing variational structures, and their connection to the asymptotic probabilities of the underlying process as treated in large deviation theory. All calculations are purely formal and are meant for illustratory purposes.
Throughout, for simplicity, let \(\mathcal {T}\) be a finite set. Recall the reacting particle system formulation described by (1.2), i.e. as particles with labels \(A_t^1,\dots ,A_t^{N_t} \) and traits \(X_t^1,\dots ,X_t^{N_t} \in \mathcal {T}\), and with
Let \(L_t^n\) be the rescaled empirical measure
and \(W_t^{n,\pm }\) the integrated birth/death fluxes:
Moreover, assume that the particles are initially distributed at time \(t=0\) as \(\pi _n\). Then by the work of [32], one can derive under suitable assumptions that the triple \((L_t^n,W_t^{n,\pm })\) is a well-defined Markov process and satisfies a large-deviation principle as \(n\rightarrow \infty \) with rate function \(\mathcal {I}(\nu ,\lambda ^+,\lambda ^+)\) in the sense that asymptotically (as \(n\rightarrow \infty \))
where \(\mathcal {I}^0(\nu ):=\mathcal {E}\textrm{nt}(\nu |\gamma )\) and
Now, under the mean-field detailed balance assumption \(m(x,y)=c(y,x)\) for all \(x,y\in \mathcal {T}\), one can show that if \(\mathcal {F}_{MF}(\nu _0)<\infty \) the rate function \(\mathcal {I}\) is precisely the mean-field EDP-functional defined in (2.4):
This can be seen via symmetrization under time-reversal. Note that for any curve \((\nu ,\lambda ^+,\lambda ^-)\in \mathscr{C}\mathscr{E}\) the ‘reversed’ curve \((\nu _{T-t},\lambda ^{-}_{T-t},\lambda ^{+}_{T-t})\) is still contained in \(\mathscr{C}\mathscr{E}\), and
Then for suitable curves we have the decomposition
which follows from the fact that if \(\mathcal {L}(\nu _t,\lambda ^+_t,\lambda _t^-)\) and \(\mathcal {L}(\nu _t,\lambda ^-_t,\lambda _t^+)\) are finite
The splitting above is a direct consequence of the fact that under the assumption of \(c(x,x)=0\), \(m(x,y)=c(y,x)\) for all \(x,y\in \mathcal {T}\), the underlying jump process \(L_t^n\) is reversible, i.e., \({\bar{\kappa }}_{n}\) satisfies the detailed balance condition \(\Pi _n(\textrm{d}\nu ){\bar{\kappa }}_{n}(\textrm{d}\nu ,\textrm{d}\eta )=\Pi _n(\textrm{d}\eta ){\bar{\kappa }}_{n}(\textrm{d}\eta ,\textrm{d}\nu )\). Namely, consider the functional \({{\bar{\mathcal {I}}}}_n\) given by
where \(j(\textrm{d}\nu ,\textrm{d}\eta )\in \mathcal {M}^+(\Gamma \times \Gamma )\) and \((\textsf{P}_t {\bar{\kappa }}_n)\) is short-hand for the measure \(\textsf{P}(\textrm{d}\nu ){\bar{\kappa }}_n(\textrm{d}\nu ,\textrm{d}\eta )\). Let \(j^{\dag }(\textrm{d}\nu ,\textrm{d}\eta ):=j(\textrm{d}\eta ,\textrm{d}\nu )\), which again corresponds to a time-reversal procedure. We then have for suitable \((\textsf{P},j)\) the following decomposition
where
and \(({\bar{\kappa }}_n \textsf{P})\) is short-hand for the measure \(\textsf{P}(\textrm{d}\eta ){\bar{\kappa }}_n(\textrm{d}\eta ,\textrm{d}\nu )\). Now substituting
we find that
And as we have shown, in the large-population limit of \(n\rightarrow \infty \), \(\mathcal {I}_n\) EDP-converges to a functional that is lifted from \(\mathcal {I}_{MF}\), establishing the microscopic origin of the splitting for \(\mathcal {I}_{MF}\).
This decomposition for reversible processes is well-known in the net-flux representation. Namely, one can show via a minimization approach that
using a dualization argument and the elementary equality
Thus \({\bar{\mathcal {I}}}_n\) is simply the EDP-functional for jump processes of [33]. The works [20, 30, 34] contain an extensive overview and discussion on how \({\bar{\mathcal {I}}}_n\) is the expected rate functional for a large-deviation principle for the empirical measures of independent jump processes, how the reversibility of the process ensures a possible splitting in both the interacting and non-interacting case, and how for complex-balanced systems this can even be done in the irreversible setting. Moreover, for an implicit decomposition using measure-dependent Dirichlet forms in the case of the homogeneous Boltzmann equation and the underlying process, see [3].
On a final note, due to (A.1) and the origin of \(\bar{\mathcal {I}}_n\) in large deviations for independent particles (or via variational representations as found in [9]), one would expect that if \(F_t\in C_b(\Gamma )\) for all \(t\in [0,T]\), we would have for all \(n>0\) the following representation formula for the expectation:
On the other hand, by the large deviation principle of \((L_t^n,W_t^{n,\pm })\) as \(n\rightarrow \infty \), and Varadhan’s Lemma (see [10]), it holds that
Consequently,
Note that the lower bound of this equality follows from Theorem 5.1 and the superposition principle in Theorem 4.7. Moreover, we expect that the large-deviation principle implies evolutionary \(\varGamma \)-convergence of \(\mathcal {I}_n\) in a suitable topology—an implication studied in [23] in a general setting.
It then begs the question if one can reverse this procedure, namely using evolutionary \(\varGamma \)-convergence to establish large-deviation principles similar to the non-evolutionary setting of [26]. This approach was successfully applied in the case of certain diffusion processes [13] and discussed for more general processes in [21].
Appendix B: Superposition principle in \({\mathbb {R}}^{\mathbb {N}}\)
In this section, we present a superposition principle for continuity equations over \({\mathbb {R}}^{{\mathbb {N}}}\) with an additional weighted integrability condition on the associated vector fields.
Following [2, Section 7], we equip \({\mathbb {R}}^{{\mathbb {N}}}\) with the product topology, and \(\pi _n:=(p_1,\dots ,p_n)\) the canonical projections. The space \(AC_w([0,T];{\mathbb {R}}^{\mathbb {N}})\) consists of curves \(\eta \) such that \(p_i\circ \eta \in AC[0,T]\) for all \(i\in {\mathbb {N}}\). Note that both \({\mathbb {R}}^{\mathbb {N}}\) and \(C([0,T];{\mathbb {R}}^{\mathbb {N}})\) are Polish spaces. Moreover, let \(|\cdot |_{\infty }\) be the uniform norm on \({\mathbb {R}}^{{\mathbb {N}}}\).
Smooth n-cylindrical functions with compact support \(f:{\mathbb {R}}^{\mathbb {N}}\rightarrow {\mathbb {R}}\) are given in the form of
with \(\phi \in C_c^{\infty }({\mathbb {R}}^n\rightarrow {\mathbb {R}})\), and define their gradient by
We set \(\textrm{Cyl}_c({\mathbb {R}}^{\mathbb {N}})\) as the union over \(n\in {\mathbb {N}}\) of all smooth n-cylindrical functions with compact support.
In the following, we consider pairs \((\nu ,\varvec{c})\), where \((\nu _t)_{t\in [0,T]} \subset \mathcal {P}({\mathbb {R}}^{\mathbb {N}})\) is a weakly continuous family of probability measures and \(\varvec{c}:[0,T]\times {\mathbb {R}}^{\mathbb {N}}\rightarrow {\mathbb {R}}^{\mathbb {N}}\) is a Borel vector field satisfying
and all \(0\le s\le t\le T\).
We then have the following result.
Theorem B.1
Let \((\nu ,\varvec{c})\) be as above. Furthermore, suppose that for some \(M>0\)
Then there exists a Borel probability measure \(\lambda \) over \(C([0,T];{\mathbb {R}}^{\mathbb {N}})\) satisfying \((e_t)_{\#} \lambda =\nu _t\) for all \(t\in [0,T]\), and is concentrated on the family of curves \(\gamma \in AC([0,T];{\mathbb {R}}^{\mathbb {N}})\) that satisfy
The proof of Theorem B.1 combines a slight adaptation of the proof for the superposition principle in \({\mathbb {R}}^{\mathbb {N}}\) found in [2, Theorem 7.1], developed for use in metric measure spaces, with a finite-dimensional result for vector fields over \({\mathbb {R}}^n\) found in [1, Theorem 4.4]. Due to the strong similarities with the proof found in [2], we merely give a brief sketch.
Proof
By tightness of \(\nu _0\), we can choose a sequence of coercive functionals \(\Phi _i\) such that
and consider the functional \(\mathcal {A}{}:C([0,T];{\mathbb {R}}^{\mathbb {N}})\rightarrow [0,+\infty ]\) given by
It is clear that \(\mathcal {A}\) is coercive in \(C([0,T];{\mathbb {R}}^{\mathbb {N}})\), and its sublevel sets contain curves that are absolutely continuous with respect to \(|\cdot |_{\infty }\). This follows from the fact that \(\sup _{t\in [0,T]}{|p_1\circ \eta |}\) is bounded on the sublevel sets of the functional
Now, for every \(n\in {\mathbb {N}}\), we define the marginals \(\mathcal {P}({\mathbb {R}}^n)\ni \nu ^n_t:=(\pi _n)_{\#}\nu _t\) and corresponding vector fields \(\varvec{c}_{t}^n: {\mathbb {R}}^n\rightarrow {\mathbb {R}}^n\) by
Note that \((\nu ^n,\varvec{c}_{{}}^n)\) satisfies the continuity equation in \({\mathbb {R}}^n\). By Jensen’s inequality, and the fact that \(|z_1|\le |z|\le n |z|_{\infty }\), for \(z=(x_1,\ldots ,x_n)\in {\mathbb {R}}^n\), we have that
and in particular
Hence, we can apply the finite-dimensional version of [1, Theorem 4.4]. Embedding this into \({\mathbb {R}}^{\mathbb {N}}\), we obtain the probability measure \(\lambda _n\) over \(C([0,T],{\mathbb {R}}^{\mathbb {N}})\), concentrated on absolutely continuous curves satisfying \(\dot{\gamma }=\varvec{c}_t^n(\gamma )\), and such that \((e_t)_{\#}\lambda _t=\nu _t^n\). We immediately see that
which yields the tightness of \(\lambda ^n\).
Consider any converging sequence \(\lambda ^n\) (up to renumbering) and its limit \(\lambda \in \mathcal {P}(C([0,T];{\mathbb {R}}^{\mathbb {N}}))\). Since the sequence \((\nu _t^n)_{n\in {\mathbb {N}}}\) clearly converges to \(\nu _t:=(e_t)_{\#}\lambda \) in \(\mathcal {P}({\mathbb {R}}^{\mathbb {N}})\) for every \(t\in [0,T]\), it remains to show that \(\lambda \) is concentrated on solutions of \(\dot{\gamma }=\varvec{c}_t(\gamma )\). In fact, we will show that
Note that it suffices to show that for any vector field \(d:[0,T]\times {\mathbb {R}}^{\mathbb {N}}\rightarrow {\mathbb {R}}\) with \(d_t\) being k-cylindrical for every \(t\in [0,T]\), we have that
since then we can use the density of time-dependent cylindrical functions in \(L^1((1+|{x}_1|)^{-1} \nu _s \,\textrm{d}s)\) and the fact that for all s it holds that \(|p_1\circ \gamma (s)|\le \Vert p_1\circ \gamma \Vert _{\infty }\).
To prove (B.1), recall that \(\lambda ^n\) is concentrated on absolutely continuous solutions of \(\dot{\gamma }_s=\varvec{c}^n_s(\gamma _s)\). Hence,
Note that the integrand on the left-hand side is continuous in \(\gamma \). Therefore, since for \(n\ge k\)
the result then follows after taking the limit \(n\rightarrow \infty \). \(\square \)
Remark B.2
If one is only interested in curves in \(AC_w([0,T];{\mathbb {R}}^{\mathbb {N}})\), the theorem also holds whenever
The finite dimensional analog of this statement, set in \({\mathbb {R}}^n\) with the prefactor \((1+|x|)^{-1}\), is presented in [1, Theorem 4.4]. Moreover, for \({\mathbb {R}}^{\mathbb {N}}\), in [2, Theorem 7.1] the condition reads as
Appendix C: Non-continuous competition kernel
In the proof of Theorem 5.1 we require the vague convergence of \(\vartheta _{\textsf{P}^n}^{\pm }\) and \(\textsf{T}^{n,\pm }_{\#}\vartheta _{\textsf{P}^n}^{\pm }\) under the assumption of narrow convergence of \(\textsf{P}^n\) and equiboundedness of the free energy functionals \(\mathcal {F}_n\), where
If the competition kernel c is continuous, the desired statement would follow directly from the narrow convergence of \(\textsf{P}^n\). The case of merely bounded measurable c is however less trivial. Note that the strategy we employed in the proof of Theorem 3.20 is not possible, since although for every fixed n the sub-levels of \(\mathcal {F}_n\) are sequentially compact with respect to setwise convergence, this is not the case for equibounded sets of \(\{\mathcal {F}_n\}_{n\ge 1}\).
Fortunately, due to the connection between \(\varGamma \)-convergence of \(\mathcal {F}_n\) and large deviations as discussed in Section A, we can modify results from the authors’ earlier work on large deviations for interacting systems induced by singular or irregular functionals [19]. In particular, we obtain the following convergence statement.
Theorem C.1
Let \(\{\textsf{P}^n\}_{n\ge 1}\subset \mathcal {P}(\Gamma )\) be a sequence narrowly converging to \(\textsf{P}\in \mathcal {P}(\Gamma )\) with
Then for any \(\omega \in C_c(\Gamma \times \mathcal {T})\) and \(g \in \mathcal {B}_b(\mathcal {T}^2)\)
Remark C.2
The result can be easily generalized to bounded measurable functions \(g\in \mathcal {B}_b(\mathcal {T}^k)\) for finite \(k\in {\mathbb {N}}\), but we restrict ourselves to the case \(k=2\).
Corollary C.3
Let \(\{\textsf{P}^n\}_{n\ge 1}\subset \mathcal {P}(\Gamma )\) be a sequence narrowly converging to \(\textsf{P}\in \mathcal {P}(\Gamma )\) such that
Then vaguely
Proof
The first statement of (C.1) follows directly from Theorem C.1 by substituting \(g:=c\). Moreover, by the uniform continuity and compact support of any \(\omega \in C_c(\Gamma \times \mathcal {T})\) we have
and a similar approach works for \(\textsf{T}^{n,-}_{\#} \vartheta _{\textsf{P}^n}^{-}\). \(\square \)
For the proof of Theorem (C.1) we will need some a priori bounds. Namely, recall from Sect. 5.1 the generating functionals and their limit
For the “interacting” case, namely functionals of the form
there is however a problem with the unboundedness of the mass of \(\nu \). Nevertheless, upon controlling the mass we can provide the following technical estimate.
Lemma C.4
Let \(F(\nu ):=h(\nu (\mathcal {T})) \langle g,\nu ^{\otimes 2}\rangle \) with \(\textrm{supp}(h)\in [0,K]\) and \(g\in \mathcal {B}_b(\mathcal {T}^2)\). Then
and in particular
Proof
Suppose that
and let us consider the following interaction energy functional:
From a Hoeffding’s decomposition argument, see [19, Lemma 3.8], we have for every \(N\ge 2\), \(\alpha \ge 0\) the estimate
Moreover, since \(N/(N-1)\le 2\) for \(N\ge 2\), and
we find that
Recall that \(L_n(x_1,\dots ,x_N):=\tfrac{1}{n}\sum \delta _{x_i}\). Since the mass \(L_n(x_1,\dots ,x_N)(\mathcal {T})=N/n\) is bounded by K on the support of F we have for \(N\ge 2\):
while for \(N=1\) we have the trivial estimate \(|F|(L_n)\le \tfrac{K}{n} \Vert h\Vert _{\infty } \Vert g\Vert _{\infty }\), and hence for all \(N\ge 1\),
Using the representation for \(\Pi _n\) we can therefore estimate
which proves (C.2). The final desired statement follows directly after taking limits. \(\square \)
With the above estimate in hand, we can now prove our convergence statement by approximating g with a sequence of continuous \(g_{\varepsilon }\) such that
The existence of such a sequence follows similarly as for density statements in \(L^p(\gamma )\), see for example [19][Theorem C.5].
Proof of Theorem C.1
Consider a \(g\in \mathcal {B}_b(\mathcal {T}^2)\) and let \(\{g_{\varepsilon }\}_{\varepsilon >0}\subset C_b(\mathcal {T}^2)\) be a sequence approximating g in the sense of (C.3). Note that by the narrow convergence of \(\textsf{P}^n\) we have for any \(\omega \in C_c(\Gamma \times \mathcal {T})\) and any \(\varepsilon >0\) that
Note that by the compact support of \(\omega \), it suffices to show that for every \(K>0\):
Let us consider (C.4a), and set
By duality of the entropy and Lemma C.4, we have for every \(n\ge 1,\varepsilon>0,K>0\), and \(\beta >0\),
Taking subsequently the limits \(n\rightarrow \infty \) and \(\varepsilon \rightarrow 0\), we deduce
But, since \(\beta >0\) was arbitrary, we conclude that the right-hand side reduces to zero.
Similarly, for (C.4b), let
Then by duality and Lemma 5.6 we obtain
where the last inequality follows by applying Jensen’s inequality inside the exponential. Again taking the limit \(\varepsilon \rightarrow 0\) and thereafter \(\beta \rightarrow \infty \), we conclude the proof. \(\square \)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Hoeksema, J., Tse, O. Generalized gradient structures for measure-valued population dynamics and their large-population limit. Calc. Var. 62, 158 (2023). https://doi.org/10.1007/s00526-023-02500-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00526-023-02500-y