1 Introduction

An important goal in theoretical biology and population dynamics is to derive macroscopic equations from microscopic models [7, 14]. For many stochastic interacting particle systems involving birth, mutation, and death, these connections have been made rigorous. One such class of particle systems consists of spatially-structured models such as the Bolker–Pacala and Dieckmann-Law (BPDL) model [5, 24]. The dynamics of these particle systems can be described by jump processes on the space of finite positive measures and can be used to derive macroscopic models.

The convergence of such measure-valued jump processes under a mean-field scaling to a large-population limit is shown for example in [18] via martingale techniques, and in [14], where an analytic approach to the convergence of rescaled moment equations is used. In both approaches, the limiting evolution is governed by a non-local evolution equation given by

$$\begin{aligned} \partial _t u_t(x) =\int _{{\mathbb {R}}^d} m(y,x)\,u_t(y) \,\textrm{d}y-u_t(x) \int _{{\mathbb {R}}^d} c(x,y)\, u_t(y) \, \textrm{d}y. \end{aligned}$$
(1.1)

We will refer to (1.1) as the mean-field equation. Here, \(u_t\) represents the limiting density of particles at time t, and the parameter functions m and c are continuous and bounded functions stemming from birth, dispersal, and competition in the BPDL model.

In recent years, there has been considerable activity in studying the mean-field Eq. (1.1) and the BPDL model in more general spaces, allowing for dynamics involving multiple species and combinations of discrete and continuous traits. See for example [16] for an overview of existing models, where instead of \({\mathbb {R}}^d\) the underlying space is an arbitrary locally compact Polish space. However, convergence in the large population limit is not considered.

Meanwhile, powerful variational tools have been developed in the last decade for studying mean-field interacting jump processes and their limits under the assumption of detailed balance. To highlight only a few: [11] studied mean-field limits for measure-dependent jump processes; [12] proved the convergence of the spatially-homogeneous Kac-process to the Boltzmann equation; [35] investigated the macroscopic limit of Becker–Döring models; [21] showed hydrodynamic limits for zero-range and exclusion processes; [28] discussed convergence and higher-order approximations for chemical reaction networks, an approach that was subsequently used in the setting of discretized reaction-diffusion equations in [31].

In this work, we extend and apply these variational techniques to prove the mean-field limit for population dynamics over arbitrary compact Polish spaces, with bounded measurable parameters mc satisfying a detailed balance condition. In addition, we establish entropic propagation of chaos, which controls the discrepancy between the microscopic and macroscopic models in a precise sense. To the authors’ knowledge, this is the first convergence result under such general assumptions.

To do so, we first introduce a new generalized gradient structure and rigorous variational formulation for the forward Kolmogorov equation (FKE) corresponding to the BPDL model, where the FKE describes the evolution of the law of the measure-valued process. Our formulation incorporates not only the equation itself but tracks the birth and death fluxes as well. This extends the generalized gradient-flow framework of [33] due to the unboundedness of the underlying jump kernel, and the positivity of the fluxes.

We then show the convergence of these generalized gradient structures under a mean-field scaling and the large-population limit in the sense of Energy Dissipation Principles (EDPs) (see [25]). The limiting gradient flow is the Liouville equation corresponding to the mean-field equation, namely a transport equation that describes the evolution of the law of a process that follows deterministic dynamics described by (1.1) but for possibly random initial conditions. This connection between the Liouville equation and the mean-field equation is made rigorous with the help of a modification of the superposition principle of [1].

In particular, we deduce that the laws determined by the FKE equation concentrate around the solution of the mean-field equation (1.1), which due to the convergence of the associated free energies translates into an entropic propagation of chaos result, see Theorem 1.10.

Outline The rest of this section is devoted to giving a brief overview of our setting and presenting the main results. In Sect. 2 the mean-field equation and corresponding gradient structure are introduced. We repeat this process in Sects. 3 and 4 for the forward Kolmogorov equation and the Liouville equation respectively, with the proof of a modified superposition principle delegated to “Appendix B”. Finally, in Sect. 5, we establish the EDP-convergence of the gradient structures and prove both the convergence to the mean-field limit and the propagation of chaos.

1.1 Measure-valued population dynamics and mean-field limits

We consider the forward Kolmogorov equation that corresponds to a generalized version of the BPDL model. In its classical form, the Bolker-Pacala model is a purely spatially-structured microscopic model for a population of plants involving the birth, dispersal, and either natural death or death by competition for resources and can be modeled as a jump process in the space of positive measures over \({\mathbb {R}}^d\). However, in certain models of adaptive evolution, it is the mutation of traits that play a role, instead of spatial evolution (see [7, 8, 24]). Moreover, if one wants to model multiple interacting species or marked configuration spaces, more general spaces than \({\mathbb {R}}^d\) are needed [16, 22]).

Therefore, let the trait space be an arbitrary Polish space, denoted henceforth as \(\mathcal {T}\). We model the BPDL-dynamics at any time t as an interacting particle system with particles with labels \(A_t^1,\dots ,A_t^{N_t} \) and traits \(X_t^1,\dots ,X_t^{N_t} \in \mathcal {T}\), where the number of particles \(N_t\) at time t is not fixed since particles can be removed from and added to the system.

Moreover, let \(b\in \mathcal {B}^+(\mathcal {T})\), \(d,c\in \mathcal {B}^+(\mathcal {T}\times \mathcal {T})\) be non-negative measurable functions, \(n>0\) a positive parameter, and \(\gamma \in \mathcal {M}^+_{loc}(\mathcal {T})\) a non-negative reference measure such that

$$\begin{aligned} \int _{\mathcal {T}} d(x,y)\, \gamma (\textrm{d}y) = 1, \qquad \text{ for } \text{ all } x\in \mathcal {T}. \end{aligned}$$

Then the BPDL dynamics can be described as follows:

  • Each particle with trait \(x\in \mathcal {T}\) has two exponential clocks: a seed clock with rate b(x) and a death clock with rate \(\tfrac{1}{n}\sum _{i=1}^{N_t} c(x,X_t^i)\).

  • If the death clock rings, the particle is deleted.

  • If the seed clock rings, a new particle is added with trait \(y\in \mathcal {T}\) with probability \(d(x,y)\gamma (\textrm{d}y)\).

Alternatively, we can describe these dynamics in the form of reacting particles. Namely, setting \(m(x,y):=b(x)d(x,y)\), then with a little of abuse of notation we have

$$\begin{aligned} \begin{aligned} A^i_t&\rightarrow A^i_t+A^{N_t+1}_t,\quad&\hbox {with rate} \quad&m\left( X_t^i,X_t^{N_1+1}\right) \gamma \left( X_t^{N_1+1}\right) ,\\ A^i_t+A^j_t&\rightarrow A^j_t, \quad&\hbox {with rate} \quad&n^{-1}c\left( X^i_t,X^j_t\right) . \end{aligned} \end{aligned}$$
(1.2)

We will refer to m as the mutation kernel, and c as the competition kernel. The parameter \(n>0\) is called the system size, in the sense that that the scaling \(n^{-1}c\) guarantees that if the amount of particles in the system is of the order of n, the total rate of created or deleted particles is of the same order.

Instead of looking at the individual traits of the particles, it is common to only consider the measure-valued process \(\nu _t\) determined by the rescaled empirical measure

$$\begin{aligned} \nu _t^n:=\frac{1}{n}\sum _{i=1}^{N(t)} \delta _{X_t^i}. \end{aligned}$$
(1.3)

Here, \(\nu _t\in \Gamma :=\mathcal {M}^+(\mathcal {T})\) with \(\mathcal {M}^+(\mathcal {T})\) the space of finite non-negative measures. The infinitesimal generator \(Q_n\) of this process is given for all \(F\in C_c(\Gamma )\) by

$$\begin{aligned} (Q_n F)(\nu )= & {} n \int _{\mathcal {T}} \left( F\left( \nu +\tfrac{1}{n}\delta _x\right) -F(\nu )\right) \, \kappa ^+[\nu ](\textrm{d}x)\\{} & {} +n \int _{\mathcal {T}} \left( F\left( \nu -\tfrac{1}{n}\delta _x\right) -F(\nu )\right) \, \kappa ^-[\nu ](\textrm{d}x), \end{aligned}$$

where \(\kappa ^{\pm }[\nu ]\in \Gamma \) are the measure-dependent birth/death-kernels

$$\begin{aligned} \kappa ^+[\nu ](\textrm{d}x):= \left( \int _{y\in \mathcal {T}} m(y,x)\nu (\textrm{d}y)\right) \gamma (\textrm{d}x),\qquad \kappa ^-[\nu ](\textrm{d}x):= \left( \int _{y\in \mathcal {T}} c(x,y) \nu (\textrm{d}y)\right) \nu (\textrm{d}x). \end{aligned}$$

The law of the process is now given by the corresponding forward Kolmogorov equation

figure a

Depending on the setting, this formulation can be made rigorous in various ways: for example via an analytical approach on configuration spaces as done in [14], which in fact models infinite configurations of particles over \({\mathbb {R}}^d\), or via martingale techniques with \(\mathcal {T}\) a closed subset of \({\mathbb {R}}^d\) and \(\gamma =\mathscr {L}^d|_{\mathcal {T}}\) (see [18]). Moreover, in the latter, under the assumption of continuous, bounded, and integrable mutation/competition kernels, it is also shown that the process converges in the large-population limit \(n\rightarrow \infty \) to the mean-field Eq. (1.1), which can be rewritten as

figure b

i.e. \(u_t\) is the density of \(\nu _t\) with respect to \(\gamma \).

While different choices of scalings are possible, the mean-field equation describes the macroscopic properties of the measure-valued process when the population is large. An alternative way is to study the evolution of the moments, which form a hierarchy similar to the BBGKY-hierarchy of correlation functions, and under the so-called Vlasov scaling the first moment or correlation function converges to (\(\mathsf MF\)). For the case of infinite configurations over \({\mathbb {R}}^d\) this has been established, see [15], and both propagation of chaos in the Vlasov limit and the sub-Poissonian property have been established as well [17].

In this work, we do not consider the measure-valued process itself, but take the forward Kolmogorov equation (\(\mathsf FKE_n\)) as a starting point, and show convergence to the mean-field equation in the sense that \(\textsf{P}^n_t\rightarrow \delta _{\nu _t}\) narrowly on \(\mathcal {P}(\Gamma )\) under suitable initial conditions. Throughout we equip the space \(\Gamma \) with the narrow topology, and assume the following:

Assumption 1.1

The trait space \(\mathcal {T}\) is a compact Polish space, and moreover

$$\begin{aligned} \begin{aligned} \gamma&\in \Gamma \qquad{} & {} {} & {} \hbox {(reference measure with finite mass)}\\ m,c&\in \mathcal {B}_b^+(\mathcal {T}\times \mathcal {T}) \qquad{} & {} {} & {} \hbox {(bounded rates)}\\ c(x,x)&=0 \quad{} & {} \text{ for } \text{ all } x\in \mathcal {T}\qquad{} & {} \hbox {(no natural death)}\\ m(y,x)&=c(x,y) \quad{} & {} \text{ for } \text{ all } x,y\in \mathcal {T}\qquad{} & {} \hbox {({mean-field} detailed balance)}\\ \end{aligned} \end{aligned}$$

The assumption of no natural death means that particles can only be deleted due to competition with other particles. Moreover, with a bit of abuse of notation, the two conditions \(c(x,x)=0\) for all \(x\in \mathcal {T}\) and \(m(y,x)=c(x,y)\) for all \(x,y\in \mathcal {T}\) together will be referred to as the detailed balance condition, because they imply that the jump kernel \({{\bar{\kappa }}}_n\) (defined in Sect. 3) corresponding to the measure-valued process satisfies the detailed balance condition with respect to an invariant measure \(\Pi _n\), i.e.

$$\begin{aligned} \Pi _n(\textrm{d}\nu ) {{\bar{\kappa }}}_n(\nu ,\textrm{d}\eta )=\Pi _n(\textrm{d}\eta ) \bar{\kappa }_n(\eta ,\textrm{d}\nu ). \end{aligned}$$
(1.4)

Here \(\Pi _n\) is obtained as a push-forward of the Poisson measure \(\pi _n\), with

$$\begin{aligned} \mathcal {P}\left( \coprod _{N\ge 1}\mathcal {T}^N\right) \ni \pi _n:=\frac{1}{e^{n \gamma (\mathcal {T})}-1}\sum _{N=1}^{\infty } \frac{n^N}{N!}\gamma ^{\otimes N}, \end{aligned}$$

under the rescaled empirical measure mapping determined by (1.3), see (3.2). This allows us to write the forward Kolmogorov equation as a gradient flow of the relative entropy with respect to \(\Pi _n\), and equip it with a corresponding variational structure, see Theorem 1.6.

However, the condition \(m(y,x)=c(x,y)\) for all \(x,y\in \mathcal {T}\) alone suffices to express the mean-field equation as a gradient flow as (cf. Theorem 1.4), and will, therefore, be referred to as the mean-field detailed balance condition.

In light of similar results in [11, 28] for mean-field jump processes on finite spaces and finite chemical reaction networks, one expects (\(\mathsf FKE_n\)) to converge to the following Liouville equation

figure c

It is a transport equation that can be interpreted as the lifting of mean-field dynamics in \(\Gamma \) to evolutions in \(\mathcal {P}(\Gamma )\) and describes the evolution of the law of random measures \(\nu _t\) that all satisfy (\(\mathsf MF\)). In particular, if \(\nu _t\) a solution of (\(\mathsf MF\)) then \(\textsf{P}_t:=\delta _{\nu _t}\) is itself a solution of (\(\mathsf Li\)).

It turns out that in our general setting, this convergence holds as well, as will be stated in Theorem 1.9. Letting \(V[\nu ]=\kappa ^+[\nu ]-\kappa ^-[\nu ]\), we can therefore represent part of our results in Fig. 1.

Fig. 1
figure 1

Convergence in the large-population limit

This convergence is a direct consequence of the convergence of the associated gradient structures, which we will describe below.

1.2 Gradient-flow formulation

Our first main result concerns the variational formulation of the equations (\(\mathsf FKE_n\)), (\(\mathsf MF\)), (\(\mathsf Li\)) and their specific gradient structure. Various gradient-flow formulations exist for jump processes, mean-field jump processes, and chemical reaction networks [11, 12, 21, 28, 33].

In these works, a common starting point is to describe the relation between \(\rho _t\), representing either law of some process or mean-field limits and generalized fluxes \(j_t\) in the form of an abstract continuity equation. For example, in the case of independent particles following a common jump process over a graph, \(\rho _t\) corresponds to the number of particles on a node at time t, and a choice of flux \(j_t\) can be the so-called net flux \(j_t\), which is related to the number of particles going through an edge.

However, we propose a slightly different structure, namely one that tracks the effective mass fluxes for both creation (arising from mutation) and annihilation (arising from competition) separately. The use of mass fluxes instead of usual particle fluxes ensures that in our convergence results as \(n\rightarrow \infty \) we have both convergences of laws and fluxes (see Theorem 1.8).

Moreover, separating the effects of creation and annihilation (henceforth simply referred to as birth and death) instead of their combined contribution allows us to incorporate more information in our variational formulation. The downside is that we are forced to work with positive fluxes, while the framework in the aforementioned examples involves either quadratic or generalized structures for signed net fluxes. In this sense we are closer to the variational representations stemming from large deviations, involving so-called one-way or unidirectional fluxes, see for example [3, 30, 32, 34]. Indeed, our structure is motivated by large deviation theory, as we will discuss briefly in “Appendix A”.

In all three cases, i.e. for (\(\mathsf FKE_n\)), (\(\mathsf MF\)) and (\(\mathsf Li\)), our proposed structure is similar to the classical notion of a gradient flow in the sense that they all satisfy an abstract Energy-Dissipation Balance. Since we will repeat the same concept three times on different levels and for different spaces, let us make the general and abstract concepts clear:

Formal Definition 1.2

Given a free energy functional \(\mathcal {F}(\rho )\), a dissipation potential \(\mathcal {R}(\rho ,j)\), a Fisher information functional \(\mathcal {D}(\rho )\), and a linear operator B with dual \(B^*\), we consider pairs of curves \((\rho ,j)\) satisfying the continuity equation

figure d

and define the EDP-functional

$$\begin{aligned} \mathcal {I}(\rho ,j):=\int _0^T \mathcal {R}(\rho _t,j_t) \, \textrm{d}t + \mathcal {F}(\rho _T)-\mathcal {F}(\rho _0) +\int _0^T \mathcal {D}(\rho _t)\, \textrm{d}t. \end{aligned}$$

Moreover, a gradient-flow solution is a pair \((\hat{\rho },\hat{\jmath })\) satisfying (\(\textsf{CE}\)) with \({\mathcal {I}}(\hat{\rho },\hat{\jmath })=0\).

Throughout we require the non-negativity of \(\mathcal {I}\). For a deeper look at the mathematical basis of this sort of setting, especially for generalized gradient systems incorporating net fluxes, see [33].

In all three examples the generalized fluxes j consist of two parts: \(j^+\) and \(j^-\), corresponding to birth and death. The continuity equations depend on the setting and are summarized in Table 1, with \(\mathcal {M}^+_{loc}\) as the space of non-negative Radon measures.

Remark 1.3

Note that the gradient-flow solution \((\hat{\rho },\hat{\jmath })\) is the null-minimizer of \(\mathcal {I}\), and satisfies the energy-dissipation balance

$$\begin{aligned} \mathcal {F}({\hat{\rho }}_T)+ \int _0^T \left( \mathcal {R}(\hat{\rho }_t,\hat{\jmath }_t)+\mathcal {D}({\hat{\rho }}_t)\right) \, \textrm{d}t =\mathcal {F}({\hat{\rho }}_0). \end{aligned}$$

Moreover, with \(\langle \cdot ,\cdot \rangle \) shorthand for appropriate dual pairings, one would expect for small T that

$$\begin{aligned}\mathcal {F}({\hat{\rho }}_T)-\mathcal {F}({\hat{\rho }}_0) \approx \langle \partial _{\rho } \mathcal {F},\partial _t {\hat{\rho }}\rangle = \langle B\, \partial _{\rho }\mathcal {F},\hat{\jmath } \rangle , \end{aligned}$$

where we used the continuity equation (\(\textsf{CE}\)) and duality of \(B,B^*\), and therefore

$$\begin{aligned} \mathcal {I}\approx \mathcal {R}({\hat{\rho }},\hat{\jmath })+\langle \hat{\jmath }, B\, \partial _{\rho } \mathcal {F}\rangle +\mathcal {D}({\hat{\rho }}). \end{aligned}$$

In light of the generalized gradient-flow framework of [33] and the relation to minimizing movement schemes, a formal minimization procedure provides the gradient-flow solution

$$\begin{aligned} \begin{aligned} \partial _t {\hat{\rho }} {{-}} B^* \hat{\jmath }&=0\\ \hat{\jmath }&=(\,\partial _2 \mathcal {R}^*)({\hat{\rho }},-B\, \partial _{\rho } \mathcal {F}),\\ \end{aligned} \end{aligned}$$

and that along the solution,

$$\begin{aligned} \mathcal {D}({\hat{\rho }})=\mathcal {R}^*({\hat{\rho }},-B\, \partial _{\rho } \mathcal {F}). \end{aligned}$$
(1.5)

where \(\mathcal {R}^*(\rho ,w)\) is the dual of the dissipation potential \(\mathcal {R}\). Finally, note that along the gradient-flow solution the free energy \(\mathcal {F}\) is non-increasing, i.e. \(\mathcal {F}\) is a Lyapunov functional.

These (in)equalities indeed hold in our setting. See also “Appendix A”, where we compare the relation to generalized gradient flows for net fluxes, which follow from the above after a contraction argument, and the connection to the reversibility of the underlying process.

Table 1 Continuity equations

Let \(H(\mu _1,\mu _2)\) be the Hellinger distance, see (2.2), and \(\mathcal {E}\textrm{nt}(\mu _1|\mu _2)\) the relative entropy of \(\mu _1\) with respect to \(\mu _2\) for two (possible infinite) locally finite Borel measures \(\mu _1,\mu _2\):

$$\begin{aligned} \mathcal {E}\textrm{nt}(\mu _1|\mu _2):=\left\{ \begin{aligned}&\int \phi \left( \frac{\textrm{d}\mu _1}{\textrm{d}\mu _2}\right) \textrm{d}\mu _2,{} & {} \hbox {if } \mu _1\ll \mu _2,\\&+\infty , \qquad{} & {} \hbox {otherwise,} \end{aligned}\right. \end{aligned}$$

where

$$\begin{aligned} \phi (s)=s \log s-s+1. \end{aligned}$$

With the full technical details contained in Theorems 2.7, 3.8 and 4.7, we then have the following triple of results below,

Theorem 1.4

(Mean-field, cf. Theorem 2.7) Consider triples \((\nu ,\lambda ^+,\lambda ^-)\), with \(\nu _t,\lambda _t^{\pm }\in \Gamma \), satisfying the mean-field continuity equation

figure e

Define the dissipation potential \(\mathcal {R}_{MF}\), free energy \(\mathcal {F}_{MF}\) and Fisher information \(\mathcal {D}_{MF}\) as

$$\begin{aligned} \begin{aligned} \mathcal {R}_{MF}(\nu ,\lambda ^+,\lambda ^-)&:=\mathcal {E}\textrm{nt}(\lambda ^+|\theta _{\nu })+\mathcal {E}\textrm{nt}(\lambda ^-|\theta _{\nu }),\\ \mathcal {F}_{MF}(\nu )&:=\tfrac{1}{2}\mathcal {E}\textrm{nt}(\nu |\gamma ),\\ \mathcal {D}_{MF}(\nu )&:=\left\{ \begin{aligned}&2H^2(\kappa ^+[\nu ],\kappa ^-[\nu ]),\qquad{} & {} \hbox {if } \nu \ll \gamma ,\\&+\infty ,\qquad{} & {} \hbox {otherwise,} \end{aligned}\right. \end{aligned} \end{aligned}$$

where \(\theta _{\nu }\) is the geometric mean of the expected birth and death fluxes, i.e.

$$\begin{aligned} \theta _{\nu }:=\sqrt{\kappa ^+[\nu ] \kappa ^-[\nu ]}. \end{aligned}$$

Then the corresponding EDP-functional \({\mathcal {I}}_{MF}\) given by

$$\begin{aligned} \mathcal {I}_{MF}(\nu ,\lambda ^+,\lambda ^-):=\int _0^T \mathcal {R}_{MF}(\nu _t,\lambda ^+_t,\lambda ^-_t) \, \textrm{d}t + \mathcal {F}_{MF}(\nu _T)-\mathcal {F}_{MF}(\nu _0) +\int _0^T \mathcal {D}_{MF}(\nu _t)\, \textrm{d}t, \end{aligned}$$

is non-negative, and for any \(\nu _0\) with \({\mathcal {F}_{MF}(\nu _0)}<\infty \) a unique gradient-flow solution \(({\hat{\nu }},{\hat{\lambda }}^{+},\hat{\lambda }^{-})\) exists, with \({\hat{\nu }}_t\) equal to the unique strong solution to (\(\mathsf MF\)) and \(\hat{\lambda }_t^{\pm }=\kappa ^{\pm }[{\hat{\nu }}_t]\) for almost every \(t\in [0,T]\).

As mentioned, although treating birth and death separately provides us with additional information, this prohibits the use of some of the previous works for gradient structures because of the positivity of the fluxes. However, there is still a strong connection to the variational formulations for jump processes arising from the large deviations of fluxes as seen in [32] and [3], see for example “Appendix A” on the equivalence of the EDP-functional to the expected rate functional.

Remark 1.5

It is straightforward to verify that if \(\textrm{d}\nu =u \textrm{d}\gamma \)

$$\begin{aligned} \mathcal {R}^*_{MF}(\nu ,\partial _{\nu } \mathcal {F}_{MF},-\partial _{\nu } \mathcal {F}_{MF})&=\int _{\mathcal {T}^2} 1_{u(x)>0} c(x,y) \left( \sqrt{u(x)}-1\right) ^2\gamma (\textrm{d}x)\nu (\textrm{d}y), \\ \mathcal {D}_{MF}(\nu )&=\int _{\mathcal {T}^2} c(x,y) \left( \sqrt{u(x)}-1\right) ^2\gamma (\textrm{d}x)\nu (\textrm{d}y), \end{aligned}$$

and hence it is not directly clear that the relation (1.5) holds. However, as will be shown for Theorem 2.7, at least along the solution \({\hat{\nu }}_t\) the equivalence holds for a.e. \(t\in [0,T]\).

Theorem 1.6

(Forward Kolmogorov, cf. Theorem 3.8) Consider triples \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\), with \(\textsf{P}_t\in \mathcal {P}(\Gamma )\) and \({\textsf{J}_t^{\pm }\in \mathcal {M}^+_{loc}}(\Gamma \times \mathcal {T})\), satisfying the continuity equation

figure f

where

$$\begin{aligned} (\overline{\nabla }^{n,\pm } F)(\nu ,x):=n\left( F(\nu \pm \tfrac{1}{n}\delta _x)-F(\nu )\right) . \end{aligned}$$
(1.6)

Define the n-dependent Fisher information \(\mathcal {D}_{n}\) as stated in Definition 3.4, free energy

$$\begin{aligned} \mathcal {F}_n(\textsf{P}):=\frac{1}{2n} \mathcal {E}\textrm{nt}(\textsf{P}|\Pi _n), \end{aligned}$$

and dissipation potential

$$\begin{aligned} \mathcal {R}_n(\textsf{P},\textsf{J}^+,\textsf{J}^-)&:=\mathcal {E}\textrm{nt}(\textsf{J}^+|\Theta _{\textsf{P}}^{n,+})+\mathcal {E}\textrm{nt}(\textsf{J}^-|\Theta _{\textsf{P}}^{n,-}), \end{aligned}$$

where, with a little abuse of notation (see (3.14)),

$$\begin{aligned} \Theta _{\textsf{P}}^{n,\pm }(\nu ,x):=\sqrt{ \Big (\textsf{P}(\nu )\kappa ^{\pm }[\nu ]\Big ) \left( \textsf{P}(\nu \pm \tfrac{1}{n}\delta _x)\kappa ^\mp [\nu \mp \tfrac{1}{n}\delta _x]\right) }. \end{aligned}$$

Then the corresponding EDP-functional \({\mathcal {I}}_{n}\) given by

$$\begin{aligned} \mathcal {I}_{n}(\textsf{P},\textsf{J}^+,\textsf{J}^-):=\int _0^T \mathcal {R}_{n}(\textsf{P}_t,\textsf{J}^+_t,\textsf{J}^-_t) \, \textrm{d}t + \mathcal {F}_{n}(\textsf{P}_T)-\mathcal {F}_{n}(\textsf{P}_0) +\int _0^T \mathcal {D}_{n}(\textsf{P}_t)\, \textrm{d}t, \end{aligned}$$

is non-negative, and for any \(\textsf{P}_0\) with \(\mathcal {F}_n(\textsf{P}_0)<\infty \) a unique gradient-flow solution \((\hat{\textsf{P}},\hat{\textsf{J}}^{\pm })\) exists, with \(\hat{\textsf{P}}_t\) equal to a weak solution to (\(\mathsf FKE_n\)) and \(\hat{\textsf{J}}_t^{\pm }=\hat{\textsf{P}}_t \kappa _{\nu }^{\pm }\) for almost every \(t\in [0,T]\).

Similar to the mean-field case, the dissipation potential consists of relative entropies with respect to geometric averages, now of forward and backward rates along a transition \(\nu \rightarrow \nu \pm \tfrac{1}{n}\delta _{x}\). Moreover, note that in contrast to the framework of [33], we employ fluxes \(\textsf{J}^{\pm }\) that are not finite measures. This is due to the unboundedness of \(\kappa _{\nu }\) as the mass of \(\nu \) grows, which implies that the underlying jump kernel over \(\Gamma \) is itself unbounded as well, see Sect. 3.

For the Liouville equation, let us define \(\textrm{Cyl}_c(\Gamma )\) as the space of compactly supported smooth cylinder functions of the form

$$\begin{aligned} F(\nu )=g\left( \langle 1,\nu \rangle ,\langle f_1,\nu \rangle ,\dots ,\langle f_m,\nu \rangle \right) ,\qquad g\in C^{\infty }_c({\mathbb {R}}^{m}),\;m \in {\mathbb {N}}, \end{aligned}$$

where \(f_1,\dots ,f_m\in C_b(\mathcal {T})\), and \(\textrm{grad}_{\Gamma }\) is the distributional gradient defined by

$$\begin{aligned} \textrm{grad}_{\Gamma }\, F(\nu ,x)= (\nabla g)\left( \langle 1,\nu \rangle ,\langle f_1,\nu \rangle ,\dots ,\langle f_m,\nu \rangle \right) \cdot (1,f_1(x),\dots ,f_m(x))^\top . \end{aligned}$$
(1.7)

Theorem 1.7

(Liouville, cf. Theorem 4.7) Consider triples \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\), with \(\textsf{P}_t\in \mathcal {P}(\Gamma )\), \(\textsf{J}^{\pm }\in \mathcal {M}_{loc}(\Gamma \times \mathcal {T})\), satisfying the continuity equation

figure g

Define the Fisher information \(\mathcal {D}_{\infty }\) as stated in Definition 4.4, free energy

$$\begin{aligned} \mathcal {F}_{\infty }(\textsf{P})&:=\frac{1}{2}\int _{\Gamma } \mathcal {E}\textrm{nt}(\nu |\gamma ) \, \textrm{d}\textsf{P}, \end{aligned}$$

and dissipation potential

$$\begin{aligned} \mathcal {R}_{\infty }(\textsf{P},\textsf{J}^+,\textsf{J}^-):=\mathcal {E}\textrm{nt}(\textsf{J}^+|\Theta _{\textsf{P}}^{\infty })+\mathcal {E}\textrm{nt}(\textsf{J}^-|\Theta _{\textsf{P}}^{\infty }),\qquad \Theta _{\textsf{P}}^{\infty }(\textrm{d}\nu ,\textrm{d}x):=\theta _{\nu }(\textrm{d}x)\textsf{P}(\textrm{d}\nu ). \end{aligned}$$

Then the corresponding EDP-functional \({\mathcal {I}}_{\infty }\) given by

$$\begin{aligned} \mathcal {I}_{\infty }(\textsf{P},\textsf{J}^+,\textsf{J}^-):=\int _0^T \mathcal {R}_{\infty }(\textsf{P}_t,\textsf{J}^+_t,\textsf{J}^-_t) \, \textrm{d}t + \mathcal {F}_{\infty }(\textsf{P}_T)-\mathcal {F}_{\infty }(\textsf{P}_0) +\int _0^T \mathcal {D}_{\infty }(\textsf{P}_t)\, \textrm{d}t, \end{aligned}$$

is non-negative, and for any \(\textsf{P}_0\) with \(\mathcal {F}_n(\textsf{P}_0)<\infty \) a unique gradient-flow solution \((\hat{\textsf{P}},\hat{\textsf{J}}^{\pm })\) exists, with \(\hat{\textsf{P}}_t\) equal to a weak solution to (\(\mathsf FKE_n\)) and \(\hat{\textsf{J}}_t^{\pm }=\hat{\textsf{P}}_t \kappa _{\nu }^{\pm }\) for almost every \(t\in [0,T]\).

Finally, for any \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\) such that \(I_{\infty }(\textsf{P},\textsf{J}^+,\textsf{J}^-)<\infty \), there exists (with a little abuse of notation) a Borel probability measure \(\Omega \) over curves satisfying the mean-field continuity equation (\(\mathscr{C}\mathscr{E}\)) such that for all t the time marginals \((e_t)_{\#} \Omega \) are equal to \(\textsf{P}_t\), and

$$\begin{aligned} \mathcal {I}_{\infty }(\textsf{P},\textsf{J}^+,\textsf{J}^-){=}\int \mathcal {I}_{MF}(\nu ,\lambda ^+,\lambda ^-) \,\textrm{d}\Omega . \end{aligned}$$
(1.8)

The statement of (1.8) is the aforementioned superposition principle, which is a modified version of the superposition principle [2] in metric measure spaces, and the ones used in [11, 12]. It allows one to essentially jump back and forth between the Liouville equation and the mean-field dynamics, and in particular, provides us with the non-negativity of \(\mathcal {I}_{\infty }\) and the uniqueness of gradient-flow solutions.

1.3 Convergence results

Our final and most important result is that the above gradient structures converge in the sense of EDP-convergence (e.g. see [25, 34]), a generalization of the evolutionary \(\Gamma \)-convergence approach stated by [36, 37] and expanded on in [27], which implies convergence of the gradient-flow solutions and their free energies.

We say that a sequence \((\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})\in \textsf{CE}_n\) converges to some \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{{\infty }}\) if for all \(t\in [0,T]\) the probability measures \(\textsf{P}_t^n\) converge narrowly to \(\textsf{P}_t\) in \(\mathcal {P}(\Gamma )\), and \(\textsf{J}^{n,\pm }_t(\textrm{d}\nu ,\textrm{d}x) \,\textrm{d}t\) converge vaguely to \(\textsf{J}^{\pm }_t (\textrm{d}\nu ,\textrm{d}x)\, \textrm{d}t\) in \(\mathcal {M}_{loc}^{{+}}([0,T]\times \Gamma \times \mathcal {T})\). Again postponing technicalities, see Theorem 5.1, we have the following lower semi-continuity and compactness result:

Theorem 1.8

(cf. Theorem 5.1) The sequence of free energies \(\mathcal {F}_n\) \(\varGamma \)-converges to \(\mathcal {F}_{\infty }\).

Moreover, the sequence of Fisher-information functionals and dissipation potentials are all sequentially lower semicontinuous for sequences of curves with bounded \(\mathcal {I}_n\) and initial \(\mathcal {F}_n\). In particular, for any sequence \((\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})\in \textsf{CE}_n\) converging to a \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{\infty }\) such that \(\mathcal {F}_{n}(\textsf{P}_0^n)\rightarrow \mathcal {F}_{\infty }(\textsf{P}_0)\) as well, we have

$$\begin{aligned} \liminf _{n\rightarrow \infty } \mathcal {I}_n(\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})\ge \mathcal {I}_{\infty }(\textsf{P},\textsf{J}^{+},\textsf{J}^{-}). \end{aligned}$$

Finally, for any sequence \((\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})\in \textsf{CE}_n\) such that

$$\begin{aligned}\begin{aligned} \limsup _{n\rightarrow \infty } \mathcal {F}_n(\textsf{P}_0^n)<\infty ,\\ \limsup _{n\rightarrow \infty } \mathcal {I}_n(\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})<\infty , \end{aligned}\end{aligned}$$

there exists a subsequence converging to some \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{\infty }\).

Here the notion of EDP-convergence or evolutionary \(\varGamma \)-convergence (where the \(\varGamma \) is not to be confused with our space of positive measures \(\Gamma \)) relates to the \(\varGamma \)-convergence of the free energies \(\mathcal {F}_n\) and suitable liminf-estimates for the dissipation potentials and Fisher-information functionals (or local slopes in a metric setting).

In certain applications or for certain notions of convergence (e.g. see [29]) one also establishes \(\varGamma \)-convergence for the total dissipation \(\mathcal {R}_n+\mathcal {D}_n\) when written as functionals over \(C([0,T];\mathcal {P}(\Gamma ))\). Moreover, \(\varGamma \)-convergence of the functionals \(\mathcal {I}_n\) over such path-spaces are related to the large deviations of the underlying process [23], as we briefly discuss in “Appendix A”. In our framework this would require that for every \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{\infty }\), we can find a sequence \((\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})\in \textsf{CE}_n\) that converges to \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\) and satisfies the limsup-estimate

$$\begin{aligned} \limsup _{n\rightarrow \infty } \mathcal {I}_n(\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-}) \le \mathcal {I}_{\infty }(\textsf{P},\textsf{J}^+,\textsf{J}^-). \end{aligned}$$

However, in this paper we restrict ourselves only to the liminf-estimates, which is sufficient to obtain convergence of the solutions, an approach also taken in [11, 12, 28]. Namely, by a lower semicontinuity and compactness argument, Theorem 1.8 implies the convergence of both the solutions and the free energies \(\mathcal {F}_n\), if the initial data are well prepared.

Theorem 1.9

(cf. Theorem 5.3) Suppose that \(\textsf{P}_0^n \rightarrow \textsf{P}\) with \(\mathcal {F}_{n}(\textsf{P}_0^n)\rightarrow \mathcal {F}_{\infty }(\textsf{P}_0)\) as well. Then for the sequence \(\hat{\textsf{P}}^n\) of gradient-flow solutions to (\(\mathsf FKE_n\)), and \(\hat{\textsf{P}}\) the gradient-flow solution to (\(\mathsf Li\)), we have that for all \(t\in [0,T]\)

$$\begin{aligned} \hat{\textsf{P}}_t^n \rightarrow \hat{\textsf{P}}_t \hbox {\; narrowly,\quad and}\quad \lim _{n\rightarrow \infty } \mathcal {F}_n(\hat{\textsf{P}}^n_t)= \mathcal {F}_{\infty }(\hat{\textsf{P}}_t). \end{aligned}$$

In particular, if \(\textsf{P}_0=\delta _{{\hat{\nu }}_0}\) and \({\hat{\nu }}_t\) is the solution to the mean-field problem (\(\mathsf MF\)), then for all \(t\in [0,T]\)

$$\begin{aligned} \hat{\textsf{P}}_t^n \rightarrow \delta _{\hat{\nu }_t} \hbox {\; narrowly,\quad and}\quad \lim _{n\rightarrow \infty } \frac{1}{n} \mathcal {E}\textrm{nt}(\hat{\textsf{P}}^n_t|\Pi _n)=\mathcal {E}\textrm{nt}({\hat{\nu }}_t|\gamma ). \end{aligned}$$

The second half of Theorem 1.9, on the concentration around mean-field solutions and convergence of entropies, follows directly from the definition of \(\mathcal {F}_{\infty }\) and uniqueness.

For interacting particle systems where the number of particles is fixed at \(n\in {\mathbb {N}}\) the narrow convergence \(\hat{\textsf{P}}_t^n\rightarrow \delta _{\hat{\nu }_t}\) is equivalent to the propagation of chaos in the sense of Sznitman [38], and would imply narrow convergence of the k-particle marginals at time t to \(\nu _t^{\otimes k}\). However, in our setting, this implies convergence of the k-correlation functions, see [4].

Moreover, the convergence of the free energies \(\mathcal {F}_n\) implies the stronger notion of entropic propagation of chaos if the initial condition is sufficiently regular.

Theorem 1.10

(cf. Theorem 5.4) Suppose that \(\textsf{P}^n_0\rightarrow \delta _{{\hat{\nu }}_0}\) with \(C^{-1} \le \textrm{d}{\hat{\nu }}_0/\textrm{d}\gamma \le C\) for some \(C>0\). If the initial sequence \(\textsf{P}^n_0\) is entropically chaotic in the sense that

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{1}{n} \mathcal {E}\textrm{nt}(\textsf{P}_0^n|\Pi _{n,{\hat{\nu }}_0})=0, \end{aligned}$$

then this is propagated along the solution, i.e.

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{1}{n} \mathcal {E}\textrm{nt}(\hat{\textsf{P}}_t^n|\Pi _{n,{\hat{\nu }}_t})=0, \qquad \text{ for } \text{ all } t\ge 0, \end{aligned}$$

where \(\Pi _{n,\nu }\in \mathcal {P}(\Gamma )\) stems from the Poisson measure \(\pi _{n,\nu }\)with intensity measure \(\nu \), i.e.

$$\begin{aligned} \pi _{n,\nu }:=\frac{1}{e^{n \nu (\mathcal {T})}-1} \sum _{N=1}^{\infty } \frac{n^N}{N!} \nu ^{\otimes N}. \end{aligned}$$

To the authors’ knowledge, this is the first entropic propagation of chaos result for bounded competition kernels over compact Polish spaces, under the assumption of detailed balance.

1.3.1 Comments

We have given an overview of the generalized gradient structures that we introduced for the forward Kolmogorov equation of our underlying interacting particle system and alluded to how this sequence of structures converges to a gradient structure induced by the mean-field limit. Throughout, we assumed bounded measurable rates mc over a compact Polish space \(\mathcal {T}\) satisfying the detailed balance condition \(m(x,y)=c(x,y)\) and \(c(x,x)=0\) for all \(x,y\in \mathcal {T}\), and we would like to briefly touch on possible relaxations of these assumptions.

First, for the limit inferior in Theorem 5.1, there is a technical issue concerning the possible non-continuity of the competition kernel c, which we resolve by an approximation argument from large deviation theory [19], see “Appendix C”. This argument can be straightforwardly extended to unbounded rates m and c under certain exponential integrability estimates with respect to the reference measure \(\gamma \). However, the uniqueness of solutions and well-posed of variational formulations would be less clear.

Moreover, it should be noted that, while we chose \(\mathcal {T}\) to be compact for brevity and clarity of the exposition, many of the listed results carry over to the case of \(\mathcal {T}\) Polish with finite \(\gamma \), under suitable choices of topologies and by bootstrapping from the tightness of \(\gamma \). However, the classical case of \(\mathcal {T}={\mathbb {R}}^d\) with the merely locally finite reference measure \(\gamma =\mathscr {L}^d\) (under suitable integrability estimates on m), as treated in for example [14, 18], is not easily contained in our framework. Due to the necessity to control the entropy, any solution to this problem would involve newly constructed estimates on the propagation of tightness.

A more fundamental restriction is the detailed balance assumption, which is necessary to phrase the variational structures in terms of generalized gradient systems and the evolution in terms of a gradient flow. However, there exist possible extensions and decompositions of variational structures for jump processes that do not assume detailed balance or even complex balance, see for example [20] for an overview. Therefore, in future work, the authors plan to generalize the variational methods outlined here to more general evolutions.

1.4 Notation

Below we collect some of the notation used throughout this paper.

\(\mathcal {T}\)

Trait space, Assumption 1.1

mc

Mutation/competition kernel, Assumption 1.1

\(\gamma \)

Reference measure, Assumption 1.1

n

System size, Assumption 1.1

\(\mathcal {E}\textrm{nt}\)

Relative entropy (2.3)

H

Hellinger distance (2.2)

\(\Psi ,\Psi ^*\)

Dual pair (2.6),(2.5)

\(\mathcal {M}^{+}\)

Space of finite non-negative Borel measures, with narrow topology

\(\mathcal {M}^{+}_{loc}\)

Space of non-negative Radon measures, with vague topology

\(\Gamma :=\mathcal {M}^+(\mathcal {T})\)

State space of measure-valued process

\(\Gamma _n\subset \Gamma \)

Space of positive atomic measures with common mass \(\tfrac{1}{n}\) (3.3)

\(\kappa ^{\pm }_{\nu }=\kappa ^{\pm }[\nu ]\)

Measure-dependent birth/death kernels (2.1)

\(\theta _{\nu }\)

Geometric mean of \(\kappa ^{+}_{\nu }\) and \(\kappa ^{-}_{\nu }\), Definition 2.4

\(\mathscr{C}\mathscr{E}\)

Continuity equation for mean-field (\(\mathsf MF\)), Definition 2.1

\(\mathcal {R}_{MF},\mathcal {F}_{MF},\mathcal {D}_{MF}\)

Ingredients of EDP-functional \(\mathcal {I}_{MF}\) for (\(\mathsf MF\)) , Definition 2.4

\(Q_n,Q_n^{*}\)

Generator and dual generator (3.1) of (\(\mathsf FKE_n\))

\({{\bar{\kappa }}}_n\)

Jump kernel (3.4) corresponding to (\(\mathsf FKE_n\))

\(L_n\)

Rescaled empirical measure map (3.2)

\(\pi _n,\Pi _n\)

Invariant measures for particle system (3.5) and measure-valued process (3.6)

\(\textsf{T}^{n,\pm }\)

Creation/annihilation mappings (3.8)

\(\overline{\nabla }^{n,\pm }, \textrm{div}^{n,\pm }\)

Discrete \(\Gamma _n\)-gradient (3.9) and divergence (3.10)

\(\vartheta _{\textsf{P}}^{\pm }\)

Expected fluxes (3.12)

\(\Theta _{\textsf{P}}^{n,\pm }\)

Geometric average \(\vartheta _{\textsf{P}}^{\pm }\) along transition, Definition (3.1)

\(\textsf{CE}_n\)

Continuity equation for (\(\mathsf FKE_n\)), Definition (3.1)

\(\mathcal {R}_{n},\mathcal {F}_{n},\mathcal {D}_{n}\)

Ingredients of EDP-functional \(\mathcal {I}_{n}\) for (\(\mathsf FKE_n\)), Definition 3.4

\(d_{TV,w}, W\)

Weighted total variation metric (3.18)/transportation metric (4.11) over \(\mathcal {P}(\Gamma )\)

\(\textsf{CE}_{\infty }\)

Continuity equation for (Li), Definition 4.3

\(\mathcal {R}_{\infty },\mathcal {F}_{\infty },\mathcal {D}_{\infty }\)

Ingredients of EDP-functional \(\mathcal {I}_{\infty }\) for (\(\mathsf Li\)), Definition 4.4

2 Mean-field system

In this section, we will discuss the gradient-flow formulation of the mean-field equation under the detailed balance condition. Let us first make precise the context of Theorem 1.4, and embed it within the more general statement of Theorem 2.7 below.

Recall that the trait space \(\mathcal {T}\) is a compact Polish space, and \(\Gamma :=\mathcal {M}^+(\mathcal {T})\) is the space of finite non-negative measures over \(\mathcal {T}\) equipped with the narrow topology. Fix a reference measure \(\gamma \in \Gamma \), and rates mc satisfying Assumption 1.1, i.e. \(m,c\in \mathcal {B}_b^{{+}}(\mathcal {T}\times \mathcal {T})\) with \(m(x,y)=c(y,x)\) for all \(x,y\in \mathcal {T}\), and \(c(x,x)=0\) for all \(x\in \mathcal {T}\). The mean-field equation then reads

figure h

with measure-dependent birth and death kernels \(\kappa ^{\pm }:\Gamma \rightarrow \Gamma \) given by

$$\begin{aligned} \kappa ^+[\nu ](\textrm{d}x):= \int _{y\in \mathcal {T}} c(x,y)\gamma (\textrm{d}x)\nu (\textrm{d}y),\qquad \kappa ^-[\nu ](\textrm{d}x):= \int _{y\in \mathcal {T}} c(x,y) \nu (\textrm{d}x) \nu (\textrm{d}y). \nonumber \\ \end{aligned}$$
(2.1)

Routinely, we will also adopt the shorthand notation \(\kappa _\nu ^{\pm }:= \kappa ^{\pm }[\nu ]\). Now, setting \(c_{\nu }(x):=\int _{\mathcal {T}} c(x,y)\, \nu (\textrm{d}y)\), it is clear that that \(\kappa ^+_{\nu }=c_{\nu } \gamma \), \(\kappa ^-_{\nu }=c_{\nu } \nu \), and the dynamics simplify to

$$\begin{aligned} \partial _t \nu (\textrm{d}x)=c_{\nu }(x)(\gamma (\textrm{d}x)-\nu (\textrm{d}x)). \end{aligned}$$

Strong solutions to (\(\mathsf MF\)) in either total variation or appropriate \(L^1\) spaces follow straightforwardly via classical methods, see Sect. 2.2.

The total variation norm \(\Vert \cdot \Vert _{TV}\) on \(\mathcal {M}(\mathcal {T})\) is defined as

$$\begin{aligned} \Vert \mu \Vert _{TV}:=\sup \left\{ \int _{\mathcal {T}} f \, \textrm{d}\mu : f\in \mathcal {B}_b(\mathcal {T}), \, \Vert f\Vert _{\infty }\le 1 \right\} , \qquad \mu \in \mathcal {M}(\mathcal {T}), \end{aligned}$$

and the squared Hellinger distance \(H^2\) is given by

$$\begin{aligned} H^2(\nu ,\eta ):=\frac{1}{2}\int _{\mathcal {T}} \left( \sqrt{\frac{\textrm{d}\nu }{\textrm{d}\sigma }}-\sqrt{\frac{\textrm{d}\mu }{\textrm{d}\sigma }}\right) ^2 \textrm{d}\sigma , \end{aligned}$$
(2.2)

with \(\sigma \) a measure dominating both \(\mu \) and \(\nu \). Note that the definition (2.2) is independent of the choice for the dominating measure \(\sigma \), and \(\sigma =\nu +\eta \) is always admissible.

Moreover, recall the entropy function \(\phi : {\mathbb {R}}_{\ge 0}\rightarrow {\mathbb {R}}_{\ge 0}\) and its Legendre dual \(\phi ^*:{\mathbb {R}}\rightarrow {\mathbb {R}}\) by

$$\begin{aligned} \phi (s):=s \log s -s+1, \qquad \phi ^*(z):=e^{z}-1, \end{aligned}$$

and the relative entropy of \(\nu \) with respect to \(\mu \) as

$$\begin{aligned} \mathcal {E}\textrm{nt}(\nu |\mu ):=\left\{ \begin{aligned}&\int _{\mathcal {T}} \phi \left( \frac{\textrm{d}\nu }{\textrm{d}\mu } \right) \textrm{d}\mu ,{} & {} \quad \hbox { if}\ \nu \ll \mu ,\\&+\infty {,}{} & {} \quad \hbox {otherwise.} \end{aligned}\right. \end{aligned}$$
(2.3)

We will consider curves satisfying the continuity equation

figure i

in an appropriately weak sense.

Definition 2.1

(Mean-field continuity equation) A triple \((\nu ,\lambda ^+,\lambda ^-)\) satisfies the mean-field continuity equation \(\mathscr{C}\mathscr{E}\) if

  1. (1)

    the curve \([0,T]\ni t\mapsto \nu _t\in \Gamma \) is absolutely continuous with respect to \(\Vert \cdot \Vert _{TV}\),

  2. (2)

    the Borel family \((\lambda _t^\pm )_{t\in [0,T]}\subset \Gamma \) satisfies \(\int _0^T \Vert \lambda _t^{\pm }\Vert _{TV} \, \textrm{d}t<\infty \),

  3. (3)

    for every \(s,t\in [0,T]\) and all \(f\in C_b(\mathcal {T})\)

    $$\begin{aligned} \int _{\mathcal {T}} f \textrm{d}\nu _t - \int _{\mathcal {T}} f \textrm{d}\nu _s = \int _s^t \left( \int _{\mathcal {T}} f \textrm{d}\lambda _r^+-\int _{\mathcal {T}} f \textrm{d}\lambda _r^- \right) \, \textrm{d}r, \quad \hbox {for all } s,t \hbox { with} 0\le s,t\le T. \end{aligned}$$

We will refer to \(\lambda ^{\textrm{net}}=\lambda ^+-\lambda ^-\) as the net flux.

Remark 2.2

When seen as approximations of particle systems the birth/death fluxes \(\lambda ^{\pm }_t\) represent the observed amount of mass being created/annihilated around a certain point, and \(\nu _t\) represents the density of the particles, while \(\kappa _{\nu }^{\pm }\) correspond to the expected birth and death fluxes of the BPDL model.

Remark 2.3

(Time-regularity) As we will see in Lemma 2.12, if there exist a common dominating measure for \(\{\nu _t,\lambda ^+_t,\lambda _t^-\}_{t\in [0,T]}\) then the continuity equation holds in a strong sense: \(\nu _t\) is an a.e. differentiable map from [0, T] to \((\Gamma ,\Vert \cdot \Vert _{TV})\) and

$$\begin{aligned} \partial _t \nu _t=\lambda _t^+-\lambda _t^-, \qquad \hbox { for a.e.}\ t\in [0,T]. \end{aligned}$$

Definition 2.4

Let \(\theta _{\nu }\) be the geometric average of \(\kappa _{\nu }^+\) and \(\kappa _{\nu }^-\), i.e.

$$\begin{aligned} \textrm{d}\theta _{\nu }:=\sqrt{\frac{\textrm{d}\kappa _{\nu }^+}{\textrm{d}\sigma } \frac{\textrm{d}\kappa _{\nu }^-}{\textrm{d}\sigma }} \textrm{d}\sigma , \end{aligned}$$

for any dominating measure \(\sigma \). We define the following objects:

  • The dissipation potential \(\mathcal {R}_{MF}:\Gamma ^3 \rightarrow [0,+\infty ]\),

    $$\begin{aligned} \mathcal {R}_{MF}(\nu ,\lambda ^+,\lambda ^-):=\mathcal {E}\textrm{nt}(\lambda ^+|\theta _{\nu })+\mathcal {E}\textrm{nt}(\lambda ^-|\theta _{\nu }), \end{aligned}$$

    and the dual dissipation potential \(\mathcal {R}^*_{MF}:\Gamma \times \mathcal {B}_b({\mathcal {T}})^2 \rightarrow {{\mathbb {R}}}\),

    $$\begin{aligned} \mathcal {R}^*_{MF}(\nu ,w^+,w^-):=\int _{\mathcal {T}} (e^{w^{+}}-1)\,\textrm{d}\theta _{\nu }+\int _{\mathcal {T}} (e^{w^{-}}-1)\,\textrm{d}\theta _{\nu }. \end{aligned}$$
  • The free energy \(\mathcal {F}_{MF}:\Gamma \rightarrow [0,+\infty ]\),

    $$\begin{aligned} \mathcal {F}_{MF}(\nu ):=\tfrac{1}{2} \mathcal {E}\textrm{nt}(\nu |\gamma ), \end{aligned}$$

    and Fisher information \(\mathcal {D}_{MF}:\Gamma \rightarrow [0,+\infty ]\),

    $$\begin{aligned} \mathcal {D}_{MF}(\nu ):=\left\{ \begin{aligned}&2H^2(\kappa _{\nu }^+,\kappa _{\nu }^-),{} & {} \qquad \hbox { if}\ \nu \ll \gamma ,\\&+\infty ,{} & {} \qquad \hbox {otherwise.} \end{aligned}\right. \end{aligned}$$
  • The EDP-functional \(\mathcal {I}_{MF}:\mathscr{C}\mathscr{E}\rightarrow [0,+\infty ]\) for all curves with \(\mathcal {F}_{MF}(\nu _0)<\infty \)

    $$\begin{aligned} \mathcal {I}_{MF}(\nu ,\lambda ^+,\lambda ^-):=\int _0^T \mathcal {R}_{MF}(\nu _t,\lambda _t^{+},\lambda ^-_t) \, \textrm{d}t + \mathcal {F}(\nu _T)-\mathcal {F}(\nu _0)+\int _0^T \mathcal {D}_{MF}(\nu _t) \, \textrm{d}t.\nonumber \\ \end{aligned}$$
    (2.4)

Remark 2.5

Since \(\theta _{\nu }(\mathcal {T})<\infty \) by Lemma 2.10 all objects above are well-defined, and it is straightforward to verify via the dual representation of the entropy that \(\mathcal {R}_{MF}, \mathcal {R}^*_{MF}\) are truly dual objects in the sense that

$$\begin{aligned} \mathcal {R}(\nu ,\lambda ^+,\lambda ^-):=\sup _{w^{\pm } \in \mathcal {B}_b(\mathcal {T})} \left\{ \int _{\mathcal {T}} w^+ \textrm{d}\lambda ^++\int _{\mathcal {T}} w^- \textrm{d}\lambda ^- -\mathcal {R}^*(\nu ,w^+,w^-) \right\} , \end{aligned}$$

and vice versa.

Remark 2.6

If \(\nu \ll \gamma \) with \(\textrm{d}\nu =u \textrm{d}\gamma \), note that \(\textrm{d}\theta _{\nu }=c_{\nu } \sqrt{u}\, \textrm{d}\gamma \), and that the Fisher information simplifies to

$$\begin{aligned} \mathcal {D}_{MF}(\nu )= \int _{\mathcal {T}} c_{\nu } \left( \sqrt{u}-1\right) ^2 \textrm{d}\gamma . \end{aligned}$$

We are now able to fully state the variational characterization of strong solutions to the mean-field equation (\(\mathsf MF\)).

Theorem 2.7

For any \((\nu ,\lambda ^+,\lambda ^-)\in \mathscr{C}\mathscr{E}\) with \(\mathcal {F}_{MF}(\nu _0)<\infty \), we have \(\mathcal {I}_{MF}(\nu ,\lambda ^+,\lambda ^-)\ge 0\) and

$$\begin{aligned} \mathcal {I}_{MF}(\nu ,\lambda ^+,\lambda ^-)=0 \iff \left\{ \begin{aligned} \quad&\nu _t \hbox {is the unique strong solution to } (\mathsf MF), \quad \\ \quad \lambda ^{\pm }_t&=(\kappa ^{\pm }_{\nu _t}) \quad \hbox {for a.e.} t\in [0,T]. \quad \end{aligned} \right. \end{aligned}$$

Moreover, whenever \(\mathcal {F}_{MF}(\nu _0)<\infty \) and \(\mathcal {I}_{MF}(\nu ,\lambda ^+,\lambda ^-)<\infty \) the chain rule for \(\mathcal {F}_{MF}\) holds: \(\mathcal {F}_{MF}(\nu _t)\) is absolutely continuous and

$$\begin{aligned} \frac{\textrm{d}}{\textrm{d}t} \mathcal {F}_{MF}(\nu _t)=\tfrac{1}{2} \int _{\mathcal {T}} \log \frac{\textrm{d}\nu _t}{\textrm{d}\gamma }\,\textrm{d}(\lambda ^+_t-\lambda ^-_t) \, {}, \qquad \hbox { for a.e.}\ t\in [0,T]. \end{aligned}$$

The proof of Theorem 2.7 is postponed to Sect. 2.3, where we establish the main technical ingredient, namely the chain rule for the entropy functional.

Remark 2.8

The results of this section do not depend on the no natural death condition \(c(x,x)=0\) for all \(x\in \mathcal {T}\), but arise from the bounds on mc and depend crucially on the mean-field detailed balance condition \(m(x,y)=c(y,x)\) for all \(x,y\in \mathcal {T}\).

Remark 2.9

The non-negativity of \(\mathcal {I}_{MF}\) and the fact that null-minimizers are solutions to (\(\mathsf MF\)) is related to the formal equivalence

$$\begin{aligned} \mathcal {I}_{MF}(\nu ,\lambda ^+,\lambda ^-)=\int _0^T \mathcal {L}(\nu _t,\lambda _t^+,\lambda ^-_t) \,\textrm{d}t, \end{aligned}$$

where \(\mathcal {L}\) is the so-called Lagrangian given by

$$\begin{aligned} \mathcal {L}(\nu ,\lambda ^+,\lambda ^-):=\mathcal {E}\textrm{nt}(\lambda ^+|\kappa _{\nu }^+)+\mathcal {E}\textrm{nt}(\lambda ^{{-}}|\kappa _{\nu }^-).\end{aligned}$$

Note that \(\mathcal {L}\) is non-negative and zero if only if \(\lambda ^{\pm }=\kappa _{\nu }^{\pm }\). Although we do not prove the full equivalence in this work, it does play a role in the intuition and motivation behind the EDP-functional \(\mathcal {I}_{MF}\) with the Lagrangian \(\mathcal {L}\) stemming from a large deviation perspective, as seen in “Appendix A”.

2.1 A priori estimates

In this section, we will collect some elementary estimates and results that are either necessary for the well-posedness of the mean-field equation and the corresponding gradient structure, or necessary to do the same for the Liouville equation in Sect. 4.

Let \(\Psi ^*\) be given as

$$\begin{aligned} \Psi ^*(z):=2 (\cosh (z)-1) = e^z+e^{-z}-2, \end{aligned}$$
(2.5)

and its dual \(\Psi :=(\Psi ^*)^*\)

$$\begin{aligned} \Psi (s)=s \log \left( \frac{s+\sqrt{s^2+4}}{2}\right) -\sqrt{s^2+4}+2 \end{aligned}$$
(2.6)

Lemma 2.10

Let \(M:=\Vert c\Vert _{\infty } (1+\gamma (\mathcal {T}))\). Then the following estimates hold:

  1. (i)

    The measures \(\kappa _{\nu }^{\pm }\) and \(\theta _{\nu }\) are finite:

    $$\begin{aligned} \kappa ^{\pm }_{\nu }(\mathcal {T})\le M (1+\nu (\mathcal {T})^2). \end{aligned}$$
    (2.7)

    and

    $$\begin{aligned} \theta _{\nu }(\mathcal {T})\le M (1+\nu (\mathcal {T})^2) \end{aligned}$$
    (2.8)
  2. (ii)

    For any birth/death fluxes \(\lambda ^{\pm }\in \mathcal {M}^+(\mathcal {T})\), net flux \(\lambda ^{\textrm{net}}=\lambda ^+-\lambda ^{{-}}\), and \(w^{\pm }, w\in \mathcal {B}(\mathcal {T})\),

    $$\begin{aligned} \begin{aligned} \int _{\mathcal {T}} |w^{\pm }| \,\textrm{d}\lambda ^{\pm }&\le \mathcal {E}\textrm{nt}(\lambda ^{\pm }|\theta _{\nu })+\int _{\mathcal {T}} \Psi ^*(w) \, \textrm{d}\theta _{\nu } + \theta _{\nu }(\mathcal {T}), \\ \int _{\mathcal {T}} |w|\,\textrm{d}|\lambda ^{\textrm{net}}| \,&\le \mathcal {R}_{MF}(\nu ,\lambda ^+,\lambda ^-)+\int _{\mathcal {T}} \Psi ^*(w) \, \textrm{d}\theta _{\nu }. \end{aligned} \end{aligned}$$
  3. (iii)

    For any birth/death fluxes \(\lambda ^{\pm }\in \Gamma \),

    $$\begin{aligned} \phi \left( \frac{\lambda ^{\pm }(\mathcal {T})}{M(1+\nu (\mathcal {T})^2)} \vee 1\right) M\le \mathcal {R}_{MF}(\nu ,\lambda ^+,\lambda ^-) \end{aligned}$$
    (2.9)

Remark 2.11

Although the estimate for \(\theta _{\nu }\) can be made more precise, namely

$$\begin{aligned} \theta _{\nu }(\mathcal {T})\le \Vert c\Vert _{\infty } \gamma (\mathcal {T})^{1/2} \nu (\mathcal {T})^{3/2}, \end{aligned}$$

we will not require it for our results.

Proof

(i) With \(\theta _{\nu }:=\sqrt{\textrm{d}\kappa _{\nu }^+/\textrm{d}\sigma \, \textrm{d}\kappa _{\nu }^-/\textrm{d}\sigma } \, \sigma \) for any dominating measure \(\sigma \) we have by Hölder’s inequality

$$\begin{aligned} \theta _{\nu }(\mathcal {T})\le \sqrt{\kappa _{\nu }^+(\mathcal {T})\kappa _{\nu }^-(\mathcal {T})}. \end{aligned}$$

Note that \(\kappa ^+_{\nu }(\mathcal {T})\le \Vert c\Vert _{\infty } \gamma (\mathcal {T}) \nu (\mathcal {T})\), and \(\kappa ^-_{\nu }(\mathcal {T})\le \Vert c\Vert _{\infty } \nu (\mathcal {T})^2\), which provides (2.7). Since \(z\le 1+z^2\) for all \(z\ge 0\) (2.8) follows directly.

(ii) First, suppose that \(w\in \mathcal {B}_b(\mathcal {T})\). Using the elementary inequality \( e^{|a|}\le e^a+e^{-a}\) we derive by duality of the entropy

$$\begin{aligned} \int _{\mathcal {T}} |w|\, \textrm{d}\lambda ^{\pm }&\le \mathcal {E}\textrm{nt}(\lambda ^{\pm }|\theta _{\nu })+ \int _{\mathcal {T}} (e^{|w|}-1)\, \textrm{d}\theta _{\nu }\\&\le \mathcal {E}\textrm{nt}(\lambda ^{\pm }|\theta _{\nu })+\int _{\mathcal {T}} \Psi ^*(w) \, \textrm{d}\theta _{\nu } + \theta _{\nu }(\mathcal {T}). \end{aligned}$$

Next, fix any measurable function \(w\in \mathcal {B}(\mathcal {T})\) and set its k-truncation \(w_k:=\max \{\min \{w,k\},-k\}\). Since \(\Psi ^*\) is even and monotone, by monotone convergence applied to both sides, the inequality holds for w as well, with both sides possibly equal to \(+\infty \).

Next, note that for for any \({\tilde{w}}\in \mathcal {B}_b(\mathcal {T})\)

$$\begin{aligned} \int _{\mathcal {T}} {\tilde{w}}\, \textrm{d}(\lambda ^{+}-\lambda ^-)&\le \mathcal {E}\textrm{nt}(\lambda ^{+}|\theta _{\nu })+\mathcal {E}\textrm{nt}(\lambda ^{-}|\theta _{\nu })+\int _{\mathcal {T}} (e^{{\tilde{w}}}-1)\, \textrm{d}\theta _{\nu }+\int _{\mathcal {T}} (e^{-{\tilde{w}}}-1)\, \textrm{d}\theta _{\nu }\\&=\mathcal {R}_{MF}(\nu ,\lambda ^+,\lambda ^-)+\int _{\mathcal {T}} \Psi ^*({\tilde{w}}) \, \textrm{d}\theta _{\nu }. \end{aligned}$$

Substituting \({\tilde{w}}:=|w|1_{P}-|w|1_{P^c}\), with \(P,P^c\) stemming from the Hahn decomposition for \(\lambda ^{\textrm{net}}=\lambda ^+-\lambda ^-\), the desired inequality for \(\lambda ^{\textrm{net}}\) now follow after another truncation argument.

(iii) Without loss of generality, suppose that \(\mathcal {R}_{MF}\) is finite. Set \(a(\nu ):=(1+\nu (\mathcal {T})^2)^{-1}\), and note that \(0\le a(\nu )\le 1\). With \({\tilde{\phi }}(s):=\phi (s\vee 1)\) the monotone relaxation of \(\phi \), we then have the following chain of inequalities,

$$\begin{aligned} \begin{aligned} \int _{\mathcal {T}} \phi \left( \frac{\textrm{d}\lambda ^{\pm }}{\textrm{d}\theta _{\nu }}\right) \textrm{d}\theta _{\nu }&\ge \int _{\mathcal {T}} {\tilde{\phi }}\left( \frac{\textrm{d}\lambda ^{\pm }}{\textrm{d}\theta _{\nu }}\right) \textrm{d}\theta _{\nu } \\&\ge \int _{\mathcal {T}} {\tilde{\phi }}\left( \frac{\textrm{d}(a(\nu ) \lambda ^{\pm })}{\textrm{d}(a(\nu ) \theta _{\nu })}\right) \textrm{d}(a(\nu ) \theta _{\nu })\\&\ge {\tilde{\phi }}\left( \frac{a(\nu ) \lambda ^{\pm }(\mathcal {T})}{a(\nu ) \theta _{\nu }(\mathcal {T})}\right) a(\nu ) \theta _{\nu }(\mathcal {T}), \end{aligned} \end{aligned}$$

where the last inequality follows from Jensen’s inequality. By convexity of \({\tilde{\phi }}\) and \({\tilde{\phi }}(0)=0\) the latter expression is monotone in \(\theta _{\nu }(\mathcal {T})\), and hence by (2.8) we find

$$\begin{aligned} \tilde{\phi }\left( \frac{\lambda ^{\pm }(\mathcal {T})}{M(1+\nu (\mathcal {T})^2)}\right) M\le \mathcal {R}_{MF}(\nu ,\lambda ^+,\lambda ^-). \end{aligned}$$

\(\square \)

We will briefly state the improvement of regularity in time of \(\nu _t\) if there exists a common dominating measure. The proof is similar to Corollary 4.14 of [33] and therefore omitted here.

Lemma 2.12

Let \((\nu ,\lambda ^+,\lambda ^-)\in \mathscr{C}\mathscr{E}\) and suppose that there exists a measure \(\ell \in \Gamma \) such that \(\nu _t,\lambda _t^{\pm }\ll \ell \) for all \(t\in [0,T]\).

Then there exists an absolutely continuous and a.e. differentiable map \(u:[0,T]\rightarrow L^1(\mathcal {T},\ell )\) and maps \(g^{\pm }:[0,T]\rightarrow L^1(\mathcal {T},\ell )\) such that \(u_t=\textrm{d}\nu _t/\textrm{d}\ell \), \(g_t^{\pm }=\textrm{d}\lambda _t^{\pm }/\textrm{d}\ell \) and

$$\begin{aligned} \partial _t u_t(x)=g_t^+(x)-g_t^-(x), \qquad \hbox { for a.e.}\ t\in [0,T]. \end{aligned}$$

In particular, the continuity equation holds in the strong sense, namely that \(\nu _t\) is an a.e. differentiable map from [0, T] to \((\Gamma ,\Vert \cdot \Vert _{TV})\) and

$$\begin{aligned} \partial _t \nu _t=\lambda _t^+-\lambda _t^-, \qquad \hbox { for a.e.}\ t\in [0,T]. \end{aligned}$$

Next, we will list two results that are either necessary for the chain rule in Sect. 3.3 or the superposition principle and well-posedness of the continuity equation in Sect. 4.

Lemma 2.13

For any \(0\le a\le 1\), \(z\in {\mathbb {R}}\)

$$\begin{aligned} \Psi ^*(a z)\le a^2\Psi ^*(z). \end{aligned}$$
(2.10)

Moreover, for any net flux \(\lambda ^{\textrm{net}}\in \mathcal {M}(\mathcal {T})\),

$$\begin{aligned} \Psi \left( \frac{\Vert \lambda ^{\textrm{net}}\Vert _{TV}}{M(1+\nu (\mathcal {T}))}\right) M&\le \mathcal {R}_{MF}(\nu ,\lambda ^+,\lambda ^-). \end{aligned}$$
(2.11)

Proof

It is straightforward to check that \(\Psi ^*(z)/z^2\) is monotone increasing for \(z\ge 0\), from which the first statement follows.

Now, for the net flux, it is convenient to go through the dual representation. Set \(a(\nu ):=(1+\nu (\mathcal {T}))^{-1}\). By duality, for any \(w\in \mathcal {B}_b(\mathcal {T})\)

$$\begin{aligned} \begin{aligned} \mathcal {R}_{MF}(\nu ,\lambda ^+,\lambda ^-)\ge a(\nu )\int _{\mathcal {T}} w(x) \,\textrm{d}\lambda ^{\textrm{net}}- \int _{\mathcal {T}} \Psi ^*\big (a(\nu )w(x)\big ) \,\textrm{d}\theta _{\nu }. \end{aligned} \end{aligned}$$
(2.12)

However, by (2.10),

$$\begin{aligned} \int _{\mathcal {T}} \Psi ^*\big (a(\nu )w(x)\big ) \,\textrm{d}\theta _{\nu } \le \int _{\mathcal {T}} \Psi ^*\big (w(x)\big ) a(\nu )^2 \,\textrm{d}\theta _{\nu } \le M \Psi ^*(\Vert w\Vert _{\infty }). \end{aligned}$$

Taking the supremum over all \(w\in \mathcal {B}_b(\mathcal {T})\) in (2.12) we find (2.11). \(\square \)

Lemma 2.14

Let \(\{f_i\}_{i\in {\mathbb {N}}} \subset C_b(\mathcal {T})\) be a countable and dense set of bounded continuous functions. Suppose \((\nu ,\lambda ^+,\lambda ^-)\) is such that

  1. (i)

    the curve \([0,T]\ni t\mapsto \nu _t\in \Gamma \) is narrowly continuous

  2. (ii)

    \((\lambda _t^\pm )_{t\in [0,T]}\subset \Gamma \) is a Borel family with

    $$\begin{aligned} \int _0^T \mathcal {R}_{MF}(\nu _t,\lambda ^+_t,\lambda ^-_t) \, \textrm{d}t<\infty \end{aligned}$$
  3. (iii)

    For all \(i \in {\mathbb {N}}\)

    $$\begin{aligned}{} & {} \int _{\mathcal {T}} f_i\, \textrm{d}\nu _t - \int _{\mathcal {T}} f_i\, \textrm{d}\nu _s = \int _s^t \left( \int _{\mathcal {T}} f_i \,\textrm{d}\lambda _r^+-\int _{\mathcal {T}} f_i \,\textrm{d}\lambda _r^- \right) \textrm{d}r,\\{} & {} \qquad \hbox {for all }s,t \hbox {with } 0\le s,t\le T. \end{aligned}$$

Then \((\nu ,\lambda ^+,\lambda ^-)\in \mathscr{C}\mathscr{E}\), i.e. the triple satisfies the mean-field continuity equation.

Proof

Since \(\nu _t\) is narrowly continuous its mass is uniformly bounded in time, hence let \(C:=\sup _{t\in [0,T]} \nu _t(\mathcal {T})\). By (2.9) and monotonicity of \(\phi (\cdot \vee 1)\) we have for a.e. \(t\in [0,T]\),

$$\begin{aligned} \phi \left( \frac{\lambda _t^{\pm }(\mathcal {T})}{M(1+C^2)} \vee 1\right) M\le \mathcal {R}_{MF}(\nu _t,\lambda ^+_t,\lambda ^-_t), \end{aligned}$$

and therefore by convexity of \(\phi (\cdot \vee 1)\)

$$\begin{aligned} \int _0^T \lambda _t^{\pm }(\mathcal {T})<\infty . \end{aligned}$$

Since the measures \(\lambda _t^{\pm }(\textrm{d}x)\, \textrm{d}t \in \mathcal {M}^+([0,T]\times \Gamma )\) are finite, by density of \(f_i\) in \(C_b(\mathcal {T})\) it is clear that for all \(f\in C_b(\mathcal {T})\)

$$\begin{aligned} \int _{\mathcal {T}} f \,\textrm{d}\nu _t - \int _{\mathcal {T}} f \,\textrm{d}\nu _s = \int _s^t \left( \int _{\mathcal {T}} f \,\textrm{d}\lambda _r^+-\int _{\mathcal {T}} f \,\textrm{d}\lambda _r^- \right) \, \textrm{d}r, \qquad \hbox {for all } s,t \hbox {with } 0\le s,t\le T. \end{aligned}$$

By a monotone class argument, this can be extended to all \(f\in \mathcal {B}_b(\mathcal {T})\) and we derive that \(\nu _t\) is indeed TV-absolutely continuous and \((\nu ,\lambda ^+,\lambda ^-)\in \mathscr{C}\mathscr{E}\). \(\square \)

2.2 Strong solutions

Strong solutions to (\(\mathsf MF\)) exist and are unique, and we list the most important properties here. It should be noted that these arguments apply even without the detailed balance condition \(m(x,y)=c(y,x)\) and only require both \(\Vert m\Vert _{\infty }\) and \(\Vert c\Vert _{\infty }{}\) to be finite, but for simplicity, we will restrict ourselves to our framework. Moreover, in all results the time window \(T>0\) is arbitrary.

Definition 2.15

A strong solution to (\(\mathsf MF\)) is any TV-absolutely continuous and a.e. differentiable mapping \(\nu :[0,T]\rightarrow (\Gamma ,\Vert \cdot \Vert _{TV})\) satisfying

$$\begin{aligned} \begin{aligned} \partial _t \nu _t(\textrm{d}x)&=\kappa _{\nu _t}^+(\textrm{d}x)-\kappa _{\nu _t}^-(\textrm{d}x)\\ \end{aligned} \end{aligned}$$
(2.13)

Recall that \(\kappa ^+_{\nu }(\textrm{d}x)={c_{\nu }(x)} \gamma (\textrm{d}x)\) and \(\kappa ^-_{\nu }(\textrm{d}x)={c_{\nu }}(x) \nu (\textrm{d}x)\), where \(c_{\nu }(x)=\int _{\mathcal {T}} c(x,y)\, \nu (\textrm{d}y)\).

Remark 2.16

Note that if \(\nu \) is a strong solution to (\(\mathsf MF\)) automatically \((\nu ,\kappa ^{+}_{\nu },\kappa _{\nu }^-) \in \mathscr{C}\mathscr{E}\).

Vice versa, if \((\nu ,\kappa ^{+}_{\nu },\kappa _{\nu }^-) \in \mathscr{C}\mathscr{E}\) then \(\nu _t\) is a strong solution. Namely, any TV-absolutely continuous curve \(\nu _t\) possesses a common dominating measure \(\ell \in \Gamma \), which implies \(\kappa _{\nu _t}^{\pm }\ll \ell +\gamma \). By Lemma 2.12 the curve \(\nu \) is indeed a a.e. differentiable mapping to \((\Gamma ,\Vert \cdot \Vert _{TV})\)

Lemma 2.17

For any \({{\bar{\nu }}}\in \Gamma \) there exist a unique strong solution \(\nu _t\) to (\(\mathsf MF\)) such that \(\nu _0={{\bar{\nu }}}\).

Moreover, if \({{\bar{\nu }}}\ll \gamma \), then also \(\nu _t\ll \gamma \) for all \(t\in [0,T]\).

The proof is an adaptation from [18, Proposition 7.2], which is stated for Lebesgue absolutely continuous measures over \(\mathcal {T}={\mathbb {R}}^d\). In short, the linear dependence of the birth flux on the mass of \(\nu \) gives a bound on this mass uniform in time, in which case both \(\kappa ^{\pm }_{\nu }\) are Lipschitz in \(\nu \) on \((\Gamma ,\Vert \cdot \Vert )\), and classical existence theory can be applied.

Proof

First, note that for the linear case of

$$\begin{aligned} \partial _t \nu _t(\textrm{d}x) = b_t(\textrm{d}x)-c_t(x)\nu _t(\textrm{d}x), \end{aligned}$$

with \(c_t\in \mathcal {B}_b\) uniformly bounded and \(b_{{t}} \in \Gamma \) with \(\int _0^T \Vert b_{{t}}\Vert _{TV} \, \textrm{d}t<\infty \) with a common dominating measure, it is easy to verify that a unique strong non-negative solution exists and is given by

$$\begin{aligned} \nu _t:=e^{-\int _0^t c_s(x)\, \textrm{d}s}\left( \int _0^t b_s e^{\int _0^s c_r \, \textrm{d}r} \textrm{d}s+\nu _0\right) . \end{aligned}$$

We now set \(\nu ^0_t:={{\bar{\nu }}}\) for all \(t\in [0,T]\), and perform the implicit Picard iteration

$$\begin{aligned} \partial _t \nu ^{k+1}_t(\textrm{d}x)={c_{\nu _t^k}(x)} \gamma (\textrm{d}x) -{c_{\nu _t^k}(x)} \nu _t^{k+1}(\textrm{d}x), \qquad \nu ^{k+1}_0:={{\bar{\nu }}}, \end{aligned}$$

i.e. \(\nu ^{k+1}=(\mathcal {G}\nu ^k)\) with

$$\begin{aligned} (\mathcal {G}\nu )_t(\textrm{d}x):=e^{-\int _0^t {c_{\nu _s}(x)\, } \textrm{d}s}\left( \int _0^t {{c_{\nu _s}(x)}\gamma (\textrm{d}x)} e^{\int _0^s {{c_{\nu _r}(x)}} \, \textrm{d}r} \textrm{d}s+\bar{\nu }(\textrm{d}x)\right) . \end{aligned}$$

It is straightforward to check that for all \(t\in [0,T]\)

$$\begin{aligned}\sup _{k\ge 1} \nu ^k_t(\mathcal {T}) \le e^{\Vert c\Vert _{\infty } \gamma (\mathcal {T})t}{{\bar{\nu }}}(\mathcal {T})\le e^{\Vert c\Vert _{\infty } \gamma (\mathcal {T})T}{{\bar{\nu }}}(\mathcal {T})=:C.\end{aligned}$$

We will show that \(\mathcal {G}\) is contractive under a suitable metric on the space of curves with initial data \({{\bar{\nu }}}\) and mass bounded by C. This implies there exists a TV-absolutely continuous curve \(\nu \) such that

$$\begin{aligned} \nu _t-\nu _s = \int _s^t \left( \kappa _{\nu _r}^{{+}}+\kappa _{\nu _r}^-\right) \, \textrm{d}r, \qquad \hbox {for all } s,t \hbox { with } 0\le s,t\le T. \end{aligned}$$

Moreover, since in the iterations \(\nu ^k\ll {{\bar{\nu }}}+\gamma \) for all \(\nu \) it is clear that we obtain strong solutions in \(L^1(\bar{\nu }+\gamma )\). In particular, for \({{\bar{\nu }}}\ll \gamma \) we have \(\nu _t\ll \gamma \) for all \(t\in [0,T]\) as well.

Now, note that \(\langle c(x,\cdot ),\nu \rangle \) depends Lipschitz on \(\nu \) in \((\Gamma ,\Vert \cdot \Vert _{TV})\) due to the uniform bound on mass. This implies that there exists a constant K such that for any two admissible curves \(\nu ,{\tilde{\nu }}\):

$$\begin{aligned} \Vert (\mathcal {G}\nu )_t-(\mathcal {G}{\tilde{\nu }})_t\Vert _{TV}\le K \int _0^t \Vert \nu _s-{\tilde{\nu }}_s\Vert _{TV} \, \textrm{d}s, \qquad \hbox {for all } t\in [0,T]. \end{aligned}$$

Hence, by a Gronwall-type argument, we find that for any \(\varepsilon >0\) for all \(t\in [0,T]\)

$$\begin{aligned} \Vert (\mathcal {G}\nu )_t-\mathcal {G}({\tilde{\nu }})_t\Vert _{TV} e^{-(K+\varepsilon ) t} \le \frac{K}{K+\varepsilon }\left( \sup _{s\in [0,T]} \Vert \nu _s-\tilde{\nu }_s\Vert _{TV} e^{-(K+\varepsilon ) s} \right) , \end{aligned}$$

thus yielding the contraction required to apply the Banach fixed-point theorem. \(\square \)

Finally, for the use in entropic propagation chaos of Theorem 5.4, it is convenient to characterize the conditions for which \(u_t\) is bounded from above and below. The following statement follows directly from a Gronwall-type argument.

Lemma 2.18

Suppose \(\nu _0\) is such that \(C^{-1} \le \textrm{d}\nu _0/\textrm{d}\gamma (x) <C\) for some constant \(C>0\) and all \(x\in \mathcal {T}\). Then there exists a constant \(C_T>0\) such that for the corresponding solution

$$\begin{aligned} C_T^{-1} \le \frac{\textrm{d}\nu _0}{\textrm{d}\gamma } (x) <C_T, \qquad \text{ for } \text{ all } x\in \mathcal {T}, \text{ for } \text{ all } t\in [0,T]. \end{aligned}$$

2.3 Variational characterization

We will now prove the non-negativity of our EDP-functional \(\mathcal {I}_{MF}\) and the characterization of strong solutions to (\(\mathsf MF\)) as minimizers of \(\mathcal {I}_{MF}\). To do so we first need the prove the chain rule for the free energy \(\mathcal {F}_{MF}\) along curves with finite \(\mathcal {I}_{MF}\).

There is an important technical issue concerning the Fisher information, in the sense that on curves with finite \(\mathcal {I}_{MF}\) the chain rule inequality holds for the following replacement:

$$\begin{aligned} \mathcal {D}^-_{MF}(\nu ):=\int _{\mathcal {T}} \Psi ^*\left( \frac{1}{2}\log u\right) \textrm{d}\theta _{\nu } = \int _{u>0} c_{\nu }(x)\left( \sqrt{u}-1\right) ^2 \textrm{d}\gamma , \end{aligned}$$

for any \(\nu \ll \gamma \) with \(u:=\textrm{d}\nu /\textrm{d}\gamma \). Note that \(0\le \mathcal {D}^-_{MF}(\nu )\le \mathcal {D}_{MF}(\nu )\) and \(\mathcal {D}^-_{MF}=\mathcal {R}_{MF}^*(\partial _{\nu } \mathcal {F}_{MF})\).

We will see the same principle arise in Sect. 3 for the variational characterization of the forward Kolmogorov equation, which is also observed in [33, Section 5].

Lemma 2.19

For any curve \((\nu ,\lambda ^+,\lambda ^-)\in \mathscr{C}\mathscr{E}\) with \(\mathcal {F}_{MF}(\nu _0)<\infty \) and \(\mathcal {I}_{MF}(\nu ,\lambda ^+,\lambda ^-)<\infty \) it holds that \([0,T]\ni t\mapsto \mathcal {F}_{MF}(\nu _t)\) is absolutely continuous and a.e. differentiable with

$$\begin{aligned} \frac{\textrm{d}\,}{\textrm{d}\, t} \mathcal {F}_{MF}(\nu _t)=\frac{1}{2} \int _{\mathcal {T}} \log \left( \frac{\textrm{d}\nu _t}{\textrm{d}\gamma }\right) \,\textrm{d}\lambda ^{\textrm{net}}_t, \qquad \hbox { for a.e.}\ t\in [0,T]. \end{aligned}$$

Moreover, for such a curve

$$\begin{aligned} \mathcal {I}_{MF}(\nu ,\lambda ^+,\lambda ^-)\ge \mathcal {I}_{MF}^-:=\int _0^T \left( \mathcal {R}_{MF}(\nu _t,\lambda ^+_t,\lambda ^-_t) +\frac{1}{2} \int _{\mathcal {T}} \log \frac{\textrm{d}\nu _t}{\textrm{d}\gamma }\,\textrm{d}\lambda ^{\textrm{net}}_t+\mathcal {D}^-_{MF}(\nu _t)\right) \, \textrm{d}t \ge 0. \end{aligned}$$

Remark 2.20

In fact, for such curves, for a.e. t both the terms

$$\begin{aligned} \int _{\mathcal {T}} \log \left( \frac{\textrm{d}\nu _t}{\textrm{d}\gamma }\right) \,\textrm{d}\lambda ^{\pm }_t, \end{aligned}$$

will be finite, and hence

$$\begin{aligned} \int _{\mathcal {T}} \log \left( \frac{\textrm{d}\nu _t}{\textrm{d}\gamma }\right) \,\textrm{d}\lambda ^{{\textrm{net}}}_t=\int _{\mathcal {T}} \log \left( \frac{\textrm{d}\nu _t}{\textrm{d}\gamma }\right) \,\textrm{d}\lambda _t^+-\int _{\mathcal {T}} \log \left( \frac{\textrm{d}\nu _t}{\textrm{d}\gamma }\right) \,\textrm{d}\lambda _t^-,\end{aligned}$$

Remark 2.21

From Lemma 2.19, it is clear that an alternative approach would be to discard the functional \(\mathcal {I}\) and only consider \(\mathcal {I}^-\), and relate minimizers to EDP-solutions, and so forth. However, the reason for the introduction of \(\mathcal {D}_{MF}\), and \(\mathcal {I}_{MF}\) by extension, is the lower semicontinuity of \(\mathcal {D}_{MF}\) and its Liouville-counterpart \(\mathcal {D}_{\infty }\) (see Sect. 4) and is related to the fact that \(\mathcal {I}_{MF}\) arises in the limit of the EDP-convergence of Sect. 5.

Proof

Fix any curve \((\nu ,\lambda ^+,\lambda ^-)\in \mathscr{C}\mathscr{E}\) with \(\mathcal {F}_{MF}(\nu _0)<\infty \). We will show that whenever \(\mathcal {I}_{MF}<\infty \) the mapping \(t \mapsto \mathcal {E}\textrm{nt}(\nu _t|\gamma )\) is absolutely continuous and satisfies the chain rule, i.e.

$$\begin{aligned} \frac{\textrm{d}\, \mathcal {E}\textrm{nt}(\nu _t|\gamma )}{\textrm{d}\, t}=\int _{\mathcal {T}} \log \left( \frac{\textrm{d}\nu _t}{\textrm{d}\gamma }\right) \textrm{d}(\lambda ^+_t-\lambda ^-_t) \,, \qquad \hbox {for a.e. } t\in [0,T]. \end{aligned}$$

Suppose that \(\mathcal {I}_{MF}<\infty \). Since \(\mathcal {E}\textrm{nt}\) is bounded from below, \(\mathcal {E}\textrm{nt}(\nu _0|\gamma )<\infty \) implies that

$$\begin{aligned} \int _0^T \mathcal {R}_{MF}(\nu _t,\lambda ^+_t,\lambda ^-_t)\, \textrm{d}t<\infty , \qquad \int _0^T \mathcal {D}_{MF}(\nu _t) \, \textrm{d}t < \infty . \end{aligned}$$

In particular for a.e. \(t\in [0,T]\) it holds that \(\nu _t\ll \gamma \), \(\lambda _t^{\pm }\ll \theta _{\nu _t}\), and in turn \(\theta _{\nu _t} \ll \gamma \). In fact, due to TV-continuity of \(\nu _t\), we have \(\nu _t\ll \gamma \) for all \(t\in [0,T]\). Moreover, \(\int _0^T \lambda ^{\pm }_t(\mathcal {T})<\infty \) and \(\sup _t \nu _t(\mathcal {T})<\infty \).

Setting \(u_t:=\textrm{d}\nu _t/\textrm{d}\gamma \), we have

$$\begin{aligned} \textrm{d}\theta _{\nu _t}=c_{\nu _t} \sqrt{u_t} \,\textrm{d}\gamma , \end{aligned}$$

and in particular \(\theta _{\nu _t}(\{u_t=0\})=0\). Similarly, \(\lambda _t^{\pm }\ll \theta _{\nu _t}\) for a.e. t and hence \(u_t>0\) for \(\lambda ^{\pm }_t,\lambda ^{\textrm{net}}_t\)-a.e. x for such t as well. Furthermore, since for a.e. t we have \(\lambda _t^{\pm }\ll \theta _{\nu _t}\ll \gamma \) we find by Lemma 2.12 that \(u: [0,T]\rightarrow L^1(\mathcal {T},\gamma )\) is absolutely continuous and differentiable at a.e. \(r\in [0,T]\).

Consider any such r with \(\mathcal {R}_{MF}(\nu _r,\lambda _r^+,\lambda _r^-),\mathcal {D}_{{MF}}(\nu _r)<\infty \). By Lemma 2.10, for any \(w\in \mathcal {B}_b(\mathcal {T})\),

$$\begin{aligned} \left| \int _{\mathcal {T}} w \,\textrm{d}\lambda ^{\textrm{net}}_r \right| \le \int _{\mathcal {T}}|w| \,\textrm{d}|\lambda _r^{{\textrm{net}}}| \le \mathcal {R}_{MF}(\nu _r,\lambda ^+_r,\lambda ^-_r)+\int _{\mathcal {T}} \Psi ^*(w) \,\textrm{d}\theta _{\nu _r}. \end{aligned}$$
(2.14)

Now let \(\phi _m\) be the convex and uniformly Lipschitz regularizations of \(\phi \) constructed by using the truncations \(\phi _m':=[\phi ']_m=\max \{\min \{\phi ,m\},-m\}\) and \(\phi (s):=\int _1^s \phi _m'(z)\, \textrm{d}z\). Note that \(\phi _m'\) converges pointwise to \(\phi '\), and both \(\phi _m\) and \(|\phi '_m|\) converge monotonically to \(\phi \) and \(|\phi '|\) respectively.

Moreover, note that \(\phi '(u_r)=\log u_r\) is \(\theta _{\nu _r}\)-a.e. finite, and similarly \(\lambda ^{\pm }_r\)-a.e. as well. Therefore, since \(\Psi ^*\) is even and monotone on \({\mathbb {R}}_{\ge 0}\) we derive

$$\begin{aligned} \begin{aligned} \int _{\mathcal {T}} \Psi ^*(\tfrac{1}{2}\phi '_m(u_r)) \,\textrm{d}\theta _{\nu _r}&\le \int _{\mathcal {T}} \Psi ^*(\tfrac{1}{2}\phi '(u_r)) \,\textrm{d}\theta _{\nu _r}= \mathcal {D}^-_{MF}(\nu _r).\\ \end{aligned} \end{aligned}$$

Recall that \(\mathcal {D}^-_{MF}(\nu _r)\le \mathcal {D}_{MF}(\nu _r)\). By substituting \(w=\tfrac{1}{2}\phi _m'\) in (2.14) we find

$$\begin{aligned} \frac{1}{2}\int _{\mathcal {T}} \phi _m'(u_r) \,\textrm{d}\lambda ^{\textrm{net}}_r\le \frac{1}{2} \int _{\mathcal {T}} |\phi _m'(u_r)|\, \textrm{d}|\lambda ^{\textrm{net}}_r|\le \mathcal {R}_{MF}(\nu _r,\lambda ^+_r,\lambda ^-_r)+\mathcal {D}^-_{MF}(\nu _r), \end{aligned}$$

and after a monotone convergence argument

$$\begin{aligned} \frac{1}{2}\int _{\mathcal {T}} \phi '(u_r) \, \textrm{d}\lambda ^{\textrm{net}}_r\le \frac{1}{2} \int _{\mathcal {T}} |\phi '(u_r)|\,\textrm{d}|\lambda ^{\textrm{net}}_r|\le \mathcal {R}_{MF}(\nu _r,\lambda ^+_r,\lambda ^-_r)+\mathcal {D}^-_{MF}(\nu _r). \end{aligned}$$
(2.15)

Note that for every m the function \(\phi _m\) is smooth and uniformly Lipschitz, thus the functional \(\int \phi _m(u_r) \,\textrm{d}\gamma \) is \(\Vert \cdot \Vert _{TV}\)-Lipschitz continuous and hence absolutely continuous by TV-regularity of \(\nu _r\). Moreover, since \(\lambda _r^{\pm }\ll \gamma \) and \(u_r\) is a.e. differentiable in \(L^1({\mathcal {T}},\gamma )\) it is straightforward to check that

$$\begin{aligned} \int _{\mathcal {T}} \phi _m(u_t) \,\textrm{d}\gamma -\int _{\mathcal {T}} \phi _m(u_s) \,\textrm{d}\gamma =\int _s^t \int _{\mathcal {T}} \phi _m'(u_r)\,\textrm{d}(\lambda ^+_r-\lambda ^-_r) \, \textrm{d}r, \qquad \hbox {for all } s,t\in [0,T]. \end{aligned}$$

Therefore, since \(\mathcal {E}\textrm{nt}(\nu _0|\gamma )\) is finite by assumption and the functionals \(\int \phi _m(u_t)\, \textrm{d}\gamma \) converge monotonically to \(\mathcal {E}\textrm{nt}(\nu _t|\gamma )\), we find

$$\begin{aligned} \begin{aligned} \frac{1}{2} \left| \int _{\mathcal {T}} \phi (u_t) \,\textrm{d}\gamma -\int _{\mathcal {T}} \phi (u_0) \, \textrm{d}\gamma \right|&\le \frac{1}{2}\limsup _{m\rightarrow \infty } \int _0^t \int _{\mathcal {T}} |\phi _m'(u_r)|\,\textrm{d}|\lambda _r^+-\lambda _r^-|\, \textrm{d}r \\&\le \int _0^T \left( \mathcal {R}_{MF}(\nu _r,\lambda _r^+,\lambda _r^-)+\mathcal {D}^-_{MF}(\nu _r)\right) \, \textrm{d}r. \end{aligned} \end{aligned}$$

In particular \(\mathcal {E}\textrm{nt}(\nu _t|\gamma )\) is finite for all \(t\in [0,T]\), and after repeating the argument for \(s,t\in [0,T]\) we conclude by a dominated convergence argument that

$$\begin{aligned} \int _{\mathcal {T}} \phi (u_t)\, \textrm{d}\gamma -\int _{\mathcal {T}} \phi (u_s) \, \textrm{d}\gamma =\int _s^t \int _{\mathcal {T}} \phi '(u_r) \,\textrm{d}\lambda ^{\textrm{net}}_r \, \textrm{d}r, \end{aligned}$$

and

$$\begin{aligned} \begin{aligned} \mathcal {I}_{MF}&= \int _0^T \left( \mathcal {R}_{MF}(\nu _t,\lambda ^+_t,\lambda _t^-)+\frac{1}{2}\int _{\mathcal {T}} \phi '(u_t)\, \textrm{d}\lambda ^{\textrm{net}}_t+\mathcal {D}_{MF}(\nu _t) \right) \, \textrm{d}t \\&\ge \int _0^T \left( \mathcal {R}_{MF}(\nu _t,\lambda ^+_t,\lambda _t^-)+\frac{1}{2}\int _{\mathcal {T}} \phi '(u_t)\, \textrm{d}\lambda ^{\textrm{net}}_t+\mathcal {D}^-_{MF}(\nu _t) \right) \, \textrm{d}t \ge 0. \end{aligned} \end{aligned}$$

\(\square \)

We are now finally in a position to prove Theorem 2.7. With the chain rule above, all that remains is on one hand showing that \(\mathcal {I}^-_{MF}(\nu ,\lambda ^+,\lambda _t^-)=0\) implies that \(\lambda _t^{\pm }=\kappa _{\nu _t}^{\pm }\) for a.e. t, and on the other hand, showing that if \(\nu \) is a strong solution it holds that \(\mathcal {I}^-_{MF}(\nu ,\kappa _{\nu }^+,\kappa _{\nu }^-)=0\) and \(\mathcal {D}^-_{MF}=\mathcal {D}_{MF}\) for a.e. \(t\in [0,T]\). The second part again involves proving a chain rule, but now along the solution curve.

Proof of Theorem 2.7

First, consider any \((\nu ,\lambda ^+,\lambda ^-)\in \mathscr{C}\mathscr{E}\) with \(\mathcal {F}_{MF}(\nu _0)<\infty \), and \(\mathcal {I}_{MF}=0\). By Lemma 2.19,

$$\begin{aligned} \int _0^T \left( \mathcal {R}_{MF}(\nu _t,\lambda ^+_t,\lambda _t^-)+\frac{1}{2}\int _{\mathcal {T}} \phi '(u_t)\, \textrm{d}\lambda ^{\textrm{net}}_t+\mathcal {D}^-_{MF}(\nu _t) \right) \, \textrm{d}t = 0. \end{aligned}$$

Now, recall that \(\textrm{d}\theta _{\nu }=c_{\nu } \sqrt{u}\, \textrm{d}\gamma \). Setting \(g_t^{\pm }:=\textrm{d}\lambda ^{\pm }_t/\textrm{d}\theta _{\nu }\), it holds that \(\log (u_t)\, g_t^{\pm }<\infty \) for \(\theta _{\nu _t}\)-a.e. x and a.e. t, and by the inequality (2.15) that \(|\log u_t|\, |g_t^+-g_t^-|\) is \(\theta _{\nu _t}\)-integrable. Therefore, by straightforward algebraic manipulations, we find that for a.e. t,

$$\begin{aligned} \begin{aligned}&\mathcal {R}_{MF}(\nu _t,\lambda ^+_t,\lambda _t^-)+\frac{1}{2}\int _{\mathcal {T}} \phi '(u_t)\, \textrm{d}\lambda ^{\textrm{net}}_t+\mathcal {D}_{MF}^-(\nu _t)\\&=\int _{\mathcal {T}} \left( \phi (g^+_t)+\tfrac{1}{2} \log (u_t) g^+_t+ \phi ^*(\tfrac{1}{2}\log u_t) + \phi (g^-_t)-\tfrac{1}{2} \log (u_t) g^-_t+ \phi ^*(-\tfrac{1}{2}\log u_t) \right) \textrm{d}\theta _{\nu _t}. \end{aligned} \end{aligned}$$

Due to the duality between \(\phi \) and \(\phi ^*\) this expression is zero if only if \(\theta _{\nu _t}\)-a.e.

$$\begin{aligned}g_t^{\pm }=(\phi ')^{-1}(\mp \tfrac{1}{2}\log u_t).\end{aligned}$$

Recalling that \(\theta _{\nu }=c_{\nu } \sqrt{u} \gamma \), \(\kappa _{\nu }^+=c_{\nu } \gamma \) and \(\kappa ^-_{\nu }=c_{\nu } u \gamma \) we find that indeed for a.e. t,

$$\begin{aligned} \lambda ^{\pm }_t = \kappa _{\nu _t}^{\pm }. \end{aligned}$$

Vice versa, assume that \(\nu _t\) is a strong solution with \(\mathcal {F}_{MF}(\nu _0)<\infty \). Recall that \(\nu _t\ll \gamma \) for all \(t\in [0,T]\) by Lemma 2.17, and hence \(\kappa _{\nu _t}^{\pm }\ll \gamma \) as well. Therefore we can again write \(u_t:=\textrm{d}\nu _t/\textrm{d}\gamma \), \(\kappa _{\nu }^+=c_{\nu } \gamma \), \(\kappa ^-_{\nu }=c_{\nu } u \gamma \) and \(\theta _{\nu }=c_{\nu } \sqrt{u} \gamma \). Moreover, \(u: [0,T]\rightarrow L^1(\mathcal {T},\gamma )\) is absolutely continuous and a.e. differentiable, and thus for every regularized entropy function:

$$\begin{aligned} \int _{\mathcal {T}} \phi _m(u_T) \,\textrm{d}\gamma -\int _{\mathcal {T}} \phi _m(u_0) \,\textrm{d}\gamma =\int _0^T \int _{\mathcal {T}} c_{\nu _t} \phi _m'(u_t)(1-u_t)\,\textrm{d}\gamma \, \textrm{d}t. \end{aligned}$$

Note that the latter expression is non-positive since \(\phi _{{m}}'(z)(z-1)\) is non-negative, due to the convexity of \(\phi _m\) and \(\phi _m(1)=0\). Moreover, recall that the regularized entropies converge for every \(\nu \), are non-negative, and \(\mathcal {E}\textrm{nt}(\nu _0|\gamma )<{\infty }\) by assumption. Therefore

$$\begin{aligned} \limsup _{m\rightarrow \infty } \int _0^T \int _{\mathcal {T}} c_{\nu _t} \phi _m'(u_t)(u_t-1)\,\textrm{d}\gamma \, \textrm{d}t \le \mathcal {E}\textrm{nt}(\nu _0|\gamma ) < \infty .\end{aligned}$$

It is clear that to obtain \(\mathcal {I}_{MF}=0\) it is sufficient to prove that for any \(\nu \) with \(\nu \ll \gamma \),

$$\begin{aligned} \frac{1}{2} \lim _{m\rightarrow \infty } \int _{\mathcal {T}} c_{\nu } \phi _m'(u)(u-1)\,\textrm{d}\gamma =\mathcal {R}_{{MF}}(\nu ,\kappa _{\nu }^+,\kappa _{\nu }^-)+\mathcal {D}_{{MF}}(\nu ). \end{aligned}$$

By non-negativity of the integrand both

$$\begin{aligned} \lim _{m\rightarrow \infty } \int _{u=0} c_{\nu } \phi _m'(u)(u-1)\,\textrm{d}\gamma < \infty . \end{aligned}$$

and

$$\begin{aligned} \lim _{m\rightarrow \infty } \int _{u>0} c_{\nu } \phi _m'(u)(u-1)\,\textrm{d}\gamma < \infty . \end{aligned}$$

Since \(\phi _{{m}}'(0)=-m\) this implies that in fact for all m

$$\begin{aligned} \int _{u=0} c_{\nu } \phi _m'(u)(u-1)\,\textrm{d}\gamma =m \int _{u=0} c_{\nu } \,\textrm{d}\gamma , \end{aligned}$$

but since the former is finite after taking the limit \(m\rightarrow \infty \), we deduce that

$$\begin{aligned}\int _{u=0} c_{\nu } \,\textrm{d}\gamma =0,\end{aligned}$$

and hence \(\gamma (\{u=0,c_{\nu }>0{\}})=0\). Moreover, by monotone convergence we have

$$\begin{aligned} \int _{u>0} c_{\nu } \log (u)(u-1)\,\textrm{d}\gamma =\lim _{m\rightarrow \infty } \int _{u>0} c_{\nu } \phi _m'(u)(u-1)\,\textrm{d}\gamma . \end{aligned}$$

Note by straightforward algebraic manipulation that

$$\begin{aligned} \tfrac{1}{2}\log (z)(z-1)=\phi (\sqrt{z})\sqrt{z}+\phi (1/\sqrt{z})\sqrt{z}+(\sqrt{z}-1)^2\, \qquad \text{ for } \text{ all } z>0. \end{aligned}$$

Therefore

$$\begin{aligned} \begin{aligned} \frac{1}{2}\int _{u>0} c_{\nu } \log (u)(u-1) \textrm{d}\gamma&=\int _{u>0,\,c_{\nu }>0}c_{\nu } \left( \phi \left( \sqrt{u}\right) \sqrt{u}+\phi \left( 1/\sqrt{u}\right) \sqrt{u{}}+(\sqrt{u}-1)^2 \right) \textrm{d}\gamma \\&=\int _{u>0,\,c_{\nu }>0}\left( \phi \left( \frac{\textrm{d}\kappa _{\nu }^+}{\textrm{d}\theta _{\nu }}\right) \frac{\textrm{d}\theta _{\nu }}{\textrm{d}\gamma }+\phi \left( \frac{\textrm{d}\kappa _{\nu }^-}{\textrm{d}\theta _{\nu }}\right) \frac{\textrm{d}\theta _{\nu }}{\textrm{d}\gamma }+c_{\nu }(\sqrt{u}-1)^2 \right) \textrm{d}\gamma .\\ \end{aligned} \end{aligned}$$

Since all terms are non-negative we can separate terms and reduce the expression to

$$\begin{aligned} \tfrac{1}{2}\int _{u>0} c_{\nu } \log (u)(u-1) \textrm{d}\gamma = \mathcal {R}_{MF}(\nu ,\kappa _{\nu }^+,\kappa _{\nu }^-) +\mathcal {D}_{MF}(\nu ). \end{aligned}$$

Here the equality follows from the fact that \(\gamma (\{u=0,c_{\nu }>0{\}})=0\) and hence

$$\begin{aligned} \int _{u>0,\,c_{\nu }>0} c_{\nu } (\sqrt{u}-1)^2 \textrm{d}\gamma = \int _{\mathcal {T}} c_{\nu } (\sqrt{u}-1)^2 \textrm{d}\gamma =\mathcal {D}_{MF}(\nu ), \end{aligned}$$

i.e. \(\mathcal {D}_{MF}^ -(\nu )=\mathcal {D}_{MF}(\nu )\), and

$$\begin{aligned} \int _{u>0,\,c_{\nu }>0} \phi \left( \frac{\textrm{d}\kappa _{\nu }^{\pm }}{\textrm{d}\theta _{\nu }}\right) \frac{\textrm{d}\theta _{\nu }}{\textrm{d}\gamma } \textrm{d}\gamma = \mathcal {E}\textrm{nt}(\kappa ^{\pm }_{\nu }|\theta _{\nu }).\end{aligned}$$

\(\square \)

3 Forward Kolmogorov equation

In the Introduction, we discussed how the BPDL model describes a measure-valued process \(\nu ^n_t\) in \(\Gamma \) involving particles being created and annihilated, with the corresponding Forward Kolmogorov equation

figure j

where \(\textsf{P}_t \in \mathcal {P}(\Gamma )\) for all \(t\in [0,T]\) and \(Q_n^*\) is the dual of the infinitesimal generator \(Q_n\) with

$$\begin{aligned} \qquad (Q_n F)(\nu ) = n \int _{\mathcal {T}} \big (F(\nu +\tfrac{1}{n}\delta _x)-F(\nu )\big )\, \kappa ^+_{\nu }(\textrm{d}x)+n \int _{\mathcal {T}} \big (F(\nu -\tfrac{1}{n}\delta _x)-F(\nu )\big )\, \kappa ^-_{\nu }(\textrm{d}x),\nonumber \\ \end{aligned}$$
(3.1)

for all \(F\in C_c(\Gamma )\). Throughout this section, the parameter \(n>0\) will be fixed.

In the case of \(\mathcal {T}={\mathbb {R}}^d\) it is shown in [18] that a measure-valued process with generator \(Q_n\) exists, and is in fact a jump process in \(\Gamma \) corresponding to the jump kernel \({{\bar{\kappa }}}_n\) shown below. However, for our general setting with \(\mathcal {T}\) a compact Polish space, we will take (\(\mathsf FKE_n\)) simply as a starting point, and do not consider the existence or convergence of the measure-valued process \(\nu _t^n\) itself—even though we will sometimes borrow the language of jump processes for illustration purposes.

In this section, we will state the general version of Theorem 1.6, by showing that a detailed balance condition holds, establishing a generalized gradient structure for the Forward-Kolmogorov equation, and characterizing the solutions as minimizers of corresponding EDP-functionals. Similar to Sect. 2 we first give an overview of the ingredients to state the main results and then leave the proofs for the existence of solutions and the variational characterization to Sects. 3.2 and 3.3.

Note that since

$$\begin{aligned}\sup _{\nu \in \Gamma } \kappa ^{\pm }_{\nu }(\mathcal {T})=+\infty ,\end{aligned}$$

the operator \(Q_n\) is not bounded on \(\mathcal {B}_b(\Gamma )\). If it were, suitable solutions and possible variational formulation would fall into the framework of [33], where triples \((V,\pi ,\kappa )\) are considered, with V a Polish space, \(\pi \) a finite measure, and \(\kappa (x,\textrm{d}y)\) a jump kernel satisfying a detailed balance condition with respect to \(\pi \) and the boundedness condition

$$\begin{aligned} \sup _{x} \int _{V} k(x,\textrm{d}y) < \infty . \end{aligned}$$

They construct solutions to the forward Kolmogorov equation that are absolutely continuous to \(\pi \) and characterize them as minimizers of a suitable EDP functional involving the net flux. In this section, we generalize part of this framework to unbounded kernels and so-called one-way or uni-directional fluxes and tailor it to our setting of interacting particle systems.

Namely, let the rescaled empirical measure mapping \(L_n:\coprod _{N\ge 1} \mathcal {T}^{N} \rightarrow \Gamma \) be given as

$$\begin{aligned} L_n(x_1,\dots ,x_N):=\frac{1}{n} \sum _{i=1}^N \delta _{x_i}. \end{aligned}$$
(3.2)

and let \(\Gamma _n\subset \Gamma \) be the space of finite positive discrete measures with common unit weight \(\tfrac{1}{n}\), i.e.

$$\begin{aligned} \Gamma _n:= L_n\left( \coprod _{N\ge 1}\mathcal {T}^N\right) . \end{aligned}$$
(3.3)

Note that the operators \(Q_n, Q_n^*\) can be represented as

$$\begin{aligned} \begin{aligned} (Q_n F)(\nu )&= \int _{\Gamma _n} \left( F(\eta )-F(\nu )\right) {{\bar{\kappa }}}(\nu ,\textrm{d}\eta ), \\ (Q_n^* \textsf{P})(\textrm{d}\nu )&= \int _{\eta \in \Gamma _n} \textsf{P}(\textrm{d}\eta ) {{\bar{\kappa }}}_n(\eta ,\textrm{d}\nu ) - \textsf{P}(\textrm{d}\nu )\int _{\eta \in \Gamma _n} {{\bar{\kappa }}}_n(\nu ,\textrm{d}\eta ), \end{aligned} \end{aligned}$$

where \({{\bar{\kappa }}}_n(\nu ,\cdot ) \in \mathcal {M}^+(\Gamma _n)\) for all \(\nu \in \Gamma _n\) is a jump kernel over \(\Gamma _n\) given by

$$\begin{aligned} {{\bar{\kappa }}}_n(\nu ,\textrm{d}\eta ):= n \int _{\mathcal {T}}\delta _{\nu + \tfrac{1}{n} \delta _x}(\textrm{d}\eta )\, \kappa ^+_{\nu }(\textrm{d}x) + n \int _{\mathcal {T}} \delta _{\nu -\tfrac{1}{n}\delta _x}(\textrm{d}\eta )\,\kappa ^-_{\nu }(\textrm{d}x). \end{aligned}$$
(3.4)

Moreover, we consider Poisson measures \(\Pi _n\in \mathcal {P}(\Gamma _n)\) induced by the reference measure \(\gamma \). Namely, with the measure \(\pi _n \in \mathcal {P}(\coprod _{N\ge 1}\mathcal {T}^N)\) given by

$$\begin{aligned} \pi _n:=\frac{1}{e^{n \gamma (\mathcal {T})}-1}\sum _{N=1}^{\infty } \frac{n^N}{N!}\gamma ^{\otimes N}, \end{aligned}$$
(3.5)

we define

$$\begin{aligned} \Pi _n:=(L_n)_{\#} \pi _{{n}}. \end{aligned}$$
(3.6)

We will show in Lemma 3.12 that the measures \(\Pi _n\) are invariant measures of (\(\mathsf FKE_n\)) and that \({{\bar{\kappa }}}_{n}\) satisfies the detailed balance condition with respect to \(\Pi _n\), i.e. we have the symmetry

$$\begin{aligned} \Pi _n(\textrm{d}\nu ) {{\bar{\kappa }}}_n(\nu ,\textrm{d}\eta )=\Pi _n(\textrm{d}{\eta }) {{\bar{\kappa }}}_n(\nu ,\textrm{d}{\nu }). \end{aligned}$$
(3.7)

It is straightforward to check that even though \({{\bar{\kappa }}}_n\) is unbounded, we still have the weighted integrability condition

$$\begin{aligned} \sup _{\nu \in \Gamma _n} \left\{ (1+\nu (\mathcal {T})^{-2}) \int _{\Gamma _n} {{\bar{\kappa }}}_{\nu }(\nu ,\textrm{d}\eta )\right\} < +\infty . \end{aligned}$$

Therefore we can still bootstrap from gradient-flow solutions in the sense of [33] for regularized triples \((\Gamma _n,\Pi _n,{{\bar{\kappa }}}^{\varepsilon }_n)\), after passing from a net flux to a one-way flux formulation, see “Appendix A”, to obtain unique gradient-flow solutions as defined in Sect. 3.2.

To discuss the continuity equation and the dissipation potentials properly, we need to introduce some additional notation. We define the following creation and annihilation operators:

$$\begin{aligned} \begin{aligned} \textsf{T}^{n,+}&:\Gamma _n\times \mathcal {T}\rightarrow \Gamma _{n}\times \mathcal {T},\qquad \textsf{T}^{n,+}(\nu ,x) = (\nu + \tfrac{1}{n}\delta _x,x) =: (\textsf{T}_x^{n,+}\nu ,x),\\ \textsf{T}^{n,-}&:\Gamma _{n}\times \mathcal {T}\rightarrow \Gamma _{n}\times \mathcal {T},\qquad \textsf{T}^{n,-}(\nu ,x) = (\nu - \tfrac{1}{n}\delta _x,x) =: (\textsf{T}_x^{n,-}\nu ,x), \end{aligned} \end{aligned}$$
(3.8)

with the convention that \(\textsf{T}^{n,-}(\nu ,x)={(}\nu {,x)}\) if \(x\notin \textrm{supp}(\nu )\). Note that \(\textsf{T}^{n,-} \circ \textsf{T}^{n,+}=\textsf{Id}\) always holds, and \(\textsf{T}^{n,+} \circ \textsf{T}^{n,-} (\nu ,x)=(\nu ,x)\) whenever \(x\in \textrm{supp}(\nu )\).

We further define the discrete \(\Gamma _n\)-gradients \(\overline{\nabla }^{n,\pm }: C_c(\Gamma _n)\rightarrow C_c( \Gamma _n\times \mathcal {T})\):

$$\begin{aligned} (\overline{\nabla }^{n,\pm } F)(\nu ,x):= n(F(\textsf{T}_x^{n,\pm }\nu ) - F(\nu )), \end{aligned}$$
(3.9)

and the corresponding \(\Gamma _n\)-divergence \(\overline{\text {div}}^{n,\pm }: \mathcal {M}_{loc}^+(\Gamma _n\times \mathcal {T})\rightarrow \mathcal {M}_{loc}(\Gamma _n)\), dual to \(\overline{\nabla }^{n,\pm }\), given by

$$\begin{aligned} (\overline{\text {div}}^{n,\pm } \textsf{J}) = n\left( \textsf{p}^{\Gamma _n}_\#\textsf{J}- (\textsf{p}^{\Gamma _n}\circ \textsf{T}^{n,\pm })_\# \textsf{J}\right) , \end{aligned}$$
(3.10)

where \(\textsf{p}^{\Gamma _n}:{\Gamma _n}\times \mathcal {T}\rightarrow \Gamma _n\) denotes the projection to the first variable.

We consider the families of curves satisfying

figure k

in the following appropriate distributional sense.

Definition 3.1

(Continuity equation) 

A triple \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\) satisfies the continuity equation \(\textsf{CE}_n\), if

  1. (1)

    the curve \([0,T]\ni t\mapsto \textsf{P}_t\in \mathcal {P}(\Gamma _n)\) is narrowly continuous,

  2. (2)

    the Borel family \((\textsf{J}^{\pm }_t)_{t\in [0,T]}\in \mathcal {M}^+_{loc}(\Gamma _n\times \mathcal {T})\) satisfies

    $$\begin{aligned} \textrm{supp}(\textsf{J}^-_t) \subseteq \left\{ (\nu ,x)\,:\, \nu (\mathcal {T})\ge \tfrac{2}{n}, \, x\in \textrm{supp}(\nu ) \right\} , \end{aligned}$$
  3. (3)

    \(\int _0^T \int _{\Gamma _n\times \mathcal {T}} (1+\nu (\mathcal {T})^2)^{-1}\,\textrm{d}\textsf{J}^{\pm }_{t} \, \textrm{d}t<\infty \),

  4. (4)

    for every \(s,t\in [0,T]\) and all \(F\in C_c(\Gamma _n)\)

    $$\begin{aligned} \int _{\Gamma _n} F(\nu ) \,\textrm{d}\textsf{P}_t - \int _{\Gamma _n} F(\nu ) \,\textrm{d}\textsf{P}_s = \int _s^t \int _{\Gamma _n\times \mathcal {T}} \left( (\overline{\nabla }^{n,+} F) \,\textrm{d}\textsf{J}_r^++(\overline{\nabla }^{n,-} F) \, \textrm{d}\textsf{J}_r^{-} \right) \, \textrm{d}r.\nonumber \\ \end{aligned}$$
    (3.11)

Throughout we will call arbitrary measures \(\textsf{J}^{\pm } \in \mathcal {M}^+_{loc}(\Gamma _n\times \mathcal {T})\) admissible if

$$\begin{aligned} \textrm{supp}(\textsf{J}^-) \subseteq \left\{ (\nu ,x)\,:\, \nu (\mathcal {T})\ge \tfrac{2}{n}, \, x\in \textrm{supp}(\nu ) \right\} \end{aligned}$$

and

$$\begin{aligned}\int _{\Gamma _n\times \mathcal {T}} (1+\nu (\mathcal {T})^2)^{-1}\,\textrm{d}\textsf{J}^{\pm }<\infty .\end{aligned}$$

Moreover, since \(\Gamma _n\) is a closed subspace of the Polish space \(\Gamma \), the extension of \(\textsf{P}\) to \(\mathcal {P}(\Gamma )\) and the extension of \(\textsf{J}^{\pm }\) to \(\mathcal {M}^+_{loc}(\Gamma \times \mathcal {T})\) are well-defined. For simplicity, we will simply refer to them as \(\textsf{P}\), \(\textsf{J}^{\pm }\) as well, and drop the n-dependence in most arguments.

It is also clear that for any admissible \(\textsf{J}^{\pm }\)

$$\begin{aligned} (\overline{\nabla }^{n,\pm } F)(\nu ,x):=n\left( F(\nu \pm \tfrac{1}{n}\delta _x)-F(\nu )\right) , \qquad (\nu ,x)\in \textrm{supp}(\textsf{J}^\pm ) \end{aligned}$$

and in particular (3.11) is equivalent to

$$\begin{aligned}{} & {} \int _{\Gamma } F(\nu ) \,\textrm{d}\textsf{P}_t - \int _{\Gamma } F(\nu ) \,\textrm{d}\textsf{P}_s \\{} & {} \quad = \int _s^t \int _{\Gamma \times \mathcal {T}} \Big ( n\big (F(\nu +\tfrac{1}{n}\delta _x)-F(\nu )\big )\,\textrm{d}\textsf{J}_r^++n\big (F(\nu -\tfrac{1}{n}\delta _x)-F(\nu )\big )\, \textrm{d}\textsf{J}_r^- \Big ) \, \textrm{d}r. \end{aligned}$$

for all \(F\in C_c(\Gamma )\). Note that this can again be extended to all \(F\in \mathcal {B}_c(\Gamma )\) via a monotone class argument.

Remark 3.2

Condition (2) represents the restriction that particles can only be deleted if there are at least two particles in the system, consistent with the fact that \(\textsf{P}\in \mathcal {P}(\Gamma _n)\) and hence the underlying process never attains \(\nu =0\).

Moreover, condition (3) reflects the unboundedness of the observed fluxes \(\textsf{J}^{\pm }\), which stems from the unboundedness of the birth/death kernels \(\kappa ^{\pm }_{\nu }\) in \(\nu \).

Remark 3.3

Whenever \(\textsf{J}^{\pm }\) are of the form

$$\begin{aligned} \textsf{J}_t^{\pm }(\textrm{d}\nu ,\textrm{d}x)=\textsf{P}_t(\textrm{d}\nu ) \lambda ^{\pm }[t,\nu ](\textrm{d}x)\end{aligned}$$

with \(\lambda ^{\pm }[t,\nu ]\in \mathcal {M}^+(\mathcal {T})\) for all \(\nu \in \Gamma \) and \(t\in [0,T]\), the continuity equation (3.11) describes the forward Kolmogorov equation corresponding to an interacting birth/death process with the birth/death kernels \(\lambda ^{\pm }[t,\nu ]\) depending on both time and the empirical measure of the particles \(\nu \). The time-dependent jump kernel is then given by

$$\begin{aligned} {{\bar{\kappa }}}_{n,t}(\textrm{d}\nu ,\textrm{d}\eta )= n \left( \int _{\mathcal {T}}\delta _{\nu +\tfrac{1}{n}\delta _x}(\textrm{d}\eta ) \, \lambda ^{+}[t,\nu ](\textrm{d}x) + \int _{\mathcal {T}} \delta _{\nu -\tfrac{1}{n}\delta _x}(\textrm{d}\eta )\, \lambda ^{-}[t,\nu ](\textrm{d}x)\right) . \end{aligned}$$

To define the dissipation potentials, let us introduce the measures \(\vartheta _\textsf{P}^{\pm } \in \mathcal {M}_{loc}^+(\Gamma \times \mathcal {T})\)

$$\begin{aligned} \vartheta _\textsf{P}^{\pm }(\textrm{d}\nu \,\textrm{d}x):= \textsf{P}(\textrm{d}\nu )\kappa ^{\pm }_{\nu }(\textrm{d}x). \end{aligned}$$
(3.12)

Note that for any curve \((\textsf{P}_t)_{t\in [0,T]}\) the measures \(\textsf{J}_t^{\pm }:=\vartheta _{\textsf{P}_t}^{\pm }\) satisfy the conditions (2) and (3), where the former holds because \(c(x,x)=0\).

Moreover, as will be shown in Lemma 3.12, we have the following symmetry

$$\begin{aligned} \vartheta _{\Pi _n}^{\pm }=\textsf{T}^{n,\mp }_{\#}\vartheta ^{\mp }_{\Pi _n}. \end{aligned}$$
(3.13)

from which the detailed balance condition (3.7) directly follows.

Definition 3.4

Let \(\Theta ^{n,\pm }_{\textsf{P}}\in \mathcal {M}_{loc}(\Gamma \times \mathcal {T})\) be the geometric average of \(\vartheta ^{\pm }_{\textsf{P}}\) and \(\textsf{T}^{n,\mp }_{\#}\vartheta ^{\mp }_{\textsf{P}}\), i.e.

$$\begin{aligned} \Theta ^{n,\pm }_{\textsf{P}}(\textrm{d}\nu ,\textrm{d}x):=\sqrt{\frac{\textrm{d}\vartheta ^{\pm }_{\textsf{P}}}{\textrm{d}\Sigma }\frac{\textrm{d}(\textsf{T}^{n,\mp }_{\#}\vartheta ^{\mp }_{\textsf{P}})}{\textrm{d}\Sigma }}\, \,\textrm{d}\Sigma , \end{aligned}$$
(3.14)

for any dominating measure \(\Sigma \).

The dissipation potential \(\mathcal {R}_n:\mathcal {P}(\Gamma )\times \mathcal {M}_{loc}^+(\Gamma \times \mathcal {T})^2\rightarrow [0,+\infty ]\) and dual dissipation potential \(\mathcal {R}^*_n:\mathcal {P}(\Gamma )\times \mathcal {B}_c(\Gamma \times \mathcal {T})^2\) are given by

$$\begin{aligned} \begin{aligned} \mathcal {R}_n(\textsf{P},\textsf{J}^+,\textsf{J}^-)&:=\mathcal {E}\textrm{nt}(\textsf{J}^{+}|\Theta ^{n,+}_{\textsf{P}})+\mathcal {E}\textrm{nt}(\textsf{J}^{-}|\Theta ^{n,-}_{\textsf{P}}),\\ \mathcal {R}_n^*(\textsf{P},\omega ^+,\omega ^-)&:=\int _{\Gamma \times \mathcal {T}} (e^{\omega ^{+}}-1)\, \textrm{d}\Theta ^{n,+}_{\textsf{P}}+\int _{\Gamma \times \mathcal {T}} (e^{\omega ^{-}}-1)\, \textrm{d}\Theta ^{n,-}_{\textsf{P}} \end{aligned} \end{aligned}$$

For the free energy \(\mathcal {F}_n:\mathcal {P}(\Gamma )\rightarrow [0,+\infty ]\) and Fisher information \(\mathcal {D}_n:\mathcal {P}(\Gamma )\rightarrow [0,+\infty ]\)

$$\begin{aligned} \begin{aligned} \mathcal {F}_n(\textsf{P})&:=\tfrac{1}{2n} \mathcal {E}\textrm{nt}(\textsf{P}|\Pi _n)\\ \mathcal {D}_n(\textsf{P})&:=\left\{ \begin{aligned}&H^2(\vartheta _{\textsf{P}}^{+},\textsf{T}_{\#}^{n,-}\vartheta _{\textsf{P}}^{-})+H^2(\vartheta _{\textsf{P}}^{-},\textsf{T}_{\#}^{n,+}\vartheta _{\textsf{P}}^{+})\qquad{} & {} \hbox {if } \textsf{P}\ll \Pi _n,\\&+\infty{} & {} \hbox {otherwise}.\\ \end{aligned}\right. \end{aligned} \end{aligned}$$

For the EDP-functional \(\mathcal {I}_{n}:\textsf{CE}_n\rightarrow [0,+\infty ]\) for all curves with \(\mathcal {F}_{n}(\textsf{P}_0)<\infty \)

$$\begin{aligned} \mathcal {I}_{n}(\textsf{P},\textsf{J}^+,\textsf{J}^-):=\int _0^T \mathcal {R}_{n}(\textsf{P}_t,\textsf{J}_t^+,\textsf{J}^-_t) \, \textrm{d}t + \mathcal {F}_n({\textsf{P}_T})-\mathcal {F}_n({\textsf{P}_0})+\int _0^T \mathcal {D}_{n}(\textsf{P}_t) \, \textrm{d}t. \end{aligned}$$

Remark 3.5

The definition of \(\Theta _{\textsf{P}}^{n,\pm }\) is independent of the dominating measure \(\Sigma \). Moreover, formally

$$\begin{aligned}\Theta _{\textsf{P}}^{n,+}(\nu ,x)=\sqrt{ (\textsf{P}(\nu )\kappa ^+[\nu ])(\textsf{P}(\nu +\tfrac{1}{n}\delta _x)\kappa ^-[\nu +\tfrac{1}{n}\delta _x}]), \end{aligned}$$

i.e. it represents the geometric mean of the expected fluxes going forwards and backwards along the transition \(\nu \leftrightarrow \nu +\tfrac{1}{n}\delta _x\).

In addition, due to the symmetry (3.13) the measures \(\Theta _{\textsf{P}}^{n,\pm }\) simplify whenever \(\textsf{P}\ll \Pi _n\), i.e. if \(\textrm{d}\textsf{P}= U \textrm{d}\Pi _n\) we have

$$\begin{aligned} \Theta ^{n,\pm }_{\textsf{P}}(\textrm{d}\nu ,\textrm{d}x)=\sqrt{U(\nu )U(\nu \pm \tfrac{1}{n}\delta _x)}\, \vartheta _{\Pi _n}^{\pm }(\textrm{d}\nu ,\textrm{d}x). \end{aligned}$$

Remark 3.6

Note that \(\mathcal {D}_n\) is a jointly convex function in \((\vartheta ^{\pm }_{\textsf{P}},\textsf{T}_{\#}^{n,\mp } \vartheta _{\textsf{P}}^{\mp })\), and lower semicontinuous if \(\mathcal {F}_{n}\) is bounded. Moreover, it is straightforward to check that whenever \(\textsf{P}\ll \Pi _n\) with \(\textrm{d}\textsf{P}= U \Pi _n\) it holds

$$\begin{aligned} \mathcal {D}_n(\textsf{P})&=\frac{1}{2}\int _{\Gamma \times \mathcal {T}} \left( \sqrt{U(\nu +\tfrac{1}{n}\delta _x)}-\sqrt{U(\nu )}\right) ^2 \textrm{d}\vartheta ^{+}_{\Pi _n}\\&\quad +\frac{1}{2}\int _{\Gamma \times \mathcal {T}} \left( \sqrt{U(\nu -\tfrac{1}{n}\delta _x)}-\sqrt{U(\nu )}\right) ^2 \textrm{d}\vartheta ^{-}_{\Pi _n}\\&=\int _{\Gamma \times \mathcal {T}} \left( \sqrt{U(\nu \pm \tfrac{1}{n}\delta _x)}-\sqrt{U(\nu )}\right) ^2 \textrm{d}\vartheta ^{\pm }_{\Pi _n}. \end{aligned}$$

Finally, for technical purposes, we also introduce a version for net fluxes.

Definition 3.7

The upward net flux \(\textsf{J}^{\textrm{net}}\) is defined as

$$\begin{aligned} \textsf{J}^{\textrm{net}}:=\textsf{J}^+-\textsf{T}^{n,-}_{\#} \textsf{J}^{-} \end{aligned}$$

Note that \(\textsf{J}^{\textrm{net}}(\nu ,x)\) can be interpreted as the net flux along the jump \(\nu \leftrightarrow \nu +\tfrac{1}{n}\delta _x\).

The continuity equation for the net flux reduces to

$$\begin{aligned} \int _{\Gamma } F(\nu ) \,\textrm{d}\textsf{P}_t - \int _{\Gamma } F(\nu ) \,\textrm{d}\textsf{P}_s = \int _s^t \int _{\Gamma \times \mathcal {T}} n\big (F(\nu +\tfrac{1}{n}\delta _x)-F(\nu )\big )\,\textrm{d}\textsf{J}^{\textrm{net}}_r\, \textrm{d}r \end{aligned}$$

We are now in a position to give the general version of Theorem 1.6.

Theorem 3.8

For any \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{n}\) with \(\mathcal {F}_n(\textsf{P}_0)<\infty \) we have \(\mathcal {I}_{n}(\textsf{P},\textsf{J}^+,\textsf{J}^-)\ge 0\),

$$\begin{aligned} \mathcal {I}_{n}(\textsf{P},\textsf{J}^+,\textsf{J}^-)=0 \implies \left\{ \begin{aligned} \quad&\textsf{P}_t \hbox {is a weak solution to} (\mathsf FKE_n) \quad \\ \quad&\textsf{J}^{\pm }_t=\textsf{P}_t \kappa _{\nu }^{\pm } \quad \hbox {for a.e. } t\in [0,T], \quad \end{aligned} \right. \end{aligned}$$
(3.15)

and there exist a unique gradient-flow solution, i.e. a curve \((\textsf{P})\) such that \(\mathcal {I}_{n}(\textsf{P},\textsf{P}_t \kappa _{\nu }^{+},\textsf{P}_t \kappa _{\nu }^{-})=0\).

Moreover, whenever \(\mathcal {F}_n(\textsf{P}_0)<\infty \) and \(\mathcal {I}_{n}(\textsf{P},\textsf{J}^+,\textsf{J}^-) < \infty \), the chain rule for \(\mathcal {F}_{n}\) and the net flux holds holds: \(\mathcal {F}_{n}(\textsf{P}_t)\) is absolutely continuous and

$$\begin{aligned} \frac{\textrm{d}\,}{\textrm{d}t} \mathcal {F}_n(\textsf{P}_t)=\frac{n}{2}\int _{\Gamma \times \mathcal {T}} (\log U(\nu +\tfrac{1}{n}\delta _x)-\log U(\nu ))\,\textrm{d}\textsf{J}^{\textrm{net}}_t, \qquad \hbox {for a.e. } t\in [0,T]. \end{aligned}$$

The proof of Theorem 3.8 is postponed to Sect. 3.3 and follows from the existence of a gradient-flow solution via EDP-convergence of a sequence of regularized problems established in Sect. 3.2, and its uniqueness via a convexity argument.

Remark 3.9

Similar to the mean-field case, the non-negativity of \(\mathcal {I}_{n}\) and the identification of solutions to (\(\mathsf MF\)) as null-minimizers of \(\mathcal {I}_n\) is related to the formal equivalence

$$\begin{aligned} \mathcal {I}_{n}(\textsf{P},\textsf{J}^+,\textsf{J}^-)=\int _0^T \mathcal {L}_n(\textsf{P}_t,\textsf{J}^+_t,\textsf{J}^-_t) \,\textrm{d}t, \end{aligned}$$

where \(\mathcal {L}_n\) is the so-called Lagrangian given by

$$\begin{aligned} \mathcal {L}_n(\textsf{P},\textsf{J}^+,\textsf{J}^-):=\mathcal {E}\textrm{nt}(\textsf{J}^+|\textsf{P}\kappa _{\nu }^+)+\mathcal {E}\textrm{nt}(\textsf{J}^-|\textsf{P}\kappa _{\nu }^-).\end{aligned}$$

We discuss the implication of this relation in “Appendix A”.

Remark 3.10

(Net flux) To show the existence of gradient-flow solutions in the sense of null-minimizers of \(\mathcal {I}_n\) we will have to jump from gradient-flow solutions in the sense of [33], see Theorem 3.20. The expressions for net fluxes are in fact contractions of those for one-way or uni-directional fluxes, as discussed in Section A, which we use to show that the two notions of gradient-flow solutions are equivalent.

3.1 A priori estimates

Below we will state the estimates and identities necessary to prove the chain rule and establish the existence of solutions.

Recall that \(\vartheta _{\textsf{P}}^{\pm }\) satisfies the same restrictions (Conditions (2) and (3)) as the fluxes \(\textsf{J}^{\pm }\). This is easily verified, but since we will use it repeatedly let us state it here precisely.

Lemma 3.11

For any \(\textsf{P}\in \mathcal {P}(\Gamma _n)\)

$$\begin{aligned} \textrm{supp}(\vartheta _{\textsf{P}}^-) \subseteq \left\{ (\nu ,x)\,:\, \nu (\mathcal {T})\ge \tfrac{2}{n}, \, x\in \textrm{supp}(\nu ) \right\} . \end{aligned}$$

In particular, for any \(\omega \in C_c(\Gamma \times \mathcal {T})\)

$$\begin{aligned} \int _{\Gamma \times \mathcal {T}} \omega (\nu ,x) \,\textrm{d}(\textsf{T}^{n,\pm }_{\#} \vartheta _{\textsf{P}}^\pm )&=\int _{\Gamma \times \mathcal {T}} \omega (\nu \pm \tfrac{1}{n}\delta _x,x)\, \textrm{d}\vartheta _{\textsf{P}}^\pm , \end{aligned}$$

and

$$\begin{aligned} \textsf{T}^{n,\mp }_{\#} \circ \textsf{T}^{n,\pm }_{\#} \vartheta _{\textsf{P}}^{\pm }=\vartheta _{\textsf{P}}^{\pm }.\end{aligned}$$

Finally,

$$\begin{aligned} \textsf{T}^{n,\pm }_{\#} \Theta ^{n,\pm }_{\textsf{P}} = \Theta ^{n,\mp }_{\textsf{P}}. \end{aligned}$$

The above identities allow us to prove the symmetry condition that implies the detailed balance condition (3.7).

Lemma 3.12

(Detailed balance)

$$\begin{aligned} \vartheta _{\Pi _n}^{\pm }=\textsf{T}^{n,\mp }_{\#}\vartheta ^{\mp }_{\Pi _n}. \end{aligned}$$

Proof

Fix an arbitrary \(\omega \in C_c(\Gamma \times \mathcal {T})\), and for any ordered collection of N variables in \(\mathcal {T}\) set \({\textbf{x}}^{N}:=(x_1,\dots ,x_N) \in {\mathcal {T}}^N\). We then have the following.

$$\begin{aligned}&\int _{\Gamma \times \mathcal {T}}\omega (\nu ,x) \,\textrm{d}\vartheta ^+_{\Pi _n}\\&\quad =\frac{1}{e^{n \gamma (\mathcal {T})}-1}\sum _{N=1}^{\infty } \frac{n^N}{N!} \int _{\mathcal {T}^N} \left( \int _{\mathcal {T}} \omega \left( L_n({\textbf{x}}^{N}),x\right) \kappa ^+\left[ L_n(\mathbf {x^N})\right] (\textrm{d}x) \right) \gamma ^{\otimes N}(\textrm{d}{\textbf{x}}^N),\\&\int _{\Gamma \times \mathcal {T}}\omega (\nu ,x) \, \textrm{d}(\textsf{T}^{n,-}_{\#} \vartheta ^-_{\Pi _n})\\&\quad =\frac{1}{e^{n\gamma (\mathcal {T})}-1}\sum _{N=1}^{\infty } \frac{n^N}{N!} \int _{\mathcal {T}^N} \left( \int _{\mathcal {T}} \omega \left( L_n({\textbf{x}}^{N})-\tfrac{1}{n}\delta _x,x\right) \kappa ^-\left[ L_n({\textbf{x}}^N)\right] (\textrm{d}x) \right) \gamma ^{\otimes N}(\textrm{d}{\textbf{x}}^N). \end{aligned}$$

Since \(\kappa ^-[\tfrac{1}{n}\delta _{y}]=0\) for any \(y\in \mathcal {T}\), the sum in the right-hand side of the last expression starts from \(N=2\), thus reducing the expression to

$$\begin{aligned}{} & {} \frac{1}{e^{n \gamma (\mathcal {T})}-1}\sum _{N=1}^{\infty } \frac{n^{N+1}}{(N+1)!} \int _{\mathcal {T}^{N+1}} \left( \int _{\mathcal {T}} \omega \left( L_{{n}}({\textbf{x}}^{N+1})-\tfrac{1}{n}\delta _x,x\right) \kappa ^-\left[ L_n({\textbf{x}}^{N+1})\right] (\textrm{d}x) \right) \\{} & {} \quad \gamma ^{\otimes (N+1)}(\textrm{d}{\textbf{x}}^{N+1}), \end{aligned}$$

It is clear that, for our desired equality, it is enough to show that for every N,

$$\begin{aligned} \begin{aligned} n \int _{\mathcal {T}^{N+1}}&\left( \int _{\mathcal {T}^2} \omega \left( L_n({\textbf{x}}^{N+1})-\tfrac{1}{n}\delta _x,x\right) c(x,y)\, L_n({\textbf{x}}^{N+1})^{\otimes 2}(\textrm{d}x,\textrm{d}y)\right) \, \gamma ^{\otimes (N+1)}(\textrm{d}{\textbf{x}}^{N+1})\\&=(N+1)\int _{\mathcal {T}^{N}} \left( \int _{\mathcal {T}^2} \omega \left( L_n({\textbf{x}}^{N}),x\right) c(x,y)\gamma (\textrm{d}x) L_n({\textbf{x}}^{N})(\textrm{d}y)\right) \, \gamma ^{\otimes N}(\textrm{d}{\textbf{x}}^N). \end{aligned} \end{aligned}$$

To do so, note that since \(c(x,x)=0\),

$$\begin{aligned}&n \int _{\mathcal {T}^2} \omega \left( L_n({\textbf{x}}^{N+1})-\tfrac{1}{n}\delta _x,x\right) c(x,y)\, L_n({\textbf{x}}^{N+1})^{\otimes 2}(\textrm{d}x,\textrm{d}y)\\&=\frac{1}{n}\sum _{i=1}^{N+1} \sum _{j\ne i} \omega \left( L_n({\textbf{x}}^{N+1})-\tfrac{1}{n}\delta _{x_i},x_i\right) c(x_i,x_j)\\&=\sum _{i=1}^{N+1} \int _{\mathcal {T}} \omega \left( L_n({\textbf{x}}^{N+1})-\tfrac{1}{n}\delta _{x_i},x_i\right) c(x_i,y) \left( L_n({\textbf{x}}^{N+1})-\tfrac{1}{n}\delta _{x_i}\right) (\textrm{d}y). \end{aligned}$$

Hence, by symmetry of \(\gamma ^{\otimes (N+1)}\), we obtain

$$\begin{aligned} \int _{\mathcal {T}^{N+1}}&\left( \sum _{i=1}^{N+1} \int _{\mathcal {T}} \omega \left( L_n({\textbf{x}}^{N+1})-\tfrac{1}{n}\delta _{x_i},x_i\right) c(x_i,y) \left( L_n({\textbf{x}}^{N+1})-\tfrac{1}{n}\delta _{x_i}\right) (\textrm{d}y) \right) \gamma ^{\otimes (N+1)}(\textrm{d}{\textbf{x}}^{N+1})\\&=(N+1) \int _{\mathcal {T}^N} \left( \int _{\mathcal {T}^2} \omega \left( L_n({\textbf{x}}^{N}),x\right) c(x,y) \gamma (\textrm{d}x)L_n({\textbf{x}}^{N})(\textrm{d}y) \right) \gamma ^{\otimes N}(\textrm{d}{\textbf{x}}^{N}), \end{aligned}$$

as desired. \(\square \)

Recall from Lemma 2.10 that that

$$\begin{aligned} \kappa ^{\pm }_{\nu }(\mathcal {T})\le M (1+\nu (\mathcal {T})^2), \end{aligned}$$

where \(M:=(1+\gamma (\mathcal {T}))\Vert c\Vert _{\infty }\). Now let

$$\begin{aligned} M_n:=\max \bigl \{1+2/n^2,2\bigr \} M, \end{aligned}$$

and the jointly convex and lower semicontinuous function \(\Upsilon :{\mathbb {R}}_{\ge 0}^3\rightarrow [0,+\infty ]\) given by

$$\begin{aligned} \Upsilon (w,u,v):= {\left\{ \begin{array}{ll} \sqrt{u v} &{} \quad \hbox {if } w=0,\\ \phi \left( \frac{w}{\sqrt{u v}}\right) \sqrt{u v} &{} \quad \hbox {if } u,v>0,\\ +\infty &{} \quad \hbox {if } w>0, \hbox { and either} u=0 \hbox { or} v=0. \end{array}\right. } \end{aligned}$$

We then have the following result.

Lemma 3.13

The following statements hold:

  1. (i)

    For all \(\textsf{P}\)

    $$\begin{aligned} \int _{\Gamma \times \mathcal {T}} (1+\nu (\mathcal {T}))^{-2}\,\Theta _{\textsf{P}}^{n,\pm }(\textrm{d}\nu \,\textrm{d}y)\le \int _{\Gamma \times \mathcal {T}} (1+\nu (\mathcal {T})^2)^{-1}\,\Theta _{\textsf{P}}^{n,\pm }(\textrm{d}\nu \,\textrm{d}y) \le M_n. \end{aligned}$$
  2. (ii)

    For any \(\textsf{P}\), admissible \(\textsf{J}^{\pm }\), and net flux \(\textsf{J}^{\textrm{net}}=\textsf{J}^+-\textsf{T}^{n,-}_{\#}\textsf{J}^-\), \(\omega \in \mathcal {B}(\Gamma \times \mathcal {T})\), we have

    $$\begin{aligned} \int _{\Gamma \times \mathcal {T}} |\omega |\,\textrm{d}|\textsf{J}^{\textrm{net}}| \, \le \mathcal {R}_n(\textsf{P},\textsf{J}^+,\textsf{J}^-)+\int _{\Gamma \times \mathcal {T}} \Psi ^*(\omega ) \, \textrm{d}\Theta _{\textsf{P}}^{n,+}. \end{aligned}$$

    Moreover,

    $$\begin{aligned} \phi \left( 1 \vee \frac{1}{M_n} \int _{\Gamma \times \mathcal {T}} (1+\nu (\mathcal {T}{})^2)^{-1}\,\textsf{J}^{\pm }(\textrm{d}\nu ,\textrm{d}x) \right) M_n\, \le \mathcal {R}_n(\textsf{P},\textsf{J}^+,\textsf{J}^-), \end{aligned}$$
    (3.16a)
    $$\begin{aligned} \Psi \left( \frac{1}{M_n} \int _{\Gamma \times \mathcal {T}} (1+\nu (\mathcal {T}{}))^{-1}\,|\textsf{J}^{\textrm{net}}|(\textrm{d}\nu ,\textrm{d}x) \right) M_n\, \le \mathcal {R}_n(\textsf{P},\textsf{J}^+,\textsf{J}^-). \end{aligned}$$
    (3.16b)
  3. (iii)

    For all admissible \(\textsf{P},\textsf{J}^{\pm }\),

    $$\begin{aligned} \mathcal {E}\textrm{nt}(\textsf{J}^{\pm }|\Theta ^{n,\pm })=\int _{\Gamma \times \mathcal {T}} \Upsilon \left( \frac{\textrm{d}\textsf{J}^\pm }{\textrm{d}\Sigma },\frac{\textrm{d}\vartheta _{\textsf{P}}^\pm }{\textrm{d}\Sigma },\frac{\textrm{d}(\textsf{T}^{n,\mp }_{\#}\vartheta _{\textsf{P}}^\mp )}{\textrm{d}\Sigma }\right) \textrm{d}\Sigma , \end{aligned}$$
    (3.17)

    for any common dominating measure \(\Sigma \). Moreover, if \(\textrm{d}\textsf{P}=U \textrm{d}\Pi _n\),

    $$\begin{aligned} \mathcal {E}\textrm{nt}(\textsf{J}^{\pm }|\Theta ^{n,\pm })=\int _{\Gamma \times \mathcal {T}} \Upsilon \left( \frac{\textrm{d}\textsf{J}^{\pm }}{\textrm{d}\vartheta _{\textsf{P}}^{\pm }},U(\nu ),U(\nu \pm \tfrac{1}{n}\delta _x)\right) \textrm{d}\vartheta ^{\pm }_{\textsf{P}}. \end{aligned}$$

Remark 3.14

Since \(M_n\le 3\,M\) for all \(n\ge 1\) the estimates (3.16) are uniform in n, which we will use in the EDP-convergence to establish tightness of sequences \(\textsf{J}^{n,\pm }\) under bound on \(\mathcal {I}_n\). Moreover, the representation (3.17) is used to deduce the lower-semicontinuity of \(\mathcal {I}_n\) for sequences of curves.

Proof

(i) For any \(x^*\in \mathcal {T}\), \(\nu \in \Gamma \), we have

$$\begin{aligned} \max \{\kappa ^{\pm }(\mathcal {T}), \kappa ^{\pm }[\textsf{T}^{n,\pm }_{x^*}(\nu )](\mathcal {T})\}&\le M \max \left\{ 1+\nu (\mathcal {T})^2,\, 1+(\textsf{T}^{n,+}_{x^*}(\nu ))(\mathcal {T})^2,\, 1+(\textsf{T}^{n,-}_{x^*}(\nu ))(\mathcal {T})^2\right\} \\&\le M_n (1+\nu (\mathcal {T})^2) \end{aligned}$$

due to the inequality

$$\begin{aligned}1+(\tfrac{1}{n}+z)^2\le 1+\tfrac{2}{n^2}+2z^2, \qquad \text{ for } \text{ all } z\ge 0.\end{aligned}$$

In particular,

$$\begin{aligned} \max \left\{ \int _{\Gamma \times \mathcal {T}} (1+\nu (\mathcal {T})^2)^{-1} \textrm{d}\vartheta _{\textsf{P}}^\pm , \int _{\Gamma \times \mathcal {T}} (1+\nu (\mathcal {T})^2)^{-1} \textrm{d}\textsf{T}^{n,\mp }_{\#} \vartheta _{\textsf{P}}^\pm \right\} \le M_n, \end{aligned}$$

and hence the desired statement follows after applying Jensen’s inequality.

(ii) By duality we have for any \(\omega \in \mathcal {B}_c(\Gamma \times \mathcal {T})\),

$$\begin{aligned}&\int _{\Gamma \times \mathcal {T}} \omega ^+\, \textrm{d}\textsf{J}^++\int _{\Gamma \times \mathcal {T}} \omega ^- \,\textrm{d}\textsf{J}^-\le \mathcal {R}_n(\textsf{P},\textsf{J}^+,\textsf{J}^-)+\int _{\Gamma \times \mathcal {T}} (e^{\omega ^+}-1) \,\textrm{d}\Theta _{\textsf{P}}^{n,+}\\&\quad +\int _{\Gamma \times \mathcal {T}} (e^{\omega ^-}-1)\, \textrm{d}\Theta _{\textsf{P}}^{n,-}. \end{aligned}$$

Substituting \(\omega ^+=\omega \), \( \omega ^-=-\omega \circ \textsf{T}^{n,-}\) and using the fact that \(\textsf{T}^{n,-}_{\#}\Theta ^{n,-}_{\textsf{P}}=\Theta ^{n,+}_{\textsf{P}}\) we derive

$$\begin{aligned} \int _{\Gamma \times \mathcal {T}} \omega \,\textrm{d}\textsf{J}^{\textrm{net}}\le \mathcal {R}_n(\textsf{P},\textsf{J}^+,\textsf{J}^-)+\int _{\Gamma \times \mathcal {T}} \Psi ^*(\omega ) \,\textrm{d}\Theta _{\textsf{P}}^{n,+}. \end{aligned}$$

Since \(\Psi ^*\) is even we can replace \(\omega \) and \(\textsf{J}\) by their absolutes in the inequality, after substituting for \(\omega \) appropriately, and we conclude with a monotone convergence argument. The inequalities (3.16a) and (3.16b) now follow similarly as in Lemma 2.10 via respectively Jensen’s inequality and a dual approach.

(iii) Let us only consider \(\textsf{J}^+\), \(\Theta ^{n,+}\) (the case for \(\textsf{J}^-\), \(\Theta ^{n,-}\) is similar). Suppose \(\mathcal {E}\textrm{nt}(\textsf{J}^{+}|\Theta ^{n,+})<\infty \) and recall that

$$\begin{aligned} \Theta ^{n,+}_{\textsf{P}}(\textrm{d}\nu ,\textrm{d}x):=\sqrt{\frac{\textrm{d}\vartheta ^{+}_{\textsf{P}}}{\textrm{d}\Sigma }\frac{\textrm{d}(\textsf{T}^{n,-}_{\#}\vartheta ^{-}_{\textsf{P}})}{\textrm{d}\Sigma }}\, \,\textrm{d}\Sigma , \end{aligned}$$

where \(\Sigma \) is a dominating measure, e.g \(\Sigma =\vartheta ^{+}_{\textsf{P}}+\textsf{T}^{n,-}_{\#}\vartheta ^{-}_{\textsf{P}}\). Then \(\textsf{J}^+\ll \Theta _{\textsf{P}}^{n,+} \ll \Sigma \), and it follows that \(\textsf{J}^+\)-a.e. \(\textrm{d}\vartheta _{\textsf{P}}^{+}/\textrm{d}\Sigma \), \(\textrm{d}(\textsf{T}^{n,-}_{\#}\vartheta _{\textsf{P}}^{-})/\textrm{d}\Sigma >0\), from which one can easily verifies (3.17).

Vice versa, suppose that

$$\begin{aligned}\int _{\Gamma \times \mathcal {T}} \Upsilon \left( \frac{\textrm{d}\textsf{J}^+}{\textrm{d}\Sigma },\frac{\textrm{d}\vartheta _{\textsf{P}}^+}{\textrm{d}\Sigma },\frac{\textrm{d}(\textsf{T}^{n,-}_{\#}\vartheta _{\textsf{P}}^-)}{\textrm{d}\Sigma }\right) \textrm{d}\Sigma <\infty ,\end{aligned}$$

for some dominating measure \(\Sigma \). Then again \(\textsf{J}^+\)-a.e. we have that \(\textrm{d}\vartheta _{\textsf{P}}^{+}/\textrm{d}\Sigma \), \(\textrm{d}(\textsf{T}^{n,-}_{\#}\vartheta _{\textsf{P}}^{-})/\textrm{d}\Sigma >0\), and by super-linearity of \(\phi \) deduce that in fact \(\textsf{J}^+\ll \tilde{\Sigma }\) for any dominating measure of \(\vartheta _{\textsf{P}}^+\) and \(\textsf{T}^{n,-}_{\#}\vartheta _{\textsf{P}}^-\), which together implies \(\textsf{J}^+\ll \Theta _{\textsf{P}}^{n,+}\) and the result follows similarly as above. \(\square \)

Finally, we discuss the time-regularity of \(\textsf{P}_t\) for admissible curves and state the analog of Lemma 2.12. Let the weighted total variation metric \(d_{TV,w}\) be given as

$$\begin{aligned} \begin{aligned} d_{TV,w}(\textsf{P}^1,\textsf{P}^2)&:=\int _{\Gamma } (1+\nu (\mathcal {T})^2)^{-1}\, \textrm{d}|\textsf{P}^1-\textsf{P}^2|. \end{aligned} \end{aligned}$$
(3.18)

Note that \(d_{TV,w}\) is lower semicontinuous with respect to the narrow topology, and while convergence in \(d_{TV,w}\) does not directly imply narrow convergence, it does so on narrowly pre-compact sets.

Lemma 3.15

For any \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_n\) we have \( \text{ for } \text{ all } s,t\in [0,T]\):

$$\begin{aligned} d_{TV,w}(\textsf{P}_s,\textsf{P}_t)\le 4 n \max \Bigl \{1+\tfrac{2}{n^2},2\Bigr \} \int _s^t \int _{\Gamma \times \mathcal {T}} (1+\nu (\mathcal {T})^2)^{-1} \textrm{d}(\textsf{J}_r^{+}+\textsf{J}^-_r)\, \textrm{d}r. \end{aligned}$$
(3.19)

Suppose in addition that \(\textsf{P}_t\ll \Pi _n, \textsf{J}^{\pm }_t\ll \vartheta _{\Pi _n}^{\pm }\) for all \(t\in [0,T]\) and set

$$\begin{aligned} \ell :=(1+\nu (\mathcal {T})^2)^{-1} \Pi _n,\qquad \Sigma ^{\pm }:=(1+\nu (\mathcal {T})^2)^{-1} \vartheta ^{\pm }_{\Pi _n}. \end{aligned}$$

Then there exists an absolutely continuous and a.e. differentiable map \(U:[a,b]\rightarrow L^1(\mathcal {P}(\Gamma ),\ell )\) and maps \(G^\pm :[0,T]\rightarrow L^1(\Sigma ^{\pm })\) such that \(U_t=\textrm{d}\textsf{P}_t/\textrm{d}\Pi _n\), \(G_t^{\pm }=\textrm{d}\textsf{J}^{\pm }/\textrm{d}\vartheta _{\Pi _n}^{\pm }\), and

$$\begin{aligned} \begin{aligned} \partial _t U_t(\nu )&= n \int _{\mathcal {T}} (G_t^-(\nu +\tfrac{1}{n}\delta _x,x)-G_t^+(\nu ,x)) \, \kappa ^+_{\nu }(\textrm{d}x)\\&+ n\int _{\mathcal {T}} (G_t^+(\nu -\tfrac{1}{n}\delta _x,x)-G_t^-(\nu ,x)) \, \kappa ^-_{\nu }(\textrm{d}x). \end{aligned} \end{aligned}$$
(3.20)

Alternatively, in terms of the net-flux \(\textsf{J}^{\textrm{net}}=G \vartheta _{\textsf{P}}^+\) with \(G^\textrm{net}:=G^+-G^-\circ \textsf{T}^{n,+}\),

$$\begin{aligned} \begin{aligned} \partial _t U_t(\nu )= n \int _{\mathcal {T}} G^\textrm{net}_t(\nu -\tfrac{1}{n}\delta _x,x) \, \kappa ^-_{\nu }(\textrm{d}x)-n \int _{\mathcal {T}} G^\textrm{net}_t(\nu ,x) \, \kappa ^+_{\nu }(\textrm{d}x). \end{aligned} \end{aligned}$$

Remark 3.16

Note that the estimate (3.19) for the weighted total variation metric blows up as \(n\rightarrow \infty \). For the proof of EDP-convergence we instead use a weaker metric, the transportation-like metric W defined by (4.4), which does behave uniform-in-n for a sequence of curves with finite \(\limsup _{n\rightarrow \infty } \mathcal {I}_n\).

Proof

Due to the continuity equation and after a monotone class argument, we have the crude estimate

$$\begin{aligned} \left| \int _{\Gamma } F \textrm{d}(\textsf{P}_t-\textsf{P}_s) \right|\le & {} n \int _s^t \left[ \int _{\Gamma \times \mathcal {T}} (|F(\nu +\tfrac{1}{n}\delta _x)|+|F(\nu )|) \, \textrm{d}\textsf{J}_r^{+} \right. \\{} & {} \quad \left. + \int _{\Gamma \times \mathcal {T}} (|F(\nu -\tfrac{1}{n}\delta _x)|+|F(\nu )|) \, \textrm{d}\textsf{J}_r^{-} \right] \textrm{d}r, \end{aligned}$$

for any \(F\in \mathcal {B}_c(\Gamma )\). Now fix \(F\in \mathcal {B}_c(\Gamma )\), and let \(K:=\sup _{\nu \in \Gamma } F(\nu )(1+\nu (\mathcal {T})^2)\). Note that by the bounds of Lemma 3.13 for any \(\nu \in \Gamma _n\), we have the following estimates

$$\begin{aligned} |F|(\nu )&\le K (1+\nu (\mathcal {T})^2)^{-1}\\ |F|(\nu +\tfrac{1}{n}\delta _x)&\le K \big (1+(\tfrac{1}{n}+\nu (\mathcal {T}))^2\big )^{-1} \le K (1+\nu (\mathcal {T})^2)^{-1},\\ |F|(\nu -\tfrac{1}{n}\delta _x)&\le K \big (1+(\tfrac{-1}{n}+\nu (\mathcal {T}))^2\big )^{-1} \le K \max \{1+\tfrac{2}{n^2},2\} (1+\nu (\mathcal {T})^2)^{-1}, \end{aligned}$$

and therefore

$$\begin{aligned} \left| \int _{\Gamma } F \textrm{d}(\textsf{P}_t-\textsf{P}_s) \right| \le 4 n K \max \Bigl \{1+\tfrac{2}{n^2},2\Bigr \} \int _s^t \int _{\Gamma \times \mathcal {T}} (1+\nu (\mathcal {T})^2)^{-1} \textrm{d}( \textsf{J}^+_r+\textsf{J}^-_r) \, \textrm{d}r. \end{aligned}$$

Taking the supremum over all \(F\in \mathcal {B}_c(\Gamma )\) with \(\sup _{\nu \in \Gamma } F(\nu )(1+\nu (\mathcal {T})^2)\le 1\) we conclude that

$$\begin{aligned} d_{TV,w}(\textsf{P}_s,\textsf{P}_t)&=\int _{\Gamma } (1+\nu (\mathcal {T})^2)^{-1}\, \textrm{d}|\textsf{P}_t^1-\textsf{P}_s^2|\\&\le 4 n \max \Bigl \{1+\tfrac{2}{n^2},2\Bigr \} \int _s^t (1+\nu (\mathcal {T})^2)^{-1} (\textrm{d}\textsf{J}^+_r+\textsf{J}^-_r) \, \textrm{d}r. \end{aligned}$$

Next, suppose that \(\textsf{P}_t\ll \Pi _n, \textsf{J}^{\pm }_t\ll \vartheta _{\Pi _n}^{\pm }\) for all \(t\in [0,T]\). Let \(U_t=\textrm{d}\textsf{P}_t/\textrm{d}\Pi _n\), \(G_t^{\pm }=\textrm{d}\textsf{J}^{\pm }/\textrm{d}\vartheta _{\Pi _n}^{\pm }\). Note that by the absolute continuity of \(\textsf{P}_t\) with respect to \(d_{TV,w}\), the map \(t\mapsto U_t\) is absolutely continuous in \(L^1(\ell )\). Moreover, for every \(F\in \mathcal {B}_c(\Gamma )\) the continuity equation reads as

$$\begin{aligned} \int _{\Gamma } F (U_t-U_s) \,\textrm{d}\Pi _n&= \int _s^t \int _{\Gamma \times \mathcal {T}} (F(\nu +\tfrac{1}{n}\delta _x)-F(\nu )) G_r^+(\nu ,x) \, \textrm{d}\vartheta _{\Pi _n}^+ \, \textrm{d}r\\& + \int _s^t \int _{\Gamma \times \mathcal {T}} (F(\nu -\tfrac{1}{n}\delta _x)-F(\nu )) G_r^-(\nu ,x) \, \textrm{d}\vartheta _{\Pi _n}^- \, \textrm{d}r. \end{aligned}$$

But due to Lemma 3.12, the integrands can be rewritten as follows

$$\begin{aligned} \int _{\Gamma \times \mathcal {T}} F(\nu \pm \tfrac{1}{n}\delta _x) G_r^{\pm }(\nu ,x) \, \textrm{d}\vartheta _{\Pi _n}^{\pm }&= \int _{\Gamma \times \mathcal {T}} F(\nu ) G_r^{\pm }(\nu \mp \tfrac{1}{n}\delta _x,x) \, \textrm{d}(\textsf{T}^{n,\pm }_{\#} \vartheta _{\Pi _n}^{\pm })\\&=\int _{\Gamma \times \mathcal {T}} F(\nu ) G_r^{\pm }(\nu \mp \tfrac{1}{n}\delta _x,x) \, \textrm{d}\vartheta _{\Pi _n}^\mp , \end{aligned}$$

and therefore

$$\begin{aligned} \int _{\Gamma } F (U_t-U_s) \,\textrm{d}\Pi _n&= \int _s^t \int _{\Gamma \times \mathcal {T}} F(\nu )(G_r^-(\nu +\tfrac{1}{n}\delta _x,x)-G_r^+(\nu ,x)) \kappa ^+_{\nu }(\textrm{d}x)\, \textrm{d}\Pi _n(\textrm{d}\nu )\, \, \textrm{d}r\\& +\int _s^t \int _{\Gamma \times \mathcal {T}} F(\nu )(G_r^+(\nu -\tfrac{1}{n}\delta _x,x)-G_r^-(\nu ,x)) \, \kappa ^-_{\nu }(\textrm{d}x)\, \textrm{d}\Pi _n(\textrm{d}\nu ) \, \textrm{d}r, \end{aligned}$$

which is the weak formulation of (3.20). Putting in the pre-factors \((1+\nu (\mathcal {T})^2)^{-1}\) to state the expression in terms of the finite measures \(\ell \) and \(\Sigma \), and noting that due to time-regularity \((1+\nu (\mathcal {T})^2)^{-1} \textsf{P}_t\) is TV-regular, we can proceed as in Corollary 4.14 of [33] and conclude the proof after redefining \(U,G^{\pm }\) on negligible sets. \(\square \)

3.2 Weak solutions

In this section, we will discuss the existence of weak solutions to (\(\mathsf FKE_n\)), i.e. solutions to

$$\begin{aligned} \partial _t \textsf{P}= \overline{\text {div}}^{n,+} \vartheta _{\textsf{P}}^{+}+\overline{\text {div}}^{n,-} \vartheta _{\textsf{P}}^-, \end{aligned}$$

in appropriate weak form, but with the property that \(\mathcal {I}_n(\textsf{P},\vartheta _{\textsf{P}}^+,\vartheta _{\textsf{P}}^-)\le 0\). In the next section, we will show that \(\mathcal {I}_n\ge 0\) and that gradient-flow solutions, i.e. those with \(\mathcal {I}_n=0\), are unique.

Definition 3.17

A curve \((\textsf{P}_t)_{t\in [0,T]}\) is a weak solution to (\(\mathsf FKE_n\)) if \(\textrm{supp} \,\textsf{P}_t\in \Gamma _n\) for all \(t\in [0,T]\), \(\textsf{P}_t\) is continuous in the narrow topology and for all \(s,t\in [0,T]\), and all \(F\in C_c(\Gamma )\),

$$\begin{aligned} \int _{\Gamma } F(\nu ) \,\textrm{d}\textsf{P}_t - \int _{\Gamma } F(\nu ) \,\textrm{d}\textsf{P}_s = \int _s^t \int _{\Gamma \times \mathcal {T}} \left( (\overline{\nabla }^{n,+} F) \,\textrm{d}\vartheta _{\textsf{P}_t}^++(\overline{\nabla }^{n,-} F) \, \textrm{d}\vartheta _{\textsf{P}_t}^- \right) \, \textrm{d}r. \end{aligned}$$

Remark 3.18

Recall that \(\int (1+\nu (\mathcal {T}))^2)\,\textrm{d}\vartheta _{\textsf{P}_t}^{\pm }\le M_n\) independently of \(\textsf{P}_t\). Hence it is easy to check that \((\textsf{P})\) is a weak solution if and only if \((\textsf{P},\vartheta _{\textsf{P}}^+,\vartheta _{\textsf{P}}^-)\in \textsf{CE}_{n}\).

Moreover, under some additional assumptions, solutions turn out to inherit polynomial mass-estimates from the initial datum, see e.g. Theorem 3.1 of [18] for the case in \({\mathbb {R}}^d\). While throughout this article we do not assume more from the initial data than having finite relative entropy with respect to \(\Pi _n\) (which does imply the finiteness of the first moment), we provide the higher-moment estimates here for completeness.

Lemma 3.19

Fix any \(p\ge 0\), and assume that \((\textsf{P})\) is a weak solution with initial datum satisfying

$$\begin{aligned} \int _{\Gamma } \nu (\mathcal {T})^p \,\textsf{P}_0(\textrm{d}\nu )<\infty , \end{aligned}$$

and such that for any \(F\in \mathcal {B}_b(\Gamma )\), \(s,t\in [0,T]\) we have the inequality

$$\begin{aligned} \int _{\Gamma } F(\nu ) \,\textrm{d}\textsf{P}_t - \int _{\Gamma } F(\nu ) \,\textrm{d}\textsf{P}_s \le \int _s^t \int _{\Gamma \times \mathcal {T}} \left( (\overline{\nabla }^{n,+} F)_+ \,\textrm{d}\vartheta _{\textsf{P}_t}^++(\overline{\nabla }^{n,-} F)_+ \, \textrm{d}\vartheta _{\textsf{P}_t}^- \right) \, \textrm{d}r.\nonumber \\ \end{aligned}$$
(3.21)

Then

$$\begin{aligned} \sup _{t\in [0,T]} \int _{\Gamma } \nu (\mathcal {T})^p \,\textsf{P}_t(\textrm{d}\nu )<\infty . \end{aligned}$$

The condition (3.21) is necessary to show the propagation of mass-moments, but can itself be shown to hold if the first moment is uniformly bounded in time (and in particular if the relative entropy with respect to \(\Pi _n\) is uniformly bounded in time), using the compactly supported multipliers \(\chi _m\) of Sect. 3.3.

Proof

Set \(F(\nu ):=f(\nu (\mathcal {T}))\) with \(f(z):=z^p\) and let \(f_k(z)=\min \{z,k\}^p\) be its sequence of truncations. Setting \(F_k(\nu ):=f_k(\nu (\mathcal {T}))\), we have for every \(k\ge 1\),

$$\begin{aligned} \int _{\Gamma } F_k\, \textrm{d}(\textsf{P}_t-\textsf{P}_0)&{\le \int _0^t \left( \int _{\Gamma \times \mathcal {T}}(\overline{\nabla }^{n,+} F_k)_+ \, \textrm{d}\vartheta _{\textsf{P}_r}^++\int _{\Gamma \times \mathcal {T}}(\overline{\nabla }^{n,-} F_k)_+ \, \textrm{d}\vartheta _{\textsf{P}_r}^- \right) \, \textrm{d}r}\\&\le \int _0^t \int _{\Gamma \times \mathcal {T}} \big (f_k(\nu (\mathcal {T})+\tfrac{1}{n})-f_k(\nu (\mathcal {T}))\big ) \,\kappa ^+_{\nu }(\textrm{d}x)\,\textsf{P}_r(\textrm{d}\nu )\, \textrm{d}r, \end{aligned}$$

where we used the fact that \(f_k\) is non-decreasing, thus implying \(\overline{\nabla }^{n,-} F_k\le 0\). Recalling that

$$\begin{aligned} \kappa _\nu ^+(\mathcal {T}) \le \Vert c\Vert _\infty \gamma (\mathcal {T})\nu (\mathcal {T}), \end{aligned}$$

and using that \(z(f_k(z+\tfrac{1}{n})-f_k(z))\le C_{p,n}(1+f_k(z))\) for a suitable constant \(C_{p,n}\) independent of k, we can apply a standard Gronwall argument to obtain

$$\begin{aligned} \int _{\Gamma } F_k(\nu ) \,\textrm{d}\textsf{P}_t&\le e^{ C_{p,n}\Vert c\Vert _{\infty }\gamma (\mathcal {T})t}\left( C_{p,n}\Vert c\Vert _{\infty }\gamma (\mathcal {T}) t+\int _{\Gamma } F_k(\nu ) \, \textrm{d}\textsf{P}_0 \right) . \end{aligned}$$

Taking \(k\rightarrow \infty \) we derive the desired inequality by monotone convergence. \(\square \)

We can now state the existence result of a weak solution satisfying one-half of the Energy-Dissipation principle, which is complemented by the chain rule proved in Sect. 3.3. The existence proof is one of EDP-convergence (see also Sect. 5), bootstrapping from problems with bounded kernels and the results of [33].

Theorem 3.20

Suppose that

$$\begin{aligned} \mathcal {E}\textrm{nt}({\bar{\textsf{P}}}|\Pi _n) < \infty . \end{aligned}$$

Then there exist a weak solution \((\textsf{P})\) with initial datum \({\bar{\textsf{P}}}\) such that

$$\begin{aligned} \int _0^T \mathcal {R}_n(\textsf{P}_t,\vartheta _{\textsf{P}_t}^+,\vartheta _{\textsf{P}_t}^-)\, \textrm{d}t + \mathcal {F}_n(\textsf{P}_T)-\mathcal {F}_n(\textsf{P}_0)+\int _0^T \mathcal {D}_n(\textsf{P}_t) \, \textrm{d}t \le 0. \end{aligned}$$

Proof

Fix any \({\bar{\textsf{P}}}\) with \(\mathcal {E}\textrm{nt}({\bar{\textsf{P}}}|\Pi _n)<\infty \). We proceed by approximating the unbounded kernel \({{\bar{\kappa }}}_n\) with bounded ones. For every \(\varepsilon >0\), we introduce the regularized jump kernel \({{\bar{\kappa }}}_{n}^{\varepsilon }(\nu ,\textrm{d}\eta )\) over \(\Gamma \) defined by

$$\begin{aligned} {{\bar{\kappa }}}_{n}^{\varepsilon }(\nu ,\textrm{d}\eta ):=\frac{1}{1+\varepsilon \nu (\mathcal {T})\eta (\mathcal {T})} {{\bar{\kappa }}}_{n}(\nu ,\textrm{d}\eta ). \end{aligned}$$

In terms of birth/death kernels, this can be rewritten as

$$\begin{aligned} {{\bar{\kappa }}}_{n}^{\varepsilon }(\nu ,\textrm{d}\eta ) = n \int _{\mathcal {T}}\delta _{\nu + \tfrac{1}{n} \delta _x}(\textrm{d}\eta )\, \kappa ^{+,\varepsilon }_{\nu }(\textrm{d}x) + n \int _{\mathcal {T}} \delta _{\nu -\tfrac{1}{n} \delta _x}(\textrm{d}\eta )\,\kappa ^{-,\varepsilon }_{\nu }(\textrm{d}x), \end{aligned}$$

where

$$\begin{aligned} \kappa ^{\pm ,\varepsilon }_{\nu }:=\frac{1}{1+\varepsilon \nu (\mathcal {T})(\nu (\mathcal {T})\pm \tfrac{1}{n})}\kappa ^{\pm }_{\nu }. \end{aligned}$$

Note that

$$\begin{aligned} \sup _{\nu \in \Gamma } \kappa ^{\pm ,\varepsilon }_{\nu }{(\mathcal {T})} < \infty \qquad \text{ for } \text{ all } \varepsilon >0. \end{aligned}$$
(3.22)

Correspondingly, we denote \(\vartheta _{\textsf{P}}^{\pm ,\varepsilon }\), \(\Theta ^{n,\pm ,\varepsilon }_{\textsf{P}}\), \(Q_{n,\varepsilon }\), \(Q_{n,\varepsilon }^*\), \(\mathcal {R}_{n,\varepsilon }\), \(\mathcal {D}_{n,\varepsilon }\), \(\mathcal {I}_{n,\varepsilon }\), \((\textsf{FKE}_{n,\varepsilon })\) as the relevant quantities, operators, functionals and forward Kolmogorov equations induced by \(\kappa ^{\pm ,\varepsilon }_{\nu }\). We will first show the existence of gradient-flow solutions for the regularized problems, i.e. curves such that \(\mathcal {I}_{n,\varepsilon }=0\), and then construct an appropriate limit curve as \(\varepsilon \rightarrow 0\).

Thus, fix any \(\varepsilon >0\). Due to the bound (3.22) it is clear that \(Q_{n,\varepsilon }\) is a bounded operator since

$$\begin{aligned} \sup _{\nu \in \Gamma } \int _{\Gamma } {{\bar{\kappa }}}_n^{\varepsilon }(\nu ,\textrm{d}\eta ) < \infty . \end{aligned}$$

Moreover, since the prefactor \(\nu (\mathcal {T})\eta (\mathcal {T})\) is symmetric under swapping of \(\nu \) and \(\eta \), it straightforward to verify that \({{\bar{\kappa }}}_{n}^{\varepsilon }\) is still reversible with respect to the same invariant measure \(\Pi _n\), i.e. we have

$$\begin{aligned} \Pi _n(\textrm{d}\nu ){{\bar{\kappa }}}_{n}^{\varepsilon }(\nu ,\textrm{d}\eta ) = \Pi _n(\textrm{d}\eta ){{\bar{\kappa }}}_{n}^{\varepsilon }(\eta ,\textrm{d}\nu ). \end{aligned}$$

The triple \((\Gamma ,\Pi _n,{{\bar{\kappa }}}_{n}^{\varepsilon })\) therefore satisfies the assumptions of [33]. Keeping in mind the difference in definitions of \(\Psi ^*\) due to the extra factor 2, by [33, Theorem 6.6] there exist a unique curve \(U^{\varepsilon }\in C^1([0,T],L^1(\Gamma ,\Pi _n))\) such that \(U_0=\textrm{d}{\bar{\textsf{P}}}/\textrm{d}\Pi _n\), and

$$\begin{aligned} \left\{ \begin{aligned} \partial _t U_t(\nu )&= \int _{\Gamma } (U_t(\eta )-U_t(\nu )){{\bar{\kappa }}}_n^{\varepsilon }(\nu ,\textrm{d}\eta ), \qquad \hbox { for a.e.}\ t\in [0,T], \\ \mathcal {E}\textrm{nt}(\textsf{P}_0|\Pi _n)- \mathcal {E}\textrm{nt}(\textsf{P}_T|\Pi _n)&= \int _0^T \int _{\Gamma \times \Gamma } \Psi \left( U_t(\eta )-U_t(\nu ) \right) \sqrt{U_t(\nu )U_t(\eta )} \, \Pi _n(\textrm{d}\nu ) \, {{\bar{\kappa }}}_{n}^{\varepsilon }(\nu ,\textrm{d}\eta ) \, \textrm{d}t\\&\quad + \int _{\Gamma \times \Gamma } \left( \sqrt{U_t(\eta )}-\sqrt{U_t(\nu )}\right) ^2 \Pi _n(\textrm{d}\nu ) \, {{\bar{\kappa }}}_{n}^{\varepsilon }(\nu ,\textrm{d}\eta ) \, \textrm{d}t, \end{aligned}\right. \end{aligned}$$

with \(\textsf{P}_t:=U_t \Pi _n\) as usual. In particular the entropy \(\mathcal {E}\textrm{nt}(\textsf{P}|\Pi _n)\) decreases along the solution and hence

$$\begin{aligned} \sup _{t\in [0,T]} \mathcal {E}\textrm{nt}(\textsf{P}_t|\Pi _n) \le \mathcal {E}\textrm{nt}({\bar{\textsf{P}}}|\Pi _n). \end{aligned}$$

By evenness of \(\Psi \), symmetry of \(\Pi _n {{\bar{\kappa }}}_n^{\varepsilon }\) and the identity (A.2), we can express for any U after substituting for \({{\bar{\kappa }}}_{n}^{\varepsilon }(\nu ,\textrm{d}\eta )\)

$$\begin{aligned} \frac{1}{2} \int _{\Gamma \times \Gamma }&\Psi \left( U(\eta )-U(\nu ) \right) \sqrt{U(\nu )U(\eta )} \, \Pi _n(\textrm{d}\nu ) \, {{\bar{\kappa }}}_{n}^{\varepsilon }(\nu ,\textrm{d}\eta ) \\&= \int _{U(\eta )>0,U(\nu )>0}\phi \left( \sqrt{U(\nu )/U(\eta )}\right) \sqrt{U(\eta )U(\nu )} \, \Pi _n(\textrm{d}\nu ) \, {{\bar{\kappa }}}_{n}^{\varepsilon }(\nu ,\textrm{d}\eta )\\&= \int _{U(\nu +n^{-1}\delta _x)>0,U(\nu )>0} \phi \left( \frac{\textrm{d}\vartheta _{\textsf{P}}^{+,\varepsilon }}{\textrm{d}\Theta ^{+,n,\varepsilon }_{\textsf{P}}} \right) \sqrt{U(\nu +\tfrac{1}{n}\delta _x)U(\nu )} \, \Pi _n(\textrm{d}\nu ) \, \kappa ^{+,\varepsilon }_{\nu }(\textrm{d}x) \\&\qquad + \int _{U(\nu -n^{-1}\delta _x)>0,U(\nu )>0} \phi \left( \frac{\textrm{d}\vartheta _{\textsf{P}}^{-,\varepsilon }}{\textrm{d}\Theta ^{-,n,\varepsilon }_{\textsf{P}}} \right) \sqrt{U(\nu -\tfrac{1}{n}\delta _x)U(\nu )} \, \Pi _n(\textrm{d}\nu ) \, \kappa ^{-,\varepsilon }_{\nu }(\textrm{d}x) \\&=\mathcal {R}_{n,\varepsilon }\left( \textsf{P},\vartheta _{\textsf{P}}^{+,\varepsilon },\vartheta _{\textsf{P}}^{-,\varepsilon }\right) . \end{aligned}$$

Moreover, it is straightforward to check that

$$\begin{aligned} \int _{\Gamma \times \Gamma } \left( \sqrt{U_t(\eta )}-\sqrt{U_t(\nu )}\right) ^2 \Pi _n(\textrm{d}\nu ) \, {{\bar{\kappa }}}_{n}^{\varepsilon }(\nu ,\textrm{d}\eta ) = \mathcal {D}_{n,\varepsilon }(\textsf{P}), \end{aligned}$$

and therefore with \(J^{\pm }:=\vartheta _{\textsf{P}}^{\pm ,\varepsilon }\) we conclude

$$\begin{aligned} \mathcal {I}_{n,\varepsilon }(\textsf{P},\textsf{J}^+,\textsf{J}^-)=0. \end{aligned}$$

Finally, note that by Lemma 3.13 and Remark 3.6

$$\begin{aligned} \begin{aligned} \mathcal {E}\textrm{nt}\left( \textsf{J}^{\pm }|\Theta _{\textsf{P}}^{n,+,\varepsilon }\right)&=\int _{\Gamma \times \mathcal {T}} \Upsilon \left( \frac{\textrm{d}\vartheta _{\textsf{P}}^{\pm ,\varepsilon }}{\textrm{d}\Sigma },\frac{\textrm{d}\vartheta _{\textsf{P}}^{\pm ,\varepsilon }}{\textrm{d}\Sigma },\frac{\textrm{d}(\textsf{T}^{n,\mp }_{\#}\vartheta _{\textsf{P}}^{\mp ,\varepsilon })}{\textrm{d}\Sigma }\right) \textrm{d}\Sigma ,\\ \mathcal {D}_{n,\varepsilon }&=2H^2(\vartheta _{\textsf{P}}^{\pm ,\varepsilon },\textsf{T}^{n,\mp }_{\#}\vartheta _{\textsf{P}}^{\mp ,\varepsilon }), \end{aligned} \end{aligned}$$

for any dominating measure \(\Sigma \), which are both non-negative, convex, and vaguely lower-semicontinuous functionals of \(\vartheta _{\textsf{P}}^{\pm ,\varepsilon },\textsf{T}^{n,\mp }_{\#}\vartheta _{\textsf{P}}^{\mp ,\varepsilon }\) in \(\mathcal {M}_{loc}(\Gamma \times \mathcal {T})\), see [6, Theorem 3.4.3].

Next, we consider the sequence of pairs \((\textsf{P}^{\varepsilon },\textsf{J}^{\pm ,\varepsilon })\) stemming from the regularized problems above, satisfying

$$\begin{aligned} \mathcal {I}_{n,\varepsilon }(\textsf{P}^{\varepsilon },\textsf{J}^{+,\varepsilon },\textsf{J}^{-,\varepsilon })=0 \qquad \text{ for } \text{ all } \varepsilon >0. \end{aligned}$$

As for a priori estimates, we have

$$\begin{aligned} \sup _{\varepsilon ,t\in [0,T]} \mathcal {E}\textrm{nt}(\textsf{P}_t^{\varepsilon }|\Pi _n) \le \mathcal {E}\textrm{nt}({\bar{\textsf{P}}}|\Pi _n), \end{aligned}$$
(3.23)

and

$$\begin{aligned} \kappa ^{\pm ,\varepsilon }_{\nu }(\mathcal {T})\le \kappa ^{\pm }_{\nu }(\mathcal {T}) \qquad \text{ for } \text{ all } \varepsilon >0. \end{aligned}$$

From the latter, it can be shown similarly as in Lemma 3.15 that we have the equicontinuity result

$$\begin{aligned} d_{TV,w}(\textsf{P}^{\varepsilon }_t,\textsf{P}^{\varepsilon }_s)\le 2 n \max \{1+\tfrac{2}{n^2},2\} |t-s|. \end{aligned}$$

Here \(d_{TV,w}\) is the weighted total variation-metric defined in (3.18) as

$$\begin{aligned} \begin{aligned} d_{TV,w}(\textsf{P}^{\varepsilon }_t,\textsf{P}^{\varepsilon }_s)&:=\int _{\Gamma } (1+\nu (\mathcal {T})^2)^{-1}\, \textrm{d}|\textsf{P}^{{\varepsilon }}_{{t}}-\textsf{P}^{{\varepsilon }}_{{s}}|, \text{ for } \text{ all } \varepsilon >0,\, \text{ for } \text{ all } s,t\in [0,T]. \end{aligned} \end{aligned}$$

Recall that d is lower semicontinuous with respect to the narrow topology and convergence in d implies narrow convergence on narrowly pre-compact sets. Since \(\mathcal {E}\textrm{nt}(\textsf{P}^{\varepsilon }_t|\Pi _n)\) is bounded uniformly in \(\varepsilon \) and t and \(\mathcal {E}\textrm{nt}(\cdot |\Pi _n)\) is narrowly coercive we obtain by a standard Arzelá-Ascoli argument, up to choosing a subsequence, the existence of a curve \(t\mapsto \textsf{P}_t\) such that

$$\begin{aligned} \textsf{P}_t^{\varepsilon } \rightarrow \textsf{P}_t \quad \hbox {narrowly for all } t\in [0,T]. \end{aligned}$$

Note that by the estimate (3.23) and lower-semicontinuity of the entropy, we have that for every \(t\in [0,T]\), the sequence of measures \(\textsf{P}_t^{\varepsilon }\) converge setwise to \(\textsf{P}_t\) and \(\mathcal {E}\textrm{nt}(\textsf{P}_t|\Pi _n)\le \mathcal {E}\textrm{nt}({{\bar{\textsf{P}}}}|\Pi _n)<\infty \). Moreover, \(\kappa ^{\pm ,\varepsilon }_{\nu } \nearrow \kappa ^{\pm }_{\nu }\) as \(\varepsilon \rightarrow 0\) for every \(\nu \), and hence setwise convergence of \(\textsf{P}^{\varepsilon }_t\) implies setwise convergence on pre-compact sets of \(\Gamma \times \mathcal {T}\) for

$$\begin{aligned} \vartheta _{\textsf{P}^{\varepsilon }_t}^{\pm ,\varepsilon }(\textrm{d}\nu ,\textrm{d}x)=\textsf{P}_t^{\varepsilon }(\textrm{d}\nu )\kappa ^{\pm ,\varepsilon }[\nu ](\textrm{d}x), \end{aligned}$$

see e.g. [33, Lemma 2.4] for the case of set-wise convergence for bounded jump kernels. In particular, we have the vague convergence

$$\begin{aligned} \vartheta ^{\pm ,\varepsilon }_{\textsf{P}_t^{\varepsilon }} \rightarrow \vartheta _{\textsf{P}_t}^\pm ,\qquad \textsf{T}^{n,\pm }_{\#}\vartheta ^{\pm ,\varepsilon }_{\textsf{P}_t^{\varepsilon }} \rightarrow \textsf{T}^{n,\pm }_{\#}\vartheta _{\textsf{P}_t}^\pm . \end{aligned}$$

It is straightforward to check that we can pass to the limit in the continuity Eq. (3.11), and in particular, derive that \(\textsf{P}\) is a weak solution to the unregularized problem.

Finally, recall that \(\mathcal {F}_n(\textsf{P}_T^{{\varepsilon }})\) is convex in and narrowly lower semicontinuous in \(\textsf{P}^{\varepsilon }_T\), and as shown above the action \(\mathcal {R}^{\varepsilon }_n\) is jointly convex and lower semicontinuous in \((\vartheta ^{\pm ,\varepsilon }_{\textsf{P}^{\varepsilon }},\textsf{T}^{n,\mp }_{\#}\vartheta ^{\mp ,\varepsilon }_{\textsf{P}^{\varepsilon }})\). Proceeding as in Remark 3.6, we also find that the Fisher information is jointly convex and lower semicontinuous in \((\vartheta ^{\pm ,\varepsilon }_{\textsf{P}^{\varepsilon }},\textsf{T}^{n,\mp }_{\#}\vartheta ^{\mp ,\varepsilon }_{\textsf{P}^{\varepsilon }})\) if \(\textsf{P}^{\varepsilon }\) are contained in sub-level sets of \(\mathcal {F}_n\). Therefore, we conclude that

$$\begin{aligned} \mathcal {I}_{n}(\textsf{P})&\le \liminf _{\varepsilon \rightarrow 0} \left( \int _0^T \mathcal {R}_{n,\varepsilon }(\textsf{P}^{\varepsilon }_t,\textsf{J}_t^{+,\varepsilon },\textsf{J}_t^{-,\varepsilon }) \, \textrm{d}t+\mathcal {F}_n(\textsf{P}_T^{\varepsilon })-\mathcal {F}_n({{\bar{\textsf{P}}}})+\int _0^T \mathcal {D}_{n,\varepsilon }(\textsf{P}^ {\varepsilon }_t) \, \textrm{d}t \right) \\&={\liminf _{\varepsilon \rightarrow 0}}\mathcal {I}_{n,\varepsilon }(\textsf{P}^{\varepsilon },\textsf{J}^{+,\varepsilon },\textsf{J}^{-,\varepsilon }) =0, \end{aligned}$$

thus establishing the claim. \(\square \)

3.3 Variational characterization

We will now present the chain rule for the entropy. The strategy of the proof is similar to the mean-field case and the proof for jump processes of [33], with the difference that due to the unboundedness of \({{\bar{\kappa }}}\), we need a two-fold regularization of the entropy, namely via truncations and compactly supported multipliers.

Theorem 3.21

For any \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{n}\) with \(\mathcal {F}_n(\textsf{P}_0)<\infty \) and \(\mathcal {I}_{n}(\textsf{P},\textsf{J}^+,\textsf{J}^-) < \infty \), it holds that \(t \mapsto \mathcal {F}_{n}(\textsf{P}_t)\) is absolutely continuous and

$$\begin{aligned} \frac{\textrm{d}\,}{\textrm{d}t} \mathcal {F}_n(\textsf{P}_t)=\int _{\Gamma \times \mathcal {T}} \bigl (\log U(\nu +\delta _x)-\log U(\nu )\bigr )\,\textrm{d}\textsf{J}^{\textrm{net}}_t {}, \qquad \hbox {for a.e. } t\in [0,T]. \end{aligned}$$

Moreover, \(\mathcal {I}_n\ge 0\), and if \(\mathcal {I}_n=0\) we have

$$\begin{aligned} \textsf{J}^{\pm }_t=\textsf{P}_t \kappa _{\nu }^{\pm } \qquad \hbox {for a.e.} t\in [0,T].\end{aligned}$$

Proof

For any curve \(\textsf{P}\) with \(\textsf{P}\ll \Pi _n\) for all \(t\in [0,T]\) we will use

$$\begin{aligned} S^{k,m}_t=:\int _{\Gamma } \phi _{k,m}(U_t)\, \textrm{d}\Pi _n, \qquad S^{m}_t=:\int _{\Gamma } \phi _{m}(U_t)\, \textrm{d}\Pi _n, \end{aligned}$$

where \(\phi _{k,m}(U,\nu )=\chi _k(\nu )\phi _m(U)\), \(k,m\in {\mathbb {N}}\) with \(\phi _m\) the previously defined regularized entropy functions, and \(\chi _k:=f_k(\nu (\mathcal {T})) \in C_c(\Gamma )\) compactly supported multipliers defined via

$$\begin{aligned} f_k(z):=\left\{ \begin{aligned} 1&, \qquad{} & {} 0\le z\le k, \\ 2-\frac{z}{k}&, \qquad{} & {} k\le z\le 2k, \\ 0&, \qquad{} & {} z\ge 2k. \\ \end{aligned}\right. \end{aligned}$$

Note that \(|f_k|\le 1\), \(|f_k'(z)|z\le 2\) uniformly in k, \(f_k\) converges monotonically to 1, and \(|\overline{\nabla }^{n,+} \xi _k|(\nu ,x)\le 3/(1+\nu (\mathcal {T}))\) if \(k\ge 1\). In addition, recall that \(\phi _m'\) converges pointwise to \(\phi '\) and \(|\phi '_m|,\phi _m\) converge monotonically to \(|\phi '|,\phi \) respectively, and in particular,

$$\begin{aligned} \lim _{k\rightarrow \infty } S^{k,m}_t{=}S^m_t, \qquad \lim _{m\rightarrow \infty } S^m_t=\mathcal {E}\textrm{nt}(\textsf{P}_t|\Pi _n). \end{aligned}$$

Moreover, let the distributional derivatives with respect to \(\textsf{P}\) be defined as

$$\begin{aligned} DS^{k,m}_t(\nu ):=\chi _k(\nu ) \phi '_m(U_t(\nu )), \qquad DS^{m}_t(\nu ):= \phi '_m(U_t(\nu )) \end{aligned}$$

Note that pointwise \(\lim _{k\rightarrow \infty } \overline{\nabla }^{n,\pm } DS^{k,m}_t=\overline{\nabla }^{n,\pm } DS^{m}_t\) and \(\lim _{m\rightarrow \infty } \overline{\nabla }^{n,\pm } DS^{m}_t = \overline{\nabla }^{n,\pm } \phi '(U_t)\).

Now, consider a curve \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_n\) with \(\mathcal {F}_n(\textsf{P}_0)<\infty \) and \(\mathcal {I}_n<\infty \). Since \(\mathcal {E}\textrm{nt}\) is bounded from below

$$\begin{aligned} \int _0^T \mathcal {R}_n(\textsf{P}_t,\textsf{J}^+_t,\textsf{J}^-_t)\, \textrm{d}t<\infty , \qquad \int _0^T \mathcal {D}_{n}(\textsf{P}_t) \, \textrm{d}t < \infty , \end{aligned}$$

and therefore \(\textsf{P}_t\ll \Pi _n\), \(\textsf{J}^{\pm }_t\ll \Theta ^{n,\pm }_{\textsf{P}_t} \ll \vartheta ^{\pm }_{\Pi _n}\) for a.e. \(t\in [0,T]\), with

$$\begin{aligned} \Theta ^{n,\pm }_{\textsf{P}_t}(\textrm{d}\nu ,\textrm{d}x)=\sqrt{U_t(\nu )U_t(\nu \pm \tfrac{1}{n}\delta _x)}\, \vartheta _{\Pi _n}^{\pm }(\textrm{d}\nu ,\textrm{d}x). \end{aligned}$$

In particular \(U_t(\nu )\), \(U_t(\nu \pm \tfrac{1}{n}\delta _x)>0\) for \(\textsf{J}^{\pm }_t, \Theta ^{n,\pm }_{\textsf{P}_t}\)-a.e. \(\nu ,x\).

Moreover, set \(\textsf{J}^{\pm }_t=G_t^{\pm } \vartheta _{\Pi _{{n}}}^{\pm }\), \(\textsf{J}^{\textrm{net}}_t=G^{\textrm{net}}_t \vartheta _{\Pi _{{n}}}^+\) (or \(G^{\textrm{net}}_t:=G_t^+-G_t^-\circ \textsf{T}^{n,+})\), and

$$\begin{aligned} \ell :=(1+\nu (\mathcal {T})^2)^{-1} \Pi _n,\qquad \Sigma ^{\pm }:=(1+\nu (\mathcal {T})^2)^{-1} \vartheta ^{\pm }_{\Pi _n}. \end{aligned}$$

By Lemma 3.15, the map \(t\mapsto U_t\) is absolutely continuous and a.e. differentiable in \(L^1(\mathcal {P}(\Gamma ),\ell )\) with

$$\begin{aligned} \begin{aligned} \partial _t U_t(\nu )&= n \int _{\mathcal {T}} (G_t^-(\nu +\tfrac{1}{n}\delta _x,x)-G_t^+(\nu ,x)) \, \kappa ^+_{\nu }(\textrm{d}x)\\&+ n\int _{\mathcal {T}} (G_t^+(\nu -\tfrac{1}{n}\delta _x,x)-G_t^-(\nu ,x)) \, \kappa ^-_{\nu }(\textrm{d}x), \end{aligned} \end{aligned}$$

or in terms of the net flux,

$$\begin{aligned} \begin{aligned} \partial _t U_t(\nu )= n \int _{\mathcal {T}} G^{\textrm{net}}_t(\nu -\tfrac{1}{n}\delta _x,x) \, \kappa ^-_{\nu }(\textrm{d}x)-n \int _{\mathcal {T}} G^{\textrm{net}}_t(\nu ,x) \, \kappa ^+_{\nu }(\textrm{d}x). \end{aligned} \end{aligned}$$

Therefore, since \((1+\nu (\mathcal {T})^2)\) is bounded from above and below on the support of \(\xi _k\), it is clear that for every mn the maps \(t\mapsto S_t^{k,m}\) are Lipschitz, absolutely continuous and for a.e. \(t\in [0,T]\)

$$\begin{aligned} \frac{\textrm{d}\,}{\textrm{d}t} S_t^{k,m}&=n \int _{\mathcal {T}} DS_t^{k,m}(\nu ) G^{\textrm{net}}_t(\nu -\tfrac{1}{n}\delta _x,x) \, \kappa ^-_{\nu }(\textrm{d}x)-n \int _{\mathcal {T}} DS_t^{k,m}(\nu ) G^{\textrm{net}}_t(\nu ,x) \, \kappa ^+_{\nu }(\textrm{d}x)\\&=\int _{\Gamma \times \mathcal {T}} \overline{\nabla }^{n,+} DS_t^{k,m} \, \textrm{d}\textsf{J}_t, \end{aligned}$$

and in particular, for all \(s,t\in [0,T]\),

$$\begin{aligned} S_t^{k,m}-S_s^{k,m} = \int _s^t \int _{\Gamma \times \mathcal {T}} \overline{\nabla }^{n,+} DS_r^{k,m} \, \textrm{d}\textsf{J}_r. \end{aligned}$$
(3.24)

Recall that the following convergences hold pointwisely:

$$\begin{aligned} \lim _{m\rightarrow \infty } \lim _{k\rightarrow \infty } \overline{\nabla }^{n,+} DS_t^{k,m}=\overline{\nabla }^{n,+} \phi '(U_t),\qquad \text {and}\qquad \lim _{k\rightarrow \infty } \overline{\nabla }^{n,+} \xi _k=0. \end{aligned}$$

Moreover, the following estimate holds for every \((\nu ,x)\):

$$\begin{aligned} |\overline{\nabla }^{n,+} DS_t^m|(\nu ,x)&\le \Vert \xi _k\Vert _{\infty } |\overline{\nabla }^{n,+} DS_t^m|(\nu ,x)+\Vert \phi '_m\Vert _{\infty } |\overline{\nabla }^{n,+} \xi _k|(\nu ,x) \\&\le |\overline{\nabla }^{n,+} DS_t^m|(\nu ,x)+3 m (1+\nu (\mathcal {T}))^{-1}\\&\le |\overline{\nabla }^{n,+} \phi '(U_t)|(\nu ,x)+3 m (1+\nu (\mathcal {T}))^{-1}, \end{aligned}$$

where the final inequality follows from the truncation inequality for discrete derivatives, i.e. \(|\phi _m(\eta )-\phi _m(\nu )|\le |\phi (\eta )-\phi (\nu )|\). Note that by Lemma 3.13, for any \(\textsf{P},\textsf{J}^{\pm }\) with finite \(\mathcal {R}_n\) that

$$\begin{aligned} \int _{\Gamma \times \mathcal {T}} (1+\nu (\mathcal {T}))^{-1} \textrm{d}|\textsf{J}^{\textrm{net}}|<\infty , \end{aligned}$$

and moreover

$$\begin{aligned} \frac{1}{2n}\int _{\Gamma \times \mathcal {T}} |\overline{\nabla }^{n,+} \phi '(U)|\,\textrm{d}|\textsf{J}^{\textrm{net}}| \, \le \mathcal {R}_n(\textsf{P},\textsf{J}^+,\textsf{J}^-)+\int _{\Gamma \times \mathcal {T}} \Psi ^*\left( \frac{1}{2n}\overline{\nabla }^{n,+} \phi '(U)\right) \textrm{d}\Theta _{\textsf{P}}^{n,+}, \end{aligned}$$

with

$$\begin{aligned} \mathcal {D}_n^-(\textsf{P})&:= \int _{\Gamma \times \mathcal {T}} \Psi ^*\left( \frac{1}{2n}\overline{\nabla }^{n,+} \phi '(U)\right) \textrm{d}\Theta _{\textsf{P}}^{n,+} \\&=\int _{U(\nu +n^{-1}\delta _x)>0,U(\nu )>0} \Psi ^*\left( \log U(\nu +\tfrac{1}{n}\delta _x)-U(\nu )\right) \sqrt{U(\nu +\tfrac{1}{n}\delta _x)U(\nu )}\, \textrm{d}\vartheta _{\mathsf \Pi _n}^+\\&=\int _{U(\nu +n^{-1}\delta _x)>0,U(\nu )>0} \left( \sqrt{U(\nu +\tfrac{1}{n}\delta _x)}-\sqrt{U(\nu )}\right) ^2 \textrm{d}\vartheta ^+_{\Pi _n}\\&\le \mathcal {D}_n(\textsf{P}). \end{aligned}$$

Therefore, since \(\mathcal {E}\textrm{nt}(\textsf{P}_0|\Pi _n)<\infty \) we find by a dominated convergence argument and taking subsequent limits in k and m in (3.24) that \(\mathcal {E}\textrm{nt}(\textsf{P}_t|\Pi _n)<\infty \) for all \(t\in [0,T]\),

$$\begin{aligned} \mathcal {E}\textrm{nt}(\textsf{P}_t)-\mathcal {E}\textrm{nt}(\textsf{P}_s)&= \int _s^t \int _{\Gamma \times \mathcal {T}} \overline{\nabla }^{n,+} \phi '(U_r) \, \textrm{d}\textsf{J}^{\textrm{net}}_r\, \textrm{d}r, \qquad s,t\in [0,T]\\ \int _{\Gamma \times \mathcal {T}} |\overline{\nabla }^{n,+} \phi (U_t)| \,\textrm{d}|\textsf{J}^{\textrm{net}}_r|&\le \mathcal {R}_n(\textsf{P}_t,\textsf{J}^+_t,\textsf{J}^-_t)+\mathcal {D}^-_n(\textsf{P}_t), \qquad t\in [0,T]. \end{aligned}$$

and

$$\begin{aligned} \mathcal {I}_n\ge \int _0^T \mathcal {R}_n(\textsf{P}_t,\textsf{J}^+_t,\textsf{J}^-_t)\, \textrm{d}t + \tfrac{1}{2n}\left( \mathcal {E}\textrm{nt}(\textsf{P}_T)-\mathcal {E}\textrm{nt}(\textsf{P}_0)\right) + \int _0^T \mathcal {D}^-_n(\textsf{P}_t)\, \textrm{d}t\ge 0. \end{aligned}$$

Next, assume that \(\mathcal {I}_n=0\). Then the above arguments imply that for a.e. \(t\in [0,T]\),

$$\begin{aligned} \mathcal {R}_n(\textsf{P}_t,\textsf{J}^+_t,\textsf{J}^-_t)\, +\frac{1}{2n}\int _{\Gamma \times \mathcal {T}} \overline{\nabla }^{n,+} \phi '(U_t) \, \textrm{d}\textsf{J}^{\textrm{net}}_t+ \mathcal {D}^-_n(\textsf{P}_t) =0. \end{aligned}$$
(3.25)

To simplify manipulations, let \(U^{\pm }(\nu ,x):=U\circ \textsf{T}_x^{n,\pm }=U(\nu \pm \tfrac{1}{n}\delta _x)\). Note that for the actions,

$$\begin{aligned} \mathcal {E}\textrm{nt}(\textsf{J}^{+}|\Theta ^{n,+}_{\textsf{P}})&=\int _{\Gamma \times \mathcal {T}} 1_{U,V>0} \, \phi \left( G^{+} U/U^+\right) \sqrt{U U^+} \, \textrm{d}\vartheta _{\Pi _n}^{+},\\ \mathcal {E}\textrm{nt}(\textsf{J}^{-}|\Theta ^{n,-}_{\textsf{P}})&=\int _{\Gamma \times \mathcal {T}} 1_{U,U^->0} \, \phi \left( G^{-} U^-/U\right) \sqrt{U U^-} \, \textrm{d}\vartheta _{\Pi _n}^{-}\\&=\int _{\Gamma \times \mathcal {T}} 1_{U,U^+>0} \, \phi \left( G^{-} U^+/U\right) \sqrt{U U^+} \, \textrm{d}\vartheta _{\Pi _n}^{+}, \end{aligned}$$

for the modified Fisher information \(\mathcal {D}_n^-\),

$$\begin{aligned} \mathcal {D}_n^-(\textsf{P})&=\int _{\Gamma \times \mathcal {T}} 1_{U,U^+>0} \left( \sqrt{U^+}-\sqrt{U}\right) ^2 \, \textrm{d}\vartheta ^+_{\Pi _n}, \end{aligned}$$

and finally

$$\begin{aligned} \frac{1}{2n}\int _{\Gamma \times \mathcal {T}} \overline{\nabla }^{n,+} \phi '(U) \, \textrm{d}\textsf{J}=\frac{1}{2} \int _{\Gamma \times \mathcal {T}} (\phi '(U^+)-\phi '(U)) (G^+-G^-\circ \textsf{T}^{n,-}) \, \textrm{d}\vartheta _{\Pi _n}^+, \end{aligned}$$

which due to \(\textsf{J}^{\pm }\ll \Theta ^{n,\pm }_{\textsf{P}}\) is equal to

$$\begin{aligned}\frac{1}{2} \int _{\Gamma \times \mathcal {T}} 1_{U,U^+>0} (\phi '(U^+)-\phi '(U) (G_t^+-G_t^-\circ \textsf{T}^{n,-}) \, \textrm{d}\vartheta _{\Pi _n}^+.\end{aligned}$$

Therefore, after some cumbersome rewriting, the integrands of the left-hand side of (3.25) reads as the indicator functions over \(\{U,U^+>0\}\) multiplied by the terms

$$\begin{aligned}&\phi \left( G^{+} U/U^+\right) \sqrt{U U^+} + \tfrac{1}{2} (\phi '(U^+)-\phi '(U) )G_t^++\phi ^*\left( -\tfrac{1}{2}(\phi '(U^+)-\phi '(U))\right) \\&\quad +\;\phi \left( G^{-}\circ \textsf{T}^{n,+} U^+/U\right) \sqrt{U U^+} - \tfrac{1}{2} (\phi '(U^+)-\phi '(U)) G^{-}\circ \textsf{T}^{n,+}\\&\quad +\phi ^*\left( -\tfrac{1}{2}(\phi '(U^+)-\phi '(U))\right) , \end{aligned}$$

since

$$\begin{aligned}\phi ^*\left( -\tfrac{1}{2}(\phi '(U^+)-\phi '(U))\right) =U-\sqrt{U U^+}, \quad \phi ^*\left( \tfrac{1}{2}(\phi '(U^+)-\phi '(U))\right) =U^+-\sqrt{U U^+}. \end{aligned}$$

By duality of \(\phi ,\phi ^*\) we have \(G^+=U\) and \(G^{-}\circ \textsf{T}^{n,+}=U^+\), hence \(G^-=U\) as well. Subsequently we can conclude that \(\mathcal {I}_n=0\) if and only if \(\textsf{J}^{\pm }_t=\vartheta _{\textsf{P}_t}^{\pm }\) for a.e. \(t\in [0,T]\) and a.e. \(\nu ,x\). \(\square \)

Together, Theorems 3.21 and 3.20 provide a proof of the variational characterization for the forward Kolmogorov equation.

Proof of Theorem 3.8

Under the assumption of \(\mathcal {F}_n(\textsf{P}_0)<\infty \) we have by Theorem 3.21 a chain rule for the entropy, the inequality \(\mathcal {I}_n\ge 0\), and the statement that \(\mathcal {I}_n(\textsf{P},\textsf{J}^+,\textsf{J}^-)=0\) implies that \(\textsf{P}\) is a weak solution. Moreover, due to Theorem 3.20 there exists a weak solution with \(\mathcal {I}_n\le 0\).

It remains to show that gradient-flow solutions are unique, which is a classical argument using the strict convexity of \(\mathcal {F}_n\), e.g. see Theorem 5.9 of [33]. Suppose that there exist two curves \(\textsf{P}^1,\textsf{P}^2\) such that \(\textsf{P}^1_0=\textsf{P}^2_0={{\bar{\textsf{P}}}}\), \(\mathcal {I}_n(\textsf{P}^1,\vartheta _{\textsf{P}^1}^+,\vartheta _{\textsf{P}^1}^-)\) and \(\mathcal {I}_n( \textsf{P}^2,\vartheta _{\textsf{P}^2}^+,\vartheta _{\textsf{P}^2}^-)=0\). Applying the chain rule it is straightforward to verify that for a gradient-flow solution \(\mathcal {I}_{n}^t=0\) for every \(t \in [0,T]\), where

$$\begin{aligned} \mathcal {I}_n^t(\textsf{P},\textsf{J}^+,\textsf{J}^-):= \int _0^t \mathcal {R}_n(\textsf{P}_r,\textsf{J}^+_r,\textsf{J}_r^-)\, \textrm{d}r + \mathcal {F}_n(\textsf{P}_t)-\mathcal {F}_n({{\bar{\textsf{P}}}})+\int _0^t \mathcal {D}_n(\textsf{P}_r) \, \textrm{d}r,\end{aligned}$$

and that \(\mathcal {I}_n^t\ge 0\) for arbitrary curves with initial condition \({{\bar{\textsf{P}}}}\).

Now, define \({\tilde{\textsf{P}}}_t=\tfrac{1}{2}\textsf{P}^1+\tfrac{1}{2}\textsf{P}^2\) and note that \(({\tilde{\textsf{P}}},\vartheta _{\tilde{\textsf{P}}}^+,\vartheta _{{\tilde{\textsf{P}}}}^-)\in \textsf{CE}_n\) as well, and

$$\begin{aligned} \vartheta _{{\tilde{\textsf{P}}}}^{\pm }=\tfrac{1}{2}\vartheta _{\textsf{P}^1}^{\pm }+\tfrac{1}{2}\vartheta _{\textsf{P}^2}^{\pm }. \end{aligned}$$

Fix any \(t \in [0,T]\) and suppose that \(\textsf{P}_t^1\ne \textsf{P}_t^{2}\). Then by convexity of \(\mathcal {R}_n\) and \(\mathcal {D}_n\), and strict convexity of \(\mathcal {F}_n\), we have

$$\begin{aligned} \mathcal {I}_n^t({\tilde{\textsf{P}}},\vartheta _{{\tilde{\textsf{P}}}}^{+},\vartheta _{{\tilde{\textsf{P}}}}^{-})&= \int _0^t \mathcal {R}_n({\tilde{\textsf{P}}}_r,\vartheta _{{\tilde{\textsf{P}}}_r}^+,\vartheta _{{\tilde{\textsf{P}}}_r}^-)\, \textrm{d}r + \mathcal {F}_n({\tilde{\textsf{P}}}_t)-\mathcal {F}_n({{\bar{\textsf{P}}}})+\int _0^t \mathcal {D}_n(\textsf{P}_r) \, \textrm{d}r\\&< \tfrac{1}{2}\mathcal {I}_n^t(\textsf{P}^1,\vartheta _{\textsf{P}^1}^+,\vartheta _{\textsf{P}^1}^-)+\tfrac{1}{2}\mathcal {I}_n^t(\textsf{P}^2,\vartheta _{\textsf{P}^2}^+,\vartheta _{\textsf{P}^2}^-)=0, \end{aligned}$$

which leads to a contradiction, and hence \(\textsf{P}_t^1= \textsf{P}_t^{2}\) for all \(t \in [0,T]\). \(\square \)

4 Liouville equation and lifted dynamics

In this section, we will consider the variational formulation for our proposed limit of the forward Kolmogorov equation (\(\mathsf FKE_n\)), namely the Liouville equation

figure l

It can be interpreted as a transport equation lifted from the mean-field dynamics, in the sense that it describes the evolution of the law of a deterministic process satisfying the mean-field equation but with possibly random initial conditions. We will consider the same ingredients as in previous sections, namely a non-negative EDP functional consisting of an action term, a difference of free energies, and a corresponding Fisher information term. The main technical tool that we use is a new superposition principle, which allows us to prove the chain rule via the results on mean-field curves of Sect. 2.

Solutions to (Li) are defined as appropriate weak solutions to

$$\begin{aligned} \partial _t \textsf{P}_t =Q_{\infty }^* \, \textsf{P}_t, \end{aligned}$$

where \(\textsf{P}_t \in \mathcal {P}(\Gamma )\) for all \(t\in [0,T]\) and the operator \(Q_{\infty }^*\) is the dual of \(Q_{\infty }\) given by

$$\begin{aligned} \begin{aligned} (Q_{\infty } F)(\nu )&= \int _{\mathcal {T}}(\textrm{grad}_{\Gamma }F)(\nu ,x) V[\nu ](\textrm{d}x),\\ V[\nu ]&:=\kappa ^+[\nu ]-\kappa ^-[\nu ], \end{aligned} \end{aligned}$$

for all \(F\in \textrm{Cyl}_c(\Gamma )\). Here \(\textrm{Cyl}_c(\Gamma )\) is the space of all compactly supported smooth cylinder functions, i.e. those of the form

$$\begin{aligned} F(\nu )=g\left( \langle 1,\nu \rangle ,\langle f_1,\nu \rangle ,\dots ,\langle f_m,\nu \rangle \right) , \end{aligned}$$

where \(g\in C^{\infty }_c({\mathbb {R}}^{m})\) with \(m \in {\mathbb {N}}\), and \(f_1,\dots ,f_m\in C_b(\mathcal {T})\), and \(\textrm{grad}_{\Gamma }\) is the distributional gradient defined by

$$\begin{aligned} \textrm{grad}_{\Gamma }\, F(\nu ,x)= (\nabla g)\left( \langle 1,\nu \rangle ,\langle f_1,\nu \rangle ,\dots ,\langle f_m,\nu \rangle \right) \cdot (1,f_1(x),\dots ,f_m(x))^\top . \end{aligned}$$

To be precise, we consider the following type of solutions.

Definition 4.1

A curve \((\textsf{P}_t)_{t\in [0,T]}\) is a weak solution to (\(\mathsf Li\)) if \(\textsf{P}_t\) is continuous in the narrow topology and for all \(s,t\in [0,T]\), and all \(F\in \textrm{Cyl}_c(\Gamma )\),

$$\begin{aligned} \int _{\Gamma } F(\nu ) \,\textrm{d}\textsf{P}_t - \int _{\Gamma } F(\nu ) \,\textrm{d}\textsf{P}_s = \int _s^t \int _{\Gamma \times \mathcal {T}} (\textrm{grad}_{\Gamma }F)(\nu ,x) V[\nu ](\textrm{d}x) \textsf{P}_t(\textrm{d}\nu ) \, \textrm{d}r. \end{aligned}$$
(4.1)

Remark 4.2

Note that (Li) is the transport equation associated with the measure-valued vector field \(V[\nu ]\). Now let the flow \(G:[0,T]\times \Gamma \rightarrow \Gamma \) be the unique strong solution to the mean-field equation, i.e. with

$$\begin{aligned} \partial _t G_t[\nu ]=V[G_t[\nu ]]. \end{aligned}$$
(4.2)

As will be shown in Sect. 4.2, \(\textsf{P}_t:=(G_t)_{\#} {{\bar{\textsf{P}}}}\) is a weak solution to (\(\mathsf Li\)) for any initial data \({{\bar{\textsf{P}}}}\in \mathcal {P}(\Gamma )\). In particular, if \(\nu _t\) is a solution to (\(\mathsf MF\)) than \(\textsf{P}_t:=\delta _{\nu _t}\) is a weak solution to (Li).

Instead of the solution to (Li), we will now consider arbitrary curves satisfying

figure m

in the following appropriate distributional sense.

Definition 4.3

(Continuity equation)  A triple \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\) satisfies the continuity equation \(\textsf{CE}_{\infty }\), if

  1. (1)

    the curve \([0,T]\ni t\mapsto \textsf{P}_t\in \mathcal {P}(\Gamma )\) is narrowly continuous,

  2. (2)

    the Borel family \((\textsf{J}^{\pm }_t)_{t\in [0,T]}\in \mathcal {M}^+_{loc}(\Gamma \times \mathcal {T})\) satisfies

    $$\begin{aligned} \int _0^T \int _{\Gamma \times \mathcal {T}} (1+\nu (\mathcal {T})^2)^{-1}\,\textrm{d}\textsf{J}^{\pm }_{t} \, \textrm{d}t<\infty , \end{aligned}$$
  3. (3)

    for every \(s,t\in [0,T]\) and all \(F\in \textrm{Cyl}_c(\Gamma )\)

    $$\begin{aligned} \int _{\Gamma } F(\nu ) \,\textrm{d}\textsf{P}_t - \int _{\Gamma } F(\nu ) \,\textrm{d}\textsf{P}_s = \int _s^t \int _{\Gamma \times \mathcal {T}} \textrm{grad}_{\Gamma }F \,(\textrm{d}\textsf{J}_r^+-\textrm{d}\textsf{J}_r^-) \, \textrm{d}r. \end{aligned}$$

Moreover, let us introduce the EDP-functional. Recall from Sect. 3 the notation \(\vartheta _{\textsf{P}}^{\pm }(\textrm{d}\nu ,\textrm{d}x):=\kappa ^{\pm }[\nu ](\textrm{d}x)\textsf{P}(\textrm{d}\nu )\).

Definition 4.4

Let \(\Theta ^{\infty }_{\textsf{P}}\in \mathcal {M}_{loc}(\Gamma \times \mathcal {T})\) be the geometric average of \(\vartheta ^{+}_{\textsf{P}}\) and \(\vartheta ^{-}_{\textsf{P}}\), i.e.

$$\begin{aligned} \Theta ^{\infty }_{\textsf{P}}(\textrm{d}\nu ,\textrm{d}x):=\sqrt{\frac{\textrm{d}\vartheta ^{+}_{\textsf{P}}}{\textrm{d}\Sigma }\frac{\textrm{d}\vartheta ^{-}_{\textsf{P}}}{\textrm{d}\Sigma }}\, \,\textrm{d}\Sigma , \end{aligned}$$

for any dominating measure \(\Sigma \). We define the following objects:

  • The dissipation potential \(\mathcal {R}_{\infty }:\mathcal {P}(\Gamma )\times \mathcal {M}_{loc}^+(\Gamma \times \mathcal {T})^2\rightarrow [0,+\infty ]\),

    $$\begin{aligned} \mathcal {R}_{\infty }(\textsf{P},\textsf{J}^+,\textsf{J}^-):=\mathcal {E}\textrm{nt}(\textsf{J}^{+}|\Theta ^{\infty }_{\textsf{P}})+\mathcal {E}\textrm{nt}(\textsf{J}^{-}|\Theta ^{\infty }_{\textsf{P}}). \end{aligned}$$
  • The dual dissipation potential \(\mathcal {R}^*_{\infty }:\mathcal {P}(\Gamma )\times \mathcal {B}_c(\Gamma \times \mathcal {T})^2\rightarrow {\mathbb {R}}\),

    $$\begin{aligned} \mathcal {R}_{\infty }^*(\textsf{P},\omega ^+,\omega ^-):=\int _{\Gamma \times \mathcal {T}} (e^{\omega ^{+}}-1)\, \textrm{d}\Theta ^{\infty }_{\textsf{P}}+\int _{\Gamma \times \mathcal {T}} (e^{\omega ^{-}}-1)\, \textrm{d}\Theta ^{\infty }_{\textsf{P}}. \end{aligned}$$
  • The free energy \(\mathcal {F}_{\infty }:\mathcal {P}(\Gamma )\rightarrow [0,+\infty ]\),

    $$\begin{aligned} \mathcal {F}_{\infty }(\textsf{P}):=\int _{\Gamma } \mathcal {F}_{MF}(\nu )\, \textsf{P}(\textrm{d}\nu ). \end{aligned}$$
  • The Fisher information \(\mathcal {D}_{\infty }:\mathcal {P}(\Gamma )\rightarrow [0,+\infty ]\),

    $$\begin{aligned} \mathcal {D}_{\infty }(\textsf{P}):=\int _{\Gamma } \mathcal {D}_{MF}(\nu ) \, \textsf{P}(\textrm{d}\nu ). \end{aligned}$$
  • The EDP-functional \(\mathcal {I}_{\infty }:\textsf{CE}_{\infty }\rightarrow [0,+\infty ]\) for all curves with \(\mathcal {F}_{\infty }(\textsf{P}_0)<\infty \),

    $$\begin{aligned} \mathcal {I}_{\infty }(\textsf{P},\textsf{J}^+,\textsf{J}^-):=\int _0^T \mathcal {R}_{\infty }(\textsf{P}_t,\textsf{J}_t^+,\textsf{J}^-_t) \, \textrm{d}t + \mathcal {F}_\infty ({\textsf{P}_T})-\mathcal {F}_\infty ({\textsf{P}_0})+\int _0^T \mathcal {D}_{\infty }(\textsf{P}_t) \, \textrm{d}t. \end{aligned}$$

Remark 4.5

Recall from Sect. 2 that \(\mathcal {F}_{MF}(\nu ):=\tfrac{1}{2}\mathcal {E}\textrm{nt}(\nu |\gamma )\) and

$$\begin{aligned} \mathcal {D}_{MF}(\nu ):=\left\{ \begin{aligned}&2H^2(\kappa _{\nu }^+,\kappa _{\nu }^-),{} & {} \qquad \hbox { if}\ \nu \ll \gamma ,\\&+\infty ,{} & {} \qquad \hbox {otherwise.} \end{aligned}\right. \end{aligned}$$

In particular, if \(\mathcal {F}_{\infty }(\textsf{P})<\infty \) we have

$$\begin{aligned} \mathcal {D}_{\infty }(\textsf{P})&=2 \int _{\Gamma } H^2(\kappa _{\nu }^+,\kappa _{\nu }^-) \, \textsf{P}(\textrm{d}\nu ) =2 H^2(\vartheta _{\textsf{P}}^+,\vartheta _{\textsf{P}}^-). \end{aligned}$$

Remark 4.6

Note that \(\Theta _{\textsf{P}}^ {\infty }(\textrm{d}\nu ,\textrm{d}x)=\textsf{P}(\textrm{d}\nu ) \theta _{\nu }(\textrm{d}x)\). Moreover, if \(\mathcal {E}\textrm{nt}(\textsf{J}_t^{\pm }|\Theta _{\textsf{P}_t}^{\infty })\) is finite, we can set

$$\begin{aligned} \lambda ^{\pm }_{t}[\nu ](\textrm{d}x):=\frac{\textrm{d}\textsf{J}_t^{\pm }}{\textrm{d}\Theta _{\textsf{P}_t}^ {\infty }}(\nu ,x)\,\theta _{\nu }(\textrm{d}x), \end{aligned}$$

and it is straightforward to verify that we have the disintegration

$$\begin{aligned} \textsf{J}^{\pm }(\textrm{d}\nu ,\textrm{d}x)=\lambda _t^{\pm }[\nu ](\textrm{d}x)\textsf{P}_t(\textrm{d}\nu ), \end{aligned}$$

and the equivalence

$$\begin{aligned} \mathcal {E}\textrm{nt}(\textsf{J}_t^\pm |\Theta ^{\infty }_{\textsf{P}_t})=\int _{\Gamma } \mathcal {E}\textrm{nt}(\lambda _{t}^\pm [\nu ]|\theta _{\nu })\, \textrm{d}\textsf{P}_t. \end{aligned}$$
(4.3)

Together with the definitions of \(\mathcal {F}_{\infty }\) and \(\mathcal {D}_{\infty }\) this implies that if \(\mathcal {I}_{\infty }(\textsf{P},\textsf{J}^+,\textsf{J}^-)\) is finite then the \(\lambda _{t}^{\pm }[\nu ]\) are well-defined for a.e. \(t\in [0,T]\), and

$$\begin{aligned} {\mathcal {I}_{\infty }(\textsf{P},\textsf{J}^+,\textsf{J}^-)}&= \int _{\Gamma } \mathcal {F}_{MF}(\nu ) \textsf{P}_T(\textrm{d}\nu )-\int _{\Gamma } \mathcal {F}_{MF}(\nu ) \textsf{P}_0(\textrm{d}\nu ) \\&+ \int _0^T \int _{\Gamma } \left( \mathcal {R}_{MF}(\nu _t,\lambda ^+_{t}[\nu ],\lambda ^-_{t}[\nu ])+\mathcal {D}_{MF}(\nu )\right) \textsf{P}_t(\textrm{d}\nu ) \, \textrm{d}t. \end{aligned}$$

Throughout the rest of this section we will simply write \(\lambda _{t,\nu }^{\pm }=\lambda ^\pm _t[\nu ]\).

We will show the following equivalence, which subsumes Theorem (1.7).

Theorem 4.7

For any \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{\infty }\) with \(\mathcal {F}_{\infty }(\textsf{P}_0)<\infty \), the EDP-functional \(\mathcal {I}_{\infty }\) is finite if and only if there exists a Borel probability measure Q over \(C([0,T];\Gamma )\) such that

  1. (1)

    for the time-evaluations \(e_t\) we have \((e_t)_{\#}Q=\textsf{P}_t\) for all \(t\in [0,T]\),

  2. (2)

    the measure Q is concentrated on the family of curves \(\nu \in AC([0,T];(\Gamma ,\Vert \cdot \Vert _{TV}))\) such that \((\nu ,\lambda ^+_{\nu },\lambda ^-_{\nu }) \in \mathscr{C}\mathscr{E}\), where \(\lambda _\nu ^\pm \) is defined via the disintegration

    $$\begin{aligned} \textsf{J}_t^{\pm }(\textrm{d}\nu ,\textrm{d}x)=\lambda _{t,\nu }^{\pm }(\textrm{d}x)\textsf{P}_t(\textrm{d}\nu )\qquad \text {for a.e. } t\in [0,T], \end{aligned}$$
  3. (3)

    we have the representation

    $$\begin{aligned} \mathcal {I}_{\infty }(\textsf{P},\textsf{J}^+,\textsf{J}^-)= \int \mathcal {I}_{MF}\left( \nu ,\lambda ^+_{\nu },\lambda ^-_{\nu }\right) \textrm{d}Q, \end{aligned}$$

    with the latter term finite.

In particular, \(\mathcal {I}_{\infty }\ge 0\), and

$$\begin{aligned} \mathcal {I}_{\infty }(\textsf{P},\textsf{J}^+,\textsf{J}^-)=0 \iff \left\{ \begin{aligned} \quad&\hbox {} \textsf{P}_t \hbox {\; is the weak solution to }(\textsf{Li}) \hbox { with } \textsf{P}_t=(G_t)_{\#} \textsf{P}_0 \\ \quad \textsf{J}^{\pm }_t&=\textsf{P}_t \kappa _{\nu }^{\pm } \quad \hbox { for a.e.}\ t\in [0,T]\\ \end{aligned} \right. \end{aligned}$$

Here \(G_t:\Gamma \rightarrow \Gamma \) maps \({{\bar{\nu }}}\) to the unique mean-field solution \(\nu _t\) at time t, see Remark 4.2. It is determined by

$$\begin{aligned} \partial _t G_t[\nu ]=V[G_t[\nu ]]. \end{aligned}$$

We do not have a priori uniqueness of the Liouville equation. However, we do have the uniqueness of weak solutions for which a superposition holds, in particular for curves with finite \(\mathcal {I}_{\infty }\). Therefore gradient-flow solutions (null-minimizers of \(\mathcal {I}_{\infty }\)) are unique.

In the case of \(\textsf{P}_t:=\delta _{\nu _t}\) with \(\nu _t\) the solution to the mean-field equation there is a trivial superposition principle, and we have the following consequence.

Corollary 4.8

Suppose \(\textsf{P}_0=\delta _{\nu _0}\) with \(\mathcal {F}_{MF}(\nu _0)<\infty \). Then

$$\begin{aligned} \mathcal {I}_{\infty }(\textsf{P},\textsf{J}^+,\textsf{J}^-)=0 \iff \left\{ \begin{aligned} \quad&\textsf{P}_t=\delta _{\nu _t},\quad \nu _t \hbox { is the unique strong solution to } (\mathsf MF) \quad \\ \quad&\textsf{J}^{\pm }_t=\textsf{P}_t \kappa _{\nu }^{\pm } \quad \hbox {for a.e. } t\in [0,T] \quad \end{aligned} \right. \end{aligned}$$

4.1 A priori estimates

Due to the representation (4.3) of the dissipation potential in terms of mean-field objects, we can directly derive the following estimates from Lemma’s 2.10 and 2.13.

Corollary 4.9

Let \(\textsf{P}\in \mathcal {P}(\Gamma ),\textsf{J}^{\pm }\in \mathcal {M}^+_{loc}(\Gamma \times \mathcal {T})\) be such that \(\mathcal {R}_{\infty }(\textsf{P},\textsf{J}^+,\textsf{J}^-)<\infty \), and set

$$\begin{aligned} \lambda ^{\textrm{net}}_{\nu }:=\lambda ^{+}_{\nu }-\lambda ^{-}_{\nu }. \end{aligned}$$

Then the following estimates hold:

$$\begin{aligned} \int _{\Gamma } M\phi \left( \frac{\lambda _{\nu }^{\pm }(\mathcal {T})}{M(1+\nu (\mathcal {T}{})^2)}\vee 1\right) \textsf{P}(\textrm{d}\nu )&\le \mathcal {R}_{\infty }(\textsf{P},\textsf{J}^+,\textsf{J}^-),\\ \int _{\Gamma } M\Psi \left( \frac{\Vert \lambda ^{\textrm{net}}_{\nu }\Vert _{TV}}{M(1+\nu (\mathcal {T}{}))}\right) \textsf{P}(\textrm{d}\nu )&\le \mathcal {R}_{\infty }(\textsf{P},\textsf{J}^+,\textsf{J}^-). \end{aligned}$$

Moreover, the following equivalence follows straightforwardly from Lemma 3.13.

Corollary 4.10

For any \(\textsf{P}\in \mathcal {P}(\Gamma ),\textsf{J}^{\pm }\in \mathcal {M}^+_{loc}(\Gamma \times \mathcal {T})\)

$$\begin{aligned} \mathcal {E}\textrm{nt}(\textsf{J}^{\pm }|\Theta ^{\infty }_{\textsf{P}})=\int _{\Gamma \times \mathcal {T}} \Upsilon \left( \frac{\textrm{d}\textsf{J}^{\pm }}{\textrm{d}\Sigma },\frac{\textrm{d}\vartheta _{\textsf{P}}^+}{\textrm{d}\Sigma },\frac{\textrm{d}\vartheta _{\textsf{P}}^-}{\textrm{d}\Sigma }\right) \textrm{d}\Sigma , \end{aligned}$$

for any common dominating measure \(\Sigma \).

Finally, we consider the time regularity for arbitrary curves, with respect to the following metric.

Definition 4.11

We define the following metric:

$$\begin{aligned} W(\textsf{P}^1,\textsf{P}^2):=\sup _{F\in {\mathbb {F}}} \left\{ \int _{\Gamma } F \,\textrm{d}(\textsf{P}^1-\textsf{P}^2) \right\} ,\qquad \textsf{P}^1,\textsf{P}^2\in \mathcal {P}(\Gamma ), \end{aligned}$$
(4.4)

where

$$\begin{aligned} {{\mathbb {F}}:=\Bigl \{ F\in \textrm{Cyl}_c(\Gamma )\,: \, (1+\nu (\mathcal {T})^2)\left| (\textrm{grad}_{\Gamma }F)(\nu ,x)\right| \le 1, \text{ for } \text{ all } x\in \mathcal {T}, \nu \in \Gamma \Bigr \}.} \end{aligned}$$

Note that W is narrowly lower semicontinuous. Moreover, we have that

$$\begin{aligned} {\sup _{(\nu ,x)\in \Gamma \times \mathcal {T}} (1+\nu (\mathcal {T})^2)|(\textrm{grad}_{\Gamma }\,F)(\nu ,x)| <\infty \qquad \hbox { for any}\ F\in \textrm{Cyl}_c(\Gamma ),} \end{aligned}$$

and hence by a density argument, it is straightforward to verify that convergence in W implies vague convergence on \(\Gamma \), and therefore narrow convergence on narrowly pre-compact subsets.

Remark 4.12

Formally, one can represent W as a transport distance, in the sense that

$$\begin{aligned} W(\textsf{P}^1,\textsf{P}^2)=W_{d_{\Gamma }}(\textsf{P}^1,\textsf{P}^2), \end{aligned}$$

where \(W_{d_{\Gamma }}\) is the 1-Wasserstein metric on \(\mathcal {P}(\Gamma )\) induced by the metric \(d_{\Gamma }\) over \(\Gamma \) given by

$$\begin{aligned} d_{\Gamma }(\nu ^1,\nu ^2):=\inf _{(\nu _t)_{t\in [0,1]}} \left\{ \int _0^1 \frac{|{\dot{\nu }}_t|_{TV}}{1+\nu _t(\mathcal {T})^2}\, \textrm{d}t\,: \, \nu _0=\nu ^0, \, \nu _1=\nu ^2\right\} . \end{aligned}$$

However, we do not require such representations in this current work.

Lemma 4.13

For any \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{\infty }\) we have

$$\begin{aligned} W(\textsf{P}_s,\textsf{P}_t)\le 2 \int _s^t \int _{\Gamma \times \mathcal {T}} (1+\nu (\mathcal {T})^2)^{-1} \textrm{d}(\textsf{J}_r^{+}+\textsf{J}^-_r)\, \textrm{d}r, \qquad \text{ for } \text{ all } s,t\in [0,T]. \end{aligned}$$

Proof

This follows directly from the continuity equation, since for any \(F\in {\mathbb {F}}\), \(s,t\in [0,T]\):

$$\begin{aligned} \left| \int _{\Gamma } F(\nu ) \,\textrm{d}\textsf{P}_t - \int _{\Gamma } F(\nu ) \,\textrm{d}\textsf{P}_s \right|&\le \int _s^t \int _{\Gamma \times \mathcal {T}} \left| (1+\nu (\mathcal {T})^2) \textrm{grad}_{\Gamma }F \right| (1+\nu (\mathcal {T})^2)^{-1}\,\textrm{d}(\textsf{J}_r^++\textrm{d}\textsf{J}_r^-) \, \textrm{d}r\\&\le \int _s^t \int _{\Gamma \times \mathcal {T}} (1+\nu (\mathcal {T})^2)^{-1}\,\textrm{d}(\textsf{J}_r^++\textrm{d}\textsf{J}_r^-) \, \textrm{d}r. \end{aligned}$$

Taking the supremum over all \(F\in {\mathbb {F}}\) we obtain the desired statement. \(\square \)

4.2 Weak solutions

Here we briefly consider the existence and representations for solutions to the Liouville equation.

Lemma 4.14

For any \({{\bar{\textsf{P}}}}_t\in \mathcal {P}(\Gamma )\) there exists a solution \(\textsf{P}\) to (Li) with initial data \({{\bar{\textsf{P}}}}\).

Proof

Recall the flow \(G:[0,T]\times \Gamma \rightarrow \Gamma \) determined by

$$\begin{aligned} \partial _t G_t[\nu ]=V[G_t[\nu ]], \end{aligned}$$

Set \(\textsf{P}_t:=(G_t)_{\#} {{\bar{\textsf{P}}}}\). We will show that \(\textsf{P}_t\) is a weak solution in the sense of (4.1). Namely, consider any \(F\in \textrm{Cyl}_c(\Gamma )\). Due to the strong regularity of solutions to the mean-field equation, it is straightforward to show that for all \(s,t\in [0,T]\) we have the chain rule

$$\begin{aligned} F\circ G_t(\nu )-F(\nu )=\int _s^t (\textrm{grad}_{\Gamma } F)(G_r \circ \nu ,x) \, \textrm{d}V[G_r\circ \nu ] \, \textrm{d}r, \end{aligned}$$

and hence

$$\begin{aligned} \int _{\Gamma } F \textrm{d}\textsf{P}_t-\int _{\Gamma } F \textrm{d}\textsf{P}_s&= \left( \int _s^t (\textrm{grad}_{\Gamma } F)(G_r \circ \nu ,x) \, V[G_r[\nu ]](\textrm{d}x)\, \textrm{d}t\right) {{\bar{\textsf{P}}}}(\textrm{d}\nu )\\&=\int _s^t \int _{\Gamma \times \mathcal {T}}(\textrm{grad}_{\Gamma } F)(\nu ,x) V[\nu ](\textrm{d}x) \textsf{P}_r(\textrm{d}\nu ) \, \textrm{d}t, \end{aligned}$$

and thus \(\textsf{P}_t\) is indeed a weak solution. \(\square \)

4.3 Superposition principle

One of our main tools in proving the chain rule, uniqueness of solutions, and the variational representation of Theorem 4.7 is the superposition principle. It guarantees that we can represent the action as an expectation of the mean-field action under some measure over curves in \(\mathscr{C}\mathscr{E}\), and allows us to use the theory on mean-field dynamics of Sect. 2. In this section, we will make this notion precise.

Theorem 4.15

Let \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{\infty }\) with

$$\begin{aligned} \int _0^T \mathcal {R}_{\infty }(\textsf{P}_t,\textsf{J}^+_t,\textsf{J}^-_t) \, \textrm{d}t < \infty . \end{aligned}$$

Then there exists a Borel probability measure \(Q\in \mathcal {P}(C([0,T];\Gamma ))\) satisfying \((e_t)_{\#}Q=\textsf{P}_t\) for all \(t\in [0,T]\), and concentrated on curves \(\nu \in AC([0,T];(\Gamma ,\Vert \cdot \Vert _{TV}))\), for which \((\nu ,\lambda ^+_{\nu },\lambda ^-_{\nu }) \in \mathscr{C}\mathscr{E}\). Moreover,

$$\begin{aligned} \int _0^T \mathcal {R}_{\infty }(\textsf{P}_t,\textsf{J}^+_t,\textsf{J}^-_t)\, \textrm{d}t = \int _{C([0,T];\Gamma )} \left( \int _0^T \mathcal {R}_{MF}\left( \nu _t,\lambda ^+_{t,\nu },\lambda ^-_{t,\nu }\right) \, \textrm{d}t \right) Q(\textrm{d}\nu ). \end{aligned}$$
(4.5)

Conversely, if there is a Borel probability measure \(Q\in \mathcal {P}(C([0,T];\Gamma ))\) concentrated on curves \(\nu \in AC([0,T];(\Gamma ,\Vert \cdot \Vert _{TV}))\) and a Borel family \(\{\lambda ^{\pm }_{t,\nu }\}\), for which \((\nu ,\lambda ^+_{\nu },\lambda ^-_{\nu })\in \mathscr{C}\mathscr{E}\), with

$$\begin{aligned} \int _{C([0,T];\Gamma )} \left( \int _0^T \mathcal {R}_{MF}\left( \nu _t,\lambda ^+_{t,\nu },\lambda ^-_{t,\nu }\right) \, \textrm{d}t \right) Q(\textrm{d}\nu )<\infty , \end{aligned}$$

then \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in {\textsf{CE}_{\infty }}\) for \(\textsf{P}_t:=(e_t)_{\#}Q\), \(\textsf{J}_t^{\pm }:=\textsf{P}_t \lambda ^{\pm }_{t,\nu }\), and (4.5) holds as well.

The inspiration for using a superposition principle stems from similar approaches in [11, 12], where it is applied to transport equations lifted from the Boltzmann-equation or mean-field jump dynamics respectively, and the main ingredient is the abstract superposition principle over \({\mathbb {R}}^{\mathbb {N}}\) of [2]. However, these results are not directly applicable to our setting, since the mass of \(\nu _t(\mathcal {T})\) for a mean-field curve is not fixed, and \(V[\nu ](\mathcal {T})\) is finite but unbounded over \(\Gamma \). We remedy this by combining two known superposition principles: on the one hand, the abstract superposition principle over \({\mathbb {R}}^{\mathbb {N}}\) of [2], and on the other hand one for finite-dimensional vector fields with linear growth, found in [1]. Our result is stated in Theorem B.1.

Proof

Consider any \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{\infty }\) with finite \(\mathcal {R}_{\infty }\), and for a.e. \(t\in [0,T]\) set \(\lambda ^{\textrm{net}}_{t,\nu }:=\lambda ^+_{t,{\nu }}-\lambda ^-_{t,{\nu }}\). By Corollary 4.9,

$$\begin{aligned} \int _{\Gamma } M\Psi \left( \frac{\Vert \lambda ^{\textrm{net}}_{t,\nu }\Vert _{TV}}{M(1+\nu (\mathcal {T}))}\right) \textsf{P}_t(\textrm{d}\nu ) \le \mathcal {R}_{\infty }(\textsf{P}_t,\textsf{J}^+_t,{\textsf{J}_t^-}). \end{aligned}$$
(4.6)

Now, take a countable and dense set \(f_1,f_2,\ldots \in C_b(\mathcal {T})\), with \(f_1=1\), \(\Vert f_i\Vert _{\infty }\le 1\), \(i\ge 2\), and define \({\mathbb {T}}:\Gamma \rightarrow {\mathbb {R}}^{{\mathbb {N}}}\)

$$\begin{aligned} {\mathbb {T}}(\nu ):=\left( \int _{\mathcal {T}} f_1 \,\textrm{d}\nu , \int _{\mathcal {T}} f_2 \,\textrm{d}\nu \ldots \right) . \end{aligned}$$

Note that \({\mathbb {T}}(\nu )\) is injective, continuous when \(\Gamma \) is equipped with the narrow topology and \({\mathbb {R}}^{{\mathbb {N}}}\) with product topology, and is an isometry between \((\Gamma ,\Vert \cdot \Vert _{TV})\) and \(({\mathbb {T}}(\Gamma ),|\cdot |_{\infty })\), where \(|\cdot |_{\infty }\) is the uniform norm over \({\mathbb {R}}^{{{\mathbb {N}}}}\). We set \(\sigma _t:={\mathbb {T}}_{\#}\textsf{P}_t \in \mathcal {P}({\mathbb {R}}^{{\mathbb {N}}})\), and for a.e. \(t\in [0,T]\) define the vector field \({\textbf{W}}_t:{\mathbb {R}}^{{{\mathbb {N}}}}\rightarrow {\mathbb {R}}^{{{\mathbb {N}}}}\) via its components

$$\begin{aligned} W_i(t,z):=\int _{X} f_i(x) \,\lambda ^{{\textrm{net}}}_{t,{\mathbb {T}}^{-1}(z)}(\textrm{d}x). \end{aligned}$$

Note that the support of \({\textbf{W}}_t\) is in \({\mathbb {T}}(\Gamma )\), that \(|{\textbf{W}}_t(z)|_{\infty }\le \Vert \lambda ^{{\textrm{net}}}_{t,{\mathbb {T}}^{-1}(z)}\Vert _{TV}\) and \(({\mathbb {T}}(\nu ))_1=\nu (\mathcal {T})\). Therefore, by (4.6) we have the estimate

$$\begin{aligned} \int _0^T \int _{{\mathbb {R}}^{{\mathbb {N}}}} M \Psi \left( \frac{|{\textbf{W}}_t(z)|_{\infty }}{M(1+|z_1|)}\right) \sigma _{{t}}(\textrm{d}z)\, \textrm{d}t \le \int _0^T \mathcal {R}_{\infty }(\textsf{P}_t,\textsf{J}^+_t,\textsf{J}^-_t)\, \textrm{d}t < \infty . \end{aligned}$$

Moreover, \((\sigma ,{\textbf{W}})\) satisfy the continuity equation, in the sense that for all \(g\in \textrm{Cyl}_c({\mathbb {R}}^{\mathbb {N}})\), we have

$$\begin{aligned} \int _{{\mathbb {R}}^{\mathbb {N}}} g\, \textrm{d}\sigma _t - \int _{{\mathbb {R}}^{\mathbb {N}}} g\, \textrm{d}\sigma _s = \int _s^t\int _{{\mathbb {R}}^{\mathbb {N}}} ({\textbf{W}}_r,\nabla g)\, \textrm{d}\sigma _r\,\textrm{d}r\qquad \text {for every } s,t \in [0,T]. \end{aligned}$$

Indeed, take any \(g\in \textrm{Cyl}_c({\mathbb {R}}^{\mathbb {N}})\) and define \(F:=g\circ {\mathbb {T}}\), i.e.

$$\begin{aligned} F(\nu )=g\left( \langle f_1,\nu \rangle ,\dots ,\langle f_m,\nu \rangle \right) . \end{aligned}$$

Note that \(F\in \textrm{Cyl}_c(\Gamma )\), and therefore since \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{\infty }\),

$$\begin{aligned} \int _{{\mathbb {R}}^{{\mathbb {N}}}} g(z)\, \sigma _t(\textrm{d}z)-\int _{{\mathbb {R}}^{{\mathbb {N}}}} g(z)\, \sigma _s(\textrm{d}z)&=\int _{\Gamma } F \textrm{d}\textsf{P}_t-\int _{\Gamma } F \textrm{d}\textsf{P}_s \\&= \int _s^t \int _{\Gamma \times \mathcal {T}} (\textrm{grad}_{\Gamma }\,F)(\nu ,x)(\textsf{J}^+_r-\textsf{J}^-_r)(\textrm{d}\nu ,\textrm{d}x)\, \textrm{d}r\\&=\int _s^t \int _{\Gamma } \sum _{i} (\partial _i g)({\mathbb {T}}(\nu )) \left( \int _{\mathcal {T}} f_i(x) \lambda ^{{\textrm{net}}}_{r,\nu }(\textrm{d}x)\right) \textsf{P}_r(\textrm{d}\nu )\, \textrm{d}r\\&=\int _s^t \int _{{\mathbb {R}}^{{\mathbb {N}}}} \nabla g(z) \cdot {\textbf{W}}_r(z)\, \sigma _r(\textrm{d}z) \, \textrm{d}r. \end{aligned}$$

Thus, we are now in a position to apply Theorem B.1, and obtain a Borel probability measure \(\Omega \) over \(C([0,T];{\mathbb {R}}^{{\mathbb {N}}})\) satisfying \((e_t)_{\#} \Omega =\sigma _t\) for all \(t\in [0,T]\), and which is concentrated on the family of curves \(z\in AC([0,T];{\mathbb {R}}^{{\mathbb {N}}})\) that are solutions to the ODE

$$\begin{aligned} \dot{z}_t={\textbf{W}}_t(z_t)\qquad \text {for almost every } t\in [0,T]. \end{aligned}$$

Note that since \(\textrm{supp}(\sigma )\subseteq {\mathbb {T}}(\Gamma )\), we have \(\textrm{supp}(\Omega )\subseteq AC([0,T];{\mathbb {T}}(\Gamma ))\). Now let \(\tilde{{\mathbb {T}}}:C([0,T];\Gamma )\rightarrow C([0,T];{\mathbb {R}}^{{\mathbb {N}}})\) be defined via \((\tilde{{\mathbb {T}}}(\nu ))_t:={\mathbb {T}}(\nu _t)\). Similar as for \({\mathbb {T}}\), \(\tilde{{\mathbb {T}}}\) is injective and an isometry when seen as a map \(\tilde{{\mathbb {T}}}:AC([0,T];(\Gamma ,\Vert \cdot \Vert _{TV}))\rightarrow AC([0,T];({\mathbb {R}}^{{\mathbb {N}}},|\cdot |_{\infty }))\). Therefore, it is clear the measure \(Q:=\tilde{{\mathbb {T}}}^{-1}_{\#} \Omega \in \mathcal {P}(C([0,T];\Gamma ))\) is well defined, satisfies \(\textsf{P}_t=(e_t)_{\#}Q\) and is concentrated on the family of curves \(\nu \in AC([0,T];(\Gamma ,\Vert \cdot \Vert _{TV}))\), for which

$$\begin{aligned} \int _{\mathcal {T}} f_i \, \textrm{d}\nu _t - \int _{\mathcal {T}} f_i \, \textrm{d}\nu _s = \int _s^t f_i \, \textrm{d}(\lambda ^+_{r,\nu }-\lambda ^+_{r,\nu }) \, \textrm{d}r \qquad \hbox {for all} s,t\in [0,T] \hbox {,} i\in {\mathbb {N}}. \end{aligned}$$

Moreover,

$$\begin{aligned} \int _{C([0,T];\Gamma )} \left( \int _0^T \mathcal {R}_{MF}\left( \nu _t,\lambda ^+_{t,\nu },\lambda ^-_{t,\nu }\right) \, \textrm{d}t \right) Q(\textrm{d}\nu )&=\int _0^T \int _{\Gamma } \mathcal {R}_{MF}(\nu ,\lambda ^+_{t,\nu },\lambda ^-_{t,\nu })\, \textsf{P}_t(\textrm{d}\nu ) \, \textrm{d}t\\&=\int _0^T \mathcal {R}_{\infty }(\textsf{P}_t,\textsf{J}^+_t,\textsf{J}^-_t) \, \textrm{d}t, \end{aligned}$$

where the latter is finite by assumption, and hence, by Lemma 2.14, we deduce that \((\nu ,\lambda _{\nu }^{+},\lambda _{\nu }^{+})\in \mathscr{C}\mathscr{E}\) Q-almost everywhere.

The reverse statement can be derived straightforwardly and we omit the proof. \(\square \)

4.4 Variational characterization

Having all the ingredients at hand, we can now prove the variational characterization for the Liouville equation, namely Theorem 4.7.

Proof of Theorem 4.7

Suppose \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\) is such that \(\mathcal {F}_{\infty }(\textsf{P}_0)<\infty \) and \(\mathcal {I}_{\infty }<\infty \). Since \(\mathcal {F}_{\infty }\) is non-negative we have in particular that

$$\begin{aligned} \int _0^T \mathcal {R}_{\infty }(\textsf{P}_t,{\textsf{J}}_t^+,{\textsf{J}}_t^-)\, \textrm{d}t<\infty , \quad \mathcal {F}_{\infty }(\textsf{P}_T)<\infty , \quad \int _0^T \mathcal {D}_{\infty }(\textsf{P}_t)\, \textrm{d}t <\infty . \end{aligned}$$

Hence, from the superposition principle of Theorem 4.15, we obtain a Borel probability measure Q over \(C([0,T];\Gamma )\) satisfying \((e_t)_{\#}Q=\textsf{P}_t\) for all \(t\in [0,T]\) and concentrated on the family of curves \(\nu \in AC([0,T];(\Gamma ,\Vert \cdot \Vert _{TV}))\) for which \((\nu ,\lambda ^+_{\nu },\lambda ^-_{\nu })\in \mathscr{C}\mathscr{E}\). Moreover,

$$\begin{aligned} \int _{C([0,T];\Gamma )} \left( \int _0^T \mathcal {R}_{MF}\left( \nu _t,\lambda ^+_{t,\nu },\lambda ^-_{t,\nu }\right) \textrm{d}t \right) Q(\textrm{d}\nu )=\int _0^T \mathcal {R}_{\infty }(\textsf{P}_t,\textsf{J}^+_t,\textsf{J}^-_t) \, \textrm{d}t < \infty . \end{aligned}$$

Since \(\mathcal {F}_{\infty }(\textsf{P}_0)<\infty \) we have that for Q-a.e. curve \(\mathcal {F}_{MF}(\nu _0)<\infty \). Moreover, since both \(\mathcal {F}_{\infty }\) and \(\mathcal {D}_{\infty }\) are simply their mean-field counterparts integrated by \(\textsf{P}\), we find

$$\begin{aligned}&\int _{C([0,T];\Gamma )} \mathcal {I}_{MF}\left( \nu ,\lambda ^+_{\nu },\lambda _{\nu }^-\right) Q(\textrm{d}\nu ) \\&\, =\int _{C([0,T];\Gamma )} \left( \int _0^T \mathcal {R}_{MF}\left( \nu _t,\lambda ^+_{t,\nu },\lambda ^-_{t,\nu }\right) \textrm{d}t+\mathcal {F}_{MF}(\nu _t)-\mathcal {F}_{MF}(\nu _0)+\int _0^T \mathcal {D}_{MF}(\nu _t) \, \textrm{d}t\right) Q(\textrm{d}\nu ) \\&\, =\int _{C([0,T];\Gamma )} \left( \int _0^T \mathcal {R}_{MF}\left( \nu _t,\lambda ^+_{t,\nu },\lambda ^-_{t,\nu }\right) \textrm{d}t \right) Q(\textrm{d}\nu ) + \mathcal {F}_{\infty }(\textsf{P}_T)-\mathcal {F}_{\infty }(\textsf{P}_0)+\int _0^T \mathcal {D}_{\infty }(\textsf{P}_t)\, \textrm{d}t\\&\, = \mathcal {I}_{\infty }(\textsf{P},\textsf{J}^+,\textsf{J}^-), \end{aligned}$$

where the second equality follows from Fubini-Tonelli and the fact that \(\mathcal {R}_{MF},\mathcal {D}_{MF},\mathcal {F}_{MF}\ge 0\) and \(\mathcal {F}_{\infty }(\textsf{P}_0)<\infty \). In particular, by the non-negativeness of \(\mathcal {I}_{MF}\) it holds that \(\mathcal {I}_{\infty }\ge 0\).

Moreover, since \(\mathcal {I}_{MF}=0\) if and only if \(\nu \) is the unique strong solution for an initial datum \({{\bar{\nu }}}\) with \(\mathcal {E}\textrm{nt}(\bar{\nu }|\gamma )<\infty \), we derive by non-negativeness of \(\mathcal {I}_{MF}\) that \(\mathcal {I}_{\infty }=0\) if and only if Q is concentrated on the unique solutions of the mean-field equation. In this case, Q is characterized by

$$\begin{aligned} Q={\tilde{G}}_{\#} \textsf{P}_0, \end{aligned}$$

where \(G_t:\Gamma \rightarrow \Gamma \) defined by (4.2) maps any \({{\bar{\nu }}}\) to the unique solution to (\(\mathsf MF\)) for initial condition \(\nu _0={{\bar{\nu }}}\) and \({\tilde{G}}:\Gamma \rightarrow C([0,T],\Gamma )\) is defined via \(({\tilde{G}}(\nu _0))_t:=G_t(\nu _0)\). Note that \(\textsf{P}_t=(G_t)_{\#}\textsf{P}_0\), \(\textsf{J}_t^{\pm }=\textsf{P}_t \kappa _{\nu }^{\pm }\) for almost every \(t\in [0,T]\), and in particular \(\textsf{P}_t\) is a weak solution to (Li).

Vice versa, if \(\textsf{P}\) is a weak solution such that \(\textsf{P}_t=(G_{t})_{\#} \textsf{P}_0\), we simply set

$$\begin{aligned} Q:={\tilde{G}}_{\#} \textsf{P}_0, \qquad \lambda _{\nu }^{\pm }:=\kappa _{\nu }^{\pm } \quad \text{ for } \text{ all } t\in [0,T]. \end{aligned}$$

Since \(\mathcal {F}_{\infty }(\textsf{P}_0)<\infty \), we still have \(\mathcal {E}\textrm{nt}(\nu |\gamma )<\infty \) for \(\textsf{P}_0\)-almost every \(\nu \), and we repeat the same calculations to conclude that indeed \(\mathcal {I}_{\infty }=0\). \(\square \)

5 EDP-convergence

In the previous sections, we have established variational formulations for the solution to the forward Kolmogorov equation of the interacting particle system, for the solutions to the mean-field equation, and for the corresponding Liouville equation. Moreover, for the latter, we have shown how the corresponding EDP-functional can be represented as the expectation over a functional of mean-field paths.

We are now in a position to rigorously discuss the convergence of the forward Kolmogorov equation to the Liouville equation, in terms of the EDP-convergence of their gradient structures. Namely, let us denote a sequence of curves \((\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})\in \textsf{CE}_n\) converging to a curve \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\), denoted by \(\lim _{n\rightarrow \infty }(\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-}) = (\textsf{P},\textsf{J}^+,\textsf{J}^-)\), if the following holds:

  • \(\textsf{P}_t^n \rightarrow \textsf{P}_t\) narrowly for all \(t\in [0,T]\),

  • \(\textsf{J}_t^{n,\pm }(\textrm{d}\nu ,\textrm{d}x) \, \textrm{d}t \rightarrow \textsf{J}_t^{\pm }(\textrm{d}\nu ,\textrm{d}x) \, \textrm{d}t\) vaguely on \(\mathcal {M}^+_{loc}(\Gamma \times \mathcal {T}\times [0,T])\).

Theorem 5.1

Suppose that a sequence \((\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})\in \textsf{CE}_n\), \(n\ge 1\), is such that

$$\begin{aligned} \limsup _{n\rightarrow \infty } \mathcal {F}_n(\textsf{P}_0^n)<\infty , \qquad \limsup _{n\rightarrow \infty } \mathcal {I}_n(\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})<\infty , \end{aligned}$$

then the family of curves \(\{(\textsf{P}_t)_{t\in [0,T]}\}_{n}\) is W-equicontinuous (4.4), and there exists a (not relabelled) subsequence \((\textsf{P}^{n},\textsf{J}^{n,+},\textsf{J}^{n,-})\) and a \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{\infty }\) such that

$$\begin{aligned} \lim _{n\rightarrow \infty } (\textsf{P}^{n},\textsf{J}^{n,+},\textsf{J}^{n,-}) = (\textsf{P},\textsf{J}^+,\textsf{J}^-), \end{aligned}$$

Moreover, for any such converging sequence

$$\begin{aligned} \begin{aligned} \liminf _{n\rightarrow \infty } \mathcal {F}_{n}(\textsf{P}^{n}_t)&\ge \mathcal {F}_{\infty }(\textsf{P}_t),\qquad \text{ for } \text{ all } t\in [0,T],\\ \liminf _{n\rightarrow \infty } \int _0^T \mathcal {R}_{n}(\textsf{P}_t^{n},\textsf{J}^{n,+}_t,\textsf{J}^{n,-}_t)\, \textrm{d}t&\ge \int _0^T \mathcal {R}_{\infty }(\textsf{P}_t,\textsf{J}^{+}_t,\textsf{J}^{-}_t)\, \textrm{d}t, \\ \liminf _{n\rightarrow \infty } \int _0^T \mathcal {D}_{n}(\textsf{P}_t^{n}) \, \textrm{d}t&\ge \int _0^T \mathcal {D}_{\infty }(\textsf{P}_t^{n})\, \textrm{d}t. \end{aligned} \end{aligned}$$
(5.1)

Remark 5.2

The compactness result is slightly stronger. As shown in the proof of Theorem 5.1 the measures \(\textsf{J}_r^{n,\pm }(\textrm{d}\nu ,\textrm{d}x) \, \textrm{d}r\) converge vaguely on \(\mathcal {M}^+_{loc}(\Gamma \times \mathcal {T}\times [s,t])\) for any \(s,t\in [0,T]\).

Note that if in addition, the initial data is well-prepared, in the sense that

$$\begin{aligned} \lim _{n\rightarrow \infty } \mathcal {F}_n(\textsf{P}_0^n)=\mathcal {F}_{\infty }(\textsf{P}_0), \end{aligned}$$

then for any converging subsequence, we have the liminf-estimate

$$\begin{aligned} \liminf _{n\rightarrow \infty } \mathcal {I}_{n}(\textsf{P}^{n},\textsf{J}^{n,+},\textsf{J}^{n,-})\ge \mathcal {I}_{\infty }(\textsf{P},\textsf{J}^{+},\textsf{J}^{-}), \end{aligned}$$
(5.2)

or in other words, obtain evolutionary \(\varGamma \)-convergence of \(\mathcal {I}_n\) to \(\mathcal {I}_{\infty }\).

Now, recall by Theorem 3.8 that unique gradient-flow solutions to the forward Kolmogorov equations (\(\mathsf FKE_n\)) exist, and similarly, gradient-flow solutions to the Liouville equation (Li) are unique by Theorem 4.7. Therefore, modifying classical arguments from [36, 37], we can directly conclude the following convergence for the sequence of solutions.

Theorem 5.3

Consider a converging sequence \(\mathcal {P}(\Gamma _n) \ni {{\bar{\textsf{P}}}}^n\rightarrow {{\bar{\textsf{P}}}} \in \mathcal {P}(\Gamma )\) such that

$$\begin{aligned} \lim _{n\rightarrow \infty } \mathcal {F}_n({{\bar{\textsf{P}}}}^n)=\mathcal {F}_{\infty }({{\bar{\textsf{P}}}}), \end{aligned}$$
(5.3)

and for each \(n\ge 0\) let \(\textsf{P}_t^n\) be the unique gradient-flow solution to ((\(\mathsf FKE_n\))) with initial data \({{\bar{\textsf{P}}}}^n\). Then there exists a unique gradient-flow solution \(\textsf{P}\) to (Li) with initial data \({{\bar{\textsf{P}}}}\). Moreover, we have the convergence

$$\begin{aligned} \begin{aligned} \lim _{n\rightarrow \infty } (\textsf{P}^n,\vartheta _{\textsf{P}^n}^{+},\vartheta _{\textsf{P}^n}^{-})&=(\textsf{P},\vartheta _{\textsf{P}}^{+},\vartheta _{\textsf{P}}^{-})\\ \lim _{n\rightarrow \infty } \mathcal {F}_n(\textsf{P}^n_t)&=\mathcal {F}_{\infty }(\textsf{P}_t), \qquad \text{ for } \text{ all } t\in [0,T]. \end{aligned} \end{aligned}$$

Proof

Recall that \(\mathcal {I}_n(\textsf{P}^n,\vartheta _{\textsf{P}^n}^{+},\vartheta _{\textsf{P}^n}^{-})=0\) for all \(n\ge 0\). Therefore, by (5.3) and Theorem 5.1 we have for any subsequence indexed by \(n'\) converging to a \((\textsf{P},\textsf{J}^{+},\textsf{J}^-)\in \textsf{CE}_{\infty }\) that (5.2) holds, and hence

$$\begin{aligned} 0 = \liminf _{n'\rightarrow \infty } \mathcal {I}_{n{'}}(\textsf{P}^{n'},\vartheta _{\textsf{P}^{n'}}^{+},\vartheta _{\textsf{P}^{n'}}^{-}) \ge \mathcal {I}_{\infty }(\textsf{P},\textsf{J}^{+},\textsf{J}^{-}), \end{aligned}$$

and thus \(\mathcal {I}_{\infty }(\textsf{P},\textsf{J}^{+},\textsf{J}^{-})=0\), which implies that \(\textsf{P}\) is the unique gradient-flow solution to (Li) and \(\textsf{J}_t^{\pm }=\vartheta _{\textsf{P}_t}^{\pm }\) for a.e. \(t\in [0,T]\). The convergence of \(\textsf{P}_t^n\) now follows from a compactness argument, and by lower semicontinuity, we conclude that for every \(t\in [0,T]\),

$$\begin{aligned} {\liminf _{n\rightarrow \infty } \mathcal {F}_n(\textsf{P}_t^n)}&{\ge \mathcal {F}_{\infty }(\textsf{P}_t)}, \\ \limsup _{n\rightarrow \infty } \mathcal {F}_n(\textsf{P}_t^n)&= \lim _{n\rightarrow \infty }\mathcal {F}_n(\textsf{P}_0^n)-\liminf _{n\rightarrow \infty } \int _0^t \left( \mathcal {R}_n(\textsf{P}^n,\vartheta _{\textsf{P}^n}^{+},\vartheta _{\textsf{P}^n}^{-})+\mathcal {D}_n(\textsf{P}_t^n)\right) \, \textrm{d}t\\&{\le }\, \mathcal {F}_{\infty }(\textsf{P}_0)-\int _0^t \left( \mathcal {R}_{\infty }(\textsf{P},\vartheta _{\textsf{P}}^{+},\vartheta _{\textsf{P}}^{-})+\mathcal {D}_{\infty }(\textsf{P})\right) \, \textrm{d}t = \mathcal {F}_{\infty }(\textsf{P}_t), \end{aligned}$$

and therefore \(\liminf _{n\rightarrow \infty } \mathcal {F}_n(\textsf{P}_t^n)=\mathcal {F}_{\infty }(\textsf{P}_t)\). \(\square \)

Now suppose that in addition, the initial sequence of measures \({\bar{\textsf{P}}}^n\) is chaotic, in the sense that

$$\begin{aligned}{{\bar{\textsf{P}}}}^n \rightarrow \delta _{{{\bar{\nu }}}}\quad \text {narrowly for some } {{\bar{\nu }}} \in \Gamma .\end{aligned}$$

Then, as a consequence of Theorem 5.3, we have the propagation of chaos result, namely

$$\begin{aligned}{{\bar{\textsf{P}}}}^n \rightarrow \delta _{{{\bar{\nu }}}_t} \quad \text {narrowly} \text{ for } \text{ all } t\in [0,T],\end{aligned}$$

where \(\nu _t\) is the unique solution to the mean-field equation (2.13) with initial datum \({{\bar{\nu }}}\). As mentioned in the introduction, for interacting particle systems with the number of particles fixed at \(n\in {\mathbb {N}}\), this would imply narrow convergence of the k-marginals at time t to \(\nu _t^{\otimes k}\) (e.g. see [38]), in our setting this implies convergence of the k-correlation functions [4].

Moreover, note that we have a stronger notion of convergence since the free energies \(\mathcal {F}_n\) converge as well. Under appropriate conditions on the initial datum \({{\bar{\nu }}}\), this guarantees a version of propagation of entropic chaoticity. Namely, for any \(\nu \) we define the rescaled Poisson measures

$$\begin{aligned} \Pi _{n,\nu }:=(L_n)_{\#} \pi _{n,\nu },\qquad \text {where}\qquad \pi _{n,\nu }:=\frac{1}{e^{n \nu (\mathcal {T})}-1}\sum _{N=1}^{\infty } \frac{n^N}{N!}\nu ^{\otimes N}. \end{aligned}$$

It is straightforward to check that \(\Pi _{n,\nu ^*}\rightarrow \delta _{\nu ^*}\) narrowly. We then have the following result.

Theorem 5.4

(Propagation of chaos) Consider the setting of Theorem 5.3 and assume additionally that \({{\bar{\textsf{P}}}}=\delta _{{{\bar{\nu }}}}\) for some \(\bar{\nu }\in \Gamma \) with \(\mathcal {E}\textrm{nt}({{\bar{\nu }}}|\gamma )<\infty \). Let \(\nu _t\) be the unique solution to (2.13) with initial datum \({{\bar{\nu }}}\). Then for all \(t\in [0,T]\),

$$\begin{aligned} \begin{aligned} \textsf{P}^n_t \rightarrow \delta _{\nu _t}\quad \text {narrowly},\qquad \text {and}\qquad \lim _{n\rightarrow \infty } \mathcal {E}\textrm{nt}(\textsf{P}_t^n|\Pi _n)&=\mathcal {E}\textrm{nt}(\nu _t|\gamma ). \end{aligned} \end{aligned}$$

If additionally there exists a constant \(C>1\) such that \(C^{-1}\le \textrm{d}{{\bar{\nu }}}/\textrm{d}\gamma \le C\) then

$$\begin{aligned} \lim _{n\rightarrow \infty } \mathcal {E}\textrm{nt}(\textsf{P}^n_t|\Pi _{n,\nu _t})=0, \qquad \text{ for } \text{ all } t\in [0,T].\end{aligned}$$

Theorems 5.1 and 5.4 are proved in Sect. 5.3. However, first, we show \(\Gamma \)-convergence of the free energies in Sect. 5.1 and establish the necessary estimates in Sect. 5.2.

5.1 \(\varGamma \)-convergence of \(\mathcal {F}_n\)

While only the liminf-estimates for the free energy \(\mathcal {F}_n\) are necessary for the proof of Theorem 5.1 and the convergence of solutions, we provide here the full \(\varGamma \)-convergence result. We rely strongly on the characterization of [26], which connects a large deviation principle with rate function I to the fact that

$$\begin{aligned} \mathop {\varGamma \text {-lim}}_{n\rightarrow \infty } \frac{1}{n}\mathcal {E}\textrm{nt}(\textsf{P}|\Pi ^n) = \int _{\Gamma } I(\nu )\, \textsf{P}(\textrm{d}\nu ), \end{aligned}$$

and provides useful sufficient conditions for both.

Recall in our setting that

$$\begin{aligned} \mathcal {F}_n(\textsf{P})=\frac{1}{2n}\mathcal {E}\textrm{nt}(\textsf{P}|\Pi _n),\qquad \mathcal {F}_{\infty }(\textsf{P})=\frac{1}{2}\int _{\Gamma } \mathcal {E}\textrm{nt}(\nu |\gamma )\,\textsf{P}(\textrm{d}\nu ). \, \end{aligned}$$

We then have the following result, which we prove after Lemma 5.6 below.

Theorem 5.5

The family \(\{\mathcal {F}_n\}_{n\ge 1}\) is equicoercive and \(\varGamma \)-converges to \(\mathcal {F}_{\infty }\) in the sense that

  • for any converging sequence \(\textsf{P}^n\rightarrow \textsf{P}\in \mathcal {P}(\Gamma )\):

    $$\begin{aligned} \mathcal {F}_{\infty }(\textsf{P})\le \liminf _{n\rightarrow \infty } \mathcal {F}_n(\textsf{P}^n), \end{aligned}$$
  • for any \(\textsf{P}\in \mathcal {P}(\Gamma )\) with \(\mathcal {F}_{\infty }(\textsf{P})<\infty \) there exists a sequence \(\textsf{P}^n\in \Gamma \) converging to \(\textsf{P}\) such that

    $$\begin{aligned} \lim _{n\rightarrow \infty } \mathcal {F}_n(\textsf{P}^n) = \mathcal {F}_{\infty }(\textsf{P}). \end{aligned}$$

By the results of [26, Theorems 3.4, 3.5] it is sufficient to merely show the corresponding bounds or limits for any \(\textsf{P}\) of the form \(\textsf{P}=\delta _{\nu }\) for some \(\nu \in \Gamma \). Because of this reduction, we can make use of the so-called cumulant generating functionals \(G_n\) given by

$$\begin{aligned} G_n(f):=\frac{1}{n}\log \int _{\Gamma } e^{n \langle f,\nu \rangle }\, \Pi _n(\textrm{d}\nu ), \end{aligned}$$

for any \(f\in \mathcal {B}_b(\Gamma )\), and their limit counterpart

$$\begin{aligned} G(f):=\int _{\mathcal {T}} (e^f-1)\, \textrm{d}\gamma . \end{aligned}$$

Note that by Legendre duality of the entropy functional, we have for all \(n> 0\) the inequality

$$\begin{aligned} \int _{\Gamma } \langle f,\nu \rangle \, \textrm{d}\textsf{P}\le \frac{1}{n}\mathcal {E}\textrm{nt}(\textsf{P}|\Pi _n)+G_n(f), \quad \end{aligned}$$
(5.4)

and for the Legendre dual of G, we have

$$\begin{aligned}G^*(\nu ):=\sup _{f\in \mathcal {C}_b(\mathcal {T})} \bigl \{\langle f,\nu \rangle - G(f)\bigr \}=\mathcal {E}\textrm{nt}(\gamma |\nu ).\end{aligned}$$

We will first simplify \(G_n\) and show that it indeed converges to G.

Lemma 5.6

Let \(f\in \mathcal {B}_b(\mathcal {T})\). Then for each \(n>0\)

$$\begin{aligned} G_n(f)=\tfrac{1}{n}\log \frac{e^{n \int _{\mathcal {T}} e^f \textrm{d}\gamma }-1}{e^{n \gamma (\mathcal {T})}-1}. \end{aligned}$$

In particular

$$\begin{aligned} \lim _{n\rightarrow \infty } G_n(f)=G(f). \end{aligned}$$

Proof

Using the representation for the rescaled Poisson measure \(\Pi _n\) we have

$$\begin{aligned} \begin{aligned} \int _{\Gamma } e^{n \langle f,\nu \rangle } \Pi _n(\textrm{d}\nu )&=\frac{1}{e^{n \gamma (\mathcal {T})}-1} \sum _{N=1}^{\infty } \frac{n^N}{N!}\int _{\mathcal {T}^N} e^{\sum _{i=1}^N f(x_i)} \textrm{d}\gamma ^{\otimes N}\\&=\frac{1}{e^{n \gamma (\mathcal {T})}-1} \sum _{N=1}^{\infty } \frac{n^N\left( \int _{\mathcal {T}} e^f \textrm{d}\gamma \right) ^n}{N!} =\frac{e^{n \int _{\mathcal {T}} e^f \textrm{d}\gamma }-1}{e^{n \gamma (\mathcal {T})}-1}, \end{aligned} \end{aligned}$$

and after taking logarithms and dividing by n we obtain the desired statement. Moreover, recall that by assumption \(\gamma (\mathcal {T})>0\) and note that by the boundedness of f,

$$\begin{aligned} 0< \int _{\mathcal {T}} e^f \textrm{d}\gamma < \infty . \end{aligned}$$

Hence we can take limit \(n\rightarrow \infty \) to deduce

$$\begin{aligned} \lim _{n\rightarrow \infty } G_n(f)&=\lim _{n\rightarrow \infty } \frac{1}{n}\log \left( e^{n \int _{\mathcal {T}} e^f \textrm{d}\gamma }-1\right) -\frac{1}{n}\log \left( e^{n \gamma (\mathcal {T})}-1\right) \\&=\int _{\mathcal {T}} e^f \textrm{d}\gamma -\gamma (\mathcal {T})=G(f), \end{aligned}$$

thereby concluding the proof. \(\square \)

Next, we establish convergence for suitable linear functionals of \(\nu \). In “Appendix C”, we will even prove convergence for quadratic functionals if the mass of \(\nu (\mathcal {T})\) is appropriately controlled.

Lemma 5.7

Suppose that the sequence \(\textsf{P}^n\) converges narrowly and

$$\begin{aligned} \limsup _{n\rightarrow \infty } \mathcal {E}\textrm{nt}(\textsf{P}^n|\Pi _n) < \infty . \end{aligned}$$

Then for any \(f\in B_b(\mathcal {T})\) it holds that

$$\begin{aligned} \lim _{n\rightarrow \infty } \int _{\Gamma } \langle f,\nu \rangle \, \textrm{d}\textsf{P}^n = \int _{\Gamma } \langle f,\nu \rangle \, \textrm{d}\textsf{P}. \end{aligned}$$
(5.5)

Proof

First, let us consider \(f\in C_b(\mathcal {T})\), and introduce the functions \(F(\nu ):=\langle f,\nu \rangle \) and its truncation \(F_L(\nu ):=\alpha _{L}(\nu (\mathcal {T})) \langle f,\nu \rangle \), where \(\alpha _L(z):={{\bar{\alpha }}}(z-L)\) with \({{\bar{\alpha }}}\in C_b({\mathbb {R}})\) a continuous non-increasing function such that \(0\le \bar{\alpha }(z)\le 1\) for all z, \({{\bar{\alpha }}}(z)=1\) for \(z\le 0\), and \({{\bar{\alpha }}}(z)=0\) for all \(z\ge 1\).

Note that \(F_L(\nu )\uparrow F(\nu )\) as \(L \rightarrow \infty \) and that \(F_L\) is continuous and bounded for every \(L\ge 0\). Hence,

$$\begin{aligned} \lim _{n\rightarrow \infty } \int _{\Gamma } F_L \, \textsf{P}^n(\textrm{d}\nu )=\int _{\Gamma } F_L\, \textrm{d}\textsf{P}, \qquad \lim _{L\rightarrow \infty } \int _{\Gamma } F_L\, \textrm{d}\textsf{P}=\int _{\Gamma } \langle f,\nu \rangle \, \textrm{d}\textsf{P}. \end{aligned}$$

We will show that

$$\begin{aligned} \limsup _{L\rightarrow \infty } \limsup _{n\rightarrow \infty } \frac{1}{n} \log \int _{\Gamma } e^{n \beta |F_L-F|}\, \textrm{d}\Pi _n=0, \qquad \text{ for } \text{ all } \beta \ge 0. \end{aligned}$$
(5.6)

From this we can obtain (5.5) since by duality,

$$\begin{aligned} \int _\Gamma |F_L-F|(\nu ) \,\textrm{d}\textsf{P}^n\le \frac{1}{\beta } \left( \mathcal {E}\textrm{nt}(\textsf{P}^n|\Pi _n)+\frac{1}{n} \log \int _{\Gamma } e^{n \beta |F_L-F|}\, \textrm{d}\Pi _n \right) ,\quad \text {for every } \beta ,L\ge 0. \end{aligned}$$

Taking subsequent limits in n, L and \(\beta \) to infinity, we deduce

$$\begin{aligned} \limsup _{L\rightarrow \infty } \limsup _{n\rightarrow \infty } \int |F_L-F|\, \textrm{d}\textsf{P}^n = 0, \end{aligned}$$

thus proving the desired equality (5.5).

Now, to establish (5.6), first note that \(|F_L-F|(\nu )\le |\alpha _L(N/n)-1|\langle |f|,\nu \rangle \), and therefore

$$\begin{aligned} \int _{\Gamma } e^{n \beta |F_L-F|}\, \textrm{d}\Pi _n&\le \frac{1}{e^{n \gamma (\mathcal {T})}-1} \sum _{N=1}^{\infty } \frac{n^N}{N!}\int _{\mathcal {T}^N} e^{\beta |\alpha _L(N/n)-1|\sum _{i=1}^N |f|(x_i)} \textrm{d}\gamma ^{\otimes N}\\&=\frac{1}{e^{n \gamma (\mathcal {T})}-1} \sum _{N=1}^{\infty } \frac{n^N}{N!}\left( \int _{\mathcal {T}} e^{\beta |\alpha _L(N/n)-1| |f|(x)} \gamma (\textrm{d}x)\right) ^N\\&\le \frac{1}{e^{n \gamma (\mathcal {T})}-1} \sum _{N=1}^{\infty } \frac{n^N}{N!}\gamma (\mathcal {T})^N + \frac{1}{e^{n \gamma (\mathcal {T})}-1} \sum _{N=\left\lfloor n L\right\rfloor }^{\infty } \frac{n^N}{N!}\left( \int _{\mathcal {T}} e^{\beta \Vert f\Vert _{\infty }} \gamma (\textrm{d}x)\right) ^N\\&= 1+ \frac{1}{e^{n \gamma (\mathcal {T})}-1} \sum _{N=\left\lfloor n L\right\rfloor }^{\infty } \frac{n^N}{N!}C_\beta ^N, \end{aligned}$$

with \(C_{\beta }:=e^{\beta \Vert f\Vert _{\infty }} \gamma (\mathcal {T})\). Suppose \(X_n\) is a Poisson variable with mean \(n C_\beta \). Then the second term in the previous estimate can be expressed as

$$\begin{aligned} \frac{1}{e^{n C_\beta }} \sum _{N=\left\lfloor n L\right\rfloor }^{\infty } \frac{n^N}{N!} C_\beta ^N = \textrm{Prob}\left( \tfrac{1}{n}X_n \ge \tfrac{1}{n}\left\lfloor n L\right\rfloor \right) . \end{aligned}$$

It is clear that \(\frac{1}{n}X_n\rightarrow C_{\beta }\) almost surely as \(n\rightarrow \infty \). Moreover, by elementary large deviation results, e.g. as in Cramer’s theorem [10, Theorem 2.2.3], it satisfies a large deviation principle with the rate n and rate function \(I_\beta (z):=z \log (z/C_{\beta })-z+C_{\beta }\), which implies

$$\begin{aligned} \limsup _{n\rightarrow \infty } \frac{1}{n} \log \textrm{Prob}\left( \tfrac{1}{n}X_n \ge a\right) \le -\inf _{z\ge a} I_\beta (z). \end{aligned}$$

Note that \(\inf _{z\ge a} I_\beta (z)=I_\beta (a)\) for \(a\ge C_{\beta }\), and hence for \(L\ge C_{\beta }\) we obtain the bound

$$\begin{aligned} \limsup _{n\rightarrow \infty } \frac{1}{n}\log \int _{\Gamma } e^{n \beta |F_L-F|}\, \textrm{d}\Pi _n&\le \limsup _{n\rightarrow \infty } \frac{1}{n}\log \max \left\{ 1,\frac{e^{n C_{\beta }}}{e^{n \gamma (\mathcal {T})}-1}e^{-n C_{\beta }} \sum _{N=\left\lfloor n L\right\rfloor }^{\infty } \frac{n^N}{N!}C_\beta ^N\right\} \\&\le \max \left\{ 0,(C_{\beta }-\gamma (\mathcal {T}))- I_\beta (L)\right\} . \end{aligned}$$

Letting \(L\rightarrow \infty \), we deduce

$$\begin{aligned} \limsup _{L\rightarrow \infty } \ \limsup _{n\rightarrow \infty } \frac{1}{n} \log \int _{\Gamma } e^{n \beta |F_L-F|}\, \textrm{d}\Pi _n\le \max \left\{ 0, (C_{\beta }-\gamma (\mathcal {T}))- \liminf _{L\rightarrow \infty } I_\beta (L)\right\} = 0.\end{aligned}$$

Finally, let us now consider \(f\in \mathcal {B}_b\). Using a similar density approach as above it is sufficient to show that there exists a sequence of bounded continuous functions \(f_k\), such that

$$\begin{aligned} \limsup _{k\rightarrow \infty } \limsup _{n\rightarrow \infty } \frac{1}{n} \log \int _{\Gamma } e^{n \beta \langle |f-f_k|,\nu \rangle }\, \textrm{d}\Pi _n=0, \qquad \text{ for } \text{ all } \beta >0, \end{aligned}$$

but, by Lemma 5.6, we have

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{1}{n} \log \int _{\Gamma } e^{n \beta \langle |f-f_k|,\nu \rangle } \, \textrm{d}\Pi _n = \int _{\mathcal {T}} \left( e^{\beta |f-f_k|}-1\right) \textrm{d}\gamma .\end{aligned}$$

Similar to density statements in \(L^p(\gamma )\), one can find a sequence such that the above integrals vanish as \(k\rightarrow \infty \), see for example [19, Theorem C.5]. \(\square \)

Proof of Theorem 5.5

First, we will show that the family \(\{\mathcal {F}_n\}_{n\ge 1}\) is equicoercive, by establishing a first moment bound for \(\textsf{P}\) in terms of the mass \(\nu (\mathcal {T})\). Namely, setting \(f\equiv 1\) in (5.4) we have for any \(\textsf{P}\in \mathcal {P}(\Gamma )\), \(n\ge 1\), the inequality

$$\begin{aligned} \int _{\Gamma } \nu (\mathcal {T})\, \textrm{d}\textsf{P}&\le \frac{1}{n}\mathcal {E}\textrm{nt}(\textsf{P}|\Pi _n)+G_n(1) \le 2\mathcal {F}_n(\textsf{P})+\frac{1}{n}\log \frac{e^{n e \gamma (\mathcal {T}) }-1}{e^{n \gamma (\mathcal {T})}-1}, \end{aligned}$$

where the final term is bounded from above independently of \(\textsf{P}\).

Next, for the limit inferior, consider a narrowly converging sequence \(\textsf{P}^n\rightarrow \textsf{P}=\delta _{{{\bar{\nu }}}}\) for some \({{\bar{\nu }}}\in \Gamma \). Fix any \(f\in C_b(\mathcal {T})\). By duality, we have for every n,

$$\begin{aligned} \frac{1}{n}\mathcal {E}\textrm{nt}(\textsf{P}^n|\Pi _n)&\ge \int _{\Gamma }\langle f,\nu \rangle \,\textrm{d}\textsf{P}^n- \frac{1}{n} \log \int _{\Gamma } e^{n \langle f,\nu \rangle }\, \textrm{d}\Pi _n, \end{aligned}$$

and due to Lemmas 5.6 and 5.7 and,

$$\begin{aligned} \liminf _{n\rightarrow \infty } \frac{1}{n}\mathcal {E}\textrm{nt}(\textsf{P}^n|\Pi _n)&\ge \liminf _{n\rightarrow \infty } \int _{\Gamma }\langle f,\nu \rangle \,\textrm{d}\textsf{P}^n- \frac{1}{n} \log \int _{\Gamma } e^{n \langle f,\nu \rangle }\, \textrm{d}\Pi _n \\&=\langle f,{{\bar{\nu }}}\rangle -G(f). \end{aligned}$$

Taking the supremum over all \(f\in C_b(\mathcal {T})\) we find

$$\begin{aligned} \mathcal {F}_{\infty }(\delta _{{{\bar{\nu }}}})=\frac{1}{2}\mathcal {E}\textrm{nt}(\bar{\nu }|\gamma )\le \liminf _{n\rightarrow \infty } \mathcal {F}_n(\textsf{P}^n). \end{aligned}$$

Finally, consider an arbitrary \({{\bar{\nu }}}\in \Gamma \) with \(\mathcal {E}\textrm{nt}(\bar{\nu }|\gamma )<\infty \) and set \(\textsf{P}=\delta _{{{\bar{\nu }}}}\). We will construct a sequence of measures \(\textsf{P}^n\) that locally consists of Poisson measures induced by \({{\bar{\nu }}}\). Namely, set

$$\begin{aligned} \Pi _{n,{{\bar{\nu }}}}:=(L_n)_{\#} \pi _{n,\bar{\nu }},\qquad \text {with}\qquad \pi _{n,{{\bar{\nu }}}}:=\frac{1}{e^{n \bar{\nu }(\mathcal {T})}-1}\sum _{N=1}^{\infty } \frac{n^N}{N!}{{\bar{\nu }}}^{\otimes N}, \end{aligned}$$

and consider the sequence \(\textsf{P}^n:=\Pi _{n,{{\bar{\nu }}}}\). It is straightforward to verify that indeed \(\textsf{P}^n\rightarrow \delta _{{{\bar{\nu }}}}\). Moreover, note that although \(L_n\) is not bijective, we do have the equality

$$\begin{aligned} \mathcal {E}\textrm{nt}(\textsf{P}^n|\Pi _n)=\mathcal {E}\textrm{nt}(\pi _{n,{{\bar{\nu }}}}|\pi _n), \end{aligned}$$

due to the symmetry of the N-particle distributions \(\bar{\nu }^{\otimes N}\), \(\gamma ^{\otimes N}\). Therefore, we derive

$$\begin{aligned} \begin{aligned} \mathcal {E}\textrm{nt}(\textsf{P}^n|\Pi _n)&=\mathcal {E}\textrm{nt}(\pi _{n,{{\bar{\nu }}}}|\pi _n)\\&=\frac{1}{e^{n {{\bar{\nu }}}(\mathcal {T})}-1}\sum _{N=1}^{\infty } \frac{n^N}{N!}\int _{\mathcal {T}^N} \log \left( \frac{e^{n \gamma (\mathcal {T})}-1}{e^{n {{\bar{\nu }}}(\mathcal {T})}-1}\frac{\textrm{d}{{\bar{\nu }}}^{\otimes N}}{\textrm{d}\gamma ^{\otimes N}}\right) \textrm{d}{{\bar{\nu }}}^{\otimes N}\\&=\frac{1}{e^{n {{\bar{\nu }}}(\mathcal {T})}-1}\sum _{N=1}^{\infty } \frac{n^N}{N!} \left( N {{\bar{\nu }}}(\mathcal {T})^{N-1} \int _{\mathcal {T}} \log \left( \frac{\textrm{d}{{\bar{\nu }}}}{\textrm{d}\gamma }\right) \textrm{d}{{\bar{\nu }}}+{{\bar{\nu }}}(\mathcal {T})^{N}\log \frac{e^{n \gamma (\mathcal {T})}-1}{e^{n {{\bar{\nu }}}(\mathcal {T})}-1} \right) \\&=n\frac{e^{n {{\bar{\nu }}}(\mathcal {T})}}{e^{n {{\bar{\nu }}}(\mathcal {T})}-1}\int _{\mathcal {T}} \log \left( \frac{\textrm{d}{{\bar{\nu }}}}{\textrm{d}\gamma }\right) \textrm{d}{{\bar{\nu }}} +\log \frac{e^{n \gamma (\mathcal {T})}-1}{e^{n {{\bar{\nu }}}(\mathcal {T})}-1}. \end{aligned} \end{aligned}$$

Rescaling and taking the limit \(n\rightarrow \infty \), we obtain

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{1}{n}\mathcal {E}\textrm{nt}(\textsf{P}^n|\Pi _n)=\int _{\mathcal {T}} \log \left( \frac{\textrm{d}{{\bar{\nu }}}}{\textrm{d}\gamma }\right) \textrm{d}{{\bar{\nu }}} -\bar{\nu }(\mathcal {T})+\gamma (\mathcal {T}) =\mathcal {E}\textrm{nt}({{\bar{\nu }}}|\gamma ), \end{aligned}$$

therewith concluding the proof. \(\square \)

5.2 Uniform estimates

In Sect. 3.1 we provided uniform-in-n estimates for the flux. Namely, from Lemma 3.13, we directly have the following.

Corollary 5.8

Consider a sequence \((\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})\in \textsf{CE}_n\) such that

$$\begin{aligned} \limsup _{n\rightarrow \infty } \int _0^T \mathcal {R}_n(\textsf{P}_t^n,\textsf{J}^{n,+}_t,\textsf{J}^{n,-}_t)< \infty .\end{aligned}$$

Then

$$\begin{aligned} {\limsup _{n\rightarrow \infty }}\int _0^{T} 3M {\tilde{\phi }}\left( \frac{1}{3M}\int _{\Gamma \times \mathcal {T}} (1+\nu (\mathcal {T})^2)^{-1}\,\textsf{J}^{n,{\pm }}_t(\textrm{d}\nu ,\textrm{d}x)\right) \, \textrm{d}t < \infty , \end{aligned}$$

where \(M:=(1+\gamma (\mathcal {T}))\Vert c\Vert _{\infty }\).

However, the weighted total variation metric \(d_{TV,w}\) that was introduced is not appropriate for taking limits, and instead, we take the weaker metric defined in (4.4),

$$\begin{aligned} W(\textsf{P}^1,\textsf{P}^2):=\sup _{F\in {\mathbb {F}}} \left\{ \int _{\Gamma } F \,\textrm{d}(\textsf{P}^1-\textsf{P}^2) \right\} , \end{aligned}$$

where

$$\begin{aligned} {\mathbb {F}}:=\left\{ F\in \textrm{Cyl}_c(\Gamma )\,: \, (1+\nu (\mathcal {T})^2)\left| (\textrm{grad}_{\Gamma }F)(\nu ,x)\right| \le 1, \text{ for } \text{ all } x\in \mathcal {T}, \nu \in \Gamma \right\} . \end{aligned}$$

Recall that W is narrowly lower semicontinuous and implies narrow convergence on narrowly pre-compact subsets. We now have the following equicontinuity result.

Lemma 5.9

Consider a sequence \((\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})\in \textsf{CE}_n\) such that

$$\begin{aligned} \limsup _{n\rightarrow \infty } \int _0^T \mathcal {R}_n(\textsf{P}_t^n,\textsf{J}^{n,+}_t,\textsf{J}^{n,-}_t)\, \textrm{d}t< \infty .\end{aligned}$$

Then

$$\begin{aligned} \limsup _{n\rightarrow \infty } \int _0^T {\tilde{\phi }}\left( \frac{|\dot{\textsf{P}}^n_t|_{W}}{12M}\right) <\infty , \end{aligned}$$

where \(|{\dot{\textsf{P}}}_t|_{W}\) is the W-metric speed and \(\tilde{\phi }(s):=\phi (s \vee 1)\) is the monotone relaxation of \(\phi \).

Proof

The proof is similar to Lemmas 3.15 and 4.13, now for the distance W instead of the weighted total variation metric \(d_{TV,w}\). Namely, fix \(n>0\) and consider a curve \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_n\). Then we have for any \(s,t\in [0,T]\) and any \(F\in C_c(\Gamma )\),

$$\begin{aligned} \left| \int _{\Gamma } F \textrm{d}(\textsf{P}_t-\textsf{P}_s) \right| \le \int _s^t \int _{\Gamma \times \mathcal {T}} |\overline{\nabla }^{n,+}F(\nu ,x)| \, \textrm{d}\textsf{J}_r^{+} \, \textrm{d}r + \int _s^t \int _{\Gamma \times \mathcal {T}} |\overline{\nabla }^{n,-}F(\nu ,x)| \, \textrm{d}\textsf{J}_r^{-} \, \textrm{d}r. \end{aligned}$$

Substituting any \(F\in {\mathbb {F}}\) it is straightforward to verify that

$$\begin{aligned} |\overline{\nabla }^{n,+}F(\nu ,x)| = n|F(\nu +\tfrac{1}{n}\delta _x)-F(\nu )|&\le (1+\nu (\mathcal {T})^2)^{-1}\\ |\overline{\nabla }^{n,-}F(\nu ,x)| = n|F(\nu )-F(\nu -\tfrac{1}{n}\delta _x)|&\le (1+(\nu (\mathcal {T})-\tfrac{1}{n})^2)^{-1} \le 2(1+\nu (\mathcal {T})^2)^{-1}, \end{aligned}$$

for sufficiently large n, and therefore

$$\begin{aligned} \left| \int _{\Gamma } F \textrm{d}(\textsf{P}_t-\textsf{P}_s) \right| \le 2 \int _s^t \int _{\Gamma \times \mathcal {T}} (1+\nu (\mathcal {T})^2)^{-1} \, \textrm{d}(\textsf{J}_r^{+}+\textsf{J}_r^{-}) \, \textrm{d}r. \end{aligned}$$

Taking the supremum over \(F\in {\mathbb {F}}\), we find that \((\textsf{P}_t)_{t\in [0,T]}\) is absolutely continuous w.r.t. W with

$$\begin{aligned} |{\dot{\textsf{P}}}_t|_{W}\le 2 \int _{\Gamma \times \mathcal {T}} (1+\nu (\mathcal {T})^2)^{-1} \, \textrm{d}(\textsf{J}_t^{+}+\textsf{J}_t^{-}) \qquad \hbox { for a.e.}\ t\in [0,T], \end{aligned}$$

where \(|{\dot{\textsf{P}}}^n_t|_W\) is the W-metric speed. Applying the estimates in Lemma 3.13 concludes the proof. \(\square \)

5.3 Proof of main results

We finally conclude the manuscript with the proof of the main results.

Proof of Theorem 5.1

We will first establish the liminf-estimates. Namely, consider a sequence \((\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})\in \textsf{CE}_n\) that converges to the curve \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{\infty }\). In particular \(\textsf{P}_t^n\rightarrow \textsf{P}_t\) for all \(t\in [0,T]\), and hence by Theorem 5.5 on the \(\Gamma \)-convergence of \(\mathcal {F}_n\) we immediately obtain

$$\begin{aligned} \liminf _{n\rightarrow \infty } \mathcal {F}_n(\textsf{P}_t^n)\ge \mathcal {F}_{\infty }(\textsf{P}_t), \qquad \text{ for } \text{ all } t\in [0,T]. \end{aligned}$$

Now suppose that

$$\begin{aligned} \limsup _{n\rightarrow \infty } \mathcal {F}_n(\textsf{P}_0^n)<\infty , \qquad \limsup _{n\rightarrow \infty } \mathcal {I}_n(\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})<\infty . \end{aligned}$$

In particular we have the bounds

$$\begin{aligned} \limsup _{n\rightarrow \infty } \int _0^T \mathcal {R}_n(\textsf{P}^n_t,\textsf{J}^{n,+},\textsf{J}^{n,-}) \, \textrm{d}t< \infty , \quad \limsup _{n\rightarrow \infty } \int _0^T \mathcal {D}_n(\textsf{P}^n_t) \, \textrm{d}t < \infty . \end{aligned}$$
(5.7)

Due to the chain rule and the assumption on \(\mathcal {F}_n(\textsf{P}^n_0)\), we obtain

$$\begin{aligned} \limsup _{n\rightarrow \infty } \sup _{t\in [0,T]} \mathcal {F}_n(\textsf{P}^n_t) < \infty . \end{aligned}$$
(5.8)

The latter guarantees, by Corollary C.3, that we have the vague convergence

$$\begin{aligned} \begin{aligned} \lim _{n\rightarrow \infty } \vartheta ^{\pm }_{\textsf{P}_t^n} = \vartheta ^{\pm }_{\textsf{P}_t}, \qquad \lim _{n\rightarrow \infty } \textsf{T}^{n,\pm }_{\#}\vartheta _{\textsf{P}_t^n}^{\pm } =\vartheta ^{\pm }_{\textsf{P}_t}. \end{aligned} \end{aligned}$$

Recall that from Lemma 3.13 and Remark 3.6 we have for each \(n\ge 1\):

$$\begin{aligned} \begin{aligned} \mathcal {E}\textrm{nt}\left( \textsf{J}_t^{n,\pm }|\Theta _{\textsf{P}}^{n,+}\right)&=\int _{\Gamma \times \mathcal {T}} \Upsilon \left( \frac{\textrm{d}\textsf{J}_t^{n,\pm }}{\textrm{d}\Sigma },\frac{\textrm{d}\vartheta _{\textsf{P}_t}^{\pm }}{\textrm{d}\Sigma },\frac{\textrm{d}(\textsf{T}^{n,\mp }_{\#}\vartheta _{\textsf{P}_t}^{\mp })}{\textrm{d}\Sigma }\right) \textrm{d}\Sigma ,\\ \mathcal {D}_{n}(\textsf{P}_t)&=2H^2(\vartheta _{\textsf{P}_t}^{\pm },\textsf{T}^{n,\mp }_{\#}\vartheta _{\textsf{P}_t}^{\mp }), \end{aligned} \end{aligned}$$

for any dominating measure \(\Sigma \), and similarly, from Corollary 4.10 and Remark 4.5 that

$$\begin{aligned} \begin{aligned} \mathcal {E}\textrm{nt}\left( \textsf{J}_t^{\pm }|\Theta _{\textsf{P}}^{+}\right)&=\int _{\Gamma \times \mathcal {T}} \Upsilon \left( \frac{\textrm{d}\vartheta _{\textsf{P}_t}^{\pm }}{\textrm{d}\Sigma },\frac{\textrm{d}\vartheta _{\textsf{P}_t}^{\pm }}{\textrm{d}\Sigma },\frac{\textrm{d}\vartheta _{\textsf{P}_t}^{\mp }}{\textrm{d}\Sigma }\right) \textrm{d}\Sigma ,\\ \mathcal {D}_{\infty }(\textsf{P}_t)&=2H^2(\vartheta _{\textsf{P}_t}^{\pm },\vartheta _{\textsf{P}_t}^{\mp }). \end{aligned} \end{aligned}$$

By the convexity and lower semi-continuity of \(\Upsilon \) and H we conclude by standard semi-continuity results (e.g. see [6, Theorem 3.4.3]) that for each \(t\in [0,T]\),

$$\begin{aligned} \liminf _{n\rightarrow \infty } \mathcal {R}_n(\textsf{P}_t^n,\textsf{J}^{n,+}_t,\textsf{J}^{n,-}_t) \ge \mathcal {R}_n(\textsf{P}_t,\textsf{J}^{+}_t,\textsf{J}^{-}_t), \qquad \liminf _{n\rightarrow \infty } \mathcal {D}_n(\textsf{P}_t^n) \ge \mathcal {D}_n(\textsf{P}_t), \end{aligned}$$

from which (5.1) directly follows after applying the Fatou lemma.

Next, we consider the question of compactness. As in the previous part, let us consider a sequence \((\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})\in \textsf{CE}_n\) with

$$\begin{aligned} \limsup _{n\rightarrow \infty } \mathcal {F}_n(\textsf{P}_0^n)<\infty , \qquad \limsup _{n\rightarrow \infty } \mathcal {I}_n(\textsf{P}^n,\textsf{J}^{n,+},\textsf{J}^{n,-})<\infty , \end{aligned}$$

which imply that the estimates (5.7) and (5.8) still hold. The bound on the free energy ensures by Theorem 5.5 that \(\{\textsf{P}_{t}^n\}_{t\in [0,T],n\ge 1}\) is pre-compact. Moreover, due to the bound on the action \(\mathcal {R}_n\), we have by the results of Corollary (5.8) and Lemma (5.9) that

$$\begin{aligned} \limsup _{n\rightarrow \infty } \int _0^T {\tilde{\phi }}\left( \frac{1}{3M} \int _{\Gamma \times \mathcal {T}} (1+\nu (\mathcal {T}{})^2)^{-1}\,\textsf{J}^{n,\pm }_t(\textrm{d}\nu ,\textrm{d}x) \right) \, \textrm{d}t < \infty , \end{aligned}$$
(5.9)
$$\begin{aligned} \limsup _{n\rightarrow \infty } \int _0^T {\tilde{\phi }}\left( \frac{|\dot{\textsf{P}}^n_t|_{W}}{12M}\right) \, \textrm{d}t<\infty , \end{aligned}$$
(5.10)

where \(|{\dot{\textsf{P}}}^n_t|_W\) is again the W-metric speed. From (5.9), we then conclude from the non-decreasing, convex, and super-linear at infinity property of \({\tilde{\phi }}\) that, up to choosing a subsequence \(n'\), there exists a family \(\{\textsf{J}^{\pm }_t\}_{t\in [0,T]} \in \mathcal {M}_{loc}^+(\Gamma \times \mathcal {T})\) such that for all st the sequence of measures \(\textsf{J}^{n',\pm }_r(\textrm{d}\nu ,\textrm{d}x)\, \textrm{d}r\) converges to \(\textsf{J}^{\pm }_r(\textrm{d}\nu ,\textrm{d}x)\, \textrm{d}r\) in \(\mathcal {M}_{loc}(\Gamma \times \mathcal {T}\times [s,t])\), and

$$\begin{aligned} \int _0^T {} {\tilde{\phi }}\left( \frac{1}{3M} \int _{\Gamma \times \mathcal {T}} (1+\nu (\mathcal {T}{})^2)^{-1}\,\textsf{J}^{\pm }_t(\textrm{d}\nu ,\textrm{d}x) \right) \, \textrm{d}t < \infty . \end{aligned}$$

Similarly, since the metric W is narrowly lower semicontinuous and induces narrow convergence on narrowly pre-compact subsets, we find by an Arzela-Ascoli argument and the estimate (5.10) that, up to choosing a subsequence \(n''\), there exist a narrowly continuous curve \((\textsf{P}_t)_{t\in [0,T]}\) such that \(\textsf{P}^{n''}_t\) converges to \(\textsf{P}_t\) for all \(t\in [0,T]\).

All that remains is showing that \((\textsf{P},\textsf{J}^+,\textsf{J}^-)\in \textsf{CE}_{\infty }\). Therefore, fix any \(s,t\in [0,T]\) and \(F \in \textrm{Cyl}_c(\Gamma )\). It is straightforward to verify that there exist constants \(K_{F}\) and \(C_F\) such that the following Taylor approximation holds:

$$\begin{aligned} \left| \textrm{grad}_{\Gamma }(\nu ,x)\mp n\left( F(\nu {\pm }\tfrac{1}{n}\delta _x)-F(\nu )\right) \right| \le \tfrac{C_{F}}{n} 1_{\nu (\mathcal {T})\le K_F}(\nu ,x), \qquad \text{ for } \text{ all } \nu \in \Gamma ,\, x\in \mathcal {T}. \end{aligned}$$

Thus, we can take the limit in the continuity equation \(\textsf{CE}_n\), to conclude that

$$\begin{aligned} \int _{\Gamma } F(\nu ) \,\textrm{d}\textsf{P}_t - \int _{\Gamma } F(\nu ) \,\textrm{d}\textsf{P}_s&=\lim _{n\rightarrow \infty } \int _{\Gamma } F(\nu ) \,\textrm{d}\textsf{P}^{n''}_t - \int _{\Gamma } F(\nu ) \,\textrm{d}\textsf{P}^{n''}_s \\&= \lim _{n\rightarrow \infty } \int _s^t \left( \int _{\Gamma \times \mathcal {T}} (\overline{\nabla }^{n'',+}F)\,\textrm{d}\textsf{J}_r^{n'',+}+(\overline{\nabla }^{n'',-}F)\, \textrm{d}\textsf{J}_r^{n'',-} \right) \, \textrm{d}r \\&=\int _s^t \Big (\int _{\Gamma \times \mathcal {T}} (\textrm{grad}_{\Gamma } F)\,\textrm{d}\textsf{J}_r^+-(\textrm{grad}_{\Gamma } F)\, \textrm{d}\textsf{J}_r^- \Big ) \, \textrm{d}r, \end{aligned}$$

thereby concluding the proof. \(\square \)

Proof of Theorem 5.4

Suppose that \({{\bar{\textsf{P}}}}^n\rightarrow {{\bar{\textsf{P}}}} =\delta _{{{\bar{\nu }}}}\) with

$$\begin{aligned} \lim _{n\rightarrow \infty } \mathcal {F}_n(\bar{\textsf{P}}^n)=\frac{1}{2}\mathcal {E}\textrm{nt}({{{\bar{\nu }}}}|\gamma ). \end{aligned}$$

For each \(n\in {\mathbb {N}}\) let \(\textsf{P}_t^n\) be the unique gradient-flow solution to ((\(\mathsf FKE_n\))) with initial data \({{\bar{\textsf{P}}}}^n\). Moreover, let \(\nu _t\) be the unique solution to (2.13) with initial data \({{\bar{\nu }}}\), and set \(\textsf{P}_t:=\delta _{\nu _t}\), which is the unique gradient-flow solution to the Liouville equation (Li) with initial data \({{\bar{\textsf{P}}}}\). Then by Theorem 5.3 we have for every \(t\in [0,T]\) that \(\textsf{P}_t^n\rightarrow \textsf{P}_t\), and

$$\begin{aligned} \lim _{n\rightarrow \infty } \mathcal {F}_n(\textsf{P}^n_t)=\mathcal {F}_{\infty }(\textsf{P}_t)=\frac{1}{2}\mathcal {E}\textrm{nt}(\nu _t|\gamma ). \end{aligned}$$

Next, suppose that in addition there exists a constant \(C>1\) such that \(C^{-1}\le \textrm{d}{{\bar{\nu }}}/\textrm{d}\gamma \le C\). By Lemma 2.18 there exists a \(C'<\infty \) with

$$\begin{aligned}\sup _{t\in [0,T]} \left\| \log u_t\right\| _{\infty }<C', \qquad u_t:=\textrm{d}\nu _t/\textrm{d}\gamma . \end{aligned}$$

Now fix any \(t\in [0,T]\), and recall that

$$\begin{aligned} \Pi _{n,\nu _t}:=(L_n)_{\#} \pi _{n,\nu _t},\qquad \pi _{n,\nu _t}=\frac{1}{e^{n {\nu _t(\mathcal {T})}}-1}\sum _{N=1}^{\infty } \frac{n^N}{N!}\nu _t^{\otimes N}. \end{aligned}$$

It is straightforward to check that \(\Pi _n \ll \Pi _{n,\nu _t} \ll \Pi _n\) and hence for any \(\Gamma _n \ni \Gamma _n=L_n(x_1,\dots ,x_N)\),

$$\begin{aligned} \log \left( \frac{\textrm{d}\Pi _{n,\nu _t}}{\textrm{d}\Pi _n}\right) (\nu )=\log \left( \frac{e^{n \gamma (\mathcal {T})}-1}{e^{n \nu _t(\mathcal {T})}-1}\frac{\textrm{d}\nu _t^{\otimes N}}{\textrm{d}\gamma ^{\otimes N}}\right) =\log \left( \frac{e^{n \gamma (\mathcal {T})}-1}{e^{n \nu _t(\mathcal {T})}-1}\right) +\sum _{i=1}^N \log u_t(x_i), \end{aligned}$$

with all terms finite, and \(|\sum \log u_t(x_i)|\le N C'\). Therefore, by Lemma 5.7 we derive

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{1}{n}\int _{\Gamma } \log \left( \frac{\textrm{d}\Pi _{n,\nu _t}}{\textrm{d}\Pi _n}\right) \, \textrm{d}\textsf{P}_t^n&=\lim _{n\rightarrow \infty } \frac{1}{n}\log \left( \frac{e^{n \gamma (\mathcal {T})}-1}{e^{n \nu _t(\mathcal {T})}-1}\right) +\lim _{n\rightarrow \infty }\int _{\Gamma } \langle \log u_t,\nu \rangle \, \textrm{d}\textsf{P}_t^n\\&=\gamma (\mathcal {T})-\nu _t(\mathcal {T})+\langle \log u_t,\nu _t\rangle \\&=\mathcal {E}\textrm{nt}(\nu _t|\gamma ). \end{aligned}$$

Subsequently, we can compute as follows:

$$\begin{aligned} \lim _{n\rightarrow \infty } \mathcal {E}\textrm{nt}(\textsf{P}^n_0|\Pi _{n})&=\frac{1}{n}\int _{\Gamma } \phi \left( \frac{\textrm{d}\textsf{P}^n_0}{\textrm{d}\Pi _{n}}\right) \, \textrm{d}\Pi _{n}\\&=\lim _{n\rightarrow \infty } \frac{1}{n}\int _{\Gamma } \left( \log \left( \frac{\textrm{d}\textsf{P}^n_0}{\textrm{d}\Pi _{n,\nu _0}}\right) +\log \left( \frac{\textrm{d}\Pi _{n,\nu _0}}{\textrm{d}\Pi _{n}}\right) \right) \, \textrm{d}\textsf{P}^n_0\\&=\mathcal {E}\textrm{nt}(\nu _0|\gamma ), \end{aligned}$$

and hence the initial data are well-prepared. Therefore, we can conclude for all \(t\in [0,T]\)

$$\begin{aligned} \lim _{n\rightarrow \infty } \mathcal {E}\textrm{nt}(\textsf{P}^n_t|\Pi _{n,\nu _t})&=\frac{1}{n}\int _{\Gamma } \phi \left( \frac{\textrm{d}\textsf{P}^n_t}{\textrm{d}\Pi _{n,\nu _t}}\right) \, \textrm{d}\Pi _{n,\nu _t}\\&=\lim _{n\rightarrow \infty } \frac{1}{n}\int _{\Gamma } \left( \log \left( \frac{\textrm{d}\textsf{P}^n_t}{\textrm{d}\Pi _{n}}\right) +\log \left( \frac{\textrm{d}\Pi _{n}}{\textrm{d}\Pi _{n,\nu _t}}\right) \right) \, \textrm{d}\textsf{P}^n_t\\&=\mathcal {E}\textrm{nt}(\nu _t|\gamma )-\mathcal {E}\textrm{nt}(\nu _t|\gamma ) =0, \end{aligned}$$

thus establishing the entropic propagation of chaos result. \(\square \)