1 Introduction

There are many different techniques to study the long-time behaviour of Markov processes that excel in different situations. One very common and powerful technique is the use of Lyapunov functionals, i.e., functionals that are monotone in time. An example of such a functional is the variance

$$\begin{aligned} \textbf{Var}_\mu (f) := {\mathbb {E}}_\mu [f^2]-{\mathbb {E}}_\mu [f]^2, \quad f \in L^2(\mu ), \end{aligned}$$

where \(\mu \) is an invariant measure for some Markov process \((X_t)_{t \ge 0}\) with semigroup \((P_t)_{t \ge 0}\). If we now fix an observable f and consider the function

$$\begin{aligned}{}[0,\infty ) \ni t \mapsto \textbf{Var}_\mu (P_tf) \in [0,\infty ), \end{aligned}$$

then it is easy to see that this is non-increasing and under some further assumptions one can even show that it is strictly decreasing for all non-constant observables f. This whole viewpoint is purely based on functional analytic arguments and one does not even need to speak about the underlying Markov process itself to carry out the corresponding calculations. Looking at this result from a different perspective, we observe that on average the process \((P_t f(X_t)^2)_{t \ge 0}\) is non-increasing. However, the purely analytic approach does not provide us with any insight on the behaviour of this process on the level of single trajectories. Of course, in general we cannot expect that every trajectory of this process is non-increasing, but we can hope that the process exhibits some stochastic form of monotonicity, such as being non-increasing in conditional mean, i.e., a supermartingale. From a probabilistic standpoint, these limitations of the purely analytic tools are somewhat dissatisfying. Consequently, we seek to enhance this coarse approach by applying a more detailed, probabilistic technique that allows us to extend these results to a trajectorial level, by which we mean results on the behaviour of single realisations of a stochastic process. By doing so, we uncover more of the underlying probabilistic mechanisms governing the decay of variance, or more generally, the decay of \(\Phi \)-entropies. For this, we will first briefly recall the notion of \(\Phi \)-entropies and then explain our main results and ideas with the help of the simple example of a continuous-time Markov chain on a finite state space. The rest of the article is then devoted to extending these ideas to the setting of spatially extended systems of infinitely many interacting particles as e.g. considered in [13].

1.1 \(\Phi \)-Entropies and Their Decay Under Markovian Dynamics

Let \(\Phi : I \rightarrow \mathbb {R}\) be a smooth and convex function defined on a not necessarily bounded interval \(I \subset \mathbb {R}\). Let \((E, \mathcal {B}(E))\) be a Polish space equipped with its Borel \(\sigma \)-algebra and assume that \(\mu \) is a probability measure on \((E, \mathcal {B}(E))\). The \(\Phi \)-entropy functional is then defined on the set of \(\mu \)-integrable functions \(f:E \rightarrow I\) by

$$\begin{aligned} \textbf{Ent}_\mu ^\Phi (f) := \int _E \Phi (f) d\mu - \Phi \left( \int _E f d \mu \right) = {\mathbb {E}}_\mu \left[ \Phi (f)\right] - \Phi \left( {\mathbb {E}}_\mu \left[ f\right] \right) . \end{aligned}$$

By Jensen’s inequality one can immediately deduce that the \(\Phi \)-entropy functional takes its values in \(\mathbb {R}_+ \cup \left\{ +\infty \right\} \). Moreover, \(\textbf{Ent}_\mu ^\Phi (f)\) vanishes if its argument is constant and if \(\Phi \) is strictly convex, then the converse is also true. For special choices of \(\Phi \) one can recover the classical variance and relative entropy functionals since we have

$$\begin{aligned} \textbf{Ent}_\mu ^{u \mapsto u^2} = \textbf{Var}_\mu , \quad \textbf{Ent}_\mu ^{u \mapsto u \log u} = h(\cdot |\mu ). \end{aligned}$$

Now let \((X(t))_{t \ge 0}\) be a Markov process on our Polish space E with associated semigroup \((P_t)_{t \ge 0}\) acting on \(C_b(E;\mathbb {R})\), the space of continuous and bounded real-valued functions on E. Let us assume that there exists an invariant probability measure \(\mu \) and denote by \(\mathscr {L}\) the generator of the semigroup \((P_t)_{t \ge 0}\) with domain \(\text {dom} (\mathscr {L}) \subset C_b(E;\mathbb {R})\).

By invariance of \(\mu \) and Jensen’s inequality one can now deduce that for all \(f \in C_b(E;\mathbb {R})\)

$$\begin{aligned} \textbf{Ent}_\mu ^\Phi (P_tf)&= {\mathbb {E}}_\mu \left[ \Phi (P_tf)\right] - \Phi \left( {\mathbb {E}}_\mu \left[ P_tf\right] \right) \le {\mathbb {E}}_\mu \left[ P_t\Phi (f)\right] - \Phi \left( {\mathbb {E}}_\mu \left[ f\right] \right) = \textbf{Ent}_\mu ^\Phi (f). \end{aligned}$$

This tells us that the \(\Phi \)-entropy is non-increasing as a function of t and can be used as a Lyapunov function. More precisely, with purely analytic arguments, one can even deduce the following general result about the decay of \(\Phi \)-entropies.

Proposition 1.1

(DeBruijn like property for Markov semigroups, [2]) Let \((X(t))_{t \ge 0}\) be a Markov process on a Polish space E equipped with its Borel \(\sigma \)-algebra \(\mathcal {B}(E)\) and let \((P_t)_{t \ge 0}\) be the associated Markov semigroup with generator \(\mathscr {L}\). Assume that \(\mu \) is an invariant probability measure. Then, for any continuous and bounded function \(f:E \rightarrow I\) and any \(t>0\), it holds that

$$\begin{aligned} \partial _t \ \textbf{Ent}_\mu ^\Phi (P_tf) = {\mathbb {E}}_\mu \left[ \Phi '(P_t f)\mathscr {L}(P_t f) \right] \le 0. \end{aligned}$$

This result is classical, but we nevertheless recall its short analytic proof.

Proof

The chain rule and the definition of the generator \(\mathscr {L}\) directly imply that

$$\begin{aligned} \partial _t \ \textbf{Ent}_\mu ^\Phi (P_tf) = {\mathbb {E}}_\mu \left[ \Phi '(P_t f)\frac{d}{dt}(P_t f) \right] = {\mathbb {E}}_\mu \left[ \Phi '(P_t f)\mathscr {L}(P_t f) \right] . \end{aligned}$$

To see that the left-hand side is actually non-positive, it suffices to observe that the convexity of \(\Phi \) implies via Jensen’s inequality for conditional expectations

$$\begin{aligned} \Phi (P_{t+s}g) \le P_t(\Phi (P_sg)) \end{aligned}$$

for any \(s,t \ge 0\) and hence for all f we have

$$\begin{aligned} \textbf{Ent}_\mu ^\Phi (P_{t+s}f) \le \textbf{Ent}_\mu ^\Phi (P_sf), \end{aligned}$$

by invariance of \(\mu \). \(\square \)

By integrating with respect to the time variable one obtains the following classical corollary, which links exponential decay of \(\Phi \)-entropies and functional inequalities involving \(\Phi \)-entropies.

Corollary 1.2

In the setting of Proposition 1.1, the following two statements are equivalent.

  1. i

    There exists a constant \(c>0\) such that for all \(f\in \text {dom} (\mathscr {L})\)

    $$\begin{aligned} \textbf{Ent}_\mu ^\Phi (f) \le -c {\mathbb {E}}_\mu \left[ \Phi '(f)\mathscr {L}f \right] . \end{aligned}$$
  2. ii.

    There exists a constant \(c>0\) such that for all continuous and bounded \(f:E \rightarrow I\)

    $$\begin{aligned} \textbf{Ent}_\mu ^\Phi (P_tf) \le e^{-\frac{t}{c}}\textbf{Ent}_\mu ^\Phi (f). \end{aligned}$$

Note that, in the special case \(\Phi : u \mapsto u^2\), one recovers the Poincaré inequality

$$\begin{aligned} \textbf{Var}_\mu (f) \le -\frac{c}{2} \ \langle f, \mathscr {L}f \rangle _{L^2(\mu )}, \end{aligned}$$

which is well-known to be equivalent to exponential \(L^2\) ergodicity, see e.g. [7]. For a more detailed review of \(\Phi \)-entropies and further results we refer the interested reader to [2].

1.2 A Finite State-Space Example for the Trajectorial Approach

As one can see, the results above can be obtained without even mentioning the underlying stochastic process and just dealing with the semigroup and its generator. While the simplicity of this method is certainly attractive, it solely provides information about averaged quantities, i.e., the mean behaviour of the process, but it does not offer any insights into the behaviour of individual realisations. We therefore want to complement this perspective with a more refined and probabilistic approach that enables us to derive results at a trajectorial level.

For simplicity, we will first discuss the main ideas for the example of a continuous-time Markov chain on a finite state space. More precisely, let \((X_t)_{t\ge 0}\) be a Markov chain on a finite set E with irreducible generator \(\mathscr {L}\) and strictly positive invariant measure \(\mu \). Hence, the corresponding Markov semigroup is given by the matrix exponential \((e^{t\mathscr {L}})_{t \ge 0}\). Denote the underlying probability space by \((\Omega , \mathcal {A}, {\mathbb {P}})\) and assume that \(X_0 \sim \mu \) under \({\mathbb {P}}\).

It is easy to check that for all bounded \(f:[0,\infty ) \times E \rightarrow \mathbb {R}\) such that for all \(x \in E\) the partial derivatives \(\partial _t f(\cdot , x)\) are continuous and bounded, the process defined by

$$\begin{aligned} f(t, X_t) - \int _0^t (\partial _s + \mathscr {L})f(s,X_s)ds, \quad t \ge 0, \end{aligned}$$
(1.1)

is a martingale with respect to the canonical filtration generated by \((X_t)_{t \ge 0}\), see e.g. [14, Lemma IV.4.20].

If we now fix a finite time horizon \(T>0\) and consider the time-reversal \(({\hat{X}}_t)_{0 \le t \le T}\) of \((X_t)_{t\ge 0}\), where \({\hat{X}}_t = X_{T-t}\), then under \({\mathbb {P}}\) the time-reversed process is again a time-homogeneous Markov process with generator \(\hat{\mathscr {L}}\), where

$$\begin{aligned} \hat{\mathscr {L}}(x,y) = \frac{\mu (y)}{\mu (x)}\mathscr {L}(y,x). \end{aligned}$$

A short calculation now shows that, for each bounded \(g: E \rightarrow \mathbb {R}\) and \(T>0\), the process \((P_{T-s}g({\hat{X}}_s))_{0 \le s\le T}\) is a \((({\hat{\mathcal {F}}}_t)_{0 \le t \le T}, {\mathbb {P}})\)-martingale, where \({\hat{\mathcal {F}}}_t = \sigma (X_{T-s}: \ 0 \le s \le t)\). Indeed, we can use the chain rule to calculate

$$\begin{aligned} \partial _t P_{T-t}g(x) = - \hat{\mathscr {L}} P_{T-t}g(x), \end{aligned}$$

so the correction term in (1.1) vanishes. Note that it is crucial to use the time-reversed process here, since the correction term does not cancel out if one uses the forward process.

By convexity this directly implies that the time-reversed trajectorial \(\Phi \)-entropy, i.e., the process defined by

$$\begin{aligned} \Phi (P_{T-s}g(X(T-s))), \quad 0 \le t \le T, \end{aligned}$$

is a submartingale. The submartingale property of this process should be thought of as a stochastic monotonicity because it tells us that almost surely we have

$$\begin{aligned} {\mathbb {E}}\left[ \Phi (P_{T-s}g(X(T-s)))\Big |{\hat{\mathcal {F}}}_t\right] \ge \Phi (P_{T-t}g(X(T-t))), \quad 0 \le t \le s \le T. \end{aligned}$$
(1.2)

This provides us with valuable insights into the behavior of \(\Phi \)-entropy functionals along individual trajectories and can be viewed as a trajectorial refinement of DeBruijn’s Theorem, which we can recover from (1.2) by taking expectations with respect to \({\mathbb {P}}\). The submartingale property of the backward dynamics can be interpreted as follows: Regardless of our knowledge about the system’s future trajectory, we expect the \(\Phi \)-entropy to have decreased in the past. Hence, in a sense, the knowledge of a system’s trajectory does not influence the DeBruijn-like decay of \(\Phi \)-entropies when looking backwards in time.

Moreover, we can now apply the standard machinery of martingale inequalities to get concentration bounds on the fluctuations around the mean. For example, Doob’s classical submartingale inequality, see [14, Theorem II.52.1], implies that for all \(C > 0\) we have

$$\begin{aligned} {\mathbb {P}}\left[ \sup _{0 \le t \le T} \Phi (P_tf(\eta _t)) \ge C \right] \le \frac{\int \Phi (f)d\mu }{C}. \end{aligned}$$

Since the right-hand side does not depend on \(T>0\), we even get a tail bound for unbounded time intervals

$$\begin{aligned} {\mathbb {P}}\left[ \sup _{0 \le t < \infty } \Phi (P_tf(\eta _t)) \ge C \right] \le \frac{\int \Phi (f)d\mu }{C}. \end{aligned}$$

An analogous time-reversed martingale structure has been identified in the field of stochastic thermodynamics, specifically in the investigation of the decay of non-equilibrium free energies under Markovian dynamics, see [15, Sect. 9.1.4]. Exploring the underlying reasons for why the submartingale property appears exclusively in reversed time is a subject of ongoing research. For readers interested in delving deeper into this topic from a physics perspective, we recommend the comprehensive review article [15].

The main work is now to establish that a similar argument as in the case of a finite state space can also be made rigorous to treat infinite-dimensional systems like the interacting particle systems we consider. To our best knowledge, the first results of this kind, in the context of diffusions in \(\mathbb {R}^n\), have been achieved in [4]. More recently, starting with [9], these results have been extended to more and more classes of Markov processes, including continuous time Markov chains on countable state spaces, see [10]. The works [11, 16] are also in a similar spirit.

The setting will be made precise in Sect. 2, but roughly speaking, we consider continuous-time Markov jump processes on general configuration spaces \(\Omega = \Omega _0^S\), where S is an arbitrary countable set and \(\Omega _0\) is a compact Polish space. We will refer to the elements of S as sites and call \(\Omega _0\) the local state-space. In most examples considered in the literature, S is the vertex set of some graph like the d-dimensional hypercubic lattice \(\mathbb {Z}^d\), a tree or the Cayley graph of a group. This underlying spatial geometry dictates which particles can interact with each other and we are therefore not in the setting of mean-field systems but in an infinite-dimensional setting. This of course brings with it its own set of technical difficulties which need to be dealt with for making the time-reversal arguments work.

The main technical difficulties come from making sure that the time-reversal is again a well-defined interacting particle system and from obtaining a description of its generator. This is made possible by assuming some local regularity of the local conditional distributions of the time-stationary measure \(\mu \). Namely, by the assumption that \(\mu \) is actually a Gibbs measure with respect to a quasilocal specification that additionaly satisfies a certain smoothness condition. This condition is e.g. satisfied if the specification is given in terms of a potential \(\Phi = (\Phi _B)_{B \Subset S}\) such that

$$\begin{aligned} \sup _{x \in S}\sum _{B \Subset S: \ B \ni x}\left|B \right|\left\Vert \Phi _B \right\Vert _\infty < \infty , \end{aligned}$$

where the notation \(B \Subset S\) means that B is a finite subset of S. Note that this condition is for example satisfied for any translation-invariant finite-range potential, so our theory applies to a fairly large class of models.

1.3 Organisation of the Manuscript

The rest of this article is organised as follows. We will first collect the necessary notation and formulate our main results in Sect. 2. Then, as a first step, we investigate the time-reversal of interacting particle systems in equilibrium in Sect. 3 with the main goal of obtaining an explicit representation of the (formal) generator of the time-reversed dynamics. In Sect. 4, we will then apply these results to establish pathwise properties of general \(\Phi \)-entropy functionals.

2 Setting and Main Results

Let \((\Omega _0, \mathcal {B}_0)\) be a compact Polish space equipped with its Borel \(\sigma \)-algebra and \(\lambda _0\) a probability measure on \((\Omega _0, \mathcal {B}_0)\), which will serve as our reference measure. We will consider Markovian dynamics on the configuration space \(\Omega = \Omega _0^{S}\), where S is some countable set whose elements we will refer to as sites. In most applications this will be the set of vertices of some graph, e.g. \(\mathbb {Z}^d\) or a tree. We equip \(\Omega \) with the product topology and corresponding Borel \(\sigma \)-algebra \(\mathcal {F}\). Note that \(\mathcal {F}\) coincides with the product \(\sigma \)-algebra \(\otimes _{x \in S}\mathcal {B}_0\). For \(\Delta \subset S\) we will also write \(\Omega _\Delta := \Omega _0^\Delta \) for the set of partial configurations. We will also equip \(\Omega _\Delta \) with the product \(\sigma \)-algebra and the probability measure \(\lambda _\Delta = \otimes _{x \in \Delta }\lambda _0\). For \(\Lambda \subset S\), let \(\mathcal {F}_\Lambda \) be the sub-\(\sigma \)-algebra of \(\mathcal {F}\) that is generated by the projections \(\omega \mapsto \omega _\Delta \in \Omega _\Delta \) for \(\Delta \Subset \Lambda \), where we write \(\Subset \) to signify that a set is a finite subset of another set. For \(\Delta \subset S\) and (partial) configurations \(\eta _{\Delta ^c} \in \Omega _{\Delta ^c}\) and \(\xi _\Delta \in \Omega _\Delta \), we will write \(\xi _\Delta \eta _{\Delta ^c}\) for the configuration that is defined on all of S and agrees with \(\eta _{\Delta ^c}\) on \(\Delta ^c\) and with \(\xi _\Delta \) on \(\Delta \). For a topological space E, we will denote its Borel \(\sigma \)-algebra by \(\mathcal {B}(E)\) and the space of continuous real-valued functions on E by C(E). The space of non-negative measures on E, or more precisely on \(\mathcal {B}(E)\), will be denoted by \(\mathcal {M}(E)\) and is equipped with the topology of weak-convergence. The subset of probability measures, i.e., non-negative measures with total mass equal to one, will be denoted by \(\mathcal {M}_1(E)\). The total variation distance on \(\mathcal {M}(E)\) will be denoted by \(\left\Vert \cdot \right\Vert _{\text {TV}}\).

2.1 Interacting Particle Systems and Gibbs Measures

2.1.1 Interacting Particle Systems

We will consider time-continuous Markovian dynamics on \(\Omega \), namely interacting particle systems characterised by time-homogeneous generators \(\mathscr {L}\) with domain \(\text {dom}(\mathscr {L}) \subset C(\Omega )\) and the associated Markovian semigroup \((P_t)_{t \ge 0}\) on \(C(\Omega )\). For interacting particle systems we adopt the notation and exposition of the standard reference [13, Chapter I].

In our setting, the generator \(\mathscr {L}\) is given by a collection of transition measures \((c_\Delta (\cdot , d\xi ))_{\Delta \Subset S}\) in finite volumes \(\Delta \Subset S\), i.e., mappings

$$\begin{aligned} \Omega \ni \eta \mapsto c_\Delta (\eta , d\xi _\Delta ) \in \mathcal {M}(\Omega _\Delta ). \end{aligned}$$

These transition measures can be interpreted as the infinitesimal rates at which the particles inside \(\Delta \) switch from the configuration \(\eta _\Delta \) to \(\xi _\Delta \), given that the rest of the system is currently in state \(\eta _{\Delta ^c}\). The full dynamics of the interacting particle system is then given as the superposition of these local dynamics,

$$\begin{aligned} \mathscr {L}f(\eta ) = \sum _{\Delta \Subset S}\int _{\Omega _\Delta }\left[ f(\xi _\Delta \eta _{\Delta ^c})-f(\eta )\right] c_\Delta (\eta , d\xi _\Delta ). \end{aligned}$$
(2.1)

In [13, Chapter I] it is shown that the following conditions are sufficient to guarantee well-definedness.

(L1):

For each \(\Delta \Subset \Omega \) the mapping

$$\begin{aligned} \Omega \ni \eta \mapsto c_\Delta (\eta , d\xi _\Delta ) \in \mathcal {M}(\Omega _\Delta ) \end{aligned}$$

is continuous.

(L2):

The total rate at which a single particle switches its state is uniformly bounded, i.e.,

$$\begin{aligned} \sup _{x \in S}\sum _{\Delta \ni x}\sup _{\eta \in \Omega }c_\Delta (\eta , \Omega _\Delta ) < \infty . \end{aligned}$$
(L3):

The total influence of all other particles on the transition rates of a single particle is uniformly bounded, i.e.,

$$\begin{aligned} M := \sup _{x \in S}\sum _{\Delta \ni x}\sum _{y \ne x}\delta _y c_\Delta < \infty , \end{aligned}$$

where

$$\begin{aligned} \delta _y c_\Delta := \sup \left\{ \left\Vert c_\Delta (\eta , \cdot ) - c_\Delta (\xi , \cdot ) \right\Vert _{\text {TV}}: \ \eta _{y^c} = \xi _{y^c} \right\} . \end{aligned}$$

Under these conditions, the core of the operator \(\mathscr {L}\) is given by

$$\begin{aligned} D(\Omega ) := \left\{ f \in C(\Omega ): \ {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| f \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| } := \sum _{x \in S}\delta _x(f) < \infty \right\} , \end{aligned}$$

where for \(x \in S\)

$$\begin{aligned} \delta _x(f) := \sup _{\eta , \xi : \ \eta _{x^c} = \xi _{x^c}}\left|f(\eta ) - f(\xi ) \right| \end{aligned}$$

is the oscillation of a function \(f:\Omega \rightarrow \mathbb {R}\) at the site x. In addition, one can show the following estimates for \(\mathscr {L}\) and the action of the semigroup \((P_t)_{t \ge 0}\) generated by \(\mathscr {L}\). We will need these later on.

Lemma 2.1

Assume that the generator \(\mathscr {L}\) satisfies \(\mathbf {(L1)}-\mathbf {(L3)}\) and denote by \((P_t)_{t \ge 0}\) the semigroup generated by \(\mathscr {L}\).

  1. i.

    For \(f\in D(\Omega )\) we have \(P_tf \in D(\Omega )\) for all \(t \ge 0\) and

    $$\begin{aligned} {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| P_tf \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| } \le \exp \left( (M-\varepsilon )t\right) {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| f \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }. \end{aligned}$$
  2. ii.

    For all \(f \in D(\Omega )\) it holds that

    $$\begin{aligned} \left\Vert \mathscr {L}f \right\Vert _\infty \le \left( \sup _{x \in S}\sum _{\Delta \ni x}\sup _{\eta \in \Omega }c_\Delta (\eta , \Omega _\Delta ) \right) {\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| f \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| }. \end{aligned}$$

The constants are explicitly given by

$$\begin{aligned} M&= \sup _{x \in S} \sum _{\Delta \ni x}\sum _{y \ne x}\delta _y c_{\Delta } < \infty , \\ \varepsilon&= \inf _{x \in S} \inf _{\eta ,\zeta :\eta _{x^c} = \zeta _{x^c}, \eta _x \ne \zeta _x} \sum _{\Delta \ni x} \left( \sum _{\xi _{\Delta }: \xi _x = \zeta _x} c_{\Delta }(\eta , \xi _{\Delta }) + \sum _{\xi _{\Delta }: \xi _x = \eta _x} c_{\Delta }(\zeta , \xi _{\Delta }) \right) . \end{aligned}$$

Proof

Combine the results from Proposition 3.2(a) and Theorem 3.9.(d) in [13, Chapter I]. \(\square \)

For our purposes, the mere well-definedness of an interacting particle system is not sufficient and we need to assume some more regularity. All the additional assumptions we put will be used to make the generator of the time-reversal well-defined.

(R1):

For each \(\Delta \) and \(\eta \in \Omega \) the measure \(c_\Delta (\eta , d\xi _\Delta )\) is absolutely continuous with respect to  the reference measure \(\lambda _\Delta (d\xi _\Delta )\) with density \(c_\Delta (\eta , \cdot )\).

(R3):

For each \(\Delta \in \Omega \) the map

$$\begin{aligned} \Omega \times \Omega _\Delta \ni (\eta , \xi _\Delta ) \mapsto c_\Delta (\eta , \xi _\Delta ) \in \mathbb {R}\end{aligned}$$

is continuous with respect to the product topology.

(R3):

The total rate of transition for a single site is uniformly bounded from above

$$\begin{aligned} \sup _{x \in S}\sum _{\Delta \ni x}\sup _{\eta \in \Omega }\left\Vert c_\Delta (\eta , \cdot ) \right\Vert _\infty < \infty . \end{aligned}$$
(R4):

The condition \(\mathbf {(L3)}\) is satisfied, i.e.,

$$\begin{aligned} \sup _{x \in S}\sum _{\Delta \ni x}\sum _{y \ne x} \delta _yc_\Delta < \infty . \end{aligned}$$
(R5):

There exists an \(R>0\) such that for all \(\Delta \Subset \mathbb {Z}^d\) with \(\left|\Delta \right| > R\) we have

$$\begin{aligned} \sup _{\eta \in \Omega ,\xi _\Delta \in \Omega _\Delta }c_\Delta (\eta ,\xi _\Delta ) = 0. \end{aligned}$$

We will comment on where and why we need these assumptions and their connection to the classical conditions (L1)–(L3) at the end of Sect. 2.1.2, after we have stated our assumptions on the local conditional distribution of the time-stationary measure \(\mu \).

2.1.2 Gibbs Measures and the DLR Formalism

We will mainly be interested in the situation where the process generated by \(\mathscr {L}\) admits a time-stationary measure \(\mu \) with a well-behaved local representation, namely that \(\mu \) is a Gibbs measure with respect to to a sufficiently nice specification \(\gamma \). Let us therefore first recall the general definition of a specification.

Definition 2.2

A specification \(\gamma = (\gamma _\Lambda )_{\Lambda \Subset S}\) is a family of probability kernels \(\gamma _{\Lambda }\) from \(\Omega _{\Lambda ^c}\) to \(\mathcal {M}_1(\Omega )\) that additionally satisfies the following properties.

  1. i.

    Each \(\gamma _{\Lambda }\) is proper, i.e., for all \(B \in \mathcal {F}_{\Lambda ^c}\) it holds that

    $$\begin{aligned} \gamma _\Lambda (B | \cdot ) = \textbf{1}_B(\cdot ). \end{aligned}$$
  2. ii.

    The probability kernels are consistent in the sense that if \(\Delta \subset \Lambda \Subset S\), then for all \(A \in \mathcal {F}\)

    $$\begin{aligned} \gamma _{\Lambda }\gamma _{\Delta }(A|\cdot ) = \gamma _\Lambda (A|\cdot ), \end{aligned}$$

    where the concatenation of two probability kernels is defined as usual via

    $$\begin{aligned} \gamma _\Lambda \gamma _\Delta (A |\eta ) = \int _\Omega \gamma _\Delta (A |\omega ) \gamma _\Lambda (d\omega |\eta ). \end{aligned}$$

For the existence and further properties of Gibbs measures with specification \(\gamma \) one needs to impose some conditions on the specification \(\gamma \). One sufficient condition for the existence of a Gibbs measure for a specification \(\gamma \) is quasilocality, see e.g. [5] or [6]. For the following sections we will need to assume some more regularity for the specification \(\gamma \). In particular, these assumptions will guarantee that \(\gamma \) is quasilocal.

(S1):

For each \(\Delta \Subset S\) and \(\eta \in \Omega \), the probability measure \(\gamma _\Delta (d\xi _\Delta |\eta )\) is absolutely continuous with respect to the reference measure \(\lambda _\Delta (d\xi _\Delta )\) with density \(\gamma _\Delta (\cdot |\eta )\).

(S2):

For all \(\Delta \Subset S\), the map

$$\begin{aligned} \Omega \times \Omega _\Delta \ni (\eta , \xi _\Delta ) \mapsto \gamma _\Delta (\xi _\Delta |\eta _{\Delta ^c}) \in [0, \infty ) \end{aligned}$$

is continuous (with respect to the product topology).

(S3):

The conditional densities on the single spin spaces are uniformly bounded away from zero and infinity, i.e.,

$$\begin{aligned} 0< \delta \le \inf _{x \in S}\inf _{\eta \in \Omega }\gamma _{x}(\eta _x|\eta _{x^c}) \le \sup _{x \in S}\sup _{\eta \in \Omega }\gamma _x(\eta _x |\eta _{x^c}) \le \delta ^{-1} <\infty . \end{aligned}$$
(S4):

We have

$$\begin{aligned} \sup _{x \in S}\sum _{\Delta \ni x:\ c_\Delta >0} \sum _{y \ne x}\delta _y\gamma _\Delta < \infty , \end{aligned}$$

where

$$\begin{aligned} \delta _y\gamma _\Delta = \sup \left\{ \left\Vert \gamma _\Delta (d\xi _\Delta |\eta ) - \gamma _\Delta (d\xi _\Delta |\zeta ) \right\Vert _{\text {TV}}: \ \eta _{y^c}=\zeta _{y^c} \right\} . \end{aligned}$$

Remark 2.3

Now that we have stated all of the conditions that we need, let us briefly comment on why and where we need them.

  1. i.

    Assumption \(\mathbf {(R3)}\) clearly implies \(\mathbf {(L2)}\) and together with \(\mathbf {(R4)}\) ensures that the interacting particle system is well-defined.

  2. ii.

    Assumption \(\mathbf {(R1)}\) and \(\mathbf {(S1)}\) allow us to write down the local transition density of the time-reversal and \(\mathbf {(S3)}\) makes sure that we are not performing a division by zero.

  3. iii.

    The further regularity assumptions \(\mathbf {(R3)}\), \(\mathbf {(R5)}\) \(\mathbf {(S3)}\), \(\mathbf {(S4)}\) and the continuity assumptions \(\mathbf {(R2)}\) and \(\mathbf {(S2)}\) make sure that the local transition densities of the time-reversal also give rise to a well-defined interacting particle system.

  4. iv.

    The quantity in \(\mathbf {(S4)}\) is similar to the classical Dobrushin uniqueness condition, see [6]. However, we only need it to be finite and not strictly smaller than one.

Example 2.4

One particular class of models to which our theory can be applied to are spin systems for which the specification \(\gamma \) is defined via a potential \(\Phi = (\Phi _B)_{B \Subset S}\) that satisfies

$$\begin{aligned} \sup _{x \in S}\sum _{B \ni x} \left|B \right|\left\Vert \Phi _B \right\Vert _\infty < \infty , \end{aligned}$$

and where the rates are of the form

$$\begin{aligned} c_\Delta (\eta , \xi _\Delta ) = {\left\{ \begin{array}{ll} \exp \left( -\beta \sum _{B: B \cap \Delta \ne \emptyset }\Phi _B(\xi _\Delta \eta _{\Delta ^c})\right) , \quad &{}\text {if } \left|\Delta \right| = 1, \\ 0, &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$

Instead of these single-site updates one could also consider updates in larger regions with a bounded diameter. Then the rates satisfy \(\mathbf {(R1)}-\mathbf {(R5)}\) and the specification satisfies \(\mathbf {(S1)}-\mathbf {(S4)}\), as one can see by using similar arguments as in the proof of [5, Lemma 6.28]. Note that this class of examples includes models with long-range pairwise interactions \((\Phi _{\{x,y\}})_{x,y \in \mathbb {Z}^d}\) that satisfy

$$\begin{aligned} \left\Vert \Phi _{\{x,y\}} \right\Vert _\infty \sim \left\Vert x-y \right\Vert _1^{-\alpha } \end{aligned}$$

for some \(\alpha > d\).

2.2 The Time-Reversal of an Interacting Particle System

In the notation of above, assume that \(\mu \in \mathscr {G}(\gamma )\) is Gibbs measure for a quasilocal specification \(\gamma \), i.e., assume that \(\mu \) satisfies the DLR equations

$$\begin{aligned} \mu (f)=\mu (\gamma _{\Lambda }(f| \cdot )) \end{aligned}$$

for all \(\Lambda \Subset S\) and bounded measurable functions f. Further assume that \(\mu \) is time-stationary with respect to the Markovian dynamics with generator \(\mathscr {L}\). Denote the semigroup generated by \(\mathscr {L}\) by \((P_t)_{t \ge 0}\) and the corresponding process on \(\Omega \) by \((\eta (t))_{t \ge 0}\). As discussed in Sect. 1.2, for each fixed \(T>0\) the process \((\eta (T-t))_{0 \le t \le T}\) is again a time-homogeneous Markov process and under some suitable assumptions its associated semigroup has a generator \(\hat{\mathscr {L}}\). But what does this generator look like? For general Markov processes it is not possible to give a closed form expression, but in our case we can use the special structure of \(\mathscr {L}\) as the superposition of local dynamics in finite volumes. In each of these finite volumes, it is clear how the time-reversal with respect to \(\mu \) should look and we can hope that we can again write \(\hat{\mathscr {L}}\) as the superposition of finite volume processes. With this Ansatz, the probabilistic intuition dictates the educated guess

$$\begin{aligned} {\hat{c}}_\Delta (\eta , \xi _\Delta ) = c_\Delta (\xi _\Delta \eta _{\Delta ^c}, \eta _\Delta ) \frac{\gamma _\Delta (\xi _\Delta |\eta _{\Delta ^c})}{\gamma _\Delta (\eta _\Delta |\eta _{\Delta ^c})} \end{aligned}$$
(2.2)

for the transition densities appearing in the generator of the time-reversed interacting particle system. However, at this stage, it is not obvious that the generator of the time-reversed system is again of the form (2.1) and has precisely these rates. For Markov processes on finite state spaces this is an easy calculation but we have to put in some more work, which will be carried out in Sect. 3. We obtain the following result that extends results from [12] to a much more general setting.

Proposition 2.5

(Time-reversal generator) Let the rates of an interacting particle system with generator \(\mathscr {L}\) satisfy \(\mathbf {(R1)}-\mathbf {(R5)}\) and assume that \(\mu \) is a time-stationary measure for the corresponding Markov semigroup \((P(t))_{t \ge 0}\) on \(C(\Omega )\) that is generated by \(\mathscr {L}\) such that \(\mu \) is a Gibbs measure with respect to a specification \(\gamma \) that satisfies \(\mathbf {(S1)}-\mathbf {(S4)}\). Then, the time-reversed process has a generator \(\hat{\mathscr {L}}\) whose transition densities (with respect to the reference measure \(\lambda _0\)) are given by

$$\begin{aligned} {\hat{c}}_\Delta (\eta , \xi _\Delta ) = c_\Delta (\xi _\Delta \eta _{\Delta ^c}, \eta _\Delta ) \frac{\gamma _\Delta (\xi _\Delta |\eta _{\Delta ^c})}{\gamma _\Delta (\eta _\Delta |\eta _{\Delta ^c})}. \end{aligned}$$

The proof of this can be found at the end of Sect. 3.

2.3 Trajectorial Decay of \(\Phi \)-Entropies

With this auxiliary result at hand, we can then obtain the following result, which describes the dissipation of general \(\Phi \)-entropies on a trajectorial level. Before we state the theorem, let us introduce some further notation to express the main equation in a cleaner way. The Bregman \(\Phi \)-divergence associated with \(\Phi : I \rightarrow \mathbb {R}\) is defined as

$$\begin{aligned} \text {div} ^{\Phi }(p |q) := \Phi (p) - \Phi (q) - (p-q)\Phi '(q), \quad p,q \in I. \end{aligned}$$

This is precisely the difference between the value of \(\Phi \) at the point p and the value of the first-order Taylor expansion of \(\Phi \) around q, evaluated at p and is non-negative, since we assumed that \(\Phi \) is convex. Bregman divergences are sometimes also referred to as Bregman distances, despite not being a metric since they are in general not symmetric and do not satisfy the triangle inequality. They are however still useful for applying techniques from optimisation theory in more general contexts, e.g. in statistical learning theory [1]. As an example, consider the classical entropy function \(\Phi : u \mapsto u \log u\). Then the associated Bregman divergence is given by

$$\begin{aligned} \text {div} ^{\Phi }(p |q) = p \log p - p \log q + p - q = p \log \left( \frac{p}{q}\right) + p - q. \end{aligned}$$

Note that we now have to be careful with the probability space and filtration we are working with, since we are talking about results on a trajectorial level.

Theorem 2.6

(Trajectorial decay of \(\Phi \)-entropies) Let \((\Omega , \mathcal {A}, {\mathbb {P}})\) be a probability space on which the interacting particle system \((\eta (s))_{s \ge 0}\) is defined. Denote the generator of the interacting particle system by \(\mathscr {L}\) and assume that its rates satisfy \(\mathbf {(R1)}-\mathbf {(R5)}\) and assume that under \({\mathbb {P}}\) we have \(\eta _0 \sim \mu \), where \(\mu \) is a time-stationary measure for the corresponding Markov semigroup \((P_t)_{t \ge 0}\) on \(C(\Omega )\) that is generated by \(\mathscr {L}\) such that \(\mu \) is a Gibbs measure with respect to a specification \(\gamma \) that satisfies \(\mathbf {(S1)}-\mathbf {(S4)}\). Let \(I \subset \mathbb {R}\) be an interval and \(\Phi : I \rightarrow \mathbb {R}\) be a continuously differentiable convex function. Then, for any \(f \in D(\Omega )\) such that \(f(\Omega ) \subset I\), and \(T>0\), the process defined by

$$\begin{aligned} L^{\Phi , f}(s) := \Phi (P_{T-s}f(\eta _{T-s})), \quad 0 \le s \le T, \end{aligned}$$
(2.3)

is a \((({\hat{\mathcal {G}}}_t)_{0 \le t \le T},{\mathbb {P}})\)-submartingale, where \({\hat{\mathcal {G}}}_t = \sigma (\eta (T-s): \ 0 \le s \le t)\). Its Doob–Meyer decomposition is given by

$$\begin{aligned} L^{\Phi , f}(t)&= M^{\Phi , f}(t) \nonumber \\&\quad +\int _0^t \sum _{\Delta \Subset S}\int _{\Omega _\Delta }{\hat{c}}_\Delta (\eta (T-s), \xi _\Delta )\text {div} ^{\Phi }\nonumber \\&\quad \times (P_{T-s}(f(\xi _\Delta \eta _{\Delta ^c}(T-s))|P_{T-s}f(\eta (T-s))) \lambda _\Delta (d\xi _\Delta ) ds. \end{aligned}$$
(2.4)

The proof of this theorem can be found at the end of Sect. 4. While one can recover the classical DeBruijn-like decay of \(\Phi \)-entropies as stated in Proposition 1.1 by taking expectations, let us stress that Theorem 2.6 gives us a much more precise description of the process. Indeed, the submartingale property provides us with information on the behaviour of the process in conditional mean, and not just on the level of ensemble averages. So we can actually say that, irrespective of the current state and future trajectory of our system, we expect that the \(\Phi \)-entropy process has decreased in the past. Furthermore, the submartingale property allows us to make use of the strong machinery of (sub)martingale inequalities and thereby derive new results on the pathwise deviations from the ensemble average. Indeed, from Doob’s submartingale inequality, see [14, Theorem II.52.1], we immediately get the following explicit concentration bound for the trajectories:

$$\begin{aligned} \forall C >0: \quad {\mathbb {P}}\left[ \sup _{0 \le t < \infty }L^{\Phi , f}(t) \ge C\right] \le \frac{\int _\Omega \Phi (f) d\mu }{C}. \end{aligned}$$

This allows us to bound the probability of ever seeing large pathwise deviations from the ensemble average, as described by DeBruijn’s Theorem.

For the sake of concreteness, let us write out the result from Theorem 2.6 explicitly for one of the simplest cases, namely the trajectorial decay of variance, corresponding to \(\Phi : u \mapsto u^2\).

Corollary 2.7

(Trajectorial decay of variance) In the setting of Theorem 2.6, we have that, for any \(f \in D(\Omega )\) and \(T>0\), the process defined by \((P_{T-s}f(\eta _{T-s}))^2\), \(0 \le s \le T\), is a \((({\hat{\mathcal {G}}}_t)_{0 \le t \le T},{\mathbb {P}})\)-submartingale, where \({\hat{\mathcal {G}}}_t = \sigma (\eta (T-s): \ 0 \le s \le t)\). Its Doob–Meyer decomposition is given by

$$\begin{aligned} (P_{T-s}f(\eta _{T-s}))^2&= M^f(t)+ \int _0^t \sum _{\Delta \Subset S} \int _{\Omega _\Delta }{\hat{c}}_\Delta (\eta (T-s), \xi _\Delta ) \\&\quad \times \left[ f(\xi _\Delta \eta _{\Delta ^c}(T-s)) - f(\eta (T-s))\right] ^2\lambda _\Delta (d\xi _\Delta ) ds. \end{aligned}$$

2.4 Outlook

Even though we were able to show the trajectorial decay for the relative entropy under quite general assumptions on the dynamics, these results are not fully satisfactory in the context of statistical mechanics. The usually more interesting Lyapunov functional in this setting is the so-called relative entropy density, as e.g.  considered in [5], which is not only defined for measures \(\nu \) that are absolutely continuous with respect to  \(\mu \). Therefore, it would be much more natural to work with this functional \(h(\cdot |\mu ):\mathcal {M}_1^{inv}(\Omega ) \rightarrow \mathbb {R}\) instead and one can show that is also a Lyapunov function for interacting particle systems under quite general assumptions, see [8], but it is somewhat unclear how to even formulate conjectures about the trajectorial properties of this functional, since one cannot naively evaluate it pointwise.

As we already saw in the case of a continuous time Markov chain on a finite state space, the main ingredient for this type of result is to obtain an explicit description of the generator of the time-reversed process. Another class of processes that could be of interest and is not covered by our results are systems which evolve continuously on their single spin spaces, as opposed to our pure-jump processes. The first example that comes to mind are of course systems of (infinitely-many) interacting diffusions, e.g.  indexed by \(\mathbb {Z}^d\). We expect that, if a given system of interacting diffusions admits a Gibbs measure as an invariant probability measure, then a combination of the techniques in [9] and this article should yield analogous results—of course under some suitable regularity conditions on the coefficients.

3 The Time-Reversed Interacting Particle System and Its Generator

The main goal of this section is to prove Proposition 2.5, thereby establishing that the generator of the time-reversal is indeed given by \(\hat{\mathscr {L}}\). For this we will need to establish some regularity properties for the transition densities as defined in (2.2).

3.1 Upper and Lower Bounds for the Conditional Densities

Since we will need to deal with quotients involving the conditional densities \(\gamma _\Delta \) on arbitrary finite subsets \(\Delta \Subset S\), we will need to lift the upper and lower bounds from \(\mathbf {(S3)}\) to this more general case. This is essentially the content of the following lemma.

Lemma 3.1

Let \(\gamma \) be a specification that satisfies \(\mathbf {(S1)}\) and \(\mathbf {(S3)}\). Then, there exists a constant \(C \in (0,\infty )\) such that for all \(\Delta \Subset S\) we have the estimate

$$\begin{aligned} e^{-C\left|\Delta \right|} \le \inf _{ \eta \in \Omega }\gamma _\Delta (\eta _\Delta |\eta _{\Delta ^c}) \le \sup _{\eta \in \Omega } \gamma _\Delta (\eta _\Delta |\eta _{\Delta ^c}) \le e^{C\left|\Delta \right|}. \end{aligned}$$

This constant is precisely given by \(C = \left|\log \delta \right|\).

Proof

For this, fix an enumeration \(i_1, \dots , i_k\) of the elements of \(\Delta \) and introduce the notation

$$\begin{aligned}{}[i_j, i_k] := \left\{ i_j, i_{j+1},\dots , i_k \right\} , \quad 1 \le j \le k. \end{aligned}$$

With this notation at hand, we can use the chain rule for conditional probability densities to write

$$\begin{aligned} \gamma _\Delta (\eta _{\Delta }|\eta _{\Delta ^c}) = \prod _{j=1}^k\gamma _{[i_1,i_j]}(\eta _{i_j}|\eta _{[i_{j+1},i_k]}\eta _{\Delta ^c}), \end{aligned}$$
(3.1)

where \(\gamma _{[i_1,i_j]}(\eta _{i_j}|\eta _{[i_{j+1},i_k]}\eta _{\Delta ^c})\) is the marginal density of the measure \(\gamma _{[i_1,i_j]}(d \eta _{[i_1,i_j]} |\eta _{[i_{j+1},i_k]} \eta _{\Delta ^c})\) with respect to the site \(i_j\). But, using consistency of the specification \(\gamma \), we have

$$\begin{aligned} \gamma _{[i_1,i_j]}(\eta _{i_j}|\eta _{[i_{j+1},i_k]}\eta _{\Delta ^c})&=\int \gamma _{[i_1,i_j]}(d\xi _{[i_1,i_j]}|\eta _{[i_{j+1},i_k]}\eta _{\Delta ^c})\gamma _{i_j}(\eta _{i_j}|\xi _{[i_1,i_{j-1}]}\eta _{[i_{j+1},i_k]}\eta _{\Delta ^c}), \end{aligned}$$

which is, by assumption, upper bounded by \(\delta ^{-1}\) and lower bounded by \(\delta \). In conjunction with the representation (3.1) this implies the desired upper and lower bound where the constant C is explicitly given by \(C= \left|\log (\delta ) \right|\). \(\square \)

As a corollary we now get the following estimate for the quotients that appear in the definition of transition density of the time-reversal (2.2).

Lemma 3.2

Let \(\Delta \Subset S\) and \(\gamma \) be a specification that satisfies \(\mathbf {(S1)}\) and \(\mathbf {(S3)}\). Then, for all \(\Delta \Subset S\), \(\eta \in \Omega \) and \(\xi _\Delta \in \Omega _\Delta \), we have

$$\begin{aligned} 0 < e^{-2C \left|\Delta \right|} \le \frac{\gamma _\Delta (\xi _\Delta |\eta _{\Delta ^c})}{\gamma _\Delta (\eta _\Delta |\eta _{\Delta ^c})} \le e^{2C \left|\Delta \right|}. \end{aligned}$$

3.2 The Switching Lemma

Now that we can be sure that the densities as in (2.2) are actually well-defined and we are not performing a division by zero, we can start showing that \(\hat{\mathscr {L}}\) is indeed the generator of the time-reversed process. The main technical tool will be the following lemma.

Lemma 3.3

Let the rates of a well-defined interacting particle system with generator \(\mathscr {L}\) satisfy \(\mathbf {(R1)}\) and assume that \(\mu \) is a time-stationary measure for \(\mathscr {L}\) and \(\mu \) is a Gibbs measure with respect to a specification \(\gamma \) that satisfies \(\mathbf {(S1)}\) and \(\mathbf {(S3)}\). Then, we have for all bounded and measurable \(f,g:\Omega \rightarrow \mathbb {R}\) and \(\Delta \Subset S\) that

$$\begin{aligned}&\int _{\Omega _\Delta }\int _{\Omega }c_{\Delta }(\omega , \xi _{\Delta })f(\omega )g(\xi _{\Delta }\omega _{\Delta ^c})\mu (d\omega ) \lambda _\Delta (d\xi _\Delta ) \nonumber \\&\quad = \int _{\Omega _\Delta }\int _{\Omega }{\hat{c}}_{\Delta }(\omega , \xi _{\Delta })f(\xi _{\Delta }\omega _{\Delta ^c})g(\omega )\mu (d\omega )\lambda _\Delta (d\xi _\Delta ), \end{aligned}$$
(3.2)

where

$$\begin{aligned} {\hat{c}}_\Delta (\eta , \xi _\Delta ) = c_\Delta (\xi _\Delta \eta _{\Delta ^c}, \eta _\Delta ) \frac{\gamma _\Delta (\xi _\Delta |\eta _{\Delta ^c})}{\gamma _\Delta (\eta _\Delta |\eta _{\Delta ^c})}. \end{aligned}$$

To keep the notation for conditional expectations in the upcoming proof simple, we will denote integration with respect to \(\mu \) by \(\mathbb {E}[\cdot ]\).

Proof

As a first step, note that for fixed \(\Delta \Subset S\) and \(\xi _\Delta \in \Omega _\Delta \) the maps

$$\begin{aligned} \Omega \ni \omega \mapsto g(\xi _{\Delta }\omega _{\Delta ^c}) \in \mathbb {R}, \quad \Omega \ni \omega \mapsto f(\xi _{\Delta }\omega _{\Delta ^c}) \in \mathbb {R}, \end{aligned}$$

are \(\mathcal {F}_{\Delta ^c}\)-measurable. Therefore, we can use that \(\gamma \) is the local conditional distribution of \(\mu \) and the definition of the rates \({\hat{c}}\) to obtain the \(\mu \)-almost-sure identity

$$\begin{aligned}&\mathbb {E}\left[ c_{\Delta }(\cdot , \xi _{\Delta })f(\cdot )g(\xi _{\Delta }\cdot _{\Delta ^c}) |\mathcal {F}_{\Delta ^c} \right] (\omega ) \\&\quad = g(\xi _{\Delta }\omega _{\Delta ^c}) \mathbb {E}\left[ c_{\Delta }(\cdot , \xi _{\Delta })f(\cdot ) |\mathcal {F}_{\Delta ^c} \right] (\omega ) \\&\quad = g(\xi _{\Delta }\omega _{\Delta ^c}) \int _{\Omega _\Delta } \gamma _{\Delta }(\zeta _{\Delta }|\omega _{\Delta ^c})c_{\Delta }(\zeta _{\Delta }\omega _{\Delta ^c}, \xi _{\Delta })f(\zeta _{\Delta }\omega _{\Delta ^c})\lambda _\Delta (d\zeta _\Delta ) \\&\quad = g(\xi _{\Delta }\omega _{\Delta ^c}) \int _{\Omega _\Delta } \gamma _{\Delta }(\xi _{\Delta }|\omega _{\Delta ^c}){\hat{c}}_{\Delta }(\xi _{\Delta }\omega _{\Delta ^c}, \zeta _{\Delta })f(\zeta _{\Delta }\omega _{\Delta ^c}) \lambda _\Delta (d\zeta _\Delta ). \end{aligned}$$

If we now integrate over \(\xi _\Delta \), exchange the order of integration (via Fubini) and apply the same arguments in reverse— with f taking the role of g and vice versa—we get

$$\begin{aligned}&\int _{\Omega _\Delta } \mathbb {E}\left[ c_{\Delta }(\cdot , \xi _{\Delta })f(\cdot )g(\xi _{\Delta }\cdot _{\Delta ^c}) |\mathcal {F}_{\Delta ^c} \right] (\eta ) \lambda _\Delta (d\xi _\Delta ) \\&\quad = \int _{\Omega _\Delta } \mathbb {E}\left[ {\hat{c}}_{\Delta }(\cdot , \zeta _{\Delta })f(\zeta _{\Delta }\cdot _{\Delta ^c})g(\cdot ) |\mathcal {F}_{\Delta ^c} \right] (\eta )\lambda _\Delta (d\zeta _\Delta ). \end{aligned}$$

By integrating both sides with respect to \(\mu \), exchanging the order of integration, and applying the law of total expectation we obtain

$$\begin{aligned}&\int _{\Omega _\Delta }\int _{\Omega _\Delta }(\omega , \xi _{\Delta })f(\omega )g(\xi _{\Delta }\omega _{\Delta ^c})\mu (d\omega ) \lambda _\Delta (d\xi _\Delta ) \\&= \int _{\Omega _\Delta }\int _{\Omega }{\hat{c}}_{\Delta }(\omega , \zeta _{\Delta })f(\zeta _{\Delta }\omega _{\Delta ^c})g(\omega )\mu (d\omega )\lambda _\Delta (d\zeta _\Delta ), \end{aligned}$$

as desired. \(\square \)

3.3 Regularity of the Time-Reversal Rates

To make sure that \(\hat{\mathscr {L}}\) is actually the generator of a well-defined interacting particle system we now show that the collection of transition measures \(({\hat{c}}_\Delta (\cdot , \cdot ))_{\Delta \Subset S}\) satisfies the three conditions (L1)–(L3).

Proposition 3.4

Let the rates of an interacting particle system with generator \(\mathscr {L}\) satisfy \(\mathbf {(R1)}-\mathbf {(R5)}\) and assume that \(\mu \) is a time-stationary measure for \(\mathscr {L}\) and such that \(\mu \) is a Gibbs measure with respect to a specification \(\gamma \) that satisfies (S1)–(S4). Then, the transition measures \(({\hat{c}}_\Delta (\cdot , d\xi _\Delta ))_{\Delta \Subset \mathbb {Z}^d}\) with \(\lambda _\Delta \)-densities given by

$$\begin{aligned} {\hat{c}}_\Delta (\eta , \xi _\Delta ) = c_\Delta (\xi _\Delta \eta _{\Delta ^c}, \eta _\Delta ) \frac{\gamma _\Delta (\xi _\Delta |\eta _{\Delta ^c})}{\gamma _\Delta (\eta _\Delta |\eta _{\Delta ^c})} \end{aligned}$$

satisfy the conditions (L1)–(L3).

Proof

Ad \(\mathbf {(L1)}\): This follows from the continuity assumptions \(\mathbf {(R2)}\) and \(\mathbf {(S2)}\), together with assumption \(\mathbf {(S3)}\) and Lemma 3.2.

Ad \(\mathbf {(L2)}\): Note that for fixed \(\Delta \Subset S\), \(\xi _\Delta \in \Omega _\Delta \) and \(\eta \in \Omega \) we have by Lemma 3.2 and assumption \(\mathbf {(R5)}\)

$$\begin{aligned} \left|c_\Delta (\eta , \xi _\Delta ) \right| = \left|c_\Delta (\xi _\Delta \eta _{\Delta ^c},\eta _\Delta )\frac{\gamma _\Delta (\xi _\Delta |\eta _{\Delta ^c}}{\gamma _\Delta (\eta _\Delta |\eta _{\Delta ^c})} \right| \le \frac{1}{\delta }e^R c_\Delta (\xi _\Delta \eta _{\Delta ^c}, \eta _\Delta ). \end{aligned}$$

So we get

$$\begin{aligned} \sup _{\eta \in \Omega }{\hat{c}}_\Delta (\eta , \Omega _\Delta ) = \sup _{\eta \in \Omega }\int _{\Omega _\Delta }{\hat{c}}_\Delta (\eta , \xi _\Delta )\lambda _\Delta (d\xi _\Delta )&\le \frac{1}{\delta }e^R \sup _{\eta \in \Omega }\int _{\Omega _\Delta }c_\Delta (\xi _\Delta \eta _{\Delta ^c}, \eta _{\Delta })\lambda _\Delta (d\xi _\Delta ) \\&\le \frac{1}{\delta }e^R \sup _{\eta \in \Omega } \left\Vert c_\Delta (\eta , \cdot ) \right\Vert _\infty . \end{aligned}$$

Therefore, assumption \((\textbf{R3})\) implies that \(\mathbf {(L2)}\) is also satisfied.

Ad \(\mathbf {(L3)}\): Fix \(\Delta \Subset S\), \(y \in S\) and two configurations \(\eta ^1, \eta ^2\) that only disagree at y. Then, for any \(\xi _\Delta \in \Omega _\Delta \) we have

$$\begin{aligned}&\left|{\hat{c}}_\Delta \left( \eta ^1, \xi _\Delta \right) -{\hat{c}}_\Delta \left( \eta ^2, \xi _\Delta \right) \right|\\&= \left| c_\Delta \left( \xi _\Delta \eta ^1_{\Delta ^c}, \eta ^1_\Delta \right) \frac{\gamma _\Delta \left( \xi _\Delta |\eta ^1_{\Delta ^c}\right) }{\gamma _\Delta \left( \eta ^1_\Delta |\eta ^1_{\Delta ^c}\right) } -c_\Delta \left( \xi _\Delta \eta ^2_{\Delta ^c},\eta ^2_\Delta \right) \frac{\gamma _\Delta \left( \xi _\Delta |\eta ^2_{\Delta ^c}\right) }{\gamma _\Delta \left( \eta ^2_\Delta |\eta ^2_{\Delta ^c}\right) } \right| \\&\le \left|c_\Delta \left( \xi _\Delta \eta ^1_{\Delta ^c}, \eta ^1_\Delta \right) \right| \left|\frac{\gamma _\Delta \left( \xi _\Delta |\eta ^1_{\Delta ^c}\right) }{\gamma _\Delta \left( \eta ^1_\Delta |\eta ^1_{\Delta ^c}\right) } - \frac{\gamma _\Delta \left( \xi _\Delta |\eta ^2_{\Delta ^c}\right) }{\gamma _\Delta \left( \eta ^2_\Delta |\eta ^2_{\Delta ^c}\right) } \right| \\&\quad + \left|\frac{\gamma _\Delta \left( \xi _\Delta |\eta ^2_{\Delta ^c}\right) }{\gamma _\Delta \left( \eta ^2_\Delta |\eta ^2_{\Delta ^c}\right) } \right| \left|c_\Delta \left( \xi _\Delta \eta ^1_{\Delta ^c}, \eta ^1_{\Delta }\right) - c_\Delta \left( \xi _\Delta \eta ^2_{\Delta ^c}, \eta ^2_{\Delta }\right) \right|. \end{aligned}$$

To estimate this further, we will have to make a case distinction over whether the site y is contained in \(\Delta \) or not. If y is contained in \(\Delta \), then we can naively use Lemma 3.2 and assumption \(\mathbf {(R5)}\) to obtain the rough estimate

$$\begin{aligned} \left|{\hat{c}}_\Delta \left( \eta ^1, \xi _\Delta \right) -{\hat{c}}_\Delta \left( \eta ^2, \xi _\Delta \right) \right| \le 4 \frac{1}{\delta }e^R \sup _{\eta \in \Omega , \xi _\Delta \in \Omega _\Delta }\left|c_\Delta \left( \eta , \xi _\Delta \right) \right| \le \frac{4 e^R K(c)}{\delta }. \end{aligned}$$

In the case where y is not contained in \(\Delta \), we can (and have to) be a bit more precise. Via the elementary algebraic rule

$$\begin{aligned} ac - bd = \frac{1}{2}\left[ (a-b)(c+d) + (a+b)(c-d)\right] , \end{aligned}$$

and Lemma 3.2 plus assumption \(\mathbf {(R5)}\) one obtains

$$\begin{aligned}&\left|c_\Delta \left( \xi _\Delta \eta ^1_{\Delta ^c}, \eta ^1_\Delta \right) \right| \left|\frac{\gamma _\Delta \left( \xi _\Delta |\eta ^1_{\Delta ^c}\right) }{\gamma _\Delta \left( \eta ^1_\Delta |\eta ^1_{\Delta ^c}\right) } - \frac{\gamma _\Delta \left( \xi _\Delta |\eta ^2_{\Delta ^c}\right) }{\gamma _\Delta \left( \eta ^2_\Delta |\eta ^2_{\Delta ^c}\right) } \right| + \left|\frac{\gamma _\Delta \left( \xi _\Delta |\eta ^2_{\Delta ^c}\right) }{\gamma _\Delta \left( \eta ^2_\Delta |\eta ^2_{\Delta ^c}\right) } \right| \left|c_\Delta \left( \xi _\Delta \eta ^1_{\Delta ^c}, \eta ^1_{\Delta }\right) - c_\Delta \left( \xi _\Delta \eta ^2_{\Delta ^c}, \eta ^2_{\Delta }\right) \right| \\&\quad = \frac{1}{2} \left|c_{\Delta }\left( \xi _{\Delta }\eta ^1_{\Delta ^c}, \eta ^1_{\Delta }\right) \right| \left|\frac{1}{\gamma _{\Delta }\left( \eta ^1_{\Delta }|\eta ^1_{\Delta ^c}\right) \gamma _{\Delta }\left( \eta ^2_{\Delta }|\eta ^2_{\Delta ^c}\right) } \right| \\&\qquad \times \left|\gamma _{\Delta }\left( \xi _{\Delta }|\eta ^1_{\Delta ^c}\right) - \gamma _{\Delta }\left( \xi _{\Delta }|\eta ^2_{\Delta ^c}\right) \right| \left|\gamma _{\Delta }\left( \eta ^1_{\Delta }|\eta ^1_{\Delta ^c}\right) +\gamma _{\Delta }\left( \eta ^2_{\Delta }|\eta ^2_{\Delta ^c}\right) \right| \\&\qquad + \left|\frac{\gamma _{\Delta }\left( \xi _{\Delta }|\eta ^2_{\Delta ^c}\right) }{\gamma _{\Delta }\left( \eta ^2_{\Delta }|\eta ^2_{\Delta ^c}\right) } \right| \left|c_{\Delta }\left( \xi _{\Delta }\eta ^1_{\Delta ^c}, \eta ^1_{\Delta }\right) -c_{\Delta }\left( \xi _{\Delta }\eta ^2_{\Delta ^c}, \eta ^2_{\Delta }\right) \right| \\&\quad \le \frac{1}{2\delta ^2}e^{2R} K(c)K(\gamma )\left|\gamma _{\Delta }\left( \xi _{\Delta }|\eta ^1_{\Delta ^c}\right) - \gamma _{\Delta }\left( \xi _{\Delta }|\eta ^2_{\Delta ^c}\right) \right| + \frac{1}{\delta }e^R \left|c_{\Delta }\left( \xi _{\Delta }\eta ^1_{\Delta ^c}, \eta ^1_{\Delta }\right) -c_{\Delta }\left( \xi _{\Delta }\eta ^2_{\Delta ^c}, \eta ^2_{\Delta }\right) \right|. \end{aligned}$$

Now, by integrating this pointwise difference of the densities over \(\xi _\Delta \), we obtain via all of the other assumptions that

$$\begin{aligned} \sup _{x \in S}\sum _{\Delta \ni x}\sum _{y \ne x} \delta _y{\hat{c}}_\Delta < \infty . \end{aligned}$$

But this is precisely \(\mathbf {(L3)}\) and the proof is finished. \(\square \)

With these two intermediate results at hand, we can now show that \(\hat{\mathscr {L}}\) is indeed the generator of the time-reversal of \((\eta _t)_{t \ge 0}\) (with respect to the time-stationary measure \(\mu \)).

Proof of Proposition 2.5

It only remains to show that for all \(f,g \in D(\Omega )\) we have

$$\begin{aligned} \int _\Omega f(\omega ) \mathscr {L}g(\omega ) \mu (d\omega ) = \int _\Omega \left( \hat{\mathscr {L}}f\right) (\omega )g(\omega )\mu (d\omega ), \end{aligned}$$

since then the claimed time-reversal duality follows from Lemma A.4.

For this, we first note that it suffices to show that the duality relation for the generators holds for all local functions \(f,g: \Omega \rightarrow \mathbb {R}\). Indeed, if it holds for all pairs of local functions, we can then extend it to functions with bounded total oscillation by using the estimates from Lemma 2.1 and dominated convergence. Therefore, let fg be two local functions and let \(\Lambda \Subset S\) be sufficiently large such that both f and g only depend on coordinates in \(\Lambda \). By first applying Lemma 3.3 and then using that \(\mu \) is time-stationary with respect to the Markovian dynamics generated by \(\mathscr {L}\), we see that

$$\begin{aligned}&\int _{\Omega }f(\omega )\mathscr {L}g(\omega )\mu (d\omega ) - \int _{\Omega }\left( \hat{\mathscr {L}}f(\omega )\right) g(\omega )\mu (d\omega ) \\&\quad = \sum _{\Delta \cap \Lambda \ne \emptyset }\int _{\Omega _\Delta } \int _{\Omega }c_{\Delta }(\omega , \xi _{\Delta })[f\cdot g(\xi _{\Delta }\omega _{\Delta ^c})- f\cdot g(\omega )]\mu (d\omega )\lambda _\Delta (d\xi _\Delta ) \nonumber \\&\quad = \int _{\Omega }\mathscr {L} (f\cdot g) (\omega )\mu (d\omega ) = 0, \end{aligned}$$

which finishes the proof. \(\square \)

4 Trajectorial Decay of \(\Phi \)-Entropies

In this section we use the time-reversed process and a martingale argument to prove Theorem 2.6.

4.1 The Time-Dependent Martingale Property

The main technical tool will be the following lemma which can be seen as an extension of [14, Lemma IV.20.12] to our setting.

Lemma 4.1

Let \(\mathscr {L}\) be the generator of an interacting particle system \((\eta (s))_{s \ge 0}\) such that its transition rates satisfy \((\textbf{L1})-(\textbf{L3})\) and let \(\mu \) be a time-stationary measure with respect to \(\mathscr {L}\). Then, for all \(f:[0,\infty ) \times \Omega \rightarrow \mathbb {R}\) such that

  1. i.

    \(f(\cdot , \eta ) \in C^1([0,\infty ))\) for all \(\eta \in \Omega \) and

  2. ii.

    for all \(T > 0\) it holds that

    $$\begin{aligned} \sup _{0 \le t \le T}{\left| \hspace{-1.0625pt}\left| \hspace{-1.0625pt}\left| f(t,\cdot ) \right| \hspace{-1.0625pt}\right| \hspace{-1.0625pt}\right| } < \infty , \end{aligned}$$

the process defined by

$$\begin{aligned} f(t, \eta (t)) - \int _0^t (\partial _s + \mathscr {L})f(s,\eta (s))ds \end{aligned}$$

is a martingale with respect to the filtration \(\mathcal {G}_t:= \sigma (\eta (u): 0 \le u \le t)\).

The proof of this lemma is not difficult but hard to find in the existing literature, we therefore give it in some detail.

Proof

For functions f as above, we define

$$\begin{aligned} M(s) := f(s,\eta (s)) - \int _0^s (\partial _u + \mathscr {L})f(u,\eta (u))du, \quad s \ge 0 \end{aligned}$$

and denote by \((P_t)_{t \ge 0}\) the Markov semigroup generated by \((\partial _s + \mathscr {L})\). Then, for \(s<t\), the Markov property and the elementary identity

$$\begin{aligned} \frac{d}{dt}P_t = P_t(\partial _t + \mathscr {L}) = (\partial _t + \mathscr {L})P_t, \end{aligned}$$

give us

$$\begin{aligned}&{\mathbb {E}}\left[ f(t, \eta (t)) - \int _0^t (\partial _u + \mathscr {L})f(u,\eta (u))du\Big |\mathcal {G}_s \right] \\&\quad = P_{t-s}f(s, \eta (s)) - \int _0^s (\partial _u + \mathscr {L})f(u, \eta (u))du - \int _s^t P_{u-s}(\partial _u + \mathscr {L})f(s, \eta (s))du \\&\quad = P_{t-s}f(s, \eta (s)) - \int _0^s (\partial _u + \mathscr {L})f(u, \eta (u))du - \int _s^t \frac{d}{du}P_{u-s}f(s, \eta (s))du \\&\quad = f(s,\eta (s)) - \int _0^s (\partial _u + \mathscr {L})f(u,\eta (u))du. \end{aligned}$$

This shows that the process \((M(s))_{s \ge 0}\) is indeed a martingale. \(\square \)

This abstract tool now lets us establish the analogue of the first step in the case of a finite state space considered in Sect. 1.2.

Proposition 4.2

Let \((\Omega , \mathcal {A}, {\mathbb {P}})\) be a probability space on which the interacting particle system \((\eta (s))_{s \ge 0}\) is defined. Denote the generator of \((\eta (s))_{s \ge 0}\) by \(\mathscr {L}\) and assume that the rates satisfy (R1)–(R5) and assume that under \(\mathbb {P}\) we have \(\eta (0) \sim \mu \), where \(\mu \) is a time-stationary measure for the corresponding Markov semigroup \((P_t)_{t \ge 0}\) on \(C(\Omega )\) that is generated by \(\mathscr {L}\) and that \(\mu \) is a Gibbs measure with respect to a specification \(\gamma \) that satisfies (S1)–(S4). Then, for all \(f \in D(\Omega )\) and \(T > 0\), the process defined by

$$\begin{aligned} P_{T-s}f(\eta (T-s)), \quad 0 \le s \le T, \end{aligned}$$

is a \((({\hat{\mathcal {G}}}_t)_{0 \le t \le T}, {\mathbb {P}})\)-martingale, where \({\hat{\mathcal {G}}}_t = \sigma (\eta (T-s): \ 0 \le s \le t)\).

Proof

Note that by Lemma 2.1 we can apply Lemma 4.1 to the function

$$\begin{aligned}{}[0,T] \times \Omega \ni (s, \eta ) \mapsto P_{T-s} f(\eta ). \end{aligned}$$

But since we have by the chain rule

$$\begin{aligned} \partial _s P_{T-s}f = -\hat{\mathscr {L}}P_{T-s}f, \end{aligned}$$

the correction term cancels out and we obtain the claimed martingale property. \(\square \)

4.2 Trajectorial Decay of \(\Phi \)-Entropies

With this preliminary result in place, we can now come to the proof of our main result.

Proof of Theorem 2.6

Submartingale property: By Jensen’s inequality and Proposition 4.2 we immediately see that the process \((L^{\Phi ,f}(t))_{t \ge 0}\), as defined in (2.3), is a submartingale. Doob–Meyer decomposition: Here we want to apply Lemma 4.1 to the function

$$\begin{aligned} g: [0,T] \times \Omega \ni (s, \eta ) \mapsto \Phi (P_{T-s}f) \in \mathbb {R}. \end{aligned}$$

Via the chain rule we see that

$$\begin{aligned} \partial _s g(s, \cdot ) = \partial _s \Phi (P_{T-s}f(\cdot )) = -\Phi '(P_{T-s}f(\cdot ))\hat{\mathscr {L}}P_{T-s}f(\cdot ). \end{aligned}$$

Applying the generator \(\hat{\mathscr {L}}\) for fixed \(s \in [0,T]\) yields

$$\begin{aligned} \hat{\mathscr {L}}g(s, \eta ) = \sum _{\Delta \Subset S}\int _{\Omega _\Delta }{\hat{c}}_\Delta (\eta , \xi _\Delta )\left[ \Phi (P_{T-s}f(\xi _\Delta \eta _{\Delta ^c}))-\Phi (P_{T-s}f(\eta )) \right] \lambda _\Delta (d\xi _\Delta ). \end{aligned}$$

By putting these two ingredients together and using the previously introduced notation for the Bregman \(\Phi \)-divergence we obtain

$$\begin{aligned} (\partial _s + \hat{\mathscr {L}})g(s,\eta ) = \sum _{\Delta \Subset S}\int _{\Omega _\Delta }{\hat{c}}_\Delta (\eta , \xi _\Delta )\text {div} ^{\Phi }(P_{T-s}f(\xi _\Delta \eta _{\Delta ^c}) |P_{T-s}f(\eta ))\lambda _\Delta (d\xi _\Delta ). \end{aligned}$$

The claimed Doob–Meyer decomposition (2.4) of the submartingale \(L^{\Phi ,f}\) now follows from Lemma 4.1. \(\square \)