Skip to main content

Is the Boltzmann Equation Reversible? A Large Deviation Perspective on the Irreversibility Paradox


We consider the kinetic theory of dilute gases in the Boltzmann–Grad limit. We propose a new perspective based on a large deviation estimate for the probability of the empirical distribution dynamics. Assuming Boltzmann molecular chaos hypothesis (Stosszahlansatz), we derive a large deviation rate function, or action, that describes the stochastic process for the empirical distribution. The quasipotential for this action is the negative of the entropy when the conservation laws are verified, as should be expected. While the Boltzmann equation appears as the most probable evolution, corresponding to a law of large numbers, the action describes a genuine reversible stochastic process for the empirical distribution, in agreement with the microscopic reversibility. As a consequence, this large deviation perspective gives the expected meaning to the Boltzmann equation and explains its irreversibility as the natural consequence of limiting the physical description to the most probable evolution. More interestingly, it also quantifies the probability of any dynamical evolution departing from solutions of the Boltzmann equation. This picture is fully compatible with the heuristic classical view of irreversibility, but makes it much more precise in various ways. We also explain that this large deviation action provides a natural gradient structure for the Boltzmann equation.

This is a preview of subscription content, access via your institution.


  1. Bertini, L., De Sole, A., Gabrielli, D., Jona-Lasinio, G., Landim, C.: Macroscopic fluctuation theory. Rev. Mod. Phys. 87(2), 593 (2015)

    ADS  MathSciNet  MATH  Google Scholar 

  2. Binney, J., Tremaine, S.: Galactic Dynamics, p. 747. Princeton University Press, Princeton, NJ (1987)

    MATH  Google Scholar 

  3. Bodineau, T., Gallagher, I., Saint-Raymond, L.: From hard sphere dynamics to the stokes-fourier equations: an analysis of the boltzmann-grad limit. Ann. PDE 3(1), 2 (2017)

    MathSciNet  MATH  Google Scholar 

  4. Bodineau, T., Gallagher, I., Saint-Raymond, L., Simonella, S.: Fluctuation theory in the boltzmann-grad limit. J. Stat. Phys. (2020).

    MathSciNet  MATH  Google Scholar 

  5. Bogoliubov, N.: Problems of a dynamical theory in statistical physics. gostekhisdat, moscow (1946)(in russian). english translation in de boer, j., uhlenbeck, ge (eds.) studies in statistical mechanics, vol. 1 (1962)

  6. Boltzmann, L.: Entgegnung auf die wärmetheoretischen betrachtungen des hrn. e. zermelo. Ann. Phys. 293(4), 773–784 (1896)

    MATH  Google Scholar 

  7. Boltzmann, L.: Vorlesungen über gastheorie, 2 vols. Barth, Leipzig (1896)

  8. Brush, S.G.: Kinetic Theory: The Nature of Gases and of Heat, vol. 1. Elsevier, Amsterdam (2013)

    Google Scholar 

  9. Brush, S.G.: Kinetic Theory: Irreversible Processes. Elsevier, Amsterdam (2016)

    Google Scholar 

  10. Cercignani, C.: The boltzmann equation. The Boltzmann Equation and Its Applications, pp. 40–103. Springer, New York (1988)

    MATH  Google Scholar 

  11. Chapman, S., Cowling, T.G., Burnett, D.: The Mathematical Theory of Non-uniform Gases: An Account of the Kinetic Theory of Viscosity, Thermal Conduction and Diffusion in Gases. Cambridge University Press, Cambridge (1990)

    Google Scholar 

  12. Dawson, D.A., Gartner, J.: Large deviations from the mckean-vlasov limit for weakly interacting diffusions. Stochastics (1987)

  13. Derrida, B., Lebowitz, J.L.: Exact large deviation function in the asymmetric exclusion process. Phys. Rev. Lett. 80, 209–213 (1998).

    ADS  MathSciNet  MATH  Google Scholar 

  14. Desvillettes, L., Villani, C.: On the trend to global equilibrium for spatially inhomogeneous kinetic systems: the boltzmann equation. Invent. Math. 159(2), 245–316 (2005)

    ADS  MathSciNet  MATH  Google Scholar 

  15. Eyink, G.L., Spohn, H.: Space-time invariant states of the ideal gas with finite number, energy, and entropy density. Transl. Am. Math. So. 2(198), 71–90 (2000)

    MathSciNet  MATH  Google Scholar 

  16. Feng, J., Kurtz, T.G.: Large Deviations for Stochastic Processes, vol. 131. American Mathematical Society, Providence, RI (2006)

    MATH  Google Scholar 

  17. Feynman, R.: The Character of Physical Law. 1965. Cox and Wyman Ltd., London (1967)

    Google Scholar 

  18. Freidlin, M.I., Wentzell, A.D.: Random Perturbations of Dynamical Systems. Springer, Berlin, New York (1984)

    MATH  Google Scholar 

  19. Freidlin, M.I., Wentzell, A.D.: Random Perturbations of Dynamical Systems, 3rd edn. Springer-Verlag, New York (2012)

    MATH  Google Scholar 

  20. Frisch, U., Hasslacher, B., Pomeau, Y.: Lattice-gas automata for the Navier–Stokes equation. Phys. Rev. Lett. 56(14), 1505 (1986)

    ADS  Google Scholar 

  21. Gallagher, I., Saint Raymond, L., Texier, B.: From newton to boltzmann: the case of hard-spheres and short-range potentials. European Mathematical Society p. 150 (2014)

  22. Gardiner, C.W.: Handbook of stochastic methods for physics, chemistry and the natural sciences. Springer Series in Synergetics, Berlin: Springer, |c1994, 2nd ed. 1985. Corr. 3rd printing 1994 (1994)

  23. Gaspard, P.: Time-reversal symmetry relation for nonequilibrium flows ruled by the fluctuating boltzmann equation. Physica A 392(4), 639–655 (2013)

    ADS  MathSciNet  Google Scholar 

  24. Goldstein, S., Lebowitz, J.L.: On the (Boltzmann) entropy of non-equilibrium systems. Physica D 193, 53 (2004).

    ADS  MathSciNet  MATH  Google Scholar 

  25. Kac, M.: Foundations of kinetic theory. In: Proceedings of The third Berkeley symposium on mathematical statistics and probability, vol. 3, pp. 171–197. University of California Press Berkeley and Los Angeles, California (1956)

  26. Landau, L.D., Lifshitz, E.M.: Statistical Physics. Vol. 5 of the Course of Theoretical Physics. Pergamon Press (1980)

  27. Lanford, O.E.: Time evolution of large classical systems. Dynamical Systems, Theory and Applications, pp. 1–111. Springer, New York (1975)

    Google Scholar 

  28. Lebowitz, J.L.: Boltzmann’s entropy and time’s arrow. Phys. Today 46, 32–32 (1993)

    Google Scholar 

  29. Lebowitz, J.L.: Time’s arrow and boltzmann’s entropy. Scholarpedia 3(4), 3448 (2008)

    ADS  Google Scholar 

  30. Lu, X., Mouhot, C.: On measure solutions of the boltzmann equation, part II: Rate of convergence to equilibrium. J. Differ. Equ. 258(11), 3742–3810 (2015)

    ADS  MathSciNet  MATH  Google Scholar 

  31. Mielke, A., Peletier, M.A., Renger, D.M.: On the relation between gradient flows and the large-deviation principle, with applications to markov chains and diffusion. Potential Anal. 41(4), 1293–1327 (2014)

    MathSciNet  MATH  Google Scholar 

  32. Mischler, S., Mouhot, C.: Kac’s program in kinetic theory. Invent. Math. 193(1), 1–147 (2013)

    ADS  MathSciNet  MATH  Google Scholar 

  33. Otto, F.: The geometry of dissipative evolution equations: the porous medium equation (2001)

  34. Pulvirenti, M., Saffirio, C., Simonella, S.: On the validity of the Boltzmann equation for short range potentials. Rev. Math. Phys. 26(02), 1450001 (2014)

    MathSciNet  MATH  Google Scholar 

  35. Rezakhanlou, F.: Large deviations from a kinetic limit. Ann. Probab. 26(3), 1259–1340 (1998)

    MathSciNet  MATH  Google Scholar 

  36. Saint Raymond, L.: Hydrodynamic Limits of the Boltzmann Equation. Springer Science & Business Media, Berlin (2009)

    MATH  Google Scholar 

  37. Spohn, H.: Large Scale Dynamics of Interacting Particles. Springer, New-York (2002)

    MATH  Google Scholar 

  38. Thomson, W.: The kinetic theory of the dissipation of energy. Proc. R. Soc. Edinb. 8, 325–334 (1875)

    MATH  Google Scholar 

  39. Villani, C.: Optimal Transport: Old and New, vol. 338. Springer Science & Business Media, Berlin (2008)

    MATH  Google Scholar 

Download references


This work has been initiated following discussions with L. Saint Raymond. I thank her for very fruitful discussions. I thank C. Villani for pointing me to the works of F. Rezakhanlou, in 2015, after I derived this large deviation principle from a chaotic hypothesis. I thank G. Eyink, O. Feliachi, J. Reygner and E. Woillez for comments on this manuscript. The research leading to these results has received funding from the European Research Council under the European Union’s seventh Framework Programme (FP7/2007-2013 Grant Agreement No. 616811. In its last stage, this work was supported by a Subagreement from the Johns Hopkins University with funds provided by Grant No. 663054 from Simons Foundation. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of Simons Foundation or the Johns Hopkins University.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Freddy Bouchet.

Additional information

Communicated by Herbert Spohn.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Large Deviation Rate Functions from the Infinitesimal Generator of a Continuous Time Markov Process

Infinitesimal Generator of a Continuous Time Markov Process

We recall the notion of the infinitesimal generator of a continuous time Markov process. We consider the continuous time Markov processes \(\left\{ X(t)\right\} _{0\le t\le T}\), for instance \(X(t)\in {\mathbb {R}}^{n}\). The infinitesimal generator acts on the test function \(\phi :{\mathbb {R}}^{n}\rightarrow {\mathbb {R}}\) and is defined by

$$\begin{aligned} G\left[ \phi \right] (x)=\lim _{t\downarrow 0}\frac{{\mathbb {E}}_{x} \left[ \phi (X(t))\right] -\phi (x)}{t}. \end{aligned}$$

For example, for a diffusion \(\hbox {d}x=R(x)\hbox {d}t+\sqrt{2}\hbox {d}W_{t}\), the infinitesimal generator is \(G\left[ \phi \right] (x)=R(x)\nabla \phi +\Delta \phi \), the adjoint of the Fokker–Planck equation.

As an example, let us compute the infinitesimal generator for the radioactive decay of a single particle, defined in Sect. 3.2. If \(X=1\) at time \(t=0\), the probability that \(X=1\) at time t, for small t, is \(1-\lambda t\) up to terms of order two in t. The probability that \(X=0\) at time t, for small t, is \(\lambda t\), up to terms of order two in t. If \(X=0\) at time \(t=0\), it remains zero for all time. Then

$$\begin{aligned} G\left[ \phi \right] (1)=\lambda \left[ \phi (0)-\phi (1)\right] . \end{aligned}$$


$$\begin{aligned} G\left[ \phi \right] (0)=0. \end{aligned}$$

We can write

$$\begin{aligned} G\left[ \phi \right] (x)=\lambda x(\phi (0)-\phi (1)). \end{aligned}$$

The generator is \((\phi (0)-\phi (1))\) the value of the function after the jump minus its value before the jump multiplied by the jump rate \(\lambda \).

In the example of the radioactive decay, \(X_{N}(t)\) (11) is also a continuous time Markov process. We can compute directly its infinitesimal generator by studying all possible changes of the variable \(X_{N}\). We then obtain

$$\begin{aligned} G_{N}\left[ \phi \right] (x)=N\lambda x\left[ \phi \left( x-\frac{1}{N}\right) -\phi \left( x\right) \right] , \end{aligned}$$

where \(x=n/N\) with n any integer number with \(1\le n\le N\), and \(\phi \) is a real valued function on \(\left[ 0,1\right] \). We also have \(G_{N}\left[ \phi \right] (0)=0\). The generator has on contribution per jump: \((\phi (x-1/N)-\phi (x))\) the value of the function after the jump minus its value before the jump multiplied by the jump rate \(N\lambda x\). The jump rate in this case is a single particle jump rate \(\lambda \) multiplied by the density x multiplied by the total particle number N.

Heuristic Derivation of the Large Deviation Rate Functions from the Infinitesimal Generator of a Continuous Time Markov Process

We give in this section a heuristic derivation of the relation between (8), (9), and (10).

Let us consider trajectories \(\left\{ X_{\epsilon }(t)\right\} _{0\le t<\infty }\) starting at x. We denote \(P_{t}\left( x,{\dot{x}}\right) \) the probability that the Newton difference quotient \(\frac{X(t)-x}{t}\) be equal to \({\dot{x}}\) after a time t:

$$\begin{aligned} P_{t,\epsilon }\left( x,{\dot{x}}\right) \equiv {\mathbb {E}}_{x} \left[ \delta \left( \frac{X_{\epsilon }(t)-x}{t}-{\dot{x}}\right) \right] . \end{aligned}$$

Let us first assume that for small time t, \(P_{t,\epsilon }\) verifies the large deviation estimate

$$\begin{aligned} P_{t,\epsilon }\left( x,{\dot{x}}\right) \underset{\epsilon \downarrow 0}{\asymp }\exp \left( -\frac{tL\left( x,{\dot{x}}\right) }{\epsilon }\right) \end{aligned}$$

(more precisely, we take first the limit \(\epsilon \downarrow 0\): \(L\left( x,{\dot{x}}\right) =-\lim _{t\downarrow 0}\lim _{\epsilon \downarrow 0}\epsilon \log P_{t,\epsilon }\left( x,{\dot{x}}\right) /t\)). Then decomposing the path X(t) in small subpaths, and using the Markov property, we can construct a path integral and the large deviation (10) holds. It is thus sufficient to prove the large deviation result (43) holds in order to conclude that (10) is true.

In order to assess the large deviation result (43), we can study the cumulant generating function of \(P_{t,\epsilon }\). A sufficient condition for (43) to hold is then given by Gärtner–Ellis theorem. If for all p, the limit

$$\begin{aligned} \begin{aligned} H(x,p)&=\lim _{t\downarrow 0}\lim _{\epsilon \downarrow 0}\frac{\epsilon }{t}\log {\mathbb {E}}_{x}\left[ \exp \left( \frac{t}{\epsilon }\frac{p. \left( X_{\epsilon }(t)-x\right) }{t}\right) \right] \\&=\lim _{t\downarrow 0}\lim _{\epsilon \downarrow 0}\frac{\epsilon }{t}\log \left\{ {\mathbb {E}}_{x}\left[ \exp \left( \frac{p.X_{\epsilon }(t)}{\epsilon }\right) \right] \exp \left( -\frac{p.x}{\epsilon }\right) \right\} \end{aligned} \end{aligned}$$

exists and H is everywhere differentiable then (10) will hold with L given by (9). Now, using the definition of the infinitesimal generator (41) we have

$$\begin{aligned} \frac{1}{t}\log \left\{ {\mathbb {E}}_{x}\left[ \exp \left( \frac{p.X_{\epsilon }(t)}{\epsilon } \right) \right] \exp \left( -\frac{px}{\epsilon }\right) \right\} \\ \begin{aligned}&=\frac{1}{t}\log \left\{ 1+tG_{\epsilon }\left[ \exp \left( \frac{px}{\epsilon }\right) \right] \exp \left( -\frac{px}{\epsilon }\right) +o(t)\right\} \\&=G_{\epsilon }\left[ \exp \left( \frac{px}{\epsilon }\right) \right] \exp \left( -\frac{px}{\epsilon }\right) +o(1). \end{aligned} \end{aligned}$$

Hence if the limit (8) exists then the large deviation estimate (10) holds.


The example of locally finitely indivisible processes is discussed in Freidlin-Wentzell textbook. This includes the diffusion and Poisson process cases discussed below.

Diffusion with small noise. We consider the diffusion

$$\begin{aligned} \hbox {d}X_{\epsilon }=R(X_{\epsilon })\hbox {d}t+\sqrt{2\epsilon } \sigma (X_{\epsilon })\hbox {d}W_{t} \end{aligned}$$

where \(X_{\epsilon }\in {\mathbb {R}}^{n}\), R(.) is a vector field, \(\sigma \) a \(n\times n\) matrix. We denote \(a(x)\equiv \sigma (x)\sigma (x)^{T}\), where T stands for the transposition. The infinitesimal generator is

$$\begin{aligned} G_{\epsilon }\left[ \phi \right] =R.\nabla \phi +\epsilon a:\nabla \nabla \phi , \end{aligned}$$

where  :  is the symbol for the contraction of two second order tensors. Then it is easily checked that, from the definitions (8) and (9),

$$\begin{aligned} H(x,p)=p.ap+p.R. \end{aligned}$$

Then, whenever a is invertible,

$$\begin{aligned} L(x,{\dot{x}})=\frac{1}{4}\left( {\dot{x}}-R\right) .a^{-1}\left( {\dot{x}}-R\right) . \end{aligned}$$

H and L are the classical Hamiltonian and Lagrangian for a diffusion with small noise.

Poisson process The case of a Poisson process is discussed in the book of Freidlin–Wentzell. This textbook considers a single Poisson process, rescaled in order to have an infinitesimal generator that fits with the asymptotics leading to a large deviation estimate, as in equation (8). We rather consider N independent Poisson processes \(\left\{ x_{n}(t)\right\} _{1\le n\le N}\) for which we will look at the large deviations for their empirical average. This case is more in line with what will be needed in this paper. The value \(x_{n}\) of each of these Poisson processes is increased by 1 at a rate 1 (the probability of \(x_{n}\) to increase by a jump equal to one during an infinitesimal time interval dt is dt).

We consider the average

$$\begin{aligned} X_{N}(t)=\frac{1}{N}\sum _{n=1}^{N}x_{n}(t). \end{aligned}$$

During an infinitesimal interval dt, the probability for \(X_{N}(t)\) to increases by an amount 1/N is Ndt. The infinitesimal generator is thus

$$\begin{aligned} G_{N}\left[ \phi \right] (x)=N\left[ \phi \left( x+\frac{1}{N}\right) -\phi \left( x\right) \right] . \end{aligned}$$

Using (8) with \(\epsilon =1/N\), we deduce that the process \(\left\{ X_{N}(t)\right\} \) verifies a large deviation principle with an action characterized by the Hamiltonian

$$\begin{aligned} H(x,p)=\exp (p)-1. \end{aligned}$$

Quasipotential, Relaxation Paths, Fluctuation Paths, and Conservation Laws

Some Properties of the Lagrangian and of the Hamiltonian

L is a large deviation rate function for the variable \({\dot{x}}\). By definition, a large deviation rate function has zero as its minimum value. We thus have

$$\begin{aligned} \inf _{{\dot{x}}}L(x,{\dot{x}})=0=L\left( x,R(x)\right) , \end{aligned}$$

where for the second equality we assume that the infimum is achieved at \({\dot{x}}=R(x)\). We also have

$$\begin{aligned} L(x,{\dot{x}})\ge 0. \end{aligned}$$

From the definition of the Hamiltonian H as a rescaled cumulant generating function (44), we can conclude that for any xH is a convex function of the variable p and that

$$\begin{aligned} H(x,0)=0. \end{aligned}$$

The Legendre–Fenchel relation between L and H (9) implies that for any x and p

$$\begin{aligned} p{\dot{x}}\le L(x,{\dot{x}})+H(x,p) \end{aligned}$$

from which, using (47) we verify again (46).

Relaxation Paths

The relaxation paths \(X_{r}(t,x)\) are the most probable paths of the dynamics described by the action (13), starting from a state x at time \(t=0\). They thus minimize the action. From the definition of R (45), as \(L\ge 0\) and \(L\left( x,R(x)\right) =0\), relaxation paths thus solve

$$\begin{aligned} {\dot{X}}_{r}=R(X_{r}), \end{aligned}$$

with the initial condition \(X_{r}(0,x)=x\).

Moreover looking at the condition for the stationarity of the variational problem \(0=L(X,R(X))=\sup _{p}\left[ p.R(X)-H\left( X,p\right) \right] \), we conclude that the optimal is achieved for \(p=0\) and that

$$\begin{aligned} R(X_{r})=\frac{\partial H}{\partial p}(X_{r},0). \end{aligned}$$

In the following, in order to keep the discussion simple, we assume that the relaxation dynamics has a single global point attractor \(x_{0}\), with \(R(x_{0})=0\). The generalization to multiple attractors or to other types of attractors could be considered following the classical discussion (see for instance [18]). As we will see, this hypothesis will be verified for the Boltzmann equation.


We consider now the stationary distribution \(P_{s}\) of the processes \(X_{\varepsilon }\) which dynamics follows the large deviation principle (10). We assume that the stationary distribution also follows a large deviation principle:

$$\begin{aligned} P_{s}(x)\equiv {\mathbb {E}}\left[ \delta \left( X_{\epsilon }-x\right) \right] \underset{\epsilon \downarrow 0}{\asymp }\exp \left( -\frac{U(x)}{\epsilon }\right) , \end{aligned}$$

where U is called the quasipotential. In the case when the relaxation dynamics has a single global attractor \(x_{0}\), the quasipotential is characterized by the variational problem

$$\begin{aligned} \begin{aligned} U(x)&=\inf _{\left\{ X(t)\left| X(-\infty )=x_{0}\,\,\hbox {and}\,\,X(0) =x\right. \right\} }\int _{-\infty }^{0}\hbox {d}t\,L(X,{\dot{X}})\\&=\inf _{\left\{ X(t),P(t)\left| X(-\infty )=x_{0}\,\,\hbox {and}\,\,X(0) =x\right. \right\} }\int _{-\infty }^{0}\hbox {d}t\,\left[ P{\dot{X}}-H(X,P)\right] . \end{aligned} \end{aligned}$$

It is a classical result, that can be found for instance in any textbook of classical mechanics, that the minimum of a variational problem with a Lagrangian solves a Hamilton–Jacobi equation. Then the quasipotential U solves the stationary Hamilton–Jacobi equation

$$\begin{aligned} H(x,\nabla U)=0. \end{aligned}$$

Fluctuation Paths

The fluctuation paths are the minimizers of the quasipotential variational problem (52). They are very important as they describe the most probable path starting from the attractor \(x_{0}\) and leading to a fluctuation x.

The fluctuation paths define a flow parametrized by x,  that we denote \(X_{f}(t,x)\) (the path evolution) and \(P_{f}(t,x)\) (the conjugated momentum evolution). They verify the Euler-Lagrange equations

$$\begin{aligned} \left\{ \begin{array}{ccc} \dot{X_{f}} &{} = &{} \frac{\partial H}{\partial p}\left( X_{f},P_{f}\right) \\ {\dot{P}}_{f} &{} = &{} -\frac{\partial H}{\partial x}\left( X_{f},P_{f}\right) , \end{array}\right. \end{aligned}$$

with the boundary conditions \(X_{f}(-\infty ,x)=x_{0}\) and \(X_{f}(0,x)=x\). As any fluctuation path converges to \(x_{0}\) as \(t\downarrow -\infty \), we have \(R(x_{0})=\frac{\partial H}{\partial p}(x_{0},0)=0\). As H is a convex function of the variable p, the equation \(\frac{\partial H}{\partial P}(x_{0},p)=0\) can have at most one root, from which we deduce that \(\lim {}_{t\downarrow -\infty }P_{f}(t,x)=0\). Moreover, Hamilton’s equations (54) conserve the Hamiltonian H along their dynamics. From the value of H for \(t\downarrow -\infty \), we deduce that along the fluctuation paths \(H(X_{f},P_{f})=0\). From the variational characterization of the quasipotential (52), we then deduce that \(U(x)=\int _{-\infty }^{0}\hbox {d}t\,P_{f}(t,x){\dot{X}}_{f}(t,x)\). It then follows that \(\nabla U(x)=P_{f}(0,x)\). Using the flow property, it is clear that this relation is valid all along the fluctuation paths. Then for any x and t

$$\begin{aligned} \nabla U(X_{f}(t,x))=P_{f}(t,x). \end{aligned}$$

Using this result and (54), we deduce that the fluctuation paths solve the first order equation

$$\begin{aligned} {\dot{X}}_{f}=F(X_{f})\equiv \frac{\partial H}{\partial p}\left( X_{f},\nabla U(X_{f})\right) , \end{aligned}$$

where the second equality defines the fluctuation path vector field F.

Decay (resp. increase) of the Quasi Potential Along the Relaxation ( resp. fluctuation) Paths

We now prove that the value of U characterizes the relaxation towards the attractor \(x_{0}\): any relaxation path decreases U monotonously. Indeed, from (49) and (50), we have

$$\begin{aligned} \begin{aligned} \frac{\hbox {d}U}{\hbox {d}t}(X_{r})&=\frac{\partial H}{\partial p}\left( X_{r},0\right) .\nabla U(X_{r})\\&=H(X_{r},0)-H(X_{r},\nabla U(X_{r}))+\frac{\partial H}{\partial p}\left( X_{r},0\right) .\nabla U(X_{r})\le 0 \end{aligned} \end{aligned}$$

where we have used (47) and the Hamilton–Jacobi equation (53) to write the second equality. The inequality is a consequence of the convexity of H with respect to its second variable. In case of strict convexity, which will be often the case, the equality holds if and only if \(\nabla U(X_{r})=0\).

Moreover, the condition: for any \(\alpha \in [0,1]\)

$$\begin{aligned} \left( \nabla U\right) ^{T}\frac{\partial ^{2}H}{\partial p^{2}}\left( X_{r},\alpha \nabla U\right) \nabla U\ge CU \end{aligned}$$

implies a convergence to equilibrium faster than \(\hbox {e}^{-Ct}\). The condition that the quasipotential is uniformly convex in the norm of the second variation of H: for any p

$$\begin{aligned} p^{T}\frac{\partial ^{2}H}{\partial p^{2}}\left( x,\alpha \nabla U\right) \hbox {Hess}U(x)p\ge Cp^{T}p \end{aligned}$$

implies (56). In the case of the sum of N independent particles, where each follows a diffusion, the second variations of H are the Wasserstein distance and the condition (57) is a log-Sobolev inequality.

We now prove similarly that U increases monotonously along the fluctuation paths. Using (55), we have

$$\begin{aligned} \begin{aligned} \frac{\hbox {d}U}{\hbox {d}t}(X_{f})&=\frac{\partial H}{\partial P} \left( X_{f},\nabla U(X_{f})\right) .\nabla U(X_{f})\\&=H(X_{f},0)-H(X_{f},\nabla U(X_{f}))+\frac{\partial H}{\partial P}\left( X_{f},\nabla U(X_{f})\right) .\nabla U(X_{f})\ge 0, \end{aligned} \end{aligned}$$

where the second equality is a consequence of the Hamilton–Jacobi equation (53) and of (47), and the inequality is again a consequence of the convexity of H with respect to its second argument. Again if H is strictly convex the inequality is strict whenever \(\nabla U(X_{r})\ne 0\).

Conservation Laws

It may happen that the stochastic process has a conservation law C: for any \(\epsilon \) and t, \(C\left( X_{\epsilon }(t)\right) =C_{0}\). Then, \({\dot{X}}_{\epsilon }(t).\frac{\partial C}{\partial x}(X_{\epsilon }(t))=0\). As a consequence, from the definition of the Lagrangian (43), we deduce that

$$\begin{aligned} L(x,{\dot{x}})=+\infty \,\,\,\hbox {if}\,\,\,{\dot{x}}.\frac{\partial C}{\partial x}(x)\ne 0. \end{aligned}$$

At the level of the Hamiltonian, using the Legendre–Fenchel transform (9), we conclude that the conservation law translates to the continuous symmetry property

$$\begin{aligned} \hbox {for any}\,\,x,p\,\,\hbox {and}\,\,\alpha ,\,\,\,H\left( x,p+\alpha \frac{\partial C}{\partial x}\right) =H(x,p) \end{aligned}$$

or equivalently

$$\begin{aligned} \hbox {for any}\,\,x\,\,\hbox {and}\,\,p,\,\,\,\frac{\partial H}{\partial p}\left( x,p\right) .\frac{\partial C}{\partial x}(x)=0. \end{aligned}$$

Then as a function of its second variable, H(x, .) is flat in the direction \(\frac{\partial C}{\partial x}\).

As far as the Hamilton–Jacobi equation is concerned, this means that only the projection of the gradient of U on the orthogonal of \(\nabla C\) matters.

Time Reversal Symmetry and Detailed Balance

Detailed Balance

If the Markov process is time reversible, or equivalently if it verifies a detailed balance condition, this implies a time reversal symmetry for the path large deviation estimate (10), or equivalently the action (13). We explain this point in this section.

A stationary continuous time Markov process is said to be time reversible if its backward and forward histories have the same probabilities. We consider the transition probability \(P_{T}\) for the Markov process (\(P_{T}(y;x)\) is the transition probability from the state x towards the state y). If for any states x and y,

$$\begin{aligned} P_{T}(y;x)P_{S}(x)=P_{T}(x;y)P_{S}(y), \end{aligned}$$

we say that the process verifies a detailed balance property with respect to the distribution \(P_{s}\). It is then very easily checked that \(P_{S}\) is a stationary distribution of the Markov process. The detailed balance condition is a necessary and sufficient condition for the Markov process to be time reversible. Another characterization of the time-reversibility of the process is that the infinitesimal generator of the time reversed process is identical to the infinitesimal generator of the initial process.

If for any \(\epsilon \) the process \(\left\{ X_{\epsilon }\right\} \) verifies a detailed balance property, then the large deviation dynamics will inherit this symmetry property. However, the converse is not necessarily true, the detailed balance property can hold at the level of the large deviations dynamics without holding at the level of the process \(\left\{ X_{\epsilon }\right\} \).

For the process we are interested in, the condition for detailed balance can be written

$$\begin{aligned} P_{\Delta t,\epsilon }\left( x,{\dot{x}}\right) P_{S}(x)\underset{\Delta t\rightarrow 0}{\sim }P_{\Delta t,\epsilon }\left( x+\Delta t{\dot{x}},-{\dot{x}}\right) P_{S}(x+\Delta t{\dot{x}}), \end{aligned}$$

where \(P_{\Delta t,\epsilon }\) is defined by (42). Using the large deviation estimates (43) and (51) evaluated for small \(\Delta t\), the detailed balance condition writes: for any x and \({\dot{x}}\)

$$\begin{aligned} L(x,{\dot{x}})-L(x,-{\dot{x}})={\dot{x}}.\nabla U. \end{aligned}$$

Using the Legendre–Fenchel relations between H and L (9), this detailed balance condition writes: for any x and p

$$\begin{aligned} H\left( x,-p\right) =H\left( x,p+\nabla U\right) . \end{aligned}$$

If H and U verify the detailed balance condition (62), using \(H(x,0)=0\) (eq. (47)), we easily deduce that \(H\left( x,\nabla U\right) =0\) which is the Hamilton–Jacobi equation. With some further conditions on U, see for instance Sect. 7.4, we may conclude that U is the quasipotential.

Moreover, if detailed balance is verified, then one expects to observe the time reversal symmetry at the level of the relaxation and fluctuation paths. Indeed from (62), we easily derive \(R(x)=\frac{\partial H}{\partial x}\left( x,0\right) =-\frac{\partial H}{\partial x}\left( x,\nabla U\right) =-F(x)\). We thus conclude that for Hamiltonians with detailed balance relation, the fluctuation paths are the time reversed of the relaxation paths.

Generalized Detailed Balance

For most physical systems the notion of time reversibility has to be extended, for instance in order to take into account that the velocity sign has to be changed in systems with inertia, or other fields have to be modified in the time-reversal symmetry. This is true for the time-reversal symmetry of dynamical systems, for instance of mechanical systems described by Hamiltonian equations, but also for the time-reversal symmetry of Markov processes. Such a generalized definition of time reversal symmetry is classical both in the physics and the mathematics literature, see for instance [22].

We consider a map I from the state space to itself. We assume that I is an involution (\(I^{2}=Id\)) and that I is self adjoint for the canonical scalar product: for any x and y, \(I(x).y=x.I(y)\). A continuous time Markov process is said to be time-reversal symmetric in the generalized sense if its backward histories with the application of I and its forward histories have the same probabilities. If the distribution \(P_{S}\) is I-symmetric (for any x \(P_{S}\left( I(x)\right) =P_{S}(x)\)) and if for any states x and y,

$$\begin{aligned} P_{T}(y;x)P_{S}(x)=P_{T}\left( I(x);I(y)\right) P_{S}\left( I(y)\right) , \end{aligned}$$

we say that the process verifies a generalized detailed balance property with respect to the distribution \(P_{s}\) and the symmetry I. It is then very easily checked that \(P_{S}\) is a stationary distribution of the Markov process. The generalized detailed balance condition is a necessary and sufficient condition for the Markov process to be time reversible in the generalized sense. Another characterization of the time-reversibility of the process in the generalized sense is that the infinitesimal generator of the time reversed process is identical to the generator of the initial process up to application of the involution I.

Then, the discussion of Sect. 7.3.1 easily generalizes. The conditions of generalized detailed balance at the level of large deviation read \(U\left( I(x)\right) =U(x)\) and

$$\begin{aligned} L(x,{\dot{x}})-L(x,-I\left[ {\dot{x}}\right] )=I\left[ {\dot{x}}\right] .\nabla U \end{aligned}$$

or equivalently

$$\begin{aligned} H\left( I\left[ x\right] ,-I\left[ p\right] \right) =H\left( x,p+\nabla U\right) . \end{aligned}$$

If a generalized detailed balance is verified, then the quasipotential solves the Hamilton–Jacobi equation and the fluctuation paths are the time reversed of the fluctuation paths composed with the symmetry I: \(F(x)=-R\left( I\left( x\right) \right) \).

A Sufficient Condition for U to be the Quasipotential

We know that if U is the quasipotential then it solves the Hamilton–Jacobi equation \(H\left( x,\nabla U\right) =0\). The converse is not necessarily true. For instance \(U=0\) solves the Hamilton–Jacobi equation but is not the quasipotential.

We give a sufficient condition for U to be the quasipotential, in the simple case when U has a unique global minimum \(x_{0}\).

If V solves the Hamilton–Jacobi equation and V has a single minimum \(x_{0}\) with \(V(x_{0})=0\), if moreover for any x the solution of the reverse fluctuation path dynamics \({\dot{X}}=-F(X)=-\frac{\partial H}{\partial p}\left( X,\nabla V(X)\right) \), with \(X(t=0)=x\), converges to \(x_{0}\) for large times, then V is the quasipotential. We give now a simple proof.

From the definition of L (9), we have for any X and \({\dot{X}}\), \(L\left( X,{\dot{X}}\right) \ge {\dot{X}}\nabla V(X)-H\left( X,\nabla V(X)\right) \). Hence using that V solves the Hamilton, Jacobi equation (\(H\left( x,\nabla V\right) =0\)), we obtain that for any X such that \(X(0)=x\) and \(X(-\infty )=x_{0}\)

$$\begin{aligned} \int _{-\infty }^{0}\hbox {d}t\,L\left( X,{\dot{X}}\right) \ge \int _{-\infty }^{0} \hbox {d}t\,{\dot{X}}\nabla V(X)=V(x). \end{aligned}$$

Hence, using the characterization of the quasipotential (52), we get \(U(x)\ge V(x)\).

Moreover, from the definition of L (9), for any x and p we have

$$\begin{aligned} L\left( x,\frac{\partial H}{\partial p}(x,p)\right) =p\frac{\partial H}{\partial p}(x,p)-H(x,p). \end{aligned}$$

If we apply this formula to the fluctuation path that verifies \(\dot{X_{f}}=\frac{\partial H}{\partial p}\left( X_{f},\nabla V(X_{f})\right) \), with \(p=\nabla U\), using moreover \(H(x,\nabla V(x))=0\), we get

$$\begin{aligned} \int _{-\infty }^{0}\hbox {d}t\,L\left( X_{f},{\dot{X}}_{f}\right) =\int _{-\infty }^{0}\hbox {d}t\,{\dot{X}}_{f}\nabla V(X_{f})=V(x). \end{aligned}$$

Hence \(U(x)\le V(x)\). We thus conclude that V is the quasipotential.

The Infinitesimal Generator for the Free Transport

We consider N particles that undergo free transport. Each particle \(1\le n\le N\) has a position \({\mathbf {r}}_{n}(t)\) and a velocity \({\mathbf {v}}_{n}(t)\). Then the empirical distribution f verifies the equation

$$\begin{aligned} \frac{\partial f}{\partial t}=-{\mathbf {v}}.\frac{\partial f}{\partial {\mathbf {r}}} \end{aligned}$$

Let us consider a \(\phi \) functional of f. Then \(\phi \) evolves according to

$$\begin{aligned} \frac{\hbox {d}\phi }{\hbox {d}t}=\int \hbox {d}{\mathbf {r}}\hbox {d}{\mathbf {v}}\, \frac{\partial f}{\partial t}({\mathbf {r}},{\mathbf {v}})\frac{\delta \phi }{\delta f({\mathbf {r}},{\mathbf {v}})}=-\int \hbox {d}{\mathbf {r}}\hbox {d}{\mathbf {v}}\, {\mathbf {v}}.\frac{\partial f}{\partial {\mathbf {r}}}({\mathbf {r}},{\mathbf {v}})\frac{\delta \phi }{\delta f({\mathbf {r}},{\mathbf {v}})}. \end{aligned}$$

Then the infinitesimal generator of the free transport is

$$\begin{aligned} G\left[ \phi \right] =-\int \hbox {d}{\mathbf {r}}\hbox {d}{\mathbf {v}}\,{\mathbf {v}}. \frac{\partial f}{\partial {\mathbf {r}}}({\mathbf {r}},{\mathbf {v}}) \frac{\delta \phi }{\delta f({\mathbf {r}},{\mathbf {v}})}. \end{aligned}$$

If \(\phi =\hbox {e}^{\frac{\int \text {d}{\mathbf {r}}\text {d}{\mathbf {v}}\,pf}{\epsilon }}\), then

$$\begin{aligned} \frac{\delta \phi }{\delta f({\mathbf {r}},{\mathbf {v}})}=\frac{p({\mathbf {r}}, {\mathbf {v}})}{\epsilon }\hbox {e}^{\frac{\int d{\mathbf {r}}_{1}d {\mathbf {v}}_{1}\,pf}{\epsilon }} \end{aligned}$$


$$\begin{aligned} \epsilon G\left[ \hbox {e}^{\frac{\int d{\mathbf {r}}d{\mathbf {v}}\,pf}{\epsilon }} \right] \hbox {e}^{-\frac{\int \text {d}{\mathbf {r}}\text {d}{\mathbf {v}}\,pf}{\epsilon }} =-\int \hbox {d}{\mathbf {r}}\hbox {d}{\mathbf {v}}\,p({\mathbf {r}},{\mathbf {v}}) {\mathbf {v}}.\frac{\partial f}{\partial {\mathbf {r}}}({\mathbf {r}},{\mathbf {v}}). \end{aligned}$$

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bouchet, F. Is the Boltzmann Equation Reversible? A Large Deviation Perspective on the Irreversibility Paradox. J Stat Phys 181, 515–550 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Boltzmann equation
  • Kinetic theory
  • Large deviation theory
  • Macroscopic fluctuation theory
  • Dilute gases
  • Gradient flows