Abstract
We study the Dyson–Ornstein–Uhlenbeck diffusion process, an evolving gas of interacting particles. Its invariant law is the beta Hermite ensemble of random matrix theory, a non-product log-concave distribution. We explore the convergence to equilibrium of this process for various distances or divergences, including total variation, relative entropy, and transportation cost. When the number of particles is sent to infinity, we show that a cutoff phenomenon occurs: the distance to equilibrium vanishes abruptly at a critical time. A remarkable feature is that this critical time is independent of the parameter beta that controls the strength of the interaction, in particular the result is identical in the non-interacting case, which is nothing but the Ornstein–Uhlenbeck process. We also provide a complete analysis of the non-interacting case that reveals some new phenomena. Our work relies among other ingredients on convexity and functional inequalities, exact solvability, exact Gaussian formulas, coupling arguments, stochastic calculus, variational formulas and contraction properties. This work leads, beyond the specific process that we study, to questions on the high-dimensional analysis of heat kernels of curved diffusions.
This is a preview of subscription content, access via your institution.


Notes
Here “H” is the capital \(\eta \) used by Boltzmann for entropy, “W” is for Wasserstein, “I” is for Fisher information.
References
Aldous, D., Diaconis, P.: Shuffling cards and stopping times. Am. Math. Mon. 93, 333–348 (1986)
Anderson, G.W., Guionnet, A., Zeitouni, O.: An introduction to random matrices, volume 118 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge (2010)
Ané, C., Blachère, S., Chafaï, D., Fougères, P., Gentil, I., Malrieu, F., Roberto, C., Scheffer, G.: Sur les inégalités de Sobolev logarithmiques, vol. 10. Société Mathématique de France, Paris (2000)
Baker, T.H., Forrester, P.J.: The Calogero-Sutherland model and polynomials with prescribed symmetry. Nuclear Phys. B 492(3), 682–716 (1997)
Bakry, D.: Remarques sur les semigroupes de Jacobi. Astérisque 236, 23–39 (1996). (Hommage à P. A. Meyer et J. Neveu)
Bakry, D., Gentil, I., Ledoux, M.: Analysis and geometry of Markov diffusion operators, vol. 348. Springer, Cham (2014)
Barrera, G.: Abrupt convergence for a family of Ornstein-Uhlenbeck processes. Braz. J. Probab. Stat. 32(1), 188–199 (2018)
Barrera, G., Högele, M.A., Pardo, J.C.: The cutoff phenomenon in total variation for nonlinear Langevin systems with small layered stable noise. preprint arXiv:2011.10806v1, (2020)
Barrera, G., Högele, M.A., Pardo, J.C.: Cutoff thermalization for Ornstein-Uhlenbeck systems with small Lévy noise in the Wasserstein distance. preprint arXiv:2009.10590v1 to appear in J. Stat. Phys. 2021 (2020)
Barrera, G., Jara, M.: Thermalisation for small random perturbations of dynamical systems. Ann. Appl. Probab. 30(3), 1164–1208 (2020)
Barrera, G., Pardo, J.C.: Cut-off phenomenon for Ornstein-Uhlenbeck processes driven by Lévy processes. Electron. J. Probab. 25, 33 (2020). (Paper No. 15)
Arous, G.B., Guionnet, A.: Large deviations for Wigner’s law and Voiculescu’s non-commutative entropy. Probab. Theory Relat. Fields 108(4), 517–542 (1997)
Bertucci, C., Debbah, M., Lasry, J.-M., Lions, P.-L.: A spectral dominance approach to large random matrices. preprint arXiv:2105.08983v1, (2021)
Biane, P., Speicher, R.: Free diffusions, free entropy and free Fisher information. Ann. Inst. H. Poincaré Probab. Stat. 37(5), 581–606 (2001)
Bolley, F., Chafaï, D., Fontbona, J.: Dynamics of a planar Coulomb gas. Ann. Appl. Probab. 28(5), 3152–3183 (2018)
Bolley, F., Gentil, I., Guillin, A.: Convergence to equilibrium in Wasserstein distance for Fokker-Planck equations. J. Funct. Anal. 263(8), 2430–2457 (2012)
Bourgade, P., Erdös, L., Yau, H.-T.: Edge universality of beta ensembles. Comm. Math. Phys. 332(1), 261–353 (2014)
Caputo, P., Labbé, C., Lacoin, H.: Mixing time of the adjacent walk on the simplex. Ann. Probab. 48(5), 2449–2493 (2020)
Caputo, P., Labbé, C., Lacoin, H.: Spectral gap and cutoff phenomenon for the Gibbs sampler of \(\nabla \varphi \) interfaces with convex potential. Ann. Inst. H. Poincaré Probab. Stat. 58(2), 794–826 (2022)
Carrillo, J.A., McCann, R.J., Villani, C.: Kinetic equilibration rates for granular media and related equations: entropy dissipation and mass transportation estimates. Rev. Mat. Iberoamericana 19(3), 971–1018 (2003)
Carrillo, J.A., McCann, R.J., Villani, C.: Contractions in the 2-Wasserstein length space and thermalization of granular media. Arch. Ration. Mech. Anal. 179(2), 217–263 (2006)
Cépa, E., Lépingle, D.: Diffusing particles with electrostatic repulsion. Probab. Theory Relat. Fields 107(4), 429–449 (1997)
Chafaï, D.: Entropies, convexity, and functional inequalities: on \(\Phi \)-entropies and \(\Phi \)-Sobolev inequalities. J. Math. Kyoto Univ. 44(2), 325–363 (2004)
Chafaï, D.: Binomial-Poisson entropic inequalities and the M/M/\(\infty \) queue. ESAIM, Probab. Stat. 10, 317–339 (2006)
Chafaï, D., Lehec, J.: On Poincaré and logarithmic Sobolev inequalities for a class of singular Gibbs measures. In: Geometric aspects of functional analysis. Israel seminar (GAFA) 2017–2019. Volume 1, pages 219–246. Springer, Cham (2020)
Chen, G.-Y., Saloff-Coste, L.: The cutoff phenomenon for ergodic Markov processes. Electron. J. Probab. 13(3), 26–78 (2008)
Devroye, L., Mehrabian, A., Reddad, T.: The total variation distance between high-dimensional Gaussians. preprint arXiv:1810.08693v5, (2018)
Diaconis, P.: The cutoff phenomenon in finite Markov chains. Proc. Nat. Acad. Sci. U.S.A. 93(4), 1659–1664 (1996)
Diaconis, P., Saloff-Coste, L.: Logarithmic Sobolev inequalities for finite Markov chains. Ann. Appl. Probab. 6(3), 695–750 (1996)
Diaconis, P., Shahshahani, M.: Time to reach stationarity in the Bernoulli-Laplace diffusion model. SIAM J. Math. Anal. 18, 208–218 (1987)
Donati-Martin, C., Groux, B., Maïda, M.: Convergence to equilibrium in the free Fokker-Planck equation with a double-well potential. Ann. Inst. Henri Poincaré, Probab. Stat. 54(4), 1805–1818 (2018)
Dumitriu, I., Edelman, A.: Matrix models for beta ensembles. J. Math. Phys. 43(11), 5830–5847 (2002)
Dyson, F.J.: A Brownian-motion model for the eigenvalues of a random matrix. J. Math. Phys. 3, 1191–1198 (1962)
Edelman, A.: The random matrix technique of ghosts and shadows. Markov Process. Relat. Fields 16(4), 783–792 (2010)
Edelman, A., Rao, N.R.: Random matrix theory. Acta Numerica 14, 233–297 (2005)
Engoulatov, A.: A universal bound on the gradient of logarithm of the heat kernel for manifolds with bounded Ricci curvature. J. Funct. Anal. 238(2), 518–529 (2006)
Erdős, L., Yau, H.-T.: A dynamical approach to random matrix theory, volume 28 of Courant Lecture Notes in Mathematics. Courant Institute of Mathematical Sciences, New York; American Mathematical Society, Providence, RI (2017)
Feller, W.: Two singular diffusion problems. Ann. Math. 2(54), 173–182 (1951)
Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int. Stat. Rev. 70(3), 419–435 (2002)
Givens, C.R., Shortt, R.M.: A class of Wasserstein metrics for probability distributions. Michigan Math. J. 31(2), 231–240 (1984)
Grigor’yan, A.: Heat kernel and analysis on manifolds, volume 47. Providence, RI: American Mathematical Society (AMS); Somerville, MA: International Press (2009)
Gustavsson, J.: Gaussian fluctuations of eigenvalues in the GUE. Ann. Inst. Henri Poincaré, Probab. Stat. 41(2), 151–178 (2005)
Hoffman, A.J., Wielandt, H.W.: The variation of the spectrum of a normal matrix. Duke Math. J. 20, 37–39 (1953)
Holcomb, D., Paquette, E.: Tridiagonal models for dyson brownian motion. preprint arXiv:1707.02700, (2017)
Horn, R.A., Johnson, C.R.: Matrix analysis, 2nd edn. Cambridge University Press, Cambridge (2013)
Huang, J., Landon, B.: Rigidity and a mesoscopic central limit theorem for Dyson Brownian motion for general \(\beta \) and potentials. Probab. Theory Relat. Fields 175(1–2), 209–253 (2019)
Lachaud, B.: Cut-off and hitting times of a sample of Ornstein-Uhlenbeck processes and its average. J. Appl. Probab. 42(4), 1069–1080 (2005)
Lacoin, H.: Mixing time and cutoff for the adjacent transposition shuffle and the simple exclusion. Ann. Probab. 44(2), 1426–1487 (2016)
Lassalle, M.: Polynômes de Hermite généralisés. C. R. Acad. Sci. Paris Sér. I Math. 313(9), 579–582 (1991)
Lassalle, M.: Polynômes de Jacobi généralisés. C. R. Acad. Sci. Paris Sér. I Math. 312(6), 425–428 (1991)
Lassalle, M.: Polynômes de Laguerre généralisés. C. R. Acad. Sci. Paris Sér. I Math. 312(10), 725–728 (1991)
Levin, D.A., Peres, Y., Wilmer, E.L.: Markov chains and mixing times. With a chapter on “Coupling from the past” by James G. Propp and David B. Wilson. 2nd edition. Providence, RI: American Mathematical Society (AMS), 2nd edition edition (2017)
Li, S., Li, X.-D., Xie, Y.-X.: On the law of large numbers for the empirical measure process of generalized Dyson Brownian motion. J. Stat. Phys. 181(4), 1277–1305 (2020)
Lippert, R.A.: A matrix model for the \(\beta \)-Jacobi ensemble. J. Math. Phys. 44(10), 4807–4816 (2003)
Méliot, P.-L.: The cut-off phenomenon for brownian motions on compact symmetric spaces. Potential Anal. 40(4), 427–509 (2014)
Pardo, L.: Statistical inference based on divergence measures, volume 185 of Statistics: Textbooks and Monographs. Chapman & Hall/CRC, Boca Raton, FL (2006)
Pollard, D.: A user’s guide to measure theoretic probability, volume 8 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge (2002)
Potters, M., Bouchaud, J.-P.: A first course in random matrix theory: for physicists, engineers and data scientists. Cambridge University Press, Cambridge (2021)
Rachev, S.T.: Probability metrics and the stability of stochastic models. John Wiley & Sons Ltd., Chichester etc. (1991)
Rogers, L., Shi, Z.: Interacting Brownian particles and the Wigner law. Probab. theory relat. fields 95(4), 555–570 (1993)
Salez, J.: Cutoff for non-negatively curved Markov chains. preprint arXiv:2102.05597v1, (2021)
Saloff-Coste, L.: Precise estimates on the rate at which certain diffusions tend to equilibrium. Mathematische Zeitschrift 217(1), 641–677 (1994)
Saloff-Coste, L.: Aspects of Sobolev-type inequalities, vol. 289. Cambridge University Press, Cambridge (2002)
Saloff-Coste, L.: On the convergence to equilibrium of Brownian motion on compact simple Lie groups. J. Geom. Anal. 14(4), 715–733 (2004)
Souplet, P., Zhang, Q.S.: Sharp gradient estimate and Yau’s Liouville theorem for the heat equation on noncompact manifolds. Bull. Lond. Math. Soc. 38(6), 1045–1053 (2006)
Villani, C.: Optimal transport. Old and new, vol. 338. Springer, Berlin (2009)
Acknowledgements
JB is supported by a “Fondation CFM pour la Recherche” grant. DC is supported by project EFI ANR-17-CE40-0030. CL is supported by project SINGULAR ANR-16-CE40-0020-01.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A. Distances and divergences
We use the following standard distances and divergences to quantify the trend to equilibrium of Markov processes and to formulate the cutoff phenomena.
The Wasserstein–Kantorovich–Monge transportation distance of order 2 and with respect to the underlying Euclidean distance is defined for all probability measures \(\mu \) and \(\nu \) on \({\mathbb {R}}^n\) by
where \(|x|=\sqrt{x_1^2+\ldots +x_n^2}\) and where the inf runs over all couples (X, Y) with \(X\sim \mu \) and \(Y\sim \nu \).
The total variation distance between probability measures \(\mu \) and \(\nu \) on the same space is
where the supremum runs over Borel subsets. If \(\mu \) and \(\nu \) are absolutely continuous with respect to a reference measure \(\lambda \) with densities \(f_\mu \) and \(f_\nu \) then \(\Vert \mu -\nu \Vert _{\mathrm {TV}}=\frac{1}{2}\int |f_\mu -f_\nu |\mathrm {d}\lambda =\frac{1}{2}\Vert f_\mu -f_\nu \Vert _{L^1(\lambda )}\).
The Hellinger distance between probability measures \(\mu \) and \(\nu \) with densities \(f_\mu \) and \(f_\nu \) with respect to the same reference measure \(\lambda \) is
This quantity does not depend on the choice of \(\lambda \). We have \(\mathrm {Hellinger}(\mu ,\nu )=\frac{1}{\sqrt{2}}\Vert \sqrt{f_\mu }-\sqrt{f_\nu }\Vert _{L^2(\lambda )}\). Note that an alternative normalization is sometimes considered in the literature, making the maximal value of the Hellinger distance equal \(\sqrt{2}\).
The Kullback–Leibler divergence or relative entropy is defined by
if \(\nu \) is absolutely continuous with respect to \(\mu \), and \(\mathrm {Kullback}(\nu \mid \mu )=+\infty \) otherwise.
The \(\chi ^2\) divergence or relative variance is given by
We set it to \(+\infty \) if \(\nu \) is not absolutely continuous with respect to \(\mu \). If \(\mu \) and \(\nu \) have densities \(f_\mu \) and \(f_\nu \) with respect to a reference measure \(\lambda \) then \(\chi ^2(\nu \mid \mu )=\int (f_\nu ^2/f_\mu )\mathrm {d}\lambda -1\).
The (logarithmic) Fisher information or divergence is defined by
if \(\nu \) is absolutely continuous with respect to \(\mu \), and \(\mathrm {Fisher}(\nu \mid \mu )=+\infty \) otherwise.
Each of these distances or divergences has its advantages and drawbacks. In some sense, the most sensitive is Fisher due to its Sobolev nature, then \(\chi ^2\), then Kullback which can be seen as a sort of \(L^{1+}=L\log L\) norm, then TV and Hellinger which are comparable, then Wasserstein, but this rough hierarchy misses some subtleties related to some scales and nature of the arguments.
Some of these distances or divergences can generically be compared as the following result shows.
Lemma A.1
(Inequalities). For any probability measures \(\mu \) and \(\nu \) on the same space,
We refer to [57, p. 61-62] for a proof. The inequality between the total variation distance and the relative entropy is known as the Pinsker or Csiszár–Kullback inequality, while the inequalities between the total variation distance and the Hellinger distance are due to Kraft. There are many other metrics between probability measures, see for instance [39, 59] for a discussion.
The total variation distance can also be seen as a special Wasserstein distance of order 1 with respect to the atomic distance, namely
where the infimum runs over all couplings \(X\sim \mu \) and \(Y\sim \nu \). This explains in particular why \(\mathrm {TV}\) is more sensitive than \(\mathrm {Wasserstein}\) at short scales but less sensitive at large scales, a consequence of the sensitivity difference between the underlying atomic and Euclidean distances. The probabilistic representations of \(\mathrm {TV}\) and \(\mathrm {Wasserstein}\) make them compatible with techniques of coupling, which play an important role in the literature on convergence to equilibrium of Markov processes.
We gather now useful results on distances and divergences.
Lemma A.2
(Contraction properties). Let \(\mu \) and \(\nu \) be two probability measures on a same measurable space S. Let \(f:S\mapsto T\) be a measurable function, where T is another measurable space.
-
If \(\mathrm {dist}\in \{\mathrm {TV}, \mathrm {Kullback}, \chi ^2\}\) then
$$\begin{aligned} \mathrm {dist}(\nu \circ f^{-1}\mid \mu \circ f^{-1}) \le \mathrm {dist}(\nu \mid \mu ). \end{aligned}$$ -
If \(S={\mathbb {R}}^n\), \(T={\mathbb {R}}^k\) then, denoting \(\left\| f\right\| _{\mathrm {Lip}}=\sup _{x\ne y}\frac{|f(x)-f(y)|}{|x-y|}\),
$$\begin{aligned} \mathrm {Wasserstein}(\mu \circ f^{-1},\nu \circ f^{-1}) \le \left\| f\right\| _{\mathrm {Lip}}\mathrm {Wasserstein}(\mu ,\nu ). \end{aligned}$$
The notation \(f^{-1}\) stands for the reciprocal map \(f^{-1}(A)=\{y\in S:f(x)\in A\}\) and \(\mu \circ f^{-1}\) is the image measure or push-forward of \(\mu \) by the map f, defined by \((\mu \circ f^{-1})(A)=\mu (f^{-1}(A))\). In terms of random variables we have \(Y\sim \mu \circ f^{-1}\) if and only \(Y=f(X)\) where \(X\sim \mu \).
The proof of the contraction properties of Lemma A.2 are all based on variational formulas. Note that following [66, Ex. 22.20 p. 588], there is a variational formula for \(\mathrm {Fisher}\) that comes from its dual representation as an inverse Sobolev norm. We do not develop this idea in this work.
Proof
The proof of the contraction property for Wasserstein comes from the fact that every coupling of \(\mu \) and \(\nu \) produces a coupling for \(\mu \circ f^{-1}\) and \(\nu \circ f^{-1}\). Regarding TV, the contraction property is a consequence of the definition of this distance and of measurability. In the case of Kullback, the property can be proved using the following well known variational formula:
where the supremum runs over all \(g\in L^1(\nu )\), or by approximation when the supremum runs over all bounded measurable g. This variational formula can be derived for instance by applying Jensen’s inequality to \(- \log {\mathbb {E}}_{\nu }[\mathrm {e}^g \frac{\mathrm {d}\mu }{\mathrm {d}\nu }]\).
Equality is achieved for \(g=\log (\mathrm {d}\nu /\mathrm {d}\mu )\). Now, taking \(g=h\circ f\) gives
and it remains to take the supremum over h to get
The variational formula for \(\mathrm {Kullback}(\cdot \mid \mu )\) is a manifestation of its convexity, it expresses this functional as the envelope of its tangents, its Fenchel–Legendre transform or convex dual is the log-Laplace transform. Such a variational formula is equivalent to tensorization, and is available for all \(\Phi \)-entropies such that \((u,v)\mapsto \Phi ''(u)v^2\) is convex, see [24, Th. 4.4]. In particular, the analogous variational formula as well as the consequence in terms of contraction are also available for \(\chi ^2\) which corresponds to the \(\Phi \)-entropy with \(\Phi (u)=u^2-1\) (variance as a \(\Phi \)-entropy). \(\square \)
Lemma A.3
(Scale invariance versus homogeneity) The total variation distance is scale invariant while the Wasserstein distance is homogeneous just like a norm, namely for all probability measures \(\mu \) and \(\nu \) on \({\mathbb {R}}^n\) and all scaling factor \(\sigma \in (0,\infty )\), denoting \(\mu _\sigma =\mathrm {Law}(\sigma X)\) where \(X\sim \mu \), we have
Proof
For the Wasserstein distance, the result follows from
while for the \(\mathrm {TV}\) distance, it comes from the fact that \(A\mapsto A_\sigma := \{\sigma x: x\in A\}\) is a bijection. \(\square \)
We turn to the behavior of the distances/divergences under tensorization.
Lemma A.4
(Tensorization) For all probability measures \(\mu _1,\ldots ,\mu _n\) and \(\nu _1,\ldots ,\nu _n\) on \({\mathbb {R}}\), we have
The equality for the Wasserstein distance comes by taking the product of optimal couplings. The first inequality for the total variation distance comes from its contraction property (Lemma A.2), while the second comes from \(|(a_1\ldots a_n)-(b_1\ldots b_n)|\le \sum _{i=1}^n|a_i-b_i|(a_1\ldots a_{i-1})(b_{i+1}\ldots b_n)\), \(a_1,\ldots ,a_n,b_1,\ldots ,b_n\in [0,+\infty )\), which comes itself from the triangle inequality on the telescoping sum \(\sum _{i=1}^n(c_i-c_{i-1})\) where \(c_i=(a_1\ldots a_i)(b_{i+1}\ldots b_n)\) via \(c_i-c_{i-1}=(a_i-b_i)(a_1\ldots a_{i-1})(b_{i+1}\ldots b_n)\).
Lemma A.5
(Explicit formulas for Gaussian distributions) For all \(n\ge 1\), \(m_1,m_2\in {\mathbb {R}}^n\), and all \(n\times n\) covariance matrices \(\Sigma _1,\Sigma _2\), denoting \(\Gamma _1={\mathcal {N}}(\mu _1,\Sigma _1)\) and \(\Gamma _2={\mathcal {N}}(\mu _2,\Sigma _2)\), we have
where the formula for \(\chi ^2(\Gamma _1\mid \Gamma _2)\) holds if \(2\Sigma _2>\Sigma _1\), and \(\chi ^2(\Gamma _1\mid \Gamma _2)=+\infty \) otherwise. Moreover the formulas for Fisher and Wasserstein rewrite, if \(\Sigma _1\) and \(\Sigma _2\) commute, \(\Sigma _1\Sigma _2=\Sigma _2\Sigma _1\), to
Regarding the total variation distance, there is no general simple formula for Gaussian laws, but we can use for instance the comparisons with \(\mathrm {Kullback}\) and \(\mathrm {Hellinger}\) (Lemma A.1), see [27] for a discussion.
Proof of Lemma A.5
We refer to [56, p. 47 and p. 51] for Kullback and Hellinger, and to [40] for Wasserstein, a far more subtle case. The formula for \(\chi ^2(\Gamma _1\mid \Gamma _2)\) follows easily from a direct computation. We have not found in the literature a formula for Fisher. Let us give it here for the sake of completeness. Using \({\mathbb {E}}[X_iX_j]=\Sigma _{ij}+m_im_j\) when \(X\sim {\mathcal {N}}(m,\Sigma )\) we get, for all \(n\times n\) symmetric matrices A and B
and thus for all n-dimensional vectors a and b,
Now, using the notation \(q_i(x)=\Sigma _i^{-1}(x-m_i)\cdot (x-m_i)\) and \(|\Sigma _i|=\det (\Sigma _i)\),
The formula when \(\Sigma _1\Sigma _2=\Sigma _2\Sigma _1\) follows immediately. \(\square \)
Appendix B. Convexity and its dynamical consequences
We gather useful dynamical consequences of convexity. We start with functional inequalities.
Lemma B.1
(Logarithmic Sobolev inequality) Let \(P_n^\beta \) be the invariant law of the DOU process solving (1.3). Then, for all law \(\nu \) on \({\mathbb {R}}^n\), we have
Moreover the constant \(\frac{1}{2n}\) is optimal.
Furthermore, finite equality is achieved if and only if \(\mathrm {d}\nu /\mathrm {d}P_n^\beta \) is of the form \(\mathrm {e}^{\lambda (x_1+\ldots +x_n)}\), \(\lambda \in {\mathbb {R}}\).
Linearizing the log-Sobolev inequality above with \(\mathrm {d}\nu /\mathrm {d}P_n^\beta =1+\varepsilon f\) gives the Poincaré inequality
It can be extended by truncation and regularization from the case where f is smooth and compactly supported to the case where f is in the Sobolev space \(H^1(P^\beta _n)\). Finite equality is achieved when f is an eigenfunction associated to the eigenvalue \(-1\) of \({{\,\mathrm{G}\,}}\), namely \(f(x)=a(x_1+\ldots +x_n)+b\), \(a,b\in {\mathbb {R}}\), hence the other name spectral gap inequality. It rewrites in terms of \(\chi ^2\) divergence as
The right-hand side plays for the \(\chi ^2\) divergence the role played by Fisher for Kullback.
We refer to [25, 37] for a proof of Lemma B.1. This logarithmic Sobolev inequality is a consequence of the log-concavity of \(P_n^\beta \) with respect to \({\mathcal {N}}(0,\frac{1}{n}\mathrm {I}_n)\). A slightly delicate aspect lies in the presence of the restriction to \(D_n\), which can be circumvented by using a regularization procedure.
There are many other functional inequalities which are a consequence of this log-concavity, for instance the Talagrand transportation inequality that states that when \(\nu \) has finite second moment,
and the HWI inequalityFootnote 1 that states that when \(\nu \) has finite second moment,
and we refer to [66] for this couple of functional inequalities, that we do not use here.
Lemma B.2
(Sub-exponential convergence to equilibrium) Let \({(X^n_t)}_{t\ge 0}\) be the DOU process solution of (1.3) with \(\beta =0\) or \(\beta \ge 1\), and let \(P_n^\beta \) be its invariant law. Then for all \(t\ge 0\), we have the sub-exponential convergences
Recall that when \(\beta >0\) the initial condition \(X^n_0\) is always taken in \(D_n\).
For each inequality, if the right-hand side is infinite then the inequality is trivially satisfied. This is in particular the case for \(\mathrm {Kullback}\) and \(\mathrm {Fisher}\) when \(\mathrm {Law}(X^n_0)\) is not absolutely continuous with respect to the Lebesgue measure, and for Wasserstein when \(\mathrm {Law}(X^n_0)\) has infinite second moment.
Elements of proof of Lemma B.2
The idea is that an exponential decay for \(\mathrm {Kullback}\), \(\chi ^2\), \(\mathrm {Fisher}\), and \(\mathrm {Wasserstein}\) can be established by taking the derivative, using a functional inequality, and using the Grönwall lemma. More precisely, for \(\mathrm {Kullback}\) it is a log-Sobolev inequality, for \(\chi ^2\) a Poincaré inequality, for \(\mathrm {Wasserstein}\) a transportation type inequality, and for \(\mathrm {Fisher}\) a Bakry – Émery \(\Gamma _2\) inequality, see for instance [3, 6, 66]. It is a rather standard piece of probabilistic functional analysis, related to the log-concavity of \(P_n^\beta \). We recall the crucial steps for the reader convenience. Let us set \(\mu _t=\mathrm {Law}(X^n_t)\) and \(\mu =P_n^\beta \). For \(t>0\) the density \(p_t=\mathrm {d}\mu _t/\mathrm {d}\mu \) exists and solves the evolution equation \(\partial _tp_t={{\,\mathrm{G}\,}}p_t\) where \({{\,\mathrm{G}\,}}\) is as in (2.5). We have the integration by parts
For \(\mathrm {Kullback}\), we find using these tools, for all \(t>0\), denoting \(\Phi (u):=u\log (u)\),
where the inequality comes from the logarithmic Sobolev inequality of Lemma B.1. It remains to use the Grönwall lemma to get the exponential decay of \(\mathrm {Kullback}\).
The derivation of the exponential decay of the Fisher divergence follows the same lines by differentiating again with respect to time. Indeed, after a sequence of differential computations and integration by parts, we find, see for instance [3, Ch. 5], [6], or [66],
where \(\Gamma _{\!2}(f):=\frac{1}{n^2}f''^2+\frac{1}{n}V''f'^2\) is the Bakry – Émery “Gamma-two” operator of the dynamics. Now using the convexity of V, we get, by the Grönwall lemma, for all \(t>0\),
This can be used to prove the log-Sobolev inequality, see [3, Ch. 5], [6], and [66]. This differential approach goes back at least to Boltzmann (statistical physics) and Stam (information theory) and was notably extensively developed later on by Bakry, Ledoux, Villani and their followers.
For the Wasserstein distance, we proceed by coupling. Indeed, since the diffusion coefficient is constant in space, we can simply use a parallel coupling. Namely, let \({(X'_t)}_{t\ge 0}\) be the process started from another possibly random initial condition \(X'_0\), and satisfying to the same stochastic differential equation, with the same BM. We get
hence
Now since E is uniformly convex with \(\nabla ^2 E\ge n I_n\), we get, for all \(x,y\in {\mathbb {R}}^n\),
which gives
and by the Grönwall lemma,
It follows that
By taking the infimum over all couplings of \(X_0\) and \(X_0'\) we get
Taking \(X_0'\sim P_n^\beta \) we get, by invariance, for all \(t\ge 0\),
\(\square \)
Lemma B.3
(Monotonicity) Let \({(X^n_t)}_{t\ge 0}\) be the DOU process (1.3), with \(\beta =0\) or \(\beta \ge 1\) and invariant law \(P^\beta _n\). Then for all \(\mathrm {dist}\in \{\mathrm {TV}, \mathrm {Hellinger}, \mathrm {Kullback}, \chi ^2, \mathrm {Fisher}, \mathrm {Wasserstein}\}\), the function \(t\ge 0\mapsto \mathrm {dist}(\mathrm {Law}(X^n_t)\mid P^\beta _n)\) is non-increasing.
Elements of proof of Lemma B.3
The monotonicity for \(\mathrm {TV}, \mathrm {Hellinger}, \mathrm {Kullback}, \chi ^2\) comes from the Markov nature of the process and the convexity of
This is known as the \(\Phi \)-entropy dissipation of Markov processes, see [6, 23, 66]. This can also be seen from (B.3). The monotonicity for \(\mathrm {TV}\) follows also from the contraction property of the total variation with respect to general Markov kernels, see [52, Ex. 4.2].
The monotonicity for \(\mathrm {Fisher}\) comes from the identity (B.4) and the convexity of V. By (B.3) this monotonicity is also equivalent to the convexity of \(\mathrm {Kullback}\) along the dynamics. The monotonicity for \(\mathrm {Wasserstein}\) can be obtained by computing the derivative along the dynamics starting from (B.6), but this is more subtle due to the variational nature of this distance and involves the convexity of V, see for instance [16, Bottom of p. 2442 and Lem. 3.2].
The monotonicities can also be extracted from the exponential decays of Lemma B.2 thanks to the Markov property and the profile \(\mathrm {e}^{-t}=1-t+o(t)\) of the prefactor in the right hand side. \(\square \)
The convexity of the interaction \(-\log \) as well as the constant nature of the diffusion coefficient in the evolution Eq. (1.3) allows to use simple “maximum principle” type arguments to prove that the dynamic exhibits a monotonous behavior and an exponential decay.
Lemma B.4
(Monotonicity and exponential decay) Let \({(X_t^n)}_{t\ge 0}\) and \({(Y_t^n)}_{t\ge 0}\) be a pair of DOU processes solving (1.3), \(\beta \ge 1\), driven by the same Brownian motion \((B_t)_{t\ge 0}\) on \({\mathbb {R}}^n\) and with respective initial conditions \(X_0^n\in {\overline{D}}_n\) and \(Y_0^n\in {\overline{D}}_n\). If for all \(i\in \{1,\ldots ,n\}\)
then the following properties hold true:
-
(Monotonicity property) for all \(t\ge 0\) and \(i \in \{1,\ldots ,n\}\),
$$\begin{aligned} X_t^{n,i}\le Y_t^{n,i}, \end{aligned}$$ -
(Decay estimate) for all \(t\ge 0\),
$$\begin{aligned} \max _{i\in \{1,\ldots ,n\}} (Y_t^{n,i}-X_t^{n,i}) \le \max _{i\in \{1,\ldots ,n\}} (Y_0^{n,i}-X_0^{n,i})\mathrm {e}^{-t}. \end{aligned}$$
Proof of Lemma B.4
The difference of \(Y_t^n - X_t^n\) satisfies
Since there are almost surely no collisions between the coordinates of \(X^n\), resp. of \(Y^n\), the right-hand side is almost surely finite for all \(t > 0\) and every process \(Y_t^{n,i}-X_t^{n,i}\) is \({\mathcal {C}}^1\) on \((0,\infty )\). Note that at time 0 some derivatives may blow up as two coordinates of \(X^n\) or \(Y^n\) may coincide.
Let us define
Elementary considerations imply that M and m are themselves \({\mathcal {C}}^1\) on \((0,\infty )\) and that at all times \(t>0\), there exist i, j such that
This would not be true if there were infinitely many processes of course. Now observe that if at time \(t>0\) we have \(Y_t^{n,i}-X_t^{n,i} = M(t)\), then
This implies that \(\partial _t M(t) \le - M(t)\). Similarly, we can deduce that \(\partial _t m(t) \ge - m(t)\). Integrating these differential equations, we get for all \(t\ge t_0 > 0\)
Since all processes are continuous on \([0,\infty )\), we can pass to the limit \(t_0\downarrow 0\) and get for all \(t\ge 0\),
\(\square \)
Remark B.5
(Beyond DOU dynamics) The monotonicity property of Lemma B.4 relies on the convexity of the interaction \(-\log \), and has nothing to do with the long-time behavior and the strength of V. In particular, this monotonicity property remains valid for the process solving (1.3) with an arbitrary V provided that it is \({\mathcal {C}}^1\) and there is no explosion, even in the situation where V is not strong enough to ensure that the process has an invariant law. If V is \({\mathcal {C}}^2\) then the decay estimate of Lemma B.4 survives in the following decay or growth form:
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Boursier, J., Chafaï, D. & Labbé, C. Universal cutoff for Dyson Ornstein Uhlenbeck process. Probab. Theory Relat. Fields 185, 449–512 (2023). https://doi.org/10.1007/s00440-022-01158-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-022-01158-5
Keywords
- Dyson process
- Ornstein–Uhlenbeck process
- Coulomb gas
- Random matrix theory
- High dimensional phenomenon
- Cutoff phenomenon
- High-dimensional probability
- Functional inequalities
- Spectral analysis
- Stochastic calculus
- Gaussian analysis
- Markov process
- Diffusion process
- Interacting particle system
Mathematics Subject Classification
- Diffusion processes: 60J60
- Interacting particle systems: 82C22