On Large Deviations for Gibbs Measures, Mean Energy and Gamma-Convergence

Open Access


We consider the random point processes on a measure space \((X,\mu _{0})\) defined by the Gibbs measures associated with a given sequence of N-particle Hamiltonians \(H^{(N)}.\) Inspired by the method of Messer–Spohn for proving concentration properties for the laws of the corresponding empirical measures, we propose a number of hypotheses on \(H^{(N)}\) that are quite general but still strong enough to extend the approach of Messer–Spohn. The hypotheses are formulated in terms of the asymptotics of the corresponding mean energy functionals. We show that in many situations, the approach even yields a large deviation principle (LDP) for the corresponding laws. Connections to Gamma-convergence of (free) energy type functionals at different levels are also explored. The focus is on differences between positive and negative temperature situations, motivated by applications to complex geometry. The results yield, in particular, large deviation principles at positive as well as negative temperatures for quite general classes of singular mean field models with pair interactions, generalizing the 2D vortex model and Coulomb gases. In a companion paper, the results will be illustrated in the setting of Coulomb and Riesz type gases on a Riemannian manifold X,  comparing with the complex geometric setting.


Interacting particle systems Mean-field limit Large deviations Coulomb interaction 

Mathematics Subject Classification

82C22 60F10 

1 Introduction

Let X be a compact topological space endowed with a probability measure \(\mu _{0}.\) Given a sequence of symmetric functions \(H^{(N)}\) on the N-fold products \(X^{N},\) which are absolutely integrable with respect to the Borel measure \(\mu _{0}^{\otimes N},\) the corresponding Gibbs measures at inverse temperature \(\beta _{N}\in \mathbb {R}\) are defined as the following sequence of symmetric probability measures on \(X^{N}:\)
$$\begin{aligned} \mu _{\beta _{N}}^{(N)}:=\frac{1}{Z_{N,\beta }}e^{-\beta _{N}H^{(N)}}\mu _{0}, \end{aligned}$$
assuming that the partition function \(Z_{N,\beta _{N}}\) is finite:
$$\begin{aligned} Z_{N,\beta _{N}}:=\int _{X^{N}}e^{-\beta _{N}H^{(N)}}\mu _{0}^{\otimes N}<\infty . \end{aligned}$$
We also assume that the following limit exists:
$$\begin{aligned} \beta :=\lim _{N\rightarrow \infty }\beta _{N}\in ]-\infty ,\infty ]. \end{aligned}$$
The ensemble \((X^{N},\mu _{\beta _{N}}^{(N)})\) (called the canonical ensemble) defines a random point process with N particles on X which, from the point of view of statistical mechanics, models N identical particles on X interacting by the Hamiltonian (interaction energy) \(H^{(N)}\) in thermal equilibrium at inverse temperature \(\beta _{N}.\) The corresponding empirical measure is the random measure
$$\begin{aligned} \delta _{N}:\,\,X^{N}\rightarrow \mathcal {P}(X),\,\,\,(x_{1}, \ldots ,x_{N})\mapsto \delta _{N}(x_{1},\ldots ,x_{N}):=\frac{1}{N}\sum _{i=1}^{N} \delta _{x_{i}} \end{aligned}$$
taking values in the space \(\mathcal {P}(X)\) of all probability measures on X. A recurrent theme in statistical mechanics is to study the large N-limit (i.e., the “macroscopic limit”) of the canonical ensemble through the large N-limit of the laws of \(\delta _{N},\) i.e. through the sequence of probability measures
$$\begin{aligned} \Gamma _{N}:=(\delta _{N})_{*}\mu _{\beta _{N}}^{(N)} \end{aligned}$$
on \(\mathcal {P}(X).\) In many situations, the laws \(\Gamma _{N}\) can be shown to concentrate, as \(N\rightarrow \infty ,\) at the subset of \(\mathcal {P}(X)\) consisting of the minima of a free energy type functional \(F_{\beta }\) on \(\mathcal {P}(X);\) we will then say that “the sequence \(\Gamma _{N}\) has the concentration property.” For example, if the functional \(F_{\beta }\) has a unique minimizer \(\mu _{\beta }\), then it follows that the random measures \(\delta _{N}\) converge in law to a unique deterministic measure \(\mu _{\beta }.\) A stronger exponential notion of concentration—with an explicit speed and rate functional—is offered by the theory of large deviations, by demanding that the laws \(\Gamma _{N}\) satisfy a large deviation principle (LDP) with speed \(r_{N}\) and a rate functional F,  symbolically expressed as
$$\begin{aligned} \Gamma _{N}(\mu )\sim e^{-r_{N}F(\mu )},\,\,N\rightarrow \infty . \end{aligned}$$
The present paper is inspired by the method introduced by Messer–Spohn [28] to establish the concentration property of the laws of the empirical measures \(\delta _{N}\) using the Gibbs variational principle, combined with properties of the mean energy of the system. In the original approach in [28], \(H^{(N)}\) was assumed to be the mean field Hamiltonian corresponding to a continuous pair interaction potential W(xy) : 
$$\begin{aligned} H^{(N)}(x_{1},\ldots ,x_{N}):=\frac{1}{N}\sum _{1\le i,j\le N}W(x_{i},x_{j}), \end{aligned}$$
where \(\beta _{N}=\beta \in ]0,\infty [\) (this is a mean field interaction in the sense that each particle \(x_{i}\) is exposed to the average of the pair interactions \(W(x_{i},x_{j})\) of all N particles, including the self-interaction). But the approach has also been extended to handle some situations where W(xy) is allowed to be singular [15, 22, 24], as in Onsager’s vortex model for 2D turbulence [29], where \(W(x,y)=-\log |x-y|.\) The corresponding mean field Hamiltonian is then “renormalized” by removing the self-interaction terms in order to make sure that \(H^{(N)}\) is generically finite on \(X^{N}.\) The aim of the present paper is to:
  • Propose a number of quite general hypotheses on \(H^{(N)},\) formulated in terms of the corresponding mean energy functional \(E^{(N)}\)on \(\mathcal {P}(X^{N})\) (defined by formula 1.3) that are strong enough to extend the approach of Messer–Spohn.

  • Show that the approach also yields the stronger exponential concentration property in the sense of an LDP, almost “for free,” in several situations.

  • Explore some relations to the notion of Gamma-convergence of functionals: first by reformulating the approach of Messer–Spohn in terms of Gamma-convergence of the induced mean free energy functionals \(F_{\beta _{N}}^{(N)}\) on \(\mathcal {P}(\mathcal {P}(X))\) (formula (2.4)) and then by deducing a Gamma-convergence result for the sequence \(H^{(N)}/N\) on \(X^{N}.\)

The main motivation comes from the probabilistic approach to the construction of Kähler–Einstein metrics on a complex algebraic manifold X, introduced in [4, 5]. In that situation, the corresponding Hamiltonians \(H^{(N)}\) are highly nonlinear and singular (and not of the simpler mean field type appearing in formula (1.5), which has previously been used in the statistical mechanical approach to conformal geometry introduced in [23]). But still, as shown in [5], building on [7], the sequence \(H^{(N)}/N\) Gamma-converges towards a certain energy type functional \(E(\mu )\) on \(\mathcal {P}(X).\) Exploiting superharmonicity properties of \(H^{(N)}/N\), the corresponding LDP is then established at any positive inverse temperature \(\beta \) in [5], producing Kähler–Einstein metrics with negative Ricci curvature in the large N-limit. The approach in [5] bypasses the problem of the existence of the macroscopic mean energy (hypothesis H1 below), which is open in the complex geometric setting. On the other hand, as discussed in [5], extending the LDP in [5] to negative \(\beta \)—which is needed to produce Kähler–Einstein metrics with positive Ricci curvature—necessitates the existence of the macroscopic mean energy. This is the motivation behind Theorem 1.3 below, which shows that, conversely, hypothesis H1 together with the additional hypotheses H4, implies an LDP for appropriate negative \(\beta .\) The hypothesis H4 is inspired by the energy-entropy compactness results in [9], which can be viewed as the macroscopic analog of H4 in the complex geometric setting. Incidentally, in the lowest-dimensional case when X is a Riemann surface, the probabilistic setting in [4, 5] is essentially equivalent to a mean field model with a logarithmic pair interaction, which is thus similar to Onsager’s vortex model for 2D turbulence [29]. In the latter situation, the corresponding concentration properties were established in [15, 22], for any \(\beta \) above the critical negative temperature (and the LDP was established using a different method in [12]). This is in line with Onsager’s prediction of the existence of macroscopic negative temperature states.

Another motivation for the present paper comes from random matrix theory (or more generally Coulomb gases), which can be viewed as a vortex type model with \(\beta _{N}\sim N\) (and in particular \(\beta =\infty \)). The corresponding concentration property was established in [24], using the method of Messer–Spohn as in [15, 22]. Here we observe that, with a simple modification, the concentration property can be upgraded to an LDP (Corollary 1.2). In particular, this allows one to dispense with some technical assumptions (such as regularity properties of the corresponding pair interaction W(xy) away from the diagonal) used in the different approaches to LDPs in [1, 2, 16, 34] (see Sect. 1.3).1

Yet another motivation comes from approximation and sampling theory and, in particular, the problem of finding nearly minimal configurations for a given energy type interaction on a Riemannian manifold, in the spirit of [20, 32].

Let us also point out that that the restriction that X be compact can be removed if suitable growth-assumptions of \(H^{(N)}\) at infinity are made, as in the settings in \(\mathbb {R}^{n}\) considered in [1, 2, 16, 18, 24, 34] (using appropriate tightness estimates). But in order to (hopefully) convey the conceptual simplicity of the arguments we stick with a compact X.

1.1 Hypotheses

We may as well assume that X coincides with the support of \(\mu _{0}.\) In the following, \(\mu ^{(N)}\) will denote a symmetric probability measure on \(X^{N}\) and \(Y:=\mathcal {P}(X).\) We recall that the mean (microscopic) energy of \(\mu ^{(N)},\) in the usual sense of statistical mechanics, is defined by
$$\begin{aligned} E^{(N)}(\mu ^{(N)}):=\frac{1}{N}\int _{X^{N}}H^{(N)}\mu ^{(N)} \end{aligned}$$
(assuming that \(H^{(N)}\in L^{1}(\mu ^{(N)})).\) We introduce the following hypotheses:
  • (H1) “Existence of a macroscopic mean energy \(E(\mu )''\): There exists a functional \(E(\mu )\) on \(\mathcal {P}(X)\) such that for any \(\mu \) in \(\mathcal {P}(X)\) satisfying \(E(\mu )<\infty \),
    $$\begin{aligned} \lim _{N\rightarrow \infty }E^{(N)}(\mu ^{\otimes N})=E(\mu ). \end{aligned}$$
    Moreover, \(E(\mu _{0})<\infty \).
  • (H2) “Lower bound on the mean energy”: For any sequence of \(\mu ^{(N)}\) such that \(\Gamma _{N}:=(\delta _{N})_{*}\mu ^{(N)}\rightarrow \Gamma \) weakly in \(\mathcal {P}(Y)\), we have
    $$\begin{aligned} \liminf _{N\rightarrow \infty }E^{(N)}(\mu ^{(N)})\ge E(\Gamma ):=\int _{Y}E(\mu )\Gamma (\mu ). \end{aligned}$$
  • (H3) “Approximation property”: For any \(\mu \) such that \(E(\mu )<\infty \), there exists a sequence \(\mu _{j}\) converging weakly to \(\mu \) such that \(\mu _{j}\) has finite entropy with respect to \(\mu _{0}\) and satisfies \(E(\mu _{j})\rightarrow E(\mu ).\)

  • (H4) “Mean energy/entropy compactness”: If
    $$\begin{aligned} E^{(N)}(\mu ^{(N)})\le C,\,\,\,D^{(N)}(\mu ^{(N)})\le C, \end{aligned}$$
    where \(D^{(N)}(\mu ^{(N)})\) is the mean entropy, then the following convergence holds, after perhaps replacing \(\mu ^{(N)}\) by a subsequence such that \(\Gamma _{N}:=(\delta _{N})_{*}\mu ^{(N)}\rightarrow \Gamma \) weakly in \(\mathcal {P}(Y):\)
    $$\begin{aligned} \lim _{N\rightarrow \infty }E^{(N)}(\mu ^{(N)})=\int _{Y}E(\mu )\Gamma (\mu ). \end{aligned}$$
The first hypothesis will be assumed throughout the paper. The second and third ones will appear naturally in positive and vanishing temperature, respectively, while the fourth one turns out to be useful in some case of negative temperature. However, it may very well be that hypothesis H4 needs to be weakened a bit in order to increase its scope. For example, in the proof of the large N–concentration properties, one only needs to assume that H4 holds when \(\mu ^{(N)}\) is the Gibbs measure corresponding to \(H^{(N)}\) (see Remark 2.11).

Of course, the sign of the temperature may be switched by replacing \(H^{(N)}\) with \(-H^{(N)},\) but the point is that, in practice, we will consider settings where the sign of \(H^{(N)}\) is fixed by the requirement that \(H^{(N)}\) be bounded from below (which essentially means that the system is assumed to be stable at zero temperature).

1.2 Large Deviation Results

We start with the simpler setting of positive temperature:

Theorem 1.1

Suppose that hypotheses H1 and H2 hold, and let \(\beta _{N}\) be a sequence of positive numbers tending to \(\beta \in ]0,\infty [.\) Then the measures \((\delta _{N})_{*}(e^{-\beta _{N}H^{(N)}}\mu _{0}^{\otimes N})\) on \(\mathcal {P}(X)\) satisfy, as \(N\rightarrow \infty ,\) a large deviation principle (LDP) with speed \(\beta _{N}N\) and rate functional
$$\begin{aligned} F_{\beta }(\mu )=E(\mu )+\frac{1}{\beta }D_{\mu _{0}}(\mu ). \end{aligned}$$
Equivalently, the LDP holds for the corresponding Gibbs measures with \(F_{\beta }\) replaced by \(F_{\beta }-\inf F_{\beta }.\) Under the additional hypothesis‘ H3, the result also holds when \(\beta =\infty \).
The previous theorem in particular applies to the following “finite order” Hamiltonians of mean field type. Given symmetric functions \(W_{m}\) on \(X^{m}\) for \(m\le M\), set
$$\begin{aligned} H^{(N)}(x_{1},\ldots x_{N}):=\sum _{m=1}^{M}\frac{1}{N^{(m-1)}} \sum _{I}W_{m}(x_{i_{1}},\ldots ,x_{i_{m}}), \end{aligned}$$
where the inner sum runs over all multi indices \(I=(i_{1},\ldots ,i_{m})\) of length m and with the property that no two indices of I coincide. Then it is easy to verify H1 and H2 above with
$$\begin{aligned} E(\mu ):=\sum _{m=1}^{M}\int _{X^{m}}W_{m}\mu ^{\otimes m}. \end{aligned}$$
The main case of interest is when \(M=2\) and \(H^{(N)}\) is a sum of pair-interactions \(W(x_{i},x_{j}),\) scaled by 1 / N. But since it will require no extra effort in the proofs, we consider the more general “finite order setting.”

Corollary 1.2

Let \(W(x_{1},..x_{m})\) be a symmetric lower semi-continuous function on \(X^{m}\) taking values in \(]-\infty ,\infty ]\) and \(\beta _{N}\) a sequence of positive numbers tending to \(\beta \in ]0,\infty [.\) Assume that the Gibbs measures \(\mu _{\beta _{N}}^{(N)}\) of the corresponding mean field Hamiltonians are well-defined probability measures. Then the laws \((\delta _{N})_{*}(\mu _{\beta }^{(N)})\) satisfy an LDP with speed \(\beta _{N}N\) and rate functional \(F_{\beta },\) with
$$\begin{aligned} E(\mu ):=\int _{X^{m}}W\mu ^{\otimes m}. \end{aligned}$$
The corresponding result also holds for \(\beta =\infty \) if hypothesis H3 holds.

In the Euclidean setting and with \(M=2\), the previous corollary was established very recently in [18] using somewhat different methods (the results in [18] have also independently been generalized to the setting of the previous theorem and corollary in [19]).

We next turn to the case of negative temperature.

Theorem 1.3

Suppose that hypothesis H1 and H4 hold, and fix a negative number \(\beta _{0}.\) Then the following are equivalent:
  • For any \(\beta >\beta _{0},\) we have \(Z_{N,\beta }\le C_{\beta }^{N}\).

  • For any \(\beta >\beta _{0},\) the measures \((\delta _{N})_{*}\left( e^{-\beta H^{(N_{k})}}\mu _{0}^{\otimes N}\right) \) on \(\mathcal {P}(X)\) satisfy an LDP with speed N and rate functional
    $$\begin{aligned} \beta F_{\beta }(\mu )=\beta E(\mu )+D_{\mu _{0}}(\mu ). \end{aligned}$$

Corollary 1.4

Let W(xy) be a symmetric function in \(L^{1}(X^{2},\mu _{0}^{\otimes 2})\) such that the corresponding partition functions \(Z_{N,\beta }\) satisfy
$$\begin{aligned} Z_{N,\beta }\le C_{\beta }^{N} \end{aligned}$$
for any \(N\ge 2\) and \(\beta \in ]-\infty ,\infty [.\) Then the Gibbs measures \(\mu _{\beta }^{(N)}\) of the mean field Hamiltonians corresponding to W satisfy an LDP with speed N and rate functional \(\beta F_{\beta },\) with
$$\begin{aligned} E(\mu ):=\int _{X^{2}}W\mu ^{\otimes 2}. \end{aligned}$$

The key observation in the proof of Corollary 1.4 is that the first point in Theorem 1.3 always implies, “for free,” a uniform estimate in the Orlitz (Zygmund) space \(L^{1}\hbox {Log}L^{1},\) so that some general Orlitz space duality results [26, 30] can be exploited in order to verify the hypothesis H4.

It seems natural to ask if the previous corollary can be generalized to the case when one only assumes the integrability condition that \(Z_{2,\beta }\) be finite when \(\beta >\beta _{0},\) for some (finite) negative number \(\beta _{0}.\) The following theorem gives an affirmative answer if one strengthens the integrability condition a bit:

Theorem 1.5

Let X be a compact metric space, W a lower semi-continuous symmetric measurable function on \(X^{2}\), and \(\beta _{0}\) a negative number such that
$$\begin{aligned} \sup _{x\in X}\int _{X}e^{-\beta _{0}W(x,y)}\mu _{0}(y)<\infty . \end{aligned}$$
Then, for any \(\beta >\beta _{0}\), the Gibbs measures \(\mu _{\beta }^{(N)}\) satisfy an LDP as in the previous corollary.
Specialized to the logarithmic case of the vortex model, i.e. to the case
$$\begin{aligned} W(x,y)=-\log |x-y|, \end{aligned}$$
the previous theorem recovers the LDP in [12] with a new proof. The proof follows closely the corresponding (weaker) concentration result for the vortex model originally established in [15, 22]. The new observation is that with a little twist, the argument in [15, 22] can be supplemented to give the LDP in question.

1.3 Relations to Gamma-Convergence at Different Levels

The proofs of the LDPs above are based on the Gamma-convergence of the corresponding free energy functionals \(F_{\beta _{N}}^{(N)}\) when viewed as functionals on the space \(\mathcal {P}(\mathcal {P}(X))\) (a similar approach is used in the dynamic setting considered in [10] where the assumptions H1 and H2 also appear naturally). Incidentally, as observed in the following corollary, the LDPs then imply the Gamma-convergence of the scaled Hamiltonians \(H^{(N)}/N\) when viewed as functionals on \(\mathcal {P}(X).\)

Corollary 1.6

Suppose that the Hamiltonians \(H^{(N)}\) satisfy H1 and H2. Then \(H^{(N)}/N\) Gamma-converges towards \(E(\mu )\) on \(\mathcal {P}(X).\) In particular, this applies to the finite order mean field Hamiltonians.

The previous result generalizes the Gamma-convergence result in [34, Prop 2.8, Remark 2.19] for the mean field Hamiltonians corresponding to pair interactions W(xy) (on a domain in \(\mathbb {R}^{D}\)), where it was assumed that w(xy) essentially only blows up along the diagonal (similar results also appear implicitly in [1, 2, 16]). The proofs in [1, 2, 16, 34]) are based on some rather intricate combinatorial constructions, involving small cubes in \(\mathbb {R}^{D}.\) On the other hand, the latter results yield the stronger result that any measure \(\mu \) with a positive continuous density admits a recovery sequence \(x^{(N)}\) such that
$$\begin{aligned} \limsup _{N\rightarrow \infty }\sup _{B_{\epsilon _{N}}}E_{N}\le E(\mu ), \end{aligned}$$
where \(B_{\epsilon _{N}}\) denotes the \(L^{\infty }\)-ball with center \(x^{(N)}\) and radius \(\epsilon _{N}\) of the order \(1/N^{1/D}.\) In turn, as shown in [34], the latter stronger form of Gamma-convergence implies the LDP for the corresponding Gibbs measures when \(\beta _{N}\gg \log N\) (which is used to make sure that \(\beta _{N}^{-1}N^{-1}\log \int _{B_{\epsilon _{N}}}\mu _{0}^{\otimes N}\rightarrow 0\) when \(\mu _{0}\) is a volume form). But in fact, as pointed out by the referee, the technical condition that \(\beta _{N}\gg \log N\) can be replaced by \(\beta _{N}\gg 1\) by a slight modification of the proof of Prop 2.5 in [16].

Relations between Gamma-convergence and large deviation principles have also been previously studied in [27] but from a somewhat different perspective (see also [13] for some related results).

1.4 Applications to the Coulomb Gas on a Riemannian Manifold

In the companion paper [6], the general large deviation results above are illustrated and further developed for Coulomb and Riesz type gases on a compact D-dimensional Riemannian manifold (Xg) and more generally for suitable compact subsets \(K\subset X\) (the case when \(\mu _{0}\) is a volume form and \(\beta >0\) has also indepedently been obtained in [19]).

Here we will only state the corresponding LDP for the Coulomb gas on (Xg) defined as follows. Let W(xy) be 1 / 2 times the integral kernel of the inverse of the positive Laplacian \(-\Delta \) on the space of all functions in \(L^{2}(X,dV_{g})\) with mean zero, where \(dV_{g}\) denotes the volume form determined by the metric g. As is classical, W is symmetric and smooth away from the diagonal, and close to the diagonal it admits the following asymptotics when \(D>2:\)
$$\begin{aligned} W(x,y)=\frac{C_{D}}{d(x,y)^{(D-2)}}(1+O(1)),\,\,\,D>2, \end{aligned}$$
for a positive constant \(C_{D}.\) Moreover, when \(D=2\),
$$\begin{aligned} W(x,y)=-\frac{1}{(4\pi )}\log d^{2}(x,y)+O(1),\,\,D=2. \end{aligned}$$
In particular, W is lsc and in \(L^{1}(X\times X).\) Given a probality measure \(\mu _{0}\) on X, the Coulomb gas on \((X,g,\mu _{0})\) at inverse temperature \(\beta _{N}\) is defined by the Gibbs measures corresponding to \((\mu _{0},H^{(N)},\beta _{N}),\) where \(H^{(N)}\) is the mean field Hamiltonian corresponding to the pair interaction W(xy). In this setting, Corollary 1.2 and Theorem 1.5 yield, as shown in [6], the following LDP for the laws of the empirical measures of the Coulomb gas, formulated in terms of the potential theoretic properties of the measure \(\mu _{0}:\)

Theorem 1.7

Let (Xg) be a compact Riemannian manifold, and consider the Coulomb gas at inverse temperature \(\beta _{N}\) on \((X,g,\mu _{0}).\)
  • When \(\beta \in ]0,\infty [,\) the LDP holds if the measure \(\mu _{0}\) is nonpolar.

  • When \(\beta =\infty \), the LDP holds if \(\mu _{0}\) is nonpolar and \(\mu _{0}\) is determining for its support K.

  • When \(D=2\) and \(\beta <\infty \), the LDP holds when \(\beta >-4\pi d(\mu _{0}),\) where \(d(\mu _{0})\in [0,\infty [\) is the sup over all \(t>0\) such that there exists a positive constant C (depending on t) such that
    $$\begin{aligned} \mu _{0}(B_{R}(x))\le CR^{t} \end{aligned}$$
    as \(R\rightarrow 0,\) for any Riemannian ball \(B_{R}(x)\) of radius R centered at a given point x in X.
We briefly recall that a compact subset \(K\subset X\) is polar if it is locally contained in the \(-\infty \)-set of a local subharmonic function (or equivalently, if K has vanishing capacity). Accordingly, a measure \(\mu _{0}\) is said to be nonpolar if it does not charge any polar set. The notion of a determining measure \(\mu _{0}\) appearing in the second point above means that for any continuous function u,
$$\begin{aligned} \left\| e^{\varphi -u}\right\| _{L^{\infty }(K,\mu _{0})}=\sup _{K}e^{\varphi -u}, \end{aligned}$$
for any quasi-subharmonic function \(\varphi \) on X; i.e., \(\varphi \) is strongly usc and satisfies \(\Delta \varphi \ge -1.\) This notion is closely related to the notion of measures satisfying a Bernstein–Markov property in pluripotential theory [8, 11] and measures with regular asymptotic behavior in the theory of planar orthogonal polynomials [33]. For example, \(\mu _{0}\) can be taken to be the D-dimensional Hausdorff measure on a Lipschitz domain \(K\subset (X,g)\) or the \((D-1)\)-dimensional Hausdorff measure on a Lipschitz hypersurface in (Xg). The point is that the assumption that \(\mu _{0}\) is nonpolar and determining implies that the hypothesis H3 is satisfied, as shown in [6] (an alternative proof of the LDP in the case when \(\beta =\infty \) can also be given using the approach in the complex geometric setting in [3]). Finally, we recall that measures satisfying \(d(\mu _{0})>0,\) as in the third point above, are sometimes called Frostman measures in the classical litterature (for example, the property in question holds with \(d(\mu _{0})=d\) when \(\mu _{0}\) is the d-dimensional Hausdorff measure of a compact subset K of X of Hausdorff dimension d).

More generally, an LDP as in the previous theorem is obtained in [6], when W(xy) is taken as the integral kernel of the inverse of \((-\Delta )^{p}\) and the (possible fractional) power p is in ]0, D / 2] (or even more generally: when \((-\Delta )^{p}\) is replaced by a suitable pseudodifferential operator of order at most D). Then the last point in the previous theorem holds in the critical case \(p=D/2.\) However, the LDP for \(\beta =\infty \) appears to be rather subtle in the general setting and is only shown to hold when \(\mu _{0}\) is a volume form (or comparable to a volume form), except when \(p\le 2\), where it applies to measures \(\mu _{0}\) that are determining in a suitable sense.

Let us also point out that in the Euclidean setting of the Coulomb and Riesz gases in \(\mathbb {R}^{n},\) with \(\mu _{0}\) given by the Euclidean volume form and \(\beta _{N}\) of the order N,  a refined “microscopic” large deviation principle “at the level of processes” is obtained in [25]. Such large deviation principles are beyond the scope of the present paper and seem to require different methods—the point here is rather to allow the measure \(\mu _{0}\) to be very singular (and the inverse temperature to be negative, in some cases).

2 Proofs of the Large Deviations Results

2.1 General Notation

Given a compact topological space X, we will denote by \(C^{0}(X)\) the space of all continuous functions u on X,  equipped with the sup-norm, and by \(\mathcal {M}(X)\) the space of all signed (Borel) measures on X. The subset of \(\mathcal {M}(X)\) consisting of all probability measures will be denoted by \(\mathcal {P}(X).\) We endow \(\mathcal {M}(X)\) with the weak topology; i.e., \(\mu _{j}\) is said to converge to \(\mu \) weakly in \(\mathcal {M}(X)\) if
$$\begin{aligned} \left\langle \mu _{j},u\right\rangle \rightarrow \left\langle \mu ,u\right\rangle :=\int _{X}u\mu \end{aligned}$$
for any continuous function u on X,  i.e., for any \(u\in C^{0}(X)\) (in other words, the weak topology is the weak-star topology when \(\mathcal {M}(X)\) is identified with the topological dual of \(C^{0}(X)).\) Since X is compact, so is \(\mathcal {P}(X).\) Given a lower semi-continuous function F on
$$\begin{aligned} Y:=\mathcal {P}(X), \end{aligned}$$
we will, abusing notation slightly, also write F for the induced linear lower semi-continuous functional on \(\mathcal {P}(Y):\)
$$\begin{aligned} F(\Gamma ):=\int _{\mathcal {P}(Y)}F(\mu )\Gamma (\mu ). \end{aligned}$$
Equivalently, under the natural embedding \(\mu \mapsto \delta _{\mu }\) of Y into \(\mathcal {P}(Y)\), the function \(F(\Gamma )\) is the unique lower semi-continuous affine extension of F to \(\mathcal {P}(Y).\)

We will denote by \(S_{N}\) the permutation group acting on \(X^{N}\) and by \(\mathcal {P}(X^{N})^{S_{N}}\) the space of symmetric measures \(\mu _{N}\) (i.e., \(S_{N}\)-invariant) on \(X^{N}.\) Also note that, following standard practice, we will denote by C a generic constant whose value may change from line to line.

2.1.1 Entropy

We will write \(D(\nu _{1},\nu _{2})\) for the relative entropy (also called the Kullback-Leibler divergence in information theory) of two measures \(\nu _{1}\) and \(\nu _{2}\) on a topological space Z :  if \(\nu _{1}\) is absolutely continuous with respect to \(\nu _{2},\) i.e., \(\nu _{1}=f\nu _{2},\) one defines
$$\begin{aligned} D(\nu _{1},\nu _{2}):=\int _{Y}\log (\nu _{1}/\nu _{2})\nu _{1}, \end{aligned}$$
and otherwise one declares that \(D(\nu _{1},\nu _{2}):=\infty .\) Note the sign convention used: D is minus the physical entropy. In our setting, the space Z will always be of the form \(X^{N}\), and we will then take the reference measure \(\nu _{2}=\mu _{0}^{\otimes N}\) and write \(D(\cdot ):=D(\cdot ,\mu _{0}^{\otimes N}).\) It will also be convenient to define the mean entropy of a probability measure \(\mu _{N}\) on \(X^{N}\) (i.e., \(\mu _{N}\in \mathcal {P}(X^{N}))\) as
$$\begin{aligned} D^{(N)}(\mu _{N}):=\frac{1}{N}D(\mu _{N},\mu _{0}^{\otimes N}). \end{aligned}$$
Then it follows directly that
$$\begin{aligned} D^{(N)}(\mu ^{\otimes N})=D(\mu ). \end{aligned}$$
Moreover, denoting by \(\left( \mu _{N}\right) _{j}\) the jth marginal \(\mu _{N}\) (which defines a probability measure on \(X^{j})\),
$$\begin{aligned} D^{(N)}(\mu _{N})\ge D^{(j)}(\left( \mu _{N}\right) _{j}), \end{aligned}$$
as follows from the concavity of the function \(t\mapsto \log t\) on \(\mathbb {R}_{+}\) (see, for example, [22]).

2.2 Preliminaries

2.2.1 Large Deviation Principles

Let us start by recalling the general definition of a large deviation principle (LDP) for a sequence of measures.

Definition 2.1

Let Y be a Polish space, i.e., a complete separable metric space.
  1. (i)

    A function \(I:\mathcal {\,Y}\rightarrow ]-\infty ,\infty ]\) is a rate function if it is lower semi-continuous. It is a good rate function if it is also proper; i.e., \(I^{-1}]-\infty ,a]\) is compact for any given \(a\in \mathbb {R}.\)

  2. (ii)

    A sequence \(\Gamma _{N}\) of measures on Y satisfies a large deviation principle with speed \(r_{N}\) and rate function I if

$$\begin{aligned} \limsup _{N\rightarrow \infty }\frac{1}{r_{N}}\log \Gamma _{N}(\mathcal {F}) \le -\inf _{\mu \in \mathcal {F}}I(\mu ) \end{aligned}$$
for any closed subset \(\mathcal {F}\) of Y and
$$\begin{aligned} \liminf _{N\rightarrow \infty }\frac{1}{r_{N}}\log \Gamma _{N}(\mathcal {G}) \ge -\inf _{\mu \in \mathcal {G}}I(\mu ) \end{aligned}$$
for any open subset \(\mathcal {G}\) of Y.

Remark 2.2

The LDP is said to be weak if the upper bound is only assumed to hold when \(\mathcal {F}\) is compact. Anyway, we will only consider the case when Y is compact, and hence the notion of a weak LDP and an LDP then coincide (and moreover, any rate functional is automatically good).

Lemma 2.3

(Bryc). Let Y be a compact Polish space. Suppose that there exists a function f on \(C^{0}(Y)\) such that for any \(\Phi \in C^{0}(Y)\),
$$\begin{aligned} f_{N}(\Phi ):=\frac{1}{r_{N}}\log \int e^{r_{N}\Phi }\Gamma _{N}\rightarrow f(\Phi ). \end{aligned}$$
Then \(\Gamma _{N}\) satisfies an LDP with speed \(r_{N}\) and rate functional
$$\begin{aligned} I(\mu )=\sup _{\Phi \in C^{0}(Y)}\left( \Phi (\mu )-f(\Phi )\right) \end{aligned}$$
(by Varadhan’s lemma the converse also holds).

We also have the following simple lemma:

Lemma 2.4

The measures \(\tilde{\Gamma }_{N}:=(\delta _{N})_{*}(e^{-\beta H^{(N)}}\mu _{0}^{\otimes N})\) are finite and satisfy the asymptotics in Bryc’s lemma with rate functional \(\tilde{I}(\mu )\) and speed N if and only if the corresponding probability measures \(\Gamma _{N}:=(\delta _{N})_{*}(\mu _{\beta }^{(N)})\) on \(\mathcal {P}(X)\) satisfy an LDP at speed N with rate functional \(I:=\tilde{I}-C_{\beta },\) where \(C_{\beta }:=\inf _{\mathcal {\mu \in }\mathcal {P}(X)}\tilde{I}(\mu )\) and the sequence \(-N^{-1}\log Z_{N,\beta }\) is convergent in \(\mathbb {R}\) (and then the limit is equal to \(C_{\beta }).\)


By definition,
$$\begin{aligned} \frac{1}{N}\log \int e^{N\Phi }\Gamma _{N}=\frac{1}{N}\log \int e^{N\Phi }\tilde{\Gamma }_{N}-\frac{1}{N}\log Z_{N,\beta },\,\,\,Z_{N,\beta }=\int e^{N0}\tilde{\Gamma }_{N}. \end{aligned}$$
Hence, if the measures \(\tilde{\Gamma }_{N}\) satisfy the asymptotics in Bryc’s lemma with rate functional \(\tilde{I}(\mu )\), then the measures \(\Gamma _{N}\) satisfy the asymptotics in Bryc’s lemma with rate functional \(\tilde{I}(\mu )-C_{\beta },\) where \(C_{\beta }\) is the limit of \(-\frac{1}{N}\log Z_{N,\beta }\) as \(N\rightarrow \infty .\) Since \(\mu _{\beta }^{(N)}\) is a probability measure, the LDP for \(\Gamma _{N}\) implies that the inf of \(\tilde{I}(\mu )-C_{\beta }\) vanishes; i.e., \(C_{\beta }\) is the inf of \(\tilde{I}(\mu ).\) The converse is proved in a similar way. \(\square \)

2.2.2 Gamma-Convergence

We recall that a sequence of functions \(f_{j}\) on a topological space \(\mathcal {X}\) is said to Gamma-converge to a function f on \(\mathcal {X}\) if
$$\begin{aligned} \begin{array}{lll} x_{j}\rightarrow x\,\text{ in } \mathcal {X} &{} \implies &{} \liminf \nolimits _{j\rightarrow \infty }f_{j}(x_{j})\ge f(x)\\ \forall x &{} \exists x_{j}\rightarrow x\,\text{ in } \mathcal {X}: &{} \lim \nolimits _{j\rightarrow \infty }f_{j}(x_{j})=f(x) \end{array} \end{aligned}$$
(such a sequence \(x_{j}\) is called a recovery sequence); see [14]. More generally, given a subset \(\mathcal {S}\Subset \mathcal {X}\), we will say that \(f_{j}\) Gamma-converges to f relative to \(\mathcal {S}\) if the existence of a recovery sequence in \(\mathcal {X}\) is only demanded when \(x\in \mathcal {S}.\)

Lemma 2.5

Assume that \(f_{j}\) Gamma-converges to f relative to \(\mathcal {S}\subset \mathcal {X}.\) Then \(f_{|\mathcal {S}}\) is lower semi-continuous.


Consider a sequence \(s_{i}\rightarrow s\) in S. For each \(s_{i}\) we take a recovery sequence \(y_{i}^{(j)}\) in X converging to \(s_{i}.\) Setting \(y_{i}:=y_{i}^{(n_{i})},\) for a suitable increasing function \(i\mapsto n_{i},\) yields a sequence \(y_{i}\) in \(\mathcal {X}\) converging to s such that \(f(s_{i})\ge f_{n_{i}}(y_{i})-1/i.\) Setting \(x_{n_{i}}:=y_{i}\) (and \(x_{j}:=s\) when j is not of the form \(j=n_{i}\) for any i) and using the first implication in the definition of the relative Gamma-convergence of the sequence \(f_{j},\) we thus deduce that \(\liminf _{i\rightarrow \infty }f(s_{i})\ge f(s),\) as desired. \(\square \)

Lemma 2.6

Let \(\mathcal {X}\) be a compact topological space and assume that \(f_{j}\) Gamma-converges to f relative to a set \(\mathcal {S}\) containing all minima of f. Then
$$\begin{aligned} \lim _{j\rightarrow \infty }\inf _{\mathcal {X}}f_{j}=\inf _{\mathcal {X}}f. \end{aligned}$$


Given \(s\in \mathcal {S}\) we take a recovery sequence \(x_{n}\) and observe that
$$\begin{aligned} f(s)\ge f_{n}(x_{n})+o(1)\ge \inf f_{n}+o(1)=f_{n}(y_{n})+o(1)\ge f(y)+o(1) \end{aligned}$$
for some \(y_{n},y\in \mathcal {X},\) by the compactness and the assumption of Gamma-convergence. In particular, when s realizes the minimum of f so does y, and hence equalities must hold above, which concludes the proof. \(\square \)

2.2.3 Legendre–Fenchel Transforms

Let f be a function on a topological vector space V. Then its Legendre–Fenchel transform is defined as the following convex lower semi-continuous function \(f^{*}\) on the topological dual \(V^{*}\):
$$\begin{aligned} f^{*}(w):=\sup _{v\in V}\left\langle v,w\right\rangle -f(v) \end{aligned}$$
in terms of the canonical pairing between V and \(V^{*}.\) In the present setting, we will take \(V=C^{0}(X)\) and \(V^{*}=\mathcal {M}(X),\) the space of all signed Borel measures on X. Then \(f^{**}=f\) for any lower semi-continuous convex function (by standard duality in locally convex topological vector spaces [17]).

2.3 Proof of Theorem 1.1

$$\begin{aligned} E_{N}(x_{1},..,x_{N}):=H^{(N)}(x_{1},\ldots ,x_{N})/N \end{aligned}$$
so that the mean energy (1.3) can be written as
$$\begin{aligned} E^{(N)}(\mu _{N}):=\int _{X^{N}}E_{N}\mu _{N}. \end{aligned}$$
We denote by \(F_{\beta _{N}}^{(N)}\) the corresponding mean free energy functional on \(\mathcal {P}(X^{N})^{S_{N}}\), at inverse temperature \(\beta _{N}:\)
$$\begin{aligned} F_{\beta _{N}}^{(N)}(\mu _{N}):=E^{(N)}(\mu _{N})+ \frac{1}{\beta _{N}}D^{(N)}(\mu _{N}), \end{aligned}$$
assuming that \(H^{(N)}\in L^{1}(\mu ^{(N)}),\) which ensures that \(E^{(N)}(\mu _{N})\) is well defined and finite. If \(H^{(N)}\) is not in \(L^{1}(\mu ^{(N)})\), we set \(F_{\beta _{N}}^{(N)}(\mu _{N}):=\infty \) if \(\beta _{N}\in ]-\infty ,\infty ]\) and \(F_{\beta _{N}}^{(N)}(\mu _{N}):=-\infty \) if \(\beta _{N}<0.\) Thus \(\beta _{N}F_{\beta _{N}}^{(N)}\) takes values in \(]-\infty ,\infty ].\)

Now set \(Y:=\mathcal {P}(X).\) By embedding \(\mathcal {P}(X^{N}/S_{N})\) into \(\mathcal {P}(Y),\) using the push-ward map \((\delta _{N})_{*},\) we can identify the mean free energies \(F_{\beta _{N}}^{(N)}\) with functionals on \(\mathcal {P}(Y),\) extended by \(\infty \) to all of \(\mathcal {P}(Y).\) We will identity Y with its image in \(\mathcal {P}(Y)\) under the embedding \(\mu \mapsto \delta _{\mu }.\)

The starting point of the proof of the LDP is the following reformulation of Bryc’s lemma in terms of the Legendre–Fenchel transform, using the Gibbs variational principle:

Lemma 2.7

(Bryc+Gibbs): Suppose that the Legendre–Fenchel transforms \(f_{N}\) of the free energy functionals \(F_{\beta _{N}}^{(N)}\) converge point-wise to a function f on \(C^{0}(Y):\)
$$\begin{aligned} \lim _{N\rightarrow \infty }f_{N}(\Phi )=f(\Phi ); \end{aligned}$$
$$\begin{aligned} \lim \,_{N\rightarrow \infty }\inf _{\mathcal {\Gamma \in P}(Y)}\left( F_{\beta _{N}}^{(N)}(\Gamma )+\left\langle \Phi ,\Gamma \right\rangle \right) =-f(-\Phi ). \end{aligned}$$
Then the LDP holds with speed \(N\beta _{N}\) and rate functional
$$\begin{aligned} I(\mu ):=f^{*}(\delta _{\mu }), \end{aligned}$$
where \(f^{*}\) is the Legendre–Fenchel transform of f.


The Gibbs variational principle says that if \(\mu _{\beta _{N}}^{(N)}\) is a well-defined probability measure, then
$$\begin{aligned} \inf _{\mathcal {P}(X^{N})^{S_{N}}}F_{\beta _{N}}^{(N)}=F_{ \beta _{N}}^{(N)}(\mu _{\beta _{N}}^{(N)})=-\frac{1}{N\beta _{N}} \log \int _{X^{N}}e^{-\beta _{N}NE^{(N)}}\mu _{0}^{\otimes N}. \end{aligned}$$
Indeed, rewriting \(F_{\beta _{N}}^{(N)}(\mu _{N})=\frac{1}{\beta _{N}}D(\mu _{N},\mu _{\beta _{N}}^{(N)})- \frac{1}{N\beta _{N}}\log \int _{X^{N}}e^{-\beta _{N}NE^{(N)}}\mu _{0}^{\otimes N}\), this follows immediately from the fact that \(D\ge 0\) (which in turn follows from Jensen’s inequality). Hence, replacing \(H^{(N)}\) with the new Hamiltonian \(H^{(N)}+N\delta _{N}^{*}(\Phi )\) and applying Bryc’s lemma concludes the proof. \(\square \)

Remark 2.8

Varadhan’s lemma implies that the converse of the previous lemma also holds.

In order to verify the criterion in the previous lemma, we will use the following lemma:

Lemma 2.9

Under the hypotheses H1 and H2 and \(\beta \in ]0,\infty [\), the mean free energies \(F_{\beta _{N}}^{(N)}\) Gamma-converge to the lower semi-continuous linear functional \(F_{\beta }(\Gamma )\) on \(\mathcal {P}(Y),\) relative to Y,  where \(F_{\beta }(\mu )\) is the macroscopic free energy on Y : 
$$\begin{aligned} F_{\beta }:=E+D/\beta . \end{aligned}$$
If moreover H3 holds, then the corresponding result also holds when \(\beta =\infty \).


First assume that \(\beta <\infty .\) The lower bound follows directly from hypotheses H1 and H2 together with the fact that the mean entropy functionals satisfy the lower bound in the Gamma-convergence (by subadditivity [31]; see also Theorem 5.5 in [21] for generalizations). To prove the existence of recovery sequences, we fix an element \(\Gamma \) of the form \(\delta _{\mu }\) and take the recovery sequence to be of the form \((\delta _{N})_{*}\mu ^{\otimes N}.\) Then the required convergence follows from H1 together with the product property (2.1.) Finally, when \(\beta =\infty \), the previous argument for the existence of a recovery sequence still applies as long as \(\mu \) satisfies \(D(\mu )<\infty .\) The general case then follows by a simple diagonal approximation argument using H3. \(\square \)

Now, since the limiting functional \(F_{\beta }(\Gamma )\) is affine and lower semi-continuous (by Lemma 2.5) and the set Y is extremal in \(\mathcal {P}(Y)\), the infimum of F on \(\mathcal {P}(Y)\) is attained in Y (for example, by Choquet’s theorem). Fixing a continuous function \(\Phi \) on \(C^{0}(Y)\) and replacing \(H^{(N)}\) with the new Hamiltonian \(H^{(N)}+N\delta _{N}^{*}(\Phi ),\) Lemma 2.6 thus shows that the criterion in Lemma 2.7 is satisfied. Hence the LDP holds with lower semi-continuous rate functional \(I(\mu )=f^{*}(\delta _{\mu }).\) Finally, extending I to \(\mathcal {P}(Y)\) by linearity, this means that \(I(\Gamma )\) is the Legendre–Fenchel transform of f,  i.e., \(I=f^{*}.\) But in our case f is itself defined as \(f:=F^{*}\), and hence, \(I=F^{**}=F\) since F is convex (and even affine) and lower semi-continuous.

Remark 2.10

An inspection of the proof of Theorem 1.1 above reveals that, in the case \(\beta =\infty ,\) the hypothesis H3 may be replaced by the following weaker one:

  • (H3)’ The functional \(F_{\beta }\) Gamma-converges towards E,  as \(\beta \rightarrow \infty \).

2.4 Proof of Corollary 1.2

First note that H1 is trivially satisfied. To verify the second hypothesis H2, we may as well, by linearity, assume that \(M=m\) and that there is just one term with \(W:=W_{m}(x_{1},\ldots ,x_{m}).\) Since W is lower semi-continuous, there exists a sequence of continuous, functions \(W_{R}\) increasing to W as \(R\rightarrow \infty \), and we denote by \(E_{W_{R}}\) the corresponding functionals on \(\mathcal {P}(X).\) It follows readily from the definitions that for any fixed \(R>0\),
$$\begin{aligned} E_{W_{R}}\left( \delta _{N}(x_{1},\ldots ,x_{N})\right) +O( \frac{1}{N})=E_{W_{R}}^{(N)}(x_{1},x_{2},\ldots .,x_{N}), \end{aligned}$$
and, in particular,
$$\begin{aligned} E_{W}^{(N)}(\mu _{N})\ge \int E_{W_{R}}\left( \delta _{N}(x_{1},\ldots ,x_{N})\right) \mu _{N}+C_{R}/N. \end{aligned}$$
But since \(E_{W_{R}}\)is continuous,
$$\begin{aligned} \int E_{W_{R}}\left( \delta _{N}(x_{1},\ldots ,x_{N})\right) \mu _{N}= \int _{\mathcal {P}(X)}E_{W_{R}}(\mu )(\delta _{N})_{*}\mu _{N}\rightarrow \int _{\mathcal {P}(X)}E_{W_{R}}(\mu )\Gamma . \end{aligned}$$
$$\begin{aligned} \liminf _{N\rightarrow \infty }\int _{X^{N}}E^{(N)}\mu _{N}\ge \int E_{W_{R}}(\mu )\Gamma \end{aligned}$$
for any \(R>0.\) Finally, letting \(R\rightarrow \infty \) and using the monotone convergence theorem of integration theory concludes the proof.

2.5 Proof of Theorem 1.3

First observe that if the LDP in the second point of the theorem holds, then integrating over all of \(\mathcal {P}(X)\) reveals that the first point holds. To prove the converse, we fix \(\beta >\beta _{0}\) and note that the Gibbs variational principle applied at the inverse temperature \(\beta -\epsilon \) gives
$$\begin{aligned} (\beta -\epsilon )F_{\beta -\epsilon }^{(N)}\ge -C_{\epsilon }, \end{aligned}$$
which we rewrite as
$$\begin{aligned} \beta F_{\beta }^{(N)}\ge \epsilon E^{(N)}-C_{\epsilon }. \end{aligned}$$
Thus, by the Gibbs variational principle,
$$\begin{aligned} \beta F_{\beta }^{(N)}(\nu ^{\otimes N})\ge \beta F_{\beta }^{(N)}(\mu _{\beta }^{(N)})\ge \epsilon E^{(N)}(\mu _{\beta }^{(N)})-C_{\epsilon } \end{aligned}$$
for any fixed \(\nu \in \mathcal {P}(Y).\) In particular, taking \(\nu =\mu _{0}\) and using the hypothesis H1 (which implies that \(F_{\beta }^{(N)}(\mu _{0}^{\otimes N})\rightarrow E(\mu _{0})\) as \(N\rightarrow \infty )\) gives that
$$\begin{aligned} E^{(N)}(\mu _{\beta }^{(N)})\le C'. \end{aligned}$$
But then the previous inequalities force
$$\begin{aligned} D^{(N)}(\mu _{\beta }^{(N)})\le C''. \end{aligned}$$
Hence, by the hypothesis H4,
$$\begin{aligned} \lim _{N\rightarrow \infty }E^{(N)}(\mu _{\beta }^{(N)})=\int _{\mathcal {P}(X)}E( \mu )\Gamma (\mu ) \end{aligned}$$
for any limit point \(\Gamma \) of the laws of \(\delta _{N}.\) As a consequence, we deduce precisely as before that the desired asymptotics for the \(\beta _{N}F_{\beta _{N}}^{(N)}\) hold. Finally, repeating the argument with \(H^{(N)}\) replaced by the new Hamiltonian \(H^{(N)}+N\delta _{N}^{*}(\Phi )\) (which satisfies the same hypothesis) concludes the proof, just as before.

Remark 2.11

If one only wants to prove that the laws of \(\delta _{N}\) concentrate on the minima of \(F_{\beta }\) (rather than proving an LDP), it is enough to show that the convergence of the free energies hold for \(\Phi =0\) (as in the original approach in [28]). As revealed by the previous proof, this only requires that the hypothesis H4 holds for the particular sequence \(\mu _{\beta }^{(N)}.\)

2.6 Proof of Corollary 1.4

First observe that the assumption on \(Z_{N,\beta }\) in the case \(N=2\) gives the following exponential integrability property of W:
$$\begin{aligned} \int _{X^{2}}e^{-\beta W}\mu _{0}^{\otimes 2}<\infty \end{aligned}$$
for any \(\beta .\) Hence, by Theorem 1.3, it will be enough to show that the previous integrability property implies that H4 is satisfied (for any sequence \(\mu _{N})\). To this end, we will apply a duality argument. First recall that given a measure space \((\mathcal {X},\mu )\) and a finite Young function \(\theta \) on \(\mathbb {R}\) (i.e., a non-negative even lower semi-continuous convex function such that \(\theta (0)=0\)), the corresponding large Orlitz space is defined by
$$\begin{aligned} L_{\theta }(\mathcal {X},\mu ):=\left\{ f:\,\,\exists \alpha >0:\,\int \theta (\alpha f)\mu <\infty \right\} \end{aligned}$$
and the corresponding small Orlitz space is defined by
$$\begin{aligned} M_{\theta }(\mathcal {X},\mu ):=\left\{ f:\,\,\forall \alpha >0:\,\int \theta (\alpha f)\mu <\infty \right\} \end{aligned}$$
(where all functions f are assumed measurable). The space \(L_{\theta }(\mathcal {X},\mu )\) may be equipped with a norm \(\left\| \cdot \right\| _{\theta },\) called the Luxemburg norm, which turns \(L_{\theta }(\mathcal {X},\mu )\) and its subspace \(M_{\theta }(\mathcal {X},\mu )\) into Banach spaces:
$$\begin{aligned} \left\| f\right\| _{\theta }:=\inf \left\{ b>0:\,\int \theta (b^{-1}f)\mu \le 1\right\} , \end{aligned}$$
i.e., the gauge of the set (unit-ball)
$$\begin{aligned} \left\{ f:\,\int \theta (f)\mu \le 1\right\} . \end{aligned}$$
By the Hölder–Young inequality,
$$\begin{aligned} \left| \int fg\mu \right| \le 2\left\| f\right\| _{\theta }\left\| g\right\| _{\theta ^{*}}, \end{aligned}$$
where \(\theta ^{*}\) is the Young function defined as the Legendre–Fenchel transform of \(\theta .\) In particular, for any \(g\in L_{\theta ^{*}} f\mapsto \int fg\mu \) defines a continuous function on \(L_{\theta }\) with bounded operator norm; i.e., \(L_{\theta ^{*}}\subset L_{\theta }^{*},\) where \(L^{*}\) denotes the Banach space dual of a Banach space L,  endowed with the operator norm. To apply this in the present context, we note that
$$\begin{aligned} E^{(N)}(\mu _{N})=\int _{X^{2}}W\rho _{N}\mu _{0}^{\otimes 2}, \end{aligned}$$
where \(\rho _{N}\) is the density of the second marginal of \(\mu _{N}.\) The assumption that \(D^{(N)}(\mu ^{(N)})\le C\) implies that
$$\begin{aligned} \int _{X^{2}}(\rho _{N}\log \rho _{N})\mu _{0}^{\otimes 2}\le C2, \end{aligned}$$
according to (2.2).
Now set \(\theta (s):=e^{s}-s-1,\) when \(s\ge 0.\) Then \(\theta ^{*}(t)=(t+1)\log (1+t)-t\) when \(t\ge 0.\) By the previous entropy inequality for \(\rho _{N}\), the sequence \(\{\rho _{N}\}\) stays in a fixed ball in \(L_{\theta ^{*}}\), and hence, by the Hölder-Young inequality, \(\{\rho _{N}\}\) stays in a fixed ball in the dual Banach space \(L_{\theta }^{*}.\) By weak compactness, it then follows that there exists \(\Lambda \in L_{\theta }^{*}\) such that for any \(g\in L_{\theta }\),
$$\begin{aligned} \int _{X^{2}}\rho _{N}g\mu _{0}^{\otimes 2}\rightarrow \left\langle \Lambda ,g\right\rangle \end{aligned}$$
(after perhaps passing to a subsequence). Now, since \(\rho _{N}\mu _{0}^{\otimes 2}\) is a probability measure, we may also assume that there exists \(\rho \in L_{\theta ^{*}}\) such that
$$\begin{aligned} \int _{X^{2}}\rho _{N}u\mu _{0}^{\otimes 2}\rightarrow \int _{X^{2}}\rho u\mu _{0}^{\otimes 2} \end{aligned}$$
for any continuous function u. In our case \(g=W\), and we just need to check that \(\left\langle \Lambda ,g\right\rangle =\left\langle \rho \mu _{0}^{\otimes 2},g\right\rangle \). But, by assumption, \(W\in M_{\theta }\), and by the general duality theorems in [26, 30] the topological dual of \(M_{\theta }\) identifies with \(L_{\theta ^{*}},\) i.e. any continuous functional \(\Lambda \) on \(M_{\theta }\) is obtained by integrating against a (unique) \(\rho \in L_{\theta ^{*}},\) which concludes the proof.

2.7 Proof of Theorem 1.5

Given a compact metric space X, we endow \(Y(:=\mathcal {P}(X))\) with the Wasserstein \(L^{1}\)-metric d,  which is compatible with the weak topology:
$$\begin{aligned} d(\mu ,\nu )=\sup _{f:\,L(f)\le 1}\int f(\mu -\nu ), \end{aligned}$$
where f is Lipschitz continuous on X with Lipschitz constant \(L(f)\le 1.\) Since \(\int (\mu -\nu )=0\), we may as well assume that \(f(x_{0})=0\) for a fixed point \(x_{0}\) and hence that \(|f(x)|\le C_{X}\), where \(C_{X}\) is independent of f (since X is compact and, in particular, has bounded diameter).
Let us first show that, when \(\beta >\beta _{0},\)
$$\begin{aligned} Z_{N,\beta }\le A_{\beta }^{(N-1)},\,\,\,A_{\beta }=\sup _{x\in X}\int _{X}e^{-\beta W(x,y)}\mu _{0}(y). \end{aligned}$$
To see this, rewrite \(-\beta H^{(N)}=\frac{1}{N}\sum _{i=1}^{N}f_{i},\) where \(f_{i}\) is the sum of \(\beta W(x_{i},x_{j})\) over all j such that \(j\ne i.\) The arithmetric-geometric means inequality gives
$$\begin{aligned} \int _{X^{N}}e^{-\beta H^{(N)}}\mu _{0}^{\otimes N}\le \sum _{i=1}^{N}\frac{1}{N}\int _{X^{N}}e^{f_{i}}\mu _{0}^{\otimes N}. \end{aligned}$$
Now, for a given i, we have (by first integrating over the \(N-1\) variables different from \(x_{i}\) and then taking the sup over \(x_{i})\)
$$\begin{aligned} \int _{X^{N}}e^{f_{i}}\mu _{0}^{\otimes N}=\int _{X}\left( \int _{X}e^{-W(x,y)}\mu _{0}(y)\right) {}^{N-1}\mu _{0}(x)\le A_{\beta }^{N-1}, \end{aligned}$$
where, by assumption, \(A_{\beta }<\infty \) (since \(A_{\beta _{0}}<\infty \) and W is lsc on the compact space \(X^{2}\) and thus bounded from below).

Next we fix a continuous function \(\Phi \) on \(Y:=\mathcal {P}(X).\) Without loss of generality, we may as well assume that \(W,\Phi \ge 0.\)

First observe that when \(\beta >\beta _{0}\), we have
$$\begin{aligned} Z_{N,\beta }[\Phi ]:=\int _{X^{N}}e^{-\beta \left( H^{(N)}+N\delta _{N}^{*}( \Phi )\right) }\mu _{0}^{\otimes N}\le C_{\beta }^{N}, \end{aligned}$$
as follows directly from the bound (2.7) (using that \(\Phi \) is bounded).
Using the convergence of the mean energies and Gibbs variational principle, as before, we thus have
$$\begin{aligned} -\log C_{\beta }\le \limsup _{N\rightarrow \infty }-\frac{1}{N}\log Z_{N,\beta }[\Phi ]\le \inf _{\mu \in \mathcal {P}(X)}\left( \beta E(\mu )+\Phi (\mu )+D(\mu )\right) . \end{aligned}$$
Now, to prove the LDP we need, in view of Lemma 2.7, to complement the upper bound on \(-\frac{1}{N}\log Z_{N,\beta }[\Phi ]\) in formula (2.9) with a corresponding lower bound. To this end it would, by Theorem 1.3, be enough to establish the hypothesis H4 in the present context (for example by trying to extend the Orlitz space duality argument in the proof of Corollary 1.4). Such an approach remains to be developed (however, a macroscopic version of H4 does hold; see formula (2.12)). Here we will instead take another road, exploiting the stronger \(L^{p}\)-bounds provided by the following lemma (inspired by [15, 22]):

Lemma 2.12

Let \(\Phi \) be a given Lipschitz continuous function on \(Y:=\mathcal {P}(X)\) and fix \(\beta >\beta _{0.}\) Then the following estimate holds for the densities \(\rho _{j}^{(N)}\) of the jth marginal of the Gibbs measures corresponding to the Hamiltonian \(H^{(N)}+N\delta _{N}^{*}(\Phi ):\)
$$\begin{aligned} \rho _{j}^{(N)}(x_{1},..x_{j})\le C_{j}e^{-\frac{1}{N}\sum \sum _{k\ne l\le j}W(x_{k},x_{l})} \end{aligned}$$
as \(N\rightarrow \infty .\) In particular, for any \(p>1\), \(\rho _{j}^{(N)}(x_{1},..x_{j})\) is uniformly bounded in \(L^{p}\) as \(N\rightarrow \infty .\)


To fix ideas, we start with the case \(\Phi =0,\) following closely the proof of Theorem 3.1 in [15]. Set
$$\begin{aligned} W(X,Y):=\sum _{x\in X,y\in Y}W(x_{i},y_{j}),\,\,\,\,d\mu (Y):=\mu _{0}^{\otimes N-j}, \end{aligned}$$
where \(X:=\{x_{1},..,x_{j}\}\) and \(Y:=\{x_{j+1},\ldots ,x_{N}\}.\)Then we can decompose \(E^{(N)}(x_{1},\ldots ,x_{N})=\frac{1}{N}W(X,X)+\frac{1}{N}W(X,Y)+\frac{1}{N}W(Y,Y).\) Accordingly,
$$\begin{aligned} \rho _{j}^{(N)}(X)=e^{-\frac{\beta }{N}W(X,X)}\frac{1}{Z_{N}}\int e^{-\frac{\beta }{N}W(X,Y)}e^{-\frac{\beta }{N}W(Y,Y)}d\mu (Y). \end{aligned}$$
Applying Hölder’s inequality with \(p=N/j\) (and thus \(q=1+j/(N-j))\), the integral in the right-hand side is bounded from above by
$$\begin{aligned} =\left( \int e^{-\beta W(X,Y)}d\mu (Y)\right) ^{1/N}\left( \int e^{-\frac{\beta }{N}qW(Y,Y)}d\mu (Y)\right) ^{1/q}. \end{aligned}$$
By assumption, the integral appearing in the first factor above is bounded from above by a \(A^{N}\) for some positive constant A. It will thus be enough to show that the second integral is controlled by \(Z_{N}\) in the sense that it is bounded from above by a uniform constant times \(Z_{N}.\) To this end, we will apply Hölder’s inequality again, now with conjugate exponents u and w with u sufficiently close to 1 (to be quantified below). We thus rewrite
$$\begin{aligned} q=\frac{1}{u}+\left( q-\frac{1}{u}\right) \end{aligned}$$
and apply Hölder’s inequality. Since \(w(q-1/u)=1+w(q-1)=1+\frac{wj}{(N-j)},\) this gives
$$\begin{aligned} \int e^{-\frac{\beta }{N}qW(Y,Y)}d\mu (Y)\le & {} \left( \int e^{-\frac{\beta }{N}W(Y,Y)}d\mu (Y)\right) ^{1/u}\nonumber \\&\left( \int e^{-\frac{\beta }{N}(1+\frac{wj}{(N-j)})W(Y,Y)}d\mu (Y)\right) ^{1/w}. \end{aligned}$$
The first factor is controlled by \(Z_{N}\) (since \(W\ge 0).\) Moreover, taking \(w=\epsilon (N-j)/j\) for a sufficiently small positive number \(\epsilon \) ensures that the integral in the second factor is controlled by \(Z_{N,(1+\epsilon )\beta }\le B^{N}\) (by (2.8)). Since w is of the order N, this concludes the proof when \(\Phi =0.\) To treat the general case, we will use the following:
$$\begin{aligned} \text{ Claim: } \left| \Phi \left( \frac{1}{N}\sum _{i=1}^{N} \delta _{x_{i}}\right) -\Phi \left( \frac{1}{N-j}\sum _{i=j+1}^{N} \delta _{x_{i}}\right) \right| \le C\frac{1}{N}. \end{aligned}$$
Accepting the claim for the moment and introducing the notation \(\Phi (\frac{1}{N-j}\sum _{i=j+1}^{N}\delta _{x_{i}})=\phi (Y)\), we have
$$\begin{aligned} \rho _{j}^{(N)}(X)\le \frac{e^{C}}{Z_{N}}e^{-\frac{\beta }{N}W(X,X)}\int e^{-\frac{\beta }{N}W(X,Y)}e^{-\frac{\beta }{N}(W(Y,Y)+N^{2}\phi (Y))}d\mu (Y). \end{aligned}$$
We then use first Hölder’s inequality with p and q and then with u and v exactly as above to get the same factors as above apart from the last factor in formula (2.10), which now becomes
$$\begin{aligned} \int e^{-\frac{\beta }{N}(1+\gamma )(W(Y,Y)+N^{2}\Phi (y))}d\mu (Y), \end{aligned}$$
which is bounded from above by \(C'^{N},\) according to the estimate (2.8) (when \(\gamma \) is sufficiently small). This proves the lemma once we have verified the claim above. To this end, we assume to simplify the notation that \(j=1\) (the general case is similar) and observe that setting \(\mu :=\frac{1}{N}\sum _{i=1}^{N}\delta _{x_{i}}\) and \(\nu :=\frac{1}{N-1}\sum _{i=2}^{N}\delta _{x_{i}}\) gives
$$\begin{aligned} N(\mu -\nu )=\delta _{x_{1}}-\frac{1}{N-1}\sum _{i=2}^{N}\delta _{x_{i}}. \end{aligned}$$
Hence, for any f such that \(|f|\le C_{X}\), we have
$$\begin{aligned} \left| \int f(\mu -\nu )\right| \le \frac{1}{N}\left( C_{X}+\frac{1}{N-1} \sum _{i=2}^{N}C_{X}\right) \le \frac{1}{N}2C_{X}, \end{aligned}$$
and hence \(d(\mu ,\nu )\le 2C_{X}/N.\) But then the claim follows directly from the Lipschitz continuity of \(\Phi .\) \(\square \)
Now, to verify the missing lower bound on \(-\frac{1}{N}\log Z_{N,\beta }[\Phi ]\), we first claim that it will be enough to verify the case when \(\Phi \) is Lipschitz continuous. Indeed, any continuous function \(\Phi \) on a compact metric space Y can be written as a uniform limit \(\Phi ^{(R)}\) of Lipschitz continuous function (for example, \(\Phi ^{(R)}(x):=\inf _{Y}\left( \Phi (y)+Rd(x,y)\right) \) increases to \(\Phi ,\) as \(R\rightarrow \infty ,\) and has Lipschitz constant R). Moreover we may, after relabeling the sequence, assume that \(|\Phi _{\epsilon }-\Phi |\le \epsilon .\) But since \(\Phi \mapsto -\frac{1}{N\beta }\log Z_{N,\beta }[\Phi ]\) is increasing and \(-\frac{1}{N\beta }\log Z_{N,\beta }[\Phi +c]=-\frac{1}{N\beta }\log Z_{N,\beta }[\Phi ]+c\) for any \(c\in \mathbb {R}\), we get
$$\begin{aligned} \left| \frac{1}{N\beta }\log Z_{N,\beta }[\Phi ]-\frac{1}{N\beta }Z_{N,\beta }[\Phi _{\epsilon }]\right| \le \epsilon , \end{aligned}$$
which proves the claim. Next, we recall that, by the Gibbs variational principle,
$$\begin{aligned} -\frac{1}{N}\log Z_{N,\beta }[\Phi ]=F_{\beta }^{(N)}(\mu _{\beta }^{(N)}), \end{aligned}$$
where \(\mu _{\beta }^{(N)}\) denotes the sequence of Gibbs measures, at inverse temperature \(\beta ,\) corresponding to the Hamiltonian \(H^{(N)}+N\delta _{N}^{*}(\Phi ),\) for \(\Phi \) Lipschitz continuous and decompose
$$\begin{aligned} E^{(N)}(\mu _{\beta }^{(N)})=\int _{X^{2}}W\rho _{2}^{(N)}\mu _{0}^{\otimes 2}+\left\langle \Phi ,\Gamma _{N}\right\rangle . \end{aligned}$$
By continuity the second term above converges towards \(\left\langle \Phi ,\Gamma \right\rangle \). To prove the desired lower bound on \(F_{\beta }^{(N)}(\mu _{\beta }^{(N)})\), it will thus, just as in the proof of Corollary 1.4, be enough to show that
$$\begin{aligned} \lim _{N\rightarrow \infty }\int _{X^{2}}W\rho _{2}^{(N)}\mu _{0}^{\otimes 2}=\int W\rho _{2}\mu _{0}^{\otimes 2} \end{aligned}$$
for any weak limit point \(\rho _{2}\mu _{0}^{\otimes 2}\) of \(\rho _{2}^{(N)}\mu _{0}^{\otimes 2}.\) To this end, we recall that, by the previous lemma, \(\rho _{2}^{(N)}\) is uniformly bounded in \(L^{p},\) as \(N\rightarrow \infty ,\) for any fixed \(p>1.\) Hence, by standard \(L^{p}\)-duality, the limit (2.11) follows from the fact that \(W\in L^{q}\) for some (any) \(q>1,\) since by assumption, \(e^{\epsilon W}\in L^{1}\) for any sufficiently small positive number \(\epsilon .\) This concludes the proof of Theorem 1.5.

2.7.1 An Alternative Direct Proof of the Lower Semi-continuity of \(\beta F_{\beta }\)

A consequence of the LDP established above is that the corresponding (scaled) free energy functional \(\beta F_{\beta }\) is lsc on \(\mathcal {P}(X).\) As we show next, this could also be shown directly by establishing a macroscopic version of the hypothesis H4 (using Orlitz space duality). This indicates that there could be a more direct proof of the LDP that avoids Lemma 2.12, as discussed above.

Lemma 2.13

Under the assumptions of Theorem 1.5, the following holds: for any \(\beta >\beta _{0}\) the (scaled) free energy functional \(\beta F_{\beta }\) on \(\mathcal {P}(X)\) is lower semi-continuous.


First observe that by the inequality (2.9) (applied to \(\Phi =0):\)
$$\begin{aligned} \beta F_{\beta }\ge -C_{\beta }. \end{aligned}$$
Now, applying the previous bound to \(\beta -\epsilon >\beta _{0}\) reveals that
$$\begin{aligned} \beta F_{\beta }\ge \epsilon E-C'. \end{aligned}$$
Given \(\mu \) in \(\mathcal {P}(X)\) with \(E(\mu )<\infty \), we set
$$\begin{aligned} u_{\mu }(x):=\int W(x,y)\mu (y). \end{aligned}$$
$$\begin{aligned} E(\mu )=\int _{X}u_{\mu }\mu . \end{aligned}$$
Using the previous estimate it will, to prove the lemma, be enough to verify the following “macroscopic” version of H4 for any sequence \(\mu _{j}\) converging weakly towards \(\mu :\)
$$\begin{aligned} ``\text {Macro} H4\text {''}:D(\mu _{j})\le C\implies E(\mu _{j})\rightarrow E(\mu ) \end{aligned}$$
(compare [9, Theorem 2.17]). To this end, set
$$\begin{aligned} u_{j}:=u_{\mu _{j}}:=\int W(x,y)\mu _{j}(y). \end{aligned}$$
First observe that it will be enough to prove that
$$\begin{aligned} \left\| u_{j}-u\right\| _{\theta }\rightarrow 0, \end{aligned}$$
where \(\theta \) is of exponential type, as in the proof of Corollary 1.4). Indeed, \(E(\mu _{j})-E(\mu )= =\left\langle u_{\mu _{j}},\mu _{j}\right\rangle -\left\langle u_{\mu },\mu \right\rangle =\left\langle u_{\mu _{j}}-u_{\mu },\mu _{j}\right\rangle -\left\langle u_{\mu },\mu -\mu _{j}\right\rangle =\left\langle u_{\mu _{j}}-u_{\mu },\mu _{j}\right\rangle -\left\langle \mu ,u_{\mu }-u_{\mu _{j}}\right\rangle \), using that W(xy) is symmetric in the last step. Hence, the Hölder–Young inequality gives
$$\begin{aligned} |E(\mu _{j})-E(\mu )|\le 2\left\| u_{j}-u\right\| _{\theta }\left( \left\| \rho _{j}\right\| _{\theta ^{*}}+\left\| \rho \right\| _{\theta ^{*}}\right) . \end{aligned}$$
Since the second factor in the right-hand side above is uniformly bounded (since \(D(\mu _{j})\le C)\), this shows that it is indeed enough to prove (2.13). To prove the latter convergence, observe that the Hölder–Young inequality implies that
$$\begin{aligned} |u_{j}|\le C,\,\,\,|u|\le C \end{aligned}$$
(using the integrability assumption on W). Indeed, if \(\mu =\rho \mu _{0},\) then
$$\begin{aligned} |u_{\mu }(x)|=|\int W(x,y)\rho (y)\mu _{0}(y)|\le 2\left\| W(x,\cdot )\right\| _{\theta }\left\| \rho \right\| _{\theta ^{*}}, \end{aligned}$$
where the first term is uniformly bounded in x,  by assumption, and the second term is controlled by \(D(\mu ).\) Next, note that the convergence (2.13) that we intend to prove is equivalent to proving
$$\begin{aligned} \int _{X}\theta (a(u_{j}-u))\mu _{0}\rightarrow 0 \end{aligned}$$
for any fixed positive number a. Now, since \(\theta (t)\le |t|e^{|t|}\) and the sup norms of \(|u_{j}|\) and |u| are bounded from above by C, we have
$$\begin{aligned} \int _{X}\theta (a(u_{j}-u))\mu _{0}\le a\int _{X}|u_{j}-u|e^{a|u_{j}-u|}\mu _{0}\le ae^{2aC}\int _{X}|u_{j}-u|\mu _{0}. \end{aligned}$$
All that remains is thus to show that
$$\begin{aligned} \left\| u_{j}-u\right\| _{L^{1}(X,\mu _{0})}\rightarrow 0. \end{aligned}$$
By the lower semi-continuity of W
$$\begin{aligned} \liminf _{j\rightarrow \infty }u_{j}\ge u, \end{aligned}$$
the desired convergence (2.14) will follow from general measure theory if \(\int u_{j}\mu _{0}\rightarrow \int u\mu _{0}\) (using that \(u_{\mu }\ge 0\) if W is normalized so that \(W\ge 0).\) But
$$\begin{aligned} \int _{X}u_{j}\mu _{0}=\int _{X}v(y)\mu _{j}(y),\,\,\,v:=\int _{X}W(x,y)\mu _{0}(y), \end{aligned}$$
where v is bounded, by the previous argument (since \(\mu _{0}\) trivially has finite entropy). In particular, v is in the little Orlitz space \(M_{\theta }(X,\mu _{0})\), and since \(D(\mu _{j})\le C\), the desired convergence (2.14) then follows from the duality argument towards the end of the proof of Corollary 1.4. \(\square \)

2.8 Relations to Gamma-Convergence of \(E^{(N)}\) on \(\mathcal {P}(X):\) Proof of Corollary 1.6

First observe that the required lower bound on \(E^{(N)}(x^{(N)})\) is obtained by taking \(\mu _{N}\in \mathcal {P}(X^{N})^{S_{N}}\) to be the normalized \(S_{N}\)-orbit in \(X^{N}\)of the Dirac measure supported at a given \(x^{(N)}\in X^{N}\):
$$\begin{aligned} \mu _{N}=\frac{1}{N!}\sum _{\sigma \in S_{N}}\delta _{\sigma (x^{(N)})}. \end{aligned}$$
Then \(E^{(N)}(x^{(N)})=E^{(N)}(\mu _{N}).\) Now, if \(\delta _{N}(x^{(N)})\) converges towards \(\mu \) in \(\mathcal {P}(X),\) then \((\delta _{N})_{*}\mu _{N}\) converges towards \(\Gamma :=\delta _{\mu }\), and hence the desired lower bound on \(E^{(N)}(x^{(N)})\) follows from the assumed lower bound on \(E^{(N)}(\mu _{N}).\)
Next, to construct recovery sequences, we fix some \(\beta >0,\) say \(\beta =1\), and a probability measure \(\mu _{0}.\) By Theorem 1.1 an LDP holds with rate functional \(E(\mu )+D_{\mu _{0}}(\mu ).\) In particular, the lower bound in the LDP gives that, for any given \(\mu \) and \(\mu _{0}\) in \(\mathcal {P}(X),\)
$$\begin{aligned}&-E(\mu )-D_{\mu _{0}}(\mu )\le \limsup _{\epsilon \rightarrow 0} \limsup _{N\rightarrow \infty }\frac{1}{N}\log \int _{B_{\epsilon }(\mu )} e^{-H^{(N)}}\mu _{0}^{\otimes N}\le \\&\le \limsup _{\epsilon \rightarrow 0}\limsup _{N\rightarrow \infty } \left( -\inf _{B_{\epsilon }(\mu )}E^{(N)}+0\right) . \end{aligned}$$
Hence, taking a sequence \(x_{\epsilon }^{(N)}\in B_{\epsilon }(\mu )\) such that \(E^{(N)}(x_{\epsilon }^{(N)})\le \inf _{B_{\epsilon }(\mu )}E^{(N)}+\epsilon \) gives
$$\begin{aligned} -E(\mu )-D_{\mu _{0}}(\mu )\le \limsup _{\epsilon \rightarrow 0} \limsup _{N\rightarrow \infty }\left( -E^{(N)}(x_{\epsilon }^{(N)})\right) . \end{aligned}$$
In particular, taking \(\mu _{0}=\mu \) (which gives \(D_{\mu _{0}}(\mu )=0)\), we deduce, by a diagonal argument, the existence of a recovery sequence for any given \(\mu \in \mathcal {P}(X).\)

3 Concluding Remarks

3.1 A Weaker form of the Hypothesis H2

Let us come back to the setting of Theorem 1.1 and observe that the hypothesis H2 may be replaced by the following one, which is a priori weaker (see the beginning of Sect. 2.8):
  • (H2’) For any sequence of \(x^{(N)}\in X^{N}\) such that \(\delta _{N}(x^{(N)})\rightarrow \mu \) weakly in \(\mathcal {P}(X)\), we have
    $$\begin{aligned} \liminf _{N\rightarrow \infty }\frac{1}{N}H^{(N)}(x^{(N)})\ge E(\mu ). \end{aligned}$$
In other words, (H2’) says that \(E^{(N)}:=H^{(N)}/N,\) when viewed as a functional on \(\mathcal {P}(X),\) satisfies the lower bound property that is one of the two requirements for the Gamma-convergence of \(E^{(N)}\) towards \(E(\mu ),\) where \(E(\mu )\) denotes, as before, the macroscopic mean energy whose existence is postulated in hypotheses H1.

Theorem 3.1

The conclusion of Theorem 1.1 remains valid if H2 is replaced by H2’.


Just as in the proof of Theorem1.1, in order to verify the convergence in Bryc’s lemma, we may without loss of generality assume that \(\Phi =0.\) Moreover, exactly as before, H1 combined with the Gibbs variational principle yields the upper bound on \(-\log Z_{N,\beta _{N}}.\) Thus we just have to prove the following bound:
$$\begin{aligned} \limsup _{N\rightarrow \infty }\frac{1}{N\beta _{N}}\log \int _{X^{N}}e^{- \beta _{N}H^{(N)}}\mu _{0}^{\otimes N}\le -\inf _{\mathcal {P}(X)}F_{\beta }. \end{aligned}$$
To this end, for any fixed \(\delta >0,\) we cover the compact space \(\mathcal {P}(X)\) by a finite number \(M_{\delta }\) of balls of radius \(\delta .\) Then
$$\begin{aligned} \limsup _{N\rightarrow \infty }\frac{1}{N\beta _{N}}\log \int _{X^{N}}e^{- \beta _{N}H^{(N)}}\mu _{0}^{\otimes N}\!\le \!0+\limsup _{N\rightarrow \infty }\frac{1}{N\beta _{N}}\log \int _{B_{ \delta }(\mu _{\delta ,N})}e^{-\beta _{N}H^{(N)}}\mu _{0}^{\otimes N}, \end{aligned}$$
where \(\mu _{\delta ,N}\) is the center of the ball with the largest integral (using that \((N\beta _{N})^{-1}\log M_{\delta }\rightarrow 0).\) Denote by \(\mu _{\delta }\) a weak limit point in \(\mathcal {P}(X)\) of the family \(\mu _{\delta ,N}\) as \(N\rightarrow \infty .\) Then \(B_{\delta }(\mu _{\delta ,N})\subset B_{2\delta }(\mu _{\delta })\) as \(N\rightarrow \infty \) (along the subsequence of \(\{N\}\)). Next, denote by \(\mu \) a weak limit point of \(\mu _{\delta }\) as \(\delta \rightarrow 0,\) i.e., \(\mu \) is the limit of \(\mu _{\delta _{j}}\) as \(\delta _{j}\rightarrow 0\), and fix \(\epsilon >0.\) For any sufficiently small \(\delta _{j}\), we have \(B_{2\delta _{j}}(\mu _{\delta _{j}})\subset B_{\epsilon }(\mu ).\) Hence, for any such \(\delta _{j}\), we have
$$\begin{aligned} B_{\delta _{j}}(\mu _{\delta _{j},N})\subset B_{\epsilon }(\mu ) \end{aligned}$$
as \(N\rightarrow \infty \) (along the subsequence of \(\{N\}\)). Thus
$$\begin{aligned} \limsup _{N\rightarrow \infty }\frac{1}{N\beta _{N}}\log \int _{X^{N}}e^{- \beta _{N}H^{(N)}}\mu _{0}^{\otimes N}\le \limsup _{N\rightarrow \infty }\frac{1}{N\beta _{N}}\log \int _{B_{\epsilon }(\mu )}e^{-\beta _{N}H^{(N)}}\mu _{0}^{\otimes N}. \end{aligned}$$
Moreover, by Sanov’s theorem,
$$\begin{aligned} \limsup _{\epsilon \rightarrow \infty }\limsup _{N\rightarrow \infty } \frac{1}{N\beta _{N}}\log \int _{B_{\epsilon }(\mu )}e^{-\beta _{N}H^{(N)}}\mu _{0}^{\otimes N}\le -\liminf _{\epsilon \rightarrow \infty }\liminf _{N\rightarrow \infty }\inf _{B_{\epsilon }(\mu )}E^{(N)}-\frac{1}{\beta }D(\mu ), \end{aligned}$$
where, by hypothesis H2’, the right-hand side is bounded from above by \(-E(\mu )-\frac{1}{\beta }D(\mu ):=-F_{\beta }(\mu ).\) Hence,
$$\begin{aligned} \limsup _{N\rightarrow \infty }\frac{1}{N\beta _{N}}\log \int _{X^{N}} e^{-\beta _{N}H^{(N)}}\mu _{0}^{\otimes N}\le -F_{\beta }(\mu )\le -\inf _{\mathcal {P}(X)}F_{\beta }, \end{aligned}$$
which concludes the proof of the bound (3.1) (strictly speaking we have proved the bound for some subsequence of the \(\{N\}\), but this is enough since we could have started by replacing the sequence \(\{N\}\) with a subsequence \(N_{j}\) with the property that the limsup in formula (3.1) is a lim along the subsequence \(N_{j}\)). \(\square \)

A result essentially equivalent to the previous theorem appears in [13].

Combining Theorem 3.1 and Corollary 1.6 thus reveals that the hypotheses H1 and H2’ actually imply the Gamma-convergence of \(\frac{1}{N}H^{(N)}\) towards E on \(\mathcal {P}(X).\) But it seems unlikely that, in general, the assumption that \(\frac{1}{N}H^{(N)}\) Gamma-converges towards a functional E on \(\mathcal {P}(X)\) is sufficent to deduce an LDP (even if one also assumes H3). On the other hand, as shown in [5], one does get an LDP for any \(\beta \in ]0,\infty ]\) under an assumption of quasi-superharmonicity:

Theorem 3.2

[5] Let \(H^{(N)}\) be a sequence of lower semi-continuous symmetric functions on \(X^{N},\) where X is a compact Riemannian manifold. Assume that :
  • The sequence \(\frac{1}{N}H^{(N)}\) on \(X^{N}\) (identified with a sequence of functions on \(\mathcal {P}(X)\)) Gamma-converges towards a functional E on \(\mathcal {P}(X)\).

  • \(H^{(N)}\) is uniformly quasi-superharmonic, i.e., \(\Delta _{x_{1}}H^{(N)}(x_{1},x_{2},\ldots x_{N})\le C\) on \(X^{N}\).

Then, for any sequence of positive numbers \(\beta _{N}\rightarrow \beta \in ]0,\infty ]\), the measures \(\Gamma _{N}:=(\delta _{N})_{*}e^{-\beta _{N}H^{(N)}}\) on \(\mathcal {M}_{1}(X)\) satisfy, as \(N\rightarrow \infty ,\) an LDP with speed \(\beta _{N}N\) and good rate functional
$$\begin{aligned} F_{\beta }(\mu )=E(\mu )+\frac{1}{\beta }D_{dV}(\mu ). \end{aligned}$$

This is not hard to see when \(\beta =\infty ,\) but for \(\beta <\infty \), the proof hinges on a submean inequality for quasi-subharmonic functions with a distortion factor that is subexponential in the dimension, proved in [5].


  1. 1.

    The Hamiltonians in the random matrix and Coulomb gas literature are usually scaled in a different way so that our zero-temperature \((\beta =\infty )\) corresponds to a fixed inverse temperature.



This work was supported by grants from the ERC and the KAW foundation. It is a pleasure to thank Sebastien Boucksom, Vincent Guedj, Philippe Eyssidieu, and Ahmed Zeriahi for the stimulating collaboration [9] and the referee for careful reading of the manuscript and many helpful remarks. Also thanks to the special issue editors Peter Forrester, Doug Hardin, and Sylvia Serfaty for the invitation to contribute to the special issue of Constructive Approximation.


  1. 1.
    Ben Arous, G., Guionnet, A.: Large deviations for Wigner’s law and Voiculescu’s non-commutative entropy. Probab. Theory Rel. Fields 108(4), 517–542 (1997)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Ben Arous, G., Zeitouni, O.: Large deviations from the circular law. ESAIM Probab. Stat. 2, 123–134 (1998)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Berman, R.J.: Determinantal point processes and fermions on complex manifolds: large deviations and Bosonization. Comm. Math. Phys. 327(1), 1–47 (2014)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Berman, R.J: Kähler–Einstein metrics, canonical random point processes and birational geometry. In: AMS Proceedings of the 2015 Summer Research Institute on Algebraic Geometry (to appear). arXiv:1307.3634
  5. 5.
    Berman, R.J.: Large deviations for Gibbs measures with singular Hamiltonians and emergence of Kähler–Einstein metrics. Comm. Math. Phys. 354(3), 1133–1172 (2017)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Berman, R.J.: Large Deviations for Gibbs Measures and Global Potential Theory: Riemannian Versus Kähler Manifolds (in preparation) Google Scholar
  7. 7.
    Berman, R.J., Boucksom, S.: Growth of balls of holomorphic sections and energy at equilibrium. Invent. Math. 181(2), 337 (2010)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Berman, R.J., Boucksom, S., Witt Nyström, D.: Fekete points and convergence towards equilibrium measures on complex manifolds. Acta Math. 207(1), 1–27 (2011)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Berman, R.J., Boucksom, S., Eyssidieu, P., Guedj, V., Zeriahi, A.: Kähler–Einstein metrics and the Kähler–Ricci flow on log Fano varieties. Crelle’s J. (to appear). arXiv:1111.7158
  10. 10.
    Berman, R.J., Onnheim, M.: Propagation of Chaos for a Class of First Order Models with Singular Mean Field Interactions. arXiv:1610.04327
  11. 11.
    Bloom, T., Levenberg, N., Piazzon, P., Wielonsky, F: Bernstein–Markov: A Survey. Dolomites Res. Notes Approx. Vol. (Special Issue) 75–91 (2015). arXiv:1512.00739
  12. 12.
    Bodineau, T., Guionnet, A.: About the stationary states of vortex systems. Ann. Inst. Henri Poincare Probab. Stat. 35, 205–237 (1999)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Boucksom, S.: Limite thermodynamique et théorie du potentie. SMF Gazette Octobre. No. 146 (2015)Google Scholar
  14. 14.
    Braides, A.: Gamma-Convergence for Beginners. Oxford University Press, Oxford (2002)CrossRefMATHGoogle Scholar
  15. 15.
    Caglioti, E., Lions, P.-L., Marchioro, C., Pulvirenti, M.: A special class of stationary flows for two-dimensional Euler equations: a statistical mechanics description. Commun. Math. Phys. 143(3), 501–525 (1992)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Chafaï, D., Gozlan, N., Zitt, P.-A.: First-order global asymptotics for confined particles with singular pair repulsion. Ann. Appl. Probab. 24(6), 2371–2413 (2014)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Dembo, A., Zeitouni O.: Large deviation techniques and applications. Corrected reprint of the second (1998) edition. In: Stochastic Modelling and Applied Probability, 38. Springer, Berlin (2010)Google Scholar
  18. 18.
    Dupuis, P., Laschos, V., Ramanan, K.: Large Deviations for Empirical Measures Generated by Gibbs Measures with Singular Energy Functionals. arXiv:1511.06928
  19. 19.
    García Zelada, D.: A Large Deviation Principle for Empirical Measures on Polish Spaces: Application to Singular Gibbs Measures on Manifolds. arXiv:1703.02680
  20. 20.
    Hardin, D.P., Saff, E.B.: Discretizing manifolds via minimum energy points. Not. Am. Math. Soc. 51(10), 1186–1194 (2004)MathSciNetMATHGoogle Scholar
  21. 21.
    Hauray, M., Mischler, S.: On Kac’s chaos and related problems. J. Funct. Anal. (2014). arXiv:1205.4518
  22. 22.
    Kiessling, M.K.H.: Statistical mechanics of classical particles with logarithmic interactions. Comm. Pure Appl. Math. 46, 27–56 (1993)MathSciNetCrossRefMATHGoogle Scholar
  23. 23.
    Kiessling, M.K.H.: Statistical mechanics approach to some problems in conformal geometry. Phys. A: Stat. Mech. Appl. 279(1–4), 353–368 (2000)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Kiessling, Michael K.-H., Spohn, H.: A note on the eigenvalue density of random matrices. Comm. Math. Phys. 199(3), 683–695 (1999)MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Leblé, T., Serfaty, S.: Large Deviation Principle for Empirical Fields of Log and Riesz Gases. arXiv:1502.02970
  26. 26.
  27. 27.
    Mariani, M.: A Gamma-Convergence Approach to Large Deviations. arXiv:1204.0640
  28. 28.
    Messer, J., Spohn, H.: Statistical mechanics of the isothermal Lane–Emden equation. J. Stat. Phys. 29(3), 561–578 (1982)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Onsager: Statistical hydrodynamics. Supplemento al Nuovo Cimento 6:279–287 (1949)Google Scholar
  30. 30.
    Rao, M.M., Ren, Z.D.: Theory of Orlicz Spaces, Volume 146 of Pure and Applied Mathematics. Marcel Dekker, New York (1991)Google Scholar
  31. 31.
    Robinson, D.W., Ruelle, D.: Mean entropy of states in classical statistical mechanics. Comm. Math. Phys. 5, 288–300 (1967)MathSciNetCrossRefMATHGoogle Scholar
  32. 32.
    Saff, E.B., Kuijlaars, A.B.J.: Distributing many points on a sphere. Math. Intell. 19(1), 5–11 (1997)MathSciNetCrossRefMATHGoogle Scholar
  33. 33.
    Saff, E., Totik, V.: Logarithmic Potentials with Exteriour Fields. Springer, Berlin (1997) (with an appendix by Bloom, T)Google Scholar
  34. 34.
    Serfaty, S.: Coulomb gases and Ginzburg–Landau vortices. Zurich Lectures in Advanced Mathematics. European Mathematical Society (EMS), Zürich (2015)Google Scholar

Copyright information

© The Author(s) 2018

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.Department of Mathematical SciencesChalmers University of Technology and the University of GothenburgGöteborgSweden

Personalised recommendations