On Large Deviations for Gibbs Measures, Mean Energy and Gamma-Convergence

We consider the random point processes on a measure space (X,μ0)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(X,\mu _{0})$$\end{document} defined by the Gibbs measures associated with a given sequence of N-particle Hamiltonians H(N).\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H^{(N)}.$$\end{document} Inspired by the method of Messer–Spohn for proving concentration properties for the laws of the corresponding empirical measures, we propose a number of hypotheses on H(N)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H^{(N)}$$\end{document} that are quite general but still strong enough to extend the approach of Messer–Spohn. The hypotheses are formulated in terms of the asymptotics of the corresponding mean energy functionals. We show that in many situations, the approach even yields a large deviation principle (LDP) for the corresponding laws. Connections to Gamma-convergence of (free) energy type functionals at different levels are also explored. The focus is on differences between positive and negative temperature situations, motivated by applications to complex geometry. The results yield, in particular, large deviation principles at positive as well as negative temperatures for quite general classes of singular mean field models with pair interactions, generalizing the 2D vortex model and Coulomb gases. In a companion paper, the results will be illustrated in the setting of Coulomb and Riesz type gases on a Riemannian manifold X, comparing with the complex geometric setting.


Introduction
Let X be a compact topological space endowed with a probability measure μ 0 . Given a sequence of symmetric functions H (N ) on the N -fold products X N , which are absolutely integrable with respect to the Borel measure μ ⊗N 0 , the corresponding Gibbs measures at inverse temperature β N ∈ R are defined as the following sequence of symmetric probability measures on X N : assuming that the partition function Z N ,β N is finite: We also assume that the following limit exists: The ensemble (X N , μ (N ) β N ) (called the canonical ensemble) defines a random point process with N particles on X which, from the point of view of statistical mechanics, models N identical particles on X interacting by the Hamiltonian (interaction energy) H (N ) in thermal equilibrium at inverse temperature β N . The corresponding empirical measure is the random measure δ N : X N → P(X ), (x 1 , . . . , x N ) → δ N (x 1 , . . . , taking values in the space P(X ) of all probability measures on X . A recurrent theme in statistical mechanics is to study the large N -limit (i.e., the "macroscopic limit") of the canonical ensemble through the large N -limit of the laws of δ N , i.e. through the sequence of probability measures on P(X ). In many situations, the laws N can be shown to concentrate, as N → ∞, at the subset of P(X ) consisting of the minima of a free energy type functional F β on P(X ); we will then say that "the sequence N has the concentration property." For example, if the functional F β has a unique minimizer μ β , then it follows that the random measures δ N converge in law to a unique deterministic measure μ β . A stronger exponential notion of concentration-with an explicit speed and rate functional-is offered by the theory of large deviations, by demanding that the laws N satisfy a large deviation principle (LDP) with speed r N and a rate functional F, symbolically expressed as The present paper is inspired by the method introduced by Messer-Spohn [28] to establish the concentration property of the laws of the empirical measures δ N using the Gibbs variational principle, combined with properties of the mean energy of the system. In the original approach in [28], H (N ) was assumed to be the mean field Hamiltonian corresponding to a continuous pair interaction potential W (x, y) : where β N = β ∈]0, ∞[ (this is a mean field interaction in the sense that each particle x i is exposed to the average of the pair interactions W (x i , x j ) of all N particles, including the self-interaction). But the approach has also been extended to handle some situations where W (x, y) is allowed to be singular [15,22,24], as in Onsager's vortex model for 2D turbulence [29], where W (x, y) = − log |x − y|. The corresponding mean field Hamiltonian is then "renormalized" by removing the self-interaction terms in order to make sure that H (N ) is generically finite on X N . The aim of the present paper is to: • Propose a number of quite general hypotheses on H (N ) , formulated in terms of the corresponding mean energy functional E (N ) on P(X N ) (defined by formula 1.3) that are strong enough to extend the approach of Messer-Spohn. • Show that the approach also yields the stronger exponential concentration property in the sense of an LDP, almost "for free," in several situations. • Explore some relations to the notion of Gamma-convergence of functionals: first by reformulating the approach of Messer-Spohn in terms of Gamma-convergence of the induced mean free energy functionals F (N ) β N on P(P(X )) (formula (2.4)) and then by deducing a Gamma-convergence result for the sequence H (N ) /N on X N .
The main motivation comes from the probabilistic approach to the construction of Kähler-Einstein metrics on a complex algebraic manifold X , introduced in [4,5]. In that situation, the corresponding Hamiltonians H (N ) are highly nonlinear and singular (and not of the simpler mean field type appearing in formula (1.5), which has previously been used in the statistical mechanical approach to conformal geometry introduced in [23]). But still, as shown in [5], building on [7], the sequence H (N ) /N Gamma-converges towards a certain energy type functional E(μ) on P(X ). Exploiting superharmonicity properties of H (N ) /N , the corresponding LDP is then established at any positive inverse temperature β in [5], producing Kähler-Einstein metrics with negative Ricci curvature in the large N -limit. The approach in [5] bypasses the problem of the existence of the macroscopic mean energy (hypothesis H1 below), which is open in the complex geometric setting. On the other hand, as discussed in [5], extending the LDP in [5] to negative β-which is needed to produce Kähler-Einstein metrics with positive Ricci curvature-necessitates the existence of the macroscopic mean energy. This is the motivation behind Theorem 1.3 below, which shows that, conversely, hypothesis H1 together with the additional hypotheses H4, implies an LDP for appropriate negative β. The hypothesis H4 is inspired by the energy-entropy compactness results in [9], which can be viewed as the macroscopic analog of H4 in the complex geometric setting. Incidentally, in the lowest-dimensional case when X is a Riemann surface, the probabilistic setting in [4,5] is essentially equivalent to a mean field model with a logarithmic pair interaction, which is thus similar to Onsager's vortex model for 2D turbulence [29]. In the latter situation, the corresponding concentration properties were established in [15,22], for any β above the critical negative temperature (and the LDP was established using a different method in [12]). This is in line with Onsager's prediction of the existence of macroscopic negative temperature states.
Another motivation for the present paper comes from random matrix theory (or more generally Coulomb gases), which can be viewed as a vortex type model with β N ∼ N (and in particular β = ∞). The corresponding concentration property was established in [24], using the method of Messer-Spohn as in [15,22]. Here we observe that, with a simple modification, the concentration property can be upgraded to an LDP (Corollary 1.2). In particular, this allows one to dispense with some technical assumptions (such as regularity properties of the corresponding pair interaction W (x, y) away from the diagonal) used in the different approaches to LDPs in [1,2,16,34] (see Sect. 1.3). 1 Yet another motivation comes from approximation and sampling theory and, in particular, the problem of finding nearly minimal configurations for a given energy type interaction on a Riemannian manifold, in the spirit of [20,32].
Let us also point out that that the restriction that X be compact can be removed if suitable growth-assumptions of H (N ) at infinity are made, as in the settings in R n considered in [1,2,16,18,24,34] (using appropriate tightness estimates). But in order to (hopefully) convey the conceptual simplicity of the arguments we stick with a compact X.

Hypotheses
We may as well assume that X coincides with the support of μ 0 . In the following, μ (N ) will denote a symmetric probability measure on X N and Y := P(X ). We recall that the mean (microscopic) energy of μ (N ) , in the usual sense of statistical mechanics, is defined by (assuming that H (N ) ∈ L 1 (μ (N ) )). We introduce the following hypotheses: • (H1) "Existence of a macroscopic mean energy E(μ) : There exists a functional E(μ) on P(X ) such that for any μ in P(X ) satisfying E(μ) < ∞, . 1 The Hamiltonians in the random matrix and Coulomb gas literature are usually scaled in a different way so that our zero-temperature (β = ∞) corresponds to a fixed inverse temperature.
Moreover, E(μ 0 ) < ∞. • (H2) "Lower bound on the mean energy": For any sequence of μ (N ) such that • (H3) "Approximation property": For any μ such that E(μ) < ∞, there exists a sequence μ j converging weakly to μ such that μ j has finite entropy with respect to μ 0 and satisfies E(μ j ) → E(μ). • (H4) "Mean energy/entropy compactness": If where D (N ) (μ (N ) ) is the mean entropy, then the following convergence holds, after perhaps replacing μ (N ) by a subsequence such that N : The first hypothesis will be assumed throughout the paper. The second and third ones will appear naturally in positive and vanishing temperature, respectively, while the fourth one turns out to be useful in some case of negative temperature. However, it may very well be that hypothesis H4 needs to be weakened a bit in order to increase its scope. For example, in the proof of the large N -concentration properties, one only needs to assume that H4 holds when μ (N ) is the Gibbs measure corresponding to H (N ) (see Remark 2.11).
Of course, the sign of the temperature may be switched by replacing H (N ) with −H (N ) , but the point is that, in practice, we will consider settings where the sign of H (N ) is fixed by the requirement that H (N ) be bounded from below (which essentially means that the system is assumed to be stable at zero temperature).

Large Deviation Results
We start with the simpler setting of positive temperature: The previous theorem in particular applies to the following "finite order" Hamiltonians of mean field type. Given symmetric functions W m on X m for m ≤ M, set (1.5) where the inner sum runs over all multi indices I = (i 1 , . . . , i m ) of length m and with the property that no two indices of I coincide. Then it is easy to verify H1 and H2 above with The main case of interest is when M = 2 and H (N ) is a sum of pair-interactions W (x i , x j ), scaled by 1/N . But since it will require no extra effort in the proofs, we consider the more general "finite order setting." The corresponding result also holds for β = ∞ if hypothesis H3 holds.
In the Euclidean setting and with M = 2, the previous corollary was established very recently in [18] using somewhat different methods (the results in [18] have also independently been generalized to the setting of the previous theorem and corollary in [19]).
We next turn to the case of negative temperature.

Theorem 1.3
Suppose that hypothesis H1 and H4 hold, and fix a negative number β 0 . Then the following are equivalent: on P(X ) satisfy an LDP with speed N and rate functional The key observation in the proof of Corollary 1.4 is that the first point in Theorem 1.3 always implies, "for free," a uniform estimate in the Orlitz (Zygmund) space L 1 LogL 1 , so that some general Orlitz space duality results [26,30] can be exploited in order to verify the hypothesis H4.
It seems natural to ask if the previous corollary can be generalized to the case when one only assumes the integrability condition that Z 2,β be finite when β > β 0 , for some (finite) negative number β 0 . The following theorem gives an affirmative answer if one strengthens the integrability condition a bit: Theorem 1.5 Let X be a compact metric space, W a lower semi-continuous symmetric measurable function on X 2 , and β 0 a negative number such that Then, for any β > β 0 , the Gibbs measures μ (N ) β satisfy an LDP as in the previous corollary.
Specialized to the logarithmic case of the vortex model, i.e. to the case the previous theorem recovers the LDP in [12] with a new proof. The proof follows closely the corresponding (weaker) concentration result for the vortex model originally established in [15,22]. The new observation is that with a little twist, the argument in [15,22] can be supplemented to give the LDP in question.

Relations to Gamma-Convergence at Different Levels
The proofs of the LDPs above are based on the Gamma-convergence of the corresponding free energy functionals F (N ) β N when viewed as functionals on the space P(P(X )) (a similar approach is used in the dynamic setting considered in [10] where the assumptions H1 and H2 also appear naturally). Incidentally, as observed in the following corollary, the LDPs then imply the Gamma-convergence of the scaled Hamiltonians H (N ) /N when viewed as functionals on P(X ).

Corollary 1.6 Suppose that the Hamiltonians H (N ) satisfy H1 and H2. Then H (N ) /N Gamma-converges towards E(μ) on P(X ). In particular, this applies to the finite order mean field Hamiltonians.
The previous result generalizes the Gamma-convergence result in [34, Prop 2.8, Remark 2.19] for the mean field Hamiltonians corresponding to pair interactions W (x, y) (on a domain in R D ), where it was assumed that w(x, y) essentially only blows up along the diagonal (similar results also appear implicitly in [1,2,16]). The proofs in [1,2,16,34]) are based on some rather intricate combinatorial constructions, involving small cubes in R D . On the other hand, the latter results yield the stronger result that any measure μ with a positive continuous density admits a recovery sequence where B N denotes the L ∞ -ball with center x (N ) and radius N of the order 1/N 1/D . In turn, as shown in [34], the latter stronger form of Gamma-convergence implies the LDP for the corresponding Gibbs measures when β N log N (which is used to make sure that β −1 as pointed out by the referee, the technical condition that β N log N can be replaced by β N 1 by a slight modification of the proof of Prop 2.5 in [16]. Relations between Gamma-convergence and large deviation principles have also been previously studied in [27] but from a somewhat different perspective (see also [13] for some related results).

Applications to the Coulomb Gas on a Riemannian Manifold
In the companion paper [6], the general large deviation results above are illustrated and further developed for Coulomb and Riesz type gases on a compact D-dimensional Riemannian manifold (X, g) and more generally for suitable compact subsets K ⊂ X (the case when μ 0 is a volume form and β > 0 has also indepedently been obtained in [19]).
Here we will only state the corresponding LDP for the Coulomb gas on (X, g) defined as follows. Let W (x, y) be 1/2 times the integral kernel of the inverse of the positive Laplacian − on the space of all functions in L 2 (X, dV g ) with mean zero, where dV g denotes the volume form determined by the metric g. As is classical, W is symmetric and smooth away from the diagonal, and close to the diagonal it admits the following asymptotics when D > 2 : for a positive constant C D . Moreover, when D = 2, In particular, W is lsc and in L 1 (X × X ). Given a probality measure μ 0 on X , the Coulomb gas on (X, g, μ 0 ) at inverse temperature β N is defined by the Gibbs measures is the mean field Hamiltonian corresponding to the pair interaction W (x, y). In this setting, Corollary 1.2 and Theorem 1.5 yield, as shown in [6], the following LDP for the laws of the empirical measures of the Coulomb gas, formulated in terms of the potential theoretic properties of the measure μ 0 : Theorem 1.7 Let (X, g) be a compact Riemannian manifold, and consider the Coulomb gas at inverse temperature β N on (X, g, μ 0 ).
• When β = ∞, the LDP holds if μ 0 is nonpolar and μ 0 is determining for its support K .
We briefly recall that a compact subset K ⊂ X is polar if it is locally contained in the −∞-set of a local subharmonic function (or equivalently, if K has vanishing capacity). Accordingly, a measure μ 0 is said to be nonpolar if it does not charge any polar set. The notion of a determining measure μ 0 appearing in the second point above means that for any continuous function u, for any quasi-subharmonic function ϕ on X ; i.e., ϕ is strongly usc and satisfies ϕ ≥ −1. This notion is closely related to the notion of measures satisfying a Bernstein-Markov property in pluripotential theory [8,11] and measures with regular asymptotic behavior in the theory of planar orthogonal polynomials [33]. For example, μ 0 can be taken to be the D-dimensional Hausdorff measure on a Lipschitz domain K ⊂ (X, g) or the (D − 1)-dimensional Hausdorff measure on a Lipschitz hypersurface in (X, g). The point is that the assumption that μ 0 is nonpolar and determining implies that the hypothesis H3 is satisfied, as shown in [6] (an alternative proof of the LDP in the case when β = ∞ can also be given using the approach in the complex geometric setting in [3]). Finally, we recall that measures satisfying d(μ 0 ) > 0, as in the third point above, are sometimes called Frostman measures in the classical litterature (for example, the property in question holds with d(μ 0 ) = d when μ 0 is the d-dimensional Hausdorff measure of a compact subset K of X of Hausdorff dimension d).
More generally, an LDP as in the previous theorem is obtained in [6], when W (x, y) is taken as the integral kernel of the inverse of (− ) p and the (possible fractional) power p is in ]0, D/2] (or even more generally: when (− ) p is replaced by a suitable pseudodifferential operator of order at most D). Then the last point in the previous theorem holds in the critical case p = D/2. However, the LDP for β = ∞ appears to be rather subtle in the general setting and is only shown to hold when μ 0 is a volume form (or comparable to a volume form), except when p ≤ 2, where it applies to measures μ 0 that are determining in a suitable sense.
Let us also point out that in the Euclidean setting of the Coulomb and Riesz gases in R n , with μ 0 given by the Euclidean volume form and β N of the order N , a refined "microscopic" large deviation principle "at the level of processes" is obtained in [25]. Such large deviation principles are beyond the scope of the present paper and seem to require different methods-the point here is rather to allow the measure μ 0 to be very singular (and the inverse temperature to be negative, in some cases).

General Notation
Given a compact topological space X , we will denote by C 0 (X ) the space of all continuous functions u on X, equipped with the sup-norm, and by M(X ) the space of all signed (Borel) measures on X. The subset of M(X ) consisting of all probability measures will be denoted by P(X ). We endow M(X ) with the weak topology; i.e., for any continuous function u on X, i.e., for any u ∈ C 0 (X ) (in other words, the weak topology is the weak-star topology when M(X ) is identified with the topological dual of C 0 (X )). Since X is compact, so is P(X ). Given a lower semi-continuous function F on Y := P(X ), we will, abusing notation slightly, also write F for the induced linear lower semicontinuous functional on P(Y ) : Equivalently, under the natural embedding μ → δ μ of Y into P(Y ), the function F( ) is the unique lower semi-continuous affine extension of F to P(Y ).
We will denote by S N the permutation group acting on X N and by P(X N ) S N the space of symmetric measures μ N (i.e., S N -invariant) on X N . Also note that, following standard practice, we will denote by C a generic constant whose value may change from line to line.

Entropy
We will write D(ν 1 , ν 2 ) for the relative entropy (also called the Kullback-Leibler divergence in information theory) of two measures ν 1 and ν 2 on a topological space Z : if ν 1 is absolutely continuous with respect to ν 2 , i.e., ν 1 = f ν 2 , one defines and otherwise one declares that D(ν 1 , ν 2 ) := ∞. Note the sign convention used: D is minus the physical entropy. In our setting, the space Z will always be of the form X N , and we will then take the reference measure ν 2 = μ ⊗N 0 and write D(·) := D(·, μ ⊗N 0 ). It will also be convenient to define the mean entropy of a probability measure μ N on X N (i.e., μ N ∈ P(X N )) as Then it follows directly that Moreover, denoting by (μ N ) j the jth marginal μ N (which defines a probability measure on X j ), as follows from the concavity of the function t → log t on R + (see, for example, [22]).

Large Deviation Principles
Let us start by recalling the general definition of a large deviation principle (LDP) for a sequence of measures.
for any open subset G of Y.

Remark 2.2
The LDP is said to be weak if the upper bound is only assumed to hold when F is compact. Anyway, we will only consider the case when Y is compact, and hence the notion of a weak LDP and an LDP then coincide (and moreover, any rate functional is automatically good).

Lemma 2.3 (Bryc)
. Let Y be a compact Polish space. Suppose that there exists a function f on C 0 (Y ) such that for any ∈ C 0 (Y ), Then N satisfies an LDP with speed r N and rate functional (by Varadhan's lemma the converse also holds).
We also have the following simple lemma: Proof By definition, Hence, if the measures˜ N satisfy the asymptotics in Bryc's lemma with rate func-tionalĨ (μ), then the measures N satisfy the asymptotics in Bryc's lemma with rate functionalĨ (μ) − C β , where C β is the limit of − 1 N log Z N ,β as N → ∞. Since μ (N ) β is a probability measure, the LDP for N implies that the inf ofĨ (μ) − C β vanishes; i.e., C β is the inf ofĨ (μ). The converse is proved in a similar way.

Gamma-Convergence
We recall that a sequence of functions f j on a topological space X is said to Gammaconverge to a function f on X if (such a sequence x j is called a recovery sequence); see [14]. More generally, given a subset S X , we will say that f j Gamma-converges to f relative to S if the existence of a recovery sequence in X is only demanded when x ∈ S. , for a suitable increasing function i → n i , yields a sequence y i in X converging to s such that f (s i ) ≥ f n i (y i ) − 1/i. Setting x n i := y i (and x j := s when j is not of the form j = n i for any i) and using the first implication in the definition of the relative Gamma-convergence of the sequence f j , we thus deduce that lim inf i→∞ f (s i ) ≥ f (s), as desired.

Lemma 2.6 Let X be a compact topological space and assume that f j Gammaconverges to f relative to a set S containing all minima of f. Then
Proof Given s ∈ S we take a recovery sequence x n and observe that for some y n , y ∈ X , by the compactness and the assumption of Gamma-convergence. In particular, when s realizes the minimum of f so does y, and hence equalities must hold above, which concludes the proof.

Legendre-Fenchel Transforms
Let f be a function on a topological vector space V. Then its Legendre-Fenchel transform is defined as the following convex lower semi-continuous function f * on the topological dual V * : in terms of the canonical pairing between V and V * . In the present setting, we will take V = C 0 (X ) and V * = M(X ), the space of all signed Borel measures on X. Then f * * = f for any lower semi-continuous convex function (by standard duality in locally convex topological vector spaces [17]).

Proof of Theorem 1.1
Set E N (x 1 , .., x N ) := H (N ) (x 1 , . . . , x N )/N so that the mean energy (1.3) can be written as We denote by F (N ) β N the corresponding mean free energy functional on P(X N ) S N , at inverse temperature β N : Now set Y := P(X ). By embedding P(X N /S N ) into P(Y ), using the push-ward map (δ N ) * , we can identify the mean free energies F (N ) β N with functionals on P(Y ), extended by ∞ to all of P(Y ). We will identity Y with its image in P(Y ) under the embedding μ → δ μ .
The starting point of the proof of the LDP is the following reformulation of Bryc's lemma in terms of the Legendre-Fenchel transform, using the Gibbs variational principle: i.e., Then the LDP holds with speed Nβ N and rate functional where f * is the Legendre-Fenchel transform of f.
Proof The Gibbs variational principle says that if μ (N ) β N is a well-defined probability measure, then  In order to verify the criterion in the previous lemma, we will use the following lemma:

If moreover H3 holds, then the corresponding result also holds when β = ∞.
Proof First assume that β < ∞. The lower bound follows directly from hypotheses H1 and H2 together with the fact that the mean entropy functionals satisfy the lower bound in the Gamma-convergence (by subadditivity [31]; see also Theorem 5.5 in [21] for generalizations). To prove the existence of recovery sequences, we fix an element of the form δ μ and take the recovery sequence to be of the form (δ N ) * μ ⊗N . Then the required convergence follows from H1 together with the product property (2.1.) Finally, when β = ∞, the previous argument for the existence of a recovery sequence still applies as long as μ satisfies D(μ) < ∞. The general case then follows by a simple diagonal approximation argument using H3. Now, since the limiting functional F β ( ) is affine and lower semi-continuous (by Lemma 2.5) and the set Y is extremal in P(Y ), the infimum of F on P(Y ) is attained in Y (for example, by Choquet's theorem). Fixing a continuous function on C 0 (Y ) and replacing H (N ) with the new Hamiltonian H (N ) + N δ * N ( ), Lemma 2.6 thus shows that the criterion in Lemma 2.7 is satisfied. Hence the LDP holds with lower semi-continuous rate functional I (μ) = f * (δ μ ). Finally, extending I to P(Y ) by linearity, this means that I ( ) is the Legendre-Fenchel transform of f, i.e., I = f * . But in our case f is itself defined as f := F * , and hence, I = F * * = F since F is convex (and even affine) and lower semi-continuous.

Remark 2.10
An inspection of the proof of Theorem 1.1 above reveals that, in the case β = ∞, the hypothesis H3 may be replaced by the following weaker one: • (H3)' The functional F β Gamma-converges towards E, as β → ∞.

Proof of Corollary 1.2
First note that H1 is trivially satisfied. To verify the second hypothesis H2, we may as well, by linearity, assume that M = m and that there is just one term with W : = W m (x 1 , . . . , x m ). Since W is lower semi-continuous, there exists a sequence of continuous, functions W R increasing to W as R → ∞, and we denote by E W R the corresponding functionals on P(X ). It follows readily from the definitions that for any fixed R > 0, N (x 1 , . . . , x N and, in particular, But since E W R is continuous, for any R > 0. Finally, letting R → ∞ and using the monotone convergence theorem of integration theory concludes the proof.

Proof of Theorem 1.3
First observe that if the LDP in the second point of the theorem holds, then integrating over all of P(X ) reveals that the first point holds. To prove the converse, we fix β > β 0 and note that the Gibbs variational principle applied at the inverse temperature β − gives which we rewrite as Thus, by the Gibbs variational principle, for any fixed ν ∈ P(Y ). In particular, taking ν = μ 0 and using the hypothesis H1 (which implies that F But then the previous inequalities force Hence, by the hypothesis H4, for any limit point of the laws of δ N . As a consequence, we deduce precisely as before that the desired asymptotics for the β N F (N ) β N hold. Finally, repeating the argument with H (N ) replaced by the new Hamiltonian H (N ) + N δ * N ( ) (which satisfies the same hypothesis) concludes the proof, just as before.

Remark 2.11
If one only wants to prove that the laws of δ N concentrate on the minima of F β (rather than proving an LDP), it is enough to show that the convergence of the free energies hold for = 0 (as in the original approach in [28]). As revealed by the previous proof, this only requires that the hypothesis H4 holds for the particular sequence μ (N ) β .

Proof of Corollary 1.4
First observe that the assumption on Z N ,β in the case N = 2 gives the following exponential integrability property of W : for any β. Hence, by Theorem 1.3, it will be enough to show that the previous integrability property implies that H4 is satisfied (for any sequence μ N ). To this end, we will apply a duality argument. First recall that given a measure space (X , μ) and a finite Young function θ on R (i.e., a non-negative even lower semi-continuous convex function such that θ(0) = 0), the corresponding large Orlitz space is defined by and the corresponding small Orlitz space is defined by (where all functions f are assumed measurable). The space L θ (X , μ) may be equipped with a norm · θ , called the Luxemburg norm, which turns L θ (X , μ) and its subspace M θ (X , μ) into Banach spaces: i.e., the gauge of the set (unit-ball) f : By the Hölder-Young inequality, where θ * is the Young function defined as the Legendre-Fenchel transform of θ. In particular, for any g ∈ L θ * f → f gμ defines a continuous function on L θ with bounded operator norm; i.e., L θ * ⊂ L * θ , where L * denotes the Banach space dual of a Banach space L , endowed with the operator norm. To apply this in the present context, we note that where ρ N is the density of the second marginal of μ N . The assumption that D (N ) (μ (N ) ) ≤ C implies that according to (2.2). Now set θ(s) := e s − s − 1, when s ≥ 0. Then θ * (t) = (t + 1) log(1 + t) − t when t ≥ 0. By the previous entropy inequality for ρ N , the sequence {ρ N } stays in a fixed ball in L θ * , and hence, by the Hölder-Young inequality, {ρ N } stays in a fixed ball in the dual Banach space L * θ . By weak compactness, it then follows that there exists ∈ L * θ such that for any g ∈ L θ , (after perhaps passing to a subsequence). Now, since ρ N μ ⊗2 0 is a probability measure, we may also assume that there exists ρ ∈ L θ * such that for any continuous function u. In our case g = W , and we just need to check that , g = ρμ ⊗2 0 , g . But, by assumption, W ∈ M θ , and by the general duality theorems in [26,30] the topological dual of M θ identifies with L θ * , i.e. any continuous functional on M θ is obtained by integrating against a (unique) ρ ∈ L θ * , which concludes the proof.

Proof of Theorem 1.5
Given a compact metric space X , we endow Y (:= P(X )) with the Wasserstein L 1metric d, which is compatible with the weak topology: where f is Lipschitz continuous on X with Lipschitz constant L( f ) ≤ 1. Since (μ − ν) = 0, we may as well assume that f (x 0 ) = 0 for a fixed point x 0 and hence that | f (x)| ≤ C X , where C X is independent of f (since X is compact and, in particular, has bounded diameter).
Let us first show that, when β > β 0 , To see this, rewrite −β over all j such that j = i. The arithmetric-geometric means inequality gives Now, for a given i, we have (by first integrating over the N − 1 variables different from x i and then taking the sup over x i ) where, by assumption, A β < ∞ (since A β 0 < ∞ and W is lsc on the compact space X 2 and thus bounded from below). Next we fix a continuous function on Y := P(X ). Without loss of generality, we may as well assume that W, ≥ 0.
First observe that when β > β 0 , we have as follows directly from the bound (2.7) (using that is bounded).
Using the convergence of the mean energies and Gibbs variational principle, as before, we thus have Now, to prove the LDP we need, in view of Lemma 2.7, to complement the upper bound on − 1 N log Z N ,β [ ] in formula (2.9) with a corresponding lower bound. To this end it would, by Theorem 1.3, be enough to establish the hypothesis H4 in the present context (for example by trying to extend the Orlitz space duality argument in the proof of Corollary 1.4). Such an approach remains to be developed (however, a macroscopic version of H4 does hold; see formula (2.12)). Here we will instead take another road, exploiting the stronger L p -bounds provided by the following lemma (inspired by [15,22]): as N → ∞. In particular, for any p > 1, ρ Proof To fix ideas, we start with the case = 0, following closely the proof of Theorem 3.1 in [15]. Set Applying Hölder's inequality with p = N /j (and thus q = 1 + j/(N − j)), the integral in the right-hand side is bounded from above by By assumption, the integral appearing in the first factor above is bounded from above by a A N for some positive constant A. It will thus be enough to show that the second integral is controlled by Z N in the sense that it is bounded from above by a uniform constant times Z N . To this end, we will apply Hölder's inequality again, now with conjugate exponents u and w with u sufficiently close to 1 (to be quantified below). We thus rewrite and apply Hölder's inequality. Since The first factor is controlled by Z N (since W ≥ 0). Moreover, taking w = (N − j)/j for a sufficiently small positive number ensures that the integral in the second factor is controlled by Z N ,(1+ )β ≤ B N (by (2.8)). Since w is of the order N , this concludes the proof when = 0. To treat the general case, we will use the following: Accepting the claim for the moment and introducing the notation We then use first Hölder's inequality with p and q and then with u and v exactly as above to get the same factors as above apart from the last factor in formula (2.10), which now becomes which is bounded from above by C N , according to the estimate (2.8) (when γ is sufficiently small). This proves the lemma once we have verified the claim above. To this end, we assume to simplify the notation that j = 1 (the general case is similar) and observe that setting μ := 1 Hence, for any f such that | f | ≤ C X , we have and hence d(μ, ν) ≤ 2C X /N . But then the claim follows directly from the Lipschitz continuity of . Now, to verify the missing lower bound on − 1 N log Z N ,β [ ], we first claim that it will be enough to verify the case when is Lipschitz continuous. Indeed, any continuous function on a compact metric space Y can be written as a uniform limit (R) of Lipschitz continuous function (for example, (R) (x) := inf Y ( (y) + Rd(x, y)) increases to , as R → ∞, and has Lipschitz constant R). Moreover we may, after relabeling the sequence, assume that | − | ≤ . But since → − 1 which proves the claim. Next, we recall that, by the Gibbs variational principle, where μ (N ) β denotes the sequence of Gibbs measures, at inverse temperature β, corresponding to the Hamiltonian H (N ) + N δ * N ( ), for Lipschitz continuous and decompose By continuity the second term above converges towards , . To prove the desired lower bound on F (N ) β (μ (N ) β ), it will thus, just as in the proof of Corollary 1.4, be enough to show that for any weak limit point ρ 2 μ ⊗2 0 of ρ (N ) 2 μ ⊗2 0 . To this end, we recall that, by the previous lemma, ρ (N ) 2 is uniformly bounded in L p , as N → ∞, for any fixed p > 1. Hence, by standard L p -duality, the limit (2.11) follows from the fact that W ∈ L q for some (any) q > 1, since by assumption, e W ∈ L 1 for any sufficiently small positive number . This concludes the proof of Theorem 1.5.

An Alternative Direct Proof of the Lower Semi-continuity of β F β
A consequence of the LDP established above is that the corresponding (scaled) free energy functional β F β is lsc on P(X ). As we show next, this could also be shown directly by establishing a macroscopic version of the hypothesis H4 (using Orlitz space duality). This indicates that there could be a more direct proof of the LDP that avoids Lemma 2.12, as discussed above.

Lemma 2.13
Under the assumptions of Theorem 1.5, the following holds: for any β > β 0 the (scaled) free energy functional β F β on P(X ) is lower semi-continuous.
Proof First observe that by the inequality (2.9) (applied to = 0) : Now, applying the previous bound to β − > β 0 reveals that Then Using the previous estimate it will, to prove the lemma, be enough to verify the following "macroscopic" version of H4 for any sequence μ j converging weakly towards μ : "MacroH 4" : First observe that it will be enough to prove that where θ is of exponential type, as in the proof of Corollary 1.4). Indeed, y) is symmetric in the last step. Hence, the Hölder-Young inequality gives Since the second factor in the right-hand side above is uniformly bounded (since D(μ j ) ≤ C), this shows that it is indeed enough to prove (2.13). To prove the latter convergence, observe that the Hölder-Young inequality implies that |u j | ≤ C, |u| ≤ C (using the integrability assumption on W ). Indeed, if μ = ρμ 0 , then where the first term is uniformly bounded in x, by assumption, and the second term is controlled by D(μ). Next, note that the convergence (2.13) that we intend to prove is equivalent to proving X θ(a(u j − u))μ 0 → 0 for any fixed positive number a. Now, since θ(t) ≤ |t|e |t| and the sup norms of |u j | and |u| are bounded from above by C, we have All that remains is thus to show that (2.14) By the lower semi-continuity of W, the desired convergence (2.14) will follow from general measure theory if u j μ 0 → uμ 0 (using that u μ ≥ 0 if W is normalized so that W ≥ 0). But where v is bounded, by the previous argument (since μ 0 trivially has finite entropy).
In particular, v is in the little Orlitz space M θ (X, μ 0 ), and since D(μ j ) ≤ C, the desired convergence (2.14) then follows from the duality argument towards the end of the proof of Corollary 1.4.

Relations to Gamma-Convergence of E (N) on P(X) : Proof of Corollary 1.6
First observe that the required lower bound on E (N ) (x (N ) ) is obtained by taking μ N ∈ P(X N ) S N to be the normalized S N -orbit in X N of the Dirac measure supported at a given x (N ) ∈ X N : Then N ) ) converges towards μ in P(X ), then (δ N ) * μ N converges towards := δ μ , and hence the desired lower bound on E (N ) (x (N ) ) follows from the assumed lower bound on E (N ) (μ N ). Next, to construct recovery sequences, we fix some β > 0, say β = 1, and a probability measure μ 0 . By Theorem 1.1 an LDP holds with rate functional E(μ) + D μ 0 (μ). In particular, the lower bound in the LDP gives that, for any given μ and μ 0 in P(X ), Hence, taking a sequence In particular, taking μ 0 = μ (which gives D μ 0 (μ) = 0), we deduce, by a diagonal argument, the existence of a recovery sequence for any given μ ∈ P(X ).

A Weaker form of the Hypothesis H2
Let us come back to the setting of Theorem 1.1 and observe that the hypothesis H2 may be replaced by the following one, which is a priori weaker (see the beginning of Sect. 2.8): • (H2') For any sequence of x (N ) ∈ X N such that δ N (x (N ) ) → μ weakly in P(X ), we have In other words, (H2') says that E (N ) := H (N ) /N , when viewed as a functional on P(X ), satisfies the lower bound property that is one of the two requirements for the Gamma-convergence of E (N ) towards E(μ), where E(μ) denotes, as before, the macroscopic mean energy whose existence is postulated in hypotheses H1.
2) where μ δ,N is the center of the ball with the largest integral (using that (Nβ N ) −1 log M δ → 0). Denote by μ δ a weak limit point in P(X ) of the family μ δ,N as N → ∞. Then B δ (μ δ,N ) ⊂ B 2δ (μ δ ) as N → ∞ (along the subsequence of {N }). Next, denote by μ a weak limit point of μ δ as δ → 0, i.e., μ is the limit of μ δ j as δ j → 0, and fix > 0. For any sufficiently small δ j , we have B 2δ j (μ δ j ) ⊂ B (μ). Hence, for any such δ j , we have 4) where, by hypothesis H2', the right-hand side is bounded from above by −E(μ) − which concludes the proof of the bound (3.1) (strictly speaking we have proved the bound for some subsequence of the {N }, but this is enough since we could have started by replacing the sequence {N } with a subsequence N j with the property that the limsup in formula (3.1) is a lim along the subsequence N j ).
A result essentially equivalent to the previous theorem appears in [13]. Combining Theorem 3.1 and Corollary 1.6 thus reveals that the hypotheses H1 and H2' actually imply the Gamma-convergence of 1 N H (N ) towards E on P(X ). But it seems unlikely that, in general, the assumption that 1 N H (N ) Gamma-converges towards a functional E on P(X ) is sufficent to deduce an LDP (even if one also assumes H3). On the other hand, as shown in [5], one does get an LDP for any β ∈]0, ∞] under an assumption of quasi-superharmonicity: Theorem 3.2 [5] Let H (N ) be a sequence of lower semi-continuous symmetric functions on X N , where X is a compact Riemannian manifold. Assume that : • The sequence 1 N H (N ) on X N (identified with a sequence of functions on P(X )) Gamma-converges towards a functional E on P(X ). This is not hard to see when β = ∞, but for β < ∞, the proof hinges on a submean inequality for quasi-subharmonic functions with a distortion factor that is subexponential in the dimension, proved in [5].