On large deviations for Gibbs measures, mean energy and Gamma-convergence

We consider the random point processes on a measure space X defined by the Gibbs measures associated to a given sequence of N-particle Hamiltonians H^{(N)}. Inspired by the method of Messer-Spohn for proving concentration properties for the laws of the corresponding empirical measures, we propose a number of hypotheses on H^{(N)} which are quite general, but still strong enough to extend the approach of Messer-Spohn. The hypotheses are formulated in terms of the asymptotics of the corresponding mean energy functionals. We show that in many situations the approach even yields a Large Deviation Principle (LDP) for the corresponding laws. Connections to Gamma-convergence of (free) energy type functionals at different levels are also explored. The focus is on differences between positive and negative temperature situations, motivated by applications to complex geometry. The results yield, in particular, large deviation principles at positive as well as negative temperatures for quite general classes of singular mean field models with pair interactions, generalizing the 2D vortex model and Coulomb gases. In a companion paper the results are illustrated in the setting of Coulomb and Riesz type gases on a Riemannian manifold X, comparing with the complex geometric setting.


Introduction
Let X be a compact topological space endowed with a probability measure µ 0 . Given a sequence of symmetric functions H (N ) on the N −fold products X N , assumed measurable with respect to µ ⊗N 0 , the corresponding Gibbs measures at inverse temperature β N ∈ R is defined as the following sequence of symmetric probability measures on X N : assuming that the partition function Z N,βN is finite: We also assume that the following limit exists: The ensemble (X N , µ (N ) βN ) (called the canonical ensemble) defines a random point process with N particles on X which, from the point of view of statistical mechanics, models N identical particles on X interacting by the Hamiltonian (interaction energy) H (N ) in thermal equilibrium at inverse temperature β N . The corresponding empirical measure is the random measure (1.1) δ N : X N → P(X), (x 1 , . . . , x N ) → δ N (x 1 , . . . , x N ) : taking values in the space P(X) of of all probability measures on X. A recurrent theme in statistical mechanics is to study the large N −limit (i.e. the "macroscopic limit") of the canonical ensemble through the large N −limit of the laws of δ N , i.e. through the sequence of probability measures on P(X). In many situations the laws Γ N can be shown to concentrate, as N → ∞, at the subset of P(X) consisting of the minima of a free energy type functional F β on P(X); we will then say that "the sequence Γ N has the concentration property". For example, if the functional F β has a unique minimizer µ β then it follows that the random measures δ N converge in law to a unique deterministic measure µ β .
A stronger exponential notion of concentration, with an explicit speed and rate functional, is offered by the theory of large deviations, by demanding that the laws Γ N satisfy a Large Deviation Principle (LDP) with speed r N and a rate functional F, symbolically expressed as The present paper is inspired by the method introduced by Messer-Spohn [28] to establish the concentration property of the laws of the empirical measures δ N using the Gibbs variational principle combined with properties of the mean energy of the system. In the original approach in [28] H (N ) was assumed to be the mean field Hamiltonian corresponding to a continuous pair interaction potential W (x, y) : where β N = β ∈]0, ∞[ (this is a mean field interaction in the sense that each particle x i is exposed to the average of the pair interactions W (x i , x j ) of all N particles, including the self-interaction). But the approach has also been extended to handle some situations where W (x, y) is allowed to be singular [16,22,24], as in Onsager's vortex model for 2D turbulence [29], where W (x, y) = − log |x − y|.
The corresponding mean field Hamiltonian is then "renormalized" by removing the self-interaction terms in order to make sure that H (N ) is generically finite on X N . The aim of the present note is to • Propose a number of quite general hypothesis on H (N ) , formulated in terms of the corresponding mean energy functional E (N ) on P(X N ), which are strong enough to extend the approach of Messer-Spohn. • Show that the approach also yields the stronger exponential concentration property in the sense of a LDP, almost "for free", in several situations • Explore some relations to the notion of Gamma-convergence of functionals: first by reformulating the approach of Messer-Spohn in terms of Gammaconvergence of the induced free energy functionals F (N ) βN on P(P(X)) and then by deducing a Gamma-convergence result for the sequence H (N ) /N on X N .
The main motivation comes from the probabilistic approach to the construction of Kähler-Einstein metrics on a complex algebraic manifold X, introduced in [5,4]. In that situation the corresponding Hamiltonians H (N ) are highly non-linear and singular (and not of the simpler mean field type appearing in formula1.4 which is used in the statistical mechanical approach to conformal geometry introduced in [23]). But still, as shown in [5], building on [7], the sequence H (N ) /N Gamma-converges towards a certain energy type functional E(µ) on P(X). Exploiting superharmonicity properties of H (N ) /N the corresponding LDP is then established at a any positive inverse temperature β in [5], producing Kähler-Einstein metrics with negative Ricci curvature in the large N −limit. The approach in [5] bypasses the problem of the existence of the macroscopic mean energy (hypothesis H1 below), which is open in the complex geometric setting. On the other hand, as discussed in [5], extending the LDP in [5] to negative β, which is needed to produce Kähler-Einstein metrics with positive Ricci curvature, necessitates the existence of the macroscopic mean energy. This is the motivation behind Theorem 1.3 below which shows that, conversely, hypothesis H1 together with the additional hypotheses H4, implies a LDP for appropriate negative β. The hypothesis H4 is inspired by the energy-entropy compactness results in [10], which can be viewed as the macroscopic analog of H4 in the complex geometric setting. Incidentally, in the lowest-dimenaional case when X is a Riemann surface the probabilistic setting in [5,4] is essentially equivalent to a mean field model with a logarithmic pair interaction, which is thus similar to Onsager's vortex model for 2D turbulence [29]. In the latter situation the corresponding concentration properties were established in [16,22], for any β above the critical negative temperature (and the LDP was established using a different method in [13]). This in line with Onsager's prediction of the existence of macroscopic negative temperature states.
Another motivation for the present note comes from random matrix theory (or more generally Coulomb gases) which can be viewed as a vortex type model with β N ∼ N (and in particular β = ∞.) The corresponding concentration property was established in [24], using the method of Messer-Spohn as in [16,22]. Here we observe that, with a simple modification, the concentration property can be upgraded to a LDP (Corollary 1.2). In particular, this allows one to dispense with the technical assumption that β N ≫ log N which is needed in the approach in [1,17,2,34]), where moreover some regularity properties of the corresponding pair interaction W (x, y) away from the diagonal are required (see Section 1.3). 1 Yet another motivation comes from approximation and sampling theory and, in particular, the problem of finding nearly minimal configurations for a given energy type interaction on a Riemannian manifold, in the spirit of [20,32].
Let us also point out that that the restriction that X be compact can be removed if suitable growth-assumptions of H (N ) at infinity are made, as in the settings in R n considered in [1,17,2,34,24,19] (using appropriate tightness estimates). But in order to (hopefully) convey the conceptual simplicity of the arguments we stick with a compact X.
1.1. Hypotheses. We may as well assume that X coincides with the support of µ 0 . In the following µ (N ) will denote a symmetric probability measure on X N and Y := P(X). We recall that the mean (microscopic) energy of µ (N ) , in the usual sense of statistical mechanics, is defined by We introduce the following hypotheses: • (H1) ("existence of a macroscopic mean energy E(µ)"): There exists a functional E(µ) on P(X) such that for any µ in P( Moreover, E(µ 0 ) < ∞ • (H2) ("lower bound on the mean energy") For any sequence of µ (N ) such that • (H3) "Approximation property": For any µ such that E(µ) < ∞ there exists a sequence µ j converging weakly to µ such that µ j is absolutely continuous with respect to µ 0 and satisfies E(µ j ) → E(µ). • (H4) ("mean energy/entropy compactness") If where D (N ) (µ (N ) ) is the mean entropy, then the following convergence holds, after perhaps replacing µ (N ) by a subsequence such that Γ N := (δ N ) * µ (N ) → Γ weakly in P(Y ) : The first hypothesis will be assumed through out the paper. The second and third one will appear naturally in positive and vanishing temperature respectively, while the fourth one turns out to be useful in some case of negative temperature. However, it may very well be that the hypothesis H4 needs to be weakened a bit in order to increase its scope. For example, in the proof of the large N − concentration properties one only needs to assume that H4 holds when µ (N ) is the Gibbs measure corresponding to H (N ) (see Remark 2.11).
Of course, the sign of the temperature may be switched by replacing H (N ) with −H (N ) , but the point is that, in practice, we will consider settings where the sign of H (N ) is fixed by the requirement that H (N ) be bounded from below (which essentially means that the system is assumed to be stable at zero temperature). on P(X) satisfy, as N → ∞, a large deviation principle (LDP) with speed β N N and rate functional Equivalently, the LDP holds for the corresponding Gibbs measures with F β replaced by F β −inf F β . Under the additional hypothesis H3 the result also holds when β = ∞ The previous theorem in particular applies to the following "finite order" Hamiltonians of mean field type. Given symmetric functions W m on X m for m ≤ M set where the inner sum rums over all multi indices I = (i 1i , ..., i m ) of length m and with the property that no two indices of I coincide. Then it is easy to verify H1 and H2 above with The main case of interest is when M = 2 and H (N ) is a sum of pair-interactions W (x i , x j ), scaled by 1/N. But since it will require no extra effort in the proofs we consider the more general "finite order setting". The corresponding result also holds for β = ∞ if the regularization hypothesis H3 holds.
In the Euclidean setting and with M = 2 the previous corollary was established very recently in [19] using different methods (based on control theory).
We next turn to the case of negative temperature. Theorem 1.3. Suppose that hypothesis H1 and H4 hold and fix a negative number β 0 . Then the following is equivalent: on P(X) satisfy a LDP with speed N and rate functional βF β (µ) = βE(µ) + D µ0 (µ) Corollary 1.4. Let W (x, y) be a symmetric measurable function on (X 2 , µ ⊗2 0 ) such that for any β ∈ R of the mean field Hamiltonians corresponding to W satisfy a LDP with speed N and rate functional βF β , with For example, the previous corollary applies when (X, µ 0 ) is a domain in R D endowed with the Lebesgue measure and W is any symmetric function with a gradient in L D loc (R D ) (then the exponential integrability condition follows from Trudinger's inequality). The key observation in the proof of Cor 1.4 is that the first point in Theorem 1.3 always implies, "for free", a uniform estimate in the Orlitz (Zygmund) space L 1 LogL 1 , so that some general Orlitz space duality results [30,26] can be exploited in order to verify the hypothesis H4. It seems natural to ask if the previous corollary can be generalized to the case when the integrability condition 1.5 is only assumed to hold for β > β 0 , for some (finite) negative number β 0 ? The following theorem gives an affirmative answer if one strengthens the integrability condition a bit: Theorem 1.5. Let X be a compact metric space and W a lower semi-continuous symmetric measurable function on X 2 and β 0 a negative number such that Then, for any β > β 0 the Gibbs measures µ (N ) β satisfy an LDP as in the previous corollary.
Specialized to the logarithmic case of the vortex model, i.e. to the case the previous theorem recovers the LDP in [13] with a new proof. The proof follows closely the corresponding (weaker) concentration result for the vortex model originally established in [16,22]. The new observation is that with a little twist the argument in [16,22] can be supplemented to give the LDP in question.

1.3.
Relations to Gamma-convergence at different levels. The proofs of the LDPs above are based on the Gamma-convergence of the corresponding free energy functionals F (N ) βN when viewed as functionals on the space P(P(X)) (a similar approach is used in the dynamic setting considered in [11] where the assumptions H1 and H2 also appear naturally). Incidentally, as observed in the following corollary the LDPs then imply the Gamma-convergence of the scaled Hamiltonians H (N ) /N when viewed as functionals on P(X). Corollary 1.6. Suppose that the Hamiltonians H (N ) satisfy H1 and H2 and H3 for some measure µ 0 . Then H (N ) /N Gamma-converges towards E(µ) on P(X). In particular, this applies to the finite order mean field Hamiltonians (assuming the approximation property H3).
The previous result generalizes the Gamma-convergence result in [34, Prop 2.8, Remark 2.19] for the mean field Hamiltonians corresponding to pair interactions W (x, y) (on Euclidean domains), where it was assumed that w(x, y) essentially only blows up along the diagonal (similar result also appear implicitly in [1,17,2]). The proofs in [34,1,17,2]) are based on some rather intricate combinatorial constructions, involving small cubes. On the other hand, the latter results yield the stronger result that any measure µ with a positive continuous density admits a recovery sequences x (N ) such that where B ǫN denotes the L ∞ −ball centered at x (N ) with radius ǫ N of the order 1/N 1/D , for D = dim X. In turn, as shown in [34], the latter stronger form of Gamma-convergence implies the LDP for the corresponding Gibbs measures when . Relations between Gamma-convergence and large deviation principles have also been previously studied in [27] but from a rather different perspective (see also [14] for some related results, which in particular allows one to dispense with the assumption H3 in 1.6).

1.4.
Applications to the Coulomb gas on a Riemannian manifold. In the companion paper [6] the general large deviation results above are illustrated and further developed for Coulomb and Riesz type gases on a compact D−dimensional Riemannian manifold (X, g) (and more generall for suitable compact subsets K ⊂ X). Here we will only state the corresponding LDP for the Coulomb gas on (X, g) defined as follows. Let W (x, y) be 1/2 times the integral kernel of the inverse of the positive Laplacian −∆ on the space of all functions in L 2 (X, dV g ) with mean zero. As is classical, W is symmetric and smooth away from the diagonal and close to the diagonal it admits the following asymptotics when D > 2 : where the leading constant is expressed in terms of the classical Γ−function. Moreover, when D = 2 In particular, W is lsc and in L 1 (X) on X × X. Given a probality measure µ 0 on X the Coulomb gas on (X, g, µ 0 ) at inverse temperature β N is defined by the Gibbs measures corresponding to (µ 0 , H (N ) , β N ) where H (N ) is the mean field Hamiltonian corresponding to the pair interaction W (x, y). In this setting Cor 1.2 and Theorem 1.5 yields, as shown in [6], the following LDP for the laws of the empirical measures of the Coulomb gas, formulated in terms of the potential theoretic properties of the measure µ 0 : Let (X, g) be a compact Riemannian manifold and consider the Coulomb gas at inverse temperature β N on (X, g, µ 0 ).
• When β ∈]0, ∞[ the LDP holds if the measure µ 0 is non-polar • When β = ∞ the LDP holds if µ 0 is non-polar and µ 0 is determining for its support K. • When D = 2 and β < ∞ the LDP holds when β > −4πd(µ 0 ), where d(µ 0 ) ∈ [0, ∞[ is the sup över all t > 0 such that there exists a positive constant C (depending on t) such that as R → 0, for any Riemannian ball B R (x) of radius R centered at a given point x in X.
We briefly recall that a compact subset K ⊂ X is polar if it is locally contained in the −∞−set of a local subharmonic function (or equivalently, if K has vanishing capacity). Accordingly, a measure µ 0 is said to be non-polar if it does not charge any polar set. The notion of a determining measure µ 0 appearing in the second point above means that for any given u ∈ C 0 (X) for any quasi-subharmonic function ϕ on X, i.e. ϕ is strongly usc and satisfies ∆ϕ ≥ −1. This notion is closely related to the notion of measures satisfying a Bernstein-Markov property in pluripotential theory [12,8] and measures with regular asymptotic behaviour in the theory of planar orthogonal polynomials [33]. For example, µ 0 can be taken to be the D−dimensional Hausdorff measure on a Lipschitz domain K ⊂ (X, g) or the (D − 1)−dimensional Hausdorff measure on a Lipschitz hypersurface in (X, g). The point is that the assumption that µ 0 is nonpolar and determining implies that the hypothesis H3 is satisfied, as shown in [6] (an alternative proof of the LDP in the case when β = ∞ can also be given using the approach in the complex geometric setting in [3], based on [7, ?]). Finally, we recall that measures satisfying d(µ 0 ) > 0, as in the third point above, are sometimes called Frostman measures in the classical litterature (for example, the property in More generally, an LDP as in the previous theorem is obtained in [6], when W (x, y) is taken as the integral kernel of the inverse of (−∆) p and the (possible fractional) power p is in ]0, D/2] (or even more generally: when (−∆) p is replaced by a suitable pseudodifferential operator of order at most D). Then the last point in the previous theorem holds in the critical case p = D/2. However, the LDP for β = ∞ appears to be rather subtle in the general setting and is only shown to hold when µ 0 is a volume form (or comparable to a volume form), except when p ≤ 2 where it applies to measures µ 0 which are determining in a suitable sense.
Let us also point out that in the Euclidean setting of the Coulomb and Riesz gases in R n , with µ 0 given by the Euclidean volume form and β N of the order N, a refined "microscopic" large deviation principle "at the level of processes" is obtained in [25]. Such large deviation principles are beyond the scope of the present paper and seem to require different methods -the point here is rather to allow the measure µ 0 to be very singular (and the inverse temperature to be negative, in some cases).
Acknowledgment. It is a pleasure to thank Sebastien Boucksom, Vincent Guedj, Philippe Eyssidieu and Ahmed Zeriahi for the stimulating collaboration [10]. Also thanks to the editors Doug Hardin, Edward Saff and Sylvia Serfaty for the invitation to contribute to the special issue of the Journal of Constructive Approximation on the theme 'Approximation and statistical physics', which prompted the present paper.

Proofs of the large devations results
2.1. General notation. Given a compact topological space X we will denote by C 0 (X) the space of all continuous functions u on X, equipped with the sup-norm and by M(X) the space of all signed (Borel) measures on X. The subset of M(X) consisting of all probability measures will be denoted by P(X). We endow M(X) with the weak topology, i.e. µ j is said to converge to µ weakly in M(X) if µ j , u j → µ, u :=ˆX uµ for any continuous function u on X, i.e. for any u ∈ C 0 (X) (in other words, the weak topology is the weak*-topology when M(X) is identified with the topological dual of C 0 (X)). Since X is compact so is P(X). Given a lower semi-continuous function F on Y := P(X) we will, abusing notation slightly, also write F for the induced linear lower semicontinuous functional on P(Y ) : Equivalently, under the natural embedding µ → δ µ of Y into P(Y ) the function F (Γ) is the unique lower semi-continuous affine extension of F to P(Y ). We will denote by S N the permutation group acting on X N and by P(X N ) SN the space of symmetric measures µ N (i.e. S N −invariant) on X N . Also note that, following standard practice, we will denote by C a generic constant whose value may change from line to line.
2.1.1. Entropy. We will write D(ν 1 , ν 2 ) for the relative entropy (also called the Kullback-Leibler divergence in information theory) of two measures ν 1 and ν 2 on a topological space Z : if ν 1 is absolutely continuous with respect to ν 2 , i.e. ν 1 = f ν 2 , one defines D(ν 1 , ν 2 ) :=ˆY log(ν 1 /ν 2 )ν 1 and otherwise one declares that D(µ) := ∞. Note the sign convention used: D is minus the physical entropy. In our setting the space Z will always be of the form X N and we will then take the reference measure ν 2 = µ ⊗N 0 and write D(·) := D(·, µ ⊗N 0 ). It will also be convenient to define the mean entropy of a probability measure µ N on X N (i.e. µ N ∈ M 1 (X N )) as Then it follows directly that Moreover, denoting by (µ N ) j the j th marginal µ N (which defines a probability measure on X j ) as follows from the concavity of the function t → log t on R + (see for example [22] for any open subset G of Y. Remark 2.2. The LDP is said to be weak if the upper bound is only assumed to hold when F is compact. Anyway, we will only consider the case when Y is compact and hence the notion of a weak LDP and an LDP then coincide (and moreover any rate functional is automatically good).
Then Γ N satisfies a LDP with speed r N and rate functional (by Varadhan's lemma the converse also holds).

2.2.2.
Gamma-convergence. We recall that a sequence of functions f j on a topological space X is said to Gamma-converge to a function f on X if (such a sequence x j is called a recovery sequence); see [15]. More generally, given a subset S ⋐ X we will say that f j Gamma-converge to f relative to S if the existence of a recovery sequence in X is only demanded when x ∈ S.
Lemma 2.5. Assume that f j Gamma-converges to f relative to S ⊂ X . Then f |S is lower semi-continuous.
Proof. Consider a sequence s i → s in S. For each s i we take a recovery sequence x (j) i in X converging so s i . By a diagonal argument we get a sequence x i in X converging to s such that f (s i ) = f i (x i ) + o(1) and hence f (s i ) ≥ f (s) + o(1), as desired.
Lemma 2.6. Let X be a compact topological space and assume that f j Gammaconverges to f relative to a set S containing all minima of f. Then Proof. Given s ∈ S we take a recovery sequence x n and observe that for some y ∈ X , by the compactness and the assumption of Gamma-convergence. In particular, when s realizes the minimum of f so does y and hence equalities must hold above, which concludes the proof.

2.2.3.
Legendre-Fenchel transforms. Let f be a function on a topological vector space V. Then its Legendre-Fenchel transform is defined as following convex lower semi-continuous function f * on the topological dual V * in terms of the canonical pairing between V and V * . In the present setting we will take V = C 0 (X) and V * = M(X), the space of all signed Borel measures on X. Then f * * = f for any lower semi-continuous convex function (by standard duality in locally convex topological vector spaces [18]). βN the corresponding mean free energy functional on P(X N ) SN , at inverse temperature β N : Now set Y := P(X). By embedding P(X N /S N ) isometrically into P(Y ), using the push-ward map (δ N ) * , we can and identify the mean free energies F (N ) with functionals on P(Y ), extended by ∞ to all of P(Y ). We will identity Y with image in P(Y ) under the embedding µ → δ µ . The starting point of the proof of the LDP is the following reformulation of Bryc's lemma in terms of Legendre-Fenchel transform, using the Gibbs variational principle: βN is a well-defined probability measure, then  In order to verify the criterion in the previous lemma we will use the following lemma: Lemma 2.9. Under the hypotheses H1 and H2 and β ∈]0, ∞[ the mean free energies F (N ) βN Gamma-converge to the lower semi-continuous linear functional F β (Γ) on P(Y ), relative to Y, where F β (µ) is the macroscopic free energy on Y : If moreover H3 holds, then the corresponding result also holds when β = ∞ Proof. First assume that β < ∞. The lower bound follows directly from hypotheses H1 and H2 together with the fact that the mean entropy functionals satisfy the lower bound in the Gamma-convergence (by subadditivity [31]; see also Theorem 5.5 in [21] for generalizations). To prove the existence of recovery sequences we fix an element Γ of the form δ µ and take the recovery sequence to be for the form (δ N ) * µ ⊗N . Then the required convergence follows from H1 together with the product property 2.1. Finally, when β = ∞ the previous argument for the existence of a recovery sequence still applies as long as µ satisfies D(µ) < ∞. The general case then follows by a simple diagonal approximation argument using H3. Now, since the limiting functional F (Γ) is affine and lower semi-continuous (by Lemma 2.5) and the set Y is extremal in P(Y ) the infimum of F on P(Y ) is attained in Y (for example, by Choquet's theorem). Fixing a continuous function Φ on C 0 (Y ) and replacing H (N ) with the new Hamiltonian H (N ) +N δ * N (Φ), Lemma 2.6 thus shows that the criterion in Lemma 2.7 is satisfies. Hence the LDP holds with lower semi-continuous rate functional I(µ) = f * (δ µ ). Finally, extending I to P(Y ) by linearity this means that I(Γ) is the Legendre-Fenchel transform of f, i.e. I = f * . But in our case f is itself defined as f := F * and hence, I = F * * = F since F is convex (and even affine) and lower semi-continuous.
10. An inspection of the proof of Theorem 1.1 above reveals that, in the case β = ∞, the hypothesis H3 may be replaced by the following weaker one: • (H3)' The functional F β Gamma-converges towards E, as β → ∞ 2.4. Proof of Corollary 1.2. First note that H1 is trivially satisfied (by the Fubini-Tonelli theorem). To verify the second hypothesis H2 we may as well, by linearity, assume that M = m and that there is just one term with W := W m (x 1 , ..., x m ). Since W is lower semi-continuous there exists a sequence of continuous functions W R increasing to W as R → ∞ and we denote by E WR the corresponding functionals on P(X). It follows readily from the definitions that for any fixed R > 0 WR (x 1 , x 2 , ...., x N ) and in particular for any R > 0. Finally, letting R → ∞ and using the monotone convergence theorem of integration theory concludes the proof.

2.5.
Proof of Theorem 1.3. First observe that if the LDP in the second point of the theorem holds then integrating over all of P(X) reveals that the first point holds. To prove the converse we fix β > β 0 and note that Gibbs variational principle applied at the inverse temperature β − ǫ gives for any fixed ν ∈ P(Y ). In particular, taking ν = µ 0 and using the hypothesis H1 gives that β ) ≤ C ′ But then the previous inequalities force for any limit point Γ of the laws of δ N . As a consequence we deduce precisely as before that the desired asymptotics for the β N F (N ) hold. Finally, repeating the argument with H (N ) replaced by the new Hamiltonian H (N ) + N δ * N (Φ) (which satisfies the same hypothesis) concludes the proof, just as before.
Remark 2.11. If one only wants to prove that the laws of δ N concentrate on the minima of F β (rather than proving a LDP) it is enough to show that the convergence of the free energies hold for Φ = 0 (as in the original approach in [28]). As revealed by the previous proof this only requires that the hypothesis H4 holds for the particular sequence µ Performing the latter integral first over the N − 1 variables different from x i gives C N −1 where C =´e −βW (x,y) µ ⊗2 0 , which is assumed finite, giving the desired bound. Hence, by Theorem 1.3, it will be enough to show that H4 is satisfied (for any sequence µ N ). To this end we will apply a duality argument. First recall that given a measure space (X , µ) and a finite Young function θ on R (i.e. a non-negative even lower semi-continuous convex function) the corresponding large Orlitz space is defined by (where all functions f are assumed measurable). The space L θ (X , µ) may be equipped with a norm · θ , called the Luxemburg norm, which turns L θ (X , µ) and its subspace M θ (X , µ) into Banach spaces: i.e. the gauge of the set (unit-ball) f :ˆˆθ(f )µ ≤ 1 ,

By the Hölder-Young inequality
where θ * is the Young function defined as the Legendre-Fenchel transform of θ. In particular, for any g ∈ L θ * f →´f gµ defines a continuous function on L θ with bounded operator norm, i.e. L θ * ⊂ L * θ , where L * denotes the Banach space dual of a Banach space L, endowed with the operator norm. To apply this in the present context we note that Now set θ(s) := e s − s − 1. Then θ * (s) = (s + 1) log(1 + s) − s. By the previous entropy inequality for ρ N the sequence {ρ N } stays in a fixed ball in L θ * and hence, by the Hölder-Young inequality, {ρ N } stays in a fixed ball in the dual Banach space L * θ . By weak compactness it then follows that there exists Λ ∈ L * θ such that for any g ∈ L θˆX m ρ N gµ ⊗m 0 → Λ, g (after perhaps passing to a subsequence). Now, since ρ N gµ is a probability measure we may also assume that there exists ρ ∈ L θ such that for any continuous function u. In our case g = W and we just need to check that Λ, g = Λ, ρ . But, by assumption, W ∈ M θ and by the general duality theorems in [30,26] the topological dual of M θ identifies with L θ * , i.e. any continuous functional Λ on M θ is obtained by integrating against a (unique) ρ ∈ L θ * , which concludes the proof.

2.7.
Proof of Theorem 1.5. Given a compact metric space X we endow Y (:= P(X)) with the Wasserstein L 1 −metric d, which is compatible with the weak topology: where f is Lipschitz continuous on X with Lipschitz constant 1. Since´(µ − ν) = 0 we may as well assume that f (x 0 ) = 0 for a fixed point x 0 and hence that |f (x)| ≤ C X where C X is independent of f (since X is compact and, in particular, has bounded diameter).
We fix, as before, a continuous function Φ on Y := P(X). Without loss of generality we may as well assume that W, Φ ≥ 0.
First observe that when β > β 0 we have precisely as in the beginning of the proof of Cor 1.4 (using that Φ is bounded).
Using the convergence of the mean energies and Gibbs variational principle, as before, we thus have Lemma 2.12. For any β > β 0 the corresponding (scaled) free energy functional βF β on P(X) is lower semi-continuous.
Proof. First observe that by the inequality 2.7 (applied to Φ = 0) : Now, applying the previous bound to β − ǫ > β 0 reveals that Given µ in P(X) with E(µ) < ∞ we set u µ (x) :=´W (x, y)µ. Then Using the previous estimate it will, to prove the lemma, be enough to verify the following "macroscopic" version of H4 for any sequence µ j converging weakly towards µ : (compare [10,Theorem 2.17]). To this end we set u j := u µj and observe that the Hölder-Young inequality (with θ of exponential type, as in the proof of Cor 1.4) implies that |u j | ≤ C (using the assumption on W ). Hence, using the Hölder-Young inequality again it will be enough to show that, u j − u θ → 0, or equivalently that, for any given a > 0ˆX θ(a(u j − u))µ 0 → 0.
Since θ(t) ≤ te t and |u j | ≤ C it will thus be enough to show that By the lower semi-continuity of W, lim inf j→∞ u j ≥ u the desired convergence will follow from general measure theory if´u j µ 0 →´uµ 0 (using that u µ ≥ 0 if W is normalized so that W ≥ 0). But where v is bounded, by the previous argument (since µ 0 trivially has finite entropy).
In particular, v is in the little Orlitz space M θ (X, µ 0 ) and since D(µ j ) ≤ C the desired convergence then follows from the duality argument towards the end of the proof of Cor 1.4. Now, to prove the LDP we need, in view of Lemma 2.7, to complement the upper bound on − 1 N log Z N,β [Φ] in formula 2.7 with a corresponding lower bound. To this end it seems natural to try to extend the Orlitz space duality argument in the proof of Cor 1.4 to the present setting, exploiting the uniform bound on the entropy of the marginals. But here we will instead take another road (inspired by [16,22]), exploiting the stronger L p −bounds provided by the following lemma.
Lemma 2.13. Let Φ be a given Lipschitz continuous function on Y := P(X) and fix β > β 0. Then the following estimate holds for the densities ρ as N → ∞. In particular, for any p > 1 ρ Proof. To fix ideas we start with the case Φ = 0, following closely the proof of Theorem 3.1 in [16]. Set Applying Hölder's inequality with p = N (and thus q = 1 + 1/N − 1)) the integral in the right hand side is bounded from above by By assumption the integral appearing in the first factor above is bounded from above by a A N . It will thus be enough to show that the second integral is controlled by Z N in the sense that it is bounded from above by a uniform constant times Z N .
To this end we will apply Hölder's inequality again, now with conjugate exponents u and w with u sufficiently close to 1 (to be quantified below). We thus rewrite and apply Hölder's inequality. Since w(q − 1/u) = 1 + w(q − 1) = 1 + w (N −1) this gives (2.8) Hence, taking w = ǫ(N − 1) for a sufficiently small positive number ǫ the first factor is controlled by Z N (since W ≥ 0) and the integral in second factor is controlled by Z N.(1+ǫ)β ≤ B N (by 2.6). Since w is of the order N this concludes the proof when Φ = 0. To treat the general case we will use the following Accepting the claim for the moment and introducing the notation Φ( 1 We then use first Hölder's inequality with p and q and then with u and v exactly as above to get the same factors as above apart from the last factor in formula 2.8 which now becomesˆe which is bounded from above by C ′N , according to the estimate 2.6 (when γ is sufficiently small). This proves the lemma once we have verified the claim above.
To this end we assume to simplify the notation that j = 1 (the general case is similar) and observe that setting µ := 1 N N i=1 δ xi and ν := 1 Hence, for any f such that |f | ≤ C X we have and hence d(µ, ν) ≤ 2C X /N. But then the claim follows directly from the Lipschitz continuity of Φ.
Now, to verify the missing lower bound on we first claim that it will be enough to verify the case when Φ Lipschitz continuous. Indeed, any continuous function Φ on a compact metric space Y can be written as a uniform limit Φ (R) of Lipschitz continuous function (for example, Φ (R) (x) := inf Y (Φ(y) + Rd(x, y)) increases to Φ, as R → ∞, and has Lipschitz constant R). Moreover we may, after relabeling the sequence, assume that |Φ ǫ −Φ| ≤ ǫ. But since which proves the claim. Next, we consider the sequence µ N = µ (N ) βN of Gibbs measures corresponding to the Hamiltonian H (N ) +N δ * N (Φ), for Φ Lipschitz continuous and decompose By continuity the second term above converges towards Φ, Γ . To prove the desired lower bound on F β ) it will thus, just as in the proof of Cor 1.4, be enough to show that for any weak limit point To this end we recall that, by the previous lemma, ρ (N ) 2 is uniformly bounded in L p , as N → ∞, for any fixed p > 1. Hence, by standard L p −duality the limit 2.9, follows from the fact that W ∈ L q for some (any) q > 1, since by assumption, e ǫW ∈ L 1 for any sufficiently small positive number ǫ.

2.8.
Relations to Gamma-convergence of E (N ) on P(X) : Proof of Cor 1.6. First observe that the required lower bound on E (N ) (x (N ) ) is obtained by taking µ N to be the normalized S N −orbit in X N of the Dirac measure supported at x (N ) . Then E (N ) (x (N ) ) = E (N ) (µ N ) and since µ N converges towards Γ := δ µ this proves the desired lower bound.
Next, to construct recovery sequences we take a sequence β N → ∞. By Theorem 1.1 an LDP holds with rate functional E(µ). In particular, the lower bound in the LDP gives that for any δ > 0 we have for ǫ ≤ ǫ δ −E(µ) − δ ≤ lim sup Hence, if D(µ) < ∞ then Sanov's theorem (i.e. the LDP for H (N ) = 0) shows that there exists a sequence x Now a diagonal argument shows that any such µ admits a recovery sequence. Finally, the regularity assumption allows us to deduce the existence of a recovery sequence for any µ.

Concluding remarks
3.1. A weaker form of the hypothesis H2. Let us come back to the setting of Theorem 1.1 and observe that the hypothesis H2 may be replaced by the following one, which is a priori weaker (see the beginning of Section 2.8): • (H2') For any sequence of x (N ) ∈ X N such that δ N (x (N ) ) → µ weakly in P(X) we have In other words, (H2') says that E (N ) := H (N ) /N, when viewed as a functional on P(X), satisfies the lower bound property which is one of the two requirements for the Gamma-convergence of E (N ) towards E(µ), where E(µ) denotes, as before, the macroscopic mean energy whose existence is postulated in hypotheses H1. Proof. Just as in the proof of Theorem1.1, in order to verify the convergence in Bryc's lemma, we may without loss of generality assume that Φ = 0. Moreover, exactly as before H1 combined with the Gibbs variational principle yields the upper bound on − log Z N,βN . To prove the lower bound first note that for any fixed µ ∈ P(X) and ǫ > 0 Sanov's theorem gives, just as in the proof of Cor 1.6 in Section 2.8, that where, by hypothesis H2', the right hand side is bounded from above by −E(µ) − 1 β D(µ) := −F β (µ). Now, for any fixed δ > 0 we cover the compact space P(X) by a finite number M δ of balls B δ (µ δ,i ) with i = 1, .., M δ . Then where µ δ is the center of the ball with the largest integral. Next, fix ǫ > 0 and denote by µ a limit point in P(X) of the family µ δ . For any sufficiently small δ j we have B δj (µ δj ) ⊂ B ǫ (µ). Hence, estimating the right hand side in the previous formula with the integral over B ǫ (µ) and using the inequality 3.1 gives which shows that Bryc's lemma can be applied, just as before, to deduce the LDP in question.
A result essentially equivalent to the previous theorem appears in [14]. Combining Theorem 3.1 and Cor 1.6 thus reveals that the hypotheses H1 and H2' actually implies the Gamma-convergence of 1 N H (N ) towards E on P(X), if the approximation hypothesis H3 holds.
Finally, let us point out that it seems unlikely that, in general, the assumption that 1 N H (N ) Gamma-converges towards a functional E on P(X) is not enough to deduce a LDP (even if one also assumes H3). On the other hand, as shown in [5], one does get an LDP for any β ∈]0, ∞] under an assumption of quasi-superharmonicity: Theorem 3.2. [5] Let H (N ) be a sequence of lower semi-continuous symmetric functions on X N , where X is a compact Riemannian manifold. Assume that • The sequence 1 N H (N ) on X N (identified with a sequence of functions on P(X) Gamma-converges towards a functional E on P(X) • H (N ) is uniformly quasi-superharmonic, i.e. ∆ x1 H (N ) (x 1 , x 2 , ...x N ) ≤ C on X N Then, for any sequence of positive numbers β N → β ∈]0, ∞] the measures Γ N := (δ N ) * e −βN H (N ) on M 1 (X) satisfy, as N → ∞, a LDP with speed β N N and good rate functional (3.2) F β (µ) = E(µ) + 1 β D dV (µ)