Estimation of local microcanonical averages in two lattice mean-field models using coupling techniques

We consider an application of probabilistic coupling techniques which provides explicit estimates for comparison of local expectation values between label permutation invariant states, for instance, between certain microcanonical, canonical, and grand canonical ensemble expectations. A particular goal is to obtain good bounds for how such errors will decay with increasing system size. As explicit examples, we focus on two well-studied mean-field models: the discrete model of a paramagnet and the mean-field spherical model of a continuum field, both of which are related to the Curie–Weiss model. The proof is based on a construction of suitable probabilistic couplings between the relevant states, using Wasserstein fluctuation distance to control the difference between the expectations in the thermodynamic limit.


Introduction
We consider a novel method of analysis of convergence of local expectation values in probability distributions associated with microcanonical ensembles. Our approach aims at answering the following question which would be natural, for example, to control expectations in states arising in ergodic theory: Assume that the system is in a microcanonical state with one or two known fixed conserved quantities which are label permutation invariant. Consider an observable which depends only on a few degrees of freedom of some finite but large system, for example, consider local correlation functions. Assume that there is some other permutation invariant probability distribution, such as the corresponding grand canonical ensemble, in which the expectation of the observable can be computed, either via a simulation or analytically. Can we estimate the error which arises from replacing the typically not computable microcanonical expectation with the second result? Assuming that the two ensembles are thermodynamically equivalent, how fast does the error decrease with increasing system size?
From the perspective of uniform measures with constraints, we mainly focus on the related standard ensembles, i.e., microcanonical, canonical, and grand canonical ensembles, each with parameters associated with the thermodynamically relevant quantities. For the sake of completeness, we will give a heuristic overview of the standard ensemble theory in Sec. 1.1. There we also introduce notations and terminology which will be used later for defining the ensemble measures of the two models.
In the above standard ensemble set-up, the thermodynamic equivalence of ensembles can often be studied via relative entropy methods. In certain models, in particular, of discrete lattice fields, the relative entropy bounds can also provide an answer to the question stated in the beginning via the Pinsker inequality. However, relative entropy estimates are not always readily available, cannot be used between measures which are not absolutely continuous at least in one direction, and as we will show explicitly later, the estimates they provide might not be optimal.
The motivation to look for improvements of the well developed earlier methods comes from a recent result [13] for the supercritical Berlin-Kac spherical model [1]. This is a model with two thermodynamic quantities in a canonical ensemble where one of them becomes frustrated and forms a condensate. This results in an nonequivalence between the canonical and grand canonical ensembles. However, it was shown in [13] that, after separating the condensate modes, the state of the remaining modes is well described by a grand canonical state. Comparing canonical with this modified grand canonical ensemble yields local expectations which converge to each other in the thermodynamic limit. The result was proven using a suitably constructed coupling and relying on the translation invariance of the system, and the resulting estimates imply a power-law convergence in the system size of the errors between the two expectation values.
The coupling technique in [13] is, however, quite specific to the Berlin-Kac model, and relies partially on the existence of the condensate. Here, we explore the extension of these ideas to wellstudied cases where equivalence and non-equivalence of various ensembles are known, and which are sufficiently simple to be fairly explicitly computable. For the models chosen here, translation invariance is being replaced by label permutation invariance, and it will serve as an important tool to lift the fairly crude coupling estimates into convergence of various local expectations. We also explore the idea of replacing the standard ensembles of some of these models with other, more accurate but still easy to evaluate, measures. The standard theory of ensemble equivalence will serve as a guide in this choice, and also in most of the cases studied here, it will suffice on its own.
The first of the models is the simple paramagnet which one can find in [11]. In working with this model, there will be a slight abuse of terminology. There is no associated Hamiltonian, but the magnetization is the "conserved quantity" of this model. The corresponding canonical ensemble then has a parameter associated with the control of expectation of magnetization. The paramagnet model is, however, closely related to the standard Curie-Weiss model since the ensemble expectations of the latter can be expressed as a convex combination of those of the former (this connection will be discussed also in Sec. 3.4).
The second model is a continuum modification of the Curie-Weiss model called the mean-field spherical model. The model has been studied in [9] and it is a simplification of the Berlin-Kac model introduced in [1]. In [9], the authors consider the thermodynamic properties of the microcanonical and canonical ensembles. In Sec. 4, we explore the mean-field spherical model in a slightly generalized set-up, namely, by also considering the density of the system to be a free parameter. This allows to study the properties of the grand canonical ensemble which is not explored in [9].
For both of these models, we will give detailed proofs of explicit rates of convergence of finite marginal distributions and/or finite moments of all order between the ensembles of the models. The main result here is the development of novel methods which employ rigorous and well-understood analysis of the thermodynamic properties of the ensembles in order to prove a form of weak convergence of the probability measures corresponding to the different ensembles.
For the simple paramagnet, we will supply two distinct proofs with explicit errors for the convergence of local observables. The first proof will utilize relative entropy and, as such, will mainly reprove and collect known results. The second proof will utilize a coupling argument. Due to the simple nature of this model, we can show explicitly that the error bounds given by the coupling method are strictly better than the bounds given by the relative entropy method that we used. For the mean-field spherical model, we will focus solely on application of the coupling methods to prove local convergence results.
The main mathematical tools for the rigorous control of expectations in the various ensemble measures are couplings of the ensemble measures and the related Wasserstein distance between them, with suitably chosen "cost functions." We give a brief review of couplings and Wasserstein metric in Sec. 2.1.
A crucial property of the ensemble measures and of the couplings constructed here is their invariance under permutations of the particle labels. The permutation invariance improves the control of differences of expectations under the ensemble measures, allowing to bound the error by the above Wasserstein distance. The method is similar to how translation invariance has been used in [13] for the supercritical Berlin-Kac spherical model, and it is described in detail in Sec. 2. Another tool for such an estimation is the Laplace method of asymptotic analysis for such integrals. The method and how it applies to the above error estimation is also discussed in Sec. 2

and A.2.
We postpone more detailed discussion about further related previous works, and how the present estimates connect to these, at the end of Introduction, to Sec. 1.2.

Equilibrium ensembles with two thermodynamic quantities
To fix our conventions, let us record here briefly our definitions and parametrization of the standard ensembles. For a review of further results and discussion about the thermodynamic equivalence of ensembles, we refer to [16] and for mathematical details also to [12].
In the following, S is some arbitrary state space with some fixed positive reference measure dφ. The two thermodynamic observables, the "conserved quantities", will be called the energy H : S → R and the particle number N : S → R. We use V > 0 to represent the number of degrees of freedom of the system and we focus on the properties of the system for large V . It is typically related to the "volume" of the state space S in some way.
We represent the constraints using, at the moment somewhat formal, δ-function notations; the rigorous meaning of the notations will be discussed later. Let us stress that we do not take the commonly used thin-shell smoothing of these measures since this would for our purposes unnecessarily complicate the analysis. However, we then have to be careful in the choice of allowed parameter values in some of the ensembles below, to avoid instances where the normalization factor is zero or otherwise ill-defined.
The microcanonical ensemble with energy density ε ∈ R and particle density ρ ∈ R is then given by The canonical ensemble with inverse temperature β ∈ R and particle density ρ ∈ R is given by Finally, the analogously defined grand canonical ensemble with inverse temperature β ∈ R and chemical potential µ ∈ R is Let us remark that, for later convenience, we do not follow the standard physics conventions here using which our parameter "µ" should have been replaced by "−βµ". With the above definitions, there are a number of immediate, explicit relations between some of the above ensembles. In particular, we will need later the following two observations which allow representing an ensemble as a mixture of the more constrained ensemble: Next, we define the specific microcanonical entropy or microcanonical entropy per degrees of freedom by We define the specific canonical free energy or canonical free energy per degrees of freedom by Note that we do not divide here by β, as would be common for definition of a free energy: this would not be convenient for our models since also zero and negative values of β may occur here.
Similarly, the specific grand canonical free energy or grand canonical free energy per degrees of freedom is defined here by Now, in order to see the relationship to Laplace-type integrals, we note that Assuming that the limits exist, we define Then either Laplace-type integral estimates or large deviation techniques [12] can often be used to show that the limit functions are related by a Legendre transform: Typically, this results in a one-to-one correspondence between the parameters ε and ρ in the microcanonical ensemble with the associated free parameters β and µ. Assuming that the above thermodynamic limits exist and agree with each other using this correspondence, we say that the ensembles are thermodynamically equivalent . The theory of Laplace-type integrals is well-developed and allows one to compute explicit asymptotics of such integrals. In particular, one is typically interested in second-order fluctuations. Indeed, from the specific free energies, we obtain Using the theory of Laplace-type integrals, we typically have The notion of "typical" here is rather vague and we refer the reader to A.2 for a more detailed account of the use of the Laplace method. The first limit implies that the energy density of the canonical system converges to a constant, which, in turn, implies that the energy density behaves like O(1) for large V . The contents of the second limit imply that the standard deviation of the energy density of the canonical system behaves like O(V − 1 2 ). However, one should not rely on these formulae directly at phase transition points where the differentiability assumptions fail: in such cases, more refined tools, such as subdifferentials and convex analysis, will be needed to study the related behaviour.
However, in addition to analysing the thermodynamic properties of the system, the Laplacetype analysis offers us something more. Indeed, if we return to the alternative representation of the canonical ensemble and we denote the minimizing ε of f C (β, ρ) by ε * , then, for some suitable class of observables g(φ), one might expect that We then say that the two ensembles are equivalent in this observable class. For instance, if the above result would hold for every function g : S → C which is Lipschitz continuous, we could say that the microcanonical and canonical ensembles are Lipschitz observable equivalent . Analogously, if the result holds for all polynomials g of the field whose degree is not allowed to grow with V , we say that the ensembles are equivalent in their local moments. In this paper, we consider the suitable class of functions g, and the rate of convergence in (1.2) in more detail.

Clarification of terminology
To avoid possible misunderstandings, let us explicitly record our usage of the terminology concerning ensembles and related objects such as the partition function and free energy. Most notably, we will need to make a distinction between thermodynamic and auxiliary statistical ensembles. A statistical ensemble is a probability distribution describing the state of a system. A thermodynamic ensemble is a particular statistical ensemble which is determined by the physical properties of the system, in particular, by its dynamics. The most common examples start with a Hamiltonian defining the dynamics and then include any other relevant conserved quantities using one of the above discussed forms leading to microcanonical, canonical, and possibly one or more grand canonical ensembles. Partition functions and free energies can then be associated with these thermodynamic ensembles. We do make some choices of convenience to simplify the overall constant in the partition function: to avoid misunderstandings, we include also their explicit definitions in the following.
Here, we start from some some given thermodynamic ensemble in the microcanonical form. This yields the physical probability distribution whose local expectation values we aim to estimate. For this estimation, it turns out to be helpful to introduce new probability measures, i.e., statistical ensembles, on the system which we will call auxiliary ensembles. Since many of these auxiliary measures can be written in the same form as standard thermodynamic ensembles, it will be helpful to extend the standard terminology also there, leading, for example, to "auxiliary microcanonical ensemble with fixed magnetization density" for the Curie-Weiss model.
The auxiliary ensembles can be associated with "partition functions" and "free energies" in analogy with the standard ensembles, and this indeed will become a helpful shorthand notation in some of our computations. However, it should be stressed that the auxiliary ensembles usually do not have any thermodynamic meaning, for example, the magnetization defining the auxiliary ensemble above is not implied to be a conserved quantity in any dynamics leading to the Curie-Weiss model. In addition, when talking about phase transitions and their order, we will always refer to the parameters in the original thermodynamic ensembles, and not to those appearing in the auxiliary ensembles.

Related works and further motivation
There has always been considerable interest in trying to classify the "correct" notions of convergence of the equilibrium ensembles. For a particularly illuminating and modern account on some of the various notions which have been considered, we refer to [16] and its references. Thermodynamic equivalence from the point of view of large deviations and convexity properties of entropy is considered in great generality in [16]. Here, we approach the problem more from the point of view of convergence of generic local expectation values, and the additional facilitating ingredient is label permutation invariance of the studied equilibrium ensembles. For rigorous applications of the ensembles in non-equilibrium phenomena, such as for estimating the accuracy of local thermal equilibrium while studying heat transport, it would be important to be able to estimate the error in the approximation. This ultimate goal is the second motivation for starting with the simple example cases in the present contribution.
In fact, such rigorous proofs are already available in the literature, albeit for different systems from the ones studied here. A very detailed mathematical account of such a convergence has been given in [2] starting from uniform distributions on the intersection of a simplex and a sphere. By appropriately parametrizing the radius of the sphere, and considering the behaviour of finite dimensional marginals and moments of this uniform distribution as the dimension of the space is increased, the author is able to rigorously prove that a phase transition occurs for this specific system. In particular, the author is able to prove that in the high dimensional limit the finite marginal distributions of the given uniform distributions are of product form.
Another work in this direction, which cites the previous article, is given in [8]. In this work the authors consider the convergence of the microcanonical and grand canonical measures related to the Bose-Hubbard model. The commonality between both [2] and [8] is that the models they are considering are defined on state spaces with strictly positive unbounded elements. Such a feature seems to be a key property of these models since both of these works observe a phase transition into a state which can be characterized as containing a condensate.
In fact, a fairly satisfying account of ensembles with unbounded strictly positive phase spaces has been given in [14]. In this work the author proves a form of the equivalence of ensembles for systems with multiple constraints satisfying certain conditions, and the results are quite general as to their applicability. However, the main theorems presented there hold for phase spaces which are defined on [0, ∞) N rather than R N , and, furthermore, the assumptions of the main theorem do not hold for the ensembles we are considering here.
We also mention an extensive source for references to the relative entropy method and usage of the method in [7] and [3]. Some of these references will also be explicitly quoted later when discussing the usage of relative entropy.
Finally, let us mention the origin of the continuum model we are considering. First, we recall the (discrete) Curie-Weiss model. For a general overview of the discrete model, we refer to [10]. We also mention the classical work of Ellis in [6] which goes beyond the standard Curie-Weiss model. In [9], the authors consider a further simplification to the Berlin-Kac model introduced in [1]. In particular, the nearest neighbour Ising model is replaced by a mean-field Hamiltonian, and, as evidenced in the article, the thermodynamic properties of the microcanonical and canonical ensembles become exactly computable. However, the authors do not consider the properties of local observables in their analysis. The following references contain results about the phase structure of these models [9], as well as of their Potts model type generalizations to multicomponent cases [4].
Our approach differs significantly from those of the above previous works and their associated models. In particular, we will employ various coupling methods to prove convergence of finite dimensional marginal distributions and finite moments of all orders. In addition, our arguments do not hinge on definitions of the microcanonical ensembles with thin-set approximations. Instead, we define the microcanonical ensembles directly as constrained measures and explore their properties via analytic rather than probabilistic methods. For the first model, we refer to [11] for a considerably more detailed analysis of the various properties of the model. However, for the second model introduced in [9] there does not seem to be proofs pertaining to the convergence of finite dimensional marginals or finite moments. There is a considerable amount of fine structure which must be considered to give a full account of the local convergence at this level.
Finally, let us stress that the main purpose of this paper is to display the specific methods of coupling and their relationship with the local convergence properties of the equilibrium ensembles. The thermodynamic properties of these systems are already well-known and have been studied extensively, but we wish to give an alternate, simpler and more accurate, account of the two models present in this paper, with the hope that the ideas used here generalize to other, less explicitly tractable models.

Two methods of coupling and main lemmas
In this section, we will present definitions relevant to this article including the concept of coupling, the Wasserstein distance metric, and their two application methods which will be presented as theorems later on.

Couplings and Wasserstein distances
We collect some of the basic notions related to couplings here. More thorough introduction is available for instance in [17].

Couplings and transport maps
We will frequently make use of the notion of coupling between probability measures. Let X be a sample space and let Σ be a σ-algebra on X. Let µ 1 and µ 2 be two probability measures on X.
Define the coordinate projections P 1 : X × X → X and P 2 : X × X → X by P 1 (x, y) := x and P 2 (x, y) := y. A probability measure γ on a sample space X × X with a σ-algebra Σ ⊗ Σ is called a coupling if γ • P −1 1 = µ 1 and γ • P −1 2 = µ 2 . Here, and in the following, P −1 will be used not only to denote the inverse of a mapping P , but also for the associated map which takes a set to its preimage under P .
In this paper, we will often give the definitions of probability measures with the explicit assumption that they can be constructed by simply giving suitable values of the expectations of measurable functions. For example, if X is a locally compact Hausdorff space and we are able to construct a bounded positive linear functional L on C c (X), the space of continuous functions with compact support equipped with the supremum norm, such that L = 1, then by the Riesz-Markov-Kakutani representation theorem, there exists a unique Radon probability measure 1 µ on X such that L(f ) = f µ for all f ∈ C c (X).
For the contents of this paper, we will use the following equivalent notion of coupling. Let f : X → R be a measurable function. A probability measure γ, as defined in the previous paragraph, is a coupling if holds for all such functions f . One typically says that the marginal distributions of γ are given by µ 1 and µ 2 .
In this paper, we will sometimes refer to specific types of couplings as transport maps. Let µ 1 be a probability measure as before, and let T : X → X be a measurable map. Define the probability measure µ 2 by setting µ 2 (A) := µ 1 (T −1 (A)) for all A ∈ Σ. Such a probability measure µ 2 is called the pushfoward measure of µ 1 by the map T . We then denote µ 2 = T * µ 1 . This notion is also sometimes called the abstract change of variables due to the following equivalent definition of the pushforward measure: If f : X → R + is a characteristic function of a measurable set, we may set and this defines a positive measure µ 3 on Σ. Then, it is straightforward to check that µ 3 indeed is a probability measure for which (2.1) holds for every non-negative measurable function f . In addition, µ 3 = µ 2 , and thus (2.1) provides an alternative definition of T * µ 1 . When µ 2 and µ 1 are measures such that there is a measurable map T for which µ 2 = T * µ 1 , we call T a transport map from the measure µ 1 to µ 2 . A transport map T can always be used to construct a coupling between µ 1 and µ 2 as follows: If g : X × X → R + is a measurable function, we define a probability measure γ by setting One can go through analogous steps as above and show that γ is then indeed a coupling of µ 1 and µ 2 = T * µ 1 .

Wasserstein distance and coupling optimization
For the moment, we will specialize to probability measures on R n . Let µ 1 and µ 2 be probability measures on R n and let f : R n → R be a bounded 1-Lipschitz function with respect to the || · || pnorm for some p ≥ 1. To be explicit, we require that f is a function for which its optimal Lipschitz constant K, defined by satisfies K ≤ 1. This is a property which depends on the choice of norm, and restricts the class of allowed functions. Naturally, if f is a function with K > 1, then we can apply the results below to the 1-Lipschitz function 1 K f , and the conclusions for the original function f will be the same, as long as the constant K remains bounded in n. We have chosen to use the "1-Lipschitz" assumption in order to remove one, otherwise quite relevant, constant from the estimates.
Suppose there exists a coupling γ of µ 1 and µ 2 . Using the properties of probability measures, we have On the last line, we have used the short hand notation x i = P i (x), i = 1, 2, for clarity. One should note that the coupling does not appear on the left hand side of this inequality, and, we are thus free to minimize this inequality with respect to all couplings γ. Since there always exists at least one coupling, given by the the product coupling γ = µ 1 ⊗ µ 2 , and since the functions f are bounded, then for any coupling the middle expression has a uniform upper bound. Therefore, Naturally, we can swap the norm || · || p for any cost function c(x, y) : R n × R n → R + with enough regularity as long as we can relate the difference of the expectations somehow to the given cost function.
For p ≥ 1, define P p (R n ) to be the space of probability measures with finite p:th moments, i.e., assuming that x p p < ∞. Consider µ 1 , µ 2 ∈ P p (R n ). Given also some q ≥ 1, we denote the p-Wasserstein distance between µ 1 and µ 2 with respect to the q-norm by W p;q (µ 1 , µ 2 ). Explicitly, , and, since x q ≤ n 1/q max j |x j |, is straightforward to check that then W p;q (µ 1 , µ 2 ) < ∞.
The p-Wasserstein distance has been studied comprehensively and applied in a great variety of circumstances; examples and discussion are provided in [17]. However, for the purposes of this paper, we will be more interested in slightly modified cost functions which are similar in nature to the p-Wasserstein distances. The main drawback of many of the methods and papers associated with the Wasserstein distances is that the focus has been on the case where the dimension of the space n is fixed. In the context of statistical mechanics, we are typically interested in asymptotic properties for arbitrarily large n.

Definitions
For the purposes of this section and for the definition of the lattice model later, let us fix some shorthand notations first. Given N ∈ N, we denote the collection of first N integers as follows and we denote the group of permutations of its elements by S N . Given a subset I ⊂ [N ], of a length n := |I|, there is a unique bijection π I : I → [n] which retains the order of the elements in the subsequence. We letπ I ∈ S N denote the extension of π I which is obtained by permuting the elements in [N ] \ I in an order preserving manner into the set [N ] \ [n]. In addition, every bijection π I as above defines a projection P I : R N → R n via the formula (P I x) j := x π −1 I (j) , j ∈ [n]. Analogously, given a permutation π ∈ S N , the corresponding coordinate permutation will be denoted Q π : R N → R N ; explicitly, we set (Q π x) j := x π −1 (j) , j ∈ [N ] (note that using the inverse permutation in the formula will result in a map which will send coordinate i into coordinate π(i)).
Given y ∈ R, there is a unique integer k ∈ Z for which k ≤ y < k + 1, and we denote this by using the "floor" notation, k := ⌊y⌋. In particular, given n, N ∈ N such that n ≤ N and setting k = ⌊N/n⌋ we have k ∈ N and k satisfies kn ≤ N < (k + 1)n. Definition 2.1 (Permutation invariance of measures on R N ). Given N ∈ N, a probability measure µ on R N , we say that µ is permutation invariant, if for every integrable function f : R N → R and a permutation π ∈ S N , we have f • Q π ∈ L 1 (µ) and Finally, instead of using a standard p-norm to measure distances in R N , we scale it suitably with N so that the Wasserstein cost function becomes an average over particle labels. The benefits of this definition will become apparent in Sec. 2.3.
Definition 2.2 (Specific p-norm fluctuation distance). Suppose p ≥ 1 and N ∈ N. Let µ 1 and µ 2 be two Radon probability measures on R N such that the p:th moments under both measures are finite. Their specific p-norm fluctuation distance w p is then defined as where the infimum is taken over all couplings of µ 1 and µ 2 .
Clearly, this definition relates to the standard p-norm Wasserstein distance mentioned earlier via a scaling:

The direct coupling method
To highlight the benefits of the above definitions, we offer the following fundamental Lemma which will be used to prove the main theorems of this paper. It should be stressed that the key assumption is to specialize to permutation invariant measures. We aim to consider local expectations, i.e., F for functions F : R N → R which depend only on components x i , i ∈ I, where I ⊂ [N ] can be otherwise arbitrary but it has a bounded size, i.e., |I| remains bounded when N → ∞. In particular, note that then there is some f : Proof. For the proof, set n := |I| and k := ⌊N/n⌋ when k ∈ N and k satisfies kn ≤ N < (k + 1)n. We define the sets , by setting I 1 := I and, for i > 1, we proceed inductively by . For any i, there is a permutation in S N which is bijection between I i and I. Thus by the assumed permutation invariance of the measures, we have f • P Ii = f • P I for either measure and all i. Therefore, Suppose then that γ is a coupling between µ 1 and µ 2 . Then f • P Ii µj = f • P Ii • P j γ for both j = 1, 2. Again resorting to the shorthand notations x j := P j x, we can rewrite The absolute value of this expression can now be estimated using the assumed 1-Lipschitz property of f . Combining the results and using the triangle inequality we thus obtain where in the last step we have used Hölder's inequality. Since the sets I i are disjoint, here Because the left hand side of the above estimate does not depend on the coupling γ, we can take the infimum over all possible couplings. Then using the relation between k and n stated in the beginning of the proof, we obtain as desired.
The first Lemma concerned bounds on local observables which were bounded 1-Lipschitz functions. This next variant of the Lemma concerns estimation of arbitrary finite moments.
For any x ∈ R N , we then let x J denote the power Assuming also n J ≤ p 0 + 1 − p0 p , it follows that Since 1 ≤ n J ≤ p 0 , the assumptions guarantee that x J is integrable with respect to both µ 1 and µ 2 . On the other hand, n J ≤ p 0 + 1 − p0 p implies q(n J − 1) ≤ p 0 , so that also M (J, p) < ∞.
First, note that for x, y ∈ R N , we have There are n J factors in each of the products under the sum. Thus for any coupling γ between µ 1 and µ 2 and, for simplicity, replacing x 1 , x 2 by x, y, we find using the generalized Hölder's inequality where q ′ := q(n J − 1), so that indeed 1 p + (n J − 1) 1 q ′ = 1, as required by the Hölder's inequality. Apart from the first term, the remaining n J − 1 terms are all bounded by M (J, p). Therefore, where Hölder's inequality has been used in the second step. Here, even if there are repetitions in the sequence J, we have 1 To finish the proof, one should notice that the label subset I which appears in this theorem can be regarded in the same way as in the proof of Lemma 2.3. Using the assumed permutation invariance to clone the labels yields collections of subsequences J(ℓ) and subsets I ℓ for ℓ ∈ [k], where k := ⌊N/n⌋, n := |I|. Since x J µi = x J(ℓ) µi by construction, we find using permutation invariance that as desired.

Free energy method
By applying Lemma 2.3, we are now able to produce two distinct types of coupling proofs which concern the ensembles discussed in the introduction.
Theorem 2.5. Let µ ε,ρ;N MC be a permutation invariant probability measure corresponding to a microcanonical ensemble with energy density ε and particle density ρ. In addition, assume that if we fix a possible energy density ε ′ , then for any other possible energy density ε there exists a constant C(ε, ρ) > 0 independent of ε ′ and N , but possibly dependent on ε and ρ, such that Suppose also that the microcanonical and canonical measures, for some parameter β, have finite p:th moments.
Fix n < ∞ and consider any I ⊂ N of length n. Let f : R |I| → R be a bounded 1-Lipschitz function with respect to the || · || p norm. Then where the canonical standard deviation of energy density reads explicitly Using the notation of the specific free energies, the same result can be rewritten as Applying Lemma 2.3 together with the assumptions of this theorem, we thus obtain where the first term on the right hand side does not depend on ε ′ , we obtain by Hölder's inequality an estimate as desired. Then we use the generic properties listed in Sec. 1.1 to express the result in terms of the canonical free energy.
Following the theme of the direct coupling method, the approach can also then be applied to the case of finite moments. Theorem 2.6. Let µ ε,ρ;N MC be a permutation invariant probability measure corresponding to a microcanonical ensemble with energy density ε and particle density ρ. In addition, assume that if we fix a possible energy density ε ′ then for any other possible energy density ε there exists a constant C(ε, ρ) > 0 independent of ε ′ and N , but possibly dependent on ε and ρ such that Suppose also that the microcanonical and canonical measures, for some parameter β, have finite p 0 :th moments for some p 0 ≥ p. Let J be a finite sequence of elements in [N ] where elements may be repeated, let n J := |J|, and suppose that n J ≤ p 0 + 1 − p0 p . Collect into I ⊂ [N ] the elements occurring in the sequence. It follows that where, using the dual exponent q = p p−1 , Proof. The proof is almost identical to the proof of the previous theorem. In order to isolate the moments of the canonical ensemble, one needs an additional application of Hölder's inequality.
For suitable ensembles, these theorems together imply that with bounded moments, one can achieve an explicit rate of convergence of the finite dimensional moments and marginals of the ensembles.

Simple model of a paramagnet
In this section, we will consider a simple model of a paramagnet discussed in [11]. We begin by defining the magnetization of the lattice system and the two probability measures on the lattice that we will consider. Furthermore, we define the magnetization density m N : N . In the following definitions, there will be a slight misuse of the ensemble terminology. We will refer to the fixed magnetization probability measures as an auxiliary microcanonical ensemble, and the probability measure with a parameter controlling the expectation of magnetization will be referred to as an auxiliary canonical ensemble.  Let µ ∈ R. The auxiliary canonical ensemble with magnetic potential µ is defined via its action functions f : S → R by The second representation is called the magnetization representation of this ensemble.
We will also need the following standard "thermodynamic" properties of these ensembles. We have compiled them in the following lemma.
Proof. The calculation of the auxiliary microcanonical partition function is based on the fact that the number of positive spins in a field configuration fully defines the total magnetization of the configuration, and, as a result, one only needs to consider the number of configurations with a specific number of positive spins. The final equality follows from the representation of the beta function after opening up the combination and subsequent factorials. The rest of the results concerning the auxiliary canonical ensemble follow by first computing the partition function, by noting that the structure of the measures is that of a product measure. Then we can differentiate the free energy with respect to µ and divide appropriately by the degrees of freedom N .
Next, we are going to present two distinct methods with which to compute upper bounds for the rate of convergence of expectations of functions between these two probability measures.

Relative entropy
We will, again, follow the presentation of this topic given in [11]. We begin with the definition of relative entropy. Definition 3.5 (Relative entropy). Let λ 1 and λ 2 be two probability measures on a space X. If λ 1 is not absolutely continuous with respect to λ 2 , the relative entropy H(λ 1 ||λ 2 ) = ∞. If λ 1 is absolutely continuous with respect to λ 2 , then we have For the paramagnet, we have the following calculation of the relative entropy.  Proof. We have .
Define the function f (t) := 1+m 2 ln t + 1−m 2 ln(1 − t). It can be shown by differentiation that the mapping f is strictly concave and thus attains its unique maximum on the interval (0, 1). This maximum is attained at the point t 0 := 1+m 2 ∈ (0, 1). Therefore, Recalling the definition of F (m, µ) in (3.2), we conclude that By computation, if m ∈ (−1, 1) and µ ∈ R are such that m = − tanh µ, then µ = 1 2 ln 1−m 1+m and thus F (m, µ) = 0. For fixed m, by differentiation we find that µ → F (m, µ) is a strictly convex function with a minimum at µ = 1 2 ln 1−m 1+m . Hence, this value of µ is the only zero of the function.
To get a useful lower bound, we need to inspect the values of the function f more carefully.
and since f ′ (t 0 ) = 0, we find from Taylor's theorem that there is K > 0 which depends only on m via t 0 such that for the above choice of t, which have |t − t 0 | ≤ δt 0 , Therefore, by the positivity of the integrand, we obtain Consider then arbitrary p ∈ (0, 1 2 ), and set δ : . Therefore, there exists N 0 which depends only on m and p such that for all N ≥ N 0 , the lower bound is greater than 1 N 1 2 − 2p ln(N + 1). If η < 1 we can set p = η/2, and the above estimates together thus prove (3.1) for all N ≥ N 0 . The lower bound is trivial if η ≥ 1. This concludes the proof of the Proposition.

Coupling
For the use of the coupling method, the previous thermodynamic calculations concerning the auxiliary canonical ensemble will be needed. The additional ingredient is the explicit coupling for certain observables presented in the following theorem.
By construction, we have In the other direction, if we fix φ ′ with magnetization M ′ and consider the number of ways to go to a field configuration φ which agrees on the positive lattice sites of φ, then clearly we must take ∆ = M ′ −M Now, we have the following simple binomial coefficient manipulations This verifies that γ is indeed a coupling between the fixed magnetization density ensembles with different magnetizations m and m ′ . For such a coupling, by construction, we have from which it follows that On the other hand, if η is any other coupling of µ m;N MC and µ m ′ ;N MC , we also have This implies that the coupling γ is an optimal coupling, and, we have This completes the proof assuming M ′ > M , and hence by symmetry, also the proof of the Theorem.

Convergence of local observables
In this subsection, we will present two distinct proofs of the convergence of local observables based on the two previously introduced objects. The first proof will be based on utilizing the Pinsker inequality and its relationship with relative entropy. This type of argument can be found in [7]. The argument uses the information divergence related methods from [5] in which a statement concerning the relationship between weak convergence and relative entropy is also given. We must also emphasize that whatever we refer to here as the relative entropy method is precisely the collection of arguments and theorems that will be presented shortly. We are not stating that the classical inequalities could not be, for instance, strengthened or leveraged with other theorems in order to produce better results. For an example of this sort of work, we refer to [3].

Relative entropy method
Again, we will follow the example of [11]. We have the following theorem. Proof. The following proof can be considered a sketch of the standard argument presented for the same model in [11].
Since f is a bounded function, and the measures are absolutely continuous with respect to each other, we have By Pinsker's inequality, we have Now, let N ≥ |I| and let I k ⊂ Λ be disjoint copies of size |I| on the lattice such that Λ ⊂ K k=1 I k and K−1 k=1 I k ⊂ Λ. It follows that (K − 1)|I| ≤ |Λ| ≤ K|I| which implies that 1 K ≤ |I| N . Now, by utilizing the fact that µ µ;N C is a product measure, and that both measures are permutation invariant, it follows that The statement follows by combining these calculations.
We give the full convergence rate in the following corollary. Here, and in the following, we employ the standard rigorous definition of the "O"-notation: given g(N ) ≥ 0 for N ∈ N, "X(N ) = O(g(N ))" refers to the limit N → ∞, i.e., it means there is an N -independent constant C and some N 0 ∈ N such that |X(N )| ≤ Cg(N ) for all N ≥ N 0 . However, in these results, the constant C is allowed to depend on possible other parameters of the setup: for example, no uniformity of C in the parameters m, |I| or f ∞ is claimed below.
By applying the relative entropy method, we have The factor of log(N ) cannot be removed by using this specific inequality since if we choose η = 1 4 , there exists a cutoff N (m) ∈ N such that for N ≥ N (m), we have Proof. The corollary follows directly by combining the contents and bounds from proposition 3.6 and theorem 3.8.

Coupling method
The main theorems formulated in the coupling section concern 1-Lipschitz functions with respect to some norm ||·|| p . Since the domain set is finite, all functions f : {−1, 1} |I| → R are automatically Lipschitz functions with respect to all of these norms. The choice of using p = 1 norm below is partially a matter of convenience, due to equivalence of the finite set p-norms, but one should be careful in the application of the result if the size of the set I is allowed to become unbounded as N → ∞. We recall from Sec. 2.1.2 that the optimal Lipschitz constant does depend on the choice of norm, and it will affect the overall constant in the bounds, unless scaled to one, as we require here.
We can now state the full convergence theorem. We continue to use the notations introduced before Corollary 3.9, i.e., "A(N ) = B(N ) + O(N − 1 2 )" here means that there exists N 0 ∈ N and Proof. The result follows by applying the free energy method presented in Theorem 2.5, along with the w 1 fluctuation distance bound presented in Theorem 3.7, and the equations in Lemma 3.4.
For the auxiliary microcanonical ensemble with fixed magnetization density, the w 1 choice of cost function is natural since the || · || 1 -norm satisfies

Relationship to the Curie-Weiss model
Let us first recall the Curie-Weiss Hamiltonian.
This relation leads to a simplification when studying the microcanonical ensemble of the Curie-Weiss model, defined as follows. We will always use the lower case letter m to specify fixed magnetization densities introduced in Definition 3.2, and ε for fixed energy densities so that there is no ambiguity.
In some sense, the fixed energy ensemble for some values of J and h is not necessarily fundamental as it can be represented as a convex combination of fixed magnetizations. The energy density can be written in terms of the magnetization density as from which it is clear that for some values of h and J there are multiple magnetization densities which give the same energy density. The following lemma makes the previous statements more quantitative.
with the convention that f m;N from which the statement follows.
From the previous lemma, it is apparent that if one has knowledge of the auxiliary microcanonical partition function and local observables of the fixed magnetization ensemble, then, in principle, one has a full description of the weak convergence properties of corresponding fixed energy density ensemble. In the later sections, we will give an example of this exact kind of analysis with the mean-field spherical model.

Final remarks and comparison of methods
From the results in the previous subsections, we can see that the coupling method generates a strictly better convergence rate by removing the factor of log(N ) which is an irremovable part of the magnitude of specific relative entropy in this case. For the relative entropy method, the primary object of interest is the calculation of the specific relative entropy. As can be seen, the selection of the magnetization density m and the parameter µ comes down to solving an equation which relates these parameters. Once the pair has been realized from this equation, we obtain the desired upper bound. In our calculation, the partition functions of both the auxiliary microcanonical ensemble and auxiliary canonical ensemble needed to be estimated sufficiently accurately (other more refined approaches for using relative entropy may be found in [12]).
For the coupling method, one needed to come up with the coupling of the auxiliary microcanonical ensembles resulting in Theorem 3.7. In addition, one uses the standard "thermodynamic" relations for the auxiliary canonical ensemble, given in Lemma 3.4. Because the auxiliary canonical ensemble was in product form, the calculations were particularly simple.
From these observations, the main difference between the methods concerns the treatment of the auxiliary microcanonical ensemble. In the above direct relative entropy computation, we need to calculate the partition function of the auxiliary microcanonical ensemble and the auxiliary canonical ensemble, but the "thermodynamic" relations of the auxiliary canonical ensemble do not seem important. For the coupling method, the auxiliary microcanonical partition function does not play the same role, and the coupling is the most important object along with the "thermodynamic" relations from the auxiliary canonical ensemble.

Mean-field spherical model aka Continuum Curie-Weiss model
For this model, we will need to clarify the goals and priority of the limiting measures for the local convergence result. Our main goal is to analyse explicitly the probability measure associated with the microcanonical ensemble. To that end, there will be some local convergence results in which thermodynamic equivalence might hold between two ensembles, but we will opt for a simpler auxiliary ensemble measure as the approximating measure.
In particular, for this model, we will see that the simplest limit measure to consider will be either a Gaussian measure or a convex combination of Gaussian measures. We will also prove some local convergence results where the limiting measure is not a product measure.
In the second model considered here, the "spin-field" φ is allowed to take all real values otherwise being similar to the discrete Curie-Weiss model.
We also define the magnetization M : S → R by In this model, the particle number function N [·] is much more relevant than in the discrete case. For this Hamiltonian, we will need to consider probability measures described by products of delta functions. To properly resolve them, we begin with an observation concerning a matrix relevant to the definitions of the ensembles. In the following, we employ the notation M N (R) for the collection of real N × N matrices.  Writing out the above matrix multiplication componentwise explicitly, we find for all i, j N Q 1i Q 1j = 1.
In particular, then |Q 1i | = 1 √ N for all i ∈ [N ], and thus for each i there is σ i ∈ {±1} such that Q 1i = σ i 1 √ N . Using a proof by contradiction, one can see that, in fact, the elements Q 1i must either all be negative or all be positive. Now, define U ∈ M N (R) by U := −Q if the elements Q 1i are all negative, and U := Q if the elements Q 1i are all positive. It follows that U is an orthogonal matrix and, by definition, we have This completes the proof of the Lemma.
Next, we will give two examples of how to apply δ-function calculation rules to resolve the ones relevant to the Curie-Weiss system, for both microcanonical and canonical ensembles. It is possible to prove the validity of these manipulations under the assumptions made in the Examples, for instance, following the discussion in Appendix A of [13].
Here we have first made a change of variables to (z, ψ) = U φ and then used spherical coordinates system to integrate out the resulting δ-functions. Since the left hand side does not depend on the choice of the matrix U , all choices must result in the same value for the integral on the right hand side.
to conclude that We will utilize these forms for more explicit definitions of the ensembles and in the proof concerning the boundedness of moments of the microcanonical ensembles.

Microcanonical analysis
From here on, whenever the mapping U is present, we are always referring to the mapping U defined by a matrix satisfying Lemma 4.2. We fix the choice of this matrix in the following. We begin with definitions of the two ensembles, related to fixed magnetization and to fixed energy. Definition 4.6 (Fixed energy density and particle density/microcanonical ensemble). Let ρ > 0, and ε, m + , and m − be as in Example 4.4, in particular, assume ε < h 2 2J and m 2 − , m 2 + < ρ. The microcanonical ensemble with energy density ε and particle density ρ > 0 is then defined via its action on bounded 1-Lipschitz functions f : S → R by If ε < h 2 2J but min(m 2 − , m 2 + ) < ρ ≤ max(m 2 − , m 2 + ), we set f One can indeed verify by using the calculations in Examples 4.3 and 4.4 that these measures correspond to δ-function definitions, resolved in the manner used in the Examples. The second definition serves as an explanation of the choice of multiplicative constants in the definition of the microcanonical partition function and specific entropy. In the degenerate case ε = h 2 2J , we would have above m + = m − = − h J = m, and in this case the δ-function definition in Example 4.4 does not really make sense since it would contain a singular term δ (z − m √ N ) 2 . Following these observations, we define the fixed energy microcanonical ensemble via the corresponding fixed magnetization ensemble.
Note that in addition to values of (ε, ρ) for which there are no solutions to the constraints, we have also left undefined the degenerate energy ensembles for which ε ≤ h 2 2J but min(m 2 − , m 2 + ) = ρ, as well as the degenerate magnetization ensembles with m 2 = ρ. In theses cases, the dimensionality of the solution manifold does not increase with N since all solutions have ψ = 0. As such, the resulting degenerate ensemble does not have standard thermodynamic behaviour.
We begin by estimating the fluctuation distance of two fixed magnetization ensembles by constructing a suitable transport map between them.
In the previous theorem, the fluctuation distance is seemingly bounded asymmetrically with respect to the magnetization densities m and m ′ . By symmetry, the bound holds for either choice, and thus a symmetric bound can also be straightforwardly derived. The reason for the asymmetric choice is that while using the fluctuation distance, we will always consider one of the magnetization densities to be fixed.
To study the fixed energy ensembles, we begin with a Lemma which implies that, for h = 0, one of the fixed magnetization measures dominates in the fixed energy ensemble.
Proof. If we consider the mapping m → Z MC (m, ρ; N ), then it is clear that Z MC (m, ρ; N ) ≥ Z MC (m ′ , ρ; N ) for all |m| ≤ |m ′ | < √ ρ. Now, note that The result follows by plugging in the values in the partition function.
With the above computations, we can also now give a simpler definition of the set of allowed energies, i.e., of those values of ε for which the fixed energy ensemble is defined using Definition 4.6. Definition 4.9. For h ∈ R and ρ > 0, we define the set of possible energy densities E h,ρ by We remark that for all of the above h, ρ the set E h,ρ contains ε = 0 and an interval of negative values of ε. In particular, E h,ρ is non-empty. Also, in case h = 0, we have E 0,ρ = − ρJ 2 , 0 . where In addition, The results follow since the term inside the absolute values on the right hand side in (4.1) is strictly less than one.
If h = 0, there exists a suitable coupling which can be constructed from the couplings used for the fixed magnetization ensembles.
Suppose first that ε, ε ′ < 0 and let m ± and m ′ ± be corresponding positive and negative magnetization densities to ε and ε ′ , respectively. To obtain a transport map, we define T : Now, let U ∈ M N (R) be the same unitary mapping as before. By setting T ′ := U −1 • T • U and going through the same calculations as earlier, one can confirm that f • T ′ ε,ρ;N for all observables f . Thus T ′ is a transport map and the associated coupling yields a bound The bound is also trivially true if ε = ε ′ = 0. Combining the above estimates proves the statement in the Theorem.

Canonical analysis
Compared to the regular Curie-Weiss model from [10], the canonical ensemble is somewhat simpler to analyse. In particular, one should note that when applying the identity which is sometimes referred to as Gaussian linearization, to solve the partition function of the regular Curie-Weiss model, we are effectively adding an extra variable over which to integrate when using Laplace's method. To this end, in order to avoid multi-dimensional Laplace analysis, we can forego the use of Gaussian linearization, and use a direct 1-dimensional Laplace method. In the following definitions and calculations, there will be a significant difference in the treatment of the models depending on whether we are dealing with h = 0 or h = 0. In the previous section, for h = 0, we constructed an explicit coupling between the fixed energy density ensembles. In the case of h = 0, it will turn out that the coupling between the fixed magnetization density ensembles is the more important object of study. This phenomenon seems to be closely related to the phase transitions in the mean-field spherical model which are thoroughly presented and analysed in [9].
We will define the canonical ensembles with the help of the microcanonical ensembles.
Definition 4.12. (Fluctuating magnetization and fixed particle density/auxiliary canonical ensemble) Let ρ > 0 and µ ∈ R. The auxiliary canonical ensemble with magnetic potential µ and particle density ρ is defined via its action on bounded 1-Lipschitz functions f : S → R by where we define the auxiliary canonical partition function by and the specific auxiliary canonical free energy by The case h = 0 is taken care of by the previous definition. We will refer to the special case of h = 0 as the fluctuating energy density ensemble. One can verify by formal calculations that this corresponds to the typical definition of an ensemble with a fixed average constraint. We have overloaded the notation here similarly as was done in the previous section: for example, the functions Z C (µ, ρ; N ) and Z C (β, ρ; N ) are different, but the name of the first parameter will uniquely determine to which we refer in the following. Proceeding as before, we first present the asymptotics of the derivatives of the partition function.
Theorem 4.14. Let ρ > 0 and µ ∈ R. Define ψ µ,ρ : Employing the shorthand notation Furthermore, if we fix ρ > 0, then for every µ ∈ R there exists m ∈ (− √ ρ, √ ρ) such that ψ µ,ρ is minimized at m, and, for every m ∈ (− √ ρ, √ ρ), there exists µ ∈ R such that ψ µ,ρ is minimized at m. The following asymptotics hold Proof. The first part of the theorem follows directly by differentiating the specific free energies with respect to µ and dividing by the degrees of freedom N appropriately. Next, for fixed ρ > 0, we compute It follows that the map ψ µ,ρ is strictly concave for all µ ∈ R, and we can check that there is a unique global minimum at m ∈ (− √ ρ, √ ρ) which satisfies ψ ′ µ,ρ (m) = 0. First, if µ = 0, then clearly the minimizing m = 0. If µ = 0, we have Next, if µ > 0, then 1 2µ + 1 2µ 2 + ρ > √ ρ , and thus the minimizing m must be < − √ ρ, and thus the minimizing m must satisfy The conclusion is that, if |m| < √ ρ, then Furthermore, the above relation goes both ways. For every µ ∈ R there exists a unique minimizing m for the above equation, and, for every m ∈ (− √ ρ, √ ρ), there exists µ ∈ R such that the given m is the minimizing term. This can be seen by simply studying the given equation above and considering the limits |µ| → 0 and |µ| → ∞ and using the continuity on the open intervals (−∞, 0) and (0, ∞).
The asymptotics of the average and standard deviation of magnetization density are given by the asymptotics of Laplace type integrals. We have as desired.
Next, we present the asymptotics of the h = 0 case.
For the asymptotics, if β ≥ 1 Jρ , then the asymptotics are standard and we have If β = 1 Jρ , then we need to choose half-integer values of "α" in the Laplace method, but this will not alter the scaling of the asymptotics for the above ratios. However, if β < 1 Jρ , then ψ ′ β,ρ (ε) < 0 for all ε, and since then "µ = 1" in the Laplace method, it follows that This completes the proof of the Theorem.

Grand canonical analysis
Finally, we will present the grand canonical ensemble and auxiliary grand canonical ensemble and the direct coupling method. If one considers microcanonical to be the most fundamental ensemble, this will result in substantial simplification of computation of its expectation values in the thermodynamic limit since these can now be computed using the grand canonical ensemble which is a Gaussian measure.
The definition may be rewritten using the same parametrization of the integrals as for the auxiliary microcanonical ensemble. The result is summarized in the following Lemma.
We can now construct a direct coupling between the auxiliary microcanonical ensemble and the auxiliary grand canonical ensemble. Then, which implies , and thus T ′ is a transport map. Therefore, using the related coupling we find an estimate We compute The converse result states that It follows that for every pair (m, ρ) for which the auxiliary microcanonical ensemble exists, there exists a pair (µ, η) such that the auxiliary grand canonical ensemble exists, and, the converse result holds as well. For such a pair satisfying the equations given above, we have , We have ||ψ|| 2 − N = 1 + , and thus It follows that 1 Combining all the terms, we find which implies the bound stated in the Theorem.
If h = 0, the microcanonical energy ensemble is well-approximated by an auxiliary microcanonical magnetization ensemble whose auxiliary grand canonical theory we already covered above. For the case of h = 0, we consider the following grand canonical energy ensembles.
The definition may be rewritten using the same parametrization of the integrals as for the microcanonical ensemble. The result is summarized in the following Lemma.
For the fixed energy density ensemble, there is only a single value of energy density for which a direct coupling can be constructed. Then, for all β < 2µ J , we have Proof. Let us begin by considering the more general case with h ∈ R and µ, ρ > 0 arbitrary. Let m + and m − be the corresponding negative and positive magnetization densities to the given ε.
MC , and T ′ is a transport map. Using the associated coupling, we find Note that for ε = 0, we have which does not provide any additional convergence for the local expectation error estimates. However, under the assumptions listed in the Theorem, i.e., if h = 0 = ε, µ > 0, ρ = 1 2µ , we find via the same computation as above that Note that the above holds for all β < 2µ J ⇐⇒ β < 1 ρJ .
For the cases β ≥ 1 ρJ , we must introduce another class of auxiliary measures. One should note that there is no direct coupling of this new alternate auxiliary grand canonical ensemble to the microcanonical ensemble because the probability measures are not disjoint. However, the individual grand canonical ensembles do converge suitably to the fixed magnetization density case, and thus we still have the desired local convergence properties. We also remark that the case µ = 0 corresponds to the regular grand canonical ensemble given by a Gaussian measure with β = 0.

Convergence of finite marginal distributions and finite moments
In this subsection, we will collect and apply the upper bounds and error estimates presented for the continuum model to formulate the main local convergence theorems. Since our main goal is to prove convergence theorems and error estimates for local observables of the mean-field spherical microcanonical ensemble, we will present a variety of target measures which the local observables can converge to. Some of these target measures will come from thermodynamic ensembles and others from auxiliary ensembles. In particular, the value of the constant h will have a significant impact on the choice of target measure.
The main theorems concerning the convergence of moments required the boundedness of single moments of all degrees. To this end, we will employ the following lemma.
For N ≥ 5, we have By the dominated convergence theorem, using the assumed integrability of f , we have Lemma 4.24. Let f : R → R be integrable with respect to any Gaussian measure. Let x ∈ Λ and define P x : S → R by P x (φ) = φ x . Then for all ρ > 0, h ∈ R, and ε ∈ E h,ρ , we have . In both of these cases, the result remains bounded as N → ∞.
As we remarked earlier, the phase transitions in the mean-field spherical model result in the need for different limiting measures outside of the standard ensembles when using the coupling method. Of particular importance is the parameter h ∈ R. In the following convergence results, we will always explicitly state for which different parameters and limiting measures the convergence results hold.
First, we will state the convergence result for the auxiliary microcanonical, auxiliary canonical, microcanonical, and canonical ensembles. Proof. The result follows by applying the free energy coupling presented in Theorem 2.5, along with the w 2 bound presented in Theorem 4.7, and with the asymptotics presented in Theorem 4.14.   Proof. If ε ∈ − Jρ 2 , 0 , the result follows by applying the free energy coupling presented in Theorem 2.5, along with the w 2 bound presented in Theorem 4.11, and the asymptotics presented in Theorem 4.15.
If ε = 0, then observe that the w 2 bound in Theorem 4.11 is not Lipschitz in the appropriate sense to directly apply Theorem 2.5. However, following the proof of Theorem 2.5, we can apply the following inequality Finally, we will state the convergence result for the auxiliary microcanonical, auxiliary grand canonical, alternate auxiliary grand canonical, microcanonical, and grand canonical ensembles.  Proof. The result follows by splitting the fixed energy density ensemble into its fixed magnetization ensity ensembles and applying Theorem 4.28.

Remark on choice of cost function
For this model, it should be observed that the w 2 convergence is a natural choice of convergence from the perspective that it implies both w 1 and w 2 convergence simultaneously, which in turn implies that the magnetization density converges along with the energy density. Without this property, a coupling of suitable strength between the microcanonical and grand canonical measures seems unlikely. Observe that Thus, even though optimality of the transport was not necessarily achieved as in the discrete case, the best possible scaling in the dependence on changes in the parameter m was still obtained here.
to denote the above without reference to N , even if the power series on the right does not converge. Analogously, we define asymptotic power series representation as x → ∞ by requiring that x → f (1/x), x > 0, has an asymptotic power series at 0. Explicitly, we then require a k x −k−µ = 0 , for all N ∈ N 0 , and denote this by The asymptotic analysis of Laplace-type integrals has been studied extensively. For completeness, we will present below a general form of the asymptotics of Laplace-type integrals. • h is differentiable in a neighbourhood of a and the previous power series representation can be term-wise differentiated to give a s (s + µ)(x − a) s+µ−1 .
• h ′ is continuous in a neighbourhood of a except possibly at a.
Suppose also that ϕ : [a, b] → R is a function satisfying all of the following: • ϕ is continuous in a neighbourhood of a except possibly at a.