FLUCTUATION RESULTS FOR GENERAL BLOCK SPIN ISING MODELS

We study a block spin mean-field Ising model. For the vector of block magnetizations we prove Large Deviation Principles and Central Limit Theorems under general assumptions for the block interaction matrix. Using the exchangeable pair approach of Stein’s method we are also able to establish a speed of convergence for the Central Limit Theorem for the vector of block magnetizations in the high temperature regime.


Introduction
Mean-field block models were introduced as an approximation of a lattice model of a meta-magnet, see e.g. formula (4.1) in [23]. Later, they were rediscovered as interesting models for statistical mechanics systems, see [18], [15], [8], [26], [24], as well as models for social interactions between several groups, e.g. in [17], [1], [28]. This latter approach follows very much the social re-interpretation for one group of the Curie-Weiss model in [6] or of the Hopfield model in [9] or [25]. A third source of interest in mean-field spin block models is a statistical point of view. In [3], the authors gave another analysis of the bipartite mean-field Ising block model with equal block sizes, and asked the question whether one can recover the blocks from several observations from this model, and if so, how many observations are needed. This aspect considers the block spin models as a special incident of block models. These have been in the center of interest in statistics and probability theory over the past couple of years (see, e.g. [2], [19]). The statistical interest in them arises from their relation to graphical models. In this framework a major question is always how to reconstruct the block structure under sparsity assumptions (see e.g. [5], [27], [4]).
Our starting point is [26]. There, the fluctuations of an order parameter for a two-groups block model with equal block sizes were analyzed on the level of large deviations principles (LDPs, for short) and central limit theorems (CLTs). Starting from these results, there are several natural questions. First: Can these results be also proven for systems with not necessarily identical block sizes? Second: Can we generalize our results to the situation of more than two groups? And third: Can we give a speed of convergence for the CLT? The main goal of the current note is to (partially) answer these questions. To this end, we will present a new approach to mean-field block spin models, via the corresponding block interaction matrix.
Moreover, to obtain a speed of convergence in the CLT, we will employ Stein's method as in [12], [7] for the standard mean-field Ising, or Curie-Weiss model.
The rest of this note is organized in the following way. In the remaining part of this introduction, we define our model in a way that makes it accessible to our techniques in Sections 2 and 3, and state our main results. Section 2 is devoted to the proof of the LDP results. Afterwards, we analyze the critical points of the rate function and obtain the mean field equations, showing that in the high temperature case the only maximum is 0, whereas in the low temperature case there are nonzero maximizers, and we obtain a solution for a special class of block interaction matrices. In Section 3 we prove the CLT for the order parameter of the model in two ways. One uses the classical Hubbard-Stratonovich transformation. This was already used for proving the CLT for the magnetization in the Curie-Weiss model in [14], and also is the core technique for the CLT in [26]. The second proof uses a multivariate version of the exchangeable pair approach in Stein's method, developed in [29]. Lastly, Section 4 contains a discussion of some of the results and further open questions.
1.1. The model. The block spin Ising model will be characterized by two quantities, a number k ∈ N (which we interpret as the number of blocks), and a symmetric, positive definite matrix A ∈ R k×k , which is the block interaction matrix. A ij will determine the strength of interaction between two particles in block i and j respectively. Here, R r 1 ×r 2 is the set of all r 1 by r 2 matrices with real entries.
Let N (n) be a strictly increasing subsequence of N. Define for each n ∈ N the matrix of the relative block sizes We assume that for each i = 1, . . . , k the limit exists, so that the matrix of asymptotic relative block sizes is invertible. If the k partition blocks are asymptotically of the same size, i.e.
where O(m, n) ∈ R m×n is the matrix with all entries equal to 1. We denote this model by µ Jn . More precisely, µ Jn is the probability measure on {−1, +1} N , N = N (n), defined by Here, of course, Note that, contrary to the usual convention, we do not require the diagonal of J n to be zero for technical convenience. However, since x 2 i = 1, both J n and its "dediagonalized" version J n = J n − diag(J ii ) result in the same Ising model. Here and in the sequel, diag(λ 1 , . . . , λ l ) is a diagonal l × l matrix with values λ 1 , . . . , λ l on its diagonal. Lastly, for any p, q ∈ [1, ∞] and any matrix A ∈ R k×k we define the operator norm A p→q := sup Ax q .

Main results.
We prove results on the fluctuations of the block magnetization vector on different scales. In what follows, we use the non-normalized and normalized versions of the block magnetization vector defined as Note that this allows us to rewrite the Hamiltonian H n of µ Jn as which we use tacitly. We begin by presenting the large deviation results. The first result is a generalization of [26,Theorem 2.1]. In that paper, an LDP for m (n) was proved in the situation of k = 2 blocks of equal size. Here we analyze the general case.
and L * denotes the convex conjugate of log cosh, i.e.
More precisely, in the notion of large deviations, the sequence of push forwards ( m (n) • µ Jn ) n∈N satisfies an LDP with speed N and the rate function I.
In the special case of asymptotically uniform block sizes the function I is related to the matrix A in an even more straightforward way, since in this case We show that the rate function I has a unique minimum at 0 in the case Γ 2 ∞ AΓ 2 ∞ 2→2 ≤ 1, which yields the following corollary. Corollary 1.2. Under the general assumptions, if Γ ∞ AΓ ∞ 2→2 ≤ 1, the normalized vector of magnetizations m (n) converges to 0 exponentially fast in µ Jn -probability. By this we mean more precisely, for each ε > 0 there is a constant I ε such that Let us discuss the large deviation results. In the classical Curie-Weiss model, i.e. the case k = 1, there is a phase transition: The limiting behavior of m (n) changes, depending on whether A 11 ≤ 1 (the high temperature regime), or A 11 > 1 (the low temperature regime) (see [13] for an extensive treatment of this model). A corresponding phase transition can be observed in our model. This is stated in [16] for the bipartite model. In [24] the authors prove the existence of such a phase transition using the method of moments. Of course, with that method one cannot obtain an exponential speed of convergence as in Corollary 1.2. In accordance with the notion in the classical Curie-Weiss model, we will call these different parameter regimes the high temperature and low temperature regime, respectively. Here, the high temperature regime corresponds to Γ ∞ AΓ ∞ 2→2 ≤ 1 and the low temperature regime to Γ ∞ AΓ ∞ 2→2 > 1. In the special case of asymptotically uniform block sizes (i.e. Γ ∞ = 1 √ k Id) these conditions reduce to A 2→2 ≤ k and A 2→2 > k respectively.
Next, we consider the scaled block magnetization vector m (n) . Again, in the classical (i.e. one-dimensional) case it is known that the magnetization satisfies a central limit theorem with variance σ 2 = (1−A 11 ) −1 whenever A 11 < 1. The following theorem is a generalization of this phenomenon.  Note that Σ ∞ exists, and it can be written as a von Neumann series. Moreover, if Again, a similar statement is derived in [24] using the method of moments.
Furthermore, we can treat the critical case. In the Curie-Weiss model, for β = 1, the quantity N −3/4 N i=1 σ i converges weakly to a measure with Lebesgue-density g 1 (x) := Z −1 exp − x 4 12 (see e.g. [13, Theorem V.9.5]). As proven in [26] and [16] a similar statement holds true for the vector of magnetizations in the case of k = 2 blocks. The next theorem gives a further generalization of this fact in the case k ≥ 2. Moreover, it shows that statistics associated to the orthogonal decomposition of the block interaction matrix give rise to k asymptotically independent random variables with either a Gaussian distribution or a distribution with a Lebesgue-density g 1 .
In the multidimensional critical case Γ ∞ AΓ ∞ 2→2 = 1 we restrict to the uniform case with a simple eigenvalue λ k = k, i.e. we have  ) and X n ∼ µ Jn be independent random variables, defined on a common probability space. Then w n (X n ) + Y n converges in distribution to a probability measure with density ki for a suitable normalization Z that makes the expression (1.1) a probability density. Thus, the vector (w n (X n ) j ) j=1,...,k−1 converges to a normal distribution with covariance matrix Σ = diag ((k − λ j ) −1 ) and the random variable w n (X n ) k converges to a distribution with Lebesgue-density ki )x 4 dx. We believe it is possible to extend Theorem 1.4 to the case where the eigenvalue k has multiplicity greater than 1, by appropriately rescaling all the eigenvectors which belong to the eigenvalue k.
Note that the parameter σ 2 := k 3 /12 k i=1 V 4 ki is directly related to the variance of a random variable with that distribution; indeed, a short calculation shows that for X ∼ exp(−σ 2 x 4 )dx we have Var(X) = cσ −1 , where c is an absolute constant.
where v k is the eigenvector belonging to the eigenvalue k. In a final step, we establish convergence rates in the CLT in the high temperature case for a special class of functions. We use the exchangeable pair approach of Stein's method, that was also used in [12] and [7] in the case of the Curie-Weiss model. The proof of the next result will rely on a multivariate version of Stein's method proven in [29]. To this end, define the function class of all three times differentiable functions with all partial derivatives (up to order three) bounded.

Proofs of the large deviation results and the mean-field equations
Let us start off by proving the LDP result for the rescaled block magnetization vector m (n) . Recall the notion of an LDP (for which we also refer to [22,11]): If X is a Polish space and (a n ) n∈N is an increasing sequence of non-negative real numbers, we say that a sequence of probability measures (ν n ) n on X satisfies a large deviation principle with speed a n and rate function I : X → R (i.e. a lower semi-continuous function with compact level sets {x : where int(B) and cl(B) denote the topological interior and closure of a set B, respectively.
We say that a sequence of random variables X n : Ω → X satisfies an LDP with speed a n and rate function I : X → R under a sequence of measures µ n if the pushforward sequence ν n := µ n • X n satisfies an LDP with speed a n and rate function I.
To prove Theorem 1.1, we will need the following lemma.
Lemma 2.1. Let X be a Polish space and assume that a sequence of measures (µ n ) n∈N on X satisfies an LDP with speed n and rate function I. Let F : X → R be a continuous function which is bounded from above and η n : X → R a sequence of functions such that η n L ∞ (µn) → 0. Then the sequence of measures d µ n = exp(nF + nη n )dµ n satisfies an LDP with speed n and rate function Proof. Note that this is a slight modification of the tilted LDP, which is an immediate consequence of Varadhan's Lemma ([22, Theorem III.17]). Indeed, according to this tilted LDP, the sequence of measures (ν n ) n with µ n -density exp(nF ) satisfies an LDP with speed n and rate function J. Since for any n ∈ N and any B ∈ B(X ) the inequalities hold, this easily implies an LDP for ( µ n ) n with speed n and the same rate function J due to η n L ∞ (µn) → 0.
Proof of Theorem 1.1. First, note that under the uniform measure µ 0 (i.e. A ≡ 0) we have By the Gärtner-Ellis Theorem ([11, Theorem 2.3.6]), m (n) satisfies an LDP under µ 0 with speed N and rate function where L * (x) is the convex conjugate of log cosh. Next, it is easy to see that we can rewrite the µ 0 -density of µ Jn as where Note that we artificially inserted the truncation in F to emphasize the boundedness of F ( m (n) ) -this does not affect the quadratic form, since Moreover, F is obviously continuous and η n satisfies so that the assertion follows from Lemma 2.1.
2.1. The mean-field equations. Theorem 1.1 states that the function determines the asymptotic behavior of the magnetization, and thus the critical points of I are of utter importance. These satisfy the so-called mean-field equations For example, in the well-studied case k = 2, choosing for a positive definite matrix A and γ ∈ (0, 1) equations (2.1) reduce to Whereas for the two-dimensional fixed point problem the existence of a solution can be shown by monotonicity arguments, the existence of a solution to (2.1) for general k is more involved. First off, we will show that in the high temperature regime the only critical point of I is 0. This will immediately yield Corollary 1.2.
Proof of Corollary 1.2. In the sense of the formulation in Corollary 1.2, m (n) concentrates exponentially fast in the minima of the function J. However, under the condition Γ ∞ AΓ ∞ 2→2 ≤ 1 there is only one minimum, which is zero. To see this, note that any local minimum satisfies Here, artanh(x) is understood componentwise. Clearly, 0 is a solution, and due to this is a local minimum. We claim that this is the only solution. Indeed, if y = 0 solves (2.2), we have Here the first inequality follows from the general fact that the spectrum of the matrices BC and CB agree, applied to B = Γ ∞ and C = Γ ∞ A. The last inequality follows from artanh(x)x ≥ x 2 for all x ∈ (−1, 1). This means that for any solution y we have equality in (2.4). However, equality can only hold if y i = 0 whenever γ i = 0, since if y i = 0 for some i, there is a strict inequality in the last step, which results in a contradiction. Due to our assumption γ i ∈ (0, 1), this proves the claim.
In contrast, in the low temperature regime, there are other solutions to the meanfield equations (2.1). Let us start with the following lemma showing the connection of the k-dimensional mean-field equations to the one-dimensional equations of the Curie-Weiss model. It provides an explicit formula for the solution of the k-dimensional problem in terms of the solution of the Curie-Weiss equation.
Proof. Let m * > 0 be the unique positive solution of the Curie-Weiss equation where in the second-to-last step we have used explicitly that v k ∈ {−1, 0, 1} k , and so v is a critical point of I. Moreover, in this case it is easily seen that Example 2.3. Even though the assumptions in the previous proposition seem to be tailor-made for its proof (and the conclusion also holds true more generally), there are interesting non-trivial examples of a matrix satisfying the conditions of Proposition 2.2. One of them is the family of k × k matrices (k ∈ N) of the form for any parameter such that β + (k − 1)α > k and β > α. (2.5) This corresponds to k groups with an interaction parameter β within the group and α between the groups. For example, the condition (2.5) is satisfied whenever β > α > 1.
In the general case, the conclusion of Proposition 2.2 holds as well. In this case the proof relies on the fact that the continuous function I has a global maximum on its (compact) domain [−1, 1] k , and the next lemma excludes maxima on the boundary. Hence there is always at least one solution y = 0 (since 0 is either an infliction point or a minimum) to (2.1).

Lemma 2.4.
Let I be the large deviation rate function from Theorem 1.1, i.e.
and L * denotes the convex conjugate of log cosh.
where x j ∈ R k−1 is the vector obtained from x by deleting the j-th component. If we divide both sides by 1 − y and let lim sup y→1 , the left hand side is finite, as 1 2 x, Cx ∈ C ∞ (R k ), and the right hand side tends to ∞ by l'Hospital's rule. This proves statement (1).
(2): Clearly, x can only satisfy the mean-field equations if x ∈ (−1, +1) k . Since it solves the mean-field equations, for any i = 1, . . . , k we have Inserting this into the function I gives (3): The function I is bounded in [−1, 1] k , as On the other hand, if there exists a sequence of maximisers approaching the boundary, i.e. for at least one i we have x i → 1, this gives R(x i ) → ∞.
In the case of two blocks, i.e. k = 2, equal block sizes and the same interaction within a group, the set of maximisers of the rate function is explicitly known. Note that we have to restrict to |α| ≤ β and β > 0 in order for A to be positive definite. Moreover, the characterization of the high temperature phase Γ ∞ AΓ ∞ Id (where is the Loewner partial ordering) can be reduced to (Id−Γ ∞ AΓ ∞ )e 1 , e 1 ≥ 0 and det(Id − Γ ∞ AΓ ∞ ) ≥ 0. Thus we are in the high temperature regime if and only if Proof. The case α = 0 is an easy consequence of the statements for the onedimensional Curie-Weiss model, since We treat the case α > 0 only -the case α < 0 follows immediately from the equality I α,β (x, y) = I −α,β (x, −y) (with the appropriate modifications, e.g. the maximum will be in the second quadrant instead of the first).
Due to (2.6) the maximum of the rate function is non-negative, let us call this maximum η. Then, I(x, y) = η = 0 implies (x, y) = 0, which is a contradiction to the low temperature case (recall the Hessian of I in 0 given in equation (2.3)), so that η > 0. Moreover, every global maximum (and thus local maximum, as it is not attained on the boundary) satisfies the mean-field equations, and so the value of I at any maximum is given by equation (2.6). As a consequence, all global maxima lie on a contour line i ) was defined in the previous lemma. Firstly, let us show that in the first quadrant there can only be one such point. Due to symmetry, the global maximum will also be present in the third quadrant. For x 1 > 0 the points on the contour line C η can be described by a function x 2 = g(x 1 ), and due to the monotonicity of R the function g is non-increasing. Moreover, the solutions of the mean-field equations can be described by the functions . The function f 1 can behave in two ways, depending on the parameter γβ: For γβ ≤ 1 it increases monotonously. For γβ > 1 it decreases first and then increases. More precisely, in the latter case, f 1 (t) = 0 if and only if t ∈ {0, ±m γβ } for some m γβ > 0 and f 1 is strictly increasing for t ≥ m γβ . Moreover, the curve (x, f 1 (x)) is only in the first quadrant if m γβ < x ≤ 1. In either case, there is only one intersection point of g and f 1 in the first quadrant.
Secondly, the maximum cannot be in the second quadrant. Assume that there are solutions to the mean field equations both in the first and in the second quadrant. If we denote by m c the zeros of ϕ c (t) := artanh(x) − cx, for the solution in the second quadrant, we easily see that −m c < x < 0 and 0 ≤ y ≤ m β(1−γ) . Hence If there is also a solution in the first quadrant with coordinates (x * , y * ), we obtain analogously This yields that the maximum must lie in the first quadrant Furthermore, we can treat the case k > 2 for uniform block sizes and special matrices. The proof is motivated by [3, Proposition 2.1]. Lemma 2.6. Let k ≥ 2 and A be a block interaction matrix with positive entries such that we have for any i = 1, . . . , k for two constants c 1 , c 2 > 0 A ii = c 1 and In the uniform case, there are exactly two maximisers of the rate function I and they satisfy x = m * (1, . . . , 1) for m * solving the Curie-Weiss equation c 1 +c 2 k x = artanh(x). Proof. Using the equality xy = − 1 2 (x − y) 2 + 1 2 x 2 + 1 2 y 2 we can rewrite the rate function as where equality only holds in the case x i = x j for all i, j. Thus, we search for maximisers of I on the generalized diagonal {x ∈ [−1, 1] k : x i = x j ∀i, j}. On this set we have i.e it reduces to the Curie-Weiss equations in one dimension. For c 1 + c 2 > k it has a unique nonzero solution m * , and x = m * (1, . . . , 1) solves the k-dimensional maximization problem.
Unfortunately, the proof cannot be modified in a straightforward way to deal with non-equal block sizes, not even in the case k = 2. The reason is that the inequality used in the proof does not give any information on the actual maximiser in this setting (i. e. I is not maximized on any type of (weighted) diagonal). As such, we cannot reduce this to the one-dimensional setting.
Example. For example, Lemma 2.6 can be used to prove that given three positive parameters α, β, γ with β > α and β + α > 2γ, the rate function corresponding to only has two maximisers in the uniform case. The conditions on α, β, γ ensure that A is positive definite, and it is clear that c 1 = β and c 2 = α + 2γ.
As a concluding remark let us note that the previous results imply that there is indeed a phase transition in our block spin model. However, if k > 2 or the block sizes are not equal, it seems hard to give a similarly explicit formula for the limit points. Nevertheless, the above observations show that there is a phase transition in the general block spin Ising model with an arbitrary number of blocks and general block sizes. In particular, they also justify the names "high temperature regime" and "low temperature regime".

Proofs of the limit theorems
In this section we prove (standard and non-standard) Central Limit Theorems for the vector m (n) . In the first subsection we will treat the high temperature regime. Here we derive a standard CLT using the Hubbard-Stratonovich transform. This is in spirit similar to the third section in [26] and technically related to [20]. The result can also be derived from [15], where similar techniques are used. However, the subsection also prepares nicely for Subsection 3.2, where we treat the critical case and show a non standard CLT. This generalizes results from [16] and [26]. Finally, in Subsection 3.3 we will use Stein's method, an alternative approach to prove the CLT for m (n) . This is not only interesting in its own rights, but also has the advantage of providing a speed of convergence, which is missing in the case of a proof via the Hubbard-Stratonovich transform.

Central limit theorem: Hubbard-Stratonovich approach. For the proof we shall use the transformed block magnetization vectors
where Γ n AΓ n = V T n Λ n V n is the orthogonal decomposition. It is easy to see that Proof of Theorem 1.3. As in [26] or [15] (both papers are inspired by [14]), we use the Hubbard-Stratonovich transform (i.e. a convolution with an independent normal distribution). For each n ∈ N, Our first step is to prove that w n converges weakly to a normal distribution. Let Y n ∼ N (0, Λ −1 n ) be an independent sequence, which is moreover independent of ( w n ) n∈N . We have for any B ∈ B(R k ) where we have defined For parameters r, R > 0 let B 0,r,R := {x ∈ R k : r ≤ x 2 2 ≤ R} and decompose Since Λ n → Λ ∞ (which is a consequence of the continuity of the eigenvalues) we have for any R > 0 Next, we will estimate (3.1) from below in order to obtain an upper bound for I 2 . If we define C 2,4 := Id 2→4 , then where we have used the convergence of Γ n to Γ ∞ to bound Γ −1/2 n 4→4 and the fact that C(r)r 2 → 0 as r → 0, so that the right hand side is positive definite for r small enough, uniformly in n. Thus, after taking the limit n → ∞, I 2 will vanish in the limit R → ∞.
Lastly, we need to show that I 3 vanishes as well. To this end, we will show that we can choose r > 0 small enough to ensure that Φ n (x) ≥ exp(−N c) uniformly for x ∈ B r √ N (0) c and for n large enough. Since Λ n − Λ ∞ 2→2 → 0 and Λ ∞ 2→2 < 1, choose n large enough so that Λ n 2→2 < 1 uniformly. Again, as before, it can be seen that 0 is the only minimum for n chosen that way. Indeed, after some manipulations any critical point satisfies Γ n AΓ n tanh(y) = y, and since tanh(y) 2 ≤ y 2 and Γ n AΓ n 2→2 < 1, this is only possible for y = 0. As a consequence, for any r > 0 there is a constant c such that uniformly Φ n (x) ≥ c, i.e.
Lastly, choose r > 0 so small that Λ n − Λ 2 n − C(r)r 2 C is uniformly positive definite, and observe that we obtain . From here, it remains to undo the convolution, giving With the help of Slutsky's theorem and the definition m n = V T n w n this implies Example. Consider the case k = 2 and We have the diagonalization 15 and w = V m = 1 which is exactly the covariance matrix in [26] (again up to a factor of 2). Note that similar results have been derived in [24].
Remark. If A ∈ M k (R) is symmetric and positive semidefinite, then a variant of the proof shows that if we let A = V T ΛV with Λ = diag(λ 1 , . . . , λ l , 0, . . . , 0) for l < k, ((V m) i ) i≤l converges to an l-dimensional normal distribution with covariance matrix . . . , λ l ). This can be applied to the matrix A 2 above with α = β, resulting in a CLT for the magnetization in a Curie-Weiss model, which of course can also be obtained by choosing k = 1 and 0 < β < 1.

Non-central limit theorem.
Recall the situation of Theorem 1.4: The block interaction matrix has eigenvalues 0 < λ 1 ≤ . . . ≤ λ k−1 < λ k = k and we consider the uniform case, i.e. Γ 2 ∞ = k −1 . Moreover, we use the definitions Proof of Theorem 1.4. Let Y n ∼ N (0,Ĉ −1 N ) and X n ∼ µ Jn be independent random variables, defined on a common probability space. We have for any Borel set B ∈ B(R k ) Now the proof is along the same lines as the proof of the CLT in the high temperature phase, with the slight modification that we use expansion of log cosh to fourth order We again split R k into three regions, namely the inner region I 1 = B R (0) for an arbitrary R > 0, the intermediate region I 2 = K r \B R (0) for some arbitrary r > 0, where and the outer region I 3 := K c r . Also define the rescaled vector Firstly, in the inner region we rewrite and since the convergence of the error terms is uniform on any compact subset of R k , for any fixed R > 0 this yields Secondly, we show that the outer region does not contribute to the limit N → ∞. It can be seen by elementary tools that Φ N has a unique minimum 0 in 0, and so for any r > 0 we have inf x∈I 3 Φ(x) > 0. Using the monotone convergence theorem, we obtain lim Lastly, we will estimate the contribution of the intermediate region from above by a quantity which vanishes as R → ∞. To this end, we will bound the function Φ N from below. Recall that Now, as in the case of the central limit theorem, we can estimate from below the error term in such a way that there is a positive constant c and a positive definite matrix C such that from which we obtain an upper bound, i.e.
and the right hand side vanishes as R → ∞ by dominated convergence. As a result, the limit n → ∞ exists and is equal to The convergence results for the non-convoluted vector follow easily by considering the characteristic functions. We have for any Using the independence of X n and Y n , the results follow by simple calculations.

Central limit theorem: Stein's method.
Lastly, we will prove Theorem 1.5 using Stein's method of exchangeable pairs. For brevity's sake, for the rest of this section we fix n ∈ N and we will drop all sub-and superscripts (e.g. we write B i instead of B (n) i ,m instead ofm (n) , J instead of J n et cetera). It is more convenient to formulate this approach in terms of random variables. Let X be a random vector with distribution µ J and I be an independent random variable uniformly distributed on {1, . . . , N }. First, denote by (X, X) the exchangeable pair which is given by taking a step in the Glauber chain for µ J , i.e. X is the vector after replacing X I by an independent X I with distribution X I ∼ µ J (· | X I ) (the exchangeability follows from the reversibility of the Glauber dynamics). Consequently, (m,m ) = (m(X),m( X)) is also exchangeable. More precisely, with the standard basis vectors (e i ) i=1,...,k of R k we have We need the following lemma to identify the conditional expectation of X i . Here, we write h : {1, . . . , N } → {1, . . . , k} for the function that assigns to each position its block, i.e. h(j) = k ⇐⇒ j ∈ B k .
Proof. For any Ising model µ = µ J the conditional distribution of X i is given by µ(· | X i ) and so 18 where we recall the notation J (d) for the matrix without its diagonal, i.e. J (d) = J − diag(J ii ). In the case that J = J n is the block model matrix, this yields Since the conditional expectation will be of importance, we define so that E( X i | F) = tanh(g i (X)). Note that g i actually does not depend on X i , the latter term is added for convenience to rewrite the first term. Thus we have
where, with λ(i) := d m=1 |(Λ −1 ) m,i |, we define the three error terms Here, |h| j denotes the supremum of the partial derivatives of up to order j. Note that in the proof the choice of σ(W ) for the conditional expectation is arbitrary; it suffices to take any σ-algebra F with respect to which W is measurable. Clearly, the value E 1 has to be adjusted accordingly.
with the three error terms Var(R i ).
Finally, the following lemma shows that all error terms E i can be bounded by a term of order N −3/2 .

Lemma 3.5. In the situation of Corollary 3.4 we have
Before we prove this lemma (and consequently Theorem 1.5), we will state concentration of measure results in the block spin Ising models. These will be necessary to bound E 1 , E 2 , E 3 . The first step is the existence of a logarithmic Sobolev inequality for the Ising model µ Jn with a constant that is uniform in n.  20 where Ent is the entropy functional and T i : This follows immediately from [21, Proposition 1.1], since Γ n AΓ n → Γ ∞ AΓ ∞ , which implies the convergence of the norms, i.e. for n large enough we have Γ n AΓ n 2→2 < 1. Although the condition in [21] is J 1→1 < 1, this was merely for applications' sake and J 2→2 < 1 is sufficient to establish the logarithmic Sobolev inequality.
For any function f : {−1, +1} N → R and any r ∈ {1, . . . , N } we write Moreover, it is known that (3.3) implies a Poincaré inequality Proof of Lemma 3.5. Error term E 1 : To treat the term E 1 , fix i ∈ {1, . . . , k} and observe that Thus, if we define i | −1/2 (Var(f i (X))) 1/2 , and we need to show that Var(f i (X)) = O(1). Using the Poincaré inequality (3.4) it suffices to prove that The second case r / ∈ B (n) i follows by similar reasoning.

21
Error term E 2 : The second term E 2 is much easier to estimate, as Error term E 3 : To estimate the variance of the remainder term R we first split it into two sums. For any i = 1, . . . , k write j (X) + R (2) j (X).
i − E R (2) i 2 and we estimate these terms separately. It is obvious that the L 2 norm of the second term is of order O(N −2 ). To estimate R In the last line we have used the fact that (AΓm) 3 i 2 = (AΓm i For the details see [21]. The constant depends on a norm of AΓ, which by convergence to AΓ ∞ can again be chosen independently of n.
Proof of Theorem 1.5. The theorem follows immediately from Corollary 3.4 and Lemma 3.5.

Discussion and open questions
Although the questions raised in the introduction have been answered to a certain degree, there are still open questions that we were not yet able to answer.
The first question concerns the maxima of the rate function I. Firstly, note that by [10, Theorem A.1] the global maxima of I are related to the global minima of the so-called pressure functional, which can for example be found in [15, equation (14)]. Using the compactness of [−1, 1] k and the continuity of I, the existence of a maximiser easily follows, but the number of maximisers is still obscure. From realanalyticity of I, we can infer that the set of maximisers is a λ k null set, but it could in principle contain infinitely many points (e.g. consider the real-analytic function g(x, y) = sin(x) − y). However, Lemmas 2.5 and 2.6 as well as numerics suggest that for all k ≥ 2, the number of local minima is twice the number of independent systems -see Figures 2 for the k = 3 and 3 for the k = 2 case below.
Another question is the relationship of Theorems 1.3 and 1.5. In Theorem 1.5 we consider the distance to a normal distribution with covariance matrix Σ n :=  Em (n) (m (n) ) T and not to Σ ∞ := (Id −Γ ∞ AΓ ∞ ) −1 , which is the covariance matrix of the limiting distribution. Testing against functions h ∈ C ∞ c (R k ), we see that Σ ∞ is the limit of the matrices Σ n . It is an interesting task to provide suitable bounds of Σ n − Σ ∞ in any matrix norm, since [29, Proposition 2.8] provides bounds of |E h(X) − E h(Y )| for two random vectors with X ∼ N (0, Σ 0 ) and Y ∼ N (0, Σ 1 ) in terms of the 1-distance of Σ 0 and Σ 1 .
Thirdly, it remains an open problem to quantify the distance to a normal distribution with the "limiting" covariance matrix Σ ∞ . The central limit theorem in the one-dimensional Curie-Weiss model has been solved for example in [12,Corollary 2.9]. Therein one can see that the limiting covariance is (1 − β) −1 by considering the approximate linear regression condition. A similar condition is true in the multidimensional case. For example, in Lemma 3.2 we have proven where λ = N −1 and Λ = (Id −Γ n AΓ n ) −1 . Thus, in the case Γ n ≡ Γ ∞ (e.g. consider a subsequence along which this holds) Λ is the covariance matrix of the limit distribution. However, we have been unable to find a suitable modification of [29, Theorem 2.1] that enables one to compare the distribution of the random vectorm (n) with N (0, Λ).