1 Introduction

In the early 2000s, a statistical mechanics approach to social choice in several groups of interacting agents lead to the development of so-called block spin models [4]. These block spin models were investigated in a number of papers, from both a static and a dynamic point of view, see [3, 12,13,14, 17, 18, 20, 23, 25]. Statistical questions in block spin models were studied in [2] and [24]. All the above papers, however, discuss the situation where each spin only takes two values, hence generalized mean-field Ising models or Curie–Weiss models.

In [19] and [21] for the first time block spin models with three or more possible spin values were considered. They are natural extensions of mean-field Potts models as considered as a generalization of the Curie–Weiss model in [16] and [11]. In [21] the author discusses the situation of a block spin Potts modelFootnote 1 with two blocks or groups and proves limit theorems for this situation. In [19] on the other hand, a setting with several groups is analyzed, however, these have to have an equal size. In this case the authors are able to show a large deviation principle, a phase transition, they are able to identify the limit points and prove logarithmic Sobolev inequalities. In particular, the authors in [19] identify a parameter regime in the model that corresponds to the high-temperature regime in usual mean-field spin models. In this regime the matrix-valued order parameter of the model has a unique limit point.

The main goal of the present note is to prove a Central Limit Theorem (CLT, for short) for this order parameter (the block magnetization) in the high-temperature regime. Notice that Theorem 2 in [21] studies a similar question. However, there only two blocks are allowed (which makes the situation technically considerably easier) and little is known about the parameter regime where the CLT holds. In the present paper we will study an arbitrary (finite) number of blocks of arbitrary block sizes with more general interactions possible.

To be more specific, to present a simplified model and to fix some notation, partition \(\{1, \ldots , N\}, N \in \mathbb {N}\) into s blocks \(S_k\). Let \(\beta >0\) be the intra-block-interactivity within one of the blocks (for \(s=1\), the parameter \(\beta \) corresponds to the inverse temperature) and \(\alpha \in (0, \beta )\) be the inter-block-interactivity between different blocks. Then, for any \(q \in \mathbb {N}\setminus \{1\}\) (which can be thought of as the number of colors) and \(\omega \in \{1, \ldots , q\}^N\), we consider the Hamiltonian

$$\begin{aligned} H_N(\omega ) := -\frac{\beta }{2N} \sum _{i \sim j} \mathbf {1}_{\omega _i = \omega _j} - \frac{\alpha }{2N} \sum _{i \not \sim j} \mathbf {1}_{\omega _i = \omega _j}, \end{aligned}$$
(1)

where \(i \sim j\) means that i and j belong to the same block \(S_k\) and \(i \not \sim j\) means that they belong to different blocks. Here, for \(i=j\) we have \(i \sim i\), and for \(i \ne j\) we count both pairs (ij) and (ji). This gives rise to the Gibbs measure

$$\begin{aligned} \mu _N(\omega ) := \frac{\exp (-H_N(\omega ))}{Z_{N}}, \end{aligned}$$

where \(Z_{N}\) is the partition function

$$\begin{aligned} Z_N := \sum _{\omega '} \exp (-H_N(\omega ')). \end{aligned}$$

This definition agrees with the one given in [19], where the case of \(q \ge 3\) was considered. Here we also include \(q=2\), in which case we obtain a version of the Ising block model, cf. e. g. [23] or [20]. Note that our \(\beta \) and \(\alpha \) correspond to \(2\beta \) and \(2\alpha \) in [23], and similar remarks hold for [20]. This is a consequence of using indicator functions rather than products \(\omega _i\omega _j\) of spins \(\omega _i, \omega _j \in \{\pm 1\}\).

In the sequel it will turn out that our quantities of interest can be most handily described in term of the Kronecker product of matrices, denoted by \(\otimes \) in the rest of the paper. We define the magnetization of the block spin Potts model as the vector m whose components are given by

$$\begin{aligned} m_{k,c}:=(e_k\otimes e_c)^Tm:=\frac{1}{\left|S_k\right|}\sum _{i\in S_k}{\mathbbm {1}}_c(\omega _i), \end{aligned}$$
(2)

where \(k \in \{1, \ldots , s\}\), \(c \in \{1, \ldots , q\}\) and \(e_k \in \mathbb {R}^s\), \(e_c \in \mathbb {R}^q\) are the respective standard unit vectors. Note that m corresponds to the matrix \(M_N\) in [19], written as a vector, and under slight abuse of notation, we will still denote its components in a double index notation. Denote by \(I_d\) the identity matrix, by \(\mathbf{1}_{d\times e}\) the \(d\times e\) matrix consisting only of ones and \(\mathbf{1}_d=\mathbf{1}_{d\times 1}\). Define the diagonal matrix consisting of the block sizes \(S= {{{\,\mathrm{diag}\,}}}( (\left|S_k\right|)_{k=1,\dots ,s})\).

In the most general case we will treat in this note, the \(s\times s\) block interaction matrix A is assumed to be symmetric, positive definite and having only positive entries \(A_{j,k}>0\). For instance, a sufficient condition for A is if \(A^{-1}\) is a so-called Stieltjes matrix having negative off-diagonal entries. A more specific example of consideration will be the structured matrix \(A_{\alpha ,\beta }=\alpha \mathbf{1}_{s\times s}+(\beta -\alpha ) I_s\), which has \(\beta \) on the diagonal and \(\alpha \) elsewhere, corresponding to the Hamiltonian (1). We will denote by

$$\begin{aligned} \mathcal S= S\otimes I_q,\quad \mathcal A=A\otimes I_q \end{aligned}$$

the corresponding block-structured matrices of the block sizes and interaction matrix, respectively.

Next let \(\gamma _k=\lim _N\left|S_k\right|/N\) and \(\Gamma =\lim _{N\rightarrow \infty }\mathcal S /N={{{\,\mathrm{diag}\,}}}(\gamma )\otimes I_q\). Then the Hamiltonian can be rewritten as

$$\begin{aligned} H_N(\omega )=-\frac{1}{2N} m^T \mathcal S\mathcal A \mathcal Sm \end{aligned}$$

Finally, let us recall the critical inverse temperature of the q-color block Potts model is given by \(\zeta _q := 2 \frac{q-1}{q-2} \log (q-1) \). For \(q=2\), this has to be read as \(\zeta _2 = 2 = \lim _{q \downarrow 2} \zeta _q\) (see [11, 16]). Our main result in the present paper is a CLT for the matrix of block magnetizations. If the block sizes are asymptotically equal, this CLT reads as follows:

Theorem 1

Consider \(A=A_{\alpha ,\beta }\) in the high-temperature case, i.e.,

$$\begin{aligned} \Vert \sqrt{\Gamma }\mathcal A\sqrt{\Gamma }\Vert _2=\lim _{N\rightarrow \infty }\Vert \sqrt{S}A\sqrt{S}/N\Vert _2<\zeta _q, \end{aligned}$$

of asymptotically equal block sizes, i.e., \(\Gamma =I_{sq}/s\). Then, the normalized magnetization satisfies the following Central Limit Theorem

$$\begin{aligned} \sqrt{\mathcal S}(m-\tfrac{1}{q}\mathbf{1} _{sq})\Rightarrow W, \end{aligned}$$

where \(W\sim \mathcal N (0,\Sigma )\) is an sq-dimensional centered Gaussian random variable with singular covariance matrix

$$\begin{aligned} \Sigma =s\mathcal A^{-1}\left( \left( I_{sq}+\frac{\mathcal A}{s}\big ( I_s\otimes (\tfrac{1}{q^2}\mathbf{1}_{q\times q}-\tfrac{1}{q} I_q)\big )\right) ^{-1}-I_{sq}\right) . \end{aligned}$$

For non-homogeneous block sizes, we can prove a CLT under a more restrictive condition for the“inverse temperature”of a more general interaction matrix A.

Theorem 2

Let \(\gamma _k>0\) for all \(k=1,\dots ,s\) and \(\mathcal A = A\otimes I_q\) for some symmetric, positive definite interaction matrix A with positive entries. If \(\Vert \sqrt{\Gamma }\mathcal A\sqrt{\Gamma }\Vert _2<4\frac{q-1}{q}\), then the normalized magnetization satisfies the following Central Limit Theorem

$$\begin{aligned} \sqrt{\mathcal S}(m-\tfrac{1}{q}\mathbf{1} _{sq})\Rightarrow W, \end{aligned}$$

where \(W\sim \mathcal N (0, \Sigma )\) is an sq-dimensional centered Gaussian random variable with singular covariance matrix

$$\begin{aligned} \Sigma&=\big (\sqrt{\Gamma }\mathcal A\sqrt{\Gamma }\big )^{-1}\left( \Big (I_{sq}+\sqrt{\Gamma }\mathcal A\sqrt{\Gamma }\big ( I_s\otimes (\tfrac{1}{q^2}\mathbf{1}_{q\times q}-\tfrac{1}{q} I_q)\big )\Big )^{-1}-I_{sq}\right) \end{aligned}$$

Remark 3

The value \(4\frac{q-1}{q}\) defines a threshold between different cases in [11, Proof of Theorem 2.1] as well, however it is not possible to generalize their monotonicity (in \(\beta \)) arguments to \(s>1\) blocks. Instead, our proof relies on a fixed-point argument, as in [20, 23]. This argument does not apply above the given threshold, where additional local minima appear already in the classical Potts model, and hence we cannot reach the critical inverse temperature \(\zeta _q\). Note that \(4\frac{q-1}{q}\le \zeta _q\) with equality for \(q=2\). Moreover, the same spectral norm also distinguishes between the high and low-temperature case in [20] and [6].

For \(s=2\), Theorem 1 is an explicit version of [21, Theorem 2] and the limiting distribution coincides. However, it is not apparent from [21] that the limiting distribution is degenerate, which we shall circumvent in the following.

Moreover, if we replace the condition \(\Vert {\sqrt{\Gamma }\mathcal A\sqrt{\Gamma }}\Vert _2<4\frac{q-1}{q}\) by the same abstract condition as in [21, Equation (16)], i. e. if \(\phi \) has a unique minimizer \(\xi ^*\) at which the Hessian is strictly positive definite and if \(\phi \) grows at least quadratically far away from \(\xi ^*\), then our CLT continues to hold (since this is the last part of the proof from (40) and below). Here, \(\phi \) is given in Lemma 7 below.

Remark 4

The components of m are coupled since \(m_{k,q}=1-\sum _{c=1}^{q-1} m_{k,c}\). Therefore \(\Sigma \) is singular and has exactly s many zero eigenvalues. We shall circumvent this by projecting to the first \(q-1\) coordinates of each block after rotating the hyperplane of the support of the limiting Gaussian distribution. We rotate by the matrix \(I_s\otimes \tilde{R}\), for which each \(s\times s\) diagonal-block is given by the matrix \(\tilde{R}\) (given in (42)) which rotates the normal vector of the support to the unit vector, i.e., \(\tilde{R} \frac{1}{\sqrt{q}}\mathbf{1} _{q}=e_q\). If \(\tilde{P}\) is the \((q-1)\times q\)-projection matrix onto the first \((q-1)\) coordinates, define \(\mathcal R=I_s\otimes \tilde{P} \tilde{R}\). According to the previous discussion, it is reasonable to consider the rotated (as well as projected and rescaled) magnetization given by

$$\begin{aligned} \hat{m} =\mathcal R\sqrt{\mathcal S}(m-\tfrac{1}{q}\mathbf{1} _{sq})=({{{\,\mathrm{diag}\,}}}(\sqrt{\left|S_k\right|}_{k=1,\dots , s})\otimes \tilde{P} \tilde{R}) m. \end{aligned}$$
(3)

Note that the rotated magnetization \(\hat{m}\) is already centered because of \(\mathcal R \mathbf{1}_{sq}=0\).

Theorem 5

Under the conditions of Theorem 1 or Theorem 2, respectively, the rotated magnetization \(\hat{m}\) satisfies the Central Limit Theorem

$$\begin{aligned} {\hat{m}}\Rightarrow \mathcal N (0 , (q-A/s)^{-1}\otimes I_{q-1}) \end{aligned}$$

in the case of asymptotically equal block sizes or, more generally

$$\begin{aligned} \hat{m}\Rightarrow \mathcal N \Big (0 , \big (q-{{{\,\mathrm{diag}\,}}(\gamma )}^{1/2}A {{{\,\mathrm{diag}\,}}(\gamma )}^{1/2}\big )^{-1}\otimes I_{q-1}\Big ), \end{aligned}$$

where the covariance matrix is non-singular and strictly positive definite.

This rotated CLT makes a comparison to the mean-field Ising model, or its block model version, more apparent. We would like to emphasize that in the case of the Curie–Weiss model, we take \(q=2,s=1\), and \(\hat{m}= \sqrt{N/2}(m_2-m_1)\) is the usual magnetization of the Ising model up to a factor of \(1/\sqrt{2}\). The limiting distribution is given by \(\mathcal N (0 , (2-\beta )^{-1})\) which coincides with the classical results, see [10, Theorem V.9.4]. Note that once again, our \(\beta \) corresponds to \(2\beta \) in [10]. Moreover, one may retrieve [23, Theorem 1.2] in the case of \(q=s=2\), where our \(\beta \) corresponds to \(4\beta \) of [23].

The object of study in the LLN/LDP in [20] differs from the CLT’s above by the scaling of magnetization. The following moderate deviation principle (MDP) describes the fluctuation on scalings transitioning between the two.

Theorem 6

Assume the conditions of Theorems 1 or 2. For any \(0<\theta <1/2\), the distribution of \(\mathcal R\mathcal S^{\theta }(m-\frac{1}{q} \mathbf{1}_{sq})\) under the Gibbs measure \(\mu _{N}\) satisfies a moderate deviation principle of speed \(N^{1-2\theta }\) with good rate function

$$\begin{aligned} \Lambda (t)=\frac{1}{2} t^T \left( \left( q{{{\,\mathrm{diag}\,}}(\gamma )}^{1-2\theta }-{{{\,\mathrm{diag}\,}}(\gamma )}^{1-\theta }A {{{\,\mathrm{diag}\,}}(\gamma )}^{1-\theta }\right) \otimes I_{q-1}\right) t. \end{aligned}$$

As usual in large deviations theory, the large deviations rate function derived in [19, §3] is not obtained when setting \(\theta =0\) in Theorem 6, while for \(\theta =1/2\) one obtains as rate function the exponent of the CLT in Theorems 1 and 2. It is folklore knowledge in large and moderate deviation theory that such a smooth transition from the MDP regime to the regime of the CLT is usually true, even though there are counterexamples (see e.g., [22] for a counterexample).

Let us yet again consider the special case \(q=2\), \(s=1\) corresponding to the classical Curie-Weiss model. Here, \(\Lambda (t)=(2-\beta )t^2/2\), which coincides with [8, Example 2.1], after noticing that our \(\beta \) corresponds to \(2\beta \) in [8].

The proofs of the results presented above are based on the Hubbard–Stratonovich transform, or in other words, convoluting the distribution of the rescaled magnetization under the Gibbs measure with a suitable Gaussian measure will lead to a representation where Laplace’s Method can be applied. Let us first state how this convolution looks like. In particular, the function \(\phi \) (cf. (5) below) in the exponent of the resulting density will be of major interest.

Lemma 7

For any \(0\le \theta \le 1/2\) and all \(v\in {{\,\mathrm{\mathbb {R}}\,}}^{sq}\) we have

$$\begin{aligned}&\mu _N\circ \left( \mathcal S^\theta (m-v)\right) ^{-1}*\mathcal N \left( 0,N( \mathcal S^{1-\theta }\mathcal A \mathcal S^{1-\theta })^{-1}\right) (d^{sq}x)\nonumber \\&=c_N \exp \left[ - N \phi \left( \left( \frac{\mathcal S}{ N}\right) ^{1-\theta }\frac{x}{ N^{\theta }}+\frac{\mathcal S}{N}v \right) \right] d^{sq}x, \end{aligned}$$
(4)

where

$$\begin{aligned} \phi (\xi )=\tfrac{1}{2} \xi ^T\mathcal A \xi -\sum _{k=1}^s \frac{\left|S_k\right|}{N} \log \left( \sum _{c=1}^q \exp \left[ \xi ^T\mathcal A(e_k\otimes e_c) \right] \right) . \end{aligned}$$
(5)

Since \(\phi \) depends on N in the case of non-equal Block sizes \(\left|S_k\right|\not \equiv N/s\), we shall write \(\phi _N\) whenever this dependency needs to be stressed. Note that this is the analogue of [11, Lemma 3.2] in the case of \(\theta =1/2, s=1\) and that the sum of exponential functions corresponds to \(2\cosh (\cdot )\) in the Ising model, cf. [23, Proof of Lemma 3.1].

Finding the global minima of \(\phi \) will be crucial, hence we shall investigate \(\phi \) in the following. In Sect. 2 we build a bridge to the convex dual of \(\phi \) which is studied in [19]. In order to do so, we would have liked to invoke classic results from the theory of classic duality as given e.g., by Eisele and Ellis [9, Appendix C]. However, our function of interest fails to meet the assumptions made in this theory. This is repaired by introducing subdifferentials and by applying the theory of convex duality to submanifolds. Note that a similar result can be found in [5, Theorem A.1], using a related methodology. Nevertheless, we chose to include our own arguments for completeness and to keep this paper self-contained. Moreover, we do not only provide a result for the minimizer but for all critical points and explicitly state the bijection on which the duality is based.

In Sect. 3 we prove Lemma 7. Moreover, we study the properties of \(\phi \) in detail and, in particular, find its minima. For asymptotically equal block sizes we can make use of results obtained in [19]. This is why in this case we obtain results up to the critical temperature, while in the case of general block sizes and general interaction matrix we need to study \(\phi \) directly. Finally, in Sect. 4 we will prove the main theorems of this note.

2 Some Facts on Convex Duality

Recall that the convex conjugate of a function \(f :\mathbb {R}^d \rightarrow \mathbb {R} \cup \{\infty \}\) is defined by

$$\begin{aligned} f^*(\nu )=\sup _{\xi \in \mathbb {R}^d}\{\xi ^T\nu -f(\xi )\} = \sup _{\xi \in \mathrm {dom}(f)}\{\xi ^T\nu -f(\xi )\}, \end{aligned}$$

where \(\mathrm {dom}(f) := \{\xi \in \mathbb {R}^d :f(\xi ) < \infty \}\) (in passing, note that we never allow for \(f(\xi ) = - \infty \)). According to [26, Theorem 12.2], this transform is an involution \((f^*) ^*=f\) for closed convex functions.

For the ease of presentation, we first show a duality result in a simplified situation, i.e., for functions which behave particularly well with respect to convex conjugates. That is, assume that \(B := \mathrm {int}(\mathrm {dom}(f))\) is a non-empty open convex set and that f is strictly convex and essentially smooth, i.e., f is differentiable on B and \(\lim |\nabla f(x_i)| = \infty \) whenever \(x_i\) is a sequence in B converging to a boundary point of B. Such a function is also called a convex function of Legendre type. In this case, by [26, Theorem 26.5], \(f^*\) is a convex function of Legendre type as well, we have

$$\begin{aligned} \nabla f \circ \nabla f^*=Id=\nabla f^*\circ \nabla f, \end{aligned}$$
(6)

and by [26, Theorem 23.5],

$$\begin{aligned} {\text {argsup}}_{\nu }\{\nu ^T\xi -f^*(\nu )\}=\nabla f (\xi ). \end{aligned}$$
(7)

Using these basic relations, we obtain the following lemma:

Lemma 8

Let fg be convex functions of Legendre type. Then we have the convex duality

$$\begin{aligned} \sup _{\xi \in \mathrm {dom}(g)}\{f(\xi )-g(\xi )\}=\sup _{\nu \in \mathrm {dom}(f^*)}\{g^*(\nu )-f^*(\nu )\}. \end{aligned}$$
(8)

Moreover, the bijection \(\nabla g\), given by (6), is a bijection between critical points and a bijection between maximizers of (8), and for any critical point \(\xi \) of \(f-g\), we have

$$\begin{aligned} f(\xi )-g(\xi ) = g^*(\nabla g(\xi ))-f^*(\nabla g(\xi )). \end{aligned}$$

In particular, if the right supremum is attained at a unique maximizer \(\nu ^*\), then the left supremum is attained at the unique maximizer \(\xi ^*=\nabla g^*(\nu ^* )\).

Proof

The duality (8) can be found in Eisele and Ellis [9, Appendix C].

Suppose that \(\xi \) is a critical point of \(f-g\). In particular,

$$\begin{aligned} \nabla (f (\xi )-g(\xi ))=0, \end{aligned}$$
(9)

i.e., \(\nabla f\) and \(\nabla g\) take the same value on critical points, and hence by (6),

$$\begin{aligned} \nabla f (\nabla g^*(\nabla g(\xi )))=\nabla f(\xi )=\nabla g(\xi ). \end{aligned}$$

Applying \(\nabla f^*\) to both sides reveals that \(\nabla g(\xi )\) is a critical point of \(g^*-f^*\). Moreover, it follows from (9) and the representation (7) that

$$\begin{aligned} g^*(\nabla g(\xi ))-f^*(\nabla g(\xi ))&= g^*(\nabla g(\xi ))-f^*(\nabla f(\xi ))+\xi ^T(\nabla f(\xi )-\nabla g(\xi ))\nonumber \\ {}&=f(\xi )-g(\xi ). \end{aligned}$$
(10)

Conversely, using the above reasoning together with the involution property of the Legendre transform, any critical point \(\nu \) of \(g^*-f^*\) is also a critical point of \(f-g\) and satisfies

$$\begin{aligned} g^*(\nu )-f^*(\nu ) = f(\nabla g^*(\nu )) - g(\nabla g^*(\nu )). \end{aligned}$$
(11)

Suppose in addition that \(\xi \) is a maximizer such that \(f(\xi )-g(\xi )=\sup \{f-g\}\). Then, by convex duality (8) and (10), we obtain that \(\nabla g(\xi )\) is a maximizer of \(g^*-f^*\). Conversely let \(\nu \) be a maximizer of \(g^*-f^*\), then by (8) and (11) also \(\nabla g^*(\nu )\) is a maximizer of \(f-g\). \(\square \)

In the proofs of Theorems 1 and 2, we need a result of the same type as Lemma 8 applied to the functions

$$\begin{aligned} f(\xi ) := \sum _{k=1}^s\frac{|S_k|}{N}\log \sum _{c=1}^q\exp (\xi _{k,c}),\qquad g(\xi ):= \frac{1}{2}\xi ^T\mathcal A ^{-1}\xi \end{aligned}$$
(12)

such that \(\phi (\xi )=g(\mathcal A \xi )-f(\mathcal A \xi )\). Here, and in the sequel, we used the same notation as for the magnetization (2), i.e. we denote the components by \(\xi _{k,c}:=(e_k\otimes e_c)^T\xi =\xi _{(k-1)q+c}\). Let us calculate the convex conjugates \(f^*\) and \(g^*\). First, note that \(g^*(\nu )=\frac{1}{2}\nu ^T\mathcal A \nu \), see [10, Example VI.5.1]. Therefore, it remains to find

$$\begin{aligned} f^*(\nu )=\sup _{\xi } \{\nu ^T\xi - f(\xi )\}. \end{aligned}$$
(13)

Differentiation with respect to \(\xi \) shows that the supremum is attained at a critical point satisfying

$$\begin{aligned} \nu _{k,c}-\frac{|S_k|}{N}\frac{\exp ( \xi _{k,c})}{\sum _{\tilde{c}=1}^q\exp ( \xi _{k,\tilde{c}}) }=0 \end{aligned}$$

for all \(c=1,\dots ,q\) and \(k=1,\dots ,s\). In particular, critical points can only exist if \(\nu _{k,c} > 0\) and moreover, summing over \(c=1,\dots ,q\) shows that we must have \(\sum _{c=1}^q\nu _{k,c}=|S_k|/N\) for all \(k=1,\dots ,s\). (If these conditions are not satisfied, it is not hard to see that \(f^*(\nu ) = \infty \).)

Applied to (13), this readily implies \(f^*(\nu )=\nu ^T\log \nu -\mathcal H\), where the logarithm is considered componentwise and \(\mathcal H = \mathcal H(|S_\cdot |/N) = \sum _{k=1}^s(|S_k|/N)\log (|S_k|/N)\) is the entropy of \((|S_1|/N, \ldots , |S_s|/N)\). Summing up, we have

$$\begin{aligned} f^*(\nu ) = {\left\{ \begin{array}{ll} \nu ^T\log \nu -\mathcal H, &{} \nu \in C\\ \infty , &{}\text {else}\end{array}\right. },\qquad g^*(\nu )=\frac{1}{2}\nu ^T\mathcal A \nu , \end{aligned}$$
(14)

where

$$\begin{aligned} C := \{\nu \in {{\,\mathrm{\mathbb {R}}\,}}^{sq} :\nu _{k,c} > 0\ \text {and}\ \sum _{c=1}^q \nu _{k,c} = |S_k|/N\ for all \ k\}. \end{aligned}$$
(15)

In particular, note that in this situation, we cannot directly apply Lemma 8, since f is not a convex function of Legendre type. In fact, its convex conjugate \(f^*\) is not even differentiable (as a function on \(\mathbb {R}^{sq}\)). Therefore, we need to modify the arguments leading to Lemma 8.

To this end, let us recall the notion of subdifferentials. Let \(h :\mathbb {R}^{sq} \rightarrow \mathbb {R} \cup \{ \infty \}\) be a convex function. Then, the subdifferential \(\partial h(\nu )\) of h in \(\nu \in \mathbb {R}^{sq}\) is the the set of all vectors (subgradients) \(\nabla h(\nu ) \in \mathbb {R}^{sq}\) such that

$$\begin{aligned} h(\eta ) \ge h(\nu ) - \langle \nabla h(\nu ), \eta -\nu \rangle \end{aligned}$$
(16)

for all \(\eta \in \mathbb {R}^{sq}\) or, equivalently, all \(\eta \in \mathrm {dom}(h)\). Labeling the elements of \(\partial h(\nu )\) by \(\nabla h(\nu )\) is non-standard but convenient for our purposes. If h is differentiable in \(\nu \), then \(\partial h(\nu )\) just consists of a single element \(\nabla h(\nu )\), which is the usual Euclidean gradient of h at \(\nu \), cf. [26, Theorem 25.1]. Assuming h to be a closed function, (7) continues to hold for subgradients, and generalizing (6), we have

$$\begin{aligned} \begin{aligned} \nabla h(\nu ) \in \partial h(\nu )&\Leftrightarrow \nu \in \partial h^*(\nabla h(\nu )),\\ \nabla h^*(\nu ) \in \partial h^*(\nu )&\Leftrightarrow \nu \in \partial h(\nabla h^*(\nu )) \end{aligned} \end{aligned}$$
(17)

according to [26, Theorem 23.5]. In particular, we can always choose suitable elements of the respective subdifferentials such that (6) holds (pointwise). For illustration, consider f and g as in (12) for \(q=2\) and \(s=1\). In this case,

$$\begin{aligned} \partial f(\xi )&= \Big (\frac{e^{\xi _1}}{e^{\xi _1} + e^{\xi _1}}, \frac{e^{\xi _2}}{e^{\xi _1} + e^{\xi _1}}\Big ),\quad \partial g(\xi ) = (\xi _1/\beta , \xi _2/\beta )\\ \partial f^*(\nu )&= {\left\{ \begin{array}{ll} \{ (z, z + \log (\nu _2/\nu _1) :z \in \mathbb {R} \} &{} \nu \in (0,1)^2, \nu _1 + \nu _2 = 1\\ \emptyset &{}\text {else}\end{array}\right. },\quad \partial g^*(\nu ) = (\beta \nu _1, \beta \nu _2). \end{aligned}$$

In Lemma 8, the supremum of \(g^*-f^*\) is evaluated over \(\mathrm {dom}(f^*)\), which in our case equals C as in (15). Therefore, we have to consider functions \(h :C \rightarrow \mathbb {R}\). Set

$$\begin{aligned} \hat{C} := \{\hat{\nu } \in {{\,\mathrm{\mathbb {R}}\,}}^{(q-1)s} :\hat{\nu }_{k,c} > 0\ \text {and}\ \sum _{c=1}^{q-1} \hat{\nu }_{k,c} < |S_k|/N\ for all \ k\}. \end{aligned}$$
(18)

Obviously, there is a natural bijection between C and \(\hat{C}\): any \(\nu \in C\) corresponds to a \(\hat{\nu }\) whose k-th block is given by \((\nu _{k,1}, \ldots , \nu _{k,q-1})\) (i. e. the first \(q-1\) coordinates of the k-th block of \(\nu \)). In other words, recalling the projection \(\tilde{P}\) from Remark 4, we have \(\hat{\nu }=(I_s\otimes \tilde{P})\nu \in \hat{C}\). Conversely, given \(\hat{\nu }\), we get back \(\nu \) by adding a q-th coordinate to each block by setting \(\nu _{k,q} = |S_k|/N-\sum _{c=1}^{q-1}\hat{\nu }_{k,c}\). In particular, any function \(h :C \rightarrow \mathbb {R}\) naturally gives rise to a function \(\hat{h} :\hat{C} \rightarrow \mathbb {R}\) given by \(\hat{h}(\hat{\nu }) := h(\nu )\), where \(\nu \) is the element of C corresponding to \(\hat{\nu }\) as described above. In the same way, any function \(\hat{h}\) on \(\hat{C}\) naturally corresponds to a function h on C. The subdifferentials of h and \(\hat{h}\) are related as follows:

Lemma 9

Consider C as in (15) and \(\hat{C}\) as in (18), let \(h :\mathbb {R}^{sq} \rightarrow \mathbb {R}\) be a convex function with \(C \subset \mathrm {dom}(h)\) and \(\hat{h}\) as introduced above. For any vector \(\xi \in \mathbb {R}^{sq}\), let \(\widetilde{\xi } \in \mathbb {R}^{(q-1)s}\) be the vector whose k-th block is given by

$$\begin{aligned} (\xi _{k,c}-\xi _{k,q})_{c=1}^{q-1}, \end{aligned}$$
(19)

i.e., the q-th component of each block is subtracted from all the other components.

  1. 1

    \(\hat{h}\) is a convex function on \(\hat{C}\), and for any \(\nu \in C\) and any \(\nabla h(\nu ) \in \partial h(\nu )\), the vector \(\widetilde{\nabla h(\nu )}\) lies in \(\partial \hat{h}(\hat{\nu })\).

  2. 2

    Assume \(\mathrm {dom}(h) = C\), i. e. \(h(\nu ) = \infty \) if \(\nu \notin C\). If \(\hat{\nu } \in \hat{C}\) and \(\nabla \hat{h}(\hat{\nu }) \in \partial \hat{h}(\hat{\nu })\), then any vector \(x \in \mathbb {R}^{sq}\) such that \(\widetilde{x} = \nabla \hat{h}(\hat{\nu })\) lies in \(\partial h(\nu )\).

Proof

This follows readily from the definition of the subdifferential as in (16). Indeed, to see (1), take any \(\hat{\eta } \in \hat{C}\). Then, writing \(\nu _{k,q} = |S_k|/N-\sum _{c=1}^{q-1}\hat{\nu }_{k,c}\), we have

$$\begin{aligned} \hat{h}(\hat{\eta }) = h(\eta )&\ge h(\nu ) - \langle \nabla h(\nu ), \eta - \nu \rangle \\&= \hat{h}(\hat{\nu }) - \langle \widehat{\nabla h(\nu )}, \hat{\eta } - \hat{\nu } \rangle - \sum _{k=1}^s (\nabla h(\nu ))_{k,q}\sum _{c=1}^{q-1} (\nu _{k,c}-\eta _{k,c})\\&=\hat{h} (\hat{\nu })-\langle \widetilde{\nabla h (\nu )},\hat{\eta }-\hat{\nu }\rangle , \end{aligned}$$

which is what had to be proven. Reversing these arguments immediately leads to (2), where the condition \(\mathrm {dom}(h) = C\) guarantees that we may restrict ourselves to vectors \(\eta \in C\). \(\square \)

We are now ready to prove an analogue of Lemma 8 for the functions f and g from (12).

Lemma 10

The statement of Lemma 8 continues to hold for f and g as in (12).

Note that in this case, a critical point of \(g^*-f^*\) has to be understood as a critical point of \(\hat{g}^* - \hat{f}^*\), which is a differentiable function on \(\hat{C}\). To switch between the differentials of \(g^*\) and \(\hat{g}^*\), we may then use Lemma 9 (1) and the uniqueness of the subdifferential for differentiable functions. In particular, once again it holds that if the right supremum is attained at a unique maximizer \(\nu ^*\), then the left supremum is attained at the unique maximizer \(\xi ^*=\nabla g^*(\nu ^* )\).

Proof

We follow the proof of Lemma 8 and adapt it to the situation under consideration. Obviously, (8) holds since for this relation, no differentiability assumptions are needed. For the rest of the proof, note that for f and g as in (12), the functions f, g, \(g^*\), \(\hat{f}^*\) and \(\hat{g}^*\) are differentiable, which in particular yields uniqueness of the corresponding subdifferentials.

Suppose that \(\xi \) is a critical point of \(f-g\). In particular,

$$\begin{aligned} \nabla (f (\xi )-g(\xi ))=0, \end{aligned}$$
(20)

i.e., \(\nabla f\) and \(\nabla g\) take the same value on critical points, and hence by (6),

$$\begin{aligned} \nabla f (\nabla g^*(\nabla g(\xi )))=\nabla f(\xi )=\nabla g(\xi ). \end{aligned}$$

Now choose any \(\nabla f^*(\nabla g(\xi )) \in \partial f^*(\nabla g(\xi ))\) and apply it to both sides of this equality, which yields

$$\begin{aligned} \nabla f^*(\nabla f (\nabla g^*(\nabla g(\xi ))))=\nabla f^*(\nabla g(\xi )), \end{aligned}$$

or, in terms of \(\hat{f}^*\),

$$\begin{aligned} \nabla \hat{f}^*\big ((I_s\otimes \tilde{P})\nabla f (\nabla g^*(\nabla g(\xi )))\big )=\nabla \hat{f}^*(\widehat{\nabla g(\xi )}), \end{aligned}$$
(21)

where for convenience of notation we used \(I_s\otimes \tilde{P}\) instead of the \(\widehat{\cdot }\) notation.

Now note that for any \(\omega \in \mathbb {R}^{sq}\), \(\nabla \hat{f}^*(\widehat{\nabla f(\omega )}) = \widetilde{\omega }\). To see this, note that by (17), a suitable choice of subgradient yields \(\nabla f^*(\nabla f(\omega )) = \omega \), which by Lemma 9 (part (1), for \(h = f^*\), \(\nu = \nabla f(\omega )\)) yields the claim for this choice of subgradient, from where the claim also follows in general since \(\hat{f}^*\) is differentiable (and thus has a unique subdifferential). In particular, the left-hand side of (21) reads \(\widetilde{\nabla g^*(\nabla g(\xi ))}\), which by Lemma 9 (1) equals \(\nabla \hat{g}^*(\widehat{\nabla g(\xi )})\). Altogether, \(\nabla \hat{f}^*(\widehat{\nabla g(\xi )}) = \nabla \hat{g}^*(\widehat{\nabla g(\xi )})\), i. e. \(\widehat{\nabla g(\xi )}\) is a critical point of \(\hat{g}^* - \hat{f}^*\). Moreover, it follows from (20) and the representation (7) that

$$\begin{aligned} \hat{g}^*(\widehat{\nabla g(\xi )})-\hat{f}^*(\widehat{\nabla g(\xi )})&= g^*(\nabla g(\xi ))-f^*(\nabla g(\xi ))\nonumber \\&= g^*(\nabla g(\xi ))-f^*(\nabla f(\xi ))+\xi ^T(\nabla f(\xi )-\nabla g(\xi ))\nonumber \\&=f(\xi )-g(\xi ). \end{aligned}$$
(22)

Conversely, let \(\hat{\nu }\) be a critical point of \(\hat{g}^*-\hat{f}^*\), i. e. \(\nabla (\hat{g}^*(\hat{\nu }) -\hat{f}^*(\hat{\nu })) = 0\). Using Lemma 9 (both parts), we may therefore choose \(\nabla f^*(\nu ) \in \partial f^*(\nu )\) such that \(\nabla (g^*(\nu ) -f^*(\nu )) = 0\). As above, it follows that

$$\begin{aligned} \nabla f^* (\nabla g(\nabla g^*(\nu )))=\nabla f^*(\nu )=\nabla g^*(\nu ). \end{aligned}$$

Applying \(\nabla f\) to both sides, we obtain that \(\nabla g^*(\nu )\) is also a critical point of \(f-g\), and as above we may furthermore deduce that it satisfies

$$\begin{aligned} f(\nabla g^*(\nu )) - g(\nabla g^*(\nu )) = g^*(\nu )-f^*(\nu ). \end{aligned}$$
(23)

Suppose in addition that \(\xi \) is a maximizer such that \(f(\xi )-g(\xi )=\sup \{f-g\}\). Then, by convex duality (8) and (22), we obtain that \(\nabla g(\xi )\) is a maximizer of \(g^*-f^*\). Conversely let \(\nu \) be a maximizer of \(g^*-f^*\), then by (8) and (23) also \(\nabla g^*(\nu )\) is a maximizer of \(f-g\). \(\square \)

Subsequently, no subdifferentials will appear and all \(\nabla , \partial \) are classical derivatives.

3 Finding the Minimizer of \(\phi \)

To start, we derive the appearance of the function

$$\begin{aligned} \phi _N(\xi ):=\phi (\xi ):=\tfrac{1}{2} \xi ^T\mathcal A \xi -\sum _{k=1}^s \frac{\left|S_k\right|}{N} \log \left( \sum _{c=1}^q \exp \left[ \xi ^T\mathcal A(e_k\otimes e_c) \right] \right) \end{aligned}$$
(24)

in Lemma 7 as the exponential density for the smoothened distribution of the magnetization under the Gibbs measure.

Proof

(Proof of Lemma 7)

As the Hubbard-Stratonovich transform from [11, 23] suggests, we begin to evaluate for any measurable set \(U\in \mathcal B ({{\,\mathrm{\mathbb {R}}\,}}^{sq})\)

$$\begin{aligned}&\mu _N\circ ( \mathcal S^{\theta }m)^{-1}*\mathcal N (0,N( \mathcal S^{1-\theta }\mathcal A \mathcal S^{1-\theta })^{-1})(U)\\&\quad =c_N\sum _{\omega \in \{1,\dots q\}^N} \int _U\exp \left[ -\frac{1}{2}\left( x-\mathcal S^\theta m\right) ^T\left( \frac{\mathcal S^{1-\theta }\mathcal A\mathcal S^{1-\theta }}{N}\right) \left( x-\mathcal S^\theta m\right) \right. \\ {}&\left. \qquad +\frac{1}{2} \left( \frac{\mathcal S m}{\sqrt{N}}\right) ^T\mathcal A \frac{\mathcal S m}{\sqrt{N}}\right] d^{sq}x \\&\quad =c_N \int _U \exp \left[ -\tfrac{1}{2} x^T\left( \frac{\mathcal S^{1-\theta }\mathcal A\mathcal S^{1-\theta }}{N}\right) x\right] \sum _{\omega \in \{1,\dots q\}^N} \exp \left[ x^T\left( \frac{\mathcal S^{1-\theta }\mathcal A \mathcal S}{N}\right) m\right] d^{sq}x\\&\quad =c_N \int _U \exp \left[ -\tfrac{1}{2} x^T\left( \frac{\mathcal S^{1-\theta }\mathcal A\mathcal S^{1-\theta }}{N}\right) x\right] \sum _{\omega _1=1}^q\cdots \sum _{\omega _N=1}^q\\ {}&\qquad \times \exp \left[ \tfrac{1}{ N} \sum _{k=1 }^s \sum _{c=1}^q \sum _{i\in S_k}x^T(\mathcal S^{1-\theta }\mathcal A)(e_k\otimes e_c){\mathbbm {1}}_c(\omega _i)\right] d^{sq}x\\&\quad =c_N \int _U \exp \left[ -\tfrac{1}{2} x^T\left( \frac{\mathcal S^{1-\theta }\mathcal A\mathcal S^{1-\theta }}{N}\right) x\right] \sum _{\omega _1=1}^q\cdots \sum _{\omega _N=1}^q\prod _{k=1}^s\prod _{i\in S_k}\\ {}&\qquad \times \exp \left[ \tfrac{1}{ N} x^T(\mathcal S^{1-\theta }\mathcal A)(e_k\otimes e_{\omega _i})\right] d^{sq}x\\&\quad =c_N \int _U \exp \left[ -\tfrac{1}{2} x^T\left( \frac{\mathcal S^{1-\theta }\mathcal A\mathcal S^{1-\theta }}{N}\right) x\right] \\&\qquad \times \prod _{k=1}^s\left( \sum _{c=1}^q \exp \left[ \tfrac{1}{ N} x^T(\mathcal S^{1-\theta }\mathcal A)(e_k\otimes e_c)\right] \right) ^{\left|S_k\right|}d^{sq}x, \end{aligned}$$

where in the last steps we basically used the fact that in each block \(S_k\), the quantity is independent of the individual node \(\omega _i\). Slightly rewriting, we therefore obtain

$$\begin{aligned}&\mu _N\circ ( \mathcal S^\theta m)^{-1}*\mathcal N (0,N(\mathcal S^{1-\theta }\mathcal A \mathcal S^{1-\theta })^{-1})(U)\\&\quad =c_N \int _U \exp \left[ -\frac{N}{2} \left( \frac{\mathcal S^{1-\theta }}{N}x \right) ^T\mathcal A \left( \frac{\mathcal S^{1-\theta }}{N}x \right) \right. \\ {}&\qquad \left. +\sum _{k=1}^s \left|S_k\right|\log \left( \sum _{c=1}^q \exp \left[ \left( \frac{\mathcal S^{1-\theta }}{N}x \right) ^T\mathcal A(e_k\otimes e_c) \right] \right) \right] d^{sq}x. \end{aligned}$$

For simplicity, we write \(\phi \) as presented in (24) and shift by an arbitrary \(v\in {{\,\mathrm{\mathbb {R}}\,}}^{sq}\) to obtain

$$\begin{aligned}&\mu _N\circ \left( \mathcal S^\theta (m-v)\right) ^{-1}*\mathcal N \left( 0,N( \mathcal S^{1-\theta }\mathcal A \mathcal S^{1-\theta })^{-1}\right) (d^{sq}x)\nonumber \\&\quad =c_N \exp \left[ - N \phi \left( \left( \frac{\mathcal S}{ N}\right) ^{1-\theta }\frac{x}{ N^{\theta }}+\frac{\mathcal S}{N}v \right) \right] d^{sq}x, \end{aligned}$$

as claimed. \(\square \)

The central aim of this section is to find the minimizer of \(\phi \). To see that the set of minimizers is non-empty, note that there exists an \(R>0\) such that

$$\begin{aligned} \phi (\xi )\ge \tfrac{1}{3}\xi ^T\mathcal A \xi \quad \text {for all }\left\Vert \xi \right\Vert >R \end{aligned}$$
(25)

for N sufficiently large, i.e., \(\varphi \) behaves at least quadratically for \(\Vert \xi \Vert \) large. Indeed, this follows from considering the limit \(\left\Vert \xi \right\Vert \rightarrow \infty \) and \(\mathcal A\) being positive definite, i. e. \(\tfrac{1}{6} \xi ^T\mathcal A \xi \ge \lambda \left\Vert \xi \right\Vert ^2\) for some \(\lambda >0\).

It turns out that finding the minimizer in the case of (asymptotically) equal block sizes is fairly easily dealt with by the convex duality results from the previous section and [19].

Lemma 11

The minimum \(\phi ^*=\inf _{\xi \in {{\,\mathrm{\mathbb {R}}\,}}^ {sq}}\phi (\xi )\) of \(\phi \) is attained at

$$\begin{aligned} \xi ^*\in \Big \{\xi \in (0,1) ^{sq}: \sum _{c=1}^{q}\xi _{k,c}=\frac{\left|S_k\right|}{N}\text { for all }k=1,\dots ,s\Big \} = C \end{aligned}$$
(26)

with C as in (15). Under the conditions of Theorem 1, the minimum is uniquely attained at \(\xi ^* = \frac{1}{q s}\mathbf{1} _{sq}\) in the case of equal block sizes. If the block sizes are asymptotically equal only, then for any (big) \(R>0\) and any (small) \(r>0\) to be chosen later, any minimizer \(\xi _N^*\) of \(\phi _N\) in \(B_R(0)\) satisfies \(\xi ^*_N\in B_r(\frac{1}{q s}\mathbf{1} _{sq})\) for N large enough.

Proof

To see the first claim, first note that

$$\begin{aligned} \nabla \phi (\xi )=\mathcal A\xi - \sum _{k=1}^s\frac{\left|S_k\right|}{N} \frac{\sum _{c=1}^q\mathcal A(e_k\otimes e_c)\exp (\xi ^T\mathcal A(e_k\otimes e_c))}{\sum _{c=1}^q\exp (\xi ^T\mathcal A(e_k\otimes e_c))}=0, \end{aligned}$$
(27)

so that we must have \(\xi _{k,c} > 0\). Moreover, we may plug the upper identity into the s equations \(\sum _{c=1}^q \partial _{k,c} \phi (\xi ) = 0\), which immediately leads to the result.

In order to find the supremum of \(-\phi \), note that by Lemma 10,

$$\begin{aligned} \sup _{\xi \in {{\,\mathrm{\mathbb {R}}\,}}^{sq}} \{-\phi (\xi )\}&= \sup _{\tilde{\xi }=\mathcal A \xi }\left\{ \sum _{k=1}^s\frac{\left|S_k\right|}{N}\log \sum _{c=1}^q\exp (\tilde{\xi }_{k,c})- \frac{1}{2}\tilde{\xi }^T\mathcal A ^{-1}\tilde{\xi }\right\} \\&=\sup _{\tilde{\xi }\in {{\,\mathrm{\mathbb {R}}\,}}^{sq}}\left\{ f(\tilde{\xi })-g(\tilde{\xi })\right\} \\&=\sup _{\nu \in \mathrm {dom}(f^*)}\{g^*(\nu )-f^*(\nu )\}, \end{aligned}$$

and if \(\nu ^*\) is a maximizer of \(g^*-f^*\), then \(\tilde{\xi }^*=\nabla g^*(\nu ^*)\) maximizes \(f-g\). Since \(\nabla g^*(\nu ) = \mathcal A \nu \), it follows that \(\nu ^*\) is a minimizer of \(\phi \).

Therefore, it remains to check the maximizers of \(g^*-f^*\). Here, using (14), we have

$$\begin{aligned} \sup _{\nu \in \mathrm {dom}(f^*)} \{g^*(\nu ) - f^*(\nu )\} = \sup _{\nu \in C}\left\{ \frac{1}{2}\nu ^T\mathcal A \nu -\nu ^T\log \nu \right\} +\mathcal H. \end{aligned}$$

In [19, §4 & (4.1)], it has been shown that for equal block sizes \(|S_k|=N/s\), the supremum on the right-hand side is attained at a unique point \(\nu ^*=\frac{1}{ sq} \mathbf{1}_{sq}\). For asymptotically equal block sizes, the function \(\phi _N\) converges uniformly on compact sets to

$$\begin{aligned} \phi _\infty (\xi ) :=\tfrac{1}{2} \xi ^T\mathcal A \xi -\frac{1}{s}\sum _{k=1}^s \log \left( \sum _{c=1}^q \exp \left[ \xi ^T\mathcal A(e_k\otimes e_c) \right] \right) , \end{aligned}$$

i.e., \(\phi \) from (24) for (non-asymptotically) equal block sizes. Take \(R > 0\) such that \(\frac{1}{sq}\mathbf{1}_{sq}\in B_R(0)\). Then, for any small \(\varepsilon > 0\), the minimum \(\phi _N^*\) of \(\phi _N\) must satisfy \(\phi ^*-\varepsilon< \phi _N^* < \phi ^*+\varepsilon \) by uniform convergence. In particular, it follows that if \(\xi _N^*\) is a minimizer of \(\phi _N\), then by continuity, for any small \(r > 0\), \(\xi ^*_N\in B_r(\frac{1}{ sq} \mathbf{1}_{sq})\) for any N sufficiently large. \(\square \)

Let us now turn to the case of general block sizes, for which we will study \(\phi \) in more detail. This analysis is inspired by [11], which in turn relies on the ideas of [16]. In our case of \(s>1\) blocks, however, we need to replace several arguments that cannot be generalized directly, like monotonicity (in \(\beta \) and t defined below), real convex functions having at most two zeros or applying the logarithm.

According to (27), any minimizer satisfies the critical equation

$$\begin{aligned} \xi _{k,c}:=(e_k\otimes e_c)^T\xi = \frac{\left|S_k\right|\exp (\xi ^T(Ae_k\otimes e_c))}{N\sum _{\tilde{c}=1}^q\exp (\xi ^T(Ae_k\otimes e_{\tilde{c}}))}. \end{aligned}$$
(28)

For simplicity we write \(\xi _{\cdot ,c}=(I_s\otimes e_c^T)\xi \in {{\,\mathrm{\mathbb {R}}\,}}^s\) for the vector of a color c in different blocks and \(\xi _{k,\cdot }=(e_k^T\otimes I_q)\xi \in {{\,\mathrm{\mathbb {R}}\,}}^q\) for the vector of colors in a block k. Using the componentwise exponential and abbreviating \(w_k=\frac{\exp (\xi ^T(Ae_k\otimes I_q))}{\sum _{\tilde{c}=1}^q\exp (\xi ^T(Ae_k\otimes e_{\tilde{c}}))}\in {{\,\mathrm{\mathbb {R}}\,}}^q\), we may write the Hessian of \(\phi \) as the block matrix

$$\begin{aligned} H_\phi (\xi )&=A\otimes I_q-\sum _{k=1}^s\left( \frac{AS}{N}e_k(Ae_k)^T \right) \otimes \left( {{{\,\mathrm{diag}\,}}}(w_k)-w_kw_k^T\right) \nonumber \\&=\mathcal A\left[ I_{sq}-\sum _{k=1}^s\left( e_ke_k^T \right) \otimes \left( {{{\,\mathrm{diag}\,}}}(\tfrac{\left|S_k\right|}{N}w_k)-\tfrac{\left|S_k\right|}{N}w_kw_k^T\right) \mathcal A\right] . \end{aligned}$$
(29)

Also note that by (28), any minimizer \(\xi \) satisfies \(\xi _{k,\cdot }=\tfrac{\left|S_k\right|}{N}w_k\).

Rearranging (28) and applying the logarithm, we obtain for all \(k=1,\dots ,s\) that

$$\begin{aligned}&\xi _{\cdot ,c}^TA e_k-\log (\xi _{\cdot ,c}^T e_k)=\log \left( \sum _{\tilde{c}=1}^q\exp (\xi ^T_{\cdot ,\tilde{c}}Ae_k)\right) -\log \left( \frac{\left|S_k\right|}{N}\right) \end{aligned}$$
(30)

is independent of \(c=1,\dots ,q\). Multiplying the same equation with \(\xi _{k,c}\) and summing over k and c implies that for any critical point \(\xi ^*\) the value of \(\phi \) can also be represented as a relative entropy

$$\begin{aligned} \phi (\xi ^*)=-\tfrac{1}{2} \sum _{k,c}\xi _{k,c}\log \left( \frac{\sum _{\tilde{c}}^q \exp \big (\xi _{\cdot ,\tilde{c}}^TAe_k\big )}{\xi _{k,c}}\right) -\tfrac{1}{2}\mathcal H(|S_\cdot |/N). \end{aligned}$$

Let us collect some important properties that will help to classify and parametrize minimizers, generalizing [19, Lemmas 4.3 & 4.4].

Lemma 12

Any minimizer \(\xi \) of \(\phi \) satisfies

  1. 1

    The color-coordinates may be ordered decreasingly, i.e. \(\xi _{k,1}\ge \dots \ge \xi _{k,q}\) for all k.

  2. 2

    If for some \(c=1,\dots ,q\) we have \(\xi _{k,c}=\xi _{k,c+1}\) for some k, then it holds for all k.

  3. 3

    There are at most two different entries in \(\xi _{k,\cdot }\), for any \(k=1,\dots ,s\).

This is the only proof that uses the assumption of A having positive entries.

Proof

(1) As we have seen in the proof of Lemma 11, we have

$$\begin{aligned} \sup _{\xi \in {{\,\mathrm{\mathbb {R}}\,}}^{sq}} \{-\phi (\xi )\} = \sup _{\nu \in C}\left\{ \frac{1}{2}\nu ^T\mathcal A \nu -\nu ^T\log \nu \right\} - \mathcal H. \end{aligned}$$

Hence, by convex duality \(\xi \) is a minimizer if and only if \(\nu =\xi \) is a minimizer of \(\xi ^T\log \xi -\tfrac{1}{2} \xi ^T\mathcal A \xi \). Let us assume that the color coordinates are ordered \(\xi _{k,1}\ge \dots \ge \xi _{k,q}\) for all k and show that any tuple \(\sigma \) of s permutations \(\sigma _k\in \mathbb S ^q\), given by \((e_k\otimes e_c)^T\xi _\sigma =\xi _{k,\sigma _k(c)}\) would only increase the value of \(\phi \). Indeed, for any \(j,k=1,\dots ,s\) the rearrangement inequality [15, Theorem 368] states that \(\sum _{c=1}^q\xi _{j,c}\xi _{k,c}\ge \sum _{c=1}^q\xi _{j,\sigma _j(c)}\xi _{k,\sigma _k(c)} \), hence

$$\begin{aligned} \xi ^T\log \xi -\frac{1}{2} \xi ^T\mathcal A \xi&=\sum _{k=1} ^s\sum _{c=1}^q\xi _{k,c}\log \xi _{k,c}-\frac{1}{2}\sum _{j,k=1}^sA_{j,k}\sum _{c=1}^q\xi _{j,c}\xi _{k,c}\nonumber \\ {}&\le \xi _\sigma ^T\log \xi _\sigma -\frac{1}{2} \xi _\sigma ^T\mathcal A \xi _\sigma . \end{aligned}$$
(31)

Here, we used that \(\xi ^T\log \xi \) is invariant under permutation of the vector and that \(A_{j,k}> 0\). Note that the rearrangement inequality is strict if two different indices are reordered differently, i.e., the value of (31) may only increase. Thus, if there exists \(j\ne k\) and \(\tilde{c} <c\) with different orderings \(\xi _{j,\tilde{c}}<\xi _{j,c}\) and \(\xi _{k,\tilde{c}}>\xi _{k,c}\), then it is not a minimum of (31) and not a minimum of \(\phi \).

(2) Assume \(\xi _{k,c}=\xi _{k, c+1}\) for some \(k=1,\dots ,s \) and some c and consider the difference of (30) for kc and for \(k,c+1\), which is equivalent to

$$\begin{aligned} (\xi _{\cdot ,c}-\xi _{\cdot ,c+1})^TAe_k-\log (\xi _{k,c}/\xi _{k,c+1})=0\ \Leftrightarrow \ \sum _{j=1}^sA_{j,k}\xi _{j,c}=\sum _{j=1}^sA_{j,k}\xi _{j,c+1}. \end{aligned}$$

On the other hand by (1), we have \(\xi _{k,c}\ge \xi _{k, c+1}\) for all k and equality follows from \(A_{j,k}> 0\).

(3) Suppose there exists \(c_1<c_2<c_3\) with \(\xi _{k,c_1}>\xi _{k,c_2}>\xi _{k,c_3}\) for some \(k=1,\dots ,s\) and hence for all k due to (2). Again, from (30) it follows

$$\begin{aligned} \frac{(\xi _{\cdot ,c_1}-\xi _{\cdot ,c_2})^TAe_k}{\xi _{k,c_1}-\xi _{k,c_2}}&=\frac{\log (\xi _{k,c_1})-\log (\xi _{k,c_2})}{\xi _{k,c_1}-\xi _{k,c_2}}<\frac{\log (\xi _{k,c_1})-\log (\xi _{k,c_3})}{\xi _{k,c_1}-\xi _{k,c_3}}\\ {}&= \frac{(\xi _{\cdot ,c_1}-\xi _{\cdot ,c_3})^TAe_k}{\xi _{k,c_1}-\xi _{k,c_3}}, \end{aligned}$$

where the inequality is nothing but the concavity of the logarithm. Multiplying with \((\xi _{k,c_1}-\xi _{k,c_2})\) and \((\xi _{k,c_1}-\xi _{k,c_3})\) and then summing over \(k=1,\dots ,s \) implies

$$\begin{aligned} (\xi _{\cdot ,c_1}-\xi _{\cdot ,c_2})^TA(\xi _{\cdot ,c_1}-\xi _{\cdot ,c_3})<(\xi _{\cdot ,c_1}-\xi _{\cdot ,c_3})^TA(\xi _{\cdot ,c_1}-\xi _{\cdot ,c_2}), \end{aligned}$$

which is a contradiction since A is symmetric. \(\square \)

Rewriting Lemma 12, we may search the minimizer \(\xi ^*\) of \(\phi \) among the vectors \(\xi \) for which there exists \(l \in \{0, 1, \ldots , q-1\}\) such that for any \(k = 1, \ldots , s\)

$$\begin{aligned} \xi _{k,1} = \ldots = \xi _{k,l} > \xi _{k,l+1} = \ldots = \xi _{k,q}. \end{aligned}$$

Recalling (26), for each k there exists \(t_k \in [0,1]\) such that the two possible values are given by \(|S_k|t_k/(Nl) + |S_k|(1-t_k)/(Nq)\) and \(|S_k|(1-t_k)/(Nq)\), i. e. any minimizer is of the form

$$\begin{aligned} \xi = \xi (t) =\sum _{c=1}^l\frac{St}{Nl}\otimes e_c+ \frac{S}{N}(\mathbf{1}_s-t)\otimes \tfrac{1}{q}\mathbf{1}_q \end{aligned}$$
(32)

for some \(t \in [0,1]^s\). Next, we show that \(l\le 1\), i. e. only the first coordinate may have a different value from the remaining ones.

Lemma 13

Any minimizer of \(\phi \) is given by

$$\begin{aligned} \xi (t)=\frac{S}{N}t\otimes e_1+ \frac{S}{N}(\mathbf{1}_s-t)\otimes \tfrac{1}{q}\mathbf{1}_q \end{aligned}$$
(33)

for some \(t\in [0,1]^s\).

This replaces [11, Equation (2.8)], where \(t\in [0,1]\). Surprisingly, it is not sufficient to consider values \(t=\tilde{t} \mathbf{1}_s\), \(\tilde{t}\in [0,1]\), in (33) in order to follow the lines of [11]. It will be apparent from Figure 1 below that minima might appear off the diagonal. In the sequel, we will use the Loewner order of matrices \(X>_LY\) (or \(X\ge _L Y\)) if \(X-Y\) is positive (semi-) definite.

Proof

Let us first consider the case \({{{\,\mathrm{diag}\,}}}(\xi _{\cdot ,1})\le _L A^{-1}\), which is equivalent to \(A\le _L{{{\,\mathrm{diag}\,}}}(\xi _{\cdot ,1})^{-1}\). Let us view the left hand side of (30) as the component \(g_k(\xi _{\cdot ,c})\) for a function \(g:{{\,\mathrm{\mathbb {R}}\,}}^s\rightarrow {{\,\mathrm{\mathbb {R}}\,}}^s\). Then by our assumption, \(Dg(x)=A-{{{\,\mathrm{diag}\,}}}(x)^{-1}<_L0\) for all \(x\in {{\,\mathrm{\mathbb {R}}\,}}^s\) such that \({{{\,\mathrm{diag}\,}}}(x)<_L{{{\,\mathrm{diag}\,}}}(\xi _{\cdot ,1})\). Hence \(g(\xi _{\cdot ,c})=g(\xi _{\cdot ,1})\) readily implies \(\xi _{\cdot ,c}=\xi _{\cdot ,1}\), i.e. \(\xi =\xi (0)\).

On the other hand, let us now assume \({{{\,\mathrm{diag}\,}}}(\xi _{\cdot ,1})\not \le _L A^{-1}\). Since A is symmetric and positive definite, this is equivalent to \(A-A{{{\,\mathrm{diag}\,}}}(\xi _{\cdot ,1})A\not \ge _L0\), that is \(u^T(A-A{{{\,\mathrm{diag}\,}}}(\xi _{\cdot ,1})A)u<0\) for some \(u\in {{\,\mathrm{\mathbb {R}}\,}}^s\). Now assume \(l>1\) in (32). In this case, if \(\xi \) is a minimizer of \(\phi \), then \(x=0\) is a minimum of the function \(h(x)=\phi (\xi +xu\otimes (e_1-e_l))\). Contrarily its second derivative is explicitly given by

$$\begin{aligned} h''(0)&=(u\otimes (e_1-e_l))^TH_\phi (\xi )(u\otimes (e_1-e_l))\\&=2u^TAu-\sum _{k=1}^s\left( u^TAe_k\right) ^2(e_1-e_l)^T\left( {{{\,\mathrm{diag}\,}}}(\xi _{k,\cdot })-\tfrac{N}{\left|S_k\right|}\xi _{k,\cdot }\xi _{k,\cdot }^T\right) (e_1-e_l)\\&=2u^TAu-\sum _{k=1}^s \left( u^TAe_k\right) ^2\left( \xi _{k,1}+\xi _{k,l}-\tfrac{N}{\left|S_k\right|}(\xi _{k,1}-\xi _{k,l})^2\right) \\&=2u^T\left( A-A{{{\,\mathrm{diag}\,}}}(\xi _{\cdot ,1})A\right) u<0. \end{aligned}$$

Therefore \(\xi \) cannot be a minimizer of \(\phi \) and the contradiction proves \(l\le 1\).Footnote 2\(\square \)

Fig. 1
figure 1

The function \([0,1]^2\ni t\mapsto \phi (\xi (t))\), for \(q=5\) colors\(^5\), \(s=2\) blocks of size \(\gamma =(1/4,3/4)\) and different regimes of the norm of the interaction matrix: Smaller than \(4\tfrac{q-1}{q}=3.2\) (left), bigger than \(\zeta _q=\tfrac{8}{3} \log 4\approx 3.7\) (right) and in between (middle). The left graph clearly shows that for \(\Vert \sqrt{\Gamma }\mathcal A\sqrt{\Gamma }\Vert =3.1<3.2\), the (global) minimum is attained at \(t=0\). If we increase the inverse temperature to \(3.2<\Vert \sqrt{\Gamma }\mathcal A\sqrt{\Gamma }\Vert =3.65< 3.7\) (center), another local minimum appears at \(t\ne 0\). If \(\Vert \sqrt{\Gamma }\mathcal A\sqrt{\Gamma }\Vert =3.8>\zeta _q\) (right), then the new local minimum becomes global and a CLT around \(\tfrac{1}{q} \mathbf{1}_{sq}\) cannot hold.

Next, we study the minimizer of \(\phi (\xi (t))\) in the regime where the interaction matrix has a small norm as illustrated in Fig. 1 on the left. Figure 1 also exposes the difficulty of the intermediate regime, which is not covered by our results for inhomogeneous block sizes.

Proposition 14

If \(\left\Vert \sqrt{\tfrac{S}{N}}A\sqrt{\tfrac{S}{N}}\right\Vert _2<4\frac{q-1}{q}\), then \(\xi ^*= \tfrac{\mathcal S}{qN} \mathbf{1} _{sq}\) is the unique minimizer of \(\phi \).

Proof

By Lemma 13 it remains to study the minima of \(\Phi (t)=\phi (\xi (t))\) as a function of \(t\in [0,1]^s\). We have

$$\begin{aligned} \Phi (t)&= \tfrac{1}{2} \xi (t)^T(A\otimes I_q)\xi (t)-\sum _{k=1}^s\frac{\left|S_k\right|}{N} \log \left[ \sum _{c=1}^q\exp (\xi (t)^T(Ae_k\otimes e_c))\right] \\&=\tfrac{1}{2} t^T \frac{SAS}{N^2}t+t^T\frac{SAS}{N^2}(\mathbf{1}_s-t)\tfrac{1}{q}+\tfrac{1}{2}(\mathbf{1}_s-t)^T\frac{SAS}{N^2}(\mathbf{1}_s-t)\tfrac{1}{q}\\&\quad -\sum _{k=1}^s\frac{\left|S_k\right|}{N} \log \left[ \exp \Big (\big (\tfrac{1}{q}\mathbf{1}_s+\tfrac{q-1}{q} t\big )^T\tfrac{SA}{N}e_k\Big )+(q-1)\exp \Big (\tfrac{1}{q}\big (\mathbf{1}_s-t)^T\tfrac{SA}{N}e_k\big )\Big ) \right] \\&=\frac{q-1}{2q} t^T\frac{SAS}{N^2}t +\frac{1}{2q}\mathbf{1}_s^T \frac{SAS}{N^2}\mathbf{1}_s\\&\quad -\sum _{k=1}^s\frac{\left|S_k\right|}{N} \left( \tfrac{1}{q}\mathbf{1}^T_s \frac{SA}{N} e_k+ \tfrac{q-1}{q} t^T\tfrac{SA}{N}e_k+\log \left[ 1+(q-1)\exp \big (-t^T\tfrac{SA}{N}e_k\big )\right] \right) \\&=\frac{q-1}{2q} t^T\frac{SAS}{N^2}t -\frac{1}{2q}\mathbf{1}_s^T \frac{SAS}{N^2}\mathbf{1}_s-\frac{q-1}{q} t^T\frac{SAS}{N^2}\mathbf{1}_s\\ {}&\quad -\sum _{k=1}^s\frac{\left|S_k\right|}{N} \log \left[ 1+(q-1)\exp \big (-t^T\tfrac{SA}{N}e_k\big )\right] . \end{aligned}$$

In this form, we easily obtain the derivative

$$\begin{aligned} \nabla \Phi (t)&=\frac{q-1}{q}\frac{SAS}{N^2}(t-\mathbf{1}_s)+\sum _{k=1}^s\frac{\left|S_k\right|}{N} \frac{(q-1)\tfrac{SA}{N}e_k\exp \big (-t^T\tfrac{SA}{N}e_k\big ) }{1+(q-1)\exp \big (-t^T\tfrac{SA}{N}e_k\big )}\\&=\sum _{k=1}^s\frac{q-1}{q}\frac{SAS}{N^2}e_k\left( \frac{1+(q-1)t_k-(1-t_k)\exp \big (t^T\tfrac{SA}{N}e_k\big ) }{(q-1)+\exp \big (t^T\tfrac{SA}{N}e_k\big )}\right) \\&=\frac{q-1}{q}\frac{SAS}{N^2}\left( \sum _{k=1}^s\frac{q e_k}{(q-1)+\exp \big (t^T\tfrac{SA}{N}e_k\big )}-\big (\mathbf{1}_s-t\big )\right) \end{aligned}$$

Similar to (28), a critical point satisfies \(\nabla \Phi =0\) and it followsFootnote 3

$$\begin{aligned} t=\mathbf{1}_s -\sum _{k=1}^s\frac{q e_k}{(q-1)+\exp \Big (t^T\tfrac{SA}{N}e_k\Big )}=:h(t), \end{aligned}$$
(34)

hence we are looking for a fixed point of the auxiliary function h. The most obvious candidate is the fixed point \(t=0\), which we shall prove to be unique. The Jacobian of h is given by

$$\begin{aligned} Dh(t)= \sum _{k=1}^s\frac{q e_k\big (\tfrac{SA}{N}e_k\big )^T\exp \big (t^T\tfrac{SA}{N}e_k\big )}{\Big ((q-1)+\exp \big (t^T\tfrac{SA}{N}e_k\big )\Big )^2} = \sum _{k=1}^s\frac{q e_ke_k^T \exp \big (t^T\tfrac{SA}{N}e_k\big )}{\Big ((q-1)+\exp \big (t^T\tfrac{SA}{N}e_k\big )\Big )^2}A\frac{S}{N}. \end{aligned}$$

The fixed point \(t=0\) is unique for h if and only if it is the unique fixed point for \(\tilde{h}(t)=(\tfrac{S}{N})^{1/2}h((\tfrac{S}{N})^{-1/2}t)\). The latter follows from the Banach fixed point theorem, if the spectral norm of the Jacobian

$$\begin{aligned} D\tilde{h} (t)&=(\tfrac{S}{N})^{1/2}(Dh)((\tfrac{S}{N})^{-1/2}t)(S/N)^{-1/2}\\ {}&= \sum _{k=1}^s\frac{q e_ke_k^T \exp \Big (t^T\sqrt{\tfrac{S}{N}}Ae_k\Big )}{\Big ((q-1)+\exp \Big (t^T\sqrt{\tfrac{S}{N}}Ae_k\Big )\Big )^2}\sqrt{\frac{S}{N}}A\sqrt{\frac{S}{N}} \end{aligned}$$

is smaller than 1 for all t. A simple analysis shows that the function \(x\mapsto q\exp (x)/(q-1+\exp (x))^2\) has its maximum at \(x=\log (q-1)\) with value \(q/(4(q-1))\).Footnote 4 Therefore, the spectral norm is bounded by

$$\begin{aligned} \left\Vert D\tilde{h}(t)\right\Vert _2\le \frac{q}{4(q-1)}\left\Vert \sqrt{S}A\sqrt{S}/N\right\Vert _2<1, \end{aligned}$$

the first inequality holds since the first matrix is diagonal and the second inequality follows from our assumption. The Banach fixed point theorem implies that the fixed point \(t=0\) is unique and the claim follows from Lemma 13.

\(\square \)

Note that the vector \(\xi ^*=\tfrac{\mathcal S}{qN}\mathbf{1}_{sq}\) is always a critical point. Moreover, it follows from Lemma 11 and Proposition 14 that for equal block sizes or in the situation of Theorem 2, it is a global minimizer. Next, we verify that the Hessian of \(\phi \) is positive definite at \(\xi ^*\) as well as in some neighbourhood which can be chosen uniformly for N sufficiently large, cf. [11, Lemma 3.4a].

Lemma 15

If \(\Vert \sqrt{\Gamma }\mathcal A \sqrt{\Gamma }\Vert _2<q\), then for N sufficiently large there exists a neighbourhood \(B_r(\xi ^*)\) of \(\xi ^*=\tfrac{\mathcal S}{qN}\mathbf{1}_{sq}\), where the Hessian of \(\phi \) is positive definite, i.e. there exists \(r,\lambda _{\min }>0\) such that

$$\begin{aligned} \xi ^TH_\phi (\tilde{\xi })\xi >\tfrac{1}{2} \lambda _{\min }\left\Vert \xi \right\Vert ^2 \end{aligned}$$
(35)

for all \(\xi \in {{\,\mathrm{\mathbb {R}}\,}}^{sq},\tilde{\xi }\in B_r(\xi ^*)\).

In particular, for asymptotically equal block sizes, \(\xi ^*\) converges to \(\frac{1}{q s}\mathbf{1} _{sq}\) and any minimizer \(\xi _N^*\) lies in a neighborhood of \(\frac{1}{q s}\mathbf{1} _{sq}\) for N large enough by Lemma 11. Combining these facts with Lemma 15, we see that for N large enough, \(\xi ^*\) and the so far unspecified vectors \(\xi _N^*\) from Lemma 11 are minimizers which all lie in an area of positive definite Hessian. This readily implies uniqueness of the minimizer in some ball \(B_R(0)\). Choosing R sufficiently large (cf. (25)) it follows that the unique minimizer of \(\phi \) is indeed given by \(\xi ^*=\tfrac{\mathcal S}{qN}\mathbf{1}_{sq}\) under the assumptions from Theorem 1.

Proof

The Hessian is given by (29), where we already saw that at the critical point \(\xi ^*=\tfrac{\mathcal S}{qN}\mathbf{1}_{sq}\) we may plug in the critical equation (28). Thus, in terms of the Loewner order, we have the equivalence

$$\begin{aligned} H_\phi (\xi ^*)&=\mathcal A\left[ I_{sq}-\sum _{k=1}^s(e_k e_k^T)\otimes \left( {{{\,\mathrm{diag}\,}}}(\xi ^*_{k,\cdot }) -\tfrac{N}{\left|S_k\right|}\xi ^*_{k,\cdot }{\xi ^*_{k,\cdot }}^T\right) \mathcal A\right] \nonumber \\&=A\otimes I_{q}-\frac{ASA}{q N}\otimes (I_q-\tfrac{1}{q}\mathbf{1}_{q\times q})>_L0 \end{aligned}$$
(36)
$$\begin{aligned}&\Leftrightarrow \frac{\sqrt{S}A\sqrt{S}}{N}\otimes I_{q}-\frac{\sqrt{S} ASA\sqrt{S}}{q N^2}\otimes (I_q-\tfrac{1}{q}\mathbf{1}_{q\times q})>_L0\nonumber \\&\Leftrightarrow \frac{\sqrt{S} A\sqrt{S}}{q N}\otimes (I_q-\tfrac{1}{q}\mathbf{1}_{q\times q})<_L I_s\otimes I_q. \end{aligned}$$
(37)

Similarly to the proof of Lemma 13, these equivalences hold since all the occurring matrices are symmetric and positive definite. For example the eigenvectors of \(A_{\alpha ,\beta }\) are \(v_1=\mathbf{1}_s\) with eigenvalue \(\lambda _1=\beta +(s-1)\alpha \) and \(v_k=e_k-e_{k-1}\) for \(1<k\le s\) having eigenvalue \(\lambda _k=\beta -\alpha >0\). In the same notation, they are also the eigenvectors of \(I_q-\tfrac{1}{q}\mathbf{1} _{q\times q}\) (corresponding to \(s\leftrightarrow q\), \(\beta =(q-1)/q\) and \(\alpha =-1/q\)) to the eigenvalues 0 and 1. In other words we have \(0\le _L I_q-\tfrac{1}{q}\mathbf{1} _{q\times q}\le _L I_q\). By [1, Theorem 2.5] it holds that the Kronecker product preserves the Loewner order. Applied to our setting, it remains to show \(\frac{\sqrt{S} A\sqrt{S}}{Nq}<_LI_s\), since then (37) follows from the Kronecker product with \(I_q-\tfrac{1}{q}\mathbf{1} _{q\times q}\le _L I_q\). Recall our assumption \(\Vert \frac{\sqrt{S} A\sqrt{S}}{N}\Vert _2\le q\) hence \(\sqrt{S}A\sqrt{S}/N<_LqI_s\) holds. Thus we have shown \(H_\phi (\xi ^*)\) is positive definite with minimal eigenvalue \(\lambda _{\min } >0\). By continuity of \(H_\phi \), there exists \(r>0\) such that (35) follows. \(\square \)

4 Proofs of the Theorems

We will prove the Central Limit Theorems via convergence of the moment-generating function.Footnote 5

Proof

(Proof of Theorems 1 and 2)

According to the previous section, we assume that \(\xi ^*(=\frac{\mathcal S }{qN}\mathbf{1}_{sq})\) is the unique minimizer of \(\phi \), and the Hessian \(H_\phi \) is positive definite in some neighborhood. Applying Lemma 7 for \(\theta =1/2\), \(v=\big (\frac{\mathcal S}{N}\big )^{-1} \xi ^*(=\tfrac{1}{q}\mathbf{1} _{sq})\) and \(Z\sim \mathcal N (0,N(\sqrt{ \mathcal S}\mathcal A\sqrt{ \mathcal S})^{-1})\) implies for all \(t\in {{\,\mathrm{\mathbb {R}}\,}}^{sq}\)

$$\begin{aligned}&\int \exp \left[ t^T\big (Z+ \sqrt{\mathcal S}(m-v)\big )\right] d{{\,\mathrm{\mathbb {P}}\,}}\nonumber \\ =&c_N \int \exp \left[ - N \phi \left( \xi ^*+\sqrt{\frac{\mathcal S}{ N}}\frac{x}{\sqrt{N}}\right) +t^Tx\right] d^{sq}x\nonumber \\ =&c_Ne^{-N\phi ^*} \int \exp \left[ - N\big (\phi (\xi )-\phi ^*\big )+t^TN\sqrt{\mathcal S ^{-1}}(\xi -\xi ^*)\right] \left|\det (\sqrt{\mathcal S}/N) \right|^{-1}d^{sq}\xi , \end{aligned}$$
(38)

we used a change of variables \(\xi =\xi ^*+\sqrt{\mathcal S/ N}x /{\sqrt{N}}\) and complemented the minimum \(\phi ^*=\phi (\xi ^*)\) of \(\phi \) in the exponent. Note that the Jacobian is of order \(\sim N^{sq/2}\). For \(r>0\) from Lemma 15, we discuss the integral over the areas \(B_r(\xi ^*)\) and its complement separately. The latter can be bounded as follows. Since \(\xi \in B^c_r(\xi ^*)\) is not the global minimizer of \(\phi \), we have \(\phi (\xi )\ge \phi ^*-\delta \) for some \(\delta >0\). We choose \(R>0\) from (25) and possibly enlarge it so that \(\left\Vert t\right\Vert \le R\) in order to estimate

$$\begin{aligned}&\int _{B^c_r(\xi ^*)} \exp \left[ - N\big (\phi (\xi )-\phi ^*\big )+t^TN\sqrt{\mathcal S ^{-1}}(\xi -\xi ^*)\right] d^{sq}\xi \nonumber \\&\le \int _{B^c_r(\xi ^*)\cap B_R(0)} \exp \left[ -N\delta +c\sqrt{N} R^2\right] d^{sq}\xi \nonumber \\&\quad +e^{N\phi ^*}\int _{B^c_r(\xi ^*)\cap B^c_R(0)} \exp \left[ -\frac{N}{3} \xi ^T\mathcal A\xi + c \sqrt{N}\left\Vert \xi \right\Vert ^2\right] d^{sq}\xi \le Ce^{-\epsilon N}. \end{aligned}$$
(39)

for some \(\epsilon ,c,C>0\) and N sufficiently large. The last step follows from \(\phi ^*\le \phi (0)=-s\log q<0\) and \(\mathcal A\) being positively definite. We also see that \(\epsilon \) does depend on \(\mathcal A\) (or more precisely on \(\left|\beta -\alpha \right|\) for \(A=A_{\alpha ,\beta }\)), whereas C is independent of it.

The major contribution comes from the local part of \(\xi \in B_r(\xi ^*)\), where we use a second order Taylor approximation with Lagrange remainder, i.e. there exists an intermediate point \(\tilde{\xi }\in B_r(\xi ^*)\) between \(\xi \) and \(\xi ^*\) such that

$$\begin{aligned} \phi (\xi )=\phi (\xi ^*)+(\xi -\xi ^*)^T\nabla \phi (\xi ^*)+\tfrac{1}{2} (\xi -\xi ^*) ^T H_\phi (\tilde{\xi })(\xi -\xi ^*). \end{aligned}$$
(40)

Note that Lemma 11 or Proposition 14 respectively yields \(\phi (\xi ^*)=\phi ^*\) and \(\nabla \phi (\xi ^*)=0\), where \(\Vert \sqrt{S}A\sqrt{S}/N\Vert _2<4\frac{q-1}{q}\) holds for N sufficiently large. After plugging this into (38) and undoing the change of variables, we arrive at

$$\begin{aligned}&\lim _{N\rightarrow \infty } \int \exp \left[ t^T\big (Z+ \sqrt{\mathcal S}(m-v)\big )\right] d{{\,\mathrm{\mathbb {P}}\,}}\nonumber \\&\quad =\lim _{N\rightarrow \infty }c_N e^{- N\phi ^*}\left( \int _{\sqrt{N\mathcal S ^{-1}}B_{r\sqrt{N}}(0)} \exp \left[ -\tfrac{1}{2} x^T\sqrt{\frac{\mathcal S}{N}} H_\phi (\tilde{\xi })\sqrt{\frac{\mathcal S}{N}}x+t^Tx\right] d^{sq}x+\mathcal O (e^{-\epsilon N})\right) \nonumber \\&\quad =\lim _{N\rightarrow \infty }\left( c_N e^{- N\phi ^*}\left( \sqrt{(2\pi )^{sq}\det (\Theta )}+\mathcal O (e^{-\epsilon N})\right) \right) \int \exp \left[ t^Tx\right] d\mathcal N(0,\Theta )(x), \end{aligned}$$
(41)

where \(\Theta = (\sqrt{\Gamma } H_\phi (\xi ^*)\sqrt{ \Gamma })^{-1}\). The last step follows from the dominated convergence theorem with an integrable majorant that exists due to (35) and pointwise convergence holds, since the intermediate point converges \(\tilde{\xi }\rightarrow \xi ^*\rightarrow \Gamma \tfrac{1}{q} \mathbf{1}_{sq}\) as \(N\rightarrow \infty \) for each fixed x. In particular we obtain the normalizing constant from setting \(t=0\)

$$\begin{aligned} \lim _{N\rightarrow \infty }c_N e^{-N\phi ^*}=\sqrt{\det (\Theta ^{-1})/(2\pi )^{sq}}. \end{aligned}$$

The moment-generating function of Z is given by \(t\mapsto \exp [\tfrac{1}{2} t^TN(\sqrt{ \mathcal S}\mathcal A\sqrt{ \mathcal S})^{-1}t]\), which converges to \(\exp [\tfrac{1}{2} t^T(\sqrt{ \Gamma }\mathcal A\sqrt{\Gamma })^{-1}t]\) as \(N\rightarrow \infty \). Since Z is independent from m, the moment generating functions factorize and we conclude our claim

$$\begin{aligned} \lim _{N\rightarrow \infty }{{\,\mathrm{\mathbb {E}}\,}}\exp [ t^T \sqrt{\mathcal S}(m-N\mathcal S ^{-1}\xi ^*)]=\exp \left[ \tfrac{1}{2} t^T \left( \Gamma ^{-1/2} (\lim _NH_\phi (\xi ^*)^{-1}-\mathcal A^{-1})\Gamma ^{-1/2}\right) t\right] . \end{aligned}$$

The explicit representation of the Hessian has been evaluated in (36) already, hence

$$\begin{aligned} \lim _{N\rightarrow \infty } H_\phi (\xi ^*) = A\otimes I_{q}-\frac{A{{{\,\mathrm{diag}\,}}}(\gamma ) A}{q}\otimes (I_q-\tfrac{1}{q}\mathbf{1}_{q\times q}). \end{aligned}$$

In the case of asymptotically equal block sizes \(\gamma =1/s\), we have \(\xi ^*=\frac{1}{sq}\mathbf{1} _{sq}\), \(\Gamma =\frac{1}{s} I_{sq}\). \(\square \)

Let us now derive the rotated CLT, Theorem 5, from the CLT’s with degenerate multivariate Gaussian limit distribution.

Proof

(Proof of Theorem 5) The matrix \(\tilde{R}\in SO(q)\), where SO(q) stands for the special orthogonal group of dimension q, is given by

$$\begin{aligned} \tilde{R}=\left( \begin{matrix} 1-\frac{1}{q+\sqrt{q}} &{}-\frac{1}{q+\sqrt{q}} &{} \cdots &{} -\frac{1}{q+\sqrt{q}} &{} -\frac{1}{\sqrt{q}} \\ -\frac{1}{q+\sqrt{q}} &{}1-\frac{1}{q+\sqrt{q}} &{} \ddots &{} \vdots &{} \vdots \\ \vdots &{} \ddots &{} \ddots &{} -\frac{1}{q+\sqrt{q}} &{}\vdots \\ -\frac{1}{q+\sqrt{q}} &{} \cdots &{} -\frac{1}{q+\sqrt{q}} &{}1-\frac{1}{q+\sqrt{q}} &{} -\frac{1}{\sqrt{q}} \\ \frac{1}{\sqrt{q}}&{}\cdots &{}\cdots &{}\cdots &{}\frac{1}{\sqrt{q}} \end{matrix}\right) . \end{aligned}$$
(42)

By the definition of \(\mathcal S\) it follows from Theorem 1 or Theorem 2 that

$$\begin{aligned} \hat{m}=\mathcal R\sqrt{\mathcal S}(m-\tfrac{1}{q}\mathbf{1} _{sq})\Rightarrow \tilde{W} \end{aligned}$$

where we applied the continuous mapping theorem. The limit \(\tilde{W}\) has Gaussian distribution with covariance matrix \(\mathcal R\Sigma \mathcal R^T\). Using the Kronecker product structure \(\mathcal A=A\otimes I_q\) and \(\Gamma ={{{\,\mathrm{diag}\,}}}(\gamma )\otimes I_q\), we rewrite

$$\begin{aligned} \Sigma&=\big (\sqrt{\Gamma }\mathcal A\sqrt{\Gamma }\big )^{-1}\left( \Big (I_{sq}+\sqrt{\Gamma }\mathcal A\sqrt{\Gamma }\big ( I_s\otimes (\tfrac{1}{q^2}\mathbf{1}_{q\times q}-\tfrac{1}{q} I_q)\big )\Big )^{-1}-I_{sq}\right) \\&=\big (\sqrt{\Gamma }\mathcal A\sqrt{\Gamma }\big )^{-1}\Big (I_{sq}+\sqrt{\Gamma }\mathcal A\sqrt{\Gamma }\big ( I_s\otimes (\tfrac{1}{q^2}\mathbf{1}_{q\times q}-\tfrac{1}{q} I_q)\big )\Big )^{-1}\\ {}&\quad \times \sqrt{\Gamma }\mathcal A\sqrt{\Gamma }\big ( I_s\otimes (\tfrac{1}{q} I_q-\tfrac{1}{q^2}\mathbf{1}_{q\times q})\big ) \\&=\left( qI_{sq}-\sqrt{\Gamma }\mathcal A \sqrt{\Gamma }\left( I_s\otimes \big (I_q-\tfrac{1}{q}\mathbf{1}_{q\times q}\big )\right) \right) ^{-1}\cdot \left( I_s\otimes \left( I_q-\tfrac{1}{q}\mathbf{1}_{q\times q}\right) \right) \end{aligned}$$

From \(I_s\otimes \left( I_q-\tfrac{1}{q}\mathbf{1}_{q\times q}\right) \cdot \mathcal R^T= \mathcal R^T\) it follows

$$\begin{aligned}&\mathcal R\Sigma \mathcal R^T\cdot \left( qI_{s(q-1)}-{{{\,\mathrm{diag}\,}}(\gamma )}^{1/2}A {{{\,\mathrm{diag}\,}}(\gamma )}^{1/2}\otimes I_{q-1}\right) \\&\quad =\mathcal R \left( qI_{sq}-\sqrt{\Gamma }\mathcal A \sqrt{\Gamma }\left( I_s\otimes \big (I_q-\tfrac{1}{q}\mathbf{1}_{q\times q}\big )\right) \right) ^{-1}\\ {}&\qquad \times \mathcal R^T\cdot \left( qI_{s(q-1)}-{{{\,\mathrm{diag}\,}}(\gamma )}^{1/2}A {{{\,\mathrm{diag}\,}}(\gamma )}^{1/2}\otimes I_{q-1}\right) \\&\quad =\mathcal R\left( qI_{sq}-\sqrt{\Gamma }\mathcal A \sqrt{\Gamma }\left( I_s\otimes \big (I_q-\tfrac{1}{q}\mathbf{1}_{q\times q}\big )\right) \right) ^{-1}\\ {}&\qquad \cdot \left( q\mathcal R ^T-\sqrt{\Gamma }\mathcal A \sqrt{\Gamma }\left( I_s\otimes \big (I_q-\tfrac{1}{q}\mathbf{1}_{q\times q}\big )\right) \mathcal R^T\right) \\&\quad =\mathcal R\mathcal R ^T=I_{s}\otimes I_{q-1}=I_{s(q-1)}. \end{aligned}$$

On the other hand it holds \(\mathcal R\mathcal R ^T=I_{q-1}\otimes I_s=I_{(q-1)s}\), and thus we have actually shown that the covariance matrix is given by

$$\begin{aligned} \mathcal R\Sigma \mathcal R^T&=\left( qI_{s(q-1)}-{{{\,\mathrm{diag}\,}}(\gamma )}^{1/2}A {{{\,\mathrm{diag}\,}}(\gamma )}^{1/2}\otimes I_{q-1}\right) ^{-1}\nonumber \\&=\left( q-{{{\,\mathrm{diag}\,}}(\gamma )}^{1/2}A {{{\,\mathrm{diag}\,}}(\gamma )}^{1/2}\right) ^{-1}\otimes I_{q-1} \end{aligned}$$
(43)

and is non-singular. In the case of asymptotically equal block sizes, this equals \((q-A/s)^{-1}\otimes I_{q-1}\). \(\square \)

At last, we turn to the

Proof

(Proof of Theorem 6) We follow the steps of the proofs of the CLT’s. Let us first consider the usual magnetization and rotate it only in the last step, when it is necessary. We are interested in the asymptotics of the moment generating function at any \(t\in {{\,\mathrm{\mathbb {R}}\,}}^{sq}\), i.e.

$$\begin{aligned}&\int \exp \left[ N^{1-2\theta }t^T\big (Z+ \mathcal S^{\theta }(m-v)\big )\right] d{{\,\mathrm{\mathbb {P}}\,}}\\ {}&\quad =c_N \int \exp \left[ - N \phi \left( \xi ^*+\left( \frac{\mathcal S}{ N}\right) ^{1-\theta }\frac{x}{ N^{\theta }} \right) +N^{1-2\theta }t^Tx\right] d^{sq}x \end{aligned}$$

for \(v=N\mathcal S ^{-1}\xi ^*\) and \(Z\sim \mathcal N \left( 0,N( \mathcal S^{1-\theta }\mathcal A \mathcal S^{1-\theta })^{-1}\right) \). Repeating the tail estimate along the lines of (39) with the change of variables \(\xi =\xi ^*+(\mathcal S/N)^{1-\theta }x/N^{\theta }\), we obtain

$$\begin{aligned}&\int _{B^c_r(\xi ^*)} \exp \left[ - N\big (\phi (\xi )-\phi ^*\big )+t^TN^{1-\theta }(N\mathcal S ^{-1})^{1-\theta }(\xi -\xi ^*)\right] d^{sq}\xi \nonumber \\&\quad \le \int _{B^c_r(\xi ^*)\cap B_R(0)} \exp \left[ -N\delta +cN^{1-\theta } R^2\right] d^{sq}\xi +e^{N\phi ^*}\nonumber \\&\qquad \times \int _{B^c_r(\xi ^*)\cap B^c_R(0)} \exp \left[ -\frac{N}{3} \xi ^T\mathcal A\xi + c N^{1-\theta }\left\Vert \xi \right\Vert ^2\right] d^{sq}\xi \le Ce^{-\epsilon N} \end{aligned}$$
(44)

for some \(r,R,c,C,\delta ,\epsilon >0\). Let us denote \(J_N(x)=\left( \frac{\mathcal S}{N}\right) ^{1-\theta }H_\phi (\tilde{\xi })\left( \frac{\mathcal S}{N}\right) ^{1-\theta }\) for shorter notation, where \(\tilde{\xi }\) is the intermediate point between \(\xi ^*\) and \(\xi \) in the Lagrange remainder of the Taylor expansion (40). Analogously to the derivation (41), we obtain for all \(t\in {{\,\mathrm{\mathbb {R}}\,}}^{sq}\)

$$\begin{aligned}&\int \exp \left[ N^{1-2\theta }t^T\big (Z+ \mathcal S^{\theta }(m-v)\big )\right] d{{\,\mathrm{\mathbb {P}}\,}}\nonumber \\&\quad \sim c_N e^{- N\phi ^*}\int _{U_N} \exp \left[ -\frac{N^{1-2\theta }}{2} x^TJ_N(x)x+N^{1-2\theta }t^Tx\right] d^{sq}x\nonumber \\&\quad =c_Ne^{-N\phi ^*} \int _{U_N}\exp \left[ -\frac{N^{1-2\theta }}{2}(x-J_N(x)^{-1} t)^TJ_N(x)(x-J_N(x)^{-1} t)\right] \nonumber \\&\qquad \times \exp \left[ \frac{N^{1-2\theta }}{2}t^TJ_N(x)^{-1}t\right] d^{sq}x \end{aligned}$$
(45)

with \(U_N=(\mathcal S/N)^{\theta -1}B_{r N^{\theta }}(0)\). Suppose for a moment that \(J_N(x)\) can be replaced by its limit \(J_\infty =\lim _N J_N(x)=\Gamma ^{1-\theta } H_\phi (\xi ^*) \Gamma ^{1-\theta }\). Then it would follow from dominated convergence with an integrable majorant from (35) and \( U_N\rightarrow {{\,\mathrm{\mathbb {R}}\,}}^{sq}\) that

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{1}{N^{1-2\theta }}\log \int \exp \left[ N^{1-2\theta }t^T\big (Z+ \mathcal S^{\theta }(m-v)\big )\right] d{{\,\mathrm{\mathbb {P}}\,}}=\frac{1}{2} t^T (\Gamma ^{1-\theta } H_\phi (\xi ^*) \Gamma ^{1-\theta })^{-1}t. \end{aligned}$$
(46)

This is immediate in the previous case of \(\theta =1/2\), because the integrand in (45) was convergent. In order to justify this limit, we will split the integration area \(U_N\) of (45) into a compact set and the rest. First choose \(t=0\), perform a change of variables \(y=N^{1/2-\theta }x\) and use dominated convergence with an integrable majorant from (35), which implies the asymptotic

$$\begin{aligned} c_Ne^{-N\phi ^*}N^{(\theta -1/2)sq}= c+o(1). \end{aligned}$$
(47)

Hence the normalizing constants are negligible after taking the logarithm and dividing by \(N^{1-2\theta }\). It follows from (35) and continuity of \(H_\phi \) that \(\left\Vert H_\phi (\tilde{\xi })^{-1}-H_\phi (\xi ^*)^{-1}\right\Vert =o (1)\) uniformly for \(\xi \in B_{cN^{-\theta }}(\xi ^*)\), hence we have \(J_N(x)^{-1}=J_\infty ^{-1}+o(1)\) uniformly for x in a compact set, say \(x\in \bar{B}_\kappa (0)\). Thus,

$$\begin{aligned}&\int _{\bar{B}_\kappa (0)}\exp \left[ -\frac{N^{1-2\theta }}{2}(x-J_N(x)^{-1} t)^TJ_N(x)(x-J_N(x)^{-1} t)\right] \exp \left[ \frac{N^{1-2\theta }}{2}t^TJ_N(x)^{-1}t\right] d^{sq}x\nonumber \\&\quad =\exp \left[ \frac{N^{1-2\theta }}{2}t^T(J_\infty ^{-1}+o(1))t\right] \int _{\bar{B}_\kappa (0)}\nonumber \\&\qquad \times \exp \left[ -\frac{N^{1-2\theta }}{2}(x-(J_\infty ^{-1}+o(1)) t)^TJ_N(x)(x-(J_\infty ^{-1}+o(1)) t)\right] d^{sq}x \end{aligned}$$
(48)

with the integral being of order \(I_N\asymp (N^{(\theta -1/2)sq})\), similar to (47). It follows from (35) and continuity of \(H_\phi \) that \(\left\Vert H_\phi (\tilde{\xi })^{-1}-H_\phi (\xi ^*)^{-1}\right\Vert =\mathcal O (1)\) uniformly for \(\xi \in B_r(0)\), hence after all we have \(J_N(x)^{-1}=J_\infty ^{-1}+\mathcal O (1)\) uniformly for \(x\in U_N\). Thus we can bound

$$\begin{aligned}&\int _{U_N\setminus \bar{B}_\kappa (0)}\exp \left[ -\frac{N^{1-2\theta }}{2}(x-J_N(x)^{-1} t)^TJ_N(x)(x-J_N(x)^{-1} t)\right] \exp \left[ \frac{N^{1-2\theta }}{2}t^TJ_N(x)^{-1}t\right] d^{sq}x\\&\quad \lesssim \int _{U_N\setminus \bar{B}_\kappa (0)}\exp \left[ -c N^{1-2\theta }\left\Vert x-J_\infty ^{-1}t+t\mathcal O (1)\right\Vert ^2\right] \exp \left[ \frac{\varrho }{2} N^{1-2\theta } \left\Vert t\right\Vert ^2\right] d^{sq}x, \end{aligned}$$

where \(c>0\) comes from the positive definiteness shown in Lemma 15 and \(\varrho =\mathcal O (1)\) is bounded by the spectral radius of \(J_N(x)=J_\infty +\mathcal O (1)\). Choosing \(\kappa >0\) sufficiently large, it follows

$$\begin{aligned}&\int _{U_N\setminus \bar{B}_\kappa (0)}\exp \left[ -c N^{1-2\theta }\left\Vert x-J_\infty ^{-1}t+t\mathcal O (1)\right\Vert ^2\right] \exp \left[ \frac{\varrho }{2} N^{1-2\theta } \left\Vert t\right\Vert ^2\right] d^{sq}x\\&\lesssim \int _{U_N\setminus \bar{B}_\kappa (0)}\exp \left[ -\frac{c}{2} N^{1-2\theta }\left\Vert x\right\Vert ^2\right] d^{sq}x\lesssim N^{sq\theta }\exp \left[ -\frac{c}{2} N^{1-2\theta }\kappa ^2\right] =o(1). \end{aligned}$$

Therefore, combining this part, (48) and (47), taking the logarithm and dividing by \(N^{1-2\theta }\), we have shown

$$\begin{aligned}&\frac{1}{N^{1-2\theta }}\log \left( c_N e^{- N\phi ^*}\int _{U_N} \exp \left[ -\frac{N^{1-2\theta }}{2} x^TJ_N(x)x+N^{1-2\theta }t^Tx\right] d^{sq}x\right) \\&=\frac{1}{N^{1-2\theta }}\log \left[ cN^{(1/2-\theta )sq}\left( \exp \left[ \frac{N^{1-2\theta }}{2}t^T(J_\infty ^{-1}+o(1))t\right] \cdot I_N+o(1) \right) \right] \\&= \frac{1}{2} t^T (\Gamma ^{1-\theta } H_\phi (\xi ^*) \Gamma ^{1-\theta })^{-1}t+o(1). \end{aligned}$$

Consequently, we have shown that (46) holds.

The moment generating function of \(\sqrt{N^{1-2\theta }}Z\) equals \(\exp [\tfrac{1}{2}t^TN( (\mathcal S/N)^{1-\theta }\mathcal A (\mathcal S/N)^{1-\theta })^{-1}t]\), hence

$$\begin{aligned}&\frac{1}{N^{1-2\theta }}\log {{\,\mathrm{\mathbb {E}}\,}}\left( \exp [N^{1-2\theta }t^T Z]\right) \\ {}&\quad =\tfrac{1}{2}t^TN( (\mathcal S/N)^{1-\theta }\mathcal A (\mathcal S/N)^{1-\theta })^{-1}t\rightarrow \tfrac{1}{2} t^T (\Gamma ^{1-\theta }\mathcal A\Gamma ^{1-\theta })^{-1}t \end{aligned}$$

as \(N\rightarrow \infty \). Thus we obtain

$$\begin{aligned}&\lim _{N\rightarrow \infty }\frac{1}{N^{1-2\theta }}\log \int \exp \left[ N^{1-2\theta }t^T \mathcal S^{\theta }(m-v)\right] d{{\,\mathrm{\mathbb {P}}\,}}\\ {}&\quad =\frac{1}{2} t^T \left( \Gamma ^{\theta -1} (H_\phi (\xi ^*)^{-1}-\mathcal A ^{-1}) \Gamma ^{\theta -1}\right) t. \end{aligned}$$

Now, consider \(t=\mathcal R ^T \tilde{t}\) and we arrive at

$$\begin{aligned}&\lim _{N\rightarrow \infty }\frac{1}{N^{1-2\theta }}\log \int \exp \left[ N^{1-2\theta }\tilde{t}^T\mathcal R\mathcal S^{\theta }(m-v)\right] d{{\,\mathrm{\mathbb {P}}\,}}\\ {}&\quad =\frac{1}{2} \tilde{t}^T \mathcal R\left( \Gamma ^{\theta -1} (H_\phi (\xi ^*)^{-1}-\mathcal A ^{-1}) \Gamma ^{\theta -1}\right) \mathcal R ^T \tilde{t}=:\Lambda ^*(\tilde{t}). \end{aligned}$$

Taking the same route as in the proof of Theorem 5, we obtain

$$\begin{aligned}&\mathcal R\left( \Gamma ^{\theta -1} (H_\phi (\xi ^*)^{-1}-\mathcal A ^{-1}) \Gamma ^{\theta -1}\right) \mathcal R ^T\\ {}&\quad =\left( q{{{\,\mathrm{diag}\,}}(\gamma )}^{1-2\theta }-{{{\,\mathrm{diag}\,}}(\gamma )}^{1-\theta }A {{{\,\mathrm{diag}\,}}(\gamma )}^{1-\theta }\right) ^{-1}\otimes I_{q-1} \end{aligned}$$

as in (43), where we omit the analogous details. In particular this matrix is positive definite and hence the quadratic form \(\Lambda ^*\) is an essentially smooth function. The Gärtner–Ellis Theorem [7, Theorem 2.3.6] now provides the large deviation principle of \(\mathcal S^{\theta }(m-N\mathcal S^{-1}\xi ^*)\) with speed \(N^{1-2\theta }\) and good rate function given by the Legendre transform \((\Lambda ^*)^*=\Lambda \). This is given by the quadratic form of half the inverse matrix and we wrote t instead \(\tilde{t}\) in the statement of the theorem. \(\square \)