1 Introduction

For a graph \(G=(V,E)\) the partition function of the random cluster model is defined by

$$\begin{aligned} Z_G(q,w)=\sum _{A\subseteq E(G)}q^{k(A)}w^{|A|}, \end{aligned}$$

where k(A) denotes the number of connected components of the graph (VA). In many papers, one uses the parametrization \(w=e^{\beta }-1\).

When q is a positive integer, then \(Z_G(q,w)\) is also the partition function of the Potts-model with q spins, moreover there is a natural coupling between the two models, see for example [16]. In this paper we call a model a spin model with r spins if there is an \(r\times r\) symmetric matrix N and a vector \(\underline{\mu }\in \mathbb {R}^r\) such that for a graph \(G=(V,E)\) the probability of a \(\sigma : V\rightarrow \{1,2,\dots ,r\}\) is

$$\begin{aligned} \mathbb {P}(\sigma )=\frac{1}{Z_G(N,\underline{\mu })}\prod _{v\in V}\mu _{\sigma (v)}\prod _{(u,v)\in E(G)}N_{\sigma (u),\sigma (v)}, \end{aligned}$$

where with the notation \([r]=\{1,2,\dots ,r\}\) we have

$$\begin{aligned} Z_G(N,\underline{\mu })=\sum _{\sigma :V\rightarrow [r]}\prod _{v\in V}\mu _{\sigma (v)} \prod _{(u,v)\in E(G)}N_{\sigma (u),\sigma (v)}. \end{aligned}$$

In both expressions the second product is over the edge set E(G), the symmetricity of N ensures that the expression is well-defined. The quantity \(Z_G(N,\underline{\mu })\) is the partition function of the model. In case of the Potts-model we have \(r=q\) and \(N=J_q+wI_q\), where \(J_q\) is the \(q\times q\) matrix consisting of 1’s and \(I_q\) is the \(q\times q\) identity matrix. The vector \(\underline{\mu }\) is the constant 1 vector in this case. In general, if \(\underline{\mu }\) is the constant 1 vector, then we will simply write \(Z_G(N)\) instead of \(Z_G(N,\underline{\mu })\).

Let v(G) denote the number of vertices of a graph G. In this paper we study the quantity

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{1}{v(G_n)}\ln Z_{G_n}(q,w) \end{aligned}$$

when \((G_n)_n\) is an essentially large girth sequence of d-regular graphs. A graph sequence \((G_n)_n\) is called essentially large girth if for all g we have \(\lim _{n\rightarrow \infty }\frac{L(G_n,g)}{v(G_n)}=0\), where L(Gg) denotes the number of cycles of length at most \(g-1\). It is known that a sequence of random d-regular graphs is essentially large girth graph sequence with probability one (see for instance [19]). So the problems of determining \(\lim _{n\rightarrow \infty } \frac{1}{v(G_n)}\mathbb {E}\ln Z_{G_n}(q,w)\) or \(\lim _{n\rightarrow \infty } \frac{1}{v(G_n)}\ln \mathbb {E}Z_{G_n}(q,w)\) for random d-regular graph sequence \((G_n)_n\) are very strongly related to this question. In fact, it will turn out that all these limits are the same. The main theorem of this paper is the following.

Theorem 1.1

If \((G_n)_n\) is an essentially large girth sequence of d-regular graphs, then the limit

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{1}{v(G_n)}\ln Z_{G_n}(q,w)=\ln \Phi _{d,q,w} \end{aligned}$$

exists for \(q\ge 2\) and \(w\ge 0\). The quantity \(\Phi _{d,q,w}\) can be computed as follows. Let

$$\begin{aligned} \Phi _{d,q,w}(t):= & {} \left( \sqrt{1+\frac{w}{q}}\cos (t)+\sqrt{\frac{(q-1)w}{q}}\sin (t)\right) ^{d}\\{} & {} +\,(q-1) \left( \sqrt{1+\frac{w}{q}}\cos (t)-\sqrt{\frac{w}{q(q-1)}}\sin (t)\right) ^{d}, \end{aligned}$$

then

$$\begin{aligned} \Phi _{d,q,w}:=\max _{t\in [-\pi ,\pi ]}\Phi _{d,q,w}(t). \end{aligned}$$

The same conclusion holds true with probability one for a sequence of random d-regular graphs.

The quantity \(\Phi _{d,q,w}\) has various alternative descriptions. As far as we know the description in the theorem is new even for the Potts model. There is a critical value \(w_c(d,q)\) such that if \(0\le w\le w_c(d,q)\), then \(\Phi _{d,q,w}(t)\) is maximized at \(t=0\), and so \(\Phi _{d,q,w}=q\left( 1+\frac{w}{q}\right) ^{d/2}\), and for \(w>w_c(d,q)\) we have \(\Phi _{d,q,w}>q\left( 1+\frac{w}{q}\right) ^{d/2}\). Moreover, if \(q>2\), then \(\frac{\partial }{\partial w }\Phi _{d,q,w}\) is discontinuous at \(w_c(q)\), that is, there is a first order phase transition at \(w_c(q)\). We will see that

$$\begin{aligned} w_c(d,q)=\frac{q-2}{(q-1)^{1-2/d}-1}-1. \end{aligned}$$

For random d-regular graphs and the Potts model this was established by Galanis et al. [15]. For not necessarily integer \(q\ge 2\) this was conjectured by Helmuth et al. [17]. We will also prove that for any d-regular graph G we have

$$\begin{aligned} Z_{G}(q,w)\ge \Phi _{d,q,w}^{v(G)}, \end{aligned}$$

and we also show that if G contains \(\varepsilon v(G)\) cycles of length at most g for some fixed \(\varepsilon \) and g, then \(Z_{G}(q,w)\) is exponentially larger than that bound. Related results were obtained by Ruozzi [22].

1.1 Related works

There has been a lot of work on Potts model and random cluster model both on random regular graphs and essentially large girth graph sequences. The papers mentioned below treat various related problems, but we only mention the results that are directly related to Theorem 1.1.

The case \(q=2\) and \(w\ge 0\) is the so-called ferromagnetic Ising-model. In this case, Dembo and Montanari [12] proved that \(Z_{G_n}(2,w)^{1/v(G_n)}\) converges if \((G_n)_n\) is an essentially large girth sequence of d-regular graphs. In fact, they proved a significantly more general theorem about essentially large girth graphs that are not necessarily regular. Note that they use the terminology locally tree-like for what we use essentially large girth. When q is a positive integer and \(w\ge 0\), that is, in the case of the ferromagnetic Potts model, Dembo et al. [14] proved the convergence of \(Z_{G_n}(q,w)^{1/v(G_n)}\) for essentially large girth sequence of d-regular graphs for every d except when w belongs to a certain interval \((w_0,w_1)\). Later Dembo et al. [13] proved the convergence of \(Z_{G_n}(q,w)^{1/v(G_n)}\) for essentially large girth sequence of d-regular graphs, when d is even, q is a positive integer and \(w\ge 0\) even if \(w\in (w_0,w_1)\). Very recently, Helmuth et al. [17] proved the convergence of \(Z_{G_n}(q,w)^{1/v(G_n)}\) for essentially large girth sequence of d-regular graphs for large (not necessarily integer) q and \(w\ge 0\) with the additional hypothesis that \((G_n)_n\) satisfies some expansion condition. This line of research using cluster expansion and an expansion property of \((G_n)_n\) was extended by Carlson et al. [7] and by Carlson et al. [8] for the Potts model. Ferromagnetic Potts models on random regular graphs were also studied by Galanis et al. [15].

Our theorem does not cover the case when \(w<0\). When q is a positive integer and \(w=-1\), then \(Z_G(q,-1)\) counts the number of proper colorings of the graph G. This case was treated by Bandyopadhyay and Gamarnik [2]. They showed that if \(q\ge d+1\), then for an essentially large girth graph sequence of d-regular graphs \((G_n)_n\) we have

$$\begin{aligned} \lim _{n\rightarrow \infty }Z_{G_n}(q,-1)^{1/v(G_n)}=q\left( 1-\frac{1}{q}\right) ^{d/2}. \end{aligned}$$

Their result was extended for integer \(q\ge 2\Delta \) and \(w\ge -1\) by Borgs et al. [6] for general Benjamini–Schramm convergent graph sequences, where \(\Delta \) is a bound on the degrees of all \(G_n\) in the sequence. The result of Bandyopadhyay and Gamarnik [2] was extended for not necessarily integer \(q\ge 8\Delta \) and \(w=-1\) by Abért and Hubai [1] also for arbitrary Benjamini–Schramm convergent graph sequences. Csikvári and Frenkel [11] showed that the same conclusion holds true for every fixed \(w\ge 0\) and q sufficiently large in terms of w and \(\Delta \). The partition function of the random cluster model is strongly related to the so-called Tutte polynomial [25]. For a graph \(G=(V,E)\) the Tutte polynomial \(T_G(x,y)\) is defined by

$$\begin{aligned} T_G(x,y)=\sum _{A\subseteq E}(x-1)^{k(A)-k(E)}(y-1)^{k(A)+|A|-v(G)}. \end{aligned}$$

The connection between the Tutte polynomial and the random cluster model is

$$\begin{aligned} T_G(x,y)=(x-1)^{-k(E)}(y-1)^{-v(G)}Z_G((x-1)(y-1),y-1). \end{aligned}$$

Bencs and Csikvári [4] proved that for an essentially large girth sequence of d-regular graphs \((G_n)_n\) we have

$$\begin{aligned} \lim _{n\rightarrow \infty }T_{G_n}(x,y)^{1/v(G_n)}= \left\{ \begin{array}{lc} (d-1)\left( \frac{(d-1)^2}{(d-1)^2-x}\right) ^{d/2-1}&{} \ \ \text{ if }\ x\le d-1,\\ x\left( 1+\frac{1}{x-1}\right) ^{d/2-1} &{}\ \ \text{ if }\ x> d-1 \end{array}\right. \end{aligned}$$

for \(x\ge 1\) and \(0\le y\le 1\). The next theorem summarizes the known results for the Tutte polynomial for non-negative xy.

Theorem 1.2

Let \((G_n)_n\) be an essentially large girth sequence of d-regular graphs. Then the limit

$$\begin{aligned} \lim _{n\rightarrow \infty }T_{G_n}(x,y)^{1/v(G_n)}=t_d(x,y) \end{aligned}$$

exists if xy satisfy one the following conditions

  1. (i)

    (Theorem 1.1) \((x-1)(y-1)\ge 2\) and \(y>1\),

  2. (ii)

    (see Sect. 4) \(x\ge d-1\) and \(y\ge 0\),

  3. (iii)

    (Bencs and Csikvári [4]) \(x\ge 1\) and \(0\le y\le 1\).

Figure 1 depicts the regions described in Theorem 1.2.

Fig. 1
figure 1

The investigated parameters of the article (in blue). For description of the related regions (in gray) see Sect. 4. The dashed lines are \(x=d-1\) and the phase transition parametrized in xy

1.2 Plan of the paper

This paper has essentially two parts. In the first part we show that \(Z_G(q,w)\) can be approximated by the partition function of a 2-spin model for essentially large girth graphs, namely \(Z_G(q,w)\approx Z_G(M'_2,\underline{\nu }_2)\), where

$$\begin{aligned} M'_2=\left( \begin{array}{cc} 1+w &{} 1 \\ 1 &{} 1+\frac{w}{q-1} \end{array}\right) \ \ \ \text {and}\ \ \ \underline{\nu }_2= \left( \begin{array}{c} 1 \\ q-1 \end{array}\right) . \end{aligned}$$

The precise statement is Theorem 2.4. In this theorem we do not use that G is regular so we believe that this statement is very useful for studying random cluster model on other essentially large girth graphs like Erdős–Rényi random graphs \(G(n,\frac{c}{n})\). This statement implies that for an essentially large girth sequence of d-regular graphs \((G_n)_n\) we have

$$\begin{aligned} \lim _{n\rightarrow \infty } \frac{1}{v(G_n)}\ln Z_{G_n}(q,w)=\lim _{n\rightarrow \infty } \frac{1}{v(G_n)}\ln Z_{G_n}(M'_2,\underline{\nu }_2). \end{aligned}$$

At that point one can simply cite a theorem of Sly and Sun [23, 24] (relying on a theorem of Dembo and Montanari [12]) that shows that the aforementioned limit exists since \(\det (M'_2)>0\). Indeed, Sly and Sun [23, 24] proved that for regular graphs any 2-spin model \((N,\underline{\mu })\) having a positive determinant is equivalent with a ferromagnetic Ising model. Dembo and Montanari [12] showed that for the ferromagnetic Ising-model the limit indeed exists. In these papers the main technique is an abstract interpolation method. Nevertheless, in this paper we do not rely on these papers, instead we build out a little theory for “ferromagnetic” 2-spin models that builds on Lee–Yang theory [18, 26, 27] and the gauge theory of Chertkov and Chernyak [9, 10]. This approach has some additional gains. First of all, it shows that the limit exists not only for essentially large girth sequence of d-regular graphs but for Benjamini–Schramm convergent sequence of d-regular graphs (for the definition of a Benjamini–Schramm convergent graph sequence see Definition 3.27, for the precise statement see Theorem 3.29). Secondly, we prove a stability theorem that shows that if a d-regular graph contains a linear number of short cycles, that is, it contains at least \(\varepsilon v(G)\) cycles of length at most g for some fixed \(\varepsilon \) and g, then both \(Z_G(q,w)\) and \(Z_G(N,\underline{\mu })\) are exponentially larger than the number obtained for an essentially large girth sequence of d-regular graphs (for the precise statement see Theorem 3.41 and the remark after the theorem).

So the second part of the paper is an elaborate analysis of the 2-spin model \((N,\underline{\mu })\), where N is a positive definite \(2\times 2\) matrix with positive entries and \(\underline{\mu }\) is a vector in \(\mathbb {R}^2\) with positive entries. We will show that there exists a quantity \(\Phi _d(N,\underline{\mu })\) such that for every essentially large girth sequence of d-regular graphs \((G_n)_n\) we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{v(G_n)}\ln Z_{G_n}(N,\underline{\mu })=\ln \Phi _d(N,\underline{\mu }). \end{aligned}$$

To analyse these models we use a strategy that can be of independent interest. The idea is that we can associate many different polynomials to the same computational problem and the zeros of one of these polynomials satisfy Lee–Yang theorem, that is, they are on a circle. Indeed, for a d-regular graph G let us introduce the following polynomials [5, 26]

$$\begin{aligned} F_G(x_0,\dots ,x_d)=\sum _{A\subseteq E}\left( \prod _{v\in V} x_{d_A(v)}\right) , \end{aligned}$$

and a bit more generally,

$$\begin{aligned} F_G(x_0,\dots ,x_d|z)=\sum _{A\subseteq E}\left( \prod _{v\in V} x_{d_A(v)}\right) z^{2|A|}=F_G(x_0,x_1z,x_2z,...,x_dz^d). \end{aligned}$$

We call \(F_G(x_0,\dots ,x_d)\) and \(F_G(x_0,\dots ,x_d|z)\) the subgraph counting polynomial.

We show that if N is a \(2\times 2\) positive definite matrix, then there are vectors \(\underline{v}(t)\in \mathbb {R}^{d+1}\) for each \(t\in [0,2\pi ]\) such that \(F_G(\underline{v}(t))=Z_G(N,\underline{\mu }).\) We will use these polynomials for two different things. First we show that there exists a \(t_1\) such that the zeros of \(F_G(\underline{v}(t_1)|z)\) lie on a circle for all d-regular graphs G. This will enable us to use a standard technique about limits of root measures. More details about this plan can be found in the introduction of Sect. 3. The second application is that there is a \(t_0\) such that the first coordinate of \(\underline{v}(t_0)\) is exactly \(\Phi _{d}(N,\underline{\mu })\) and all other coordinates have a nice sign structure. This will enable us to prove the aforementioned stability theorem for graphs containing a linear number of short cycles.

Notations. Given a graph \(G=(V,E)\) we use the notation v(G) and e(G) for the number of vertices and edges, respectively. Given a set \(S\subseteq V\) let E(S) denote the edges induced by S, that is, \(\{(u,v)\in E(G)\ |\ u,v\in S\}\) and let \(e(S)=|E(S)|\). Let G[S] denote the induced subgraph with vertex set S and edge set E(S). Similarly, \(e(V-S)\) is the number of edges induced by \(V\setminus S\), and \(G-S\) denotes the subgraph induced by \(V\setminus S\).

For an \(A\subseteq E(G)\) and a \(v\in V\) let \(d_A(v)\) denote the degree of the vertex v in the graph (VA), that is, the number of edges of A incident to v.

For an \(A\subset E(G)\) and an \(S\subseteq V(G)\) let \(A\llbracket S\rrbracket =\{(u,v)\in A\ |\ u,v\in S\}\), so these are the edges of A that are induced by S.

Given graphs H and G let \(\hom (H,G)\) denote the number of homomorphisms from H to G, that is, the number of maps \(\varphi :V(H)\rightarrow V(G)\) such that \((\varphi (u),\varphi (v))\in E(G)\) whenever \((u,v)\in E(H)\).

The notation [q] stands for the set \(\{1,2,\dots ,q\}\). We denote the scalar products of vectors \(\underline{x}\) and \(\underline{y}\) by \(\langle \underline{x},\underline{y}\rangle \).

This paper is organized as follows. In the next section we introduce the rank 1 and rank 2 approximations of \(Z_G(q,w)\) and study its basic properties. In Sect. 3 we study the rank 2 approximation or more generally, \(Z_G(N,\underline{\mu })\). We end the paper with some remarks about the case \(1<q<2\).

2 Approximations

In this section we introduce various approximations of the partition function of the random cluster model. In the sequel the rank 2 approximation will be especially important for us.

2.1 Rank 1 approximation

For motivational purposes let us assume for a moment that q is a positive integer. Then it is known that

$$\begin{aligned} Z_G(q,w)=Z_G(M), \end{aligned}$$

where M is the \(q\times q\) matrix with entries \(1+w\) in the diagonal and 1’s as off-diagonal elements. It is a natural idea to approximate M with the rank 1 matrix \(M_1\) such that the sum of all entries of M and \(M_1\) are equal. In other words, let \(M_1\) be the \(q\times q\) matrix with entries \(1+\frac{w}{q}\) everywhere. Note that by the definition of \(Z_G(M_1)\) we have

$$\begin{aligned} Z_G(M_1)=q^{v(G)}\left( 1+\frac{w}{q}\right) ^{e(G)}. \end{aligned}$$

Let us call the quantity

$$\begin{aligned} Z^{(1)}_G(q,w)=q^{v(G)}\left( 1+\frac{w}{q}\right) ^{e(G)} \end{aligned}$$

the rank 1 approximation of \(Z_G(q,w)\). This quantity makes sense even if q is positive, but not necessarily integer and we will refer to it as the rank 1 approximation of \(Z_G(q,w)\) even in this case.

Lemma 2.1

If \(q\ge 1\), then

$$\begin{aligned} Z_G(q,w)\ge Z^{(1)}_G(q,w). \end{aligned}$$

If \(0< q\le 1\), then

$$\begin{aligned} Z_G(q,w)\le Z^{(1)}_G(q,w). \end{aligned}$$

Proof

Using the fact that \(k(A)\ge v(G)-|A|\) for an \(A\subseteq E(G)\) we get that for \(q\ge 1\) we have

$$\begin{aligned} Z_G(q,w)=\sum _{A\subseteq E(G)}q^{k(A)}w^{|A|}\ge \sum _{A\subseteq E(G)}q^{v(G)-|A|}w^{|A|}=q^{v(G)}\left( 1+\frac{w}{q}\right) ^{e(G)}. \end{aligned}$$

For \(q\le 1\) we have the opposite inequality in the above computation. \(\square \)

Lemma 2.1 implies that for any d–regular graph G and \(q>1\) we have

$$\begin{aligned} Z_G(q,w)^{1/v(G)}\ge q\left( 1+\frac{w}{q}\right) ^{d/2}. \end{aligned}$$

For \(q<1\) the same quantity is an upper bound. Note that the very same quantity appears as \(\Phi _{d,q,w}(0)\) in Theorem 1.1.

2.2 Rank 2 approximation

What is better than a rank 1 approximation? Naturally, a rank 2 approximation.

Again for motivational purposes let us assume for a moment that \(q\ge 2\) is an integer. This time let us approximate the matrix M with the following rank 2 matrix \(M_2\).

$$\begin{aligned} M_2=\left( \begin{array}{cccc} 1+w &{} 1 &{} \ldots &{} 1 \\ 1 &{} 1+\frac{w}{q-1} &{} \ldots &{} 1+\frac{w}{q-1} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ 1 &{} 1+\frac{w}{q-1} &{} \ldots &{} 1+\frac{w}{q-1} \end{array}\right) . \end{aligned}$$

Then

$$\begin{aligned} Z_G(M_2)=\sum _{S\subseteq V}(1+w)^{e(S)}(q-1)^{v(G)-|S|}\left( 1+\frac{w}{q-1}\right) ^{e(G-S)}. \end{aligned}$$

Indeed, let \(S=\varphi ^{-1}(1)\) in the definition of \(Z_G(M_2)\). Let us introduce the quantity

$$\begin{aligned} Z^{(2)}_G(q,w)=\sum _{S\subseteq V}(1+w)^{e(S)}(q-1)^{v(G)-|S|}\left( 1+\frac{w}{q-1}\right) ^{e(G-S)}. \end{aligned}$$

The definition of \(Z^{(2)}_G(q,w)\) makes perfect sense if \(q> 1\), but not necessarily integer and we will refer to it as the rank 2 approximation of \(Z_G(q,w)\). Recall that

$$\begin{aligned} M'_2=\left( \begin{array}{cc} 1+w &{} 1 \\ 1 &{} 1+\frac{w}{q-1} \end{array}\right) \ \ \ \text {and}\ \ \ \underline{\nu }_2=\left( \begin{array}{c} 1 \\ q-1 \end{array}\right) , \end{aligned}$$

and note that

$$\begin{aligned} Z^{(2)}_G(q,w)=Z_G(M'_2,\underline{\nu }_2) \end{aligned}$$

even if q is not an integer.

This time it is less clear that it is a natural approximation, but as it will turn out this is an asymptotically precise approximation for essentially large girth graphs if \(q\ge 2\) and \(w\ge 0\). We can prove it through a series of lemmas.

Lemma 2.2

We have

$$\begin{aligned} Z_G(q,w)=\sum _{S\subseteq V}(1+w)^{e(S)}Z_{G-S}(q-1,w). \end{aligned}$$

Proof

This identity is trivially true for positive integer q using the interpretation of \(Z_G(q,w)\) as the partition function of the Potts-model. Since we have polynomials on both sides we get that it is true for all q and w. \(\square \)

Lemma 2.3

For \(q\ge 2\) we have

$$\begin{aligned} Z_G(q,w)\ge Z^{(2)}_G(q,w). \end{aligned}$$

For \(1< q\le 2\) we have

$$\begin{aligned} Z_G(q,w)\le Z^{(2)}_G(q,w). \end{aligned}$$

Proof

By Lemma 2.2 we have

$$\begin{aligned} Z_G(q,w)=\sum _{S\subseteq V}(1+w)^{e(S)}Z_{G-S}(q-1,w). \end{aligned}$$

By the definitions of \(Z^{(2)}_G(q,w)\) and \(Z^{(1)}_G(q,w)\) we have

$$\begin{aligned} Z^{(2)}_G(q,w)=\sum _{S\subseteq V}(1+w)^{e(S)}Z^{(1)}_{G-S}(q-1,w). \end{aligned}$$

Now the claim follows by Lemma 2.1\(\square \)

Now we are ready to prove that the rank 2 approximation is asymptotically precise for essentially large girth graphs if \(q\ge 2\) and \(w\ge 0\).

Theorem 2.4

Let G be a graph on n vertices with \(L=L(G,g)\) cycles of length at most \(g-1\). Let \(q\ge 2\). Then

$$\begin{aligned} Z^{(2)}_G(q,w)\le Z_G(q,w)\le q^{n/g+L}Z^{(2)}_G(q,w). \end{aligned}$$

Proof

The lower bound was already proven in Lemma 2.3. So we only need to prove the upper bound.

Given \(A\subseteq E(G)\) we can decompose A as follows. Let \(V_1,\dots ,V_r\) be the vertex sets of the connected components of the graph \(H=(V,A)\), and let \(A_1,\dots ,A_r\) be the corresponding subsets of A. If \(V_i\) is an isolated vertex, then \(A_i=\emptyset \).

Let us say that \(V_i\) is small if the induced graph \(G[V_i]\) does not contain a cycle. In particular, \(A_i\) does not contain a cycle either. Note that it is possible that \(A_i\) does not contain a cycle, but the induced graph \(G[V_i]\) contains a cycle, and so \(V_i\) is not small. Let \(\mathcal {S}_A\) denote the set of small \(V_i\)’s. We say that \(V_i\) is large if it is not small, and we denote by \(\mathcal {L}_A\) the set of large \(V_i\)’s. Note that \(|\mathcal {L}_A|\le n/g+L\) since each large connected component has size at least g or it contains a cycle of length at most \(g-1\).

Finally, let us say that a vertex set R is compatible with A if R is the union of some small \(V_i\)’s. Note that R may be the empty set. We denote this relation by \(R\sim A\). Furthermore, let \(A\llbracket R\rrbracket \) be the edges of A induced by the vertex set R. Note that if \(R\sim A\), then \(A\llbracket R\rrbracket \) is a forest. On the other hand, there is no restriction on \(A\llbracket V\setminus R\rrbracket .\) Figure 2 depicts an example for the introduced concepts.

Fig. 2
figure 2

A subgraph A is depicted with thick edges. There are 4 components. The edge sets with connected components of size 3 and 4 belong to \(A_{\ell }\). The edge sets with connected components of size 1 and 2 belong to \(A_s\). A compatible set R is either the vertex set of the latter components or the empty set or the union of these two connected components

Let \(k(R,A\llbracket R \rrbracket )\) denote the number of connected components of the graph \((R,A\llbracket R \rrbracket )\). By the binomial identity we have

$$\begin{aligned} q^{|\mathcal {S}_A|}=((q-1)+1)^{|\mathcal {S}_A|}=\sum _{R\sim A}(q-1)^{k(R,A\llbracket R \rrbracket )}. \end{aligned}$$

Then

$$\begin{aligned} Z_G(q,w)&=\sum _{A \subseteq E(G)}q^{k(A)}w^{|A|}\\&=\sum _{A \subseteq E(G)}q^{|\mathcal {S}_A|+|\mathcal {L}_A|}w^{|A|}\\&\le q^{n/g+L}\sum _{A \subseteq E(G)}q^{|\mathcal {S}_A|}w^{|A|}\\&=q^{n/g+L}\sum _{A \subseteq E(G)}\sum _{R: R\sim A}(q-1)^{k(R,A\llbracket R\rrbracket )}w^{|A|}\\&=q^{n/g+L}\sum _{R \subseteq V(G)}\sum _{A: R\sim A} (q-1)^{k(R,A\llbracket R\rrbracket )}w^{|A\llbracket R\rrbracket |+|A\llbracket V\setminus R\rrbracket |}\\&=q^{n/g+L}\sum _{R \subseteq V(G)}(1+w)^{e(V\setminus R)}\sum _{D}(q-1)^{k(R,D)}w^{|D|}, \end{aligned}$$

where in the last sum, \(D=A\llbracket R\rrbracket \) is a subset of the edges induced by R such that none of the induced connected components contains a cycle. Then

$$\begin{aligned} \sum _{D}(q-1)^{k(R,D)}w^{|D|}=\sum _{D}(q-1)^{|R|-|D|}w^{|D|}\le (q-1)^{|R|}\left( 1+\frac{w}{q-1}\right) ^{e(R)}. \end{aligned}$$

Hence

$$\begin{aligned} Z_G(q,w)\le q^{n/g+L}\sum _{R \subseteq V(G)}(1+w)^{e(V\setminus R)}Z^{(1)}_{G[R]}(q-1,w), \end{aligned}$$

that is

$$\begin{aligned} Z_G(q,w)\le q^{n/g+L}Z^{(2)}_G(q,w). \end{aligned}$$

\(\square \)

The following theorem is an immediate consequence of Theorem 2.4.

Theorem 2.5

Let \(q\ge 2\) and \(w\ge 0\). Let \((G_n)_n\) be an essentially large girth sequence of d-regular graphs. If the limit

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{v(G_n)}\ln Z^{(2)}_{G_n}(q,w) \end{aligned}$$

exists, then the limit

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{v(G_n)}\ln Z_{G_n}(q,w) \end{aligned}$$

exists too, and they have the same value.

3 Ferromagnetic 2-Spin Models

In this section we analyze the rank 2 approximation of the random cluster model. Since \(Z^{(2)}_G(q,w)=Z_G(M'_2,\underline{\nu }_2)\) for a \(2\times 2\) matrix \(M'_2\) we will actually prove that if N is a \(2\times 2\) positive definite matrix with positive entries and \(\underline{\mu }=(\mu _1,\mu _2)\) is a positive vector, then

$$\begin{aligned} \lim _{n\rightarrow \infty }Z_{G_n}(N,\underline{\mu })^{1/v(G_n)}=\Phi _{d}(N,\underline{\mu }) \end{aligned}$$

exists for every essentially large girth sequence of d-regular graphs \((G_n)_n\). In fact, we will prove a much stronger theorem about Benjamini–Schramm convergent graph sequences.

The plan is to connect the quantity \(Z_G(N,\underline{\mu })\) with Lee–Yang theory. This connection is built out through the so-called subgraph counting polynomial (see [26]).

3.1 Subgraph counting polynomial

From now on we always assume that G is a d-regular graph.

Let us introduce the so-called subgraph counting polynomial

$$\begin{aligned} F_G(x_0,\dots ,x_d)=\sum _{A\subseteq E}\left( \prod _{v\in V} x_{d_A(v)}\right) , \end{aligned}$$

and a bit more generally,

$$\begin{aligned} F_G(x_0,\dots ,x_d|z)=\sum _{A\subseteq E}\left( \prod _{v\in V} x_{d_A(v)}\right) z^{2|A|}=F_G(x_0,x_1z,x_2z,...,x_dz^d) \end{aligned}$$

As an example we give the subgraph counting polynomial \(F_{K_5}(x_0,x_1,x_2,x_3,x_4)\) of the complete graph \(K_5\) on 5 vertices. The first term corresponds to the empty subgraph, the last term corresponds to the graph itself.

$$\begin{aligned}&x_{0}^{5} + 10 x_{0}^{3} x_{1}^{2} + 15 x_{0} x_{1}^{4} + 30 x_{0}^{2} x_{1}^{2} x_{2} + 30 x_{1}^{4} x_{2} + 60 x_{0} x_{1}^{2} x_{2}^{2} + 10 x_{0}^{2} x_{2}^{3} + 70 x_{1}^{2} x_{2}^{3} + 15 x_{0} x_{2}^{4} \\&\!\!\quad + 12 x_{2}^{5} + 20 x_{0} x_{1}^{3} x_{3} + 60 x_{1}^{3} x_{2} x_{3} + 60 x_{0} x_{1} x_{2}^{2} x_{3} + 120 x_{1} x_{2}^{3} x_{3} + 60 x_{1}^{2} x_{2} x_{3}^{2} + 30 x_{0} x_{2}^{2} x_{3}^{2} \\&\quad \!\! + 70 x_{2}^{3} x_{3}^{2} + 60 x_{1} x_{2} x_{3}^{3} + 5 x_{0} x_{3}^{4} + 30 x_{2} x_{3}^{4} + 5 x_{1}^{4} x_{4} + 30 x_{1}^{2} x_{2}^{2} x_{4} + 15 x_{2}^{4} x_{4} + 60 x_{1} x_{2}^{2} x_{3} x_{4} \\&\quad \!\!+ 60 x_{2}^{2}x_{3}^{2} x_{4} + 20 x_{1} x_{3}^{3} x_{4} + 15 x_{3}^{4} x_{4} + 10 x_{2}^{3} x_{4}^{2} + 30 x_{2} x_{3}^{2} x_{4}^{2} + 10 x_{3}^{2} x_{4}^{3} + x_{4}^{5}. \end{aligned}$$

The general plan is the following. In the next section we show that there are vectors \(\underline{v}(t)\) for each \(t\in [0,2\pi ]\) such that

$$\begin{aligned} F_G(\underline{v}(t))=Z_G(N,\underline{\mu }). \end{aligned}$$

We will show that there exists a \(t_1\) such that all zeros of \(F_G(\underline{v}(t_1)|z)\) lie on a circle for all d-regular graph G. This will imply the convergence of the sequence \(\frac{1}{v(G_n)}\ln Z_G(N,\underline{\mu })\) for not only essentially large girth d-regular graphs but for all Benjamini–Schramm convergent graph sequences.

We will also show that there exists a \(t_0\) such that the first coordinate of \(\underline{v}(t_0)\) is exactly \(\Phi _d(N,\underline{\mu })\), and all other coordinates have a nice sign structure. This will enable us to show that \(Z_G(N,\underline{\mu })\ge \Phi _d(N,\underline{\mu })^{v(G)}\) for all d-regular graph G, and if G contains a linear number of short cycles, then \(Z_G(N,\underline{\mu })\ge ((1+\delta )\Phi _d(N,\underline{\mu }))^{v(G)}\) for some \(\delta >0\).

3.2 Rank 2 matrices

Suppose that we can write an \(r\times r\) matrix N into the form \(N=\underline{a}\underline{a}^T+\underline{b}\underline{b}^T\) and let \(\underline{\mu }\in \mathbb {R}^r\). Then

$$\begin{aligned} Z_G(N,\underline{\mu })&=\sum _{\varphi : V\rightarrow [r]}\prod _{v\in V}\mu _{\varphi (v)}\prod _{(u,v)\in E}N_{\varphi (u)\varphi (v)}\\&=\sum _{\varphi : V\rightarrow [r]}\prod _{v\in V}\mu _{\varphi (v)}\prod _{(u,v)\in E}(\underline{a}\underline{a}^T+\underline{b}\underline{b}^T)_{\varphi (u)\varphi (v)}\\&=\sum _{A\subseteq E}\sum _{\varphi : V\rightarrow [r]}\prod _{v\in V}\mu _{\varphi (v)}\prod _{(u,v)\in E\setminus A}(\underline{a}\underline{a}^T)_{\varphi (u)\varphi (v)}\prod _{(u,v)\in A}(\underline{b}\underline{b}^T)_{\varphi (u)\varphi (v)}\\&=\sum _{A\subseteq E}\sum _{\varphi : V\rightarrow [r]}\prod _{v\in V}\mu _{\varphi (v)}\prod _{(u,v)\in E\setminus A}(\underline{a}_{\varphi (u)}\underline{a}_{\varphi (v)})\prod _{(u,v)\in A}(\underline{b}_{\varphi (u)}\underline{b}_{\varphi (v)})\\&=\sum _{A\subseteq E}\prod _{v\in V}\left( \sum _{k=1}^r\mu _ka_k^{d-d_S(v)}b_k^{d_S(v)}\right) \\&=F_G(r_0,\dots ,r_d), \end{aligned}$$

where \(r_j=\sum _{k=1}^r\mu _ka_k^{d-j}b_k^{j}\). On the other hand, \(\underline{a}\) and \(\underline{b}\) are not the only vectors satisfying \(N=\underline{a}\underline{a}^T+\underline{b}\underline{b}^T\). Indeed, let us define the vectors \(\underline{a}(t)\) and \(\underline{b}(t)\) as follows:

$$\begin{aligned} \underline{a}(t)_j=a_j\cos (t)+b_j\sin (t), \end{aligned}$$

and

$$\begin{aligned} \underline{b}(t)_j=-a_j\sin (t)+b_j\cos (t). \end{aligned}$$

Then \(N=\underline{a}(t)\underline{a}(t)^T+\underline{b}(t)\underline{b}(t)^T\). So each pair \(\underline{a}(t),\underline{b}(t)\) gives rise to a vector \(\underline{v}(t)=(r_0(t),\dots ,r_d(t))\) such that

$$\begin{aligned} F_G(\underline{v}(t))=Z_G(N,\underline{\mu }). \end{aligned}$$

Remark 3.1

We can apply our argument to \(N=M'_2\), \(\underline{\mu }=\underline{\nu }_2\) with the following vectors.

$$\begin{aligned} \underline{a}=\left( \begin{array}{c} \sqrt{1+\frac{w}{q}}\\ \sqrt{1+\frac{w}{q}}\end{array}\right) \ \ \ \text {and}\ \ \ \underline{b}=\left( \begin{array}{c}\sqrt{\frac{(q-1)w}{q}}\\ -\sqrt{\frac{w}{q(q-1)}}\end{array}\right) . \end{aligned}$$

One can check that \(M'_2=\underline{a}\underline{a}^T+\underline{b}\underline{b}^T\) indeed holds true. We can again introduce the vectors \(\underline{a}(t),\underline{b}(t)\) giving rise to a vector \(\underline{v}(t)=(r_0(t),\dots ,r_d(t))\) such that

$$\begin{aligned} F_G(\underline{v}(t))=Z_G(M'_2,\underline{\nu }_2)=Z^{(2)}_G(q,w). \end{aligned}$$

In this case

$$\begin{aligned} r_j(t)&=\sum _{k=1}^2\mu _ka(t)_k^{d-j}b(t)_k^{j} \\&=\left( \sqrt{1+\frac{w}{q}}\cos (t)+\sqrt{\frac{(q-1)w}{q}}\sin (t)\right) ^{d-j}\\&\qquad \times \left( -\sqrt{1+\frac{w}{q}}\sin (t)+\sqrt{\frac{(q-1)w}{q}}\cos (t)\right) ^{j}\\&\qquad +(q-1)\left( \sqrt{1+\frac{w}{q}}\cos (t)-\sqrt{\frac{w}{q(q-1)}}\sin (t)\right) ^{d-j}\\&\qquad \times \left( -\sqrt{1+\frac{w}{q}}\sin (t)-\sqrt{\frac{w}{q(q-1)}}\cos (t)\right) ^{j}. \end{aligned}$$

In particular,

$$\begin{aligned} r_0(t)= & {} \left( \sqrt{1+\frac{w}{q}}\cos (t)+\sqrt{\frac{(q-1)w}{q}}\sin (t)\right) ^{d}\\{} & {} \quad +\,(q-1) \left( \sqrt{1+\frac{w}{q}}\cos (t)-\sqrt{\frac{w}{q(q-1)}}\sin (t)\right) ^{d}. \end{aligned}$$

In other words, \(r_0(t)=\Phi _{d,q,w}(t)\).

3.2.1 Decompositions of \(2\times 2\) positive definite matrices

Sometimes it will be convenient to require extra conditions about the vectors \(\underline{a}\) and \(\underline{b}\) in the decomposition of \(N=\underline{a}\underline{a}^T+\underline{b}\underline{b}^T\).

Lemma 3.2

Let N be a \(2\times 2\) positive definite matrix with positive entries.

  1. (i)

    Then there exists a decomposition \(N=\underline{a}\underline{a}^T+\underline{b}\underline{b}^T\) such that \(a_1,a_2,b_1>0\) and \(b_2<0\).

  2. (ii)

    There is also a decomposition \(N=\underline{a}'\underline{a}'^T+\underline{b}'\underline{b}'^T\) such that \(a'_1,a'_2,b'_1,b'_2>0\).

Proof

First we prove (i). Let \(\underline{v}_1,\underline{v}_2\) be the orthonormal set of eigenvectors of N corresponding to eigenvectors \(\lambda _1, \lambda _2>0\). If \(\lambda _1\ge \lambda _2\), then by the Perron–Frobenius theory we can assume that \(\underline{v}_1\) has positive entries. Since \(\underline{v}_1,\underline{v}_2\) are orthogonal, one of the entries of \(\underline{v}_2\) is positive, the other is negative. By considering \(-\underline{v}_2\) if necessary we can assume that the first entry is positive, the second is negative. Hence \(\underline{a}=\sqrt{\lambda _1}\underline{v}_1\) and \(\underline{b}=\sqrt{\lambda _2}\underline{v}_2\) satisfies the conditions.

Next let us prove (ii). We can assume that we have already found an \(\underline{a}\) and \(\underline{b}\) such that \(N=\underline{a}\underline{a}^T+\underline{b}\underline{b}^T\) and \(a_1,a_2,b_1>0\) and \(b_2<0\). Let

$$\begin{aligned} a_1'=a_1\cos (\alpha )+b_1\sin (\alpha )\ \ \ \text {and}\ \ \ b_1'=-a_1\sin (\alpha )+b_1\cos (\alpha ), \end{aligned}$$

and

$$\begin{aligned} a_2'=a_2\cos (\alpha )+b_2\sin (\alpha )\ \ \ \text {and}\ \ \ b_2'=-a_2\sin (\alpha )+b_2\cos (\alpha ), \end{aligned}$$

If we choose \(\alpha \) such a way that \(\alpha \in \left( -\frac{\pi }{2},\frac{\pi }{2}\right) \), that is, \(\cos (\alpha )>0\) and

$$\begin{aligned} -\frac{a_2}{b_2}>\frac{b_1}{a_1}>0>\frac{b_2}{a_2}>\tan (\alpha )>-\frac{a_1}{b_1}, \end{aligned}$$

then \(a'_1,a'_2,b'_1,b'_2>0\). Note that \(\frac{b_2}{a_2}>-\frac{a_1}{b_1}\) since \(N_{12}=a_1a_2+b_1b_2>0\). \(\square \)

3.2.2 The functions \(a_1(t),a_2(t),b_1(t),b_2(t)\)

In this section we introduce some functions that will appear many times in this paper.

Definition 3.3

For \(a_1,a_2,b_1,b_2\in \mathbb {R}\) let

$$\begin{aligned} a_1(t)= & {} a_1\cos (t)+b_1\sin (t)\ \ \text {and}\ \ \ b_1(t)=b_1\cos (t)-a_1\sin (t),\\ a_2(t)= & {} a_2\cos (t)+b_2\sin (t)\ \ \text {and}\ \ \ b_2(t)=b_2\cos (t)-a_2\sin (t). \end{aligned}$$

Lemma 3.4

Suppose that for the \(2\times 2\) positive definite matrix N we have \(N=\underline{a}\underline{a}^T+\underline{b}\underline{b}^T=\hat{\underline{a}}\hat{\underline{a}}^T+\hat{\underline{b}}\hat{\underline{b}}^T\). Then there exists a t such that \(\underline{a}(t)=\hat{\underline{a}}\) and \(\underline{b}(t)=\hat{\underline{b}}\) or there exists a t such that \(\underline{a}(t)=\hat{\underline{a}}\) and \(\underline{b}(t)=-\hat{\underline{b}}\), and all vectors of those forms are solutions.

Proof

Consider the vectors \(\underline{x}=(a_1,b_1)\) and \(\underline{y}=(a_2,b_2)\). Our goal is to prove that U(2) act transitively on the pairs \(\underline{x},\underline{y}\). The equation \(N=\underline{a}\underline{a}^T+\underline{b}\underline{b}^T\) is equivalent to \(N_{11}=\langle \underline{x},\underline{x}\rangle \), \(N_{12}=\langle \underline{x},\underline{y}\rangle \), \(N_{22}=\langle \underline{y},\underline{y}\rangle \). Thus we know the length of \(\underline{x},\underline{y}\), and from these the angle between them. Thus with unitary operation we can transform any solution to any other solution, and by a unitary action applied to a solution we always get a solution. \(\square \)

Remark 3.5

For the specific choice \(a_1=a_2=\sqrt{1+\frac{w}{q}}\), \(b_1=\sqrt{\frac{(q-1)w}{q}}\) and \(b_2=-\sqrt{\frac{w}{q(q-1)}}\) we use the notation

$$\begin{aligned} a_{q,w,1}(t)= & {} \sqrt{1+\frac{w}{q}}\cos (t)+\sqrt{\frac{(q-1)w}{q}}\sin (t)\ \ \ \text {and}\\ b_{q,w,1}(t)= & {} -\sqrt{1+\frac{w}{q}}\sin (t)+\sqrt{\frac{(q-1)w}{q}}\cos (t),\\ a_{q,w,2}(t)= & {} \sqrt{1+\frac{w}{q}}\cos (t)-\sqrt{\frac{w}{q(q-1)}}\sin (t)\ \ \ \text {and}\\ b_{q,w,2}(t)= & {} -\sqrt{1+\frac{w}{q}}\sin (t)-\sqrt{\frac{w}{q(q-1)}}\cos (t). \end{aligned}$$

We collected some claims about \(a_1(t),a_2(t),b_1(t),b_2(t)\) whose proof is just a straightforward computation. First we describe the sign structure of the functions \(a_1(t),a_2(t),b_1(t),b_2(t)\) on the interval \(\left[ 0,\frac{\pi }{2}\right) \).

Lemma 3.6

Let \(a_1,a_2,b_1,b_2\in \mathbb {R}\) such that \(a_1,a_2,b_1>0\) and \(b_2<0\) and \(a_1a_2+b_1b_2>0\). Let \(t\in \left[ 0,\frac{\pi }{2}\right) \). Then

  1. (a)

    if \(0\le \tan (t)\le \frac{b_1}{a_1}\) we have \(a_1(t),a_2(t),b_1(t)\ge 0\) and \(b_2(t)<0\),

  2. (b)

    if \(\frac{b_1}{a_1}\le \tan (t)\le \frac{a_2}{-b_2}\) we have \(a_1(t),a_2(t)\ge 0\) and \(b_1(t),b_2(t)\le 0\),

  3. (c)

    if \(\frac{a_2}{-b_2}\le \tan (t)\), then \(a_1(t)>0\) and \(a_2(t),b_1(t),b_2(t)\le 0\)

Lemma 3.7

Let \(a_1,a_2,b_1,b_2\in \mathbb {R}\). Then

$$\begin{aligned} \frac{\partial }{\partial t}\left( \frac{a_1(t)b_1(t)}{a_2(t)b_2(t)}\right) =\frac{(a_1a_2+b_1b_2)(a_2b_1-a_1b_2)}{a_2(t)^2b_2(t)^2}. \end{aligned}$$

Lemma 3.8

Let Q be a \(2\times 2\) real matrix with non-zero determinant. Then for every \(c\in \mathbb {R}\), there is a unique \(t\in [0,\pi )\) such that \(\frac{Q_{11}\cos (t)+Q_{12}\sin (t)}{Q_{21}\cos (t)+Q_{22}\sin (t)}=c\).

Proof

Let \(F_Q(t)=\frac{Q_{11}\cos (t)+Q_{12}\sin (t)}{Q_{21}\cos (t)+Q_{22}\sin (t)}.\) We have \(\frac{\partial }{\partial t}F_Q(t)=\frac{Q_{12}Q_{21}-Q_{11}Q_{22}}{(Q_{21}\cos (t)+Q_{22}\sin (t))^2}.\) Hence \(F_Q(t)\) is either strictly monotone decreasing or strictly monotone increasing on \([0,\pi )\) depending on the sign of \(\det (Q)\) with a discontinuity at \(t_0\), where \(\tan (t_0)=-\frac{Q_{21}}{Q_{22}}\). Since \(F_Q(0)=F_Q(\pi )=\frac{Q_{11}}{Q_{21}}\), and \(\lim _{t\searrow t_0}F_Q(t)=\pm \infty \) and \(\lim _{t\nearrow t_0}F_Q(t)=\mp \infty \), the claim follows.

We will also use the following identities.

Lemma 3.9

For arbitrary \(a_1,a_2,b_1,b_2\in \mathbb {R}\) we have

$$\begin{aligned}{} & {} a_1(t)^2+b_1(t)^2=a_1^2+b_1^2,\ \ \ a_2(t)^2+b_2(t)^2=a_2^2+b_2^2,\\{} & {} a_1(t)a_2(t)+b_1(t)b_2(t)=a_1a_2+b_1b_2,\ \ \ a_2(t)b_1(t)-a_1(t)b_2(t)=a_2b_1-a_1b_2. \end{aligned}$$

Remark 3.10

In case of \(Z^{(2)}_G(q,w)\) we get

$$\begin{aligned} a_{q,w,1}(t)a_{q,w,2}(t)+b_{q,w,1}(t)b_{q,w,2}(t)=1. \end{aligned}$$

It is also true that

$$\begin{aligned} a_{q,w,1}(t)b_{q,w,1}(t)+(q-1)a_{q,w,2}(t)b_{q,w,2}(t)=-q\cos (t)\sin (t). \end{aligned}$$

3.3 The vector \(\underline{v}(t_1)\)

In this section we show that there exists a \(t_1\) such that for all d-regular graph G the zeros of \(F_G(\underline{v}(t_1)|z)\) lie on a circle.

3.3.1 Wagner’s subgraph counting technique.

In this section we will recall some theorem of Wagner (Theorem 3.2 of [26]) about the location of zeros of \(F_G(x_0,\dots ,x_d|z)\). For any fixed \(x_0,\dots ,x_d\) let us define the following key-polynomial

$$\begin{aligned} K(x_0,\dots ,x_d|z)=\sum _{k=0}^d {d\atopwithdelims ()k} x_k z^k. \end{aligned}$$

Theorem 3.11

(Wagner [26]). If \(K(x_0,\dots ,x_d|z)\) has no complex zero in the open disk of radius \(\kappa \) around 0, then \(F_G(x_0,\dots ,x_d |z)\) has no complex zero in the open disk of radius \(\kappa \) around 0 for any d-regular graph G.

If \(K(x_0,\dots ,x_d|z)\) has no complex zero in the complement of a closed disk of radius \(\kappa \) around 0, then \(F_G(x_0,\dots ,x_d |z)\) has no complex zero in the complement of a closed disk of radius \(\kappa \) around 0 for any d-regular graph G.

In particular, if \(K(x_0,\dots ,x_d|z)\) has only zeros on the circle of radius \(\kappa \) around 0, then \(F_G(x_0,\dots ,x_d|z)\) has complex zeros only on the circle of radius \(\kappa \) for any d-regular graph G.

3.3.2 Key polynomials for rank 2 matrices

Suppose that we have a rank 2 matrix N of the form \(N=\underline{a}\underline{a}^T+\underline{b}\underline{b}^T\in \mathbb {R}\) and a \(\underline{\mu }\in \mathbb {R}^{2}\). Then we know that \(F_G(\underline{v}(t))=Z_G(N,\underline{\mu })\), where \(\underline{v}(t)=(r_0(t),\dots ,r_d(t))\) for any \(t\in [0,2\pi )\).

Lemma 3.12

Let \(\underline{a},\underline{b},\underline{\mu }\in \mathbb {R}^{r}\). For \(k=1,\dots ,r\) let

$$\begin{aligned} a_k(t)=a_k\cos (t)+b_k\sin (t)\ \ \ \text {and}\ \ \ b_k(t)=b_k\cos (t)-a_k\sin (t), \end{aligned}$$

and for \(j=0,\dots ,d\) let \(r_j(t)=\sum _{k=1}^r\mu _ka_k(t)^{d-j}b_k(t)^j\). Finally, let \(\underline{v}(t)=(r_0(t),\dots ,r_d(t))\) and

$$\begin{aligned} K(\underline{v}(t) | z)=\sum _{j=0}^d\left( {\begin{array}{c}d\\ j\end{array}}\right) r_j(t)z^j. \end{aligned}$$

Then

$$\begin{aligned} K(\underline{v}(t) | z)= \sum _{k=1}^r \mu _k(b_k(t)z+a_k(t))^d \end{aligned}$$

Proof

By definition we have

$$\begin{aligned} K(\underline{v}(t) | z)&=\sum _{j=0}^d \left( {\begin{array}{c}d\\ j\end{array}}\right) r_j(t)z^j \\&=\sum _{j=0}^d \left( {\begin{array}{c}d\\ j\end{array}}\right) \left( \sum _{k=1}^{r} \mu _ka_k(t)^{d-j}b_k(t)^j\right) z^j\\&=\sum _{k=1}^{r} \sum _{j=0}^d \left( {\begin{array}{c}d\\ j\end{array}}\right) \mu _ka_k(t)^{d-j}b_k(t)^jz^j\\&=\sum _{k=1}^{r} \mu _k\left( a_k(t)+b_k(t)z\right) ^d. \end{aligned}$$

\(\square \)

Lemma 3.13

Let \(\mu _1,\mu _2\in \mathbb {R}\) and \(a_1,a_2,b_1,b_2\in \mathbb {R}\) such that \(\mu _1,\mu _2> 0\), then all the complex zeros of \(K(\underline{v}(t)|z)\) are on a circle or on a line.

Moreover, if \(t_1\) satisfies

$$\begin{aligned} \frac{a_1(t_1)b_1(t_1)}{a_2(t_1)b_2(t_1)}=\left( \frac{\mu _2}{\mu _1}\right) ^{2/d}, \end{aligned}$$

then the circle has center at 0. Furthermore, the radius of this circle is

$$\begin{aligned} R_c=\left( \frac{\mu _2}{\mu _1}\right) ^{1/d}\left| \frac{a_2(t_1)}{b_1(t_1)}\right| = \left( \frac{\mu _2}{\mu _1}\right) ^{-1/d}\left| \frac{a_1(t_1)}{b_2(t_1)}\right| =\left| \frac{a_1(t_1)a_2(t_1)}{b_1(t_1)b_2(t_1)}\right| ^{1/2}. \end{aligned}$$

Proof

From Lemma 3.12 we have that

$$\begin{aligned} K(\underline{v}(t)|z)=\mu _1(a_1(t)+b_1(t) z)^d+\mu _2(a_2(t)+b_2(t) z)^d. \end{aligned}$$

Let us assume that \(K(\underline{v}(t)|\zeta )=0\).

If \(a_1(t)+b_1(t)\zeta =a_2(t)+b_2(t)\zeta =0\), then \(\zeta \) is the only zero of \(K(\underline{v}(t)|z)\) with multiplicity d, thus all the complex zeros are on a circle of radius \(|\zeta |=\left| \frac{a_1(t)a_2(t)}{b_1(t)b_2(t)}\right| ^{1/2}\) with center at 0.

If \(a_1(t)+b_1(t)\zeta \) or \(a_2(t)+b_2(t)\zeta \) is not 0, then by symmetry we can assume that \(a_2(t)+b_2(t)\zeta \ne 0\), and we get that

$$\begin{aligned} \mu _1(a_1(t)+b_1(t) \zeta )^d+\mu _2(a_2(t)+b_2(t) \zeta )^d&=0\\ \left( \frac{a_1(t)+b_1(t)\zeta }{a_2(t)+b_2(t)\zeta }\right) ^d&=-\frac{\mu _2}{\mu _1}\\ M_t(\zeta )^d&=-\frac{\mu _2}{\mu _1}, \end{aligned}$$

where \(M_t(z)=\frac{a_1(t)+b_1(t)z}{a_2(t)+b_2(t)z}\) is a Möbius transformation with real coefficients. Let us introduce the notation \(T=\left( \frac{\mu _2}{\mu _1}\right) ^{2/d}\). Thus we obtained that for any \(\zeta \) zero of \(K(\underline{v}(t)|z)\) we have

$$\begin{aligned} |M_t(\zeta )|=\sqrt{T}. \end{aligned}$$

Since \(M_t(z)\) is Möbius transformation, therefore \(M_t^{(-1)}(z)\) maps cycles into cycles and lines, i.e. \(\zeta \in M_{t}^{(-1)}(S_{\sqrt{T}} )\), where \(S_c\) is a circle of radius c around 0.

In order to prove the second part of the statement we have to investigate when does the circle \(M_{t_1}^{(-1)}(S_{\sqrt{T}})\) have a center at 0. Since \(M_t(z)\) is Möbius transformation with real coefficients, thus \(M_t^{(-1)}(z)\) is also a Möbius transformation with real coefficients. This means that the image of a circle that is perpendicular to the real line is also perpendicular to the real line. We claim that \(M_{t_1}^{(-1)}(S_{\sqrt{T}})\) is not a line. To see it it is enough to show that \(M^{(-1)}_{t_1}(\pm \sqrt{T})\) is not \(\infty \), or equivalently \(M_{t_1}(\infty )\ne \pm \sqrt{T}\). If this would be the case, then \(M_{t_1}(\infty )=\frac{b_1(t_1)}{b_2(t_1)}=\pm \sqrt{T}\) would imply that \(a_1(t_1)+b_1(t_1)z\) and \(a_2(t_1)+b_2(t_1)z\) have a common zero, which lead us to a contradiction.

Thus the center of \(M^{(-1)}_{t_!}(S_{\sqrt{T}} )\) is at

$$\begin{aligned} \frac{1}{2}\left( M^{(-1)}_{t_1}\left( \sqrt{T}\right) +M^{(-1)}_{t_1}\left( -\sqrt{T}\right) \right) . \end{aligned}$$

This is 0 if and only if

$$\begin{aligned} M^{(-1)}_{t_1}\left( \sqrt{T}\right)&=-M^{(-1)}_{t_1}\left( -\sqrt{T}\right) \\ \frac{a_2(t_1) \sqrt{T}-a_1(t_1)}{-b_2(t_1)\sqrt{T} +b_1(t_1)}&= \frac{a_2(t_1) \sqrt{T}+a_1(t_1)}{b_2(t_1)\sqrt{T} +b_1(t_1)}\\ a_2(t_1)b_2(t_1)T&= a_1(t_1)b_1(t_1) \end{aligned}$$

This is equivalent to

$$\begin{aligned} T= \frac{a_1(t_1)b_1(t_1)}{a_2(t_1)b_2(t_1)}. \end{aligned}$$

To find the corresponding radius we have to calculate \(\left| M_{t_1}^{(-1)}(\sqrt{T})\right| \).

$$\begin{aligned} M_{t_1}^{(-1)}(\sqrt{T})&=\frac{a_2(t_1) \sqrt{T}-a_1(t_1)}{-b_2(t_1)\sqrt{T} +b_1(t_1)}\\&=\frac{a_2(t_1)}{b_1(t_1)}\sqrt{T}\left( \frac{a_2(t_1)b_1(t_1)\sqrt{T}-a_1(t_1)b_1(t_1)}{-a_2(t_1)b_2(t_1)T+b_1(t_1)a_2(t_1)\sqrt{T}}\right) \\&=\frac{a_2(t_1)}{b_1(t_1)}\sqrt{T} \end{aligned}$$

This implies that \(R_c=\sqrt{T}\left| \frac{a_2(t_1)}{b_1(t_1)}\right| \). Thus by equation \(T= \frac{a_1(t_1)b_1(t_1)}{a_2(t_1)b_2(t_1)}\) we also have \(R_c=T^{-1/2}\left| \frac{a_1(t_1)}{b_2(t_1)}\right| \), and by multiplying the two equations we get that \(R_c^2=\left| \frac{a_1(t_1)a_2(t_1)}{b_1(t_1)b_2(t_1)}\right| \). \(\square \)

Lemma 3.14

Let \(a_1,a_2,b_1,b_2,\mu _1,\mu _2\in \mathbb {R}\) such that \(a_1,a_2,b_1,\mu _1,\mu _2>0\) and \(b_2<0\) and \(a_1a_2+b_1b_2>0\), then there is a unique \(t_1\in \left[ 0,\frac{\pi }{2}\right] \) such that \(\frac{b_1}{a_1}<\tan (t_1)<\frac{a_2}{-b_2}\) and

$$\begin{aligned} \frac{a_1(t_1)b_1(t_1)}{a_2(t_1)b_2(t_1)}=\left( \frac{\mu _2}{\mu _1}\right) ^{2/d}. \end{aligned}$$

For such a \(t_1\) we have \(a_1(t_1),a_2(t_1)>0\) and \(b_1(t_1),b_2(t_1)<0\) implying that \(\frac{a_1(t_1)a_2(t_1)}{b_1(t_1)b_2(t_1)}>0\).

Proof

Note that the function \(\frac{a_1(t)b_1(t)}{a_2(t)b_2(t)}\) is only positive at \(t\in \left[ 0,\frac{\pi }{2}\right] \) if \(a_1(t),a_2(t)>0\) and \(b_1(t),b_2(t)<0\), that is, \(\frac{b_1}{a_1}<\tan (t)<\frac{a_2}{-b_2}\). When \(t\rightarrow \arctan \left( \frac{b_1}{a_1}\right) \), then \(b_1(t)\rightarrow 0\), and so \(\frac{a_1(t)b_1(t)}{a_2(t)b_2(t)}\rightarrow 0\). If \(t\rightarrow \arctan \left( \frac{a_2}{-b_2}\right) \), then \(a_2(t)\rightarrow 0\), and so \(\frac{a_1(t)b_1(t)}{a_2(t)b_2(t)}\rightarrow \infty \). Since

$$\begin{aligned} \frac{\partial }{\partial t}\left( \frac{a_1(t)b_1(t)}{a_2(t)b_2(t)}\right) =\frac{(a_1a_2+b_1b_2)(a_2b_1-a_1b_2)}{a_2(t)^2b_2(t)^2}>0 \end{aligned}$$

the function is strictly monotone increasing, hence there is a unique \(t_1\) satisfying \(\frac{a_1(t_1)b_1(t_1)}{a_2(t_1)b_2(t_1)}=\left( \frac{\mu _2}{\mu _1}\right) ^{2/d}\). \(\square \)

Theorem 3.15

Let N be a \(2\times 2\) positive definite matrix with positive entries and let \(\underline{\mu }\in \mathbb {R}^2_{>0}\). Then there exists a \(\underline{v}_c\in \mathbb {R}^{d+1}\) and an \(R_c(N,\underline{\mu })\in \mathbb {R}_{>0}\) such that for any d-regular graph G we have \(Z_G(N,\underline{\mu })=F_G(\underline{v}_c)\) and all complex zeros of \(F_G(\underline{v}_c|z)\) lie on a circle around 0 of radius \(R_c(N,\underline{\mu })\). Moreover, \(R=R_c(N,\mu )\) is a positive real solution of

$$\begin{aligned} (N_{11}N_{22} - N_{12}^2) R^4 + ( -N_{22}^2T + 2N_{12}^2 - N_{11}^2T^{-1} ) R^2 + (N_{11}N_{22}- N_{12}^2)=0, \end{aligned}$$

where \(T=\left( \frac{\mu _2}{\mu _1}\right) ^{2/d}\).

Proof

The first part of the claim follows from combining Lemma 3.23.14, 3.13 and Theorem 3.11. Indeed, by Lemma 3.2 we know that there exists a decomposition \(N=\underline{a}\underline{a}^T+\underline{b}\underline{b}^T\) such that \(a_1,a_2,b_1>0\) and \(b_2<0\). Then Lemma 3.14 implies that there exists a \(t_1\) such that \(\frac{a_1(t_1)b_1(t_1)}{a_2(t_1)b_2(t_1)}=\left( \frac{\mu _2}{\mu _1}\right) ^{2/d}\). Then Lemma 3.13 shows that the zeros of \(K(\underline{v}(t_1)|z)\) lie on a circle that has center at 0. Then Theorem 3.11 implies that all complex zeros of \(F_G(\underline{v}(t_1)|z)\) lie on a circle around 0 for any d-regular graph G. Thus \(\underline{v}_c=\underline{v}(t_1)\) satisfies the conditions of the theorem.

To prove the statement concerning the radius of the circle note that by Lemma 3.9 and 3.13 we have \(N_{11}=a_1(t_1)^2+b_1(t_1)^2\), \(N_{22}=a_2(t_1)^2+b_2(t_1)^2\), \(N_{12}=a_1(t_1)a_2(t_1)+b_1(t_1)b_2(t_1)\), \(T=\frac{a_1(t_1)b_1(t_1)}{a_2(t_1)b_2(t_1)}\) and \(R^2=\frac{a_1(t_1)a_2(t_1)}{b_1(t_1)b_2(t_1)}\). Let us introduce the notations \(\overline{a}_1=a_1(t_1)\), \(\overline{a}_2=a_2(t_1)\), \(\overline{b}_1=b_1(t_1)\) and \(\overline{b}_2=b_2(t_1)\). Then we get that

$$\begin{aligned}&(N_{11}N_{22} - N_{12}^2) R^4 + ( -N_{22}^2T + 2N_{12}^2 - N_{11}^2T^{-1} ) R^2 + (N_{11}N_{22}- N_{12}^2)\\&\quad =((\overline{a}_1^2+\overline{b}_1^2)(\overline{a}_2^2+\overline{b}_2^2)-(\overline{a}_1\overline{a}_2+\overline{b}_1\overline{b}_2)^2)\left( \frac{\overline{a}_1^2\overline{a}_2^2}{\overline{b}_1^2\overline{b}_2^2}+1\right) \\&\qquad -(\overline{a}_2^2+\overline{b}_2^2)^2\frac{\overline{a}_1\overline{b}_1}{\overline{a}_2\overline{b}_2}\cdot \frac{\overline{a}_1\overline{a}_2}{\overline{b}_1\overline{b}_2} +2(\overline{a}_1\overline{a}_2+\overline{b}_1\overline{b}_2)^2\cdot \frac{\overline{a}_1\overline{a}_2}{\overline{b}_1\overline{b}_2} -(\overline{a}_1^2+\overline{b}_1^2)^2\frac{\overline{a}_2\overline{b}_2}{\overline{a}_1\overline{b}_1}\cdot \frac{\overline{a}_1\overline{a}_2}{\overline{b}_1\overline{b}_2}\\&\quad =\frac{(\overline{a}_1^2\overline{b}_2^2-2\overline{a}_1\overline{a}_2\overline{b}_1\overline{b}_2+\overline{a}_2^2\overline{b}_1^2)(\overline{a}_1^2\overline{a}_2^2+\overline{b}_1^2\overline{b}_2^2)}{\overline{b}_1^2\overline{b}_2^2} -\frac{(\overline{a}_2^4+2\overline{a}_2^2\overline{b}_2^2+\overline{b}_2^4)\overline{a}_1^2\overline{b}_2^2}{\overline{b}_1^2\overline{b}_2^2}\\&\qquad +\frac{2(\overline{a}_1^2\overline{a}_2^2+2\overline{a}_1\overline{a}_2\overline{b}_1\overline{b}_2+\overline{b}_1^2\overline{b}_2^2)\overline{a}_1\overline{a}_2\overline{b}_1\overline{b}_2}{\overline{b}_1^2\overline{b}_2^2} -\frac{(\overline{a}_1^4+2\overline{a}_1^2\overline{b}_1^2+\overline{b}_1^4)\overline{a}_2^2\overline{b}_2^2}{\overline{b}_1^2\overline{b}_2^2} \end{aligned}$$

Now one can see that everything cancels, and this is indeed 0. \(\square \)

3.4 Random regular graphs and Bethe approximation

In this section we recall about some results of Dembo et al. [13] on Bethe approximation. We introduce a quantity \(\Phi _d(N,\underline{\mu })\) for which it is true that if G is a random d-regular graph on n vertices, then we have \(\mathbb {E}Z_G(N,\underline{\mu })=n^{O(1)}\Phi _{d}(N,\underline{\mu })^n\). As a consequence of a theorem of Ruozzi we will also get that \(Z_G(N,\underline{\mu })\ge \Phi _{d}(N,\underline{\mu })^{v(G)}\) for a d-regular graph G if N is a \(2\times 2\) positive definite matrix.

In general, let \(N\in \mathbb {R}^{r\times r}_{>0}\) be a symmetric matrix and \(\underline{\mu }\in \mathbb {R}^r_{>0}\). Let \(B_{N,\underline{\mu }}\) be a symmetric distribution on \([r]^2\). Let \(b_{N,\underline{\mu }}\) be the marginal of \(B_{N,\underline{\mu }}\) to its first coordinate. Let us define

$$\begin{aligned} \mathbb {F}_{\hom }(B_{N,\underline{\mu }}):=\frac{d}{2}H(B_{N,\underline{\mu }})-(d-1)H(b_{N,\underline{\mu }})+[\ln N]_{B_{N,\underline{\mu }}}+\frac{d}{2}[\ln \mu ]_{b_{N,\mu }}, \end{aligned}$$

where for a probability distribution \(P=(p_1,\dots ,p_n)\) and a vector \(f=(f_1,\dots ,f_n)\) we have

$$\begin{aligned} H(P)=\sum _{i=1}^np_i\ln \frac{1}{p_i}\ \ \ \text {and}\ \ \ [f]_P=\sum _{i=1}^np_if_i \end{aligned}$$

with the usual convention \(0\cdot \ln \frac{1}{0}=0\). The subscript \(\hom \) in \(\mathbb {F}_{\hom }\) simply stands for homomorphism. Let

$$\begin{aligned} \Phi _d(N,\underline{\mu })=\max _{B_{N,\underline{\mu }}}\exp \left( \mathbb {F}_{\hom }(B_{N,\underline{\mu }})\right) . \end{aligned}$$

The quantity \(\Phi _d(N,\underline{\mu })\) also has a description through belief propagation equation, see Proposition 14.6 of [20] or Section 1.2 of [13]. Let \(h\in \mathbb {R}^r\) be a probability distribution. The belief propagation equation or Bethe recursion is

$$\begin{aligned} \textrm{BP}(h)_{\sigma }:=\frac{1}{z_h}\mu _{\sigma }\left( \sum _{\sigma '}N_{\sigma ,\sigma '}h_{\sigma '}\right) ^{d-1} \end{aligned}$$

for all \(\sigma \in [r]\), where \(z_h\) is the normalizing constant ensuring that \(\textrm{BP}(h)\) is a probability distribution too. Let \(\mathcal {H}^*\) be the set of probability distributions for which \(\textrm{BP}(h)=h\). The Bethe functional is defined as

$$\begin{aligned} \widetilde{\Phi }_{N,\underline{\mu },d}(h)=\ln \left( \sum _{\sigma }\mu _{\sigma }\left( \sum _{\sigma '} N_{\sigma ,\sigma '}h_{\sigma '}\right) ^{d}\right) -\frac{d}{2}\ln \left( \sum _{\sigma ,\sigma '}N_{\sigma ,\sigma '}h_{\sigma }h_{\sigma '}\right) . \end{aligned}$$

Then

$$\begin{aligned} \Phi _d(N,\underline{\mu })=\sup _{h\in \mathcal {H}^*}\exp (\widetilde{\Phi }_{N,\underline{\mu },d}(h)). \end{aligned}$$

Dembo et al. [13] showed that the quantity \(\Phi _d(N,\underline{\mu })\) is directly related to the expected value of \(Z_G(N,\underline{\mu })\) for random d-regular graphs.

Theorem 3.16

(Dembo et al. [13]). Let G be a random d-regular graph on n vertices. Then

$$\begin{aligned} \mathbb {E} Z_G(N,\underline{\mu })=n^{O(1)}\Phi _{d}(N,\underline{\mu })^{n}. \end{aligned}$$

3.4.1 A theorem of Ruozzi.

In this section we show that if N is a \(2\times 2\) positive definite matrix with positive entries and \(\underline{\mu }\) is a positive vector, then for any d-regular graph G we have \(Z_G(N,\underline{\mu })\ge \Phi _{d}(N,\underline{\mu })^{n}\). First we recall the setting of factor graphs.

Definition 3.17

A factor graph \(\mathcal {G}=(F,V,E,\mathcal {X},(g_a)_{a\in F})\) is a bipartite graph equipped with a set of functions. Its vertex set is \(F\cup V\), where F is the set of function nodes, and V is the set of variable nodes. The edge set of \(\mathcal {G}\) will be denoted by \(E(\mathcal {G})\). The neighbors of a factor node a or variable node v will be denoted by \(\partial a\) or \(\partial v\), respectively. For each variable node v we associate a variable \(x_v\) taking its values from the alphabet \(\mathcal {X}\). For each a there is an associated function \(g_a: \mathcal {X}^{\partial a}\rightarrow \mathbb {R}_{\ge 0}\). The partition function of the factor graph \(\mathcal {G}\) is

$$\begin{aligned} Z(\mathcal {G})=\sum _{\underline{x}\in \mathcal {X}^V}\prod _{a\in F}g_a(\underline{x}_{\partial a}), \end{aligned}$$

where \(\underline{x}_{\partial a}\) is the restriction of \(\underline{x}\) to the set \(\partial a\).

When \(\mathcal {X}=\{0,1\}\) we speak about a binary factor graph.

Let us consider an example.

Example 3.18

Suppose that \(G=(V,E)\) is an (ordinary) graph. We can associate a factor graph \(\mathcal {G}\) as follows. For each \(v\in V\) we introduce a variable node v and function node \(v'\), and for each edge \(e=(u,v)\) we introduce a function node e. In \(\mathcal {G}\) let us connect v with \(v'\) and \(e=(u,v)\) with u and v. Set \(\mathcal {X}=[r]\). Let N be a \(r\times r\) matrix and \(\underline{\mu }\in \mathbb {R}^r\). For each function node \(v'\) we introduce the function \(g_v(x)=\mu _x\) and for each edge e we introduce the function \(g_e(x,y)=N_{x,y}\). Then

$$\begin{aligned} Z(\mathcal {G})=Z_G(N,\underline{\mu }). \end{aligned}$$

The middle picture of Fig. 3 depicts the factor graph \(\mathcal {G}\) for the diamond graph G.

Example 3.19

Let \(G=(V,E)\) be a graph. Recall that

$$\begin{aligned} F_G(x_0,\dots ,x_d)=\sum _{A\subseteq E}\left( \prod _{v\in V} x_{d_A(v)}\right) . \end{aligned}$$

For \((a_0,\dots ,a_d)\in \mathbb {R}^{d+1}\) let us consider the following factor graph \(\mathcal {G}'=(F',V',E',\mathcal {X}',(g'_a)_{a\in F}).\) We subdivide each edge of E with one vertex. In the resulting bipartite graph one side corresponds to \(F'\), the other side corresponds to \(V'\). So with a slight abuse of notation we have \(F'=V\) and \(V'=E\). Let \(\mathcal {X}'=\{0,1\}\). For each \(v\in V\) let us introduce the function

$$\begin{aligned} g'_v(x_{e_1},\dots ,x_{e_{d_v}})=a_{|x|}, \end{aligned}$$

where \(|x|=x_{e_1}+\dots +x_{e_{d_v}}\), and \(e_1,\dots ,e_{d_v}\) are the edges incident to v. Then

$$\begin{aligned} Z(\mathcal {G}')=F_G(a_0,\dots ,a_d). \end{aligned}$$

As we can see \(\mathcal {G}'\) is in some sense the dual of \(\mathcal {G}\). The picture on the right hand side of Fig. 3 depicts the factor graph \(\mathcal {G}'\) for the diamond graph G.

Fig. 3
figure 3

A graph with two graphical models. Square shape nodes are function vertices, circle shape nodes are variable vertices. We can see the factor graph of Example 3.18 in the middle and the factor graph of Example 3.19 on the right. In this example the original graph is not regular, nevertheless we can see that in \(\mathcal {G}'\) the variable nodes correspond to the edges of the original graph while the function nodes correspond to the vertices of the original graph

Next we need the concept of the Bethe approximation for factor graphs. First we need to introduce the pseudo-marginal polytope.

Definition 3.20

For each variable node v let us introduce a probability distribution \(b_v\) on \(\mathcal {X}\), and for each function node a let us also introduce a probability distribution \(b_a\) on \(\mathcal {X}^{\partial a}\):

$$\begin{aligned} \sum _{x\in \mathcal {X}}b_v(x)=1\ \ \forall v\in V,\ \ b_v(x)\ge 0\ \ \ \forall x\in \mathcal {X}, \end{aligned}$$

and

$$\begin{aligned} \sum _{\underline{x}\in \mathcal {X}^{\partial {a}}}b_a(\underline{x})=1\ \ \forall a\in F,\ \ b_a(\underline{x})\ge 0\ \ \ \forall \underline{x}\in \mathcal {X}^{\partial a}. \end{aligned}$$

Furthermore, \(b_v\) and \(b_a\) have to be consistent in the following sense: for all \(c\in \mathcal {X},\ a\in F, v\in \partial a\) we have

$$\begin{aligned} \sum _{\underline{x}\in \mathcal {X}^{\partial a\setminus v}}b_a(\underline{x},c)=b_v(c). \end{aligned}$$

We will call a \(\underline{b}=((b_v)_{v\in V},(b_a)_{a\in F})\) a locally consistent set of marginals or simply pseudo-marginal. The set of such \(\underline{b}\) will be denoted by \(\textrm{Mar}(\mathcal {G})\).

Definition 3.21

The Bethe partition function \(Z_B(\mathcal {G})\) is defined as follows. Let \(\mathbb {F}\) be the following function evaluated on a \(\underline{b} \in \textrm{Mar}(\mathcal {G})\):

$$\begin{aligned} \mathbb {F}(\underline{b})=\sum _{a\in F}\sum _{\underline{x}\in \mathcal {X}^{\partial {a}}}b_a(\underline{x}) \ln \frac{g_a(\underline{x})}{b_a(\underline{x})} -\sum _{v \in V}(1-|\partial v|)\sum _{x \in \mathcal {X}}b_v(x)\ln b_v(x). \end{aligned}$$

The notation \(\mathbb {F}\) is consistent with our previous notation \(\mathbb {F}_{\hom }\) as it will be explained later. Finally, let

$$\begin{aligned} H_B(\mathcal {G})=\sup _{\underline{b}\in \textrm{Mar}(\mathcal {G})} \mathbb {F}(\underline{b}), \end{aligned}$$

and

$$\begin{aligned} Z_B(\mathcal {G})=\exp (H_B(\mathcal {G})). \end{aligned}$$

Here \(H_B(\mathcal {G})\) is the Bethe free entropy, and \(Z_B(\mathcal {G})\) is the Bethe partition function. We note that if \(g_a(\underline{x})=0\), then we require \(b_a(\underline{x})=0\) and use the convention \(0\cdot \ln \frac{0}{0}=0\).

Example 3.22

By continuing examples 3.18 and 3.19 we can consider the Bethe partition functions of \(\mathcal {G}\) and \(\mathcal {G}'\). For \(\mathcal {G}\) we will denote it by \(Z_G^B(N,\mu )\).

Recall that a function g is log-supermodular if for all \(\underline{x},\underline{y}\in \{0,1\}^k\) we have

$$\begin{aligned} g(\underline{x})g(\underline{y})\le g(\underline{x}\wedge \underline{y})g(\underline{x}\vee \underline{y}), \end{aligned}$$

where \(\underline{x}\wedge \underline{y},\underline{x}\vee \underline{y}\in \{0,1\}^k\) such that \((\underline{x}\wedge \underline{y})_i=\min (x_i,y_i)\) and \((\underline{x}\vee \underline{y})_i=\max (x_i,y_i)\) for \(i\in [k]\).

Theorem 3.23

(Ruozzi [21]). Let \(\mathcal {G}=(F,V,E,\mathcal {X},(g_a)_{a\in F})\) be a factor graph with \(\mathcal {X}=\{0,1\}\) such that for all \(a\in F\) the functions \(g_a\) are log-supermodular. Then \(Z(\mathcal {G})\ge Z_B(\mathcal {G})\).

Lemma 3.24

For an \(r \times r\) matrix N and \(\underline{\mu }\in \mathbb {R}^r_{\ge 0}\), and for a d-regular graph G we have \(Z_G^B(N,\underline{\mu })\ge \Phi _d(N,\underline{\mu })^{v(G)}\).

Proof

By using the same probability distribution \(B_{N,\underline{\mu }}\) everywhere in the definition of \(Z_G^B(N,\underline{\mu })\) the consistency of marginals is immediately satisfied. Then the function \(\mathbb {F}\) on this pseudo-marginal simplifies to \(\mathbb {F}_{\hom }(B_{N,\underline{\mu }})\), and we get that \(Z_G^B(N,\underline{\mu })\ge \Phi _d(N,\underline{\mu })^{v(G)}\). \(\square \)

Theorem 3.25

For a \(2\times 2\) positive definite matrix N with positive entries and \(\underline{\mu }\in \mathbb {R}^2_{\ge 0}\), and a d-regular graph G we have \(Z_G(N,\underline{\mu })\ge \Phi _d(N,\underline{\mu })^{v(G)}\).

Proof

Note that the log-supermodularity of \(g_a\) in the case of the factor graph \(\mathcal {G}\) in Example 3.18 simply means that \(N_{11}N_{22}\ge N_{12}N_{21}\) which is satisfied as N is positive definite. Hence by combining Ruozzi’s theorem with the previous lemma we get that

$$\begin{aligned} Z_G(N,\underline{\mu })\ge Z_G^B(N,\underline{\mu })\ge \Phi _d(N,\underline{\mu })^{v(G)}. \end{aligned}$$

Remark 3.26

For Theorem 3.25 we will give a new proof in Sect. 3.7 that implies a slightly stronger statement. Namely, if G is a d-regular graph such that for some g and \(\varepsilon >0\) the graph G contains at least \(\varepsilon v(G)\) cycles of length g, then \(Z_G(N,\underline{\mu })>((1+\delta )\Phi _d(N,\underline{\mu }))^{v(G)}\) for some \(\delta =\delta (d,N,\underline{\mu },g,\varepsilon )>0\).

3.5 Convergence of \(Z_{G_n}(N,\underline{\mu })\)

In this section we prove that if N is a \(2\times 2\) positive definite matrix with positive entries and \(\underline{\mu }\) is a positive vector, then \(\frac{1}{v(G_n)}\ln Z_{G_n}(N,\underline{\mu })\) is convergent for an essentially large girth sequence of d-regular graphs \((G_n)_n\). In fact, we will prove a stronger statement. Namely, we prove the convergence of the sequence \(\frac{1}{v(G_n)}\ln Z_{G_n}(N,\underline{\mu })\) for any Benjamini–Schramm convergent graph sequence \((G_n)_n\) of regular graphs.

Definition 3.27

For a finite graph G, a finite connected rooted graph \(\alpha \) and a positive integer r, let \(\mathbb {P}(G,\alpha ,r)\) be the probability that the r-ball centered at a uniform random vertex of G is isomorphic to \(\alpha \).

We say that a bounded-degree graph sequence \((G_n)_n\) is Benjamini–Schramm convergent if for all finite rooted graphs \(\alpha \) and \(r>0\), the probabilities \(\mathbb {P}(G_n,\alpha ,r)\) converge.

Benjamini–Schramm convergence is also called local convergence as it primarily grasps the local structure of the graphs \((G_n)_n\).

Given a vector \(\underline{a}\in \mathbb {R}^{d+1}\) and a d-regular graph G on n vertices let \(\lambda _1(G),\dots ,\lambda _{nd}(G)\) be the zeros of the polynomial \(F_G(\underline{a}|z)\). Let us define the probability measure \(\rho _{G,\underline{a}}\) on \(\mathbb {C}\) as follows:

$$\begin{aligned} \rho _{G,\underline{a}}:=\frac{1}{nd}\sum _{k=1}^{nd}\delta _{\lambda _i(G)}, \end{aligned}$$

where \(\delta _{\lambda }\) is the Dirac-measure on the number \(\lambda \).

Lemma 3.28

(a) For any integer \(k\ge 0\), a vector \(\underline{a}\in \mathbb {R}^{d+1}\) and a Benjamini–Schramm convergent sequence of d-regular graphs \((G_n)_n\) the sequence

$$\begin{aligned} \int z^k\ d\rho _{G_n,\underline{a}}(z) \end{aligned}$$

is convergent.

(b) Let \(\underline{v}_c\in \mathbb {R}^{d+1}\) be such that the zeros of \(F_G(\underline{v}_c|z)\) lie on a circle of radius \(R_c\) for all graph G. If \((G_n)_n\) is a Benjamini–Schramm convergent sequence of d-regular graphs, then the sequence of measures \(\rho _{G_n,\underline{v}_c}\) converges weakly.

Proof

Part (a) is a special case of a much more general theorem claiming that

$$\begin{aligned} \int z^k\ d\rho _{G,\underline{a}}(z)=\frac{1}{dv(G)}\sum _{j=1}^{dv(G)}\lambda _j(G)^k \end{aligned}$$

can be expressed as \(\frac{1}{v(G)}\sum _Hc_{H,k}\hom (H,G)\) for a fixed finite set of connected graphs H, and the fact that a sequence of bounded degree graphs \((G_n)_n\) is Benjamini–Schramm convergent if and only if for all connected graphs H the sequence \(\frac{\hom (H,G_n)}{v(G_n)}\) is convergent. For details see the paper of Csikvári and Frenkel [11].

Part (a) implies part (b) for the following reasons. The weak convergence of measures \(\rho _n\) on \(\mathbb {C}\) is equivalent with the convergence of \(\int z^k\overline{z}^{\ell }d\rho _n(z)\) for all integers \(k,\ell \ge 0\). But if \(\rho _n\) are supported on a fixed circle and they are symmetric to the real line, then this is equivalent with the convergence of \(\int z^md\rho _n(z)\) for all positive integer m. \(\square \)

Theorem 3.29

For any Benjamini–Schramm convergent sequence of d-regular graphs \((G_n)_n\) the sequence

$$\begin{aligned} \frac{1}{v(G_n)}\ln Z_{G_n}(N,\underline{\mu }) \end{aligned}$$

is convergent.

Proof

By Theorem 3.15 there exists a \(\underline{v}_c\in \mathbb {R}^{d+1}\) such that for any d-regular graph G we have \(Z_G(N,\underline{\mu })=F_G(\underline{v}_c)\) and all zeros of \(F_G(\underline{v}_c|z)\) lie on a circle of radius \(R_c(N,\underline{\mu })\). First suppose that \(R_c=R_c(N,\underline{\mu })\ne 1\). We have

$$\begin{aligned} \frac{1}{v(G)}\ln Z_{G}(N,\mu )&=\frac{1}{v(G)}\ln F_G(\underline{v}_c|z)\bigg |_{z=1}\\&=\frac{1}{v(G)}\ln \left( \prod _{j=1}^{dv(G)}(1-\lambda _j(G))\right) \\&=d\int \ln |z-1|d\rho _{G,\underline{v}_c}(z). \end{aligned}$$

The measures \(\rho _{G_n,\underline{v}_c}\) are supported on a circle of radius \(R_c\ne 1\), thus \(\ln |z-1|\) is a continuous function on a region containing the circle but avoid an open neighborhood of \(z=1\). Since the measures \(\rho _{G_n,\underline{v}_c}\) are weakly convergent we get that the sequence \(\frac{1}{v(G_n)}\ln Z_{G_n}(N,\underline{\mu })\) is convergent.

Next we show that the limit exists even if \(R_c(N,\underline{\mu })=1\). Let \(\Phi _L(N,\underline{\mu })\) be defined by

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{v(G_n)}\ln Z_{G_n}(N,\underline{\mu })=\ln \Phi _L(N,\mu ) \end{aligned}$$

if \(R_c(N,\underline{\mu })\ne 1\). Here L in \(\Phi _L(N,\underline{\mu })\) simply stands for the word limit. We show that \(\Phi _L(N,\underline{\mu })\) is a monotone increasing continuous function of \(\mu _1\). Indeed, if \(\mu '_1<\mu _1\), then

$$\begin{aligned} Z_G(N,(\mu '_1,\mu _2))\le Z_G(N,(\mu _1,\mu _2))\le \left( \frac{\mu _1}{\mu '_1}\right) ^{v(G)}Z_G(N,(\mu '_1,\mu _2)). \end{aligned}$$

So if \(R_c(N,(\mu '_1,\mu _2))\ne 1\), then

$$\begin{aligned} \ln \Phi _L(N,(\mu '_1,\mu _2))= & {} \lim _{n\rightarrow \infty }\frac{1}{v(G_n)}\ln Z_{G_n}(N,(\mu '_1,\mu _2))\\\le & {} \liminf _{n\rightarrow \infty }\frac{1}{v(G_n)}\ln Z_{G_n}(N,(\mu _1,\mu _2))\\\le & {} \limsup _{n\rightarrow \infty }\frac{1}{v(G_n)}\ln Z_{G_n}(N,(\mu _1,\mu _2))\\\le & {} \ln \left( \frac{\mu _1}{\mu '_1}\right) +\ln \Phi _L(N,(\mu '_1,\mu _2)). \end{aligned}$$

Note that if \(R_c(N,(\mu _1,\mu _2))=1\), then

$$\begin{aligned} 2(N_{11}N_{22} - N_{12}^2)+ ( -N_{22}^2T + 2N_{12}^2 - N_{11}^2T^{-1} )=0, \end{aligned}$$

where \(T=\left( \frac{\mu _2}{\mu _1}\right) ^{2/d}\). For fixed N and \(\mu _2\) there are at most two \(\mu _1\) such that this equation is satisfied. Thus for such a \(\mu _1\) we can define

$$\begin{aligned} \Phi _L(N,(\mu _1,\mu _2))=\lim _{\mu '_1\rightarrow \mu _1}\Phi _L(N,(\mu '_1,\mu _2)), \end{aligned}$$

and we get that

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{v(G_n)}\ln Z_{G_n}(N,(\mu _1,\mu _2))=\Phi _L(N,(\mu _1,\mu _2)). \end{aligned}$$

\(\square \)

Next we need some result about the number of short cycles in random regular graphs. There are many such results in the literature, we chose one.

Lemma 3.30

(McKay et al. [19]) . Let \(\{c_1,\dots ,c_t\}\) be a non-empty subset of \(\{3,\dots ,g\}\). For a random regular graph G of order n and degree d, define \(M_C(G)=(m_1,\dots ,m_t)\), where \(m_i\) is the number of cycles of length \(c_i\) in G for \(1\le i\le t\). For \(1\le i\le t\) let \(\mu _i=\frac{(d-1)^{c_i}}{2c_i}\). Let S be a set of non-negative integer t-tuples. Then as \(n\rightarrow \infty \) the probability that \(M_C(G)\in S\) is equal to

$$\begin{aligned} (1+o(1))\left( \sum _{(m_1,\dots ,m_t)\in S}\prod _{i=1}^t\frac{e^{-\mu _i}\mu _i^{m_i}}{m_i!}\right) +o(1). \end{aligned}$$

Now we are ready to give a new proof of the fact that the limit of \(\lim _{n\rightarrow \infty }\frac{1}{v(G_n)}\ln Z_{G_n}(N,\underline{\mu })\) is \(\ln \Phi _{d}(N,\underline{\mu })\) for random regular graphs and essentially large girth sequence of regular graphs.

Theorem 3.31

(Sly and Sun [23, 24] building on Dembo and Montanari [12]). Let N be a \(2\times 2\) positive definite matrix with positive entries and let \(\underline{\mu }\in \mathbb {R}_{>0}^2\). If \((G_n)_n\) is an essentially large girth sequence of d-regular graphs, then

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{v(G_n)}\ln Z_{G_n}(N,\underline{\mu })=\ln \Phi _{d}(N,\underline{\mu }). \end{aligned}$$

The same statement holds true for a sequence of random d-regular graphs with probability one.

Proof

We know from Theorem 3.29 that \(\lim _{n\rightarrow \infty }\frac{1}{v(G_n)}\ln Z_{G_n}(N,\underline{\mu })\) exists for an essentially large girth sequence of d-regular graphs. We only have to prove that this limit is \(\ln \Phi _{d}(N,\underline{\mu })\). To prove this it is enough to show one essentially large girth sequence of d-regular graphs \((G_n)_n\) for which \(\lim _{n\rightarrow \infty }\frac{1}{v(G_n)}\ln Z_{G_n}(N,\underline{\mu })=\ln \Phi _d(N,\underline{\mu })\). Let \(G_n\) be a random d-regular graph on n vertices, then by Markov’s inequality

$$\begin{aligned} \mathbb {P}\left( Z_{G_n}(N,\underline{\mu })\ge n^2\mathbb {E}Z_{G_n}(N,\underline{\mu })\right) \le \frac{1}{n^2}. \end{aligned}$$

Note that \(\mathbb {E}Z_{G_n}(N,\underline{\mu })=n^C\Phi _{d}(N,\underline{\mu })^n\) by Theorem 3.16, and for all graph G we have \(Z_{G_n}(N,\underline{\mu })\ge \Phi _{d}(N,\underline{\mu })^{v(G_n)}\) by Theorem 3.25. By Borel–Cantelli lemma we immediately get that \(\lim _{n\rightarrow \infty }\frac{1}{v(G_n)}\ln Z_{G_n}(N,\underline{\mu })=\ln \Phi _d(N,\underline{\mu })\) holds true with probability one. By Lemma 3.30 we can easily find an essentially large girth sequence of d-regular graphs \((G_n)_n\) such that \(\lim _{n\rightarrow \infty }\frac{1}{v(G_n)}\ln Z_{G_n}(N,\underline{\mu })=\ln \Phi _{d}(N,\underline{\mu })\). \(\square \)

3.6 Trigonometric Bethe approximation

In this subsection we will define some trigonometric polynomial and prove that its maximum is exactly the Bethe approximation.

Definition 3.32

By using the notations of Definition 3.3 for \(\underline{a},\underline{b},\underline{\mu }\in \mathbb {R}^2\) let

$$\begin{aligned} \Phi _{\underline{a},\underline{b},\underline{\mu }}(t)= & {} \mu _1a_1(t)^d+\mu _2a_2(t)^d=\mu _1(a_1\cos (t)+b_1\sin (t))^d\\{} & {} +\mu _2(a_2\cos (t)+b_2\sin (t))^d. \end{aligned}$$

The following lemma is an immediate consequence of Lemma 3.4.

Lemma 3.33

If \(N=\underline{a}\underline{a}^T+\underline{b}\underline{b}^T=\hat{\underline{a}}\hat{\underline{a}}^T+\hat{\underline{b}}\hat{\underline{b}}^T\), then there exist an \(s\in \{-1,1\}\) and an \(\alpha \in [0,2\pi ]\) such that

$$\begin{aligned} \Phi _{\hat{\underline{a}},\hat{\underline{b}},\underline{\mu }}(t)=\Phi _{\underline{a},\underline{b},\underline{\mu }}(st+\alpha ). \end{aligned}$$

By Lemma 3.33 we can introduce the following concept.

Definition 3.34

Let N be a \(2\times 2\) positive definite matrix with positive entries, and let \(\underline{\mu }\in \mathbb {R}^2\) be a vector with positive entries. Let \(N=\underline{a}\underline{a}^T+\underline{b}\underline{b}^T\) be any representation of N, then let us define

$$\begin{aligned} \widetilde{\Phi }_d(N,\underline{\mu })=\max _{t\in [0,2\pi ]}\Phi _{\underline{a},\underline{b},\underline{\mu }}(t). \end{aligned}$$

The main theorem of this section is the following.

Theorem 3.35

Let N be a \(2\times 2\) positive definite matrix with positive entries, and let \(\underline{\mu }\in \mathbb {R}^2\) be a vector with positive entries. Then

$$\begin{aligned} \Phi _d(N,\underline{\mu })=\widetilde{\Phi }_d(N,\underline{\mu }). \end{aligned}$$

As a preparation for the proof we introduce some notations. We will also use the notions and tools from Sect. 3.4. The equation \(\textrm{BP}(h)=h\) using the substitution \(R=\frac{h_1}{h_2}\) becomes

$$\begin{aligned} R=\frac{\mu _1}{\mu _2}\left( \frac{N_{11}R+N_{12}}{N_{12}R+N_{22}}\right) ^{d-1}. \end{aligned}$$

Call \(\mathcal {R}_{N,\underline{\mu }}\) the set of non-negative solutions of this equation. By a simple calculation we have

$$\begin{aligned} \overline{\mathbb {F}}(R,N,\underline{\mu })&=\exp \left( \widetilde{\Phi }_{N,\underline{\mu },d}(h)\right) =\left( \sum _{\sigma }\mu _{\sigma }\left( \sum _{\sigma '} N_{\sigma ,\sigma '}h_{\sigma '}\right) ^{d}\right) \left( \sum _{\sigma ,\sigma '}N_{\sigma ,\sigma '}h_{\sigma }h_{\sigma '}\right) ^{\!-\frac{d}{2}}\\&=\mu _1\left[ \frac{N_{11}R+N_{12}}{\sqrt{N_{11}R^2+2N_{12}R+N_{22}}}\right] ^d+ \mu _2\left[ \frac{N_{12}R+N_{22}}{\sqrt{N_{11}R^2+2N_{12}R+N_{22}}} \right] ^d. \end{aligned}$$

We know that

$$\begin{aligned} \Phi _d(N,\underline{\mu })=\max _{R\in \mathcal {R}_{N,\underline{\mu }}}\overline{\mathbb {F}}(R,N,\underline{\mu }) \end{aligned}$$

Let us also choose a representation \(N=\underline{a}\underline{a}^T+\underline{b}\underline{b}^T\), and let

$$\begin{aligned} R(t)=-\frac{b_2(t)}{b_1(t)}\ \ \ \text {and}\ \ \ S(t)=\frac{a_1(t)}{a_2(t)}. \end{aligned}$$

Lemma 3.36

Let \(a_1,a_2,b_1,b_2,\mu _1,\mu _2\in \mathbb {R}\), then for every \(t\in [0,2\pi ]\) such that \(b_1(t),a_2(t)\ne 0\) we have

$$\begin{aligned} S(t)=\frac{N_{11}R(t)+N_{12}}{N_{12}R(t)+N_{22}}. \end{aligned}$$

Furthermore,

$$\begin{aligned} |a_1(t)|=\frac{|N_{11}R(t)+N_{12}|}{\sqrt{N_{11}R(t)^2+2N_{12}R(t)+N_{22}}} \end{aligned}$$

and

$$\begin{aligned} |a_2(t)|=\frac{|N_{12}R(t)+N_{22}|}{\sqrt{N_{11}R(t)^2+2N_{12}R(t)+N_{22}}}, \end{aligned}$$

and if \(t_0\) maximizes \(\Phi _{\underline{a},\underline{b},\underline{\mu }}(t)\), then

$$\begin{aligned} R(t_0)=\frac{\mu _1}{\mu _2}\left( \frac{N_{11}R(t_0)+N_{12}}{N_{12}R(t_0)+N_{22}}\right) ^{d-1}. \end{aligned}$$

Proof

We have

$$\begin{aligned}&\frac{N_{11}R(t)+N_{12}}{N_{12}R(t)+N_{22}}=\frac{N_{11}\left( -\frac{b_2(t)}{b_1(t)}\right) +N_{12}}{N_{12}\left( -\frac{b_2(t)}{b_1(t)}\right) +N_{22}}=\frac{-N_{11}b_2(t)+N_{12}b_1(t)}{-N_{12}b_2(t)+N_{22}b_1(t)}\\&\quad =\frac{(a_1^2+b_1^2)(a_2\sin (t)-b_2\cos (t))+(a_1a_2+b_1b_2)(-a_1\cos (t)+b_1\sin (t))}{(a_1a_2+b_1b_2)(a_2\sin (t)-b_2\cos (t))+(a_2^2+b_2^2)(-a_1\cos (t)+b_1\sin (t))}\\&\quad =\frac{b_1(a_2b_1-a_1b_2)\sin (t)+a_1(a_2b_1-a_1b_2)\cos (t)}{b_2(a_2b_1-a_1b_2)\sin (t)+a_2(a_2b_1-a_1b_2)\cos (t)}\\&\quad =\frac{b_1\sin (t)+a_1\cos (t)}{b_2\sin (t)+a_2\cos (t)}\\&\quad =\frac{a_1(t)}{a_2(t)}\\&\quad =S(t) \end{aligned}$$

Note that we have \((a_2b_1-a_1b_2)^2=(a_1^2+a_2^2)^2(b_1^2+b_2^2)^2-(a_1a_2+b_1b_2)^2=N_{11}N_{22}-N_{12}^2\ne 0\). Next let us prove that

$$\begin{aligned} |a_1(t)|=\frac{|N_{11}R(t)+N_{12}|}{\sqrt{N_{11}R(t)^2+2N_{12}R(t)+N_{22}}}. \end{aligned}$$

Let us multiply both sides with the denominator of the right hand side, and take the square of both sides. Note that by Lemma 3.9 and the decomposition \(N=\underline{a}\underline{a}^T+\underline{b}\underline{b}^T\) we have \(N_{11}=a_1(t)^2+b_1(t)^2\), \(N_{12}=a_1(t)a_2(t)+b_1(t)b_2(t)\) and \(N_{22}=a_2(t)^2+b_2(t)^2\) are true for every t. For ease of notation let

$$\begin{aligned} \textrm{LHS}=a_1(t)^2( N_{11}R(t)^2+2N_{12}R(t)+N_{22}). \end{aligned}$$

Then

$$\begin{aligned} \textrm{LHS}&=a_1(t)^2( N_{11}R(t)^2+2N_{12}R(t)+N_{22})\\&=a_1(t)^2\left( (a_1(t)^2+b_1(t)^2)\frac{b_2(t)^2}{b_1(t)^2}-2(a_1(t)a_2(t)\right. \\&\quad \left. +b_1(t)b_2(t))\frac{b_2(t)}{b_1(t)}+(a_2(t)^2+b_2(t)^2)\right) \\&=a_1(t)^4\frac{b_2(t)^2}{b_1(t)^2}-2a_1(t)^3a_2(t)\frac{b_2(t)}{b_1(t)}+a_1(t)^2a_2(t)^2\\&=\left( -a_1(t)^2\frac{b_2(t)}{b_1(t)}+a_1(t)a_2(t)\right) ^2\\&=\left( -(a_1(t)^2+b_1(t)^2)\frac{b_2(t)}{b_1(t)}+(a_1(t)a_2(t)+b_1(t)b_2(t))\right) ^2\\&=\left( N_{11}R(t)+N_{22}\right) ^2 \end{aligned}$$

The proof of the third identity follows similarly and we omit it.

If \(t_0\) maximizes \(\Phi _{\underline{a},\underline{b},\underline{\mu }}(t)\), in fact, we only need \(\Phi _{\underline{a},\underline{b},\underline{\mu }}'(t_0)=0\), then we have

$$\begin{aligned} R(t_0)=\frac{\mu _1}{\mu _2}S(t_0)^{d-1}=\frac{\mu _1}{\mu _2}\left( \frac{N_{11}R(t_0)+N_{12}}{N_{12}R(t_0)+N_{22}}\right) ^{d-1}. \end{aligned}$$

\(\square \)

Now we are ready to prove Theorem 3.35.

Proof of Theorem 3.35

Note that \(\widetilde{\Phi }_d(N,\underline{\mu })\) does not depend on which representation \(N=\underline{a}\underline{a}^T+\underline{b}\underline{b}^T\) we choose so we can assume by part (ii) of Lemma 3.2 that \(a_1,a_2,b_1,b_2>0\). Then

$$\begin{aligned} \Phi _{\underline{a},\underline{b},\underline{\mu }}(t)=\mu _1(a_1\cos (t)+b_1\sin (t))^d+\mu _2(a_2\cos (t)+b_2\sin (t))^d \end{aligned}$$

is maximized at some \(t_0\in \left[ 0,\frac{\pi }{2}\right] \). Then \(S(t_0)>0\), and by \(R(t_0)=\frac{\mu _1}{\mu _2}S(t_0)^{d-1}\) we get that \(R(t_0)>0\). Thus \(R(t_0)\in \mathcal {R}_{N,\mu }\) and we have

$$\begin{aligned} \widetilde{\Phi }_d(N,\underline{\mu })=\Phi _{\underline{a},\underline{b},\underline{\mu }}(t_0)=\overline{\mathbb {F}}(R(t_0),N,\underline{\mu })\le \max _{R\in \mathcal {R}_{N,\underline{\mu }}}\overline{\mathbb {F}}(R,N,\underline{\mu })=\Phi _d(N,\underline{\mu }). \end{aligned}$$

On the other hand, \(\Phi _d(N,\underline{\mu })=\max _{R\in \mathcal {R}_{N,\underline{\mu }}}\overline{\mathbb {F}}(R,N,\underline{\mu })=\overline{\mathbb {F}}(R_0,N,\underline{\mu })\) for some \(R_0\). Then by Lemma 3.8 there exists a \(t'\in [0,2\pi ]\) such that \(R(t')=R_0\). Note that from

$$\begin{aligned} \frac{a_1(t')}{a_2(t')}=S(t')=\frac{N_{11}R_0+N_{12}}{N_{12}R_0+N_{22}}>0, \end{aligned}$$

we get that \(a_1(t'),a_2(t')\) have the same sign. By changing \(t'\) to \(t'+\pi \) if necessary we can ensure that they are both positive. Hence we have

$$\begin{aligned} a_1(t')=\frac{N_{11}R_0+N_{12}}{\sqrt{N_{11}R_0^2+2N_{12}R_0+N_{22}}}\ \text {and}\ \ a_2(t')=\frac{N_{12}R_0+N_{22}}{\sqrt{N_{11}R_0^2+2N_{12}R_0+N_{22}}}. \end{aligned}$$

Then

$$\begin{aligned} \Phi _d(N,\underline{\mu })=\overline{\mathbb {F}}(R_0,N,\underline{\mu })=\Phi _{\underline{a},\underline{b},\underline{\mu }}(t')\le \max _{t\in [0,2\pi ]}\Phi _{\underline{a},\underline{b},\underline{\mu }}(t)=\widetilde{\Phi }_d(N,\underline{\mu }). \end{aligned}$$

\(\square \)

We end this section with a lemma that we will use later.

Lemma 3.37

Let Q be a \(2\times 2\) matrix with positive entries and positive determinant, and let k be an integer. Then the equation

$$\begin{aligned} R=\left( \frac{Q_{11}R+Q_{12}}{Q_{21}R+Q_{22}}\right) ^{k} \end{aligned}$$

has at most 3 non-negative solutions.

Proof

Let

$$\begin{aligned} f(R)=\left( \frac{Q_{11}R+Q_{12}}{Q_{21}R+Q_{22}}\right) ^{k}-R. \end{aligned}$$

Then \(\frac{\partial ^2}{\partial R^2}f\) is given as

$$\begin{aligned} \left( \frac{Q_{11}R+Q_{12}}{Q_{21}R+Q_{22}}\right) ^{k}\frac{k(Q_{11}Q_{22}-Q_{12}Q_{21}) ((k-1)Q_{11}Q_{22}-(k+1)Q_{12}Q_{21}-2Q_{11}Q_{21}R)}{(Q_{11}R+Q_{12})^2(Q_{12}R+Q_{22})^2}. \end{aligned}$$

Let

$$\begin{aligned} R^*=\frac{(k-1)Q_{11}Q_{22}-(k+1)Q_{12}Q_{21}}{2Q_{11}Q_{21}}. \end{aligned}$$

If \(R^*\le 0\), then f is concave of \([0,\infty )\) and so it has at most two solutions.

If \(R^*> 0\), then \(\frac{\partial ^2}{\partial R^2}f\) is positive if \(0\le R<R^*\), and negative if \(R>R^*\). If \(f(R^*)>0\), then f has at most 2 solutions on \((0,R^*)\) and has at most 1 solution on \((R^*,\infty )\). If \(f(R^*)<0\), then f has at most 1 solution on \((0,R^*)\) and has at most 2 solutions on \((R^*,\infty )\). Finally, if \(f(R^*)=0\), then f has at most 1 solution on \((0,R^*)\) and has at most 1 solutions on \((R^*,\infty )\). So in all cases it has at most 3 solutions. \(\square \)

3.7 The vector \(\underline{v}(t_0)\)

Let \(t_0\) be the maximizer of the function \(\Phi _{\underline{a},\underline{b},\underline{\mu }}\). In this section we study the vector \(\underline{v}(t_0)=(r_0(t_0),r_1(t_0),\dots ,r_d(t_0))\). First we need a simple lemma.

Lemma 3.38

Let \(\underline{a},\underline{b},\underline{\mu }\in \mathbb {R}^r\). For \(j=1,\dots ,r\) let

$$\begin{aligned} a_j(t)=a_j\cos (t)+b_j\sin (t)\ \ \ \text {and}\ \ \ b_j(t)=-a_j\sin (t)+b_j\cos (t). \end{aligned}$$

Let \(r_j(t)=\sum _{k=1}^r\mu _ka_k(t)^{d-j}b_k(t)^{j}\), then

$$\begin{aligned} r_0(t)=\sum _{j=0}^d\left( {\begin{array}{c}d\\ j\end{array}}\right) r_j(0)\cos (t)^{d-j}\sin (t)^j. \end{aligned}$$

Proof

We have

$$\begin{aligned} r_0(t)&=\sum _{k=1}^r\mu _ka_k(t)^d\\&=\sum _{k=1}^r\mu _k(a_k\cos (t)+b_k\sin (t))^d\\&=\sum _{k=1}^r\mu _k\sum _{j=0}^d\left( {\begin{array}{c}d\\ j\end{array}}\right) a_k^{d-j}b_k^{j}\cos (t)^{d-j}\sin (t)^j\\&=\sum _{j=0}^d\left( {\begin{array}{c}d\\ j\end{array}}\right) \cos (t)^{d-j}\sin (t)^j\left( \sum _{k=1}^r\mu _ka_k^{d-j}b_k^{j}\right) \\&=\sum _{j=0}^d\left( {\begin{array}{c}d\\ j\end{array}}\right) r_j(0)\cos (t)^{d-j}\sin (t)^j \end{aligned}$$

\(\square \)

Consider the vector \(\underline{v}(t_0)=(r_0(t_0),r_1(t_0),\dots ,r_d(t_0))\). We show that \(r_1(t_0)=0\) and \(r_j(t_0)\ge 0\) if j is even, and the numbers \(r_j(t_0)\) have the same sign for odd \(j\ge 3\). This follows from the following more general lemma.

Lemma 3.39

Let \(\mu _1,\mu _2>0\) and \(a_1,a_2,b_1,b_2\in \mathbb {R}\) such that \(a_1,a_2,b_1,b_2>0\). Let \(t_0\in \left[ 0,\frac{\pi }{2}\right] \) be the maximizer of \(\Phi _{\underline{a},\underline{b},\underline{\mu }}(t)=\mu _1a_1(t)^d+\mu _2a_2(t)^d\). Let

$$\begin{aligned} r_j(t)=\mu _1a_1(t)^{d-j}b_1(t)^j+\mu _2a_2(t)^{d-j}b_2(t)^j. \end{aligned}$$

Then \(r_1(t_0)=0\) and either

  1. (i)

    \(r_j(t_0)\ge 0\) for \(j=0,\dots ,d\) or

  2. (ii)

    \(r_j(t_0)\ge 0\) for even j, and \(r_j(t_0)\le 0\) for odd j.

Proof

Observe that \(\frac{\partial }{\partial t}a_1(t)=b_1(t)\) and \(\frac{\partial }{\partial t}a_2(t)=b_2(t)\) and so

$$\begin{aligned} \frac{\partial }{\partial t}r_0(t)=d\mu _1a_1(t)^{d-1}b_1(t)+d\mu _2a_2(t)^{d-1}b_2(t)=dr_1(t). \end{aligned}$$

Hence if \(t_0\) maximizes \(r_0(t)\), then \(r_1(t_0)=0\).

To prove the inequalities we need to study several cases. First of all, \(r_1(t_0)=0\) implies that \(a_1(t_0)b_1(t_0)=0\) if and only if \(a_2(t_0)b_2(t_0)=0\). If \(a_1(t_0)b_1(t_0)=a_2(t_0)b_2(t_0)=0\), then \(r_1(t_0)=r_2(t_0)=\dots =r_{d-1}(t_0)=0\). We know that \(r_0(t_0)\ge r_0(0)=\mu _1a_1^d+\mu _2a_2^d>0\). Finally, \(r_d(t_0)=\mu _1b_1(t_0)^d+\mu _2b_1(t_0)^d\ge 0\) if d is even. If d is odd, then no matter what the sign of \(r_d(t_0)\) case (i) or (ii) is satisfied.

So we can assume that \(a_1(t_0)b_1(t_0)\ne 0\) and \(a_2(t_0)b_2(t_0)\ne 0\). By symmetry we can assume that \(|\mu _2a_2(t_0)^d|\ge |\mu _1a_1(t_0)^d|\). Note that

$$\begin{aligned} r_j(t_0)=\mu _1a_1(t_0)^d\left( \frac{b_1(t_0)}{a_1(t_0)}\right) ^j\left( 1+\frac{\mu _2a_2(t_0)^d}{\mu _1a_1(t_0)^d}\left( \frac{a_1(t_0)b_2(t_0)}{a_2(t_0)b_1(t_0)}\right) ^j\right) . \end{aligned}$$

From \(r_1(t_0)=0\) we get that

$$\begin{aligned} \left| \frac{a_1(t_0)b_2(t_0)}{a_2(t_0)b_1(t_0)}\right| =\left| \frac{\mu _1a_1(t_0)^d}{\mu _2a_2(t_0)^d}\right| \le 1. \end{aligned}$$

Note that if \(\mu _2a_2(t_0)^d=\mu _1a_1(t_0)^d>0\), then \(\frac{a_1(t_0)b_2(t_0)}{a_2(t_0)b_1(t_0)}=-1\). Then for odd j we have \(r_j(t_0)=0\), and for even j all terms are positive in the above product, so \(r_j(t_0)>0\).

If \(\mu _2a_2^d>\mu _1a_1^d\), then \(\left| \frac{a_1(t_0)b_2(t_0)}{a_2(t_0)b_1(t_0)}\right| <1\), and so the last term is positive for all \(j\ge 2\). Then if \(\frac{b_1(t_0)}{a_1(t_0)}> 0\), then \(r_j(t_0)\ge 0\) for all j, and if \(\frac{b_1(t_0)}{a_1(t_0)}<0\), then \(r_j(t)\) is positive for even j and negative for odd \(j\ge 3\). We are done. \(\square \)

Remark 3.40

Note that besides \(r_1(t_0)=0\) we also have \(\frac{r_2(t_0)}{r_0(t_0)}\le \frac{1}{d-1}\) since \(\frac{\partial ^2}{\partial t^2}r_0(t)=d((d-1)r_2(t)-r_0(t))\) should be non-positive at \(t=t_0\).

The following theorem is a strengthening of Theorem 3.25 with a new proof.

Theorem 3.41

Let N be a \(2\times 2\) positive definite matrix with positive entries and let \(\underline{\mu }\in \mathbb {R}_{>0}^2\). For any d-regular graph G we have \(Z_G(N,\underline{\mu })\ge \Phi _{d}(N,\underline{\mu })^{v(G)}\). Furthermore, if G contains \(\varepsilon v(G)\) cycles of length at most g, then there exists a \(\delta =\delta (d,N,\underline{\mu },\varepsilon ,g)>0\) such that \(Z_G(N,\underline{\mu })\ge ((1+\delta )\Phi _{d}(N,\underline{\mu }))^{v(G)}\).

The proof presented below is strongly inspired by the work of Chertkov and Chernyak [10] on loop series and gauge transformation. The paper of Borbényi and Csikvári [5] contains a similar proof about the number of Eulerian orientations in regular graphs.

Proof

By part (ii) of Lemma 3.2 we can choose \(\underline{a},\underline{b}\in \mathbb {R}^2\) such that \(a_1,a_2,b_1,b_2>0\). Then \(r_j(0)=\mu _1a_1^{d-j}b_1^j+\mu _2a_2^{d-j}b_2^j>0\) for all \(j\in \{0,1,\dots ,d\}\). This implies that

$$\begin{aligned} \Phi _{\underline{a},\underline{b},\underline{\mu }}(t)=r_0(t)=\sum _{j=0}^d\left( {\begin{array}{c}d\\ j\end{array}}\right) r_j(0)\cos (t)^{d-j}\sin (t)^j \end{aligned}$$

has a maximizer \(t_0\) in the interval \(\left[ 0,\frac{\pi }{2}\right] \). Indeed, for any \(t\in [0,2\pi ]\) there is a \(t'\in \left[ 0,\frac{\pi }{2}\right] \) such that \(\cos (t')=|\cos (t)|\) and \(\sin (t')=|\sin (t)|\), so \(|\Phi _{\underline{a},\underline{b},\underline{\mu }}(t)|\le \Phi _{\underline{a},\underline{b},\underline{\mu }}(t')\). Thus the conditions of Lemma 3.39 are satisfied. We have

$$\begin{aligned} Z_G(N,\underline{\mu })=F_G(r_0(t_0),\dots ,r_d(t_0))= \sum _{A \subseteq E(G)}\prod _{v\in V}r_{d_A(v)}(t_0). \end{aligned}$$

For each \(A\subseteq E(G)\) the number of vertices with odd \(d_A(v)\) is even, so by Lemma 3.39 each term in the sum is non-negative. Then taking \(A=\emptyset \) we get that

$$\begin{aligned} Z_G(N,\underline{\mu })\ge r_0(t_0)^{v(G)}=\Phi _{d}(N,\underline{\mu })^{v(G)}. \end{aligned}$$

This completes the proof of the first part.

To prove the second part first observe that \(r_2(t_0)>0\). Indeed, since \(a_1,a_2,b_1,b_2>0\) and \(t_0\in \left[ 0,\frac{\pi }{2}\right] \) we get that \(a_1(t_0),a_2(t_0)>0\) and thus

$$\begin{aligned} r_2(t_0)=\mu _1a_1(t_0)^{d-2}b_1(t_0)^2+\mu _2a_2(t_0)^{d-2}b_2(t_0)^2=0 \end{aligned}$$

would imply that \(b_1(t_0)=b_2(t_0)=0\) which then implies that

$$\begin{aligned} N_{11}N_{22}-N_{12}^2=(a_1(t_0)b_2(t_0)-a_2(t_0)b_1(t_0))^2=0 \end{aligned}$$

contradicting the positive definiteness of N. Also observe that if G contains \(\varepsilon v(G)\) cycles of length at most g, then it also contains \(\varepsilon ' v(G)\) vertex-disjoint cycles of length at most g for some \(\varepsilon '\) depending on dg and \(\varepsilon \), but not depending on v(G). Then

$$\begin{aligned} Z_G(N,\underline{\mu })\ge \Phi _d(N,\underline{\mu })^{v(G)}\left( 1+\left( \frac{r_2(t_0)}{r_0(t_0)}\right) ^g\right) ^{\varepsilon ' v(G)}. \end{aligned}$$

Indeed, we can consider those sets \(A\subseteq E(G)\) that consists of the union of some of the vertex-disjoint cycles of length at most g. Here we also use the fact that \(0\le \frac{r_2(t_0)}{r_0(t_0)}\le \frac{1}{d-1}\le 1\) by Remark 3.40. Hence \(1+\delta =\left( 1+\left( \frac{r_2(t_0)}{r_0(t_0)}\right) ^g\right) ^{\varepsilon '}\) satisfies the claim of the theorem. \(\square \)

Remark 3.42

Theorem 3.41 implies that if \((G_n)_n\) is a sequence of d-regular graphs such that it is not essentially large girth, then

$$\begin{aligned} \limsup _{n\rightarrow \infty }\frac{1}{v(G_n)}\ln Z_{G_n}(N,\underline{\mu })>\ln \Phi _d(N,\underline{\mu }). \end{aligned}$$

Since \(Z_G(q,w)\ge Z_{G}^{(2)}(q,w)\) for \(q\ge 2\) and \(w\ge 0\) this kind of stability statement is also true for \(Z_G(q,w)\).

3.8 Mixed state

In this section we introduce a concept that is strongly related to the phase transition of the random cluster model. We will see that there exists a \(w_c=w_c(q)\) such that if \(0\le w\le w_c\), then \(\Phi _{d,q,w}=q\left( 1+\frac{w}{q}\right) ^{d/2}\), and if \(w>w_c\), then \(\Phi _{d,q,w}>q\left( 1+\frac{w}{q}\right) ^{d/2}\). The problem with such a statement is that it depends on the parametrization (qw), for a general model \((N,\underline{\mu })\) it does not make sense. On the other hand, there is a concept that makes sense even for general \((N,\underline{\mu })\), where N is a \(2\times 2\) positive definite matrix.

Definition 3.43

We say that \((N,\underline{\mu })\) exhibits a mixed state for a fixed positive integer d if \(R_c(N,\underline{\mu })=1\).

Note that \(R_c(N,\underline{\mu })=1\) does not depend on which representation \(N=\underline{a}\underline{a}^T+\underline{b}\underline{b}^T\) we choose. We also know that \(R=R_c(N,\underline{\mu })\) is a solution of

$$\begin{aligned} (N_{11}N_{22} - N_{12}^2) R^4 + ( -N_{22}^2T + 2N_{12}^2 - N_{11}^2T^{-1} ) R^2 + (N_{11}N_{22}- N_{12}^2)=0, \end{aligned}$$

where \(T=\left( \frac{\mu _2}{\mu _1}\right) ^{2/d}\). This shows that \((N,\underline{\mu })\) exhibits a mixed state for d if

$$\begin{aligned} 2(N_{11}N_{22} - N_{12}^2)-(N_{22}^2T - 2N_{12}^2 + N_{11}^2T^{-1} )=0. \end{aligned}$$

The main lemma of this section is the following.

Lemma 3.44

Let \(N=\underline{a}\underline{a}^T+\underline{b}\underline{b}^T\) for some \(a_1,a_2,b_1,b_2\in \mathbb {R}\) and let \(\mu _1,\mu _2>0\). Suppose that \((N,\underline{\mu })\) exhibits a mixed state for d, that is, for some \(t_1\) we have

$$\begin{aligned} a_1(t_1)=\left( \frac{\mu _2}{\mu _1}\right) ^{1/d}(-b_2(t_1))\ \ \text {and}\ \ \ -b_1(t_1)=\left( \frac{\mu _2}{\mu _1}\right) ^{1/d}a_2(t_1). \end{aligned}$$

Let \(t_2=2t_1-\frac{\pi }{2}\). Then for every \(t\in \mathbb {R}\) we have

$$\begin{aligned} \mu _1a_1(t)^d=\mu _2a_2(t_2-t)^d\ \ \ \text {and}\ \ \ \mu _2a_2(t)^d=\mu _1a_1(t_2-t)^d. \end{aligned}$$

In particular,

$$\begin{aligned} \Phi _{\underline{a},\underline{b},\underline{\mu }}(t)=\Phi _{\underline{a},\underline{b},\underline{\mu }}(t_2-t). \end{aligned}$$

Proof

Note that for any \(u_1,u_2\in \mathbb {R}\) we have

$$\begin{aligned} a_1(u_1+u_2)=a_1(u_1)\cos (u_2)+b_1(u_1)\sin (u_2). \end{aligned}$$

To prove \(\mu _1a_1(t)^d=\mu _2a_2(t_2-t)^d\) it will be more convenient to prove the statement in the form

$$\begin{aligned} a_2(t_1-t)=\left( \frac{\mu _1}{\mu _2}\right) ^{1/d}a_1\left( t+t_1-\frac{\pi }{2}\right) . \end{aligned}$$

This is indeed true,

$$\begin{aligned} a_2(t_1-t)&=a_2(t_1)\cos (-t)+b_2(t_1)\sin (-t)\\&=a_2(t_1)\cos (t)-b_2(t_1)\sin (t)\\&=\left( \frac{\mu _1}{\mu _2}\right) ^{1/d}(-b_1(t_1)\cos (t)+a_1(t_1)\sin (t))\\&=\left( \frac{\mu _1}{\mu _2}\right) ^{1/d}\left( a_1(t_1)\cos \left( t-\frac{\pi }{2}\right) +b_1(t_1)\sin \left( t-\frac{\pi }{2}\right) \right) \\&=\left( \frac{\mu _1}{\mu _2}\right) ^{1/d}a_1\left( t+t_1-\frac{\pi }{2}\right) \end{aligned}$$

By symmetry the other claim is also true. \(\square \)

3.9 Specialization to \(N=M'_2\) and \(\underline{\mu }=\underline{\nu }_2\)

In this section we collected some results that are specialized to \(N=M'_2\) and \(\underline{\mu }=\underline{\nu }_2\). In particular, we will choose \(a_1=a_2=\sqrt{1+\frac{w}{q}}\), \(b_1=\sqrt{\frac{(q-1)w}{q}}\) and \(b_2=-\sqrt{\frac{w}{q(q-1)}}\).

Lemma 3.45

Let \(q\ge 2,w\ge 0\). Let \(a_1=a_2=\sqrt{1+\frac{w}{q}}\), \(b_1=\sqrt{\frac{(q-1)w}{q}}\), \(b_2=-\sqrt{\frac{w}{q(q-1)}}\), \(\nu _1=1\) and \(\nu _2=q-1\). Let \(r_j(0)=\nu _1a_1^{d-j}b_1^j+\nu _2a_2^{d-j}b_2^j\). Then we have \(r_j(0)\ge 0\) for \(j=0,1,\dots ,d\) and \(r_1(0)=0\).

Proof

We have

$$\begin{aligned} r_j(0)&=\left( 1+\frac{w}{q}\right) ^{(d-j)/2}\left( \frac{(q-1)w}{q}\right) ^{j/2}+(q-1)(-1)^j \left( 1+\frac{w}{q}\right) ^{(d-j)/2}\left( \frac{w}{q(q-1)}\right) ^{j/2}\\&=\left( 1+\frac{w}{q}\right) ^{(d-j)/2}\left( \frac{w}{q(q-1)}\right) ^{j/2}((q-1)^j+(-1)^j(q-1)) \end{aligned}$$

This is 0 if \(j=1\), and positive if \(j\ne 1\). \(\square \)

Recall that

$$\begin{aligned} \Phi _{d,q,w}=\max _{t\in [0,2\pi ]}\Phi _{d,q,w}(t). \end{aligned}$$

The next lemma shows that it is enough to consider the interval \(\left[ 0,\frac{\pi }{2}\right] \) to find the maximum when \(q\ge 2\) and \(w\ge 0\).

Lemma 3.46

If \(q\ge 2\) and \(w\ge 0\), then there is a \(t_0\in \left[ 0,\frac{\pi }{2}\right] \) for which \(\Phi _{d,q,w}=\Phi _{d,q,w}(t_0)\).

Proof

By Lemma 3.38 we have

$$\begin{aligned} \Phi _{d,q,w}(t)=r_0(t)=\sum _{j=0}^d\left( {\begin{array}{c}d\\ j\end{array}}\right) r_j(0)\cos (t)^{d-j}\sin (t)^j. \end{aligned}$$

By Lemma 3.45 we have \(r_j(0)\ge 0\) for all \(j\in \{0,1,\dots ,d\}\) if \(q\ge 2\) and \(w\ge 0\). For any \(t\in [0,2\pi ]\) there is a \(t'\in \left[ 0,\frac{\pi }{2}\right] \) such that \(|\cos (t)|=\cos (t')\) and \(|\sin (t)|=\sin (t')\), thus

$$\begin{aligned} |\Phi _{d,q,w}(t)|\le \sum _{j=0}^d\left( {\begin{array}{c}d\\ j\end{array}}\right) r_j(0)|\cos (t)|^{d-j}|\sin (t)|^j= \Phi _{d,q,w}(t'). \end{aligned}$$

Hence \(\max _{t\in [0,2\pi ]}\Phi _{d,q,w}(t)=\max _{t\in \left[ 0,\frac{\pi }{2}\right] }\Phi _{d,q,w}(t).\) \(\square \)

The next lemma will be useful to get even more precise bounds on \(\tan (t_0)\).

Lemma 3.47

Let \(q\ge 2\) and \(w\ge 0\). Let \(\overline{t}\in \left[ 0,\frac{\pi }{2}\right] \) such that

$$\begin{aligned} \frac{\partial }{\partial t}\Phi _{d,q,w}(t)\bigg |_{t=\overline{t}}=0. \end{aligned}$$

Then we have \(a_{q,w,1}(\overline{t}),a_{q,w,2}(\overline{t}),b_{q,w,1}(\overline{t})>0\) and \(b_{q,w,2}(\overline{t})<0\). In particular, this is true if \(\overline{t}=t_0\) maximizing the function \(\Phi _{d,q,w}(t)\) in the interval \(\left[ 0,\frac{\pi }{2}\right] \).

Proof

Note that \(\Phi _{d,q,w}(t)=a_{q,w,1}(t)^d+(q-1)a_{q,w,2}(t)^d\), and its derivative is

$$\begin{aligned} \frac{\partial }{\partial t}\Phi _{d,q,w}(t)=d(b_{q,w,1}(t)a_{q,w,1}(t)^{d-1}+b_{q,w,2}(t)a_{q,w,2}(t)^{d-1}). \end{aligned}$$

Note that \(a_{q,w,1}(t)>0\) and \(b_{q,w,2}(t)<0\) for all \(t\in \left[ 0,\frac{\pi }{2}\right] \). Suppose for contradiction that \(b_{q,w,1}(\overline{t})<0\). Then from \(b_{q,w,1}(\overline{t})a_{q,w,1}(\overline{t})^{d-1}+b_{q,w,2}(\overline{t})a_{q,w,2}(\overline{t})^{d-1}=0\) we also get that \(a_{q,w,2}(\overline{t})^{d-1}<0\), that is, \(a_{q,w,2}(\overline{t})<0\) and d is even. Then

$$\begin{aligned} \left( \frac{a_{q,w,1}(\overline{t})}{-a_{q,w,2}(\overline{t})}\right) ^{d-1}=\frac{(q-1)(-b_{q,w,2}(\overline{t}))}{-b_1(\overline{t})}. \end{aligned}$$

Note that \(a_{q,w,1}(t)>-a_{q,w,2}(t)\) for all \(t\in \left[ 0,\frac{\pi }{2}\right] \), and so

$$\begin{aligned} \left( \frac{a_{q,w,1}(\overline{t})}{-a_{q,w,2}(\overline{t})}\right) ^{d-1}>\frac{a_{q,w,1}(\overline{t})}{-a_{q,w,2}(\overline{t})}. \end{aligned}$$

By \(a_{q,w,1}(t)b_{q,w,1}(t)+(q-1)a_{q,w,2}(t)b_{q,w,2}(t)=-q\cos (t)\sin (t)\) we have

$$\begin{aligned} \frac{a_{q,w,1}(\overline{t})}{-a_{q,w,2}(\overline{t})}>\frac{(q-1)(-b_{q,w,2}(\overline{t}))}{-b_{q,w,1}(\overline{t})}, \end{aligned}$$

but then

$$\begin{aligned} \left( \frac{a_{q,w,1}(\overline{t})}{-a_{q,w,2}(\overline{t})}\right) ^{d-1}>\frac{a_{q,w,1}(\overline{t})}{-a_{q,w,2}(\overline{t})}>\frac{(q-1)(-b_{q,w,2}(\overline{t}))}{-b_{q,w,1}(\overline{t})} \end{aligned}$$

leads to a contradiction. Hence \(b_{q,w,1}(\overline{t})>0\). But then \(a_{q,w,1}(\overline{t})a_{q,w,2}(\overline{t})+b_{q,w,1}(\overline{t})b_{q,w,2}(\overline{t})=1\) implies that \(a_{q,w,2}(\overline{t})>0\). \(\square \)

Finally, we collected some claims about the derivatives of \(\Phi _{d,q,w}(t)\) at \(t=0.\)

Lemma 3.48

We have

$$\begin{aligned} \frac{\partial }{\partial t}\Phi _{d,q,w}(t)\bigg |_{t=0}=0\ \ \ \text {and}\ \ \ \frac{\partial ^2}{\partial t^2}\Phi _{d,q,w}(t)\bigg |_{t=0}=\left( 1+\frac{w}{q}\right) ^{d/2-1}((d-2)w-q). \end{aligned}$$

In particular, if \(w<\frac{q}{d-2}\), then the function \(\Phi _{d,q,w}(t)\) has a local maximum at \(t=0\), and if \(w>\frac{q}{d-2}\) the function \(\Phi _{d,q,w}(t)\) has a local minimum at \(t=0.\)

3.9.1 Mixed state and phase transition.

In this section we discuss the mixed state and phase transition of \(Z^{(2)}_G(q,w)\).

We know that \((N,\underline{\mu })\) exhibits mixed state for some d if

$$\begin{aligned} 2(N_{11}N_{22} - N_{12}^2)-(N_{22}^2T - 2N_{12}^2 + N_{11}^2T^{-1} )=0, \end{aligned}$$

where \(T=\left( \frac{\mu _2}{\mu _1}\right) ^{1/d}\). Applying this equation to \((M'_2,\underline{\nu }_2)\) we get that

$$\begin{aligned}{} & {} 2\left( (1+w)\left( 1+\frac{w}{q-1}\right) -1\right) -\left( \left( 1+\frac{w}{q-1}\right) ^2(q-1)^{2/d}-2+(1+w)^2(q-1)^{-2/d}\right) \\{} & {} \quad = -(((q-1)^{-1/d}-(q-1)^{1/d-1})w-((q-1)^{1/d}-(q-1)^{-1/d}))^2. \end{aligned}$$

Note that if \(q=2\), then this is constant 0, so in the case of \(q=2\), the spin system \((M'_2,\underline{\nu }_2)\) always exhibits mixed state. If \(q\ne 2\), then the solution \(w_c=w_c(q)\) of this equation is

$$\begin{aligned} w_c=\frac{(q-1)^{1/d}-(q-1)^{-1/d}}{(q-1)^{-1/d}-(q-1)^{1/d-1}}=\frac{(q-1)-(q-1)^{1-2/d}}{(q-1)^{1-2/d}-1}=\frac{q-2}{(q-1)^{1-2/d}-1}-1. \end{aligned}$$

Note that by L’Hôpital’s rule we have

$$\begin{aligned} \lim _{q\rightarrow 2+}w_c(q)=\frac{2}{d-2}, \end{aligned}$$

so we will define \(w_c(2)=\frac{2}{d-2}\) even though the spin system \((M'_2,\underline{\nu }_2)\) itself always exhibits mixed state for every w if \(q=2\).

The main theorem of this section is the following. It asserts that the mixed state also describes a phase transition in the value of \(\Phi _{d,q,w}\).

Theorem 3.49

Let \(q\ge 2\). If \(0\le w\le w_c\), then \(\Phi _{d,q,w}=q\left( 1+\frac{w}{q}\right) ^{d/2}\). If \(w>w_c\), then \(\Phi _{d,q,w}>q\left( 1+\frac{w}{q}\right) ^{d/2}\).

Before we start to prove Theorem 3.49 we need a lemma about the curve \((q,w_c(q))\). For a visualization of this lemma see the dashed curve on Fig. 4.

Lemma 3.50

For every \(q\ge 2\) let \(x(q)=1+\frac{q}{w_c(q)}\) and \(y(q)=1+w_c(q)\). Then the curve (x(q), y(q)) on the (xy) plane is the graph of a monotone increasing function. Furthermore, \(w_c(q)\le \frac{q}{d-2}\).

Proof

We have

$$\begin{aligned} y=w_c(q)+1=\frac{q-2}{(q-1)^{1-2/d}-1}, \end{aligned}$$

and \(q=(x(q)-1)(y(q)-1)\). Thus

$$\begin{aligned} \frac{\partial y}{\partial x}=\frac{\partial y}{\partial q}\cdot \frac{\partial q}{\partial x}=\frac{(q-1)^{1-2/d}-1-(1-2/d)(q-1)^{-2/d}(q-2)}{((q-1)^{1-2/d}-1)^2}\cdot (y-1). \end{aligned}$$

Clearly, \(y-1=w>0\) so we only need to show that the first term is also positive. We can rewrite the numerator as

$$\begin{aligned} \frac{2}{d}(q-1)^{-2/d}(q-2)+(q-1)^{-2/d}\!-\!1\!=\!(q-1)^{-2/d}\left( \frac{2}{d}(q-2)+1-(q-1)^{2/d}\right) . \end{aligned}$$

This is 0 if \(q=2\) and its derivative is \(\frac{2}{d} (1-(q-1)^{-2/d})\ge 0\) for \(q\ge 2\). Hence \(\frac{dy }{dx}>0\).

The second part follows from the first part since if we follow any hyperbola \((x-1)(y-1)=q\) while decreasing x, and hence increasing y, it intersects the curve (x(q), y(q)) before hitting the line \(x=d-1\). The intersection of the hyperbola \((x-1)(y-1)=q\) and the line \(x=d-1\) is at \(y=1+\frac{q}{d-2}\), thus \(w_c(q)\le \frac{q}{d-2}\).

Fig. 4
figure 4

A generic hyperbola intersecting the line \(x=d-1\) and curve of the phase transition

We decompose the proof of Theorem 3.49 into three propositions dealing with \(w=w_c\), \(0\le w<w_c\) and \(w>w_c\).

Proposition 3.51

For \(q\ge 2\) and \(w=w_c\) we have \(\Phi _{d,q,w}=q\left( 1+\frac{w}{q}\right) ^{d/2}\).

Proof

Let us assume that \(q>2\), the statement for \(q=2\) follows by continuity. We show that for \(w=w_c\) the function \(\Phi _{d,q,w}(t)\) has a global maximizer in \((0,\pi /2)\) with value \(\Phi _{d,q,w}(0)=q\left( 1+\frac{w}{q}\right) ^{d/2}\). We know that for \(w=w_c\) there is a \(t_1\in \left[ 0,\frac{\pi }{2}\right] \) such that

$$\begin{aligned} \frac{a_{q,w,2}(t_1)}{-b_{q,w,1}(t_1)}=\left( \frac{\nu _1}{\nu _2}\right) ^{1/d}. \end{aligned}$$

This means that

$$\begin{aligned} \frac{\sqrt{1+\frac{w}{q}}\cos (t_1)-\sqrt{\frac{w}{q(q-1)}}\sin (t_1)}{\sqrt{1+\frac{w}{q}}\sin (t_1)-\sqrt{\frac{(q-1)w}{q}}\cos (t_1)}=(q-1)^{-1/d}\le 1. \end{aligned}$$

Note that if \(q=2\), then \(t_1=\frac{\pi }{4}\) is a solution. If \(q>2\), then

$$\begin{aligned} \frac{\sin (t_1)}{\cos (t_1)}\ge \frac{\sqrt{1+\frac{w}{q}}+\sqrt{\frac{(q-1)w}{q}}}{\sqrt{1+\frac{w}{q}}+\sqrt{\frac{w}{q(q-1)}}}>1 \end{aligned}$$

showing that \(t_1\in \left( \frac{\pi }{4},\frac{\pi }{2}\right) \). Let \(t_2=2t_1-\frac{\pi }{2}\in \left( 0,\frac{\pi }{2}\right) \). By Lemma 3.44 we know that \(\Phi _{d,q,w_c}(t)=\Phi _{d,q,w_c}(t_2-t)\). (For an example of the graph of a function \(\Phi _{d,q,w_c}(t)\) see Fig. 5)

Fig. 5
figure 5

For \(d=4\) and \(q=10\) we have \(w_c=3\). The graph of the trigonometric polynomial \(\Phi _{4,10,3}(t)\) is depicted in the figure

We know that

$$\begin{aligned} \frac{\partial }{\partial t}\Phi _{d,q,w_c}(t)\bigg |_{t=0}=0. \end{aligned}$$

This immediately implies that

$$\begin{aligned} \frac{\partial }{\partial t}\Phi _{d,q,w_c}(t)\bigg |_{t=t_2}=0. \end{aligned}$$

The equation \(\Phi _{d,q,w_c}(t)=\Phi _{d,q,w_c}(t_2-t)\) also implies that

$$\begin{aligned} \frac{\partial }{\partial t}\Phi _{d,q,w_c}(t)\bigg |_{t=t_2/2}=0. \end{aligned}$$

This means that a computation similar to the one in Sect. 3.6 gives that if

$$\begin{aligned} R(t)=-(q-1)\frac{b_{q,w,2}(t)}{b_{q,w,1}(t)}\ \ \ \text {and}\ \ \ S(t)=\frac{a_{q,w,1}(t)}{a_{q,w,2}(t)}=\frac{(1+w)R(t)+q-1}{R(t)+q+w-1}, \end{aligned}$$

then the values \(R=R(0),R\left( \frac{t_2}{2}\right) ,R(t_2)\) are all solutions of the equation

$$\begin{aligned} R=\left( \frac{(1+w)R+q-1}{R+w+q-1}\right) ^{d-1}. \end{aligned}$$

The values \(R(0),R\left( \frac{t_2}{2}\right) ,R(t_2)\) are at least 1, because by Lemma 3.47 they are non-negative, and \(S(t)>1\) whenever \(t\in \left[ 0,\frac{\pi }{2}\right] \) and both \(a_{q,w,1}(t),a_{q,w,2}(t)>0\). The equation

$$\begin{aligned} R=\left( \frac{(1+w)R+q-1}{R+w+q-1}\right) ^{d-1} \end{aligned}$$

has at most 3 solutions satisfying \(R\ge 1\) by Lemma 3.37 which means that there is no other \(t'\in \left( 0,\frac{\pi }{2}\right) \) that is a local maximizer or minimizer of \(\Phi _{d,q,w}(t)\). Note that \(\frac{d^2}{dt^2}\Phi _{d,q,w_c}\bigg |_{t=0}<0\) since \(w_c(q)<\frac{q}{d-2}\) by Lemma 3.50. So at \(\frac{t_2}{2}\) we have a local minimum, and at \(t_2\) we have a local maximum. Hence

$$\begin{aligned} \Phi _{d,q,w_c}=\Phi _{d,q,w_c}(0)=\Phi _{d,q,w_c}(t_2)=q\left( 1+\frac{w}{q}\right) ^{d/2}. \end{aligned}$$

\(\square \)

Proposition 3.52

For \(q\ge 2\) and \(0\le w\le w_c\) we have \(\Phi _{d,q,w}=q\left( 1+\frac{w}{q}\right) ^{d/2}\).

Proof

We will describe the pairs (qw) for which \(\Phi _{d,q,w}=q\left( 1+\frac{w}{q}\right) ^{d/2}\). To do this it is better to use the Tutte polynomial \(T_G(x,y)\) instead of \(Z_G(q,w)\) with \(q=(x-1)(y-1)\) and \(w=y-1\). Recall that the connection between the Tutte polynomial and the partition function of the random cluster model is the following:

$$\begin{aligned} T_G(x,y)=(x-1)^{-k(E)}(y-1)^{-v(G)}Z_G((x-1)(y-1),y-1). \end{aligned}$$

Then for \(q\ge 2\) and an essentially large girth sequence of d-regular graphs \((G_n)_n\) the statement

$$\begin{aligned} \lim _{n\rightarrow \infty } Z^{(2)}_{G_n}(q,w)^{1/v(G_n)}=\lim _{n\rightarrow \infty } Z_{G_n}(q,w)^{1/v(G_n)}=q\left( 1+\frac{w}{q}\right) ^{d/2} \end{aligned}$$

is equivalent with

$$\begin{aligned} \lim _{n\rightarrow \infty }T_{G_n}(x,y)^{1/v(G_n)}=x\left( 1+\frac{1}{x-1}\right) ^{d/2-1}. \end{aligned}$$

This is independent of y. The Tutte polynomial has only non-negative coefficients [25], so if this limit value holds true for \((x,y_1)\) and \((x,y_2)\), then so for every \(y\in [y_1,y_2]\). Note that for \(x\ge d-1\) and \(y=1\) this was indeed proved by Bencs and Csikvári [4]. In fact, we do not even need to use this result since for \(q=1\) this statement is trivial. By Lemma 3.50 the curve \((q,w_c(q))\) for \(q\ge 2\) reparametrized with x and y is the graph of a monotone increasing function on the interval \([d-1,\infty )\), see the dashed line on Fig. 1. In particular, for \(q\ge 2\) the part of the hyperbola \((x-1)(y-1)=q\) with \(0\le w=y-1\le w_c\) goes under this curve implying \(\Phi _{d,q,w}=q\left( 1+\frac{w}{q}\right) ^{d/2}\). \(\square \)

Remark 3.53

We remark that the same argument also gives that if \(1<q<2\) and \(0\le w\le \frac{q}{d-2}\), then for an essentially large girth sequence of d-regular graphs \((G_n)_n\) we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{v(G_n)}\ln Z^{(2)}_{G_n}(q,w)=\lim _{n\rightarrow \infty }\frac{1}{v(G_n)}\ln Z_{G_n}(q,w)=q\left( 1+\frac{w}{q}\right) ^{d/2}. \end{aligned}$$

Proposition 3.54

For \(q\ge 2\) and \(w>w_c\) we have \(\Phi _{d,q,w}>q\left( 1+\frac{w}{q}\right) ^{d/2}\). Furthermore, the function \(\frac{\partial }{\partial w}\Phi _{d,q,w}\) has a discontuinity at \(w=w_c\) if \(q>2\).

Proof

Consider the function

$$\begin{aligned} h(w,t)=\left( 1+\frac{w}{q}\right) ^{-d/2}\Phi _{d,q,w}(t). \end{aligned}$$

We show that it is a strictly monotone increasing function in w for every \(t\in \left( 0,\frac{\pi }{2}\right) \). By definition

$$\begin{aligned} h(w,t)= & {} \left( \cos (t)+\sqrt{\frac{(q-1)w}{q+w}}\sin (t)\right) ^d+(q-1)\\{} & {} \quad \left( \cos (t)-\sqrt{\frac{w}{(q-1)(q+w)}}\sin (t)\right) ^d. \end{aligned}$$

Then \(\frac{\partial h}{\partial w}\) is given as

$$\begin{aligned}{} & {} \frac{dq\sqrt{q-1}\sin (t)}{\sqrt{w(q+w)^3}}\left( \left( \cos (t)+\sqrt{\frac{(q-1)w}{q+w}}\sin (t)\right) ^{d-1}\right. \\{} & {} \quad \left. -\left( \cos (t)-\sqrt{\frac{w}{(q-1)(q+w)}}\sin (t)\right) ^{d-1}\right) . \end{aligned}$$

This is positive if \(t\in \left( 0,\frac{\pi }{2}\right) \) since

$$\begin{aligned} \cos (t)+\sqrt{\frac{(q-1)w}{q+w}}\sin (t)> \left| \cos (t)-\sqrt{\frac{w}{(q-1)(q+w)}}\sin (t)\right| . \end{aligned}$$

Note that for \(q>2\) there is a \(t_0(w_c)\in \left( 0,\frac{\pi }{2}\right) \) such that \(\Phi _{d,q,w_c}(t_0(w_c))=\Phi _{d,q,w_c}(0)\), that is, \(h(w_c,t_0(w_c))=q\). Then for \(w>w_c\) we have \(h(w,t_0(w_c))>q\) which gives that

$$\begin{aligned} \Phi _{d,q,w}\ge \Phi _{d,q,w}(t_0(w_c))>q\left( 1+\frac{w}{q}\right) ^{d/2}. \end{aligned}$$

For \(q=2\) we know that \(w_c=\frac{2}{d-2}\) and for \(w> \frac{q}{d-2}=\frac{2}{d-2}\) we have \(\frac{\partial }{\partial t}\Phi _{d,q,w}(t)\bigg |_{t=0}=0\) and \(\frac{\partial }{\partial t^2}\Phi _{d,q,w}(t)\bigg |_{t=0}<0\), so at \(t=0\) we have a local minimum, thus \(\Phi _{d,q,w}>\Phi _{d,q,w}(0)\) for \(w>w_c\).

Next we prove the claim about \(\frac{\partial }{\partial w}\Phi _{d,q,w}\). Let \(w>w_c\) such that \(w-w_c\) is small enough, namely it satisfies

$$\begin{aligned} h(w,t_0(w_c))\ge & {} h(w_c,t_0(w_c))+\frac{1}{2}(w-w_c)\frac{\partial }{\partial w}h(w,t_0(w_c))\bigg |_{w=w_c}\\= & {} q+\frac{1}{2}(w-w_c)\frac{\partial }{\partial w}h(w,t_0(w_c))\bigg |_{w=w_c}. \end{aligned}$$

Then

$$\begin{aligned} \frac{\Phi _{d,q,w}-\Phi _{d,q,w_c}}{w-w_c}&\ge \frac{1}{w-w_c}\left( \left( 1+\frac{w}{q}\right) ^{d/2}h(w,t_0(w_c))-q\left( 1+\frac{w_c}{q}\right) ^{d/2}\right) \\&\ge \frac{1}{w-w_c}\left( \left( 1+\frac{w}{q}\right) ^{d/2}\left( q+\frac{1}{2}(w-w_c)\frac{\partial }{\partial w}h(w,t_0(w_c))\bigg |_{w=w_c}\right) \right. \\&\left. \quad -q\left( 1+\frac{w_c}{q}\right) ^{d/2}\right) \end{aligned}$$

From this it follows that

$$\begin{aligned} \frac{\partial }{\partial w^+}\Phi _{d,q,w}\bigg |_{w=w_c}\ge \frac{\partial }{\partial w^-}\Phi _{d,q,w}\bigg |_{w=w_c}+\frac{1}{2}\left( 1+\frac{w_c}{q}\right) ^{d/2}\frac{\partial }{\partial w}h(w,t_0(w_c))\bigg |_{w=w_c}. \end{aligned}$$

\(\square \)

Remark 3.55

It is well-known that if \(q=2\), then there is a second order phase transition, that is, \(\frac{\partial }{\partial w}\Phi _{d,2,w}\) is continuous, but \(\frac{\partial ^2}{\partial w^2}\Phi _{d,2,w}\) is discontinuous at \(w=w_c=\frac{2}{d-2}\). For details see Chapter 4.8 of [3].

3.10 Examples

In this section we give some examples for the theorems we proved.

Example 3.56

Let \(d=8\), \(q=5\) and \(w=1\). Then the vector

$$\begin{aligned} \underline{v}(0)=(10.368, 0, 1.728, 1.058, 0.936, 0.749, 0.615, 0.501, 0.409), \end{aligned}$$

where we kept only the first three digits everywhere. Note that \(10.368=5\cdot \left( 1+\frac{1}{5}\right) ^{8/2}.\) So for every 8-regular graph G we have

$$\begin{aligned} Z^{(2)}_G(5,1)=F_G(10.368, 0, 1.728, 1.058, 0.936, 0.749, 0.615, 0.501, 0.409). \end{aligned}$$

Using \(t_0=0.6619549492373429\) we get the vector

$$\begin{aligned} \underline{v}(t_0)=(16.277, 0, 0.433, -0.496, 0.581, -0.679, 0.794, -0.929, 1.086) \end{aligned}$$

again only keeping the first 3 digits everywhere. A more precise value of the first coordinate is 16.277748757985485, and so this is \(\Phi _{8,5,1}\). Note that the sign structure of \(\underline{v}(t_0)\) shows that

$$\begin{aligned} Z^{(2)}_G(5,1)= & {} F_G(16.277, 0, 0.433, -0.496, 0.581, -0.679, 0.794, -0.929, 1.086)\\ {}\ge & {} 16.277^{v(G)} \end{aligned}$$

for every 8-regular graph G.

Example 3.57

Let \(d=4\), \(q=5\) and \(w=3\). Then

$$\begin{aligned} \underline{v}(0)=(12.8, 0, 4.8, 4.409, 5.85) \end{aligned}$$

where we again kept only the first three digits everywhere. This time \(t_0=0.8316331320342567\) and \(\Phi _{4,5,3}=16.315621073058985\) while

$$\begin{aligned} \underline{v}(t_0)=(16.315, 0, 1.878, -3.867, 8.176). \end{aligned}$$

In this case \(t_1=1.06627054934707\) and the corresponding vector

$$\begin{aligned} \underline{v}(t_1)=(15.010, -2.835, 0.994, -2.454, 11.249). \end{aligned}$$

For the complete graph \(K_5\) on 5 vertices the subgraph counting polynomial looks like as follows:

$$\begin{aligned}&F_{K_5}(15.010, -2.835, 0.994, -2.454, 11.249\ |\ z)=180176.234z^{20} + 85764.618z^{18} \\&\quad +28876.392z^{16} + 15784.587z^{14} + 10454.536z^{12} + 9093.510z^{10}+ 13949.743z^{8}\\&\quad + 28103.222z^6 + 68600.498z^4 + 271865.398z^2 + 762087.303 \end{aligned}$$

All zeros of this polynomial have absolute value approximately 1.0747696. Of course, we could have used any 4-regular graph instead of \(K_5\) (see Fig. 6).

Fig. 6
figure 6

The zeros of \(F_{G}(15.010, -2.835, 0.994, -2.454, 11.249\ |\ z)\), where G is \(K_5\) (red) and G is the octahedron (black x)

Example 3.58

Let \(d=4\) and \(q=5\) again, but let \(w=w_c=2\). Then

$$\begin{aligned} \underline{v}(0)=(9.8, 0, 2.8, \sqrt{5.04}, 2.6), \end{aligned}$$

\(t_0=0.5575988373258864\) and

$$\begin{aligned} \underline{v}(t_0)=(9.8, 0, 2.8, -\sqrt{5.04}, 2.6). \end{aligned}$$

We have \(t_1=1.06419757674722\) and

$$\begin{aligned} \underline{v}(t_1)=(8, -\sqrt{4.5}, 1, -\sqrt{4.5}, 8). \end{aligned}$$

One can check that

$$\begin{aligned}&F_{K_5}(8, -\sqrt{4.5}, 1, -\sqrt{4.5}, 8\ |\ z)= 32768z^{20} + 23040z^{18} + 11070z^{16} + 6647.5z^{14} \\&\quad +4620z^{12} + 3927z^{10} + 4620z^8 + 6647.5z^6 + 11070z^4 + 23040z^2 + 32768 \end{aligned}$$

and all of its zeros have absolute value 1.

4 Selected Remarks About the Interval \(1<q<2\)

In this section we collected several remarks about the interval \(1<q<2\).

4.1 Two different quantities

In this section we aim to explain a seemingly negligible thing that makes the interval \(q\ge 2\) and \(1<q<2\) really different.

Once again let \(N=M_2'\) and \(\underline{\mu }=\underline{\nu }_2\), and parametrize the distribution h in the Bethe recursion as follows:

$$\begin{aligned} h=\left( \frac{R}{R+q-1},\frac{q-1}{R+q-1}\right) . \end{aligned}$$

Then

$$\begin{aligned} \textrm{BP}(h)_{1}=\frac{1}{z_h}\left( \frac{(1+w)R+q-1}{R+q-1}\right) ^{d-1} \end{aligned}$$

and

$$\begin{aligned} \textrm{BP}(h)_{2}:=\frac{1}{z_h}(q-1)\left( \frac{R+w+q-1}{R+q-1}\right) ^{d-1}. \end{aligned}$$

If \(\textrm{BP}(h)=h\), then by dividing the Bethe recursions for \(h_1\) and \(h_2\) we get that

$$\begin{aligned} R=\left( \frac{(1+w)R+q-1}{R+w+q-1}\right) ^{d-1}. \end{aligned}$$

We remark that if we study the Potts model \(N=M=wI_q+J_q\) and \(\mu \equiv 1\) with

$$\begin{aligned} h=\left( \frac{R}{R+q-1},\frac{1}{R+q-1},\dots ,\frac{1}{R+q-1}\right) , \end{aligned}$$

then we would have arrived to the same equation. Let \(\mathcal {R}_{d,q,w}\) be the set of non-negative solutions of this equation. Let \(\mathcal {R}^*_{d,q,w}\) be the solutions satisfying also that \(R\ge 1\). Let us introduce the notation

$$\begin{aligned} \overline{\mathbb {F}}(R,d,q,w)&=\left( \frac{(1+w)R+q-1}{\sqrt{(1+w)(R^2+q-1)+2R(q-1)+(q-1)(q-2)}}\right) ^d+\\&\quad +(q-1)\left( \frac{R+q+w-1}{\sqrt{(1+w)(R^2+q-1)+2R(q-1)+(q-1)(q-2)}}\right) ^d \end{aligned}$$

Then we know that

$$\begin{aligned} \Phi _d(M'_2,\nu _2)=\max _{R\in \mathcal {R}_{d,q,w}}\overline{\mathbb {F}}(R,d,q,w). \end{aligned}$$

For later use let us also introduce

$$\begin{aligned} \Phi ^*_d(M'_2,\nu _2)=\max _{R\in \mathcal {R}^*_{d,q,w}}\overline{\mathbb {F}}(R,d,q,w). \end{aligned}$$

Similarly, we can consider the pair

$$\begin{aligned} \Phi _{d,q,w}=\max _{t\in [0,2\pi ]}\Phi _{d,q,w}(t)\ \ \text {and}\ \ \ \Phi ^*_{d,q,w}=\max _{t\in \left[ 0,\frac{\pi }{2}\right] }\Phi _{d,q,w}(t). \end{aligned}$$

In case of \(q\ge 2\) we have

$$\begin{aligned} \Phi _d(M'_2,\nu _2)=\Phi _{d,q,w}=\Phi ^*_d(M'_2,\nu _2)=\Phi ^*_{d,q,w}. \end{aligned}$$

But when \(1<q<2\) we have

$$\begin{aligned} \Phi _d(M'_2,\nu _2)=\Phi _{d,q,w}>\Phi ^*_d(M'_2,\nu _2)=\Phi ^*_{d,q,w}. \end{aligned}$$

While it is still true that for an essentially large girth sequence of d-regular graphs we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{v(G_n)}\ln Z^{(2)}_{G_n}(q,w)=\Phi _{d,q,w}, \end{aligned}$$

we actually believe that

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{v(G_n)}\ln Z_{G_n}(q,w)=\Phi ^*_{d,q,w}. \end{aligned}$$

This means that the rank 2 approximation is not good enough in the interval \(1<q<2\).

Nevertheless, by Remark 3.53 we know that for \(1<q<2\) and \(0\le w\le \frac{q}{d-2}\) we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{v(G_n)}\ln Z_{G_n}(q,w)=q\left( 1+\frac{w}{q}\right) ^{d/2}. \end{aligned}$$

We remark that this result is compatible with the conjecture

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{v(G_n)}\ln Z_G(q,w)=\Phi ^*_{d,q,w} \end{aligned}$$

since for the function \(\Phi _{d,q,w}(t)\) we have

$$\begin{aligned} \frac{\partial }{ \partial t}\Phi _{d,q,w}(t)\bigg |_{t=0}=0\ \ \ \text {and}\ \ \ \frac{\partial ^2}{\partial t^2}\Phi _{d,q,w}(t)\bigg |_{t=0}=d\left( 1+\frac{w}{q}\right) ^{d/2-1}((d-2)w-q) \end{aligned}$$

which is negative if \(w<\frac{q}{d-2}\) and positive \(w>\frac{q}{d-2}\). So in the first case we get that \(t=0\) is a local maximum, in the second case it is a local minimum.