1 Introduction

Given two graphs \(\mathbb{G }\) and \(W\), a graph homomorphism is a mapping from the nodes of \(\mathbb{G }\) to the nodes of \(W\), such that every edge in \(\mathbb{G }\) is mapped onto an edge in \(W\). When nodes and edges of \(W\) are weighted, the homomorphism inherits a certain weight itself (see Sect. 2 for details). A sequence of \(N\)-node graphs \(\mathbb{G }_N, N\ge 1\) is defined to be right-converging with respect to \(W\) if the logarithm of the sum of homomorphism weights, normalized by \(N\), has a limit. In the statistical physics terminology, the nodes of \(W\) correspond to spin values, and the sum of homomorphism weights is called the partition function. Graph homomorphisms haven been studied from the statistical physics perspective in many papers, including [9, 10]. See also Chapter 4 of the recent book by Lovász [19] on this subject.

The theory of graph convergence is now well developed for the case of dense graphs (graphs with number of edges of the order \(N^2\)), see [2, 3, 12, 20], where the normalization is appropriately \(N^2\), not \(N\). The theory of sparse graphs convergence, however, presents some fundamental challenges [4, 7], as even establishing some of the basic properties of convergence for sparse graphs remain conjectures at best. For example, it is an open problem to show that the most basic sequence of sparse random graphs, namely the sequence of sparse Erdös-Rényi graphs is right-converging with respect to every target graph \(W\).

In this paper we prove a special case of this conjecture, under the assumption that a certain tensor product associated with the target graph \(W\) satisfies a certain convexity property. While the convexity assumption is somewhat complicated in the fully general case (see Theorem 1, which is the main result of the paper), for the special case when \(\mathbb{G }_N\) is a sequence of random graphs (as opposed to hyper-graphs) and when the matrix \(J\) of edge weights corresponding to \(W\) is deterministic and symmetric, the convexity assumptions means existence of a value \(\alpha \) which is larger than the largest element of \(J\), such that \(\alpha -J\) is positive semi-definite. Our additional technical assumption is the existence of spin values (labels of \(W\)) with positive interaction with every other spin value (namely the corresponding edge weights are positive). This assumption is adopted in order to avoid potentially nullifying the partition function. See [1] where a similar issue is treated instead by conditioning for partition function to stay positive. We formulate the problem of right-convergence both for the case of discrete and continuous spin values, the latter corresponding to the case when the nodes of \(W\) are indexed by real values and node and edge weights are random quantities described by some random measurable functions. Our framework is rich enough to accommodate a recent conjecture by Talagrand regarding the existence of the limit of a measure of a set obtained from \(\mathbb{R }^N\) by intersecting it with subsets of \(\mathbb{R }^N\), chosen i.i.d. according to some common probability law, see Sect. 3 for the precise statement. More accurately stated as Research Problem 6.7.2 in his recent book [24], this is one of many fascinating so-called Level III problems in the book.

Our proof method is based on the interpolation technique introduced by Guerra and Toninelli [13] in the context of Sherrington–Kirkpatrick model in statistical physics and further developed for the case of sparse graphs (called diluted spin glass models in the statistical physics literature) by Franz and Leone [14, 15], Panchenko and Talagrand [23], Montanari [22], Bayati et al. [5], Abbe and Montanari [1]. The idea is to build a sequence of graphs interpolating between a random graph on \(N\) nodes, on the one hand, and a disjoint union of two random graphs with \(N_1\) and \(N_2\) nodes on the other hand, where \(N_1+N_2=N\). The interpolation is constructed in such a way that in every step of the interpolation the log-partition function increases or decreases in expectation (depending on a model). Such a property means that the expected log-partition function is sub- or super-additive in \(N\), thus implying the existence of the limit by Fekete’s lemma. The convergence with high probability requires an additional concentration type argument, which for our case turns out to be more involved than usual, due to the continuity of spin values. We show that the super-additivity property holds for the class of models of interest when the assumptions of our main theorem hold. We further verify these assumptions for a broad scope of models considered earlier in the literature. In particular, most of the results obtained in [5], including those regarding independent set, coloring, Ising and random K-SAT models, follow from the main result of the present paper as special cases. As mentioned above, we do suspect that the right-convergence holds for sparse Erdös-Rényi graphs for all target graphs \(W\) and state this explicitly as a conjecture, since at the present time we do not have a single counterexample, however contrived.

The remainder of the paper is structured as follows. The notions of graph homomorphisms and right-convergence are introduced in the following section. Many examples are discussed also in the same section. Finally, the section introduces the definition of graph homomorphism for continuous spin values. While continuous spin models are ubiquitous in the physics literature, they are rarely discussed in the context of graph theory. Our modeling assumptions, conjectures and main result are stated in Sect. 3. In the same section we provide an in-depth discussion of the convexity of tensor products property—the principal technical tool underlying our main result—and discuss its relevance to examples introduced earlier in Sect. 2. In Sect. 4 we establish some basic properties of the log-partition functions and establish a concentration result. The interpolation technique is introduced in Sect. 5 and the proof of our main result is found in the same section.

We finish this section by introducing some notations. \(\mathbf{1}_d=(1,1,\ldots ,1)^T\) denotes a \(d\)-dimensional vector of ones. \({\varvec{I}}(A)\) is an indicator function, taking value \(1\) when the event \(A\) takes place and zero otherwise. Throughout the paper we will use standard order of magnitude notations \(O(\cdot )\) and \(o(\cdot )\) where the constants hidden \(O\) should be clear from the context. \(\mathbb{R }~ (\mathbb{R }_+)\) denotes the set of real (non-negative real) values.

2 Graph homomorphisms and right-convergence

Consider a \(K\)-uniform directed hypergraph \(\mathbb{G }\) on nodes \(\{1,2,\ldots ,N\}\triangleq V(\mathbb{G })\). Let \(E(\mathbb{G })\) be the set of hyperedges of \(\mathbb{G }\), where each hyperedge \(e=(u^e_1,\ldots ,u^e_K)\in E(\mathbb{G })\) is an ordered sets of \(K\) nodes in \(\mathbb{G }\). We fix a positive integer \(q\) and refer to integers \(0,1,\ldots ,q-1\) as colors or spin values, interchangeably. For every node \(u\in V(\mathbb{G })\) of the hypergraph, let \(\mathcal N (u,\mathbb{G })\) be the set of edges incident to \(u\). Then \(|\mathcal N (u,\mathbb{G })|\) is the degree of \(u\) in \(\mathbb{G }\), namely the total number of hyperedges containing \(u\). Similarly, for every edge \(e\), let \(\mathcal N (e,\mathbb{G })\) be the set of hyperedges incident to \(e\), including \(e\). Namely, \(\mathcal N (e,\mathbb{G })\) is the set of hyperedges sharing at least one node with \(e\). For simplicity, from this point on we will use the terms graphs and edges in place of hypergraphs and hyperedges.

Each node \(u\in V(\mathbb{G })\) of \(\mathbb{G }\) is associated with a random map \(h_u:\{0,1,\ldots ,q-1\}\mapsto \mathbb{R }_+\) called the potential at node \(u\). The sequence \(h_u, u\in V(\mathbb{G })\) is assumed to be i.i.d. distributed according to some probability measure \(\nu _h\). Similarly, each hyperedge \(e\in E(\mathbb{G })\) is associated with a random map \(J_e:\{0,1,\ldots ,q-1\}^K\mapsto \mathbb{R }_+\) called the potential at hyperedge \(e\). The sequence \(J_e, e\in E(\mathbb{G })\) is i.i.d. as well with a common probability measure \(\nu _J\). Further details concerning probability measures \(\nu _h\) and \(\nu _J\) will be discussed later.

Given an arbitrary map \(\sigma :V(\mathbb{G })\mapsto \{0,\ldots ,q-1\}\), we associate with it a (random) weight

$$\begin{aligned} H(\sigma )\triangleq \prod _{u\in V(\mathbb{G })}h_u(\sigma (u))\prod _{e\in E(\mathbb{G })} J_{e}(\sigma (u^e_1),\ldots ,\sigma (u^e_K)). \end{aligned}$$
(1)

Any such map is called homomorphism for reasons explained below. The following random variable is defined to be the partition function of \(\mathbb{G }\):

$$\begin{aligned} Z(\mathbb{G })\triangleq \sum _\sigma H(\sigma ), \end{aligned}$$
(2)

where the sum is taken over all maps \(\sigma :V(\mathbb{G })\mapsto \{0,\ldots ,q-1\}\). We note that the value \(Z(\mathbb{G })=0\) is possible according to this definition. In case \(Z(\mathbb{G })>0\), this induces a random Gibbs probability measure on the set of maps \(\sigma \), where the probability mass of \(\sigma \) is \(H(\sigma )/Z(\mathbb{G })\). The graph \(\mathbb{G }\) with potentials \(h_u,J_e\) and the corresponding Gibbs measure is also commonly called Markov Random Field in the Electrical Engineering literature.

Already a very rich class of models is obtained when the potentials \(h_u\) and \(J_e\) are deterministic, denoted by \(h\) and \(J\) for simplicity. In this case \(h\) and \(J\) induce a weighted complete graph on \(q\) nodes with node weights defined by \(h\) and edge weights defined by \(J\). We denote this weighted graph by \(W\). Consider further a special case when \(h=1, K=2\) and \(J\) is a symmetric zero-one valued function in its two arguments. Then we can think of \(W\) as an undirected unweighted graph on nodes \(0,\ldots ,q-1\), where \((i,j)\) is an edge in \(W\) if and only if \(J(i,j)=1\) (the symmetry of \(J\) makes the definition consistent). Then observe that \(H(\sigma )=1\) if \(\sigma \) defines a (proper) graph homomorphism from \(\mathbb{G }\) to \(W\) and \(H(\sigma )=0\) otherwise. (Recall that given two graphs \(\mathbb{G }_1,\mathbb{G }_2\), a map \(\sigma :V(\mathbb{G }_1)\mapsto V(\mathbb{G }_2)\) is called graph homomorphism if for every \((i_1,i_2)\in E(\mathbb{G }_1)\) we have \((\sigma (i_1),\sigma (i_2))\in E(\mathbb{G }_2)\)). Thus \(Z(\mathbb{G })\) is the number of homomorphisms from \(\mathbb{G }\) to \(W\). In that sense we can think of an arbitrary map \(\sigma :V(\mathbb{G })\mapsto \{0,1,\ldots ,q-1\}\) as a graph homomorphism and the associated partition function \(Z(\mathbb{G })\) is the “number” of homomorphisms.

Let us discuss some well-known examples from combinatorics and statistical physics in the context of our definition.

2.1 Examples

Independent sets (Hard-Core) model. A parameter \(\lambda >0\) is fixed. \(K=2,q=2\). The node potentials are deterministic and are defined by \(h(1)=\lambda \) and \(h(0)=1\). The edge potential are also deterministic and are defined by \(J(i_1,i_2)=0\) if \(i_1=i_2=1\) and \(J(i_1,i_2)=1\), otherwise. Then \(H(\sigma )>0\) if the set of nodes \(u\) with \(\sigma (u)=1\) is an independent set in \(\mathbb{G }\), and \(H(\sigma )=0\) otherwise. Moreover, in the former case \(H(\sigma )=\lambda ^{k}\), where \(k\) is the cardinality of the corresponding independent set. Thus \(Z(\mathbb{G })\) is the usual partition function associated with the independent set model, also known as the hard-core model. The parameter \(\lambda \) is commonly called activity or fugacity. When \(\lambda =1, Z(\mathbb{G })\) is simply the number of independent sets in the graph \(\mathbb{G }\).

Colorings (Potts) model. \(K=2, h_u=h\equiv 1\). A parameter \(\beta \in \mathbb{R }\) is fixed. The edge potentials are identical for all edges and given by \(J(i_1,i_2)=1\) if \(i_1\ne i_2\) and \(=\exp (-\beta )\) otherwise. In this case \(H(\sigma )=\exp (-\beta k)\), where \(k\) is the number of monochromatic edges, namely edges receiving the same color at the incident vertices. The case \(\beta >0 ~(\beta <0)\) is usually called anti-ferromagnetic (ferromagnetic) Potts model. In this paper we focus on the anti-ferromagnetic case.

Ising model. \(K=2,q=2\). \(h_u(1)=h(1)=h\), for some constant value \(h\in \mathbb{R }\), and \(h_u(0)=h(0)=1\). The edge potentials are identical for all edges and given by \(J(i_1,i_2)=\exp (-\beta {\varvec{I}}(i_1\ne i_2)+\beta {\varvec{I}}(i_1=i_2))\), for some parameter \(\beta \). As for the coloring model, the case \(\beta >0\) (\(\beta <0\)) is called ferromagnetic (anti-ferromagnetic) Ising model. Parameter \(h\) is called the external magnetization. It is more common to consider \(\{-1,1\}\) as opposed to \(\{0,1\}\) as a spin value space, in which case we can simply write \(J(i_1,i_2)=\exp (\beta i_1 i_2)\). When \(h=1\), it is easy to see that the Ising model is equivalent to a special case of the Potts model when \(q=2\), by multiplying each \(H(\sigma )\) by a constant factor.

Viana–Bray model.   \(q=2\). \(h(1)=h>0\), for some constant value \(h\), and \(h(0)=1\). A random variable \(I\), which is symmetric around zero and has bounded support, and parameter \(\beta >0\) are fixed. Let \(J(i_1,\ldots ,i_K)=\exp \left( \beta I\prod _{1\le l\le K}({\varvec{I}}(i_l=1)-{\varvec{I}}(i_l=0))\right) \). A more common approach is to fix the set of colors to be \(\{-1,1\}\), in which case \(J(i_1,\ldots ,i_K)=\exp \left( \beta I\prod _{1\le l\le K}i_l\right) \). The assumption of bounded support is not standard, but is adopted in this paper for convenience. One can think of Viana-Bray as a randomized mixture of ferromagnetic and anti-ferromagnetic Ising models.

XOR Model. Consider the Viana–Bray model with \(h=1\) and \(I\) taking values \(1\) and \(-1\) with equal probability. For convenience let us assume that the spin values are \(-1\) and \(1\) as opposed to \(0\) and \(1\). When \(I=1\), the potential \(J(i_1,\ldots ,i_K)\) takes value \(\exp (\beta )\) if an only if an even number of \(i_k, 1\le k\le K\) is \(-1\), and otherwise it takes value \(\exp (-\beta )\). The situation is reversed when \(I=-1\). A more common definition of the XOR model is as follows: \(J(i_1,\ldots ,i_K)=(I+i_1\cdots i_K )\mod (2)\). Namely \(J=0\) when the parity of the product \(i_1\cdots i_K\) coincides with the parity of \(I\), and \(J=1\) otherwise. Thus the model involves hard-core interaction (namely allows \(J\) to be zero). In this case the corresponding partition function \(Z(\mathbb{G })\) is the number of valid assignments, namely assignments such that potential value at every edge equals unity. This version of the model is outside of the scope of this paper, but we do notice that we obtain it from our version by defining \(J(i_1,\ldots ,i_K)=\exp \left( -\beta +\beta I\prod _{1\le l\le K}i_l\right) \) instead, and sending \(\beta \) to \(+\infty \). The conditions for existence of valid assignments and the number of valid assignments has been a subject of study on its own [1, 17, 21].

Random K-SAT model. This is our first example for which considering the order of nodes in edges \(e=(u^e_1,\ldots ,u^e_K)\) is relevant. We set \(q=2\). \(h_u=h\equiv 1\) for all \(u\). For every edge \(e\) a vector \((i_1^*,\ldots ,i_K^*)\) is selected uniformly at random from \(\{0,1\}^K\). We define

$$\begin{aligned} J_e(i_1,\ldots ,i_K)=\left\{ \begin{array}{ll} 1, &{} (i_1,\ldots ,i_K)\ne (i_1^*,\ldots ,i_K^*) ; \\ \exp (-\beta ), &{} (i_1,\ldots ,i_K)=(i_1^*,\ldots ,i_K^*). \end{array} \right. \end{aligned}$$

Again a more common version of this model is to define \(J_e(i_1,\ldots ,i_K)\) to be \(0\) when \((i_1,\ldots ,i_K)=(i_1^*,\ldots ,i_K^*)\) and \(1\) otherwise, in which case \(Z(\mathbb{G })\) is the number of satisfiable assignments. As for the Viana–Bray model we can think of this model as a limit of our model as \(\beta \rightarrow \infty \).

2.2 Continuous spin values

We now generalize the notion of graph homomorphisms to the case of real valued colors (spins). This is achieved by making \(h\) and \(J\) random measurable functions with real valued inputs. Specifically, assume that we have random i.i.d. (Lebesgue) measurable function \(h_u:\mathbb{R }\mapsto \mathbb{R }_+, ~u\in V(\mathbb{G })\), and random i.i.d. (Lebesgue) measurable functions \(J_e:\mathbb{R }^K\mapsto \mathbb{R }_+, ~e\in E(\mathbb{G })\). The probability measures corresponding to the randomness in choices of \(h_u\) and \(J_e\) are again denoted by \(\nu _h\) and \(\nu _J\), respectively. Every map \(\sigma :V(\mathbb{G })\mapsto \mathbb{R }\), is associated with a random weight defined as

$$\begin{aligned} H(\sigma )\triangleq \prod _{u\in V(\mathbb{G })}h_u(\sigma (u))\prod _{e\in E(\mathbb{G })} J_{e}(\sigma (u^e_1),\ldots ,\sigma (u^e_K)). \end{aligned}$$

As in the discrete case, we call any such map a homomorphism and do not distinguish between homomorphisms with zero vs positive weights. The associated partition function is defined as the following integral taken in the Lebesgue sense:

$$\begin{aligned} Z(\mathbb{G })\triangleq \int \limits _{\mathbb{R }^N} H(\sigma )d\sigma . \end{aligned}$$

A more conventional way to write the partition function is to think of \((\sigma (u), u\in V(\mathbb{G }))\) as an \(N=|V(\mathbb{G })|\)-dimensional real vector \((x_u, u\in V(\mathbb{G }))=(x_1,\ldots ,x_N)\), thus defining the associated partition function

$$\begin{aligned} Z(\mathbb{G })=\int \limits _{x\in \mathbb{R }^N} \prod _{1\le u\le N}h_u(x_u)\prod _{e\in E(\mathbb{G })} J_{e}(x_{u^e_1},\ldots ,x_{u^e_K})dx. \end{aligned}$$

We adopt this notational convention from this point on. It is simple to see that the discrete case is a special case of the continuous spin value model. Indeed given a discrete model corresponding to some \(K,q\) and realizations of \(h_u,J_e\), define for every \(x\in \mathbb{R }\)

$$\begin{aligned} h_u(x)=\left\{ \begin{array}{l@{\quad }l} h_u(i), &{} \hbox {if} \,\,x\in [i,i+1), \hbox {for}\quad i=0,1,\ldots ,q-1; \\ 0, &{} \hbox {otherwise.} \end{array} \right. \end{aligned}$$
(3)

Similarly, \(J_e(x_1,\ldots ,x_K)=J_e(i_1,\ldots ,i_K)\) if \(x_{i_l}\in [i_l,i_l+1), l=1,\ldots ,K,\) and \(J_e=0\) otherwise. Then \(\int H(x)dx=\sum _{\sigma }H(\sigma )\), with the appropriate meaning of \(x\) and \(\sigma \) in two expressions.

The following example of a continuous spin valued model is the subject of Talagrand’s conjecture, which is stated precisely in Sect. 3.1 below. Let \(h\) be a Gaussian kernel \(h(x)\triangleq (2\pi )^{-{1\over 2}}\exp (-x^2/2), x\in \mathbb{R }\) and let \(J:\mathbb{R }^K\mapsto \{0,1\}\) be a zero-one valued deterministic function. Equivalently \(J\) defines a set \(A_J=\{x\in \mathbb{R }^K: J(x)=1\}\). Then \(Z(\mathbb{G })\) is the probability that a vector \(X_1,\ldots ,X_N\) of i.i.d. standard normal random variables is such that \((X_{u^e_1},\ldots ,X_{u^e_K})\) belongs to the set \(A_J\) for every edge \(e\in E(\mathbb{G })\). If in place of a Gaussian kernel we use \(h\equiv 1\), then \(Z(\mathbb{G })\) is instead the Lebesgue measure of the corresponding set. Another set of examples associated with continuous spin valued models is obtained by considering linear programming relaxations of some of the discrete models considered in the earlier section, see [16] for more details.

2.3 Right-convergence of graph sequences

We will be interested primarily in the case of sequences of sparse graphs \(\mathbb{G }\), which for the purposes of this paper we define as a sequence of graphs \((\mathbb{G }_N, N\ge 1)\) such that \(\sup _N |E(\mathbb{G }_N)|/|V(\mathbb{G }_N)|<\infty \). Namely, the number of edges grows at most linearly in the number of nodes. For simplicity we assume from now on that \(|V(\mathbb{G }_N)|=N\) and \(V(\mathbb{G }_N)=\{1,\ldots ,N\}\). The right-convergence of graph sequences concerns the existence of the limit of the normalized log-partition function \(N^{-1}\log Z(\mathbb{G }_N)\). Here \(\log Z(\mathbb{G }_N)\) is defined to be \(-\infty \) in the case \(Z(\mathbb{G }_N)=0\). The following definition applies to both the discrete and continuous spin value cases.

Definition 1

Given probability measures \(\nu _h,\nu _J\), a sequence of graphs \((\mathbb{G }_N)\) is defined to be right-converging with respect to \(\nu _h\) and \(\nu _J\) if the sequence \(\lim _{N\rightarrow \infty }N^{-1} \log Z(\mathbb{G }_N)\) converges in distribution to some random variable \(Z\). The sequence of graphs is defined to be right-converging if it is right-converging with respect to every \(\nu _h,\nu _J\).

In many interesting cases, including the cases considered in this paper, the random variable \(N^{-1}\log Z(\mathbb{G }_N)\) is concentrated around its mean \(\mathbb{E }[N^{-1}\log Z(\mathbb{G }_N)]\), in which case the right-convergence is defined as convergence in probability. Namely, the sequence \(\mathbb{G }_N\) is right converging if there exists a deterministic quantity \(z\) such that

$$\begin{aligned} \lim _{\epsilon \downarrow 0}\lim _{N\rightarrow \infty }\mathbb{P }\left( |N^{-1}\log Z(\mathbb{G }_N)-z|>\epsilon \right) =0. \end{aligned}$$

It is easy to construct examples of trivially converging graph sequences. For example suppose \(E(\mathbb{G }_N)=\emptyset \). Then

$$\begin{aligned} Z(\mathbb{G }_N)=\prod _{u\in V(\mathbb{G }_N)} \int \limits _{x\in \mathbb{R }}h_u(x)dx, \end{aligned}$$

implying that \(N^{-1}\log Z(\mathbb{G }_N)\) is an average of i.i.d. sequence of random variables distributed as \(\log \int _{x\in \mathbb{R }}h_u(x)dx\). The limit equals \(\mathbb{E }\log \int h_u(x)dx\), if \(\mathbb{E }|\log \int h_u(x)|dx <\!\infty \), or possibly a stable law otherwise. In general, it is easy to see that if \(G_N\) is a disjoint union of \(N/k\) identical graphs with \(k\) nodes, where \(k\) is constant, then \(\mathbb{G }_N\) is right-converging.

The notion of right-convergence comes in contrast to the notion of left-convergence [8], which is roughly defined as convergence of constant depth neighborhoods of randomly uniformly chosen nodes in \(\mathbb{G }_N\). We do not provide a formal definition of left-convergence since we will not be working with this notion in this paper. It is known that right-convergence implies left-convergence [4], but very simple examples exist showing that converse is not true (see again [4]).

We are interested in defining right-convergence for sequences of random graphs. A sequence of random graphs \(\mathbb{G }_N, N\ge 1\) is defined to be right-convergent, if it is right-converging with respect to the randomness associated both with the graph and potentials \(h_u,J_e\). In this paper we consider exclusively the following sequence of sparse random graphs on \(N\) nodes, known as (sparse) Erdös-Rényi graph, denoted by \(\mathbb{G }(N,c)\). A constant \(c>0\) is fixed. For each \(j=1,2,\ldots ,\lfloor cN\rfloor \) the \(j\)-th edge is an ordered \(K\) tuple \(e_j=(u^{e_j}_1,\ldots ,u^{e_j}_K)\) chosen uniformly at random with replacement from the set of nodes \(1,2,\ldots ,N\). It is not hard to show that the graph sequence \(\mathbb{G }(N,c), N\ge 1\) is right-converging when \(c<(K(K-1))^{-1}\), as in this case the graph breaks down into a disjoint union of linearly many graphs with a constant average size [6, 18]. A far more interesting case is when \(c>(K(K-1))^{-1}\), as in this case a giant (linear) size connected component exists and understanding the limit of the log-partition function in this regime is far from trivial.

3 Conjectures and the main result

3.1 Assumptions and main results

Our main goal is establishing the conditions under which the sequence of Erdös-Rényi graphs is right-converging. Since there is no particular reason as to why there should exist any measures \(\nu _h,\nu _J\) and any \(c>0\) such that \(\mathbb{G }(N,c)\) would not be right-converging with respect to \(\nu _h,\nu _J\), and since at the very least no counterexamples are known to the day, the following conjecture seems plausible.

Conjecture 1

For every \(c>0\), the random graph sequence \(\mathbb{G }(N,c)\) is right-converging.

Now we turn to the Talagrand’s Problem 6.7.2 in [24], mentioned above. Framed in our terminology, it corresponds to the special case of Erdös-Rényi graph convergence when \(J\) takes only values \(0\) and \(1\) and \(h\) corresponds to a Gaussian Kernel. There are compelling reasons to consider even this special case and the details can be found in the aforementioned book. Furthermore, it is of interest to consider even more restricted case when the sets defined by \(J=1\) condition are convex. Since the motivation behind the latter assumption is beyond the scope of the present paper, we will not focus on it here.

The precise statement of Talagrand’s conjecture is as follows.

Conjecture 2

(Research Problem 6.7.2 in [24]) For every \(c>0\), the random graph sequence \(\mathbb{G }(N,c)\) is right-converging when \(h(x)=\exp (-x^2)\) and \(J\in \{0,1\}\).

Despite the Gaussian distribution suggested by the kernel \(h(x)=\exp (-x^2)\), one should be aware of the fact that we are dealing here with the case of a deterministic node potential. In fact, all of our previous examples also correspond to deterministic node potentials as well. The author is not aware of interesting models with genuinely random node potentials studied in the past, but since the techniques of the present paper easily extend to the case of random node potentials, they are allowed in our model.

We now turn to assumptions needed for the statement and the proof of our main result.

Assumption 1

There exist values \(\kappa >0, \rho _{\max }\ge \rho _{\min }>0, 0<J_{\max }\le \rho _{\max }\) and \(\Omega _h\subset \mathbb{R }\) such that

  1. 1.

    \([0,\kappa ]\subset \Omega _h\), and, almost surely with respect to \(\nu _h, h(x)=0\) for all \(x\notin \Omega _h\). Namely, the support of \(h\) lies in \(\Omega _h\) almost surely. Furthermore,

    $$\begin{aligned} \rho _{\min }\le \int \limits _{x\in [0,\kappa ]} h(x)dx\le \int \limits _\mathbb{R }h(x)dx\le \rho _{\max } \end{aligned}$$
    (4)

    almost surely.

  2. 2.

    \(\sup _{x\in \mathbb{R }}J(x)\le J_{\max }\) almost surely.

  3. 3.

    For every \(i=1,2,\ldots ,K\) almost surely

    $$\begin{aligned} \{x\in \mathbb{R }^K: x_i\in [0,\kappa ), x_k\in \Omega _h, 1\le k\le K\}\subset \{x\in \mathbb{R }^K: J(x)\ge \rho _{\min }\}. \end{aligned}$$

The last assumption means that there is a positive measure set of spin values which have a positive interaction with every other spin value choices within \(\Omega _h\). In the statistical physics terminology this means that continuous spin values in \([0,\kappa )\) have a positive (“soft”) interaction with all other spin values.

Let us now verify that Assumption 1 holds for all models discussed in Sect. 3.2. For all of these models we take \(\Omega _h=[0,q-1]\). Namely, it is the range of continuous representation of the discrete set of colors \(0,1,\ldots ,q-1\). For the case of independent set model the parts 1) and 2) are verified by taking \(\kappa =1,\rho _{\min }=\min (\lambda ,1), \rho _{\max }=1+\lambda \), and \(J_{\max }=1\). The assumption 3) concerning the soft core interaction is verified as well since \(J\) takes value \(1\) as long as at least one of the arguments of \(J\) belongs to \([0,1)\).

The Assumption 1 is verified in a straightforward way for coloring and Ising models, as in this case \(J\ge \exp (-\beta )>0\) for any choice of spin values. We set \(\kappa =q-1\) so that all values in the domain \(\Omega \) “qualify” for soft-core interaction. We set \(\rho _{\max }=\max (\max (1,h)q), J_{\max }=\exp (|\beta |)\), and \(\rho _{\min }=\min (\min (1,h),\exp (-|\beta |)\). For the Viana–Bray model we set \(J_{\max }=\rho _{\max }=\max (h,\exp (\beta c_I))\), where \([-c_I,c_I]\) is the support of \(I\). The remaining parts of the assumptions for the Viana–Bray and verification of the assumptions for XOR is similar. For the K-SAT model we set \(J_{\max }=1\), and \(\kappa =\rho _{\max }=2, \rho _{\min }=\exp (-\beta )\). It is easy to check that Assumption 1 is verified.

In general it is easy to see that for discrete deterministic models, the only non-trivial part of Assumption 1 is the existence of a state with soft interactions. Namely, the existence of a state \(q_0\in \{0,1,\ldots ,q-1\}\) such that for every \(k=1,\ldots ,K\) and every \(i_1,\ldots ,i_K\in \{0,1,\ldots ,q-1\}\) such that \(i_k=q_0\), we have \(J(i_1,\ldots ,i_K)>0\). In this case we can take

$$\begin{aligned} J_{\max }&= \max _{0\le i_1,\ldots ,i_K\le q-1}J(i_1,\ldots i_K), \end{aligned}$$
(5)
$$\begin{aligned} \rho _{\max }&= \max \left( J_{\max },q\max _i h(i)\right) , \end{aligned}$$
(6)
$$\begin{aligned} \rho _{\min }&= \min _{1\le k\le K}\min _{i_1,\ldots ,i_K: i_k=q_0}J(i_1,\ldots ,i_K). \end{aligned}$$
(7)

We now turn to our next key assumption regarding the convexity property of the edge potentials \(J_e\). For this purpose we need to resort to the notions of multidimensional arrays and their tensor products. An \(n\)-dimensional array \(A\) of \(K\)-th order is ordered set of real values of the form \(A=(a_{\underline{i}})\), where \(\underline{i}=(i_1,\ldots ,i_K)\) and \(i_1,\ldots ,i_K\) vary over all indices \(1,2,\ldots ,n\). A tensor product of two arrays \(A=(a_{i_1,\ldots ,i_K})\) and \(B=(b_{i_1,\ldots ,i_K})\), of the same dimension \(n\) and order \(K\) is an \(n^2\)-dimensional array of order \(K\) denoted by \(A\otimes B\), where for each \(\underline{i}=(i_{1},\ldots ,i_{K})\) and \(\underline{j}=(j_{1},\ldots ,j_{K})\), the corresponding entry \((A\otimes B)_{\underline{i},\underline{j}}\) of the product \(A\otimes B\) is \(a_{\underline{i}}b_{\underline{j}}\). While one can define tensor products of arrays of different order and dimensions, it will be never relevant for our purposes. Given a convex set \(S\subset \mathbb{R }^n\), an \(n\)-dimensional array \(A\) of order \(K\) is defined to be convex over \(S\) if the following multilinear form defined on \(y=(y_1,\ldots ,y_n)\in S\) is convex:

$$\begin{aligned} (y_1,\ldots ,y_n)\mapsto \sum _{\underline{i}=(i_1,\ldots ,i_K)}y_{i_1}y_{i_2}\cdots y_{i_K}a_{\underline{i}}, \end{aligned}$$
(8)

where the sum is over all \(i_1,\ldots ,i_K\) taking values \(1,\ldots ,n\). When \(S=\mathbb{R }^n\) we simply say the array is convex. For example, the second order array is convex if and only if it is positive semi-definite. On the other hand, consider a one-dimensional array of order \(K=3\), which is defined by a single number \(a>0\). The corresponding multilinear form is just \(y\mapsto ay^3\) which is convex over \(\Omega =\mathbb{R }_+\) but is not convex over \(\Omega =\mathbb{R }\). This observation will be useful in our analysis of the random K-SAT problem when \(K\) is odd. We write \(\langle y,A\rangle \) for the expression on the right-hand side of (8) for short. We now state our main result.

Theorem 1

Suppose the Assumption 1 holds. Suppose further that there exists constant \(\alpha \ge J_{\max }\) such that for every \(n,r\ge 1\) and almost surely every \(x_1,\ldots ,x_r\in \Omega _h^n, ~x_l=(x^l_1,\ldots ,x^l_n),~1\le l\le r\), it holds that the expected tensor product

$$\begin{aligned} \mathbb{E }\bigotimes _{1\le l\le r} A_l \end{aligned}$$
(9)

is convex on the set \(\mathbb{R }_+^{n^r}\), where \(A_l\) is an \(n\)-dimensional array of order \(K\) defined by

$$\begin{aligned} A_l=\alpha -J=\left( \alpha -J\left( x^l_{i_1},\ldots ,x^l_{i_K}\right) , ~1\le i_1,\ldots ,i_K\le n\right) . \end{aligned}$$

Then the graph sequence \(\mathbb{G }(N,c)\) is right-converging with respect to \(\nu _h,\nu _J\).

We note that the tensor product \(\bigotimes _{1\le l\le r} A_l\) is an \(n^r\)-dimensional array of order \(K\). We further note that, importantly, the same copy of random \(J\), generated according to \(\nu _J\), is used in this tensor product, and the expectation is with respect to the randomness of \(J\). The expectation operator when applied to arrays is understood componentwise.

As we will show in the next section, Theorem 1 covers many special cases, some of them already covered in the literature. In particular, the right-convergence for K-SAT and Viana–Bray models was established in [14], and the right-convergence for independent set, Ising and coloring models was established in [5]. The right-convergence for the K-SAT and XOR model with hard-core interaction was established in [1], but is not covered by our theorem, since the part 3 of Assumption 1 fails for this model.

3.2 Examples and special cases

Let us verify that the convexity assumption of Theorem 1 holds for many examples, including our examples in Sect. 2.1. For most of the examples we will be able to verify convexity of the expected tensor product (9) on all \(\mathbb{R }^{n^r}\) as opposed to \(\mathbb{R }_+^{n^r}\). In particular, let us now focus on discrete models with \(q\) spin values. Observe that we can bypass the embedding of the discrete model into a continuous model via (3). Furthermore, regarding the special case when \(K=2\) and \(J\) is deterministic and symmetric, it is well-known that the tensor product of positive semi-definite matrices is positive semi-definite. Thus it suffices to assert the convexity of each individual matrix \(\alpha -J(x_i,x_j), x=(x_1,\ldots ,x_n)\in \mathbb{R }^n\), rather than their tensor product. Finally, since \(x_i\) take arbitrary values in \(0,1,\ldots ,q-1\), it suffices to simply verify the convexity of the \(q\times q\) matrix \((\alpha -J(i,j), 0\le i,j\le q-1)\). Recall our earlier observation that Assumption 1 holds in this case if there exists \(i_0\) such that \(J(i_0,j)>0\) for all \(j\), namely \(\max _i\min _jJ(i,j)>0\). In this case \(J_{\max }\) can be set as (5). We obtain the following result.

Theorem 2

Suppose \(K=2\) and \(J\) is a deterministic and symmetric edge potential such that \(\max _i\min _jJ(i,j)>0\) and the matrix \(\left( \alpha -J(i,j), 0\le i,j\le q-1\right) \) is positive semi-definite for some \(\alpha \ge J_{\max }\), where \(J_{\max }\) is defined by (5). Then the sequence \(G(N,c)\) is right-converging with respect to \(J\).

Let us apply this result to our examples, beginning with the independent set model. The matrix \(\alpha -J\) is

$$\begin{aligned} \left( \begin{array}{c@{\quad }c} \alpha -1 &{} \alpha -1 \\ \alpha -1 &{} \alpha \\ \end{array} \right) , \end{aligned}$$

which is positive semi-definite. For the case of the coloring model we obtain that \(\alpha -J\) is a matrix with diagonal entries \(\alpha -\exp (-\beta )\) and off-diagonal entries \(\alpha -1\le \alpha -\exp (-\beta )\). This matrix is positive semi-definite since \(\beta >0\) (anti-ferromagnetism assumption). The situation for the anti-ferromagnetic Ising model is the same, since the only difference is the possible presence of the magnetic field. We conclude that the sequence of random graphs \(\mathbb{G }(N,c)\) is right-converging with respect to these models. This result was already obtained in [5] by conducting an individual analysis for each of the cases. Here we obtain as a simple consequence of Theorem 2.

Before we turn to other examples, we ask the following question: under what conditions the convexity assumption holds for discrete models? While we do not have the answer for the general case, the answer for the case of symmetric deterministic positive definite matrices is rather simple (the author wishes to thank László Lovász for this observation).

Lemma 1

Suppose \(J=(J_{i,j}, 1\le i,j\le n)\) is a deterministic, symmetric \(n\times n\) matrix. There exists \(\alpha _0>0\) such that \(\alpha -J\) is positive definite for all \(\alpha \ge \alpha _0\) if and only if \(-J\) is positive definite on the linear subspace

$$\begin{aligned} R_n\triangleq \left\{ y\in \mathbb{R }^n: \mathbf{1}_n^Ty=0\right\} . \end{aligned}$$
(10)

Namely \(y^T(-J)y>0\) for every nonzero \(y\in R_n\).

We obtain the following result.

Theorem 3

Suppose \(J\) is a deterministic matrix such that \(-J\) is positive definite on \(R_q\) and \(\max _i\min _jJ(i,j)>0\). Then the random graph sequence \(\mathbb{G }(N,c)\) is right-converging with respect to \(J\).

Proof of Lemma 1

Suppose \(\alpha -J\) is positive definite for some \(\alpha >0\). Fix any non-zero \(y\in R_n\) and observe that

$$\begin{aligned} 0< y^T(\alpha -J)y=\alpha y^T(\mathbf{1}_n\mathbf{1}_n^T)y-y^TJy=\alpha (\mathbf{1}_n^Ty)^2-y^TJy=-y^TJy, \end{aligned}$$

implying that \(-J\) is positive definite on \(R_n\).

Conversely, suppose \(-J\) is positive definite on \(R_n\). Fix any sequence \(\alpha _r\rightarrow \infty \). Let

$$\begin{aligned} y_r=\arg \min _{y\in \mathbb{R }^n, \Vert y\Vert _2=1} y^T(\alpha _r-J)y=\arg \min _{y\in \mathbb{R }^n,\Vert y\Vert _2=1}\alpha _r(\mathbf{1}^T_ny)^2-y^TJy. \end{aligned}$$

Such \(y_r\) clearly exists by the compactness argument. Find \(y^*, \Vert y^*\Vert _2=1\) such that \(y_r\) converges to \(y^*\) along some subsequence \(r_l, l\ge 1\). If \(\mathbf{1}^T_n y^*=0\) then \(y*\in R_n\), giving \(-(y^*)^TJy^*>0\). Then we can find large enough \(r\), such that \(\min _{y\in \mathbb{R }^n,\Vert y\Vert _2=1}\alpha _r(\mathbf{1}^T_ny)^2-y^TJy>0\) and the assertion is proven.

On the other hand, if \(\mathbf{1}^T_n y^*\ne 0\), then we find \(r_0\) large enough so that \(\alpha _r(\mathbf{1}^T_ny^*)^2-(y^*)^TJy^*>0\) for all \(r\ge r_0\). Then since \(y_{r_l}\rightarrow y\), we can find \(r_l\) large enough so that \(\min _{y\in \mathbb{R }^n,\Vert y\Vert _2=1}\alpha _{r_l}(\mathbf{1}^T_ny)^2-y^TJy>0\) and the assertion is established. \(\square \)

Interestingly, the definiteness condition in Lemma 1 cannot be relaxed to the case when \(-J\) is positive semi-definite. Indeed let

$$\begin{aligned} J=\left( \begin{array}{c@{\quad }c} -1 &{} 0 \\ 0 &{} 1 \\ \end{array} \right) . \end{aligned}$$

Then \(y^T(-J)y=-y_1^2+y_2^2\) which is \(0\) when \(y_1+y_2=0\). Nevertheless, \(\alpha -J\) is never positive semi-definite since the determinant is \(-1<0\) (the author wishes to thank Rob Freund for this counterexample).

Let us now give an example of a continuous spin model for which the product in (9) is convex. Such an example can derived as a continuous analogue of the coloring model with countably infinitely many colors. In particular we let \(K=2\). Fix a countable sequence of mutually disjoint Lebesgue measurable sets \(A_r\subset \mathbb{R }, r\ge 1\), and positive weights \(\gamma _r>0, r\ge 1\) such that \(\sup \gamma _r<\infty \). Fix any \(\gamma \ge \sup \gamma _r\) and define

$$\begin{aligned} J(x,y)=\gamma -\sum _{r\ge 1}\gamma _r{\varvec{I}}\{x,y\in A_r\}. \end{aligned}$$
(11)

Namely, for every vector \((x_i,1\le i\le n)\), the corresponding matrix is

$$\begin{aligned} A\triangleq \left( \gamma -\sum _{r\ge 1}\gamma _r{\varvec{I}}\{x_i,x_j\in A_r\}, ~1\le i,j\le n\right) . \end{aligned}$$

Then \(\gamma -A\) is a positive semi-definite matrix since for every \(y\in \mathbb{R }^n\),

$$\begin{aligned} y^T(\gamma -A)y=\sum _r\gamma _r\left( \sum _{i: x_i\in A_r}y_i\right) ^2\ge 0. \end{aligned}$$

In the special case \(\gamma =\gamma _r=1\) for all \(r\), we also obtain \(J(x,y)\in \{0,1\}\) which conforms with Talagrand’s Conjecture 2. Thus in the special case \(h(x)=\exp (-x^2)\) corresponding to \(\Omega _h=\mathbb{R }\), Assumption 1 corresponding to the existence of soft states is satisfied if there exists \(\kappa >0\) such that \([0,\kappa )\cap A_r=\emptyset \) for all \(r\). (This requirement can be generalized to the case when we replace \([0,\kappa )\) by any positive length interval or even positive (Lebesgue) measure subset of \(\mathbb{R }\).)

Corollary 1

Conjecture 2 holds when \(J(x,y)=1-\sum _{r\ge 1}{\varvec{I}}\{x,y\in A_r\},\) and there exists \(\kappa >0\) such that \([0,\kappa )\cap A_{r}=\emptyset \), for all \(r\), where \(A_r\) is any countably infinite collection of mutually disjoint measurable subsets of \(\mathbb{R }\).

Unfortunately, for the case when \(K=2\) and the potentials \(J\) are zero-one valued, the example above is the only form which can make \(\alpha -J\) positive semi-definite for some \(\alpha \). Indeed, suppose \(J\) is zero-one valued deterministic edge potential such that the product (9) is convex for every \(x_1,\ldots ,x_r\in \mathbb{R }^n\). Let \(A_0\) be the set of \(x\) such that the set \(\{y: J(x,y)=0\}\) is non-empty. We claim that \(J(x,x)=0\) for every \(x\in A_0\). Indeed, otherwise there is \(x'\) such that \(J(x,x')=0\). Then for the vector \((x_1,x_2)=(x,x')\) and every \(\alpha \ge 1\), the matrix \(\alpha -J\) is

$$\begin{aligned} \left( \begin{array}{c@{\quad }c} \alpha -J(x,x) &{} \alpha -J(x,x') \\ \alpha -J(x,x') &{} \alpha -J(x',x') \\ \end{array} \right) = \left( \begin{array}{c@{\quad }c} \alpha -1 &{} \alpha \\ \alpha &{} \alpha -J(x',x') \\ \end{array} \right) . \end{aligned}$$

The entry \(\alpha -J(x',x')\) is either \(\alpha \) or \(\alpha -1\). In either case the determinant of the matrix is negative. Thus the matrix is not positive semi-definite, and the claim is established.

If \(A_0\) is empty, then we can take \(A_1=\mathbb{R }\), in which case \(J\equiv 1\) and the assertion is established. Otherwise define the equivalency relation on \(A_0\) as follows: \(x,y\in A_0\) are equivalent if \(J(x,y)=0\). The reflexivity follows from the observation above that \(J(x,x)=0\) for every \(x\in A_0\), and symmetry follows from symmetry of \(J\). For transitivity, suppose there exists \(x_1,x_2,x_3\in A_0\) such that \(J(x_1,x_2)=J(x_2,x_3)=0\), but \(J(x_1,x_3)=1\). Then the matrix \((\alpha -J(x_i,x_j), 1\le i,j\le 3)\) is

$$\begin{aligned} \left( \begin{array}{c@{\quad }c@{\quad }c} \alpha &{} \alpha &{} \alpha -1 \\ \alpha &{} \alpha &{} \alpha \\ \alpha -1 &{} \alpha &{} \alpha \\ \end{array} \right) , \end{aligned}$$

which is not positive semi-definite, since the determinant of this matrix is \(-\alpha \). This establishes transitivity. By the equivalency relationship, we have \(A_0=\cup _{r\ge 1} A_r\) where \(A_r\) are mutually disjoint positive measure sets, and almost surely \(J(x,y)=0\) if and only if \(x,y\in A_r\) for some \(r\ge 1\). Thus indeed \(J\) can satisfy the assumptions of Theorem 1 only when it is of the form \(J(x,y)=1-\sum _{r\ge 1}{\varvec{I}}\{x,y\in A_r\}\).

Now let us turn to the examples when \(K\ge 3\). We begin with the random K-SAT model. This case was also earlier analyzed in [14] and [5]. Recall that for this model \(J\) takes values \(1\) or \(\exp (-\beta )\). Assumption 1 holds for this model. We set \(\alpha =1\) and claim that the assumptions of Theorem 1 hold as well. We fix an arbitrary sequence of \(r\) elements: \(x^l=(x_1^l,\ldots ,x_n^l)\in \{0,1\}^n,~l=1,2,\ldots ,r \). Fix also a realization of \(J\), and let \(z_1^*,\ldots ,z_K^*\in \{0,1\}\) be the corresponding unique binary assignment such that \(J(z_1^*,\ldots ,z_K^*)=\exp (-\beta )\), and \(J(z_1,\ldots ,z_K)=1\) when \((z_1,\ldots ,z_K)\ne (z_1^*,\ldots ,z_K^*)\). Consider the corresponding array \(\bigotimes _{1\le l\le r} A_l\), where

$$\begin{aligned} A_l=\left( 1-J\left( x^l_{i_1},\ldots ,x^l_{i_K}\right) , ~1\le i^l_1,\ldots ,i^l_K\le n\right) . \end{aligned}$$

Every entry of the tensor product \(\bigotimes _{1\le l\le r} A_l\) is conveniently indexed by a sequence \((i^1_1,\ldots ,i^r_1), \ldots , (i^1_K,\ldots ,i^r_K)\), where \(i^l_k, 1\le l\le r, 1\le k\le K\) vary over \(1,\ldots ,n\). The corresponding entry is

$$\begin{aligned} \prod _{1\le l\le r}\left( 1-J\left( x^l_{i^l_1},\ldots ,x^l_{i^l_K}\right) \right) . \end{aligned}$$
(12)

This entry is non-zero if and only if \((x^l_{i^l_1},\ldots ,x^l_{i^l_K})=(z_1^*,\ldots ,z_K^*)\), for all \(l=1,2,\ldots ,r\), in which case the value is \((1-\exp (-\beta ))^r\). In particular, for the product to be non-zero it should be the case that for each \(k=1,\ldots ,K\),

$$\begin{aligned} x^1_{i^1_k}=x^2_{i^2_k}=\cdots =x^r_{i^r_k}. \end{aligned}$$
(13)

Now recalling that \(z_1^*,\ldots ,z^*_K\) are chosen uniformly at random we see that the product (12) equals \((1-\exp (-\beta ))^r\) with probability \(1/2^K\) if the relation (13) holds and equals zero with probability one if this relation does not hold. Now let \(S_0\subset \{1,\ldots ,n\}\) be the set of indices \(i\) such that \(x^1_i=\cdots =x^r_i=0\). Similarly define \(S_1\). Then we see that the product (12) equals \((1-\exp (-\beta ))^r\) with probability \(1/2^K\) if for each \(k\) either \(i^1_k,i^2_k,\ldots ,i^r_k\in S_0\) or \(i^1_k,i^2_k,\ldots ,i^r_k\in S_1\). If this does not hold, the product is zero with probability one. Then for \(y\in \mathbb{R }^{n^r}_+\), the corresponding multilinear form is

$$\begin{aligned}&\displaystyle \Bigg \langle y,\mathbb{E }\bigotimes _{1\le l\le r} A_l\Bigg \rangle =2^{-K}(1-\exp (-\beta ))^r\sum \limits _{b_1,\ldots ,b_K}\sum \limits _{i^1_k,\ldots ,i^r_k\in S_{b_k}}y_{i^1_1,\ldots ,i^r_1}\cdots y_{i^1_K,\ldots ,i^r_K}, \end{aligned}$$

where the first sum over all \(b_1,b_2,\ldots ,b_K\) taking values \(0\) and \(1\). The expression above equals

$$\begin{aligned} 2^{-K}(1-\exp (-\beta ))^r\left( \sum _{j_1,\ldots ,j_r\in S_0}y_{j_1,\ldots ,j_r}+\sum _{j_1,\ldots ,j_r\in S_1}y_{j_1,\ldots ,j_r}\right) ^K. \end{aligned}$$

This form is convex on \(\mathbb{R }^{n^r}_+\) since \(y^K\) is a convex function on \(\mathbb{R }_+\) for every positive integer \(K\). We have verified that the K-SAT model satisfies the assumptions of Theorem 1.

We now turn to the Viana–Bray model for the case when \(K\) is even. The right-convergence result for this model was established earlier in [14], also only for the case of even \(K\). The odd \(K\) case remains an open problem. We set \(\alpha =J_{\max }\). For convenience we will use encoding \(-1,1\) for \(x\) instead of \(0,1\), as it was discussed when we introduced the example. In other words \(J(x_1,x_2)=\exp (\beta Ix_1x_2)\) for any \(x_1,x_2\in \{-1,1\}\). We will use the same symmetrization trick as in [14] and [23]. For any \(x_1,\ldots ,x_K\in \{-1,1\}\) observe that

$$\begin{aligned} \alpha -J(x_1,\ldots ,x_K)&= J_{\max }-\exp \left( \beta I \prod x_i\right) \\&= J_{\max }-2^{-1}(\exp (\beta I)+\exp (-\beta I)) \\&\quad -\,2^{-1}(\exp (\beta I)-\exp (-\beta I))\prod x_i \\&\triangleq f_1(I)-f_2(I)\prod x_i. \end{aligned}$$

Observe also that by symmetry of the distribution of \(I\), and as a result, of \(f_2(I)\), for every odd \(r\) we have

$$\begin{aligned} \mathbb{E }f_2^r(I)=0. \end{aligned}$$
(14)

Now we verify the convexity of (9). Fix any sequence \(x^l=(x_1,\ldots ,x_n)\in \{-1,1\}^n\) for \(l=1,2,\ldots ,r\), and consider the corresponding array \(\bigotimes _{1\le l\le r} A_l\), where

$$\begin{aligned} A_l=\alpha -J&= \left( J_{\max }-\exp (\beta I x^l_{i^l_1}\cdots x^l_{i^l_K}), ~1\le i^l_1,\ldots ,i^l_K\le n\right) \\&= \left( f_1(I)-f_2(I)x^l_{i^l_1}\cdots x^l_{i^l_K}, ~1\le i^l_1,\ldots ,i^l_K\le n\right) . \end{aligned}$$

Every entry of the tensor product \(\bigotimes _{1\le l\le r} A_l\) is again conveniently indexed by a sequence \((i^1_1,\ldots ,i^r_1),\ldots ,(i^1_K,\ldots ,i^r_K)\), where \(i^l_k, 1\le l\le r, 1\le k\le K\) vary over \(1,\ldots ,n\). The corresponding entry is then

$$\begin{aligned} \prod _{1\le l\le r}\left( f_1(I)\!-\!f_2(I)x^l_{i^l_1}\cdots x^l_{i^l_K}\right) \!&= \!\sum _{S\subset \{1,\ldots ,r\}}\left( f_1^{r-|S|}(I)+(-f_2(I))^{|S|}\prod _{l\in S}x^l_{i^l_1}\cdots x^l_{i^l_K}\right) \\ \!&= \!(1\!+\!f_1(I))^r\!+\!\sum _{S\subset \{1,\ldots ,r\}}(-f_2(I))^{|S|}\prod _{l\in S}x^l_{i^l_1}\!\cdots \! x^l_{i^l_K}, \end{aligned}$$

where the product over empty set \(S\) is assumed to be zero. Then using the observation (14) the expected entry is

$$\begin{aligned} \mathbb{E }[(1+f_1(I))^r]+\sum _{S\subset \{1,\ldots ,r\}, |S|\text {~even~}}\mathbb{E }f_2^{|S|}(I)\prod _{l\in S}x^l_{i^l_1}\cdots x^l_{i^l_K} \end{aligned}$$

Then for any vector \(y=(y_{j_1,\ldots ,j_r}, 1\le j_1,\ldots ,j_r\le n)\in \mathbb{R }^r\)

$$\begin{aligned}&\left\langle y,\mathbb{E }\bigotimes _{1\le l\le r} A_l\right\rangle =\mathbb{E }^K[(1+f_1(I))^r]\left( \sum _{1\le j_1,\ldots ,j_r\le n}y_{j_1,\ldots ,j_r}\right) ^K \\&\quad +\sum _{S\subset \{1,\ldots ,r\}, |S|\text {~even~}}\mathbb{E }f_2^{|S|}(I) \sum _{1\le l\le r,1\le k\le K,1\le i^l_k\le n}\prod _{1\le k\le K}y_{i_k^1\cdots i^r_k}\prod _{l\in S}x^l_{i^l_1}\cdots x^l_{i^l_K} \end{aligned}$$

The first summand is a convex function since \(K\) is even. The second summand is

$$\begin{aligned} \sum _{S\subset \{1,\ldots ,r\}, |S|\text {~even~}}\mathbb{E }f_2^{|S|}(I) \left( \sum _{1\le j_1,\ldots ,j_r\le n}y_{j_1\cdots j_r}\prod _{l\in S}x^l_{j_l}\right) ^K. \end{aligned}$$

The prefactor \(\mathbb{E }f_2^{|S|}(I)\) is non-negative since \(|S|\) is restricted to be even. Finally, the remaining term is convex since \(K\) is even and the term inside the power \(K\) is a linear form in vector \(y\).

4 Upper, lower and concentration bounds for the log-partition function

We first obtain some basic upper and lower bound on the log-partition functions.

Lemma 2

Under the Assumption 1 for every graph \(\mathbb{G }\) with \(N\) nodes and \(M\) edges

$$\begin{aligned} (M+N)\log \rho _{\min }&\le \log Z(\mathbb{G })\le (M+N)\log \rho _{\max }, \end{aligned}$$
(15)

almost surely. As a result \(\mathbb{E }\log Z(\mathbb{G }))\) is well defined for every graph \(\mathbb{G }\).

Proof

We have

$$\begin{aligned} Z(\mathbb{G })&\le \int \limits _{\mathbb{R }^N}\prod _{1\le u\le N} h_u(x_u) \prod _{e\in E(\mathbb{G })} J_e(x_{u^e_1},\ldots ,x_{u^e_K})dx\\&\le J_{\max }^{M}\prod _{1\le u\le N}\int \limits _\mathbb{R }h_u(x)dx \\&\le \rho _{\max }^{M+N}, \end{aligned}$$

and

$$\begin{aligned} Z(\mathbb{G })&\ge \int \limits _{x\in [0,\kappa )^N}\prod _u h_u(x_u) \prod _e J_e(x_{u^e_1},\ldots ,x_{u^e_K})dx \\&\ge \rho _{\min }^{M}\prod _u \int \limits _{[0,\kappa )} h_u(x)dx \\&\ge \rho _{\min }^{M+N}, \end{aligned}$$

from which (15) follows. \(\square \)

We now study the impact of adding/deleting one edge from a given realization of a graph and node and edge potentials.

Lemma 3

Consider any graph \(\mathbb{G }\) and realizations of node and edge potentials. Suppose the potential in node \(v\in V(\mathbb{G })\) is changed from \(h_u\) to \(\hat{h}_u\), such that Assumption 1 is still valid. Denote the resulting instance by \({\hat{\mathbb{G }}}\). The following holds almost surely

$$\begin{aligned} |\log Z(\mathbb{G })-\log Z({\hat{\mathbb{G }}})|\le 2(1+|\mathcal N (u,\mathbb{G })|)(\log \rho _{\max }-\log \rho _{\min }). \end{aligned}$$
(16)

Proof

Let \(\mathbb{G }_0\) be the graph obtained from \(\mathbb{G }\) after deleting node \(v\) and all the edges in \(\mathcal N (v,\mathbb{G })\) (together with the node potential of \(v\) and edge potentials corresponding to \(\mathcal N (v,\mathbb{G })\)). Applying Assumption 1 we have

$$\begin{aligned} Z(\mathbb{G })&= \int \limits _{\mathbb{R }^N}\prod _u h_u(x_u)\prod _{e\in E(\mathbb{G })} J_e(x_{u^e_1},\ldots ,x_{u^e_K})dx\\&\le \rho _{\max }^{|\mathcal N (v,\mathbb{G })|}\int \limits _\mathbb{R }h_v(x)dx \int \limits _{x\in \mathbb{R }^{N-1}}\prod _{u\ne v} h_u(x_u)\prod _{e\in E(\mathbb{G })\setminus \mathcal N (v,\mathbb{G })}J_e(x_{u^e_1},\ldots ,x_{u^e_K})dx\\&\le \rho _{\max }^{1+|\mathcal N (v,\mathbb{G })|}Z(\mathbb{G }_0), \end{aligned}$$

where \(Z(\mathbb{G }_0)\) is the partition function associated with \(\mathbb{G }_0\):

$$\begin{aligned} Z(\mathbb{G }_0)=\int \limits _{x\in \mathbb{R }^{N-1}}\prod _{u\ne v} h_u(x_u)\prod _{e\in E(\mathbb{G })\setminus \mathcal N (v,\mathbb{G })}J_e(x_{u^e_1},\ldots ,x_{u^e_K})dx, \end{aligned}$$

and where the first inequality follows since \(x_v\) is not coupled anymore with \(x_u, u\ne v\) through the edge potentials. On the other hand, applying the third part of Assumption 1

$$\begin{aligned} Z(\mathbb{G })&\ge \int \limits _{x_v\in [0,\kappa ), x_u\in \mathbb{R }, u\ne v}\prod _u h_u(x_u)\prod _e J_e(x_{u^e_1},\ldots ,x_{u^e_K})dx \\&\ge \rho _{\min }^{1+|\mathcal N (v,\mathbb{G })|}Z(\mathbb{G }_0). \end{aligned}$$

We obtain

$$\begin{aligned} |\log Z(\mathbb{G })-\log Z(\mathbb{G }_0)|\le (1+|\mathcal N (v,\mathbb{G })|)(\log \rho _{\max }-\log \rho _{\min }). \end{aligned}$$

Since the same bound holds for \({\hat{G}}\), we obtain the required bound. \(\square \)

Lemma 4

Consider any graph \(\mathbb{G }\) and realizations of node and edge potentials. Suppose an edge \(e=(v_1,\ldots ,v_K)\) is added to the graph together with an edge potential \(J_e\) satisfying Assumption 1. Denote the resulting instance by \({\hat{\mathbb{G }}}\). The following holds almost surely

$$\begin{aligned} \left| \log Z(\mathbb{G }\!+\!e)-\log Z(\mathbb{G })\right| \!\le \! (2K+2|\mathcal N (e,E)|\!+\!1)\left( \log \rho _{\max }-\log \rho _{\min }\right) \!.\qquad \end{aligned}$$
(17)

Proof

Consider the graph \(\mathbb{G }_0\) obtained from \(\mathbb{G }\) by deleting nodes \(v_1,\ldots ,v_K\) and all the edges in \(\mathcal N (e,\mathbb{G })\), together with their associated node and edge potentials. From Assumption 1 we obtain

$$\begin{aligned} Z(\mathbb{G })&= \int \limits _{\mathbb{R }^N}\prod _u h_u(x_u)\prod _e J_e(x_{u^e_1},\ldots ,x_{u^e_K})dx\\&\le \rho _{\max }^{K+|\mathcal N (e,\mathbb{G })|} \int \limits _{x\in \mathbb{R }^{N-K}}\prod _{u\ne v_1,\ldots ,v_K} h_u(x_u)\prod _{e\notin \mathcal N (e,\mathbb{G })}J_e(x_{u^e_1},\ldots ,x_{u^e_K})dx\\&= \rho _{\max }^{K+|\mathcal N (e,\mathbb{G })|}Z(\mathbb{G }_0). \end{aligned}$$

On the other hand, by the third part of Assumption 1

$$\begin{aligned} Z(\mathbb{G })&\ge \int \limits _{x_{v_1},\ldots ,x_{v_K}\in [0,\kappa ), x_u\in \mathbb{R }, u\ne v_1,\ldots ,v_K}\prod _u h_u(x_u)\prod _e J_e(x_{u^e_1},\ldots ,x_{u^e_K})dx \\&\ge \rho _{\min }^{K+|\mathcal N (e,\mathbb{G })|}Z(\mathbb{G }_0)\!. \end{aligned}$$

We conclude

$$\begin{aligned} |\log Z(\mathbb{G })-\log Z(\mathbb{G }_0)|\le (K+|\mathcal N (e,\mathbb{G })|)\left( \log \rho _{\max }-\log \rho _{\min }\right) \!. \end{aligned}$$

Observe that a similar bound holds when \(\mathbb{G }+e\) replaces \(\mathbb{G }\), where we simply replace \(|\mathcal N (e,\mathbb{G })|\) with \(|\mathcal N (e,\mathbb{G })|+1\) in upper and lower bounds. Putting the two bounds together, we obtain the result. \(\square \)

We now establish a result regarding the concentration of \(\log Z(\mathbb{G }(N,c))\) around its mean.

Proposition 1

Under Assumption 1 the following concentration bound holds:

$$\begin{aligned} \lim _{N\rightarrow \infty }\mathbb{P }\left( \left| N^{-1}\log Z(\mathbb{G }(N,c))-N^{-1}\mathbb{E }[\log Z(\mathbb{G }(N,c))]\right| >\log ^3 N/\sqrt{N}\right) =0. \end{aligned}$$

Proof

A standard approach for proving such concentration result is Azuma-Hoeffding inequality which establishes concentration bound for martingales with bounded increments. The application of such a technique would be straightforward if we had a deterministic bound on the edge and node degrees \(\mathcal N (e,\mathbb{G })\). Unfortunately, this is not the case as the largest degree in sparse random graphs \(\mathbb{G }(N,c)\) is known to grow at nearly a logarithmic rate. In order to deal with this we first establish a simple bound on the degree which holds with high probability and then apply the martingale concentration bound to a truncated version of \(Z(\mathbb{G })\).

Thus let us establish the following simple bound on the largest degree of \(\mathbb{G }\).

$$\begin{aligned} \mathbb{P }\left( \max _{u\le N}|\mathcal N (u,\mathbb{G })|=\log N\right) =N^{-O(\log \log N)}\!. \end{aligned}$$
(18)

The total number of edges containing node \(u\) is \(N^K-(N-1)^K\). Thus the probability that a randomly chosen edge contains \(u\) is

$$\begin{aligned} {N^K-(N-1)^K\over N^K}=1-(1-1/N)^K={K\over N}+o(N^{-1}). \end{aligned}$$

Given an arbitrary \(m\), we then obtain

$$\begin{aligned} \mathbb{P }\left( |\mathcal N (u,\mathbb{G })|=m\right)&= \left( \begin{array}{c}\lfloor cN\rfloor \\ m\end{array}\right) \left( {K\over N}+o(N^{-1})\right) ^m\left( 1-{K\over N}+o(N^{-1})\right) ^m\\&\le \left( \begin{array}{c} \lfloor cN\rfloor \\ m \end{array}\right) \left( {K\over N}+o(N^{-1})\right) ^m\\&\le {(cN)^m\over m!}\left( {K^m\over N^m}+o(N^{-m})\right) \\&= {(cK)^m\over m!}+o\left( {(cK)^m\over m!}\right) . \end{aligned}$$

When \(m \ge \log N\), this bound is \({\frac{1}{N^{O(\log \log N)}}}\). Using union bound, (18) follows from \(N^2 N^{-O(\log \log N)}=N^{-O(\log \log N)}\), where factor \(N\) in \(N^2\) is obtained by over summing over nodes, and the second factor \(N\) is obtained by summing over \(\log N\le m\le N\).

We now return to the proof of the concentration result. For every \(n=1,2,\ldots ,N\), let \(\mathcal F _n\) be the filtration associated with random variables \(h_u, 1\le u\le n\), all edges \(e\) spanned by the nodes \(1,\ldots ,n\), as well as their associated random variables \(J_{e}\). Namely, \(\mathcal F _n\) is the information revealed by the portion of the graph \(\mathbb{G }(N,c)\) associated with the first \(n\) nodes, edges spanned by these nodes, as well as their associated potentials \(h_u, J_{e}\). Then \(R_N=\log Z(\mathbb{G }(N,c))\) and \(R_n\triangleq \mathbb{E }[\log Z(\mathbb{G }(N,c))|\mathcal F _n], 0\le n\le N\) is a martingale, where \(\mathcal F _0\) is assumed to be a trivial filtration and \(\mathbb{E }[\log Z(\mathbb{G }(N,c))|\mathcal F _0]=\mathbb{E }[\log Z(\mathbb{G }(N,c))]\). For every \(n\), let \(D_n\) be the maximum degree of the nodes \(1,\ldots ,n\) in the subgraph spanned by \(1,\ldots ,n\). Applying Lemmas 3 and 4 we have \(|R_n-R_{n-1}|\le c_1+c_1D_n^2\), for some constant \(c_1>0\), which depends on \(K,\rho _{\min },\rho _{\max }\) only. Indeed conditioning on the node potential \(h_n\) of the node \(n\), by Lemma 3 changes the conditioned expectation by at most \(c_1 \,+\, c_2 D_n\) for some \(c_1,c_2\). Also revealing each edge incident to \(n\) and spanned by nodes \(1,\ldots ,n\) changes the conditioned expectation also by at most \(c_1\,+\,c_2 D_n\), by Lemma 4. Thus the total change is at most \(c_1+c_2 D_n^2\le c_3 D_n^2\), for some appropriate constant \(c_3\), (where we do not bother to rename the constants \(c_1,c_2\)).

Let \(M\le N\) be defined the smallest \(n\) such that \(D_n>\log n\). If no such node exists (which by (18) occurs with overwhelming probability), then we set \(M=N\). Clearly, \(M\) is a stopping time with respect to the filtration \(\mathcal F _n\), and \({\hat{R}}_n\triangleq R_{\min (M,n)}, 0\le n\le N\) is a stopped martingale, satisfying \(|{\hat{R}}_n-{\hat{R}}_{n-1}|\le c_3\log ^2 N\). Now applying Azuma-Hoeffding inequality, we obtain for every \(x>0\),

$$\begin{aligned} \mathbb{P }\left( \left| {\hat{R}}_N-\mathbb{E }{\hat{R}}_N\right| > x( c_3\log ^2 N) \sqrt{N}\right) \le \exp (-x^2/2). \end{aligned}$$

From this we obtain

$$\begin{aligned} \mathbb{P }\left( \left| R_N-\mathbb{E }[R_N]\right| >\sqrt{N}\log ^3 N\right)&\le \mathbb{P }\left( \left| {\hat{R}}_N-\mathbb{E }[{\hat{R}}_N]\right| >\sqrt{N}\log ^3 N, M=N\right) \\&\quad +\,\mathbb{P }(M<N)\\&\le \mathbb{P }\left( \left| {\hat{R}}_N-\mathbb{E }[{\hat{R}}_N]\right| >\sqrt{N}\log ^3 N\right) \\&\quad +\,N^{-O(\log \log N)}\\&\le \exp \left( -\log ^2 N/(2c_3^2)\right) +N^{-O(\log \log N)}\\&= N^{-O(\log \log N)}. \end{aligned}$$

We obtain the claimed concentration result:

$$\begin{aligned} \mathbb{P }\left( \left| N^{-1}\log Z(\mathbb{G }(N,c))\!-\!N^{-1}\mathbb{E }[\log Z(\mathbb{G }(N,c))]\right| \!>\!\log ^3 N/\sqrt{N}\right) \!=\!N^{-O(\log \log N)}. \end{aligned}$$

\(\square \)

5 Interpolation scheme and superadditivity

In this section we introduce the interpolation method and use it to prove our main result, Theorem 1. Given a positive integer \(N\), consider any positive integers \(N_1,N_2\) such that \(N_1+N_2=N\). For every \(t=0,1,\ldots ,\lfloor cN\rfloor \) we introduce a random graph denoted by \(\mathbb{G }(N,c,t)\) generated as follows. The graph \(\mathbb{G }(N,c,t)\) has \(\lfloor cN\rfloor \) edges. Among those, \(t\) edges are generated independently and uniformly at random among all the \(N^K\) potential edges on \(N\) nodes. Each of the remaining \(\lfloor cN\rfloor -t\) edges is generated independently and uniformly at random among the \(N_1^K\) edges of the complete graph supported by nodes \(1,2,\ldots ,N_1\), with probability \(N_1/N\), and is generated independently uniformly at random among the \(N_2^K\) edges of the complete graph supported by nodes \(N_1+1,,\ldots ,N\), with probability \(N_2/N\). Observe that \(\mathbb{G }(N,c,0)=\mathbb{G }(N,c)\) and \(\mathbb{G }(N,c,\lfloor cN\rfloor )\) is a disjoint union of two graphs \(\mathbb{G }_j, j=1,2\), where \(\mathbb{G }_j\) has \(N_j\) nodes and \(R_j\) edges chosen uniformly at random from \(N_j^K\) edges, where \(R_j\) has a binomial distribution with \(\lfloor cN\rfloor \) trials and success probability \(N_j/N\). In particular, the expected number of edges in \(\mathbb{G }_j\) is \((N_j/N)\lfloor cN\rfloor \in [cN_j\!-\!1, cN_j]\). Every node \(u=1,\ldots ,N\) of the graph \(\mathbb{G }(N,c,t)\) is equipped with the node potential \(h_u\) distributed according to \(\nu _h\), and every edge \(e=e_1,\ldots ,e_{\lfloor cN\rfloor }\) of the graph is equipped with the edge potential \(J_e\) distributed according to \(\nu _J\), all choices made independent. Our main technical result leading to Theorem 1 is that the expected log-partition function of \(\mathbb{G }(N,c,t)\) is decreasing as a function of \(t\):

Proposition 2

Suppose Assumption 1 holds. Then

$$\begin{aligned} \mathbb{E }[\log Z(\mathbb{G }(N,c,t))]\ge \mathbb{E }[\log Z(\mathbb{G }(N,c,t+1))] \end{aligned}$$

for every \(0\le t\le \lfloor cN\rfloor -1\).

Before we prove the proposition we use it to esablish our main result.

Proof of Theorem 1

Since the partition function of a disjoint union of two graphs is the product of the corresponding partition functions, then, as a corollary of Proposition 2, we obtain

$$\begin{aligned} \mathbb{E }[\log Z(\mathbb{G }(N,c))]\ge \mathbb{E }[\log Z(\mathbb{G }_1)]+\mathbb{E }[\log Z(\mathbb{G }_2)], \end{aligned}$$
(19)

where \(\mathbb{G }_1\) and \(\mathbb{G }_2\) are disjoint parts of \(\mathbb{G }(N,c,\lfloor cN\rfloor )\) described above. Since the expected number of edges of \(\mathbb{G }_j, j=1,2\) is in the interval \([cN_j-1,cN_j]\), and the number of edges has binomial distribution with \(O(N)\) trials, then we can obtain a graph \(\mathbb{G }(N_j,c)\) from \(\mathbb{G }_j\) by deleting or adding at most \(O(\sqrt{N})\) edges in expectation. Applying Lemma 4 this also implies that for \(j=1,2\)

$$\begin{aligned} \Big |\mathbb{E }[\log Z(\mathbb{G }_1)]-\mathbb{E }[\log Z(\mathbb{G }(N_j,c))]\Big |\le O(\sqrt{N}). \end{aligned}$$

Combining with (19) this implies that the sequence \(\mathbb{E }[\log Z(\mathbb{G }(N,c))]\) satisfies the following near super-additivity property:

$$\begin{aligned} \mathbb{E }[\log Z(\mathbb{G }(N,c))]\ge \mathbb{E }[\log Z(\mathbb{G }(N_1,c))]+\mathbb{E }[\log Z(\mathbb{G }(N_2,c))]-O(\sqrt{N}). \end{aligned}$$

It is a classical fact, known as Fekete’s Lemma, that super-additive sequences converge to a limit after the normalization. It is rather straightforward to show that the same applies to nearly super-additive sequences, provided the correction term is \(O(N^\alpha )\), with \(\alpha <1\) (and \(\alpha =1/2\) in our case). The complete proof can be found in [5]. We conclude that the following limit exists:

$$\begin{aligned} \lim _{N\rightarrow \infty }{\mathbb{E }[\log Z(\mathbb{G }(N,c))]\over N}. \end{aligned}$$

Combining with the concentration result of Proposition 1, we conclude that the sequence \(\mathbb{G }(N,c)\) is right-converging. \(\square \)

Proof of Proposition 2

Fix any \(t<\lfloor cN\rfloor \). Observe that the graph \(\mathbb{G }(N,c,t+1)\) can be obtained from \(\mathbb{G }(N,c,t)\) by removing from \(\mathbb{G }(N,c,t)\) an edge \(e\) (together with the associated edge potential) chosen uniformly at random from all the edges of \(\mathbb{G }(N,c,t)\), and adding an edge \({\hat{e}}\) supported by nodes \(1,\ldots ,N_1\) chosen uniformly at random from \(N_1^K\) possibilities, with probability \(N_1/N\), or supported by nodes \(N_1+1,\ldots ,N\) chosen uniformly at random from \(N_2^K\) possibilities, with probability \(N_2/N\). The newly created edge \({\hat{e}}\) is equipped with an edge potential \(J_{\hat{e}}\) generated at random using \(\nu _J\), independently from all the other randomness of the graph. In this edge removing and edge adding procedure we keep the node potentials intact. Similarly, we keep edge potentials intact for all edges other than the removed and the added one. Let \(\mathbb{G }_0\) be the realization of the graph obtained after removing edge \(e\), but before adding \({\hat{e}}\). We assume that \(\mathbb{G }_0\) encodes the node/edge potentials as well. The proposition follows from the following apparently stronger inequality, which we claim holds for every \(\mathbb{G }_0\):

$$\begin{aligned} \mathbb{E }[\log Z(\mathbb{G }(N,c,t))|\mathbb{G }_0]\!-\!\log Z(\mathbb{G }_0)\!\ge \! \mathbb{E }[\log Z(\mathbb{G }(N,c,t+1))|\mathbb{G }_0]\!-\!\log Z(\mathbb{G }_0).\nonumber \\ \end{aligned}$$
(20)

Note by Lemma 2 that \(\log Z(G_0)\) as well as both expectations are finite, and thus the proposition indeed follows from (20). Let \(e=(v_1,\ldots ,v_K)\) and let \(J\) be the corresponding edge potential. Similarly, let \({\hat{e}}=({\hat{v}}_1,\ldots ,{\hat{v}}_K)\) and let \({\hat{J}}\) be the corresponding edge potential. Notice that the randomness of \(e,{\hat{e}},J\) and \({\hat{J}}\) are the only sources of randomness in the expectations in (20).

Considering \(\alpha \ge J_{\max }\) such that (9) is convex, using Taylor expansion

$$\begin{aligned} \log x-\log x_0=-\sum _{r\ge 1} r^{-1}(x_0-x)^r x_0^{-r} \end{aligned}$$

around \(x_0=\alpha Z(\mathbb{G }_0)\), we obtain

$$\begin{aligned}&\mathbb{E }[\log Z(\mathbb{G }(N,c,t))|\mathbb{G }_0,J]-\log Z(\mathbb{G }_0) \\&\quad =\log \alpha -\sum _{r\ge 1}r^{-1}\alpha ^{-r}Z^{-r}(\mathbb{G }_0)\mathbb{E }\left[ \left( \alpha Z(\mathbb{G }_0)-Z(\mathbb{G }(N,c,t))\right) ^r\right] . \end{aligned}$$

Before we proceed, we need to justify the interchange of infinite summation and expectation. First observe that \(\alpha Z(\mathbb{G }_0)\ge Z(\mathbb{G }(N,c,t))\), since adding an edge can increase the partition function by at most \(J_{\max }\le \alpha \) multiplicative factor. (Bound in Lemma 4 is cruder since we needed it to be two sided). Then the interchange of limits is justified by the Monotone Convergence Theorem.

With a similar expression for \(Z(\mathbb{G }(N,c,t+1))\), we obtain that it suffices to show

$$\begin{aligned} \mathbb{E }\left[ \left( \alpha Z(\mathbb{G }_0)-Z(\mathbb{G }(N,c,t))\right) ^r\right] \le \mathbb{E }\left[ \left( \alpha Z(\mathbb{G }_0)-Z(\mathbb{G }(N,c,t+1))\right) ^r\right] . \end{aligned}$$
(21)

We begin with the expression on the left and expand it as

$$\begin{aligned}&\mathbb{E }\left[ \left( \alpha Z(\mathbb{G }_0)-Z(\mathbb{G }(N,c,t))\right) ^r\right] \\&\quad =\mathbb{E }\left( \,\,\int \limits _{\mathbb{R }^N} \Bigg (\alpha -J\Bigg (x_{v_1},\ldots ,x_{v_K}\Bigg )\Bigg )\prod _{u}h_u(x_{u}\Bigg ) \prod _{e\in E(\mathbb{G }_0)}J_e\Bigg (x_{u^e_1},\ldots ,x_{u^e_K}\Bigg )dx\right) ^r\\&\quad =\mathbb{E }\int \limits _{x^1,\ldots ,x^r\in \mathbb{R }^N} \prod _{1\le l\le r}\left( \alpha -J\Bigg (x^l_{v_1},\ldots ,x^l_{v_K}\Bigg )\right) \prod _{u}h_u\Bigg (x^l_{u}\Bigg )\\&\quad \quad \times \prod _{e\in E(\mathbb{G }_0)}J_e\Bigg (x^l_{u^e_1},\ldots ,x^l_{u^e_K}\Bigg )dx^1\cdots dx^r\\&\quad ={N^{-K}}\sum _{1\le v_1,\ldots v_K\le N} \int \limits _{x^1,\ldots ,x^r\in \mathbb{R }^N} \mathbb{E }\prod _{1\le l\le r}\left( \alpha -J\Bigg (x^l_{v_1},\ldots ,x^l_{v_K}\Bigg )\right) \prod _{u}h_u\Bigg (x^l_{u}\Bigg )\\&\quad \quad \times \prod _{e\in E(\mathbb{G }_0)}J_e\Bigg (x^l_{u^e_1},\ldots ,x^l_{u^e_K}\Bigg )dx^1\cdots dx^r. \end{aligned}$$

Here we note that the expectation in the last term is with respect to the randomness of \(J\) only. Given \(x^1,\ldots ,x^r\), we focus on

$$\begin{aligned} {N^{-K}}\sum _{1\le v_1,\ldots v_K\le N}\mathbb{E }\prod _{1\le l\le r}\left( \alpha -J\Bigg (x^l_{v_1},\ldots ,x^l_{v_K}\Bigg )\right) \!. \end{aligned}$$
(22)

For each \(l=1,\ldots ,r\), consider the \(K\)-th order \(N\)-dimensional array

$$\begin{aligned} A_l=\left( \alpha -J\Bigg (x^l_{v_1},\ldots ,x^l_{v_K}\Bigg ), 1\le v_1,\ldots ,v_K\le N\right) \!. \end{aligned}$$

Also consider \(N^r\)-dimensional vector \(e^{N,r}\) defined as follows: for every \(1\le i_1,\ldots ,i_r\le N\),

$$\begin{aligned} e^{N,r}_{i_1,\ldots ,i_r}=\left\{ \begin{array}{l@{\quad }l} N^{-1}, &{} i_1=i_2=\cdots =i_r; \\ 0, &{} \hbox {otherwise.} \end{array} \right. \end{aligned}$$

Observe that (22) is

$$\begin{aligned} \mathbb{E }\left[ \Bigg \langle e^{N,r},\bigotimes _{1\le l\le r}A_l\Bigg \rangle \right] . \end{aligned}$$

Now consider the right-hand size of (21). For convenience, denote the set of nodes \(1,\ldots ,N_1\) by \([N_1]\), and the set of nodes \(N_1+1,\ldots ,N\) by \([N_2]\). Using the same expansion, but keeping in mind that we have \({\hat{e}}\) in place of \(e\), we obtain

$$\begin{aligned}&\sum _{j=1,2}{N_j\over N}{N_j^{-K}}\sum _{1\le v_1,\ldots v_K\in [N_j]} \int \limits _{x^1,\ldots ,x^r\in \mathbb{R }^N} \mathbb{E }\prod _{1\le l\le r}\left( \alpha -J\Bigg (x^l_{v_1},\ldots ,x^l_{v_K}\Bigg )\right) \prod _{u}h_u\Bigg (x^l_{u}\Bigg )\\&\quad \times \prod _{e\in E(\mathbb{G }_0)}J_e\Bigg (x^l_{u^e_1},\ldots ,x^l_{u^e_K}\Bigg )dx^1\cdots dx^r. \end{aligned}$$

Given \(x^1,\ldots ,x^r\), we now focus on

$$\begin{aligned} \sum _{j=1,2}{N_j\over N}{N_j^{-K}}\sum _{1\le v_1,\ldots v_K\in [N_j]}\mathbb{E }\prod _{1\le l\le r}\left( \alpha -J\Bigg (x^l_{v_1},\ldots ,x^l_{v_K}\Bigg )\right) . \end{aligned}$$
(23)

For each \(j=1,2\) consider \(N^r\)-dimensional vector \(e^{N,r,j}\) defined as follows: for every \(1\le i_1,\ldots ,i_r\le N\),

$$\begin{aligned} e^{N,r,j}_{i_1,\ldots ,i_r}=\left\{ \begin{array}{l@{\quad }l} N_j^{-1}, &{} \hbox {if} \ i_1=i_2=\cdots =i_r\in [N_j]; \\ 0, &{} \hbox {otherwise.} \end{array} \right. \end{aligned}$$

Now observe that (22) is

$$\begin{aligned} \sum _{j=1,2}{N_j\over N}\mathbb{E }\left[ \Bigg \langle e^{N,r,j},\bigotimes _{1\le l\le r}A_l\Bigg \rangle \right] , \end{aligned}$$

where \(A_l, 1\le l\le r\) are defined as above. By the assumption of convexity of the expected tensor product \(\mathbb{E }[\bigotimes _{1\le l\le r}A_l]\), which is (9), we obtain

$$\begin{aligned} \sum _{j=1,2}{N_j\over N}\mathbb{E }\left[ \Bigg \langle e^{N,r,j},\bigotimes _{1\le l\le r}A_l\Bigg \rangle \right] \ge \mathbb{E }\left[ \Bigg \langle \sum _{j=1,2}{N_j\over N}e^{N,r,j},\bigotimes _{1\le l\le r}A_l\Bigg \rangle \right] . \end{aligned}$$

Recognizing \(\sum _{j=1,2}{N_j\over N}e^{N,r,j}\) as \(e^{N,r}\) we obtain the claimed bound (21). This completes the proof of Proposition 2. \(\square \)