Abstract
We discuss a generalization of the classic condition of validity of the interpolation method for the density of quenched free energy of mean field spin glasses. The condition is written just in terms of the \(L^2\) metric structure of the Gaussian random variables. As an example of application we deduce the existence of the thermodynamic limit for a GREM model with infinite branches for which the classic conditions of validity fail. We underline the dependence of the density of quenched free energy just on the metric structure and discuss the models from a metric viewpoint.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The interpolation method is a simple but powerful technique used to prove inequalities for Gaussian random vectors (see for example [20] and [21]). This method has great relevance in the field of Mathematical and Theoretical Physics since it represents an essential ingredient in the study of mean field spin glasses. In the breakthrough paper [19] it has been used to prove the existence of the thermodynamic limit for the quenched density of free energy for the Sherrington–Kirkpatrick model. This was a longstanding problem and its solution was the turning point towards the proof of the Parisi Formula [29].
Spin glasses are simple mathematical models for disordered systems whose rigorous analysis is indeed a challenge for mathematicians. We refer to [24, 28] the mathematically interested reader and to [23] the physically interested one. Among plenty of models, one of the most studied is that introduced by Sherrington and Kirkpatrick in [26] as a solvable elementary model. Indeed the structure of the solution turned out to be much more rich and complex than expected and was build up in a series of papers by Parisi (see [23] for a detailed discussion). A rigorous proof of the Parisi conjectured solution was missing for a long time and the interpolation method played a key role in its proof. See [18] for a review on this.
Using the same idea of [19], the authors of [12] proposed a general setting for the interpolation method in the framework of mean field spin glasses. Furthermore, they successfully applied this technique to prove the existence of the thermodynamic limit for the Generalized Random Energy Model (GREM, a family of models introduced in [15]) with a finite number of levels.
The interpolation method is now a powerful technique that has many different applications in different contexts, see for example [1,2,3,4,5,6, 22], a list that is by far not exaustive.
The “classical” hypothesis under which the interpolation method can be applied to the quenched free energy of mean field spin glasses consists of a collection of equalities and inequalities for the covariance matrix of the underlying multivariate Gaussian process. We show that less restrictive conditions are actually needed. More precisely, we show that the method works under conditions that involve just the \(L^2\) metric structure of the Gaussian random vectors. By the correspondence in [27, 17] this is always an Euclidean metric structure. A condition of this type is very natural since the quenched free energy depends on the distribution of the Gaussian random vector only through its metric structure. This gives an interesting geometric flavor and interpretation and we discuss, at the end of the paper, the models from a purely metric viewpoint. This generalized condition of validity was also obtained through a tricky computation in the framework of SudakovFernique inequalities in [11]. Here we deduce the condition by a general argument that could in principle be applied also to comparative inequalities involving expected values of different functions of Gaussian vectors. As an example of application of the generalized condition, we consider a GREM model with infinite levels and deduce the existence of the thermodynamic limit for the quenched density of free energy. Indeed, in this case the usual condition of validity of the interpolation method used in [12, 19] fails. We can deduce therefore the existence of the thermodynamic limit directly using the simple argument of the interpolation method. We refer to [9, 25] and [10] for the beautiful mathematics involved in the limit of such kind of models.
The structure of the paper is the following.
In Sect. 2 we briefly recall the basics of the interpolation method together with the conditions used in [19] and [12]; we then discuss the Euclidean metric structure associated to any Gaussian random vector and finally show the generalized conditions.
In Sect. 3 we discuss two examples. The first one is the Sherrington–Kirkpatrick model. This is done simply to recall the basic mechanism and idea of application. The second example is a GREM model with infinite levels for which it is necessary to use the generalized conditions to prove the existence of the thermodynamic limit. In the final part of this section we discuss the models from a purely metric viewpoint introducing a class of models that have a natural metric structure and for which it is possible to show the existence of the thermodynamic limit.
In the Appendix we have an elementary auxiliary Lemma.
2 The Interpolation Method
2.1 The Interpolation Method
Let \(X=(X_1,\dots ,X_n)\) be a ndimensional zero mean Gaussian random vector having covariance matrix C. The \(n\times n\) symmetric matrix C is nonnegative definite and the elements are defined by \(C_{i,j}{:=}{\mathbb {E}}\left[ X_iX_j\right] \). When C is positive definite then the distribution of X is absolutely continuous with respect to the Lebesgue measure on \({\mathbb {R}}^n\) and the density is
where \(\left( \,\cdot \, ,\,\cdot \,\right) \) denotes the Euclidean scalar product in \({\mathbb {R}}^n\). We restrict to the case of positive definite matrices since the other cases can be deduced by a limiting procedure. We have the Fourier transform representation
We denote by \(\mathrm {Tr}\,(\,\cdot \,)\) the trace of a matrix and consider \(\,\overline{\! \mathcal {C}}\) the set of nonnegative definite symmetric matrices endowed with the HilbertSchmidt scalar product
The open set of positive definite symmetric matrices corresponds to \({\mathcal {C}}\).
Let \(\phi : {\mathcal {C}} \times {\mathbb {R}}^n\rightarrow {\mathbb {R}}^+\) as defined in (2.1). By (2.2) and a direct computation we have
and
Recall that in the above formulas C is a symmetric matrix so that the variations in the computation of (2.4) are constructed varying symmetrically the matrix C. More precisely let \(E^{\{i,j\}}\) with \(i\ne j\) be the symmetric matrix such that \(E^{\{i,j\}}_{i,j}=E^{\{i,j\}}_{j,i}=1\) and having all the remaining elements equal to zero. Given \(F:{\mathcal {C}}\rightarrow {\mathbb {R}}\) we define
Consider now \(f:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) a \(C^2\) function with moderate growth at infinity, for example such that \(f(x)\le \mathrm {e}^{\lambda x}\) for a suitable constant \(\lambda \ge 0\). This technical condition is related to the validity of some integrations by parts. We call \(\nabla ^2f\left( x\right) \) the Hessian matrix of f at x, that is the symmetric matrix having elements
The following result is the interpolation method. For the readers convenience we give the short proof.
Lemma 2.1
(Interpolation method) Consider two zero mean Gaussian random vectors X, Y having covariance matrices respectively given by \(C^X\) and \(C^Y\). Consider a \(C^2\) function f with moderate growth. We have
where
and X, Y are two independent copies of the random vectors.
Proof
When Z is a ndimensional centered Gaussian random vector, then \(\mathbb {E}\left[ f(Z)\right] \) depends only on the covariance matrix C of the vector Z. Fix a \(C^2\) function f and define the function \(F:\overline{{\mathcal {C}}}\rightarrow {\mathbb {R}}\) as
With the help of formulas (2.4), (2.5), when \(C\in {\mathcal {C}}\) we can compute
and
Given a \(C^1\) parametric curve \(\left\{ C(t)\right\} _{t\in [0,1]}\) on \({\mathcal {C}}\) such that \(C(0)=C^X\) and \(C(1)=C^Y\), then we have
where \(Z\left( t\right) \) is a centered Gaussian random vector having covariance \(C\left( t\right) \). The special case when the curve linearly interpolates between \(C^X\) and \(C^Y\) gives (2.7) with Z(t) given by (2.8). If one or both the matrices \(C^X\) and \(C^Y\) are not strictly positive definite, it is possible to add to the matrices \(\varepsilon \mathbb I\), do the same computation as above and finally take the limit \(\varepsilon \rightarrow 0\). \(\square \)
The above formula is the core of the interpolation method. It is very useful to establish inequalities between the two expected values on the left hand side of (2.7).
The Guerra–Toninelli interpolation method is a simple but powerful technique developed in the study of mean field spin glasses (see [18, 19] and references therein), which is based on an abstract theorem about Gaussian random variables. It corresponds to the interpolation method Lemma 2.1 with the special choice of the function
where \(w_i\in {\mathbb {R}}^+\) are some fixed positive weights.
In particular, Guerra and Toninelli obtained and used the following result (this is Theorem 2 in [18]) to prove the existence of the thermodynamic limit of the Sherrington–Kirkpatrick model. The same idea and the same Theorem (Theorem 2.2 below), was used later on in [12] to deduce the existence of the thermodynamic limit for a GREM model [15] with a finite number of levels.
Theorem 2.2
Let X, Y two centered Gaussian random vectors and the function f given by (2.14). If
then we have
We show the proof of Theorem 2.2 that is based on the interpolation formula (2.7).
Proof of Theorem 2.2
Let us call, for any \(i=1,\ldots ,n\)
By a direct computation, when f is (2.14), we have
By the formulas (2.19), (2.20) and conditions (2.15), (2.16), we have that
and the result follows by (2.7). \(\square \)
2.2 Covariances and Metrics
We start recalling some simple but useful Lemmas.
Lemma 2.3
We have that the \(n\times n\) symmetric matrix C belongs to \({\mathcal {C}}\) if and only if there exist n vectors \(a^{(i)}\in {\mathbb {R}}^n\) such that
This is a classic result and the matrix C is called the Gram matrix of the vectors \(\left( a^{(i)}\right) _{i=1,\dots ,n}\), see for example [7].
A finite metric space with n points is called Euclidean if there exists a collection of n points on \({\mathbb {R}}^k\) having the same relative interdistances. Of course we can always fix \(k=n\). Not every metric space can be realized in this way. The simplest example is the minimal path metrics on the vertices of the graph in Fig. 1 where the edges have all length 1.
Given a centered Gaussian random vector X there is naturally associated the metric \(d_X\) that is the \(L^2\) distance between the random variables
We have the following result (see also [17, 27])
Lemma 2.4
A finite metric space \(\left( \left\{ 1,\dots ,n\right\} , d\right) \) is Euclidean if and only if there exists a zero mean Gaussian random vector \(X=(X_1,\dots ,X_n)\) such that \(d=d_X\).
Proof
Consider d an Euclidean distance and let \(a^{(i)}\), \(i=1,\dots ,n\) be some points on \({\mathbb {R}}^n\) that realize such a distance. This means that \(d(i,j)=\left a^{(i)}a^{(j)}\right \) where \(\cdot \) represents the Euclidean distance in \({\mathbb {R}}^n\). Such a collection of vectors exists by definition of an Euclidean metric space. Let A be an \(n\times n\) matrix defined by \(A_{i,j}{:=}a^{(i)}_j\). Let \(Z=(Z_1,\dots ,Z_n)\) be a vector of i.i.d. standard Gaussian random variables and consider the Gaussian vector \(X=AZ\) whose covariance \(C^X=AA^T\) coincides with the right hand side of (2.22). Using (2.23) we have
Conversely let X a Gaussian zero mean vector with covariance \(C^X\) and let A an \(n\times n\) matrix such that \(C^X=AA^T\). Define n vectors in \({\mathbb {R}}^n\) by \(a^{(i)}_j{:=}A_{i,j}\); by (2.23) we have that \(d_X\) is determined by the first equality in (2.24) and is therefore Euclidean.\(\square \)
Other simple but useful lemmas are the following. We give just the statements, the proofs can be found for example in [8].
Lemma 2.5
Let \(v^{(1)},\dots , v^{(n)}\) and \(w^{(1)},\dots , w^{(n)}\) be two collections of n vectors in \({\mathbb {R}}^n\). We have that
if and only if there exists \(O\in O\left( n\right) \) such that \(w^{(i)}=Ov^{(i)}\) for any i.
Lemma 2.6
Let \(v^{(1)},\dots ,v^{(n)}\) and \(w^{(1)},\dots ,w^{(n)}\) be two collections of vectors in \({\mathbb {R}}^n\). We have that
if and only if there exists \(O\in O\left( n\right) \) and a vector \(b\in {\mathbb {R}}^n\) such that
The metric structure \(d_X\), associated to a Gaussian random vector X, contains less information than the covariance \(C^X\) and there are random vectors having different covariances but the same metric structure. This type of invariance is best understood in terms of the vectors in \({\mathbb {R}}^n\) using the above Lemmas that characterize invariance by rotations and translations. In particular we can completely characterize the centered Gaussian random variables that share the same metric structure.
Lemma 2.7
Given X and Y two ndimensional centered Gaussian random vectors, we have that \(d_X=d_Y\) if and only if there exists a centered Gaussian random variable W such that the random vector \(X_i+ W\), \(i=1,\dots ,n\) has the same distribution of Y.
Proof
If Y has the same distribution of \(X+W{1}_n\), where \({1}_n\) is the ndimensional vector of all ones \({1}_n=\left( 1,1,\ldots ,1\right) \), then
Conversely, suppose that \(d_X=d_Y\). We have that there exist two matrices \(A^X\) and \(A^Y\) such that \(A^XZ\) has the same distribution of X and \(A^YZ\) has the same distribution of Y, where Z is an n vector of i.i.d. standard Gaussian random variables. We define two collections \(v^{(i)}, w^{(i)}\), \(i=1,\dots n\) of vectors in \({\mathbb {R}}^n\) defined by \(v^{(i)}_j{:=}A^X_{i,j}\) and \(w^{(i)}_j{:=}A^Y_{i,j}\). Since \(d_X=d_Y\) we have
and by Lemma 2.6 there exist \(O\in O(n)\) and a vector \(b\in {\mathbb {R}}^n\) such that \(w^{(i)}=Ov^{(i)}+b\), \(i=1,\dots ,n\). In terms of the corresponding matrices this means that \(A^Y=A^XO^T+B\), where the matrix B is defined as \(B_{i,j}{:=}b_j\). We obtain therefore
The random vector \(A^XO^TZ\) is a centered Gaussian random vector with covariance \(A^XO^TO(A^X)^T=C^X\) so that it has the same law of X. The random vector BZ has all the components equal and setting \(W=\sum _{j=1}^nb_jZ_j\) we finish the proof. \(\square \)
A direct consequence of the above result is the following. Define the function \(F:\,\overline{\! {\mathcal {C}}}\rightarrow {\mathbb {R}}\) by
where X is a centered Gaussian random vector with covariance C.
Lemma 2.8
Given \(C^X, C^Y \in \,\overline{\! {\mathcal {C}}}\) such that \(d_X=d_Y\), then \(F(C^X)=F(C^Y)\).
Proof
Since \(d_X=d_Y\) by Lemma 2.7 we have that \(Y=X+W\) and therefore
where the last equality follows by the fact that W is centered. \(\square \)
This Lemma simply says that we can define the right hand side of (2.30) as \(\widetilde{F}(d)\) since the function depends just on the metric structure of the random variables and not on their correlations.
We expect therefore to have a version of Theorem 2.2 with conditions written just in terms of the metrics. This is done in the next section.
2.3 A Generalized Condition
We show how to generalize Theorem 2.2 proving that (2.17) can be deduced under weaker hypotheses concerning just the metric structures. The same inequality has been obtained in [11] with a tricky computation. Here we show that this fact follows from a general argument that may be applied for different functions f.
Theorem 2.9
Let X, Y two centered Gaussian random vectors and the function f given by (2.14). If
then
Note that if conditions (2.15) and (2.16) are satisfied then (2.31) holds, but it is easy to construct examples for which (2.31) holds but (2.15), (2.16) are violated.
Observe that for any x we have that \(\mu \left( x\right) =\left( \mu _1\left( x\right) , \dots ,\mu _n\left( x\right) \right) \in {\mathcal {I}}^n\) (recall definition (2.18)) where
Namely, \({\mathcal {I}}^n\subset {\mathbb {R}}^n\) is a \((n1)\)dimensional simplex with extremal elements \(\mu ^{(1)},\dots ,\mu ^{(n)}\), where \( \mu ^{(l)}_i=\delta _{li}\).
We start with a preliminary Lemma
Lemma 2.10
Consider a symmetric matrix D and the function \(G:{\mathcal {I}}^n\rightarrow {\mathbb {R}}\) defined as
We have that
if and only if
Proof
If condition (2.35) holds, then
To obtain the last identity we used the fact that \(\mu \in {\mathcal {I}}^n\). Conversely, suppose inequality (2.34) to hold. Choose \(\mu \) such that \(\mu _l=\mu _m=\frac{1}{2}\) for some \(l\ne m\) and 0 otherwise; then (2.33) becomes
where we used the symmetry of D. Consider all the couples \(l,m\in \left\{ 1,\ldots ,n\right\} \) to get the result. \(\square \)
Proof of Theorem 2.9
By formula (2.7) we deduce the results once we show that
where we called
Using (2.19) and (2.20) we obtain that the expression to be minimized in (2.37) is
We have therefore that the infimum in (2.37) coincides with \(\inf _{\mu \in {\mathcal {I}}^n}G(\mu )\) and the result follows by Lemma 2.10 since (2.35) with the matrix D defined by (2.38) coincides with (2.31). \(\square \)
3 Examples
In this section we discuss two examples, obtaining the existence of the thermodynamic limit for the quenched free energy of two models. The first one is the Sherrington–Kirkpatrick model. The existence of the thermodynamic limit for this model was obtained, by the interpolation method, in the breakthrough paper [19]. This was done using the result Theorem 2.2. We review this result as a warmup to fix ideas and the basic constructions. We use however Theorem 2.9 and discuss the result just in terms of the metrics. Then we discuss a class of Generalized Random Energy Models [15] for which in general conditions (2.15), (2.16) fail while condition (2.31) hold. We refer to [9, 25] and [10] for the beautiful mathematics involved in the limit of such kind of models. In the final part of the section we discuss some models from a purely metric viewpoint.
3.1 The Sherrington–Kirkpatrick Model
The Sherrington–Kirkpatrick model is a mean field spin glass model [18, 24, 26, 28]. Spins configurations are \(\sigma \in \{1,1\}^N\) and the energy of the system is given by
where \(J_{i,j}\) are i.i.d. standard Gaussian random variables. Small variants of the model consider different sums in (3.1) but all the variants are equivalent modulo simple transformations. The spins are associated to the vertices of a complete graph and the interaction between each pair of spins is determined by the variables J’s. The partition function is defined as
where the parameter \(\beta \) is the inverse temperature and the quenched free energy per site is defined by
where the last equality defines the symbol \(\alpha _N\left( \beta \right) \). The variables \(\left( \beta H_N\left( \sigma \right) \right) _{\sigma \in \left\{ 1,1\right\} ^N}\) are a centered Gaussian random vector with covariance
where
is the overlap between the configurations \(\sigma \) and \(\sigma '\). The corresponding Euclidean distance according to (2.23) is given by
where
is the Hamming distance. By the way, the Hamming distance is an example of a non Euclidean metric. Notice that we have of course \(d_N\left( \sigma ,\sigma \right) =0\) but we have also \(d_N\left( \sigma ,\sigma \right) =0\) since \(H_N\left( \sigma \right) =H_N\left( \sigma \right) \). The fact that the right hand side of (3.7) is a distance (indeed a pseudo distance) is not trivial but follows directly since it is obtained by (2.23) (it is a function of a metric that is again a metric, see [13, 16]).
Let us split the system into two subsystems \(S_1, S_2\) with respectively \(N_1\) and \(N_2\) vertices with \(N_1+N_2=N\). We erase the interaction between spins that belong to different subsystems. We define the restricted Hamiltonians of the subsystems as
where we remark that the sum is restricted to the indices belonging to the subsystems labeled \(k=1,2\). Here and hereafter we continue to use the symbol \(\sigma \) both for the full configuration as well as for the configuration restricted to a subsystem. When a configuration appears in an expression that is labeled by a subsystem then we mean the configuration restricted to the subsystem. For example \(d^H_{N_k}\left( \sigma ,\sigma '\right) \) and \(d_{N_k}\left( \sigma ,\sigma '\right) \) are respectively the Hamming distance (3.7) and the distance (3.6) when the configuration is restricted to the subsystem \(k=1,2\). Note that with this notation we have the key relationship
Another important relationship is
We apply Theorem 2.9 with the vectors
The condition (2.31) becomes the superPythagorean relation
that is equivalent to
The above inequality is true by (3.9) and the concavity of the real function \(x \rightarrow x\left( 1x\right) \). By Theorem 2.9 and (3.10) we deduce
and by subadditivity and the classic Fekete Lemma we deduce that the limit of the quenched free energy per site exists
3.2 The Generalized Random Energy Model
The Generalized Random Energy Model (GREM) is a spin glass model introduced by Derrida [15] to generalize the REM (Random Energy Model) imposing pair correlations between energies. The model has a hierarchical structure, as any spin configuration correspond to a leaf of a given rooted tree.
We consider sequences of finite trees codified by finite strings of nonnegative integers. Let \(n\in \mathbb {N}\) and \(\underline{k}=\left( k_1,\ldots ,k_n\right) \) a vector of nonnegative integers and call \(\underline{k} {:=}k_1+\ldots +k_n\). The tree \(\mathcal {T}_{\underline{k}}\) is constructed as follows. The root (that is the unique node at level 0) is connected to \(2^{k_1}\) nodes to compose the first level. Each node of the first level is connected to \(2^{k_2}\) nodes of the second level; we have therefore \(2^{k_1+k_2}\) nodes on the second level and so on. The nth level consists of \(2^{k_1}2^{k_2}\ldots 2^{k_n}=2^{\underline{k}}\) leaves. If there exists a \(1\le j<n\) such that \(k_j=0\), we mean that the nodes of the level j coincide with those of the level \(j1\). A spin configuration \(\sigma \in \left\{ 1,1\right\} ^{\underline{k}}\) is then attached to each leaf. The Hamiltonian is
where \(\varepsilon _i^{(\sigma )}\sim \mathcal {N}\left( 0,a_i\right) \) if \(k_i>0\) and \(\varepsilon _i^{(\sigma )}=0\) if \(k_i=0\). For any \(i\in \mathbb N\) we have that the \(a_i\)’s are positive numbers such that \(\sum _{i=1}^{+\infty } a_i=1\).
The random variables \(\varepsilon \)’s are attached to the edges of the tree. More precisely attached to the edges that connect the level \(i1\) to the level i there is a family of i.i.d. centered Gaussian random variables with variance \(a_i\), one for each edge. When we write \(\varepsilon ^{(\sigma )}_i\) we mean then the random variable associated to the unique edge that connects level \(i1\) to level i and that belongs to the unique path from the leaf associated to \(\sigma \) to the root. When \(k_i=0\) there are no edges from level \(i1\) to level i and therefore we set \(\varepsilon ^{(\sigma )}_i=0\). Then, \(\left( H_{\underline{k}}(\sigma )\right) _{\sigma \in \left\{ 1,1\right\} ^{\underline{k} }}\) is a centered Gaussian random vector on the \(\underline{k}\)dimensional hypercube \(\left\{ 1,1\right\} ^{\underline{k}}\).
We call \(l=l\left( \sigma ,\tau \right) \in \{0,1, \dots n1\}\) the level of the hierarchy at which the two paths from the leaves \(\sigma \) and \(\tau \) of \(\mathcal {T}_{\underline{k}}\) to the root merge. The two configurations share the same energy variables \(\varepsilon ^{(\sigma )}_i=\varepsilon ^{(\tau )}_i\) for any \(i\le l\), while \(\varepsilon ^{(\sigma )}_i\ne \varepsilon ^{(\tau )}_i\) whenever \(i> l\). When \(\varepsilon ^{(\sigma )}_i\) and \(\varepsilon ^{(\tau )}_i\) are different, they are independent. Furthermore, \(\varepsilon ^{(\sigma )}_i\) and \(\varepsilon ^{(\tau )}_j\) are always independent if \(j\ne i\). We define \(\widetilde{a}_i{:=}a_i\) when \(k_i>0\) and \(\widetilde{a}_i{:=}0\) when \(k_i=0\). We get
pointing out that the right hand side above is zero when \(l=0\). The corresponding metric according to (2.23) is given by
The term inside the square root on the right hand side represents, up to a multiplicative factor, the minimal path length distance between the two leaves \(\sigma \) and \(\tau \) on the tree when each edge between level \(i1\) and i has a length given by \(a_i\). Since the graph is a tree the path is unique and the metric (3.17) is an ultrametric. We introduce, for notational convenience, the normalized distance
so that \(d_{\underline{k}}\left( \sigma ,\tau \right) =\sqrt{\left {\underline{k}}\right } s_{\underline{k}}\left( \sigma ,\tau \right) \) for any pair of configurations \(\sigma \) and \(\tau \).
Both the correlations (3.16) and the metric (3.17) depend on the vector \(\underline{k}\) and on the assignment of configurations to leaves. We will discuss soon this.
Like for the Sherrington–Kirkpatrick model, given an inverse temperature \(\beta \), we introduce the disorderdependent partition function
and the quenched average of the free energy per site
We prove the existence of the thermodynamic limit of (3.20) under general assumptions when a parameter N is diverging and the vector \(\underline{k}=\underline{k}\left( N\right) \) is growing in such a way that also \(n=n\left( N\right) \) diverges. Contucci et al. [12] proved this fact when n is constant. This was obtained applying the same strategy of the Guerra–Toninelli interpolation method [19]; in particular, they used the inequality in Theorem 2.2. When n is no longer bounded this inequality fails while the inequality in Theorem 2.9 continues to work. We describe now more precisely the growing mechanism of the model and prove the existence of the thermodynamic limit.
3.2.1 Growing and Labeling
We consider a sequence of growing trees labeled by a sequence of vectors \(\underline{k}\left( N\right) \). For each \(N\in \mathbb {N}\) we have the tree \(\mathcal {T}_{\underline{k}\left( N\right) }\) defined by the following hypothesis and rules.

(H1)
Let \(\left( \alpha _i\right) _{n=1}^{\infty }\) be a sequence of reals larger than 1 satisfying the constraint
$$\begin{aligned} \sum _{i=1}^{\infty }\log \alpha _i=\log 2. \end{aligned}$$(3.21)The \(\alpha _i\)’s define the tree \(\mathcal {T}_{\underline{k}\left( N\right) }\) through
$$\begin{aligned} k_i\left( N\right) {:=}\left\lfloor \frac{N\log \alpha _i}{\log 2}\right\rfloor , \qquad i\in \mathbb {N}, \end{aligned}$$(3.22)where \(\lfloor \, \cdot \,\rfloor \) denotes the integer part.

(H2)
The sequence \(\left( a_i\right) _{i=1}^{\infty }\) corresponds to the lengths of the edges from the different levels and the variance of the associated random variables and satisfies the condition \(\sum _{i=1}^{\infty }a_i=1\).
The exact values of the sums of the series are not really important and could be substituted just by summability conditions. Formula (3.22) follows by the fact that we ask that the number of edges connecting a given node at level \(i1\) to nodes at level i grows exponentially like \(\alpha _i^N\).
Observe that by (3.21), for any fixed \(N>0\) in \({\underline{k}\left( N\right) }\) just a finite number of components are different from zero. We define
and the finite vector \(\underline{k}\left( N\right) {:=}\left( k_1\left( N\right) ,\ldots ,k_n\left( N\right) \right) \). Then, a spin configuration \(\sigma \in \left\{ 1,1\right\} ^{\left \underline{k}\left( N\right) \right }\) is assigned to each leaf. The method is actually arbitrary; indeed, the free energy of the system is obtained summing over all the configurations, thus getting rid of any dependence on the underlying choice.
We assign a spin configuration to each leaf of the tree as follows. At fixed N, we attach to every edge one or more labels of type \(\left( m,s\right) \), where \(s=\pm 1\) and \(m\in \left\{ 1,\ldots , \left \underline{k}(N)\right \right\} \). Given a leaf there exists a unique path toward the root. If this path crosses an edge having a label \(\left( m,s\right) \) then the configuration \(\sigma \) associated to the leave is such that \(\sigma \left( m\right) =s\). We assign the labels in such a way that every path meets all the labels \(m=1, \dots , \left \underline{k}\left( N\right) \right \) and such that different leaves have associated different configurations.
We embed the tree on a plane so that the root is on the top and the paths from the leaves to the root are going upwards. Moreover all the edges connecting a given node with the nodes at the successive level are ordered from left to right. Each edge connecting the level \(i1\) to level i has exactly \(k_i\left( N\right) \) labels corresponding to the values \(m=\sum _{j=1}^{i1}k_j\left( N\right) +1, \sum _{j=1}^{i1}k_j\left( N\right) +2, \dots , \sum _{j=1}^{i}k_j\left( N\right) \). The corresponding values of the parameter s are fixed as follows.
Fix a node at level \(i1\). Number each edge connecting this node with a node at level i with an integer number going from left to right from the value 0 to \(2^{k_i\left( N\right) }1\). The leftmost will correspond to 0 while the rightmost to \(2^{k_i\left( N\right) }1\). Do this for each node. Write these integers in binary code so that the leftmost edges are numbered with \(k_i\left( N\right) \) zeros and the rightmost with \(k_i(N)\) ones. In our setting, the 0 corresponds to the − sign and the 1 to the \(+\) sign. Then, we associate the lowest value of m to the most significant digit and the highest value of m to the less significant one. See Fig. 3 for an example.
3.2.2 Splitting the System
Let \(N>0\) and consider a pair of integers \(N_1,N_2\) such that \(N_1+N_2=N\). We already know how to construct the trees \(\mathcal {T}_{\underline{k}\left( N\right) }\), \(\mathcal {T}_{\underline{k}\left( N_1\right) }\) and \(\mathcal {T}_{\underline{k}\left( N_2\right) }\). Their geometric structure is simply codified by the finite vectors \(\underline{k}\left( N\right) \), \(\underline{k}\left( N_1\right) \), \(\underline{k}\left( N_2\right) \) and we recall that, by definition, we have
Notice that
We associate the labels to the edges and leaves of the full system \(\mathcal {T}_{\underline{k}\left( N\right) }\) as in the previous section. The labels of the two subsystems \(\mathcal {T}_{\underline{k}\left( N_1\right) }\) and \(\mathcal {T}_{\underline{k}\left( N_2\right) }\) are instead attributed in a slightly different way in order to have different spins (different labels m) belonging to the two subsystems.
The labels m attributed to the edges from level \(i1\) to level i in the full system coincide with the set \(\left\{ \sum _{j=1}^{i1}k_j\left( N\right) +1,\sum _{j=1}^{i1}k_j\left( N\right) +2, \dots , \sum _{j=1}^{i}k_j\left( N\right) \right\} \). When we split the system into the two subsystems we assign to the edges that connect each node in the level \(i1\) to the level i of the subsystem \(\mathcal {T}_{\underline{k}\left( N_1\right) }\) the labels \(\left\{ \sum _{j=1}^{i1}k_j\left( N\right) +1, \dots , \sum _{j=1}^{i1}k_j\left( N\right) +k_i\left( N_1\right) \right\} \) while we assign to the edges that connect each node in the level \(i1\) to the level i of the subsystem \(\mathcal {T}_{\underline{k}\left( N_2\right) }\) the labels \(\left\{ \sum _{j=1}^{i1}k_j\left( N\right) +k_i\left( N_1\right) +1,\dots ,\sum _{j=1}^{i1}k_j\left( N\right) +k_i\left( N_1\right) +k_i\left( N_2\right) \right\} \). By (3.25) this is well defined. Once split the labels m into the two subsystems, the assignment of the label \(s=\pm \) follows the same rule of the previous section. Since \(k_i\left( N_1\right) +k_i\left( N_2\right) \) may be strictly less than \(k_i\left( N\right) \), some of the labels m (i.e. some spins) may disappear in the splitting.
We discuss now the behavior of the distances. Consider two finite vectors \(\underline{k}\) and \(\underline{k}'\) such that \(k'_i\le k_i\) for any i. We assign the labels to \(\mathcal T_{\underline{k}}\) in the usual way while instead we assign the labels to \(\mathcal T_{\underline{k}'}\) as follows. We assign to the edges that connect each node in the level \(i1\) to the level i of \(\mathcal {T}_{\underline{k}'}\) arbitrarily \(k'_i\) of the \(k_i\) labels in \(\mathcal T_{\underline{k}}\). The assignment of the labels \(s=\pm \) follows then the usual rule.
We call respectively \(d_{\underline{k}}\) and \(d_{\underline{k}'}\) the metrics defined by formula (3.17) for the two trees \(\mathcal T_{\underline{k}}\) and \(\mathcal T_{\underline{k}'}\) and \(s_{\underline{k}}\), \(s_{\underline{k}'}\) the corresponding normalized distances (see (3.18)). As before given two spin configurations \(\sigma ,\tau \in \left\{ 1,1\right\} ^{\left \underline{k}\right }\) we call again \(\sigma ,\tau \in \left\{ 1,1\right\} ^{\left \underline{k}'\right }\) the same configurations but restricted just to the labels assigned to the edges in \(\mathcal T_{\underline{k}'}\). We have the following.
Lemma 3.1
Consider two finite vectors \(\underline{k}' \le \underline{k}\) and the corresponding trees \(\mathcal T_{\underline{k}}\) and \(\mathcal T_{\underline{k}'}\) with configurations of spins associated to the leaves as above. Then we have
Proof
Consider the tree \(\mathcal T_{\underline{k}}\), two configurations \(\sigma , \tau \) associated to two leaves and their corresponding geodetic path. Let us now consider a new finite vector \(\underline{k}'\) obtained by \(\underline{k}\) simply decreasing by one just a single component and preserving all the remaining ones, i.e. \(k'_i=k_i1\) and \(k'_j=k_j\) for all \(j\ne i\). Suppose that the label m that is missing in \(\mathcal T_{\underline{k}'}\) is \(m^*\). The tree \(\mathcal T_{\underline{k}'}\) with the corresponding labeling is obtained from \(\mathcal T_{\underline{k}}\) and the original labeling simply as follows. All the edges connecting nodes at level \(i1\) to nodes at level i in \(\mathcal T_{\underline{k}}\) can be paired into pairs having exactly the same labels apart the one corresponding to \(m^*\). The two paired edges will have labels respectively \(\left( m^*,+\right) \) and \(\left( m^*,\right) \). If we identify each paired couple of edges, and consequently we identify too the subtrees starting from the identified nodes, we get a tree that coincides with \(\mathcal T_{\underline{k}'}\) with exactly the same assignments of labels. In particular, the leaves associated to \(\sigma \), \(\tau \) in the new tree will be exactly the original ones after the identification. Finally the geodetic path too remains the same after the identification (see e.g. Fig. 4).
Since the identification procedure can only shorten this path we have the statement of the lemma when \(\underline{k}'\) is obtained by \(\underline{k}\) decreasing by one just one of its components. We finish the proof observing that any \(\underline{k}'\le \underline{k}\) can be obtained by \(\underline{k}\) after a finite numbers of iterations of this type. \(\square \)
Remark 3.2
Both \(\mathcal T_{\underline{k}\left( N_i\right) }\), \(i=1,2\) are obtained by \(\mathcal T_{\underline{k}\left( N\right) }\) as in the hypothesis of Lemma 3.1 and we have therefore
Since by (3.25) we have \(\frac{\underline{k}(N_1)+\underline{k}(N_2)}{\underline{k}(N)}\le 1\) and we deduce
that is equivalent to the superPythagorean condition
3.2.3 Thermodynamic Limit
We define the energy of our sequence of GREM models as \(H_N\left( \sigma \right) := H_{\underline{k}\left( N\right) }\left( \sigma \right) \) (recall definition (3.15)) and the corresponding partition functions and density of free energies like in (3.19), (3.20) more precisely \(Z_N\left( \beta \right) =\sum _{\left\{ \sigma \right\} } \mathrm {e}^{\beta H_N\left( \sigma \right) }\) and
where the last equality defines the symbol \(\alpha _N(\beta )\).
We need a preliminary Lemma. Let us call \(\gamma _i{:=}\frac{\log \alpha _i}{\log 2}>0\). Observe that by definition we have \(\sum _{i=1}^{+\infty }\gamma _i=1\).
Lemma 3.3
We have
Proof
For any finite k we have
The right hand side of the above equation is 1. The left hand side converges when \(N\rightarrow \infty \) to \(\sum _{i=1}^k\gamma _i\). Taking now the limit on \(k\rightarrow \infty \) we deduce the statement of the Lemma. \(\square \)
We can now prove the existence of the limit for quenched free energy per site of a GREM model with infinite levels.
Theorem 3.4
Under the hypothesis (H1) and (H2), there exists the limit when \(N\rightarrow \infty \) of the density of free energy (3.30) defined on \(\mathcal {T}_{\underline{k}\left( N\right) }\), in the sense that there exists the following limit that coincides with an infimum
Proof
We apply the interpolation method for the Gaussian random vectors \(H_{\underline{k}\left( N\right) }\left( \sigma \right) \) and \(H_{\underline{k}\left( N_1\right) }\left( \sigma \right) +H_{\underline{k}\left( N_2\right) }\left( \sigma \right) \) that are both labeled by the configurations \(\sigma \in \left\{ 1,1\right\} ^{\left \underline{k}\left( N\right) \right }\). The Gaussian random variables used to compute \(H_{\underline{k}\left( N\right) }, H_{\underline{k}\left( N_1\right) }\) and \(H_{\underline{k}\left( N_2\right) }\) are all independent among them. Note that since in the splitting some spins are lost then the second Gaussian random vector is degenerate.
We have the following identity
The last term is due to the fact that some spins may be lost in the splitting.
By Remark 3.2, we can apply Theorem (2.9) getting
Since the last term in the above inequality is nonnegative we obtain that the sequence \(\alpha _N(\beta )\) is subadditive. By Fekete’s Lemma we deduce that there exists the limit
By Lemma 3.3 we have that
and we get the main statement of the Theorem.
It remains just to prove that the limit is strictly bigger than \(\infty \).
This follows by the summability of the variances \(a_i\)’s. Indeed, we prove that for any \(N>0\), \(\beta F_{N}\left( \beta \right) \) is bounded from above. We have
where we used Jensen’s inequality. Since the \(\varepsilon _i^{(\sigma )}\) are independent, the expectation value in the last row is the product of generating functions:
hence
where we used the fact that \(\sum _{i=1}^{\infty }a_i=1\). \(\square \)
Just as a remark we show in Lemma 3.5 in the Appendix that the third term in the right hand side of (3.35) is negligible when N is large. This fact is irrelevant for the proof but it is interesting in itself since for different models we could have a similar situation but with the wrong sign and a bound of this type could allow to apply the generalized subadditive lemmas in [14].
3.3 Geometric Remarks
Since by Lemma 2.8 we have that the density of free energy just depends on the metric structure of the Gaussian random variables, it is interesting to analyze the metric structure corresponding to the different models. Moreover natural and interesting models can be introduced starting directly from the metric description. Since all the metric spaces involved must be Euclidean a relevant characteristic is the dimension of the space where the metric can be realized as a collection of points.
Another useful remark is that the superPythagorean relation (3.11) implies (3.13), that gives the convergence (3.14) to the infimum of the density of free energies while instead a subPythagorean relation (i.e. (3.11) with the opposite inequality) would imply a convergence of the density of free energy to the supremum.
Let us start with the Sherrington–Kirkpatrick model. Since the energy is defined in terms of \(N^2\) i.i.d. Gaussian random variables the metric of the model with N sites can be represented by \(2^N\) points embedded on \({\mathbb {R}}^{N^2}\). Indeed a natural representation of this metric is the following. Consider \(\sigma \in \left\{ 1,1\right\} ^N\) as a column vector and define the projector \(\widehat{\sigma }:=\sqrt{N}\sigma \sigma ^T\) that is a positive definite \(N\times N\) matrix. By a direct computation we have that the metric induced by the Sherrington–Kirkpatrick model is given by
i.e. it is the Euclidean metric on the one dimensional projectors induced by the HilbertSchmidt scalar product. The superPythagorean relation is strictly related to the fact that all the projectors belong to the cone of positive definite matrices.
The metric structure of the GREM is best described with the trees illustrated in Sect. 3.2.
An interesting class of models can be introduced directly defining the metric. Let \(v^{\pm 1}\in {\mathbb {R}}^2\) be two vectors. To any \(\sigma \in \left\{ 1,1\right\} ^N\) we associate the vector \(v\left( \sigma \right) \in {\mathbb {R}}^{2^N}\) defined by
We have
If we let \(v^{\pm 1}=v^{\pm 1, N}\) depend on N in such a way that \(v^{\pm 1,N}^2=N^{\frac{1}{N}}\) and \(\left( v^{1,N}, v^{1, N}\right) =N^{\frac{1}{N}}\alpha ^{\frac{1}{N}}\), \(\alpha \in [0,1)\) we obtain that the Euclidean metric on the \(2^N\) points embedded into \({\mathbb {R}}^{2^N}\) (that are the points in correspondence of the vectors) is given by
This Euclidean metric (as before it is a function of a metric that is again a metric [13, 16]) satisfies the superPythagorean relation (3.11) since the real function \(1\alpha ^x\) is concave. We have therefore convergence of the free energy densities. A model that corresponds to this metric can be fixed such that \({\mathbb {E}}\left[ H_N\left( \sigma \right) H_N\left( \eta \right) \right] \sim \alpha ^{d^H_N\left( \sigma ,\eta \right) }\). The special case \(\alpha =0\) corresponds to the REM model that is a special case of the GREM discussed in Sect. 3.2 with one single branch and all the leaves directly connected to the root. In this case all the points are equally spaced.
References
Alaoui, A.E., Krzakala, F., Vail, C.O.: Estimation in the spiked wigner model: a short proof of the replica formula. IEEE Int. Symp. Inf. Theory 2018, 1874–1878 (2018)
Barbier, J., Dia, M., Macris, N., Krzakala, F.: The mutual information in random linear estimation 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 625–632. IL, Monticello (2016)
Barbier, J., Macris, N.: The adaptive interpolation method: a simple scheme to prove replica formulas in Bayesian inference Probab. Theory Relat. Fields 174, 1133–1185 (2019)
Barbier, J., Macris, N., Dia, M., Krzakala, F.: Mutual information and optimality of approximate messagepassing in random linear estimation. IEEE Trans. Inf. Theory 66(7), 4270–4303 (2020)
Barra, A.: The mean field Ising model through interpolating techniques. J. Stat. Phys. 132(5), 787–809 (2008)
Barra, A., Contucci, P., Mingione, E., Tantari, D.: Multispecies mean field spin glasses. Rigorous results. Ann. Henri Poincaré 16(3), 691–708 (2015)
Bhatia, R.: Positive definite matrices Princeton Series in Applied Mathematics. Princeton University Press, Princeton, NJ (2007)
Blumenthal, L.M.: Theory and Applications of Distance Geometry, 2nd edn. Chelsea Publishing Co., New York (1970)
Bovier, A., Kurkova, I.: Derrida’s generalized random energy models. 1. Models with infinitely many hierarchies. Ann. Inst. H. Poincaré. Prob. Stat. 40, 439–480 (2004)
Bovier, A., Kurkova, I.: Derrida’s generalized random energy models. II. Models with continuous hierarchies. Ann. Inst. H. Poincaré Probab. Statist. 40(4), 481–495 (2004)
Chatterjee, S.: An error bound in the SudakovFernique inequality. arXiv:math/0510424
Contucci, P., Degli, Esposti M., Giardinà, C., Graffi, S.: Thermodynamical limit for correlated Gaussian random energy models. Commun. Math. Phys. 236(1), 55–63 (2003)
Corazza, P.: Introduction to metricpreserving functions Amer. Math. Monthly 106(4), 309–323 (1999)
de Bruijn, N.G. , Erdös, P.: Some linear and some quadratic recursion formulas. II. Nederl. Akad. Wetensch. Proc. Ser. A. 55 = Indagationes Math. 14, 152–163 (1952)
Derrida, B.: A generalization of the random energy model that includes correlations between the energies. J. Phys. Lett. 46, 401–407 (1985)
Dobos̆, J.: Metric Preserving Functions, Vydavatel’stvo Troffek, Koice, Slovakia, ISBN 8088896304 (1998)
Dokmanic, I., Parhizkar, R., Ranieri, J., Vetterli, M.: Euclidean Distance Matrices: Essential theory, algorithms, and applications. IEEE Signal Processing Magazine 32, 6 (2015)
Guerra, F.: Spin Glasses. In: Bovier, A., et al. (eds.) Mathematical Statistical Physics, pp. 243–271. Elsevier, Oxford (2006)
Guerra, F., Toninelli, F.L.: The thermodynamic limit in mean field spin glass models. Commun. Math. Phys. 230(1), 71–79 (2002)
Joagdev, K., Perlman, M.D., Pitt, L.D.: Association of normal random variables and Slepian’s inequality. Ann. Probab. 11, 451–455 (1983)
Kahane, J.P.: Une inégalité du type Slepian and Gordon sur les processus gaussiens Israel. J. Math. 55, 109–110 (1986)
Korada, S.B., Macris, N.: Exact solution of the gauge symmetric pspin glass model on a complete graph. J. Stat. Phys. 136(2), 205–230 (2009)
Mezard, M., Parisi, G., Virasoro, M.A.: Spin Glass Theory and Beyond. World Scientific, Singapore (1987)
Panchenko, D.: The Sherrington–Kirkpatrick Model Springer Monographs in Mathematics. Springer, New York (2013)
Ruelle, D.: A mathematical reformulation of Derrida’s REM and GREM Comm. Math. Phys. 108(2), 225–239 (1987)
Sherrington, D., Kirkpatrick, S.: Solvable model of a spinglass. Phys. Rev. Lett. 35, 1792–1796 (1975)
Schoenberg, I.: Remarks to Maurice Fréchets article Sur la définition axiomatique dune classe despace distanciés vectoriellement applicable sur lespace de Hilbert. Ann. Math. 36, 724–732 (1935)
Talagrand M.: Spin glasses: a challenge for mathematicians. Cavity and mean field models. Results in Mathematics and Related Areas. 3rd Series. A Series of Modern Surveys in Mathematics, 46 SpringerVerlag, Berlin (2003)
Talagrand, M.: The Parisi formula. Ann. Math. (2) 163(1), 221–263 (2006)
Acknowledgements
We thank Adriano Barra, Francesco Guerra and Fabio Lucio Toninelli for several useful comments, suggestions and remarks.
Funding
Open access funding provided by Università degli Studi dell’Aquila within the CRUICARE Agreement.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Federico RicciTersenghi.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
We show here that the extra terms in (3.35) are indeed negligible when N is large.
Lemma 3.5
We have
Proof
Let us call \(I\left( N\right) \subseteq \mathbb N\) the set \(I\left( N\right) {:=}\left\{ i\,:\, k_i\left( N\right) >0\right\} \). We define also \(J\left( N\right) {:=}I\left( N\right) \cap \left( I\left( N1\right) \right) ^C\). We have for any \(N_1+N_2=N\) that
By definition we have that if \(i\in J\left( N\right) \) then \(\frac{1}{N}\le \gamma _i<\frac{1}{N1}\). We have therefore
We deduce therefore that the series on the right hand side has to be convergent. Since \(I(N)=\sum _{\ell =1}^NJ(\ell )\) we have
The series on the right hand side of (3.45) is convergent, thus we deduce the statement by the dominated convergence Theorem. This, together with (3.44) and Lemma 3.3, concludes the proof. \(\square \)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Boccagna, R., Gabrielli, D. Remarks on the Interpolation Method. J Stat Phys 181, 1218–1238 (2020). https://doi.org/10.1007/s1095502002624x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s1095502002624x