1 Introduction

The interpolation method is a simple but powerful technique used to prove inequalities for Gaussian random vectors (see for example [20] and [21]). This method has great relevance in the field of Mathematical and Theoretical Physics since it represents an essential ingredient in the study of mean field spin glasses. In the breakthrough paper [19] it has been used to prove the existence of the thermodynamic limit for the quenched density of free energy for the Sherrington–Kirkpatrick model. This was a longstanding problem and its solution was the turning point towards the proof of the Parisi Formula [29].

Spin glasses are simple mathematical models for disordered systems whose rigorous analysis is indeed a challenge for mathematicians. We refer to [24, 28] the mathematically interested reader and to [23] the physically interested one. Among plenty of models, one of the most studied is that introduced by Sherrington and Kirkpatrick in [26] as a solvable elementary model. Indeed the structure of the solution turned out to be much more rich and complex than expected and was build up in a series of papers by Parisi (see [23] for a detailed discussion). A rigorous proof of the Parisi conjectured solution was missing for a long time and the interpolation method played a key role in its proof. See [18] for a review on this.

Using the same idea of [19], the authors of [12] proposed a general setting for the interpolation method in the framework of mean field spin glasses. Furthermore, they successfully applied this technique to prove the existence of the thermodynamic limit for the Generalized Random Energy Model (GREM, a family of models introduced in [15]) with a finite number of levels.

The interpolation method is now a powerful technique that has many different applications in different contexts, see for example [1,2,3,4,5,6, 22], a list that is by far not exaustive.

The “classical” hypothesis under which the interpolation method can be applied to the quenched free energy of mean field spin glasses consists of a collection of equalities and inequalities for the covariance matrix of the underlying multivariate Gaussian process. We show that less restrictive conditions are actually needed. More precisely, we show that the method works under conditions that involve just the \(L^2\) metric structure of the Gaussian random vectors. By the correspondence in [27, 17] this is always an Euclidean metric structure. A condition of this type is very natural since the quenched free energy depends on the distribution of the Gaussian random vector only through its metric structure. This gives an interesting geometric flavor and interpretation and we discuss, at the end of the paper, the models from a purely metric viewpoint. This generalized condition of validity was also obtained through a tricky computation in the framework of Sudakov-Fernique inequalities in [11]. Here we deduce the condition by a general argument that could in principle be applied also to comparative inequalities involving expected values of different functions of Gaussian vectors. As an example of application of the generalized condition, we consider a GREM model with infinite levels and deduce the existence of the thermodynamic limit for the quenched density of free energy. Indeed, in this case the usual condition of validity of the interpolation method used in [12, 19] fails. We can deduce therefore the existence of the thermodynamic limit directly using the simple argument of the interpolation method. We refer to [9, 25] and [10] for the beautiful mathematics involved in the limit of such kind of models.

The structure of the paper is the following.

In Sect. 2 we briefly recall the basics of the interpolation method together with the conditions used in [19] and [12]; we then discuss the Euclidean metric structure associated to any Gaussian random vector and finally show the generalized conditions.

In Sect. 3 we discuss two examples. The first one is the Sherrington–Kirkpatrick model. This is done simply to recall the basic mechanism and idea of application. The second example is a GREM model with infinite levels for which it is necessary to use the generalized conditions to prove the existence of the thermodynamic limit. In the final part of this section we discuss the models from a purely metric viewpoint introducing a class of models that have a natural metric structure and for which it is possible to show the existence of the thermodynamic limit.

In the Appendix we have an elementary auxiliary Lemma.

2 The Interpolation Method

2.1 The Interpolation Method

Let \(X=(X_1,\dots ,X_n)\) be a n-dimensional zero mean Gaussian random vector having covariance matrix C. The \(n\times n\) symmetric matrix C is non-negative definite and the elements are defined by \(C_{i,j}{:=}{\mathbb {E}}\left[ X_iX_j\right] \). When C is positive definite then the distribution of X is absolutely continuous with respect to the Lebesgue measure on \({\mathbb {R}}^n\) and the density is

$$\begin{aligned} \phi _{C}\left( x\right) {:=}\frac{1}{\sqrt{\left( 2\pi \right) ^n\text {det}\left( C\right) }}\mathrm {e}^{-\frac{1}{2} (x,C^{-1}x)}, \end{aligned}$$
(2.1)

where \(\left( \,\cdot \, ,\,\cdot \,\right) \) denotes the Euclidean scalar product in \({\mathbb {R}}^n\). We restrict to the case of positive definite matrices since the other cases can be deduced by a limiting procedure. We have the Fourier transform representation

$$\begin{aligned} \phi _C\left( x\right) =\frac{1}{(2\pi )^n}\int _{{\mathbb {R}}^n}\mathrm {d}\lambda \, \mathrm {e}^{-i(\lambda ,x)}\mathrm {e}^{-\frac{1}{2} (\lambda ,C\lambda )}. \end{aligned}$$
(2.2)

We denote by \(\mathrm {Tr}\,(\,\cdot \,)\) the trace of a matrix and consider \(\,\overline{\! \mathcal {C}}\) the set of non-negative definite symmetric matrices endowed with the Hilbert-Schmidt scalar product

$$\begin{aligned} \left( A,B\right) {:=}\text {Tr}\left( AB\right) , \qquad A, B\in \,\overline{\! {\mathcal {C}}}. \end{aligned}$$
(2.3)

The open set of positive definite symmetric matrices corresponds to \({\mathcal {C}}\).

Let \(\phi : {\mathcal {C}} \times {\mathbb {R}}^n\rightarrow {\mathbb {R}}^+\) as defined in (2.1). By (2.2) and a direct computation we have

$$\begin{aligned} \frac{\partial \phi _C\left( x\right) }{\partial C_{i,j}}=\frac{\partial \phi _C\left( x\right) }{\partial C_{j,i}}=\frac{\partial ^2\phi _C\left( x\right) }{\partial x_i\partial x_j}, \end{aligned}$$
(2.4)

and

$$\begin{aligned} \frac{\partial \phi _C\left( x\right) }{\partial C_{i,i}}=\frac{1}{2}\frac{\partial ^2\phi _C\left( x\right) }{\partial x_i^2}. \end{aligned}$$
(2.5)

Recall that in the above formulas C is a symmetric matrix so that the variations in the computation of (2.4) are constructed varying symmetrically the matrix C. More precisely let \(E^{\{i,j\}}\) with \(i\ne j\) be the symmetric matrix such that \(E^{\{i,j\}}_{i,j}=E^{\{i,j\}}_{j,i}=1\) and having all the remaining elements equal to zero. Given \(F:{\mathcal {C}}\rightarrow {\mathbb {R}}\) we define

$$\begin{aligned} \frac{\partial F\left( C\right) }{\partial C_{j,i}}=\frac{\partial F\left( C\right) }{\partial C_{i,j}}{:=}\lim _{\delta \rightarrow 0}\frac{F\left( C+\delta E^{\{i,j\}}\right) -F\left( C\right) }{\delta }. \end{aligned}$$
(2.6)

Consider now \(f:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) a \(C^2\) function with moderate growth at infinity, for example such that \(|f(x)|\le \mathrm {e}^{\lambda |x|}\) for a suitable constant \(\lambda \ge 0\). This technical condition is related to the validity of some integrations by parts. We call \(\nabla ^2f\left( x\right) \) the Hessian matrix of f at x, that is the symmetric matrix having elements

$$\begin{aligned} \left( \nabla ^2f\right) _{i,j}\left( x\right) {:=}\frac{\partial ^2f\left( x\right) }{\partial x_i \partial x_j}. \end{aligned}$$

The following result is the interpolation method. For the readers convenience we give the short proof.

Lemma 2.1

(Interpolation method) Consider two zero mean Gaussian random vectors XY having covariance matrices respectively given by \(C^X\) and \(C^Y\). Consider a \(C^2\) function f with moderate growth. We have

$$\begin{aligned} {\mathbb {E}}\left[ f\left( Y\right) \right] -{\mathbb {E}} \left[ f\left( X\right) \right] =\frac{1}{2} \int _0^1 \mathrm {d}t\,{\mathbb {E}} \Big [\mathrm {Tr}\Big (\left( C^Y-C^X\right) \nabla ^2f\left( Z\left( t\right) \right) \Big )\Big ], \end{aligned}$$
(2.7)

where

$$\begin{aligned} Z(t)=\sqrt{t}X+\sqrt{(1-t)}Y, \end{aligned}$$
(2.8)

and XY are two independent copies of the random vectors.

Proof

When Z is a n-dimensional centered Gaussian random vector, then \(\mathbb {E}\left[ f(Z)\right] \) depends only on the covariance matrix C of the vector Z. Fix a \(C^2\) function f and define the function \(F:\overline{{\mathcal {C}}}\rightarrow {\mathbb {R}}\) as

$$\begin{aligned} F\left( C\right) {:=}{\mathbb {E}}\left[ f\left( Z\right) \right] . \end{aligned}$$
(2.9)

With the help of formulas (2.4), (2.5), when \(C\in {\mathcal {C}}\) we can compute

$$\begin{aligned} \frac{\partial F\left( C\right) }{\partial C_{i,j}}= & {} \int _{{\mathbb {R}}^n}\mathrm {d}x\,\frac{\partial \phi _{C}\left( x\right) }{\partial C_{i,j}}f\left( x\right) =\int _{{\mathbb {R}}^n}\mathrm {d}x\,\frac{\partial ^2 \phi _{C}\left( x\right) }{\partial x_i\partial x_j}f\left( x\right) \end{aligned}$$
(2.10)
$$\begin{aligned}= & {} \int _{{\mathbb {R}}^n}\mathrm {d}x\, \phi _{C}\left( x\right) \frac{\partial ^2f\left( x\right) }{\partial x_i\partial x_j}={\mathbb {E}}\left[ \left( \nabla ^2 f\right) _{i,j}\left( Z\right) \right] , \end{aligned}$$
(2.11)

and

$$\begin{aligned} \frac{\partial F\left( C\right) }{\partial C_{i,i}}= & {} \int _{{\mathbb {R}}^n}\mathrm {d}x\,\frac{\partial \phi _{C}\left( x\right) }{\partial C_{i,i}}f\left( x\right) =\frac{1}{2}\int _{{\mathbb {R}}^n}\mathrm {d}x\,\frac{\partial ^2 \phi _{C}\left( x\right) }{\partial x_i^2}f\left( x\right) \nonumber \\= & {} \frac{1}{2}\int _{{\mathbb {R}}^n}\mathrm {d}x\, \phi _{C}\left( x\right) \frac{\partial ^2f\left( x\right) }{\partial x_i^2}=\frac{1}{2}\, {\mathbb {E}}\left[ \left( \nabla ^2f\right) _{i,i}\left( Z\right) \right] . \end{aligned}$$
(2.12)

Given a \(C^1\) parametric curve \(\left\{ C(t)\right\} _{t\in [0,1]}\) on \({\mathcal {C}}\) such that \(C(0)=C^X\) and \(C(1)=C^Y\), then we have

$$\begin{aligned} {\mathbb {E}}\left[ f\left( Y\right) \right] -{\mathbb {E}}\left[ f\left( X\right) \right] =\frac{1}{2}\int _0^1 \mathrm {d}t\,{\mathbb {E}}\left[ \mathrm {Tr}\left( \frac{\mathrm {d}C\left( t\right) }{\mathrm {d}t}\nabla ^2f\left( Z\left( t\right) \right) \right) \right] , \end{aligned}$$
(2.13)

where \(Z\left( t\right) \) is a centered Gaussian random vector having covariance \(C\left( t\right) \). The special case when the curve linearly interpolates between \(C^X\) and \(C^Y\) gives (2.7) with Z(t) given by (2.8). If one or both the matrices \(C^X\) and \(C^Y\) are not strictly positive definite, it is possible to add to the matrices \(\varepsilon \mathbb I\), do the same computation as above and finally take the limit \(\varepsilon \rightarrow 0\). \(\square \)

The above formula is the core of the interpolation method. It is very useful to establish inequalities between the two expected values on the left hand side of (2.7).

The Guerra–Toninelli interpolation method is a simple but powerful technique developed in the study of mean field spin glasses (see [18, 19] and references therein), which is based on an abstract theorem about Gaussian random variables. It corresponds to the interpolation method Lemma 2.1 with the special choice of the function

$$\begin{aligned} f(x)=\log \sum _{i=1}^nw_i \mathrm {e}^{x_i}, \end{aligned}$$
(2.14)

where \(w_i\in {\mathbb {R}}^+\) are some fixed positive weights.

In particular, Guerra and Toninelli obtained and used the following result (this is Theorem 2 in [18]) to prove the existence of the thermodynamic limit of the Sherrington–Kirkpatrick model. The same idea and the same Theorem (Theorem 2.2 below), was used later on in [12] to deduce the existence of the thermodynamic limit for a GREM model [15] with a finite number of levels.

Theorem 2.2

Let XY two centered Gaussian random vectors and the function f given by (2.14). If

$$\begin{aligned}&C^X_{i,i}=C^Y_{i,i}, \qquad \forall \, i, \end{aligned}$$
(2.15)
$$\begin{aligned}&C^X_{i,j}\ge C^Y_{i,j}, \qquad \forall \, i\ne j, \end{aligned}$$
(2.16)

then we have

$$\begin{aligned} {\mathbb {E}}\left[ f\left( Y\right) \right] \ge {\mathbb {E}}\left[ f\left( X\right) \right] . \end{aligned}$$
(2.17)

We show the proof of Theorem 2.2 that is based on the interpolation formula (2.7).

Proof of Theorem 2.2

Let us call, for any \(i=1,\ldots ,n\)

$$\begin{aligned} \mu _i\left( x\right) {:=}\frac{w_i\mathrm {e}^{x_i}}{\sum _{j=1}^nw_j\mathrm {e}^{x_j}}. \end{aligned}$$
(2.18)

By a direct computation, when f is (2.14), we have

$$\begin{aligned}&\frac{\partial ^2f\left( x\right) }{\partial x_i^2}=\mu _i\left( x\right) -\mu _i^2\left( x\right) , \end{aligned}$$
(2.19)
$$\begin{aligned}&\frac{\partial ^2f\left( x\right) }{\partial x_i\partial x_j}=-\mu _i\left( x\right) \mu _j\left( x\right) . \end{aligned}$$
(2.20)

By the formulas (2.19), (2.20) and conditions (2.15), (2.16), we have that

$$\begin{aligned} (C^Y-C^X)_{i,j}\left( \nabla ^2f\right) _{i,j}\left( x\right) \ge 0, \ \ \ \ \ \forall \, x\in {\mathbb {R}}^d, \;\;\; \forall \,i,j\, \end{aligned}$$
(2.21)

and the result follows by (2.7). \(\square \)

2.2 Covariances and Metrics

We start recalling some simple but useful Lemmas.

Lemma 2.3

We have that the \(n\times n\) symmetric matrix C belongs to \({\mathcal {C}}\) if and only if there exist n vectors \(a^{(i)}\in {\mathbb {R}}^n\) such that

$$\begin{aligned} C_{i,j}=\big (a^{(i)},a^{(j)}\big ). \end{aligned}$$
(2.22)

This is a classic result and the matrix C is called the Gram matrix of the vectors \(\left( a^{(i)}\right) _{i=1,\dots ,n}\), see for example [7].

A finite metric space with n points is called Euclidean if there exists a collection of n points on \({\mathbb {R}}^k\) having the same relative interdistances. Of course we can always fix \(k=n\). Not every metric space can be realized in this way. The simplest example is the minimal path metrics on the vertices of the graph in Fig. 1 where the edges have all length 1.

Fig. 1
figure 1

An example of a non Euclidean metric spcace. The distance is the minimal path distance on the graph

Given a centered Gaussian random vector X there is naturally associated the metric \(d_X\) that is the \(L^2\) distance between the random variables

$$\begin{aligned} d_X(i,j){:=}\sqrt{{\mathbb {E}}\left[ \left( X_i-X_j\right) ^2\right] }=\sqrt{C^X_{i,i}+C^X_{jj}-2C^X_{i,j}}. \end{aligned}$$
(2.23)

We have the following result (see also [17, 27])

Lemma 2.4

A finite metric space \(\left( \left\{ 1,\dots ,n\right\} , d\right) \) is Euclidean if and only if there exists a zero mean Gaussian random vector \(X=(X_1,\dots ,X_n)\) such that \(d=d_X\).

Proof

Consider d an Euclidean distance and let \(a^{(i)}\), \(i=1,\dots ,n\) be some points on \({\mathbb {R}}^n\) that realize such a distance. This means that \(d(i,j)=\left| a^{(i)}-a^{(j)}\right| \) where \(|\cdot |\) represents the Euclidean distance in \({\mathbb {R}}^n\). Such a collection of vectors exists by definition of an Euclidean metric space. Let A be an \(n\times n\) matrix defined by \(A_{i,j}{:=}a^{(i)}_j\). Let \(Z=(Z_1,\dots ,Z_n)\) be a vector of i.i.d. standard Gaussian random variables and consider the Gaussian vector \(X=AZ\) whose covariance \(C^X=AA^T\) coincides with the right hand side of (2.22). Using (2.23) we have

$$\begin{aligned} d_X\left( i,j\right) =\big |a^{(i)}-a^{(j)}\big |=d\left( i,j\right) . \end{aligned}$$
(2.24)

Conversely let X a Gaussian zero mean vector with covariance \(C^X\) and let A an \(n\times n\) matrix such that \(C^X=AA^T\). Define n vectors in \({\mathbb {R}}^n\) by \(a^{(i)}_j{:=}A_{i,j}\); by (2.23) we have that \(d_X\) is determined by the first equality in (2.24) and is therefore Euclidean.\(\square \)

Other simple but useful lemmas are the following. We give just the statements, the proofs can be found for example in [8].

Lemma 2.5

Let \(v^{(1)},\dots , v^{(n)}\) and \(w^{(1)},\dots , w^{(n)}\) be two collections of n vectors in \({\mathbb {R}}^n\). We have that

$$\begin{aligned} \big ( v^{(i)},v^{(j)}\big ) =\big ( w^{(i)},w^{(j)}\big ) \qquad \forall \,i,j\, \end{aligned}$$
(2.25)

if and only if there exists \(O\in O\left( n\right) \) such that \(w^{(i)}=Ov^{(i)}\) for any i.

Lemma 2.6

Let \(v^{(1)},\dots ,v^{(n)}\) and \(w^{(1)},\dots ,w^{(n)}\) be two collections of vectors in \({\mathbb {R}}^n\). We have that

$$\begin{aligned} \big |v^{(i)}-v^{(j)}\big |=\big |w^{(i)}-w^{(j)}\big |, \qquad \forall \,i,j, \end{aligned}$$
(2.26)

if and only if there exists \(O\in O\left( n\right) \) and a vector \(b\in {\mathbb {R}}^n\) such that

$$\begin{aligned} w^{(i)}=Ov^{(i)}+b, \qquad \forall i. \end{aligned}$$
(2.27)

The metric structure \(d_X\), associated to a Gaussian random vector X, contains less information than the covariance \(C^X\) and there are random vectors having different covariances but the same metric structure. This type of invariance is best understood in terms of the vectors in \({\mathbb {R}}^n\) using the above Lemmas that characterize invariance by rotations and translations. In particular we can completely characterize the centered Gaussian random variables that share the same metric structure.

Lemma 2.7

Given X and Y two n-dimensional centered Gaussian random vectors, we have that \(d_X=d_Y\) if and only if there exists a centered Gaussian random variable W such that the random vector \(X_i+ W\), \(i=1,\dots ,n\) has the same distribution of Y.

Proof

If Y has the same distribution of \(X+W{1}_n\), where \({1}_n\) is the n-dimensional vector of all ones \({1}_n=\left( 1,1,\ldots ,1\right) \), then

$$\begin{aligned} d_Y(i,j)=\sqrt{\mathbb {E}\left[ \left( Y_i-Y_j\right) ^2\right] }=\sqrt{\mathbb {E}\left[ \left( X_i+W-X_j-W\right) ^2\right] }=d_X(i,j). \end{aligned}$$

Conversely, suppose that \(d_X=d_Y\). We have that there exist two matrices \(A^X\) and \(A^Y\) such that \(A^XZ\) has the same distribution of X and \(A^YZ\) has the same distribution of Y, where Z is an n vector of i.i.d. standard Gaussian random variables. We define two collections \(v^{(i)}, w^{(i)}\), \(i=1,\dots n\) of vectors in \({\mathbb {R}}^n\) defined by \(v^{(i)}_j{:=}A^X_{i,j}\) and \(w^{(i)}_j{:=}A^Y_{i,j}\). Since \(d_X=d_Y\) we have

$$\begin{aligned} \big |v^{(i)}-v^{(j)}\big |=\big |w^{(i)}-w^{(j)}\big |, \qquad \forall \, i,j, \end{aligned}$$
(2.28)

and by Lemma 2.6 there exist \(O\in O(n)\) and a vector \(b\in {\mathbb {R}}^n\) such that \(w^{(i)}=Ov^{(i)}+b\), \(i=1,\dots ,n\). In terms of the corresponding matrices this means that \(A^Y=A^XO^T+B\), where the matrix B is defined as \(B_{i,j}{:=}b_j\). We obtain therefore

$$\begin{aligned} Y=A^XO^TZ+BZ. \end{aligned}$$
(2.29)

The random vector \(A^XO^TZ\) is a centered Gaussian random vector with covariance \(A^XO^TO(A^X)^T=C^X\) so that it has the same law of X. The random vector BZ has all the components equal and setting \(W=\sum _{j=1}^nb_jZ_j\) we finish the proof. \(\square \)

A direct consequence of the above result is the following. Define the function \(F:\,\overline{\! {\mathcal {C}}}\rightarrow {\mathbb {R}}\) by

$$\begin{aligned} F(C){:=}{\mathbb {E}}\left[ \log \sum _{i=1}^nw_i\mathrm {e}^{X_i}\right] , \end{aligned}$$
(2.30)

where X is a centered Gaussian random vector with covariance C.

Lemma 2.8

Given \(C^X, C^Y \in \,\overline{\! {\mathcal {C}}}\) such that \(d_X=d_Y\), then \(F(C^X)=F(C^Y)\).

Proof

Since \(d_X=d_Y\) by Lemma 2.7 we have that \(Y=X+W\) and therefore

$$\begin{aligned}&F(C^Y) ={\mathbb {E}}\left[ \log \sum _{i=1}^nw_i\mathrm {e}^{Y_i}\right] ={\mathbb {E}}\left[ \log \sum _{i=1}^nw_i\mathrm {e}^{X_i+W}\right] \\&={\mathbb {E}}\left[ W+\log \sum _{i=1}^nw_i\mathrm {e}^{X_i}\right] =F(C^X), \end{aligned}$$

where the last equality follows by the fact that W is centered. \(\square \)

This Lemma simply says that we can define the right hand side of (2.30) as \(\widetilde{F}(d)\) since the function depends just on the metric structure of the random variables and not on their correlations.

We expect therefore to have a version of Theorem 2.2 with conditions written just in terms of the metrics. This is done in the next section.

2.3 A Generalized Condition

We show how to generalize Theorem 2.2 proving that (2.17) can be deduced under weaker hypotheses concerning just the metric structures. The same inequality has been obtained in [11] with a tricky computation. Here we show that this fact follows from a general argument that may be applied for different functions f.

Theorem 2.9

Let XY two centered Gaussian random vectors and the function f given by (2.14). If

$$\begin{aligned} d_Y\left( i,j\right) \ge d_X\left( i,j\right) \qquad \forall \, i,j, \end{aligned}$$
(2.31)

then

$$\begin{aligned} {\mathbb {E}}\left[ f\left( Y\right) \right] \ge {\mathbb {E}}\left[ f\left( X\right) \right] . \end{aligned}$$
(2.32)

Note that if conditions (2.15) and (2.16) are satisfied then (2.31) holds, but it is easy to construct examples for which (2.31) holds but (2.15), (2.16) are violated.

Observe that for any x we have that \(\mu \left( x\right) =\left( \mu _1\left( x\right) , \dots ,\mu _n\left( x\right) \right) \in {\mathcal {I}}^n\) (recall definition (2.18)) where

$$\begin{aligned} {\mathcal {I}}^n=\bigg \{\mu =\left( \mu _1,\dots ,\mu _n\right) :\,0\le \mu _i \le 1,\,\, \sum _{i=1}^n\mu _i=1\bigg \}. \end{aligned}$$

Namely, \({\mathcal {I}}^n\subset {\mathbb {R}}^n\) is a \((n-1)\)-dimensional simplex with extremal elements \(\mu ^{(1)},\dots ,\mu ^{(n)}\), where \( \mu ^{(l)}_i=\delta _{li}\).

We start with a preliminary Lemma

Lemma 2.10

Consider a symmetric matrix D and the function \(G:{\mathcal {I}}^n\rightarrow {\mathbb {R}}\) defined as

$$\begin{aligned} G\left( \mu \right) {:=}\sum _{i=1}^n\mu _iD_{ii}-\sum _{i=1}^n\sum _{j=1}^n\mu _i\mu _jD_{ij}. \end{aligned}$$
(2.33)

We have that

$$\begin{aligned} \inf _{\mu \in {\mathcal {I}}^n}G\left( \mu \right) \ge 0 \end{aligned}$$
(2.34)

if and only if

$$\begin{aligned} D_{ii}+D_{jj}-2D_{ij}\ge 0 \qquad \forall \, i,j\in \left\{ 1,\dots ,n\right\} . \end{aligned}$$
(2.35)

Proof

If condition (2.35) holds, then

$$\begin{aligned} G(\mu )\ge \sum _{i=1}^n\mu _iD_{ii}-\frac{1}{2} \sum _{i=1}^n\sum _{j=1}^n\mu _i\mu _j\left( D_{ii}+D_{jj}\right) =0. \end{aligned}$$

To obtain the last identity we used the fact that \(\mu \in {\mathcal {I}}^n\). Conversely, suppose inequality (2.34) to hold. Choose \(\mu \) such that \(\mu _l=\mu _m=\frac{1}{2}\) for some \(l\ne m\) and 0 otherwise; then (2.33) becomes

$$\begin{aligned} \frac{1}{4}\left( D_{ll}+D_{mm}\right) -\frac{1}{2}D_{lm}\ge 0 \end{aligned}$$
(2.36)

where we used the symmetry of D. Consider all the couples \(l,m\in \left\{ 1,\ldots ,n\right\} \) to get the result. \(\square \)

Proof of Theorem 2.9

By formula (2.7) we deduce the results once we show that

$$\begin{aligned} \inf _{x\in {\mathbb {R}}^n}\bigg \{\mathrm {Tr}\Big (D\left( \nabla ^2f\right) _{i,j}(x)\Big ) \bigg \}\ge 0, \end{aligned}$$
(2.37)

where we called

$$\begin{aligned} D{:=}C^Y-C^X. \end{aligned}$$
(2.38)

Using (2.19) and (2.20) we obtain that the expression to be minimized in (2.37) is

$$\begin{aligned} \sum _{i=1}^nD_{i,i}\mu _i\left( x\right) -\sum _{i=1}^n\sum _{j=1}^nD_{i,j}\mu _i\left( x\right) \mu _j\left( x\right) . \end{aligned}$$
(2.39)

We have therefore that the infimum in (2.37) coincides with \(\inf _{\mu \in {\mathcal {I}}^n}G(\mu )\) and the result follows by Lemma 2.10 since (2.35) with the matrix D defined by (2.38) coincides with (2.31). \(\square \)

3 Examples

In this section we discuss two examples, obtaining the existence of the thermodynamic limit for the quenched free energy of two models. The first one is the Sherrington–Kirkpatrick model. The existence of the thermodynamic limit for this model was obtained, by the interpolation method, in the breakthrough paper [19]. This was done using the result Theorem 2.2. We review this result as a warm-up to fix ideas and the basic constructions. We use however Theorem 2.9 and discuss the result just in terms of the metrics. Then we discuss a class of Generalized Random Energy Models [15] for which in general conditions (2.15), (2.16) fail while condition (2.31) hold. We refer to [9, 25] and [10] for the beautiful mathematics involved in the limit of such kind of models. In the final part of the section we discuss some models from a purely metric viewpoint.

3.1 The Sherrington–Kirkpatrick Model

The Sherrington–Kirkpatrick model is a mean field spin glass model [18, 24, 26, 28]. Spins configurations are \(\sigma \in \{-1,1\}^N\) and the energy of the system is given by

$$\begin{aligned} H_N\left( \sigma \right) {:=}-\frac{1}{\sqrt{N}}\sum _{i,j=1}^NJ_{i,j}\sigma \left( i\right) \sigma \left( j\right) , \end{aligned}$$
(3.1)

where \(J_{i,j}\) are i.i.d. standard Gaussian random variables. Small variants of the model consider different sums in (3.1) but all the variants are equivalent modulo simple transformations. The spins are associated to the vertices of a complete graph and the interaction between each pair of spins is determined by the variables J’s. The partition function is defined as

$$\begin{aligned} Z_N\left( \beta \right) {:=}\sum _{\{\sigma \}}\mathrm {e}^{-\beta H_N(\sigma )}, \end{aligned}$$
(3.2)

where the parameter \(\beta \) is the inverse temperature and the quenched free energy per site is defined by

$$\begin{aligned} F_N\left( \beta \right) {:=}-\frac{1}{\beta N}\mathbb {E}\left[ \log Z_N\left( \beta \right) \right] {:=}\frac{1}{\beta N}\alpha _N\left( \beta \right) , \end{aligned}$$
(3.3)

where the last equality defines the symbol \(\alpha _N\left( \beta \right) \). The variables \(\left( -\beta H_N\left( \sigma \right) \right) _{\sigma \in \left\{ -1,1\right\} ^N}\) are a centered Gaussian random vector with covariance

$$\begin{aligned} \beta ^2{\mathbb {E}}\left[ H_N\left( \sigma \right) H_N\left( \sigma '\right) \right] =\frac{\beta ^2}{N} \sum _{i,j=1}^N\sigma \left( i\right) \sigma \left( j\right) \sigma '\left( i\right) \sigma '\left( j\right) =N\beta ^2q_N^2\left( \sigma ,\sigma '\right) , \end{aligned}$$
(3.4)

where

$$\begin{aligned} q_N\left( \sigma ,\sigma '\right) {:=}\frac{1}{N}\sum _{i=1}^N\sigma \left( i\right) \sigma '\left( i\right) , \end{aligned}$$
(3.5)

is the overlap between the configurations \(\sigma \) and \(\sigma '\). The corresponding Euclidean distance according to (2.23) is given by

$$\begin{aligned} d_N\left( \sigma ,\sigma '\right) = \beta \sqrt{8N\Big [d^H_N\left( 1-d^H_N\right) \Big ]}, \end{aligned}$$
(3.6)

where

$$\begin{aligned} d^H_N\left( \sigma ,\sigma '\right) {:=}\frac{1}{N}\sum _{i=1}^N\mathbb I\Big (\sigma \left( i\right) \ne \sigma '\left( i\right) \Big ) \end{aligned}$$
(3.7)

is the Hamming distance. By the way, the Hamming distance is an example of a non Euclidean metric. Notice that we have of course \(d_N\left( \sigma ,\sigma \right) =0\) but we have also \(d_N\left( \sigma ,-\sigma \right) =0\) since \(H_N\left( \sigma \right) =H_N\left( -\sigma \right) \). The fact that the right hand side of (3.7) is a distance (indeed a pseudo distance) is not trivial but follows directly since it is obtained by (2.23) (it is a function of a metric that is again a metric, see [13, 16]).

Let us split the system into two subsystems \(S_1, S_2\) with respectively \(N_1\) and \(N_2\) vertices with \(N_1+N_2=N\). We erase the interaction between spins that belong to different subsystems. We define the restricted Hamiltonians of the subsystems as

$$\begin{aligned} H_{N_k}(\sigma ){:=}-\frac{1}{\sqrt{N}_k}\sum _{i,j\in S_k}J_{i,j}\sigma \left( i\right) \sigma \left( j\right) , \qquad k=1,2, \end{aligned}$$
(3.8)

where we remark that the sum is restricted to the indices belonging to the subsystems labeled \(k=1,2\). Here and hereafter we continue to use the symbol \(\sigma \) both for the full configuration as well as for the configuration restricted to a subsystem. When a configuration appears in an expression that is labeled by a subsystem then we mean the configuration restricted to the subsystem. For example \(d^H_{N_k}\left( \sigma ,\sigma '\right) \) and \(d_{N_k}\left( \sigma ,\sigma '\right) \) are respectively the Hamming distance (3.7) and the distance (3.6) when the configuration is restricted to the subsystem \(k=1,2\). Note that with this notation we have the key relationship

$$\begin{aligned} d^H_N\left( \sigma ,\sigma '\right) =\frac{N_1}{N}d^H_{N_1}\left( \sigma ,\sigma '\right) + \frac{N_2}{N}d^H_{N_2}\left( \sigma ,\sigma '\right) . \end{aligned}$$
(3.9)

Another important relationship is

$$\begin{aligned} \sum _{\{\sigma \}} \mathrm {e}^{-\beta \left( H_{N_1}(\sigma )+H_{N_2}(\sigma )\right) }=Z_{N_1}(\beta )Z_{N_2}(\beta ). \end{aligned}$$
(3.10)

We apply Theorem 2.9 with the vectors

$$\begin{aligned} \left\{ \begin{array}{l} Y=\big (-\beta H_N(\sigma )\big )_{\sigma \in \{-1,1\}^N},\\ X=\left( -\beta H_{N_1}(\sigma )-\beta H_{N_2}(\sigma )\right) _{\sigma \in \{-1,1\}^N}. \end{array} \right. \end{aligned}$$

The condition (2.31) becomes the super-Pythagorean relation

$$\begin{aligned} d_N\ge \sqrt{d_{N_1}^2+d_{N_2}^2}, \end{aligned}$$
(3.11)

that is equivalent to

$$\begin{aligned} \Big [d^H_N\left( 1-d^H_N\right) \Big ]\ge \frac{N_1}{N}\Big [d^H_{N_1}\left( 1-d^H_{N_1}\right) \Big ]+\frac{N_2}{N} \Big [d^H_{N_2}\left( 1-d^H_{N_2}\right) \Big ]. \end{aligned}$$
(3.12)

The above inequality is true by (3.9) and the concavity of the real function \(x \rightarrow x\left( 1-x\right) \). By Theorem 2.9 and (3.10) we deduce

$$\begin{aligned} \alpha _N\left( \beta \right) \le \alpha _{N_1}\left( \beta \right) +\alpha _{N_2}\left( \beta \right) , \end{aligned}$$
(3.13)

and by sub-additivity and the classic Fekete Lemma we deduce that the limit of the quenched free energy per site exists

$$\begin{aligned} \lim _{N\rightarrow \infty }F_N\left( \beta \right) =\lim _{N\rightarrow \infty }\frac{1}{\beta N}\alpha _N\left( \beta \right) =\inf _N\frac{1}{\beta N}\alpha _N\left( \beta \right) . \end{aligned}$$
(3.14)

3.2 The Generalized Random Energy Model

The Generalized Random Energy Model (GREM) is a spin glass model introduced by Derrida [15] to generalize the REM (Random Energy Model) imposing pair correlations between energies. The model has a hierarchical structure, as any spin configuration correspond to a leaf of a given rooted tree.

We consider sequences of finite trees codified by finite strings of non-negative integers. Let \(n\in \mathbb {N}\) and \(\underline{k}=\left( k_1,\ldots ,k_n\right) \) a vector of non-negative integers and call \(|\underline{k}| {:=}k_1+\ldots +k_n\). The tree \(\mathcal {T}_{\underline{k}}\) is constructed as follows. The root (that is the unique node at level 0) is connected to \(2^{k_1}\) nodes to compose the first level. Each node of the first level is connected to \(2^{k_2}\) nodes of the second level; we have therefore \(2^{k_1+k_2}\) nodes on the second level and so on. The n-th level consists of \(2^{k_1}2^{k_2}\ldots 2^{k_n}=2^{|\underline{k}|}\) leaves. If there exists a \(1\le j<n\) such that \(k_j=0\), we mean that the nodes of the level j coincide with those of the level \(j-1\). A spin configuration \(\sigma \in \left\{ -1,1\right\} ^{|\underline{k}|}\) is then attached to each leaf. The Hamiltonian is

$$\begin{aligned} H_{\underline{k}}\left( \sigma \right) =-\sqrt{|\underline{k}|}\left( \varepsilon _1^{(\sigma )}+\cdots +\varepsilon _{n}^{(\sigma )}\right) , \end{aligned}$$
(3.15)

where \(\varepsilon _i^{(\sigma )}\sim \mathcal {N}\left( 0,a_i\right) \) if \(k_i>0\) and \(\varepsilon _i^{(\sigma )}=0\) if \(k_i=0\). For any \(i\in \mathbb N\) we have that the \(a_i\)’s are positive numbers such that \(\sum _{i=1}^{+\infty } a_i=1\).

The random variables \(\varepsilon \)’s are attached to the edges of the tree. More precisely attached to the edges that connect the level \(i-1\) to the level i there is a family of i.i.d. centered Gaussian random variables with variance \(a_i\), one for each edge. When we write \(\varepsilon ^{(\sigma )}_i\) we mean then the random variable associated to the unique edge that connects level \(i-1\) to level i and that belongs to the unique path from the leaf associated to \(\sigma \) to the root. When \(k_i=0\) there are no edges from level \(i-1\) to level i and therefore we set \(\varepsilon ^{(\sigma )}_i=0\). Then, \(\left( H_{\underline{k}}(\sigma )\right) _{\sigma \in \left\{ -1,1\right\} ^{|\underline{k}| }}\) is a centered Gaussian random vector on the \(|\underline{k}|\)-dimensional hypercube \(\left\{ -1,1\right\} ^{|\underline{k}|}\).

We call \(l=l\left( \sigma ,\tau \right) \in \{0,1, \dots n-1\}\) the level of the hierarchy at which the two paths from the leaves \(\sigma \) and \(\tau \) of \(\mathcal {T}_{\underline{k}}\) to the root merge. The two configurations share the same energy variables \(\varepsilon ^{(\sigma )}_i=\varepsilon ^{(\tau )}_i\) for any \(i\le l\), while \(\varepsilon ^{(\sigma )}_i\ne \varepsilon ^{(\tau )}_i\) whenever \(i> l\). When \(\varepsilon ^{(\sigma )}_i\) and \(\varepsilon ^{(\tau )}_i\) are different, they are independent. Furthermore, \(\varepsilon ^{(\sigma )}_i\) and \(\varepsilon ^{(\tau )}_j\) are always independent if \(j\ne i\). We define \(\widetilde{a}_i{:=}a_i\) when \(k_i>0\) and \(\widetilde{a}_i{:=}0\) when \(k_i=0\). We get

$$\begin{aligned} \mathbb {E}\left[ H_{\underline{k}}\left( \sigma \right) H_{\underline{k}}\left( \tau \right) \right] =\left| \underline{k}\right| \sum _{i=1}^{l}\widetilde{a}_i, \end{aligned}$$
(3.16)

pointing out that the right hand side above is zero when \(l=0\). The corresponding metric according to (2.23) is given by

$$\begin{aligned} d_{\underline{k}}\left( \sigma ,\tau \right) =\sqrt{\mathbb {E}\left[ \left( H_{\underline{k}} \left( \sigma \right) -H_{\underline{k}}\left( \tau \right) \right) ^2\right] }=\sqrt{ 2 \left| \underline{k}\right| \sum _{i=l+1}^n\widetilde{a}_i}. \end{aligned}$$
(3.17)

The term inside the square root on the right hand side represents, up to a multiplicative factor, the minimal path length distance between the two leaves \(\sigma \) and \(\tau \) on the tree when each edge between level \(i-1\) and i has a length given by \(a_i\). Since the graph is a tree the path is unique and the metric (3.17) is an ultrametric. We introduce, for notational convenience, the normalized distance

$$\begin{aligned} {{s}_{\underline{k}}}\left( \sigma ,\tau \right) {:=}\sqrt{2\sum _{i=l+1}^n\widetilde{a}_i}, \end{aligned}$$
(3.18)

so that \(d_{\underline{k}}\left( \sigma ,\tau \right) =\sqrt{\left| {\underline{k}}\right| } s_{\underline{k}}\left( \sigma ,\tau \right) \) for any pair of configurations \(\sigma \) and \(\tau \).

Fig. 2
figure 2

The paths \(\sigma \) and \(\tau \) are at distance \(s_{{\underline{k}}}\left( \sigma ,\tau \right) =2\left( a_2+a_3\right) \)

Both the correlations (3.16) and the metric (3.17) depend on the vector \(\underline{k}\) and on the assignment of configurations to leaves. We will discuss soon this.

Like for the Sherrington–Kirkpatrick model, given an inverse temperature \(\beta \), we introduce the disorder-dependent partition function

$$\begin{aligned} Z_{\underline{k}}\left( \beta \right) {:=}\sum _{\left\{ \sigma \right\} }\mathrm {e}^{-\beta H_{{\underline{k}}}\left( \sigma \right) } \end{aligned}$$
(3.19)

and the quenched average of the free energy per site

$$\begin{aligned} F_{\underline{k}}\left( \beta \right) {:=}-\frac{1}{\beta |\underline{k}|}\,\mathbb {E}\left[ \log Z_{\underline{k}}\left( \beta \right) \right] . \end{aligned}$$
(3.20)

We prove the existence of the thermodynamic limit of (3.20) under general assumptions when a parameter N is diverging and the vector \(\underline{k}=\underline{k}\left( N\right) \) is growing in such a way that also \(n=n\left( N\right) \) diverges. Contucci et al. [12] proved this fact when n is constant. This was obtained applying the same strategy of the Guerra–Toninelli interpolation method [19]; in particular, they used the inequality in Theorem 2.2. When n is no longer bounded this inequality fails while the inequality in Theorem 2.9 continues to work. We describe now more precisely the growing mechanism of the model and prove the existence of the thermodynamic limit.

3.2.1 Growing and Labeling

We consider a sequence of growing trees labeled by a sequence of vectors \(\underline{k}\left( N\right) \). For each \(N\in \mathbb {N}\) we have the tree \(\mathcal {T}_{\underline{k}\left( N\right) }\) defined by the following hypothesis and rules.

  1. (H1)

    Let \(\left( \alpha _i\right) _{n=1}^{\infty }\) be a sequence of reals larger than 1 satisfying the constraint

    $$\begin{aligned} \sum _{i=1}^{\infty }\log \alpha _i=\log 2. \end{aligned}$$
    (3.21)

    The \(\alpha _i\)’s define the tree \(\mathcal {T}_{\underline{k}\left( N\right) }\) through

    $$\begin{aligned} k_i\left( N\right) {:=}\left\lfloor \frac{N\log \alpha _i}{\log 2}\right\rfloor , \qquad i\in \mathbb {N}, \end{aligned}$$
    (3.22)

    where \(\lfloor \, \cdot \,\rfloor \) denotes the integer part.

  2. (H2)

    The sequence \(\left( a_i\right) _{i=1}^{\infty }\) corresponds to the lengths of the edges from the different levels and the variance of the associated random variables and satisfies the condition \(\sum _{i=1}^{\infty }a_i=1\).

The exact values of the sums of the series are not really important and could be substituted just by summability conditions. Formula (3.22) follows by the fact that we ask that the number of edges connecting a given node at level \(i-1\) to nodes at level i grows exponentially like \(\alpha _i^N\).

Observe that by (3.21), for any fixed \(N>0\) in \({\underline{k}\left( N\right) }\) just a finite number of components are different from zero. We define

$$\begin{aligned} n{:=}n\left( N\right) {:=}\max \{i\,:\, k_i\left( N\right) >0\} \end{aligned}$$
(3.23)

and the finite vector \(\underline{k}\left( N\right) {:=}\left( k_1\left( N\right) ,\ldots ,k_n\left( N\right) \right) \). Then, a spin configuration \(\sigma \in \left\{ -1,1\right\} ^{\left| \underline{k}\left( N\right) \right| }\) is assigned to each leaf. The method is actually arbitrary; indeed, the free energy of the system is obtained summing over all the configurations, thus getting rid of any dependence on the underlying choice.

We assign a spin configuration to each leaf of the tree as follows. At fixed N, we attach to every edge one or more labels of type \(\left( m,s\right) \), where \(s=\pm 1\) and \(m\in \left\{ 1,\ldots , \left| \underline{k}(N)\right| \right\} \). Given a leaf there exists a unique path toward the root. If this path crosses an edge having a label \(\left( m,s\right) \) then the configuration \(\sigma \) associated to the leave is such that \(\sigma \left( m\right) =s\). We assign the labels in such a way that every path meets all the labels \(m=1, \dots , \left| \underline{k}\left( N\right) \right| \) and such that different leaves have associated different configurations.

We embed the tree on a plane so that the root is on the top and the paths from the leaves to the root are going upwards. Moreover all the edges connecting a given node with the nodes at the successive level are ordered from left to right. Each edge connecting the level \(i-1\) to level i has exactly \(k_i\left( N\right) \) labels corresponding to the values \(m=\sum _{j=1}^{i-1}k_j\left( N\right) +1, \sum _{j=1}^{i-1}k_j\left( N\right) +2, \dots , \sum _{j=1}^{i}k_j\left( N\right) \). The corresponding values of the parameter s are fixed as follows.

Fix a node at level \(i-1\). Number each edge connecting this node with a node at level i with an integer number going from left to right from the value 0 to \(2^{k_i\left( N\right) }-1\). The leftmost will correspond to 0 while the rightmost to \(2^{k_i\left( N\right) }-1\). Do this for each node. Write these integers in binary code so that the leftmost edges are numbered with \(k_i\left( N\right) \) zeros and the rightmost with \(k_i(N)\) ones. In our setting, the 0 corresponds to the − sign and the 1 to the \(+\) sign. Then, we associate the lowest value of m to the most significant digit and the highest value of m to the less significant one. See Fig. 3 for an example.

Fig. 3
figure 3

Example of assignation of the labels for \(\underline{k}=\left( 1,2,1\right) \). Paths \(\sigma \) and \(\tau \) have spin configurations \(\sigma =\left( -1,-1,1,-1\right) \), \(\tau =\left( -1,1,1,1\right) \)

3.2.2 Splitting the System

Let \(N>0\) and consider a pair of integers \(N_1,N_2\) such that \(N_1+N_2=N\). We already know how to construct the trees \(\mathcal {T}_{\underline{k}\left( N\right) }\), \(\mathcal {T}_{\underline{k}\left( N_1\right) }\) and \(\mathcal {T}_{\underline{k}\left( N_2\right) }\). Their geometric structure is simply codified by the finite vectors \(\underline{k}\left( N\right) \), \(\underline{k}\left( N_1\right) \), \(\underline{k}\left( N_2\right) \) and we recall that, by definition, we have

$$\begin{aligned} k_{i}\left( N_j\right) {:=}\left\lfloor \frac{N_j\log \alpha _i}{\log 2}\right\rfloor , \qquad j=1,2, \,\,\,\,\,i\in \mathbb N. \end{aligned}$$
(3.24)

Notice that

$$\begin{aligned} k_i\left( N_1\right) +k_i\left( N_2\right) \le k_i\left( N\right) \le k_i\left( N_1\right) +k_i\left( N_2\right) +1. \end{aligned}$$
(3.25)

We associate the labels to the edges and leaves of the full system \(\mathcal {T}_{\underline{k}\left( N\right) }\) as in the previous section. The labels of the two subsystems \(\mathcal {T}_{\underline{k}\left( N_1\right) }\) and \(\mathcal {T}_{\underline{k}\left( N_2\right) }\) are instead attributed in a slightly different way in order to have different spins (different labels m) belonging to the two subsystems.

The labels m attributed to the edges from level \(i-1\) to level i in the full system coincide with the set \(\left\{ \sum _{j=1}^{i-1}k_j\left( N\right) +1,\sum _{j=1}^{i-1}k_j\left( N\right) +2, \dots , \sum _{j=1}^{i}k_j\left( N\right) \right\} \). When we split the system into the two subsystems we assign to the edges that connect each node in the level \(i-1\) to the level i of the subsystem \(\mathcal {T}_{\underline{k}\left( N_1\right) }\) the labels \(\left\{ \sum _{j=1}^{i-1}k_j\left( N\right) +1, \dots , \sum _{j=1}^{i-1}k_j\left( N\right) +k_i\left( N_1\right) \right\} \) while we assign to the edges that connect each node in the level \(i-1\) to the level i of the subsystem \(\mathcal {T}_{\underline{k}\left( N_2\right) }\) the labels \(\left\{ \sum _{j=1}^{i-1}k_j\left( N\right) +k_i\left( N_1\right) +1,\dots ,\sum _{j=1}^{i-1}k_j\left( N\right) +k_i\left( N_1\right) +k_i\left( N_2\right) \right\} \). By (3.25) this is well defined. Once split the labels m into the two subsystems, the assignment of the label \(s=\pm \) follows the same rule of the previous section. Since \(k_i\left( N_1\right) +k_i\left( N_2\right) \) may be strictly less than \(k_i\left( N\right) \), some of the labels m (i.e. some spins) may disappear in the splitting.

We discuss now the behavior of the distances. Consider two finite vectors \(\underline{k}\) and \(\underline{k}'\) such that \(k'_i\le k_i\) for any i. We assign the labels to \(\mathcal T_{\underline{k}}\) in the usual way while instead we assign the labels to \(\mathcal T_{\underline{k}'}\) as follows. We assign to the edges that connect each node in the level \(i-1\) to the level i of \(\mathcal {T}_{\underline{k}'}\) arbitrarily \(k'_i\) of the \(k_i\) labels in \(\mathcal T_{\underline{k}}\). The assignment of the labels \(s=\pm \) follows then the usual rule.

We call respectively \(d_{\underline{k}}\) and \(d_{\underline{k}'}\) the metrics defined by formula (3.17) for the two trees \(\mathcal T_{\underline{k}}\) and \(\mathcal T_{\underline{k}'}\) and \(s_{\underline{k}}\), \(s_{\underline{k}'}\) the corresponding normalized distances (see (3.18)). As before given two spin configurations \(\sigma ,\tau \in \left\{ -1,1\right\} ^{\left| \underline{k}\right| }\) we call again \(\sigma ,\tau \in \left\{ -1,1\right\} ^{\left| \underline{k}'\right| }\) the same configurations but restricted just to the labels assigned to the edges in \(\mathcal T_{\underline{k}'}\). We have the following.

Lemma 3.1

Consider two finite vectors \(\underline{k}' \le \underline{k}\) and the corresponding trees \(\mathcal T_{\underline{k}}\) and \(\mathcal T_{\underline{k}'}\) with configurations of spins associated to the leaves as above. Then we have

$$\begin{aligned} s_{\underline{k}'}\left( \sigma ,\tau \right) \le s_{\underline{k}}\left( \sigma ,\tau \right) , \qquad \forall \, \sigma , \tau . \end{aligned}$$
(3.26)

Proof

Consider the tree \(\mathcal T_{\underline{k}}\), two configurations \(\sigma , \tau \) associated to two leaves and their corresponding geodetic path. Let us now consider a new finite vector \(\underline{k}'\) obtained by \(\underline{k}\) simply decreasing by one just a single component and preserving all the remaining ones, i.e. \(k'_i=k_i-1\) and \(k'_j=k_j\) for all \(j\ne i\). Suppose that the label m that is missing in \(\mathcal T_{\underline{k}'}\) is \(m^*\). The tree \(\mathcal T_{\underline{k}'}\) with the corresponding labeling is obtained from \(\mathcal T_{\underline{k}}\) and the original labeling simply as follows. All the edges connecting nodes at level \(i-1\) to nodes at level i in \(\mathcal T_{\underline{k}}\) can be paired into pairs having exactly the same labels apart the one corresponding to \(m^*\). The two paired edges will have labels respectively \(\left( m^*,+\right) \) and \(\left( m^*,-\right) \). If we identify each paired couple of edges, and consequently we identify too the subtrees starting from the identified nodes, we get a tree that coincides with \(\mathcal T_{\underline{k}'}\) with exactly the same assignments of labels. In particular, the leaves associated to \(\sigma \), \(\tau \) in the new tree will be exactly the original ones after the identification. Finally the geodetic path too remains the same after the identification (see e.g. Fig. 4).

Since the identification procedure can only shorten this path we have the statement of the lemma when \(\underline{k}'\) is obtained by \(\underline{k}\) decreasing by one just one of its components. We finish the proof observing that any \(\underline{k}'\le \underline{k}\) can be obtained by \(\underline{k}\) after a finite numbers of iterations of this type. \(\square \)

Fig. 4
figure 4

After the coalescence of the branches with \(m^*=2\), the configurations \(\sigma \) and \(\tau \) in Figs. 3 and 4 are at distance \(s_{\underline{k}'}\left( \sigma ,\tau \right) =\sqrt{2 a_3}<s_{\underline{k}}\left( \sigma ,\tau \right) =\sqrt{2\left( a_2+a_3\right) }\)

Remark 3.2

Both \(\mathcal T_{\underline{k}\left( N_i\right) }\), \(i=1,2\) are obtained by \(\mathcal T_{\underline{k}\left( N\right) }\) as in the hypothesis of Lemma 3.1 and we have therefore

$$\begin{aligned} s_{\underline{k}\left( N\right) }\left( \sigma ,\tau \right) \ge \max \left\{ s_{\underline{k}\left( N_1\right) }\left( \sigma ,\tau \right) , s_{\underline{k}\left( N_2\right) }\left( \sigma ,\tau \right) \right\} , \qquad \forall \,\sigma , \tau ,\,\,\,\,\, i=1,2. \end{aligned}$$
(3.27)

Since by (3.25) we have \(\frac{|\underline{k}(N_1)|+|\underline{k}(N_2)|}{|\underline{k}(N)|}\le 1\) and we deduce

$$\begin{aligned} s^2_{\underline{k}\left( N\right) }\left( \sigma ,\tau \right) \ge \frac{\left| \underline{k}\left( N_1\right) \right| }{\left| \underline{k}\left( N\right) \right| }s^2_{\underline{k}\left( N_1\right) }\left( \sigma ,\tau \right) +\frac{\left| \underline{k}\left( N_2\right) \right| }{\left| \underline{k}\left( N\right) \right| }s^2_{\underline{k}\left( N_2\right) }\left( \sigma ,\tau \right) , \end{aligned}$$
(3.28)

that is equivalent to the super-Pythagorean condition

$$\begin{aligned} d_{\underline{k}\left( N\right) }\left( \sigma ,\tau \right) \ge \sqrt{d^2_{\underline{k}\left( N_1\right) }\left( \sigma ,\tau \right) +d^2_{\underline{k}\left( N_2\right) }\left( \sigma ,\tau \right) }. \end{aligned}$$
(3.29)

3.2.3 Thermodynamic Limit

We define the energy of our sequence of GREM models as \(H_N\left( \sigma \right) := H_{\underline{k}\left( N\right) }\left( \sigma \right) \) (recall definition (3.15)) and the corresponding partition functions and density of free energies like in (3.19), (3.20) more precisely \(Z_N\left( \beta \right) =\sum _{\left\{ \sigma \right\} } \mathrm {e}^{-\beta H_N\left( \sigma \right) }\) and

$$\begin{aligned} F_N(\beta )=-\frac{1}{\beta \left| \underline{k}\left( N\right) \right| }{\mathbb {E}}\left[ \log Z_N\left( \beta \right) \right] =:\frac{\alpha _N(\beta )}{\beta \left| \underline{k}\left( N\right) \right| }, \end{aligned}$$
(3.30)

where the last equality defines the symbol \(\alpha _N(\beta )\).

We need a preliminary Lemma. Let us call \(\gamma _i{:=}\frac{\log \alpha _i}{\log 2}>0\). Observe that by definition we have \(\sum _{i=1}^{+\infty }\gamma _i=1\).

Lemma 3.3

We have

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{\left| \underline{k}(N)\right| }{N}=1 \end{aligned}$$
(3.31)

Proof

For any finite k we have

$$\begin{aligned} \frac{\sum _{i=1}^{k}\left( N\gamma _i-1\right) }{N}\le \frac{\left| \underline{k}\left( N\right) \right| }{N}\le \frac{\sum _{i=1}^{+\infty }N\gamma _i}{N}. \end{aligned}$$
(3.32)

The right hand side of the above equation is 1. The left hand side converges when \(N\rightarrow \infty \) to \(\sum _{i=1}^k\gamma _i\). Taking now the limit on \(k\rightarrow \infty \) we deduce the statement of the Lemma. \(\square \)

We can now prove the existence of the limit for quenched free energy per site of a GREM model with infinite levels.

Theorem 3.4

Under the hypothesis (H1) and (H2), there exists the limit when \(N\rightarrow \infty \) of the density of free energy (3.30) defined on \(\mathcal {T}_{\underline{k}\left( N\right) }\), in the sense that there exists the following limit that coincides with an infimum

$$\begin{aligned} -\infty<\lim _{N\rightarrow \infty }-\frac{1}{\beta \left| \underline{k}\left( N\right) \right| }\,\mathbb {E}\left[ \log Z_{N}\left( \beta \right) \right] =\inf _N\frac{\alpha _N(\beta )}{\beta N} <\infty . \end{aligned}$$
(3.33)

Proof

We apply the interpolation method for the Gaussian random vectors \(H_{\underline{k}\left( N\right) }\left( \sigma \right) \) and \(H_{\underline{k}\left( N_1\right) }\left( \sigma \right) +H_{\underline{k}\left( N_2\right) }\left( \sigma \right) \) that are both labeled by the configurations \(\sigma \in \left\{ -1,1\right\} ^{\left| \underline{k}\left( N\right) \right| }\). The Gaussian random variables used to compute \(H_{\underline{k}\left( N\right) }, H_{\underline{k}\left( N_1\right) }\) and \(H_{\underline{k}\left( N_2\right) }\) are all independent among them. Note that since in the splitting some spins are lost then the second Gaussian random vector is degenerate.

We have the following identity

$$\begin{aligned} \sum _{\left\{ \sigma \right\} }\mathrm {e}^{-\beta H_{\underline{k}\left( N_1\right) }\left( \sigma \right) }\mathrm {e}^{-\beta H_{\underline{k}\left( N_2\right) }\left( \sigma \right) }=Z_{N_1}\left( \beta \right) Z_{N_2}\left( \beta \right) 2^{\left| \underline{k}\left( N\right) \right| -\left| \underline{k}\left( N_1\right) \right| -\left| \underline{k}\left( N_2\right) \right| }. \end{aligned}$$
(3.34)

The last term is due to the fact that some spins may be lost in the splitting.

By Remark 3.2, we can apply Theorem (2.9) getting

$$\begin{aligned} \alpha _N\left( \beta \right) \le \alpha _{N_1}\left( \beta \right) +\alpha _{N_2}\left( \beta \right) -\Big (\left| \underline{k}\left( N\right) \right| -\left| \underline{k}\left( N_1\right) \right| -\left| \underline{k}\left( N_2\right) \right| \Big )\log 2. \end{aligned}$$
(3.35)

Since the last term in the above inequality is non-negative we obtain that the sequence \(\alpha _N(\beta )\) is subadditive. By Fekete’s Lemma we deduce that there exists the limit

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{\alpha _N(\beta )}{\beta N}=\inf _N\frac{\alpha _N(\beta )}{\beta N}. \end{aligned}$$
(3.36)

By Lemma 3.3 we have that

$$\begin{aligned} \lim _{N\rightarrow \infty }\frac{\alpha _N(\beta )}{\beta \left| \underline{k}\left( N\right) \right| }= \lim _{N\rightarrow \infty }\frac{\alpha _N(\beta )}{\beta N}, \end{aligned}$$
(3.37)

and we get the main statement of the Theorem.

It remains just to prove that the limit is strictly bigger than \(-\infty \).

This follows by the summability of the variances \(a_i\)’s. Indeed, we prove that for any \(N>0\), \(-\beta F_{N}\left( \beta \right) \) is bounded from above. We have

$$\begin{aligned} -\beta F_{N}\left( \beta \right)= & {} \frac{1}{\left| \underline{k}\left( N\right) \right| }\,\mathbb {E}\left[ \log Z_{N}\left( \beta \right) \right] =\frac{1}{\left| \underline{k}\left( N\right) \right| }\,\mathbb {E}\left[ \log \sum _{\left\{ \sigma \right\} }\mathrm {e}^{\beta (\varepsilon ^{(\sigma )}_1+\varepsilon ^{(\sigma )}_2+\ldots +\varepsilon ^{(\sigma )}_{n\left( N\right) })}\right] \nonumber \\\le & {} \frac{1}{\left| \underline{k}\left( N\right) \right| }\,\log \sum _{\left\{ \sigma \right\} } \mathbb {E}\left[ \mathrm {e}^{\beta (\varepsilon ^{(\sigma )}_1+\varepsilon ^{(\sigma )}_2+\ldots +\varepsilon ^{(\sigma )}_{n\left( N\right) })} \right] , \end{aligned}$$
(3.38)

where we used Jensen’s inequality. Since the \(\varepsilon _i^{(\sigma )}\) are independent, the expectation value in the last row is the product of generating functions:

$$\begin{aligned} \mathbb {E}\left[ \mathrm {e}^{\beta \varepsilon _i^{(\sigma )}}\right] =\mathrm {e}^{\frac{\beta }{2}a_i}\qquad \forall \,i, \end{aligned}$$
(3.39)

hence

$$\begin{aligned} -\beta F_{N}\left( \beta \right) \le \frac{1}{\left| \underline{k}\left( N\right) \right| }\,\log \sum _{\left\{ \sigma \right\} } \mathrm {e}^{\frac{\beta }{2}\sum _{i=1}^{n\left( N\right) }a_i} \le \frac{1}{\left| \underline{k}\left( N\right) \right| }\,\log \sum _{\left\{ \sigma \right\} }\mathrm {e}^{\frac{\beta }{2}}=\log 2 + \frac{\beta }{2\left| \underline{k}\left( N\right) \right| }, \end{aligned}$$
(3.40)

where we used the fact that \(\sum _{i=1}^{\infty }a_i=1\). \(\square \)

Just as a remark we show in Lemma 3.5 in the Appendix that the third term in the right hand side of (3.35) is negligible when N is large. This fact is irrelevant for the proof but it is interesting in itself since for different models we could have a similar situation but with the wrong sign and a bound of this type could allow to apply the generalized subadditive lemmas in [14].

3.3 Geometric Remarks

Since by Lemma 2.8 we have that the density of free energy just depends on the metric structure of the Gaussian random variables, it is interesting to analyze the metric structure corresponding to the different models. Moreover natural and interesting models can be introduced starting directly from the metric description. Since all the metric spaces involved must be Euclidean a relevant characteristic is the dimension of the space where the metric can be realized as a collection of points.

Another useful remark is that the super-Pythagorean relation (3.11) implies (3.13), that gives the convergence (3.14) to the infimum of the density of free energies while instead a sub-Pythagorean relation (i.e. (3.11) with the opposite inequality) would imply a convergence of the density of free energy to the supremum.

Let us start with the Sherrington–Kirkpatrick model. Since the energy is defined in terms of \(N^2\) i.i.d. Gaussian random variables the metric of the model with N sites can be represented by \(2^N\) points embedded on \({\mathbb {R}}^{N^2}\). Indeed a natural representation of this metric is the following. Consider \(\sigma \in \left\{ -1,1\right\} ^N\) as a column vector and define the projector \(\widehat{\sigma }:=\sqrt{N}\sigma \sigma ^T\) that is a positive definite \(N\times N\) matrix. By a direct computation we have that the metric induced by the Sherrington–Kirkpatrick model is given by

$$\begin{aligned} d_N\left( \sigma ,\eta \right) =\sqrt{\mathrm {Tr}\left( \left( \widehat{\sigma }-\widehat{\eta }\right) ^2\right) }, \end{aligned}$$

i.e. it is the Euclidean metric on the one dimensional projectors induced by the Hilbert-Schmidt scalar product. The super-Pythagorean relation is strictly related to the fact that all the projectors belong to the cone of positive definite matrices.

The metric structure of the GREM is best described with the trees illustrated in Sect. 3.2.

An interesting class of models can be introduced directly defining the metric. Let \(v^{\pm 1}\in {\mathbb {R}}^2\) be two vectors. To any \(\sigma \in \left\{ -1,1\right\} ^N\) we associate the vector \(v\left( \sigma \right) \in {\mathbb {R}}^{2^N}\) defined by

$$\begin{aligned} v\left( \sigma \right) {:=}\otimes _{i=1}^N v^{\sigma (i)}. \end{aligned}$$
(3.41)

We have

$$\begin{aligned} \Big (v\left( \sigma \right) , v\left( \eta \right) \Big )=\prod _{i=1}^N\left( v^{\sigma (i)}, v^{\eta (i)}\right) . \end{aligned}$$

If we let \(v^{\pm 1}=v^{\pm 1, N}\) depend on N in such a way that \(|v^{\pm 1,N}|^2=N^{\frac{1}{N}}\) and \(\left( v^{1,N}, v^{-1, N}\right) =N^{\frac{1}{N}}\alpha ^{\frac{1}{N}}\), \(\alpha \in [0,1)\) we obtain that the Euclidean metric on the \(2^N\) points embedded into \({\mathbb {R}}^{2^N}\) (that are the points in correspondence of the vectors) is given by

$$\begin{aligned} d_N(\sigma ,\eta )=\sqrt{2N\left( 1-\alpha ^{d^H_N(\sigma ,\eta )}\right) }. \end{aligned}$$
(3.42)

This Euclidean metric (as before it is a function of a metric that is again a metric [13, 16]) satisfies the super-Pythagorean relation (3.11) since the real function \(1-\alpha ^x\) is concave. We have therefore convergence of the free energy densities. A model that corresponds to this metric can be fixed such that \({\mathbb {E}}\left[ H_N\left( \sigma \right) H_N\left( \eta \right) \right] \sim \alpha ^{d^H_N\left( \sigma ,\eta \right) }\). The special case \(\alpha =0\) corresponds to the REM model that is a special case of the GREM discussed in Sect. 3.2 with one single branch and all the leaves directly connected to the root. In this case all the points are equally spaced.