Remarks on the interpolation method

We discuss a generalization of the conditions of validity of the interpolation method for the density of quenched free energy of mean field spin glasses. The condition is written just in terms of the $L^2$ metric structure of the Gaussian random variables. As an example of application we deduce the existence of the thermodynamic limit for a GREM model with infinite branches for which the classic conditions of validity fail.


Introduction
The interpolation method is a simple but powerful technique used to prove inequalities for Gaussian random vectors (see for example [10] and [11]). This method has great relevance in the field of Mathematical and Theoretical Physics since it represents an essential ingredient in the study of mean field spin glasses. In the breakthrough paper [9] it has been used to prove the existence of the thermodynamic limit for the quenched density of free energy for the Sherrington-Kirkpatrick model. This was a longstanding problem and its solution was the turning point towards the proof of the Parisi Formula [18].
Spin glasses are simple mathematical models for disordered systems whose rigorous analysis is indeed a challenge for mathematicians. We refer to [13], [17] the mathematically interested reader and to [12] the physically interested one. Among plenty of models, one of the most studied is that introduced by Sherrington and Kirkpatrick in [15] as a solvable elementary model. Indeed the structure of the solution turned out to be much more rich and complex than expected and was build up in a series of papers by Parisi (see [12] for a detailed discussion). A rigorous proof of the Parisi conjectured solution was missing for a long time and the interpolation method played a key role in its proof. See [8] for a review on this.

arXiv:2004.00714v1 [math-ph] 1 Apr 2020
Using the same idea of [9], the authors of [4] proposed a general setting for the interpolation method in the framework of mean field spin glasses. Furthermore, they successfully applied this technique to prove the existence of the thermodynamic limit for the Generalized Random Energy Model (GREM, a family of models introduced in [6]) with a finite number of levels.
The "classical" hypothesis under which the interpolation method can be applied to the quenched free energy of mean field spin glasses consists of a collection of equalities and inequalities for the covariance matrix of the underlying multivariate Gaussian process. We show that less restrictive conditions are actually needed. More precisely, we show that the method works under conditions that involve just the L 2 metric structure of the Gaussian random vectors. By the correspondence in [16], [7] this is always an Euclidean metric structure. A condition of this type is very natural since the quenched free energy depends on the distribution of the Gaussian random vector only through its metric structure. This gives an interesting geometric flavor and interpretation. A similar inequality was obtained through a tricky computation in the framework of Sudakov-Fernique inequalities in [3]. Here we show that the result follows by a general argument involving the special form of the function and that, in a sense, is the best possible. As an example of application of the improved conditions, we consider a GREM model with infinite levels and deduce the existence of the thermodynamic limit for the quenched density of free energy. Indeed, in this case the usual conditions of validity of the interpolation method used in [9], [4] fail. We can deduce therefore the existence of the thermodynamic limit directly using the simple argument of the interpolation method. We refer to [14], [1] and [2] for the beautiful mathematics involved in the limit of such kind of models.
The structure of the paper is the following. In Section 2 we briefly recall the basics of the interpolation method together with the conditions used in [9] and [4]; we then discuss the Euclidean metric structure associated to any Gaussian random vector and finally show our generalized conditions.
In Section 3 we discuss two examples. The first one is the Sherrington-Kirkpatrick model. This is done simply to recall the basic mechanism and idea of application. The second example is a GREM model with infinite levels for which it is necessary to use our generalized conditions to prove the existence of the thermodynamic limit.
In the Appendix we collect some elementary Lemmas.

The interpolation method
2.1. The interpolation method. Let X = (X 1 , . . . , X n ) be a n-dimensional zero mean Gaussian random vector having covariance matrix C. The n × n symmetric matrix C is non-negative definite and the elements are defined by When C is positive definite then the distributions of X is absolutely continuous with respect to the Lebesgue measure on R n and the density is where ( · , · ) denotes the Euclidean scalar product in R n . We restrict to the case of positive definite matrices since the other cases can be deduced by a limiting procedure. We have the Fourier transform representation We denote by Tr ( · ) the trace of a matrix and consider C the set of non negative definite symmetric matrices endowed with the Hilbert-Schmidt scalar product The open set of positive definite symmetric matrices corresponds to C.
Let φ : C × R n → R + as defined in (2.1). By (2.2) and a direct computation we have and Recall that in the above formulas C is a symmetric matrix so that the variations in the computation of (2.4) are constructed varying symmetrically the matrix C. More precisely let E {i,j} with i = j be the symmetric matrix such that E {i,j} j,i = 1 and having all the remaining elements equal to zero. Given F : C → R we define Consider now f : R n → R a C 2 function with moderate growth at infinity, for example such that |f (x)| ≤ e λ|x| for a suitable constant λ ≥ 0. This technical condition is related to the validity of some integrations by parts. We call ∇ 2 f (x) the Hessian matrix of f at x, that is the symmetric matrix having elements The following result is the interpolation method. For the readers convenience we give the short proof.
Lemma 2.1. [Interpolation method] Consider two mean zero Gaussian random vectors X, Y having covariance matrices respectively given by C X and C Y . Consider a C 2 function f with moderate growth we have where 8) and X, Y are two independent copies of the random vectors.
Proof. When Z is a n-dimensional centered Gaussian random vector, then E [f (Z)] depends only on the covariance matrix C of the vector Z. Fix a C 2 function f and define the function F : C → R as With the help of formulas (2.4), (2.5), when C ∈ C we can compute and where Z (t) is a centered Gaussian random vector having covariance C (t). The special case when the curve linearly interpolates between C X and C Y gives (2.7) with Z(t) given by (2.8). If one or both the matrices C X and C Y are not strictly positive definite, it is possible to add to the matrices εI, do the same computation as above and finally take the limit ε → 0.
The above formula is the core of the interpolation method. It is very useful to establish inequalities between the two expected values on the left hand side of (2.7).
The Guerra-Toninelli interpolation method is a simple but powerful technique developed in the study of mean field spin glasses (see [8,9] and references therein), which is based on an abstract theorem about Gaussian random variables. It corresponds to the interpolation method 2.1 with the special choice of the function where w i ∈ R + are some fixed positive weight. In particular, Guerra and Toninelli obtained and used the following result (this is Theorem 2 in [8]) to prove the existence of the thermodynamic limit of the Sherrington-Kirkpatrick model. The same idea and the same Theorem, (Theorem 2.2 below) was used later on in [4] to deduce the existence of the thermodynamic limit for a GREM model [6] with a finite number of levels.
Theorem 2.2. Let X, Y two centered Gaussian random vectors and the function f given by (2.14). If By a direct computation, when f is (2.14), we have and the result follows by (2.7).

2.2.
Covariances and metrics. We start showing a simple but useful Lemma Lemma 2.3. We have that the n × n symmetric matrix C belongs to C if and only if there exist n vectors a (i) ∈ R n such that Proof. If a matrix can be written like (2.22) then we have x i a (i) . Conversely, if C ∈ C then it can be written as C = AA T where A is a n × n matrix and A T denotes its transposed matrix. Let us introduce the vectors a (i) = (a A finite metric space with n points is called Euclidean if there exists a collection of n points on R k having the same distances. Of course we can always fix k = n. Not every metric space can be realized in this way. The simplest example is the minimal path metrics on the vertices of the graph in Figure 1 where the edges have all length 1.
Given a centered Gaussian random vector X there is naturally associated the metric d X that is the L 2 distance between the random variables We have the following result (see for example [7,16]) is Euclidean if and only if there exists a mean zero Gaussian random vector X = (X 1 , . . . , X n ) such that d = d X .
Proof. Consider d and Euclidean distance and let a (i) , i = 1, . . . , n be some points on R n that realize such a distance. Let A be an n × n matrix defined by A i,j := a (i) j . Let Z = (Z 1 , . . . , Z n ) be a vector of i.i.d. standard Gaussian random variables and consider the Gaussian vector X = AZ whose covariance C X = AA T coincides with the right hand side of (2.22). Using (2.23) we have (2.24) Figure 1. The simplest non Euclidean metric space.
Conversely given X a Gaussian mean zero vector with covariance C X and let A an n × n matrix such that C X = AA T . Define n vectors in R n by a (i) j := A i,j ; by (2.23) we have that d X is determined by the first equality in (2.24) and is therefore Euclidean.
The metric structure d X contains less information than the covariance C X and there are random vectors having different covariances but the same metric structure. This type of invariance is best understood in terms of the vectors in R d using the Lemmas in Appendix that characterizes invariance by rotations and translations. In particular we can completely characterize the centered Gaussian random variables that share the same metric structure.
Lemma 2.5. Given X and Y two n-dimensional centered Gaussian random vectors, we have that d X = d Y if and only if there exists a centered Gaussian random variable W such that the random vector X i + W , i = 1, . . . , n has the same distribution of Y .
Proof. If Y has the same distribution of X + W then and by Lemma 3.6 there exist O ∈ O(n) and a vector b ∈ R n such that w (i) = Ov (i) + b, i = 1, . . . , n. In terms of the corresponding matrices this means that The random vector A X O T Z is a centered Gaussian random vector with covariance A X O T O(A X ) T = C X so that it has the same law of X. The random vector BZ has all the components equal and setting W = n j=1 b j Z j we finish the proof.
A direct consequence of the above result is the following. Define the function where X is a centered Gaussian random vector with covariance C.
where the last equality follows by the fact that W is centered.
This Lemma simply says that we can define the right hand side of (2.27) as F (d) since the function depends just on the metric structure of the random variables and not on their correlations.
We expect therefore to have a version of Theorem 2.2 with conditions written just in terms of the metrics. This is done in the next section.

2.3.
A generalized condition. We show how to generalize Theorem 2.2 proving that (2.17) can be deduced under weaker hypotheses concerning just the metric structures. The same inequality has been obtained in [3] with a tricky computation.
Here we show that this fact follows from a general argument and that it is somehow the best possible bound.
Theorem 2.7. Let X, Y two centered Gaussian random vectors and the function f given by (2.14). Observe that for any x we have that µ (x) = (µ 1 (x) , . . . , µ n (x)) ∈ I n (recall definition (2.18)) where Namely, I n ⊂ R n is a (n−1)-dimensional simplex with extremal elements µ (1) , . . . , µ (n) , where µ We start with a preliminary Lemma Lemma 2.8. Consider a symmetric matrix D and the function G : I n → R defined as Proof. If condition (2.33) holds, then To obtain the last identity we used the fact that µ ∈ I n . Conversely, suppose inequality (2.32) to hold. Choose µ such that µ l = µ m = 1 2 for some l = m and 0 otherwise; then (2.31) becomes where we used the symmetry of D. Consider all the couples l, m ∈ {1, . . . , n} to get the result.
Proof of Theorem 2.7. By formula (2.7) we deduce the results once we show that We have therefore that the infimum in (2.35) coincides with inf µ∈I n G(µ) and the result follows by Lemma 2.8 since (2.33) with the matrix D defined by (2.36) coincides with (2.29).

Examples
In this section we discuss two examples, obtaining the existence of the thermodynamic limit for the quenched free energy of two models. The first one is the Sherrington-Kirkpatrick model. The existence of the thermodynamic limit for this model was obtained, by the interpolation method, in the breakthrough paper [9]. This was done using the result 2.2. We review this result as a warm-up to fix ideas and the basic constructions. We use however Theorem 2.7 and discuss the result just in terms of the metrics. Then we discuss a class of Generalized Random Energy Models [6] for which in general conditions (2.15), (2.16) fail while condition (2.29) hold. We refer to [14], [1] and [2] for the beautiful mathematics involved in the limit of such kind of models.
3.1. The Sherrington-Kirkpatrick model. The Sherrington-Kirkpatrick model is a mean field spin glass model [8], [13], [15], [17]. Spins configurations are σ ∈ {−1, 1} N and the energy of the system is given by The partition function is defined as where the parameter β is the inverse temperature and the quenched free energy per site is defined by where the last equality defines the symbol α N (β). The variables (−βH N (σ)) σ∈{−1,1} N are a centered Gaussian random vector with covariance is the overlap between the configurations σ and σ . The corresponding distance according to (2.23) is given by is the Hamming distance. Notice that we have of course d N (σ, σ) = 0 but we have also d N (σ, −σ) = 0 since H N (σ) = H N (−σ). The fact that the right hand side of (3.7) is a distance (indeed a pseudo distance) is not trivial but follows directly since it is obtained by (2.23). Let us split the system into two subsystems S 1 , S 2 with respectively N 1 and N 2 vertices with N 1 + N 2 = N . We erase the interaction between spins that belong to different subsystems. We define the restricted Hamiltonians of the subsystems as where we remark that the sum is restricted to the indices belonging to the subsystems labeled k = 1, 2. Here and hereafter we continue to use the symbol σ both for the full configuration as well as for the configuration restricted to a subsystem.
When a configuration appears in an expression that is labeled by a subsystem then we mean the configuration restricted to the subsystem. For example d H N k (σ, σ ) and d N k (σ, σ ) are respectively the Hamming distance (3.7) and the distance (3.6) when the configuration is restricted to the subsystem k = 1, 2. Note that with this notation we have the key relationship Another important relationship is We apply Theorem 2.7 with the vectors The condition (2.29) becomes the super-Pythagorean relation that is equivalent to . (3.12) The above inequality is true by (3.9) and the concavity of the real function x → x (1 − x). By Theorem 2.7 and (3.10) we deduce and by sub-additivity and the classic Fekete Lemma we deduce that the limit of the quenched free energy per site exists The model has a hierarchical structure, as any spin configuration correspond to a leaf of a given rooted tree. We consider sequences of finite trees codified by finite strings of nonnegative integers. Let n ∈ N and k = (k 1 , . . . , k n ) a vector of nonnegative integers and call |k| := k 1 + . . . + k n . The tree T k is constructed as follows. The root (that is the unique node at level 0) is connected to 2 k1 nodes to compose the first level. Each node of the first level is connected to 2 k2 nodes of the second level; we have therefore 2 k1+k2 nodes on the second level and so on. The n-th level consists of 2 k1 2 k2 . . . 2 kn = 2 |k| leaves. If there exists a 1 ≤ j < n such that k j = 0, we mean that the nodes of the level j coincide with those of the level j − 1. A spin configuration σ ∈ {−1, 1} |k| is then attached to each leaf. The Hamiltonian is where ε The random variables ε's are attached to the edges of the tree. More precisely attached to the edges that connect the level i − 1 to the level i there is a family of i.i.d. centered Gaussian random variables with variance a i , one for each edge. When we write ε (σ) i we mean then the random variable associated to the unique edge that connects level i − 1 to level i and that belongs to the unique path from the leaf associated to σ to the root. When k i = 0 there are no edges from level i − 1 to level i and therefore we set ε (σ) i = 0. Then, H k (σ) σ∈{−1,1} |k| is a centered Gaussian random vector on the |k|-dimensional hypercube {−1, 1} |k| . We call l = l (σ, τ ) ∈ {0, 1, . . . n − 1} the level of the hierarchy at which the two paths from the leaves σ and τ of T k to the root merge. The two configurations share the same energy variables ε pointing out that the right hand side above is zero when l = 0. The corresponding metric according to (2.23) is given by The term inside the square root on the right hand side represents, up to a multiplicative factor, the minimal path length distance between the two leaves σ and τ on the tree when each edge between level i − 1 and i has a length given by a i . Since the graph is a tree the path is unique and the metric (3.17) is an ultrametric. We introduce, for notational convenience, the normalized distance 18) so that d k (σ, τ ) = |k|s k (σ, τ ) for any pair of configurations σ and τ . Both the correlations (3.16) and the metric (3.17) depends on the vector k and on the assignment of configurations to leaves. We will discuss soon this.
Like for the Sherrington-Kirkpatrick model, given an inverse temperature β, we introduce the disorder-dependent partition function (3.19) and the quenched average of the free energy per site We prove the existence of the thermodynamic limit of (3.20) under general assumptions when a parameter N is diverging and the vector k = k (N ) is growing in such a way that also n = n (N ) diverges. Contucci et al. [4] proved this fact when n is constant. This was obtained applying the same strategy of the Guerra-Toninelli interpolation method [9]; in particular, they used the inequality in Theorem 2.2. When n is no longer bounded this inequality fails while the inequality in Theorem 2.7 continues to work. We describe now more precisely the growing mechanism of the model and prove the existence of the thermodynamic limit. The α i 's define the tree T k(N ) through where · denotes the integer part.
corresponds to the lengths of the edges from the different levels and the variance of the associated random variables and satisfies the condition The exact values of the sums of the series are not really important and could be substituted just by summability conditions. Formula (3.22) follows by the fact that we ask that the number of edges connecting a given node at level i − 1 to nodes at level i grows exponentially like α N i . Observe that by (3.21), for any fixed N > 0 in k (N ) just a finite number of components is different from zero. We define  N ) , . . . , k n (N )). Then, a spin configuration σ ∈ {−1, 1} |k(N )| is assigned to each leaf. The method is actually arbitrary; indeed, the free energy of the system is obtained summing over all the configurations, thus getting rid of any dependence on the underlying choice.
We assign a spin configuration to each leaf of the tree as follows. At fixed N , we attach to every edge one or more labels of type (m, s), where s = ±1 and m ∈ {1, . . . , |k(N )|}. Given a leaf there exists a unique path toward the root. If this path crosses an edge having a label (m, s) then the configuration σ associated to the leave is such that σ (m) = s. We assign the label in such a way that every path meets all the labels m = 1, . . . , |k (N )| and such that different leaves have associated different configurations.
We embed the tree on a plane so that the root is on the top and the paths from the leaves to the root are going upwards. Moreover all the edges connecting a given node with the nodes at the successive level are ordered from left to right. Each edge connecting the level i − 1 to level i has exactly k i (N ) labels corresponding to the values m = Fix a node at level i − 1. Number each edge connecting this node with a node at level i with an integer number going from left to right from the value 0 to 2 ki(N ) − 1. The leftmost will correspond to 0 while the rightmost to 2 ki(N ) − 1. Do this for each node. Write these integers in binary code so that the leftmost edges are numbered with k i (N ) zeros and the rightmost with k i (N ) ones. In our setting, the 0 corresponds to the − sign and the 1 to the + sign. Then, we associate the lowest value of m to the most significant digit and the highest value of m to the less significant one. See Figure 3 for an example.  1, 1, 1) 3.2.2. Splitting the system. Let N > 0 and consider a pair of integers N 1 , N 2 such that N 1 + N 2 = N . We already know how to construct the trees T k(N ) , T k(N1) and T k(N2) . Their geometric structure is simply codified by the finite vectors k (N ), k (N 1 ), k (N 2 ) and we recall that, by definition, we have Notice that We associate the labels to the edges and leaves of the full system T k(N ) as in the previous section. The labels of the two subsystems T k(N1) and T k(N2) are instead attributed in a slightly different way in order to have different spins (different labels m) belonging to the two subsystems. The labels m attributed to the edges from level i − 1 to level i in the full system coincide with the set When we split the system into the two subsystems we assign to the edges that connect each node in the level i − 1 to the level i of the subsystem T k(N1) the labels i−1 j=1 k j (N ) + 1, . . . , i−1 j=1 k j (N ) + k i (N 1 ) while we assign to the edges that connect each node in the level i − 1 to the level i of the subsystem T k(N2) the labels (3.25) this is well defined. Once split the labels m into the two subsystems, the assignment of the label s = ± follow the same rule of the previous section. Since k i (N 1 ) + k i (N 2 ) may be strictly less than k i (N ), some of the labels m (i.e. some spins) may disappear in the splitting.
We discuss now the behavior of the distances. Consider two finite vectors k and k such that k i ≤ k i for any i. We assign the labels to T k in the usual way while instead we assign the labels to T k as follows. We assign to the edges that connect each node in the level i − 1 to the level i of T k arbitrarily k i of the k i labels in T k . The assignment of the labels s = ± follows then the usual rule.
We call respectively d k and d k the metrics defined by formula (3.17) for the two trees T k and T k and s k , s k the corresponding normalized distances (see (3.18)). As before given two spin configurations σ, τ ∈ {−1, 1} |k| we call again σ, τ ∈ {−1, 1} |k | the same configurations but restricted just to the labels assigned to the edges in T k . We have the following.
Lemma 3.1. Consider two finite vectors k ≤ k and the corresponding trees T k and T k with configurations of spins associated to the leaves as above. Then we have Proof. Consider the tree T k , two configurations σ, τ associated to two leaves and their corresponding geodetic path. Let us now consider a new finite vector k obtained by k simply decreasing by one just a single component and preserving all the remaining ones, i.e. k i = k i − 1 and k j = k j for all j = i. Suppose that the label m that is missing in T k is m * . The tree T k with the corresponding labeling is obtained from T k and the original labeling simply as follows. All the edges connecting nodes at level i − 1 to nodes at level i in T k can be paired into pairs having exactly the same labels apart the one corresponding to m * . The two paired edges will have labels respectively (m * , +) and (m * , −). If we identify each paired couple of edges, and consequently we identify too the subtrees starting from the identified nodes, we get a tree that coincides with T k with exactly the same assignments of labels. In particular, the leaves associated to σ, τ in the new tree will be exactly the original ones after the identification. Finally the geodetic path too remains the same after the identification (see e.g. Figure 4). Since the identification procedure can only shorten this path we have the statement of the lemma when k is obtained by k decreasing by one just one of its components. We finish the proof observing that any k ≤ k can be obtained by k after a finite numbers of iterations of this type.
that is equivalent to the super-Pythagorean condition where the last equality defines the symbol α N (β). We need a preliminary Lemma. Let us call γ i := log αi log 2 > 0. Observe that by definition we have Proof. For any finite k we have The right hand side of the above equation is 1. The left hand side converges when N → ∞ to k i=1 γ i . Taking now the limit on k → ∞ we deduce the statement of the Lemma.
We can now prove the existence of the limit for quenched free energy per site of a GREM model with infinite levels.
Theorem 3.4. Under the hypothesis (H1) and (H2), there exists the limit when N → ∞ of the density of free energy (3.30) defined on T k(N ) , in the sense that there exists the following limit that coincides with an infimum Proof. We apply the interpolation method for the Gaussian random vectors H k(N ) (σ) and H k(N1) (σ)+H k(N2) (σ) that are both labeled by the configurations σ ∈ {−1, 1} |k(N )| . The Gaussian random variables used to compute H k(N ) , H k(N1) and H k(N2) are all independent among them. Note that since in the splitting some spins are lost then the second Gaussian random vector is degenerate.
We have the following identity The last term is due to the fact that some spins may be lost in the splitting. By Remark 3.2, we can apply Theorem (2.7) getting Since the last term in the above inequality is nonnegative we obtain that the sequence α N (β) is subadditive. By Fekete's Lemma we deduce that there exists the limit and we get the main statement of the Theorem.
It remains just to prove that the limit is strictly bigger than −∞. This follows by the summability of the variances a i 's. Indeed, we prove that for any N > 0, −βF N (β) is bounded from above. We have where we used Jensen's inequality. Since the ε (σ) i are independent, the expectation value in the last row is the product of generating functions: where we used the fact that Just as a remark we show in Lemma 3.7 in the Appendix that the third term in the right hand side of (3.35) is negligible when N is large. This fact is irrelevant for the proof but it is interesting in itself since for different models we could have a similar situation but with the wrong sign and a bound of this type could allow to apply the generalized subadditive lemmas in [5].
Given any x ∈ R n we have x = i c x i v (i) and we define the matrix O by setting Since we obtain as before that O is orthogonal. With an argument similar to the one in (3.43) it is easy to obtain that w (i) = Ov (i) .  We show here that the extra terms in (3.35) are indeed negligible when N is large. By definition we have that if i ∈ J (N ) then 1 N ≤ γ i < 1 N −1 . We have therefore We deduce therefore that the series on the right hand side has to be convergent. The series on the right hand side of (3.52) is convergent, thus we deduce the statement by the dominated convergence Theorem. This, together with (3.51) and Lemma 3.3, concludes the proof.