Abstract
The main result in this paper is motivated by the Mézard–Parisi ansatz which predicts a very special structure for the distribution of spins in diluted mean field spin glass models, such as the random \(K\)-sat model at positive temperature. Using the fact that one can safely assume the validity of the Ghirlanda–Guerra identities in these models, we prove hierarchical exchangeability of pure states for the asymptotic Gibbs measures, which allows us to apply a representation result for hierarchically exchangeable arrays recently proved in Austin and Panchenko in Probab. Theory Relat. Fields 2013. Comparing this representation with the predictions of the Mézard–Parisi ansatz, one can see that the key property still missing is that the multi-overlaps between pure states depend only on their overlaps.
1 Introduction
Many mean field spin glass models are described by a random Hamiltonian \(H_N(\sigma )\) on the space of spin configurations \(\Sigma _N = \{-1,+1\}^N\) (see [12, 29] or [32]). For example, in the classical Sherrington–Kirkpatrick model [27],
where \((g_{i,j})_{i,j\ge 1}\) are i.i.d. standard Gaussian random variables, while in the random \(K\)-sat model,
where \(\alpha >0\) is called the connectivity parameter, \(\pi (\alpha N)\) is a Poisson random variable with the mean \(\alpha N\), \(({\varepsilon }_{j,k})_{j,k\ge 1}\) are independent Rademacher random variables, and the indices \((i_{j,k})_{j,k\ge 1}\) are independent uniform on \(\{1,\ldots ,N\}\). The random \(K\)-sat model is an example of a so called diluted model (diluted refers to the fact that the (hyper)graph of interactions between spins is sparse), and the main goal of this paper is to make some progress toward the Mézard–Parisi ansatz for diluted models originating in [13]. The reason the above two models are called mean field models is because the distributions of their Hamiltonians are invariant under the permutations of coordinates \(\sigma _1,\ldots ,\sigma _N\). This property is called symmetry between sites.
The main goal in spin glass models is usually to compute the limit of the free energy
as \(N\rightarrow \infty \), for all inverse temperature parameters \(\beta >0\). In the Sherrington–Kirkpatrick model, the formula for the free energy was famously invented by Parisi in [23, 24] and proved rigorously by Talagrand in [30] following important work of Guerra in [9], who showed that the Parisi formula is an upper bound on the free energy. A more recent proof of the Parisi formula in [18] was based on understanding the structure of the Gibbs measure in the infinite-volume limit predicted by the physicists in the eighties (see [12]; this direction of research was jump-started in [3]).
For diluted models, like the random \(K\)-sat model, the analogue of the Parisi formula for the free energy was proposed by Mézard and Parisi in [13] in the so called \(1\)-RSB case (a replica symmetric solution was proposed earlier in [14]), but it was also clear what the natural extension of their solution should look like in the general case. A detailed description of the general Mézard–Parisi formula can be found, for example, in [15]. The fact that this formula gives an upper bound on the free energy was proved by Franz and Leone in [7], which was the analogue of Guerra’s bound [9] in the SK model. One approach to proving the matching lower bound was given in [19] (see Theorem \(2\) and, in particular, Section 2.2 there), where the problem was reduced via an analogue of the Aizenman–Sims–Starr scheme [2] to showing that the structure of the Gibbs measure in the infinite-volume limit is described by the functional order parameter proposed by Mézard and Parisi in [13]. Our main result here combined with the main result in [4] can be viewed as a partial progress in this direction, and at the end of the introduction we will explain what the remaining gap is. Let us mention that the Mézard–Parisi ansatz has been proved in full generality in the setting of the Sherringon-Kirkpatrick model and \(p\)-spin models (see Chapter 4 in [21]), but the proof heavily relies on the special Gaussian nature of the Hamiltonian (1). In diluted models, where this ansatz is of real interest, the problem is still open in general, with one special case handled recently in [22].
In this paper, we will not work with any particular model and will simply assume that the asymptotic Gibbs measures satisfy the Ghirlanda–Guerra identities [8]. In the next section we will review how the Ghirlanda–Guerra identities arise in spin glass models and, as an example, show that one can safely assume their validity in the random \(K\)-sat model. The Ghirlanda–Guerra identities will be stated in this paper in a slightly more general form than usual to accommodate the more general notion of the asymptotic Gibbs measures in models other than the SK model but, of course, one gets this more general form for free from the usual proof of these identities.
Let us begin by recalling the definition of asymptotic Gibbs measures introduced in [19] (see also [5] for a different approach via exchangeable random measures). The Gibbs measure \(G_N\) corresponding to the Hamiltonian \(H_N(\sigma )\) is a (random) probability measure on \(\{-1,+1\}^N\) defined by
where the normalizing factor \(Z_N\) is called the partition function. Let \((\sigma ^\ell )_{\ell \ge 1}\) be an i.i.d. sequence of replicas from the Gibbs measure \(G_N\) and let \(\mu _N\) be the joint distribution of the array of all spins on all replicas \((\sigma _i^\ell )_{1\le i\le N, \ell \ge 1}\) under the average product Gibbs measure \(\mathbb {E}G_N^{\otimes \infty }\),
for any \(n\ge 1\) and any \(a_i^\ell \in \{-1,+1\}\). We extend \(\mu _N\) to a distribution on \(\{-1,+1\}^{\mathbb {N}\times \mathbb {N}}\) by setting \(\sigma _i^\ell =1\) for \(i\ge N+1.\) Let \(\mathcal{M}\) be the sets of all possible limits of \((\mu _N)\) over subsequences with respect to the weak convergence of measures on the compact product space \(\{-1,+1\}^{\mathbb {N}\times \mathbb {N}}\). Because of the symmetry between sites in mean field models, these measures inherit from \(\mu _N\) the invariance under the permutation of both spin and replica indices \(i\) and \(\ell .\) By the Aldous–Hoover representation [1, 10], for any \(\mu \in \mathcal{M}\), there exists a measurable function \(s:[0,1]^4\rightarrow \{-1,+1\}\) such that \(\mu \) is the distribution of the array
where the random variables \(w,(u_\ell ), (v_i), (x_{i,\ell })\) are i.i.d. uniform on \([0,1]\). The function \(s\) is defined uniquely for a given \(\mu \in \mathcal{M}\) up to measure-preserving transformations (Theorem 2.1 in [11]), so we can identify the distribution \(\mu \) of array \((s_i^\ell )\) with \(s\). Since \(s\) takes values in \(\{-1,+1\}\), the distribution \(\mu \) can actually be encoded by the function
where \(\mathbb {E}_x\) is the expectation in \(x\) only. The last coordinate \(x_{i,\ell }\) in (6) is independent for all pairs \((i,\ell )\), so it plays the role of “flipping a coin” with the expected value \(\sigma (w,u_\ell ,v_i)\). In fact, given the function (7), we can, obviously, redefine \(s\) by
without affecting the distribution of the array \((s_i^\ell )\). This allows us to separate the randomness of the last coordinate \(x_{i,\ell }\) from the randomness of the array \((\sigma (w,u_\ell ,v_i))\) generated by the function \(\sigma (w,u,v)\).
Then we change the perspective as follows. Let \(du\) and \(dv\) denote the Lebesgue measure on \([0,1]\) and let us define a (random) probability measure
on the space of functions of \(v\in [0,1]\),
(the unit ball of \(L^\infty \)), equipped with the topology of \(L^2([0,1], dv)\). We will denote by \(\sigma ^1\cdot \sigma ^2\) the scalar product in \(L^2([0,1], dv)\) and by \(\Vert \sigma \Vert \) the corresponding \(L^2\) norm. The random measure \(G\) in (9) is what we call the asymptotic Gibbs measure, which encodes the limit \(\mu \in \mathcal{M}\) above. The whole process of generating spins according to \(\mu \in \mathcal{M}\) can now be visualized in several steps. First, we generate the Gibbs measure \(G=G_w\) using the uniform random variable \(w\). An i.i.d. sequence \(\sigma ^\ell = \sigma (w,u_{\ell },\cdot )\) for \(\ell \ge 1\) of replicas from \(G\) gives us a sequence of functions in \(H\). Then, we plug in i.i.d. uniform random variables \((v_i)_{i\ge 1}\) into these functions to obtain the array \(\sigma ^\ell (v_i) = \sigma (w,u_\ell ,v_i)\) and, finally, use it to generate spins as in (8). From now on, we will keep the dependence of \(G\) on \(w\) implicit, denote i.i.d. replicas from \(G\) by \((\sigma ^\ell )_{\ell \ge 1}\) (which are now functions on \([0,1]\)) and no longer explicitly use the random variables \((u_{\ell })\), and denote the sequence of spins (8) corresponding to the replica \(\sigma ^\ell \) by
Given \(n\ge 1\) and replicas \(\sigma ^1,\ldots , \sigma ^n\), we will denote the array of spins corresponding to these replicas by
We will denote by \(\langle \cdot \rangle \) the average with respect to \(G^{\otimes \infty }\) (corresponding to the average in \((u_\ell )_{\ell \ge 1}\) in the sequence \((\sigma (w,u_{\ell },\cdot ))_{\ell \ge 1}\)) and by \(\mathbb {E}\) the expectation with respect to \(w\), \((v_i)\) and \((x_{i,\ell })\). In the definition of \(\langle \cdot \rangle \) one can also include averaging in the random variables \((x_{i,\ell })\), since they depend on the replica index \(\ell \), and such convention would be especially necessary if we dealt with the cavity computations (see e.g. [19, 22]), when averaging in spins \(S(\sigma ^\ell )\) can also appear in the denominator. However, throughout this paper this will not happen and, by the linearity of expectation, we can think of averaging in \((x_{i,\ell })\) as a part of the expectation \(\mathbb {E}\).
Because of the geometric nature of the asymptotic Gibbs measures \(G\) as measures on the subset of \(L^2([0,1],dv)\), the distance and scalar product between replicas play a crucial role in the description of the structure of \(G\). We will denote the scalar product between replicas \(\sigma ^\ell \) and \(\sigma ^{\ell '}\) by \(R_{\ell ,\ell '} = \sigma ^\ell \cdot \sigma ^{\ell '}\), which is more commonly called the overlap of \(\sigma ^\ell \) and \(\sigma ^{\ell '}\). Let us notice that the overlap \(R_{\ell ,\ell '}\) is a function of spin sequence (11) generated by \(\sigma ^\ell \) and \(\sigma ^{\ell '}\) since, by the strong law of large numbers,
almost surely. We mention this here just to emphasize an obvious point that the array \(S^n\) in (12) contains much more information about the replicas on the space \(H\) than just their overlaps. For example, one can similarly compute the multi-overlaps between replicas,
From now on we will assume that the measure \(G\) satisfies the Ghirlanda–Guerra identities, which means that for any \(n\ge 2,\) any bounded measurable function \(f\) of the spins \(S^n\) in (12) and any bounded measurable function \(\psi \) of one overlap,
Another way to express the Ghirlanda–Guerra identities is to say that, conditionally on \(S^n\), the law of \(R_{1,n+1}\) is given by the mixture
where \(\zeta \) denotes the distribution of \(R_{1,2}\) under the measure \(\mathbb {E}G^{\otimes 2}\),
The identities (15) are usually proved for the function \(f\) of the overlaps \((R_{\ell ,\ell '})_{\ell ,\ell '\le n}\) instead of \(S^n\), but exactly the same proof yields (15) as well (see e.g. Section 3.2 in [21]). It is well known that these identities arise from the Gaussian integration by parts of a certain Gaussian perturbation Hamiltonian against the test function \(f\), and one is free to choose this function to depend on all spins and not only overlaps.
In this paper we will be interested to say something about the distribution of the array of spins generated by the Gibbs measure \(G\), but if one is only interested in the behavior of the overlaps then it is now known that the Ghirlanda–Guerra identities completely describe the measure in this sense in terms of the functional order parameter \(\zeta \) in (17). Let us first list several purely geometric consequences.
-
(i)
([29] or Theorem 2.16 in [21]) By Talagrand’s positivity principle, the overlaps can take only nonnegative values, \(\zeta ([0,\infty ))=1\).
-
(ii)
([16] or Theorem 2.15 in [21]) With probability one over the choice of random measure \(G\) the following holds. If \(q^*\) is the largest point in the support \(\text{ supp }(\zeta )\) of measure \(\zeta \) then \(G(\sigma : \Vert \sigma \Vert ^2 = q^*)=1\). If \(\zeta (\{q^*\})>0\) then \(G\) is purely atomic, otherwise, \(G\) has no atoms.
-
(iii)
([20] or Theorem 2.14 in [21]) With probability one, the support of \(G\) is ultrametric, i.e. \(G^{\otimes 3}(R_{2,3} \ge \min (R_{1,2},R_{1,3}))=1\).
When \(G\) is purely atomic, its atoms are called pure states. Otherwise, we will define pure states in some approximate sense that will be explained more precisely below. By ultrametricity, for any \(q\ge 0\), the relation defined by
is an equivalence relation on the support of \(G\). We will call these \(\sim _q\) equivalence clusters simply \(q\)-clusters. Throughout the paper we will use the convention that, whenever we write \(\sigma \), it belongs to the support of \(G\) rather than the ambient space \(H\).
To state our main result, let us first describe what is called the \(r\) -step replica symmetry breaking (RSB) approximation, which means that we will group the values of the overlap into \(r+1\) groups. Let us consider integer \(r\ge 1\) that will be fixed throughout the paper. Consider an infinitary rooted tree of depth \(r\) with the vertex set
where \(\mathbb {N}^0 = \{*\}\), \(*\) is the root of the tree and each vertex \(\alpha =(n_1,\ldots ,n_p)\in \mathbb {N}^{p}\) for \(p\le r-1\) has children
for all \(n\in \mathbb {N}\). Each vertex \(\alpha \) is connected to the root \(*\) by the path
We will denote the set of vertices in this path (excluding the root) by
We will denote by \(|\alpha |\) the distance of \(\alpha \) from the root (the same as cardinality of \(p(\alpha )\)). We will write \(\alpha \succ \beta \) if \(\beta \in p(\alpha )\cup \{*\}\) and say that \(\alpha \) is a descendant of \(\beta \), and \(\beta \) is an ancestor of \(\alpha \). We will sometimes denote the set of leaves \(\mathbb {N}^r\) of \(\mathcal{A}\) by \(\mathcal{L}(\mathcal{A})\). For any \(\alpha , \beta \in \mathcal{A}\), let
be the number of common vertices in the paths from the root to the vertices \(\alpha \) and \(\beta \). In other words, \(\alpha \wedge \beta \) is the distance of the lowest common ancestor of \(\alpha \) and \(\beta \) from the root.
Let us now consider \(r+1\) disjoint intervals of the type
such that
We will allow the possibility of \(I_p = \{q_p\}\) only when the point \(q_p\) is an isolated atom ‘from the right’, namely, \(\zeta (\{q_p\})>0\) and \(\zeta ((q_p,q_p +{\varepsilon }))=0\) for some \({\varepsilon }>0.\) The idea here is that we will use the intervals \(I_p\) to discretize the values of the overlap (so we imagine them as being small), but when we have such an isolated atom, we can simply include it if we wish. For example, when the overlap takes only finitely many values, we can ‘discretize exactly’ by considering only these values.
Without loss of generality, we can also assume that \(q_p < q_{p+1}\) for all \(p\le r-1\), and \(q_0\ge 0\) by Talagrand’s positivity principle. Later on we will need the sequence
such that \(\zeta _p - \zeta _{p-1} = \zeta (I_p)\) for \(0\le p\le r.\) Let us now enumerate all the \(q_p\)-clusters defined by (18) according to Gibbs’ weights as follows. Let \(H_{*}\) be the entire support of \(G\) so that \(V_* = G(H_*) =1\). Next, the support is split into \(q_1\)-clusters \((H_n)_{n\ge 1}\), which are then enumerated in the non-increasing order of their weights \(V_n = G(H_n)\),
We then continue recursively over \(p\le r-1\) and enumerate the \(q_{p+1}\)-subclusters \((H_{\alpha n})_{n\ge 1}\) of a cluster \(H_\alpha \) for \(\alpha \in \mathbb {N}^p\) in the non-increasing order of their weights \(V_{\alpha n} = G(H_{\alpha n})\),
It is a well-known fact that each cluster \(H_\alpha \) is split into infinitely many subclusters \((H_{\alpha n})_{n\ge 1}\) and their weights are all different and not equal to zero – this is another consequence of the Ghirlanda–Guerra identities. Therefore, all the inequalities in (25) and (26) are strict. More specifically, it is well known that the Ghirlanda–Guerra identities imply that the cluster weights
are generated by the Ruelle probability cascades (RPC) [26]. This will be reviewed in Sect. 4 (see also Chapter 2 in [21]). We will call the \(q_r\)-clusters \(H_\alpha \) indexed by the leaves \(\alpha \in \mathcal{L}(\mathcal{A}) = \mathbb {N}^r\) the pure states. Of course, if \(\zeta (\{q^*\})>0\) then one can take \(I_r = \{q^*\}\) in (22) to ensure that the pure states are exactly the atoms of \(G\). (For a way to construct pure states for the non-asymptotic Gibbs measure \(G_N\) in (4), see [31].)
Notice that the diameter of a pure state \(H_\alpha \) for \(\alpha \in \mathbb {N}^r\) can be bounded in \(L_2\) by
and when \(q_r\) is close to \(q^*\), these clusters are small and can be well approximated by one point, for example, the \(G\)-barycenter of the cluster. We can take these barycenters as an approximate definition of pure states but, in order not to lose any information, we will encode a pure state by an infinite sample as follows. First of all, notice that sampling from \(G\) can now be done in two steps:
-
1.
Choose \(\alpha \in \mathcal{L}(\mathcal{A}) = \mathbb {N}^r\) according to the weights \((V_\alpha )_{\alpha \in \mathbb {N}^r}\).
-
2.
Sample from the pure state \(H_\alpha \) according to the conditional distribution
$$\begin{aligned} G_\alpha (\ \cdot \ ) = \frac{G( \ \cdot \ \cap H_\alpha )}{G(H_\alpha )}. \end{aligned}$$(28)
For each \(\alpha \in \mathcal{L}(\mathcal{A}) = \mathbb {N}^r\), let us consider an i.i.d. sample \((\sigma ^{\alpha \ell })_{\ell \ge 1}\) with the distribution \(G_\alpha \) and let these samples be independent over such \(\alpha \). As in (11), let us consider the sequence of spins
generated by \(\sigma ^{\alpha \ell }\) and let
This array of spins completely encodes the pure state \(H_\alpha \) for all practical purposes, if we remember that our main object of interest is the array of spins (6) generated by the measure \(G\).
To state our main result, it remains to recall the definition of hierarchical exchangeability introduced in [4]. Consider the following family of maps on the leaves \(\mathbb {N}^r\) of the tree \(\mathcal{A}\),
As explained in [4], the condition \(\pi (\alpha )\wedge \pi (\beta ) = \alpha \wedge \beta \) simply means that the genealogy on the tree is preserved after the permutation and such \(\pi \) can be realized as a recursive rearrangement of children of each vertex starting from the root. We say that an array of random variables \((X_\alpha )_{\alpha \in \mathbb {N}^r}\) taking values in a standard Borel space is hierarchically exchangeable if
for all \(\pi \in \mathcal{H}\). Our main result will be the following structure theorem for the Gibbs measure \(G\).
Theorem 1
If (15) holds then the array (30) of spins \((S_\alpha )_{\alpha \in \mathbb {N}^r}\) within pure states is hierarchically exchangeable and independent of the cluster weights \((V_\alpha )_{\alpha \in \mathcal{A}}\) in (27).
If we write \(S_\alpha = (S_{\alpha ,i})_{i\ge 1}\), by making the dependence on the spin index \(i\) in (29) explicit, then it is obvious that the distribution of the array \((S_{\alpha ,i})\) is also invariant under the permutation of spins,
for all \(\pi \in \mathcal{H}\) and all bijections \(\rho :\mathbb {N}\rightarrow \mathbb {N}\). The Aldous–Hoover representation was generalized to such hierarchically exchangeable arrays in [4] and, in particular, Theorem 2 in [4] implies the following.
Corollary 1
If (15) holds then the array \((S_{\alpha ,i})_{\alpha \in \mathbb {N}^r, i\in \mathbb {N}}\) can be generated in distribution as
where \(f: [0,1]^{2(r+1)} \rightarrow \{-1,+1\}^\mathbb {N}\) is a measurable function and \(\omega _\alpha ,\omega _\alpha ^i\) for \(\alpha \in \mathcal{A}\) and \(i\in \mathbb {N}\) are i.i.d. random variables with the uniform distribution on \([0,1]\).
Note a slight difference in notation here and in [4] – in this paper we chose not to include the root \(*\) in the path (20) while in [4] it was included. This is why we write \(\omega _*\) and \(\omega _*^i\) in (34) separately. Let us now explain the connection of the representation (34) to the Mézard–Parisi ansatz and what seems to be the main obstacle left. First of all, if we denote the barycenter of the pure state \(H_\alpha \) by
then, by the strong law of large numbers, (29) implies that
almost surely. In the case when the pure state consists of one point \({\bar{\sigma }}^\alpha \) (for example, we mentioned above that if \(\zeta (\{q^*\})>0\) and we choose \(I_r = \{q^*\}\) then all pure states will be points) the vector \(m^\alpha \) is called the magnetization inside the pure state \(\alpha \), otherwise, we can view it as an approximate notion of magnetization. The representation (34) and (36) imply that
for some measurable function \(m: [0,1]^{2(r+1)} \rightarrow [-1,1]\).
What the Mézard–Parisi ansatz predicts is that, when \(r\) is getting large and all the intervals \(I_p\) in (22) are getting small (which means that the \(r\)-step RSB scheme gives a good approximation of the overlap distribution), the magnetizations inside the pure states can be generated approximately (in the sense of distribution) by
for some measurable function \(m: [0,1]^{r+1} \rightarrow [-1,1]\). As we already mentioned above, only the so called \(1\)-RSB case corresponding to \(r=1\) was described in detail in [13] (see Section V there), but the general case is just a natural extension. The function \(m\) is the order parameter of the Mézard–Parisi ansatz in the sense that one can express the free energy by some variational formula in terms of \(m\). Obviously, (38) can hold only if the spin magnetizations are generated independently over the spin index \(i\ge 1\) within pure states (which was, in fact, an assumption in [13]), but this assumption can be relaxed and the Mézard–Parisi formula for the free energy can be proved using the approach in Theorem \(2\) in [19] under a slightly weaker hypothesis that the magnetizations inside the pure states are generated approximately by
for some measurable function \(m: [0,1]^{r+2} \rightarrow [-1,1]\). The difference between (37) and (39) can be informally expressed as follows. In (39), we have one (random) function \(m(\omega _*, \ \cdot \ ,\ \cdot \ )\) that is used to generate spin magnetizations \(m^\alpha _i\) in each pure state \(\alpha \) using the randomness \(\omega _*^i, (\omega _{\beta }^i)_{\beta \in p(\alpha )}\) along the path from the root to \(\alpha \). In (37), for each pure state \(\alpha \) we first generate its own function \(m(\omega _*, (\omega _{\beta })_{\beta \in p(\alpha )}, \ \cdot \ ,\ \cdot \ )\) in a hierarchically symmetric fashion and then use it to generate spin magnetizations inside that pure state.
A possible way to go from (37) to (39) is to show that multi-overlaps are functions of the overlaps, which means the following. Let us consider \(n\) pure state indices \(\alpha _1,\ldots ,\alpha _n \in \mathbb {N}^r\). If we compare the representations of \(m^\alpha _i\) in terms of the barycenter \({\bar{\sigma }}^\alpha \) in (36) and in terms of the function \(m\) in (37) then the so called multi-overlap between these \(n\) barycenters can be written as
where \(\mathbb {E}_i\) denotes the average in the random variables that depend on the spin index \(i\). If (39) holds then, similarly,
which clearly depends only on \((\alpha _\ell \wedge \alpha _{\ell '})_{1\le \ell ,\ell '\le n}\). In the opposite direction, it is also not difficult to show that if \(R_{\alpha _1,\ldots ,\alpha _n}\) depends only on \((\alpha _\ell \wedge \alpha _{\ell '})_{1\le \ell ,\ell '\le n}\) for all \(n\ge 2\) then (37) can be replaced by (39). Of course, in the \(r\)-step RSB approximation, \(\alpha _\ell \wedge \alpha _{\ell '}\) describes the overlap \({\bar{\sigma }}^{\alpha _\ell }\cdot {\bar{\sigma }}^{\alpha _{\ell '}}\) only approximately, so the statement “multi-overlaps are functions of overlaps” should be understood in an approximate sense for a finite \(r\)-step RSB approximation and should only become exact as \(r\) goes to infinity, or if the distribution of the overlap is indeed concentrated on \(r+1\) points.
In the next section, we will begin with a review of the Ghirlanda–Guerra identities. In Sect. 3, we will prove some analogue of Theorem 1 at the level of the sample from the Gibbs measure rather than working with the pure states directly. In Sect. 4, we will prove a technical result about the weights in the RPC and, in Sect. 5, we will deduce Theorem 1 from the main result in Sect. 3 by sending the sample size to infinity.
2 The Ghirlanda–Guerra identities
In this section, we will explain in what sense the Ghirlanda–Guerra identities are valid in diluted models, and we will use the example of the random \(K\)-sat model (2) for this purpose. For each \(p\ge 1\), let us consider the process \(g_p(\sigma )\) on \(\Sigma _N = \{-1,+1\}^N\) given by
where \((g_{i_1,\ldots , i_p})\) are i.i.d. standard Gaussian random variables, and define
for parameters \((x_p)_{p\ge 1}\) that take values in the interval \(x_p\in [0,3]\) for all \(p\ge 1\). It is easy to check that the variance of this Gaussian process satisfies \(\mathbb {E}g(\sigma )^2 \le 3.\) Given the Hamiltonian \(H_N(\sigma )\) in (2), let us consider the perturbed Hamiltonian
for some parameter \(s\ge 0.\) It is easy to see, using Jensen’s inequality on each side, that
Therefore, if we let \(s\) in (42) depend on \(N\), \(s=s_N\), in such a way that
then the limit of the free energy is not affected by the perturbation term \((s/\beta )g(\sigma ).\) Since our ultimate goal is to find the formula for the free energy in the limit \(N\rightarrow \infty \), adding a perturbation term is allowed if it helps us in some other way. Of course, the real purpose of adding the perturbation term is to obtain the Ghirlanda–Guerra identities for the Gibbs measure
which now corresponds to the perturbed Hamiltonian (42). In other words, even though the perturbation term does not affect the free energy, it will affect the Gibbs measure ‘in a good way’ by regularizing it and forcing it to satisfy the Ghirlanda–Guerra identities. From now on, everything is defined with respect to this Gibbs measure corresponding to the perturbed Hamiltonian, such as any limit \(\mu \in \mathcal{M}\) and the corresponding asymptotic Gibbs measure \(G\).
Since we will soon pass to the limit \(N\rightarrow \infty \), it should not cause any confusion if we temporarily denote by \(\langle \cdot \rangle \) the average with respect to \(G_N^{\otimes \infty }\) in (44), let \((\sigma ^\ell )_{\ell \ge 1}\) be a sequence of replicas from \(G_N\) and denote by
the overlap between replicas \(\sigma ^\ell \) and \(\sigma ^{\ell '}\). Let us consider the function
viewed as a random function \(\varphi = \varphi \bigl ((x_p)\bigr )\) of the parameters \((x_p)\) in (41), and suppose that
for some function \(v_N(s)\) that describes how well \(\varphi ((x_p))\) is concentrated around its expected value uniformly over all possible choices of the parameters \((x_p)\) from the interval \([0,3].\) Now, for any \(n\ge 2, p\ge 1\) and any function \(f=f(\sigma ^1,\ldots ,\sigma ^n)\) on \(\Sigma _N^n\) uniformly bounded by \(1\), let us define
Let us now think of \((x_p)_{p\ge 1}\) as a sequence of i.i.d. random variables with the uniform distribution on \([1,2]\) and denote by \(\mathbb {E}_x\) the expectation with respect to such sequence. Here is one common formulation of the Ghirlanda–Guerra identities from Theorem 3.2 in [21].
Theorem 2
Suppose that the parameter \(s\) in (42) depends on \(N\), \(s=s_N\), and the sequence \((s_N)\) satisfies \(\lim _{N\rightarrow \infty } s_N=\infty \) and \(\lim _{N\rightarrow \infty } s_N^{-2} v_N(s_N) = 0\). Then
for any \(p\ge 1, n\ge 2\) and any measurable function \(f\) such that \(\Vert f\Vert _\infty \le 1\).
Of course, since the space \(\Sigma _N\) changes with \(N\), the function \(f\) here is really a sequence \(f=f_N\) such that \(\Vert f_N\Vert _\infty \le 1\) for all \(N\ge 1\).
We will show below that, in the setting of the \(K\)-sat model, one can find a sequence \((s_N)\) that satisfies (43) and the conditions in Theorem 2. However, first let us recall how one can go from (49) to (15) for any asymptotic Gibbs measure \(G\) that arises in the limit (as explained in the introduction) from a sequence of measures \(G_N\) that satisfy (49). Simply, we consider the collection \(\mathcal F\) of all triples \((f,n,p)\) such that \(p\ge 1, n\ge 2\) and \(f = \prod _{(i,\ell )\in F}\sigma _i^\ell \) for a finite subset \(F\subseteq \mathbb {N}\times \{1,\ldots , n\}.\) This is a countable collection, so we can enumerate it, \(\mathcal{F} = \{(f_j,n_j,p_j) \ |\ j\ge 1\}\), and consider
Then (49) implies that \(\lim _{N\rightarrow \infty } \mathbb {E}_x \Delta _N(x) = 0\) and, as a consequence, we can choose a sequence \(x^N = (x_p^N)_{p\ge 1}\) changing with \(N\) such that \(\lim _{N\rightarrow \infty } \Delta _N(x^N) = 0.\) Therefore, if we now define the perturbation (41) and the Gibbs measure (44) with this choice of parameters \(x^N\) that depend on \(N\), we get
for any \((f,n,p)\in \mathcal{F}\). It should be obvious that this implies (15) for any asymptotic Gibbs measure \(G\) corresponding to a limit \(\mu \in \mathcal M\) of \((\mu _N)\) in (5) over any subsequence. The fact that the overlaps in (45) converge in distribution to the overlap in (13) over the same subsequence can be easily seen by computing their joint moments using the symmetry between sites (see the introduction in [19] for details). Moreover, the identities (15) for \(\psi (x) = x^p\) and \(f\) given by a product of finitely many spins, clearly, imply (15) for any \(f\) and \(\psi \). (Finally, let us point out that, even though the Ghirlanda–Guerra identities are typically proved via the above perturbation, in the mixed \(p\)-spin models they can be proved without any perturbation, see [17] or Section 3.7 in [21].)
Let us check the conditions of Theorem 2 in the random \(K\)-sat model.
Lemma 1
For the \(K\)-sat Hamiltonian (2), both (43) and the conditions in Theorem 2 are satisfied with \(s_N = N^{\gamma }\) for any \(\gamma \in (1/4, 1/2).\)
Proof
We need to estimate the left hand side of (47) with \(H_N(\sigma )\) given by (2). We will separate various sources of randomness as follows. For a function \(\varphi =\varphi (X,Y)\) of two independent random variables \(X\) and \(Y\), by the triangle inequality and Jensen’s inequality,
where \(\mathbb {E}_X\) and \(\mathbb {E}_Y\) denote the expectation in \(X\) and \(Y\) only. Similarly, for a function \(\varphi =\varphi (X,Y,Z)\) of three independent random variables,
In the case of the function (46), these three sources of randomness will come from the perturbation term \(g(\sigma )\), the Poisson random variable \(\pi (\alpha N)\), and the sequence of Rademacher random variables \(({\varepsilon }_{j,k})\) and random indices \((i_{j,k})\). We will write the corresponding expectations by \(\mathbb {E}_g\), \(\mathbb {E}_\pi \) and \(\mathbb {E}_\theta \) correspondingly, so that
In each term, we will first fix all other randomness and estimate \(\mathbb {E}_g | \varphi - \mathbb {E}_g \varphi |\), \(\mathbb {E}_\pi |\varphi - \mathbb {E}_\pi \varphi |\) and \(\mathbb {E}_\theta | \varphi - \mathbb {E}_\theta \varphi |\). The first one can be estimated using the standard Gaussian concentration (see e.g. Theorem 1.2 in [21]). Since the variance of \(sg(\sigma )\) is bounded by \(3s^2\), we get \(\mathbb {E}_g | \varphi - \mathbb {E}_g \varphi | \le L s\) for some absolute constant \(L\). This gives \(\mathbb {E}| \varphi - \mathbb {E}_g \varphi | \le Ls\). To estimate the last two terms, we will use the fact that each term in (2) for a fixed \(k\),
is bounded uniformly by \(1\). First of all, if \(\pi _1\) and \(\pi _2\) are two independent copies of \(\pi (\alpha N)\), and we think of \(\varphi \) for a moment as a function \(\varphi (\pi (\alpha N))\) of \(\pi (\alpha N)\) only, then
This gives \(\mathbb {E}|\varphi - \mathbb {E}_\pi \varphi | \le 2\beta \sqrt{\alpha N}\). Finally, to estimate \(\mathbb {E}_\theta | \varphi - \mathbb {E}_\theta \varphi |\), we can use the standard martingale difference representation for \(\varphi - \mathbb {E}_\theta \varphi = \sum _{k\le \pi (\alpha N)} d_k\) by adding the randomness of one term (50) at a time to obtain
Therefore, \(\mathbb {E}(\varphi - \mathbb {E}_\theta \varphi )^2 \le 4\beta ^2 \alpha N\) and \(\mathbb {E}| \varphi - \mathbb {E}_\theta \varphi | \le 2\beta \sqrt{\alpha N}\). Combining all three estimates, we proved that \(\mathbb {E}|\varphi - \mathbb {E}\varphi | \le L s + 4\beta \sqrt{\alpha N}.\) Now it is easy to see that we can take \(s_N = N^{\gamma }\) for any \(\gamma \in (1/4, 1/2)\) to satisfy (43) and the conditions in Theorem 2. \(\square \)
We now go back to the notations in the setting of asymptotic Gibbs measures in the introduction, and will end this section with the invariance property that will be the main tool in the proof of Theorem 1. Given \(n\ge 1\), consider \(n\) bounded measurable functions \(f_1,\ldots , f_n: \mathbb {R}\rightarrow \mathbb {R}\) and define
For \(1\le \ell \le n\) we define
Consider a finite index set \(\mathcal{T}.\) Given a realization of the random measure \(G\) and a sample \(\sigma ^1,\ldots ,\sigma ^n\) from \(G\) let \((B_t)_{t\in \mathcal{T}}\) be a partition of the support of \(G\) such that, for each \(t\in \mathcal{T}\), the indicator \({\mathrm{I}}(\sigma \in B_t)\) is a measurable function of \((\sigma ^\ell \cdot \sigma ^{\ell '})_{\ell ,\ell '\le n}\) and \((\sigma \cdot \sigma ^\ell )_{\ell \le n}\). Let
Let us define the map \(T\) by
where \(\langle \cdot \rangle _{{\_}}\) denotes the average with respect to the measure \(G\) in \(\sigma \) only for fixed \(\sigma ^1,\ldots , \sigma ^n\). The following result was proved in [20] (see also Theorem 2.19 in [21]) as a consequence of the Ghirlanda–Guerra identities (15). Recall the definition of \(S^n\) in (12).
Theorem 3
If (15) holds then, for any bounded measurable function \(\Phi =\Phi (S^n, \delta )\),
This theorem was proved in [20] for the function \(\Phi \) of the overlaps \((R_{\ell ,\ell '})_{\ell ,\ell '\le n}\) instead of all spins \(S^n\). This is because the Ghirlanda–Guerra identities in [20] were stated only for the function of the overlaps, while here we wrote them in (15) for a function of all spins. Otherwise, the proof of Theorem 3 from (15) is identical to the one in [20].
3 At the level of replicas
The main work will be to prove some analogue of Theorem 1 at the level of the replicas \(\sigma ^1,\ldots , \sigma ^n\) sampled from the Gibbs measure \(G\) described at the end of previous section, which will then imply Theorem 1 by passing \(n\) to infinity. Until further notice, however, \(n\) will be fixed.
Let \(\mathcal{T}\) be a finite rooted labelled tree of depth \(r\). We will label the vertices of \(\mathcal{T}\) by a finite subset of \(\mathcal{A}\) in (19) as follows. The root will again be labelled by \(*\). Then, recursively for \(p\le r-1\), if a vertex at the distance \(p\) from the root labelled by \(t\in \mathbb {N}^p\) has \(k_t\) children then we label them by \(t 1,\ldots , t k_t \in \mathbb {N}^{p+1}\) (recall that for simplicity we write \(tk\) for \((t,k)\)). We identify the tree \(\mathcal{T}\) with the set of vertex labels and use the same notation, \(|t|, t\wedge s\), \(t\succ s\) for \(t,s\in \mathcal{T}\), as for the tree \(\mathcal{A}\). We will denote by \(\mathcal{L}(\mathcal{T})\) the set of leaves of \(\mathcal{T}\) and consider a function
We will call the pair \(\mathcal{C}=(\mathcal{T},\mathcal{P})\) a configuration if \(\mathcal{P}^{-1}(t) \not = \emptyset \, \text{ for } \text{ all } \, t\in \mathcal{L}(\mathcal{T}),\) i.e. at least one replica index is mapped into each leaf. Of course, this means that the cardinality \(|\mathcal{L}(\mathcal{T})| \le n\). The role of the function \(\mathcal{P}\) is to partition replica indices among the leaves of \(\mathcal{T}\) and then use the tree structure to describe how replicas \(\sigma ^1,\ldots , \sigma ^n\) cluster according to the overlap equivalence relations (18) along the tree \(\mathcal{T}\). More precisely, we will consider the event
This event depends on the tree \(\mathcal{T}\) via \(\mathcal{P}(\ell )\wedge \mathcal{P}(\ell ')\) and \(I_{\mathcal{P}(\ell )\wedge \mathcal{P}(\ell ')}\) is one of the intervals in (23). In other words, on this event the overlap of replicas “assigned by \(\mathcal{P}\)” to the leaves \(t,t'\in \mathcal{L}(\mathcal{T})\) is determined by the depth \(t\wedge t'\) of their lowest common ancestor.
Let us assume from now on that the sample belongs to the event \({\mathcal{O}(\mathcal{C})}\). Then, we can use ultrametricity of the support of the measure \(G\) to partition it in a natural way ‘along the tree \(\mathcal{T}\)’ according to the overlaps with the replicas \(\sigma ^1,\ldots , \sigma ^n\). For each \(t\in \mathcal{T}\), let
be the set of replica indices assigned to the leaves which are descendants of \(t\). Consider the sets
Since, obviously, \(t'\wedge t'' \ge |t|\) for any \(t',t''\succ t\), the overlap \(\sigma ^\ell \cdot \sigma ^{\ell '} \ge q_{|t|}\) for all \(\ell ,\ell ' \in \mathcal{R}(t)\) on the event \({\mathcal{O}(\mathcal{C})}\). By ultrametricity, this implies that we can also write the set (59) as
This makes it obvious that the sets \(C_t\) are nested, \(C_{t'} \subseteq C_t\) for \(t'\succ t\). Another simple property is that the sets indexed by the children of \(t\) are disjoint subsets of \(C_t\),
(recall that \(k_t\) is the number of children of \(t\in \mathcal{T}\)). To see this, if we take \(\ell \in \mathcal{R}(tk)\) and \(\ell '\in \mathcal{R}(tk')\) then \(\sigma ^\ell \cdot \sigma ^{\ell '} \in I_{|t|}= [q_{|t|},q_{|t|}') \) by (57). On the other hand,
so (61) again follows by ultrametricity. Let us now consider the sets \(B_t := C_t\) for \(t\in \mathcal{L}(\mathcal{T})\) and
for \(t\in \mathcal{T}{\setminus }\mathcal{L}(\mathcal{T}).\) On the event \({\mathcal{O}(\mathcal{C})}\), the collection \((B_t)_{t\in \mathcal{T}}\) forms a random partition of the support of the Gibbs measures \(G\) and, by definition, the indicator \({\mathrm{I}}(\sigma \in B_t)\) depends only on the overlaps \((\sigma \cdot \sigma ^\ell )_{\ell \le n}\). Below, this will allow us to apply Theorem 3 to this partition with some specific choice of function \(f_1,\ldots , f_n\) in (51).
Let us denote the Gibbs weights of the above sets by
It is obvious that two different configurations \(\mathcal{C}=(\mathcal{T},\mathcal{P})\) and \(\mathcal{C}'=(\mathcal{T}',\mathcal{P}')\) can result in the same event, \({\mathcal{O}(\mathcal{C})}=\mathcal{O}(\mathcal{C}')\), if we simply reshuffle the labels of \(\mathcal{T}\) in a hierarchical way and then redefine \(\mathcal{P}\) accordingly. Later on, we will need to fix a special configuration among these, and this will be done using the cluster weights \(W_t\) around the sample points, as follows. Consider the event
It is obvious that such ordering of the weights makes the events \({\mathcal{O}(\mathcal{C})}\cap {\mathcal{W}(\mathcal{C})}\) disjoint for different configurations \(\mathcal{C}\), and each sample \((\sigma ^1,\ldots ,\sigma ^n)\) belongs to one and only one of these events. We will denote the corresponding configuration by \(\mathcal{C}_n = (\mathcal{T}_n,\mathcal{P}_n)\),
and call \(\mathcal{C}_n = (\mathcal{T}_n,\mathcal{P}_n)\) the sample configuration. In other words, \(\mathcal{C}_n\) is a function of \(\sigma ^1,\ldots ,\sigma ^n\) such that the tree is defined according to the overlap structure \((\sigma ^\ell \cdot \sigma ^{\ell '})_{\ell ,\ell '\le n}\) with the vertices labelled according to the weights of the neighborhoods of these replicas. The event \({\mathcal{W}(\mathcal{C})}\) and the sample configuration \(\mathcal{C}_n\) will not be used in this section, but will play an important role in the last section where they will be utilized to partition an event into disjoint events indexed by configurations \(\mathcal{C}\).
For the remainder of this section, we will fix a configuration \(\mathcal{C}\) once and for all and, for simplicity of notation, will omit the dependence of \({\mathcal{O}(\mathcal{C})}\) on \(\mathcal{C}\) and write \(\mathcal{O}\) instead. Let us denote \(\mathbb {P}(\ \cdot \ ) = \mathbb {E}\langle {\mathrm{I}}(\ \cdot \ )\rangle \) and let
be the conditional distribution given the event \(\mathcal{O}\). Since \(n\) is fixed in this section, we will write \(S\) to denote \(S^n\) in (12). Let
We exclude the root, because \(W_*=1\). Theorem 1 will follow from the main result of this section.
Theorem 4
For any measurable sets \(A\) and \(B\),
Since the weights \((W_t)\) and \((\delta _t)\) in (63) are functions of each other, the independence of \(S\) and \(W\) in (68) is equivalent to independence of \(S\) and \(\delta \),
where \(\delta = (\delta _t)_{t\in \mathcal{T}_*}\). Again, we can exclude the root, because \(\delta _* = 1-\sum _{t\in \mathcal{T}_*}\delta _t.\) The vector \(\delta \) takes values in the open subset
of \(\mathbb {R}^{|\mathcal{T}_*|}.\) Given a vector \(a=(a_t)_{t\in \mathcal{T}_*}\in \mathbb {R}^{|\mathcal{T}_*|}\), let us define the map \(T_a: \mathcal{D}\rightarrow \mathcal{D}\) by
One can easily check that for \(a,b\in \mathbb {R}^{|\mathcal{T}_*|}\) we have \(T_a\circ T_b = T_{a+b}\) and, therefore, \(T_a^{-1} = T_{-a}\). It is also easy to check that
Let us denote by \(B_{\varepsilon }(x)\) the open ball of radius \({\varepsilon }\) in \( \mathbb {R}^{|\mathcal{T}_*|}\) centered at \(x.\) Then the following holds.
Lemma 2
For any \(a=(a_t)_{t\in \mathcal{T}_*}\in \mathbb {R}^{|\mathcal{T}_*|}\) and \(x\in \mathcal{D}\),
whenever either of the limits exists.
Proof
As we mentioned above, we will apply Theorem 3 to the partition \((B_t)_{t\in \mathcal{T}}\) in (62) with the following choice of function \(f_1,\ldots , f_n\) in (51). Let us consider an arbitrary function
such that \(\ell (t) \in \mathcal{R}(t)\) in (58) for all \(t\in \mathcal{T}\). In other words, we pick one replica index \(\ell (t)\) assigned to one of the leaves that are descendants of \(t\). Consider a vector \(b=(b_t)_{t\in \mathcal{T}}\in \mathbb {R}^{|\mathcal{T}|}\). For each replica index \(1\le \ell \le n\), let
Then the function \(F\) in (51) can be written as
Let us fix \(u\in \mathcal{T}\) and compute \(\langle {\mathrm{I}}(\sigma \in B_u) \exp F(\sigma ,\sigma ^1,\ldots ,\sigma ^n )\rangle _{{\_}}\). We will now fix \(\sigma \in B_u\) and consider several different cases when \(t\) belongs to different subsets of the tree \(\mathcal{T}\).
-
1.
First of all, if \(t=u\) then \({\mathrm{I}}(\sigma \cdot \sigma ^{\ell (t)} \in I_{|t|}) = 1\) by the definition of \(B_u\) in (62).
-
2.
If \(t\succ u\), \(t\not =u,\) then \(\ell (t) \in \mathcal{R}(u)\) and \(\sigma \cdot \sigma ^{\ell (t)} \in I_{|u|}\), which implies that \({\mathrm{I}}(\sigma \cdot \sigma ^{\ell (t)} \in I_{|t|}) = 0\).
-
3.
If neither \(t\succ u\) nor \(u\succ t\) then (on the event \(\mathcal{O}\)) \(\sigma ^{\ell (t)}\cdot \sigma ^{\ell (u)} \in I_{t\wedge u}\) and \(t\wedge u < \min (|t|,|u|)\). Since for \(\sigma \in B_u\) we have \(\sigma \cdot \sigma ^{\ell (u)} \in I_{|u|}\), by ultrametricity, \({\mathrm{I}}(\sigma \cdot \sigma ^{\ell (t)} \in I_{|t|}) = 0.\)
-
4.
If \(u\succ t\), \(t\not = u\), then, in general, the answer depends on the choice of the function (74) or, more specifically, on whether \(\mathcal{P}(\ell (u))\wedge \mathcal{P}(\ell (t)) = |t|\) or \(\mathcal{P}(\ell (u))\wedge \mathcal{P}(\ell (t)) > |t|\). In the first case, on the event \(\mathcal{O}\), \(\sigma ^{\ell (t)}\cdot \sigma ^{\ell (u)} \in I_{|t|}\). Since for \(\sigma \in B_u\) we have \(\sigma \cdot \sigma ^{\ell (u)} \in I_{|u|}\) and \(I_{|u|}\) lies strictly above \(I_{|t|}\), by ultrametricity, \(\sigma \cdot \sigma ^{\ell (t)} = \sigma ^{\ell (t)}\cdot \sigma ^{\ell (u)} \in I_{|t|}\) and \({\mathrm{I}}(\sigma \cdot \sigma ^{\ell (t)} \in I_{|t|}) = 1\). In the second case, on the event \(\mathcal{O}\), \(\sigma ^{\ell (t)}\cdot \sigma ^{\ell (u)}\) also lies strictly above \(I_{|t|}\) and, therefore, by ultrametricity, \(\sigma \cdot \sigma ^{\ell (t)}\) lies strictly above \(I_{|t|}\) and \({\mathrm{I}}(\sigma \cdot \sigma ^{\ell (t)} \in I_{|t|}) = 0\).
Therefore, if we consider the set
then, for \(\sigma \in B_u\) we have \(F(\sigma ,\sigma ^1,\ldots ,\sigma ^n) = b_u + \sum _{t\in \mathcal{T}(u)} b_t\) and
Let us now set \(b_{*}=0\) and recursively set \(b_u = a_u - \sum _{t\in \mathcal{T}(u)} b_t\) for \(u\in \mathcal{T}_{*}\). Then,
Adding them up, we get
We showed that, with this choice of functions \(f_1,\ldots , f_n\), the map \(T\) in (54) coincides with the map \(T_a\) in (71) on the coordinates indexed by \(t\in \mathcal{T}_*.\) Also, it is clear that, on the event \(\mathcal{O}\), the sum
is a constant, which we will denote by \(\gamma (a)\). If we denote \(Z_a(\delta ) = e^{\gamma (a)}/\Delta _a(\delta )^n\) then Theorem 3 implies that
The same equality, obviously, holds without the event \(\{S\in A\}\), which proves that
if the numerator is not zero. When \(T_a(\delta )\in B_{\varepsilon }(x)\), by (72),
and, hence, \(Z_a(\delta ) \in e^{\gamma (a)} \Delta _{-a}(B_{\varepsilon }(x))^n\). As a result, as \({\varepsilon }\downarrow 0\), the factor \(Z_a(\delta )\) converges uniformly to a constant \(e^{\gamma (a)} \Delta _{-a}(x)^n\) that will cancel out on the right hand side of (76), yielding (73). \(\square \)
We will need one more technical result that will be postponed until the next section.
Lemma 3
The distribution \(\mathbb {P}_\mathcal{O}(\delta \in \ \cdot \ )\) of weights \(\delta = (\delta _t)_{t\in \mathcal{T}_*}\) is absolutely continuous with respect to the Lebesgue measure on \(\mathbb {R}^{|\mathcal{T}_*|}\).
We are now ready to prove Theorem 4.
Proof of Theorem 4
Let \(p(x)\) be the Lebesgue density of the distribution \(\mathbb {P}_\mathcal{O}(\delta \in \, \cdot \, )\) and let \(p_A(\delta )\) be the conditional expectation of the indicator \({\mathrm{I}}(S\in A)\) given \(\delta \) under the measure \(\mathbb {P}_\mathcal{O}.\) Then,
To prove (69), it is enough to show that \(p_A(x)\) is a constant a.e. on the set \(\{x: p(x)>0\}\). By the Lebesgue differentiation theorem (see Corollary 1.6 in [28]), for almost every \(x'\in \mathbb {R}^{|\mathcal{T}_*|}\),
If \(p_A(x)\) is not a constant a.e. on \(\{p(x)>0\}\) then we can find two points \(x',x''\) for which both (78) and (79) hold and such that \(p(x'), p(x'')>0\) and \(p_A(x')\not = p_A(x'').\) We can also assume that \(x',x''\in \mathcal{D}\) in (70) since \(\mathbb {P}_\mathcal{O}(\delta \not \in \mathcal{D}) = 0.\) First of all, equations (77) – (79) imply that the left hand side of (73) is equal to
It is easy to check that if we take
for \(t\in \mathcal{T}_*\) then \(T_a(x'') = x'\) for \(T_a\) defined in (71). Equations (73) and (80) imply that
To finish the proof, we will follow the argument of Corollary 1.7 in [28] and use the fact that the sets \(T_{-a} (B_{\varepsilon }(x'))\) are of bounded eccentricity. Namely, since all partial derivatives of \(T_a\) are uniformly bounded in a small neighborhood of \(x''\) and all partial derivatives of \(T_a^{-1}=T_{-a}\) are uniformly bounded in a small neighborhood of \(x'\), there exist some constants \(c, C>0\) such that \(B_{c {\varepsilon }}(x'') \subseteq T_{-a} (B_{\varepsilon }(x')) \subseteq B_{C {\varepsilon }}(x'')\) for small \({\varepsilon }>0\). Therefore,
and, using that (79) holds with \(x''\) instead of \(x'\), we get
Similarly, using (78) with \(x''\) instead of \(x'\),
These equations together with (77) for \(B = T_{-a} (B_{\varepsilon }(x'))\) imply that
Recalling (81), we arrive at contradiction, \(p_A(x') = p_A(x'')\). \(\square \)
4 Absolute continuity of cluster weight distribution
In this section, we will prove Lemma 3. First, we will reduce the problem to proving absolute continuity for the distribution of finitely many cluster weights \(V_\alpha \) in (27). Then we will recall that these cluster weights are generated by the RPC thanks to the Ghirlanda–Guerra identities, so the proof of absolute continuity will be based solely on the properties of the RPC.
Let \(\mathcal{C}=(\mathcal{T},\mathcal{P})\) be a fixed configuration as in the previous section. With probability one, the vector of weights \(W=(W_t)_{t\in \mathcal{T}_*}\) defined in (63) belongs to the open subset
of \(\mathbb {R}^{|\mathcal{T}_*|},\) where we set \(y_* = 1.\) The map given by \(x_t = y_t -\sum _{k\le k_t} y_{tk}\) for \(t\in \mathcal{T}_*\) is a linear bijection between \(\mathcal{W}\) and the set \(\mathcal{D}\) defined in (70). Recall that this is precisely the relationship between the weights \(W=(W_t)_{t\in \mathcal{T}_*}\) and \(\delta = (\delta _t)_{t\in \mathcal{T}_*}\) in (63). Therefore, in order to prove Lemma 3, it is enough to prove that the distribution \(\mathbb {P}_\mathcal{O}(W\in \ \cdot \ )\) is absolutely continuous with respect to the Lebesgue measure on \(\mathbb {R}^{|\mathcal{T}_*|}\).
Let us now recall the definition of the clusters \((H_\alpha )_{\alpha \in \mathcal{A}}\) and their Gibbs weights \((V_\alpha )_{\alpha \in \mathcal{A}}\) in the paragraph above equation (27). Suppose that the cardinality of \(\mathcal{L}(\mathcal{T})\) is equal to \(m\). Let us look at all possible choices of \(m\) pure states \(H_{\alpha _t}\) for \(t\in \mathcal{L}(\mathcal{T})\) indexed by the leaves \(\alpha _t\in \mathcal{L}(\mathcal{A})=\mathbb {N}^r\) that “form the same pattern” according to their overlaps as the tree \(\mathcal{T}\). More precisely, we will denote \(\bar{\alpha } := (\alpha _t)_{t\in \mathcal{L}(\mathcal{T})}\) and consider the set
Then it should be obvious that the event \(\mathcal{O}= {\mathcal{O}(\mathcal{C})}\) defined in (57) can be written as a disjoint union \(\mathcal{O}= \bigcup _{\bar{\alpha }\in \mathcal{A}(\mathcal{C})} \mathcal{O}(\bar{\alpha })\), where (recall the definition of \(\mathcal{R}(t)\) in (58))
Then, we can write
On the event \(\mathcal{O}(\bar{\alpha })\), the vector of weights \(W=(W_t)_{t\in \mathcal{T}_*}\) can also be written as a vector of cluster weights \(V_\alpha \) in (27) indexed by the vertices \(\alpha \) in the subtree formed by all paths from the root to the leaves \((\alpha _t)_{t\in \mathcal{L}(\mathcal{T})}\). Let us call this vector \(V(\bar{\alpha })\). Also, obviously,
and, therefore,
To finish the proof of Lemma 3, it is enough to show that the distribution of \(V(\bar{\alpha })\) is absolutely continuous with respect to the Lebesgue measure. For the remainder of this section, we will forget about the configuration \(\mathcal{C}\) and will focus on proving the absolute continuity of the distribution of cluster weights \((V_\alpha )_{\alpha \in F}\) indexed by an arbitrary finite subset \(F\) of the tree \(\mathcal{A}\). Of course, this will be based on the properties of the RPC, so we will first recall the construction of these cascades and how it relates to the weights \(V_\alpha \).
Recall the sequence of parameters in (24). For each \(\alpha \in \mathcal{A}{\setminus } \mathbb {N}^r\), let \(\Pi _\alpha \) be a Poisson process on \((0,\infty )\) with the mean measure \(\zeta _{p}x^{-1-\zeta _{p}}dx\) with \(p=|\alpha |\), and we assume that these processes are independent for all \(\alpha \). Let us arrange all the points in \(\Pi _\alpha \) in the decreasing order,
and enumerate them using the children \((\alpha n)_{n\ge 1}\) of the vertex \(\alpha \). Given a vertex \(\alpha \in \mathcal{A}{\setminus } \{*\}\) and the path \(p(\alpha )\) in (20), we define
and for the leaf vertices \(\alpha \in \mathcal{L}(\mathcal{A}) = \mathbb {N}^r\) we define
For other vertices \(\alpha \in \mathcal{A}{\setminus } \mathcal{L}(\mathcal{A})\) we define
Of course, this definition implies that \(v_\alpha = \sum _{n\ge 1} v_{\alpha n}\) when \(|\alpha |<r\). Notice that, for a given \(\alpha \), the sequence of weights \((v_{\alpha n})_{n\ge 1}\) is not necessarily decreasing. For example, when \(r=2\), sequences \((u_n)_{n\ge 1}\) and \((u_{nm})_{m\ge 1}\) for all \(n\) are decreasing by construction, but \(v_n\) is proportional to \(u_n\sum _{m\ge 1} u_{nm}\) and does not have to be decreasing. Let us now rearrange the vertex labels so that the weights indexed by children will be decreasing. For each \(\alpha \in \mathcal{A}{\setminus } \mathbb {N}^r\), let \(\pi _\alpha : \mathbb {N}\rightarrow \mathbb {N}\) be a bijection such that the sequence \((v_{\alpha \pi _\alpha (n)})_{n\ge 1}\) is decreasing. Using these “local rearrangements” we define a global bijection \(\pi : \mathcal{A}\rightarrow \mathcal{A}\) in a natural way, as follows. We let \(\pi (*)=*\) and then define
recursively from the root to the leaves of the tree. Finally, we define
It is not a coincidence that we used here the same notation as in (27), since they have the same distribution. This relationship between cluster weights of a random measure \(G\) and the RPC is a well-known consequence of the Ghirlanda–Guerra identities (see Section 2.4 in [21]). Therefore, our goal is to prove the following.
Lemma 4
The distribution of weights \((V_\alpha )_{\alpha \in F}\) in (88) indexed by an arbitrary finite subset \(F\) of the tree \(\mathcal{A}\) is absolutely continuous with respect to the Lebesgue measure on \(\mathbb {R}^{|F|}\).
Let us first introduce some more notation and recall some definitions. Let \((u_n)_{n\ge 1}\) be the decreasing enumeration of a Poisson process on \((0,\infty )\) with the mean measure \(x u^{-1-x}du\) for some \(x\in (0,1)\) and let
The distribution of the sequence \((p_n)_{n\ge 1}\) is called the Poisson–Dirichlet distribution \(PD(x)\) (or \(PD(x,0)\)). It is well known that the distribution of finitely many coordinates of \((p_n)\) is absolutely continuous. For example, Proposition 47 in [25] gives some representation for the density, but the existence of the density is also easy to see directly from the representation of this process in Proposition 8 in [25].
Let us consider \(a<x\). Then the distribution of \((p_n)_{n\ge 1}\) under the change of density \(U^a/\mathbb {E}U^a\) is called the Poisson–Dirichlet distribution \(PD(x,-a)\). The usual condition \(a<x\) ensures that \(\mathbb {E}U^a <\infty \) and the change of density is well defined (see e.g. Lemma 2.1 in [21]). The definition of this distribution in Section 1.1 in [25] was different but its equivalence to this one was shown in Proposition 14 there. (In [25], the parameter \(-a\) was denoted \(\theta \) and the condition was stated as \(\theta >-x\).) It is easy to see that the distribution of finitely many coordinates of \((p_n)\) under \(PD(x,-a)\) is also absolutely continuous. Indeed, for any \(N\ge 1\) and a measurable set \(A\) in \(\mathbb {R}^N\) of Lebesgue measure \(0\), by Hölder’s inequality,
for small enough \({\varepsilon }>0\) such that \(a(1+{\varepsilon })<x\), in which case \(\mathbb {E}U^{a(1+{\varepsilon })}<\infty \).
For each \(\alpha \in \mathbb {N}^{r-1}\), let us now consider the sequence
By definition, this sequence is decreasing and \(\sum _{n\ge 1} p_{\alpha n}=1\). The following holds.
Lemma 5
For each \(\alpha \in \mathbb {N}^{r-1}\), the sequence \((p_{\alpha n})_{n\ge 1}\) in (91) has distribution \(PD(\zeta _{r-1},-\zeta _{r-2})\). These sequences are independent of each other and of \((V_{\alpha })_{|\alpha |\le r-1}\).
First, let us show how this implies Lemma 4.
Proof of Lemma 4
This now follows easily by induction on \(r\). For \(r=1\), this is just absolute continuity of weights from the Poisson–Dirichlet distribution \(PD(\zeta _0)\). To make an induction step, we use a well-known fact that the array \((V_\alpha )_{|\alpha | \le r-1}\) can be constructed as in (83)–(88) with \(r\) replaced by \(r-1\) and \(\zeta _{r-1}\) removed from the sequence (24). This observation goes back to [26], but is also a trivial consequence of the Ghirlanda–Guerra identities. (In any case, the proof of this fact will appear below as a byproduct of the proof of Lemma 5.) By the induction hypothesis, this implies that the distribution of finitely many coordinates of \((V_\alpha )_{|\alpha | \le r-1}\) is absolutely continuous. To include coordinates \(V_{\alpha n}\) for \(\alpha \in \mathbb {N}^{r-1}\) and \(n\ge 1\), we write them as \(V_{\alpha n} = V_\alpha p_{\alpha n}\) and use Lemma 5 together with the observation in (90) about absolutely continuity of the distribution of finitely many coordinates under \(PD(x,-a)\). \(\square \)
Proof of Lemma 5
We only need to consider the case \(r\ge 2\). For each \(\alpha \in \mathbb {N}^{r-2}\), consider the process \((u_{\alpha n}, (u_{\alpha n m})_{m\ge 1} )_{n\ge 1}\) and let \(U_{\alpha n} := \sum _{m\ge 1} u_{\alpha n m}\). If we define
then \(Y_{\alpha n} := (d_{\alpha n m})_{m\ge 1}\) has the Poisson–Dirichlet distribution \(PD(\zeta _{r-1})\). Notice that the random variables \((U_{\alpha n}, Y_{\alpha n})_{n\ge 1}\) are i.i.d. and independent of \((u_{\alpha n})_{n\ge 1}\). Moreover, all these processes are independent over \(\alpha \in \mathbb {N}^{r-2}\), and also independent of \(U_{r-2} = (u_\alpha )_{|\alpha |\le r-2}\).
For a fixed \(\alpha \in \mathbb {N}^{r-2}\), let \(\pi _\alpha :\mathbb {N}\rightarrow \mathbb {N}\) be a bijection such that the sequence \((u_{\alpha \pi _\alpha (n)} U_{\alpha \pi _\alpha (n)})_{n\ge 1}\) is decreasing. This is exactly the same permutation defined in the paragraph above (87) since, for a fixed \(\alpha \in \mathbb {N}^{r-2}\), \(v_{\alpha n}\) is proportional to \(u_{\alpha n}U_{\alpha n}\). Since \((u_{\alpha n})_{n\ge 1}\) is a Poisson process with the mean measure \(\zeta _{r-2} \,x^{-1-\zeta _{r-2}} dx\), Theorem 2.6 in [21] (Proposition A.2 in [6]) implies that
where \(c= \bigl (\mathbb {E}U_{\alpha 1}^{\zeta _{r-2}} \bigr )^{1/\zeta _{r-2}}\), \((u_{\alpha n})_{n\ge 1}\) and \((Y_{\alpha n}')_{n\ge 1}\) on the right hand side are independent, and the random variables \((Y_{\alpha n}')_{n\ge 1}\) are i.i.d. with the distribution of \(Y_{\alpha 1} = (d_{\alpha 1 m})_{m\ge 1}\) under the change of density
which is precisely the Poisson–Dirichlet distribution \(PD(\zeta _{r-1},-\zeta _{r-2})\). It remains to notice that the weights \((V_{\alpha })_{|\alpha |\le r-1}\) are, obviously, a function of the arrays
and are, therefore, independent of the random variables \(Y_{\alpha \pi _\alpha (n)}\), which are i.i.d. for all \(\alpha \in \mathbb {N}^{r-2}\) and \(n\ge 1\) and have the distribution \(PD(\zeta _{r-1},-\zeta _{r-2})\). In particular, the permutation \(\pi \) defined in (87), restricted to \(|\alpha |\le r-1\), will be a function of these arrays and, therefore,
are still i.i.d. over all \(\alpha \in \mathbb {N}^{r-2}\) and \(n\ge 1\), have distribution \(PD(\zeta _{r-1},-\zeta _{r-2})\), and are independent of \((V_{\alpha })_{|\alpha |\le r-1}\). This finishes the proof since, by the definition (91), for \(\alpha n\in \mathbb {N}^{r-1}\),
Finally, let us notice that the above argument also proves the fact mentioned in the proof of Lemma 4, namely, that the array \((V_\alpha )_{|\alpha | \le r-1}\) can be constructed as in (83)–(88) with \(r\) replaced by \(r-1\) and \(\zeta _{r-1}\) removed from the sequence (24). This is because \((V_\alpha )_{|\alpha | \le r-1}\) is constructed from the arrays in (93) as in (83)–(88) and, by (92), for each \(\alpha \in \mathbb {N}^{r-2}\), the second array in (93) is, up to a factor \(c\), a Poisson process with the mean measure \(\zeta _{r-2} \,x^{-1-\zeta _{r-2}} dx\). Of course, this constant factor \(c\) will cancel at the step (85), so the claim follows. \(\square \)
5 From replicas to the Gibbs measure
In this section, we will show how Theorem 1 can be deduced from Theorem 4. The main idea is that when the sample size \(n\) goes to infinity, there will be many replicas in any given subset of pure states, and the statement in Theorem 4 about spins and cluster weights corresponding to the sample can be translated into a statement in Theorem 1 about spins inside pure states and cluster weights of the Gibbs measure.
Before we begin the proof, let us first notice that Theorem 1 follows from its analogue for finite subsets of the tree \(\mathcal{A}\), as follows. Let us consider integers \(d\ge 1\) and \(N\ge 1\) that will be fixed throughout this section (note that now the notation \(N\) is not related to the number of coordinates of the system in the introduction). Let \([d]=\{1,\ldots , d\}\) and let
be a \(d\)-regular subtree of \(\mathcal{A}\). Any finite subset of \(\mathcal{A}\) will be covered by \(\mathcal{A}_d\) for \(d\) sufficiently large (depending on the subset). Now, recall the array \(S_\alpha = (S(\sigma ^{\alpha n}))_{n\ge 1}\) in (30) and let us truncate it to the array
generated by a sample \((\sigma ^{\alpha n})_{n\le N}\) of size \(N\) from the pure state \(H_{\alpha }\). We will only consider these arrays for \(\alpha \in [d]^r = \mathcal{L}(\mathcal{A}_d)\), so we will need to restrict the notion of hierarchical exchangeability to the finite tree \(\mathcal{A}_d\). Similarly to (31), let
Then, naturally, we will call a finite array \((X_\alpha )_{\alpha \in [d]^r}\) hierarchically exchangeable if
for all \(\pi \in \mathcal{H}_d\). It is obvious that, in order to prove Theorem 1, it is sufficient to show the following for all \(d,N \ge 1\).
Theorem 1 \(^{\prime }\) The array of spins \((S_{\alpha ,N})_{\alpha \in [d]^r}\) defined in (94) is hierarchically exchangeable and independent of the array of cluster weights \((V_\alpha )_{\alpha \in \mathcal{A}_d{\setminus }\{*\}}\).
To prove this, we will apply Theorem 4 to the following set of configurations \(\mathcal{C}=(\mathcal{T},\mathcal{P})\),
(this set depends on \(n\) through the mapping \(\mathcal{T}\)). In words, the tree \(\mathcal{T}\) contains \(\mathcal{A}_d\) (so it is big enough) and at least \(N\) replica indices are mapped by \(\mathcal{P}\) into each leaf \(t\in [d]^r = \mathcal{L}(\mathcal{A}_d)\subseteq \mathcal{L}(\mathcal{T})\). For a given configuration \(\mathcal{C}\in \mathcal{C}(n,d,N)\) and \(t\in [d]^r\), let \(\mathcal{R}_N(t)\) be the set of the smallest \(N\) replica indices in \(\mathcal{P}^{-1}(t)\) (we choose the smallest \(N\) just for certainty, and arbitrary \(N\) would do) and define \(\mathcal{R}_{d,N} = \bigcup _{t\in [d]^r} \mathcal{R}_N(t)\). Let us recall the definition of \(S^n\) in (11) and (12) and, similarly, define
In other words, we are now only interested in a set of \(N\) replicas for each of the leaves in \([d]^r.\) Similarly to (57), let us define the event
which involves only the replicas with indices in \(\mathcal{R}_{d,N}\) and, similarly to the definition of \(\mathbb {P}_{{\mathcal{O}(\mathcal{C})}}\) in (66), we let
We will need the following simple consequence of the Ghirlanda–Guerra identities (15).
Lemma 6
For any \(\mathcal{C}\in \mathcal{C}(n,d,N)\), we have
Proof
Let us consider the numerator and denominator on the left hand side of (101),
Consider any replica index \(\ell \in \{1,\ldots , n\}{\setminus } \mathcal{R}_{d,N}\) not appearing in \(S^{d,N}\). For simplicity of notation, suppose that this index is \(n\). Then, let \(\ell ' \not = n\) be a replica index such that \(\mathcal{P}(n) \wedge \mathcal{P}(\ell ')\) is as large as possible. Again, for simplicity of notation, suppose that \(\ell ' =1\) (it does not matter whether this replica index is in \(\mathcal{R}_{d,N}\) or not). Let \(p = \mathcal{P}(n) \wedge \mathcal{P}(1)\) so that, on the event \({\mathcal{O}(\mathcal{C})}\) in (57), we have \(\sigma ^1\cdot \sigma ^n \in I_p\). By assumption, \(\mathcal{P}(\ell )\wedge \mathcal{P}(n) \le p\) for \(2\le \ell \le n-1\) and, therefore, \(\mathcal{P}(1)\wedge \mathcal{P}(\ell ) = \mathcal{P}(\ell )\wedge \mathcal{P}(n).\) By ultrametricity, the constraint \(\sigma ^1\cdot \sigma ^{\ell }\in I_{\mathcal{P}(1)\wedge \mathcal{P}(\ell )}\) automatically implies that \(\sigma ^\ell \cdot \sigma ^{n}\in I_{\mathcal{P}(1)\wedge \mathcal{P}(\ell )} = I_{\mathcal{P}(\ell )\wedge \mathcal{P}(n)}\), which means that we can write
where
Then, using the Ghirlanda–Guerra identities, we get
By the definition (24), \(\mathbb {E}\langle {\mathrm{I}}(\sigma ^1\cdot \sigma ^2 \in I_p)\rangle = \zeta (I_p) = \zeta _{p}-\zeta _{p-1}.\) In the second sum,
depending on whether \(\ell \in \mathcal{I}=\{2\le \ell \le n-1 \ |\ \mathcal{P}(\ell )\wedge \mathcal{P}(1) =p \}\) or not. Therefore,
Since this computation did not depend on the set \(A\), similarly, we get
Dividing these two equations, we showed that \( \mathbb {P}_{{\mathcal{O}(\mathcal{C})}}\bigl (S^{d,N}\in A \bigr ) = \mathbb {P}_{{\mathcal{O}(\mathcal{C})}^-}\bigl (S^{d,N}\in A \bigr ). \) We can now proceed in the same way to remove replica indices one by one until we are left with replicas with indices in the set \(\mathcal{R}_{d,N}.\) This finishes the proof. \(\square \)
Remark 1
Notice that the right hand side of (101) does not really depend on the configuration \(\mathcal{C}\) since the set \(\mathcal{R}_{d,N}\) involves \(N\) replicas assigned to the leaves \([d]^r\) of the tree \(\mathcal{A}_d\), and we can relabel those replicas using indices \(1,\ldots , N d^r.\) Let \(\mathcal{C}_{d,N}\) be a configuration consisting of the tree \(\mathcal{A}_d\) and a map \(\mathcal{P}_{d,N}\) that maps exactly \(N\) indices in \(\{1,\ldots , N d^r\}\) to each leaf in \([d]^r\). Then the equation (101) can be rewritten as
We use the same notation \(S^{d,N}\) on the right hand side but, of course, we need to change the definition of \(S^{d,N}\) to take into account this relabeling of indices. In fact, for clarity, let us index the \(N\) replicas mapped by \(\mathcal{P}_{d,N}\) into the leaf \(\alpha \in [d]^r = \mathcal{L}(\mathcal{A}_d)\) by \(\sigma ^{(\alpha , 1)},\ldots , \sigma ^{(\alpha , N)}.\) Then \(S^{d,N}\) on the right hand side of (102) is understood as
Notice that we use the notation \(\sigma ^{(\alpha ,\ell )}\) here to distinguish these (usual, unconditional) replicas from the Gibbs measure \(G\) from the replicas \(\sigma ^{\alpha \ell }\) in (30), which denoted the sample from conditional Gibbs measure \(G_\alpha \) on the pure state \(H_\alpha \).
For a given configuration \(\mathcal{C}=(\mathcal{T},\mathcal{P})\), let us recall the definition of \(W = (W_t)_{t\in \mathcal{T}_*}\) in (63) and (67), which represent the cluster weights around the sample on the event \({\mathcal{O}(\mathcal{C})}\). For a configuration \(\mathcal{C}\in \mathcal{C}(n,d,N)\) in (97), we will denote by
the subset of these weights along the subtree \(\mathcal{A}_d\subseteq \mathcal{T}\). Let us recall the definition of the sample configuration \(\mathcal{C}_n = (\mathcal{T}_n,\mathcal{P}_n)\) in (65) and consider two events
To understand what these events represent, let us see what they will look like with high probability when the sample size \(n\rightarrow \infty \). When \(n\) gets large, with high probability, at least \(N\) replicas will fall into each of the pure states \(H_\alpha \) for \(\alpha \in [d]^r\). First of all, this means that with high probability the sample configuration \(\mathcal{C}_n \in \mathcal{C}(n,d,N)\). Second, conditionally on this event that at least \(N\) replicas fall into each of the pure states \(H_\alpha \) for \(\alpha \in [d]^r\), what are \(S^{d,N}\) and \(W^d\) in (105) and (106)? Recall that \(\mathcal{C}_n = \mathcal{C}\) means that the event \({\mathcal{W}(\mathcal{C})}\) in (64) occurs and, for each vertex \(t\in \mathcal{T}{\setminus } \mathcal{L}(\mathcal{T})\), the cluster weights indexed by its children are arranged in decreasing order. The pure states \(H_\alpha \) and the weights \(V = (V_\alpha )_{\alpha \in \mathcal{A}}\) in (27) of the clusters around the pure states were labelled in a similar fashion in (26). This implies that whenever \(\mathcal{C}_n = \mathcal{C}\in \mathcal{C}(n,d,N)\) and at least \(N\) replicas fall into each of the pure states \(H_\alpha \) for \(\alpha \in [d]^r\), we must have \(W^d = (V_\alpha )_{\alpha \in \mathcal{A}_d{\setminus }\{*\}}.\) Moreover, in this case, the spins \(S^{d,N}\) correspond to \(N\) replicas sampled from each of the pure states \(H_\alpha \) for \(\alpha \in [d]^r\), i.e. \(S^{d,N} = (S_{\alpha ,N})_{\alpha \in [d]^r}\) defined in (94). This implies that
To finish the proof of Theorem 1\({}^\prime \), it remains to show the following.
Lemma 7
We have,
Proof
First of all, when we defined the sample configuration \(\mathcal{C}_n\) in (65) we explained that the events \(\mathcal{C}_n =\mathcal{C}\) are disjoint for different \(\mathcal{C}\) and \(\{\mathcal{C}_n =\mathcal{C}\} = {\mathcal{W}(\mathcal{C})}\cap {\mathcal{O}(\mathcal{C})}\). Therefore,
Notice that \(\{W^d\in B\}\cap {\mathcal{W}(\mathcal{C})}\) is an event which involves only the weights \(W = (W_t)_{t\in \mathcal{T}_*}\) and can be written as \(\{W\in B'\}\) for some set \(B'\). Therefore, Theorem 4 implies that
Finally, using (102), we can write
which finishes the proof. \(\square \)
Together with (107) and (108), Lemma 7 implies
Therefore, \((S_{\alpha ,N})_{\alpha \in [d]^r}\) and \((V_\alpha )_{\alpha \in \mathcal{A}_d{\setminus }\{*\}}\) are independent and, recalling (103),
The hierarchical exchangeability of \((S_{\alpha ,N})_{\alpha \in [d]^r}\) follows, because of the obvious invariance of the event \(\mathcal{O}(\mathcal{C}_{d,N})\) under the permutations \(\pi \in \mathcal{H}_d\) in (95),
This finishes the proof of Theorem 1\({}^\prime \) and, thus, Theorem 1.
References
Aldous, D.: Representations for partially exchangeable arrays of random variables. J. Multivar. Anal. 11(4), 581–598 (1981)
Aizenman, M., Sims, R., Starr, S.L.: An extended variational principle for the SK spin-glass model. Phys. Rev. B. 68, 214403 (2003)
Arguin, L.-P., Aizenman, M.: On the structure of quasi-stationary competing particles systems. Ann. Probab. 37(3), 1080–1113 (2009)
Austin, T., Panchenko, D.: A hierarchical version of the de Finetti and Aldous–Hoover representations. Probab. Theory Relat. Fields (2013). doi:10.1007/s00440-013-0521-0
Austin, T.: Exchangeable random measures. To appear, Ann. Inst. Henri Poincaré Probab. Stat., arXiv:1302.2116 (2013)
Bolthausen, E., Sznitman, A.-S.: On Ruelle’s probability cascades and an abstract cavity method. Commun. Math. Phys. 197(2), 247–276 (1998)
Franz, S., Leone, M.: Replica bounds for optimization problems and diluted spin systems. J. Stat. Phys. 111(3–4), 535–564 (2003)
Ghirlanda, S., Guerra, F.: General properties of overlap probability distributions in disordered spin systems. Towards Parisi ultrametricity. J. Phys. A 31(46), 9149–9155 (1998)
Guerra, F.: Broken replica symmetry bounds in the mean field spin glass model. Commun. Math. Phys. 233(1), 1–12 (2003)
Hoover, D. N.: Row–Column Exchangeability and a Generalized Model for Probability. Exchangeability in Probability and Statistics (Rome, 1981), pp. 281–291, North-Holland, Amsterdam-New York (1982)
Kallenberg, O.: On the representation theorem for exchangeable arrays. J. Multivar. Anal. 30(1), 137–154 (1989)
Mézard, M., Parisi, G., Virasoro, M.A.: Spin Glass Theory and Beyond. World Scientific Lecture Notes in Physics, 9. World Scientific Publishing Co.,Teaneck, N.J. (1987)
Mézard, M., Parisi, G.: The Bethe lattice spin glass revisited. Eur. Phys. J. B. Condens. Matter Phys. 20(2), 217–233 (2001)
Monasson, R., Zecchina, R.: Statistical mechanics of the random K-satisfiability model. Phys. Rev. E(3) 56(2), 1357–1370 (1997)
Panchenko, D., Talagrand, M.: Bounds for diluted mean-fields spin glass models. Probab. Theory Relat. Fields 130(3), 319–336 (2004)
Panchenko, D.: A connection between Ghirlanda–Guerra identities and ultrametricity. Ann. Probab. 38(1), 327–347 (2010)
Panchenko, D.: The Ghirlanda–Guerra identities for mixed \(p\)-spin model. C.R. Acad. Sci. Paris Ser. I 348, 189–192 (2010)
Panchenko, D.: The Parisi formula for mixed \(p\)-spin models. Ann. Probab. arXiv:1112.4409 (2011) (appear)
Panchenko, D.: Spin glass models from the point of view of spin distributions. Ann. Probab. 41(3A), 1315–1361 (2013)
Panchenko, D.: The Parisi ultrametricity conjecture. Ann. Math. (2) 177(1), 383–393 (2013)
Panchenko, D.: The Sherrington–Kirkpatrick Model Springer Monographs in Mathematics. Springer, New York (2013)
Panchenko, D.: Structure of \(1\)-RSB asymptotic Gibbs measures in the diluted \(p\)-spin models. J. Stat. Phys. (2014). doi:10.1007/s10955-014-0955-5
Parisi, G.: Infinite number of order parameters for spin-glasses. Phys. Rev. Lett. 43, 1754–1756 (1979)
Parisi, G.: A sequence of approximate solutions to the S-K model for spin glasses. J. Phys. A 13, L-115 (1980)
Pitman, J., Yor, M.: The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25(2), 855–900 (1997)
Ruelle, D.: A mathematical reformulation of Derrida’s REM and GREM. Commun. Math. Phys. 108(2), 225–239 (1987)
Sherrington, D., Kirkpatrick, S.: Solvable model of a spin glass. Phys. Rev. Lett. 35, 1792–1796 (1975)
Stein, E.M., Shakarchi, R.: Real Analysis. Measure Theory, Integration, and Hilbert Spaces. Princeton Lectures in Analysis, III. Princeton University Press, Princeton, NJ (2005)
Talagrand, M.: Spin Glasses: a Challenge for Mathematicians. Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge A Series of Modern Surveys in Mathematics. Springer, Berlin (2003)
Talagrand, M.: The Parisi formula. Ann. Math. (2) 163(1), 221–263 (2006)
Talagrand, M.: Construction of pure states in mean-field models for spin glasses. Probab. Theory Relat. Fields 148(3–4), 601–643 (2010)
Talagrand, M.: Mean-Field Models for Spin Glasses. Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge A Series of Modern Surveys in Mathematics. Springer, Berlin (2011)
Acknowledgments
The author would like to thank the referees for very thorough reviews and many suggestions to improve the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
D. Panchenko Partially supported by NSF Grant.
Rights and permissions
About this article
Cite this article
Panchenko, D. Hierarchical exchangeability of pure states in mean field spin glass models. Probab. Theory Relat. Fields 161, 619–650 (2015). https://doi.org/10.1007/s00440-014-0555-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-014-0555-y
Keywords
- Spin glasses
- Diluted models
- Exchangeability
Mathematics Subject Classification (2010)
- 60K35
- 60G09
- 82B44