1 Introduction

Many mean field spin glass models are described by a random Hamiltonian \(H_N(\sigma )\) on the space of spin configurations \(\Sigma _N = \{-1,+1\}^N\) (see [12, 29] or [32]). For example, in the classical Sherrington–Kirkpatrick model [27],

$$\begin{aligned} H_N(\sigma ) = \frac{1}{\sqrt{N}}\sum _{i,j=1}^N g_{i,j} \sigma _i \sigma _j, \end{aligned}$$
(1)

where \((g_{i,j})_{i,j\ge 1}\) are i.i.d. standard Gaussian random variables, while in the random \(K\)-sat model,

$$\begin{aligned} H_N(\sigma ) = \sum _{k\le \pi (\alpha N)} \prod _{1\le j\le K} \frac{1+{\varepsilon }_{j,k} \sigma _{i_{j,k}}}{2}, \end{aligned}$$
(2)

where \(\alpha >0\) is called the connectivity parameter, \(\pi (\alpha N)\) is a Poisson random variable with the mean \(\alpha N\), \(({\varepsilon }_{j,k})_{j,k\ge 1}\) are independent Rademacher random variables, and the indices \((i_{j,k})_{j,k\ge 1}\) are independent uniform on \(\{1,\ldots ,N\}\). The random \(K\)-sat model is an example of a so called diluted model (diluted refers to the fact that the (hyper)graph of interactions between spins is sparse), and the main goal of this paper is to make some progress toward the Mézard–Parisi ansatz for diluted models originating in [13]. The reason the above two models are called mean field models is because the distributions of their Hamiltonians are invariant under the permutations of coordinates \(\sigma _1,\ldots ,\sigma _N\). This property is called symmetry between sites.

The main goal in spin glass models is usually to compute the limit of the free energy

$$\begin{aligned} F_N = \frac{1}{N}\mathbb {E}\log \sum _{\sigma \in \Sigma _N} \exp \bigl (-\beta H_N(\sigma ) \bigr ) \end{aligned}$$
(3)

as \(N\rightarrow \infty \), for all inverse temperature parameters \(\beta >0\). In the Sherrington–Kirkpatrick model, the formula for the free energy was famously invented by Parisi in [23, 24] and proved rigorously by Talagrand in [30] following important work of Guerra in [9], who showed that the Parisi formula is an upper bound on the free energy. A more recent proof of the Parisi formula in [18] was based on understanding the structure of the Gibbs measure in the infinite-volume limit predicted by the physicists in the eighties (see [12]; this direction of research was jump-started in [3]).

For diluted models, like the random \(K\)-sat model, the analogue of the Parisi formula for the free energy was proposed by Mézard and Parisi in [13] in the so called \(1\)-RSB case (a replica symmetric solution was proposed earlier in [14]), but it was also clear what the natural extension of their solution should look like in the general case. A detailed description of the general Mézard–Parisi formula can be found, for example, in [15]. The fact that this formula gives an upper bound on the free energy was proved by Franz and Leone in [7], which was the analogue of Guerra’s bound [9] in the SK model. One approach to proving the matching lower bound was given in [19] (see Theorem \(2\) and, in particular, Section 2.2 there), where the problem was reduced via an analogue of the Aizenman–Sims–Starr scheme [2] to showing that the structure of the Gibbs measure in the infinite-volume limit is described by the functional order parameter proposed by Mézard and Parisi in [13]. Our main result here combined with the main result in [4] can be viewed as a partial progress in this direction, and at the end of the introduction we will explain what the remaining gap is. Let us mention that the Mézard–Parisi ansatz has been proved in full generality in the setting of the Sherringon-Kirkpatrick model and \(p\)-spin models (see Chapter 4 in [21]), but the proof heavily relies on the special Gaussian nature of the Hamiltonian (1). In diluted models, where this ansatz is of real interest, the problem is still open in general, with one special case handled recently in [22].

In this paper, we will not work with any particular model and will simply assume that the asymptotic Gibbs measures satisfy the Ghirlanda–Guerra identities [8]. In the next section we will review how the Ghirlanda–Guerra identities arise in spin glass models and, as an example, show that one can safely assume their validity in the random \(K\)-sat model. The Ghirlanda–Guerra identities will be stated in this paper in a slightly more general form than usual to accommodate the more general notion of the asymptotic Gibbs measures in models other than the SK model but, of course, one gets this more general form for free from the usual proof of these identities.

Let us begin by recalling the definition of asymptotic Gibbs measures introduced in [19] (see also [5] for a different approach via exchangeable random measures). The Gibbs measure \(G_N\) corresponding to the Hamiltonian \(H_N(\sigma )\) is a (random) probability measure on \(\{-1,+1\}^N\) defined by

$$\begin{aligned} G_N(\sigma ) = \frac{1}{Z_N} \exp \bigl (- \beta H_N(\sigma )\bigr ) \end{aligned}$$
(4)

where the normalizing factor \(Z_N\) is called the partition function. Let \((\sigma ^\ell )_{\ell \ge 1}\) be an i.i.d. sequence of replicas from the Gibbs measure \(G_N\) and let \(\mu _N\) be the joint distribution of the array of all spins on all replicas \((\sigma _i^\ell )_{1\le i\le N, \ell \ge 1}\) under the average product Gibbs measure \(\mathbb {E}G_N^{\otimes \infty }\),

$$\begin{aligned}&\mu _N\Bigl ( \bigl \{\sigma _i^\ell = a_i^\ell \ :\ 1\le i\le N, 1\le \ell \le n \bigr \} \Bigr )\nonumber \\&\quad =\mathbb {E}G_N^{\otimes n}\Bigl ( \bigl \{\sigma _i^\ell = a_i^\ell \ :\ 1\le i\le N, 1\le \ell \le n \bigr \} \Bigr ) \end{aligned}$$
(5)

for any \(n\ge 1\) and any \(a_i^\ell \in \{-1,+1\}\). We extend \(\mu _N\) to a distribution on \(\{-1,+1\}^{\mathbb {N}\times \mathbb {N}}\) by setting \(\sigma _i^\ell =1\) for \(i\ge N+1.\) Let \(\mathcal{M}\) be the sets of all possible limits of \((\mu _N)\) over subsequences with respect to the weak convergence of measures on the compact product space \(\{-1,+1\}^{\mathbb {N}\times \mathbb {N}}\). Because of the symmetry between sites in mean field models, these measures inherit from \(\mu _N\) the invariance under the permutation of both spin and replica indices \(i\) and \(\ell .\) By the Aldous–Hoover representation [1, 10], for any \(\mu \in \mathcal{M}\), there exists a measurable function \(s:[0,1]^4\rightarrow \{-1,+1\}\) such that \(\mu \) is the distribution of the array

$$\begin{aligned} s_i^\ell =s(w,u_\ell ,v_i,x_{i,\ell }), \end{aligned}$$
(6)

where the random variables \(w,(u_\ell ), (v_i), (x_{i,\ell })\) are i.i.d. uniform on \([0,1]\). The function \(s\) is defined uniquely for a given \(\mu \in \mathcal{M}\) up to measure-preserving transformations (Theorem 2.1 in [11]), so we can identify the distribution \(\mu \) of array \((s_i^\ell )\) with \(s\). Since \(s\) takes values in \(\{-1,+1\}\), the distribution \(\mu \) can actually be encoded by the function

$$\begin{aligned} {\sigma }(w,u,v) = \mathbb {E}_x\, s(w,u,v,x) \end{aligned}$$
(7)

where \(\mathbb {E}_x\) is the expectation in \(x\) only. The last coordinate \(x_{i,\ell }\) in (6) is independent for all pairs \((i,\ell )\), so it plays the role of “flipping a coin” with the expected value \(\sigma (w,u_\ell ,v_i)\). In fact, given the function (7), we can, obviously, redefine \(s\) by

$$\begin{aligned} s(w,u_\ell ,v_i,x_{i,\ell }) = 2 {\mathrm{I}}\Bigl (x_{i,\ell } \le \frac{1+ \sigma (w,u_\ell ,v_i) }{2}\Bigr ) -1 \end{aligned}$$
(8)

without affecting the distribution of the array \((s_i^\ell )\). This allows us to separate the randomness of the last coordinate \(x_{i,\ell }\) from the randomness of the array \((\sigma (w,u_\ell ,v_i))\) generated by the function \(\sigma (w,u,v)\).

Then we change the perspective as follows. Let \(du\) and \(dv\) denote the Lebesgue measure on \([0,1]\) and let us define a (random) probability measure

$$\begin{aligned} G = G_w = du \circ \bigl (u\rightarrow \sigma (w,u,\cdot )\bigr )^{-1} \end{aligned}$$
(9)

on the space of functions of \(v\in [0,1]\),

$$\begin{aligned} H = \bigl \{ \Vert \sigma \Vert _\infty \le 1 \bigr \} \end{aligned}$$
(10)

(the unit ball of \(L^\infty \)), equipped with the topology of \(L^2([0,1], dv)\). We will denote by \(\sigma ^1\cdot \sigma ^2\) the scalar product in \(L^2([0,1], dv)\) and by \(\Vert \sigma \Vert \) the corresponding \(L^2\) norm. The random measure \(G\) in (9) is what we call the asymptotic Gibbs measure, which encodes the limit \(\mu \in \mathcal{M}\) above. The whole process of generating spins according to \(\mu \in \mathcal{M}\) can now be visualized in several steps. First, we generate the Gibbs measure \(G=G_w\) using the uniform random variable \(w\). An i.i.d. sequence \(\sigma ^\ell = \sigma (w,u_{\ell },\cdot )\) for \(\ell \ge 1\) of replicas from \(G\) gives us a sequence of functions in \(H\). Then, we plug in i.i.d. uniform random variables \((v_i)_{i\ge 1}\) into these functions to obtain the array \(\sigma ^\ell (v_i) = \sigma (w,u_\ell ,v_i)\) and, finally, use it to generate spins as in (8). From now on, we will keep the dependence of \(G\) on \(w\) implicit, denote i.i.d. replicas from \(G\) by \((\sigma ^\ell )_{\ell \ge 1}\) (which are now functions on \([0,1]\)) and no longer explicitly use the random variables \((u_{\ell })\), and denote the sequence of spins (8) corresponding to the replica \(\sigma ^\ell \) by

$$\begin{aligned} S(\sigma ^\ell ) = \Bigl ( 2 {\mathrm{I}}\Bigl (x_{i,\ell } \le \frac{1+ \sigma ^\ell (v_i) }{2}\Bigr ) -1 \Bigr )_{i\ge 1} \in \{-1,+1\}^\mathbb {N}. \end{aligned}$$
(11)

Given \(n\ge 1\) and replicas \(\sigma ^1,\ldots , \sigma ^n\), we will denote the array of spins corresponding to these replicas by

$$\begin{aligned} S^n = \bigl (S(\sigma ^\ell ) \bigr )_{1\le \ell \le n}. \end{aligned}$$
(12)

We will denote by \(\langle \cdot \rangle \) the average with respect to \(G^{\otimes \infty }\) (corresponding to the average in \((u_\ell )_{\ell \ge 1}\) in the sequence \((\sigma (w,u_{\ell },\cdot ))_{\ell \ge 1}\)) and by \(\mathbb {E}\) the expectation with respect to \(w\), \((v_i)\) and \((x_{i,\ell })\). In the definition of \(\langle \cdot \rangle \) one can also include averaging in the random variables \((x_{i,\ell })\), since they depend on the replica index \(\ell \), and such convention would be especially necessary if we dealt with the cavity computations (see e.g. [19, 22]), when averaging in spins \(S(\sigma ^\ell )\) can also appear in the denominator. However, throughout this paper this will not happen and, by the linearity of expectation, we can think of averaging in \((x_{i,\ell })\) as a part of the expectation \(\mathbb {E}\).

Because of the geometric nature of the asymptotic Gibbs measures \(G\) as measures on the subset of \(L^2([0,1],dv)\), the distance and scalar product between replicas play a crucial role in the description of the structure of \(G\). We will denote the scalar product between replicas \(\sigma ^\ell \) and \(\sigma ^{\ell '}\) by \(R_{\ell ,\ell '} = \sigma ^\ell \cdot \sigma ^{\ell '}\), which is more commonly called the overlap of \(\sigma ^\ell \) and \(\sigma ^{\ell '}\). Let us notice that the overlap \(R_{\ell ,\ell '}\) is a function of spin sequence (11) generated by \(\sigma ^\ell \) and \(\sigma ^{\ell '}\) since, by the strong law of large numbers,

$$\begin{aligned} R_{\ell ,\ell '} = \int \! \sigma ^\ell (v) \sigma ^{\ell '}(v)\, dv = \lim _{j\rightarrow \infty } \frac{1}{j}\sum _{i=1}^j S\bigl (\sigma ^\ell \bigr )_i \,S\bigl (\sigma ^{\ell '}\bigr )_i \end{aligned}$$
(13)

almost surely. We mention this here just to emphasize an obvious point that the array \(S^n\) in (12) contains much more information about the replicas on the space \(H\) than just their overlaps. For example, one can similarly compute the multi-overlaps between replicas,

$$\begin{aligned} R_{\ell _1,\ldots ,\ell _n} = \int \! \sigma ^{\ell _1}(v) \cdots \sigma ^{\ell _n}(v)\, dv = \lim _{j\rightarrow \infty } \frac{1}{j}\sum _{i=1}^j S\bigl (\sigma ^{\ell _1}\bigr )_i \cdots S\bigl (\sigma ^{\ell _n}\bigr )_i. \end{aligned}$$
(14)

From now on we will assume that the measure \(G\) satisfies the Ghirlanda–Guerra identities, which means that for any \(n\ge 2,\) any bounded measurable function \(f\) of the spins \(S^n\) in (12) and any bounded measurable function \(\psi \) of one overlap,

$$\begin{aligned} \mathbb {E}\bigl \langle f(S^n)\psi (R_{1,n+1}) \bigr \rangle = \frac{1}{n} \mathbb {E}\bigl \langle f(S^n) \bigr \rangle \mathbb {E}\bigr \langle \psi (R_{1,2})\bigr \rangle + \frac{1}{n}\sum _{\ell =2}^{n}\mathbb {E}\bigl \langle f(S^n)\psi (R_{1,\ell })\bigr \rangle . \end{aligned}$$
(15)

Another way to express the Ghirlanda–Guerra identities is to say that, conditionally on \(S^n\), the law of \(R_{1,n+1}\) is given by the mixture

$$\begin{aligned} \frac{1}{n} \zeta + \frac{1}{n} \sum _{\ell =2}^n \delta _{R_{1,\ell }}, \end{aligned}$$
(16)

where \(\zeta \) denotes the distribution of \(R_{1,2}\) under the measure \(\mathbb {E}G^{\otimes 2}\),

$$\begin{aligned} \zeta (\ \cdot \ ) = \mathbb {E}G^{\otimes 2}\bigl (R_{1,2}\in \ \cdot \ \bigr ). \end{aligned}$$
(17)

The identities (15) are usually proved for the function \(f\) of the overlaps \((R_{\ell ,\ell '})_{\ell ,\ell '\le n}\) instead of \(S^n\), but exactly the same proof yields (15) as well (see e.g. Section 3.2 in [21]). It is well known that these identities arise from the Gaussian integration by parts of a certain Gaussian perturbation Hamiltonian against the test function \(f\), and one is free to choose this function to depend on all spins and not only overlaps.

In this paper we will be interested to say something about the distribution of the array of spins generated by the Gibbs measure \(G\), but if one is only interested in the behavior of the overlaps then it is now known that the Ghirlanda–Guerra identities completely describe the measure in this sense in terms of the functional order parameter \(\zeta \) in (17). Let us first list several purely geometric consequences.

  1. (i)

    ([29] or Theorem 2.16 in [21]) By Talagrand’s positivity principle, the overlaps can take only nonnegative values, \(\zeta ([0,\infty ))=1\).

  2. (ii)

    ([16] or Theorem 2.15 in [21]) With probability one over the choice of random measure \(G\) the following holds. If \(q^*\) is the largest point in the support \(\text{ supp }(\zeta )\) of measure \(\zeta \) then \(G(\sigma : \Vert \sigma \Vert ^2 = q^*)=1\). If \(\zeta (\{q^*\})>0\) then \(G\) is purely atomic, otherwise, \(G\) has no atoms.

  3. (iii)

    ([20] or Theorem 2.14 in [21]) With probability one, the support of \(G\) is ultrametric, i.e. \(G^{\otimes 3}(R_{2,3} \ge \min (R_{1,2},R_{1,3}))=1\).

When \(G\) is purely atomic, its atoms are called pure states. Otherwise, we will define pure states in some approximate sense that will be explained more precisely below. By ultrametricity, for any \(q\ge 0\), the relation defined by

$$\begin{aligned} \sigma \sim _q \sigma ' \Longleftrightarrow \sigma \cdot \sigma ' \ge q \end{aligned}$$
(18)

is an equivalence relation on the support of \(G\). We will call these \(\sim _q\) equivalence clusters simply \(q\)-clusters. Throughout the paper we will use the convention that, whenever we write \(\sigma \), it belongs to the support of \(G\) rather than the ambient space \(H\).

To state our main result, let us first describe what is called the \(r\) -step replica symmetry breaking (RSB) approximation, which means that we will group the values of the overlap into \(r+1\) groups. Let us consider integer \(r\ge 1\) that will be fixed throughout the paper. Consider an infinitary rooted tree of depth \(r\) with the vertex set

$$\begin{aligned} \mathcal{A}= \mathbb {N}^0 \cup \mathbb {N}\cup \mathbb {N}^2 \cup \cdots \cup \mathbb {N}^r, \end{aligned}$$
(19)

where \(\mathbb {N}^0 = \{*\}\), \(*\) is the root of the tree and each vertex \(\alpha =(n_1,\ldots ,n_p)\in \mathbb {N}^{p}\) for \(p\le r-1\) has children

$$\begin{aligned} \alpha n : = (n_1,\ldots ,n_p,n) \in \mathbb {N}^{p+1} \end{aligned}$$

for all \(n\in \mathbb {N}\). Each vertex \(\alpha \) is connected to the root \(*\) by the path

$$\begin{aligned} * \rightarrow n_1 \rightarrow (n_1,n_2) \rightarrow \cdots \rightarrow (n_1,\ldots ,n_p) = \alpha . \end{aligned}$$

We will denote the set of vertices in this path (excluding the root) by

$$\begin{aligned} p(\alpha ) = \bigl \{n_1, (n_1,n_2),\ldots ,(n_1,\ldots ,n_p) \bigr \}. \end{aligned}$$
(20)

We will denote by \(|\alpha |\) the distance of \(\alpha \) from the root (the same as cardinality of \(p(\alpha )\)). We will write \(\alpha \succ \beta \) if \(\beta \in p(\alpha )\cup \{*\}\) and say that \(\alpha \) is a descendant of \(\beta \), and \(\beta \) is an ancestor of \(\alpha \). We will sometimes denote the set of leaves \(\mathbb {N}^r\) of \(\mathcal{A}\) by \(\mathcal{L}(\mathcal{A})\). For any \(\alpha , \beta \in \mathcal{A}\), let

$$\begin{aligned} \alpha \wedge \beta := |p(\alpha ) \cap p(\beta ) | \end{aligned}$$
(21)

be the number of common vertices in the paths from the root to the vertices \(\alpha \) and \(\beta \). In other words, \(\alpha \wedge \beta \) is the distance of the lowest common ancestor of \(\alpha \) and \(\beta \) from the root.

Let us now consider \(r+1\) disjoint intervals of the type

$$\begin{aligned} I_p = [q_p,q_p') \ \text{ or } \ I_p = \{q_p\} \ \text{ for } \ 0\le p \le r \end{aligned}$$
(22)

such that

$$\begin{aligned} \mathrm{supp}(\zeta )\subseteq \bigcup _{0\le p\le r} I_p \ \text{ and } \ \zeta (I_p) >0 \ \text{ for } \text{ all } \ 0\le p\le r. \end{aligned}$$
(23)

We will allow the possibility of \(I_p = \{q_p\}\) only when the point \(q_p\) is an isolated atom ‘from the right’, namely, \(\zeta (\{q_p\})>0\) and \(\zeta ((q_p,q_p +{\varepsilon }))=0\) for some \({\varepsilon }>0.\) The idea here is that we will use the intervals \(I_p\) to discretize the values of the overlap (so we imagine them as being small), but when we have such an isolated atom, we can simply include it if we wish. For example, when the overlap takes only finitely many values, we can ‘discretize exactly’ by considering only these values.

Without loss of generality, we can also assume that \(q_p < q_{p+1}\) for all \(p\le r-1\), and \(q_0\ge 0\) by Talagrand’s positivity principle. Later on we will need the sequence

$$\begin{aligned} 0= \zeta _{-1} <\zeta _0 <\cdots < \zeta _{r-1} <\zeta _r = 1 \end{aligned}$$
(24)

such that \(\zeta _p - \zeta _{p-1} = \zeta (I_p)\) for \(0\le p\le r.\) Let us now enumerate all the \(q_p\)-clusters defined by (18) according to Gibbs’ weights as follows. Let \(H_{*}\) be the entire support of \(G\) so that \(V_* = G(H_*) =1\). Next, the support is split into \(q_1\)-clusters \((H_n)_{n\ge 1}\), which are then enumerated in the non-increasing order of their weights \(V_n = G(H_n)\),

$$\begin{aligned} V_1 \ge V_2 \ge \cdots \ge V_n \ge \cdots \ge 0. \end{aligned}$$
(25)

We then continue recursively over \(p\le r-1\) and enumerate the \(q_{p+1}\)-subclusters \((H_{\alpha n})_{n\ge 1}\) of a cluster \(H_\alpha \) for \(\alpha \in \mathbb {N}^p\) in the non-increasing order of their weights \(V_{\alpha n} = G(H_{\alpha n})\),

$$\begin{aligned} V_{\alpha 1} \ge V_{\alpha 2} \ge \cdots \ge V_{\alpha n} \ge \cdots \ge 0. \end{aligned}$$
(26)

It is a well-known fact that each cluster \(H_\alpha \) is split into infinitely many subclusters \((H_{\alpha n})_{n\ge 1}\) and their weights are all different and not equal to zero – this is another consequence of the Ghirlanda–Guerra identities. Therefore, all the inequalities in (25) and (26) are strict. More specifically, it is well known that the Ghirlanda–Guerra identities imply that the cluster weights

$$\begin{aligned} V = (V_\alpha )_{\alpha \in \mathcal{A}} \end{aligned}$$
(27)

are generated by the Ruelle probability cascades (RPC) [26]. This will be reviewed in Sect. 4 (see also Chapter 2 in [21]). We will call the \(q_r\)-clusters \(H_\alpha \) indexed by the leaves \(\alpha \in \mathcal{L}(\mathcal{A}) = \mathbb {N}^r\) the pure states. Of course, if \(\zeta (\{q^*\})>0\) then one can take \(I_r = \{q^*\}\) in (22) to ensure that the pure states are exactly the atoms of \(G\). (For a way to construct pure states for the non-asymptotic Gibbs measure \(G_N\) in (4), see [31].)

Notice that the diameter of a pure state \(H_\alpha \) for \(\alpha \in \mathbb {N}^r\) can be bounded in \(L_2\) by

$$\begin{aligned} \mathrm{diam}(H_\alpha ) \le \sqrt{2(q^*-q_r)}, \end{aligned}$$

and when \(q_r\) is close to \(q^*\), these clusters are small and can be well approximated by one point, for example, the \(G\)-barycenter of the cluster. We can take these barycenters as an approximate definition of pure states but, in order not to lose any information, we will encode a pure state by an infinite sample as follows. First of all, notice that sampling from \(G\) can now be done in two steps:

  1. 1.

    Choose \(\alpha \in \mathcal{L}(\mathcal{A}) = \mathbb {N}^r\) according to the weights \((V_\alpha )_{\alpha \in \mathbb {N}^r}\).

  2. 2.

    Sample from the pure state \(H_\alpha \) according to the conditional distribution

    $$\begin{aligned} G_\alpha (\ \cdot \ ) = \frac{G( \ \cdot \ \cap H_\alpha )}{G(H_\alpha )}. \end{aligned}$$
    (28)

For each \(\alpha \in \mathcal{L}(\mathcal{A}) = \mathbb {N}^r\), let us consider an i.i.d. sample \((\sigma ^{\alpha \ell })_{\ell \ge 1}\) with the distribution \(G_\alpha \) and let these samples be independent over such \(\alpha \). As in (11), let us consider the sequence of spins

$$\begin{aligned} S(\sigma ^{\alpha \ell }) = \Bigl ( 2 {\mathrm{I}}\Bigl (x_{i,\alpha \ell } \le \frac{1+ \sigma ^{\alpha \ell }(v_i) }{2}\Bigr ) -1 \Bigr )_{i\ge 1} \end{aligned}$$
(29)

generated by \(\sigma ^{\alpha \ell }\) and let

$$\begin{aligned} S_\alpha = (S(\sigma ^{\alpha \ell }))_{\ell \ge 1}. \end{aligned}$$
(30)

This array of spins completely encodes the pure state \(H_\alpha \) for all practical purposes, if we remember that our main object of interest is the array of spins (6) generated by the measure \(G\).

To state our main result, it remains to recall the definition of hierarchical exchangeability introduced in [4]. Consider the following family of maps on the leaves \(\mathbb {N}^r\) of the tree \(\mathcal{A}\),

$$\begin{aligned} \mathcal{H} = \bigl \{ \pi : \mathbb {N}^r\rightarrow \mathbb {N}^r \,\bigr |\, \pi \text{ is } \text{ a } \text{ bijection }, \pi (\alpha )\wedge \pi (\beta ) = \alpha \wedge \beta \quad \text{ for } \text{ all } \alpha ,\beta \in \mathbb {N}^r \bigr \}.\nonumber \\ \end{aligned}$$
(31)

As explained in [4], the condition \(\pi (\alpha )\wedge \pi (\beta ) = \alpha \wedge \beta \) simply means that the genealogy on the tree is preserved after the permutation and such \(\pi \) can be realized as a recursive rearrangement of children of each vertex starting from the root. We say that an array of random variables \((X_\alpha )_{\alpha \in \mathbb {N}^r}\) taking values in a standard Borel space is hierarchically exchangeable if

$$\begin{aligned} \bigl (X_{\pi (\alpha )} \bigr )_{\alpha \in \mathbb {N}^r} \mathop {=}\limits ^{d} \bigl (X_\alpha \bigr )_{\alpha \in \mathbb {N}^r} \end{aligned}$$
(32)

for all \(\pi \in \mathcal{H}\). Our main result will be the following structure theorem for the Gibbs measure \(G\).

Theorem 1

If (15) holds then the array (30) of spins \((S_\alpha )_{\alpha \in \mathbb {N}^r}\) within pure states is hierarchically exchangeable and independent of the cluster weights \((V_\alpha )_{\alpha \in \mathcal{A}}\) in (27).

If we write \(S_\alpha = (S_{\alpha ,i})_{i\ge 1}\), by making the dependence on the spin index \(i\) in (29) explicit, then it is obvious that the distribution of the array \((S_{\alpha ,i})\) is also invariant under the permutation of spins,

$$\begin{aligned} \bigl (S_{\pi (\alpha ), \rho (i)} \bigr )_{\alpha \in \mathbb {N}^r, i\in \mathbb {N}} \mathop {=}\limits ^{d} \bigl (S_{\alpha ,i} \bigr )_{\alpha \in \mathbb {N}^r, i\in \mathbb {N}} \end{aligned}$$
(33)

for all \(\pi \in \mathcal{H}\) and all bijections \(\rho :\mathbb {N}\rightarrow \mathbb {N}\). The Aldous–Hoover representation was generalized to such hierarchically exchangeable arrays in [4] and, in particular, Theorem 2 in [4] implies the following.

Corollary 1

If (15) holds then the array \((S_{\alpha ,i})_{\alpha \in \mathbb {N}^r, i\in \mathbb {N}}\) can be generated in distribution as

$$\begin{aligned} S_{\alpha ,i} = f\bigl ( \omega _*, (\omega _{\beta })_{\beta \in p(\alpha )}, \omega _*^i, (\omega _{\beta }^i)_{\beta \in p(\alpha )} \bigr ), \end{aligned}$$
(34)

where \(f: [0,1]^{2(r+1)} \rightarrow \{-1,+1\}^\mathbb {N}\) is a measurable function and \(\omega _\alpha ,\omega _\alpha ^i\) for \(\alpha \in \mathcal{A}\) and \(i\in \mathbb {N}\) are i.i.d. random variables with the uniform distribution on \([0,1]\).

Note a slight difference in notation here and in [4] – in this paper we chose not to include the root \(*\) in the path (20) while in [4] it was included. This is why we write \(\omega _*\) and \(\omega _*^i\) in (34) separately. Let us now explain the connection of the representation (34) to the Mézard–Parisi ansatz and what seems to be the main obstacle left. First of all, if we denote the barycenter of the pure state \(H_\alpha \) by

$$\begin{aligned} {\bar{\sigma }}^\alpha = \int \limits _{H_\alpha } \sigma \, dG_\alpha (\sigma ) \end{aligned}$$
(35)

then, by the strong law of large numbers, (29) implies that

$$\begin{aligned} m^{\alpha } =(m_i^\alpha )_{i\ge 1}:= \bigl ({\bar{\sigma }}^\alpha (v_i)\bigr )_{i\ge 1} = \lim _{n\rightarrow \infty } \frac{1}{\ell } \sum _{\ell =1}^n S(\sigma ^{\alpha \ell }) \end{aligned}$$
(36)

almost surely. In the case when the pure state consists of one point \({\bar{\sigma }}^\alpha \) (for example, we mentioned above that if \(\zeta (\{q^*\})>0\) and we choose \(I_r = \{q^*\}\) then all pure states will be points) the vector \(m^\alpha \) is called the magnetization inside the pure state \(\alpha \), otherwise, we can view it as an approximate notion of magnetization. The representation (34) and (36) imply that

$$\begin{aligned} m^\alpha _i = m\bigl ( \omega _*, (\omega _{\beta })_{\beta \in p(\alpha )}, \omega _*^i, (\omega _{\beta }^i)_{\beta \in p(\alpha )} \bigr ) \end{aligned}$$
(37)

for some measurable function \(m: [0,1]^{2(r+1)} \rightarrow [-1,1]\).

What the Mézard–Parisi ansatz predicts is that, when \(r\) is getting large and all the intervals \(I_p\) in (22) are getting small (which means that the \(r\)-step RSB scheme gives a good approximation of the overlap distribution), the magnetizations inside the pure states can be generated approximately (in the sense of distribution) by

$$\begin{aligned} m^\alpha _i = m\bigl (\omega _*^i, (\omega _{\beta }^i)_{\beta \in p(\alpha )} \bigr ) \end{aligned}$$
(38)

for some measurable function \(m: [0,1]^{r+1} \rightarrow [-1,1]\). As we already mentioned above, only the so called \(1\)-RSB case corresponding to \(r=1\) was described in detail in [13] (see Section V there), but the general case is just a natural extension. The function \(m\) is the order parameter of the Mézard–Parisi ansatz in the sense that one can express the free energy by some variational formula in terms of \(m\). Obviously, (38) can hold only if the spin magnetizations are generated independently over the spin index \(i\ge 1\) within pure states (which was, in fact, an assumption in [13]), but this assumption can be relaxed and the Mézard–Parisi formula for the free energy can be proved using the approach in Theorem \(2\) in [19] under a slightly weaker hypothesis that the magnetizations inside the pure states are generated approximately by

$$\begin{aligned} m^\alpha _i = m\bigl (\omega _*, \omega _*^i, (\omega _{\beta }^i)_{\beta \in p(\alpha )} \bigr ) \end{aligned}$$
(39)

for some measurable function \(m: [0,1]^{r+2} \rightarrow [-1,1]\). The difference between (37) and (39) can be informally expressed as follows. In (39), we have one (random) function \(m(\omega _*, \ \cdot \ ,\ \cdot \ )\) that is used to generate spin magnetizations \(m^\alpha _i\) in each pure state \(\alpha \) using the randomness \(\omega _*^i, (\omega _{\beta }^i)_{\beta \in p(\alpha )}\) along the path from the root to \(\alpha \). In (37), for each pure state \(\alpha \) we first generate its own function \(m(\omega _*, (\omega _{\beta })_{\beta \in p(\alpha )}, \ \cdot \ ,\ \cdot \ )\) in a hierarchically symmetric fashion and then use it to generate spin magnetizations inside that pure state.

A possible way to go from (37) to (39) is to show that multi-overlaps are functions of the overlaps, which means the following. Let us consider \(n\) pure state indices \(\alpha _1,\ldots ,\alpha _n \in \mathbb {N}^r\). If we compare the representations of \(m^\alpha _i\) in terms of the barycenter \({\bar{\sigma }}^\alpha \) in (36) and in terms of the function \(m\) in (37) then the so called multi-overlap between these \(n\) barycenters can be written as

$$\begin{aligned} R_{\alpha _1,\ldots ,\alpha _n} := \int \prod _{\ell \le n}{\bar{\sigma }}^{\alpha _\ell }(v) \,dv = \mathbb {E}_i \prod _{\ell \le n} m\bigl ( \omega _*, (\omega _{\beta })_{\beta \in p(\alpha _\ell )}, \omega _*^i, (\omega _{\beta }^i)_{\beta \in p(\alpha _\ell )} \bigr ), \end{aligned}$$

where \(\mathbb {E}_i\) denotes the average in the random variables that depend on the spin index \(i\). If (39) holds then, similarly,

$$\begin{aligned} R_{\alpha _1,\ldots ,\alpha _n} := \int \prod _{\ell \le n}{\bar{\sigma }}^{\alpha _\ell }(v) \,dv = \mathbb {E}_i \prod _{\ell \le n} m\bigl ( \omega _*, \omega _*^i, (\omega _{\beta }^i)_{\beta \in p(\alpha _\ell )} \bigr ), \end{aligned}$$

which clearly depends only on \((\alpha _\ell \wedge \alpha _{\ell '})_{1\le \ell ,\ell '\le n}\). In the opposite direction, it is also not difficult to show that if \(R_{\alpha _1,\ldots ,\alpha _n}\) depends only on \((\alpha _\ell \wedge \alpha _{\ell '})_{1\le \ell ,\ell '\le n}\) for all \(n\ge 2\) then (37) can be replaced by (39). Of course, in the \(r\)-step RSB approximation, \(\alpha _\ell \wedge \alpha _{\ell '}\) describes the overlap \({\bar{\sigma }}^{\alpha _\ell }\cdot {\bar{\sigma }}^{\alpha _{\ell '}}\) only approximately, so the statement “multi-overlaps are functions of overlaps” should be understood in an approximate sense for a finite \(r\)-step RSB approximation and should only become exact as \(r\) goes to infinity, or if the distribution of the overlap is indeed concentrated on \(r+1\) points.

In the next section, we will begin with a review of the Ghirlanda–Guerra identities. In Sect. 3, we will prove some analogue of Theorem 1 at the level of the sample from the Gibbs measure rather than working with the pure states directly. In Sect. 4, we will prove a technical result about the weights in the RPC and, in Sect. 5, we will deduce Theorem 1 from the main result in Sect. 3 by sending the sample size to infinity.

2 The Ghirlanda–Guerra identities

In this section, we will explain in what sense the Ghirlanda–Guerra identities are valid in diluted models, and we will use the example of the random \(K\)-sat model (2) for this purpose. For each \(p\ge 1\), let us consider the process \(g_p(\sigma )\) on \(\Sigma _N = \{-1,+1\}^N\) given by

$$\begin{aligned} g_{p}(\sigma ) = \frac{1}{N^{p/2}} \sum _{i_1,\ldots ,i_p = 1}^N g_{i_1,\ldots , i_p} \sigma _{i_1}\ldots \sigma _{i_p}, \end{aligned}$$
(40)

where \((g_{i_1,\ldots , i_p})\) are i.i.d. standard Gaussian random variables, and define

$$\begin{aligned} g(\sigma ) = \sum _{p\ge 1} 2^{-p} x_pg_{p}(\sigma ) \end{aligned}$$
(41)

for parameters \((x_p)_{p\ge 1}\) that take values in the interval \(x_p\in [0,3]\) for all \(p\ge 1\). It is easy to check that the variance of this Gaussian process satisfies \(\mathbb {E}g(\sigma )^2 \le 3.\) Given the Hamiltonian \(H_N(\sigma )\) in (2), let us consider the perturbed Hamiltonian

$$\begin{aligned} H_N^{\mathrm {pert}}(\sigma ) = H_N(\sigma ) - \frac{s}{\beta } g(\sigma ) \end{aligned}$$
(42)

for some parameter \(s\ge 0.\) It is easy to see, using Jensen’s inequality on each side, that

$$\begin{aligned} \frac{1}{N}\mathbb {E}\log \sum _{\sigma \in \varSigma _N} \exp \bigl (-\beta H_N(\sigma )\bigr )&\le \ \frac{1}{N}\mathbb {E}\log \sum _{\sigma \in \varSigma _N} \exp \bigl (-\beta H_N^{\mathrm {pert}}(\sigma ) \bigr ) \\&\le \ \frac{1}{N}\mathbb {E}\log \sum _{\sigma \in \varSigma _N} \exp \bigl (-\beta H_N(\sigma ) \bigr ) + \frac{3s^2}{2N}. \end{aligned}$$

Therefore, if we let \(s\) in (42) depend on \(N\), \(s=s_N\), in such a way that

$$\begin{aligned} \lim _{N\rightarrow \infty } N^{-1} s_N^2 = 0, \end{aligned}$$
(43)

then the limit of the free energy is not affected by the perturbation term \((s/\beta )g(\sigma ).\) Since our ultimate goal is to find the formula for the free energy in the limit \(N\rightarrow \infty \), adding a perturbation term is allowed if it helps us in some other way. Of course, the real purpose of adding the perturbation term is to obtain the Ghirlanda–Guerra identities for the Gibbs measure

$$\begin{aligned} G_N(\sigma ) = \frac{\exp (-\beta H_N^{\mathrm {pert}}(\sigma ) )}{Z_N} \, \text{ where } \, Z_N = \sum _{\sigma \in \varSigma _{N}} \exp \bigl (-\beta H_N^{\mathrm {pert}}(\sigma ) \bigr ), \end{aligned}$$
(44)

which now corresponds to the perturbed Hamiltonian (42). In other words, even though the perturbation term does not affect the free energy, it will affect the Gibbs measure ‘in a good way’ by regularizing it and forcing it to satisfy the Ghirlanda–Guerra identities. From now on, everything is defined with respect to this Gibbs measure corresponding to the perturbed Hamiltonian, such as any limit \(\mu \in \mathcal{M}\) and the corresponding asymptotic Gibbs measure \(G\).

Since we will soon pass to the limit \(N\rightarrow \infty \), it should not cause any confusion if we temporarily denote by \(\langle \cdot \rangle \) the average with respect to \(G_N^{\otimes \infty }\) in (44), let \((\sigma ^\ell )_{\ell \ge 1}\) be a sequence of replicas from \(G_N\) and denote by

$$\begin{aligned} R_{\ell ,\ell '} = \frac{1}{N} \sum _{i=1}^N \sigma _i^\ell \sigma _i^{\ell '} \end{aligned}$$
(45)

the overlap between replicas \(\sigma ^\ell \) and \(\sigma ^{\ell '}\). Let us consider the function

$$\begin{aligned} \varphi = \log Z_N = \log \sum _{\sigma \in \varSigma _N} \exp \bigl ( -\beta H_N(\sigma ) + s g(\sigma )\bigr ) = \log \sum _{\sigma \in \varSigma _N} \exp \bigl ( -\beta H_N^{\mathrm {pert}}(\sigma )\bigr ),\nonumber \\ \end{aligned}$$
(46)

viewed as a random function \(\varphi = \varphi \bigl ((x_p)\bigr )\) of the parameters \((x_p)\) in (41), and suppose that

$$\begin{aligned} \sup \Bigl \{ \mathbb {E}|\varphi - \mathbb {E}\varphi | \ \bigr | \ 0\le x_p\le 3, p\ge 1\Bigr \}\le v_N(s) \end{aligned}$$
(47)

for some function \(v_N(s)\) that describes how well \(\varphi ((x_p))\) is concentrated around its expected value uniformly over all possible choices of the parameters \((x_p)\) from the interval \([0,3].\) Now, for any \(n\ge 2, p\ge 1\) and any function \(f=f(\sigma ^1,\ldots ,\sigma ^n)\) on \(\Sigma _N^n\) uniformly bounded by \(1\), let us define

$$\begin{aligned} \varDelta (f,n,p) = \Bigl | \mathbb {E}\bigl \langle f R_{1,n+1}^p \bigr \rangle - \frac{1}{n}\mathbb {E}\bigl \langle f \bigr \rangle \mathbb {E}\bigl \langle R_{1,2}^p\bigr \rangle - \frac{1}{n}\sum _{\ell =2}^{n}\mathbb {E}\bigl \langle f R_{1,\ell }^p\bigr \rangle \Bigr |. \end{aligned}$$
(48)

Let us now think of \((x_p)_{p\ge 1}\) as a sequence of i.i.d. random variables with the uniform distribution on \([1,2]\) and denote by \(\mathbb {E}_x\) the expectation with respect to such sequence. Here is one common formulation of the Ghirlanda–Guerra identities from Theorem 3.2 in [21].

Theorem 2

Suppose that the parameter \(s\) in (42) depends on \(N\), \(s=s_N\), and the sequence \((s_N)\) satisfies \(\lim _{N\rightarrow \infty } s_N=\infty \) and \(\lim _{N\rightarrow \infty } s_N^{-2} v_N(s_N) = 0\). Then

$$\begin{aligned} \lim _{N\rightarrow \infty } \mathbb {E}_x \varDelta (f,n,p) = 0 \end{aligned}$$
(49)

for any \(p\ge 1, n\ge 2\) and any measurable function \(f\) such that \(\Vert f\Vert _\infty \le 1\).

Of course, since the space \(\Sigma _N\) changes with \(N\), the function \(f\) here is really a sequence \(f=f_N\) such that \(\Vert f_N\Vert _\infty \le 1\) for all \(N\ge 1\).

We will show below that, in the setting of the \(K\)-sat model, one can find a sequence \((s_N)\) that satisfies (43) and the conditions in Theorem 2. However, first let us recall how one can go from (49) to (15) for any asymptotic Gibbs measure \(G\) that arises in the limit (as explained in the introduction) from a sequence of measures \(G_N\) that satisfy (49). Simply, we consider the collection \(\mathcal F\) of all triples \((f,n,p)\) such that \(p\ge 1, n\ge 2\) and \(f = \prod _{(i,\ell )\in F}\sigma _i^\ell \) for a finite subset \(F\subseteq \mathbb {N}\times \{1,\ldots , n\}.\) This is a countable collection, so we can enumerate it, \(\mathcal{F} = \{(f_j,n_j,p_j) \ |\ j\ge 1\}\), and consider

$$\begin{aligned} \Delta _N(x) = \sum _{j\ge 1} 2^{-j} \Delta (f_j,n_j,p_j). \end{aligned}$$

Then (49) implies that \(\lim _{N\rightarrow \infty } \mathbb {E}_x \Delta _N(x) = 0\) and, as a consequence, we can choose a sequence \(x^N = (x_p^N)_{p\ge 1}\) changing with \(N\) such that \(\lim _{N\rightarrow \infty } \Delta _N(x^N) = 0.\) Therefore, if we now define the perturbation (41) and the Gibbs measure (44) with this choice of parameters \(x^N\) that depend on \(N\), we get

$$\begin{aligned} \lim _{N\rightarrow \infty } \Bigl | \mathbb {E}\bigl \langle f R_{1,n+1}^p \bigr \rangle - \frac{1}{n}\mathbb {E}\bigl \langle f \bigr \rangle \mathbb {E}\bigl \langle R_{1,2}^p\bigr \rangle - \frac{1}{n}\sum _{\ell =2}^{n}\mathbb {E}\bigl \langle f R_{1,\ell }^p\bigr \rangle \Bigr | =0 \end{aligned}$$

for any \((f,n,p)\in \mathcal{F}\). It should be obvious that this implies (15) for any asymptotic Gibbs measure \(G\) corresponding to a limit \(\mu \in \mathcal M\) of \((\mu _N)\) in (5) over any subsequence. The fact that the overlaps in (45) converge in distribution to the overlap in (13) over the same subsequence can be easily seen by computing their joint moments using the symmetry between sites (see the introduction in [19] for details). Moreover, the identities (15) for \(\psi (x) = x^p\) and \(f\) given by a product of finitely many spins, clearly, imply (15) for any \(f\) and \(\psi \). (Finally, let us point out that, even though the Ghirlanda–Guerra identities are typically proved via the above perturbation, in the mixed \(p\)-spin models they can be proved without any perturbation, see [17] or Section 3.7 in [21].)

Let us check the conditions of Theorem 2 in the random \(K\)-sat model.

Lemma 1

For the \(K\)-sat Hamiltonian (2), both (43) and the conditions in Theorem 2 are satisfied with \(s_N = N^{\gamma }\) for any \(\gamma \in (1/4, 1/2).\)

Proof

We need to estimate the left hand side of (47) with \(H_N(\sigma )\) given by (2). We will separate various sources of randomness as follows. For a function \(\varphi =\varphi (X,Y)\) of two independent random variables \(X\) and \(Y\), by the triangle inequality and Jensen’s inequality,

$$\begin{aligned} \mathbb {E}|\varphi - \mathbb {E}\varphi | \le \mathbb {E}|\varphi - \mathbb {E}_X \varphi | + \mathbb {E}| \mathbb {E}_X \varphi - \mathbb {E}\varphi | \le \mathbb {E}|\varphi - \mathbb {E}_X \varphi | + \mathbb {E}| \varphi - \mathbb {E}_Y \varphi |, \end{aligned}$$

where \(\mathbb {E}_X\) and \(\mathbb {E}_Y\) denote the expectation in \(X\) and \(Y\) only. Similarly, for a function \(\varphi =\varphi (X,Y,Z)\) of three independent random variables,

$$\begin{aligned} \mathbb {E}|\varphi - \mathbb {E}\varphi | \le \mathbb {E}| \varphi - \mathbb {E}_X \varphi | + \mathbb {E}|\varphi - \mathbb {E}_Y \varphi | + \mathbb {E}| \varphi - \mathbb {E}_Z \varphi |. \end{aligned}$$

In the case of the function (46), these three sources of randomness will come from the perturbation term \(g(\sigma )\), the Poisson random variable \(\pi (\alpha N)\), and the sequence of Rademacher random variables \(({\varepsilon }_{j,k})\) and random indices \((i_{j,k})\). We will write the corresponding expectations by \(\mathbb {E}_g\), \(\mathbb {E}_\pi \) and \(\mathbb {E}_\theta \) correspondingly, so that

$$\begin{aligned} \mathbb {E}|\varphi - \mathbb {E}\varphi | \le \mathbb {E}| \varphi - \mathbb {E}_g \varphi | + \mathbb {E}|\varphi - \mathbb {E}_\pi \varphi | + \mathbb {E}| \varphi - \mathbb {E}_\theta \varphi |. \end{aligned}$$

In each term, we will first fix all other randomness and estimate \(\mathbb {E}_g | \varphi - \mathbb {E}_g \varphi |\), \(\mathbb {E}_\pi |\varphi - \mathbb {E}_\pi \varphi |\) and \(\mathbb {E}_\theta | \varphi - \mathbb {E}_\theta \varphi |\). The first one can be estimated using the standard Gaussian concentration (see e.g. Theorem 1.2 in [21]). Since the variance of \(sg(\sigma )\) is bounded by \(3s^2\), we get \(\mathbb {E}_g | \varphi - \mathbb {E}_g \varphi | \le L s\) for some absolute constant \(L\). This gives \(\mathbb {E}| \varphi - \mathbb {E}_g \varphi | \le Ls\). To estimate the last two terms, we will use the fact that each term in (2) for a fixed \(k\),

$$\begin{aligned} \theta _k(\sigma ) = \prod _{1\le j\le K} \frac{1+{\varepsilon }_{j,k} \sigma _{i_{j,k}}}{2}, \end{aligned}$$
(50)

is bounded uniformly by \(1\). First of all, if \(\pi _1\) and \(\pi _2\) are two independent copies of \(\pi (\alpha N)\), and we think of \(\varphi \) for a moment as a function \(\varphi (\pi (\alpha N))\) of \(\pi (\alpha N)\) only, then

$$\begin{aligned} \mathbb {E}_\pi |\varphi - \mathbb {E}_\pi \varphi | \le \mathbb {E}_\pi |\varphi (\pi _1) - \varphi (\pi _2)| \le \beta \mathbb {E}|\pi _1 - \pi _2| \le 2\beta \sqrt{\alpha N}. \end{aligned}$$

This gives \(\mathbb {E}|\varphi - \mathbb {E}_\pi \varphi | \le 2\beta \sqrt{\alpha N}\). Finally, to estimate \(\mathbb {E}_\theta | \varphi - \mathbb {E}_\theta \varphi |\), we can use the standard martingale difference representation for \(\varphi - \mathbb {E}_\theta \varphi = \sum _{k\le \pi (\alpha N)} d_k\) by adding the randomness of one term (50) at a time to obtain

$$\begin{aligned} \mathbb {E}_\theta (\varphi - \mathbb {E}_\theta \varphi )^2 = \sum _{k\le \pi (\alpha N)} \mathbb {E}_\theta d_k^2 \le 4\beta ^2 \pi (\alpha N). \end{aligned}$$

Therefore, \(\mathbb {E}(\varphi - \mathbb {E}_\theta \varphi )^2 \le 4\beta ^2 \alpha N\) and \(\mathbb {E}| \varphi - \mathbb {E}_\theta \varphi | \le 2\beta \sqrt{\alpha N}\). Combining all three estimates, we proved that \(\mathbb {E}|\varphi - \mathbb {E}\varphi | \le L s + 4\beta \sqrt{\alpha N}.\) Now it is easy to see that we can take \(s_N = N^{\gamma }\) for any \(\gamma \in (1/4, 1/2)\) to satisfy (43) and the conditions in Theorem 2. \(\square \)

We now go back to the notations in the setting of asymptotic Gibbs measures in the introduction, and will end this section with the invariance property that will be the main tool in the proof of Theorem 1. Given \(n\ge 1\), consider \(n\) bounded measurable functions \(f_1,\ldots , f_n: \mathbb {R}\rightarrow \mathbb {R}\) and define

$$\begin{aligned} F(\sigma ,\sigma ^1,\ldots ,\sigma ^n) = f_1(\sigma \cdot \sigma ^1)+\cdots +f_n(\sigma \cdot \sigma ^n). \end{aligned}$$
(51)

For \(1\le \ell \le n\) we define

$$\begin{aligned} F_\ell (\sigma ,\sigma ^1,\ldots ,\sigma ^n) = F(\sigma ,\sigma ^1,\ldots ,\sigma ^n) - f_\ell (\sigma \cdot \sigma ^\ell )+ \mathbb {E}\bigl \langle f_\ell (R_{1,2}) \bigr \rangle \end{aligned}$$
(52)

Consider a finite index set \(\mathcal{T}.\) Given a realization of the random measure \(G\) and a sample \(\sigma ^1,\ldots ,\sigma ^n\) from \(G\) let \((B_t)_{t\in \mathcal{T}}\) be a partition of the support of \(G\) such that, for each \(t\in \mathcal{T}\), the indicator \({\mathrm{I}}(\sigma \in B_t)\) is a measurable function of \((\sigma ^\ell \cdot \sigma ^{\ell '})_{\ell ,\ell '\le n}\) and \((\sigma \cdot \sigma ^\ell )_{\ell \le n}\). Let

$$\begin{aligned} \delta _t=\delta _t(\sigma ^1,\ldots ,\sigma ^n)=G(B_t). \end{aligned}$$
(53)

Let us define the map \(T\) by

$$\begin{aligned} \delta =(\delta _t)_{t\in \mathcal{T}}\rightarrow T(\delta ) = \Bigl (\frac{\langle {\mathrm{I}}(\sigma \in B_t) \exp F(\sigma ,\sigma ^1,\ldots ,\sigma ^n )\rangle _{{\_}}}{\langle \exp F(\sigma ,\sigma ^1,\ldots ,\sigma ^n )\rangle _{{\_}}} \Bigr )_{t\in \mathcal{T}}, \end{aligned}$$
(54)

where \(\langle \cdot \rangle _{{\_}}\) denotes the average with respect to the measure \(G\) in \(\sigma \) only for fixed \(\sigma ^1,\ldots , \sigma ^n\). The following result was proved in [20] (see also Theorem 2.19 in [21]) as a consequence of the Ghirlanda–Guerra identities (15). Recall the definition of \(S^n\) in (12).

Theorem 3

If (15) holds then, for any bounded measurable function \(\Phi =\Phi (S^n, \delta )\),

$$\begin{aligned} \mathbb {E}\bigl \langle \Phi (S^n, \delta ) \bigr \rangle = \mathbb {E}\Bigl \langle \frac{ \Phi (S^n,T(\delta )) \exp \sum _{\ell =1}^{n} F_\ell (\sigma ^\ell ,\sigma ^1,\ldots ,\sigma ^n)}{\langle \exp F(\sigma ,\sigma ^1,\ldots ,\sigma ^n)\rangle _{{\_}}^n} \Bigr \rangle . \end{aligned}$$
(55)

This theorem was proved in [20] for the function \(\Phi \) of the overlaps \((R_{\ell ,\ell '})_{\ell ,\ell '\le n}\) instead of all spins \(S^n\). This is because the Ghirlanda–Guerra identities in [20] were stated only for the function of the overlaps, while here we wrote them in (15) for a function of all spins. Otherwise, the proof of Theorem 3 from (15) is identical to the one in [20].

3 At the level of replicas

The main work will be to prove some analogue of Theorem 1 at the level of the replicas \(\sigma ^1,\ldots , \sigma ^n\) sampled from the Gibbs measure \(G\) described at the end of previous section, which will then imply Theorem 1 by passing \(n\) to infinity. Until further notice, however, \(n\) will be fixed.

Let \(\mathcal{T}\) be a finite rooted labelled tree of depth \(r\). We will label the vertices of \(\mathcal{T}\) by a finite subset of \(\mathcal{A}\) in (19) as follows. The root will again be labelled by \(*\). Then, recursively for \(p\le r-1\), if a vertex at the distance \(p\) from the root labelled by \(t\in \mathbb {N}^p\) has \(k_t\) children then we label them by \(t 1,\ldots , t k_t \in \mathbb {N}^{p+1}\) (recall that for simplicity we write \(tk\) for \((t,k)\)). We identify the tree \(\mathcal{T}\) with the set of vertex labels and use the same notation, \(|t|, t\wedge s\), \(t\succ s\) for \(t,s\in \mathcal{T}\), as for the tree \(\mathcal{A}\). We will denote by \(\mathcal{L}(\mathcal{T})\) the set of leaves of \(\mathcal{T}\) and consider a function

$$\begin{aligned} \mathcal{P}: \{1,\ldots , n\} \rightarrow \mathcal{L}(\mathcal{T}). \end{aligned}$$
(56)

We will call the pair \(\mathcal{C}=(\mathcal{T},\mathcal{P})\) a configuration if \(\mathcal{P}^{-1}(t) \not = \emptyset \, \text{ for } \text{ all } \, t\in \mathcal{L}(\mathcal{T}),\) i.e. at least one replica index is mapped into each leaf. Of course, this means that the cardinality \(|\mathcal{L}(\mathcal{T})| \le n\). The role of the function \(\mathcal{P}\) is to partition replica indices among the leaves of \(\mathcal{T}\) and then use the tree structure to describe how replicas \(\sigma ^1,\ldots , \sigma ^n\) cluster according to the overlap equivalence relations (18) along the tree \(\mathcal{T}\). More precisely, we will consider the event

$$\begin{aligned} {\mathcal{O}(\mathcal{C})}= \Bigl \{ (\sigma ^1,\ldots ,\sigma ^n) \ \bigr |\ \sigma ^\ell \cdot \sigma ^{\ell '}\in I_{\mathcal{P}(\ell )\wedge \mathcal{P}(\ell ')} \text{ for } \text{ all } 1\le \ell ,\ell ' \le n\Bigr \}. \end{aligned}$$
(57)

This event depends on the tree \(\mathcal{T}\) via \(\mathcal{P}(\ell )\wedge \mathcal{P}(\ell ')\) and \(I_{\mathcal{P}(\ell )\wedge \mathcal{P}(\ell ')}\) is one of the intervals in (23). In other words, on this event the overlap of replicas “assigned by \(\mathcal{P}\)” to the leaves \(t,t'\in \mathcal{L}(\mathcal{T})\) is determined by the depth \(t\wedge t'\) of their lowest common ancestor.

Let us assume from now on that the sample belongs to the event \({\mathcal{O}(\mathcal{C})}\). Then, we can use ultrametricity of the support of the measure \(G\) to partition it in a natural way ‘along the tree \(\mathcal{T}\)’ according to the overlaps with the replicas \(\sigma ^1,\ldots , \sigma ^n\). For each \(t\in \mathcal{T}\), let

$$\begin{aligned} \mathcal{R}(t) = \bigl \{ 1\le \ell \le n \ |\ \mathcal{P}(\ell ) \succ t \bigr \} \end{aligned}$$
(58)

be the set of replica indices assigned to the leaves which are descendants of \(t\). Consider the sets

$$\begin{aligned} C_t = \bigl \{\sigma \ | \ \sigma \cdot \sigma ^{\ell } \ge q_{|t|} \text{ for } \text{ all } \ell \in \mathcal{R}(t) \bigr \}. \end{aligned}$$
(59)

Since, obviously, \(t'\wedge t'' \ge |t|\) for any \(t',t''\succ t\), the overlap \(\sigma ^\ell \cdot \sigma ^{\ell '} \ge q_{|t|}\) for all \(\ell ,\ell ' \in \mathcal{R}(t)\) on the event \({\mathcal{O}(\mathcal{C})}\). By ultrametricity, this implies that we can also write the set (59) as

$$\begin{aligned} C_t = \bigl \{\sigma \ | \ \sigma \cdot \sigma ^{\ell } \ge q_{|t|} \text{ for } \text{ any } \ell \in \mathcal{R}(t) \bigr \}. \end{aligned}$$
(60)

This makes it obvious that the sets \(C_t\) are nested, \(C_{t'} \subseteq C_t\) for \(t'\succ t\). Another simple property is that the sets indexed by the children of \(t\) are disjoint subsets of \(C_t\),

$$\begin{aligned} C_{tk} \cap C_{tk'} = \emptyset \ \text{ for } \text{ all } \ k\not = k'\le k_t \end{aligned}$$
(61)

(recall that \(k_t\) is the number of children of \(t\in \mathcal{T}\)). To see this, if we take \(\ell \in \mathcal{R}(tk)\) and \(\ell '\in \mathcal{R}(tk')\) then \(\sigma ^\ell \cdot \sigma ^{\ell '} \in I_{|t|}= [q_{|t|},q_{|t|}') \) by (57). On the other hand,

$$\begin{aligned} \sigma \cdot \sigma ^{\ell } \ge q_{|t|+1} \text{ for } \sigma \in C_{tk} \text{ and } \sigma \cdot \sigma ^{\ell '} \ge q_{|t|+1} \text{ for } \sigma \in C_{tk'}, \end{aligned}$$

so (61) again follows by ultrametricity. Let us now consider the sets \(B_t := C_t\) for \(t\in \mathcal{L}(\mathcal{T})\) and

$$\begin{aligned} B_t : = C_t{\setminus } \cup _{k\le k_t} C_{tk} = \bigl \{\sigma \ |\ \sigma \cdot \sigma ^\ell \in I_{|t|} \text{ for } \text{ all } \ell \in \mathcal{R}(t)\bigr \} \end{aligned}$$
(62)

for \(t\in \mathcal{T}{\setminus }\mathcal{L}(\mathcal{T}).\) On the event \({\mathcal{O}(\mathcal{C})}\), the collection \((B_t)_{t\in \mathcal{T}}\) forms a random partition of the support of the Gibbs measures \(G\) and, by definition, the indicator \({\mathrm{I}}(\sigma \in B_t)\) depends only on the overlaps \((\sigma \cdot \sigma ^\ell )_{\ell \le n}\). Below, this will allow us to apply Theorem 3 to this partition with some specific choice of function \(f_1,\ldots , f_n\) in (51).

Let us denote the Gibbs weights of the above sets by

$$\begin{aligned} W_t = G(C_t) \ \text{ and } \ \delta _t = G(B_t) = W_t - \sum _{k\le k_t} W_{tk}. \end{aligned}$$
(63)

It is obvious that two different configurations \(\mathcal{C}=(\mathcal{T},\mathcal{P})\) and \(\mathcal{C}'=(\mathcal{T}',\mathcal{P}')\) can result in the same event, \({\mathcal{O}(\mathcal{C})}=\mathcal{O}(\mathcal{C}')\), if we simply reshuffle the labels of \(\mathcal{T}\) in a hierarchical way and then redefine \(\mathcal{P}\) accordingly. Later on, we will need to fix a special configuration among these, and this will be done using the cluster weights \(W_t\) around the sample points, as follows. Consider the event

$$\begin{aligned} {\mathcal{W}(\mathcal{C})}= \Bigl \{ (\sigma ^1,\ldots ,\sigma ^n) \ \bigr |\ W_{t1}> \cdots > W_{tk_t} \text{ for } \text{ all } t\in \mathcal{T}{\setminus } \mathcal{L}(\mathcal{T}) \Bigr \}. \end{aligned}$$
(64)

It is obvious that such ordering of the weights makes the events \({\mathcal{O}(\mathcal{C})}\cap {\mathcal{W}(\mathcal{C})}\) disjoint for different configurations \(\mathcal{C}\), and each sample \((\sigma ^1,\ldots ,\sigma ^n)\) belongs to one and only one of these events. We will denote the corresponding configuration by \(\mathcal{C}_n = (\mathcal{T}_n,\mathcal{P}_n)\),

$$\begin{aligned} \mathcal{C}_n = \mathcal{C}\Longleftrightarrow (\sigma ^1,\ldots ,\sigma ^n) \in {\mathcal{O}(\mathcal{C})}\cap {\mathcal{W}(\mathcal{C})}, \end{aligned}$$
(65)

and call \(\mathcal{C}_n = (\mathcal{T}_n,\mathcal{P}_n)\) the sample configuration. In other words, \(\mathcal{C}_n\) is a function of \(\sigma ^1,\ldots ,\sigma ^n\) such that the tree is defined according to the overlap structure \((\sigma ^\ell \cdot \sigma ^{\ell '})_{\ell ,\ell '\le n}\) with the vertices labelled according to the weights of the neighborhoods of these replicas. The event \({\mathcal{W}(\mathcal{C})}\) and the sample configuration \(\mathcal{C}_n\) will not be used in this section, but will play an important role in the last section where they will be utilized to partition an event into disjoint events indexed by configurations \(\mathcal{C}\).

For the remainder of this section, we will fix a configuration \(\mathcal{C}\) once and for all and, for simplicity of notation, will omit the dependence of \({\mathcal{O}(\mathcal{C})}\) on \(\mathcal{C}\) and write \(\mathcal{O}\) instead. Let us denote \(\mathbb {P}(\ \cdot \ ) = \mathbb {E}\langle {\mathrm{I}}(\ \cdot \ )\rangle \) and let

$$\begin{aligned} \mathbb {P}_\mathcal{O}(\ \cdot \ ) = \frac{\mathbb {P}(\ \cdot \ \cap \mathcal{O})}{\mathbb {P}(\mathcal{O})} \end{aligned}$$
(66)

be the conditional distribution given the event \(\mathcal{O}\). Since \(n\) is fixed in this section, we will write \(S\) to denote \(S^n\) in (12). Let

$$\begin{aligned} \mathcal{T}_* : = \mathcal{T}{\setminus } \{*\} \ \text{ and } \ W = (W_t)_{t\in \mathcal{T}_*}. \end{aligned}$$
(67)

We exclude the root, because \(W_*=1\). Theorem 1 will follow from the main result of this section.

Theorem 4

For any measurable sets \(A\) and \(B\),

$$\begin{aligned} \mathbb {P}_\mathcal{O}( S \in A, W\in B) = \mathbb {P}_\mathcal{O}( S \in A) \mathbb {P}_\mathcal{O}(W\in B). \end{aligned}$$
(68)

Since the weights \((W_t)\) and \((\delta _t)\) in (63) are functions of each other, the independence of \(S\) and \(W\) in (68) is equivalent to independence of \(S\) and \(\delta \),

$$\begin{aligned} \mathbb {P}_\mathcal{O}( S \in A, \delta \in B) = \mathbb {P}_\mathcal{O}( S \in A) \mathbb {P}_\mathcal{O}(\delta \in B), \end{aligned}$$
(69)

where \(\delta = (\delta _t)_{t\in \mathcal{T}_*}\). Again, we can exclude the root, because \(\delta _* = 1-\sum _{t\in \mathcal{T}_*}\delta _t.\) The vector \(\delta \) takes values in the open subset

$$\begin{aligned} \mathcal{D}= \Bigl \{ (x_t)_{t\in \mathcal{T}_*} \ \bigr |\ \sum _{t\in \mathcal{T}_*} x_t<1 \text{ and } \text{ all } x_t>0 \Bigr \} \end{aligned}$$
(70)

of \(\mathbb {R}^{|\mathcal{T}_*|}.\) Given a vector \(a=(a_t)_{t\in \mathcal{T}_*}\in \mathbb {R}^{|\mathcal{T}_*|}\), let us define the map \(T_a: \mathcal{D}\rightarrow \mathcal{D}\) by

$$\begin{aligned} T_a(x) = \Bigl (\frac{x_t e^{a_t}}{\Delta _a(x)}\Bigr )_{t\in \mathcal{T}_*} \ \text{ where } \ \Delta _a(x) = \sum _{t\in \mathcal{T}_*} x_t e^{a_t} + 1- \sum _{t\in \mathcal{T}_*} x_t. \end{aligned}$$
(71)

One can easily check that for \(a,b\in \mathbb {R}^{|\mathcal{T}_*|}\) we have \(T_a\circ T_b = T_{a+b}\) and, therefore, \(T_a^{-1} = T_{-a}\). It is also easy to check that

$$\begin{aligned} \Delta _a(T_{-a}(x)) = \frac{1}{\Delta _{-a}(x)}. \end{aligned}$$
(72)

Let us denote by \(B_{\varepsilon }(x)\) the open ball of radius \({\varepsilon }\) in \( \mathbb {R}^{|\mathcal{T}_*|}\) centered at \(x.\) Then the following holds.

Lemma 2

For any \(a=(a_t)_{t\in \mathcal{T}_*}\in \mathbb {R}^{|\mathcal{T}_*|}\) and \(x\in \mathcal{D}\),

$$\begin{aligned} \lim _{{\varepsilon }\downarrow 0} \frac{\mathbb {P}_\mathcal{O}(S\in A, \delta \in B_{\varepsilon }(x))}{ \mathbb {P}_\mathcal{O}(\delta \in B_{\varepsilon }(x))} = \lim _{{\varepsilon }\downarrow 0} \frac{\mathbb {P}_\mathcal{O}(S\in A, T_a(\delta ) \in B_{\varepsilon }(x))}{ \mathbb {P}_\mathcal{O}(T_a(\delta ) \in B_{\varepsilon }(x))} \end{aligned}$$
(73)

whenever either of the limits exists.

Proof

As we mentioned above, we will apply Theorem 3 to the partition \((B_t)_{t\in \mathcal{T}}\) in (62) with the following choice of function \(f_1,\ldots , f_n\) in (51). Let us consider an arbitrary function

$$\begin{aligned} \ell : \mathcal{T}\rightarrow \{1,\ldots , n\} \end{aligned}$$
(74)

such that \(\ell (t) \in \mathcal{R}(t)\) in (58) for all \(t\in \mathcal{T}\). In other words, we pick one replica index \(\ell (t)\) assigned to one of the leaves that are descendants of \(t\). Consider a vector \(b=(b_t)_{t\in \mathcal{T}}\in \mathbb {R}^{|\mathcal{T}|}\). For each replica index \(1\le \ell \le n\), let

$$\begin{aligned} \mathcal{T}_\ell = \bigl \{t\in \mathcal{T}\ |\ \ell (t) = \ell \bigr \} \ \text{ and } \ f_\ell (x) = \sum _{t\in \mathcal{T}_\ell } b_t {\mathrm{I}}(x\in I_{|t|}). \end{aligned}$$
(75)

Then the function \(F\) in (51) can be written as

$$\begin{aligned} F(\sigma ,\sigma ^1,\ldots ,\sigma ^n) = \sum _{\ell \le n} \sum _{t\in \mathcal{T}_\ell } b_t {\mathrm{I}}(\sigma \cdot \sigma ^\ell \in I_{|t|}) = \sum _{t\in \mathcal{T}} b_t {\mathrm{I}}(\sigma \cdot \sigma ^{\ell (t)} \in I_{|t|}). \end{aligned}$$

Let us fix \(u\in \mathcal{T}\) and compute \(\langle {\mathrm{I}}(\sigma \in B_u) \exp F(\sigma ,\sigma ^1,\ldots ,\sigma ^n )\rangle _{{\_}}\). We will now fix \(\sigma \in B_u\) and consider several different cases when \(t\) belongs to different subsets of the tree \(\mathcal{T}\).

  1. 1.

    First of all, if \(t=u\) then \({\mathrm{I}}(\sigma \cdot \sigma ^{\ell (t)} \in I_{|t|}) = 1\) by the definition of \(B_u\) in (62).

  2. 2.

    If \(t\succ u\), \(t\not =u,\) then \(\ell (t) \in \mathcal{R}(u)\) and \(\sigma \cdot \sigma ^{\ell (t)} \in I_{|u|}\), which implies that \({\mathrm{I}}(\sigma \cdot \sigma ^{\ell (t)} \in I_{|t|}) = 0\).

  3. 3.

    If neither \(t\succ u\) nor \(u\succ t\) then (on the event \(\mathcal{O}\)) \(\sigma ^{\ell (t)}\cdot \sigma ^{\ell (u)} \in I_{t\wedge u}\) and \(t\wedge u < \min (|t|,|u|)\). Since for \(\sigma \in B_u\) we have \(\sigma \cdot \sigma ^{\ell (u)} \in I_{|u|}\), by ultrametricity, \({\mathrm{I}}(\sigma \cdot \sigma ^{\ell (t)} \in I_{|t|}) = 0.\)

  4. 4.

    If \(u\succ t\), \(t\not = u\), then, in general, the answer depends on the choice of the function (74) or, more specifically, on whether \(\mathcal{P}(\ell (u))\wedge \mathcal{P}(\ell (t)) = |t|\) or \(\mathcal{P}(\ell (u))\wedge \mathcal{P}(\ell (t)) > |t|\). In the first case, on the event \(\mathcal{O}\), \(\sigma ^{\ell (t)}\cdot \sigma ^{\ell (u)} \in I_{|t|}\). Since for \(\sigma \in B_u\) we have \(\sigma \cdot \sigma ^{\ell (u)} \in I_{|u|}\) and \(I_{|u|}\) lies strictly above \(I_{|t|}\), by ultrametricity, \(\sigma \cdot \sigma ^{\ell (t)} = \sigma ^{\ell (t)}\cdot \sigma ^{\ell (u)} \in I_{|t|}\) and \({\mathrm{I}}(\sigma \cdot \sigma ^{\ell (t)} \in I_{|t|}) = 1\). In the second case, on the event \(\mathcal{O}\), \(\sigma ^{\ell (t)}\cdot \sigma ^{\ell (u)}\) also lies strictly above \(I_{|t|}\) and, therefore, by ultrametricity, \(\sigma \cdot \sigma ^{\ell (t)}\) lies strictly above \(I_{|t|}\) and \({\mathrm{I}}(\sigma \cdot \sigma ^{\ell (t)} \in I_{|t|}) = 0\).

Therefore, if we consider the set

$$\begin{aligned} \mathcal{T}(u) = \Bigl \{ t\in \mathcal{T}\, \bigr | \ u\succ t, t\not =u \, \text{ and } \mathcal{P}(\ell (u))\wedge \mathcal{P}(\ell (t)) = |t| \Bigr \} \end{aligned}$$

then, for \(\sigma \in B_u\) we have \(F(\sigma ,\sigma ^1,\ldots ,\sigma ^n) = b_u + \sum _{t\in \mathcal{T}(u)} b_t\) and

$$\begin{aligned} \bigl \langle {\mathrm{I}}(\sigma \in B_u) \exp F(\sigma ,\sigma ^1,\ldots ,\sigma ^n )\bigr \rangle _{{\_}} = G(B_u) \exp \Bigl (b_u+ \sum _{t\in \mathcal{T}(u)} b_t \Bigr ). \end{aligned}$$

Let us now set \(b_{*}=0\) and recursively set \(b_u = a_u - \sum _{t\in \mathcal{T}(u)} b_t\) for \(u\in \mathcal{T}_{*}\). Then,

$$\begin{aligned} \bigl \langle {\mathrm{I}}(\sigma \in B_*) \exp F(\sigma ,\sigma ^1,\ldots ,\sigma ^n )\bigr \rangle _{{\_}}&=\ \delta _* = 1-\sum _{t\in \mathcal{T}_*} \delta _t,\\ \bigl \langle {\mathrm{I}}(\sigma \in B_u) \exp F(\sigma ,\sigma ^1,\ldots ,\sigma ^n )\bigr \rangle _{{\_}}&=\ \delta _u e^{a_u} \ \text{ for } \ u\in \mathcal{T}_*. \end{aligned}$$

Adding them up, we get

$$\begin{aligned} \bigl \langle \exp F(\sigma ,\sigma ^1,\ldots ,\sigma ^n )\bigr \rangle _{{\_}} = \sum _{t\in \mathcal{T}_*}\delta _u e^{a_u} + 1-\sum _{t\in \mathcal{T}_*} \delta _t = \Delta _a(\delta ). \end{aligned}$$

We showed that, with this choice of functions \(f_1,\ldots , f_n\), the map \(T\) in (54) coincides with the map \(T_a\) in (71) on the coordinates indexed by \(t\in \mathcal{T}_*.\) Also, it is clear that, on the event \(\mathcal{O}\), the sum

$$\begin{aligned} \sum _{\ell =1}^n F_\ell (\sigma ^\ell ,\sigma ^1,\ldots ,\sigma ^n ) \end{aligned}$$

is a constant, which we will denote by \(\gamma (a)\). If we denote \(Z_a(\delta ) = e^{\gamma (a)}/\Delta _a(\delta )^n\) then Theorem 3 implies that

$$\begin{aligned} \mathbb {E}\bigl \langle {\mathrm{I}}\bigl (S\in A, \delta \in B_{\varepsilon }(x) \bigr ) {\mathrm{I}}_{\mathcal{O}} \bigr \rangle = \mathbb {E}\bigl \langle {\mathrm{I}}\bigl (S\in A,T_a(\delta )\in B_{\varepsilon }(x)\bigr ) Z_a(\delta ) {\mathrm{I}}_{\mathcal{O}} \bigr \rangle . \end{aligned}$$

The same equality, obviously, holds without the event \(\{S\in A\}\), which proves that

$$\begin{aligned} \frac{\mathbb {E}\langle {\mathrm{I}}(S\in A, \delta \in B_{\varepsilon }(x)) {\mathrm{I}}_{\mathcal{O}} \rangle }{\mathbb {E}\langle {\mathrm{I}}(\delta \in B_{\varepsilon }(x)) {\mathrm{I}}_{\mathcal{O}} \rangle } = \frac{\mathbb {E}\langle {\mathrm{I}}(S\in A,T_a(\delta )\in B_{\varepsilon }(x)) Z_a(\delta ) {\mathrm{I}}_{\mathcal{O}} \rangle }{\mathbb {E}\langle {\mathrm{I}}(T_a(\delta )\in B_{\varepsilon }(x)) Z_a(\delta ) {\mathrm{I}}_{\mathcal{O}} \rangle }, \end{aligned}$$
(76)

if the numerator is not zero. When \(T_a(\delta )\in B_{\varepsilon }(x)\), by (72),

$$\begin{aligned} \frac{1}{\Delta _a(\delta )} = \Delta _{-a}(T_a(\delta )) \in \Delta _{-a}(B_{\varepsilon }(x)) \end{aligned}$$

and, hence, \(Z_a(\delta ) \in e^{\gamma (a)} \Delta _{-a}(B_{\varepsilon }(x))^n\). As a result, as \({\varepsilon }\downarrow 0\), the factor \(Z_a(\delta )\) converges uniformly to a constant \(e^{\gamma (a)} \Delta _{-a}(x)^n\) that will cancel out on the right hand side of (76), yielding (73). \(\square \)

We will need one more technical result that will be postponed until the next section.

Lemma 3

The distribution \(\mathbb {P}_\mathcal{O}(\delta \in \ \cdot \ )\) of weights \(\delta = (\delta _t)_{t\in \mathcal{T}_*}\) is absolutely continuous with respect to the Lebesgue measure on \(\mathbb {R}^{|\mathcal{T}_*|}\).

We are now ready to prove Theorem 4.

Proof of Theorem 4

Let \(p(x)\) be the Lebesgue density of the distribution \(\mathbb {P}_\mathcal{O}(\delta \in \, \cdot \, )\) and let \(p_A(\delta )\) be the conditional expectation of the indicator \({\mathrm{I}}(S\in A)\) given \(\delta \) under the measure \(\mathbb {P}_\mathcal{O}.\) Then,

$$\begin{aligned} \mathbb {P}_\mathcal{O}(S\in A, \delta \in B) = \int \limits _B p_A(x) p(x) \, dx \ \text{ and } \ \mathbb {P}_\mathcal{O}(\delta \in B) = \int \limits _B p(x) \,dx. \end{aligned}$$
(77)

To prove (69), it is enough to show that \(p_A(x)\) is a constant a.e. on the set \(\{x: p(x)>0\}\). By the Lebesgue differentiation theorem (see Corollary 1.6 in [28]), for almost every \(x'\in \mathbb {R}^{|\mathcal{T}_*|}\),

$$\begin{aligned}&\lim _{{\varepsilon }\downarrow 0} \,\frac{1}{|B_{\varepsilon }(x')|}\int \limits _{B_{\varepsilon }(x')} \bigl | p_A(x)p(x) - p_A(x')p(x')\bigr | \,dx = 0,\end{aligned}$$
(78)
$$\begin{aligned}&\lim _{{\varepsilon }\downarrow 0}\, \frac{1}{|B_{\varepsilon }(x')|}\int \limits _{B_{\varepsilon }(x')} \bigl |p(x) - p(x') \bigr | \,dx = 0. \end{aligned}$$
(79)

If \(p_A(x)\) is not a constant a.e. on \(\{p(x)>0\}\) then we can find two points \(x',x''\) for which both (78) and (79) hold and such that \(p(x'), p(x'')>0\) and \(p_A(x')\not = p_A(x'').\) We can also assume that \(x',x''\in \mathcal{D}\) in (70) since \(\mathbb {P}_\mathcal{O}(\delta \not \in \mathcal{D}) = 0.\) First of all, equations (77) – (79) imply that the left hand side of (73) is equal to

$$\begin{aligned} \lim _{{\varepsilon }\downarrow 0} \, \frac{\mathbb {P}_\mathcal{O}(S\in A, \delta \in B_{\varepsilon }(x'))}{ \mathbb {P}_\mathcal{O}(\delta \in B_{\varepsilon }(x'))} = p_A(x'). \end{aligned}$$
(80)

It is easy to check that if we take

$$\begin{aligned} a_t = \log \frac{x_t'}{x_t''} - \log \frac{1-\sum _{t\in \mathcal{T}_*} x_t'}{1-\sum _{t\in \mathcal{T}_*} x_t''} \end{aligned}$$

for \(t\in \mathcal{T}_*\) then \(T_a(x'') = x'\) for \(T_a\) defined in (71). Equations (73) and (80) imply that

$$\begin{aligned} \lim _{{\varepsilon }\downarrow 0} \, \frac{\mathbb {P}_\mathcal{O}(S\in A, \delta \in T_{-a} (B_{\varepsilon }(x')))}{ \mathbb {P}_\mathcal{O}(\delta \in T_{-a} (B_{\varepsilon }(x')))} = p_A(x'). \end{aligned}$$
(81)

To finish the proof, we will follow the argument of Corollary 1.7 in [28] and use the fact that the sets \(T_{-a} (B_{\varepsilon }(x'))\) are of bounded eccentricity. Namely, since all partial derivatives of \(T_a\) are uniformly bounded in a small neighborhood of \(x''\) and all partial derivatives of \(T_a^{-1}=T_{-a}\) are uniformly bounded in a small neighborhood of \(x'\), there exist some constants \(c, C>0\) such that \(B_{c {\varepsilon }}(x'') \subseteq T_{-a} (B_{\varepsilon }(x')) \subseteq B_{C {\varepsilon }}(x'')\) for small \({\varepsilon }>0\). Therefore,

$$\begin{aligned} \frac{1}{| T_{-a} (B_{\varepsilon }(x')) |}\int \limits _{ T_{-a} (B_{\varepsilon }(x'))} \bigl |p(x) - p(x'')\bigr | \,dx\; \le \ \frac{1}{ |B_{c{\varepsilon }}(x'')|}\int \limits _{B_{C{\varepsilon }}(x'')} \bigl |p(x) - p(x'')\bigr |\, dx\\ \; =\ \frac{(C/c)^{|\mathcal{T}_*|}}{|B_{C{\varepsilon }}(x'')|}\int \limits _{B_{C{\varepsilon }}(x'')} \bigl |p(x) {-} p(x'')\bigr | \,dx, \end{aligned}$$

and, using that (79) holds with \(x''\) instead of \(x'\), we get

$$\begin{aligned} \lim _{{\varepsilon }\downarrow 0} \, \frac{1}{| T_{-a} (B_{\varepsilon }(x')) |}\int \limits _{ T_{-a} (B_{\varepsilon }(x'))} \bigl |p(x) - p(x'')\bigr | \,dx = 0. \end{aligned}$$

Similarly, using (78) with \(x''\) instead of \(x'\),

$$\begin{aligned} \lim _{{\varepsilon }\downarrow 0} \, \frac{1}{| T_{-a} (B_{\varepsilon }(x')) |}\int \limits _{ T_{-a} (B_{\varepsilon }(x'))} \bigl |p_A(x) p(x) - p_A(x'') p(x'')\bigr |\,dx = 0. \end{aligned}$$

These equations together with (77) for \(B = T_{-a} (B_{\varepsilon }(x'))\) imply that

$$\begin{aligned} \lim _{{\varepsilon }\downarrow 0} \, \frac{\mathbb {P}_\mathcal{O}(S\in A, \delta \in T_{-a} (B_{\varepsilon }(x')))}{ \mathbb {P}_\mathcal{O}(\delta \in T_{-a}( B_{\varepsilon }(x')))} = p_A(x''). \end{aligned}$$

Recalling (81), we arrive at contradiction, \(p_A(x') = p_A(x'')\). \(\square \)

4 Absolute continuity of cluster weight distribution

In this section, we will prove Lemma 3. First, we will reduce the problem to proving absolute continuity for the distribution of finitely many cluster weights \(V_\alpha \) in (27). Then we will recall that these cluster weights are generated by the RPC thanks to the Ghirlanda–Guerra identities, so the proof of absolute continuity will be based solely on the properties of the RPC.

Let \(\mathcal{C}=(\mathcal{T},\mathcal{P})\) be a fixed configuration as in the previous section. With probability one, the vector of weights \(W=(W_t)_{t\in \mathcal{T}_*}\) defined in (63) belongs to the open subset

$$\begin{aligned} \mathcal{W}= \Bigl \{ (y_t)_{t\in \mathcal{T}_*} \ \bigr |\ \sum _{k\le k_t} y_{tk}<y_t \text{ for } t\in \mathcal{T}{\setminus }\mathcal{L}(\mathcal{T}) \text{ and } \text{ all } y_t>0 \Bigr \} \end{aligned}$$
(82)

of \(\mathbb {R}^{|\mathcal{T}_*|},\) where we set \(y_* = 1.\) The map given by \(x_t = y_t -\sum _{k\le k_t} y_{tk}\) for \(t\in \mathcal{T}_*\) is a linear bijection between \(\mathcal{W}\) and the set \(\mathcal{D}\) defined in (70). Recall that this is precisely the relationship between the weights \(W=(W_t)_{t\in \mathcal{T}_*}\) and \(\delta = (\delta _t)_{t\in \mathcal{T}_*}\) in (63). Therefore, in order to prove Lemma 3, it is enough to prove that the distribution \(\mathbb {P}_\mathcal{O}(W\in \ \cdot \ )\) is absolutely continuous with respect to the Lebesgue measure on \(\mathbb {R}^{|\mathcal{T}_*|}\).

Let us now recall the definition of the clusters \((H_\alpha )_{\alpha \in \mathcal{A}}\) and their Gibbs weights \((V_\alpha )_{\alpha \in \mathcal{A}}\) in the paragraph above equation (27). Suppose that the cardinality of \(\mathcal{L}(\mathcal{T})\) is equal to \(m\). Let us look at all possible choices of \(m\) pure states \(H_{\alpha _t}\) for \(t\in \mathcal{L}(\mathcal{T})\) indexed by the leaves \(\alpha _t\in \mathcal{L}(\mathcal{A})=\mathbb {N}^r\) that “form the same pattern” according to their overlaps as the tree \(\mathcal{T}\). More precisely, we will denote \(\bar{\alpha } := (\alpha _t)_{t\in \mathcal{L}(\mathcal{T})}\) and consider the set

$$\begin{aligned} \mathcal{A}(\mathcal{C}) = \Bigl \{ \bar{\alpha } \in \mathcal{L}(\mathcal{A})^m \ \bigr | \ \alpha _t \wedge \alpha _{t'} = t\wedge t' \ \text{ for } \text{ all } \ t,t'\in \mathcal{L}(\mathcal{T}) \Bigr \}. \end{aligned}$$

Then it should be obvious that the event \(\mathcal{O}= {\mathcal{O}(\mathcal{C})}\) defined in (57) can be written as a disjoint union \(\mathcal{O}= \bigcup _{\bar{\alpha }\in \mathcal{A}(\mathcal{C})} \mathcal{O}(\bar{\alpha })\), where (recall the definition of \(\mathcal{R}(t)\) in (58))

$$\begin{aligned} \mathcal{O}(\bar{\alpha }) = \Bigl \{ (\sigma ^1,\ldots ,\sigma ^n) \ \bigr |\ \sigma ^\ell \in H_{\alpha _t} \text{ for } \text{ all } t\in \mathcal{L}(\mathcal{T}) \text{ and } \ell \in \mathcal{R}(t) \Bigr \}. \end{aligned}$$

Then, we can write

$$\begin{aligned} \mathbb {P}_\mathcal{O}(W\in B) = \mathbb {E}\bigl \langle {\mathrm{I}}(W\in B) {\mathrm{I}}_{\mathcal{O}}\bigr \rangle = \mathbb {E}\sum _{ \bar{\alpha } \in \mathcal{A}(\mathcal{C})} \bigl \langle {\mathrm{I}}(W\in B) {\mathrm{I}}_{\mathcal{O}(\bar{\alpha })}\bigr \rangle . \end{aligned}$$

On the event \(\mathcal{O}(\bar{\alpha })\), the vector of weights \(W=(W_t)_{t\in \mathcal{T}_*}\) can also be written as a vector of cluster weights \(V_\alpha \) in (27) indexed by the vertices \(\alpha \) in the subtree formed by all paths from the root to the leaves \((\alpha _t)_{t\in \mathcal{L}(\mathcal{T})}\). Let us call this vector \(V(\bar{\alpha })\). Also, obviously,

$$\begin{aligned} \bigl \langle {\mathrm{I}}_{\mathcal{O}(\bar{\alpha })}\bigr \rangle = \prod _{t\in \mathcal{L}(\mathcal{T})} V_{\alpha _t}^{|\mathcal{R}(t)|} \end{aligned}$$

and, therefore,

$$\begin{aligned} \mathbb {P}_\mathcal{O}(W\in B) = \mathbb {E}\sum _{\bar{\alpha } \in \mathcal{A}(\mathcal{C})} {\mathrm{I}}\bigl (V(\bar{\alpha }) \in B\bigr ) \prod _{t\in \mathcal{L}(\mathcal{T})} V_{\alpha _t}^{|\mathcal{R}(t)|}. \end{aligned}$$

To finish the proof of Lemma 3, it is enough to show that the distribution of \(V(\bar{\alpha })\) is absolutely continuous with respect to the Lebesgue measure. For the remainder of this section, we will forget about the configuration \(\mathcal{C}\) and will focus on proving the absolute continuity of the distribution of cluster weights \((V_\alpha )_{\alpha \in F}\) indexed by an arbitrary finite subset \(F\) of the tree \(\mathcal{A}\). Of course, this will be based on the properties of the RPC, so we will first recall the construction of these cascades and how it relates to the weights \(V_\alpha \).

Recall the sequence of parameters in (24). For each \(\alpha \in \mathcal{A}{\setminus } \mathbb {N}^r\), let \(\Pi _\alpha \) be a Poisson process on \((0,\infty )\) with the mean measure \(\zeta _{p}x^{-1-\zeta _{p}}dx\) with \(p=|\alpha |\), and we assume that these processes are independent for all \(\alpha \). Let us arrange all the points in \(\Pi _\alpha \) in the decreasing order,

$$\begin{aligned} u_{\alpha 1} > u_{ \alpha 2} >\cdots >u_{\alpha n} > \cdots , \end{aligned}$$
(83)

and enumerate them using the children \((\alpha n)_{n\ge 1}\) of the vertex \(\alpha \). Given a vertex \(\alpha \in \mathcal{A}{\setminus } \{*\}\) and the path \(p(\alpha )\) in (20), we define

$$\begin{aligned} w_\alpha = \prod _{\beta \in p(\alpha )} u_{\beta }, \end{aligned}$$
(84)

and for the leaf vertices \(\alpha \in \mathcal{L}(\mathcal{A}) = \mathbb {N}^r\) we define

$$\begin{aligned} v_\alpha = \frac{w_\alpha }{\sum _{\beta \in \mathbb {N}^r} w_\beta }. \end{aligned}$$
(85)

For other vertices \(\alpha \in \mathcal{A}{\setminus } \mathcal{L}(\mathcal{A})\) we define

$$\begin{aligned} v_\alpha = \sum _{\beta \in \mathcal{L}(\mathcal{A}),\,\beta \succ \alpha } v_\beta . \end{aligned}$$
(86)

Of course, this definition implies that \(v_\alpha = \sum _{n\ge 1} v_{\alpha n}\) when \(|\alpha |<r\). Notice that, for a given \(\alpha \), the sequence of weights \((v_{\alpha n})_{n\ge 1}\) is not necessarily decreasing. For example, when \(r=2\), sequences \((u_n)_{n\ge 1}\) and \((u_{nm})_{m\ge 1}\) for all \(n\) are decreasing by construction, but \(v_n\) is proportional to \(u_n\sum _{m\ge 1} u_{nm}\) and does not have to be decreasing. Let us now rearrange the vertex labels so that the weights indexed by children will be decreasing. For each \(\alpha \in \mathcal{A}{\setminus } \mathbb {N}^r\), let \(\pi _\alpha : \mathbb {N}\rightarrow \mathbb {N}\) be a bijection such that the sequence \((v_{\alpha \pi _\alpha (n)})_{n\ge 1}\) is decreasing. Using these “local rearrangements” we define a global bijection \(\pi : \mathcal{A}\rightarrow \mathcal{A}\) in a natural way, as follows. We let \(\pi (*)=*\) and then define

$$\begin{aligned} \pi (\alpha n) = \pi (\alpha ) \pi _{\pi (\alpha )}(n) \end{aligned}$$
(87)

recursively from the root to the leaves of the tree. Finally, we define

$$\begin{aligned} V_\alpha = v_{\pi (\alpha )} \ \text{ for } \text{ all } \ \alpha \in \mathcal{A}. \end{aligned}$$
(88)

It is not a coincidence that we used here the same notation as in (27), since they have the same distribution. This relationship between cluster weights of a random measure \(G\) and the RPC is a well-known consequence of the Ghirlanda–Guerra identities (see Section 2.4 in [21]). Therefore, our goal is to prove the following.

Lemma 4

The distribution of weights \((V_\alpha )_{\alpha \in F}\) in (88) indexed by an arbitrary finite subset \(F\) of the tree \(\mathcal{A}\) is absolutely continuous with respect to the Lebesgue measure on \(\mathbb {R}^{|F|}\).

Let us first introduce some more notation and recall some definitions. Let \((u_n)_{n\ge 1}\) be the decreasing enumeration of a Poisson process on \((0,\infty )\) with the mean measure \(x u^{-1-x}du\) for some \(x\in (0,1)\) and let

$$\begin{aligned} U = \sum _{n\ge 1} u_n \ \text{ and } \ p_n = \frac{u_n}{U} \ \text{ for } \ n\ge 1. \end{aligned}$$
(89)

The distribution of the sequence \((p_n)_{n\ge 1}\) is called the Poisson–Dirichlet distribution \(PD(x)\) (or \(PD(x,0)\)). It is well known that the distribution of finitely many coordinates of \((p_n)\) is absolutely continuous. For example, Proposition 47 in [25] gives some representation for the density, but the existence of the density is also easy to see directly from the representation of this process in Proposition 8 in [25].

Let us consider \(a<x\). Then the distribution of \((p_n)_{n\ge 1}\) under the change of density \(U^a/\mathbb {E}U^a\) is called the Poisson–Dirichlet distribution \(PD(x,-a)\). The usual condition \(a<x\) ensures that \(\mathbb {E}U^a <\infty \) and the change of density is well defined (see e.g. Lemma 2.1 in [21]). The definition of this distribution in Section 1.1 in [25] was different but its equivalence to this one was shown in Proposition 14 there. (In [25], the parameter \(-a\) was denoted \(\theta \) and the condition was stated as \(\theta >-x\).) It is easy to see that the distribution of finitely many coordinates of \((p_n)\) under \(PD(x,-a)\) is also absolutely continuous. Indeed, for any \(N\ge 1\) and a measurable set \(A\) in \(\mathbb {R}^N\) of Lebesgue measure \(0\), by Hölder’s inequality,

$$\begin{aligned} \mathbb {E}U^a {\mathrm{I}}\bigl ((p_n)_{n\le N}\in A\bigr ) \le (\mathbb {E}U^{a(1+{\varepsilon })})^{1/(1+{\varepsilon })} \mathbb {P}\bigl ((p_n)_{n\le N}\in A\bigr )^{{\varepsilon }/(1+{\varepsilon })}=0, \end{aligned}$$
(90)

for small enough \({\varepsilon }>0\) such that \(a(1+{\varepsilon })<x\), in which case \(\mathbb {E}U^{a(1+{\varepsilon })}<\infty \).

For each \(\alpha \in \mathbb {N}^{r-1}\), let us now consider the sequence

$$\begin{aligned} p_{\alpha n} = \frac{V_{\alpha n}}{V_\alpha } \ \text{ for } \ n\ge 1. \end{aligned}$$
(91)

By definition, this sequence is decreasing and \(\sum _{n\ge 1} p_{\alpha n}=1\). The following holds.

Lemma 5

For each \(\alpha \in \mathbb {N}^{r-1}\), the sequence \((p_{\alpha n})_{n\ge 1}\) in (91) has distribution \(PD(\zeta _{r-1},-\zeta _{r-2})\). These sequences are independent of each other and of \((V_{\alpha })_{|\alpha |\le r-1}\).

First, let us show how this implies Lemma 4.

Proof of Lemma 4

This now follows easily by induction on \(r\). For \(r=1\), this is just absolute continuity of weights from the Poisson–Dirichlet distribution \(PD(\zeta _0)\). To make an induction step, we use a well-known fact that the array \((V_\alpha )_{|\alpha | \le r-1}\) can be constructed as in (83)–(88) with \(r\) replaced by \(r-1\) and \(\zeta _{r-1}\) removed from the sequence (24). This observation goes back to [26], but is also a trivial consequence of the Ghirlanda–Guerra identities. (In any case, the proof of this fact will appear below as a byproduct of the proof of Lemma 5.) By the induction hypothesis, this implies that the distribution of finitely many coordinates of \((V_\alpha )_{|\alpha | \le r-1}\) is absolutely continuous. To include coordinates \(V_{\alpha n}\) for \(\alpha \in \mathbb {N}^{r-1}\) and \(n\ge 1\), we write them as \(V_{\alpha n} = V_\alpha p_{\alpha n}\) and use Lemma 5 together with the observation in (90) about absolutely continuity of the distribution of finitely many coordinates under \(PD(x,-a)\). \(\square \)

Proof of Lemma 5

We only need to consider the case \(r\ge 2\). For each \(\alpha \in \mathbb {N}^{r-2}\), consider the process \((u_{\alpha n}, (u_{\alpha n m})_{m\ge 1} )_{n\ge 1}\) and let \(U_{\alpha n} := \sum _{m\ge 1} u_{\alpha n m}\). If we define

$$\begin{aligned} d_{\alpha n m} = \frac{v_{\alpha n m}}{v_{\alpha n}} = \frac{u_{\alpha n m}}{U_{\alpha n}} \end{aligned}$$

then \(Y_{\alpha n} := (d_{\alpha n m})_{m\ge 1}\) has the Poisson–Dirichlet distribution \(PD(\zeta _{r-1})\). Notice that the random variables \((U_{\alpha n}, Y_{\alpha n})_{n\ge 1}\) are i.i.d. and independent of \((u_{\alpha n})_{n\ge 1}\). Moreover, all these processes are independent over \(\alpha \in \mathbb {N}^{r-2}\), and also independent of \(U_{r-2} = (u_\alpha )_{|\alpha |\le r-2}\).

For a fixed \(\alpha \in \mathbb {N}^{r-2}\), let \(\pi _\alpha :\mathbb {N}\rightarrow \mathbb {N}\) be a bijection such that the sequence \((u_{\alpha \pi _\alpha (n)} U_{\alpha \pi _\alpha (n)})_{n\ge 1}\) is decreasing. This is exactly the same permutation defined in the paragraph above (87) since, for a fixed \(\alpha \in \mathbb {N}^{r-2}\), \(v_{\alpha n}\) is proportional to \(u_{\alpha n}U_{\alpha n}\). Since \((u_{\alpha n})_{n\ge 1}\) is a Poisson process with the mean measure \(\zeta _{r-2} \,x^{-1-\zeta _{r-2}} dx\), Theorem 2.6 in [21] (Proposition A.2 in [6]) implies that

$$\begin{aligned} \bigl (u_{\alpha \pi _\alpha (n)} U_{\alpha \pi _\alpha (n)} , Y_{\alpha \pi _\alpha (n)}\bigr )_{n\ge 1} \mathop {=}\limits ^{d} \bigl (u_{\alpha n} c, Y_{\alpha n}'\bigr )_{n\ge 1} \end{aligned}$$
(92)

where \(c= \bigl (\mathbb {E}U_{\alpha 1}^{\zeta _{r-2}} \bigr )^{1/\zeta _{r-2}}\), \((u_{\alpha n})_{n\ge 1}\) and \((Y_{\alpha n}')_{n\ge 1}\) on the right hand side are independent, and the random variables \((Y_{\alpha n}')_{n\ge 1}\) are i.i.d. with the distribution of \(Y_{\alpha 1} = (d_{\alpha 1 m})_{m\ge 1}\) under the change of density

$$\begin{aligned} U_{\alpha 1}^{\zeta _{r-2}} \bigr / \,\mathbb {E}U_{\alpha 1}^{\zeta _{r-2}}, \end{aligned}$$

which is precisely the Poisson–Dirichlet distribution \(PD(\zeta _{r-1},-\zeta _{r-2})\). It remains to notice that the weights \((V_{\alpha })_{|\alpha |\le r-1}\) are, obviously, a function of the arrays

$$\begin{aligned} U_{r-2} = (u_\alpha )_{|\alpha |\le r-2} \ \text{ and } \ \bigl (u_{\alpha \pi _\alpha (n)} U_{\alpha \pi _\alpha (n)} \bigr )_{\alpha \in \mathbb {N}^{r-2},n\ge 1} \end{aligned}$$
(93)

and are, therefore, independent of the random variables \(Y_{\alpha \pi _\alpha (n)}\), which are i.i.d. for all \(\alpha \in \mathbb {N}^{r-2}\) and \(n\ge 1\) and have the distribution \(PD(\zeta _{r-1},-\zeta _{r-2})\). In particular, the permutation \(\pi \) defined in (87), restricted to \(|\alpha |\le r-1\), will be a function of these arrays and, therefore,

$$\begin{aligned} \bigl (d_{\pi (\alpha n)m} \bigr )_{m\ge 1} = Y_{\pi (\alpha n)} = Y_{\pi (\alpha ) \pi _{\pi (\alpha )}(n)} \end{aligned}$$

are still i.i.d. over all \(\alpha \in \mathbb {N}^{r-2}\) and \(n\ge 1\), have distribution \(PD(\zeta _{r-1},-\zeta _{r-2})\), and are independent of \((V_{\alpha })_{|\alpha |\le r-1}\). This finishes the proof since, by the definition (91), for \(\alpha n\in \mathbb {N}^{r-1}\),

$$\begin{aligned} p_{\alpha n m} = \frac{V_{\alpha n m}}{V_{\alpha n}} = \frac{v_{\pi (\alpha n) m}}{v_{\pi (\alpha n)}} = d_{\pi (\alpha n)m}. \end{aligned}$$

Finally, let us notice that the above argument also proves the fact mentioned in the proof of Lemma 4, namely, that the array \((V_\alpha )_{|\alpha | \le r-1}\) can be constructed as in (83)–(88) with \(r\) replaced by \(r-1\) and \(\zeta _{r-1}\) removed from the sequence (24). This is because \((V_\alpha )_{|\alpha | \le r-1}\) is constructed from the arrays in (93) as in (83)–(88) and, by (92), for each \(\alpha \in \mathbb {N}^{r-2}\), the second array in (93) is, up to a factor \(c\), a Poisson process with the mean measure \(\zeta _{r-2} \,x^{-1-\zeta _{r-2}} dx\). Of course, this constant factor \(c\) will cancel at the step (85), so the claim follows. \(\square \)

5 From replicas to the Gibbs measure

In this section, we will show how Theorem 1 can be deduced from Theorem 4. The main idea is that when the sample size \(n\) goes to infinity, there will be many replicas in any given subset of pure states, and the statement in Theorem 4 about spins and cluster weights corresponding to the sample can be translated into a statement in Theorem 1 about spins inside pure states and cluster weights of the Gibbs measure.

Before we begin the proof, let us first notice that Theorem 1 follows from its analogue for finite subsets of the tree \(\mathcal{A}\), as follows. Let us consider integers \(d\ge 1\) and \(N\ge 1\) that will be fixed throughout this section (note that now the notation \(N\) is not related to the number of coordinates of the system in the introduction). Let \([d]=\{1,\ldots , d\}\) and let

$$\begin{aligned} \mathcal{A}_d = \{*\}\cup [d] \cup [d]^2 \cup \cdots \cup [d]^r \subseteq \mathcal{A}\end{aligned}$$

be a \(d\)-regular subtree of \(\mathcal{A}\). Any finite subset of \(\mathcal{A}\) will be covered by \(\mathcal{A}_d\) for \(d\) sufficiently large (depending on the subset). Now, recall the array \(S_\alpha = (S(\sigma ^{\alpha n}))_{n\ge 1}\) in (30) and let us truncate it to the array

$$\begin{aligned} S_{\alpha ,N} = \bigl (S(\sigma ^{\alpha n}) \bigr )_{n\le N} \end{aligned}$$
(94)

generated by a sample \((\sigma ^{\alpha n})_{n\le N}\) of size \(N\) from the pure state \(H_{\alpha }\). We will only consider these arrays for \(\alpha \in [d]^r = \mathcal{L}(\mathcal{A}_d)\), so we will need to restrict the notion of hierarchical exchangeability to the finite tree \(\mathcal{A}_d\). Similarly to (31), let

$$\begin{aligned} \mathcal{H}_d = \Bigl \{ \pi : [d]^r\,{\rightarrow }\, [d]^r \ \bigr |\ \pi \text{ is } \text{ a } \text{ bijection }, \pi (\alpha )\wedge \pi (\beta ) = \alpha \wedge \beta \text{ for } \text{ all } \alpha ,\beta \,{\in }\, [d]^r \Bigr \}.\!\!\!\!\nonumber \\ \end{aligned}$$
(95)

Then, naturally, we will call a finite array \((X_\alpha )_{\alpha \in [d]^r}\) hierarchically exchangeable if

$$\begin{aligned} \bigl (X_{\pi (\alpha )} \bigr )_{\alpha \in [d]^r} \mathop {=}\limits ^{d} \bigl (X_\alpha \bigr )_{\alpha \in [d]^r} \end{aligned}$$
(96)

for all \(\pi \in \mathcal{H}_d\). It is obvious that, in order to prove Theorem 1, it is sufficient to show the following for all \(d,N \ge 1\).

Theorem 1 \(^{\prime }\)   The array of spins \((S_{\alpha ,N})_{\alpha \in [d]^r}\) defined in (94) is hierarchically exchangeable and independent of the array of cluster weights \((V_\alpha )_{\alpha \in \mathcal{A}_d{\setminus }\{*\}}\).

To prove this, we will apply Theorem 4 to the following set of configurations \(\mathcal{C}=(\mathcal{T},\mathcal{P})\),

$$\begin{aligned} \mathcal{C}(n,d,N) = \Bigl \{ \mathcal{C}=(\mathcal{T},\mathcal{P}) \ \bigr |\ \mathcal{A}_d\subseteq \mathcal{T} \text{ and } |\mathcal{P}^{-1}(t)|\ge N \text{ for } t\in [d]^r \Bigr \}\quad \end{aligned}$$
(97)

(this set depends on \(n\) through the mapping \(\mathcal{T}\)). In words, the tree \(\mathcal{T}\) contains \(\mathcal{A}_d\) (so it is big enough) and at least \(N\) replica indices are mapped by \(\mathcal{P}\) into each leaf \(t\in [d]^r = \mathcal{L}(\mathcal{A}_d)\subseteq \mathcal{L}(\mathcal{T})\). For a given configuration \(\mathcal{C}\in \mathcal{C}(n,d,N)\) and \(t\in [d]^r\), let \(\mathcal{R}_N(t)\) be the set of the smallest \(N\) replica indices in \(\mathcal{P}^{-1}(t)\) (we choose the smallest \(N\) just for certainty, and arbitrary \(N\) would do) and define \(\mathcal{R}_{d,N} = \bigcup _{t\in [d]^r} \mathcal{R}_N(t)\). Let us recall the definition of \(S^n\) in (11) and (12) and, similarly, define

$$\begin{aligned} S^{d,N} = \bigl (S(\sigma ^\ell ) \bigr )_{\ell \in \mathcal{R}_{d,N}}. \end{aligned}$$
(98)

In other words, we are now only interested in a set of \(N\) replicas for each of the leaves in \([d]^r.\) Similarly to (57), let us define the event

$$\begin{aligned} \mathcal{O}(\mathcal{C},d,N) = \Bigl \{ (\sigma ^\ell )_{\ell \in \mathcal{R}_{d,N}} \ \bigr |\ \sigma ^\ell \cdot \sigma ^{\ell '}\in I_{\mathcal{P}(\ell )\wedge \mathcal{P}(\ell ')} \text{ for } \text{ all } \ell ,\ell ' \in \mathcal{R}_{d,N}\Bigr \},\quad \end{aligned}$$
(99)

which involves only the replicas with indices in \(\mathcal{R}_{d,N}\) and, similarly to the definition of \(\mathbb {P}_{{\mathcal{O}(\mathcal{C})}}\) in (66), we let

$$\begin{aligned} \mathbb {P}_{\mathcal{O}(\mathcal{C},d,N)}(\ \cdot \ ) = \frac{\mathbb {P}(\ \cdot \ \cap \mathcal{O}(\mathcal{C},d,N))}{\mathbb {P}(\mathcal{O}(\mathcal{C},d,N))}. \end{aligned}$$
(100)

We will need the following simple consequence of the Ghirlanda–Guerra identities (15).

Lemma 6

For any \(\mathcal{C}\in \mathcal{C}(n,d,N)\), we have

$$\begin{aligned} \mathbb {P}_{{\mathcal{O}(\mathcal{C})}}\bigl (S^{d,N}\in \ \cdot \ \bigr ) = \mathbb {P}_{\mathcal{O}(\mathcal{C},d,N)}\bigl (S^{d,N}\in \ \cdot \ \bigr ). \end{aligned}$$
(101)

Proof

Let us consider the numerator and denominator on the left hand side of (101),

$$\begin{aligned} \mathbb {E}\bigl \langle {\mathrm{I}}(S^{d,N}\in A) {\mathrm{I}}_{{\mathcal{O}(\mathcal{C})}}\bigr \rangle \ \text{ and } \ \mathbb {E}\bigl \langle {\mathrm{I}}_{{\mathcal{O}(\mathcal{C})}}\bigr \rangle . \end{aligned}$$

Consider any replica index \(\ell \in \{1,\ldots , n\}{\setminus } \mathcal{R}_{d,N}\) not appearing in \(S^{d,N}\). For simplicity of notation, suppose that this index is \(n\). Then, let \(\ell ' \not = n\) be a replica index such that \(\mathcal{P}(n) \wedge \mathcal{P}(\ell ')\) is as large as possible. Again, for simplicity of notation, suppose that \(\ell ' =1\) (it does not matter whether this replica index is in \(\mathcal{R}_{d,N}\) or not). Let \(p = \mathcal{P}(n) \wedge \mathcal{P}(1)\) so that, on the event \({\mathcal{O}(\mathcal{C})}\) in (57), we have \(\sigma ^1\cdot \sigma ^n \in I_p\). By assumption, \(\mathcal{P}(\ell )\wedge \mathcal{P}(n) \le p\) for \(2\le \ell \le n-1\) and, therefore, \(\mathcal{P}(1)\wedge \mathcal{P}(\ell ) = \mathcal{P}(\ell )\wedge \mathcal{P}(n).\) By ultrametricity, the constraint \(\sigma ^1\cdot \sigma ^{\ell }\in I_{\mathcal{P}(1)\wedge \mathcal{P}(\ell )}\) automatically implies that \(\sigma ^\ell \cdot \sigma ^{n}\in I_{\mathcal{P}(1)\wedge \mathcal{P}(\ell )} = I_{\mathcal{P}(\ell )\wedge \mathcal{P}(n)}\), which means that we can write

$$\begin{aligned} {\mathcal{O}(\mathcal{C})}={\mathcal{O}(\mathcal{C})}^-\bigcap \, \{\sigma ^1\cdot \sigma ^n \in I_p\}, \end{aligned}$$

where

$$\begin{aligned} {\mathcal{O}(\mathcal{C})}^- = \Bigl \{ (\sigma ^\ell )_{1\le \ell \le n-1} \ \bigr |\ \sigma ^\ell \cdot \sigma ^{\ell '}\in I_{\mathcal{P}(\ell )\wedge \mathcal{P}(\ell ')} \text{ for } \text{ all } 1\le \ell ,\ell ' \le n-1\Bigr \}. \end{aligned}$$

Then, using the Ghirlanda–Guerra identities, we get

$$\begin{aligned} \mathbb {E}\bigl \langle {\mathrm{I}}(S^{d,N}\in A) {\mathrm{I}}_{{\mathcal{O}(\mathcal{C})}}\bigr \rangle =&\ \frac{1}{n-1} \mathbb {E}\bigl \langle {\mathrm{I}}(S^{d,N}\in A) {\mathrm{I}}_{{\mathcal{O}(\mathcal{C})}^-}\bigr \rangle \mathbb {E}\bigl \langle {\mathrm{I}}(\sigma ^1\cdot \sigma ^2 \in I_p)\bigr \rangle \\&+ \frac{1}{n-1} \sum _{\ell =2}^{n-1} \mathbb {E}\bigl \langle {\mathrm{I}}(S^{d,N}\in A) {\mathrm{I}}_{{\mathcal{O}(\mathcal{C})}^-} {\mathrm{I}}(\sigma ^1\cdot \sigma ^\ell \in I_p)\bigr \rangle . \end{aligned}$$

By the definition (24), \(\mathbb {E}\langle {\mathrm{I}}(\sigma ^1\cdot \sigma ^2 \in I_p)\rangle = \zeta (I_p) = \zeta _{p}-\zeta _{p-1}.\) In the second sum,

$$\begin{aligned} \ \text{ either } \ {\mathrm{I}}_{{\mathcal{O}(\mathcal{C})}^-} {\mathrm{I}}(\sigma ^1\cdot \sigma ^\ell \in I_p) ={\mathrm{I}}_{{\mathcal{O}(\mathcal{C})}^-} \ \text{ or } \ {\mathrm{I}}_{{\mathcal{O}(\mathcal{C})}^-} {\mathrm{I}}(\sigma ^1\cdot \sigma ^\ell \in I_p) = 0 \end{aligned}$$

depending on whether \(\ell \in \mathcal{I}=\{2\le \ell \le n-1 \ |\ \mathcal{P}(\ell )\wedge \mathcal{P}(1) =p \}\) or not. Therefore,

$$\begin{aligned} \mathbb {E}\bigl \langle {\mathrm{I}}(S^{d,N}\in A) {\mathrm{I}}_{{\mathcal{O}(\mathcal{C})}}\bigr \rangle = \frac{\zeta _{p}-\zeta _{p-1} + |\mathcal{I}|}{n-1} \mathbb {E}\bigl \langle {\mathrm{I}}(S^{d,N}\in A) {\mathrm{I}}_{{\mathcal{O}(\mathcal{C})}^-}\bigr \rangle . \end{aligned}$$

Since this computation did not depend on the set \(A\), similarly, we get

$$\begin{aligned} \mathbb {E}\bigl \langle {\mathrm{I}}_{{\mathcal{O}(\mathcal{C})}}\bigr \rangle = \frac{\zeta _{p}-\zeta _{p-1} + |\mathcal{I}|}{n-1} \mathbb {E}\bigl \langle {\mathrm{I}}_{{\mathcal{O}(\mathcal{C})}^-}\bigr \rangle . \end{aligned}$$

Dividing these two equations, we showed that \( \mathbb {P}_{{\mathcal{O}(\mathcal{C})}}\bigl (S^{d,N}\in A \bigr ) = \mathbb {P}_{{\mathcal{O}(\mathcal{C})}^-}\bigl (S^{d,N}\in A \bigr ). \) We can now proceed in the same way to remove replica indices one by one until we are left with replicas with indices in the set \(\mathcal{R}_{d,N}.\) This finishes the proof. \(\square \)

Remark 1

Notice that the right hand side of (101) does not really depend on the configuration \(\mathcal{C}\) since the set \(\mathcal{R}_{d,N}\) involves \(N\) replicas assigned to the leaves \([d]^r\) of the tree \(\mathcal{A}_d\), and we can relabel those replicas using indices \(1,\ldots , N d^r.\) Let \(\mathcal{C}_{d,N}\) be a configuration consisting of the tree \(\mathcal{A}_d\) and a map \(\mathcal{P}_{d,N}\) that maps exactly \(N\) indices in \(\{1,\ldots , N d^r\}\) to each leaf in \([d]^r\). Then the equation (101) can be rewritten as

$$\begin{aligned} \mathbb {P}_{{\mathcal{O}(\mathcal{C})}}\bigl (S^{d,N}\in \ \cdot \ \bigr ) = \mathbb {P}_{\mathcal{O}(\mathcal{C}_{d,N}) }\bigl (S^{d,N}\in \ \cdot \ \bigr ). \end{aligned}$$
(102)

We use the same notation \(S^{d,N}\) on the right hand side but, of course, we need to change the definition of \(S^{d,N}\) to take into account this relabeling of indices. In fact, for clarity, let us index the \(N\) replicas mapped by \(\mathcal{P}_{d,N}\) into the leaf \(\alpha \in [d]^r = \mathcal{L}(\mathcal{A}_d)\) by \(\sigma ^{(\alpha , 1)},\ldots , \sigma ^{(\alpha , N)}.\) Then \(S^{d,N}\) on the right hand side of (102) is understood as

$$\begin{aligned} S^{d,N} = \bigl (S(\sigma ^{(\alpha ,\ell )}) \bigr )_{\alpha \in [d]^r,\ell \le N}. \end{aligned}$$
(103)

Notice that we use the notation \(\sigma ^{(\alpha ,\ell )}\) here to distinguish these (usual, unconditional) replicas from the Gibbs measure \(G\) from the replicas \(\sigma ^{\alpha \ell }\) in (30), which denoted the sample from conditional Gibbs measure \(G_\alpha \) on the pure state \(H_\alpha \).

For a given configuration \(\mathcal{C}=(\mathcal{T},\mathcal{P})\), let us recall the definition of \(W = (W_t)_{t\in \mathcal{T}_*}\) in (63) and (67), which represent the cluster weights around the sample on the event \({\mathcal{O}(\mathcal{C})}\). For a configuration \(\mathcal{C}\in \mathcal{C}(n,d,N)\) in (97), we will denote by

$$\begin{aligned} W^d = (W_t)_{t\in \mathcal{A}_d{\setminus }\{*\}} \end{aligned}$$
(104)

the subset of these weights along the subtree \(\mathcal{A}_d\subseteq \mathcal{T}\). Let us recall the definition of the sample configuration \(\mathcal{C}_n = (\mathcal{T}_n,\mathcal{P}_n)\) in (65) and consider two events

$$\begin{aligned} \mathcal{E}_1(n)&= \bigcup _{\mathcal{C}\in \mathcal{C}(n,d,N)} \Bigl \{S^{d,N} \in A, W^d\in B, \mathcal{C}_n = \mathcal{C}\Bigr \},\end{aligned}$$
(105)
$$\begin{aligned} \mathcal{E}_2(n)&= \bigcup _{\mathcal{C}\in \mathcal{C}(n,d,N)} \Bigl \{W^d\in B, \mathcal{C}_n = \mathcal{C}\Bigr \}. \end{aligned}$$
(106)

To understand what these events represent, let us see what they will look like with high probability when the sample size \(n\rightarrow \infty \). When \(n\) gets large, with high probability, at least \(N\) replicas will fall into each of the pure states \(H_\alpha \) for \(\alpha \in [d]^r\). First of all, this means that with high probability the sample configuration \(\mathcal{C}_n \in \mathcal{C}(n,d,N)\). Second, conditionally on this event that at least \(N\) replicas fall into each of the pure states \(H_\alpha \) for \(\alpha \in [d]^r\), what are \(S^{d,N}\) and \(W^d\) in (105) and (106)? Recall that \(\mathcal{C}_n = \mathcal{C}\) means that the event \({\mathcal{W}(\mathcal{C})}\) in (64) occurs and, for each vertex \(t\in \mathcal{T}{\setminus } \mathcal{L}(\mathcal{T})\), the cluster weights indexed by its children are arranged in decreasing order. The pure states \(H_\alpha \) and the weights \(V = (V_\alpha )_{\alpha \in \mathcal{A}}\) in (27) of the clusters around the pure states were labelled in a similar fashion in (26). This implies that whenever \(\mathcal{C}_n = \mathcal{C}\in \mathcal{C}(n,d,N)\) and at least \(N\) replicas fall into each of the pure states \(H_\alpha \) for \(\alpha \in [d]^r\), we must have \(W^d = (V_\alpha )_{\alpha \in \mathcal{A}_d{\setminus }\{*\}}.\) Moreover, in this case, the spins \(S^{d,N}\) correspond to \(N\) replicas sampled from each of the pure states \(H_\alpha \) for \(\alpha \in [d]^r\), i.e. \(S^{d,N} = (S_{\alpha ,N})_{\alpha \in [d]^r}\) defined in (94). This implies that

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathbb {P}\bigl (\mathcal{E}_1(n)\bigr )&= \mathbb {P}\bigl ((S_{\alpha ,N})_{\alpha \in [d]^r} \in A, (V_\alpha )_{\alpha \in \mathcal{A}_d{\setminus }\{*\}} \in B\bigr ),\end{aligned}$$
(107)
$$\begin{aligned} \lim _{n\rightarrow \infty }\mathbb {P}\bigl (\mathcal{E}_2(n)\bigr )&= \mathbb {P}\bigl ((V_\alpha )_{\alpha \in \mathcal{A}_d{\setminus }\{*\}} \in B\bigr ). \end{aligned}$$
(108)

To finish the proof of Theorem 1\({}^\prime \), it remains to show the following.

Lemma 7

We have,

$$\begin{aligned} \mathbb {P}\bigl (\mathcal{E}_1(n)\bigr ) = \mathbb {P}_{\mathcal{O}(\mathcal{C}_{d,N}) }\bigl (S^{d,N}\in A \bigr ) \mathbb {P}\bigl (\mathcal{E}_2(n)\bigr ). \end{aligned}$$
(109)

Proof

First of all, when we defined the sample configuration \(\mathcal{C}_n\) in (65) we explained that the events \(\mathcal{C}_n =\mathcal{C}\) are disjoint for different \(\mathcal{C}\) and \(\{\mathcal{C}_n =\mathcal{C}\} = {\mathcal{W}(\mathcal{C})}\cap {\mathcal{O}(\mathcal{C})}\). Therefore,

$$\begin{aligned} \mathbb {P}\bigl (\mathcal{E}_1(n) \bigr )&= \sum _{\mathcal{C}\in \mathcal{C}(n,d,N)} \mathbb {P}\bigl ( \{S^{d,N} \in A\} \cap \{W^d\in B\}\cap {\mathcal{W}(\mathcal{C})}\cap {\mathcal{O}(\mathcal{C})}\bigr ),\\ \mathbb {P}\bigl (\mathcal{E}_2(n) \bigr )&= \sum _{\mathcal{C}\in \mathcal{C}(n,d,N)} \mathbb {P}\bigl (\{W^d\in B\}\cap {\mathcal{W}(\mathcal{C})}\cap {\mathcal{O}(\mathcal{C})}\bigr ). \end{aligned}$$

Notice that \(\{W^d\in B\}\cap {\mathcal{W}(\mathcal{C})}\) is an event which involves only the weights \(W = (W_t)_{t\in \mathcal{T}_*}\) and can be written as \(\{W\in B'\}\) for some set \(B'\). Therefore, Theorem 4 implies that

$$\begin{aligned}&\mathbb {P}\bigl ( \{S^{d,N} \in A\} \cap \{W^d\in B\}\cap {\mathcal{W}(\mathcal{C})}\cap {\mathcal{O}(\mathcal{C})}\bigr )\\&= \mathbb {P}_{{\mathcal{O}(\mathcal{C})}}\bigl (S^{d,N} \in A)\, \mathbb {P}\bigl (\{W^d\in B\}\cap {\mathcal{W}(\mathcal{C})}\cap {\mathcal{O}(\mathcal{C})}\bigr ). \end{aligned}$$

Finally, using (102), we can write

$$\begin{aligned} \mathbb {P}\bigl (\mathcal{E}_1(n) \bigr )&= \mathbb {P}_{\mathcal{O}(\mathcal{C}_{d,N}) }\bigl (S^{d,N}\in A \bigr ) \sum _{\mathcal{C}\in \mathcal{C}(n,d,N)} \mathbb {P}\bigl (\{W^d\in B\}\cap {\mathcal{W}(\mathcal{C})}\cap {\mathcal{O}(\mathcal{C})}\bigr )\\&= \mathbb {P}_{\mathcal{O}(\mathcal{C}_{d,N}) }\bigl (S^{d,N}\in A \bigr )\, \mathbb {P}\bigl (\mathcal{E}_2(n) \bigr ), \end{aligned}$$

which finishes the proof. \(\square \)

Together with (107) and (108), Lemma 7 implies

$$\begin{aligned} \mathbb {P}\bigl ((S_{\alpha ,N})_{\alpha \in [d]^r} {\in } A, (V_\alpha )_{\alpha \in \mathcal{A}_d{\setminus }\{*\}} \in B\bigr ) = \mathbb {P}_{\mathcal{O}(\mathcal{C}_{d,N}) }\bigl (S^{d,N}\in A \bigr )\, \mathbb {P}\bigl ((V_\alpha )_{\alpha \in \mathcal{A}_d{\setminus }\{*\}} {\in } B\bigr ). \end{aligned}$$

Therefore, \((S_{\alpha ,N})_{\alpha \in [d]^r}\) and \((V_\alpha )_{\alpha \in \mathcal{A}_d{\setminus }\{*\}}\) are independent and, recalling (103),

$$\begin{aligned} \mathbb {P}\bigl ((S_{\alpha ,N})_{\alpha \in [d]^r} \in A \bigr ) = \mathbb {P}_{\mathcal{O}(\mathcal{C}_{d,N}) }\bigl (\bigl (S(\sigma ^{(\alpha ,\ell )}) \bigr )_{\alpha \in [d]^r,\ell \le N} \in A \bigr ). \end{aligned}$$

The hierarchical exchangeability of \((S_{\alpha ,N})_{\alpha \in [d]^r}\) follows, because of the obvious invariance of the event \(\mathcal{O}(\mathcal{C}_{d,N})\) under the permutations \(\pi \in \mathcal{H}_d\) in (95),

$$\begin{aligned} \bigl (\sigma ^{(\alpha ,\ell )}\bigr )_{\alpha \in [d]^r,\ell \le N} \in \mathcal{O}(\mathcal{C}_{d,N}) \Longleftrightarrow \bigl (\sigma ^{(\pi (\alpha ),\ell )}\bigr )_{\alpha \in [d]^r,\ell \le N} \in \mathcal{O}(\mathcal{C}_{d,N}). \end{aligned}$$

This finishes the proof of Theorem 1\({}^\prime \) and, thus, Theorem 1.