Abstract
A class of deep Boltzmann machines is considered in the simplified framework of a quenched system with Gaussian noise and independent entries. The quenched pressure of a K-layers spin glass model is studied allowing interactions only among consecutive layers. A lower bound for the pressure is found in terms of a convex combination of K Sherrington–Kirkpatrick models and used to study the annealed and replica symmetric regimes of the system. A map with a one-dimensional monomer–dimer system is identified and used to rigorously control the annealed region at arbitrary depth K with the methods introduced by Heilmann and Lieb. The compression of this high-noise region displays a remarkable phenomenon of localisation of the processing layers. Furthermore, a replica symmetric lower bound for the limiting quenched pressure of the model is obtained in a suitable region of the parameters and the replica symmetric pressure is proved to have a unique stationary point.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction and Results
The mean-field setting in Statistical Mechanics corresponds to the invariance of an N particles system under the permutation group action. When this condition is weakened to permutation invariance within each set of a K-partition of the system \(\big (\sum _{p=1}^{K}N_p=N\big )\), a homogeneous model generalizes to its K-populated version. This generalization has been considered in spin systems for both non-random interactions, i.e. the Curie–Weiss model [12, 13], and random interactions, i.e. the Sherrington–Kirkpatrick model [7, 25]. For the first case, a complete control of the thermodynamic properties has been reached for general values of the interaction parameters. In the random case, instead only the so-called elliptic structure of the interactions is fully controlled, while the hyperbolic one is still not understood. We mention that the case \(K=2\) has already been solved in two particular frameworks characterized by replica symmetry: on the Nishimori line [6] or with spherical spins [4, 5].
In this paper, we continue the analysis started in [2, 8] concerning a mean-field spin glass with pure hyperbolic structure of the interactions, i.e. a random version of deep Boltzmann machines [DBM] over K layers [26]. The framework of [2] is generalized by dealing with a general number K of layers and by allowing local (layer dependent) temperatures. A lower bound for the quenched pressure in terms of K Sherrington–Kirkpatrick models [SK] coupled in temperature along a linear chain is obtained and used to study the annealed and replica symmetric regimes of the random DBM in the large volume limit. We mention that an upper bound for the quenched pressure in terms of the solution of an infinite-dimensional Hamilton–Jacobi equation has recently been obtained in [22] for \(K=2\) and layers of equal size; see also [23] for a generalization by the same author.
Our first result is a control of the annealed region \(A_K\) in terms of the largest zero of a matching polynomial which—up to a change of variable in the complex plane—is the partition function of a monomer–dimer system over the linear chain of length K [18, 19]. This region \(A_K\) turns out to be exactly the one where the annealed solution \(q=0\) is stable for the replica symmetric consistency equation. The compression of the annealed region leads to a peculiar structure of the layers: in particular, the extensive layers are localized along a chain of length two or three.
A replica symmetric lower bound for the quenched pressure is obtained in a suitable region of the parameters. In the case of Gaussian external fields, this region is identified by a K-dimensional version of the Almeida–Thouless condition for SK. Within this framework, the replica symmetric consistency equation is proved to have a unique solution on the whole space of parameters. It is important to mention that the uniqueness for the elliptic case [9, 25] is still an open problem when \(K>2\).
The paper is organised as follows. Section 2 introduces the model. In Sect. 3, we provide a lower bound for the quenched pressure of the DBM in terms of an interacting variational principle. In Sect. 4, we identify and study a region where the quenched and the annealed pressure of the DBM coincides. In Sect. 5, we derive the replica symmetric functional for the DBM and we study its stationary point(s). In Sect. 6, we provide a lower bound for the quenched pressure of the DBM in terms of the previous replica symmetric functional under suitable conditions on the parameters of the model. Appendix A contains properties of the matching polynomials zeros, which are useful to characterize the annealed region in Sect. 4 and are mainly due to Heilmann and Lieb [18].
2 Definitions
Consider N spin variables \(\sigma =(\sigma _i)_{i=1,\ldots ,N}\in \{-1,1\}^N\) arranged over K layers \(L_1,\dots ,L_K\) of cardinality \(N_1,\dots ,N_K\), respectively, so that \(\sum _{p=1}^K N_p=N\,\). Assume that the relative sizes of the layers converge in the large volume limit:
for every \(p=1,\dots ,K\,\). We denote \(\varLambda _N=(L_p)_{p=1,\dots ,K}\,\), \(\lambda ^{(N)}=\big (\lambda _p^{(N)}\big )_{p=1,\dots ,K}\) and \(\lambda =(\lambda _p)_{p=1,\dots ,K}\). Clearly, \(\sum _{p=1}^K\lambda _p=1\,\).
Let \(J_{ij}\) for \((i,j)\in L_p\times L_{p+1}\) and \(p=1,\dots ,K-1\,\) be a family of i.i.d. standard Gaussian random variables coupling spins in two consecutive layers. We introduce a vector of positive inverse temperatures tuning the interactions among consecutive layers \(\beta =(\beta _p)_{p=1,\dots ,K-1}\in \mathbb {R}_+^{K-1}\,\).
Let \(h_i\) for \(i\in L_p\) and \(p=1,\dots ,K\) be a family of independent real random variables, independent also of the \(J_{ij}\)’s, acting as external fields on the spins. Assume that \((h_i)_{i\in L_p}\) are i.i.d. copies of a random variable \(h^{(p)}\) such that \(\mathbb {E}|h^{(p)}|<\infty \,\). We denote \(h=(h^{(p)})_{p=1,\dots ,K}\,\).
Definition 1
The Hamiltonian of the random deep Boltzmann machine [DBM] is
for every spin configuration \(\sigma \in \{-1,1\}^N\,\).
Definition 2
Given two spin configurations \(\sigma ,\tau \in \{-1,1\}^N\), for every \(p=1,\ldots ,K\) we define their overlap over the layer \(L_p\) as
Remark 1
The covariance matrix of the centred Gaussian process \(H_{\varLambda _N}\) is
for every \(\sigma ,\tau \in \{-1,1\}^N\). Here, we set \(q_{\varLambda _N}(\sigma ,\tau ) \equiv \big (q_{L_p}(\sigma ,\tau )\big )_{p=1,\dots ,K}\;\),
and we denote \(M_1^{(N)}\equiv M_1(\beta ,\lambda ^{(N)})\,\). Notice that \(M_0(\beta )\) can be interpreted as a weighted adjacency matrix for the layers structure of the DBM.
Definition 3
The random partition function of the model introduced by Hamiltonian (2) is
, and its quenched pressure density is
where \(\mathbb {E}\) denotes the expectation over all the couplings \(J_{ij}\,\)’s and the external fields \(h_i\)’s.
3 A Lower Bound for the Quenched Pressure of the DBM
In this section, we give an explicit bound for the quenched pressure of the K layers DBM in terms of K independent Sherrington–Kirkpatrick spin glasses [SK] [14, 24, 27].
Considering N spin variables \(\sigma _i\), \(i=1,\dots ,N\), we recall that the Hamiltonian of the SK model is
where \({\tilde{J}}_{ij}\), \(i,j=1,\dots ,N\) is a family of i.i.d. standard Gaussian random couplings. Given two spin configurations \(\sigma ,\tau \in \{-1,1\}^N\), their overlap is
and the covariance matrix of the Gaussian process \(H^{\mathrm{SK}}_N\) is:
Given an inverse temperature \(\beta >0\), the random partition function of the SK model is
where \({\tilde{h}}_i\), \(i=1,\dots ,N\) is a family of i.i.d. copies of a random variable h such that \(\mathbb {E}|h|<\infty \,\). The quenched pressure density of the SK model is
where \(\mathbb {E}\) denotes the expectation over all couplings \(\tilde{J}_{ij}\)’s and fields \({\tilde{h}}_i\)’s. The quenched pressure converges as \(N\rightarrow \infty \) and many properties of its limit, that we will denote by \({{\,\mathrm{p^{\mathrm{SK}}}\,}}(\beta ,h)\,\), have been investigated in the literature [3, 15, 17, 21, 24, 27].
Theorem 1
The quenched pressure of the DBM satisfies the following lower bound:
where, for every \(a= (a_p)_{p=1,\dots ,K-1}\in \mathbb {R}_+^{K-1}\), the functional \({{\,\mathrm{\mathcal {P}^{\mathrm{DBM}}}\,}}(a)={{\,\mathrm{\mathcal {P}^{\mathrm{DBM}}}\,}}(a;\,\beta ,\lambda ,h)\) is defined as:
and the parameter \(\theta _p(a)=\theta _p(a;\beta ,\lambda )\ge 0\) is defined by:
Proof
We are going to prove the following lower bound at finite volume:
where \(\theta _p^{(N)}\equiv \theta _p(a;\beta ,\lambda ^{(N)})\) and \(a\in \mathbb {R}_{+}^{K-1}\) can be arbitrarily chosen. The lower bound (14) will follow immediately by letting \(N\rightarrow \infty \), since \({{\,\mathrm{p^{\mathrm{SK}}_N}\,}}(\beta ,h)\) is convex with respect to \(\beta \), and thus, the convergence to \({{\,\mathrm{p^{\mathrm{SK}}}\,}}\) is uniform on compact sets.
For every \(p=1,\ldots ,K\), let \(H^{\mathrm{SK}}_{L_p}(s)\), \(s\in \{-1,1\}^{L_p}\,\) be a Gaussian process representing the Hamiltonian of an SK model over the \(N_p\) spin variables in the layer \(L_p\,\). We assume that \(H^{\mathrm{SK}}_{L_1},\dots ,H^{\mathrm{SK}}_{L_K}\) are independent processes, also independent of the Hamiltonian \(H_{\varLambda _N}\). For \(\sigma \in \{-1,1\}^N\) and \(t\in [0,1]\), we define an interpolating Hamiltonian as follows:
where of course \(\sigma _{L_p}\equiv (\sigma _i)_{i\in L_p}\,\). An interpolating quenched pressure is naturally defined as
where
and \(\mathbb {E}\) denotes the expectation with respect to all the couplings \(J_{ij}\)’s, \({\tilde{J}}_{ij}\)’s, \(h_i\)’s. The quenched pressure of the DBM and a convex combination of quenched pressures of SK models are recovered for \(t=1\) and \(t=0\), respectively:
For every function \(f:\{-1,1\}^{N}\times \{-1,1\}^{N}\rightarrow \mathbb {R}\,\), we denote
Let \(Q_N:\{-1,1\}^{N}\times \{-1,1\}^{N}\rightarrow \mathbb {R}\,\),
Gaussian integration by parts leads to the following result:
Now, replacing the definition (16) of \(\theta _p^{(N)}=\theta _p(a;\beta ,\lambda ^{(N)})\) into (24), we obtain
The claim (17) follows immediately from (21), (22), (25) and (26). \(\square \)
Remark 2
\(a=(a_p)_{p=1,\dots ,K-1}\) is a stationary point of \({{\,\mathrm{\mathcal {P}^{\mathrm{DBM}}}\,}}\) if and only if
for every \(p=1,\dots ,K-1\,\), where we define \({{\,\mathrm{q^{\mathrm{SK}}}\,}}(\beta ,h)\ge 0\) by
and
Since \(\frac{\partial }{\partial \beta }{{\,\mathrm{p^{\mathrm{SK}}}\,}}(\beta ,h)=\beta \,\big (1-{{\,\mathrm{q^{\mathrm{SK}}}\,}}(\beta ,h)^2\big )\,\) [27], it is straightforward to compute \(\frac{\partial }{\partial a_p}{{\,\mathrm{\mathcal {P}^{\mathrm{DBM}}}\,}}\,\) from definition (15) and find the stationary condition (27).
4 The Annealed Region of the DBM
In this section, we consider the model in absence of external field (\(h=0\)) and we identify a region where the quenched and the annealed pressure of the DBM coincide.
Definition 4
The annealed pressure of the DBM is
It can be easily computed due to the Gaussian nature of the model:
By concavity of the \(\log \), the annealed pressure is an upper bound for the quenched one:
The system is said to be in the annealed regime when the parameters \((\beta ,\lambda )\) are such that \(\lim _{N\rightarrow \infty }{{\,\mathrm{p^{\mathrm{DBM}}_{\varLambda _N}}\,}}= {{\,\mathrm{p^{\mathrm{DBM-A}}}\,}}\,\).
By Theorem 1, we can investigate the annealed regime of the DBM relying on the established results for the annealed regime of the SK model. Let \({{\,\mathrm{p^{\mathrm{SK}}}\,}}\) be the limiting quenched pressure of an SK model, and let \({{\,\mathrm{p^{\mathrm{SK-A}}}\,}}\equiv \lim _{N\rightarrow \infty }N^{-1}\log \mathbb {E}Z^{\mathrm{SK}}_N\) be its annealed version. Clearly:
Equality is achieved in the so-called annealed region of the SK model [1, 14, 24, 27]:
Now, consider the following system of inequalities:
and the following region of parameters of the DBM:
where \(T_K \equiv \{ (\lambda _1,\dots ,\lambda _K) \in [0,1]^K \,|\, \sum _{p=1}^K\lambda _p=1 \}\,\) denotes the \(K-\)dimensional simplex. We denote by \(\overline{A_K}\) the topological closure of \(A_K\,\).
Theorem 2
If \((\beta ,\lambda )\in \overline{A_K}\), there exists
Proof
The lower bound (14) for the quenched pressure of the DBM rewrites as:
Thanks to (33) and (34), if \((\beta ,\lambda )\in \overline{A_K}\), then the supremum in (38) vanishes and
This bound together with (32) concludes the proof. \(\square \)
It is an open question whether \(\overline{A_K}\) is the full annealed region of the system. We will see that Proposition 4 suggests a positive answer. We are now interested in a more explicit characterization of \(A_K\). We mention that such a characterization can be interesting for inference problems as suggested in [10]. It is convenient to introduce the following family of polynomials.
Definition 5
Let \(x\in \mathbb {C}\) and \(t=(t_p)_{p=1,\dots ,K-1}\in [0,\infty )^{K-1}\). We define recursively
These orthogonal polynomials have several characterizations and were studied by Heilmann and Lieb [18, 19]. Some relevant properties can be found in Appendix A.
Remark 3
The polynomial \({{\,\mathrm{\Delta }\,}}_K(x,t)\) has an interesting combinatorial interpretation. Let’s denote by \({\mathscr {L}}_K\) the linear graph of vertex set \(\{1,\dots ,K\}\) and edge set \(\{(p,p+1)\,|\,p=1,\dots ,K-1\}\,\). A matching on \(\mathscr {L}_K\) is a subset of pairwise disjoint edges. Then:
where:
Indeed, the polynomial on the right-hand side of (41) verifies the recursion relation (40) (see [18]).
Proposition 1
Let \((\beta ,\lambda )\in \mathbb {R}_{+}^{K-1}\times T_K\) and set
where the parameter \(t=(t_p)_{p=1,\dots ,K-1}\,\) is defined by
The followings are equivalent:
-
(i)
\((\beta ,\lambda )\in A_K\)
-
(ii)
\({{\,\mathrm{\Delta }\,}}_p\!\big (\,1,\,t(\beta ,\lambda )\,\big )>0 \quad \forall \,p=2,\dots ,K\)
-
(iii)
\(\rho (\beta ,\lambda )<1\),
Proof
(i)\(\Leftrightarrow \)(ii). To shorten the notation set \(z_p\equiv {{\,\mathrm{\Delta }\,}}_p\!\big (1,\,t(\beta ,\lambda )\big )\,\). By (40), we have
Set \(a_K^*\equiv \frac{z_K}{z_{K-1}}\) and, for \(p=1,\dots ,K-1\)
Notice that if \(\lambda _p=0\), then \(a_p^*\) diverges, while \(2\lambda _p\beta _p^2\,a_p^*=1\), since \(z_{p-1}=z_p=z_{p+1}\,\). The following recursion relation follows from (45):
Now, assume \(z_1,\dots ,z_K>0\). Then, \(a_1^*,\dots ,a_K^*>0\) and choosing \(a_1=a^*_1\),..., \(a_{K-1}=a^*_{K-1}\) the system of inequalities (35) is verified.
On the other hand, assuming that there exist \(a_1,\dots ,a_{K-1}>0\) verifying (35), one can prove by induction that \(a_p^*\ge a_p>0\) for \(p=1,\dots ,K-1\) and \(a_K^*>0\,\). Therefore, \(z_1,\dots ,z_K>0\,\).
ii)\(\Leftrightarrow \)iii). Equivalence of these conditions is a consequence of the interlacing property of the zeros of \({{\,\mathrm{\Delta }\,}}_p\,\). A detailed proof can be found in Appendix (Corollary 4 with \(\rho =1\)). \(\square \)
Remark 4
The polynomial \({{\,\mathrm{\Delta }\,}}_K(x,t)\) with \(t=t(\beta ,\lambda )\) defined in (44) has also a linear algebra interpretation. Set:
where \(M_0(\beta )\) is defined by (6). The characteristic polynomial of \(M(\beta ,\lambda )\) is actually
Indeed using the Laplace expansion according to the last line of the matrix, it is easy to verify that the determinant on the right-hand side of (49) satisfies the recursion relation (40). Now, since the zeros of \(x\mapsto {{\,\mathrm{\Delta }\,}}_K(x,t(\beta ,\lambda ))\) are all real and symmetric with respect to the origin (see Appendix), the largest one is the spectral radius of \(M(\beta ,\lambda )\,\):
The next proposition exploits the result of Proposition 1 in order to study the role of the parameters \(\beta \) and \(\lambda \) in the annealed behaviour of the system.
Proposition 2
(i) For every \(\beta \in \mathbb {R}_+^{K-1}\,\),
The supremum is reached exactly for those \(\lambda =\lambda ^*(\beta )\in T_K\) such that there exists \(p^*\in \{1,\dots ,K-1\}\,\):
or \(p^*\in \{2,\dots ,K-1\}\,\):
(ii) Moreover, for every \(\lambda \in T_K\), \(\rho (\beta ,\lambda )\) is a non-decreasing function of each \(\beta _p\) for \(p=1,\dots ,K-1\).
Physically, ii) means that increasing the local temperatures pushes the system towards the annealed region. On the other hand, i) implies that if all the inverse temperatures \(\beta _p<1\) for \(p=1,\dots ,K-1\), then the system is in the annealed regime for every choice of the form factors \(\lambda \). Furthermore, if this is not the case, the system can be driven out of the region \(A_K\) by localizing the positive density layers around the minimal temperature(s).
In order to prove Proposition 2, we need the following elementary (but useful)
Lemma 1
Let \(P\ge 2\), \(x_1,\dots ,x_P\ge 0\) and \(b_1,\dots ,b_{P-1}\ge 0\,\). Set \(S\equiv \sum _{p=1}^Px_p\) and \(B \equiv \max _{p=1,\dots ,P-1} b_p\,\). Then:
Moreover, we have equality in (54) if and only if there exists \(p^*\in \{2,\dots ,P-1\}\) such that
or there exists \(p^*\in \{1,\dots ,P-1\}\) such that
Proof
Since
the following inequality holds true:
Therefore:
As a trivial consequence, we have:
Now, all the previous inequalities are saturated if and only if the following conditions are fulfilled:
It is easy to check that (61) is equivalent to (55) or (56), concluding the proof.
\(\square \)
Proof (of Proposition 2) By Remark 4, \(\rho (\beta ,\lambda )\) is the spectral radius of the matrix \(M(\beta ,\lambda )\,\). Hence:
and the square of the matrix (48) can be easily computed leading to
where for every \(p=1,\dots ,K\), \(p'=p-2,\dots ,p+1\) we set
and for convenience we denote \(\lambda _{p}\equiv 0\) for \(p\notin \{1,\dots ,K\}\,\) and \(\beta _{p}\equiv 0\) for \(p\notin \{1,\dots ,K-1\}\). The inequality in (63) follows by Lemma 1 since \(\sum _p \lambda _p=1\,\).
Now, assume that \(\rho (\beta ,\lambda )=\max _{p=1,\dots ,K-1}\beta _p^2\equiv {\hat{\beta }}^2\). In particular, the inequality in (63) must be saturated; namely, there exists \(p\in \{1,\dots ,K\}\) such that
Then, (52) or (53) follows from Lemma 1.
On the other hand, assume that condition (52) or (53) holds true. In order to prove that \(\rho (\beta ,\lambda )={\hat{\beta }}^2\), it suffices to show that \(x={\hat{\beta }}^2\) is a zero of the matching polynomial \({{\,\mathrm{\Delta }\,}}_K\!\big (x,t(\beta ,\lambda )\big )\), where the activities vector \(t(\beta ,\lambda )\) is defined by (44). Now, condition (53) implies that
while condition (52) implies that
This concludes the proof of Proposition 2 part (i). In order to prove part (ii), we observe that the matrix \(M(\beta ,\lambda )\) has nonnegative entries; therefore, its spectral radius \(\rho (\beta ,\lambda )\) is a non-decreasing function of its entries. \(\square \)
5 The Replica Symmetric Ansatz for the DBM
In this section, we derive a replica symmetric expression for the pressure of the DBM. We show that at zero magnetic field, the annealed region \(A_K\) identified by Theorem 2 and Proposition 1 is the only region where the annealed solution is stable for the replica symmetric consistency equation. Finally, we prove the uniqueness of the solution of the replica symmetric consistency equation, under the hypothesis of Gaussian centred external fields.
Let \(q=(q_p)_{p=1,\dots ,K}\in [0,1]^K\,\). Consider the matrices \(M=M(\beta ,\lambda )\), \(M_1=M_1(\beta ,\lambda )\) defined by (48), (5), respectively. For \(p=1,\dots ,K\), we have
where \(\beta _0=\beta _K=\lambda _0=\lambda _{K+1}=q_0=q_{K+1}\equiv 0\,\) for convenience. We have
Definition 6
For every \(q=(q_p)_{p=1,\dots ,K}\in [0,1]^K\), the replica symmetric functional of the DBM is
where z is a standard Gaussian random variable independent of h and \(M^{(N)}\equiv M(\beta ,\lambda ^{(N)})\,\), \(M_1^{(N)}\equiv M_1(\beta ,\lambda ^{(N)})\) are tridiagonal matrices defined by (48), (5), respectively. The limit of the functional as \(N\rightarrow \infty \) is
where \(M=M(\beta ,\lambda )\) and \(M_1=M_1(\beta ,\lambda )\,\).
Definition 6 is motivated by the following
Proposition 3
For every \(q=(q_p)_{p=1,\dots ,K}\in [0,1]^K\)
where \(q_{\varLambda _N}\equiv \big (q_{L_p}(\sigma ,\tau )\big )_{p=1,\dots ,K}\) and \(\langle \,\cdot \,\rangle _{N,t}\) denotes the quenched Gibbs expectation associated with a suitable Hamiltonian.
Proof
Let \(q\in [0,\infty )^K\). For every \(p=1,\dots , K\), we consider a one-body model over the \(N_p\) spin variables indexed by the layer \(L_p\) at inverse temperature \(\sqrt{(M^{(N)}q)_p}\,\) and external fields distributed as \(h^{(p)}\). For \(\sigma \in \{-1,1\}^N\) and \(t\in [0,1]\), we define an interpolating Hamiltonian as follows:
where \(z_i\), \(i\in L_p\), \(p=1,\dots , K\) are i.i.d. standard Gaussian random variables, independent also of \(h_i\)’s and \(J_{ij}\)’s. The interpolating pressure is
Observe that the quenched pressure of the DBM and a convex combination of quenched pressures of one-body models are recovered for \(t=1\), \(t=0\), respectively:
Gaussian integration by parts leads to the following result:
where \(\langle \,\cdot \,\rangle _{N,t}\) denotes the quenched Gibbs expectation associated with the Hamiltonian \(\mathcal H_N(\sigma ,t)+{\mathcal {H}}_N(\tau ,t)\). Therefore, (72) follows by (75), (76), (77) concluding the proof. \(\square \)
We say that the DBM is in the replica symmetric regime when there exists \(q^*\) stationary point of \({{\,\mathrm{\mathcal {P}^{\mathrm{RS-DBM}}}\,}}(q)\) such that \(\lim _{N\rightarrow \infty }{{\,\mathrm{p^{\mathrm{DBM}}_{\varLambda _N}}\,}}= {{\,\mathrm{\mathcal {P}^{\mathrm{RS-DBM}}}\,}}(q^*)\,\).
Remark 5
\(q=(q_p)_{p=1,\dots ,K}\) is a stationary point of \({{\,\mathrm{\mathcal {P}^{\mathrm{RS-DBM}}}\,}}\) if and only if
where the matrices \(M=M(\beta ,\lambda )\), \(M_1=M_1(\beta ,\lambda )\) are defined by (48), (5), respectively, and z is a standard Gaussian random variable independent of h. Indeed, Gaussian integration by parts allows to compute \(\frac{\partial }{\partial q_p}{{\,\mathrm{\mathcal {P}^{\mathrm{RS-DBM}}}\,}}\) from definition (71).
Remark 6
For \(h=0\), observe that \(q=0\) is a solution of (78) and the replica symmetric functional computed at this stationary point equals the annealed pressure of the DBM:
Proposition 4
Set \(F:[0,1]^K\rightarrow [0,1]^K\), \(F_p(q) \,\equiv \, \mathbb {E}\tanh ^2\!\left( z\,\sqrt{(Mq)_p}\right) \) for every \(p=1,\dots ,K\). The region of parameters \((\beta ,\lambda )\) such that the annealed solution \(q=0\) is a stable solution of the replica symmetric consistency equation \(q=F(q)\) coincides with the region \(A_K\) introduced in Sect. 4. Precisely:
Proof
Gaussian integration by parts allows to compute the derivatives of F with respect to q, leading to
Therefore, (80) follows immediately by Proposition 1 and Remark 4. \(\square \)
When the matrix \(M_1\) is invertible, the replica symmetric Eq. (78) rewrites as:
The problem of uniqueness of the solution of (82) has been proposed by Panchenko in [25] for the convex case (where M is replaced by a positive definite matrix) and solved in [9] for \(K=2\). In the following, we prove the uniqueness for the deep case (our matrix M is highly non-definite) under the assumption of Gaussian centred external fields. Denote \(T_K^+ \,\equiv \, \{ (\lambda _1,\dots ,\lambda _K) \in (0,1]^K \,|\, \sum _{p=1}^K\lambda _p=1 \}\,\).
Theorem 3
Let \(h^{(p)}\), \(p=1,\dots ,K\) be centred Gaussian variables with variance \(v_p>0\), respectively. Let \(\lambda \in T_K^+\) and \(\beta \in \mathbb {R}_+^{K-1}\). The consistency Eq. (82), which rewrites as
with \(M=M(\beta ,\lambda )\) defined in (48), has a unique solution.
The proof of Theorem 3 relies on the following
Lemma 2
Let h be a centred Gaussian variable with variance \(v>0\). Let \(\beta >0\). Then, equation
has a unique solution that we denote by \({{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}(\beta ,v)>0\,\). The function \({{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}\) is strictly increasing with respect to both \(\beta \) and v.
The uniqueness part in Lemma 2 is the well-known Latala–Guerra’s lemma [27]. The monotonicity part is based on a similar argument. Whereas the uniqueness property holds true for much more general choices of the external field h, we notice that the monotonicity property in \(\beta \) is lost for deterministic (large enough) h.
Proof (Lemma 2) Set \(f(q)\equiv q^{-1}\,\mathbb {E}\tanh ^2(z\,\sqrt{2\,q\,\beta ^2+v}\,)\) for \(q>0\). To prove that (84) has a unique solution, it suffices to show that f is strictly decreasing. Now, taking the derivative of f (avoiding Gaussian integration by parts) leads to:
where \(\phi (y)\equiv \tanh y\) and \(y\equiv z\,\sqrt{2\,q\,\beta ^2+v}\,\). Since \(\phi \) is odd, strictly positive on \(\mathbb {R}_+\), strictly increasing on \(\mathbb {R}\) and strictly concave on \(\mathbb {R}_+\), it follows that the functions inside each expectation in (85) are strictly positive for \(y\ne 0\,\). In particular, observe that \({{\,\mathrm{sign}\,}}\phi (y)={{\,\mathrm{sign}\,}}y\) and that
Therefore, \(\frac{df}{dq}<0\), proving uniqueness of the solution of Eq. (84).
Now, let’s prove that the solution \({{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}\) is strictly increasing with respect to \(\beta >0\). Taking the derivative with respect to \(\beta ^2\) on both sides of (84) (avoiding integration by parts), one finds:
where \(Y\equiv z\,\sqrt{2\beta ^2{{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}+v}\,\). Reordering terms and replacing \({{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}\) by \(\mathbb {E}\,\phi (Y)^2\) lead to:
In a similar way, one can prove that \({{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}\) is strictly increasing with respect to v, indeed:
\(\square \)
Proof (Theorem 3) A key observation is that the system (83) is equivalent to the following:
where we have introduced the auxiliary variables \(a_1,\dots ,a_{K-1}>0\,\). This can be easily checked by comparing definitions (16) and (68). By Lemma 2, the first line of (90) entails
where \({{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}\) is uniquely defined and strictly increasing with respect to both arguments. On the other hand, the second line of (90) rewrites as
Therefore, in order to prove the theorem it suffices to prove uniqueness of the solution \(a\in \mathbb {R}_+^{K-1}\) of the following system:
It is convenient to set \(Q_1(a_1)\,\equiv \, \lambda _1\,{{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}\big (\lambda _1\,\beta _1^2\,a_1,v_1\big )\) and for every \(p\ge 2\)
We are going to prove by induction on \(p\ge 1\) that for any given \(a_{p+1}\ge 0\), there exists a unique \(a_p\,=\,a^*_p(a_{p+1})>0\) such that
and moreover \(a_p^*\) is strictly increasing with respect to \(a_{p+1}\,\). The uniqueness of solution of (93) will follow immediately by stopping at \(p=K-1\) and choosing \(a_K=0\,\).
\(\bullet \) Case \(p=1\): given \(a_2\ge 0\), let’s consider the equation
By Lemma 2, the left-hand side of (96) is a strictly increasing function of \(a_1>0\) and takes all the values in the interval \((0,\infty )\), while the right-hand side is a decreasing function of \(a_1>0\) and takes nonnegative values. Therefore, there exists a unique \(a_1=a_1^*(a_2)>0\) solution of (96). Now, taking derivatives on both sides of (96) and using again Lemma 2, one finds:
; hence, \(a_1^*\) is a strictly increasing function of \(a_2\,\).
\(\bullet \) For \(p>1\,\), \(p-1\) \(\Rightarrow \) p. Fix \(a_{p+1}\ge 0\,\). By inductive hypothesis, \(a_1^*,\dots ,a_{p-1}^*\) are well defined and strictly increasing functions. Defining the composition \(A_l^*\equiv a_l^*\circ \dots \circ a_{p-1}^*\) for every \(l=1,\dots ,p-1\), Eq. (95) rewrites as:
By inductive hypothesis and Lemma 2, the left-hand side of (98) is a strictly increasing function of \(a_p>0\) and takes all the values in the interval \((0,\infty )\), while the right hand-side of (98) is a decreasing function of \(a_p>0\) and takes nonnegative values. Therefore, for every \(a_{p+1}\ge 0\) there exists a unique \(a_p=a_p^*(a_{p+1})>0\) solution of (98). Now, taking derivatives on both sides of (98) one finds:
which, using again the inductive hypothesis and Lemma 2, entails that \(a_p^*\) is a strictly increasing function of \(a_{p+1}\,\). \(\square \)
6 A Replica Symmetric Bound for the DBM
In this section, a lower bound for the quenched pressure of the DBM in terms of the replica symmetric functional is provided in a suitable region of the parameters \(\beta ,\,\lambda ,\,h\). For centred Gaussian external fields, this region is defined through a system of K inequalities which mimic the Almeida–Thouless condition for the SK model.
By Theorem 1, we can investigate the replica symmetric regime of the DBM relying on the established results for the replica symmetric regime of the SK model. Denote by \({{\,\mathrm{\mathcal {P}^{\mathrm{RS-SK}}}\,}}\) the replica symmetric functional of an SK model, namely for every \(q\in [0,1]\), \(\beta >0\), h real random variable with \(\mathbb {E}\,|h|<\infty \),
where z is a standard Gaussian random variable independent of h. Stationary points of \({{\,\mathrm{\mathcal {P}^{\mathrm{RS-SK}}}\,}}\) are identified by the consistency equation
where z is a standard Gaussian r.v. independent of h. The celebrated Guerra’s bound [15] states in particular that
for every \(\beta ,h\). Identifying the exact replica symmetric region of the SK model, where equality in (102) is achieved, is an open problem. A first result about the replica symmetric region of the DBM under general (but implicit) conditions is provided by the following
Theorem 4
For every \(q\in [0,1]^K\), \(a\in \mathbb {R}_+^K\) related by
; the following inequality holds true:
Moreover, if the parameters \(\beta ,\,\lambda ,\,h\) are such that there exist \(q,\,a\) related by (103) and verifying
then equality is achieved in (104) and as a consequence
Proof
Since \(q,\,a\) are related by (103), it is straightforward to verify that
By Guerra’s bound (102), substituting \({{\,\mathrm{\mathcal {P}^{\mathrm{RS-SK}}}\,}}\) to \({{\,\mathrm{p^{\mathrm{SK}}}\,}}\) in the right-hand side of expression (15) provides an upper bound to \({{\,\mathrm{\mathcal {P}^{\mathrm{DBM}}}\,}}(a)\,\). Now, using the expression (100) of \({{\,\mathrm{\mathcal {P}^{\mathrm{RS-SK}}}\,}}\), the relation (107) and comparing with the expression (71) of \({{\,\mathrm{\mathcal {P}^{\mathrm{RS-DBM}}}\,}}\), bound (104) is finally proved.
Following the same computations, if (105) holds true, then
and bound (106) then follows by Theorem 1. \(\square \)
More explicit conditions for achieving equality in (104) and having the replica symmetric bound (106) are based on the control of the replica symmetric region in the SK model. For example, it is known that equality in (102) is achieved for \(\beta \) small enough. Precisely, in Theorem 1.4.10 of [27] Talagrand proves that for every h
where q is the unique solution of (101). (Notice the different parametrisation with respect to [27].)
Corollary 1
Let \(\beta ,\,\lambda ,\,h\) such that a solution q of the replica symmetric consistency Eq. (82) satisfies the inequalities
Then, the replica symmetric bound (106) holds true.
Proof
Let q be a solution of (82) satisfying (110). Let \(a\in \mathbb {R}_+^{K-1}\) verifying (103), so that the relation (107) holds true. Then, (110) and (82) rewrite, respectively, as:
for every \(p=1,\dots ,K\,\). By Talagrand’s result (109), this entails
for every \(p=1,\dots ,K\,\). Therefore, by Theorem 4,
and the bound (106) holds true. \(\square \)
A complete characterization of the SK replica symmetric region where equality is achieved in (102) is still missing (see nevertheless [16, 20, 27]). A necessary condition is the Almeida–Thouless condition [28]:
where q is a solution of the consistency Eq. (101).
However, if we take h Gaussian centred r.v. with variance \(v>0\), it was recently proved [11] that the Almeida–Thouless condition is also sufficient to have equality in (102). Precisely:
Corollary 2
Assume \(h^{(p)}\), \(p=1,\dots ,K\) centred Gaussian variables of variance \(v_p>0\), respectively. Let \(\beta ,\,\lambda ,\,v\) such that the (unique) solution q of the replica symmetric consistency Eq. (83) satisfies the inequalities
Then, the replica symmetric bound (106) holds true.
Proof
Let q be the unique solution of (83). Let \(a\in \mathbb {R}_+^{K-1}\) verifying (103), so that the relation (107) holds true. Then, (116) and (83) rewrite, respectively, as:
for every \(p=1,\dots ,K\,\). By Chen’s result (115), this entails
for every \(p=1,\dots ,K\,\). Therefore, by Theorem 4,
and the bound (106) holds true. \(\square \)
References
Aizenman, M., Lebowitz, J.L., Ruelle, D.: Some rigorous results on the Sherrington–Kirkpatrick spin glass model. Commun. Math. Phys. 112, 3–20 (1987)
Alberici, D., Barra, A., Contucci, P., Mingione, E.: Annealing and replica symmetry in deep Boltzmann machines. J. Stat. Phys. 180, 665–677 (2020)
Auffinger, A., Chen, W.-K.: The Parisi formula has a unique minimizer. Commun. Math. Phys. 335, 1429–1444 (2015)
Auffinger, A., Chen, W.-K.: Free energy and complexity of spherical bipartite models. J. Stat. Phys. 157(1), 40–59 (2014)
Baik, J., Lee, J.O.: Free energy of bipartite spherical Sherrington–Kirkpatrick model. arXiv:1711.06364
Barbier, J., Macris, N., Miolane, L.: The layered structure of tensor estimation and its mutual information. In: 55th Annual Allerton Conference on Communication Control and Computing (2017)
Barra, A., Contucci, P., Mingione, E., Tantari, D.: Multi-species mean field spin glasses: rigorous results. Annales Henri Poincaré 16(3), 691–708 (2015)
Barra, A., Genovese, G., Guerra, F.: Equilibrium statistical mechanics of bipartite spin systems. J. Phys. A 44, 245002 (2011)
Bates, E., Sloman, L., Sohn, Y.: Replica symmetry breaking in multi-species Sherrington–Kirkpatrick model. J. Stat. Phys. 174, 333–350 (2019)
Chen, W.-K.: Phase transition in the spiked random tensor with Rademacher prior. Ann. Stat. 47(5), 2734–2756 (2019)
Chen, W.-K.: private communication (unpublished)
Contucci, P., Fedele, M.: Scaling limits for multispecies statistical mechanics mean-field models. J. Stat. Phys. 144(6), 1186–1205 (2011)
Contucci, P., Gallo, I.: Bipartite mean field spin systems. Existence and solution. Math. Phys. Electronic J. 14, 1–22 (2008)
Contucci, P., Giardinà, C.: Perspectives on Spin Glasses. Cambridge University Press, Cambridge (2013)
Guerra, F.: Broken replica symmetry bounds in the mean field spin glass model. Commun. Math. Phys. 233(1), 1–12 (2003)
Guerra, F., Toninelli, F.L.: Quadratic replica coupling in the Sherrington–Kirkpatrick mean field spin glass model. J. Math. Phys. 43, 3704 (2002)
Guerra, F., Toninelli, F.L.: The thermodynamic limit in mean field spin glass models. Commun. Math. Phys. 230(1), 71–79 (2002)
Heilmann, O.J., Lieb, E.H.: Theory of monomer-dimer systems. Commun. Math. Phys. 25(3), 190–232 (1972)
Heilmann, O.J., Lieb, E.H.: Monomers and dimers. Phys. Rev. Lett. 24, 1412–1414 (1970)
Jagannath, A., Tobasco, I.: Some properties of the phase diagram for mixed p-spin glasses. Probab. Theory Relat. Fields 167, 615–672 (2017)
Mézard, M., Parisi, G., Virasoro, M.A.: Spin Glass Theory and Beyond: An Introduction to the Replica Method and Its Applications. World Scientific, Singapore (1987)
Mourrat, J.-C.: Nonconvex interactions in mean-field spin glass. arXiv:2004.01679
Mourrat, J.-C.: Free energy upper bound for mean-field vector spin glasses. arXiv:2010.09114
Panchenko, D.: The Sherrington–Kirkpatrick model. Springer, Berlin (2013)
Panchenko, D.: The free energy in a multi-species Sherrington–Kirkpatrick model. Ann. Probab. 43(6), 3494–3513 (2015)
Salakhutdinov, R., Hinton, G.: Deep Boltzmann machines. Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, PMLR 5, 448–455 (2009)
Talagrand, M.: Mean Field Models for Spin Glasses. Volume I: Basic Examples. Springer, Berlin (2011)
Toninelli, F.L.: About the Almeida–Thouless transition line in the Sherrington–Kirkpatrick mean field spin glass model. Europhys. Lett. 60(5), 764–767 (2002)
Acknowledgements
The authors thank Adriano Barra, Wei-Kuo Chen, Francesco Guerra and Daniele Tantari for interesting discussions. D.A. is grateful to Alberto Viscardi for his contribution to Proposition 2. P.C. was partially supported by PRIN project Statistical Mechanics and Complexity (2015K7KK8L). D.A. and E.M. were partially supported by Progetto Almaidea 2018.
Funding
Open Access funding provided by EPFL Lausanne.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Vieri Mastropietro.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Matching Polynomials
Appendix: Matching Polynomials
In this Appendix, we give some properties of the polynomials \({{\,\mathrm{\Delta }\,}}_p(x,t)\) introduced by Definition 5 and characterizing the annealed region of the DBM. In particular, we are interested in the location of the zeros of \({{\,\mathrm{\Delta }\,}}_p\), namely the points \(x\in \mathbb {C}\) such that \({{\,\mathrm{\Delta }\,}}_p(x,t)=0\,\).
Theorem 5 and Corollary 3 are due to Heilmann and Lieb [18] and show that the zeros are real and have an interlacing property. Proposition 5 and Corollary 4, by using these results, contribute to the proof of Proposition 1 in Sect. 4. Precisely, we show that the zeros of \({{\,\mathrm{\Delta }\,}}_K\) lie in the interval \((-\rho ,\rho )\) if and only if all the polynomials \({{\,\mathrm{\Delta }\,}}_p\) for \(p\le K\) are positive at \(x=\rho \,\).
Theorem 5
(Heilmann-Lieb [18]) Let \(t_p>0\) for all \(p=1,\dots ,K-1\,\). Then, for every \(p=1,\dots ,K\)
-
(i)
the zeros of \({{\,\mathrm{\Delta }\,}}_p\) are real and simple;
-
(ii)
if \(p\ge 1\), the zeros of \({{\,\mathrm{\Delta }\,}}_p\) “interlace” with those of \({{\,\mathrm{\Delta }\,}}_{p-1}\). Namely, denoting by \(x_1^{(p-1)}<\dots <x_{p-1}^{(p-1)}\) the zeros of \({{\,\mathrm{\Delta }\,}}_{p-1}\) and by \(x_1^{(p)}<\dots <x_{p}^{(p)}\) the zeros of \({{\,\mathrm{\Delta }\,}}_p\), we have:
$$\begin{aligned} x_1^{(p)} \,<\, x_1^{(p-1)} \,<\, x_2^{(p)} \,<\, x_2^{(p-1)} \,<\, \dots \,<\, x_{p-1}^{(p)} \,<\, x_{p-1}^{(p-1)} \,<\, x_p^{(p)}\ . \end{aligned}$$(120)
Proof
The statement is trivially true for \(p=0\) and \(p=1\). Consider \(p\ge 1\), assume the statement holds true for \(p-1\) and p, and prove it for \(p+1\). By induction hypothesis, the zeros of \({{\,\mathrm{\Delta }\,}}_{p}\) and those of \({{\,\mathrm{\Delta }\,}}_{p-1}\) are real and simple and they are interlaced; namely, (120) holds true.
Since the zeros of \({{\,\mathrm{\Delta }\,}}_{p-1}\) are simple, \({{\,\mathrm{\Delta }\,}}_{p-1}\) changes its sign exactly at every \(x_1^{(p-1)},\dots ,x_{p-1}^{(p-1)}\). By (120), it follows that \({{\,\mathrm{\Delta }\,}}_{p-1}\) has alternating signs at the points \(x_1^{(p)},\dots ,x_{p}^{(p)}\). Therefore, also \({{\,\mathrm{\Delta }\,}}_{p+1}\) has alternating signs at the points \(x_1^{(p)},\dots ,x_{p}^{(p)}\,\), indeed by the recursion relation (40)
for every \(k=1,\dots ,p\). As a consequence, \({{\,\mathrm{\Delta }\,}}_{p+1}\) has (at least) one zero in each interval \(\big (x_k^{(p)},\,x_{k+1}^{(p)}\big )\) for \(k=1,\dots ,p-1\). Moreover, since \({{\,\mathrm{\Delta }\,}}_{p+1}\) and \({{\,\mathrm{\Delta }\,}}_{p-1}\) share the same sign as \(x\rightarrow \infty \) and as \(x\rightarrow -\infty \,\), (121) implies that \({{\,\mathrm{\Delta }\,}}_{p+1}\) has (at least) one zero in \(\big (x_p^{(p)},\,\infty \big )\) and (at least) one zero in \(\big (-\infty ,\,x_{1}^{(p)}\big )\,\). Since the zeros of \({{\,\mathrm{\Delta }\,}}_{p+1}\) are exactly \(p+1\), the thesis follows. \(\square \)
Theorem 5 can be extended to the case of nonnegative coefficients:
Corollary 3
(Heilmann-Lieb [18]) Let \(t_p\ge 0\) for all \(p=1,\dots ,K-1\,\). Then, for every \(p=1,\dots ,K\)
-
(i)
the zeros of \({{\,\mathrm{\Delta }\,}}_p\) are real;
-
(ii)
if \(p\ge 1\), the zeros of \({{\,\mathrm{\Delta }\,}}_p\) “weakly interlace” with those of \({{\,\mathrm{\Delta }\,}}_{p-1}\). Namely, denoting by \(x_1^{(p-1)}\le \dots \le x_{p-1}^{(p-1)}\) the zeros of \({{\,\mathrm{\Delta }\,}}_{p-1}\) and by \(x_1^{(p)}\le \dots \le x_{p}^{(p)}\) the zeros of \({{\,\mathrm{\Delta }\,}}_p\) repeated according to their multiplicity, we have:
$$\begin{aligned} x_1^{(p)} \,\le \, x_1^{(p-1)} \,\le \, x_2^{(p)} \,\le \, x_2^{(p-1)} \,\le \, \dots \,\le \, x_{p-1}^{(p)} \,\le \, x_{p-1}^{(p-1)} \,\le \, x_p^{(p)}\ . \end{aligned}$$(122)
Proof
It follows from Theorem 5 by continuity. \(\square \)
Remark 7
The zeros of \({{\,\mathrm{\Delta }\,}}_p\) are symmetric with respect to \(x=0\). Indeed,
because both polynomials verify the same recursion relation (40).
Proposition 5
Let \(t_p>0\) for all \(p=1,\dots ,K-1\,\). Then, for every \(\rho >0\) the followings are equivalent:
-
(i)
the zeros of \({{\,\mathrm{\Delta }\,}}_K\) are contained in \((-\rho ,\rho )\,\);
-
(ii)
the zeros of \({{\,\mathrm{\Delta }\,}}_p\) are contained in \((-\rho ,\rho )\) for every \(p=1,\dots ,K\,\);
-
(iii)
\({{\,\mathrm{\Delta }\,}}_p(\rho ,t)>0\) for every \(p\le K\) such that \(p\equiv _{\text {mod}2}K\,\);
-
(iv)
\({{\,\mathrm{\Delta }\,}}_p(\rho ,t)>0\) for every \(p=1,\dots ,K\,\).
Proof
i\(\Rightarrow \)ii. This is a consequence of Theorem 5.
ii\(\Rightarrow \)iii. Trivial since \({{\,\mathrm{\Delta }\,}}_p(x,t)\rightarrow \infty \) as \(x\rightarrow \infty \) for every \(p\ge 1\,\).
iii\(\Rightarrow \)iv. From the recursion relation (40), one sees that if \({{\,\mathrm{\Delta }\,}}_{p+1}(\rho ,t)>0\) and \({{\,\mathrm{\Delta }\,}}_{p-1}(\rho ,t)>0\) then also \({{\,\mathrm{\Delta }\,}}_{p}(\rho ,t)>0\,\).
iv\(\Rightarrow \)i. By contradiction, assume that \({{\,\mathrm{\Delta }\,}}_p(\rho ,t)>0\) for every \(p=1,\dots ,K\) and not all the zeros of \({{\,\mathrm{\Delta }\,}}_K\) are contained in \((-\rho ,\rho )\).
Claim: \({{\,\mathrm{\Delta }\,}}_p\) has at least two zeros in \((\rho ,\infty )\) for every \(p=2,\dots ,K\,\).
We are going to prove the claim by induction. It will contradict the fact that \({{\,\mathrm{\Delta }\,}}_2\) has only one positive zero.
Let’s start from \(p=K\). By hypothesis, \({{\,\mathrm{\Delta }\,}}_K(\rho ,t)>0\) and \({{\,\mathrm{\Delta }\,}}_K\) has a zero \(x_0^{(K)}\in (\rho ,\infty )\,\). Theorem 5 guarantees that \({{\,\mathrm{\Delta }\,}}_K\) changes its sign at \(x=x_0^{(K)}\) (because every zero is simple). On the other hand, we know that \({{\,\mathrm{\Delta }\,}}_K(x,t)\rightarrow \infty \) as \(x\rightarrow \infty \). Therefore, \({{\,\mathrm{\Delta }\,}}_K\) has (at least) another zero \(x_1^{(K)}\in (\rho ,\infty )\,\), \(x_1^{(K)}\ne x_0^{(K)}\,\). This proves the claim for \(p=K\,\).
Now, let \(p\le K\), assume the claim for p and prove it for \(p-1\,\). By induction hypothesis, \({{\,\mathrm{\Delta }\,}}_p\) has two zeros \(x_0^{(p)},x_1^{(p)}\in (\rho ,\infty )\,\), \(x_1^{(p)}\ne x_0^{(p)}\,\). By Theorem 5, it follows that \({{\,\mathrm{\Delta }\,}}_{p-1}\) has a zero \(x_0^{(p-1)}\in (\rho ,\infty )\) (interlacing of the zeros). Since by hypothesis, \({{\,\mathrm{\Delta }\,}}_{p-1}(\rho ,t)>0\) and \({{\,\mathrm{\Delta }\,}}_{p-1}(x,t)\rightarrow \infty \) as \(x\rightarrow \infty \), it follows that \({{\,\mathrm{\Delta }\,}}_{p-1}\) has another zero \(x_1^{(p-1)}\in (\rho ,\infty )\,\), \(x_1^{(p-1)}\ne x_0^{(p-1)}\). \(\square \)
Also, Proposition 5 extends to the case of nonnegative coefficients.
Corollary 4
Let \(t_p\ge 0\) for all \(p=1,\dots ,K-1\,\). Then, for every \(\rho >0\) the followings are equivalent:
-
(i)
the zeros of \({{\,\mathrm{\Delta }\,}}_K\) are contained in \((-\rho ,\rho )\,\);
-
(ii)
the zeros of \({{\,\mathrm{\Delta }\,}}_p\) are contained in \((-\rho ,\rho )\) for every \(p=1,\dots ,K\,\);
-
(iii)
\({{\,\mathrm{\Delta }\,}}_p(\rho ,t)>0\) for every \(p\le K\) such that \(p\equiv _{\text {mod}2}K\,\);
-
(iv)
\({{\,\mathrm{\Delta }\,}}_p(\rho ,t)>0\) for every \(p=1,\dots ,K\,\).
Proof
Implications i\(\Rightarrow \)ii\(\Rightarrow \)iii\(\Rightarrow \)iv are proven as before. iv\(\Rightarrow \)i follows from Proposition 5 by continuity. \(\square \)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Alberici, D., Contucci, P. & Mingione, E. Deep Boltzmann Machines: Rigorous Results at Arbitrary Depth. Ann. Henri Poincaré 22, 2619–2642 (2021). https://doi.org/10.1007/s00023-021-01027-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00023-021-01027-2