Deep Boltzmann Machines: Rigorous Results at Arbitrary Depth

Alberici, Diego; Contucci, Pierluigi; Mingione, Emanuele

doi:10.1007/s00023-021-01027-2

Deep Boltzmann Machines: Rigorous Results at Arbitrary Depth

Original Paper
Open access
Published: 22 February 2021

Volume 22, pages 2619–2642, (2021)
Cite this article

Download PDF

You have full access to this open access article

Annales Henri Poincaré Aims and scope Submit manuscript

Deep Boltzmann Machines: Rigorous Results at Arbitrary Depth

Download PDF

1078 Accesses
17 Citations
1 Altmetric
Explore all metrics

Abstract

A class of deep Boltzmann machines is considered in the simplified framework of a quenched system with Gaussian noise and independent entries. The quenched pressure of a K-layers spin glass model is studied allowing interactions only among consecutive layers. A lower bound for the pressure is found in terms of a convex combination of K Sherrington–Kirkpatrick models and used to study the annealed and replica symmetric regimes of the system. A map with a one-dimensional monomer–dimer system is identified and used to rigorously control the annealed region at arbitrary depth K with the methods introduced by Heilmann and Lieb. The compression of this high-noise region displays a remarkable phenomenon of localisation of the processing layers. Furthermore, a replica symmetric lower bound for the limiting quenched pressure of the model is obtained in a suitable region of the parameters and the replica symmetric pressure is proved to have a unique stationary point.

Annealing and Replica-Symmetry in Deep Boltzmann Machines

Article 05 February 2020

Learning and Retrieval Operational Modes for Three-Layer Restricted Boltzmann Machines

Article 23 October 2021

From Spin Glasses to Learning of Neural Networks

Article 16 August 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction and Results

The mean-field setting in Statistical Mechanics corresponds to the invariance of an N particles system under the permutation group action. When this condition is weakened to permutation invariance within each set of a K-partition of the system $\big (\sum _{p=1}^{K}N_p=N\big )$, a homogeneous model generalizes to its K-populated version. This generalization has been considered in spin systems for both non-random interactions, i.e. the Curie–Weiss model [12, 13], and random interactions, i.e. the Sherrington–Kirkpatrick model [7, 25]. For the first case, a complete control of the thermodynamic properties has been reached for general values of the interaction parameters. In the random case, instead only the so-called elliptic structure of the interactions is fully controlled, while the hyperbolic one is still not understood. We mention that the case $K=2$ has already been solved in two particular frameworks characterized by replica symmetry: on the Nishimori line [6] or with spherical spins [4, 5].

In this paper, we continue the analysis started in [2, 8] concerning a mean-field spin glass with pure hyperbolic structure of the interactions, i.e. a random version of deep Boltzmann machines [DBM] over K layers [26]. The framework of [2] is generalized by dealing with a general number K of layers and by allowing local (layer dependent) temperatures. A lower bound for the quenched pressure in terms of K Sherrington–Kirkpatrick models [SK] coupled in temperature along a linear chain is obtained and used to study the annealed and replica symmetric regimes of the random DBM in the large volume limit. We mention that an upper bound for the quenched pressure in terms of the solution of an infinite-dimensional Hamilton–Jacobi equation has recently been obtained in [22] for $K=2$ and layers of equal size; see also [23] for a generalization by the same author.

Our first result is a control of the annealed region $A_K$ in terms of the largest zero of a matching polynomial which—up to a change of variable in the complex plane—is the partition function of a monomer–dimer system over the linear chain of length K [18, 19]. This region $A_K$ turns out to be exactly the one where the annealed solution $q=0$ is stable for the replica symmetric consistency equation. The compression of the annealed region leads to a peculiar structure of the layers: in particular, the extensive layers are localized along a chain of length two or three.

A replica symmetric lower bound for the quenched pressure is obtained in a suitable region of the parameters. In the case of Gaussian external fields, this region is identified by a K-dimensional version of the Almeida–Thouless condition for SK. Within this framework, the replica symmetric consistency equation is proved to have a unique solution on the whole space of parameters. It is important to mention that the uniqueness for the elliptic case [9, 25] is still an open problem when $K>2$.

The paper is organised as follows. Section 2 introduces the model. In Sect. 3, we provide a lower bound for the quenched pressure of the DBM in terms of an interacting variational principle. In Sect. 4, we identify and study a region where the quenched and the annealed pressure of the DBM coincides. In Sect. 5, we derive the replica symmetric functional for the DBM and we study its stationary point(s). In Sect. 6, we provide a lower bound for the quenched pressure of the DBM in terms of the previous replica symmetric functional under suitable conditions on the parameters of the model. Appendix A contains properties of the matching polynomials zeros, which are useful to characterize the annealed region in Sect. 4 and are mainly due to Heilmann and Lieb [18].

2 Definitions

Consider N spin variables $\sigma =(\sigma _i)_{i=1,\ldots ,N}\in \{-1,1\}^N$ arranged over K layers $L_1,\dots ,L_K$ of cardinality $N_1,\dots ,N_K$, respectively, so that $\sum _{p=1}^K N_p=N\,$. Assume that the relative sizes of the layers converge in the large volume limit:

$$\begin{aligned} \lambda _p^{(N)} \equiv \, \frac{N_p}{N} \,\xrightarrow [N\rightarrow \infty ]{}\, \lambda _p \,\in [0,1] \end{aligned}$$

(1)

for every $p=1,\dots ,K\,$. We denote $\varLambda _N=(L_p)_{p=1,\dots ,K}\,$, $\lambda ^{(N)}=\big (\lambda _p^{(N)}\big )_{p=1,\dots ,K}$ and $\lambda =(\lambda _p)_{p=1,\dots ,K}$. Clearly, $\sum _{p=1}^K\lambda _p=1\,$.

Let $J_{ij}$ for $(i,j)\in L_p\times L_{p+1}$ and $p=1,\dots ,K-1\,$ be a family of i.i.d. standard Gaussian random variables coupling spins in two consecutive layers. We introduce a vector of positive inverse temperatures tuning the interactions among consecutive layers $\beta =(\beta _p)_{p=1,\dots ,K-1}\in \mathbb {R}_+^{K-1}\,$.

Let $h_i$ for $i\in L_p$ and $p=1,\dots ,K$ be a family of independent real random variables, independent also of the $J_{ij}$’s, acting as external fields on the spins. Assume that $(h_i)_{i\in L_p}$ are i.i.d. copies of a random variable $h^{(p)}$ such that $\mathbb {E}|h^{(p)}|<\infty \,$. We denote $h=(h^{(p)})_{p=1,\dots ,K}\,$.

Definition 1

The Hamiltonian of the random deep Boltzmann machine [DBM] is

$$\begin{aligned} H_{\varLambda _N}(\sigma ) \,\equiv \, -\frac{\sqrt{2}}{\sqrt{N}}\; \sum _{p=1}^{K-1}\,\beta _p\!\!\sum _{(i,j)\in L_p\times L_{p+1}}\!\!\!\!\! J_{ij}\, \sigma _i\sigma _j \end{aligned}$$

(2)

for every spin configuration $\sigma \in \{-1,1\}^N\,$.

Definition 2

Given two spin configurations $\sigma ,\tau \in \{-1,1\}^N$, for every $p=1,\ldots ,K$ we define their overlap over the layer $L_p$ as

$$\begin{aligned} q_{L_p}(\sigma ,\tau ) \,\equiv \, \frac{1}{N_p}\,\sum _{i\in L_p} \sigma _i\,\tau _i \;\in [-1,1] . \end{aligned}$$

(3)

Remark 1

The covariance matrix of the centred Gaussian process $H_{\varLambda _N}$ is

$$\begin{aligned} \mathbb {E}\,H_{\varLambda _N}(\sigma )\, H_{\varLambda _N}(\tau ) \,=\, N\, q_{\varLambda _N}(\sigma ,\tau )^T\, M_1^{(N)}\, q_{\varLambda _N}(\sigma ,\tau ) \end{aligned}$$

(4)

for every $\sigma ,\tau \in \{-1,1\}^N$. Here, we set $q_{\varLambda _N}(\sigma ,\tau ) \equiv \big (q_{L_p}(\sigma ,\tau )\big )_{p=1,\dots ,K}\;$,

$$\begin{aligned} M_1(\beta ,\lambda ) \,\equiv & {} \, {{\,\mathrm{diag}\,}}(\lambda )\,M_0(\beta )\, {{\,\mathrm{diag}\,}}(\lambda ), \end{aligned}$$

(5)

$$\begin{aligned} M_0(\beta ) \,\equiv & {} \, \begin{pmatrix} 0 &{}\quad \beta _1^2 &{}\quad &{} \quad &{} \quad &{} \\ \beta _1^2 &{}\quad 0 &{}\quad \beta _2^2 &{}\quad \quad &{} \quad &{} \\ &{}\quad \beta _2^2 &{}\quad 0 &{}\quad &{} \quad &{} \quad \\ &{}\quad &{}\quad &{}\quad \ddots &{} \quad &{} \quad \\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \beta _{K-1}^2 \\ &{} \quad &{} \quad &{}\quad &{} \quad \beta _{K-1}^2\! &{}\quad 0 \\ \end{pmatrix} \end{aligned}$$

(6)

and we denote $M_1^{(N)}\equiv M_1(\beta ,\lambda ^{(N)})\,$. Notice that $M_0(\beta )$ can be interpreted as a weighted adjacency matrix for the layers structure of the DBM.

Definition 3

The random partition function of the model introduced by Hamiltonian (2) is

$$\begin{aligned} Z_{\varLambda _N} \,\equiv \, \sum _{\sigma \in \{-1,1\}^N} \exp \Bigg (-H_{\varLambda _N}(\sigma )\,+\,\sum _{p=1}^K\sum _{i\in L_p}h_i\,\sigma _i\Bigg ) \end{aligned}$$

(7)

, and its quenched pressure density is

$$\begin{aligned} {{\,\mathrm{p^{\mathrm{DBM}}_{\varLambda _N}}\,}}\,\equiv \, \frac{1}{N}\,\mathbb {E}\,\log Z_{\varLambda _N} \end{aligned}$$

(8)

where $\mathbb {E}$ denotes the expectation over all the couplings $J_{ij}\,$’s and the external fields $h_i$’s.

3 A Lower Bound for the Quenched Pressure of the DBM

In this section, we give an explicit bound for the quenched pressure of the K layers DBM in terms of K independent Sherrington–Kirkpatrick spin glasses [SK] [14, 24, 27].

Considering N spin variables $\sigma _i$, $i=1,\dots ,N$, we recall that the Hamiltonian of the SK model is

$$\begin{aligned} H^{\mathrm{SK}}_N(\sigma ) \,\equiv \, -\frac{1}{\sqrt{N}}\; \sum _{i,j=1}^N \tilde{J}_{ij}\, \sigma _i\sigma _j \end{aligned}$$

(9)

where ${\tilde{J}}_{ij}$, $i,j=1,\dots ,N$ is a family of i.i.d. standard Gaussian random couplings. Given two spin configurations $\sigma ,\tau \in \{-1,1\}^N$, their overlap is

$$\begin{aligned} q_N(\sigma ,\tau ) \,\equiv \, \frac{1}{N}\,\sum _{i=1}^N \sigma _i\,\tau _i \;\in [-1,1] \end{aligned}$$

(10)

and the covariance matrix of the Gaussian process $H^{\mathrm{SK}}_N$ is:

$$\begin{aligned} \mathbb {E}\,H^{\mathrm{SK}}_N(\sigma )\, H^{\mathrm{SK}}_N(\tau ) \,=\, N\, q_N(\sigma ,\tau )^2 . \end{aligned}$$

(11)

Given an inverse temperature $\beta >0$, the random partition function of the SK model is

$$\begin{aligned} Z^{\mathrm{SK}}_N\,\equiv \, \sum _{\sigma \in \{-1,1\}^N} \exp \Bigg ( -\beta \,H^{\mathrm{SK}}_N(\sigma ) \,+\, \sum _{i=1}^N {\tilde{h}}_i\,\sigma _i\Bigg ) \end{aligned}$$

(12)

where ${\tilde{h}}_i$, $i=1,\dots ,N$ is a family of i.i.d. copies of a random variable h such that $\mathbb {E}|h|<\infty \,$. The quenched pressure density of the SK model is

$$\begin{aligned} {{\,\mathrm{p^{\mathrm{SK}}_N}\,}}(\beta ,h) \,\equiv \, \frac{1}{N}\,\mathbb {E}\,\log Z^{\mathrm{SK}}_N\end{aligned}$$

(13)

where $\mathbb {E}$ denotes the expectation over all couplings $\tilde{J}_{ij}$’s and fields ${\tilde{h}}_i$’s. The quenched pressure converges as $N\rightarrow \infty $ and many properties of its limit, that we will denote by ${{\,\mathrm{p^{\mathrm{SK}}}\,}}(\beta ,h)\,$, have been investigated in the literature [3, 15, 17, 21, 24, 27].

Theorem 1

The quenched pressure of the DBM satisfies the following lower bound:

$$\begin{aligned} \liminf _{N\rightarrow \infty }{{\,\mathrm{p^{\mathrm{DBM}}_{\varLambda _N}}\,}}\,\ge \sup _{a\in \mathbb {R}_{+}^{K-1}} {{\,\mathrm{\mathcal {P}^{\mathrm{DBM}}}\,}}(a) , \end{aligned}$$

(14)

where, for every $a= (a_p)_{p=1,\dots ,K-1}\in \mathbb {R}_+^{K-1}$, the functional ${{\,\mathrm{\mathcal {P}^{\mathrm{DBM}}}\,}}(a)={{\,\mathrm{\mathcal {P}^{\mathrm{DBM}}}\,}}(a;\,\beta ,\lambda ,h)$ is defined as:

$$\begin{aligned} {{\,\mathrm{\mathcal {P}^{\mathrm{DBM}}}\,}}(a) \,\equiv \, \sum _{p=1}^K \lambda _p\, {{\,\mathrm{p^{\mathrm{SK}}}\,}}\!\big (\theta _p(a),h^{(p)}\big ) \,-\, \frac{1}{2}\,\sum _{p=1}^K \lambda _p\,\theta _p(a)^2 \,+ \sum _{p=1}^{K-1}\lambda _p\,\beta _p^2\,\lambda _{p+1}\nonumber \\ \end{aligned}$$

(15)

and the parameter $\theta _p(a)=\theta _p(a;\beta ,\lambda )\ge 0$ is defined by:

$$\begin{aligned} \theta _p(a)^{\,2} \,\equiv \, {\left\{ \begin{array}{ll} \,\lambda _1\,a_1\,\beta _1^2\ &{} \text {for }p=1 \\ \,\lambda _p\left( \,\dfrac{1}{a_{p-1}}\,\beta _{p-1}^2 +\, a_p\,\beta _p^2\right) \ &{} \text {for }p=2,\dots ,K-1 \\ \,\lambda _K\,\dfrac{1}{a_{K-1}}\,\beta _{K-1}^2\ &{} \text {for }p=K \\ \end{array}\right. }. \end{aligned}$$

(16)

Proof

We are going to prove the following lower bound at finite volume:

$$\begin{aligned} {{\,\mathrm{p^{\mathrm{DBM}}_{\varLambda _N}}\,}}\,\ge \, \sum _{p=1}^K \lambda _p^{(N)}\, {{\,\mathrm{p^{\mathrm{SK}}_{N_p}}\,}}\big (\theta _p^{(N)},h^{(p)}\big ) \,-\, \frac{1}{2}\,\sum _{p=1}^K \lambda _p^{(N)} \big (\theta _p^{(N)}\big )^2 \,+\, \sum _{p=1}^{K-1}\lambda _p^{(N)}\beta _p^2\,\lambda _{p+1}^{(N)}\nonumber \\ \end{aligned}$$

(17)

where $\theta _p^{(N)}\equiv \theta _p(a;\beta ,\lambda ^{(N)})$ and $a\in \mathbb {R}_{+}^{K-1}$ can be arbitrarily chosen. The lower bound (14) will follow immediately by letting $N\rightarrow \infty $, since ${{\,\mathrm{p^{\mathrm{SK}}_N}\,}}(\beta ,h)$ is convex with respect to $\beta $, and thus, the convergence to ${{\,\mathrm{p^{\mathrm{SK}}}\,}}$ is uniform on compact sets.

For every $p=1,\ldots ,K$, let $H^{\mathrm{SK}}_{L_p}(s)$, $s\in \{-1,1\}^{L_p}\,$ be a Gaussian process representing the Hamiltonian of an SK model over the $N_p$ spin variables in the layer $L_p\,$. We assume that $H^{\mathrm{SK}}_{L_1},\dots ,H^{\mathrm{SK}}_{L_K}$ are independent processes, also independent of the Hamiltonian $H_{\varLambda _N}$. For $\sigma \in \{-1,1\}^N$ and $t\in [0,1]$, we define an interpolating Hamiltonian as follows:

$$\begin{aligned} {\mathcal {H}}_{N}(\sigma ;t) \,\equiv \, \sqrt{t}\; H_{\varLambda _N}(\sigma ) \,+\, \sqrt{1-t}\; \sum _{p=1}^K\, \theta _p^{(N)}\, H^{\mathrm{SK}}_{L_p}(\sigma _{L_p}) , \end{aligned}$$

(18)

where of course $\sigma _{L_p}\equiv (\sigma _i)_{i\in L_p}\,$. An interpolating quenched pressure is naturally defined as

$$\begin{aligned} \varphi _{N}(t)\,\equiv \, \frac{1}{N}\, \mathbb {E}\,\log \,{\mathcal {Z}}_{N}(t) \ , \end{aligned}$$

(19)

where

$$\begin{aligned} {\mathcal {Z}}_{N}(t)\,\equiv \, \sum _{\sigma \in \{-1,1\}^N} \exp \bigg (-{\mathcal {H}}_{N}(\sigma ,t) \,+\, \sum _{p=1}^K\sum _{i\in L_p} h_i\,\sigma _i \bigg ) \end{aligned}$$

(20)

and $\mathbb {E}$ denotes the expectation with respect to all the couplings $J_{ij}$’s, ${\tilde{J}}_{ij}$’s, $h_i$’s. The quenched pressure of the DBM and a convex combination of quenched pressures of SK models are recovered for $t=1$ and $t=0$, respectively:

$$\begin{aligned}&\varphi _{N}(1) \,=\, {{\,\mathrm{p^{\mathrm{DBM}}_{\varLambda _N}}\,}}, \end{aligned}$$

(21)

$$\begin{aligned}&\varphi _{N}(0) \,=\, \sum _{p=1}^K \lambda _p^{(N)}\, {{\,\mathrm{p^{\mathrm{SK}}_{N_p}}\,}}(\theta _p^{(N)},h^{(p)}) . \end{aligned}$$

(22)

For every function $f:\{-1,1\}^{N}\times \{-1,1\}^{N}\rightarrow \mathbb {R}\,$, we denote

$$\begin{aligned} \left\langle \, f\,\right\rangle _{N,t} \,\equiv \, \mathbb {E}\,\sum _{\sigma ,\tau }\frac{e^{-\beta \,\mathcal H_{N}(\sigma ;t)\,-\,\beta \,{\mathcal {H}}_{N}(\tau ;t) \,+\, \sum _{p=1}^K\sum _{i\in L_p}h_i(\sigma _i+\tau _i)}}{\mathcal {Z}_N^2(t)}\,f(\sigma ,\tau ) . \end{aligned}$$

(23)

Let $Q_N:\{-1,1\}^{N}\times \{-1,1\}^{N}\rightarrow \mathbb {R}\,$,

$$\begin{aligned} Q_N\,\equiv \; 2\sum _{p=1}^{K-1} \lambda _p^{(N)}\beta _p\,\lambda _{p+1}^{(N)}\; q_{L_p}\,q_{L_{p+1}}\,-\, \sum _{p=1}^K \lambda _p^{(N)} \big (\theta _p^{(N)}\big )^2 q_{L_p}^2 . \end{aligned}$$

(24)

Gaussian integration by parts leads to the following result:

$$\begin{aligned} \frac{d\varphi _N}{dt} \,=\, \frac{1}{2}\, \Bigg (2\sum _{p=1}^{K-1}\lambda _p^{(N)}\,\beta _p\,\lambda _{p+1}^{(N)} \,-\, \sum _{p=1}^K \lambda _p^{(N)} \big (\theta _p^{(N)}\big )^2 \Bigg ) \,-\, \frac{1}{2}\,\Big \langle Q_N \Big \rangle _{N,t} . \end{aligned}$$

(25)

Now, replacing the definition (16) of $\theta _p^{(N)}=\theta _p(a;\beta ,\lambda ^{(N)})$ into (24), we obtain

$$\begin{aligned} Q_N \,=\, -\sum _{p=1}^{K-1} \beta _p^2\; \bigg (\lambda _{p+1}^{(N)}\,\frac{1}{\sqrt{a_p}}\;q_{L_{p+1}}\,-\, \lambda _p^{(N)}\,\sqrt{a_p}\;q_{L_p}\bigg )^{\!2} \,\le 0 . \end{aligned}$$

(26)

The claim (17) follows immediately from (21), (22), (25) and (26). $\square $

Remark 2

$a=(a_p)_{p=1,\dots ,K-1}$ is a stationary point of ${{\,\mathrm{\mathcal {P}^{\mathrm{DBM}}}\,}}$ if and only if

$$\begin{aligned} \frac{1}{a_p}\;\lambda _{p+1}\,{{\,\mathrm{q^{\mathrm{SK}}}\,}}\!\big (\theta _{p+1}(a),h^{(p+1)}\big ) \,=\, \lambda _p\,{{\,\mathrm{q^{\mathrm{SK}}}\,}}\!\big (\theta _p(a),h^{(p)}\big ) \end{aligned}$$

(27)

for every $p=1,\dots ,K-1\,$, where we define ${{\,\mathrm{q^{\mathrm{SK}}}\,}}(\beta ,h)\ge 0$ by

$$\begin{aligned} {{\,\mathrm{q^{\mathrm{SK}}}\,}}(\beta ,h)^2\,\equiv \, \lim _{N\rightarrow \infty }\,\mathbb {E}\sum _{\sigma ,\tau \in \{-1,1\}^N} q_N(\sigma ,\tau )^2\; \mu _N^SK (\sigma ,\tau ) \end{aligned}$$

(28)

and

$$\begin{aligned} \mu _N^SK (\sigma ,\tau ) \,\equiv \, \frac{1}{\big (Z^{\mathrm{SK}}_N\big )^2}\,\exp \bigg (-\beta H^{\mathrm{SK}}_N(\sigma )\,-\,\beta H^{\mathrm{SK}}_N(\tau )\,+\,\sum _{p=1}^K\sum _{i\in L_p} h_i\,(\sigma _i+\tau _i)\,\bigg ).\nonumber \\ \end{aligned}$$

(29)

Since $\frac{\partial }{\partial \beta }{{\,\mathrm{p^{\mathrm{SK}}}\,}}(\beta ,h)=\beta \,\big (1-{{\,\mathrm{q^{\mathrm{SK}}}\,}}(\beta ,h)^2\big )\,$ [27], it is straightforward to compute $\frac{\partial }{\partial a_p}{{\,\mathrm{\mathcal {P}^{\mathrm{DBM}}}\,}}\,$ from definition (15) and find the stationary condition (27).

4 The Annealed Region of the DBM

In this section, we consider the model in absence of external field ($h=0$) and we identify a region where the quenched and the annealed pressure of the DBM coincide.

Definition 4

The annealed pressure of the DBM is

$$\begin{aligned} {{\,\mathrm{p^{\mathrm{DBM-A}}}\,}}\,\equiv \, \lim _{N\rightarrow \infty } \frac{1}{N}\log \mathbb {E}\,Z_{\varLambda _N} . \end{aligned}$$

(30)

It can be easily computed due to the Gaussian nature of the model:

$$\begin{aligned} {{\,\mathrm{p^{\mathrm{DBM-A}}}\,}}(\beta ,\lambda ) \,=\, \log 2 \,+\, \sum _{p=1}^{K-1}\lambda _p\,\beta _p^2\,\lambda _{p+1} , \end{aligned}$$

(31)

By concavity of the $\log $, the annealed pressure is an upper bound for the quenched one:

$$\begin{aligned} \limsup _{N\rightarrow \infty } {{\,\mathrm{p^{\mathrm{DBM}}_{\varLambda _N}}\,}}\,\le \, {{\,\mathrm{p^{\mathrm{DBM-A}}}\,}}. \end{aligned}$$

(32)

The system is said to be in the annealed regime when the parameters $(\beta ,\lambda )$ are such that $\lim _{N\rightarrow \infty }{{\,\mathrm{p^{\mathrm{DBM}}_{\varLambda _N}}\,}}= {{\,\mathrm{p^{\mathrm{DBM-A}}}\,}}\,$.

By Theorem 1, we can investigate the annealed regime of the DBM relying on the established results for the annealed regime of the SK model. Let ${{\,\mathrm{p^{\mathrm{SK}}}\,}}$ be the limiting quenched pressure of an SK model, and let ${{\,\mathrm{p^{\mathrm{SK-A}}}\,}}\equiv \lim _{N\rightarrow \infty }N^{-1}\log \mathbb {E}Z^{\mathrm{SK}}_N$ be its annealed version. Clearly:

$$\begin{aligned} {{\,\mathrm{p^{\mathrm{SK}}}\,}}\,\le \, {{\,\mathrm{p^{\mathrm{SK-A}}}\,}}\,=\, \log 2 +\frac{\beta ^2}{2} . \end{aligned}$$

(33)

Equality is achieved in the so-called annealed region of the SK model [1, 14, 24, 27]:

$$\begin{aligned} {{\,\mathrm{p^{\mathrm{SK}}}\,}}(\beta ) \,=\, {{\,\mathrm{p^{\mathrm{SK-A}}}\,}}(\beta ) \quad \text {if }\beta ^2\le \frac{1}{2} . \end{aligned}$$

(34)

Now, consider the following system of inequalities:

$$\begin{aligned} {\left\{ \begin{array}{ll} \lambda _1\,a_1\,\beta _1^2 \le \, \dfrac{1}{2}\\ \lambda _p\,\Big (\,\dfrac{1}{a_{p-1}}\,\beta _{p-1}^2 +\, a_p\,\beta _p^2\Big ) \le \, \dfrac{1}{2}\ &{} \text {for }p=2,\dots ,K-1 \\ \lambda _K\,\dfrac{1}{a_{K-1}}\,\beta _{K-1}^2 <\, \dfrac{1}{2} \end{array}\right. } \end{aligned}$$

(35)

and the following region of parameters of the DBM:

$$\begin{aligned} A_K \equiv \Big \{(\beta ,\lambda )\in \mathbb {R}_{+}^{K-1}\times T_K\ \Big |\ \exists \,a_1,\dots ,a_{K-1}>0\,: (35) \text { is verified} \Big \}, \end{aligned}$$

(36)

where $T_K \equiv \{ (\lambda _1,\dots ,\lambda _K) \in [0,1]^K \,|\, \sum _{p=1}^K\lambda _p=1 \}\,$ denotes the $K-$dimensional simplex. We denote by $\overline{A_K}$ the topological closure of $A_K\,$.

Theorem 2

If $(\beta ,\lambda )\in \overline{A_K}$, there exists

$$\begin{aligned} \lim _{N\rightarrow \infty } {{\,\mathrm{p^{\mathrm{DBM}}_{\varLambda _N}}\,}}\,=\, {{\,\mathrm{p^{\mathrm{DBM-A}}}\,}}, \end{aligned}$$

(37)

Proof

The lower bound (14) for the quenched pressure of the DBM rewrites as:

$$\begin{aligned} \liminf _{N\rightarrow \infty }{{\,\mathrm{p^{\mathrm{DBM}}_{\varLambda _N}}\,}}\,\ge \, \sup _{a\in \mathbb {R}_{+}^{K-1}}\, \sum _{p=1}^K\, \lambda _p\, \Big ({{\,\mathrm{p^{\mathrm{SK}}}\,}}\left( \theta _p(a)\right) \,-\, {{\,\mathrm{p^{\mathrm{SK-A}}}\,}}\left( \theta _p(a)\right) \Big ) \,+\, {{\,\mathrm{p^{\mathrm{DBM-A}}}\,}}.\nonumber \\ \end{aligned}$$

(38)

Thanks to (33) and (34), if $(\beta ,\lambda )\in \overline{A_K}$, then the supremum in (38) vanishes and

$$\begin{aligned} \liminf _{N\rightarrow \infty }{{\,\mathrm{p^{\mathrm{DBM}}_{\varLambda _N}}\,}}\,\ge \, {{\,\mathrm{p^{\mathrm{DBM-A}}}\,}}. \end{aligned}$$

(39)

This bound together with (32) concludes the proof. $\square $

It is an open question whether $\overline{A_K}$ is the full annealed region of the system. We will see that Proposition 4 suggests a positive answer. We are now interested in a more explicit characterization of $A_K$. We mention that such a characterization can be interesting for inference problems as suggested in [10]. It is convenient to introduce the following family of polynomials.

Definition 5

Let $x\in \mathbb {C}$ and $t=(t_p)_{p=1,\dots ,K-1}\in [0,\infty )^{K-1}$. We define recursively

$$\begin{aligned} {\left\{ \begin{array}{ll} \,{{\,\mathrm{\Delta }\,}}_{p+1}(x,t) \,\equiv \, x\,{{\,\mathrm{\Delta }\,}}_{p}(x,t) - t_p\, {{\,\mathrm{\Delta }\,}}_{p-1}(x,t)\quad \text {for }p=1,\dots ,K-1 \\ \,{{\,\mathrm{\Delta }\,}}_1(x,t) \,\equiv \, x ,\ {{\,\mathrm{\Delta }\,}}_0(x,t) \,\equiv \, 1 \end{array}\right. }. \end{aligned}$$

(40)

These orthogonal polynomials have several characterizations and were studied by Heilmann and Lieb [18, 19]. Some relevant properties can be found in Appendix A.

Remark 3

The polynomial ${{\,\mathrm{\Delta }\,}}_K(x,t)$ has an interesting combinatorial interpretation. Let’s denote by ${\mathscr {L}}_K$ the linear graph of vertex set $\{1,\dots ,K\}$ and edge set $\{(p,p+1)\,|\,p=1,\dots ,K-1\}\,$. A matching on $\mathscr {L}_K$ is a subset of pairwise disjoint edges. Then:

$$\begin{aligned} {{\,\mathrm{\Delta }\,}}_K(x,t) \,=\, \sum _{d=0}^{K/2} (-1)^d\, x^{K-2d}\, f_{d,K}(t) \ , \end{aligned}$$

(41)

where:

$$\begin{aligned} f_{d,K}(t) \,\equiv \, \sum _{\begin{array}{c} D\text { matching on }{\mathscr {L}}_K\\ |D|=d \end{array}}\, \prod _{(p,p+1)\in D}\!\!\!\!t_{p} \ . \end{aligned}$$

(42)

Indeed, the polynomial on the right-hand side of (41) verifies the recursion relation (40) (see [18]).

Proposition 1

Let $(\beta ,\lambda )\in \mathbb {R}_{+}^{K-1}\times T_K$ and set

$$\begin{aligned} \rho (\beta ,\lambda ) \,\equiv \, \max \big \{\, x>0 \;\big | {{\,\mathrm{\Delta }\,}}_K\!\big (x,\,t(\beta ,\lambda )\big )=0 \,\big \} , \end{aligned}$$

(43)

where the parameter $t=(t_p)_{p=1,\dots ,K-1}\,$ is defined by

$$\begin{aligned} t_p(\beta ,\lambda ) \,\equiv \, 4\,\lambda _p\,\beta _p^4\,\lambda _{p+1} ,\quad p=1,\dots ,K-1. \end{aligned}$$

(44)

The followings are equivalent:

(i)
$(\beta ,\lambda )\in A_K$
(ii)
${{\,\mathrm{\Delta }\,}}_p\!\big (\,1,\,t(\beta ,\lambda )\,\big )>0 \quad \forall \,p=2,\dots ,K$
(iii)
$\rho (\beta ,\lambda )<1$,

Proof

(i)$\Leftrightarrow $(ii). To shorten the notation set $z_p\equiv {{\,\mathrm{\Delta }\,}}_p\!\big (1,\,t(\beta ,\lambda )\big )\,$. By (40), we have

$$\begin{aligned} {\left\{ \begin{array}{ll} z_{p+1} \,=\, z_{p} \,-\, 4\,\lambda _p\,\beta _p^4\,\lambda _{p+1}\,z_{p-1} \quad \text {for }p=1,\dots ,K-1 \\ z_1 = 1 ,\ z_0 = 1 \end{array}\right. }\ . \end{aligned}$$

(45)

Set $a_K^*\equiv \frac{z_K}{z_{K-1}}$ and, for $p=1,\dots ,K-1$

$$\begin{aligned} a_p^*\,\equiv \, \dfrac{1}{2\,\lambda _p\,\beta _p^2}\,\dfrac{z_p}{z_{p-1}} . \end{aligned}$$

(46)

Notice that if $\lambda _p=0$, then $a_p^*$ diverges, while $2\lambda _p\beta _p^2\,a_p^*=1$, since $z_{p-1}=z_p=z_{p+1}\,$. The following recursion relation follows from (45):

$$\begin{aligned} {\left\{ \begin{array}{ll} \,a_K^* \,=\, 1-\dfrac{2\lambda _K\beta _{K-1}^2}{a_{K-1}^*} \\ \,2\lambda _p\beta _p^2\;a_p^* \,=\, 1 \,-\, \dfrac{2\lambda _p\beta _{p-1}^2}{a_{p-1}^*} \quad \text {for }p=2,\dots ,K-1 \\ \,2\lambda _1\beta _1^2\; a_1^* \,=\, 1 \end{array}\right. }, \end{aligned}$$

(47)

Now, assume $z_1,\dots ,z_K>0$. Then, $a_1^*,\dots ,a_K^*>0$ and choosing $a_1=a^*_1$,..., $a_{K-1}=a^*_{K-1}$ the system of inequalities (35) is verified.

On the other hand, assuming that there exist $a_1,\dots ,a_{K-1}>0$ verifying (35), one can prove by induction that $a_p^*\ge a_p>0$ for $p=1,\dots ,K-1$ and $a_K^*>0\,$. Therefore, $z_1,\dots ,z_K>0\,$.

ii)$\Leftrightarrow $iii). Equivalence of these conditions is a consequence of the interlacing property of the zeros of ${{\,\mathrm{\Delta }\,}}_p\,$. A detailed proof can be found in Appendix (Corollary 4 with $\rho =1$). $\square $

Remark 4

The polynomial ${{\,\mathrm{\Delta }\,}}_K(x,t)$ with $t=t(\beta ,\lambda )$ defined in (44) has also a linear algebra interpretation. Set:

$$\begin{aligned} \begin{aligned} M(\beta ,\lambda ) \,&\equiv \, 2\,M_0(\beta )\,{{\,\mathrm{diag}\,}}(\lambda ) \,\\&=\, 2\,\begin{pmatrix} 0 &{}\quad \beta _1^2\lambda _2 &{} \quad &{}\quad &{}\quad \\ \lambda _1\beta _1^2 &{}\quad 0 &{}\quad \beta _2^2\lambda _3 &{}\quad &{}\quad \\ &{}\quad \lambda _2\beta _2^2 &{}\quad 0 &{}\quad &{}\quad \\ &{}\quad &{} \quad &{} \quad \ddots &{}\quad \\ &{}\quad &{}\quad &{}\quad &{}\quad \beta _{K-1}^2\lambda _K \\ &{} \quad &{}\quad &{} \quad \lambda _{K-1}\beta _{K-1}^2 &{}\quad 0 \\ \end{pmatrix} \end{aligned}\end{aligned}$$

(48)

where $M_0(\beta )$ is defined by (6). The characteristic polynomial of $M(\beta ,\lambda )$ is actually

$$\begin{aligned} {{\,\mathrm{\Delta }\,}}_K\!\big (x,\,t(\beta ,\lambda )\big ) \,=\, \det \big ( x\,I - M(\beta ,\lambda ) \big ) . \end{aligned}$$

(49)

Indeed using the Laplace expansion according to the last line of the matrix, it is easy to verify that the determinant on the right-hand side of (49) satisfies the recursion relation (40). Now, since the zeros of $x\mapsto {{\,\mathrm{\Delta }\,}}_K(x,t(\beta ,\lambda ))$ are all real and symmetric with respect to the origin (see Appendix), the largest one is the spectral radius of $M(\beta ,\lambda )\,$:

$$\begin{aligned} \rho (\beta ,\lambda ) \,=\, \max \{|x| : x\text { eigenvalue of }M(\beta ,\lambda ) \} . \end{aligned}$$

(50)

The next proposition exploits the result of Proposition 1 in order to study the role of the parameters $\beta $ and $\lambda $ in the annealed behaviour of the system.

Proposition 2

(i) For every $\beta \in \mathbb {R}_+^{K-1}\,$,

$$\begin{aligned} \sup _{\lambda \in T_K}\rho (\beta ,\lambda ) \,=\, \max _{p=1,\dots ,K-1} \beta _p^2 \ . \end{aligned}$$

(51)

The supremum is reached exactly for those $\lambda =\lambda ^*(\beta )\in T_K$ such that there exists $p^*\in \{1,\dots ,K-1\}\,$:

$$\begin{aligned} \lambda _{p^*} \,=\, \lambda _{p^*\!+1} \,=\, \frac{1}{2} \quad ,\quad \beta _{p^*}=\max _{p=1,\dots ,K-1}\beta _p \end{aligned}$$

(52)

or $p^*\in \{2,\dots ,K-1\}\,$:

$$\begin{aligned} \lambda _{p^*} \,=\, \lambda _{p^*\!-1}+\lambda _{p^*\!+1} \,=\, \frac{1}{2} \quad ,\quad \beta _{p^*}=\beta _{p^*\!-1}=\max _{p=1,\dots ,K-1}\beta _p . \end{aligned}$$

(53)

(ii) Moreover, for every $\lambda \in T_K$, $\rho (\beta ,\lambda )$ is a non-decreasing function of each $\beta _p$ for $p=1,\dots ,K-1$.

Physically, ii) means that increasing the local temperatures pushes the system towards the annealed region. On the other hand, i) implies that if all the inverse temperatures $\beta _p<1$ for $p=1,\dots ,K-1$, then the system is in the annealed regime for every choice of the form factors $\lambda $. Furthermore, if this is not the case, the system can be driven out of the region $A_K$ by localizing the positive density layers around the minimal temperature(s).

In order to prove Proposition 2, we need the following elementary (but useful)

Lemma 1

Let $P\ge 2$, $x_1,\dots ,x_P\ge 0$ and $b_1,\dots ,b_{P-1}\ge 0\,$. Set $S\equiv \sum _{p=1}^Px_p$ and $B \equiv \max _{p=1,\dots ,P-1} b_p\,$. Then:

$$\begin{aligned} 4\,\sum _{p=1}^{P-1} b_p\,x_p\,x_{p+1} \,\le \, B \,S^2 . \end{aligned}$$

(54)

Moreover, we have equality in (54) if and only if there exists $p^*\in \{2,\dots ,P-1\}$ such that

$$\begin{aligned} x_{p^*} \,=\, x_{p^*-1}+x_{p^*+1} \,=\, \frac{S}{2} \quad ,\quad b_{p^*-1} = b_{p^*} = B \end{aligned}$$

(55)

or there exists $p^*\in \{1,\dots ,P-1\}$ such that

$$\begin{aligned} x_{p^*} \,=\, x_{p^*+1} \,=\, \frac{S}{2} \quad ,\quad b_{p^*} = B . \end{aligned}$$

(56)

Proof

Since

$$\begin{aligned} 0\,\le \, \bigg (\sum _p (-1)^p\, x_p\bigg )^2 =\, \sum _p x_p^2 \,+\, 2\,\sum _{p<p'}(-1)^{p+p'}x_p\,x_{p'} , \end{aligned}$$

(57)

the following inequality holds true:

$$\begin{aligned} \sum _p x_p^2 \;\ge \, -2\,\sum _{p<p'}(-1)^{p+p'}x_p\,x_{p'} . \end{aligned}$$

(58)

Therefore:

$$\begin{aligned} \begin{aligned} \bigg (\sum _p x_p\bigg )^2&=\, \sum _p x_p^2 \,+\, 2\sum _{p<p'}x_p\,x_{p'} \\&\ge \, 2\,\sum _{p<p'}\Big (1-(-1)^{p+p'}\Big )\,x_p\,x_{p'} \\&\ge \, 4\,\sum _{p}x_p\,x_{p+1} . \end{aligned}\end{aligned}$$

(59)

As a trivial consequence, we have:

$$\begin{aligned} 4\,\sum _{p}b_p\,x_p\,x_{p+1} \,\le \, 4\,B\sum _{p}x_p\,x_{p+1} \,\le \, B\,\bigg (\sum _p x_p\bigg )^2. \end{aligned}$$

(60)

Now, all the previous inequalities are saturated if and only if the following conditions are fulfilled:

$$\begin{aligned} {\left\{ \begin{array}{ll} \,\sum _{p\,\text {even}}x_p=\sum _{p\,\text {odd}}x_p \\ \, x_p\,x_{p'}=0 \quad \forall \,p,p':\,p+p'\,\text {odd},\, p\le p'+3 \\ \, b_p=B \quad \forall \,p:\,x_p\,x_{p+1}\ne 0 \end{array}\right. } . \end{aligned}$$

(61)

It is easy to check that (61) is equivalent to (55) or (56), concluding the proof.

$\square $

Proof (of Proposition 2) By Remark 4, $\rho (\beta ,\lambda )$ is the spectral radius of the matrix $M(\beta ,\lambda )\,$. Hence:

$$\begin{aligned} \rho (\beta ,\lambda ) \,\le \, \Vert M(\beta ,\lambda )^2 \Vert _\infty ^{1/2} \end{aligned}$$

(62)

and the square of the matrix (48) can be easily computed leading to

$$\begin{aligned} \begin{aligned} \Vert M(\beta ,\lambda )^2 \Vert _\infty \,&=\, 4\,\max _{p=1,\dots ,K} \sum _{p'=p-2}^{p+1} b_{p'}^{(p)}\, \lambda _{p'}\,\lambda _{p'+1} \\&\le \, \max _{p=1,\dots ,K-1} \beta _p^4 , \end{aligned} \end{aligned}$$

(63)

where for every $p=1,\dots ,K$, $p'=p-2,\dots ,p+1$ we set

$$\begin{aligned} b_{p'}^{(p)} \,\equiv \, \beta _{p-2}^2\,\beta _{p-1}^2\;\delta _{p-2,p'} +\, \beta _{p-1}^4\;\delta _{p-1,p'} +\, \beta _{p}^4\;\delta _{p,p'} +\, \beta _{p}^2\,\beta _{p+1}^2\;\delta _{p+1,p'} \end{aligned}$$

(64)

and for convenience we denote $\lambda _{p}\equiv 0$ for $p\notin \{1,\dots ,K\}\,$ and $\beta _{p}\equiv 0$ for $p\notin \{1,\dots ,K-1\}$. The inequality in (63) follows by Lemma 1 since $\sum _p \lambda _p=1\,$.

Now, assume that $\rho (\beta ,\lambda )=\max _{p=1,\dots ,K-1}\beta _p^2\equiv {\hat{\beta }}^2$. In particular, the inequality in (63) must be saturated; namely, there exists $p\in \{1,\dots ,K\}$ such that

$$\begin{aligned} 4\sum _{p'=p-2}^{p+1}b_{p'}^{(p)}\,\lambda _{p'}\,\lambda _{p'+1} \,=\, {\hat{\beta }}^4 . \end{aligned}$$

(65)

Then, (52) or (53) follows from Lemma 1.

On the other hand, assume that condition (52) or (53) holds true. In order to prove that $\rho (\beta ,\lambda )={\hat{\beta }}^2$, it suffices to show that $x={\hat{\beta }}^2$ is a zero of the matching polynomial ${{\,\mathrm{\Delta }\,}}_K\!\big (x,t(\beta ,\lambda )\big )$, where the activities vector $t(\beta ,\lambda )$ is defined by (44). Now, condition (53) implies that

$$\begin{aligned} {{\,\mathrm{\Delta }\,}}_K\!\big ({\hat{\beta }}^2,\,t(\beta ,\lambda )\big ) \,=\, {\hat{\beta }}^{2K}\,\big (1-4\,\lambda _{p^*}\lambda _{p^*\!+1}\big ) \,=\, 0 \, \end{aligned}$$

(66)

while condition (52) implies that

$$\begin{aligned} {{\,\mathrm{\Delta }\,}}_K\!\big ({\hat{\beta }}^2,\,t(\beta ,\lambda )\big ) \,=\, {\hat{\beta }}^{2K}\,\big (1-4\,\lambda _{p^*\!-1}\lambda _{p^*}-4\,\lambda _{p^*}\lambda _{p^*\!+1}\big ) \,=\, 0 . \end{aligned}$$

(67)

This concludes the proof of Proposition 2 part (i). In order to prove part (ii), we observe that the matrix $M(\beta ,\lambda )$ has nonnegative entries; therefore, its spectral radius $\rho (\beta ,\lambda )$ is a non-decreasing function of its entries. $\square $

5 The Replica Symmetric Ansatz for the DBM

In this section, we derive a replica symmetric expression for the pressure of the DBM. We show that at zero magnetic field, the annealed region $A_K$ identified by Theorem 2 and Proposition 1 is the only region where the annealed solution is stable for the replica symmetric consistency equation. Finally, we prove the uniqueness of the solution of the replica symmetric consistency equation, under the hypothesis of Gaussian centred external fields.

Let $q=(q_p)_{p=1,\dots ,K}\in [0,1]^K\,$. Consider the matrices $M=M(\beta ,\lambda )$, $M_1=M_1(\beta ,\lambda )$ defined by (48), (5), respectively. For $p=1,\dots ,K$, we have

$$\begin{aligned} \big (M q\big )_p \,=\, 2\,q_{p-1}\,\lambda _{p-1}\,\beta _{p-1}^2 +\, 2\,\beta _p^2\,\lambda _{p+1}\,q_{p+1} \end{aligned}$$

(68)

where $\beta _0=\beta _K=\lambda _0=\lambda _{K+1}=q_0=q_{K+1}\equiv 0\,$ for convenience. We have

$$\begin{aligned} \frac{1}{2}\,q^T M_1\,q \,=\, \sum _{p=1}^{K-1}\lambda _p\,\beta _p^2\,\lambda _{p+1}\, q_p\,\,q_{p+1} . \end{aligned}$$

(69)

Definition 6

For every $q=(q_p)_{p=1,\dots ,K}\in [0,1]^K$, the replica symmetric functional of the DBM is

$$\begin{aligned} \begin{aligned} {{\,\mathrm{\mathcal {P}^{\mathrm{RS-DBM}}_{\varLambda _N}}\,}}(q) \equiv&\sum _{p=1}^{K} \lambda _p^{(N)}\; \mathbb {E}\log \cosh \left( z\,\sqrt{\big (M^{(N)}\,q\big )_p\,}+h^{(p)}\right) \,\\&+ \frac{1}{2}\,(1-q)^T\,M_1^{(N)}\,(1-q) \,+\, \log 2 \end{aligned} \end{aligned}$$

(70)

where z is a standard Gaussian random variable independent of h and $M^{(N)}\equiv M(\beta ,\lambda ^{(N)})\,$, $M_1^{(N)}\equiv M_1(\beta ,\lambda ^{(N)})$ are tridiagonal matrices defined by (48), (5), respectively. The limit of the functional as $N\rightarrow \infty $ is

$$\begin{aligned} \begin{aligned} {{\,\mathrm{\mathcal {P}^{\mathrm{RS-DBM}}}\,}}(q;\,\beta ,\lambda ,h) \equiv&\sum _{p=1}^{K} \lambda _p\; \mathbb {E}\log \cosh \left( z\,\sqrt{\big (M q\big )_p\,}+h^{(p)}\right) \\&+\frac{1}{2}\,(1-q)^T\,M_1\,(1-q) \,+\, \log 2 \end{aligned} \end{aligned}$$

(71)

where $M=M(\beta ,\lambda )$ and $M_1=M_1(\beta ,\lambda )\,$.

Definition 6 is motivated by the following

Proposition 3

For every $q=(q_p)_{p=1,\dots ,K}\in [0,1]^K$

$$\begin{aligned} {{\,\mathrm{p^{\mathrm{DBM}}_{\varLambda _N}}\,}}\,=\, {{\,\mathrm{\mathcal {P}^{\mathrm{RS-DBM}}_{\varLambda _N}}\,}}(q) \,-\, \frac{1}{2}\, \int _0^1 \Big \langle \big (q_{\varLambda _N}-q\big )^T M_1^{(N)} \big (q_{\varLambda _N}-q\big ) \Big \rangle _{N,t} \,dt \end{aligned}$$

(72)

where $q_{\varLambda _N}\equiv \big (q_{L_p}(\sigma ,\tau )\big )_{p=1,\dots ,K}$ and $\langle \,\cdot \,\rangle _{N,t}$ denotes the quenched Gibbs expectation associated with a suitable Hamiltonian.

Proof

Let $q\in [0,\infty )^K$. For every $p=1,\dots , K$, we consider a one-body model over the $N_p$ spin variables indexed by the layer $L_p$ at inverse temperature $\sqrt{(M^{(N)}q)_p}\,$ and external fields distributed as $h^{(p)}$. For $\sigma \in \{-1,1\}^N$ and $t\in [0,1]$, we define an interpolating Hamiltonian as follows:

$$\begin{aligned} {\mathcal {H}}_N(\sigma ,t) \,\equiv \, \sqrt{t}\; H_{\varLambda _N}(\sigma ) \,+\, \sum _{p=1}^K \,\sum _{i\in L_p} \left( z_i\, \sqrt{(1-t)\,(M^{(N)}q)_p}\,+h_i\right) \sigma _i \end{aligned}$$

(73)

where $z_i$, $i\in L_p$, $p=1,\dots , K$ are i.i.d. standard Gaussian random variables, independent also of $h_i$’s and $J_{ij}$’s. The interpolating pressure is

$$\begin{aligned} \varphi _N(t) \,\equiv \, \frac{1}{N}\,\mathbb {E}\,\log \,\sum _{\sigma } e^{-{\mathcal {H}}_{N}(\sigma ,t)} \ . \end{aligned}$$

(74)

Observe that the quenched pressure of the DBM and a convex combination of quenched pressures of one-body models are recovered for $t=1$, $t=0$, respectively:

$$\begin{aligned}&\displaystyle \varphi _{N}(1) \,=\, {{\,\mathrm{p^{\mathrm{DBM}}_{\varLambda _N}}\,}}, \end{aligned}$$

(75)

$$\begin{aligned}&\displaystyle \varphi _{N}(0) \,=\, \log 2 \,+\, \sum _{p=1}^K \lambda _p^{(N)}\; \mathbb {E}\log \cosh \left( z\,\sqrt{(M^{(N)}q)_p\,}+h^{(p)}\right) . \end{aligned}$$

(76)

Gaussian integration by parts leads to the following result:

$$\begin{aligned} \frac{d\varphi _N}{dt}\,(t) \,=\, \frac{1}{2}\,(1-q)^T M_1^{(N)} (1-q) \,-\, \frac{1}{2}\,\Big \langle \big (q_{\varLambda _N}-q\big )^T M_1^{(N)} \big (q_{\varLambda _N}-q\big ) \Big \rangle _{N,t}\nonumber \\ \end{aligned}$$

(77)

where $\langle \,\cdot \,\rangle _{N,t}$ denotes the quenched Gibbs expectation associated with the Hamiltonian $\mathcal H_N(\sigma ,t)+{\mathcal {H}}_N(\tau ,t)$. Therefore, (72) follows by (75), (76), (77) concluding the proof. $\square $

We say that the DBM is in the replica symmetric regime when there exists $q^*$ stationary point of ${{\,\mathrm{\mathcal {P}^{\mathrm{RS-DBM}}}\,}}(q)$ such that $\lim _{N\rightarrow \infty }{{\,\mathrm{p^{\mathrm{DBM}}_{\varLambda _N}}\,}}= {{\,\mathrm{\mathcal {P}^{\mathrm{RS-DBM}}}\,}}(q^*)\,$.

Remark 5

$q=(q_p)_{p=1,\dots ,K}$ is a stationary point of ${{\,\mathrm{\mathcal {P}^{\mathrm{RS-DBM}}}\,}}$ if and only if

$$\begin{aligned} M_1 \cdot \left( q_p-\mathbb {E}\tanh ^2\left( z\,\sqrt{(Mq)_p\,}+h^{(p)}\right) \right) _{p=1,\dots ,K} \,=\, 0 \end{aligned}$$

(78)

where the matrices $M=M(\beta ,\lambda )$, $M_1=M_1(\beta ,\lambda )$ are defined by (48), (5), respectively, and z is a standard Gaussian random variable independent of h. Indeed, Gaussian integration by parts allows to compute $\frac{\partial }{\partial q_p}{{\,\mathrm{\mathcal {P}^{\mathrm{RS-DBM}}}\,}}$ from definition (71).

Remark 6

For $h=0$, observe that $q=0$ is a solution of (78) and the replica symmetric functional computed at this stationary point equals the annealed pressure of the DBM:

$$\begin{aligned} {{\,\mathrm{\mathcal {P}^{\mathrm{RS-DBM}}}\,}}\big (q=0;\,\beta ,\lambda ,h=0\big ) \,=\, {{\,\mathrm{p^{\mathrm{DBM-A}}}\,}}(\beta ,\lambda ) . \end{aligned}$$

(79)

Proposition 4

Set $F:[0,1]^K\rightarrow [0,1]^K$, $F_p(q) \,\equiv \, \mathbb {E}\tanh ^2\!\left( z\,\sqrt{(Mq)_p}\right) $ for every $p=1,\dots ,K$. The region of parameters $(\beta ,\lambda )$ such that the annealed solution $q=0$ is a stable solution of the replica symmetric consistency equation $q=F(q)$ coincides with the region $A_K$ introduced in Sect. 4. Precisely:

$$\begin{aligned} |x|<1\ \;\forall \,x\,\text {eigenvalue of }{{\,\mathrm{Jac}\,}}F\Big |_{q=0} \quad \Leftrightarrow \quad (\beta ,\lambda )\in A_K. \end{aligned}$$

(80)

Proof

Gaussian integration by parts allows to compute the derivatives of F with respect to q, leading to

$$\begin{aligned} {{\,\mathrm{Jac}\,}}F\,\Big |_{q=0} \,=\, M . \end{aligned}$$

(81)

Therefore, (80) follows immediately by Proposition 1 and Remark 4. $\square $

When the matrix $M_1$ is invertible, the replica symmetric Eq. (78) rewrites as:

$$\begin{aligned} q_p \,=\, \mathbb {E}\tanh ^2\left( z\,\sqrt{(Mq)_p\,}+h^{(p)}\right) \quad \forall \ p=1,\dots ,K . \end{aligned}$$

(82)

The problem of uniqueness of the solution of (82) has been proposed by Panchenko in [25] for the convex case (where M is replaced by a positive definite matrix) and solved in [9] for $K=2$. In the following, we prove the uniqueness for the deep case (our matrix M is highly non-definite) under the assumption of Gaussian centred external fields. Denote $T_K^+ \,\equiv \, \{ (\lambda _1,\dots ,\lambda _K) \in (0,1]^K \,|\, \sum _{p=1}^K\lambda _p=1 \}\,$.

Theorem 3

Let $h^{(p)}$, $p=1,\dots ,K$ be centred Gaussian variables with variance $v_p>0$, respectively. Let $\lambda \in T_K^+$ and $\beta \in \mathbb {R}_+^{K-1}$. The consistency Eq. (82), which rewrites as

$$\begin{aligned} q_p\,=\,\mathbb {E}\tanh ^2\left( z\,\sqrt{(M q)_p+v_p}\,\right) \qquad \forall \,p=1,\dots ,K \end{aligned}$$

(83)

with $M=M(\beta ,\lambda )$ defined in (48), has a unique solution.

The proof of Theorem 3 relies on the following

Lemma 2

Let h be a centred Gaussian variable with variance $v>0$. Let $\beta >0$. Then, equation

$$\begin{aligned} q\,=\,\mathbb {E}\tanh ^2\left( z\,\sqrt{2\,q\,\beta ^2+v}\,\right) \end{aligned}$$

(84)

has a unique solution that we denote by ${{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}(\beta ,v)>0\,$. The function ${{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}$ is strictly increasing with respect to both $\beta $ and v.

The uniqueness part in Lemma 2 is the well-known Latala–Guerra’s lemma [27]. The monotonicity part is based on a similar argument. Whereas the uniqueness property holds true for much more general choices of the external field h, we notice that the monotonicity property in $\beta $ is lost for deterministic (large enough) h.

Proof (Lemma 2) Set $f(q)\equiv q^{-1}\,\mathbb {E}\tanh ^2(z\,\sqrt{2\,q\,\beta ^2+v}\,)$ for $q>0$. To prove that (84) has a unique solution, it suffices to show that f is strictly decreasing. Now, taking the derivative of f (avoiding Gaussian integration by parts) leads to:

$$\begin{aligned} q^2\,\frac{df}{d q} \,=\, -\,\mathbb {E}\left[ \phi (y)\,\left( \phi (y)- y\,\phi '(y)\,\right) \right] -\, \frac{v}{2\,q\,\beta ^2+v}\;\mathbb {E}\left[ y\,\phi (y)\,\phi '(y)\right] \end{aligned}$$

(85)

where $\phi (y)\equiv \tanh y$ and $y\equiv z\,\sqrt{2\,q\,\beta ^2+v}\,$. Since $\phi $ is odd, strictly positive on $\mathbb {R}_+$, strictly increasing on $\mathbb {R}$ and strictly concave on $\mathbb {R}_+$, it follows that the functions inside each expectation in (85) are strictly positive for $y\ne 0\,$. In particular, observe that ${{\,\mathrm{sign}\,}}\phi (y)={{\,\mathrm{sign}\,}}y$ and that

$$\begin{aligned} \frac{d}{dy}\,\big (\phi (y)-y\,\phi '(y)\big ) = -y\,\phi ''(y) \,>\,0\ \ \Rightarrow \ \ {{\,\mathrm{sign}\,}}\big (\phi (y)-y\,\phi '(y)\big ) = {{\,\mathrm{sign}\,}}y .\nonumber \\ \end{aligned}$$

(86)

Therefore, $\frac{df}{dq}<0$, proving uniqueness of the solution of Eq. (84).

Now, let’s prove that the solution ${{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}$ is strictly increasing with respect to $\beta >0$. Taking the derivative with respect to $\beta ^2$ on both sides of (84) (avoiding integration by parts), one finds:

$$\begin{aligned} \frac{d{{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}}{d\beta ^2} \,=\, \frac{\mathbb {E}\big [Y\phi (Y)\,\phi '(Y)\big ]}{2\,\beta ^2\,{{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}+v}\; \left( 2\,\beta ^2\,\frac{d{{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}}{d\beta ^2}+\,2{{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}\right) \end{aligned}$$

(87)

where $Y\equiv z\,\sqrt{2\beta ^2{{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}+v}\,$. Reordering terms and replacing ${{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}$ by $\mathbb {E}\,\phi (Y)^2$ lead to:

$$\begin{aligned} \frac{d{{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}}{d\beta ^2} \,=\, \frac{ \mathbb {E}\big [Y\phi (Y)\,\phi '(Y)\big ]\;2{{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}}{v+2\,\beta ^2\,\mathbb {E}\big [\phi (Y)\,\left( \phi (Y)- Y\,\phi '(Y)\,\right) \big ] } \,>0. \end{aligned}$$

(88)

In a similar way, one can prove that ${{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}$ is strictly increasing with respect to v, indeed:

$$\begin{aligned} \frac{d}{d v}{{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}\,=\, \frac{\mathbb {E}\big [Y\phi (Y)\,\phi '(Y)\big ]}{v+2\,\beta ^2\,\mathbb {E}\big [\phi (Y)\,\left( \phi (Y)- Y\,\phi '(Y)\,\right) \big ] } \,>0. \end{aligned}$$

(89)

$\square $

Proof (Theorem 3) A key observation is that the system (83) is equivalent to the following:

$$\begin{aligned} {\left\{ \begin{array}{ll} \,q_p \,=\, \mathbb {E}\tanh ^{2}\left( z\,\sqrt{2\,q_p\,\theta _p(a)^2+v_p}\,\right) &{} p=1,\dots ,K\\ \,\lambda _p\,q_p\;a_p \,=\, \lambda _{p+1}\,q_{p+1} &{} p=1,\dots ,K-1 \end{array}\right. } \end{aligned}$$

(90)

where we have introduced the auxiliary variables $a_1,\dots ,a_{K-1}>0\,$. This can be easily checked by comparing definitions (16) and (68). By Lemma 2, the first line of (90) entails

$$\begin{aligned} q_p \,=\, {{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}\!\big (\theta _p(a),v_p\big ) \quad \forall \,p=1,\dots ,K \end{aligned}$$

(91)

where ${{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}$ is uniquely defined and strictly increasing with respect to both arguments. On the other hand, the second line of (90) rewrites as

$$\begin{aligned} \lambda _1\,q_1\;\prod _{l=1}^p a_l \,=\, \lambda _{p+1}\,q_{p+1} \quad \forall \,p=1,\dots ,K-1 . \end{aligned}$$

(92)

Therefore, in order to prove the theorem it suffices to prove uniqueness of the solution $a\in \mathbb {R}_+^{K-1}$ of the following system:

$$\begin{aligned}&\lambda _1\,{{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}\!\big (\theta _1(a),v_1\big )\,\prod _{l=1}^{p} a_l\nonumber \\&\quad \,=\, \lambda _{p+1}\,{{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}\!\big (\theta _{p+1}(a),v_{p+1}\big ) \quad \forall \,p=1,\dots ,K-1 , \end{aligned}$$

(93)

It is convenient to set $Q_1(a_1)\,\equiv \, \lambda _1\,{{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}\big (\lambda _1\,\beta _1^2\,a_1,v_1\big )$ and for every $p\ge 2$

$$\begin{aligned} Q_p\bigg (\frac{1}{a_{p-1}},a_{p}\bigg ) \,\equiv \, \lambda _p\,{{\,\mathrm{q^{\mathrm{RS-SK}}}\,}}\bigg (\lambda _p\,\frac{\beta _{p-1}^2}{a_{p-1}}+\lambda _p\,\beta _p^2\,a_p,\,v_p\bigg ) . \end{aligned}$$

(94)

We are going to prove by induction on $p\ge 1$ that for any given $a_{p+1}\ge 0$, there exists a unique $a_p\,=\,a^*_p(a_{p+1})>0$ such that

$$\begin{aligned} {\left\{ \begin{array}{ll} \;a_l = a_l^*(a_{l+1}) \quad \forall \;l=1,\dots ,p-1\\ \;Q_1(a_1)\;a_1\,\cdots \,a_{p-1}\,a_p \,=\, Q_{p+1}\bigg (\dfrac{1}{a_p},a_{p+1}\bigg ) \end{array}\right. } \end{aligned}$$

(95)

and moreover $a_p^*$ is strictly increasing with respect to $a_{p+1}\,$. The uniqueness of solution of (93) will follow immediately by stopping at $p=K-1$ and choosing $a_K=0\,$.

$\bullet $ Case $p=1$: given $a_2\ge 0$, let’s consider the equation

$$\begin{aligned} Q_1(a_1)\,a_1 \,=\, Q_2\bigg (\frac{1}{a_1},a_2\bigg ) . \end{aligned}$$

(96)

By Lemma 2, the left-hand side of (96) is a strictly increasing function of $a_1>0$ and takes all the values in the interval $(0,\infty )$, while the right-hand side is a decreasing function of $a_1>0$ and takes nonnegative values. Therefore, there exists a unique $a_1=a_1^*(a_2)>0$ solution of (96). Now, taking derivatives on both sides of (96) and using again Lemma 2, one finds:

$$\begin{aligned} \frac{d a_1^*}{d a_2} \,=\, \frac{\partial }{\partial a_2}Q_2\Big (\frac{1}{a_1},a_2\Big ) \, \Bigg [\frac{\partial }{\partial a_1}\big (Q_1(a_1)\,a_1\big ) - \frac{\partial }{\partial a_1}Q_2\Big (\frac{1}{a_1},a_2\Big ) \Bigg ]^{-1}_{|a_1=a_1^*(a_2)} >0\nonumber \\ \end{aligned}$$

(97)

; hence, $a_1^*$ is a strictly increasing function of $a_2\,$.

$\bullet $ For $p>1\,$, $p-1$ $\Rightarrow $ p. Fix $a_{p+1}\ge 0\,$. By inductive hypothesis, $a_1^*,\dots ,a_{p-1}^*$ are well defined and strictly increasing functions. Defining the composition $A_l^*\equiv a_l^*\circ \dots \circ a_{p-1}^*$ for every $l=1,\dots ,p-1$, Eq. (95) rewrites as:

$$\begin{aligned} \big (Q_1\circ A_1^*\big )(a_p) \, \prod _{l=1}^{p-1}\!A_l^*(a_p)\; a_p \,=\, Q_{p+1}\bigg (\frac{1}{a_p},a_{p+1}\bigg ) . \end{aligned}$$

(98)

By inductive hypothesis and Lemma 2, the left-hand side of (98) is a strictly increasing function of $a_p>0$ and takes all the values in the interval $(0,\infty )$, while the right hand-side of (98) is a decreasing function of $a_p>0$ and takes nonnegative values. Therefore, for every $a_{p+1}\ge 0$ there exists a unique $a_p=a_p^*(a_{p+1})>0$ solution of (98). Now, taking derivatives on both sides of (98) one finds:

$$\begin{aligned} \begin{aligned} \frac{d a_p^*}{d a_{p+1}} =&\; \frac{\partial }{\partial a_{p+1}}Q_{p+1}\Big (\frac{1}{a_p},a_{p+1}\Big )\; \cdot \\&\cdot \Bigg [\frac{\partial }{\partial a_p}\bigg (\!\big (Q_1\circ A_1^*\big )(a_p) \prod _{l=1}^{p-1}A_l^*(a_p)\,a_p\bigg ) - \frac{\partial }{\partial a_p}Q_{p+1}\Big (\frac{1}{a_p},a_{p+1}\Big ) \Bigg ]^{-1}_{|a_p=a_p^*(a_{p+1})} \end{aligned}\nonumber \\ \end{aligned}$$

(99)

which, using again the inductive hypothesis and Lemma 2, entails that $a_p^*$ is a strictly increasing function of $a_{p+1}\,$. $\square $

6 A Replica Symmetric Bound for the DBM

In this section, a lower bound for the quenched pressure of the DBM in terms of the replica symmetric functional is provided in a suitable region of the parameters $\beta ,\,\lambda ,\,h$. For centred Gaussian external fields, this region is defined through a system of K inequalities which mimic the Almeida–Thouless condition for the SK model.

By Theorem 1, we can investigate the replica symmetric regime of the DBM relying on the established results for the replica symmetric regime of the SK model. Denote by ${{\,\mathrm{\mathcal {P}^{\mathrm{RS-SK}}}\,}}$ the replica symmetric functional of an SK model, namely for every $q\in [0,1]$, $\beta >0$, h real random variable with $\mathbb {E}\,|h|<\infty $,

$$\begin{aligned} {{\,\mathrm{\mathcal {P}^{\mathrm{RS-SK}}}\,}}(q;\,\beta ,h) \,\equiv \, \mathbb {E}\log \cosh \left( z\,\sqrt{2\,q\,\beta ^2}\,+h\right) \,+\, \frac{\beta ^2}{2}\,(1-q)^2 \,+\, \log 2\qquad \end{aligned}$$

(100)

where z is a standard Gaussian random variable independent of h. Stationary points of ${{\,\mathrm{\mathcal {P}^{\mathrm{RS-SK}}}\,}}$ are identified by the consistency equation

$$\begin{aligned} q \,=\, \mathbb {E}\tanh ^2\left( z\,\sqrt{2\,q\,\beta ^2}\,+h\right) \end{aligned}$$

(101)

where z is a standard Gaussian r.v. independent of h. The celebrated Guerra’s bound [15] states in particular that

$$\begin{aligned} {{\,\mathrm{p^{\mathrm{SK}}}\,}}(\beta ,h) \,\le \, \inf _q {{\,\mathrm{\mathcal {P}^{\mathrm{RS-SK}}}\,}}(q;\,\beta ,h) . \end{aligned}$$

(102)

for every $\beta ,h$. Identifying the exact replica symmetric region of the SK model, where equality in (102) is achieved, is an open problem. A first result about the replica symmetric region of the DBM under general (but implicit) conditions is provided by the following

Theorem 4

For every $q\in [0,1]^K$, $a\in \mathbb {R}_+^K$ related by

$$\begin{aligned} \lambda _p\,q_p\;a_p \,=\, \lambda _{p+1}\,q_{p+1} \quad \forall \,p=1,\dots ,K-1 \end{aligned}$$

(103)

; the following inequality holds true:

$$\begin{aligned} {{\,\mathrm{\mathcal {P}^{\mathrm{DBM}}}\,}}(a;\,\beta ,\lambda ,h) \,\le \, {{\,\mathrm{\mathcal {P}^{\mathrm{RS-DBM}}}\,}}(q;\,\beta ,\lambda ,h) . \end{aligned}$$

(104)

Moreover, if the parameters $\beta ,\,\lambda ,\,h$ are such that there exist $q,\,a$ related by (103) and verifying

$$\begin{aligned} {{\,\mathrm{p^{\mathrm{SK}}}\,}}\big (\theta _p(a),h^{(p)}\big )\,=\,{{\,\mathrm{\mathcal {P}^{\mathrm{RS-SK}}}\,}}\big (q_p\,;\theta _p(a),h^{(p)}\big ) \quad \forall \,p=1,\dots ,K , \end{aligned}$$

(105)

then equality is achieved in (104) and as a consequence

$$\begin{aligned} \liminf _{N\rightarrow \infty } {{\,\mathrm{p^{\mathrm{DBM}}_{\varLambda _N}}\,}}\,\ge \, {{\,\mathrm{\mathcal {P}^{\mathrm{RS-DBM}}}\,}}(q;\,\beta ,\lambda ,h) . \end{aligned}$$

(106)

Proof

Since $q,\,a$ are related by (103), it is straightforward to verify that

$$\begin{aligned} 2\,q_p\,\theta _p(a)^2 \,=\, (Mq)_p \quad \forall \,p=1,\dots ,K, \end{aligned}$$

(107)

By Guerra’s bound (102), substituting ${{\,\mathrm{\mathcal {P}^{\mathrm{RS-SK}}}\,}}$ to ${{\,\mathrm{p^{\mathrm{SK}}}\,}}$ in the right-hand side of expression (15) provides an upper bound to ${{\,\mathrm{\mathcal {P}^{\mathrm{DBM}}}\,}}(a)\,$. Now, using the expression (100) of ${{\,\mathrm{\mathcal {P}^{\mathrm{RS-SK}}}\,}}$, the relation (107) and comparing with the expression (71) of ${{\,\mathrm{\mathcal {P}^{\mathrm{RS-DBM}}}\,}}$, bound (104) is finally proved.

Following the same computations, if (105) holds true, then

$$\begin{aligned} {{\,\mathrm{\mathcal {P}^{\mathrm{DBM}}}\,}}(a;\,\beta ,\lambda ,h) \,=\, {{\,\mathrm{\mathcal {P}^{\mathrm{RS-DBM}}}\,}}(q;\,\beta ,\lambda ,h) \end{aligned}$$

(108)

and bound (106) then follows by Theorem 1. $\square $

More explicit conditions for achieving equality in (104) and having the replica symmetric bound (106) are based on the control of the replica symmetric region in the SK model. For example, it is known that equality in (102) is achieved for $\beta $ small enough. Precisely, in Theorem 1.4.10 of [27] Talagrand proves that for every h

$$\begin{aligned} {{\,\mathrm{p^{\mathrm{SK}}}\,}}(\beta ,h) \,=\, {{\,\mathrm{\mathcal {P}^{\mathrm{RS-SK}}}\,}}(q;\,\beta ,h) \quad \text {if } \beta ^2<\frac{1}{8} \end{aligned}$$

(109)

where q is the unique solution of (101). (Notice the different parametrisation with respect to [27].)

Corollary 1

Let $\beta ,\,\lambda ,\,h$ such that a solution q of the replica symmetric consistency Eq. (82) satisfies the inequalities

$$\begin{aligned} (Mq)_p < \frac{1}{4}\,q_p \quad \forall \,p=1,\dots ,K \end{aligned}$$

(110)

Then, the replica symmetric bound (106) holds true.

Proof

Let q be a solution of (82) satisfying (110). Let $a\in \mathbb {R}_+^{K-1}$ verifying (103), so that the relation (107) holds true. Then, (110) and (82) rewrite, respectively, as:

$$\begin{aligned} {\left\{ \begin{array}{ll} \,\theta _p(a)^2 \,<\, \dfrac{1}{8} \\ \,q_p \,=\, \mathbb {E}\tanh ^{2}\left( z\,\sqrt{2\,q_p\,\theta _p(a)^2}+h^{(p)}\,\right) \end{array}\right. }\end{aligned}$$

(111)

for every $p=1,\dots ,K\,$. By Talagrand’s result (109), this entails

$$\begin{aligned} {{\,\mathrm{p^{\mathrm{SK}}}\,}}\big (\theta _p(a),h^{(p)}\big )\,=\,{{\,\mathrm{\mathcal {P}^{\mathrm{RS-SK}}}\,}}\big (q_p\,;\theta _p(a),h^{(p)}\big ) \end{aligned}$$

(112)

for every $p=1,\dots ,K\,$. Therefore, by Theorem 4,

$$\begin{aligned} {{\,\mathrm{\mathcal {P}^{\mathrm{DBM}}}\,}}(a;\,\beta ,\lambda ,h) \,=\, {{\,\mathrm{\mathcal {P}^{\mathrm{RS-DBM}}}\,}}(q;\,\beta ,\lambda ,h) \end{aligned}$$

(113)

and the bound (106) holds true. $\square $

A complete characterization of the SK replica symmetric region where equality is achieved in (102) is still missing (see nevertheless [16, 20, 27]). A necessary condition is the Almeida–Thouless condition [28]:

$$\begin{aligned} \beta ^2\;\mathbb {E}\cosh ^{-4}\left( z\,\sqrt{2\,q\,\beta ^2}\,+h\right) \,\le \, \frac{1}{2} \end{aligned}$$

(114)

where q is a solution of the consistency Eq. (101).

However, if we take h Gaussian centred r.v. with variance $v>0$, it was recently proved [11] that the Almeida–Thouless condition is also sufficient to have equality in (102). Precisely:

$$\begin{aligned} {{\,\mathrm{p^{\mathrm{SK}}}\,}}(\beta ,h) = {{\,\mathrm{\mathcal {P}^{\mathrm{RS-SK}}}\,}}(q;\,\beta ,h) \ \Leftrightarrow \ {\left\{ \begin{array}{ll} \,\beta ^2\;\mathbb {E}\cosh ^{-4}\left( z\,\sqrt{2\,q\,\beta ^2+v}\,\right) \,\le \, \dfrac{1}{2} \\ \,q \text { is the (unique) solution of }(84) \end{array}\right. }.\nonumber \\ \end{aligned}$$

(115)

Corollary 2

Assume $h^{(p)}$, $p=1,\dots ,K$ centred Gaussian variables of variance $v_p>0$, respectively. Let $\beta ,\,\lambda ,\,v$ such that the (unique) solution q of the replica symmetric consistency Eq. (83) satisfies the inequalities

$$\begin{aligned} (M q)_p\,\ \mathbb {E}\cosh ^{-4}\left( z\,\sqrt{(M q)_p+v_p}\,\right) \,\le \, q_p \qquad \forall \,p=1,\dots ,K . \end{aligned}$$

(116)

Then, the replica symmetric bound (106) holds true.

Proof

Let q be the unique solution of (83). Let $a\in \mathbb {R}_+^{K-1}$ verifying (103), so that the relation (107) holds true. Then, (116) and (83) rewrite, respectively, as:

$$\begin{aligned} {\left\{ \begin{array}{ll} \theta _p(a)^2\;\mathbb {E}\cosh ^{-4}\left( z\,\sqrt{2\,q_p\,\theta _p(a)^2+v_p}\,\right) \,\le \, \dfrac{1}{2} \\ q_p \,=\, \mathbb {E}\tanh ^{2}\left( z\,\sqrt{2\,q_p\,\theta _p(a)^2+v_p}\,\right) \end{array}\right. }\end{aligned}$$

(117)

for every $p=1,\dots ,K\,$. By Chen’s result (115), this entails

$$\begin{aligned} {{\,\mathrm{p^{\mathrm{SK}}}\,}}\big (\theta _p(a),h^{(p)}\big )\,=\,{{\,\mathrm{\mathcal {P}^{\mathrm{RS-SK}}}\,}}\big (q_p\,;\theta _p(a),h^{(p)}\big ) \end{aligned}$$

(118)

for every $p=1,\dots ,K\,$. Therefore, by Theorem 4,

$$\begin{aligned} {{\,\mathrm{\mathcal {P}^{\mathrm{DBM}}}\,}}(a;\,\beta ,\lambda ,h) \,=\, {{\,\mathrm{\mathcal {P}^{\mathrm{RS-DBM}}}\,}}(q;\,\beta ,\lambda ,h) \end{aligned}$$

(119)

and the bound (106) holds true. $\square $

References

Aizenman, M., Lebowitz, J.L., Ruelle, D.: Some rigorous results on the Sherrington–Kirkpatrick spin glass model. Commun. Math. Phys. 112, 3–20 (1987)
Article ADS MathSciNet Google Scholar
Alberici, D., Barra, A., Contucci, P., Mingione, E.: Annealing and replica symmetry in deep Boltzmann machines. J. Stat. Phys. 180, 665–677 (2020)
Article ADS MathSciNet Google Scholar
Auffinger, A., Chen, W.-K.: The Parisi formula has a unique minimizer. Commun. Math. Phys. 335, 1429–1444 (2015)
Article ADS MathSciNet Google Scholar
Auffinger, A., Chen, W.-K.: Free energy and complexity of spherical bipartite models. J. Stat. Phys. 157(1), 40–59 (2014)
Article ADS MathSciNet Google Scholar
Baik, J., Lee, J.O.: Free energy of bipartite spherical Sherrington–Kirkpatrick model. arXiv:1711.06364
Barbier, J., Macris, N., Miolane, L.: The layered structure of tensor estimation and its mutual information. In: 55th Annual Allerton Conference on Communication Control and Computing (2017)
Barra, A., Contucci, P., Mingione, E., Tantari, D.: Multi-species mean field spin glasses: rigorous results. Annales Henri Poincaré 16(3), 691–708 (2015)
Article ADS MathSciNet Google Scholar
Barra, A., Genovese, G., Guerra, F.: Equilibrium statistical mechanics of bipartite spin systems. J. Phys. A 44, 245002 (2011)
Article ADS MathSciNet Google Scholar
Bates, E., Sloman, L., Sohn, Y.: Replica symmetry breaking in multi-species Sherrington–Kirkpatrick model. J. Stat. Phys. 174, 333–350 (2019)
Article ADS MathSciNet Google Scholar
Chen, W.-K.: Phase transition in the spiked random tensor with Rademacher prior. Ann. Stat. 47(5), 2734–2756 (2019)
Article MathSciNet Google Scholar
Chen, W.-K.: private communication (unpublished)
Contucci, P., Fedele, M.: Scaling limits for multispecies statistical mechanics mean-field models. J. Stat. Phys. 144(6), 1186–1205 (2011)
Article ADS MathSciNet Google Scholar
Contucci, P., Gallo, I.: Bipartite mean field spin systems. Existence and solution. Math. Phys. Electronic J. 14, 1–22 (2008)
MathSciNet MATH Google Scholar
Contucci, P., Giardinà, C.: Perspectives on Spin Glasses. Cambridge University Press, Cambridge (2013)
MATH Google Scholar
Guerra, F.: Broken replica symmetry bounds in the mean field spin glass model. Commun. Math. Phys. 233(1), 1–12 (2003)
Article ADS MathSciNet Google Scholar
Guerra, F., Toninelli, F.L.: Quadratic replica coupling in the Sherrington–Kirkpatrick mean field spin glass model. J. Math. Phys. 43, 3704 (2002)
Article ADS MathSciNet Google Scholar
Guerra, F., Toninelli, F.L.: The thermodynamic limit in mean field spin glass models. Commun. Math. Phys. 230(1), 71–79 (2002)
Article ADS MathSciNet Google Scholar
Heilmann, O.J., Lieb, E.H.: Theory of monomer-dimer systems. Commun. Math. Phys. 25(3), 190–232 (1972)
Article ADS MathSciNet Google Scholar
Heilmann, O.J., Lieb, E.H.: Monomers and dimers. Phys. Rev. Lett. 24, 1412–1414 (1970)
Article ADS Google Scholar
Jagannath, A., Tobasco, I.: Some properties of the phase diagram for mixed p-spin glasses. Probab. Theory Relat. Fields 167, 615–672 (2017)
Article MathSciNet Google Scholar
Mézard, M., Parisi, G., Virasoro, M.A.: Spin Glass Theory and Beyond: An Introduction to the Replica Method and Its Applications. World Scientific, Singapore (1987)
MATH Google Scholar
Mourrat, J.-C.: Nonconvex interactions in mean-field spin glass. arXiv:2004.01679
Mourrat, J.-C.: Free energy upper bound for mean-field vector spin glasses. arXiv:2010.09114
Panchenko, D.: The Sherrington–Kirkpatrick model. Springer, Berlin (2013)
Book Google Scholar
Panchenko, D.: The free energy in a multi-species Sherrington–Kirkpatrick model. Ann. Probab. 43(6), 3494–3513 (2015)
Article MathSciNet Google Scholar
Salakhutdinov, R., Hinton, G.: Deep Boltzmann machines. Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, PMLR 5, 448–455 (2009)
Talagrand, M.: Mean Field Models for Spin Glasses. Volume I: Basic Examples. Springer, Berlin (2011)
Book Google Scholar
Toninelli, F.L.: About the Almeida–Thouless transition line in the Sherrington–Kirkpatrick mean field spin glass model. Europhys. Lett. 60(5), 764–767 (2002)
Article ADS Google Scholar

Download references

Acknowledgements

The authors thank Adriano Barra, Wei-Kuo Chen, Francesco Guerra and Daniele Tantari for interesting discussions. D.A. is grateful to Alberto Viscardi for his contribution to Proposition 2. P.C. was partially supported by PRIN project Statistical Mechanics and Complexity (2015K7KK8L). D.A. and E.M. were partially supported by Progetto Almaidea 2018.

Funding

Open Access funding provided by EPFL Lausanne.

Author information

Authors and Affiliations

Communication Theory Laboratory, E.P.F.L., Lausanne, Switzerland
Diego Alberici
Dipartimento di Matematica, Università di Bologna, Bologna, Italy
Pierluigi Contucci & Emanuele Mingione

Authors

Diego Alberici
View author publications
You can also search for this author in PubMed Google Scholar
Pierluigi Contucci
View author publications
You can also search for this author in PubMed Google Scholar
Emanuele Mingione
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Diego Alberici.

Additional information

Communicated by Vieri Mastropietro.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Matching Polynomials

In this Appendix, we give some properties of the polynomials ${{\,\mathrm{\Delta }\,}}_p(x,t)$ introduced by Definition 5 and characterizing the annealed region of the DBM. In particular, we are interested in the location of the zeros of ${{\,\mathrm{\Delta }\,}}_p$, namely the points $x\in \mathbb {C}$ such that ${{\,\mathrm{\Delta }\,}}_p(x,t)=0\,$.

Theorem 5 and Corollary 3 are due to Heilmann and Lieb [18] and show that the zeros are real and have an interlacing property. Proposition 5 and Corollary 4, by using these results, contribute to the proof of Proposition 1 in Sect. 4. Precisely, we show that the zeros of ${{\,\mathrm{\Delta }\,}}_K$ lie in the interval $(-\rho ,\rho )$ if and only if all the polynomials ${{\,\mathrm{\Delta }\,}}_p$ for $p\le K$ are positive at $x=\rho \,$.

Theorem 5

(Heilmann-Lieb [18]) Let $t_p>0$ for all $p=1,\dots ,K-1\,$. Then, for every $p=1,\dots ,K$

(i)
the zeros of ${{\,\mathrm{\Delta }\,}}_p$ are real and simple;
(ii)
if $p\ge 1$, the zeros of ${{\,\mathrm{\Delta }\,}}_p$ “interlace” with those of ${{\,\mathrm{\Delta }\,}}_{p-1}$. Namely, denoting by $x_1^{(p-1)}<\dots <x_{p-1}^{(p-1)}$ the zeros of ${{\,\mathrm{\Delta }\,}}_{p-1}$ and by $x_1^{(p)}<\dots <x_{p}^{(p)}$ the zeros of ${{\,\mathrm{\Delta }\,}}_p$, we have:
$$\begin{aligned} x_1^{(p)} \,<\, x_1^{(p-1)} \,<\, x_2^{(p)} \,<\, x_2^{(p-1)} \,<\, \dots \,<\, x_{p-1}^{(p)} \,<\, x_{p-1}^{(p-1)} \,<\, x_p^{(p)}\ . \end{aligned}$$
(120)

Proof

The statement is trivially true for $p=0$ and $p=1$. Consider $p\ge 1$, assume the statement holds true for $p-1$ and p, and prove it for $p+1$. By induction hypothesis, the zeros of ${{\,\mathrm{\Delta }\,}}_{p}$ and those of ${{\,\mathrm{\Delta }\,}}_{p-1}$ are real and simple and they are interlaced; namely, (120) holds true.

Since the zeros of ${{\,\mathrm{\Delta }\,}}_{p-1}$ are simple, ${{\,\mathrm{\Delta }\,}}_{p-1}$ changes its sign exactly at every $x_1^{(p-1)},\dots ,x_{p-1}^{(p-1)}$. By (120), it follows that ${{\,\mathrm{\Delta }\,}}_{p-1}$ has alternating signs at the points $x_1^{(p)},\dots ,x_{p}^{(p)}$. Therefore, also ${{\,\mathrm{\Delta }\,}}_{p+1}$ has alternating signs at the points $x_1^{(p)},\dots ,x_{p}^{(p)}\,$, indeed by the recursion relation (40)

$$\begin{aligned} {{\,\mathrm{\Delta }\,}}_{p+1}\!\big (x_k^{(p)},t\big ) \,=\, -\,\underbrace{t_p}_{>0}\,{{\,\mathrm{\Delta }\,}}_{p-1}\!\big (x_k^{(p)},t\big ) \end{aligned}$$

(121)

for every $k=1,\dots ,p$. As a consequence, ${{\,\mathrm{\Delta }\,}}_{p+1}$ has (at least) one zero in each interval $\big (x_k^{(p)},\,x_{k+1}^{(p)}\big )$ for $k=1,\dots ,p-1$. Moreover, since ${{\,\mathrm{\Delta }\,}}_{p+1}$ and ${{\,\mathrm{\Delta }\,}}_{p-1}$ share the same sign as $x\rightarrow \infty $ and as $x\rightarrow -\infty \,$, (121) implies that ${{\,\mathrm{\Delta }\,}}_{p+1}$ has (at least) one zero in $\big (x_p^{(p)},\,\infty \big )$ and (at least) one zero in $\big (-\infty ,\,x_{1}^{(p)}\big )\,$. Since the zeros of ${{\,\mathrm{\Delta }\,}}_{p+1}$ are exactly $p+1$, the thesis follows. $\square $

Theorem 5 can be extended to the case of nonnegative coefficients:

Corollary 3

(Heilmann-Lieb [18]) Let $t_p\ge 0$ for all $p=1,\dots ,K-1\,$. Then, for every $p=1,\dots ,K$

(i)
the zeros of ${{\,\mathrm{\Delta }\,}}_p$ are real;
(ii)
if $p\ge 1$, the zeros of ${{\,\mathrm{\Delta }\,}}_p$ “weakly interlace” with those of ${{\,\mathrm{\Delta }\,}}_{p-1}$. Namely, denoting by $x_1^{(p-1)}\le \dots \le x_{p-1}^{(p-1)}$ the zeros of ${{\,\mathrm{\Delta }\,}}_{p-1}$ and by $x_1^{(p)}\le \dots \le x_{p}^{(p)}$ the zeros of ${{\,\mathrm{\Delta }\,}}_p$ repeated according to their multiplicity, we have:
$$\begin{aligned} x_1^{(p)} \,\le \, x_1^{(p-1)} \,\le \, x_2^{(p)} \,\le \, x_2^{(p-1)} \,\le \, \dots \,\le \, x_{p-1}^{(p)} \,\le \, x_{p-1}^{(p-1)} \,\le \, x_p^{(p)}\ . \end{aligned}$$
(122)

Proof

It follows from Theorem 5 by continuity. $\square $

Remark 7

The zeros of ${{\,\mathrm{\Delta }\,}}_p$ are symmetric with respect to $x=0$. Indeed,

$$\begin{aligned} {{\,\mathrm{\Delta }\,}}_p(x,t) \,=\, (-1)^p\,{{\,\mathrm{\Delta }\,}}_p(-x,t) \end{aligned}$$

(123)

because both polynomials verify the same recursion relation (40).

Proposition 5

Let $t_p>0$ for all $p=1,\dots ,K-1\,$. Then, for every $\rho >0$ the followings are equivalent:

(i)
the zeros of ${{\,\mathrm{\Delta }\,}}_K$ are contained in $(-\rho ,\rho )\,$;
(ii)
the zeros of ${{\,\mathrm{\Delta }\,}}_p$ are contained in $(-\rho ,\rho )$ for every $p=1,\dots ,K\,$;
(iii)
${{\,\mathrm{\Delta }\,}}_p(\rho ,t)>0$ for every $p\le K$ such that $p\equiv _{\text {mod}2}K\,$;
(iv)
${{\,\mathrm{\Delta }\,}}_p(\rho ,t)>0$ for every $p=1,\dots ,K\,$.

Proof

i$\Rightarrow $ii. This is a consequence of Theorem 5.

ii$\Rightarrow $iii. Trivial since ${{\,\mathrm{\Delta }\,}}_p(x,t)\rightarrow \infty $ as $x\rightarrow \infty $ for every $p\ge 1\,$.

iii$\Rightarrow $iv. From the recursion relation (40), one sees that if ${{\,\mathrm{\Delta }\,}}_{p+1}(\rho ,t)>0$ and ${{\,\mathrm{\Delta }\,}}_{p-1}(\rho ,t)>0$ then also ${{\,\mathrm{\Delta }\,}}_{p}(\rho ,t)>0\,$.

iv$\Rightarrow $i. By contradiction, assume that ${{\,\mathrm{\Delta }\,}}_p(\rho ,t)>0$ for every $p=1,\dots ,K$ and not all the zeros of ${{\,\mathrm{\Delta }\,}}_K$ are contained in $(-\rho ,\rho )$.

Claim: ${{\,\mathrm{\Delta }\,}}_p$ has at least two zeros in $(\rho ,\infty )$ for every $p=2,\dots ,K\,$.

We are going to prove the claim by induction. It will contradict the fact that ${{\,\mathrm{\Delta }\,}}_2$ has only one positive zero.

Let’s start from $p=K$. By hypothesis, ${{\,\mathrm{\Delta }\,}}_K(\rho ,t)>0$ and ${{\,\mathrm{\Delta }\,}}_K$ has a zero $x_0^{(K)}\in (\rho ,\infty )\,$. Theorem 5 guarantees that ${{\,\mathrm{\Delta }\,}}_K$ changes its sign at $x=x_0^{(K)}$ (because every zero is simple). On the other hand, we know that ${{\,\mathrm{\Delta }\,}}_K(x,t)\rightarrow \infty $ as $x\rightarrow \infty $. Therefore, ${{\,\mathrm{\Delta }\,}}_K$ has (at least) another zero $x_1^{(K)}\in (\rho ,\infty )\,$, $x_1^{(K)}\ne x_0^{(K)}\,$. This proves the claim for $p=K\,$.

Now, let $p\le K$, assume the claim for p and prove it for $p-1\,$. By induction hypothesis, ${{\,\mathrm{\Delta }\,}}_p$ has two zeros $x_0^{(p)},x_1^{(p)}\in (\rho ,\infty )\,$, $x_1^{(p)}\ne x_0^{(p)}\,$. By Theorem 5, it follows that ${{\,\mathrm{\Delta }\,}}_{p-1}$ has a zero $x_0^{(p-1)}\in (\rho ,\infty )$ (interlacing of the zeros). Since by hypothesis, ${{\,\mathrm{\Delta }\,}}_{p-1}(\rho ,t)>0$ and ${{\,\mathrm{\Delta }\,}}_{p-1}(x,t)\rightarrow \infty $ as $x\rightarrow \infty $, it follows that ${{\,\mathrm{\Delta }\,}}_{p-1}$ has another zero $x_1^{(p-1)}\in (\rho ,\infty )\,$, $x_1^{(p-1)}\ne x_0^{(p-1)}$. $\square $

Also, Proposition 5 extends to the case of nonnegative coefficients.

Corollary 4

Let $t_p\ge 0$ for all $p=1,\dots ,K-1\,$. Then, for every $\rho >0$ the followings are equivalent:

(i)
the zeros of ${{\,\mathrm{\Delta }\,}}_K$ are contained in $(-\rho ,\rho )\,$;
(ii)
the zeros of ${{\,\mathrm{\Delta }\,}}_p$ are contained in $(-\rho ,\rho )$ for every $p=1,\dots ,K\,$;
(iii)
${{\,\mathrm{\Delta }\,}}_p(\rho ,t)>0$ for every $p\le K$ such that $p\equiv _{\text {mod}2}K\,$;
(iv)
${{\,\mathrm{\Delta }\,}}_p(\rho ,t)>0$ for every $p=1,\dots ,K\,$.

Proof

Implications i$\Rightarrow $ii$\Rightarrow $iii$\Rightarrow $iv are proven as before. iv$\Rightarrow $i follows from Proposition 5 by continuity. $\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Alberici, D., Contucci, P. & Mingione, E. Deep Boltzmann Machines: Rigorous Results at Arbitrary Depth. Ann. Henri Poincaré 22, 2619–2642 (2021). https://doi.org/10.1007/s00023-021-01027-2

Download citation

Received: 04 September 2020
Accepted: 25 January 2021
Published: 22 February 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s00023-021-01027-2

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Deep Boltzmann Machines: Rigorous Results at Arbitrary Depth

Abstract

Similar content being viewed by others

Annealing and Replica-Symmetry in Deep Boltzmann Machines

Learning and Retrieval Operational Modes for Three-Layer Restricted Boltzmann Machines

From Spin Glasses to Learning of Neural Networks

1 Introduction and Results

2 Definitions

Definition 1

Definition 2

Remark 1

Definition 3

3 A Lower Bound for the Quenched Pressure of the DBM

Theorem 1

Proof

Remark 2

4 The Annealed Region of the DBM

Definition 4

Theorem 2

Proof

Definition 5

Remark 3

Proposition 1

Proof

Remark 4

Proposition 2

Lemma 1

Proof

5 The Replica Symmetric Ansatz for the DBM

Definition 6

Proposition 3

Proof

Remark 5

Remark 6

Proposition 4

Proof

Theorem 3

Lemma 2

6 A Replica Symmetric Bound for the DBM

Theorem 4

Proof

Corollary 1

Proof

Corollary 2

Proof

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Matching Polynomials

Appendix: Matching Polynomials

Theorem 5

Proof

Corollary 3

Proof

Remark 7

Proposition 5

Proof

Corollary 4

Proof

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation