Deep Boltzmann Machines: Rigorous Results at Arbitrary Depth

A class of deep Boltzmann machines is considered in the simplified framework of a quenched system with Gaussian noise and independent entries. The quenched pressure of a K-layers spin glass model is studied allowing interactions only among consecutive layers. A lower bound for the pressure is found in terms of a convex combination of K Sherrington–Kirkpatrick models and used to study the annealed and replica symmetric regimes of the system. A map with a one-dimensional monomer–dimer system is identified and used to rigorously control the annealed region at arbitrary depth K with the methods introduced by Heilmann and Lieb. The compression of this high-noise region displays a remarkable phenomenon of localisation of the processing layers. Furthermore, a replica symmetric lower bound for the limiting quenched pressure of the model is obtained in a suitable region of the parameters and the replica symmetric pressure is proved to have a unique stationary point.


Introduction and results
The mean-field setting in Statistical Mechanics corresponds to the invariance of an N particles system under the permutation group action.When this condition is weakened to permutation invariance within each set of a Kpartition of the system K p=1 N p = N , a homogeneous model generalizes to its K-populated version.This generalization has been considered in spin systems for both non-random interactions, i.e. the Curie-Weiss model [13,12], and random interactions, i.e. the Sherrington-Kirkpatrick model [7,23].For the first case a complete control of the thermodynamic properties has been reached for general values of the interaction parameters.In the random case instead only the so called elliptic structure of the interactions is fully controlled, while the hyperbolic one is still not understood.We mention that the case K = 2 has already been solved in two particular frameworks characterized by replica symmetry: on the Nishimori line [6] or with spherical spins [4,5].
In this paper we continue the analysis started in [2,8] concerning a mean-field spin glass with pure hyperbolic structure of the interactions, i.e. a random version of deep Boltzmann machines [DBM] over K layers [24].The framework of [2] is generalized by dealing with a general number K of layers and by allowing local (layer dependent) temperatures.A lower bound for the quenched pressure in terms of K Sherringhton-Kirkpatrick models [SK] coupled in temperature along a linear chain is obtained and used to study the annealed and replica symmetric regimes of the random DBM in the large volume limit.
Our first result is a control of the annealed region A K in terms of the largest zero of a matching polynomial which -up to a change of variable in the complex plane-is the partition function of a monomer-dimer system over the linear chain of length K [18,19].This region A K turns out to be exactly the one where the annealed solution q = 0 is stable for the replica symmetric consistency equation.The compression of the annealed region leads to a peculiar structure of the layers: in particular the extensive layers are localized along a chain of length two or three.
A replica symmetric lower bound for the quenched pressure is obtained in a suitable region of the parameters.In the case of Gaussian external fields this region is identified by a K-dimensional version of the Almeida-Thouless condition for SK.Within this framework the replica symmetric consistency equation is proved to have a unique solution on the whole space of parameters.It is important to mention that the uniqueness for the elliptic case [23] is still an open problem.
The paper is organised as follows.Section 2 introduces the model.In Section 3 we provide a lower bound for the quenched pressure of the DBM in terms of an interacting variational principle.In Section 4 we identify and study a region where the quenched and the annealed pressure of the DBM coincide.In Section 5 we derive the replica symmetric functional for the DBM and we study its stationary point(s).In Section 6 we provide a lower bound for the quenched pressure of the DBM in terms of the previous replica symmetric functional under suitable conditions on the parameters of the model.Appendix A contains properties of the matching polynomials zeros, which are useful to characterize the annealed region in Section 4 and are mainly due to Heilmann and Lieb [18].

Definitions
Consider N spin variables σ = (σ i ) i=1,...,N ∈ {−1, 1} N arranged over K layers L 1 , . . ., L K of cardinality N 1 , . . ., N K respectively, so that K p=1 N p = N .Assume that the relative sizes of the layers converge in the large volume limit: for every p = 1, . . ., K .We denote and λ = (λ p ) p=1,...,K .Clearly K p=1 λ p = 1 .Let J ij for (i, j) ∈ L p × L p+1 and p = 1, . . ., K − 1 be a family of i.i.d.standard Gaussian random variables coupling spins in two consecutive layers.We introduce a vector of positive inverse temperatures tuning the interactions among consecutive layers Let h i for i ∈ L p and p = 1, . . ., K be a family of independent real random variables, independent also of the J ij 's, acting as external fields on the spins.Assume that (h i ) i∈Lp are i.i.d.copies of a random variable h (p)  such that E|h (p) | < ∞ .We denote h = (h (p) ) p=1,...,K .
and its quenched pressure density is where E denotes the expectation over all the couplings J ij 's and the external fields h i 's.

A lower bound for the quenched pressure of the DBM
In this section we give an explicit bound for the quenched pressure of the K layers DBM in terms of K independent Sherrington-Kirkpatrick spin-glasses [SK] [22,25,14].Considering N spin variables σ i , i = 1, . . ., N , we recall that the Hamiltonian of the SK model is where Jij , i, j = 1, . . ., N is a family of i.i.d.standard Gaussian random couplings.Given two spin configurations σ, τ ∈ {−1, 1} N , their overlap is and the covariance matrix of the Gaussian process Given an inverse temperature β > 0, the random partition function of the SK model is where hi , i = 1, . . ., N is a family of i.i.d.copies of a random variable h such that E|h| < ∞ .The quenched pressure density of the SK model is where E to denote the expectation over all couplings Jij 's and fields hi 's.The quenched pressure converges as N → ∞ and many properties of its limit, that we will denote by p SK (β, h) , have been investigated in the literature [21,17,15,25,22,3].

+
, the functional P DBM (a) = P DBM (a; β, λ, h) is defined as: (3.7) and the parameter θ p (a) = θ p (a; β, λ) ≥ 0 is defined by: Proof.We are going to prove the following lower bound at finite volume: (3.9) where θ (N ) p ≡ θ p (a; β, λ (N ) ) and a ∈ R K−1 + can be arbitrarily chosen.The lower bound (3.6) will follow immediately by letting N → ∞, since p SK N (β, h) is convex with respect to β and thus the convergence to p SK is uniform on compact sets.
For every p = 1, . . ., K let H SK Lp (s), s ∈ {−1, 1} Lp be a Gaussian process representing the Hamiltonian of an SK model over the N p spin variables in the layer L p .We assume that H SK L 1 , . . ., H SK L K are independent processes, also independent of the Hamiltonian H Λ N .For σ ∈ {−1, 1} N and t ∈ [0, 1] we define an interpolating Hamiltonian as follows: where of course σ Lp ≡ (σ i ) i∈Lp .An interpolating quenched pressure is naturally defined as where and E denotes the expectation with respect to all the couplings J ij 's, Jij 's, h i 's.The quenched pressure of the DBM and a convex combination of quenched pressures of SK models are recovered for t = 1 and t = 0 respectively: For every function Gaussian integration by parts leads to the following result: Now replacing the definition (3.8) of θ The claim (3.9) follows immediately from (3.13), (3.14), (3.17) and (3.18).

The annealed region of the DBM
In this Section we consider the model in absence of external field (h = 0) and we identify a region where the quenched and the annealed pressure of the DBM coincide.
Definition 4.1.The annealed pressure of the DBM is It can be easily computed due to the Gaussian nature of the model: By concavity of the log, the annealed pressure is an upper bound for the quenched one: lim sup 3 The system is said to be in the annealed regime when the parameters (β, λ) are such that lim N →∞ p DBM Λ N = p DBM-A .By Theorem 3.1 we can investigate the annealed regime of the DBM relying on the established results for the annealed regime of the SK model.Let p SK be the limiting quenched pressure of an SK model and let p SK-A ≡ lim N →∞ N −1 log EZ SK N be its annealed version.Clearly: Equality is achieved in the so called annealed region of the SK model [1,14,22,25]: Now consider the following system of inequalities: and the following region of parameters of the DBM: where Proof.The lower bound (3.6) for the quenched pressure of the DBM rewrites as: (4.9) Thanks to (4.4) and (4.5), if (β, λ) ∈ A K then the supremum in (4.9) vanishes and lim inf This bound together with (4.3) concludes the proof.
It is an open question whether A K is the full annealed region of the system.We will see that Proposition 5.2 suggests a positive answer.We are now interested in a more explicit characterization of A K .We mention that such a characterization can be interesting for inference problems as suggested in [10].It is convenient to introduce the following family of polynomials.Definition 4.2.Let x ∈ C and t = (t p ) p=1,...,K−1 ∈ [0, ∞) K−1 .We define recursively These orthogonal polynomials have several characterizations and were studied by Heilmann and Lieb [18,19].Some relevant properties can be found in the Appendix A.
Indeed using the Laplace expansion according to the last line of the matrix, it is easy to verify that the determinant on the right hand side of (4.20) satisfies the recursion relation (4.11).Now since the zeros of x → ∆ K (x, t(β, λ)) are all real and symmetric with respect to the origin (see the Appendix), the largest one is the spectral radius of M (β, λ) : The next Proposition exploits the result of Proposition 4.1 in order to study the role of the parameters β and λ in the annealed behaviour of the system.
Physically ii) means that increasing the local temperatures pushes the system toward the annealed region.On the other hand i) implies that if all the inverse temperatures β p < 1 for p = 1, . . ., K − 1, then the system is in the annealed regime for every choice of the form factors λ.Furthermore if this is not the case, the system can be driven out of the region A K by localizing the positive density layers around the minimal temperature(s).
In order to prove Proposition 4.2 we need the following elementary (but useful) or there exists p * ∈ {1, . . ., P − 1} such that the following inequality holds true: and the square of the matrix (4.19) can be easily computed leading to where for every p = 1, . . ., K, p ′ = p − 2, . . ., p + 1 we set b (p) This concludes the proof of Proposition 4.2 part i).In order to prove part ii), we observe that the matrix M (β, λ) has non-negative entries, therefore its spectral radius ρ(β, λ) is a non-decreasing function of its entries.

The replica symmetric ansatz for the DBM
In this section we derive a replica symmetric expression for the pressure of the DBM.We show that at zero magnetic field the annealed region A K identified by Theorem 4.1 and Proposition 4.1 is the only region where the annealed solution is stable for the replica symmetric consistency equation.Finally we prove the uniqueness of the solution of the replica symmetric consistency equation, under the hypothesis of Gaussian centred external fields.
Proof.Let q ∈ [0, ∞) K .For every p = 1, . . ., K we consider a one-body model over the N p spin variables indexed by the layer L p at inverse temperature (M (N ) q) p and external fields distributed as h (p) .For σ ∈ {−1, 1} N and t ∈ [0, 1] we define an interpolating Hamiltonian as follows: where z i , i ∈ L p , p = 1, . . ., K are i.i.d.standard Gaussian random variables, independent also of h i 's and J ij 's.The interpolating pressure is (5.7) Observe that the quenched pressure of the DBM and a convex combination of quenched pressures of one-body models are recovered for t = 1, t = 0 respectively: Gaussian integration by parts leads to the following result: where • N,t denotes the quenched Gibbs expectation associated to the Hamiltonian H N (σ, t) + H N (τ, t).Therefore (5.5) follows by (5.8), (5.9), (5.10) concluding the proof.
We say that the DBM is in the replica symmetric regime when there exists q * stationary point of P RS-DBM (q) such that lim N →∞ p DBM Λ N = P RS-DBM (q * ) .Remark 5.1.q = (q p ) p=1,...,K is a stationary point of P RS-DBM if and only if where the matrices M = M (β, λ), M 1 = M 1 (β, λ) are defined by (4.19), (2.5) respectively and z is a standard Gaussian random variable independent of h.Indeed Gaussian integration by parts allows to compute ∂ ∂qp P RS-DBM from definition (5.4).Remark 5.2.For h = 0 observe that q = 0 is a solution of (5.11) and the replica symmetric functional computed at this stationary point equals the annealed pressure of the DBM: (5.12) for every p = 1, . . ., K. The region of parameters (β, λ) such that the annealed solution q = 0 is a stable solution of the replica symmetric consistency equation q = F (q) coincides with the region A K introduced in Section 4. Precisely: Proof.Gaussian integration by parts allows to compute the derivatives of F with respect to q, leading to Jac F When the matrix M 1 is invertible, the replica symmetric equation (5.11) rewrites as: The problem of uniqueness of the solution of (5.15) has been proposed by Panchenko in [23] for the convex case (where M is replaced by a positive definite matrix) and solved in [9] for K = 2.In the following we prove the uniqueness for the deep case (our matrix M is highly non-definite) under the assumption of Gaussian centred external fields.Denote . The consistency equation (5.15), which rewrites as with M = M (β, λ) defined in (4.19), has a unique solution.
The proof of Theorem 5.1 relies on the following Lemma 5.1.Let h be a centered Gaussian variable with variance v > 0.
Let β > 0. Then equation has a unique solution that we denote by q RS-SK (β, v) > 0 .The function q RS-SK is strictly increasing with respect to both β and v.
The uniqueness part in Lemma 5.1 is the well-known Latala-Guerra's lemma [25].The monotonicity part is based on a similar argument.Whereas the uniqueness property holds true for much more general choices of the external field h, we notice that the monotonicity property in β is lost for deterministic (large enough) h.Lemma 5.1.Set f (q) ≡ q −1 E tanh 2 (z 2 q β 2 + v ) for q > 0. To prove that (5.17) has a unique solution it suffices to show that f is strictly decreasing.Now taking the derivative of f (avoiding Gaussian integration by parts) leads to: E y φ(y) φ ′ (y) (5.18)where φ(y) ≡ tanh y and y ≡ z 2 q β 2 + v .Since φ is odd, strictly positive on R + , strictly increasing on R and strictly concave on R + , it follows that the functions inside each expectation in (5.18) are strictly positive for y = 0 .
(5.19) Therefore df dq < 0, proving uniqueness of the solution of equation (5.17).Now let's prove that the solution q RS-SK is strictly increasing with respect to β > 0. Taking the derivative with respect to β 2 on both sides of (5.17) (avoiding integration by parts), one finds: where Y ≡ z 2β 2 q RS-SK +v .Reordering terms and replacing q RS-SK by E φ(Y ) 2 leads to: In a similar way one can prove that q RS-SK is strictly increasing with respect to v, indeed: > 0 . (5.22) Theorem 5.1.A key observation is that the system (5.16) is equivalent to the following: where we have introduced the auxiliary variables a 1 , . . ., a K−1 > 0 .This can be easily checked by comparing definitions (3.8) and (5.1).By Lemma 5.1, the first line of (5.23) entails where q RS-SK is uniquely defined and strictly increasing with respect to both arguments.On the other hand the second line of (5.23) rewrites as Therefore in order to prove the Theorem it suffices to prove uniqueness of the solution a ∈ R K−1 + of the following system: (5.27) We are going to prove by induction on p ≥ 1 that for any given a p+1 ≥ 0 there exists a unique a p = a * p (a p+1 ) > 0 such that (5.28) and moreover a * p is strictly increasing with respect to a p+1 .The uniqueness of solution of (5.26) will follow immediately by stopping at p = K − 1 and choosing a K = 0 .
• Case p = 1: given a 2 ≥ 0, let's consider the equation (5.29)By Lemma 5.1 the left-hand side of (5.29) is a strictly increasing function of a 1 > 0 and takes all the values in the interval (0, ∞), while the righthand side is a decreasing function of a 1 > 0 and takes non-negative values.
Therefore there exists a unique a 1 = a * 1 (a 2 ) > 0 solution of (5.29).Now taking derivatives on both sides of (5.29) and using again Lemma 5.1, one finds: (5.31) By inductive hypothesis and Lemma 5.1, the left-hand side of (5.31) is a strictly increasing function of a p > 0 and takes all the values in the interval (0, ∞), while the right hand-side of (5.31) is a decreasing function of a p > 0 and takes non-negative values.Therefore for every a p+1 ≥ 0 there exists a unique a p = a * p (a p+1 ) > 0 solution of (5.31).Now taking derivatives on both sides of (5.31) one finds: (5.32) which, using again the inductive hypothesis and Lemma 5.1, entails that a * p is a strictly increasing function of a p+1 .

A replica symmetric bound for the DBM
In this section a lower bound for the quenched pressure of the DBM in terms of the replica symmetric functional is provided in a suitable region of the parameters β, λ, h.For centred Gaussian external fields this region is defined though a system of K inequalities which mimic the Almeida-Thouless condition for the SK model.
By Theorem 3.1 we can investigate the replica symmetric regime of the DBM relying on the established results for the replica symmetric regime of the SK model.Denote by P RS-SK the replica symmetric functional of an SK model, namely for every q ∈ [0, 1], β > 0, h real random variable with E |h| < ∞, P RS-SK (q; β, h) ≡ E log cosh z 2 q β 2 + h + β 2 2 (1 − q) 2 + log 2 (6.1) where z is a standard Gaussian random variable independent of h.Stationary points of P RS-SK are identified by the consistency equation where z is a standard Gaussian r.v.independent of h.The celebrated Guerra's bound [15] states in particular that p SK (β, h) ≤ inf q P RS-SK (q; β, h) .( for every β, h.Identifying the exact replica symmetric region of the SK model, where equality in (6.3) is achieved, is an open problem.A first result about the replica symmetric region of the DBM under general (but implicit) conditions is provided by the following Theorem 6.1.For every q ∈ [0, 1] K , a ∈ R K + related by λ p q p a p = λ p+1 q p+1 ∀ p = 1, . . ., K − 1 (6.4) the following inequality holds true: P DBM (a; β, λ, h) ≤ P RS-DBM (q; β, λ, h) .(6.5) Moreover if the parameters β, λ, h are such that there exist q, a related by (6.4) and verifying p SK θ p (a), h (p) = P RS-SK q p ; θ p (a), h (p) ∀ p = 1, . . ., K , (6.6) then equality is achieved in (6.5) and as a consequence lim inf N →∞ p DBM Λ N ≥ P RS-DBM (q; β, λ, h) .(6.7) for every p = 1, . . ., K .Therefore by Theorem 6.1, P DBM (a; β, λ, h) = P RS-DBM (q; β, λ, h) (6.14) and the bound (6.7) holds true.
A complete characterization of the SK replica symmetric region where equality is achieved in (6.3) is still missing (see nevertheless [16,25,20]).A necessary condition is the Almeida-Thouless condition [26]: where q is a solution of the consistency equation (6.2).However if we take h Gaussian centered r.v. with variance v > 0, it was recently proved [11] that the Almeida-Thouless condition is also sufficient to have equality in (6.3).Precisely: p SK (β, h) = P RS-SK (q; β, h) ⇔ q is the (unique) solution of (5.17) .

Definition 2 . 3 .
The random partition function of the model introduced by Hamiltonian (2.2) is