Abstract
Auffinger and Chen (J Stat Phys 157:40–59, 2014) proved a variational formula for the free energy of the spherical bipartite spin glass in terms of a global minimum over the overlaps. We show that a different optimisation procedure leads to a saddle point, similar to the one achieved for models on the vertices of the hypercube.
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
1 Introduction
Let \(\sigma _{N}(dx)\) denote the uniform probability measure on\(S^N:=\{x\in \mathbb {R}^N\,:\,\Vert x\Vert ^2_2=N\}\), where \(\Vert x\Vert _2\) is the Euclidean norm. For \(x:=(x_1,\ldots x_{N_1})\in \mathbb {R}^{N_{1}}\) and \(y:=(y_1,\ldots ,y_{N_2})\in \mathbb {R}^{N_{2}}\) the bipartite spin glass is defined by the energy function
Here \(\{\xi _{ij}\}_{i\in [N_1],j\in [N_{2}]}\) are \({\mathcal {N}}(0,1)\) i.i.d. quenched r.vs. and we set \(N:=N_1+N_2\). The object of interest of this note is the free energy
in the limit in which \(N_1,N_2\rightarrow \infty \) with \(N_1/N\rightarrow \alpha \in (0,1)\). Here \(\beta \;\geqslant \;0\) is the inverse temperature, \(b_1,b_2\in \mathbb {R}\) are external fields and \((\cdot ,\cdot )\) denotes the Euclidean inner product. By concentration of Lipschitz functions of Gaussian random variables one reduces to study the average free energy \(A_{N_1,N_2}(\beta ):=E[A_{N_1,N_2}(\beta ,\xi )]\), whose limit we denote by \(A(\alpha ,\beta )\).
Auffinger and Chen proved in [1] the following variational formula for \(A(\alpha ,\beta )\) for \(\beta \) small enough
(the normalisation in (1) leads to different constants w.r.t. [1]). The above formula was successively proved to hold in the whole range of \(\beta \;\geqslant \;0\) in [2, 9]. Yet these proofs are indirect, as in both cases one obtains a formula for the free energy and then verifies a posteriori (analytically for [2] and numerically [14] for [9]) that it coincides with (3). We just mention that the results in [1] have been recently extended in [10, 11] for the complexity and in [5, 6] for the free energy.
The convex variational principle found by Auffinger and Chen appears to be in contrast with the \(\min \max \) characterisation given in [4, 7] for models on the vertices of the hypercube (see also [3] for the Hopfield model). The aim of this note is to show that the Auffinger and Chen formula can be equivalently expressed in terms of a \(\min \max \).
One disadvantage of the spherical prior is that the associated moment generating function
is not easy to compute. If h is random with i.i.d. \({\mathcal {N}}(b,q)\) components it is convenient to set
The so-called Crisanti–Sommers variational characterisation of it as \(N\rightarrow \infty \) reads as follows.
Lemma 1
Let \(b\in \mathbb {R}\), \(q>0\), \(h\in \mathbb {R}^N\) with i.i.d \({\mathcal {N}}(b,\sqrt{q})\) components. Then
At the end of this note we give a simple proof of this statement, based on the method of [8, 9]. We first get a variational characterisation of the moment generating function of a Gaussian distribution (whose variance is Legendre conjugate to q) and then use concentration of measure.
A direct computation shows that the minimum of (7) is attained for
A standard replica symmetric interpolation gives that for any \(q_1,q_2\in [0,1]\)
The last summand is an error term whose specific form is not important here. What matters is that by [1, Lemma 1] there is a choice of \((q_1,q_2)\) (see below) for which this remainder goes to zero as \(N\rightarrow \infty \) if \(\beta \) is small enough. Combining (7) and (8) we can rewrite the first line of (9) as
under the condition
Here we used that there is a sequence \(o_N\rightarrow 0\) uniformly in \(q_1,q_2,\beta ,\alpha \) such that
Indeed (12) follows easily once we use Lemma 1 for the limit of the functions \(\Gamma _N\) and we note that (11) are the critical point equations related to the minimisation of (7).
The main observation of this note is that (10) under (11) is optimised as a \(\min \max \).
Proposition 1
Assume \(b_1^2+b_2^2>0\). The function \({{\,\mathrm{RS}\,}}(q_1,q_2)\) has a unique stationary point \((\bar{q}_1, \bar{q}_2)\). It solves
Moreover
If \(b_1=b_2=0\) and
the origin is the unique solution of (13) and
If \(b_1=b_2=0\) and
there is a unique \((\bar{q}_1, \bar{q}_2)\ne (0,0)\) which solves (13) and such that (14) holds. Moreover
The crucial point of [1, Lemma 1] (for us) is that from the Latala argument [13, Sect. 1.4] it follows that the overlaps self-average as \(N\rightarrow \infty \) at a point \((\tilde{q}_1, \tilde{q}_2)\) uniquely given by
which (see [12, Lemma 7]) are indeed asymptotically equivalent to
naturally arising from the replica symmetric interpolation (here h is random with i.i.d. \(\mathcal N(0,1)\) entries). Comparing (11) and (19) readily implies that we can plug \((r_1,r_2)=(q_1,q_2)\) into (10) and obtain the convex function \(P(q_1,q_2)\) of [1, Theorem 1], optimised by (19).
On the other hand, without using the Latala method one might still optimise (10) as a function of four variables, ignoring (11). Taking derivatives first in \(q_1,q_2\), the critical point equations (24), (2425) below select exactly \((q_1,q_2)=(r_1,r_2)\). This procedure is however unjustified a priori and this particular application of Latala’s method legitimises the exchange in the order of the optimisation of the q and the r variables for small \(\beta \), which a posteriori can be extended to all \(\beta \) [2, 9].
We stress that by itself the Latala method is not variational, it only gives the self-consistent equations for the critical points. It is the Crisanti–Sommers formula (7) which makes it implicitly variational. Such a variational representation is not necessary in other cases of interest, for instance for the bipartite SK model (namely Hamiltonian (1) with \(\pm 1\) spins), for which one simply has the \(\log \cosh \). Indeed in this case a direct use of the Latala method yields the validity of the \(\min \max \) formula of [4] for \(\beta \) and \(|b_1|,|b_2|\) small enough. The proof is essentially an exercise after [13, Proposition 1.4.8] and [1, Formula (9)] and will not be reproduced here in details. The replica symmetric sum-rule for the free energy (analogue of formula (9)) reads as
(here \(g\sim {\mathcal N}(0,1)\)) and the error term can be shown by the Latala method to vanish for small \(\beta ,|b_1|, |b_2|\), if \((q_1,q_2)=(\bar{q}_1, \bar{q}_2)\) are given by
Therefore the free energy equals the first two lines on the r.h.s. of (22) evaluated in \((q_1,q_2)=(\bar{q}_1, \bar{q}_2)\), which is the value attained at the \(\min \max \), as shown in [4, 7].
2 Proofs
Proof of Proposition 1
Assume first \(b_1^2+b^2_2>0\). We differentiate (10) and by (11) we get
The functions \(r_1,r_2\) write explicitly as
We easily see that \(r_1,r_2\) are increasing from \(r_1(0),r_2(0)>0\) (obviously computable by the formulas above) to 1 and concave. Moreover we record for later use that if \(b_1=b_2=0\) we have
Now we take the derivative w.r.t. \(q_1\) and note that the r.h.s. of (24) is decreasing as a function of \(q_1\), thus \(\partial ^2_{q_1} {{\,\mathrm{RS}\,}}<0\). Therefore by the implicit function theorem there is a unique function \(q_1\) such that \(q_2=r_2(q_1)\). As a function of \(q_2\), \(q_1\) is non-negative, increasing and convex and it is \(q_1(r_2(0))=0\). We set
and compute
By the properties of the functions \(q_1\) and \(r_1\) it is clear that there is a unique intersection point \(\bar{q}_2\); moreover \(q_1\;\leqslant \;r_1\) for \(q_2\;\leqslant \;\bar{q}_2\) and otherwise \(q_1\;\geqslant \;r_1\). Therefore \(\partial _{q_2}{{\,\mathrm{RS}\,}}_1(q_2)\) is increasing in a neighbourhood of \(\bar{q}_2\) which allows us to conclude \(\partial ^2_{q_2}{{\,\mathrm{RS}\,}}_1>0\). This finishes the proof if \(b_1^2+b_2^2>0\).
If \(b_1=b_2=0\) the origin is always a stationary point. It is unique if
which, bearing in mind (28), amounts to ask (15).
Since \(r_2\) is increasing around the origin, we have \(\partial ^2_{q_1}{{\,\mathrm{RS}\,}}<0\) and by the implicit function theorem we define locally a function \(q_1(q_2)\) increasing and positive, vanishing at the origin. We set
and compute
By (31) we have \(\partial ^2_{q_2}{{\,\mathrm{RS}\,}}_1\big |_{q_2=0}>0\,\), whence we obtain (16).
If (17) holds, then
which proceeding as before leads to (18).
However also in the case \(b_1=b_2=0\) we can repeat all the steps done in the case \(b_1^2+b_2^2>0\), showing the existence of a point \((\bar{q}_1, \bar{q}_2)\) in which a \(\min \max \) of \({{\,\mathrm{RS}\,}}\) is attained. If (31) (i.e. (15)) holds then it must be \((\bar{q}_1, \bar{q}_2)=(0,0)\). If (17) holds, then (34) enforces
in a neighbourhood of the origin (as \(q_1(0)=r_1(0)=0\)), which implies that the critical point \((\bar{q}_1, \bar{q}_2)\) must fall elsewhere. \(\square \)
Proof of Lemma 1
We will prove that for all \(u\in \sqrt{q}S^N\)
We show first that (35) implies the assertion. Let h be a random vector with i.i.d. \(\mathcal N(0,q)\) entries. (As customary we write \(X\simeq Y\) if there are constants \(c,C>0\) such that \(cY\;\leqslant \;X\;\leqslant \;CY\)). The classical estimates
permit us to write for all \(t>0\) (small)
for some \(u^*\in \sqrt{q}S^{N}\) and \(o(t)\rightarrow 0\) as \(t\rightarrow 0\). Since \(t>0\) is arbitrary we obtain
It remains to show (35). Given \(\varepsilon >0\) we introduce the spherical shell
and the measure \(\sigma _N^{(\varepsilon )}\) as the uniform probability on it. For any \(\theta >0\) we have
Therefore for \(C>0\) large enough
Since this inequality holds for all \(\theta >0\) and \(\varepsilon >0\) we have
We set for brevity
and notice that \(\Gamma _1\) is uniformly convex in all the intervals \((0,\theta _0)\) for finite \(\theta _0>0\).
For the reverse bound, again we let \(\theta >0\) and write
The first summand on the r.h.s. can be written as before
For the second summand we introduce \(\eta \in (0,\frac{\theta }{2})\) and bound
Thus
with
Now we define
and we seek \(\bar{\theta }>0\) for which \(\Delta _{12},\Delta _{13}\;\geqslant \;0\) for sufficiently small \(\eta \). Since \(\Delta _{12}(0,\theta )=\Delta _{13}(0,\theta )=0\) it suffices to study
A direct computation shows
Combining (47), (48) and (49) we see that plugging \(\bar{\theta }=\arg \min \Gamma _1\) into (45) we arrive to
and changing variable \(\theta =(1-r)^{-1}\) we obtain (35). \(\square \)
Data Availibility
The preparation of this manuscript did not need any data and material.
References
Auffinger, A., Chen, W.-K.: Free energy and complexity of spherical bipartite models. J. Stat. Phys. 157, 40–59 (2014)
Baik, J., Lee, J.O.: Free energy of bipartite spherical Sherrington-Kirkpatrick model. Ann. Inst. H. Poincaré Probab. Stat. 56(4), 2897–2934 (2020)
Barra, A., Genovese, G., Guerra, F.: The replica symmetric behaviour of the analogical neural network. J. Stat. Phys. 142, 654 (2010)
Barra, A., Genovese, G., Guerra, F.: Equilibrium statistical mechanics of bipartite spin systems. J. Phys. A 44, 245002 (2011)
Bates, E., Sohn, Y.: Free energy in multi-species mixed \(p\)-spin spherical models arXiv:2109.14790 (2021)
Bates, E., Sohn, Y.: Crisanti-Sommers formula and simultaneous symmetry breaking in multi-species spherical spin glasses, arXiv:2109.14791 (2021)
Genovese, G.: Minimax formula for the replica symmetric free energy of deep restricted Boltzmann machines (2020)
Genovese, G., Tantari, D.: Legendre duality of spherical and Gaussian spin glasses. Math. Phys. Anal. Geom. 18, 1 (2015)
Genovese, G., Tantari, D.: Legendre equivalences of spherical Boltzmann machines. J. Phys. A 53(9), 094001 (2020)
Kivimae, P.: The ground state energy and concentration of complexity in spherical bipartite models, arXiv:2107.13138 (2021)
McKenna, B.: Complexity of bipartite spherical spin glasses, arXiv:2105.05043 (2021)
Panchenko, D.: Cavity method in the spherical SK model. Ann. Inst. H. Poincaré Probab. Stat. 45(4), 1020–1047 (2009)
Talagrand, M.: Mean Field Models for Spin Glasses, vol. 1. Springer, Berlin (2011)
Tantari, D.: Private communication
Acknowledgements
This manuscript benefited greatly from the observations of two anonymous referees, who are gratefully acknowledged.
Funding
Open access funding provided by University of Zurich.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The preparation of this manuscript did not involve any financial and non-financial conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Genovese, G. A Remark on the Spherical Bipartite Spin Glass. Math Phys Anal Geom 25, 14 (2022). https://doi.org/10.1007/s11040-022-09426-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11040-022-09426-5