Abstract
We investigate the thermodynamic properties of a restricted Boltzmann machine (RBM), a simple energy-based generative model used in the context of unsupervised learning. Assuming the information content of this model to be mainly reflected by the spectral properties of its weight matrix W, we try to make a realistic analysis by averaging over an appropriate statistical ensemble of RBMs. First, a phase diagram is derived. Otherwise similar to that of the Sherrington–Kirkpatrick (SK) model with ferromagnetic couplings, the RBM’s phase diagram presents a ferromagnetic phase which may or may not be of compositional type depending on the kurtosis of the distribution of the components of the singular vectors of W. Subsequently, the learning dynamics of the RBM is studied in the thermodynamic limit. A “typical” learning trajectory is shown to solve an effective dynamical equation, based on the aforementioned ensemble average and explicitly involving order parameters obtained from the thermodynamic analysis. In particular, this let us show how the evolution of the dominant singular values of W, and thus of the unstable modes, is driven by the input data. At the beginning of the training, in which the RBM is found to operate in the linear regime, the unstable modes reflect the dominant covariance modes of the data. In the non-linear regime, instead, the selected modes interact and eventually impose a matching of the order parameters to their empirical counterparts estimated from the data. Finally, we illustrate our considerations by performing experiments on both artificial and real data, showing in particular how the RBM operates in the ferromagnetic compositional phase.
Similar content being viewed by others
Notes
A somewhat different form of the TAP equations.
Note that in [17] a dependence \(\sqrt{\kappa (1-\kappa )}\) \(\left( \sqrt{\alpha (1-\alpha )} \text {in their notation} \right) \) is found. This dependence is hidden in our definition of \(\sigma ^2\) giving \(L=\sqrt{N_v N_h}\) times the variance of \(r_{ij}\) instead of \(N_v+N_h\) as in their case.
References
Smolensky, P.: Information processing in dynamical systems: foundations of harmony theory, chapter 6. In: Rumelhart, D., McLelland, J. (eds.) Parallel Distributed Processing, pp. 194–281. MIT Press, Cambridge (1986)
Salakhutdinov, R., Hinton, G.: Deep Boltzmann machines. In: Artificial Intelligence and Statistics, pp. 448–455 (2009)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771–1800 (2002)
Tieleman, T.: Training restricted Boltzmann machines using approximations to the likelihood gradient. In: Proceedings of the 25th International Conference on Machine Learning, ICML ’08, pp. 1064–1071. ACM, New York (2008)
Hinton, G.E.: A Practical Guide to Training Restricted Boltzmann Machines. Springer, Berlin (2012)
Salazar, D.S.P.: Nonequilibrium thermodynamics of restricted Boltzmann machines. Phys. Rev. E 96, 022131 (2017)
Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 79(8), 2554–2558 (1982)
Amit, D.J., Gutfreund, H., Sompolinsky, H.: Statistical mechanics of neural networks near saturation. Ann. Phys. 173(1), 30–67 (1987)
Gardner, E.: Maximum storage capacity in neural networks. Europhys. Lett. 4(4), 481 (1987)
Gardner, E., Derrida, B.: Optimal storage properties of neural network models. J. Phys. A 21(1), 271 (1988)
Barra, A., Bernacchia, A., Santucci, E., Contucci, P.: On the equivalence of Hopfield networks and Boltzmann machines. Neural Netw. 34, 1–9 (2012)
Marylou, G., Tramel, E.W., Krzakala, F.: Training restricted Boltzmann machines via the Thouless-Anderson-Palmer free energy. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS’15, pp. 640–648 (2015)
Huang, H., Toyoizumi, T.: Advanced mean-field theory of the restricted Boltzmann machine. Phys. Rev. E 91(5), 050101 (2015)
Takahashi, C., Yasuda, M.: Mean-field inference in gaussian restricted Boltzmann machine. J. Phys. Soc. Jpn. 85(3), 034001 (2016)
Furtlehner, C., Lasgouttes, J.-M., Auger, A.: Learning multiple belief propagation fixed points for real time inference. Phys. A 389(1), 149–163 (2010)
Barra, A., Genovese, G., Sollich, P., Tantari, D.: Phase diagram of restricted Boltzmann machines and generalized Hopfield networks with arbitrary priors. Phys. Rev. E 97, 022310 (2018)
Huang, H.: Statistical mechanics of unsupervised feature learning in a restricted Boltzmann machine with binary synapses. J. Stat. Mech. 2017(5), 053302 (2017)
Agliari, E., Barra, A., Galluzzi, A., Guerra, F., Moauro, F.: Multitasking associative networks. Phys. Rev. Lett. 109, 268101 (2012)
Monasson, R., Tubiana, J.: Emergence of compositional representations in restricted Boltzmann machines. Phys. Rev. Let. 118, 138301 (2017)
Zdeborová, L., Krzakala, F.: Statistical physics of inference: thresholds and algorithms. Adv. Phys. 65(5), 453–552 (2016)
Tipping, M.E., Bishop, C.M.: Mixtures of probabilistic principal component analyzers. Neural Comput. 11(2), 443–482 (1999)
Bourlard, H., Kamp, Y.: Auto-association by multilayer perceptrons and singular value decomposition. Biol. Cybern. 59(4), 291–294 (1988)
Saxe, A. M., McClelland, J. L., Ganguli, S.: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks (2014). arXiv:1312.6120
Decelle, A., Fissore, G., Furtlehner, C.: Spectral dynamics of learning in restricted Boltzmann machines. EPL 119(6), 60001 (2017)
Tramel, E.W., Gabrié, M., Manoel, A., Caltagirone, F., Krzakala, F.: A Deterministic and generalized framework for unsupervised learning with restricted Boltzmann machines (2017). arXiv:1702.03260
Marčenko, V.A., Pastur, L.A.: Distribution of eigenvalues for some sets of random matrices. Math. USSR-Sbornik 1(4), 457 (1967)
Mézard, M.: Mean-field message-passing equations in the Hopfield model and its generalizations. Phys. Rev. E 95, 022117 (2017)
Parisi, G., Potters, M.: Mean-field equations for spin models with orthogonal interaction matrices. J. Phys. A 28(18), 5267 (1995)
Opper, M., Winther, O.: Adaptive and self-averaging Thouless–Anderson–Palmer mean field theory for probabilistic modeling. Phys. Rev. E 64, 056131 (2001)
Amit, D.J., Gutfreund, H., Sompolinsky, H.: Spin-glass models of neural networks. Phys. Rev. A 32, 1007–1018 (1985)
Mézard, M., Parisi, G., Virasoro, M.A.: Spin Glass Theory and Beyond. World Scientific, Singapore (1987)
Almeida, J.R.L., Thouless, D.J.: Stability of the Sherrington–Kirkpatrick solution of a spin glass model. J. Phys. A 11(5), 983–990 (1978)
Hohenberg, P.C., Cross, M.C.: An introduction to pattern formation in nonequilibrium systems, pp. 55–92. Springer, Berlin (1987)
Mastromatteo, I., Marsili, M.: On the criticality of inferred models. J. Stat. Mech. 2011(10), P10012 (2011)
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: AT Line
The stability of the RS solution to the mean-field equations is studied along the lines of [33] by looking at the Hessian of the replicated version of the free energy and identifying eigenmodes from symmetry arguments. Before taking the limit \(p\rightarrow 0\) the free energy reads
with \(A_p\) and \(B_p\) given in (10,11). Assuming the small perturbations
around the saddle point \((m_\alpha ,\bar{m}_\alpha ,q,\bar{q})\), the perturbed free energy reads
where CT means “conjugate term” in the sense \(\epsilon \leftrightarrow \bar{\epsilon }\), \(A_{\alpha \beta } \leftrightarrow \bar{A}_{\alpha \beta }\)..., where \(\bar{\delta }_{ab} {\mathop {=}\limits ^{\text{ def }}}1-\delta _{ab}\) and the operators are given by
with
Conjugate quantities are obtained by replacing \(m_\alpha \) by \(\bar{m}_\alpha \), q by \(\bar{q}\), \(u^\alpha \) by \(v^\alpha \), \(\eta _\alpha \) by \(\theta _\alpha \) and \(\kappa \) by \(1/\kappa \). As for the SK model, the \(2Kp\times 2Kp\) Hessian thereby defined can be diagonalized with the help of three similar sets of eigenmodes corresponding to different permutation symmetries in replica space.
The first set corresponds to \(2K+2\) replica symmetric modes defined by \(\eta _\alpha ^a = \eta _\alpha \) and \(\eta _{ab} = \eta \) solving the linear system
with eigenvalue \(\lambda \) solving a polynomial equation of degree \(2K+2\) corresponding to a vanishing determinant in the above system.
The second set corresponds to a broken replica symmetry where one replica \(a_0\) is different from the others
This set has dimension \((2K+2)(p-1)\). Its parameterization is obtained by imposing orthogonality with the previous one. The corresponding system reads
Finally the eigenmodes of the Hessian are made complete by considering a broken symmetry where two replicas \(a_0\) and \(a_1\) are different from the others, with the following parameterization dictated again by orthogonality constraints with the previous sets:
The dimension of this set is now \(p(p-3)\), and it represents eigenvectors iff the following system of equations is satisfied
The corresponding eigenvalues read
with degeneracy \(p(p-3)/2\). Finally the RS stability condition reads
which reduces to the same form of the AT line for the SK model when \(\kappa =1\), except for the u and v averages that are specific to our model. As seen in Fig. 2 the influence of \(\kappa \) is very limited.
Appendix B: Synthetic Dataset
The multimodal distribution modeling the N-dimensional synthetic data is
where \(C\) is the number of clusters, \(p_c\) is a weight and \({\varvec{h}}^c\) is a hidden field for cluster \(c\). The values for \(p_c\) are taken at random and normalized, while to compute \(h_i^c\) we take into account the magnetizations \(m_i^c = \tanh (h_i^c)\). Expanding over the spectral modes, we can set an effective dimension \(d\) by constraining the sum to the range \(\alpha = 1, \dots , d \)
Clusters’ magnetizations \(m_{\alpha }^c\) are drawn at random between \([-1, 1]\) and normalized with the factor
where \(r\) is introduced to decrease the clusters’ polarizations (in our simulations, we used \(\eta = 0.3\)). The spectral basis \( u_i^\alpha \) is obtained by drawing at random \(d\) N-dimensional vectors and applying the Gram-Schmidt process (which can be safely employed as N is supposedly big and thus the initial vectors are nearly orthogonal). The hidden fields are then obtained from the magnetizations
and the samples are generated by choosing a cluster according to \(p_c\) and setting the visible variables to \( \pm 1\) according to
Rights and permissions
About this article
Cite this article
Decelle, A., Fissore, G. & Furtlehner, C. Thermodynamics of Restricted Boltzmann Machines and Related Learning Dynamics. J Stat Phys 172, 1576–1608 (2018). https://doi.org/10.1007/s10955-018-2105-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10955-018-2105-y