Thermodynamics of Restricted Boltzmann Machines and Related Learning Dynamics

Decelle, A.; Fissore, G.; Furtlehner, C.

doi:10.1007/s10955-018-2105-y

Thermodynamics of Restricted Boltzmann Machines and Related Learning Dynamics

Published: 04 July 2018

Volume 172, pages 1576–1608, (2018)
Cite this article

Journal of Statistical Physics Aims and scope Submit manuscript

918 Accesses
29 Citations
2 Altmetric
Explore all metrics

Abstract

We investigate the thermodynamic properties of a restricted Boltzmann machine (RBM), a simple energy-based generative model used in the context of unsupervised learning. Assuming the information content of this model to be mainly reflected by the spectral properties of its weight matrix W, we try to make a realistic analysis by averaging over an appropriate statistical ensemble of RBMs. First, a phase diagram is derived. Otherwise similar to that of the Sherrington–Kirkpatrick (SK) model with ferromagnetic couplings, the RBM’s phase diagram presents a ferromagnetic phase which may or may not be of compositional type depending on the kurtosis of the distribution of the components of the singular vectors of W. Subsequently, the learning dynamics of the RBM is studied in the thermodynamic limit. A “typical” learning trajectory is shown to solve an effective dynamical equation, based on the aforementioned ensemble average and explicitly involving order parameters obtained from the thermodynamic analysis. In particular, this let us show how the evolution of the dominant singular values of W, and thus of the unstable modes, is driven by the input data. At the beginning of the training, in which the RBM is found to operate in the linear regime, the unstable modes reflect the dominant covariance modes of the data. In the non-linear regime, instead, the selected modes interact and eventually impose a matching of the order parameters to their empirical counterparts estimated from the data. Finally, we illustrate our considerations by performing experiments on both artificial and real data, showing in particular how the RBM operates in the ferromagnetic compositional phase.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Data–Driven Approximation of the Koopman Operator: Extending Dynamic Mode Decomposition

Article 05 June 2015

Optimization for Deep Learning: An Overview

Article 13 June 2020

Hyperparameter Optimization

Notes

A somewhat different form of the TAP equations.
Note that in [17] a dependence $\sqrt{\kappa (1-\kappa )}$ $\left( \sqrt{\alpha (1-\alpha )} \text {in their notation} \right) $ is found. This dependence is hidden in our definition of $\sigma ^2$ giving $L=\sqrt{N_v N_h}$ times the variance of $r_{ij}$ instead of $N_v+N_h$ as in their case.

References

Smolensky, P.: Information processing in dynamical systems: foundations of harmony theory, chapter 6. In: Rumelhart, D., McLelland, J. (eds.) Parallel Distributed Processing, pp. 194–281. MIT Press, Cambridge (1986)
Google Scholar
Salakhutdinov, R., Hinton, G.: Deep Boltzmann machines. In: Artificial Intelligence and Statistics, pp. 448–455 (2009)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article ADS MathSciNet Google Scholar
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771–1800 (2002)
Article Google Scholar
Tieleman, T.: Training restricted Boltzmann machines using approximations to the likelihood gradient. In: Proceedings of the 25th International Conference on Machine Learning, ICML ’08, pp. 1064–1071. ACM, New York (2008)
Hinton, G.E.: A Practical Guide to Training Restricted Boltzmann Machines. Springer, Berlin (2012)
Book Google Scholar
Salazar, D.S.P.: Nonequilibrium thermodynamics of restricted Boltzmann machines. Phys. Rev. E 96, 022131 (2017)
Article ADS Google Scholar
Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 79(8), 2554–2558 (1982)
Article ADS MathSciNet Google Scholar
Amit, D.J., Gutfreund, H., Sompolinsky, H.: Statistical mechanics of neural networks near saturation. Ann. Phys. 173(1), 30–67 (1987)
Article ADS Google Scholar
Gardner, E.: Maximum storage capacity in neural networks. Europhys. Lett. 4(4), 481 (1987)
Article ADS Google Scholar
Gardner, E., Derrida, B.: Optimal storage properties of neural network models. J. Phys. A 21(1), 271 (1988)
Article ADS MathSciNet Google Scholar
Barra, A., Bernacchia, A., Santucci, E., Contucci, P.: On the equivalence of Hopfield networks and Boltzmann machines. Neural Netw. 34, 1–9 (2012)
Article Google Scholar
Marylou, G., Tramel, E.W., Krzakala, F.: Training restricted Boltzmann machines via the Thouless-Anderson-Palmer free energy. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS’15, pp. 640–648 (2015)
Huang, H., Toyoizumi, T.: Advanced mean-field theory of the restricted Boltzmann machine. Phys. Rev. E 91(5), 050101 (2015)
Article ADS MathSciNet Google Scholar
Takahashi, C., Yasuda, M.: Mean-field inference in gaussian restricted Boltzmann machine. J. Phys. Soc. Jpn. 85(3), 034001 (2016)
Article ADS Google Scholar
Furtlehner, C., Lasgouttes, J.-M., Auger, A.: Learning multiple belief propagation fixed points for real time inference. Phys. A 389(1), 149–163 (2010)
Article MathSciNet Google Scholar
Barra, A., Genovese, G., Sollich, P., Tantari, D.: Phase diagram of restricted Boltzmann machines and generalized Hopfield networks with arbitrary priors. Phys. Rev. E 97, 022310 (2018)
Article ADS Google Scholar
Huang, H.: Statistical mechanics of unsupervised feature learning in a restricted Boltzmann machine with binary synapses. J. Stat. Mech. 2017(5), 053302 (2017)
Article MathSciNet Google Scholar
Agliari, E., Barra, A., Galluzzi, A., Guerra, F., Moauro, F.: Multitasking associative networks. Phys. Rev. Lett. 109, 268101 (2012)
Article ADS Google Scholar
Monasson, R., Tubiana, J.: Emergence of compositional representations in restricted Boltzmann machines. Phys. Rev. Let. 118, 138301 (2017)
Article ADS Google Scholar
Zdeborová, L., Krzakala, F.: Statistical physics of inference: thresholds and algorithms. Adv. Phys. 65(5), 453–552 (2016)
Article ADS Google Scholar
Tipping, M.E., Bishop, C.M.: Mixtures of probabilistic principal component analyzers. Neural Comput. 11(2), 443–482 (1999)
Article Google Scholar
Bourlard, H., Kamp, Y.: Auto-association by multilayer perceptrons and singular value decomposition. Biol. Cybern. 59(4), 291–294 (1988)
Article MathSciNet Google Scholar
Saxe, A. M., McClelland, J. L., Ganguli, S.: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks (2014). arXiv:1312.6120
Decelle, A., Fissore, G., Furtlehner, C.: Spectral dynamics of learning in restricted Boltzmann machines. EPL 119(6), 60001 (2017)
Article ADS Google Scholar
Tramel, E.W., Gabrié, M., Manoel, A., Caltagirone, F., Krzakala, F.: A Deterministic and generalized framework for unsupervised learning with restricted Boltzmann machines (2017). arXiv:1702.03260
Marčenko, V.A., Pastur, L.A.: Distribution of eigenvalues for some sets of random matrices. Math. USSR-Sbornik 1(4), 457 (1967)
Article Google Scholar
Mézard, M.: Mean-field message-passing equations in the Hopfield model and its generalizations. Phys. Rev. E 95, 022117 (2017)
Article ADS MathSciNet Google Scholar
Parisi, G., Potters, M.: Mean-field equations for spin models with orthogonal interaction matrices. J. Phys. A 28(18), 5267 (1995)
Article ADS MathSciNet Google Scholar
Opper, M., Winther, O.: Adaptive and self-averaging Thouless–Anderson–Palmer mean field theory for probabilistic modeling. Phys. Rev. E 64, 056131 (2001)
Article ADS Google Scholar
Amit, D.J., Gutfreund, H., Sompolinsky, H.: Spin-glass models of neural networks. Phys. Rev. A 32, 1007–1018 (1985)
Article ADS MathSciNet Google Scholar
Mézard, M., Parisi, G., Virasoro, M.A.: Spin Glass Theory and Beyond. World Scientific, Singapore (1987)
MATH Google Scholar
Almeida, J.R.L., Thouless, D.J.: Stability of the Sherrington–Kirkpatrick solution of a spin glass model. J. Phys. A 11(5), 983–990 (1978)
Article ADS Google Scholar
Hohenberg, P.C., Cross, M.C.: An introduction to pattern formation in nonequilibrium systems, pp. 55–92. Springer, Berlin (1987)
Google Scholar
Mastromatteo, I., Marsili, M.: On the criticality of inferred models. J. Stat. Mech. 2011(10), P10012 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire de Recherche en Informatique, TAU-Inria Saclay, Université Paris-Saclay, Bât 660, 91190, Gif-sur-Yvette, France
A. Decelle & G. Fissore
INRIA Saclay, Orsay, France
C. Furtlehner

Authors

A. Decelle
View author publications
You can also search for this author in PubMed Google Scholar
G. Fissore
View author publications
You can also search for this author in PubMed Google Scholar
C. Furtlehner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to C. Furtlehner.

Appendices

Appendix A: AT Line

The stability of the RS solution to the mean-field equations is studied along the lines of [33] by looking at the Hessian of the replicated version of the free energy and identifying eigenmodes from symmetry arguments. Before taking the limit $p\rightarrow 0$ the free energy reads

$$\begin{aligned} f[m,\bar{m},Q,\bar{Q}] = \sum _{a,\alpha }w_\alpha m_\alpha ^a\bar{m}_\alpha ^a +\frac{\sigma ^2}{2}\sum _{a\ne b} Q_{ab}\bar{Q}_{ab} -\frac{1}{\sqrt{\kappa }} A_p[m,Q]-\sqrt{\kappa }B_p[\bar{m},\bar{Q}], \end{aligned}$$

with $A_p$ and $B_p$ given in (10,11). Assuming the small perturbations

$$\begin{aligned} m_\alpha ^a = m_\alpha +\epsilon _\alpha ^a\qquad \qquad \bar{m}_\alpha ^a = \bar{m}_\alpha +\bar{\epsilon }_\alpha ^a\\ Q_{ab} = q +\eta _{ab}\qquad \qquad \bar{Q}_{ab} = \bar{q} + \bar{\eta }_{ab}, \end{aligned}$$

around the saddle point $(m_\alpha ,\bar{m}_\alpha ,q,\bar{q})$, the perturbed free energy reads

$$\begin{aligned} \Delta f =&\sum _{a,\alpha }w_\alpha \bar{\epsilon }_\alpha ^a\epsilon _\alpha ^a+\frac{\sigma ^2}{2}\sum _{a\ne b}\bar{\eta }_{ab}\eta _{ab} +\sum _{a,b,\alpha ,\beta }\bigl [\bigl (\delta _{ab}\bar{A}_{\alpha \beta }+\bar{\delta }_{ab}\bar{B}_{\alpha \beta }\bigr )\epsilon _\alpha ^a\epsilon _\beta ^b +CT\bigr ]\\&+\sum _{a\ne b,c,\alpha }\bigl [\bigl ((\delta _{ab}+\delta _{ac})\bar{C}_{\alpha }+(1-\delta _{ac}-\delta _{bc})\bar{D}_{\alpha }\bigr )\epsilon _\alpha ^c\eta _{ab} +CT\bigr ]\\&+\sum _{a\ne b,c\ne d}\bigl [\bigl (\delta _{(ab)(cd)}\bar{E}_0+\mathbb {1}_{\{a\in (cd)\oplus b\in (cd)\}}\bar{E}_1+\mathbb {1}_{\{(ab)\cap (cd)=\emptyset \}}\bar{E}_2\bigr )\eta _{ab}\eta _{cd} +CT\bigr ], \end{aligned}$$

where CT means “conjugate term” in the sense $\epsilon \leftrightarrow \bar{\epsilon }$, $A_{\alpha \beta } \leftrightarrow \bar{A}_{\alpha \beta }$..., where $\bar{\delta }_{ab} {\mathop {=}\limits ^{\text{ def }}}1-\delta _{ab}$ and the operators are given by

$$\begin{aligned} A_{\alpha \beta }&{\mathop {=}\limits ^{\text{ def }}}(\delta _{\alpha \beta }-m_\alpha m_\beta )w_\alpha w_\beta \qquad \qquad B_{\alpha \beta } {\mathop {=}\limits ^{\text{ def }}}\Bigl (\mathsf {E}_{x,v}\bigl (v^\alpha v^\beta \tanh ^2(\bar{h}(x,v))\bigr )-m_\alpha m_\beta \Bigr )w_\alpha w_\beta \\ C_\alpha&{\mathop {=}\limits ^{\text{ def }}}\frac{\kappa ^{1/4}\sigma ^2}{2}m_\alpha (1-q)w_\alpha \qquad \qquad D_\alpha {\mathop {=}\limits ^{\text{ def }}}\frac{\kappa ^{1/4}\sigma ^2}{2} \Bigl (\mathsf {E}_{x,v}\bigl (v^\alpha \tanh ^3(\bar{h}(x,v))\bigr )-m_\alpha q\Bigr )w_\alpha \\ E_0&{\mathop {=}\limits ^{\text{ def }}}\frac{\sqrt{\kappa }\sigma ^4}{4}(1-q^2)\qquad E_1 {\mathop {=}\limits ^{\text{ def }}}\frac{\sqrt{\kappa }\sigma ^4}{4}q(1-q)\qquad E_2 {\mathop {=}\limits ^{\text{ def }}}\frac{\sqrt{\kappa }\sigma ^4}{4}\Bigl (\mathsf {E}_{x,v}\bigl (\tanh ^4(\bar{h}(x,v))\bigr )-q^2\Bigr ) \end{aligned}$$

with

$$\begin{aligned} h(x,u) {\mathop {=}\limits ^{\text{ def }}}\kappa ^{1/4}\left( \sqrt{q}\sigma x + \sum _\alpha (m_\alpha w_\alpha - \eta _\alpha )u^\alpha \right) , \end{aligned}$$

Conjugate quantities are obtained by replacing $m_\alpha $ by $\bar{m}_\alpha $, q by $\bar{q}$, $u^\alpha $ by $v^\alpha $, $\eta _\alpha $ by $\theta _\alpha $ and $\kappa $ by $1/\kappa $. As for the SK model, the $2Kp\times 2Kp$ Hessian thereby defined can be diagonalized with the help of three similar sets of eigenmodes corresponding to different permutation symmetries in replica space.

The first set corresponds to $2K+2$ replica symmetric modes defined by $\eta _\alpha ^a = \eta _\alpha $ and $\eta _{ab} = \eta $ solving the linear system

$$\begin{aligned}&\left( \frac{w_\alpha }{2}-\lambda \right) \bar{\epsilon }_\alpha -\frac{1}{2}\bar{A}_{\alpha \alpha }\epsilon _\alpha +\sum _\beta \bigl (\bar{A}_{\alpha \beta } +(p-1)\bar{B}_{\alpha \beta }\bigr )\epsilon _\beta \\&\quad +\left( (p-1)\bar{C}_\alpha +\frac{(p-1)(p-2)}{2}\bar{D}_\alpha \right) \eta = 0\\&\left( \frac{w_\alpha }{2}-\lambda \right) \epsilon _\alpha -\frac{1}{2}A_{\alpha \alpha }\bar{\epsilon }_\alpha +\sum _\beta \bigl (A_{\alpha \beta } +(p-1)B_{\alpha \beta }\bigr )\bar{\epsilon }_\beta \\&\quad +\left( (p-1)C_\alpha +\frac{(p-1)(p-2)}{2}D_\alpha \right) \bar{\eta }= 0\\&\left( \frac{\sigma ^2}{2}-\lambda \right) \bar{\eta }+\sum _\alpha \left( \bar{C}_\alpha +\frac{p-2}{2}\bar{D}_\alpha \right) \epsilon _\alpha +2\left( \bar{E}_0+2(p-2)\bar{E}_1+\frac{(p-2)(p-3)}{2}\bar{E}_2\right) \eta = 0\\&\left( \frac{\sigma ^2}{2}-\lambda \right) \eta +\sum _\alpha \left( C_\alpha +\frac{p-2}{2}D_\alpha \right) \bar{\epsilon }_\alpha +2\left( E_0+2(p-2)E_1+\frac{(p-2)(p-3)}{2}E_2\right) \bar{\eta }= 0 \end{aligned}$$

with eigenvalue $\lambda $ solving a polynomial equation of degree $2K+2$ corresponding to a vanishing determinant in the above system.

The second set corresponds to a broken replica symmetry where one replica $a_0$ is different from the others

$$\begin{aligned} (\epsilon _\alpha ^a,\bar{\epsilon }_\alpha ^a)= & {} {\left\{ \begin{array}{ll} (\epsilon _\alpha ,\bar{\epsilon }_\alpha )\qquad \text {for}\ a\ne a_0\\ (1-p)(\epsilon _\alpha ,\bar{\epsilon }_\alpha )\qquad \text {for}\ a=a_0 \end{array}\right. } \\ (\eta _{ab},\bar{\eta }_{ab})= & {} {\left\{ \begin{array}{ll} (\eta ,\bar{\eta })\qquad \text {for}\ a,b\ne a_0\\ (1-\frac{p}{2})(\eta ,\bar{\eta })\qquad \text {for}\ a=a_0\ or\ b=a_0 \end{array}\right. } \end{aligned}$$

This set has dimension $(2K+2)(p-1)$. Its parameterization is obtained by imposing orthogonality with the previous one. The corresponding system reads

$$\begin{aligned}&\left( \frac{w_\alpha }{2}-\lambda \right) \bar{\epsilon }_\alpha -\frac{1}{2}\bar{A}_{\alpha \alpha }\epsilon _\alpha +\sum _\beta (\bar{A}_{\alpha \beta }-\bar{B}_{\alpha \beta })\epsilon _\beta +\frac{p-2}{2}\bigl (\bar{C}_\alpha -\bar{D}_\alpha \bigr )\eta = 0\\&\left( \frac{w_\alpha }{2}-\lambda \right) \epsilon _\alpha -\frac{1}{2}A_{\alpha \alpha }\bar{\epsilon }_\alpha +\sum _\beta (A_{\alpha \beta }-B_{\alpha \beta })\bar{\epsilon }_\beta +\frac{p-2}{2}\bigl (C_\alpha -D_\alpha \bigr )\bar{\eta }= 0\\&\left( \frac{\sigma ^2}{2}-\lambda \right) \bar{\eta }+\sum _\alpha (\bar{C}_\alpha -\bar{D}_\alpha )\epsilon _\alpha +2\bigl (\bar{E}_0+(p-4)\bar{E}_1-(p-3)\bar{E}_2\bigr )\eta = 0\\&\left( \frac{\sigma ^2}{2}-\lambda \right) \eta +\sum _\alpha (C_\alpha -D_\alpha )\bar{\epsilon }_\alpha +2\bigl (E_0+(p-4)E_1-(p-3)E_2\bigr )\bar{\eta }= 0 \end{aligned}$$

Finally the eigenmodes of the Hessian are made complete by considering a broken symmetry where two replicas $a_0$ and $a_1$ are different from the others, with the following parameterization dictated again by orthogonality constraints with the previous sets:

$$\begin{aligned} (\epsilon _\alpha ^a,\bar{\epsilon }_\alpha ^a) = 0, \qquad (\eta _{ab},\bar{\eta }_{ab}) = {\left\{ \begin{array}{ll} (\eta ,\bar{\eta })\qquad \text {for}\ a,b\ne a_0\\ \frac{3-p}{2}(\eta ,\bar{\eta })\qquad \text {for}\ a\in {a_0,a_1}\ or\ b\in {a_0,a_1}\\ \frac{(p-2)(p-3)}{2}(\eta ,\bar{\eta })\qquad \text {for}\ (a,b)=(a_0,a_1). \end{array}\right. } \end{aligned}$$

The dimension of this set is now $p(p-3)$, and it represents eigenvectors iff the following system of equations is satisfied

$$\begin{aligned}&\left( \frac{\sigma ^2}{2}-\lambda \right) \bar{\eta }+2(\bar{E}_0-2\bar{E}_1+\bar{E}_2)\eta = 0\\&\left( \frac{\sigma ^2}{2}-\lambda \right) \eta +2(E_0-2E_1+E_2)\bar{\eta }= 0 \end{aligned}$$

The corresponding eigenvalues read

$$\begin{aligned} \lambda = \frac{\sigma ^2}{2}\pm 2\sqrt{(\bar{E}_0-2\bar{E}_1+\bar{E}_2)(E_0-2E_1+E_2)}, \end{aligned}$$

with degeneracy $p(p-3)/2$. Finally the RS stability condition reads

$$\begin{aligned} \frac{1}{\sigma ^2} > \sqrt{\mathsf {E}_{x,u}\Bigl ({\text {sech}}^4\bigl (h(x,u)\bigr )\Bigr )\mathsf {E}_{x,v}\Bigl ({\text {sech}}^4\bigl (\bar{h}(x,v)\bigr )\Bigr )}, \end{aligned}$$

which reduces to the same form of the AT line for the SK model when $\kappa =1$, except for the u and v averages that are specific to our model. As seen in Fig. 2 the influence of $\kappa $ is very limited.

Appendix B: Synthetic Dataset

The multimodal distribution modeling the N-dimensional synthetic data is

$$\begin{aligned} P(s) = \sum _{c=1}^C p_c\prod _{i=1}^N \frac{e^{h_i^c s_i}}{2\cosh (h_i^c)}, \end{aligned}$$

(45)

where $C$ is the number of clusters, $p_c$ is a weight and ${\varvec{h}}^c$ is a hidden field for cluster $c$. The values for $p_c$ are taken at random and normalized, while to compute $h_i^c$ we take into account the magnetizations $m_i^c = \tanh (h_i^c)$. Expanding over the spectral modes, we can set an effective dimension $d$ by constraining the sum to the range $\alpha = 1, \dots , d $

$$\begin{aligned} m_i^c = \sum _{\alpha = 1}^d m_{\alpha }^c u_i^\alpha \end{aligned}$$

(46)

Clusters’ magnetizations $m_{\alpha }^c$ are drawn at random between $[-1, 1]$ and normalized with the factor

$$\begin{aligned} Z = \sqrt{\frac{\sum _{\alpha } m_{\alpha }^2}{d \cdot r}}, \quad r = \tanh (\eta ) \end{aligned}$$

(47)

where $r$ is introduced to decrease the clusters’ polarizations (in our simulations, we used $\eta = 0.3$). The spectral basis $ u_i^\alpha $ is obtained by drawing at random $d$ N-dimensional vectors and applying the Gram-Schmidt process (which can be safely employed as N is supposedly big and thus the initial vectors are nearly orthogonal). The hidden fields are then obtained from the magnetizations

$$\begin{aligned} h_i^c = \tanh ^{-1}(m_i^c) \end{aligned}$$

(48)

and the samples are generated by choosing a cluster according to $p_c$ and setting the visible variables to $ \pm 1$ according to

$$\begin{aligned} p(s_i = 1) = \frac{1}{1 + e^{-2 h_i^c}} \end{aligned}$$

(49)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Decelle, A., Fissore, G. & Furtlehner, C. Thermodynamics of Restricted Boltzmann Machines and Related Learning Dynamics. J Stat Phys 172, 1576–1608 (2018). https://doi.org/10.1007/s10955-018-2105-y

Download citation

Received: 08 March 2018
Accepted: 26 June 2018
Published: 04 July 2018
Issue Date: September 2018
DOI: https://doi.org/10.1007/s10955-018-2105-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Thermodynamics of Restricted Boltzmann Machines and Related Learning Dynamics

Abstract

Access this article

Similar content being viewed by others

A Data–Driven Approximation of the Koopman Operator: Extending Dynamic Mode Decomposition

Optimization for Deep Learning: An Overview

Hyperparameter Optimization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: AT Line

Appendix B: Synthetic Dataset

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Thermodynamics of Restricted Boltzmann Machines and Related Learning Dynamics

Abstract

Access this article

Similar content being viewed by others

A Data–Driven Approximation of the Koopman Operator: Extending Dynamic Mode Decomposition

Optimization for Deep Learning: An Overview

Hyperparameter Optimization

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: AT Line

Appendix B: Synthetic Dataset

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation