The restricted Boltzmann machine is a network of stochastic units with undirected interactions between pairs of visible and hidden units. This model was popularized as a building block of deep learning architectures and has continued to play an important role in applied and theoretical machine learning. Restricted Boltzmann machines carry a rich structure, with connections to geometry, applied algebra, probability, statistics, machine learning, and other areas. The analysis of these models is attractive in its own right and also as a platform to combine and generalize mathematical tools for graphical models with hidden variables. This article gives an introduction to the mathematical analysis of restricted Boltzmann machines, reviews recent results on the geometry of the sets of probability distributions representable by these models, and suggests a few directions for further investigation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Ackley, D.H., Hinton, G.E., Sejnowski, T.J.: A learning algorithm for Boltzmann machines. Cogn. Sci. 9(1), 147–169 (1985)
Hinton, G.E., Sejnowski, T.J.: Analyzing cooperative computation. In: Proceedings of the Fifth Annual Conference of the Cognitive Science Society. Rochester, NY (1983)
Hinton, G.E., Sejnowski, T.J.: Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Chapter learning and relearning in boltzmann machines, pp. 282–317. MIT Press, USA (1986)
Hopfield, J.J.: Neurocomputing: Foundations of Research. Chapter neural networks and physical systems with emergent collective computational abilities, pp. 457–464. MIT Press, USA (1988)
Huang, K.: Statistical Mechanics. Wiley, New York (2000)
Gibbs, J.: Elementary Principles in Statistical Mechanics: Developed with Especial Reference to the Rational Foundations of Thermodynamics. C. Scribner’s sons (1902)
Brown, L.: Fundamentals of Statistical Exponential Families: With Applications in Statistical Decision Theory. Institute of Mathematical Statistics, USA (1986)
Jordan, M.I.: Graphical models. Stat. Sci. 19(1), 140–155 (2004)
Lauritzen, S.L.: Graphical Models. Oxford University Press, USA (1996)
Amari, S.: Information geometry on hierarchical decomposition of stochastic interactions. IEEE Trans. Inf. Theory 47, 1701–1711 (1999)
Amari, S.: Information Geometry and its Applications. Applied mathematical sciences, vol. 194. Springer, Japan (2016)
Amari, S., Nagaoka, H.: Methods of Information Geometry. Translations of mathematical monographs. American Mathematical Society (2007)
Ay, N., Jost, J., Lê, H., Schwachhöfer, L.: Information Geometry. Ergebnisse der Mathematik und ihrer Grenzgebiete, vol. 64. Springer, Berlin (2017)
Drton, M., Sturmfels, B., Sullivant, S.: Lectures on Algebraic Statistics. Springer, Oberwolfach Seminars (2009)
Sullivant, S.: Algebraic Statistics (2018)
Amari, S., Kurata, K., Nagaoka, H.: Information geometry of Boltzmann machines. IEEE Trans. Neural Netw. 3(2), 260–271 (1992)
Smolensky, P.: Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Chapter information processing in dynamical systems: foundations of harmony theory, vol. 1, pp. 194–281. MIT Press, USA (1986)
Freund, Y., Haussler, D.: Unsupervised learning of distributions on binary vectors using two layer networks. In: Moody, J.E., Hanson, S.J., Lippmann, R.P. (eds.) Advances in Neural Information Processing Systems 4, pp. 912–919. Morgan-Kaufmann (1992)
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009). Also published as a book. Now Publishers
Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Fischer, A., Igel, C.: An introduction to restricted Boltzmann machines. In: Alvarez, L., Mejail, M., Gomez, L., Jacobo, J. (eds.) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pp. 14–36. Springer, Heidelberg (2012)
Le Roux, N., Bengio, Y.: Representational power of restricted Boltzmann machines and deep belief networks. Neural Comput. 20(6), 1631–1649 (2008)
Montúfar, G., Rauh, J.: Hierarchical models as marginals of hierarchical models. Int. J. Approx. Reason. 88, 531–546 (2017). (Supplement C)
Montúfar, G., Rauh, J., Ay, N.: Expressive power and approximation errors of restricted Boltzmann machines. Adv. Neural Inf. Process. Syst. 24, 415–423 (2011)
Younes, L.: Synchronous Boltzmann machines can be universal approximators. Appl. Math. Lett. 9(3), 109–113 (1996)
Cueto, M.A., Morton, J., Sturmfels, B.: Geometry of the restricted Boltzmann machine. In: Viana, M.A.G., Wynn, H.P. (eds.) Algebraic methods in statistics and probability II, AMS Special Session, vol. 2. American Mathematical Society (2010)
Montúfar, G., Morton, J.: Discrete restricted Boltzmann machines. In: Proceedings of the 1-st International Conference on Learning Representations (ICLR2013) (2013)
Montúfar, G., Morton, J.: Dimension of marginals of Kronecker product models. SIAM J. Appl. Algebra Geom. 1(1), 126–151 (2017)
Cueto, M.A., Tobis, E.A., Yu, J.: An implicitization challenge for binary factor analysis. J. Symb. Comput. 45(12), 1296–1315 (2010)
Seigal, A., Montúfar, G.: Mixtures and products in two graphical models. To Appear J. Algebraic Stat. (2018). arXiv:1709.05276
Martens, J., Chattopadhya, A., Pitassi, T., Zemel, R.: On the representational efficiency of restricted Boltzmann machines. In: Advances in Neural Information Processing Systems 26, pp. 2877–2885. Curran Associates, Inc., USA (2013)
Montúfar, G., Morton, J.: When does a mixture of products contain a product of mixtures? SIAM J. Discret. Math. 29(1), 321–347 (2015)
Fischer, A., Igel, C.: Bounding the bias of contrastive divergence learning. Neural Comput. 23(3), 664–673 (2010)
Fischer, A., Igel, C.: A bound for the convergence rate of parallel tempering for sampling restricted Boltzmann machines. Theor. Comput. Sci. 598, 102–117 (2015)
Aoyagi, M.: Stochastic complexity and generalization error of a restricted Boltzmann machine in Bayesian estimation. J. Mach. Learn. Res. 99, 1243–1272 (2010)
Fischer, A., Igel, C.: Contrastive divergence learning may diverge when training restricted Boltzmann machines. In: Frontiers in Computational Neuroscience. Bernstein Conference on Computational Neuroscience (BCCN 2009) (2009)
Salakhutdinov, R.: Learning and evaluating Boltzmann machines. Technical report, 2008
Karakida, R., Okada, M., Amari, S.: Dynamical analysis of contrastive divergence learning: restricted Boltzmann machines with Gaussian visible units. Neural Netw. 79, 78–87 (2016)
Salakhutdinov, R., Mnih, A., Hinton, G.E.: Restricted Boltzmann machines for collaborative filtering. In: Proceedings of the 24th international conference on Machine learning, ICML ’07, pp. 791–798. ACM, NY (2007)
Welling, M., Rosen-Zvi, M., Hinton, G.E.: Exponential family harmoniums with an application to information retrieval. Adv. Neural Inf. Process. Syst. 17, 1481–1488 (2005)
Sejnowski, T.J.: Higher-order Boltzmann machines. Neural Networks for Computing, pp. 398–403. American Institute of Physics (1986)
Salakhutdinov, R., Hinton, G.E.: Deep Boltzmann machines. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS 09), pp. 448–455 (2009)
Montúfar, G.: Universal approximation depth and errors of narrow belief networks with discrete units. Neural Comput. 26(7), 1386–1407 (2014)
Montúfar, G., Ay, N.: Refinements of universal approximation results for deep belief networks and restricted Boltzmann machines. Neural Comput. 23(5), 1306–1319 (2011)
Sutskever, I., Hinton, G.E.: Deep, narrow sigmoid belief networks are universal approximators. Neural Comput. 20(11), 2629–2636 (2008)
Montúfar, G.: Deep narrow Boltzmann machines are universal approximators. International Conference on Learning Representations (ICLR 15) (2015). arXiv:1411.3784
Montúfar, G., Ay, N., Ghazi-Zahedi, K.: Geometry and expressive power of conditional restricted Boltzmann machines. J. Mach. Learn. Res. 16, 2405–2436 (2015)
Amin, M.H., Andriyash, E., Rolfe, J., Kulchytskyy, B., Melko, R.: Quantum Boltzmann machine. Phys. Rev. X 8, 021050 (2018)
Zhang, N., Ding, S., Zhang, J., Xue, Y.: An overview on restricted Boltzmann machines. Neurocomputing 275, 1186–1199 (2018)
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771–1800 (2002)
Tieleman, T.: Training restricted Boltzmann machines using approximations to the likelihood gradient. In: Proceedings of the 25th International Conference on Machine Learning, ICML ’08, pp. 1064–1071. ACM, USA (2008)
Salakhutdinov, R.: Learning in Markov random fields using tempered transitions. In: Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems 22, pp. 1598–1606. Curran Associates, Inc., (2009)
Fischer, A., Igel, C.: Training restricted Boltzmann machines: an introduction. Pattern Recognit. 47(1), 25–39 (2014)
Hinton. G.E.: A practical guide to training restricted Boltzmann machines, version 1. Technical report, UTML2010-003, University of Toronto, 2010
Amari, S.: Differential-geometrical Methods in Statistics. Lecture notes in statistics. Springer, Berlin (1985)
Amari, S.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998)
Rao, R.C.: Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 37, 81–91 (1945)
Watanabe, S.: Algebraic Geometry and Statistical Learning Theory. Cambridge University Press, USA (2009)
Grosse, R., Salakhudinov, R.: Scaling up natural gradient by sparsely factorizing the inverse fisher matrix. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning of Research, vol. 37, pp. 2304–2313. PMLR, France, 07–09 Jul (2015)
Pascanu, R., Bengio, Y.: Revisiting natural gradient for deep networks. In: International Conference on Learning Representations 2014 (Conference Track) (2014)
Li, W., Montúfar, G.: Natural gradient via optimal transport I (2018). arXiv:1803.07033
Montavon, G., Müller, K.-R., Cuturi, M.: Wasserstein training of restricted boltzmann machines. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pp. 3718–3726. Curran Associates Inc., USA (2016)
Csiszár, I., Tusnády, G.: Information Geometry and Alternating minimization procedures. Statistics and decisions (1984). Supplement Issue 1
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. Ser. B (Methodological) 39(1), 1–38 (1977)
Draisma, J.: A tropical approach to secant dimensions. J. Pure Appl. Algebra 212(2), 349–363 (2008)
Bieri, R., Groves, J.: The geometry of the set of characters iduced by valuations. Journal für die reine und angewandte Mathematik 347, 168–195 (1984)
Catalisano, M., Geramita, A., Gimigliano, A.: Secant varieties of \(\mathbb{P}^1\times \dots \times \mathbb{P}^1\) (\(n\)-times) are not defective for \(n\ge 5\). J. Algebraic Geom. 20, 295–327 (2011)
Freund, Y., Haussler, D.: Unsupervised learning of distributions on binary vectors using two layer networks. Technical report, Santa Cruz, CA, USA 1994
Rauh, J.: Optimally approximating exponential families. Kybernetika 49(2), 199–215 (2013)
Matúš, F.: Divergence from factorizable distributions and matroid representations by partitions. IEEE Trans. Inf. Theory 55(12), 5375–5381 (2009)
Matúš, F., Ay, N.: On maximization of the information divergence from an exponential family. In: Proceedings of the WUPES’03, pp. 199–204 (2003)
Rauh, J.: Finding the maximizers of the information divergence from an exponential family. IEEE Trans. Inf. Theory 57(6), 3236–3247 (2011)
Montúfar, G., Rauh, J., Ay, N.: Geometric Science of Information: First International Conference, GSI 2013, Paris, France, August 28-30, 2013. Proceedings. Chapter maximal information divergence from statistical models defined by neural networks, pp. 759–766. Springer, Heidelberg (2013)
Montúfar, G., Rauh, J.: Scaling of model approximation errors and expected entropy distances. Kybernetika 50(2), 234–245 (2014)
Allman, E., Cervantes, H.B., Evans, R., Hoşten, S., Kubjas, K., Lemke, D., Rhodes, J., Zwiernik, P.: Maximum likelihood estimation of the latent class model through model boundary decomposition (2017)
Hammersley, J.M., Clifford, P.E.: Markov Random Fields on Finite Graphs and Lattices (1971). Unpublished manuscript
Geiger, D., Meek, C., Sturmfels, B.: On the toric algebra of graphical models. Ann. Stat. 34(3), 1463–1492 (2006)
Steudel, B., Ay, N.: Information-theoretic inference of common ancestors. Entropy 17(4), 2304 (2015)
I thank Shun-ichi Amari for inspiring discussions over the years. This review article originated at the IGAIA IV conference in 2016 dedicated to his 80th birthday. I am grateful to Nihat Ay, Johannes Rauh, Jason Morton, and more recently Anna Seigal for our collaborations. I thank Fero Matúš for discussions on the divergence maximization for hierarchical models, lastly at the MFO Algebraic Statistics meeting in 2017. I thank Bernd Sturmfels for many fruitful discussions, and Dave Ackley for insightful discussions at the Santa Fe Institute in 2016. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no 757983).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Montúfar, G. (2018). Restricted Boltzmann Machines: Introduction and Review. In: Ay, N., Gibilisco, P., Matúš, F. (eds) Information Geometry and Its Applications . IGAIA IV 2016. Springer Proceedings in Mathematics & Statistics, vol 252. Springer, Cham. https://doi.org/10.1007/978-3-319-97798-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-97798-0_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97797-3
Online ISBN: 978-3-319-97798-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)