Skip to main content

Restricted Boltzmann Machines: Introduction and Review

  • Conference paper
  • First Online:
Information Geometry and Its Applications (IGAIA IV 2016)

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 252))

Included in the following conference series:

Abstract

The restricted Boltzmann machine is a network of stochastic units with undirected interactions between pairs of visible and hidden units. This model was popularized as a building block of deep learning architectures and has continued to play an important role in applied and theoretical machine learning. Restricted Boltzmann machines carry a rich structure, with connections to geometry, applied algebra, probability, statistics, machine learning, and other areas. The analysis of these models is attractive in its own right and also as a platform to combine and generalize mathematical tools for graphical models with hidden variables. This article gives an introduction to the mathematical analysis of restricted Boltzmann machines, reviews recent results on the geometry of the sets of probability distributions representable by these models, and suggests a few directions for further investigation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ackley, D.H., Hinton, G.E., Sejnowski, T.J.: A learning algorithm for Boltzmann machines. Cogn. Sci. 9(1), 147–169 (1985)

    Article  Google Scholar 

  2. Hinton, G.E., Sejnowski, T.J.: Analyzing cooperative computation. In: Proceedings of the Fifth Annual Conference of the Cognitive Science Society. Rochester, NY (1983)

    Google Scholar 

  3. Hinton, G.E., Sejnowski, T.J.: Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Chapter learning and relearning in boltzmann machines, pp. 282–317. MIT Press, USA (1986)

    Google Scholar 

  4. Hopfield, J.J.: Neurocomputing: Foundations of Research. Chapter neural networks and physical systems with emergent collective computational abilities, pp. 457–464. MIT Press, USA (1988)

    Google Scholar 

  5. Huang, K.: Statistical Mechanics. Wiley, New York (2000)

    Google Scholar 

  6. Gibbs, J.: Elementary Principles in Statistical Mechanics: Developed with Especial Reference to the Rational Foundations of Thermodynamics. C. Scribner’s sons (1902)

    Google Scholar 

  7. Brown, L.: Fundamentals of Statistical Exponential Families: With Applications in Statistical Decision Theory. Institute of Mathematical Statistics, USA (1986)

    Google Scholar 

  8. Jordan, M.I.: Graphical models. Stat. Sci. 19(1), 140–155 (2004)

    Article  MathSciNet  Google Scholar 

  9. Lauritzen, S.L.: Graphical Models. Oxford University Press, USA (1996)

    Google Scholar 

  10. Amari, S.: Information geometry on hierarchical decomposition of stochastic interactions. IEEE Trans. Inf. Theory 47, 1701–1711 (1999)

    Article  Google Scholar 

  11. Amari, S.: Information Geometry and its Applications. Applied mathematical sciences, vol. 194. Springer, Japan (2016)

    Book  Google Scholar 

  12. Amari, S., Nagaoka, H.: Methods of Information Geometry. Translations of mathematical monographs. American Mathematical Society (2007)

    Google Scholar 

  13. Ay, N., Jost, J., Lê, H., Schwachhöfer, L.: Information Geometry. Ergebnisse der Mathematik und ihrer Grenzgebiete, vol. 64. Springer, Berlin (2017)

    Google Scholar 

  14. Drton, M., Sturmfels, B., Sullivant, S.: Lectures on Algebraic Statistics. Springer, Oberwolfach Seminars (2009)

    Book  Google Scholar 

  15. Sullivant, S.: Algebraic Statistics (2018)

    Google Scholar 

  16. Amari, S., Kurata, K., Nagaoka, H.: Information geometry of Boltzmann machines. IEEE Trans. Neural Netw. 3(2), 260–271 (1992)

    Article  Google Scholar 

  17. Smolensky, P.: Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Chapter information processing in dynamical systems: foundations of harmony theory, vol. 1, pp. 194–281. MIT Press, USA (1986)

    Google Scholar 

  18. Freund, Y., Haussler, D.: Unsupervised learning of distributions on binary vectors using two layer networks. In: Moody, J.E., Hanson, S.J., Lippmann, R.P. (eds.) Advances in Neural Information Processing Systems 4, pp. 912–919. Morgan-Kaufmann (1992)

    Google Scholar 

  19. Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009). Also published as a book. Now Publishers

    Google Scholar 

  20. Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  Google Scholar 

  21. Fischer, A., Igel, C.: An introduction to restricted Boltzmann machines. In: Alvarez, L., Mejail, M., Gomez, L., Jacobo, J. (eds.) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pp. 14–36. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  22. Le Roux, N., Bengio, Y.: Representational power of restricted Boltzmann machines and deep belief networks. Neural Comput. 20(6), 1631–1649 (2008)

    Article  MathSciNet  Google Scholar 

  23. Montúfar, G., Rauh, J.: Hierarchical models as marginals of hierarchical models. Int. J. Approx. Reason. 88, 531–546 (2017). (Supplement C)

    Article  MathSciNet  Google Scholar 

  24. Montúfar, G., Rauh, J., Ay, N.: Expressive power and approximation errors of restricted Boltzmann machines. Adv. Neural Inf. Process. Syst. 24, 415–423 (2011)

    Google Scholar 

  25. Younes, L.: Synchronous Boltzmann machines can be universal approximators. Appl. Math. Lett. 9(3), 109–113 (1996)

    Article  MathSciNet  Google Scholar 

  26. Cueto, M.A., Morton, J., Sturmfels, B.: Geometry of the restricted Boltzmann machine. In: Viana, M.A.G., Wynn, H.P. (eds.) Algebraic methods in statistics and probability II, AMS Special Session, vol. 2. American Mathematical Society (2010)

    Google Scholar 

  27. Montúfar, G., Morton, J.: Discrete restricted Boltzmann machines. In: Proceedings of the 1-st International Conference on Learning Representations (ICLR2013) (2013)

    Google Scholar 

  28. Montúfar, G., Morton, J.: Dimension of marginals of Kronecker product models. SIAM J. Appl. Algebra Geom. 1(1), 126–151 (2017)

    Article  MathSciNet  Google Scholar 

  29. Cueto, M.A., Tobis, E.A., Yu, J.: An implicitization challenge for binary factor analysis. J. Symb. Comput. 45(12), 1296–1315 (2010)

    Article  MathSciNet  Google Scholar 

  30. Seigal, A., Montúfar, G.: Mixtures and products in two graphical models. To Appear J. Algebraic Stat. (2018). arXiv:1709.05276

  31. Martens, J., Chattopadhya, A., Pitassi, T., Zemel, R.: On the representational efficiency of restricted Boltzmann machines. In: Advances in Neural Information Processing Systems 26, pp. 2877–2885. Curran Associates, Inc., USA (2013)

    Google Scholar 

  32. Montúfar, G., Morton, J.: When does a mixture of products contain a product of mixtures? SIAM J. Discret. Math. 29(1), 321–347 (2015)

    Article  MathSciNet  Google Scholar 

  33. Fischer, A., Igel, C.: Bounding the bias of contrastive divergence learning. Neural Comput. 23(3), 664–673 (2010)

    Article  MathSciNet  Google Scholar 

  34. Fischer, A., Igel, C.: A bound for the convergence rate of parallel tempering for sampling restricted Boltzmann machines. Theor. Comput. Sci. 598, 102–117 (2015)

    Article  MathSciNet  Google Scholar 

  35. Aoyagi, M.: Stochastic complexity and generalization error of a restricted Boltzmann machine in Bayesian estimation. J. Mach. Learn. Res. 99, 1243–1272 (2010)

    MathSciNet  MATH  Google Scholar 

  36. Fischer, A., Igel, C.: Contrastive divergence learning may diverge when training restricted Boltzmann machines. In: Frontiers in Computational Neuroscience. Bernstein Conference on Computational Neuroscience (BCCN 2009) (2009)

    Google Scholar 

  37. Salakhutdinov, R.: Learning and evaluating Boltzmann machines. Technical report, 2008

    Google Scholar 

  38. Karakida, R., Okada, M., Amari, S.: Dynamical analysis of contrastive divergence learning: restricted Boltzmann machines with Gaussian visible units. Neural Netw. 79, 78–87 (2016)

    Article  Google Scholar 

  39. Salakhutdinov, R., Mnih, A., Hinton, G.E.: Restricted Boltzmann machines for collaborative filtering. In: Proceedings of the 24th international conference on Machine learning, ICML ’07, pp. 791–798. ACM, NY (2007)

    Google Scholar 

  40. Welling, M., Rosen-Zvi, M., Hinton, G.E.: Exponential family harmoniums with an application to information retrieval. Adv. Neural Inf. Process. Syst. 17, 1481–1488 (2005)

    Google Scholar 

  41. Sejnowski, T.J.: Higher-order Boltzmann machines. Neural Networks for Computing, pp. 398–403. American Institute of Physics (1986)

    Google Scholar 

  42. Salakhutdinov, R., Hinton, G.E.: Deep Boltzmann machines. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS 09), pp. 448–455 (2009)

    Google Scholar 

  43. Montúfar, G.: Universal approximation depth and errors of narrow belief networks with discrete units. Neural Comput. 26(7), 1386–1407 (2014)

    Article  MathSciNet  Google Scholar 

  44. Montúfar, G., Ay, N.: Refinements of universal approximation results for deep belief networks and restricted Boltzmann machines. Neural Comput. 23(5), 1306–1319 (2011)

    Article  MathSciNet  Google Scholar 

  45. Sutskever, I., Hinton, G.E.: Deep, narrow sigmoid belief networks are universal approximators. Neural Comput. 20(11), 2629–2636 (2008)

    Article  Google Scholar 

  46. Montúfar, G.: Deep narrow Boltzmann machines are universal approximators. International Conference on Learning Representations (ICLR 15) (2015). arXiv:1411.3784

  47. Montúfar, G., Ay, N., Ghazi-Zahedi, K.: Geometry and expressive power of conditional restricted Boltzmann machines. J. Mach. Learn. Res. 16, 2405–2436 (2015)

    MathSciNet  MATH  Google Scholar 

  48. Amin, M.H., Andriyash, E., Rolfe, J., Kulchytskyy, B., Melko, R.: Quantum Boltzmann machine. Phys. Rev. X 8, 021050 (2018)

    Google Scholar 

  49. Zhang, N., Ding, S., Zhang, J., Xue, Y.: An overview on restricted Boltzmann machines. Neurocomputing 275, 1186–1199 (2018)

    Article  Google Scholar 

  50. Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14, 1771–1800 (2002)

    Article  Google Scholar 

  51. Tieleman, T.: Training restricted Boltzmann machines using approximations to the likelihood gradient. In: Proceedings of the 25th International Conference on Machine Learning, ICML ’08, pp. 1064–1071. ACM, USA (2008)

    Google Scholar 

  52. Salakhutdinov, R.: Learning in Markov random fields using tempered transitions. In: Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems 22, pp. 1598–1606. Curran Associates, Inc., (2009)

    Google Scholar 

  53. Fischer, A., Igel, C.: Training restricted Boltzmann machines: an introduction. Pattern Recognit. 47(1), 25–39 (2014)

    Article  Google Scholar 

  54. Hinton. G.E.: A practical guide to training restricted Boltzmann machines, version 1. Technical report, UTML2010-003, University of Toronto, 2010

    Google Scholar 

  55. Amari, S.: Differential-geometrical Methods in Statistics. Lecture notes in statistics. Springer, Berlin (1985)

    Book  Google Scholar 

  56. Amari, S.: Natural gradient works efficiently in learning. Neural Comput. 10(2), 251–276 (1998)

    Article  Google Scholar 

  57. Rao, R.C.: Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 37, 81–91 (1945)

    MathSciNet  MATH  Google Scholar 

  58. Watanabe, S.: Algebraic Geometry and Statistical Learning Theory. Cambridge University Press, USA (2009)

    Book  Google Scholar 

  59. Grosse, R., Salakhudinov, R.: Scaling up natural gradient by sparsely factorizing the inverse fisher matrix. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning of Research, vol. 37, pp. 2304–2313. PMLR, France, 07–09 Jul (2015)

    Google Scholar 

  60. Pascanu, R., Bengio, Y.: Revisiting natural gradient for deep networks. In: International Conference on Learning Representations 2014 (Conference Track) (2014)

    Google Scholar 

  61. Li, W., Montúfar, G.: Natural gradient via optimal transport I (2018). arXiv:1803.07033

  62. Montavon, G., Müller, K.-R., Cuturi, M.: Wasserstein training of restricted boltzmann machines. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pp. 3718–3726. Curran Associates Inc., USA (2016)

    Google Scholar 

  63. Csiszár, I., Tusnády, G.: Information Geometry and Alternating minimization procedures. Statistics and decisions (1984). Supplement Issue 1

    Google Scholar 

  64. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. R. Stat. Soc. Ser. B (Methodological) 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  65. Draisma, J.: A tropical approach to secant dimensions. J. Pure Appl. Algebra 212(2), 349–363 (2008)

    Article  MathSciNet  Google Scholar 

  66. Bieri, R., Groves, J.: The geometry of the set of characters iduced by valuations. Journal für die reine und angewandte Mathematik 347, 168–195 (1984)

    MATH  Google Scholar 

  67. Catalisano, M., Geramita, A., Gimigliano, A.: Secant varieties of \(\mathbb{P}^1\times \dots \times \mathbb{P}^1\) (\(n\)-times) are not defective for \(n\ge 5\). J. Algebraic Geom. 20, 295–327 (2011)

    Article  MathSciNet  Google Scholar 

  68. Freund, Y., Haussler, D.: Unsupervised learning of distributions on binary vectors using two layer networks. Technical report, Santa Cruz, CA, USA 1994

    Google Scholar 

  69. Rauh, J.: Optimally approximating exponential families. Kybernetika 49(2), 199–215 (2013)

    MathSciNet  MATH  Google Scholar 

  70. Matúš, F.: Divergence from factorizable distributions and matroid representations by partitions. IEEE Trans. Inf. Theory 55(12), 5375–5381 (2009)

    Article  MathSciNet  Google Scholar 

  71. Matúš, F., Ay, N.: On maximization of the information divergence from an exponential family. In: Proceedings of the WUPES’03, pp. 199–204 (2003)

    Google Scholar 

  72. Rauh, J.: Finding the maximizers of the information divergence from an exponential family. IEEE Trans. Inf. Theory 57(6), 3236–3247 (2011)

    Article  MathSciNet  Google Scholar 

  73. Montúfar, G., Rauh, J., Ay, N.: Geometric Science of Information: First International Conference, GSI 2013, Paris, France, August 28-30, 2013. Proceedings. Chapter maximal information divergence from statistical models defined by neural networks, pp. 759–766. Springer, Heidelberg (2013)

    Google Scholar 

  74. Montúfar, G., Rauh, J.: Scaling of model approximation errors and expected entropy distances. Kybernetika 50(2), 234–245 (2014)

    MathSciNet  MATH  Google Scholar 

  75. Allman, E., Cervantes, H.B., Evans, R., Hoşten, S., Kubjas, K., Lemke, D., Rhodes, J., Zwiernik, P.: Maximum likelihood estimation of the latent class model through model boundary decomposition (2017)

    Google Scholar 

  76. Hammersley, J.M., Clifford, P.E.: Markov Random Fields on Finite Graphs and Lattices (1971). Unpublished manuscript

    Google Scholar 

  77. Geiger, D., Meek, C., Sturmfels, B.: On the toric algebra of graphical models. Ann. Stat. 34(3), 1463–1492 (2006)

    Article  MathSciNet  Google Scholar 

  78. Steudel, B., Ay, N.: Information-theoretic inference of common ancestors. Entropy 17(4), 2304 (2015)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

I thank Shun-ichi Amari for inspiring discussions over the years. This review article originated at the IGAIA IV conference in 2016 dedicated to his 80th birthday. I am grateful to Nihat Ay, Johannes Rauh, Jason Morton, and more recently Anna Seigal for our collaborations. I thank Fero Matúš for discussions on the divergence maximization for hierarchical models, lastly at the MFO Algebraic Statistics meeting in 2017. I thank Bernd Sturmfels for many fruitful discussions, and Dave Ackley for insightful discussions at the Santa Fe Institute in 2016. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement no 757983).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guido Montúfar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Montúfar, G. (2018). Restricted Boltzmann Machines: Introduction and Review. In: Ay, N., Gibilisco, P., Matúš, F. (eds) Information Geometry and Its Applications . IGAIA IV 2016. Springer Proceedings in Mathematics & Statistics, vol 252. Springer, Cham. https://doi.org/10.1007/978-3-319-97798-0_4

Download citation

Publish with us

Policies and ethics