An Introduction to Restricted Boltzmann Machines

  • Asja Fischer
  • Christian Igel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7441)


Restricted Boltzmann machines (RBMs) are probabilistic graphical models that can be interpreted as stochastic neural networks. The increase in computational power and the development of faster learning algorithms have made them applicable to relevant machine learning problems. They attracted much attention recently after being proposed as building blocks of multi-layer learning systems called deep belief networks. This tutorial introduces RBMs as undirected graphical models. The basic concepts of graphical models are introduced first, however, basic knowledge in statistics is presumed. Different learning algorithms for RBMs are discussed. As most of them are based on Markov chain Monte Carlo (MCMC) methods, an introduction to Markov chains and the required MCMC techniques is provided.


Markov Chain Markov Chain Monte Carlo Gibbs Sampling Markov Chain Monte Carlo Method Restrict Boltzmann Machine 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Ackley, D.H., Hinton, G.E., Sejnowski, T.J.: A learning algorithm for Boltzmann machines. Cognitive Science 9, 147–169 (1985)CrossRefGoogle Scholar
  2. 2.
    Bengio, Y.: Learning deep architectures for AI. Foundations and Trends in Machine Learning 21(6), 1601–1621 (2009)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Bengio, Y., Delalleau, O.: Justifying and generalizing contrastive divergence. Neural Computation 21(6), 1601–1621 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., Montreal, U.: Greedy layer-wise training of deep networks. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing (NIPS 19), pp. 153–160. MIT Press (2007)Google Scholar
  5. 5.
    Bishop, C.M.: Pattern recognition and machine learning. Springer (2006)Google Scholar
  6. 6.
    Brémaud, P.: Markov chains: Gibbs fields, Monte Carlo simulation, and queues. Springer (1999)Google Scholar
  7. 7.
    Carreira-Perpiñán, M.Á., Hinton, G.E.: On contrastive divergence learning. In: 10th International Workshop on Artificial Intelligence and Statistics (AISTATS 2005), pp. 59–66 (2005)Google Scholar
  8. 8.
    Cho, K., Raiko, T., Ilin, A.: Parallel tempering is efficient for learning restricted Boltzmann machines. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN 2010), pp. 3246–3253. IEEE Press (2010)Google Scholar
  9. 9.
    Desjardins, G., Courville, A., Bengio, Y.: Adaptive parallel tempering for stochastic maximum likelihood learning of RBMs. In: Lee, H., Ranzato, M., Bengio, Y., Hinton, G., LeCun, Y., Ng, A.Y. (eds.) NIPS 2010 Workshop on Deep Learning and Unsupervised Feature Learning (2010)Google Scholar
  10. 10.
    Desjardins, G., Courville, A., Bengio, Y., Vincent, P., Dellaleau, O.: Parallel tempering for training of restricted Boltzmann machines. In: JMLR Workshop and Conference Proceedings: AISTATS 2010, vol. 9, pp. 145–152 (2010)Google Scholar
  11. 11.
    Fischer, A., Igel, C.: Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines. In: Diamantaras, K., Duch, W., Iliadis, L.S. (eds.) ICANN 2010, Part III. LNCS, vol. 6354, pp. 208–217. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  12. 12.
    Fischer, A., Igel, C.: Bounding the bias of contrastive divergence learning. Neural Computation 23, 664–673 (2011)MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721–741 (1984)zbMATHCrossRefGoogle Scholar
  14. 14.
    Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970)zbMATHCrossRefGoogle Scholar
  15. 15.
    Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Computation 14, 1771–1800 (2002)zbMATHCrossRefGoogle Scholar
  16. 16.
    Hinton, G.E.: Boltzmann machine. Scholarpedia 2(5), 1668 (2007)CrossRefGoogle Scholar
  17. 17.
    Hinton, G.E.: Learning multiple layers of representation. Trends in Cognitive Sciences 11(10), 428–434 (2007)CrossRefGoogle Scholar
  18. 18.
    Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Computation 18(7), 1527–1554 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
  19. 19.
    Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
  20. 20.
    Igel, C., Glasmachers, T., Heidrich-Meisner, V.: Shark. Journal of Machine Learning Research 9, 993–996 (2008)zbMATHGoogle Scholar
  21. 21.
    Kivinen, J., Williams, C.: Multiple texture boltzmann machines. In: JMLR Workshop and Conference Proceedings: AISTATS 2012, vol. 22, pp. 638–646 (2012)Google Scholar
  22. 22.
    Koller, D., Friedman, N.: Probabilistic graphical models: Principles and techniques. MIT Press (2009)Google Scholar
  23. 23.
    Lauritzen, S.L.: Graphical models. Oxford University Press (1996)Google Scholar
  24. 24.
    Le Roux, N., Bengio, Y.: Representational power of restricted Boltzmann machines and deep belief networks. Neural Computation 20(6), 1631–1649 (2008)MathSciNetzbMATHCrossRefGoogle Scholar
  25. 25.
    Le Roux, N., Heess, N., Shotton, J., Winn, J.M.: Learning a generative model of images by factoring appearance and shape. Neural Computation 23(3), 593–650 (2011)MathSciNetzbMATHCrossRefGoogle Scholar
  26. 26.
    Lingenheil, M., Denschlag, R., Mathias, G., Tavan, P.: Efficiency of exchange schemes in replica exchange. Chemical Physics Letters 478, 80–84 (2009)CrossRefGoogle Scholar
  27. 27.
    MacKay, D.J.C.: Failures of the one-step learning algorithm. Cavendish Laboratory, Madingley Road, Cambridge CB3 0HE, UK (2001),
  28. 28.
    MacKay, D.J.C.: Information Theory, Inference & Learning Algorithms. Cambridge University Press (2002)Google Scholar
  29. 29.
    Mnih, V., Larochelle, H., Hinton, G.: Conditional restricted Boltzmann machines for structured output prediction. In: Cozman, F.G., Pfeffer, A. (eds.) Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence (UAI 2011), p. 514. AUAI Press (2011)Google Scholar
  30. 30.
    Montufar, G., Ay, N.: Refinements of universal approximation results for deep belief networks and restricted Boltzmann machines. Neural Comput. 23(5), 1306–1319Google Scholar
  31. 31.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Rumelhart, D.E., McClelland, J.L. (eds.) Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1: Foundations, pp. 318–362. MIT Press (1986)Google Scholar
  32. 32.
    Salakhutdinov, R., Hinton, G.E.: Deep Boltzmann machines. In: JMLR Workshop and Conference Proceedings: AISTATS 2009, vol. 5, pp. 448–455 (2009)Google Scholar
  33. 33.
    Salakhutdinov, R.: Learning in Markov random fields using tempered transitions. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems 22, pp. 1598–1606 (2009)Google Scholar
  34. 34.
    Smolensky, P.: Information processing in dynamical systems: Foundations of harmony theory. In: Rumelhart, D.E., McClelland, J.L. (eds.) Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1: Foundations, pp. 194–281. MIT Press (1986)Google Scholar
  35. 35.
    Taylor, G.W., Hinton, G.E., Roweis, S.T.: Modeling human motion using binary latent variables. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems (NIPS 19), pp. 1345–1352. MIT Press (2007)Google Scholar
  36. 36.
    Tieleman, T.: Training restricted Boltzmann machines using approximations to the likelihood gradient. In: Cohen, W.W., McCallum, A., Roweis, S.T. (eds.) International Conference on Machine learning (ICML), pp. 1064–1071. ACM (2008)Google Scholar
  37. 37.
    Tieleman, T., Hinton, G.E.: Using fast weights to improve persistent contrastive divergence. In: Pohoreckyj Danyluk, A., Bottou, L., Littman, M.L. (eds.) International Conference on Machine Learning (ICML), pp. 1033–1040. ACM (2009)Google Scholar
  38. 38.
    Wang, N., Melchior, J., Wiskott, L.: An analysis of Gaussian-binary restricted Boltzmann machines for natural images. In: Verleysen, M. (ed.) European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), pp. 287–292. d-side publications, Evere (2012)Google Scholar
  39. 39.
    Welling, M.: Product of experts. Scholarpedia 2(10), 3879 (2007)CrossRefGoogle Scholar
  40. 40.
    Welling, M., Rosen-Zvi, M., Hinton, G.: Exponential family harmoniums with an application to information retrieval. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems (NIPS 17), pp. 1481–1488. MIT Press, Cambridge (2005)Google Scholar
  41. 41.
    Younes, L.: Maximum likelihood estimation of Gibbs fields. In: Possolo, A. (ed.) Proceedings of an AMS-IMS-SIAM Joint Conference on Spacial Statistics and Imaging. Lecture Notes Monograph Series, Institute of Mathematical Statistics, Hayward (1991)Google Scholar
  42. 42.
    Yuille, A.L.: The convergence of contrastive divergence. In: Saul, L., Weiss, Y., Bottou, L. (eds.) Advances in Neural Processing Systems (NIPS 17), pp. 1593–1600. MIT Press (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Asja Fischer
    • 1
    • 2
  • Christian Igel
    • 2
  1. 1.Institut für NeuroinformatikRuhr-Universität BochumGermany
  2. 2.Department of Computer ScienceUniversity of CopenhagenDenmark

Personalised recommendations