Encyclopedia of Machine Learning and Data Mining

2017 Edition
| Editors: Claude Sammut, Geoffrey I. Webb

Deep Belief Nets

  • Geoffrey Hinton
Reference work entry
DOI: https://doi.org/10.1007/978-1-4899-7687-1_67



Deep belief nets are probabilistic generative models that are composed of multiple layers of stochastic latent variables (also called “feature detectors” or “hidden units”). The top two layers have undirected, symmetric connections between them and form an associative memory. The lower layers receive top-down, directed connections from the layer above. Deep belief nets have two important computational properties. First, there is an efficient procedure for learning the top-down, generative weights that specify how the variables in one layer determine the probabilities of variables in the layer below. This procedure learns one layer of latent variables at a time. Second, after learning multiple layers, the values of the latent variables in every layer can be inferred by a single, bottom-up pass that starts with an observed data vector in the bottom layer and uses the generative weights in the reverse direction.

Motivation and Background


This is a preview of subscription content, log in to check access.

Recommended Reading

  1. Bengio Y, Lamblin P, Popovici P, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Advances in neural information processing systems, Vancouver, vol 19. MIT, CambridgeGoogle Scholar
  2. Hinton GE (1989) Connectionist learning procedures. Artif Intell 40(1–3):185–234CrossRefGoogle Scholar
  3. Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18:1527–1554MathSciNetzbMATHCrossRefGoogle Scholar
  4. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507MathSciNetzbMATHCrossRefGoogle Scholar
  5. Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y (2007) An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th international conference on machine learning, Corvalis. ACM, New YorkGoogle Scholar
  6. LeCun Y, Bengio Y (2007) Scaling learning algorithms towards AI. In: Bottou L et al (eds) Large-scale kernel machines. MIT, CambridgeGoogle Scholar
  7. Movellan JR, Marks TK (2001) Diffusion networks, product of experts, and factor analysisGoogle Scholar
  8. Ranzato M, Huang FJ, Boureau Y, LeCun Y (2007) Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: Proceedings of computer vision and pattern recognition conference (CVPR 2007), MinneapolisGoogle Scholar
  9. Rosenblatt F (1962) Principles of neurodynamics. Spartan Books, Washington, DCzbMATHGoogle Scholar
  10. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536CrossRefGoogle Scholar
  11. Salakhutdinov RR, Hinton GE (2007) Semantic hashing. In: Proceedings of the SIGIR workshop on information retrieval and applications of graphical models, AmsterdamGoogle Scholar
  12. Selfridge OG (1958) Pandemonium: a paradigm for learning. In: Proceedings of a symposium on mechanisation of though processes, National Physical Laboratory. HMSO, LondonGoogle Scholar
  13. Sutskever I, Hinton GE (2007) Learning multilevel distributed representations for high-dimensional sequences. In: Proceedings of the eleventh international conference on artificial intelligence and statistics, San JuanGoogle Scholar
  14. Taylor GW, Hinton GE, Roweis S (2007) Modeling human motion using binary latent variables. In: Advances in neural information processing systems, Vancouver, vol 19. MIT, CambridgeGoogle Scholar
  15. Torralba A, Fergus R, Weiss Y (2008) Small codes and large image databases for recognition. In: IEEE conference on computer vision and pattern recognition, Anchorage, pp 1–8Google Scholar
  16. Welling M, Rosen-Zvi M, Hinton GE (2005) Exponential family harmoniums with an application to information retrieval. In: Advances in neural information processing systems, Vancouver, vol 17. MIT, Cambridge, pp 1481–1488Google Scholar
  17. Werbos P (1974) Beyond regression: new tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard University, CambridgeGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.University of TorontoTorontoCanada