Encyclopedia of Machine Learning

2010 Edition
| Editors: Claude Sammut, Geoffrey I. Webb

Boltzmann Machines

  • Geoffrey Hinton
Reference work entry
DOI: https://doi.org/10.1007/978-0-387-30164-8_83



A Boltzmann machine is a network of symmetrically connected, neuron-like units that make stochastic decisions about whether to be on or off. Boltzmann machines have a simple learning algorithm (Hinton & Sejnowski, 1983) that allows them to discover interesting features that represent complex regularities in the training data. The learning algorithm is very slow in networks with many layers of feature detectors, but it is fast in “restricted Boltzmann machines” that have a single layer of feature detectors. Many hidden layers can be learned efficiently by composing restricted Boltzmann machines, using the feature activations of one as the training data for the next.

Boltzmann machines are used to solve two quite different computational problems. For a search problem, the weights on the connections are fixed and are used to represent a cost function. The stochastic dynamics of a Boltzmann machine then allow it to sample binary state vectors that...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. Ackley, D., Hinton, G., & Sejnowski, T. (1985). A Learning algorithm for boltzmann machines. Cognitive Science, 9(1), 147–169.Google Scholar
  2. Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6), 721–741.zbMATHGoogle Scholar
  3. Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences USA, 79, 2554–2558.MathSciNetGoogle Scholar
  4. Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1711–1800.Google Scholar
  5. Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527–1554.zbMATHMathSciNetGoogle Scholar
  6. Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504–507.MathSciNetGoogle Scholar
  7. Hinton, G. E., & Sejnowski, T. J. (1983). Optimal perceptual inference. In Proceedings of the IEEE conference on computer vision and pattern recognition, Washington, DC (pp. 448–453).Google Scholar
  8. Jordan, M. I. (1998). Learning in graphical models. Cambridge, MA MIT press.zbMATHGoogle Scholar
  9. Kirkpatrick, S., Gelatt, D. D., & Vecci, M. P. (1983). Optimization by simulated annealing. Science, 220(4598), 671–680.MathSciNetGoogle Scholar
  10. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th international conference on machine learning (pp. 282–289). San Francisco, Morgan Kaufmann.Google Scholar
  11. Peterson, C., & Anderson, J. R. (1987). A mean field theory learning algorithm for neural networks. Complex Systems, 1(5), 995–1019.zbMATHMathSciNetGoogle Scholar
  12. Sejnowski, T. J. (1986). Higher-order boltzmann machines. AIP Conference Proceedings, 151(1), 398–403.MathSciNetGoogle Scholar
  13. Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory. In D. E. Rumelhart, & J. L. McClelland (Eds.), Parallel distributed processing: Vol. 1: Foundations (pp. 194–281). Cambridge, MA: MIT Press.Google Scholar
  14. Welling, M., Rosen-Zvi, M., & Hinton, G. E. (2005). Exponential family harmoniums with an application to information retrieval. In Advances in neural information processing systems (vol. 17, pp. 1481–1488). Cambridge, MA: MIT Press.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Geoffrey Hinton

There are no affiliations available