Skip to main content

Advertisement

Log in

The physics of energy-based models

  • Research Article
  • Published:
Quantum Machine Intelligence Aims and scope Submit manuscript

Abstract

Energy-based models (EBMs) are experiencing a resurgence of interest in both the physics community and the machine learning community. This article provides an intuitive introduction to EBMs, without requiring any background in machine learning, connecting elementary concepts from physics with basic concepts and tools in generative models, and finally giving a perspective where current research in the field is heading. This article, in its original form, was written as an online lecture note in HTML and Javascript and contains interactive graphics. We recommend the reader to also visit the interactive version.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. The entropy of a system is given by \(S = \sum _x -P(x)\log P(x)\), where P(x) is the probability of a state x. If all states are equally likely, the entropy is maximal. If only one state \(P(x) = 1\) then the entropy is minimal.

  2. By definition, being at thermal equilibrium with a bath of temperature T means that the system is also at tempearature T. The temperature determines the average energy of a system. With large T, the probability of high energy states increases, and so does the average energy.

  3. For any distribution in the exponential family, a statistic T(x) is sufficient if we can write the probability p(x) as

    $$\begin{aligned}p(x) = \exp \left( \alpha (\theta ) T(x) + A(\theta ) \right) ,\end{aligned}$$

    where \(\alpha (\theta )\) is a vector-valued function and \(A(\theta )\) is a scalar, which for a Boltzmann distribution is related to the partition function as \(A(\theta ) = \log (1/Z)\) (Li et al. 2013).

  4. There are other conventions for the Ising energy function in the literature, where the signs of the energy terms change. For example,

    $$\begin{aligned} E(\sigma ) = \sum \limits _i b_i \sigma _i + \sum \limits _{ij} w_{ij} \sigma _i\sigma _j. \end{aligned}$$

    We follow the convention introduced above.

  5. Non-commuting operators do not have a common eigenfunction or eigenstate. Since these eigenfunctions in quantum mechanics (for hermitian operators) are orthogonal, non-commutating operators lead to non-orthogonal eigenfunctions or eigenstates and therefore there is no measurement operator that can reliably distinguish these non-orthogonal states. This gives rise to the uncertainty principle. The no-cloning theorem follows a similar argument. Orthogonal states can be cloned, but non-orthogonal ones cannot.

  6. Droplet is a local low-energy cluster of spins where the distribution is disconnected from the rest of the system.

References

  • Amin MH, Andriyash E, Rolfe J, Kulchytskyy B, Melko R (2018) Quantum Boltzmann machine. Physical Review X 8(2):021050

    Article  Google Scholar 

  • Aurell E, Ekeberg M (2012) Inverse Ising inference using all the data. PhysicaL Review Letters 108(9):090201

    Article  Google Scholar 

  • Amit DJ, Gutfreund H, Sompolinsky H (1985) Spin-glass models of neural networks. Physical Review A 32(2):1007

    Article  MathSciNet  Google Scholar 

  • Ackley DH, Hinton GE, Sejnowski TJ (1985) A learning algorithm for Boltzmann machines. Cognitive Science 9(1):147–169

    Article  Google Scholar 

  • Biamonte JD (2008) Nonperturbative k-body to two-body commuting conversion Hamiltonians and embedding problem instances into Ising spins. Physical Review A 77(5):052331

    Article  MathSciNet  Google Scholar 

  • Babbush R, O’Gorman B, Aspuru-Guzik A (2013) Resource efficient gadgets for compiling adiabatic quantum optimization problems. Annalen der Physik 525(10–11):877–888

    Article  MathSciNet  Google Scholar 

  • Borders WA, Pervaiz AZ, Fukami S, Camsari KY, Ohno H, Datta S (2019) Integer factorization using stochastic magnetic tunnel junctions. Nature 573(7774):390–393

    Article  Google Scholar 

  • Benedetti M, Realpe-Gómez J, Biswas R, Perdomo-Ortiz A (2017) Quantum-assisted learning of hardware-embedded probabilistic graphical models. Physical Review X 7(4):041052

    Article  Google Scholar 

  • Courville A, Bergstra J, Bengio Y (2011) A spike and slab restricted Boltzmann machine. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, pp 233–241

  • Cipra BA (1987) An introduction to the Ising model. The American Mathematical Monthly 94(10):937–959

    Article  MathSciNet  Google Scholar 

  • Carreira-Perpinan MA, Hinton GE (2005) On contrastive divergence learning. In: Aistats, vol 10. Citeseer, pp 33–40

  • Carleo G, Troyer M (2017) Solving the quantum many-body problem with artificial neural networks. Science 355(6325):602–606

    Article  MathSciNet  Google Scholar 

  • Du Y, Lin T, Mordatch I (2019) Model based planning with energy based models. arXiv:1909.06878

  • Du Y, Mordatch I (2019) Implicit generation and generalization in energy-based models

  • Dahl G, Ranzato MA, Mohamed A-R, Hinton GE (2010) Phone recognition with the mean-covariance restricted Boltzmann machine. In: Advances in neural information processing systems. pp 469–477

  • Earl DJ, Deem MW (2005) Parallel tempering: theory, applications, and new perspectives. Phys Chem Chem Phys 7:3910–3916

    Article  Google Scholar 

  • Finn C, Christiano P, Abbeel P, Levine S (2016) A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models. arXiv:1611.03852

  • Gao X, Duan L-M (2017) Efficient representation of quantum many-body states with deep neural networks. Nature Communications 8(1):662

    Article  Google Scholar 

  • Goldstein H (2002) Classical mechanics. Pearson Education

  • Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications

  • Hamze F, de Freitas N (2012) From fields to trees. arXiv:1207.4149

  • Huembeli P, Dauphin A, Wittek P, Gogolin C (2019) Automated discovery of characteristic features of phase transitions in many-body localization. Physical Review B 99(10):104106

    Article  Google Scholar 

  • Hen I (2017) Solving spin glasses with optimized trees of clustered spins. Phys Rev E 96:022105

    Article  MathSciNet  Google Scholar 

  • Hopfield JJ, Feinstein DI, Palmer RG (1983) Unlearning has a stabilizing effect in collective memories. Nature 304(5922):158

    Article  Google Scholar 

  • Hartnett GS, Mohseni M (2020) Self-supervised learning of generative spin-glasses with normalizing flows

  • Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences 79(8):2554–2558

    Article  MathSciNet  Google Scholar 

  • Houdayer J (2001) A cluster Monte Carlo algorithm for 2-dimensional spin glasses. The European Physical Journal B-Condensed Matter and Complex Systems 22(4):479–484

    Article  Google Scholar 

  • Hsieh CY, Sun Q, Zhang S, Lee CK (2021) Unitary-coupled restricted boltzmann machine ansatz for quantum simulations. NPJ Quantum Information 7(1):1–10

    Article  Google Scholar 

  • Haarnoja T, Tang H, Abbeel P, Levine S (2017) Reinforcement learning with deep energy-based policies. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70. pp 1352–1361, JMLR. org

  • Iten R, Metger T, Wilming H, Del Rio L, Renner R (2018) Discovering physical concepts with neural networks. arXiv:1807.10300

  • Jaynes ET (1957) Information theory and statistical mechanics. Physical Review 106(4):620

    Article  MathSciNet  Google Scholar 

  • Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680

    Article  MathSciNet  Google Scholar 

  • Khoshaman A, Vinci W, Denis B, Andriyash E, Sadeghi H, Amin MH (2018) Quantum variational autoencoder. Quantum Science and Technology 4(1):014001

    Article  Google Scholar 

  • Kieferová M, Wiebe N (2017) Tomography and generative training with quantum Boltzmann machines. Physical Review A 96(6):062327

    Article  Google Scholar 

  • LeCun Y, Chopra S, Hadsell R (2006) A tutorial on energy-based learning

  • Le Roux N, Bengio Y (2008) Representational power of restricted Boltzmann machines and deep belief networks. Neural Computation 20(6):1631–1649

    Article  MathSciNet  Google Scholar 

  • Le Roux N, Bengio Y (2010) Deep belief networks are compact universal approximators. Neural Computation 22(8):2192–2207

    Article  MathSciNet  Google Scholar 

  • Liu J-G, Wang L (2018) Differentiable learning of quantum circuit born machines. Physical Review A 98(6):062324

    Article  MathSciNet  Google Scholar 

  • Li X, Wang B, Liu Y, Lee TS (2013) Learning discriminative sufficient statistics score space for classification. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, pp 49–64

  • Melko RG, Carleo G, Carrasquilla J, Cirac JI (2019) Restricted Boltzmann machines in quantum physics. Nature Physics 15(9):887–892

    Article  Google Scholar 

  • Mezard M, Montanari A (2009) Information, physics, and computation. Oxford University Press Inc, New York

    Book  Google Scholar 

  • Moore C, Mertens S (2011) The nature of computation. Oxford University Press Inc, New York

    Book  Google Scholar 

  • Melnikov AA, Nautrup HP, Krenn M, Dunjko V, Tiersch M, Zeilinger A, Briegel HJ (2018) Active learning machine learns to create new quantum experiments. Proceedings of the National Academy of Sciences 115(6):1221–1226

    Article  Google Scholar 

  • Mohseni M (2021) Article in preparation

  • Neyshabur B, Bhojanapalli S, McAllester D, Srebro N (2017) Exploring generalization in deep learning. In: Advances in neural information processing systems. pp 5947–5956

  • Nielsen MA, Chuang I (2002) Quantum computation and quantum information

  • Nijkamp E, Hill M, Han T, Zhu S-C, Wu YN (2019) On the anatomy of mcmc-based maximum likelihood learning of energy-based models

  • Robert CP, Casella G (1999) The Metropolis—Hastings algorithm. In: Monte Carlo statistical methods. Springer, pp 231–283

  • Rojas R (1996) Neural networks: a systematic introduction. Springer-Verlag, Berlin, Heidelberg

    Book  Google Scholar 

  • Rojas R (2013) Neural networks: a systematic introduction. Springer Science & Business Media, New York

    MATH  Google Scholar 

  • Swersky K, Buchman D, Freitas ND, Marlin BM, et al (2011) On autoencoders and score matching for energy based models. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11). pp 1201–1208

  • Selby A (2014) Efficient subgraph-based sampling of Ising-type models with frustration. arXiv:1409.3934

  • Sutton B, Faria R, Ghantasala LA, Jaiswal R, Camsari KY, Datta S (2020) Autonomous probabilistic coprocessing with petaflips per second. IEEE Access 8:157238–157252

    Article  Google Scholar 

  • Salakhutdinov R, Mnih A, Hinton G (2007) Restricted Boltzmann machines for collaborative filtering. In: Proceedings of the 24th International Conference on Machine Learning Pages. ACM, pp 791–798

  • Stein DL, Newman CM (2013) Spin glasses and complexity. Princeton University Press, Princeton

    Book  Google Scholar 

  • Swendsen RH, Wang J-S (1987) Nonuniversal critical dynamics in Monte Carlo simulations. Phys Rev Lett 58:86–88

    Article  Google Scholar 

  • Torlai G, Mazzola G, Carrasquilla J, Troyer M, Melko R, Carleo G (2018) Neural-network quantum state tomography. Nature Physics 14(5):447

    Article  Google Scholar 

  • van Hemmen JL (1986) Spin-glass models of a neural network. Physical Review A 34(4):3435–3445

    Article  MathSciNet  Google Scholar 

  • Verdon G, Marks J, Nanda S, Leichenauer S, Hidary J (2019) Quantum Hamiltonian-based models and the variational quantum thermalizer algorithm. arXiv:1910.02071

  • Wetzel SJ (2017) Unsupervised learning of phase transitions: from principal component analysis to variational autoencoders. Physical Review E 96(2):022140

    Article  Google Scholar 

  • Wolff U (1989) Collective Monte Carlo updating for spin systems. Phys Rev Lett 62:361–364

    Article  Google Scholar 

  • Zhai S, Cheng Y, Lu W, Zhang Z (2016) Deep structured energy based models for anomaly detection. arXiv:1605.07717

  • Zhao, J, Mathieu M, LeCun Y (2016) Energy-based generative adversarial network. arXiv:1609.03126

  • Zhang J, Wang H, Chu J, Huang S, Li T, Zhao Q (2019) Improved Gaussian-Bernoulli restricted Boltzmann machine for learning discriminative representations. Knowledge-Based Systems 185:104911

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Huembeli.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huembeli, P., Arrazola, J.M., Killoran, N. et al. The physics of energy-based models. Quantum Mach. Intell. 4, 1 (2022). https://doi.org/10.1007/s42484-021-00057-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42484-021-00057-7

Keywords

Navigation