Skip to main content
Log in

Learning DFT

  • Regular Article
  • Published:
The European Physical Journal Special Topics Aims and scope Submit manuscript

Abstract

We present an extension of reverse engineered Kohn-Sham potentials from a density matrix renormalization group calculation towards the construction of a density functional theory functional via deep learning. Instead of applying machine learning to the energy functional itself, we apply these techniques to the Kohn-Sham potentials. To this end, we develop a scheme to train a neural network to represent the mapping from local densities to Kohn-Sham potentials. Finally, we use the neural network to up-scale the simulation to larger system sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. S.R. White, Density matrix formulation for quantum renormalization groups. Phys. Rev. Lett. 69, 2863–2866 (1992). https://doi.org/10.1103/PhysRevLett.69.2863

    Article  ADS  Google Scholar 

  2. S.R. White, R.M. Noack, Real-space quantum renormalization groups. Phys. Rev. Lett. 68, 3487–3490 (1992). https://doi.org/10.1103/PhysRevLett.68.3487

    Article  ADS  Google Scholar 

  3. S.R. White, Density matrix renormalization group. Phys. Rev. B 48, 10345 (1993)

    Article  ADS  Google Scholar 

  4. R.M. Noack, S.R. Manmana, Diagonalization- and numerical renormalization-group-based methods for interacting quantum systems, in Lectures on the physics of highly correlated electron systems IX: ninth training course in the physics of correlated electron systems and high-tc superconductors, ed. by A. Avella, F. Mancini, vol. 789, pp. 93–163, Salerno, Italy (2005)

  5. K.A. Hallberg, New trends in density matrix renormalization. Adv. Phys. 55(5), 477–526 (2006). https://doi.org/10.1080/00018730600766432

    Article  ADS  Google Scholar 

  6. P. Hohenberg, W. Kohn, Inhomogeneous electron gas. Phys. Rev. 136, B864–B871 (1964). https://doi.org/10.1103/PhysRev.136.B864

    Article  ADS  MathSciNet  Google Scholar 

  7. W. Kohn, L.J. Sham, Self-consistent equations including exchange and correlation effects. Phys. Rev. 140, A1133–A1138 (1965). https://doi.org/10.1103/PhysRev.140.A1133

    Article  ADS  MathSciNet  Google Scholar 

  8. R.M. Dreizler, E.K.U. Gross, Density Functional Theory (Springer, Berlin, 1990)

    Book  Google Scholar 

  9. O. Gunnarsson, K. Schönhammer, Density-functional treatment of an exactly solvable semiconductor model. Phys. Rev. Lett. 56(18), 1968–1971 (1986). https://doi.org/10.1103/PhysRevLett.56.1968

    Article  ADS  Google Scholar 

  10. P. Schmitteckert, F. Evers, Exact ground state density-functional theory for impurity models coupled to external reservoirs and transport calculations. Phys. Rev. Lett. 100(8), 086401 (2008). https://doi.org/10.1103/PhysRevLett.100.086401

    Article  ADS  Google Scholar 

  11. P. Schmitteckert, Inverse mean field theories. Phys. Chem. Chem. Phys. 20, 27600–27610 (2018). https://doi.org/10.1039/C8CP03763A

    Article  Google Scholar 

  12. M.A. Nielsen, Neural Networks and Deep Learning (Determination Press, Baltimore, 2015)

    Google Scholar 

  13. W.S. McCulloch, W. Pitts, A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5(4), 115–133 (1943). https://doi.org/10.1007/BF02478259

    Article  MathSciNet  MATH  Google Scholar 

  14. D.E. Rumelhart, G.E. Hinton, R.J. William, Learning represantations by back-propagating errors. Nature 323, 533 (1986)

    Article  ADS  Google Scholar 

  15. Y. LeCun, Y. Bengio, Convolutional networks for images, speech, and time series. Handb. Brain Theory Neural Netw. 3361, 3539 (1995)

    Google Scholar 

  16. T. Nomi et al., tiny dnn (2019)

  17. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, TensorFlow: large-scale machine learning on heterogeneous systems (2015), https://www.tensorflow.org/. Software available from tensorflow.org

  18. F. Chollet et al., Keras. GitHub (2015). https://github.com/fchollet/keras

  19. G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto, L. Zdeborová, Machine learning and the physical sciences. Rev. Mod. Phys. 91, 045002 (2019). https://doi.org/10.1103/RevModPhys.91.045002

    Article  ADS  Google Scholar 

  20. F. Brockherde, L. Vogt, L. Li, M.E. Tuckerman, K. Burke, K.-R. Müller, Bypassing the kohn-sham equations with machine learning. Nature Commun. 8(1), 872 (2017). https://doi.org/10.1038/s41467-017-00839-3. ISSN 2041-1723

    Article  ADS  Google Scholar 

  21. B. Kolb, L.C. Lentz, A.M. Kolpak, Discovering charge density functionals and structure-property relationships with prophet: a general framework for coupling machine learning and first-principles methods. Sci. Rep. 7(1), 1192 (2017). https://doi.org/10.1038/s41598-017-01251-z. ISSN 2045-2322

    Article  ADS  Google Scholar 

  22. L. Hu, X. Wang, L. Wong, G. Chen, Combined first-principles calculation and neural-network correction approach for heat of formation. J. Chem. Phys. 119, 11501 (2003)

    Article  ADS  Google Scholar 

  23. X. Zheng, L.H. Hu, X.J. Wang, G.H. Chen, A generalized exchange-correlation functional: the neural-networks approach. Chem. Phys. Lett. 390(1), 186–192 (2004). https://doi.org/10.1016/j.cplett.2004.04.020, URL http://www.sciencedirect.com/science/article/pii/S0009261404005603

  24. Q. Liu, J.C. Wang, D. PengLi, H. LiHong, X. Zheng, G.H. Chen, Improving the performance of long-range-corrected exchange-correlation functional with an embedded neural network. J. Phys. Chem. A 121(38), 7273–7281 (2017). https://doi.org/10.1021/acs.jpca.7b07045. PMID: 28876064

    Article  Google Scholar 

  25. J.C. Snyder, M. Rupp, K. Hansen, K.-R. Müller, K. Burke, Finding density functionals with machine learning. Phys. Rev. Lett. 108, 253002 (2012). https://doi.org/10.1103/PhysRevLett.108.253002

    Article  ADS  Google Scholar 

  26. J.C. Snyder, M. Rupp, K. Hansen, L. Blooston, K.-R. Müller, K. Burke, Orbital-free bond breaking via machine learning. J. Chem. Phys. 139(22), 224104 (2013). https://doi.org/10.1063/1.4834075

    Article  ADS  Google Scholar 

  27. L. Li, T.E. Baker, S.R. White, K. Burke, Pure density functional for strong correlation and the thermodynamic limit from machine learning. Phys. Rev. B 94, 245129 (2016). https://doi.org/10.1103/PhysRevB.94.245129

    Article  ADS  Google Scholar 

  28. T. Giamarchi, H.J. Schulz, Anderson localization and interactions in one-dimensional metals. Phys. Rev. B 37, 325–340 (1988). https://doi.org/10.1103/PhysRevB.37.325

    Article  ADS  Google Scholar 

  29. P. Schmitteckert, T. Schulze, C. Schuster, P. Schwab, U. Eckern, Anderson localization versus delocalization of interacting fermions in one dimension. Phys. Rev. Lett. 80, 560–563 (1998). https://doi.org/10.1103/PhysRevLett.80.560

    Article  ADS  Google Scholar 

  30. P. Schmitteckert, R.A. Jalabert, D. Weinmann, J.-L. Pichard, From the fermi glass towards the mott insulator in one dimension: Delocalization and strongly enhanced persistent currents. Phys. Rev. Lett. 81, 2308–2311 (1998). https://doi.org/10.1103/PhysRevLett.81.2308

    Article  ADS  Google Scholar 

  31. R.A. Jalabert, D. Weinmann, J.-L. Pichard, Partial delocalization of the ground state by repulsive interactions in a disordered chain. Phys. E Low-Dimens. Syst. Nanostruct. 9(3), 347–351 (2001). https://doi.org/10.1016/S1386-9477(00)00226-5, URL http://www.sciencedirect.com/science/article/pii/S1386947700002265 (Proceedings of an International Workshop and Seminar on the Dynamics of Complex Systems)

  32. P. Schmitteckert, Disordered one-dimensional fermi systems. Density Matrix Renormal. 33, 345–355 (1999). ISBN 978-3-540-66129-0

    Article  Google Scholar 

  33. I. Peschel, X. Wang, M.Kaulke, K. Hallberg, eds. Density Matrix Renormalization (1999). (ISBN 978-3-540-66129-0)

  34. K. Schönhammer, O. Gunnarsson, R.M. Noack, Density-functional theory on a lattice: comparison with exact numerical results for a model with strongly correlated electrons. Phys. Rev. B 52, 2504–2510 (1995). https://doi.org/10.1103/PhysRevB.52.2504

    Article  ADS  Google Scholar 

  35. F. Evers, P. Schmitteckert, Density functional theory with exact xc-potentials: lessons from dmrg-studies and exactly solvable models. Phys. Status Solidi B 250, 2330 (2013)

    Article  ADS  Google Scholar 

  36. H. Robbins, S. Monro, A stochastic approximation method. Ann. Math. Statist. 22(3), 400–407 (1951). https://doi.org/10.1214/aoms/1177729586

    Article  MathSciNet  MATH  Google Scholar 

  37. D.P. Kingma, J. Ba, Adam: a method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings, ed. by Y. Bengio, Y. LeCun (2015). arXiv:1412.6980. https://dblp.org/rec/journals/corr/KingmaB14.bib

  38. J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization. JMLR 12, 2121–2159 (2011)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Most of the work reported here was performed while being at the university of Würzburg and was supported by ERC-StG-Thomale-TOPOLECTRICS-336012 and was presented at the FQMT’19 in Praque. We would like to thank Florian Eich for insightful discussions.

All authors contributed equally to the manuscript and the acquisition of the results.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Schmitteckert.

Appendices

Appendix A: Neural networks for fitting functions

The main application of neural networks consists in the classification of input variables, i.e. one maps the input to a discrete set of output variable, with the standard internet example of “is it a cat or not?”. Here we provide an example for applying a neural network on fitting a function.

1.1 A.1: The network

The basic building block of a neural network (NN) consists of a neutron as depicted in Fig. 3a.

The neuron consist of an input \(\{x_i\}\), weight factors \(\{w_i\}\), an offset b, a so-called activation function \(\sigma (z)\), see Fig. 3b, and the output z:

$$\begin{aligned} z \;=\; \sigma \left( b + \sum _\ell w_\ell x_\ell \right) \end{aligned}$$
(7)

Throughout this work we have always used an \(\tanh \) activation function. One now combines many neurons, Fig. 3a, into a neural network in a layered fashion by connecting inputs of the neurons of one layer with the outputs of the neurons of the previous layer, see Fig. 4. Since from a user perspective the NN in Fig. 4 translates the input of the first layer into the output of the last layer one calls the first layer the input layer, the last one the output layer and the other layers are denoted as hidden layers. If each neuron is connected to each neuron of the previous layer one calls the network dense. The training of a NN is often referred to as machine learning, and in the presence of many hidden layer as deep learning.

Fig. 3
figure 3

The building block of a NN: a a neutron, b typical weight function: a sigmoid and a \(\tanh \)

Fig. 4
figure 4

A neural network build out of neurons shown in Fig. 3a

Fig. 5
figure 5

Fits for the function f(x) Eq. (10) using a \(\tanh \) activation function obtained using tensorflow/keras for a \(1 \times 50 \times 50 \times 1\) network. a 25.000 samples, 10 repetitions of SGD; b 25.000 samples, 100 repetitions of SGD; c 25.000 samples, 1000 repetitions of SGD; d \(2\times \) 25.000 samples, 250 repetitions (\(1\times \) SGD & \(1\times \)ADAM)

In summary, the NN in Fig. 4 calculates an output z from the input \(\{x_j\}\), where one has to specify the parameter in Eq. (7) for each neuron n in layer \(\ell \),

$$\begin{aligned} z_{\ell ,n} \;=\; \sigma \left( b_{\ell ,n} + \sum _j w_{\ell ,n,j} x_j \right) \,. \end{aligned}$$
(8)

Of course, this can be extended to create multiple output variables \(z_k\) in the output layer.

In order to apply a NN for fitting functions f(x) we use a NN with a single input x and a single output z. The free parameter \(\{b_{\ell ,n}, w_{\ell ,n,j}\}\) are the set of fitting parameter. We would like to note that this approach is in contrast to the desired approach in physics, where on tries to fit a phenomenon with a suitable function using as few fitting parameter as possible. Instead, here we take the opposite approach by using a simple fitting function unrelated to the problem and fit the desired function with a large number of parameter and a few steps of recursion.

1.2 A.2: Training the neural network: minimize cost function

The idea to determine the fit parameter for fitting a function \(f(\mathbf{x})\) consists in minimizing a cost function, typically

$$\begin{aligned} C(\mathbf{x}) \;=\; \frac{1}{N} \sum _{i=1}^{N} \left| \left| f(\mathbf{x}_i) - z_{\{b_{\ell ,n}, w_{\ell ,n,j}\}}(\mathbf{x}_i) \right| \right| _2 \end{aligned}$$
(9)

where N denotes the number of training samples. Eq. (9) could in principle be minimized by a standard steepest descent gradient search. However, due the vast amount of fit parameter this is not feasible in non-trivial examples, as the number of parameter, and therefore the dimensions of the associated matrices get too large. The breakthrough for neural networks was provided by the invention of the back propagating algorithm [14] combined with a stochastic evaluation of the gradients [36,37,38] combined the massive computational power of graphic cards, and for pattern recognition the use of convolutional layers [15], see below. In the example provided in this section we used tensorflow [17] software package combined with the keras [18] front end.

Fig. 6
figure 6

Fits for the function f(x) Eq. (10) using a \(\tanh \) activation function obtained using tensorflow/keras for a \( 1 \times 250 \times 50 \times 50 \times 1\) network, 250 repetitions, (SGD & ADAM)

1.3 A.3: An example

As an example we look at the function

$$\begin{aligned} f(x) \;=\ \sin (3x) \,-\, 0.8 \cos ^2(13x) {\mathrm {e}}^{0.5 x} \,+\, 4 {\mathrm {e}}^{-\frac{x^2}{0.0005}} \,+\, 6 {\mathrm {e}}^{ -\frac{(x+0.4)^2}{0.0001}} \end{aligned}$$
(10)

which has no deeper meaning, it was just handcrafted to represent a not too trivial function combining sharp and non-sharp features.

Since f is a single valued single argument function the input and output layer consists of a single neuron only. In Fig. 5 we show the results for fitting the function f(x) in Eq. (10) with two hidden layers consisting of fifty neurons each. In result we applied a dense NN with a \(1 \times 50 \times 50 \times 1\) structure. In order to train the system we generated 25.000 random values \(x_j\) with the corresponding \(z_j = f(x_j)\). We then trained the NN by performing a stochastic gradient descent search (SGD) with ten repetitions over the complete set of \(\{x_j, z_j\}\). We then evaluate the NN on an equidistantly spaced set of \(\{x_\ell \}\). As one can see in Fig. 5a the result is a rather smooth function that misses the sharp features. The way to improve the NN consists in learning harder, that is, we increased the repetitions of the SGD to 100, Fig. 5b, and 1000, Fig. 5c, which finally leads to a good representation of the functions.

A different strategy consist in using different gradient search strategies, i.e. a different optimizer to minimize the cost function Eq. (9). In Fig. 5d we show the results where we used only 500 repetitions, however we switched between a SGD and an ADAM optimizer, which performs much better, that just an SGD alone. We would like to remark that a priory it is not clear which optimizer is the best, and the optimizer performance seems to be rather problem dependent, see [12].

Finally in Fig. 6 we present results obtained from a deeper network consisting of \(1 \times 250 \times 50 \times 50 \times 1\) neurons. On the right axis we show the actual error of the fit which is below \(5 \times 10^{-3}\) over the complete range. In result we obtained results a rather good approximation to the function at the expense of using more than 15.000 fit parameter \(\{b_{\ell ,n}, w_{\ell ,n,j}\}\).

Fig. 7
figure 7

Layout for the convolutional network. The input layer is connected to two convolutional layer, which are then combined with the input layer to serve for the input of seven hidden full layers

We would like to point out that the approach of using 15.000 fit parameter may appear odd as it renders an understanding of the network impossible. However, we are using the approach to construct a DFT functional. For the ladder it is also fair to state that most users of the modern sophisticated DFT functionals have no understanding on the details of their construction.

Appendix B: A convolutional network

We also tested the setup of a convolutional network. There, in addition to full layer, one constructs a small kernel layer that gets convoluted with the with the output of another layer. For details we refer to [12].

Specifically we implemented the a NN as displayed in Fig. 7, which resulted into 100.628 fit parameter. However, despite all the effort we could not improve on the results obtained from the (smaller) dense network.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schmitteckert, P. Learning DFT. Eur. Phys. J. Spec. Top. 230, 1021–1029 (2021). https://doi.org/10.1140/epjs/s11734-021-00095-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1140/epjs/s11734-021-00095-z

Navigation