Learning DFT

Schmitteckert, Peter

doi:10.1140/epjs/s11734-021-00095-z

Peter Schmitteckert¹

356 Accesses
3 Altmetric
Explore all metrics

Abstract

We present an extension of reverse engineered Kohn-Sham potentials from a density matrix renormalization group calculation towards the construction of a density functional theory functional via deep learning. Instead of applying machine learning to the energy functional itself, we apply these techniques to the Kohn-Sham potentials. To this end, we develop a scheme to train a neural network to represent the mapping from local densities to Kohn-Sham potentials. Finally, we use the neural network to up-scale the simulation to larger system sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine learning accurate exchange and correlation functionals of the electronic density

Article Open access 14 July 2020

Employing neural density functionals to generate potential energy surfaces

Article 10 February 2024

Fermionic neural-network states for ab-initio electronic structure

Article Open access 12 May 2020

References

S.R. White, Density matrix formulation for quantum renormalization groups. Phys. Rev. Lett. 69, 2863–2866 (1992). https://doi.org/10.1103/PhysRevLett.69.2863
Article ADS Google Scholar
S.R. White, R.M. Noack, Real-space quantum renormalization groups. Phys. Rev. Lett. 68, 3487–3490 (1992). https://doi.org/10.1103/PhysRevLett.68.3487
Article ADS Google Scholar
S.R. White, Density matrix renormalization group. Phys. Rev. B 48, 10345 (1993)
Article ADS Google Scholar
R.M. Noack, S.R. Manmana, Diagonalization- and numerical renormalization-group-based methods for interacting quantum systems, in Lectures on the physics of highly correlated electron systems IX: ninth training course in the physics of correlated electron systems and high-tc superconductors, ed. by A. Avella, F. Mancini, vol. 789, pp. 93–163, Salerno, Italy (2005)
K.A. Hallberg, New trends in density matrix renormalization. Adv. Phys. 55(5), 477–526 (2006). https://doi.org/10.1080/00018730600766432
Article ADS Google Scholar
P. Hohenberg, W. Kohn, Inhomogeneous electron gas. Phys. Rev. 136, B864–B871 (1964). https://doi.org/10.1103/PhysRev.136.B864
Article ADS MathSciNet Google Scholar
W. Kohn, L.J. Sham, Self-consistent equations including exchange and correlation effects. Phys. Rev. 140, A1133–A1138 (1965). https://doi.org/10.1103/PhysRev.140.A1133
Article ADS MathSciNet Google Scholar
R.M. Dreizler, E.K.U. Gross, Density Functional Theory (Springer, Berlin, 1990)
Book Google Scholar
O. Gunnarsson, K. Schönhammer, Density-functional treatment of an exactly solvable semiconductor model. Phys. Rev. Lett. 56(18), 1968–1971 (1986). https://doi.org/10.1103/PhysRevLett.56.1968
Article ADS Google Scholar
P. Schmitteckert, F. Evers, Exact ground state density-functional theory for impurity models coupled to external reservoirs and transport calculations. Phys. Rev. Lett. 100(8), 086401 (2008). https://doi.org/10.1103/PhysRevLett.100.086401
Article ADS Google Scholar
P. Schmitteckert, Inverse mean field theories. Phys. Chem. Chem. Phys. 20, 27600–27610 (2018). https://doi.org/10.1039/C8CP03763A
Article Google Scholar
M.A. Nielsen, Neural Networks and Deep Learning (Determination Press, Baltimore, 2015)
Google Scholar
W.S. McCulloch, W. Pitts, A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5(4), 115–133 (1943). https://doi.org/10.1007/BF02478259
Article MathSciNet MATH Google Scholar
D.E. Rumelhart, G.E. Hinton, R.J. William, Learning represantations by back-propagating errors. Nature 323, 533 (1986)
Article ADS Google Scholar
Y. LeCun, Y. Bengio, Convolutional networks for images, speech, and time series. Handb. Brain Theory Neural Netw. 3361, 3539 (1995)
Google Scholar
T. Nomi et al., tiny dnn (2019)
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, TensorFlow: large-scale machine learning on heterogeneous systems (2015), https://www.tensorflow.org/. Software available from tensorflow.org
F. Chollet et al., Keras. GitHub (2015). https://github.com/fchollet/keras
G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto, L. Zdeborová, Machine learning and the physical sciences. Rev. Mod. Phys. 91, 045002 (2019). https://doi.org/10.1103/RevModPhys.91.045002
Article ADS Google Scholar
F. Brockherde, L. Vogt, L. Li, M.E. Tuckerman, K. Burke, K.-R. Müller, Bypassing the kohn-sham equations with machine learning. Nature Commun. 8(1), 872 (2017). https://doi.org/10.1038/s41467-017-00839-3. ISSN 2041-1723
Article ADS Google Scholar
B. Kolb, L.C. Lentz, A.M. Kolpak, Discovering charge density functionals and structure-property relationships with prophet: a general framework for coupling machine learning and first-principles methods. Sci. Rep. 7(1), 1192 (2017). https://doi.org/10.1038/s41598-017-01251-z. ISSN 2045-2322
Article ADS Google Scholar
L. Hu, X. Wang, L. Wong, G. Chen, Combined first-principles calculation and neural-network correction approach for heat of formation. J. Chem. Phys. 119, 11501 (2003)
Article ADS Google Scholar
X. Zheng, L.H. Hu, X.J. Wang, G.H. Chen, A generalized exchange-correlation functional: the neural-networks approach. Chem. Phys. Lett. 390(1), 186–192 (2004). https://doi.org/10.1016/j.cplett.2004.04.020, URL http://www.sciencedirect.com/science/article/pii/S0009261404005603
Q. Liu, J.C. Wang, D. PengLi, H. LiHong, X. Zheng, G.H. Chen, Improving the performance of long-range-corrected exchange-correlation functional with an embedded neural network. J. Phys. Chem. A 121(38), 7273–7281 (2017). https://doi.org/10.1021/acs.jpca.7b07045. PMID: 28876064
Article Google Scholar
J.C. Snyder, M. Rupp, K. Hansen, K.-R. Müller, K. Burke, Finding density functionals with machine learning. Phys. Rev. Lett. 108, 253002 (2012). https://doi.org/10.1103/PhysRevLett.108.253002
Article ADS Google Scholar
J.C. Snyder, M. Rupp, K. Hansen, L. Blooston, K.-R. Müller, K. Burke, Orbital-free bond breaking via machine learning. J. Chem. Phys. 139(22), 224104 (2013). https://doi.org/10.1063/1.4834075
Article ADS Google Scholar
L. Li, T.E. Baker, S.R. White, K. Burke, Pure density functional for strong correlation and the thermodynamic limit from machine learning. Phys. Rev. B 94, 245129 (2016). https://doi.org/10.1103/PhysRevB.94.245129
Article ADS Google Scholar
T. Giamarchi, H.J. Schulz, Anderson localization and interactions in one-dimensional metals. Phys. Rev. B 37, 325–340 (1988). https://doi.org/10.1103/PhysRevB.37.325
Article ADS Google Scholar
P. Schmitteckert, T. Schulze, C. Schuster, P. Schwab, U. Eckern, Anderson localization versus delocalization of interacting fermions in one dimension. Phys. Rev. Lett. 80, 560–563 (1998). https://doi.org/10.1103/PhysRevLett.80.560
Article ADS Google Scholar
P. Schmitteckert, R.A. Jalabert, D. Weinmann, J.-L. Pichard, From the fermi glass towards the mott insulator in one dimension: Delocalization and strongly enhanced persistent currents. Phys. Rev. Lett. 81, 2308–2311 (1998). https://doi.org/10.1103/PhysRevLett.81.2308
Article ADS Google Scholar
R.A. Jalabert, D. Weinmann, J.-L. Pichard, Partial delocalization of the ground state by repulsive interactions in a disordered chain. Phys. E Low-Dimens. Syst. Nanostruct. 9(3), 347–351 (2001). https://doi.org/10.1016/S1386-9477(00)00226-5, URL http://www.sciencedirect.com/science/article/pii/S1386947700002265 (Proceedings of an International Workshop and Seminar on the Dynamics of Complex Systems)
P. Schmitteckert, Disordered one-dimensional fermi systems. Density Matrix Renormal. 33, 345–355 (1999). ISBN 978-3-540-66129-0
Article Google Scholar
I. Peschel, X. Wang, M.Kaulke, K. Hallberg, eds. Density Matrix Renormalization (1999). (ISBN 978-3-540-66129-0)
K. Schönhammer, O. Gunnarsson, R.M. Noack, Density-functional theory on a lattice: comparison with exact numerical results for a model with strongly correlated electrons. Phys. Rev. B 52, 2504–2510 (1995). https://doi.org/10.1103/PhysRevB.52.2504
Article ADS Google Scholar
F. Evers, P. Schmitteckert, Density functional theory with exact xc-potentials: lessons from dmrg-studies and exactly solvable models. Phys. Status Solidi B 250, 2330 (2013)
Article ADS Google Scholar
H. Robbins, S. Monro, A stochastic approximation method. Ann. Math. Statist. 22(3), 400–407 (1951). https://doi.org/10.1214/aoms/1177729586
Article MathSciNet MATH Google Scholar
D.P. Kingma, J. Ba, Adam: a method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings, ed. by Y. Bengio, Y. LeCun (2015). arXiv:1412.6980. https://dblp.org/rec/journals/corr/KingmaB14.bib
J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization. JMLR 12, 2121–2159 (2011)
MathSciNet MATH Google Scholar

Download references

Acknowledgements

Most of the work reported here was performed while being at the university of Würzburg and was supported by ERC-StG-Thomale-TOPOLECTRICS-336012 and was presented at the FQMT’19 in Praque. We would like to thank Florian Eich for insightful discussions.

All authors contributed equally to the manuscript and the acquisition of the results.

Author information

Authors and Affiliations

HQS Quantum Simulations GmbH, Haid-und-Neu-Straße 7, 76131, Karlsruhe, Germany
Peter Schmitteckert

Authors

Peter Schmitteckert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Schmitteckert.

Appendices

Appendix A: Neural networks for fitting functions

The main application of neural networks consists in the classification of input variables, i.e. one maps the input to a discrete set of output variable, with the standard internet example of “is it a cat or not?”. Here we provide an example for applying a neural network on fitting a function.

1.1 A.1: The network

The basic building block of a neural network (NN) consists of a neutron as depicted in Fig. 3a.

The neuron consist of an input $\{x_i\}$, weight factors $\{w_i\}$, an offset b, a so-called activation function $\sigma (z)$, see Fig. 3b, and the output z:

$$\begin{aligned} z \;=\; \sigma \left( b + \sum _\ell w_\ell x_\ell \right) \end{aligned}$$

(7)

Throughout this work we have always used an $\tanh $ activation function. One now combines many neurons, Fig. 3a, into a neural network in a layered fashion by connecting inputs of the neurons of one layer with the outputs of the neurons of the previous layer, see Fig. 4. Since from a user perspective the NN in Fig. 4 translates the input of the first layer into the output of the last layer one calls the first layer the input layer, the last one the output layer and the other layers are denoted as hidden layers. If each neuron is connected to each neuron of the previous layer one calls the network dense. The training of a NN is often referred to as machine learning, and in the presence of many hidden layer as deep learning.

In summary, the NN in Fig. 4 calculates an output z from the input $\{x_j\}$, where one has to specify the parameter in Eq. (7) for each neuron n in layer $\ell $,

$$\begin{aligned} z_{\ell ,n} \;=\; \sigma \left( b_{\ell ,n} + \sum _j w_{\ell ,n,j} x_j \right) \,. \end{aligned}$$

(8)

Of course, this can be extended to create multiple output variables $z_k$ in the output layer.

In order to apply a NN for fitting functions f(x) we use a NN with a single input x and a single output z. The free parameter $\{b_{\ell ,n}, w_{\ell ,n,j}\}$ are the set of fitting parameter. We would like to note that this approach is in contrast to the desired approach in physics, where on tries to fit a phenomenon with a suitable function using as few fitting parameter as possible. Instead, here we take the opposite approach by using a simple fitting function unrelated to the problem and fit the desired function with a large number of parameter and a few steps of recursion.

1.2 A.2: Training the neural network: minimize cost function

The idea to determine the fit parameter for fitting a function $f(\mathbf{x})$ consists in minimizing a cost function, typically

$$\begin{aligned} C(\mathbf{x}) \;=\; \frac{1}{N} \sum _{i=1}^{N} \left| \left| f(\mathbf{x}_i) - z_{\{b_{\ell ,n}, w_{\ell ,n,j}\}}(\mathbf{x}_i) \right| \right| _2 \end{aligned}$$

(9)

where N denotes the number of training samples. Eq. (9) could in principle be minimized by a standard steepest descent gradient search. However, due the vast amount of fit parameter this is not feasible in non-trivial examples, as the number of parameter, and therefore the dimensions of the associated matrices get too large. The breakthrough for neural networks was provided by the invention of the back propagating algorithm [14] combined with a stochastic evaluation of the gradients [36,37,38] combined the massive computational power of graphic cards, and for pattern recognition the use of convolutional layers [15], see below. In the example provided in this section we used tensorflow [17] software package combined with the keras [18] front end.

1.3 A.3: An example

As an example we look at the function

$$\begin{aligned} f(x) \;=\ \sin (3x) \,-\, 0.8 \cos ^2(13x) {\mathrm {e}}^{0.5 x} \,+\, 4 {\mathrm {e}}^{-\frac{x^2}{0.0005}} \,+\, 6 {\mathrm {e}}^{ -\frac{(x+0.4)^2}{0.0001}} \end{aligned}$$

(10)

which has no deeper meaning, it was just handcrafted to represent a not too trivial function combining sharp and non-sharp features.

Since f is a single valued single argument function the input and output layer consists of a single neuron only. In Fig. 5 we show the results for fitting the function f(x) in Eq. (10) with two hidden layers consisting of fifty neurons each. In result we applied a dense NN with a $1 \times 50 \times 50 \times 1$ structure. In order to train the system we generated 25.000 random values $x_j$ with the corresponding $z_j = f(x_j)$. We then trained the NN by performing a stochastic gradient descent search (SGD) with ten repetitions over the complete set of $\{x_j, z_j\}$. We then evaluate the NN on an equidistantly spaced set of $\{x_\ell \}$. As one can see in Fig. 5a the result is a rather smooth function that misses the sharp features. The way to improve the NN consists in learning harder, that is, we increased the repetitions of the SGD to 100, Fig. 5b, and 1000, Fig. 5c, which finally leads to a good representation of the functions.

A different strategy consist in using different gradient search strategies, i.e. a different optimizer to minimize the cost function Eq. (9). In Fig. 5d we show the results where we used only 500 repetitions, however we switched between a SGD and an ADAM optimizer, which performs much better, that just an SGD alone. We would like to remark that a priory it is not clear which optimizer is the best, and the optimizer performance seems to be rather problem dependent, see [12].

Finally in Fig. 6 we present results obtained from a deeper network consisting of $1 \times 250 \times 50 \times 50 \times 1$ neurons. On the right axis we show the actual error of the fit which is below $5 \times 10^{-3}$ over the complete range. In result we obtained results a rather good approximation to the function at the expense of using more than 15.000 fit parameter $\{b_{\ell ,n}, w_{\ell ,n,j}\}$.

We would like to point out that the approach of using 15.000 fit parameter may appear odd as it renders an understanding of the network impossible. However, we are using the approach to construct a DFT functional. For the ladder it is also fair to state that most users of the modern sophisticated DFT functionals have no understanding on the details of their construction.

Appendix B: A convolutional network

We also tested the setup of a convolutional network. There, in addition to full layer, one constructs a small kernel layer that gets convoluted with the with the output of another layer. For details we refer to [12].

Specifically we implemented the a NN as displayed in Fig. 7, which resulted into 100.628 fit parameter. However, despite all the effort we could not improve on the results obtained from the (smaller) dense network.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schmitteckert, P. Learning DFT. Eur. Phys. J. Spec. Top. 230, 1021–1029 (2021). https://doi.org/10.1140/epjs/s11734-021-00095-z

Download citation

Received: 21 July 2020
Accepted: 05 January 2021
Published: 19 April 2021
Issue Date: June 2021
DOI: https://doi.org/10.1140/epjs/s11734-021-00095-z

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning DFT

Abstract

Access this article

Similar content being viewed by others

Machine learning accurate exchange and correlation functionals of the electronic density

Employing neural density functionals to generate potential energy surfaces

Fermionic neural-network states for ab-initio electronic structure

References

Acknowledgements