Abstract
We present an extension of reverse engineered Kohn-Sham potentials from a density matrix renormalization group calculation towards the construction of a density functional theory functional via deep learning. Instead of applying machine learning to the energy functional itself, we apply these techniques to the Kohn-Sham potentials. To this end, we develop a scheme to train a neural network to represent the mapping from local densities to Kohn-Sham potentials. Finally, we use the neural network to up-scale the simulation to larger system sizes.
Similar content being viewed by others
References
S.R. White, Density matrix formulation for quantum renormalization groups. Phys. Rev. Lett. 69, 2863–2866 (1992). https://doi.org/10.1103/PhysRevLett.69.2863
S.R. White, R.M. Noack, Real-space quantum renormalization groups. Phys. Rev. Lett. 68, 3487–3490 (1992). https://doi.org/10.1103/PhysRevLett.68.3487
S.R. White, Density matrix renormalization group. Phys. Rev. B 48, 10345 (1993)
R.M. Noack, S.R. Manmana, Diagonalization- and numerical renormalization-group-based methods for interacting quantum systems, in Lectures on the physics of highly correlated electron systems IX: ninth training course in the physics of correlated electron systems and high-tc superconductors, ed. by A. Avella, F. Mancini, vol. 789, pp. 93–163, Salerno, Italy (2005)
K.A. Hallberg, New trends in density matrix renormalization. Adv. Phys. 55(5), 477–526 (2006). https://doi.org/10.1080/00018730600766432
P. Hohenberg, W. Kohn, Inhomogeneous electron gas. Phys. Rev. 136, B864–B871 (1964). https://doi.org/10.1103/PhysRev.136.B864
W. Kohn, L.J. Sham, Self-consistent equations including exchange and correlation effects. Phys. Rev. 140, A1133–A1138 (1965). https://doi.org/10.1103/PhysRev.140.A1133
R.M. Dreizler, E.K.U. Gross, Density Functional Theory (Springer, Berlin, 1990)
O. Gunnarsson, K. Schönhammer, Density-functional treatment of an exactly solvable semiconductor model. Phys. Rev. Lett. 56(18), 1968–1971 (1986). https://doi.org/10.1103/PhysRevLett.56.1968
P. Schmitteckert, F. Evers, Exact ground state density-functional theory for impurity models coupled to external reservoirs and transport calculations. Phys. Rev. Lett. 100(8), 086401 (2008). https://doi.org/10.1103/PhysRevLett.100.086401
P. Schmitteckert, Inverse mean field theories. Phys. Chem. Chem. Phys. 20, 27600–27610 (2018). https://doi.org/10.1039/C8CP03763A
M.A. Nielsen, Neural Networks and Deep Learning (Determination Press, Baltimore, 2015)
W.S. McCulloch, W. Pitts, A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5(4), 115–133 (1943). https://doi.org/10.1007/BF02478259
D.E. Rumelhart, G.E. Hinton, R.J. William, Learning represantations by back-propagating errors. Nature 323, 533 (1986)
Y. LeCun, Y. Bengio, Convolutional networks for images, speech, and time series. Handb. Brain Theory Neural Netw. 3361, 3539 (1995)
T. Nomi et al., tiny dnn (2019)
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, TensorFlow: large-scale machine learning on heterogeneous systems (2015), https://www.tensorflow.org/. Software available from tensorflow.org
F. Chollet et al., Keras. GitHub (2015). https://github.com/fchollet/keras
G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto, L. Zdeborová, Machine learning and the physical sciences. Rev. Mod. Phys. 91, 045002 (2019). https://doi.org/10.1103/RevModPhys.91.045002
F. Brockherde, L. Vogt, L. Li, M.E. Tuckerman, K. Burke, K.-R. Müller, Bypassing the kohn-sham equations with machine learning. Nature Commun. 8(1), 872 (2017). https://doi.org/10.1038/s41467-017-00839-3. ISSN 2041-1723
B. Kolb, L.C. Lentz, A.M. Kolpak, Discovering charge density functionals and structure-property relationships with prophet: a general framework for coupling machine learning and first-principles methods. Sci. Rep. 7(1), 1192 (2017). https://doi.org/10.1038/s41598-017-01251-z. ISSN 2045-2322
L. Hu, X. Wang, L. Wong, G. Chen, Combined first-principles calculation and neural-network correction approach for heat of formation. J. Chem. Phys. 119, 11501 (2003)
X. Zheng, L.H. Hu, X.J. Wang, G.H. Chen, A generalized exchange-correlation functional: the neural-networks approach. Chem. Phys. Lett. 390(1), 186–192 (2004). https://doi.org/10.1016/j.cplett.2004.04.020, URL http://www.sciencedirect.com/science/article/pii/S0009261404005603
Q. Liu, J.C. Wang, D. PengLi, H. LiHong, X. Zheng, G.H. Chen, Improving the performance of long-range-corrected exchange-correlation functional with an embedded neural network. J. Phys. Chem. A 121(38), 7273–7281 (2017). https://doi.org/10.1021/acs.jpca.7b07045. PMID: 28876064
J.C. Snyder, M. Rupp, K. Hansen, K.-R. Müller, K. Burke, Finding density functionals with machine learning. Phys. Rev. Lett. 108, 253002 (2012). https://doi.org/10.1103/PhysRevLett.108.253002
J.C. Snyder, M. Rupp, K. Hansen, L. Blooston, K.-R. Müller, K. Burke, Orbital-free bond breaking via machine learning. J. Chem. Phys. 139(22), 224104 (2013). https://doi.org/10.1063/1.4834075
L. Li, T.E. Baker, S.R. White, K. Burke, Pure density functional for strong correlation and the thermodynamic limit from machine learning. Phys. Rev. B 94, 245129 (2016). https://doi.org/10.1103/PhysRevB.94.245129
T. Giamarchi, H.J. Schulz, Anderson localization and interactions in one-dimensional metals. Phys. Rev. B 37, 325–340 (1988). https://doi.org/10.1103/PhysRevB.37.325
P. Schmitteckert, T. Schulze, C. Schuster, P. Schwab, U. Eckern, Anderson localization versus delocalization of interacting fermions in one dimension. Phys. Rev. Lett. 80, 560–563 (1998). https://doi.org/10.1103/PhysRevLett.80.560
P. Schmitteckert, R.A. Jalabert, D. Weinmann, J.-L. Pichard, From the fermi glass towards the mott insulator in one dimension: Delocalization and strongly enhanced persistent currents. Phys. Rev. Lett. 81, 2308–2311 (1998). https://doi.org/10.1103/PhysRevLett.81.2308
R.A. Jalabert, D. Weinmann, J.-L. Pichard, Partial delocalization of the ground state by repulsive interactions in a disordered chain. Phys. E Low-Dimens. Syst. Nanostruct. 9(3), 347–351 (2001). https://doi.org/10.1016/S1386-9477(00)00226-5, URL http://www.sciencedirect.com/science/article/pii/S1386947700002265 (Proceedings of an International Workshop and Seminar on the Dynamics of Complex Systems)
P. Schmitteckert, Disordered one-dimensional fermi systems. Density Matrix Renormal. 33, 345–355 (1999). ISBN 978-3-540-66129-0
I. Peschel, X. Wang, M.Kaulke, K. Hallberg, eds. Density Matrix Renormalization (1999). (ISBN 978-3-540-66129-0)
K. Schönhammer, O. Gunnarsson, R.M. Noack, Density-functional theory on a lattice: comparison with exact numerical results for a model with strongly correlated electrons. Phys. Rev. B 52, 2504–2510 (1995). https://doi.org/10.1103/PhysRevB.52.2504
F. Evers, P. Schmitteckert, Density functional theory with exact xc-potentials: lessons from dmrg-studies and exactly solvable models. Phys. Status Solidi B 250, 2330 (2013)
H. Robbins, S. Monro, A stochastic approximation method. Ann. Math. Statist. 22(3), 400–407 (1951). https://doi.org/10.1214/aoms/1177729586
D.P. Kingma, J. Ba, Adam: a method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings, ed. by Y. Bengio, Y. LeCun (2015). arXiv:1412.6980. https://dblp.org/rec/journals/corr/KingmaB14.bib
J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization. JMLR 12, 2121–2159 (2011)
Acknowledgements
Most of the work reported here was performed while being at the university of Würzburg and was supported by ERC-StG-Thomale-TOPOLECTRICS-336012 and was presented at the FQMT’19 in Praque. We would like to thank Florian Eich for insightful discussions.
All authors contributed equally to the manuscript and the acquisition of the results.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Neural networks for fitting functions
The main application of neural networks consists in the classification of input variables, i.e. one maps the input to a discrete set of output variable, with the standard internet example of “is it a cat or not?”. Here we provide an example for applying a neural network on fitting a function.
1.1 A.1: The network
The basic building block of a neural network (NN) consists of a neutron as depicted in Fig. 3a.
The neuron consist of an input \(\{x_i\}\), weight factors \(\{w_i\}\), an offset b, a so-called activation function \(\sigma (z)\), see Fig. 3b, and the output z:
Throughout this work we have always used an \(\tanh \) activation function. One now combines many neurons, Fig. 3a, into a neural network in a layered fashion by connecting inputs of the neurons of one layer with the outputs of the neurons of the previous layer, see Fig. 4. Since from a user perspective the NN in Fig. 4 translates the input of the first layer into the output of the last layer one calls the first layer the input layer, the last one the output layer and the other layers are denoted as hidden layers. If each neuron is connected to each neuron of the previous layer one calls the network dense. The training of a NN is often referred to as machine learning, and in the presence of many hidden layer as deep learning.
In summary, the NN in Fig. 4 calculates an output z from the input \(\{x_j\}\), where one has to specify the parameter in Eq. (7) for each neuron n in layer \(\ell \),
Of course, this can be extended to create multiple output variables \(z_k\) in the output layer.
In order to apply a NN for fitting functions f(x) we use a NN with a single input x and a single output z. The free parameter \(\{b_{\ell ,n}, w_{\ell ,n,j}\}\) are the set of fitting parameter. We would like to note that this approach is in contrast to the desired approach in physics, where on tries to fit a phenomenon with a suitable function using as few fitting parameter as possible. Instead, here we take the opposite approach by using a simple fitting function unrelated to the problem and fit the desired function with a large number of parameter and a few steps of recursion.
1.2 A.2: Training the neural network: minimize cost function
The idea to determine the fit parameter for fitting a function \(f(\mathbf{x})\) consists in minimizing a cost function, typically
where N denotes the number of training samples. Eq. (9) could in principle be minimized by a standard steepest descent gradient search. However, due the vast amount of fit parameter this is not feasible in non-trivial examples, as the number of parameter, and therefore the dimensions of the associated matrices get too large. The breakthrough for neural networks was provided by the invention of the back propagating algorithm [14] combined with a stochastic evaluation of the gradients [36,37,38] combined the massive computational power of graphic cards, and for pattern recognition the use of convolutional layers [15], see below. In the example provided in this section we used tensorflow [17] software package combined with the keras [18] front end.
1.3 A.3: An example
As an example we look at the function
which has no deeper meaning, it was just handcrafted to represent a not too trivial function combining sharp and non-sharp features.
Since f is a single valued single argument function the input and output layer consists of a single neuron only. In Fig. 5 we show the results for fitting the function f(x) in Eq. (10) with two hidden layers consisting of fifty neurons each. In result we applied a dense NN with a \(1 \times 50 \times 50 \times 1\) structure. In order to train the system we generated 25.000 random values \(x_j\) with the corresponding \(z_j = f(x_j)\). We then trained the NN by performing a stochastic gradient descent search (SGD) with ten repetitions over the complete set of \(\{x_j, z_j\}\). We then evaluate the NN on an equidistantly spaced set of \(\{x_\ell \}\). As one can see in Fig. 5a the result is a rather smooth function that misses the sharp features. The way to improve the NN consists in learning harder, that is, we increased the repetitions of the SGD to 100, Fig. 5b, and 1000, Fig. 5c, which finally leads to a good representation of the functions.
A different strategy consist in using different gradient search strategies, i.e. a different optimizer to minimize the cost function Eq. (9). In Fig. 5d we show the results where we used only 500 repetitions, however we switched between a SGD and an ADAM optimizer, which performs much better, that just an SGD alone. We would like to remark that a priory it is not clear which optimizer is the best, and the optimizer performance seems to be rather problem dependent, see [12].
Finally in Fig. 6 we present results obtained from a deeper network consisting of \(1 \times 250 \times 50 \times 50 \times 1\) neurons. On the right axis we show the actual error of the fit which is below \(5 \times 10^{-3}\) over the complete range. In result we obtained results a rather good approximation to the function at the expense of using more than 15.000 fit parameter \(\{b_{\ell ,n}, w_{\ell ,n,j}\}\).
We would like to point out that the approach of using 15.000 fit parameter may appear odd as it renders an understanding of the network impossible. However, we are using the approach to construct a DFT functional. For the ladder it is also fair to state that most users of the modern sophisticated DFT functionals have no understanding on the details of their construction.
Appendix B: A convolutional network
We also tested the setup of a convolutional network. There, in addition to full layer, one constructs a small kernel layer that gets convoluted with the with the output of another layer. For details we refer to [12].
Specifically we implemented the a NN as displayed in Fig. 7, which resulted into 100.628 fit parameter. However, despite all the effort we could not improve on the results obtained from the (smaller) dense network.
Rights and permissions
About this article
Cite this article
Schmitteckert, P. Learning DFT. Eur. Phys. J. Spec. Top. 230, 1021–1029 (2021). https://doi.org/10.1140/epjs/s11734-021-00095-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1140/epjs/s11734-021-00095-z