Introduction

Density functional theory (DFT)1,2 has become one of the most valuable computational tools for the materials research community. It has guided the discovery of new catalysts3,4, the design of materials for energy storage5,6,7,8, and the exploration of material behavior under extreme conditions9,10,11, among other applications. The success of DFT lies in the transformation of the cumbersome many-electron many-nuclear problem of quantum mechanics to an effective one-electron Kohn–Sham (KS) equation2. Solving the KS equation for a material with a given atomic configuration provides information about the ground state electronic structure of the system in the form of one-electron wave functions (or charge density) and one-electron eigenvalues (or density of states). These quantities, i.e., either the wave functions and eigenvalues or the charge density and density of states, are the most essential and complete information of the material from which a host of properties can be computed, such as the potential energy, atomic forces, and stress tensor. DFT-based research has seen several advancements over the last several decades in the areas of theory, algorithms, and computational infrastructure, instrumental in the above-mentioned discoveries. Nevertheless, practical and routine DFT calculations of complex materials involving several thousands of atoms to probe phenomena that occur over timescales of the order of nanoseconds or longer remain inaccessible.

Over the last decade, machine learning (ML) based approaches are actively being considered in various ways to meet the length- and time-scale demands encountered during DFT computations. ML provides a powerful pathway to replace a cumbersome or expensive “input–output” problem with a cheap “surrogate” model. The accuracy and versatility of such models depend on the number and diversity of input–output examples the model has seen before and the internal architecture of such models. The past decade has seen several successful ML efforts applied to various material properties and application spaces12,13,14,15,16,17,18,19,20,21,22,23,24,25,26.

The present contribution attempts to provide an efficient emulation of DFT by treating the KS equation itself as an input–output problem. Of relevance to the present contribution are our own past work27,28, in which the problem in question was addressed to a limited level, and the recent work by Brockherde et al.29. The latter work attempts to bypass explicitly solving the KS equation using a plane-wave basis representation of the electron density. While the accuracy of the model was demonstrated for small molecules, it was not transferable to large systems. Since this first attempt, two main methodologies have been investigated to predict the charge density: grid-based schemes27,30,31,32 and atom-based representations in terms of basis functions33,34,35,36,37. The main advantage of a grid-based approach is the high accuracy obtained and general applicability to localized and delocalized electron densities. However, no information about individual atomic charges can be retrieved, and the high computational cost hinders its applicability to large databases. On the other hand, predicting the charge density as atomic contributions in terms of a basis set significantly reduces the computational cost and provides information on the individual atomic charges at the cost of lower accuracy, especially in systems with a delocalized electron density. One important advantage of atom-based representations is the higher transferability to new and larger systems, which is essential for a successful deployment of the model. Nevertheless, challenges still remain with respect to achieving comparable model performance for larger systems relative to smaller systems used during the training phase. Although methods have been developed to varying degrees of success to predict either the electronic structure or basic atomic properties such as total potential energy, atomic forces, or stress tensor, there is yet no scheme that has successfully unified simultaneous prediction of both types of properties in a comprehensive KS-DFT emulation.

In this work, the KS equation is handled in an alternative manner using a deep learning scheme, both in terms of methodological advancements and applicability, which predicts the electron density first and then employs it as an additional descriptor of the material to further predict other electronic and atomic quantities. The electronic quantities predicted other than the charge density are the density of states, valence band maximum (VBM), conduction band minimum (CBM), and band gap (Egap). The atomic or global quantities predicted are the total potential energy, atomic forces, and stress tensor, essential for molecular dynamics (MD) simulations. Our scheme also expands the flexibility and transferability of the model, allowing for training on molecules, 2D, and 3D systems within a large chemical space composed of carbon (C), hydrogen (H), nitrogen (N), and oxygen (O). Overall, the present contribution allows for a near-complete DFT emulation within a practical context, surpassing previous works27,33,34,37,38 in terms of methodological advancements as well as expanding the portfolio of predicted quantities.

Figure 1 shows several critical components which are part of our ML workflow. We created a reference, or “training,” data composed of molecules, polymer chains, and polymer crystal structures containing C, H, N, and O, and their corresponding properties computed using traditional DFT. Each reference atomic configuration is then represented using an ML-friendly atomic fingerprinting scheme. In this work, we used the atom-centered AGNI fingerprints39,40, which represent the structural and atomic-level chemical environment of each atom in a machine-readable form such that it is translation, permutation, and rotation invariant. Because not all properties predicted are rotation invariant, i.e., electron density, atomic forces, and stress tensor, we define an intermediate internal reference system that allows us to easily transform any quantity to its value in the reference system of choice. To establish a direct (and nonlinear) mapping of the fingerprints (the “input") to the spectrum of properties mentioned earlier (the “output"), we used deep neural networks.

Fig. 1: ML-DFT database and two-step workflow.
figure 1

a The reference database contains DFT data from organic molecules, polymer chains, and polymer crystals. After creating the database, the atomic configurations are fingerprinted to describe the structural and chemical environment of each atom (b). Within step 1 (c), the resulting atomic fingerprints are used as the input layer to predict the electronic charge density in terms of various Gaussian-type orbitals (GTOs) descriptors. The projection of these GTOs onto grid points provides the charge density. In step 2 (d), the combined atomic fingerprints and charge density descriptors serve as input for the prediction of other DFT properties such as potential energy, atomic forces, stress tensor, density of states, valence band maximum, conduction band minimum, and the bandgap.

Inspired by DFT, we give particular prominence to the electronic charge density and follow a two-step learning procedure. The first learning problem (step 1) involves predicting the electronic charge density given just the atomic configuration. Our protocol employs Gaussian-type orbitals (GTOs) as descriptors of the electronic charge density, but we do not use a predefined basis set; the model learns the most optimal basis from the data examples, thus expanding the flexibility of the model. Once the electronic charge density descriptors have been predicted, they are supplied as an auxiliary input (along with the atomic configuration fingerprints) to predict all other properties (listed as step 2 of Fig. 1). This strategy is consistent with the core concept underlying DFT (that the electronic charge density determines all properties of the system), and is aligned with the first rudimentary ML attempt almost a decade ago in which a variety of properties were predicted given just the electronic charge density41. Furthermore, in practice, this 2-step route also leads to more accurate and transferable results, as we will show below.

Results

Database

In this work, we focused on organic materials composed of four atoms: C, H, N, and O. We created a database containing 67 molecules, 178 polymer chains, and 55 polymer crystals composed of C-C single, double, and triple bonds, as well as aromatic rings. To provide the neural network with sufficient examples of configurational diversity, we procured random snapshots of each type of structure from DFT-based MD runs at high temperatures. For the molecules and polymer chains, the MD runs were performed at 300 K, and for the polymer crystals, the MD runs involved temperatures from 100 to 2500 K. In total, we used over 118,000 structures for the training and testing of ML-DFT. For each type of structure, we divided the selected configurations into training and test sets, following a 90:10 split. Additionally, the models were trained using an 80:20 split of the training set between training and validation. All performance results were computed using the independent test set. More details can be found in the Supplementary Information (SI). All the DFT reference data calculations were performed using the Vienna Ab Initio Simulation Package (VASP)42,43.

Fingerprinting

Within this work, we have employed two different types of fingerprints or descriptors: atomic (or structural) fingerprints and electron charge density descriptors. The atomic descriptors are the AGNI atomic fingerprints, which describe the structural and chemical environment of each atomic configuration. These previously developed AGNI atomic fingerprints have been used to create ML potentials and force fields44,45,46 for a variety of materials as well as in our previous work on predicting the grid-based electronic structure27 and the atom-based density of states28. The atomic fingerprints, computed for each atom, combine scalar, vector, and tensor-like expressions by summing over various Gaussian functions, resulting in translation, permutation, and rotation invariant descriptors.

The predicted electronic charge density descriptors constitute the second type of fingerprints used in this work. Once all the configurations have been fingerprinted, the AGNI atomic fingerprints are used as input for the charge density model, which predicts the decomposition of the atomic charge density in terms of GTO basis functions. The set of optimal GTO basis functions is selected by the model in terms of the exponent of the Gaussian along with the constant multiplying the GTO; no initial decomposition of the reference charge density in terms of GTOs is performed. The error made during the training of the model is computed by projecting each set of atomic GTO basis functions onto the same grid points used for the reference DFT charge density. As the input atomic fingerprints are translation, permutation, and rotation invariant, the predicted constants and exponents of the GTO basis functions decomposition are within the internal reference system of the atom. Because the reference electron charge density is not rotation invariant, a transformation from each atom’s internal reference system to the global reference system of the electron charge density (the Cartesian system) is required before projecting the predicted charge density onto the grid points. The vectors for the transformation matrix of each atom are defined using the two nearest neighbors (independent of the element type): the first vector is the one pointing to the first nearest neighbor, and the second vector is defined as perpendicular to the plane containing the central atom and its two nearest neighbors, and the third vector is perpendicular to the first two vectors. All vectors are normalized to obtain an orthonormal reference system (more details in the SI). The resulting transformation matrix is used to convert from the orthonormal internal reference system of each atom onto the Cartesian reference system, allowing the transformation of any rotation-invariant value of a property (such as the decomposition onto GTOs) onto the Cartesian reference system. Because of the computational cost of projecting the GTOs onto grid points, we restricted the training to structures with up to 50 atoms per element. Once the model is trained, the predicted constants and exponents for each atomic fingerprint will be referred to as charge density descriptors. Unlike atomic fingerprints, which are determined by a set of predefined equations, these charge density descriptors are learned by the neural network and provide an electronic description of the system. More details on the charge density model can be found in the SI.

Charge density prediction

To study the performance of the charge density model, we computed the mean absolute percentage error (ϵρ) for each configuration as

$${\epsilon }_{\rho }( \% )=100\cdot \frac{{\sum }_{j}| {\rho }_{{{{\rm{DFT}}}}}({r}_{j})-{\rho }_{{{{\rm{ML}}}}}({r}_{j})| }{{\sum }_{j}{\rho }_{{{{\rm{DFT}}}}}({r}_{j})}$$
(1)

where ρDFT(rj) and ρML(rj) are, respectively, the reference DFT charge density and ML-DFT charge density at grid point j, for the same configuration.

The accuracy of the charge density model can be observed in Fig. 2a, where ϵρ for the test configurations of the training set ranges mainly from 1.0% to 3.0%, with a few cases extending up to 5.0%. Most notably, the performance on the new structures (more than 50 atoms per element) not included during training is very similar to those structures used for training. An overall performance value can be calculated as the mean \({\bar{\epsilon }}_{\rho }\) computed by summing ϵρ for all configurations and normalizing by the number of electrons in each configuration. For the test configurations, \({\bar{\epsilon }}_{\rho }\) is 1.75%, increasing slightly to 1.97% for the new structures. Figure 2b presents the valence charge density difference between the reference DFT and predicted ML-DFT for various atomic structures: the cyclobutane molecule, three different polymer chains, and two polymer crystals. The cyan and yellow isosurfaces refer to an error of ± 0.005 e bohr−3. The isosurfaces occupy a very small volume due to the high accuracy of the predicted ML-DFT charge density.

Fig. 2: Performance of ML-DFT charge density model.
figure 2

a Histogram of the mean absolute percentage error, ϵρ, for the charge density on the test configurations of the training and for new larger structures. The amount of structures in each bar is indicated as a percentage with respect to the total in each set. b Charge density difference between DFT and ML-DFT for a molecule (cyclobutane), three polymer chains, and two crystalline polymers (Cryst). Cyan and yellow isosurfaces refer to an error of ± 0.005 e bohr−3.

Various other charge–density dependent properties can be calculated from the predicted charge density and charge density descriptors, such as the partial atomic charges, the reduced density gradient for the analysis of non-covalent interactions, and the dipole moment. Examples and comparisons with DFT results are included in the SI. In addition, to further extend the applicability of our method to situations requiring the full electron density, ML-DFT also provides the core electron density by mapping it to the DFT reference using 1s orbitals, with an accuracy of around \({\bar{\epsilon }}_{\rho ,{{{\rm{core}}}}}( \% )=5\cdot 1{0}^{-5}\). When the full electron density is used, the error of the predicted total electron density is reduced by 27% overall, with \({\bar{\epsilon }}_{\rho ,{{{\rm{f}}}}ull}=1.28 \%\) on test configurations and \({\bar{\epsilon }}_{\rho ,{{{\rm{f}}}}ull}=1.44 \%\) on new structures. More details in the SI.

Important previous ML work to predict the charge density-based atomic contributions required an initial decomposition onto predefined basis sets, introducing an additional error34. The ML model was trained to predict the components of the decomposition for each atom, resulting in good predictions of the charge density for cases within the training space but leading to a lack of transferability to new cases. The authors used the full charge density instead of only the chemically active valence charge density used in our work. This difference results in their work presenting lower errors in the charge density prediction, as the high-valued core charge density is easily predicted and effectively lowers the percentage errors. Moreover, the requirement for already available basis sets to decompose the charge density can become a hindrance to the applicability of this method to some elements. Comparison with more recent work predicting the valence electron density using an atom-centered approach35,36, shows our method has better accuracy. Another important comparison with recent work is the high error cases within the training space, which in this study extend up to a maximum of 5.02%, whereas in37 they reach up to 11%. Overall, our ML-DFT charge density surpasses previous methods in terms of accuracy and/or methodology, simplifying the protocol and, in the process, extending the applicability and transferability.

Total potential energy, atomic forces, and stress tensor

DFT posits that the ground state charge density has a one-to-one mapping with the ground state potential energy. Similarly, in our ML-DFT emulator, once the atomic charge density descriptors are predicted, they can be used as input (along with the AGNI atomic fingerprints) to predict the potential energy, atomic forces, and stress tensor. To evaluate the improvement in the accuracy and transferability of the potential energy model by including the charge density descriptors in the input layer, we considered three different options: using the atomic fingerprints only, the charge density descriptors only, and the atomic fingerprints and charge density descriptors together. We only used the polymer chains for training and left the polymer crystals to test the transferability to new structures. We performed fivefold cross-validation and evaluated the performance on test configurations for the polymer chains used during training (a.k.a. new configurations), for new polymer chains (a.k.a. new polymers), as well as for polymer crystals (a.k.a. new structures).

Figure 3a shows the histogram with the mean and standard deviation of the mean absolute error (MAE) value of the total potential energy per atom of each type of descriptor on the test configurations. The combined atomic fingerprints and charge density descriptors not only improve the accuracy and transferability but also reduce the deviation of the predictions, resulting in more robust models. As can be seen in the principal component analysis plots in Fig. 3b, c, the addition of the charge density descriptors results in a better separation of the structures with different potential energies, thus improving the prediction capabilities of the model.

Fig. 3: Effect of using charge density descriptors along with atomic fingerprints.
figure 3

a Histogram of MAE from the fivefold cross-validation for the potential energy prediction with three different input descriptors: (1) the atomic fingerprints (FP); (2) the charge density descriptors (CHG); (3) the atomic fingerprints along with the charge density descriptors (FP + CHG). Plots of the two main principal components (PC) from only using b the atomic fingerprints and using c both the atomic fingerprints and the charge density descriptors. The points are colored with respect to the total potential energy.

In our approach, the potential energy, atomic forces, and stress tensor components are each predicted directly (without using one to derive the others) by employing the same transformation matrix from the charge density model to transform the atomic forces and stress tensor components into the Cartesian reference system. This allows a significant reduction of errors in the atomic forces and stress tensor components while also improving the transferability to new structures. More details on the model and a quantitative test showing the improved performance can be found in the SI.

After confirming the advantage of employing the fingerprints and the charge density descriptors together, we trained the energy, forces, and stress tensor model on the entire training set of molecules, chains, and crystals. Figure 4 shows the performance of the model on the test configurations for the atomic potential energy per atom (Fig. 4a), the stress tensor components (Fig. 4b), and the atomic forces (Fig. 4c). Both the potential energy and stress tensor components are predicted with great accuracy, with an MAE of 3.3 meV atom−1 for the potential energy, and a mean root-mean-squared error (RMSE) of 6.42 kB for the diagonal stress components. However, the predicted atomic forces present a mean RMSE of 0.759 eV Å−1, with the C atomic forces presenting larger deviations from the reference DFT values than the other elements studied. This deviation is mainly observed in the polymer crystals, as observed in Fig. 4d; from the histogram of the error in the predicted forces, most of the errors are contained within ±1 eV Å−1. There are very few instances with errors larger than ±3 eV Å−1, with the highest error obtained at ~11 eV Å−1.

Fig. 4: Performance of the energy, stress tensor components, and atomic forces from the ML-DFT model for the test configurations of the entire database of molecules, polymer chains, and polymer crystals.
figure 4

a Parity plot of the potential energy per atom. b Parity plot of the six different components of the stress tensor. c Parity plot of the atomic forces for each type of element. d Histogram of the error between the reference atomic forces (Fi(DFT)) and the predicted atomic forces (Fi(ML)).

Some possible reasons behind these high errors in the atomic forces could be attributed to the insufficient sampling of highly disordered structures present in the crystal polymers, as well as the inability to capture non-local effects, such as the long-range van der Waals dispersion forces, using local atomic fingerprints. Similar results with large errors in the atomic forces have been reported in other previous studies on machine-learned atomic potentials and force fields for pure carbon structures47,48,49 and carbon-containing structures50, where various methods of database optimization are used, such as active learning. These methods could be used with ML-DFT to improve the performance of the atomic forces but are out of the scope of this study.

The density of states predictions

As previously mentioned, the solution of the KS equation in DFT describes the electronic structure of the system in the form of the charge density and the DOS. This last property is essential for computing multiple electronic properties of the system, such as the VBM, CBM, and Egap. Using a similar approach as previously described for the potential energy, the ML-DFT DOS predictor also employs as input both atomic fingerprints and charge density descriptors. Following previous work from our group27,28, the reference DOS curve is previously shifted with respect to the reference energy of vacuum and discretized every 0.1 eV from −33 eV to 1 eV. Due to this constraint, we only trained the model using vacuum-containing structures: molecules and polymer chains. To achieve the highest accuracy in the VBM and CBM, and consequently the bandgap, their DFT reference values are obtained directly from the (shifted) eigenvalues and not from the smeared DOS. Due to the intensive nature of the VBM and CBM, the entire DOS/VBM-CBM model first predicts the smeared DOS curve as the sum of each atomic contribution, normalizes the total DOS with respect to the number of valence electrons, and feeds it to a second sub-neural network which predicts the VBM and CBM (additional details in the SI). Figure 5a–f displays six different DOS for test configurations of different molecules and polymer chains. As can be observed, the test cases possess a large variety of DOS curves. Nevertheless, the ML-DFT DOS model can accurately predict them. Also, the accuracy of the predicted DOS is high enough to evaluate differences due to the atomic movement (see details in SI).

Fig. 5: Performance of the ML-DFT DOS model.
figure 5

af DFT DOS (blue) and ML-DFT DOS (red) for test configurations of six different molecules and polymer chains. The DFT and ML-DFT VBM/CBM predictions are included. The vertical dashed dark green line indicates the vacuum energy used as the global energy reference. The gray shadow indicates the standard deviation in the predicted DOS curves due to the dropout layers employed in the ML-DFT DOS model. gi Parity plots of the ML-DFT VBM, CBM, and resulting Egap for the test configurations. Vertical lines indicate the standard deviation.

As with electron density prediction, previous works on predicting the DOS are divided between a grid-based scheme27,32 and an atom-centered approach28,51,52,53. Focusing on the more recent studies, Ellis et al.32 predict the DOS through the LDOS for liquid and solid Al using a grid-based approach. The DOS obtained from the predicted LDOS shows very good accuracy, but the DOS of Al presents a smooth shape with very small variations, even from solid to liquid. However, the computational cost of using a grid-based approach significantly hinders the use of the method for large databases. Kong et al.53 use a significantly different and much more advanced method to represent the atomic structure based on graph neural networks along with an encoder–decoder technique. The method learns the similarity between crystalline structures, which is then translated into the DOS. One advantage of this technique is probably the lower cost when applied to diverse chemistries. However, it seems focused on crystalline structures, and its application to slightly unrelaxed structures is dependent upon the atomic coordinates being sufficiently similar to any of the DFT-relaxed structures used during training.

Figure 5a–f also includes the DFT and ML-DFT VBM and CBM. Due to the use of the DFT eigenvalues for VBM and CBM along with the smeared DOS as a reference, the locations of DFT and ML-DFT VBM and CBM are not at zero-valued DOS. As can be observed from the parity plots between the DFT reference and the predicted ML-DFT values of VBM, CBM, and Egap, in Fig. 5g–i, the ML-DFT values are in agreement with the reference DFT eigenvalues within MAEs of 0.069, 0.051, and 0.08 eV, respectively.

As a final note, in Fig. 6, we compare the computational cost of DFT with our ML-DFT approach for various structures of different sizes. Both types of calculations (DFT and ML-DFT) were performed in serial mode on one core of a Ryzen 9 5900X node. The total CPU time cost of DFT is significantly higher and has a cubic dependence on the system size. However, the total time for the electronic structure prediction with our ML-DFT is orders of magnitude lower than DFT with a linear dependence on system size. The ML-DFT model depends on the number of element types in the system: the red squares represent cases with only carbon and hydrogen; the red star is for a polymer chain with three elements (carbon, hydrogen, and oxygen); and the red cross represents the case of a polymer crystal with all four elements.

Fig. 6: Total CPU time of DFT versus ML-DFT for electronic structure predictions.
figure 6

DFT shows an initial high computational cost and a cubic dependence on the system size. On the other hand, ML-DFT is orders of magnitude faster than DFT and linearly dependent on the system size. Red squares: structures with carbon and hydrogen. Red star: structure with three elements. Red cross: structure with all four elements.

Discussion

This work represents an important step toward a physically-informed ML-based DFT emulator, which successfully, accurately, and simultaneously reproduces many of the outputs of the KS equation. Following the essence of DFT, material properties are determined by the descriptors of the structure and the predicted charge density, resulting in increased accuracy with respect to traditional ML potentials for a fraction of the computational cost of traditional DFT.

To represent the charge density with physically-informed descriptors, we employed GTOs, which fully adapt to each individual atom and can be used for any element type. The resulting descriptors contain physical information about the system in two ways. First, as descriptors of the electronic structure of the system, which can be employed to further predict density-dependent properties. We demonstrate this density-enhanced protocol for the atomic properties of energy, atomic forces, and stress tensors, obtaining better performance and more robust models than by using only atomic structure descriptors. Second, the descriptors can be used to directly calculate the partial atomic charges, obtaining high accuracy when compared to the leading methods in the field. We expect further applications of these physically-informed descriptors of the charge density for the prediction of other electron density-related properties such as the dipole moment (initial tests in SI) or polarizability.

Continued future improvements in the performance and robustness of the ML-DFT approach will allow for a wide range of applications: from electronic structure prediction to structure search and optimization, while integration with MD codes will allow for emulations of ab initio MD simulations. Large-scale dynamical simulations of disordered systems such as liquids or glasses may be performed, which are challenging for traditional DFT. Generalization of this methodology to more delocalized electron densities, such as in metals, may require modifications to the type of basis functions used in the decomposition of the charge density and will be explored in the future (see test on liquid Li in the SI). We also envision further modifications to increase the accuracy and transferability by shifting our present fingerprinting from predefined equations to allow deep learning architectures to search for the best atomic descriptors.

Methods

DFT details

All the reference data calculations were performed using DFT-MD simulations using the Vienna Ab Initio Simulation Package (VASP)42,43. The exchange-correlation function was modeled using the Perdew–Burke–Ernzerhof approximation54, and the ion–electron interaction was modeled using projector-augmented wave (PAW) potentials55. We employed a Monkhorst–Pack grid56 with a density of 0.025 Å−1 to sample the Brillouin zone. A plane wave basis set with a kinetic energy cutoff of 500 eV was used. The chosen kinetic energy cutoff and k-point sampling converged the total energy to less than 1 meV per atom. Tkatchenko and Scheffler vdW corrections were included57. Gaussian smearing of 0.1 eV was used. The MD simulations were performed in the NVT ensemble, with a time step of 1 fs for the molecules and polymer chains at 300 K. For the polymer crystals, due to the high temperatures reached, we used a timestep of 0.5 fs. All structures were thermalized for 500-time steps at their initial temperature (300 or 100 K), and the snapshots were taken from the subsequent simulations spanning 1 ps for the molecules and polymer chains and 5 ps for the crystal polymers.

AGNI fingerprints

For a given atom i, three different types of AGNI fingerprints are defined (scalar, vector, and tensor)27,28, expressed as the sum over the number of Gaussian functions (k) of width σk,

$${S}_{k,i}={c}_{k}\mathop{\sum }\limits_{j=1}^{N}\exp \left(\frac{-{R}_{ij}^{2}}{2{\sigma }_{k}^{2}}\right){f}_{c}({R}_{ij})$$
(2)
$${V}_{k,i}^{\alpha }={c}_{k}\mathop{\sum }\limits_{j=1}^{N}\frac{{r}_{ij}^{\alpha }}{{R}_{ij}}\exp \left(\frac{-{R}_{ij}^{2}}{2{\sigma }_{k}^{2}}\right){f}_{c}({R}_{ij})$$
(3)
$${T}_{k,i}^{\alpha \beta }={c}_{k}\mathop{\sum }\limits_{j=1}^{N}\frac{{r}_{ij}^{\alpha }{r}_{ij}^{\beta }}{{R}_{ij}^{2}}\exp \left(\frac{-{R}_{ij}^{2}}{2{\sigma }_{k}^{2}}\right){f}_{c}({R}_{ij})$$
(4)

where ck is the normalization constant defined as \({\left(\frac{1}{\sqrt{2\pi }{\sigma }_{k}}\right)}^{3},{R}_{ij}\) the distance between atom j and the center atom i, and fc(Rij) a cutoff function defined as \(0.5\left[\cos \left(\frac{\pi {R}_{ij}}{{d}_{c}}\right)+1\right]\) for Rij ≤ dc and equal to 0 for Rij > dc. α and β represent the x, y, or z components of the radial vector between atoms i and j. In this work, we employed 18 different Gaussian widths on a logarithmic scale (base 10) from 0.5 to 6.0 Å, with a cutoff distance of dc = 5 Å. While Sk is rotation invariant, \({V}_{k}^{\alpha }\) and \({T}_{k}^{\alpha \beta }\) are not, but can be combined into four rotation invariant expressions27, which are employed as the fingerprints.

ML-DFT architecture

We used Keras58 with the TensorFlow backend to implement the ML-DFT. The charge density and DOS models employ fully connected layers and are trained using a mini-batch of 30, while the model for the potential energy, atomic force, and stress tensor uses a mini-batch of 100. All models use random sampling along with Adam optimizer with a learning rate of 0.0001 and momentum vectors β1 = 0.9 and β2 = 0.999. The mean-squared error was employed as the objective function during all training. More details about the specific architectures of each model can be found in the SI.