Three separate genetic algorithm optimization runs were carried out for GGA type functionals, using either no dispersion correction in the genome, the D3(BJ) atom-pairwise potential by Grimme (D3), or the density-dependent NL correction by Vydrov and Van Voorhis. The compositions and parameters obtained for the top performing Evolutionary GGA (EG) functionals in each species—labeled EG, EG-D3, and EG-NL—are given in Table 1. The WTMAD scores of the evolved functionals computed with the quadruple-ζ basis set are shown in Fig. 4. This figure also includes WTMAD scores computed with the same basis set of the closely related PBE and B97-D3 GGA functionals as computed in this work.
Comparing the functionals yielded by the genetic algorithm amongst each other, the following trend can be observed. The functionals which incorporate dispersion correction parameters in their genome during the genetic algorithm optimization process (EG-D3 and EG-NL) exhibit significantly lower WTMAD scores (18.41 kJ/mol for both functionals) than the functional evolved without dispersion correction (EG with 24.27 kJ/mol). This behavior is hardly surprising. The inability of standard DFT to properly describe dispersion type interactions has been the subject of several recent studies and different empirical corrections have been developed to counteract this shortcoming, such as the D3(BJ) correction and the NL correction. Since the GMTKN30 reference database used in this work contains many test systems where dispersion effects are important, it can be expected that evolved functionals with dispersion corrections perform better. No difference in accuracy is observed between the D3(BJ) correction and the NL correction for the GMTKN30 database.
Pertaining to the composition of the various GGA functionals, we find that the genetic algorithm is able to recover several conventional functionals that are known to show excellent performance for the GMTKN30 database. The EG functional without dispersion correction essentially recovers the popular PBE functional  (see Table 1). Although EG uses the PW91(C) correlation functional, this correlation functional is almost identical to PBE(C) with exception of one additional term and is expected to exhibit almost exactly the same performance. This is indeed the case in the present work, as can be seen when comparing the WTMADs of both functionals (24.27 kJ/mol for EG versus 24.69 kJ/mol for standard PBE). Similar results are found for the functionals using dispersion correction (EG-D3 and EG-NL). In both cases, Becke’s B97-D functional  is recovered, one time using D3 dispersion correction and the other time using the NL correction, showing the same performance of the original B97-D3 functional (18.41 kJ/mol in all cases). Since B97-D and its variants are amongst the most reliable GGA functionals incorporating long-range dispersion interactions, it is of little surprise that the genetic algorithm correctly identifies them as one of the top performers on the GMTKN30 database. While the above findings highlight the saliency of the genetic algorithm approach, the search for GGAs should only be regarded as a proof of principle study. Since the valid GGA genomes were only drawn from ten GGA exchange functionals (including one meta-GGA), three local correlation functionals, and nine GGA correlation functionals (including one meta-GGA), the total number of possible combinations is 210 functionals, which could also be explored in a more systematic manner. In the case of the GGAs incorporating dispersion, the genetic algorithm explores a slightly larger search space, since the long-range parameters are determined at the same time as the functional composition. This approach is unusual, as typically dispersion corrections are parametrized in an a posteriori manner. However, no difference in the performance is observed in the present work (see, e.g., EG-D3 and B97-D3).
Similar to the GGA functionals reported above, three genetic algorithm evolution runs have been carried out to identify well-performing Evolutionary Hybrid (EH) functionals, using once again genomes with no dispersion correction, as well as the D3(BJ) and NL corrections (a complete genealogy of the EH-NL functionals can be found in the supporting information, which illustrates the work of the genetic algorithm). The WTMAD scores of the best resulting functionals—termed EH, EH-D3, and EH-NL—can be found in Fig. 5. Their compositions as well as hybrid and dispersion parameters are given in Table 1.
As expected, all evolved hybrid functionals yield lower WTMAD scores than their GGA counterparts. The benefit of including a dispersion correction on the overall WTMAD scores is also observed. However, in contrast to the GGA functionals, where the D3(BJ) and NL dispersion corrections perform equally well, NL outperforms D3(BJ) in the case of hybrid functionals. The trend obtained for the evolved GGAs suggests that this difference is mainly due to the larger parameter space of D3(BJ) (four additional parameters) compared to NL (one additional parameter). The already enlarged genome of hybrid functionals in combination with the additional parameters introduced by the D3(BJ) correction complicates the search for the global optimum and, as a result, the genetic algorithm is terminated before complete convergence is reached. Further iterations until convergence were not carried out to ensure comparability with the other genetic algorithm runs (EH-NL, EH).
The hybrid functional EH without explicit dispersion correction is closely related to the one parameter hybrid B97 of Becke . The main differences between both hybrids are the change of the B97(X) exchange term for B88(X) and the introduction of two additional parameters present in the standard three parameter hybrid form employed above (b and c). Compared to the standard B97 functional (WTMAD of 20.50 kJ/mol), the variant EH (WTMAD of 17.99 kJ/mol) shows better overall performance on the GMTKN30 database. The reason for this behavior is a combination of two effects: First, the functional form of EH is more flexible due to the two additional parameters. Second, EH is directly optimized on GMTKN30, while B97 was parametrized on a different set of molecules. The primary difference between both sets is the presence of a wide range of non-covalent interaction benchmarks in GMTKN30, which contribute to the overall WTMAD with a high weight. Based on the information contained in these benchmarks, the genetic algorithm utilizes the additional flexibility of EH to introduce a dispersion-like behavior (see Fig. 6). Consequently, the WTMAD of EH associated with non-covalent interactions (10.04 kJ/mol) is much lower than in B97 (16.74 kJ/mol), leading to the improved WTMAD score. Considering only basic properties and reaction energies, both hybrids exhibit a much closer performance, with B97 achieving slightly better accuracy for basic properties (19.66 kJ/mol vs. 20.50 kJ/mol) and EH for the reaction test set (24.27 kJ/mol vs. 22.18 kJ/mol). This trend is to be expected, as B97 was parametrized systematically in order to provide a good performance over a wide range of model chemistries. At the same time, this finding demonstrates the power of the genetic algorithm search procedure, as it is able to utilize information in the reference data in a manner contrasting to conventional parametrization strategies. Whether this use of information is physically founded or not remains to be addressed in future investigations. The composition of EH-D3 is similar to EH, but includes explicit dispersion correction. A direct comparison of this functional to B97 with D3(BJ) dispersion correction is not possible, as no D3(BJ) parameters have been reported for the B97 hybrid. However, due to the above trends, it is expected that both functionals would exhibit a similar performance, as non-covalent interactions are now accounted for in an explicit manner in both cases. An important observation related to EH-D3 is that while the genetic algorithm is able to identify a sufficiently good solution, other conventional functionals with lower WTMADs are in principle accessible but not found during optimization (e.g., B3PW91-D3, see Fig. 6). This failure to identify the minimum corresponding to B3PW91-D3 can be once again attributed to the expanded parameter space introduced by the D3(BJ) correction, which slows down the convergence of the genetic algorithm (see above). The final hybrid functional EH-NL is indeed a reparametrized version of B3PW91  using the NL dispersion correction. However, in this case, the genetic algorithm is able to identify an improved set of parameters and EH-NL shows a significantly lower WTMAD than both its D3 and NL counterparts (11.30 kJ/mol vs. 12.55 kJ/mol and 14.23 kJ/mol, respectively). Unlike in the case of EH, this gain in performance is not achieved using the extra flexibility of the hybrid to introduce artificial dispersion behavior. Instead, ED-NL primarily improves upon the other B3PW91 versions in the basic properties (19.66 kJ/mol vs. 20.50 kJ/mol for B3PW91-D3 and 23.85 kJ/mol for B3PW91-NL) and reaction energies benchmarks (8.79 kJ/mol vs. 11.30 kJ/mol and 12.55 kJ/mol), while exhibiting nearly the same performance for non-covalent interactions (3.35 kJ/mol vs. 3.78 kJ/mol and 3.35 kJ/mol). Especially the accuracy obtained for reaction energies is remarkable. On the whole, EH-NL shows excellent performance across the entire GMTKN30 dataset even when compared to such successful functionals as ωB97X-D  (WTMAD of 11.72 kJ/mol). The close relation between the hybrid functionals found in this work to their conventional counterparts also serves as a general demonstration for the excellent performance of the latter for a wide range of chemical systems.
An ongoing discussion in the DFT community concerns the amount of exact exchange (referred to as a in this work) used in hybrid functionals. Typical values range from 10% Hartree–Fock exchange for TPSSh  up to 54% for M06-2X , with the majority of standard hybrid functionals clustered around 25%, which is also the optimal value of exchange admixture suggested by theory . Analyzing the top 100 evolved hybrid functionals of the three different species with respect to the amount of exact exchange, average values of 25, 24, and 23% are obtained if no dispersion correction, the D3(BJ) correction or the NL correction is used, respectively. This finding supports the consensus that for three parameter hybrids an admixture Hartree–Fock exchange close to 25% is the best compromise if good performance over a variety of different systems is desired .
Methods that were optimized using a certain basis set can show erratic behavior when paired with other basis sets (i.e., reduced accuracy, even when a larger basis sets than the original is used). Consequently, parametrization is often carried out with a basis set of similar quality as the one intended for the accuracy assessment or subsequent practical applications. In this work, the WTMAD computed with a quadruple-ζ basis set (def2-QZVP) is used in the final comparison of the different functionals. However, carrying out all 2,582,000 single point calculations (as comprised in the GMTKN30 database) required over the course of a single genetic algorithm run would be too demanding with a basis set of this size. Hence, a smaller double-ζ basis set (def2-SVP) was chosen for the fitness evaluation of functionals, in order to render the computations required for the genetic algorithm tractable. To assess whether the use of this protocol is justified, the WTMADs of several evolved functionals are compared using basis sets of double-ζ, triple-ζ, and quadruple-ζ quality. The obtained values are reported in Table 2. Apart from the six functionals described above, another hybrid functional with NL dispersion correction, EH2-NL, was included.
Since the WTMAD scores of all functionals under investigation improve systematically with basis set size, no erratic basis set dependence seems to be present. Trends between functionals of different species (e.g., no dispersion, D3(BJ) correction) are also captured well, independent of the quality of the basis set. Potential problems can occur for functionals of the same species, which show only small differences in their WTMAD at double-ζ level, as it is possible for the relative trend between these functionals to invert upon increase of the basis set size. An example are the functionals EH-NL and EH2-NL. For the double-basis, the WTMAD of EH2-NL is lower than the one of EH-NL, although the difference is only small (0.25 kJ/mol). This ordering is inverted if a triple-ζ or larger basis set is used, since EH-NL now lies 1.80 kJ/mol below EH2-NL, with the difference being even more pronounced for the quadruple-ζ basis.
However, since this phenomenon occurs only rarely and only for very small differences in the WTMAD score, the computations at double-ζ level still provide a good general guideline for the genetic algorithm. A simple countermeasure is to select not only the best functional yielded by the genetic algorithm for evaluation at quadruple-ζ level, but also those functionals whose WTMAD scores lie within a certain limit to the best score. An advantage of this approach is that the genetic algorithm run with a double-ζ basis set and a subsequent re-evaluation of only a handful of functionals with the quadruple-ζ basis set is still far more efficient than a whole genetic algorithm run carried out with the larger basis set. All functionals reported above were identified in this manner.
Influence of RI and RIJCOSX approximations
To reduce the required computational time, use was made of the resolution of identity (RI) approximation for GGAs and the resolution of identity chain of spheres (RIJCOSX) approximation for hybrid functionals. These approximations allow for an efficient computation of Coulomb and Hartree–Fock exchange terms, respectively, and can lead to speed-ups of one order of magnitude. RI and RIJCOSX, however, introduce small errors in the electronic energies. To study the impact of these errors on the WTMAD scores, reference computations for the evolved GGA and hybrid functionals were carried out with and without RI and RIJCOSX. It was found that upon use of the RI and RIJCOSX approximations, the WTMADs increase by an average of 0.04 kJ/mol for GGA functionals and by 0.21 kJ/mol for hybrid functionals. The semi-numerical RIJCOSX yields slightly larger deviations than standard RI, but regarding the energies computed here, both approximations are safe to use and do not influence the quality of the results significantly. Hence, all WTMAD scores reported here for the evolved functionals were computed with these approximations.
Evolution for specific applications
So far, all evolved functionals presented here were obtained using the WTMAD computed over the GMTKN30 database as a performance measure. This protocol was chosen in order to study general functional patterns emerging for a chemically balanced set of problems. However, one particular strength of genetic algorithms is their flexibility with respect to the criteria to be optimized, since only the fitness function has to be adapted accordingly and no gradients of any form are required.
To test this versatility of genetic algorithms in the context of density functionals, a hybrid functional was optimized targeting only the mean absolute error on those subsets of the GMTKN30 database that focus on non-covalent interactions. Moreover, no dispersion correction was included in the functional genome. The resulting functional (see Table 1) was then applied to compute the potential energy curve for the benzene dimer, a well-known example for a Van-der-Waals bound system, where standard DFT fails (see, e.g., Ref. ). The potential energies obtained for the functional with adapted fitness function, called EH-ED (ED stands for evolved dispersion), the functionals EH and EH-NL, as well as CCSD(T) reference values taken from Ref.  are shown in Fig. 6.
While EH-NL shows only minor deviations from the CCSD(T) reference, the minimum is completely absent in the case of EH and the dimers are unbound. This result is typical for functionals not augmented by dispersion corrections, as they lack the physical capability to describe long-range interactions of the Van-der-Waals type. Yet, compared to EH, the potential curve of EH-ED exhibits a qualitatively correct behavior, possessing a distinct minimum at slightly larger distances than the CCSD(T) reference, despite the explicit absence of a dispersion correction.
This result is an excellent demonstration for the performance of the genetic algorithm in optimizing a specially designed objective. Moreover, it opens up the possibility to use the genetic algorithm to automatically create “niche”-functionals tailored to specific needs. While the increased accuracy for the target properties comes at the cost of generality (e.g., reduced performance for other properties, EH-ED exhibits a WTMAD of 48.53 kJ/mol), the associated trade-off should be acceptable compared to the gains, provided the reference data and fitness function are chosen carefully.
At the same time, this finding also illustrates one of the potential drawbacks of DFT. As was shown above, even the standard three-parameter hybrid functional form is extremely flexible with respect to the combination of different functional approximations and parametrization. With enough patience and creativity, almost any desired result can be obtained. Hence, this high intrinsic flexibility of density functional approaches should always be considered carefully when searching for a universally valid functional.