The discovery of improved materials benefits science, technology, and society. While there exist many methods to uncover new materials, one promising and fairly recent approach to materials discovery uses density functional theory (DFT) calculations [1, 2] to predict properties of known and hypothetical materials across a large chemical space. This approach can be quicker and cheaper than direct experimental study, and has led to new experimental findings in fields as disparate as Li ion batteries, hydrogen storage, magnetic materials, multiferroics, and catalysts [3, 4].

One pressing societal problem is meeting world energy demand in an environmentally responsible manner. A possible contribution is to convert solar energy into hydrogen and oxygen by means of a photoelectrocatalytic solar cell. In this device, one or more photons split water into H2 and O2 gases. These gases are stored and later recombined to produce energy. An interesting class of materials for solar water splitters is the perovskite family, which consists of materials with general formula ABX3.

Recently, Castelli et al. [5, 6] used DFT to screen approximately 19,000 perovskite materials as potential solar water splitters, and 20 interesting compounds were identified for experimental followup. However, the result highlights a fundamental challenge in materials discovery: the number of interesting compounds comprises a small fraction of the total number of possible compounds. Therefore, a large number of calculations are needed to find a relatively small number of interesting materials.

While exhaustive search is sometimes achievable, search spaces for new materials might encompass on the order of millions or tens of millions of hypothetical compounds. For example, the 5-atom perovskites screened by Castelli et al. make up only a small portion of potentially promising materials for this application. Unfortunately, the number of DFT calculations that can reasonably be performed on today’s computers is limited to the order of tens of thousands. For example, the Materials Project required over 10 million CPU hours to generate structural and energetic data for about 30,000 materials [7, 8]. It is therefore essential to improve the efficiency of computational search, so that enumeration of all members of a search space is not needed to confidently uncover all good candidates.

In this study, we investigate the use of evolutionary algorithms [9] (which we subsequently refer to as “genetic algorithms”) as an optimization model to reduce the number of DFT computations needed to discover new materials. We re-examine the dataset produced by Castelli et al. [5, 6]. to determine whether the same promising candidates could be discovered with fewer computations by employing a genetic algorithm (GA). Our goal is to demonstrate that optimization algorithms coupled to DFT calculations present a path forward to searching very large chemical spaces for interesting technological materials.

Several other researchers have investigated tiered or algorithmic screening processes coupled to calculation [1013], and sometimes by employing GAs [1420]. The goal of this study is not to find new materials, but rather to assess the robustness of the GA as an inverse solver for the perovskite photocatalysis problem. We compare the efficiency of GA search to competing methods of screening materials, such as random search and a chemical rule-based search. In addition, we distinguish between different forms of GA by testing the performance of over 2592 parameter sets in over 50,000 GA trials. We report the most crucial parameters for achieving efficient GAs within the perovskite photocatalysis problem. Finally, we investigate transferability of optimized GA parameters by applying them to a second problem of searching for transparent photocorrosion shields.

Calculation methods

Search space and criteria for solar water splitting

Our search space consists of ABX3 cubic perovskites with 5-atom unit cells. Perovskites are an interesting search space because they display a diverse set of properties, and many are technologically useful [21, 22]. The cubic perovskite crystal structure is illustrated in Fig. 1. For the cations A and B, a set of 52 potential elements were selected as described by Castelli et al. [5, 6]. For the anion group X3, the search included seven mixtures of oxygen, nitrogen, sulfur, and fluorine: O3, O2N, ON2, N3, O2F, O2S, and OFN. In total, the search space consists of 18,928 compounds. The complete data set has been reported previously [5, 6] and is freely available at the Computational Materials Repository (CMR) web site [5, 23, 24].

Fig. 1
figure 1

Two views of the cubic perovskite crystal structure for ABX3 compounds. Atoms of type A (in yellow) are positioned at the cube corners, B atoms (in blue) at the cube center, and X atoms (in red) at cube faces. The B atoms are octahedrally coordinated by X, whereas the A atoms are 12-fold coordinated by X (Color figure online)

Potential water splitting materials were identified based on band gap, enthalpy of formation, and band edge positions [5, 6]. A material is considered a solution if:

  • The band gap (either direct or indirect) falls in the range 1.5–3 eV.

  • The heat of formation is less than 0.2 eV/atom. The heat of formation is calculated using a linear programming approach that considers a reference set of approximately 400 elements, bulk single, and bi-metal oxides, fluorides, sulfides, nitrides, oxyfluorides, oxysulfides, and oxynitrides.

  • The band edges (either direct or indirect) straddle the H+/H2 and O2/H2O band level positions.

The total number of solutions within the search space, including both known and yet examined compounds, is 20 [5, 6]. These compounds are listed in Table 1.

Table 1 Solutions to the solar water splitting problem, and their calculated properties

Calculation details

The DFT calculations were performed using the GPAW code [25, 26]. Total energies and structural relaxations were calculated under the RPBE approximation [27]; band gaps were calculated using the GLLB-SC semilocal functional [28, 29] that was previously demonstrated to improve the reliability of predicted gaps [6]. The band edge positions were determined by an empirical method [30, 31] that positions the center of the gap (E F) at:

$$ E_{\text{F}} = E_{0} + \left( {\chi_{\text{A}} \chi_{\text{B}} \chi_{\text{X}}^{3} } \right)^{1/5} $$

where E 0 is the difference between the hydrogen electrode level and vacuum (E 0 = −4.5 eV) and χ i denotes the Mulliken electronegativity of the element on site i. For multiple elements composing the anion X3, the geometric mean of electronegativities was used. The band edge positions were obtained by adding and subtracting half the calculated band gap from E F.

All magnetic ions were initialized ferromagnetically. To break cubic symmetry, atomic positions were displaced by 0.01 Å prior to structural relaxation. However, we note that several perovskite chemistries possess large driving forces for more complex distortions [32], and any effect of these distortions on the band structure were not modeled in the data.

Genetic algorithm method

For a general introduction to GAs, we refer to several previous works [20, 3335]. We performed the GA in Python using the open-source code Pyevolve, version 0.6rc1 [36, 37]. We modified some of the Pyevolve code such that the GA engine operates as described in the following sections. Because the entire search space has been precomputed with DFT, we fetch the result of all fitness evaluations from an internal database rather than performing a DFT calculation on-demand within the GA framework. The same material might appear in multiple generations of the GA; we only count unique materials when reporting performance.

To account for variability in GA results, we repeat the GA optimization routine 20 times (using different initializations) and report the average and standard deviation of the trials for each parameter set. In total, 2592 unique parameter set combinations were attempted, leading to 2592 × 20 = 51,840 independent GA runs. The same procedure was repeated for optimizing transparent photocorrosion shields (51,840 additional GA runs). The parameters we tested are described in more detail in the following subsections.

Candidate encoding

Each potential materials candidate in the ABX3 chemical space was encoded as a three-element composition vector C = [A, X3, B]. The first and third positions represent the A and B cations in the ABX3 composition, and contain one of 52 values, each of which corresponds to an element. The middle position for X3 has one of seven values, each representing a potential anion group. As an example, the vector [3, 1, 4] corresponds to the perovskite candidate “LiBeO3” (the Z values of Li and Be are 3 and 4, respectively, and X3 = 1 represents O3 in our convention). We structured the candidate vector in the order [A, X3, B].

We note that our choice of encoding means that our technique is more appropriately described as an evolutionary algorithm rather than a GA (in the latter, binary strings are used for encodings). We examine this choice in greater detail in the “Discussion”.

Population size and initialization

We tested three population sizes: 100, 500, and 1000. The lower range of this set corresponds to approximately 0.5 % of the total search space, whereas the upper range corresponds to over 5 % of the search space. We found that reducing the population size significantly below 100 led to stagnation from insufficient diversity within the population, making it difficult to obtain converged results.

The initial population was generated using random values for each component of the composition vector. For each GA input parameter set, the same set of 20 random initial populations was repeated.

Fitness function

We note that when optimizing multiple, independent objectives (e.g., stability, gap, and band edge position), there is no “correct” way to rank materials. We tested three different strategies for assigning a numerical fitness function to individuals in the multivariate case, although other strategies such as the Pareto optimality ranking [35, 38] also exist.

The first fitness function, which we call “Discontinuous”, sums the values of a stability score, a band gap score, and two band edge position scores. These component scores are plotted in Fig. 2. The principle of the “Discontinuous” fitness function is to withhold awarding any points unless a target property is fully met.

Fig. 2
figure 2

The component fitness functions for solar water splitting. The functions corresponding to the “Discontinuous” strategy are depicted in the left column, whereas the component functions for the “Smooth” strategy are plotted on the right side. We note that for the “Smooth” band gap function, we added a discontinuity for metals (gap of zero), awarding such compounds zero points. The overall fitness function involves either sums or products of the component fitness functions, as described in the text

We label the second fitness function tested as “Smooth”. This function also sums a stability score, a band gap score, and two band edge position scores. However, in contrast to the “Discontinuous” function, the “Smooth” function continuously increases the fitness score as an individual becomes closer to meeting a target property. The component scores for the “Smooth” fitness function are also plotted in Fig. 2.

The third fitness function, which we denote as “Smooth Product”, employs the same component fitness functions as the “Smooth” fitness function (Fig. 2). However, rather than summing the component fitnesses, we take a product of the stability fitness with the sum of the band gap and band edge position fitness. The principle of the “Smooth Product” function is to assign higher fitness to compounds that balance stability and desired electronic structure.

In each case, we normalize the maximum score to 30 potential points. For the band gap and band edges, we use the higher score based on independent assessments of the direct gap and indirect gap data.

Selection function and scaling factor

We tested three algorithms for selecting individuals as parents for mating:

  • Uniform selection—random individuals in the population are selected to be parents without regard to fitness score

  • Roulette Wheel selection—the probability of an individual to be selected as a parent is proportional to its fitness function

  • Tournament Wheel selection—a set of tournaments are performed. In each tournament, a sample of the population is randomly selected. The selected individual is the one with the highest fitness within the tournament sample.

Whereas uniform selection involves no additional parameters, both roulette wheel and tournament selection are tunable through parameters that affect selection pressure. A high selection pressure biases selection towards the stronger individuals at the expense of population diversity.

For roulette wheel selection, we tune the selection pressure through a linear scaling of the raw fitness scores. The linear scaling approach prevents early dominance of a single individual and helps distinguish individuals in later generations (when raw fitness values might all be close to optimal). Linear scaling modifies the raw fitness values in each generation such that:

$$ f^{\prime } = af + b $$

where f′ is the scaled fitness, f is the raw fitness and a and b are constants that change upon each generation. The constants are selected such that (i) the average fitness within the generation is maintained (f avg = \( f_{\text{avg}}^{\prime } \)) and (ii) the maximum fitness is equal to a constant C multiplied by the average fitness (\( f_{ \hbox{max} }^{\prime } =Cf_{\text{avg}}^{\prime } \)). The constant C is a free parameter that represents the desired selection pressure. We tested several values of C ranging from 1.25 to 10 in our study, but automatically adjust it when necessary to prevent negative scaled fitness scores.

In tournament selection, the scaling parameter does not affect the results because the selected member depends only on the fitness rank rather than its absolute value. We instead tune the selection pressure through the tournament size, with larger tournaments creating greater selection pressure. We test a commonly used tournament size of two individuals, as well as tournament sizes that are 5 and 10 % of the overall population size.

Crossover function and rate

The crossover function determines how children are generated given two parents. We tested three crossover functions:

  • Single-point crossover—The parents swap either the A or B cation (but not both) to produce two children.

  • Two-point crossover—The parents swap the anion X3 to produce two children.

  • Uniform crossover—A, B, and X3 are randomly swapped between two parents. We explicitly prevent the children from being identical to the parents unless parents are identical.

A pictorial representation of the crossover operations is presented in Fig. 3.

Fig. 3
figure 3

Representation of crossover operations. A pair of parents produces a set of two children. For single-point and uniform crossover, multiple sets of children are possible and one is selected at random

The crossover rate determines what percent of the parents mate to produce children; this parameter was set to 90 %, such that most parents selected for mating produce children. The remaining 10 % are passed to the next generation without modification. This choice of crossover rate is consistent with suggestions from previous studies [39, 40].


In many optimization problems, the performance of the GA can be improved by intentionally carrying over some of the fittest individuals of the current generation to the next generation. In our implementation, such “elite” individuals replace the least fit individuals of the new population. We tested our GA with elitism turned off, and with 10, 50, and 75 % of the fittest individuals carried over to the next generation.

Mutation function and rate

Our mutation function modifies an element of the composition vector to a random value. We tested mutation rates of 1, 5, and 10 %. We note that other potential mutation operators are also possible, such as switching the identities of the A and B cation.

Convergence and additional mutation operators

When a single solution is targeted, typical convergence criteria for GA are stagnation of population diversity or failure of the fittest individual to improve with generation number. However, our GA problem is multimodal, i.e., there exist several individuals that maximize our fitness function. Our goal is to find all possible materials that meet our design criteria, and we aim to prevent population convergence to a single optimum rather than promote it.

To encourage multimodal optimization, we introduce two additional mutation operators. The first, which we call local mutation, mutates a single gene of any duplicated individuals within a generation. This operator can be thought of as performing a local search around a duplicated solution. In addition to local mutation, we detect when all members of the population were previously explored in a previous generation. In these instances, we introduce a global mutation that mutates a single gene of the entire population and increases the crossover rate to 100 % for a single generation. This “resets” the search space when the GA becomes stuck on solutions already explored in the past. We found that absent these operators, our GA could stagnate for several thousand generations, recycling the same individuals without producing new solutions.

Other methods to tackle multimodal problems have been developed; for example, a well-studied class of techniques to handle multimodal problems, termed niching, attempts to find specialized solutions within several regions of a problem space. However, there exist many methods of implementing niching, and additional parameters must be optimized within each niching implementation [20, 35, 38, 4144]. Therefore, we leave a comprehensive exploration of niching to a future study.

Evaluating performance

To evaluate the performance of the GA, we tested it against other methods and also on a different (but related) application of transparent photocorrosion shields.

Chemical rule-based search method

In addition to the GA, we independently tested a rule-based method of selecting compounds. This “chemical rule-based search” provides a sense of how to prioritize a search space using empirical knowledge and scientific principles. In particular, we apply the following rules:

  1. 1.

    “Valence balance rule”—the formal oxidation states of all the elements in a realistic ionic material must sum to zero, such that the overall material is valence-neutral. In situations where elements are known to display multiple oxidation states (for example, the transition metals), the condition must be met for at least one of the oxidation state combinations. Materials that cannot be valence-balanced, such as LiCaO3, are completely excluded from the search.

  2. 2.

    “Even–odd electrons rule”—materials containing an odd number of electrons are excluded because they will contain a partially occupied eigenstate at the Fermi level. These materials will necessarily be metallic with zero band gap and therefore unsuitable for solar water splitting.

  3. 3.

    “Goldschmidt tolerance factor ranking”—materials fulfilling the first two rules are ranked using their Goldschmidt tolerance factor [45]. The Goldschmidt tolerance factor t is based on the geometry of the perovskite cell and assesses the likelihood of a material to form the perovskite crystal structure. It is defined as:

    $$ t = \frac{{\left( {r_{\text{A}} + r_{\text{X}} } \right)}}{{\sqrt 2 \left( {r_{\text{B}} + r_{\text{X}} } \right)}} $$

    where r i corresponds to ionic radii of the i = A, B, and X3 sites. In an ideal perovskite, t is equal to unity (t ideal = 1). We ranked perovskites by the absolute deviation from this ideal value, that is by |t − t ideal|, such that compositions that more closely meet the ideal perovskite structure are tested first. For metals with multiple known ionic radii, we used an unweighted average of known radii as the radius. When the anion site X3 contains multiple elements, we used a weighted average of the individual ionic radii.

These rules might approximate the intuition of a researcher in prioritizing perovskite compounds for computation.

Transparent photocorrosion shield screening

In addition to solar water splitters, the perovskite dataset can also be used to screen potential transparent shields to protect against photocorrosion, as recently reported by Castelli et al. [5]. The need for a transparent protecting shield lies in the difficulty of finding stable, medium-gap perovskites needed for water splitting; usually, stable perovskites tend to also have wide gaps [5, 6]. A wide-gap shield might therefore be placed in front of a medium-gap photo-absorber to enhance protection against (photo)corrosion without affecting light capture properties.

From the point of view of the GA, the only component that requires modification to address this new problem is the fitness function. In particular, we are now screening for direct gap semiconductors with gaps greater than 3 eV in order to obtain transparency. In addition, the band edge position criteria now stipulates that the valence band of the shield must lie between the valence band position of the water splitter and the oxygen evolution potential. This corresponds to a valence band position lying between 1.7 and 2.5 eV with respect to the H+/H2 level [5]. There is no restriction on the position of the conduction band, other than that implied by the band gap and valence band criteria.

The modified component fitness functions are plotted in Fig. 4. The overall fitness functions, “Discontinuous”, “Smooth”, and “Smooth Product” are taken as sums and products of the component functions similarly to solar water splitting. There exist 8 solutions in our search space to the transparent shield screening problem (Table 2).

Fig. 4
figure 4

The component fitness functions for photocorrosion transparent shields. Similar to the water splitting case, the overall fitness function involves either sums or products of the component fitness functions, as described in the text. The band edge position corresponds to the valence band, with the zero value taken to be at the H+/H2 level

Table 2 Potential photoanode shields investigated in this work, and their calculated properties

We re-tested all 2592 GA parameter sets that were examined for water splitting for the transparent shield problem, with 20 trials for each parameter set, resulting in 51,840 additional GA runs.

Efficiency metric

We evaluate the robustness of both GAs and chemical rules against a standard benchmark of random guesses within the search space. The metric used for comparing algorithms is the expected number of computations needed to produce a given number of solutions to the problem. In particular, we focus on the average number of computations needed to uncover all solutions as well as the average number of computations to produce any half of solutions. We define the efficiency (or robustness) of an optimization strategy as the ratio of the average number of calculations needed for random search to the average number of calculations needed by the GA to produce a given number of solutions:

$$ e^{n} = \frac{{c_{\text{rand}}^{n} }}{{c_{\text{opt}}^{n} }} $$

where \( e^{n} \) represents our definition of efficiency in finding n solutions, \( c_{\text{opt}}^{n} \) is the average number of calculations needed by the optimization strategy to find n solutions, and \( c_{\text{rand}}^{n} \) is the average number of calculations needed for a random search strategy to find n solutions. For random search, the average number of computations c to produce n solutions is given mathematically by [46]:

$$ c_{\text{rand}}^{n} = \frac{n(x + 1)}{(s + 1)} $$

where x is the size of the search space (18,928) and s is the total number of solutions (20 for water splitting and 8 for photocorrosion shields). The number of computations \( c_{\text{rand}}^{n} \) needed to obtain n = 10 and n = 20 solutions for water splitting is 9014 and 18,028, respectively, when randomly choosing candidates. An efficiency of 2 therefore indicates that 4507 and 9014 computations were needed to find n = 10 and n = 20 solutions, respectively.


To compare the contributions of each parameter choice to the GA’s efficiency, we used the analysis of variance (ANOVA) method [4749]. ANOVA allows one to assess what factors are statistically relevant to influencing a result, the relative degree of importance of each factor, and potential interactions between parameters. We performed the ANOVA using Matlab’s multi-way anovan() method. Statistical tests were performed with a 95 % confidence level, and the multiple comparison test was performed using “Tukey’s honestly significant difference” criterion.


Now that we have introduced our GA parameter choices and efficiency measure, we compare a GA-guided search to random and chemical rule-based search. In Fig. 5, we plot the average number of fitness evaluations (DFT computations) needed to achieve a given number of solutions. Random search, depicted by a black line, requires on average over 18,000 fitness evaluations in order to find all solutions in the search space. The best-performing GA is reported in Table 3 and represented in blue in Fig. 5. On average, this GA requires fewer than 3100 calculations to find all 20 solutions in a space of almost 19,000 possibilities, making it 5.8 times as efficient at searching the perovskite chemical space compared with random search (Table 4). The variation in performance over 20 trials is small compared to the total number of evaluations (Fig. 5), with the standard deviation ranging from 81 evaluations in finding a single solution to 712 evaluations in finding all 20 solutions. Therefore, by employing a GA, one could have confidently searched the entire chemical space of ABX3 peroxides using only about one-sixth as many calculations compared with computing the entire space. Stated another way, our result suggests that given a fixed computational budget, the use of the GA allows one to search chemical spaces that are much larger than the number of available calculations.

Fig. 5
figure 5

The average number of calculations needed to produce a given number of solutions to the solar water splitting problem for genetic algorithms versus a random strategy. The best-performing GA (blue) requires significantly fewer calculations than a random strategy (black) to find potential solar water splitters. The error bars represent one standard deviation from the average performance over 20 independent runs of the genetic algorithm (Color figure online)

Table 3 Parameters for the best GA in finding both 10 and 20 solutions, as described in the text
Table 4 Efficiency of chemical rules and GA in finding 10 and 20 solutions to the one-photon water splitting problem

In Fig. 6, we compare the performance of our best GA versus the chemical rule-based strategy described in “Chemical rule-based search method”. We note that our chemical rules are a difficult benchmark to surpass; rules (1) and (2) of our rule-based search eliminate 11,587 compounds, or 60 % of the search space, from the search. In addition, rule (3) informs which of the remaining individuals are likely to be stable based on specific knowledge of the perovskite structure. In contrast, the GA must learn these types of rules dynamically over the course of optimization without any prior knowledge. The GA has no knowledge of what the genome or the fitness function represents; its only information comes from matching genome vectors to the numerical results of fitness evaluations. Despite these limitations, the best GA is comparable to search with basic chemical rules designed to tackle a specific materials problem (Fig. 6). This suggests that GAs might provide a path forward in problems where chemical rules are not available to the researcher in advance.

Fig. 6
figure 6

The average number of calculations needed to produce a given number of solutions to the solar water splitting capture problem for genetic algorithms versus chemical search. The best GA without any chemical guidance (blue) performs comparably to chemical rules (orange). The “knowledge-directed” GA for which the search space is reduced through chemical constraints (green) significantly improves upon chemical rule-based search and uninformed GA. The error bars represent one standard deviation from the average performance over 20 independent runs of the genetic algorithm (Color figure online)

We also investigated whether the GA can benefit from knowledge of chemical rules. We re-ran the best GA but simulated a situation in which the 11,587 compounds that can be excluded based on chemical rules (1) and (2) are not calculated. The GA proceeds as before, but we return a fitness function of zero for any excluded compound and do not count it as as being ‘searched’. This method crudely approximates a “knowledge-directed” GA in which outside information is employed to guide the search. The results for this method are indicated in green in Fig. 6 and demonstrate that a knowledge-directed GA outperforms both chemical rules and uninformed GA by themselves. The knowledge-directed algorithm represents factors of 11.7, 2.6, and 2.0 improvements in finding all solutions compared with random search, chemical rules alone, and GA alone, respectively. The performance data for all methods is summarized in Table 4.

Next, we examine how the six GA parameters (crossover type, population size, selection method, mutation rate, elitism, and fitness) influence robustness of the GA in finding all 20 solutions. We first analyze the data using ANOVA without considering interactions between parameters. We find that all parameters except the mutation rate statistically influence the GA efficiency using a 5 % confidence test (the mutation rate has a p value of 18 %). It may be the case that the local and global mutation operators introduced in “Convergence and additional mutation operators” generate sufficient population diversity such that additional mutations are not needed to improve GA performance.

After removing mutation from the analysis, we assessed the contribution of each remaining parameter to the GA’s robustness through the η 2 parameter. A large η 2 indicates a large effect of the parameter on GA efficiency while a small η 2 suggests that the parameter (while statistically significant) produces only a small effect. The portion of the result that cannot be prescribed to a single parameter is lumped into an “error” term. This term encompasses both interactions between parameters and also randomness of the GA (e.g., due to different initial populations). Table 5 lists the η 2 measure for all parameters and the error term. The two major parameters affecting the results are elitism and selection method (Table 5). The population size, crossover type, and fitness function have statistically significant but marginal effects on the results.

Table 5 Percentage of variance (η 2) prescribed to various GA parameters from ANOVA (run without interactions) for the problem of finding all 20 solar water splitters

We also studied an ANOVA model with pair interactions included (Table 6). Almost all interactions are very small. However, there exists one very significant interaction between elitism and the selection function. This strong interaction can be attributed to an unfavorable combination of zero elitism paired with either uniform selection or “weak” roulette selection (scaling factor of 1.25). In the case of uniform selection, the fitness function is used nowhere in the GA when elitism is absent; we are essentially performing a random search. We suspect that weak roulette selection behaves similarly, with the fitness function too weakly distinguishing good and bad individuals without the added selection pressure of elitism.

Table 6 Percentage of variance (η 2) prescribed to various GA parameters and two-parameter interactions from ANOVA for the problem of finding all 20 solar water splitters

Now that we have determined which factors and interactions are most important to the GA robustness, we examine exact parameter values that yield good or bad efficiency using a multiple comparison test. This test produces the marginal mean number of calculations required to find 20 solutions to solar water splitting along with a confidence interval. It allows us to determine which parameter values are distinct from one another, and how they affect robustness. In Fig. 7, we plot the results of the multiple comparison test for selection method, crossover, population size, elitism, and fitness function. Parameters with the same color and symbol in Fig. 7 do not differ much in their effect and can be considered equivalent.

Fig. 7
figure 7

Multiple comparison test demonstrating the effect of various parameter choices. Parameters that are statistically different from one another are represented by different colors and symbols. For example, single-point and uniform crossover are statistically different from two-point crossover, but not from each other. The value on the x-axis represents average number of computations needed to reach 20 solutions to the solar water splitting problem; lower values represent better performance (Color figure online)

In terms of selection, there exist two groups of parameters (Fig. 7). The uniform and “weak” roulette (C = 1.25) methods both perform poorly compared to other selection methods. As discussed previously, these selection methods either partially or completely fail to take into account the fitness function. The best results are found for “strong” roulette (i.e., roulette with a scaling factor equal or higher than 2.5), although similar results can also be obtained with tournament selection. While robustness slightly increases as roulette selection becomes stronger, it slightly decreases as tournament selection becomes stronger.

Figure 7 highlights that the absence of elitism is extremely undesirable. However, the positive effect of adding elitism appears to saturate somewhere around 50 %; we do not see any difference between the elitism rate set at 50 and 75 %.

When examining the crossover function, both single-point and uniform crossover significantly outperform two-point crossover (Fig. 7). Two-point crossover operator swaps X3 between parents to produce children, but our problem contains only seven potential values of X3. Many parents will share the same X3, and children will be identical to parents. The two-point crossover is therefore not appropriate for our problem as it is unlikely to generate sufficient population diversity. In general, it should be noted that our results regarding crossover operations are for a 3-element genome and may not apply to the more common situation of having larger genomes. Therefore, our results on crossover should be viewed as specific to this application.

Regarding the fitness function, Fig. 7 demonstrates that the “Smooth Product” function performs the best, followed by the “Smooth” function and finally the “Discontinuous” function. These results suggest two guidelines in designing the fitness function. First, awarding partial points for partial solutions is helpful for the GA. Second, when designing multi-objective functions it appears to be beneficial to take products of individual fitness functions rather than sums.

Finally, Fig. 7 suggests that the population size should not be too large. For a given number of total calculations, large population sizes involve fewer generations and therefore fewer GA operations per individual. The poorer performance of large populations reported in our study may largely be due to this discrepancy.

A visual summary of the effects of various parameter choices is presented in Fig. 8. The diagonal elements in Fig. 8 represent the average efficiency when holding a single GA parameter constant while averaging over all potential values of the remaining parameters. Off-diagonal elements in Fig. 8 represent the average efficiency when holding two parameters constant and averaging over the remaining parameter values. By examining Fig. 8, we see visually many of the conclusions determined through ANOVA. For example, the dark row and column in the matrix where elitism is zero illustrates the strong negative effect of this parameter choice. It is also easy to pinpoint the unfavorable interaction between lack of elitism and uniform selection or weak roulette selection (dark red). However, it is difficult to assess the statistical significance of differences. Therefore, Fig. 8 should be considered a rough overview map of parameter space.

Fig. 8
figure 8

Average efficiency in finding 20 solutions when constraining one or two parameters while averaging over other parameter values. The values along the diagonal constrain a single parameter; off-diagonal elements constrain two parameters. The designations M, F, E, S, P, and X refer to mutation rate, fitness function, elitism, selection function, population size, and crossover function, respectively. The designations D, SP, S, T, R, U, 2P, and 1P refer to discontinuous, smooth product, smooth, tournament selection, roulette selection, uniform, two-point, and single-point, respectively

In summary, our study suggests several guidelines when designing GA for perovskite oxide solar water splitters. First, elitism should be set high, for example to half the population. A “strong” roulette or tournament selection method should be used. While of less importance than selection and elitism, we can also recommend a population size small enough to enable several GA operations per individual (100, or 0.5 % of the search space, was optimal in our tests) and a Smooth fitness function that is the product of several individual functions.

While these recommendations pertain to finding all 20 solutions to the perovskite solar water splitting problem, it is interesting to test how they generalize to other problems. As a first example, we consider the problem of finding only half the number of solutions in the search space and re-examine our suggestions for parameter choices. The metric of evaluations needed to find any 10 solutions might be important in computational screening if our desire is to quickly pinpoint a few compounds for laboratory followup. Table 7 lists the η 2 values for single-factor ANOVA but for the problem of finding 10 solutions to the solar water splitting problem.

Table 7 Percentage of variance (η 2) prescribed to various GA parameters from ANOVA (run without interactions) for the problem of finding 10 solar water splitters (half of all potential solutions)

The main difference between the ANOVA results for the 10 versus 20 solutions is with respect to the population size. While the population size explained only 3 % of the variance for 20 candidates, it is much important (13.2 %) for 10 candidates. In both problems, smaller population sizes (100) are more favorable than larger ones. However, the benefits of a small population size are much more pronounced when targeting 10 candidates. This might be because small population sizes carry less diversity than large populations, presenting a natural disadvantage in searching globally for multiple optima. Large populations are slow to find initial solutions because of fewer GA operations for a given number of calculations, as discussed earlier. However, once these rules are discovered the greater diversity in large populations could become advantageous in searching globally for solutions.

Using a multiple comparison test (Fig. 9), we find that another major difference in finding 10 versus 20 solutions is the choice of selection method. Whereas obtaining 20 solutions favored strong roulette selection, obtaining 10 solutions favors a strong tournament selection rule (tournament size of 5 or 10 %). In both cases, binary tournament selection performs similarly to strong roulette selection. It might be the case that tournament selection overall creates more selection pressure than roulette selection. Similar to small population sizes, the very high selection pressure of strong tournament selection might be advantageous for finding solutions within a small region of chemical space but be suboptimal in finding solutions globally.

Fig. 9
figure 9

Multiple comparison tests illustrating the effect of selection function in determining the number of calculations needed to find 10 solutions for water splitting (top) and all 8 solutions for photocorrosion shields (bottom). The uncertainty values for photocorrosion shields are similar in magnitude to the marker size, and are omitted. Parameters that are statistically different from one another are represented by different colors and symbols. In both problems, tournament selection with a high tournament rate outperforms roulette selection

Figure 10 plots the efficiencies of finding ten and all solutions for each of the 2592 parameter sets. The two properties are correlated, suggesting that the same parameters might be used for both problems. In particular, we label the “best” GA overall, and note that it performed optimally in finding both 10 and 20 solutions. As discussed previously, Fig. 10 indicates that large population sizes (green diamonds) are less efficient than small population sizes (blue circles and orange squares), and even more so when attempting to find only ten solutions.

Fig. 10
figure 10

Efficiency of the GA in finding all 20 solutions (y-axis) versus efficiency in finding 10 solutions (x-axis) to the solar water splitting problem. Each point represents one of the 2592 parameter sets tested. The data is labeled by population size. The parameter set we consider to be the “best” exhibits optimal efficiency in finding both 10 and 20 solutions

As a second test to the transferability of our recommended GA parameters, we attempt to identify transparent photocorrosion shields as described in “Transparent photocorrosion shield screening”. In Fig. 11, we compare the efficiency of each set of GA parameters in optimizing the solar water splitter problem to the efficiency in optimizing the transparent shield problem. We note that the best performance for the transparent shield problem is approximately 8 times more efficient than random search (Fig. 11), demonstrating the GA is also applicable to a second problem.

Fig. 11
figure 11

Efficiency of the GA in finding all 8 photoanode transparent shields (y-axis) versus efficiency in finding all 20 one-photon water splitters (x-axis). Each point represents one set of GA parameters

In general, there exists a correlation between the two problems: a GA parameter set that performs well in identifying solar water splitters is also more likely to identify transparent shields efficiently (Fig. 11). However, there is considerable scatter in the relation, which suggests that unfortunately even similar problems over the same chemical space require slightly different GA parameters for optimal performance.

It should be noted that chemical search outperforms the GA in finding transparent shields, with 13.4 times improvement over random search to find all 8 solutions. Chemical rules might perform better in finding fewer solutions (as in the transparent shields problem), whereas GA might be able to outperform chemical rules when finding a greater number of solutions (as in water splitting). In Fig. 6, for example, we see a sharp dropoff in the performance of chemical rules after about 15 solutions found.

We perform a single-factor ANOVA on the photocorrosion shield data set to assess any difference in important parameters compared to solar water splitting. The results, presented in Table 8, are mostly similar to the water splitting case. However, elitism is an even greater factor in the transparent shields problem. In addition, a multiple comparison test (Fig. 9) demonstrates that strong tournament selection is optimal for finding all transparent shields, whereas strong roulette selection was optimal for finding all solar water splitters (Fig. 7). In this respect, finding all transparent shields is similar to finding only 10 solutions for water splitting. This similarity might originate because there exist only 8 solutions for the transparent shield in the search space, suggesting that we should use parameters that rapidly find a small number of solutions.

Table 8 Percentage of variance (η 2) prescribed to various GA parameters from ANOVA (run without interactions) for the problem of finding all 8 transparent photocorrosion shields

In conclusion, the parameters with the largest effects on the results are elitism, selection method, and population size. In particular, zero elitism is particularly detrimental to GA performance, especially when employing weak selection methods. The exact tuning of the parameters, and in particular the choice of strong roulette versus strong tournament selection, appears to depend on the problem. For example, our results suggest that higher selection pressures should be used when targeting fewer solutions. However, there exists overall a strong correlation between parameters that perform well on one problem versus other similar problems.


While we have so far mainly discussed the GA as a “black box” optimizer, we now consider its operation in more detail. To help understand how a GA might improve performance in our problem, we refer to a previous study by Calle-Vallejo et al. [50] on trends in perovskite stability in pure oxides. Using DFT computations similar to those employed in this work, Calle-Vallejo et al. [50] observed that the enthalpy of formation was mostly constant for a given B ion (B3+ and B4+ behave differently). This result might help explain the efficiency of the GA: we would expect that ‘fit’ parents with favorable ‘B’ genes will produce children that inherit this B-site ‘gene’ that confers good formation enthalpy. In addition, there is also evidence from Calle-Vallejo et al.’s results that perturbations to the formation enthalpy due to the A site should follow the same rank and general direction independently of the B site (although the magnitude might vary depending on B) [50]. Thus there also exist ‘desirable’ values of the A gene that could be passed between generations. We speculate that similar trends may hold true for the band gap and band position criteria. For example, there exists a weak relation between formation enthalpy and band gap that suggests that the factors that control formation enthalpy might also tune the band gap [5, 6].

We note that our choice of encoding of a material into a genome string might affect GA robustness. Our encoding employed a short genome of length 3 with a high cardinality alphabet that contained up to 52 values. The advantage of this encoding was that it was trivial to encode and decode between a perovskite material and its genomic representation. However, this might not be an optimal encoding in terms of robustness, because it treats each element in the periodic table as an independent entity. In particular, it neglects chemical relationships between elements in the periodic table. For example, our encoding prevents a crossover operation from mixing an early transition metal with a late transition metal to produce an intermediate transition metal. Such an operation might be achieved by representing each element in a binary or Gray coding that represents electronegativity or Mendeelev number. This representation would allow a child to inherit an element that is intermediate in chemical behavior to its parents. In GA terms, this would create a long genome with a low cardinality alphabet. Such representations present more opportunities to find and mix building blocks that confer fitness, thereby enhancing efficiency [9, 35, 51].

Our target problem was difficult in some respects: it is a multimodal problem with several solutions and in which the relationship between formation enthalpy, band gap, and band edge position is complex and unknown. However, our search problem was also simpler than many realistic materials design scenarios because our search space only involved a single generalized composition (ABX3) and a single crystal structure (perovskites). Many important materials design studies must search over several different composition templates and structure prototypes. For example, a recent computational investigation by Berger and Neaton [52] suggested that a Cr–V mixture in a double perovskite structure might be interesting for water splitting, and a separate computational study by Wu et al. [53]. found many potentially interesting water splitters by canvassing chemical substitutions into the ICSD database. Extension of our scheme to double-perovskites should be straightforward by increasing the length of the feature vector, but significant changes to materials encoding and crossover operations would be needed to test all the diverse structures found in the ICSD. However, we believe that this issue does not pose a major barrier to employing GA in more sophisticated searches. In particular, there exists a rich and successful history of employing GA coupled with computation to predict new crystal structures [33], and appropriate operators for crossover, selection, etc., have already been developed for searching over both crystal structures and compositions with a GA [54, 55].

We note that integrating GAs, or any optimization algorithm, into high-throughput computational searches still requires further effort. In particular, the GA implementation tested herein relies on completing one generation of computations before beginning the next generation. The typical way to parallelize this type of GA is to assign a “controller” node to coordinate the GA engine and assign the remaining nodes as “workers” that perform fitness evaluations (DFT computations). There are at least two major limitations with this setup. The first limitation is that the number of worker nodes must always balance the number of fitness evaluations needed in each generation in order to keep the workers occupied with computing tasks. Therefore, the number of worker nodes and the parallelizability of the fitness evaluations will restrict the choice of GA parameters. More worker nodes will stipulate higher population sizes, lower elitism, or improved parallelizability in evaluating the fitness function. A second limitation is that the controller node must wait for all fitness evaluations within a generation to complete before proceeding with selection, crossover, and mutation operations. A single DFT computation that is slow-to-converge might thus impede the progress of the entire GA. This is a real problem with DFT computations because time to completion can vary by days and is difficult to predict in advance.

Fortunately, alternate GA models have been designed that overcome such limitations in parallelization [35, 56, 57]. For example, in an asynchronous GA, the GA operators are immediately applied after each fitness evaluation using the population available at the time. Another technique is to perform independent GAs on different processors, but to communicate fittest individuals observed between GA instances. These methods, as well as others that have been devised [35], solve both issues presented earlier by ensuring that compute nodes are never kept idle. We note while other optimization techniques such as simulated annealing are also available [13], a major advantage of the GA is its potential for attaining high parallel performance [58] and integration into high-throughput computation. However, a necessary step forward to the automated inverse design of materials is the integration of the optimizer into one of several existing high-throughput DFT frameworks [7, 23, 59, 60].

We hypothesize that a more advanced GA might further improve performance beyond the values reported in this work. For example, niching, the use of a Pareto optimal rank fitness function, and a more flexible encoding were already mentioned as potential enhancements [35]. In addition, previous work by Balamurugan et al. [61]. suggests that a “hybrid” approach, whereby a GA is coupled to local search using alchemical derivatives [62, 63], might be a promising avenue for further performance improvements.


We demonstrated that use of a GA improves the efficiency of searching a chemical space of almost 19,000 perovskites for solar water splitters. The GA was especially useful at rapidly finding half of the solutions (almost 10 times efficiency gain over random search), and provided up to a 5.8 greater efficiency in finding all solutions. The performance of the best GA tested was comparable to a set of chemical rules we designed to filter and rank perovskite materials for this problem. A GA might therefore be applied in situations where chemical rules are not known in advance. Combining the GA with chemical rules further improved performance, leading to 16.9 and 11.7 times less fitness evaluations needed than random search to find 10 and all 20 solutions, respectively. We further found that in an alternate problem aimed at uncovering transparent photocorrosion shields, the GA performed 8 times more efficiently than random search.

Using ANOVA, we determined that the most important parameters for good performance were elitism and selection function. The GA performed best when the elitism was set to at least 50 %. The appropriate selection function appears to depend on the number of solutions in the search space. For finding all 20 solutions to the solar water splitting problem, strong roulette selection performs best. For finding 10 solutions to the water splitting problem or 8 solutions to the photocorrosion problem, a strong tournament selection performs better. In all cases, we found small population sizes to be beneficial, although the advantage diminished with the desired number of solutions.

We speculate that further gains in GA performance might be obtained through niching, longer genome encodings, or a Pareto optimal fitness function. While significant work still remains to couple a GA “control loop” to an automated and rapid DFT computation framework, our results suggest that such a technique presents a viable method to rapidly screen large chemical spaces for technological materials.