Abstract
We examine the performance of genetic algorithms (GAs) in uncovering solar water light splitters over a space of almost 19,000 perovskite materials. The entire search space was previously calculated using density functional theory to determine solutions that fulfill constraints on stability, band gap, and band edge position. Here, we test over 2500 unique GA implementations in finding these solutions to determine whether GA can avoid the need for brute force search, and thereby enable larger chemical spaces to be screened within a given computational budget. We find that the best GAs tested offer almost a 6 times efficiency gain over random search, and are comparable to the performance of a search based on informed chemical rules. In addition, the GA is almost 10 times as efficient as random search in finding half the solutions within the search space. By employing chemical rules, the performance of the GA can be further improved to approximately 12–17 better than random search. We discuss the effect of population size, selection function, crossover function, mutation rate, fitness function, and elitism on the final result, finding that selection function and elitism are especially important to GA performance. In addition, we determine that parameters that perform well in finding solar water splitters can also be applied to discovering transparent photocorrosion shields. Our results indicate that coupling GAs to highthroughput density functional calculations presents a promising method to rapidly search large chemical spaces for technological materials.
Similar content being viewed by others
Introduction
The discovery of improved materials benefits science, technology, and society. While there exist many methods to uncover new materials, one promising and fairly recent approach to materials discovery uses density functional theory (DFT) calculations [1, 2] to predict properties of known and hypothetical materials across a large chemical space. This approach can be quicker and cheaper than direct experimental study, and has led to new experimental findings in fields as disparate as Li ion batteries, hydrogen storage, magnetic materials, multiferroics, and catalysts [3, 4].
One pressing societal problem is meeting world energy demand in an environmentally responsible manner. A possible contribution is to convert solar energy into hydrogen and oxygen by means of a photoelectrocatalytic solar cell. In this device, one or more photons split water into H_{2} and O_{2} gases. These gases are stored and later recombined to produce energy. An interesting class of materials for solar water splitters is the perovskite family, which consists of materials with general formula ABX_{3}.
Recently, Castelli et al. [5, 6] used DFT to screen approximately 19,000 perovskite materials as potential solar water splitters, and 20 interesting compounds were identified for experimental followup. However, the result highlights a fundamental challenge in materials discovery: the number of interesting compounds comprises a small fraction of the total number of possible compounds. Therefore, a large number of calculations are needed to find a relatively small number of interesting materials.
While exhaustive search is sometimes achievable, search spaces for new materials might encompass on the order of millions or tens of millions of hypothetical compounds. For example, the 5atom perovskites screened by Castelli et al. make up only a small portion of potentially promising materials for this application. Unfortunately, the number of DFT calculations that can reasonably be performed on today’s computers is limited to the order of tens of thousands. For example, the Materials Project required over 10 million CPU hours to generate structural and energetic data for about 30,000 materials [7, 8]. It is therefore essential to improve the efficiency of computational search, so that enumeration of all members of a search space is not needed to confidently uncover all good candidates.
In this study, we investigate the use of evolutionary algorithms [9] (which we subsequently refer to as “genetic algorithms”) as an optimization model to reduce the number of DFT computations needed to discover new materials. We reexamine the dataset produced by Castelli et al. [5, 6]. to determine whether the same promising candidates could be discovered with fewer computations by employing a genetic algorithm (GA). Our goal is to demonstrate that optimization algorithms coupled to DFT calculations present a path forward to searching very large chemical spaces for interesting technological materials.
Several other researchers have investigated tiered or algorithmic screening processes coupled to calculation [10–13], and sometimes by employing GAs [14–20]. The goal of this study is not to find new materials, but rather to assess the robustness of the GA as an inverse solver for the perovskite photocatalysis problem. We compare the efficiency of GA search to competing methods of screening materials, such as random search and a chemical rulebased search. In addition, we distinguish between different forms of GA by testing the performance of over 2592 parameter sets in over 50,000 GA trials. We report the most crucial parameters for achieving efficient GAs within the perovskite photocatalysis problem. Finally, we investigate transferability of optimized GA parameters by applying them to a second problem of searching for transparent photocorrosion shields.
Calculation methods
Search space and criteria for solar water splitting
Our search space consists of ABX_{3} cubic perovskites with 5atom unit cells. Perovskites are an interesting search space because they display a diverse set of properties, and many are technologically useful [21, 22]. The cubic perovskite crystal structure is illustrated in Fig. 1. For the cations A and B, a set of 52 potential elements were selected as described by Castelli et al. [5, 6]. For the anion group X_{3}, the search included seven mixtures of oxygen, nitrogen, sulfur, and fluorine: O_{3}, O_{2}N, ON_{2}, N_{3}, O_{2}F, O_{2}S, and OFN. In total, the search space consists of 18,928 compounds. The complete data set has been reported previously [5, 6] and is freely available at the Computational Materials Repository (CMR) web site [5, 23, 24].
Potential water splitting materials were identified based on band gap, enthalpy of formation, and band edge positions [5, 6]. A material is considered a solution if:

The band gap (either direct or indirect) falls in the range 1.5–3 eV.

The heat of formation is less than 0.2 eV/atom. The heat of formation is calculated using a linear programming approach that considers a reference set of approximately 400 elements, bulk single, and bimetal oxides, fluorides, sulfides, nitrides, oxyfluorides, oxysulfides, and oxynitrides.

The band edges (either direct or indirect) straddle the H^{+}/H_{2} and O_{2}/H_{2}O band level positions.
The total number of solutions within the search space, including both known and yet examined compounds, is 20 [5, 6]. These compounds are listed in Table 1.
Calculation details
The DFT calculations were performed using the GPAW code [25, 26]. Total energies and structural relaxations were calculated under the RPBE approximation [27]; band gaps were calculated using the GLLBSC semilocal functional [28, 29] that was previously demonstrated to improve the reliability of predicted gaps [6]. The band edge positions were determined by an empirical method [30, 31] that positions the center of the gap (E _{F}) at:
where E _{0} is the difference between the hydrogen electrode level and vacuum (E _{0} = −4.5 eV) and χ_{ i } denotes the Mulliken electronegativity of the element on site i. For multiple elements composing the anion X_{3}, the geometric mean of electronegativities was used. The band edge positions were obtained by adding and subtracting half the calculated band gap from E _{F}.
All magnetic ions were initialized ferromagnetically. To break cubic symmetry, atomic positions were displaced by 0.01 Å prior to structural relaxation. However, we note that several perovskite chemistries possess large driving forces for more complex distortions [32], and any effect of these distortions on the band structure were not modeled in the data.
Genetic algorithm method
For a general introduction to GAs, we refer to several previous works [20, 33–35]. We performed the GA in Python using the opensource code Pyevolve, version 0.6rc1 [36, 37]. We modified some of the Pyevolve code such that the GA engine operates as described in the following sections. Because the entire search space has been precomputed with DFT, we fetch the result of all fitness evaluations from an internal database rather than performing a DFT calculation ondemand within the GA framework. The same material might appear in multiple generations of the GA; we only count unique materials when reporting performance.
To account for variability in GA results, we repeat the GA optimization routine 20 times (using different initializations) and report the average and standard deviation of the trials for each parameter set. In total, 2592 unique parameter set combinations were attempted, leading to 2592 × 20 = 51,840 independent GA runs. The same procedure was repeated for optimizing transparent photocorrosion shields (51,840 additional GA runs). The parameters we tested are described in more detail in the following subsections.
Candidate encoding
Each potential materials candidate in the ABX_{3} chemical space was encoded as a threeelement composition vector C = [A, X_{3}, B]. The first and third positions represent the A and B cations in the ABX_{3} composition, and contain one of 52 values, each of which corresponds to an element. The middle position for X_{3} has one of seven values, each representing a potential anion group. As an example, the vector [3, 1, 4] corresponds to the perovskite candidate “LiBeO_{3}” (the Z values of Li and Be are 3 and 4, respectively, and X_{3} = 1 represents O_{3} in our convention). We structured the candidate vector in the order [A, X_{3}, B].
We note that our choice of encoding means that our technique is more appropriately described as an evolutionary algorithm rather than a GA (in the latter, binary strings are used for encodings). We examine this choice in greater detail in the “Discussion”.
Population size and initialization
We tested three population sizes: 100, 500, and 1000. The lower range of this set corresponds to approximately 0.5 % of the total search space, whereas the upper range corresponds to over 5 % of the search space. We found that reducing the population size significantly below 100 led to stagnation from insufficient diversity within the population, making it difficult to obtain converged results.
The initial population was generated using random values for each component of the composition vector. For each GA input parameter set, the same set of 20 random initial populations was repeated.
Fitness function
We note that when optimizing multiple, independent objectives (e.g., stability, gap, and band edge position), there is no “correct” way to rank materials. We tested three different strategies for assigning a numerical fitness function to individuals in the multivariate case, although other strategies such as the Pareto optimality ranking [35, 38] also exist.
The first fitness function, which we call “Discontinuous”, sums the values of a stability score, a band gap score, and two band edge position scores. These component scores are plotted in Fig. 2. The principle of the “Discontinuous” fitness function is to withhold awarding any points unless a target property is fully met.
We label the second fitness function tested as “Smooth”. This function also sums a stability score, a band gap score, and two band edge position scores. However, in contrast to the “Discontinuous” function, the “Smooth” function continuously increases the fitness score as an individual becomes closer to meeting a target property. The component scores for the “Smooth” fitness function are also plotted in Fig. 2.
The third fitness function, which we denote as “Smooth Product”, employs the same component fitness functions as the “Smooth” fitness function (Fig. 2). However, rather than summing the component fitnesses, we take a product of the stability fitness with the sum of the band gap and band edge position fitness. The principle of the “Smooth Product” function is to assign higher fitness to compounds that balance stability and desired electronic structure.
In each case, we normalize the maximum score to 30 potential points. For the band gap and band edges, we use the higher score based on independent assessments of the direct gap and indirect gap data.
Selection function and scaling factor
We tested three algorithms for selecting individuals as parents for mating:

Uniform selection—random individuals in the population are selected to be parents without regard to fitness score

Roulette Wheel selection—the probability of an individual to be selected as a parent is proportional to its fitness function

Tournament Wheel selection—a set of tournaments are performed. In each tournament, a sample of the population is randomly selected. The selected individual is the one with the highest fitness within the tournament sample.
Whereas uniform selection involves no additional parameters, both roulette wheel and tournament selection are tunable through parameters that affect selection pressure. A high selection pressure biases selection towards the stronger individuals at the expense of population diversity.
For roulette wheel selection, we tune the selection pressure through a linear scaling of the raw fitness scores. The linear scaling approach prevents early dominance of a single individual and helps distinguish individuals in later generations (when raw fitness values might all be close to optimal). Linear scaling modifies the raw fitness values in each generation such that:
where f′ is the scaled fitness, f is the raw fitness and a and b are constants that change upon each generation. The constants are selected such that (i) the average fitness within the generation is maintained (f _{avg} = \( f_{\text{avg}}^{\prime } \)) and (ii) the maximum fitness is equal to a constant C multiplied by the average fitness (\( f_{ \hbox{max} }^{\prime } =Cf_{\text{avg}}^{\prime } \)). The constant C is a free parameter that represents the desired selection pressure. We tested several values of C ranging from 1.25 to 10 in our study, but automatically adjust it when necessary to prevent negative scaled fitness scores.
In tournament selection, the scaling parameter does not affect the results because the selected member depends only on the fitness rank rather than its absolute value. We instead tune the selection pressure through the tournament size, with larger tournaments creating greater selection pressure. We test a commonly used tournament size of two individuals, as well as tournament sizes that are 5 and 10 % of the overall population size.
Crossover function and rate
The crossover function determines how children are generated given two parents. We tested three crossover functions:

Singlepoint crossover—The parents swap either the A or B cation (but not both) to produce two children.

Twopoint crossover—The parents swap the anion X_{3} to produce two children.

Uniform crossover—A, B, and X_{3} are randomly swapped between two parents. We explicitly prevent the children from being identical to the parents unless parents are identical.
A pictorial representation of the crossover operations is presented in Fig. 3.
The crossover rate determines what percent of the parents mate to produce children; this parameter was set to 90 %, such that most parents selected for mating produce children. The remaining 10 % are passed to the next generation without modification. This choice of crossover rate is consistent with suggestions from previous studies [39, 40].
Elitism
In many optimization problems, the performance of the GA can be improved by intentionally carrying over some of the fittest individuals of the current generation to the next generation. In our implementation, such “elite” individuals replace the least fit individuals of the new population. We tested our GA with elitism turned off, and with 10, 50, and 75 % of the fittest individuals carried over to the next generation.
Mutation function and rate
Our mutation function modifies an element of the composition vector to a random value. We tested mutation rates of 1, 5, and 10 %. We note that other potential mutation operators are also possible, such as switching the identities of the A and B cation.
Convergence and additional mutation operators
When a single solution is targeted, typical convergence criteria for GA are stagnation of population diversity or failure of the fittest individual to improve with generation number. However, our GA problem is multimodal, i.e., there exist several individuals that maximize our fitness function. Our goal is to find all possible materials that meet our design criteria, and we aim to prevent population convergence to a single optimum rather than promote it.
To encourage multimodal optimization, we introduce two additional mutation operators. The first, which we call local mutation, mutates a single gene of any duplicated individuals within a generation. This operator can be thought of as performing a local search around a duplicated solution. In addition to local mutation, we detect when all members of the population were previously explored in a previous generation. In these instances, we introduce a global mutation that mutates a single gene of the entire population and increases the crossover rate to 100 % for a single generation. This “resets” the search space when the GA becomes stuck on solutions already explored in the past. We found that absent these operators, our GA could stagnate for several thousand generations, recycling the same individuals without producing new solutions.
Other methods to tackle multimodal problems have been developed; for example, a wellstudied class of techniques to handle multimodal problems, termed niching, attempts to find specialized solutions within several regions of a problem space. However, there exist many methods of implementing niching, and additional parameters must be optimized within each niching implementation [20, 35, 38, 41–44]. Therefore, we leave a comprehensive exploration of niching to a future study.
Evaluating performance
To evaluate the performance of the GA, we tested it against other methods and also on a different (but related) application of transparent photocorrosion shields.
Chemical rulebased search method
In addition to the GA, we independently tested a rulebased method of selecting compounds. This “chemical rulebased search” provides a sense of how to prioritize a search space using empirical knowledge and scientific principles. In particular, we apply the following rules:

1.
“Valence balance rule”—the formal oxidation states of all the elements in a realistic ionic material must sum to zero, such that the overall material is valenceneutral. In situations where elements are known to display multiple oxidation states (for example, the transition metals), the condition must be met for at least one of the oxidation state combinations. Materials that cannot be valencebalanced, such as LiCaO_{3}, are completely excluded from the search.

2.
“Even–odd electrons rule”—materials containing an odd number of electrons are excluded because they will contain a partially occupied eigenstate at the Fermi level. These materials will necessarily be metallic with zero band gap and therefore unsuitable for solar water splitting.

3.
“Goldschmidt tolerance factor ranking”—materials fulfilling the first two rules are ranked using their Goldschmidt tolerance factor [45]. The Goldschmidt tolerance factor t is based on the geometry of the perovskite cell and assesses the likelihood of a material to form the perovskite crystal structure. It is defined as:
$$ t = \frac{{\left( {r_{\text{A}} + r_{\text{X}} } \right)}}{{\sqrt 2 \left( {r_{\text{B}} + r_{\text{X}} } \right)}} $$where r _{ i } corresponds to ionic radii of the i = A, B, and X_{3} sites. In an ideal perovskite, t is equal to unity (t _{ideal} = 1). We ranked perovskites by the absolute deviation from this ideal value, that is by t − t _{ideal}, such that compositions that more closely meet the ideal perovskite structure are tested first. For metals with multiple known ionic radii, we used an unweighted average of known radii as the radius. When the anion site X_{3} contains multiple elements, we used a weighted average of the individual ionic radii.
These rules might approximate the intuition of a researcher in prioritizing perovskite compounds for computation.
Transparent photocorrosion shield screening
In addition to solar water splitters, the perovskite dataset can also be used to screen potential transparent shields to protect against photocorrosion, as recently reported by Castelli et al. [5]. The need for a transparent protecting shield lies in the difficulty of finding stable, mediumgap perovskites needed for water splitting; usually, stable perovskites tend to also have wide gaps [5, 6]. A widegap shield might therefore be placed in front of a mediumgap photoabsorber to enhance protection against (photo)corrosion without affecting light capture properties.
From the point of view of the GA, the only component that requires modification to address this new problem is the fitness function. In particular, we are now screening for direct gap semiconductors with gaps greater than 3 eV in order to obtain transparency. In addition, the band edge position criteria now stipulates that the valence band of the shield must lie between the valence band position of the water splitter and the oxygen evolution potential. This corresponds to a valence band position lying between 1.7 and 2.5 eV with respect to the H^{+}/H_{2} level [5]. There is no restriction on the position of the conduction band, other than that implied by the band gap and valence band criteria.
The modified component fitness functions are plotted in Fig. 4. The overall fitness functions, “Discontinuous”, “Smooth”, and “Smooth Product” are taken as sums and products of the component functions similarly to solar water splitting. There exist 8 solutions in our search space to the transparent shield screening problem (Table 2).
We retested all 2592 GA parameter sets that were examined for water splitting for the transparent shield problem, with 20 trials for each parameter set, resulting in 51,840 additional GA runs.
Efficiency metric
We evaluate the robustness of both GAs and chemical rules against a standard benchmark of random guesses within the search space. The metric used for comparing algorithms is the expected number of computations needed to produce a given number of solutions to the problem. In particular, we focus on the average number of computations needed to uncover all solutions as well as the average number of computations to produce any half of solutions. We define the efficiency (or robustness) of an optimization strategy as the ratio of the average number of calculations needed for random search to the average number of calculations needed by the GA to produce a given number of solutions:
where \( e^{n} \) represents our definition of efficiency in finding n solutions, \( c_{\text{opt}}^{n} \) is the average number of calculations needed by the optimization strategy to find n solutions, and \( c_{\text{rand}}^{n} \) is the average number of calculations needed for a random search strategy to find n solutions. For random search, the average number of computations c to produce n solutions is given mathematically by [46]:
where x is the size of the search space (18,928) and s is the total number of solutions (20 for water splitting and 8 for photocorrosion shields). The number of computations \( c_{\text{rand}}^{n} \) needed to obtain n = 10 and n = 20 solutions for water splitting is 9014 and 18,028, respectively, when randomly choosing candidates. An efficiency of 2 therefore indicates that 4507 and 9014 computations were needed to find n = 10 and n = 20 solutions, respectively.
ANOVA
To compare the contributions of each parameter choice to the GA’s efficiency, we used the analysis of variance (ANOVA) method [47–49]. ANOVA allows one to assess what factors are statistically relevant to influencing a result, the relative degree of importance of each factor, and potential interactions between parameters. We performed the ANOVA using Matlab’s multiway anovan() method. Statistical tests were performed with a 95 % confidence level, and the multiple comparison test was performed using “Tukey’s honestly significant difference” criterion.
Results
Now that we have introduced our GA parameter choices and efficiency measure, we compare a GAguided search to random and chemical rulebased search. In Fig. 5, we plot the average number of fitness evaluations (DFT computations) needed to achieve a given number of solutions. Random search, depicted by a black line, requires on average over 18,000 fitness evaluations in order to find all solutions in the search space. The bestperforming GA is reported in Table 3 and represented in blue in Fig. 5. On average, this GA requires fewer than 3100 calculations to find all 20 solutions in a space of almost 19,000 possibilities, making it 5.8 times as efficient at searching the perovskite chemical space compared with random search (Table 4). The variation in performance over 20 trials is small compared to the total number of evaluations (Fig. 5), with the standard deviation ranging from 81 evaluations in finding a single solution to 712 evaluations in finding all 20 solutions. Therefore, by employing a GA, one could have confidently searched the entire chemical space of ABX_{3} peroxides using only about onesixth as many calculations compared with computing the entire space. Stated another way, our result suggests that given a fixed computational budget, the use of the GA allows one to search chemical spaces that are much larger than the number of available calculations.
In Fig. 6, we compare the performance of our best GA versus the chemical rulebased strategy described in “Chemical rulebased search method”. We note that our chemical rules are a difficult benchmark to surpass; rules (1) and (2) of our rulebased search eliminate 11,587 compounds, or 60 % of the search space, from the search. In addition, rule (3) informs which of the remaining individuals are likely to be stable based on specific knowledge of the perovskite structure. In contrast, the GA must learn these types of rules dynamically over the course of optimization without any prior knowledge. The GA has no knowledge of what the genome or the fitness function represents; its only information comes from matching genome vectors to the numerical results of fitness evaluations. Despite these limitations, the best GA is comparable to search with basic chemical rules designed to tackle a specific materials problem (Fig. 6). This suggests that GAs might provide a path forward in problems where chemical rules are not available to the researcher in advance.
We also investigated whether the GA can benefit from knowledge of chemical rules. We reran the best GA but simulated a situation in which the 11,587 compounds that can be excluded based on chemical rules (1) and (2) are not calculated. The GA proceeds as before, but we return a fitness function of zero for any excluded compound and do not count it as as being ‘searched’. This method crudely approximates a “knowledgedirected” GA in which outside information is employed to guide the search. The results for this method are indicated in green in Fig. 6 and demonstrate that a knowledgedirected GA outperforms both chemical rules and uninformed GA by themselves. The knowledgedirected algorithm represents factors of 11.7, 2.6, and 2.0 improvements in finding all solutions compared with random search, chemical rules alone, and GA alone, respectively. The performance data for all methods is summarized in Table 4.
Next, we examine how the six GA parameters (crossover type, population size, selection method, mutation rate, elitism, and fitness) influence robustness of the GA in finding all 20 solutions. We first analyze the data using ANOVA without considering interactions between parameters. We find that all parameters except the mutation rate statistically influence the GA efficiency using a 5 % confidence test (the mutation rate has a p value of 18 %). It may be the case that the local and global mutation operators introduced in “Convergence and additional mutation operators” generate sufficient population diversity such that additional mutations are not needed to improve GA performance.
After removing mutation from the analysis, we assessed the contribution of each remaining parameter to the GA’s robustness through the η ^{2} parameter. A large η ^{2} indicates a large effect of the parameter on GA efficiency while a small η ^{2} suggests that the parameter (while statistically significant) produces only a small effect. The portion of the result that cannot be prescribed to a single parameter is lumped into an “error” term. This term encompasses both interactions between parameters and also randomness of the GA (e.g., due to different initial populations). Table 5 lists the η ^{2} measure for all parameters and the error term. The two major parameters affecting the results are elitism and selection method (Table 5). The population size, crossover type, and fitness function have statistically significant but marginal effects on the results.
We also studied an ANOVA model with pair interactions included (Table 6). Almost all interactions are very small. However, there exists one very significant interaction between elitism and the selection function. This strong interaction can be attributed to an unfavorable combination of zero elitism paired with either uniform selection or “weak” roulette selection (scaling factor of 1.25). In the case of uniform selection, the fitness function is used nowhere in the GA when elitism is absent; we are essentially performing a random search. We suspect that weak roulette selection behaves similarly, with the fitness function too weakly distinguishing good and bad individuals without the added selection pressure of elitism.
Now that we have determined which factors and interactions are most important to the GA robustness, we examine exact parameter values that yield good or bad efficiency using a multiple comparison test. This test produces the marginal mean number of calculations required to find 20 solutions to solar water splitting along with a confidence interval. It allows us to determine which parameter values are distinct from one another, and how they affect robustness. In Fig. 7, we plot the results of the multiple comparison test for selection method, crossover, population size, elitism, and fitness function. Parameters with the same color and symbol in Fig. 7 do not differ much in their effect and can be considered equivalent.
In terms of selection, there exist two groups of parameters (Fig. 7). The uniform and “weak” roulette (C = 1.25) methods both perform poorly compared to other selection methods. As discussed previously, these selection methods either partially or completely fail to take into account the fitness function. The best results are found for “strong” roulette (i.e., roulette with a scaling factor equal or higher than 2.5), although similar results can also be obtained with tournament selection. While robustness slightly increases as roulette selection becomes stronger, it slightly decreases as tournament selection becomes stronger.
Figure 7 highlights that the absence of elitism is extremely undesirable. However, the positive effect of adding elitism appears to saturate somewhere around 50 %; we do not see any difference between the elitism rate set at 50 and 75 %.
When examining the crossover function, both singlepoint and uniform crossover significantly outperform twopoint crossover (Fig. 7). Twopoint crossover operator swaps X_{3} between parents to produce children, but our problem contains only seven potential values of X_{3}. Many parents will share the same X_{3}, and children will be identical to parents. The twopoint crossover is therefore not appropriate for our problem as it is unlikely to generate sufficient population diversity. In general, it should be noted that our results regarding crossover operations are for a 3element genome and may not apply to the more common situation of having larger genomes. Therefore, our results on crossover should be viewed as specific to this application.
Regarding the fitness function, Fig. 7 demonstrates that the “Smooth Product” function performs the best, followed by the “Smooth” function and finally the “Discontinuous” function. These results suggest two guidelines in designing the fitness function. First, awarding partial points for partial solutions is helpful for the GA. Second, when designing multiobjective functions it appears to be beneficial to take products of individual fitness functions rather than sums.
Finally, Fig. 7 suggests that the population size should not be too large. For a given number of total calculations, large population sizes involve fewer generations and therefore fewer GA operations per individual. The poorer performance of large populations reported in our study may largely be due to this discrepancy.
A visual summary of the effects of various parameter choices is presented in Fig. 8. The diagonal elements in Fig. 8 represent the average efficiency when holding a single GA parameter constant while averaging over all potential values of the remaining parameters. Offdiagonal elements in Fig. 8 represent the average efficiency when holding two parameters constant and averaging over the remaining parameter values. By examining Fig. 8, we see visually many of the conclusions determined through ANOVA. For example, the dark row and column in the matrix where elitism is zero illustrates the strong negative effect of this parameter choice. It is also easy to pinpoint the unfavorable interaction between lack of elitism and uniform selection or weak roulette selection (dark red). However, it is difficult to assess the statistical significance of differences. Therefore, Fig. 8 should be considered a rough overview map of parameter space.
In summary, our study suggests several guidelines when designing GA for perovskite oxide solar water splitters. First, elitism should be set high, for example to half the population. A “strong” roulette or tournament selection method should be used. While of less importance than selection and elitism, we can also recommend a population size small enough to enable several GA operations per individual (100, or 0.5 % of the search space, was optimal in our tests) and a Smooth fitness function that is the product of several individual functions.
While these recommendations pertain to finding all 20 solutions to the perovskite solar water splitting problem, it is interesting to test how they generalize to other problems. As a first example, we consider the problem of finding only half the number of solutions in the search space and reexamine our suggestions for parameter choices. The metric of evaluations needed to find any 10 solutions might be important in computational screening if our desire is to quickly pinpoint a few compounds for laboratory followup. Table 7 lists the η ^{2} values for singlefactor ANOVA but for the problem of finding 10 solutions to the solar water splitting problem.
The main difference between the ANOVA results for the 10 versus 20 solutions is with respect to the population size. While the population size explained only 3 % of the variance for 20 candidates, it is much important (13.2 %) for 10 candidates. In both problems, smaller population sizes (100) are more favorable than larger ones. However, the benefits of a small population size are much more pronounced when targeting 10 candidates. This might be because small population sizes carry less diversity than large populations, presenting a natural disadvantage in searching globally for multiple optima. Large populations are slow to find initial solutions because of fewer GA operations for a given number of calculations, as discussed earlier. However, once these rules are discovered the greater diversity in large populations could become advantageous in searching globally for solutions.
Using a multiple comparison test (Fig. 9), we find that another major difference in finding 10 versus 20 solutions is the choice of selection method. Whereas obtaining 20 solutions favored strong roulette selection, obtaining 10 solutions favors a strong tournament selection rule (tournament size of 5 or 10 %). In both cases, binary tournament selection performs similarly to strong roulette selection. It might be the case that tournament selection overall creates more selection pressure than roulette selection. Similar to small population sizes, the very high selection pressure of strong tournament selection might be advantageous for finding solutions within a small region of chemical space but be suboptimal in finding solutions globally.
Figure 10 plots the efficiencies of finding ten and all solutions for each of the 2592 parameter sets. The two properties are correlated, suggesting that the same parameters might be used for both problems. In particular, we label the “best” GA overall, and note that it performed optimally in finding both 10 and 20 solutions. As discussed previously, Fig. 10 indicates that large population sizes (green diamonds) are less efficient than small population sizes (blue circles and orange squares), and even more so when attempting to find only ten solutions.
As a second test to the transferability of our recommended GA parameters, we attempt to identify transparent photocorrosion shields as described in “Transparent photocorrosion shield screening”. In Fig. 11, we compare the efficiency of each set of GA parameters in optimizing the solar water splitter problem to the efficiency in optimizing the transparent shield problem. We note that the best performance for the transparent shield problem is approximately 8 times more efficient than random search (Fig. 11), demonstrating the GA is also applicable to a second problem.
In general, there exists a correlation between the two problems: a GA parameter set that performs well in identifying solar water splitters is also more likely to identify transparent shields efficiently (Fig. 11). However, there is considerable scatter in the relation, which suggests that unfortunately even similar problems over the same chemical space require slightly different GA parameters for optimal performance.
It should be noted that chemical search outperforms the GA in finding transparent shields, with 13.4 times improvement over random search to find all 8 solutions. Chemical rules might perform better in finding fewer solutions (as in the transparent shields problem), whereas GA might be able to outperform chemical rules when finding a greater number of solutions (as in water splitting). In Fig. 6, for example, we see a sharp dropoff in the performance of chemical rules after about 15 solutions found.
We perform a singlefactor ANOVA on the photocorrosion shield data set to assess any difference in important parameters compared to solar water splitting. The results, presented in Table 8, are mostly similar to the water splitting case. However, elitism is an even greater factor in the transparent shields problem. In addition, a multiple comparison test (Fig. 9) demonstrates that strong tournament selection is optimal for finding all transparent shields, whereas strong roulette selection was optimal for finding all solar water splitters (Fig. 7). In this respect, finding all transparent shields is similar to finding only 10 solutions for water splitting. This similarity might originate because there exist only 8 solutions for the transparent shield in the search space, suggesting that we should use parameters that rapidly find a small number of solutions.
In conclusion, the parameters with the largest effects on the results are elitism, selection method, and population size. In particular, zero elitism is particularly detrimental to GA performance, especially when employing weak selection methods. The exact tuning of the parameters, and in particular the choice of strong roulette versus strong tournament selection, appears to depend on the problem. For example, our results suggest that higher selection pressures should be used when targeting fewer solutions. However, there exists overall a strong correlation between parameters that perform well on one problem versus other similar problems.
Discussion
While we have so far mainly discussed the GA as a “black box” optimizer, we now consider its operation in more detail. To help understand how a GA might improve performance in our problem, we refer to a previous study by CalleVallejo et al. [50] on trends in perovskite stability in pure oxides. Using DFT computations similar to those employed in this work, CalleVallejo et al. [50] observed that the enthalpy of formation was mostly constant for a given B ion (B^{3+} and B^{4+} behave differently). This result might help explain the efficiency of the GA: we would expect that ‘fit’ parents with favorable ‘B’ genes will produce children that inherit this Bsite ‘gene’ that confers good formation enthalpy. In addition, there is also evidence from CalleVallejo et al.’s results that perturbations to the formation enthalpy due to the A site should follow the same rank and general direction independently of the B site (although the magnitude might vary depending on B) [50]. Thus there also exist ‘desirable’ values of the A gene that could be passed between generations. We speculate that similar trends may hold true for the band gap and band position criteria. For example, there exists a weak relation between formation enthalpy and band gap that suggests that the factors that control formation enthalpy might also tune the band gap [5, 6].
We note that our choice of encoding of a material into a genome string might affect GA robustness. Our encoding employed a short genome of length 3 with a high cardinality alphabet that contained up to 52 values. The advantage of this encoding was that it was trivial to encode and decode between a perovskite material and its genomic representation. However, this might not be an optimal encoding in terms of robustness, because it treats each element in the periodic table as an independent entity. In particular, it neglects chemical relationships between elements in the periodic table. For example, our encoding prevents a crossover operation from mixing an early transition metal with a late transition metal to produce an intermediate transition metal. Such an operation might be achieved by representing each element in a binary or Gray coding that represents electronegativity or Mendeelev number. This representation would allow a child to inherit an element that is intermediate in chemical behavior to its parents. In GA terms, this would create a long genome with a low cardinality alphabet. Such representations present more opportunities to find and mix building blocks that confer fitness, thereby enhancing efficiency [9, 35, 51].
Our target problem was difficult in some respects: it is a multimodal problem with several solutions and in which the relationship between formation enthalpy, band gap, and band edge position is complex and unknown. However, our search problem was also simpler than many realistic materials design scenarios because our search space only involved a single generalized composition (ABX_{3}) and a single crystal structure (perovskites). Many important materials design studies must search over several different composition templates and structure prototypes. For example, a recent computational investigation by Berger and Neaton [52] suggested that a Cr–V mixture in a double perovskite structure might be interesting for water splitting, and a separate computational study by Wu et al. [53]. found many potentially interesting water splitters by canvassing chemical substitutions into the ICSD database. Extension of our scheme to doubleperovskites should be straightforward by increasing the length of the feature vector, but significant changes to materials encoding and crossover operations would be needed to test all the diverse structures found in the ICSD. However, we believe that this issue does not pose a major barrier to employing GA in more sophisticated searches. In particular, there exists a rich and successful history of employing GA coupled with computation to predict new crystal structures [33], and appropriate operators for crossover, selection, etc., have already been developed for searching over both crystal structures and compositions with a GA [54, 55].
We note that integrating GAs, or any optimization algorithm, into highthroughput computational searches still requires further effort. In particular, the GA implementation tested herein relies on completing one generation of computations before beginning the next generation. The typical way to parallelize this type of GA is to assign a “controller” node to coordinate the GA engine and assign the remaining nodes as “workers” that perform fitness evaluations (DFT computations). There are at least two major limitations with this setup. The first limitation is that the number of worker nodes must always balance the number of fitness evaluations needed in each generation in order to keep the workers occupied with computing tasks. Therefore, the number of worker nodes and the parallelizability of the fitness evaluations will restrict the choice of GA parameters. More worker nodes will stipulate higher population sizes, lower elitism, or improved parallelizability in evaluating the fitness function. A second limitation is that the controller node must wait for all fitness evaluations within a generation to complete before proceeding with selection, crossover, and mutation operations. A single DFT computation that is slowtoconverge might thus impede the progress of the entire GA. This is a real problem with DFT computations because time to completion can vary by days and is difficult to predict in advance.
Fortunately, alternate GA models have been designed that overcome such limitations in parallelization [35, 56, 57]. For example, in an asynchronous GA, the GA operators are immediately applied after each fitness evaluation using the population available at the time. Another technique is to perform independent GAs on different processors, but to communicate fittest individuals observed between GA instances. These methods, as well as others that have been devised [35], solve both issues presented earlier by ensuring that compute nodes are never kept idle. We note while other optimization techniques such as simulated annealing are also available [13], a major advantage of the GA is its potential for attaining high parallel performance [58] and integration into highthroughput computation. However, a necessary step forward to the automated inverse design of materials is the integration of the optimizer into one of several existing highthroughput DFT frameworks [7, 23, 59, 60].
We hypothesize that a more advanced GA might further improve performance beyond the values reported in this work. For example, niching, the use of a Pareto optimal rank fitness function, and a more flexible encoding were already mentioned as potential enhancements [35]. In addition, previous work by Balamurugan et al. [61]. suggests that a “hybrid” approach, whereby a GA is coupled to local search using alchemical derivatives [62, 63], might be a promising avenue for further performance improvements.
Conclusion
We demonstrated that use of a GA improves the efficiency of searching a chemical space of almost 19,000 perovskites for solar water splitters. The GA was especially useful at rapidly finding half of the solutions (almost 10 times efficiency gain over random search), and provided up to a 5.8 greater efficiency in finding all solutions. The performance of the best GA tested was comparable to a set of chemical rules we designed to filter and rank perovskite materials for this problem. A GA might therefore be applied in situations where chemical rules are not known in advance. Combining the GA with chemical rules further improved performance, leading to 16.9 and 11.7 times less fitness evaluations needed than random search to find 10 and all 20 solutions, respectively. We further found that in an alternate problem aimed at uncovering transparent photocorrosion shields, the GA performed 8 times more efficiently than random search.
Using ANOVA, we determined that the most important parameters for good performance were elitism and selection function. The GA performed best when the elitism was set to at least 50 %. The appropriate selection function appears to depend on the number of solutions in the search space. For finding all 20 solutions to the solar water splitting problem, strong roulette selection performs best. For finding 10 solutions to the water splitting problem or 8 solutions to the photocorrosion problem, a strong tournament selection performs better. In all cases, we found small population sizes to be beneficial, although the advantage diminished with the desired number of solutions.
We speculate that further gains in GA performance might be obtained through niching, longer genome encodings, or a Pareto optimal fitness function. While significant work still remains to couple a GA “control loop” to an automated and rapid DFT computation framework, our results suggest that such a technique presents a viable method to rapidly screen large chemical spaces for technological materials.
References
Hohenberg P, Kohn W (1964) Phys Rev 136:B864
Kohn W, Sham LJ (1965) Phys Rev 140:1133
Hautier G, Jain A, Ong SP (2012) J Mater Sci 47:7317. doi:10.1007/s1085301264240
Hafner J, Wolverton C, Ceder G (2006) MRS Bull 31:659
Castelli IE, Landis DD, Thygesen KS, Dahl S, Chorkendorff I, Jaramillo TF et al (2012) Energy Environ Sci 5:9034
Castelli IE, Olsen T, Datta S, Landis DD, Dahl S, Thygesen KS et al (2012) Energy Environ Sci 5:5814
Jain A, Hautier G, Moore CJ, Ong SP, Fischer CC, Mueller T et al (2011) Comput Mater Sci 50:2295
Materials Project (2011) http://www.materialsproject.org
Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor
Hautier G, Fischer C, Ehrlacher V, Jain A, Ceder G (2011) Inorg Chem 50:656
Hautier G, Fischer CC, Jain A, Mueller T, Ceder G (2010) Chem Mater 22:3762
Balachandran PV, Broderick SR, Rajan K (2011) Proc R Soc A 467:2271
Franceschetti A, Zunger A (1999) Nature 402:60
Kim K, Graf PA, Jones WB (2005) J Comput Phys 208:735
Dudiy S, Zunger A (2006) Phys Rev Lett 97:1
d’Avezac M, Luo JW, Chanier T, Zunger A (2012) Phys Rev Lett 108:1
Johannesson G, Bligaard T, Ruban A, Skriver H, Norskov JK (2002) Phys Rev Lett 1:255506
Graf PA, Kim K, Jones WB, Hart GLW (2005) Appl Phys Lett 87:243111
Piquini P, Graf P, Zunger A (2008) Phys Rev Lett 100:1
Chakraborti N (2004) Genetic algorithms in materials design and processing. Int Mater Rev 49:246
Bhalla A, Guo R, Roy R (2000) Mater Res Innov 4:3
Peña MA, Fierro JL (2001) Chem Rev 101:1981
Landis DD, Hummelshøj JS, Nestorov S, Greeley J, Dulak M, Bligaard T et al (2012) Comput Sci Eng 14:51
Computational Materials Repository (2013) https://cmr.fysik.dtu.dk/cmr/index.php
Enkovaara J, Rostgaard C, Mortensen JJ, Chen J, Dułak M, Ferrighi L et al (2010) J Phys Condens Matter 22:253202
Mortensen J, Hansen L, Jacobsen K (2005) Phys Rev B 71:1
Hammer B, Hansen L, Nørskov J (1999) Phys Rev B 59:7413
Kuisma M, Ojanen J, Enkovaara J, Rantala T (2010) Phys Rev B 82:1
Gritsenko O, van Leeuwen R, van Lenthe E, Baerends E (1995) Phys Rev A 51:1944
Xu Y, Schoonen M (2000) Am Mineral 85:543
Butler MA, Ginley DS (1978) J Electrochem Soc 125:228
Armiento R, Kozinsky B, Fornari M, Ceder G (2011) Phys Rev B 84:04103
Oganov A, Lyakhov A, Valle M (2011) Acc Chem Res 44:227
Sastry K, Goldberg D, Kendall G (2005) In: Burke EK, Kendall G (eds) Search methodologies. Springer, New York, p 97
Goldberg D (1989) Genetic algorithms in search, optimization, and machine learning. Addison Wesley, Reading
Perone C (2012) Pyevolve software https://github.com/perone/Pyevolve
Perone CS (2009) ACM SIGEVOlution 4:12
Konak A, Coit DW, Smith AE (2006) Reliab Eng Syst Saf 91:992
Mercer RE (1977) Adaptive search using a reproductive metaplan. University of Alberta, Edmonton
Grefenstette JJ (1986) IEEE Trans Syst Man Cybern 16:122
Sastry K, Abbass H, Goldberg D, Johnson DD (2005) In: Proceedings of the 2005 conference on genetic and evolutionary computation, p 671
Perry ZA (1984) Experimental study of speciation in ecological niche theory using genetic algorithms. Doctoral Thesis, University of Michigan
Mauldin M (1984) In: Proceedings of the national conference on artificial intelligence, Austin, TX, p 247
Goldberg DE, Richardson J (1987) In: Proceedings of the second international conference on genetic algorithms, p 41
Goldschmidt VM (1926) Naturwissenschaften 14:477
Schmuland B (2012) Math Exchange Forum. http://math.stackexchange.com/questions/206798/pul
Fisher RA (1925) Math Proc Camb Philos Soc 22:700
Rojas I, González J, Pomares H, Merelo JJ, Castillo PA, Romero G (2002) IEEE Trans Syst Man Cybern Part C 32:31
Sahai H, Ageel MI (2000) The analysis of variance: fixed, random and mixed models. Birkhäuser, Boston
CalleVallejo F, Martínez JI, GarcíaLastra JM, Mogensen M, Rossmeisl J (2010) Angew Chem Int Ed 49:7699
Holland J (1968) Hierarchical descriptions of universal spaces and adaptive systems. Technical Report, University of Michigan, Department of Computer and Communication Sciences
Berger R, Neaton J (2012) Phys Rev B 86:1
Wu Y, Lazic P, Hautier G, Persson K, Ceder G (2013) Energy Environ Sci 6:157
Oganov AR, Glass CW (2006) J Chem Phys 124:244704
Woodley S (2004) Appl Evol Comput Chem 110:95
Bethke AD (1976) Comparison of genetic algorithms and gradientbased optimizers on parallel processors: efficiency of use of processing capacity. Technical Report, University of Michigan, Logic of Computers Group
CantuPaz E (2000) Efficient and accurate parallel genetic algorithms. Springer, New York
Bandow B, Hartke B (2006) J Phys Chem A 23:5809
Munter TR, Landis DD, AbildPedersen F, Jones G, Wang S, Bligaard T (2009) Comput Sci Discov 2:015006
Ortiz C, Eriksson O, Klintenberg M (2009) Comput Mater Sci 44:1042
Balamurugan D, Yang W, Beratan DN (2008) J Chem Phys 129:174105
von Lilienfeld OA (2009) J Chem Phys 131:164102
Wang M, Hu X, Beratan DN, Yang W (2006) J Am Chem Soc 128:3228
Acknowledgements
We thank Dr. Shahar Keinan, Dr. Yosuke Kanai, Dr. Jeffrey Tilson, and Dr. Robert Fowler for their thoughts and assistance in designing this study. We thank Dr. Byron Schmuland for providing an elegant derivation of the random choosing probability problem via Math Exchange. Geoffroy Hautier acknowledges the F.R.S. FNRS Belgium for financial support under a ‘‘Chargé de Recherche’’ grant. Anubhav Jain acknowledges funding through the U.S. Government under Contract DEAC0205CH11231 and the Luis W. Alvarez Fellowship in Computational Science. Ivano E. Castelli and Karsten W. Jacobsen acknowledge support from the Danish Center for Scientific Computing through grant HDW110306, from the Catalysis for Sustainable Energy (CASE) initiative funded by the Danish Ministry of Science, Technology and Innovation and from the Center for Atomicscale Materials Design (CAMD) sponsored by the Lundbeck Foundation. This research is supported by the Office of Science of the U.S. Department of Energy under contract DEAC0205CH11231.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jain, A., Castelli, I.E., Hautier, G. et al. Performance of genetic algorithms in search for water splitting perovskites. J Mater Sci 48, 6519–6534 (2013). https://doi.org/10.1007/s1085301374489
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s1085301374489