Genetic Algorithms and the Search for Viable String Vacua

Genetic Algorithms are introduced as a search method for finding string vacua with viable phenomenological properties. It is shown, by testing them against a class of Free Fermionic models, that they are orders of magnitude more efficient than a randomised search. As an example, three generation, exophobic, Pati-Salam models with a top Yukawa occur once in every 10^{10} models, and yet a Genetic Algorithm can find them after constructing only 10^5 examples. Such non-deterministic search methods may be the only means to search for Standard Model string vacua with detailed phenomenological requirements.


Introduction
Here is a physicist's estimate of the "landscape of haemoglobin": it is composed of four chains of amino acids with either 141 or 146 in each chain; there are about 500 known amino acids; therefore the haemoglobin molecule represents a solution out of a landscape of at least 500 574 possibilities.
Such huge numbers lead one to wonder if the string landscape [1], which is positively diminutive by comparison, might seem less daunting if it were approached using some of the many heuristic search tools based on natural processes, such as simulated annealing, tabu searches and so on. In this paper we focus on the one most closely related to natural selection, namely Genetic Algorithms (GA's) [2][3][4].
GA's (and indeed natural selection) work in systems where incremental changes lead to incremental improvements. They are not very effective in "needle-in-a-haystack" situations where any solution other than the correct one is equally disfavoured. String theory seems to be of the former variety where a GA should be applicable: the anecdotal evidence that supports this is that, despite the fact that an exhaustive scan is clearly out of the question, it has been possible in many different string configurations (heterotic, intersecting branes, flux compactifications, rational CFTs for example) to piece together models that closely resemble the Standard Model.
As such the task of finding a completely viable string vacuum is likely to be what in computational complexity theory is called an NP-complete problem (where NP refers to Non-deterministic Polynomial time); that is a problem for which any given solution can be verified in a time that increases only polynomially with the difficulty, but where finding a solution by a simple deterministic search algorithm (such as exhaustive scanning) rapidly becomes computationally infeasible. Indeed a similar point was made in ref. [5], to which the reader is directed for precise definitions. NP-complete problems are precisely where heuristic search methods become effective.
The purpose of this paper is to demonstrate the efficacy of GA's in finding desirable string vacuum solutions, by examining a small sub-class of string theories, namely heterotic strings in the Free Fermionic formulation [6][7][8]. We will show that they are (many orders of magnitude) more efficient than a random search at finding string vacua with particular desirable properties. This is especially evident when one applies many phenomenological requirements and the search is multi-modal. For example GA's do not confer much advantage if one is just searching for say three generation models. However, in line with them being effective on NP-complete problems, they come into their own when the search is statistically very difficult (when for example only one in 10 7 models or fewer has the particular properties of interest).
Given the comments above, one thing we can conclude from the fact that GA's work so well is that finding the SM in the string landscape is precisely not like looking for a needle in a haystack: the landscape has structure and similar models have certain correlations.
We will describe exactly what these correlation are expected to be, but because the number of possible models is so huge it is not possible for us (even in this fairly restricted set of models) to check them explicitly. Nevertheless in our view the fact that GA's work is evidence that they are there.
1.1 Overview of GA's: a fake landscape of 10 500 Before getting to string theory, it is instructive to create a somewhat artificial optimization problem that has a similarly large landscape in order to introduce the GA technique and to make apparent its generic advantages and also its limitations. Suppose that we wish to find the supremum of some function f (x, y) in the domain x ∈ (0, 10), y ∈ (0, 10), without using calculus. One way do this is as follows: consider writing out the possible coordinates x = a.bcdef... y = g.hijkl...
where a, b, c... are digits between 0 and 9. In principle one could scan over x and y by cycling through all possible strings of digits a, b, c... To find the supremum one simply evaluates f (x, y) at each point, and chooses the largest value. Obviously this is a very labour intensive way of solving the problem (which is why calculus was invented): if the desired accuracy is high, to say 250 digits, then one has to scan 10 500 possible values of all the digits to reproduce every possible value of x and y. By wilful incompetence the optimisation problem has been recast in a form that has a string-sized landscape of possible solutions!! The solution can be found with a GA as follows. We will describe a "simple" GA of the kind developed by Holland [2]. First introduce a reasonably large population, p, of creatures defined by their genotype. (We discuss the optimal population size below.) This is simply a string of data for each one containing all the defining properties -often it is taken to be binary, but in this case we will consider the strings of digits describing the x and y coordinates to be two chromosomes 1 of length 250. Each possible digit is referred to as an allele, and each position as a locus. Initially the genotype of each member of the population is chosen at random, so the population is effectively sprinkled at random over the domain. Each genotype results in a physical characteristic, the phenotype, so called: in this case the phenotype can be the function f (x, y) evaluated at the point corresponding to the genotype. The next step is to organise breeding, with pairs of creatures being chosen to reproduce at random but with a weighting proportional to their fitness; in this case "fitter" creatures are obviously those that are sitting higher on the landscape ("fitness landscape" is coincidentally the usual GA terminology), so the fitness function can simply be chosen to increase linearly with the height f (x, y) in a way to be made precise below. The number of breeding pairs is chosen to be p so as to keep the population constant. Reproduction consists of the following crossover procedure: cut the chromosomes of a breeding pair at the same two randomly chosen positions and swap the middle section. (Usually but not always self reproduction is allowed.) At the same time introduce some mutation: reassign a tiny fraction of the digits in the offspring (less than a percent usually) to arbitrary random values 2 .
Remarkably, repeating this simplistic procedure over a number of generations results in a population that gathers with increasing accuracy around the desired global maximum 1 The common terminology equates "chromosome" with "genotype", but for stringy applications we think it may be convenient to keep the distinction. 2 In this simple GA the entire population is killed off and replaced at every generation (see [3]). There are many variations -for example the "steady state" GA continually produces new offspring, and kills off the less fit members of the population for them to replace. Likewise crossovers can be chosen to occur at one or many points on the chromosomes, and selection can be organised differently. They all have in common though the three essential ingredients of selection, crossover and mutation. value of the function. Moreover the three ingredients of selection, crossover and mutation are crucial 3 . If done correctly (see below) one can obtain a solution to any desired precision.
It is worth noting some advantages over other techniques. First the function f can have many maxima, and yet the procedure can still find the global one: the algorithm effectively samples the whole fitness landscape. Indeed f does not even need to be differentiable, a fact that strongly suggests the technique could be powerful in the string context, where getting from vacuum to vacuum often involves topology changing transitions. In addition, the computational difficulty appears to rise roughly linearly with the length of the genotype even though the size of the fitness landscape is increasing exponentially. Finally, the process is very robust. It doesn't matter for example if we choose to flatten all the chromosomes of each creature into one long string of data and perform a single crossover for the entire genotype, or if we perform crossovers on the chromosomes individually.
Many of these properties can be understood (at least intuitively) in terms of schemata and the schema theorem which was introduced by Holland [2] and which we will describe shortly. But before we do so, it is worth seeing the procedure at work on a particular function. Consider finding the maximum of f (x, y) = 12 cos 3y 2 sin This "mogul-field" function, shown in figure 1, is clearly a hard function for the usual hill-climbing algorithms to maximize.
As mentioned above, the simplest convention for choosing breeding pairs, and the one we shall use here, is that they are weighted linearly with f (x, y) (roulette wheel selection): 3 Note crossover and mutation can be either/or; they should be present in the population but do not have to occur simultaneously in the same individuals.
that is they are selected with a probability given by a fitness function for the i'th creature of the form where a and b are constants that need to be adjusted each generation. If the weighting is chosen with a too large, one finds that fitter creatures swamp the distribution very early, and there is premature convergence to the wrong solution. Once the population is all gathered around the wrong peak, the advantage of selection is lost: one has to hope for an advantageous random mutation, so in a sense the process has temporarily become no more efficient than a Monte-Carlo procedure (although once the population has "Monte-Carlo'd" out of the wrong solution, the process reverts to the preferred GA behaviour).
There are various sophisticated techniques to counteract premature convergence, such as introducing a fitness penalty for crowding, or fitness sharing. We will not need to explore these here, but in the event of such stagnation will instead resort to momentarily enhancing the mutation rate. Why this works will become apparent when we come to discuss the schema theorem below, but generally it is already clear that selection and mutation are playing complementary roles; selection drives the system to nearby maxima, while mutation drives one away.
Conversely once the population is gathered around the correct global peak one can increase a and dial down mutation in order to distinguish between creatures that all have similar heights, and gain accuracy. The convention is to choose a and b such that a creature of average fitness breeds once, while the fittest creature breeds twice on average. If the latter multiple is α, then the fitness function becomes with the average fitnessf and maximum fitness f max being re-evaluated for every generation. Note that the probability is invariant under rescaling and shifting of the function f (x, y) → βf (x, y) + γ. In our analysis we found that a higher value, α ≈ 3, gave a slightly better convergence rate. In this particular example (but not actually in the string theory problems we shall discuss) one final adjustment can make convergence more rapid: De Jong's Elitist Selection Scheme involves copying the fittest member of the population across to the next generation unchanged.
The evolution of a population of 60 individuals is shown in figure 2. The initial population coalesces around the maxima after only a few generations. The lower maxima are then disfavoured until the population is all gathered around the solution (which for the record is at x = 6.26347798 and y = 7.23832285). Note that around generation 10 the population seems to be favouring the wrong maximum but then corrects itself after a few generations.
There are two aspects that limit the final accuracy 4 . The first is simply the precision to which the function f (x, y) can be evaluated when the algorithm reaches a plateau. The second is the population size required for a meaningful search, which increases with the genotype length. As discussed in ref. [3] one useful criterion to adopt is that every point in the search space could in principle be reached by cross-over only, which requires that in the initial population every allele should occur at least once at each locus with say P * = 0.999% probability. For a decimal "alphabet" this exacts a relatively high price in terms of population size as the length increases, whereas for a binary one (assuming initially randomised genotypes) the minimum population required is which increases very slowly. This is one reason it is considered more efficient to express the genotype in a binary form. (e.g. using a binary genotype an accuracy of 10 500 would require a genotype of length = 1661, and a minimum population of only p = 22).
Nevertheless it is important to recall that the chromosomes' length is 250, so the process really is sampling the whole 10 500 landscape even if the population is only 60. As already mentioned, the process can cope with complicated and non-differentiable landscapes: for example figure 3 shows the 500'th generation for f (x, y) = 12 (3 cos(50y) sin(50x) + x + y) − x 2 − y 2 . This function is extremely "choppy", having O(50 2 ) relatively high maxima and minima in the domain. GA's are impeded by more rugged landscapes but can still operate.
This illustrates another feature of GA's which is that because it begins widely spread over the domain, the population initially averages over rapid local fluctuations and responds to the longer wavelength features. Despite this robustness, it is clear that even though the fitness landscape does not need to be differentiable, it should certainly have structure. For example if we had a "needle in a haystack" landscape consisting of 10 500 − 1 squares in which f = 0 and one square with f = 1, then a GA would be no better than a random scan at landing on this one square.

The schema theorem
As we mentioned, the schema theorem was introduced by Holland as a way to formalise the remarkable properties of GA's [2]. It has been criticised by many authors but nevertheless, while nowadays it is not thought to give a complete understanding, it does present useful ideas that at least partially explain how GA's work.
A schema is a representation of some crucial set of digits that is supposed to confer some favourable characteristic, and in this instance might look something like S = 3 * * * 4 * 6.
This example has 3 entries that we are interested in (hence we say it is order 3) and 4 entries that we do not care about, which are labelled with a wildcard * . It also has a defining length d(S) = 7, and we will call its order, o(S). This can be formalised as follows. Suppose that mutation has just produced in the population a favourable schema, S. Let n(S, t) be the total number in the population containing it at time t. We can define the average fitness of all members of the population containing S, as f S (t) = i∈S f i /n(S, t), which is higher than the average fitness of the population as a whole,f . Assuming that selection is proportional to fitness, f (t), then the expected number of offspring containing S is i∈S f i /f . Neglecting crossover and mutation this would be the expectation of n(S, t + 1); let us rewrite it as With simple probabilistic arguments one can incorporate the effect of a single-point crossover destroying S, and mutations at a rate p m per digit to find a lower bound These relations capture the observed growth of favourable schema. If our new schema gives a greatly improved fitnessf (1+∆) then at first we can neglect the n(S) contributions inf and also the crossover and mutation, so that the subsequent evolution is given by so that S is not quite able to saturate the entire population.
Thus we see the trade-off of GA's which is that the entire population can never quite reach the perfect solution, although some members may do. On the plus side however, it could be that schema S represents a local optimum but that our desired solution has an even better schema in mind. In that case, as we mentioned above, as well as producing new schemata, mutation plays a second role in that it prevents the GA stagnating. This is why increasing the mutation rate temporarily is sufficient to free the system if it becomes trapped at a local extremum. (Note however -and we shall see this in practice when we come to apply GA's to string vacua -that the most efficient procedure by far is to mutate roughly 10-20% of the genotype rather than to scramble them completely which would lose all the beneficial schemata that had been acquired up to that point.) In this manner we expect the algorithm to proceed by bursts of rapid improvement followed by periods of stasis towards the desired global solutions. 5 We should remark that we are presenting the original and most simplistic interpretation which will suffice for a qualitative understanding of the behaviour. There has been debate about the proper interpretation and in particular about how long exponential growth continues until it is affected by the change in the make-up of the population (see Reeves and Rowe in [3]). In particular it should be borne in mind that the optimal population is typically 50-100 individuals, so that the exponential growth phase can only be a short lived idealisation.
-9 - We now present the stringy problem that we will consider for this study, namely finding phenomenologically viable Pati-Salam models in the Free Fermionic Formulation of the heterotic superstring [6][7][8].
Before we describe the formalism in detail, let us briefly comment further on the relation of our approach to the landscape programme. It has been known for a long time that these and similar models lead to a huge number of possible vacua. For example [11] estimated 10 1500 vacua in the closely related covariant lattice approach, far in excess even of the later flux vacua estimate in [1]. The approach advocated in [1] and related papers (see [12] for a recent review) was to determine correlations between physical characteristics.
Alternatively one can count the multiplicities of string vacua and regard the characteristics that occur frequently as being more natural.
Completely general computer-based searches were used to consider correlations for the Free Fermionic vacua in ref. [13]. However, there are limitations to these and similar approaches, due to the space of models being so large, and due to the time-consuming computation of the spectrum in every step of the search procedure. Importantly this leads to inevitable restrictions as to what statistical correlations can and cannot reliably be established, as discussed in ref. [14].
As we shall see, in performing a GA study one is also effectively studying correlations, but very different ones from those that were explored in the landscape programme.
In the language of GA's the difference is that essentially the latter explored phenotypephenotype correlations, whereas the frequencies occurring in GA studies are more sensitive to genotype-phenotype correlations, in a way that will be made precise below. In order to test GA's against a set of models where a comprehensive scan is feasible for comparison, we will use the efficient, albeit not general, hybrid method that was formulated in [15] for the case of Z 2 × Z 2 vacua. It is based on a fixed set of basis vectors, singling out models with SO(10) gauge symmetry that play the role of the "observable" gauge group, and a variable set of GSO projection coefficients. This set-up permits the derivation of analytic results for several characteristics of the models, including the number of fermion generations, the number of additional vector-like fermion/Higgs multiplets, the number of SM breaking Higgs doublets and the number of exotic multiplets. Integrating these analytical formulae into a computer programme we can achieve a much higher scan speed of the order of 10 4 models per second for a single CPU core.
The first class of models studied in this manner were Pati-Salam vacua [16]. The Pati-Salam GUT model possesses SU (4) × SU (2) L × SU (2) R ⊃ SO(10) gauge symmetry [17]. In the supersymmetric version [18] SM quarks and leptons reside in SO(10) spinorials, 16 = (4, 2, 1) + 4, 1, 2 ,  In general, the spectrum of the model also includes a number of singlets φ (1, 1, 1) that may couple to the additional triplet and Higgs fields. Some of these singlets are likely to develop vevs and provide mass terms for the non chiral fields of the spectrum.
Fermion masses arise from the coupling Mixing of neutrinos with some of the (heavy) singlet states is required in order to attain realistic neutrino masses.
Summarising, the spectrum of a supersymmetry PS GUT model includes Equation (2.9) can be regarded as a constraint that guarantees the integrity of fermion generations. As a result, putting aside singlet states, as far as the spectrum is concerned a PS model can be characterised by five integers namely n g , k L , k R , n h , n 6 . A minimal model has n g = 3, k L = 0, k R = 1, n h = 1, n 6 = 1, however any model with n g = 3, k L ≥ 0, k R ≥ 1, n h ≥ 1, n 6 ≥ 1 is in principle phenomenologically acceptable at this level of analysis.
Semi-realistic supersymmetric Pati-Salam models can be relatively easily constructed in the Free Fermionic Formulation [18,19]. They require exclusively periodic-antiperiodic boundary conditions on the world-sheet fermions and the corresponding GGSO phases can be written in terms of binary integers, c v i v j = e i π c ij , c ij = {0, 1}. Following [16] we can systematically study a class of PS models generated by the basis vector set B =    The former arise from the sectors b I pqrs (+S) , I = 1, 2, 3 and the latter from x + b I pqrs (+S) , I = 1, 2, 3, where b 1 pqrs = b 1 + p e 3 + q e 4 + r e 5 + s e 6 , b 2 pqrs = b 2 + p e 1 + q e 2 + r e 5 + s e 6 , b 3 pqrs = x+b 1 +b 2 +p e 1 +q e 2 +r e 3 +s e 4 , p, q, r, s ∈ {0, 1}, and x = 1+S+ 6 i=1 e i + 2 k=1 z k . Additional exotic states transforming as (4, 1, 1), 4, 1, 1 (1, 2, 1) and (1, 1, 2) under the observable PS gauge group may also arise from the twisted sectors b I +α (+z 1 ) (+x) (+S), I = 1, 2, 3. We denote by n e the number of these states. They carry fractional charges and in particular they include SM singlets and doublets with ± 1 2 electric charge. The appearance of these states is generic in these vacua [20]. However, as shown in [16] the class of models under consideration includes "exophobic" vacua where all exotic fractionally charge states receive string scale masses.
Selecting amongst this huge number of vacua requires first the computation of the spectrum and second the introduction of a set of phenomenological criteria. As illustrated in [15] we can derive general analytic formulae regarding the main characteristics of models in this set in terms of the GGSO phases, i , i = 1, . . . , 51. These formulae involving ranks of binary matrices depending on i are too lengthy to include here. However, they can be easily incorporated in a computer code. The model selection criteria can be either related to the spectrum or to the couplings of the effective low energy theory. The latter are harder to implement so we will restrict to the existence of the top quark mass coupling.
3 GAs in the fermionic string landscape

Introductory remarks
Let us now see how a GA performs in the search for viable models. First we make some general remarks. When it comes to string phenomenology any fitness landscape is composed not of continuous functions but of physical properties such as supersymmetry, number of generations, Yukawa couplings and so forth. Nevertheless the question of whether the fitness landscape defined in terms of such observables has structure remains crucial, and one of the purposes of testing GA's is therefore to address this issue.
To be more specific, suppose that one constructs a GA to converge on models with three generations. To do this would require a fitness function perhaps of the form f (n g ) = e −(ng−3) 2 ; that is models are weighted with a Gaussian around the desired value. Clearly the population will coalesce around n g = 2, 3 or 4 rather than n g = 10 but as emphasised in the Introduction, for just one parameter, this way of selecting vacua is not obviously much more beneficial than a random scan; the GA procedure only really gives advantage once the search is multi-modal with many different criteria coming into play.
We can at this point mention the two studies of the string landscape using GA's in ref. [22] (which along with ref. [4] are the only other works on GA's in the context of string phenomenology of which we are aware). These searches are also multi-modal, but mainly because there are many different fluxes contributing to the same single phenomenological trait, namely the vacuum energy. Thus multi-modality can result from either an already existing many-to-one mapping of genotype to phenotype, or from a many-to-many mapping when several phenomenological requirements are introduced. The situation here, where we are searching for discrete phenomenological constraints, is really of the latter kind; selecting on the single trait of three generations would not be multi-modal enough to make GA's beneficial, as we shall see.
For example suppose that we now add the requirement that only one coupling (i.e. the top Yukawa) is large. It could be that the schemata that favour a large top Yukawa overlap in the genotype with those that give three generations. If they do not then it is equivalent to adding a second dimension in our introductory example, and the two phenomenological requirements can be decomposed into orthogonal constraints on the definitions of the models.
If there are enough orthogonal constraints chopping up the genotype, then suitable models can conceivably be constructed by hand. Conversely, if the schemata do overlap then the fitness landscape is much more complicated. And it could also be that top Yukawas and three generations are incompatible. In this way as more search criteria are applied, one expects more structure.
A more formal discussion of structure requires some measure of distance in the search space. One method for defining closeness in search spaces is the "Hamming distance", based on the number of "moves" that have to be made on one genotype in order to make it identical to the other. (More generally one can allow for insertions and deletions -known as the Levenshtein distance. In the models we will consider, where we are adjusting only a fixed number of GGSO phases, the Hamming distance is obviously the appropriate measure.) This is a useful concept when deciding if a GA will be effective on a certain problem, and it is important to realise that what one normally considers to be close phenomenologically may be far apart as far as a GA is concerned.
For instance suppose we have constructed an SU (5) model with a certain set of basis vectors. The Standard Model is then achieved by introducing a projection onto the spectrum with an additional basis vector that encodes gauge symmetry breaking. In terms of the Hamming distance the two models are clearly far apart -the data describing the Standard Model differs by an entire basis vector from that of the GUT model. However physically speaking the models are closely related: in some more complete formalism the symmetry breaking can probably be encoded by simply turning on a modulus. Conversely models that we would usually regard as being very different (such as SU (5) and SU (7)) may differ by only a few entries in their defining vectors. Moreover it is known that identical models can be described by different sets of basis vectors. For example often new basis vectors project out states from the spectrum but at the same time give rise to new sectors in which they reappear.
Clearly one expects that if selecting preferred models gets one closer to some nearby solution, then the GA will work efficiently. This has been formalised in the fitness distance correlation (FDC), that is the correlation between the fitness and the Hamming distance to the nearest solution [9]. It has been argued that generally the better this correlation is, the more effective GA's are at finding a solution (with problems fall into one of three classes depending on the FDC, misleading, difficult, and straightforward ). In the above example, it could be that SU (5) GUT models and the Standard Model are incompatible for a GA because they are far apart in terms of Hamming (Levenshtein) distance; any fitness function that favoured both models would necessarily have a poor FDC.
Unfortunately in the present context, and in many contexts where GA's are advantageous, the FDC will be extremely difficult to establish directly. This is because in order to do so one has to first locate the position of every solution so that one can compute the Hamming distance to the nearest one. But if one can deterministically find every solution then the problem is probably not interesting for GA's anyway. In computational complexity terms, establishing the correlation is clearly at least as hard as the well-known NP-hard "closest string" problem.

The GA analysis
In summary it seems that, in many situations, examining the convergence properties of GA's may actually be the best tool we have to study the structure of the landscape. Let us therefore now turn to the Pati-Salam models outlined in the previous section and describe the GA search.
We will consider three classes of solutions, that are increasingly phenomenologically viable, and hence increasingly "statistically difficult". In the first class we ask for three complete generations of SM multiplets (n g = 3) together with at least one Pati-Salam and SM Higgs (k R ≥ 1, n h ≥ 1): we call these the SM-complete models and a random search shows their number to be roughly one in 10 4 . In the second class of models we also insist that there are no exotics (k R ≥ 1, n h ≥ 1, n e = 0), the exophobic models. Roughly one in 2.5 × 10 6 models are in this class. Finally in the third class we also ask for a non-zero top Yukawa coupling (k R ≥ 1, n h ≥ 1, n e = 0, Y t = 0). From the GA perspective this last requirement is interesting because it is a direct constraint on the schemata: namely eq.(2.14) is equivalent to the condition that one of the following schemata is present: Our main tool for examining the convergence properties of the GA is the average number of string models that one has to construct before finding a solution, which we refer to as the "call count". Obviously in a completely random scan this would be the same as the "statistical difficulty", i.e. 10 4 , 2.5 × 10 6 , 10 10 respectively. Note that this is also the expected call-count for the GA if one dials the mutation rate up to 100%.
The GA is organised as follows. The genotype is taken to have length 51 and consist of the i binary numbers defining the GGSO phases. The optimal population was found to be 50 members. (Convergence rate by generation stays roughly constant if the population is increased, so it merely reduces the efficiency.) Assigning each criterion a desired value µ i and actual value ν i , the fitness function was taken to be of the form f (µ i , ν i ) = 10 − where γ i is a factor to give each criterion equal weight. Note that the µ i are in this context integers (e.g. n g for the generation number, or 0 or 1 for the existence or otherwise of a top Yukawa). In fact the precise form of f (the γ i for example) does not significantly alter the convergence unless extreme values are taken.
Searches were performed for different background mutation rates, µ b grd . The technique for tackling stagnation was as described above; if the maximum fitness was unchanged for more than 8 generations then a large mutation of 25 × µ b grd was carried out over the whole population 6 . Finally after each solution was found the entire population was scrambled and the search restarted to find the next solution.
The call count using this procedure can be drastically less than that of a random scan for the statistically more difficult searches. This is shown in figure 5 where we show the call count for the three different classes of solution. For the least constrained models we see that the GA is not much of an improvement over a random scan, as expected. However for those searches with a statistical difficulty of 2.5 × 10 6 and 10 10 we see that the GA is many orders of magnitude more efficient. Moreover the efficiency is extremely sensitive to the background mutation rate; it is optimised at around µ b grd = 0.75 − 1%. That is the 6 Again these are numbers that have to be optimised empirically. If stagnation is flagged too early for example then the algorithm is unable to converge on a solution. mutation probability per bit is optimally 0.0075-0.01. This is a clear sign that the GA is working as expected. The efficiency drops dramatically when the mutation is turned off completely (when the population is unable to discover new favourable schemata and/or stagnates) and also when the mutation is dialled up and the search becomes effectively randomised. It is close to but slightly below the rate 1/l ≈ 0.02 which is often claimed to be the optimal rate [3].
Although there are only three points of reference it is worth noting that the minimal call count appears to be increasing roughly as the log of the statistical difficulty and slower than a power law; empirically we find call-count ≈ 7 × 10 3 log(diff/4 × 10 3 ). It would be of interest to make this relationship more precise.
There is one further probe of the structure we can make. Instead of completely scrambling the genotypes after a solution is discovered, one can instead perform the same mutation of 25 × µ b grd that one does when the population stagnates. If this yields new solutions (i.e. the population should not simply revisit the same solution) at a faster rate, then this indicates that the solutions are "clustered" together (in terms of Hamming distance) rather than spread uniformly. This would certainly be expected if the system is modular with different non-overlapping schema governing different phenomenological traits. More generally it would imply that the solutions occupy a hypersurface in the search space. This is equivalent to "seeding" the starting value from a previously found solution.
Generally such a procedure is known to introduce positive and negative aspects. On the positive side the algorithm is more efficient, but on the negative side it can lead to premature convergence and may lead the algorithm to entirely miss islands of favourable solutions.
With his caveat in mind, the results for the most phenomenologically complete solutions (with a 10 10 search difficulty) are shown in figure 4. As can be seen the call count drops again (by a factor of 25) at the optimal mutation rate (although obviously it has to return to the same non-GA values as the mutation rate goes to 1 or 0). Although this may not seem dramatic, one should recall that when one completely randomises after finding each solution, this is the rate achieved with a much lower difficulty 10 4 . In accord with expectations, one finds that the new solutions are dominantly the same as the old ones with just one or two properties such as number of PS Higgses altered.

Possible improvements
The analysis presented above is sufficiently optimised for the constrained set of models considered here. In the future, and especially to perform a more general analysis of string vacua where the genotypes defining a model might have say 500 loci, further optimisation may be useful to consider 7 .
One crucial aspect that we have so far not discussed in depth is the selection process.
For this analysis we used simple roulette wheel selection, but it is known that some forms of selection may be more efficient when the population size becomes large. One can quantify the difference by measuring the "takeover time", namely the time for the best string to take over the whole population. In roulette wheel selection this typically increases as p log p with population, p, whereas with tournament selection (in which members of the old population compete with each other for breeding) it is known to increase only as log p. Secondly we have here assumed a constant mutation rate, except where there was premature convergence at which point it was temporarily increased. It may be interesting to instead vary the mutation rate more continuously, and possibly to introduce a crowding penalty.

Concluding remarks
We see this paper as a step towards more widespread use of genetic algorithms and other heuristic methods in the search for viable string vacua. We have restricted the discussion to Pati-Salam models in the Free Fermionic Formulation of heterotic strings in order to be able to compare GA's with simple randomised scans, as the latter can be performed very quickly in this class of models.
We found that GA's are many orders of magnitude more efficient at discovering solutions with desirable phenomenological characteristics. It is in the nature of heuristic search methods that rigorous expressions for the expected search time are hard to come by.
Nonetheless we find weak evidence that at least in this small subset of models the search time for GA's to find a solution increases only as the logarithm of the statistical difficulty.
It would be interesting to study this relation in more detail; if it holds it would indicate that these and similar heuristic methods would be crucial for searches with more refined criteria than the ones considered here.
It would also of course be interesting to now extend the analysis to more general configurations that include the possibility of different boundary vectors as well as generalised GSO phases, and also to consider other kinds of compactification. These are much more time consuming to construct and would be a real testing ground for GA's; the search space is many orders of magnitude larger than the relatively small one of 2 51 discussed here, and performing any kind of meaningful randomised search is not feasible.