New methods for old problems: vacua of maximal D = 7 supergravities

Finding vacua of supergravity theories is an outstanding problem which has been tackled in several ways, and with this work we add a new method to the puzzle. We analyse the scalar sector of maximal gauged supergravity theories in seven space-time dimensions. We look for vacua of the theory by varying the embedding tensor, instead of directly minimising the scalar potential. The set of quadratic constraints arising from this procedure has been solved by means of Evolution Strategies optimisation techniques, also adopted in Artificial Intelligence studies. We develop some methods to reconstruct and obtain analytical results starting from numerical outcomes, thus obtaining the complete mass spectra. In addition to some of the known vacua, we also obtain two new Minkowski vacua.


Introduction
Minima of potential functions have always been of uttermost importance in physics: from the search for stable trajectories in the gravitational field of a star to the spontaneous symmetry-breaking mechanism, potentials have been shown to be very useful in understanding the physical problem at hand. With the advent of String Theory, finding minima of scalar potentials was fundamental to study the String Landscape [1], in this framework Supergravity theories proved to be a conceptual testbed for this analysis. In fact, examining the structure of the vacua for supergravities is a crucial task in order to study the patterns of supersymmetry breaking, the processes behind the generation of critical points with non-vanishing cosmological constant, and in case the latter takes negative values the possible holographic dual theory. More specifically, the supergravity regime of the bulk theory in the AdS/CFT conjecture is described by gauged supergravities. Moreover, vacua with positive values of the cosmological constant have also recently received particular attention from the string theory community, due to the restrictions imposed upon them by the de Sitter conjecture [2] and other conjectures formulated in the framework of the Swampland programme. Vacua of gauged supergravities also have the desirable feature of giving masses to scalars, thus providing a solution to the moduli stabilisation problem. In this work, we will focus on maximal (N = 4) supersymmetric theories in D = 7 space-time dimensions. These theories are of particular interest because of their large amount of supersymmetry that completely fixes the field content of the theory and the Lagrangian, and in light of the limited number of possible deformations to which they can be subjected. For instance, dimensional reductions of string theories or M-theory on spheres produces theories whose vacua have maximally supersymmetric holographic duals. Moreover, our interest in seven dimensions arises since it has been shown that CFTs exist up to six space-time dimensions [3], and it is possible to chart most of them once we know all the vacua of Supergravity in one more dimension. Searching for minima of the scalar potentials in supergravity theories is a challenging task, and many techniques have been developed so far to tackle the problem. Progress has been achieved by pursuing three main different paths. The first consists in picking a particular theory (gaugings) and by using symmetries to truncate the field content in order to arrive at a model with a restricted number of scalars, thus making the minimisation procedure feasible. This method, introduced by N.P. Warner in [4,5] made it possible to analytically solve different minimisation conditions, thus discovering many new vacua in 4 and 5 space-time dimensions. Another proposal is to adopt a numerical approach based on machine learning libraries, and lead to the discovery of a large amount of vacua, which also allowed the analysis of mass spectra, residual supersymmetries and residual gauge groups [6][7][8][9][10][11][12][13]. Both these techniques are based on the choice of a gaug-ing, consequently, one has to scan one theory at a time, making them inconvenient for a complete cataloguing. Another approach has been formulated, based on the embedding tensor formalism. First discovered in the contest of the super-Higgs mechanism [14] it has been then introduced in the framework of the analysis of supergravity vacua [15], leading to a wealth of new results [16][17][18][19][20][21][22][23] as well as providing the first example of a family of gauged theories dependent on a continuous parameter [24]. This method does not pick a specific gauging choice, thus allowing for a simultaneous scan of different gauge theories. The last technique will produce a set of quadratic equations in n variables, with n the number of parameters in the embedding tensor, this problem is NP-hard, thus analytic solutions are expected to be found in very special cases, usually by restricting the number of variables, using some residual symmetry. In this work, we will build upon the last method and implement the scan using an evolution strategy algorithm, Covariance Matrix Adaptation (CMA-ES) [25], together with a Genetic Algorithm (GA). From the solutions we obtain, we can recover the analytic form of the vacua and their spectra.

Maximal supergravity in 7 dimensions
We recall, in this paragraph, all the notations and properties of the tensors needed in the rest of the paper. It is well known, since the seminal paper by de Wit, Samtleben and Triggiante [26], that any possible gauging of a supergravity theory is encoded in the embedding tensor Θ M α , where usually the indices M, N, P, transform in the vector representation and α, β, etc. in the adjoint of the duality group (for the case of 7 dimensions M, N, etc. will live in the fundamental representation 5 of SL(5), so the vector representation 10 will be described by a couple of antisymmetric indices M N = [M N ]). The job of the embedding tensor is to select the generators of the duality group, which in the case of 7 dimensions is SL(5), which will form the gauge group of the specified theory. Once this tensor is fixed, the Lagrangian and supersymmetry transformations follow, according to the analysis in [26]. We can write the new generators of the gauge group as where t P Q are the SL(5) generators, satisfying t M M = 0 and the algebra In seven space-time dimensions, the embedding tensor lives in the product of representations 10 ⊗ 24 = 10 + 15 + 40 + 175. However, supersymmetry constrains it to lie in the 15 + 40, therefore it is possible to parameterize it as: ,P with Z [M N,P ] = 0, that live respectiveley in the 15 and 40 representations of SL (5). Gauge symmetry imposes further constraints, these are in particular quadratic constraints that ensure the closure of the gauge algebra. They can be expressed in terms of the Y and Z tensors as These 15+40 representations can also be decomposed under the maximal compact subgroup of SL (5), that is, USp(4) SO (5). Under USp(4) they decompose as Defining the scalar matrix V M ab , anti-symmetric in a and b (a,b=1,...,4), which mediates from SL (5) to USp(4), normalised by we have: We call B, B [ab] [cd] , C [ab] and C [ab] (cd) the irreducible components that live in 1, 14, 5 and 35 of USp(4), respectively. In particular, they satisfy , Ω ab C ab = 0, [cd] , B ab cb = 0, Ω ab B ab cd = 0 = Ω ab B cd ab , C ab cd = C [ab] (cd) , C ab cb = 0, Ω ab C ab cd = 0. (2.8) In terms of these, the Y and Z tensors become [cd] , (2.9) Mapping the Θ-tensor from SL(5) to USp(4) we get the T-tensor defined as: (2.11) In terms of these new tensors, it is possible to rewrite the quadratic constraints as follows.
[cd] = 0 , Z (ab)[cd] T (ab)ef gh = 0 . Let us now come to the core of our analysis: the scalar potential and the mass matrices.
As usually happens in gauged supergravities, the scalar potential can be expressed as the difference of the squared fermion shifts, the latter are defined to be: (2.14) Then, the scalar potential is given by: (2.15) From this we can compute the variation of the scalar potential under a variation of the scalar fields given by δ Σ V M ab = Σ ab cd V M cd , where Σ ab cd is a variation along the scalar manifold SL(5)/USp(4), living in the 14 representation.
(2. 16) In order to find the vacua of the theory, we need to solve this equation of motion together with the quadratic constraints (2.12) or (2.4). Once we obtained the solutions to these quadratic equations, we studied the gauging, the residual gauge group, the residual amount of supersymmetry preserved by each vacuum, the mass spectra, and the values of the cosmological constant. On the vacuum, residual supersymmetry is given by the number of non-vanishing a parameters satisfying A 2a,bcd a = 0 . (2.17) On the other hand, for what concerns the masses, we have computed the various mass matrices. For the gravitinos, we have: The mass matrix for the fermions is given by 19) in order to compute the masses for the fermions, we must recall that χ abc transforms in the 16 of Usp(4) 20) and therefore also the mass matrix must lie in the 16 × 16 representation of Usp(4). The vector mass matrix is contained in the scalar kinetic term 1 2 P µab cd P µ cd ab , where P µab cd is defined by the gauge covariant space-time derivative of the scalar fields The P µab cd lies in the 14 representation of usp(4), while Q µa c in the 10, (these must be imposed before computing the masses). Analogously, the mass term for the 2-forms arises from the kinetic term of the vectors, namely Ω ac Ω bd H µν ab H cdµν , where H µν ab is given by the modified field strength tensor: Similarly, the mass term for the 3-forms, S N µνρ , arises from the kinetic term of the 2-forms, Ω ac Ω bd H µνρab H µνρ cd . The covariant field strength for the 2-forms is given by: Note that the kinetic term for the 3-forms is linear in the derivative, so what one really obtains from this procedure is the mass matrix, not the square mass matrix. Furthermore, this kinetic term is not canonically normalised, in fact it is in the schematic form Y M N S M DS N , so the true masses are obtained once one multiplies the mass matrix that arises from the kinetic term of the 2forms by Y −1 M N . For the scalar masses, we note that it is possible to parameterise the scalar fields in terms of the USp(4)-invariant, symmetric unimodular matrix M M N defined by The scalar potential, written in terms of M M N , is Therefore, by means of δ Σ V M ab = Σ ab cd V M cd , it is possible to calculate the second variation of the potential and, therefore, the scalar mass matrix, always recalling that Σ ab cd belongs to the 14 representation of USp(4). Eq. (2.6) has been widely used to calculate the masses. A complete study of the field content, Lagrangian, and symmetries for maximal supersymmetric gauged supergravities in 7 space-time dimensions is given in [27].

Scanning for vacua: methodologies
The approach we propose to chart the vacua of the theory is based on optimisation techniques, some of the methods we adopted are not unknown to the string theory community. Indeed, we used a combination of analytical and numerical tools; our starting point will be the analysis performed in [15]. There, the authors showed how every vacuum found in an explicit gauging of the theory can be mapped, by means of a duality transformation, to the origin of the scalar manifold, as shown in Fig. 1. This procedure is allowed by the homogeneous nature of the coset manifold, which in the case of maximal supergravities is of the form E n(n) /H (we do not include the Trombone symmetry in our analysis), with n the number of internal dimension and H the maximal compact subgroup of E n(n) . Due to this method, we will be able to scan the entire space of all gaugings at once. At the origin, the physical scalars are all set to 0, but the parameters of the embedding tensor are not fixed, so in addition to considering the minimisation conditions δ Σ V = 0, with δ Σ V given in eq. (2.16), we must also impose the quadratic constraint, eq. (2.4) or (2.12), (the linear constraint has been imposed when choosing to work with the representations (1 + 14) + (5 + 35) in (2.8)). The maximal compact subgroup of SL(5), SO(5), can be used to remove part of the variables, in this case 10, from the global system. In particular, we can always use local SO(5) to diagonalize Y M N . This leads to the problem of solving systems of multivariate quadratic equations ((MQ) problem), which is known to be a N P complete task. Indeed, various cryptosystems relies on this problem in some way, MQ is also considered a possibility for potentially post-quantum cryptosystems, namely, it is believed not to be solved by quantum algorithms as well. Therefore, we introduce a different, related, problem: we are going to solve the systems with a certain approximation, and this can always be done in polynomial time. Indeed, optimisation techniques have been widely used by physicists and mathematicians to find solutions to systems of equations. In order to transform our problem, recalling that we have a system of homogeneous equations of the form n i d i = 0, we introduce a fit function f f it defined as where i runs over the length of the system and d i represents the polynomials in the system. Obviously, the global minima of the fit function correspond to f f it = 0 and are the solutions to our system. This also implies that verifying that a minimisation point is also a solution takes no time. The first approach invented to solve a set of equations simultaneously is the Gradient Descent algorithm (GD), created by A. Cauchy, and since then many other attempts have been tried. Genetic Algorithms (GA), which are an instance of the broad class of Evolutionary Strategies algorithms, have already been widely used by theoretical physicists [28][29][30][31][32], they have the advantage to work with ill-behaved functions, while methods such as GD or stochastich GD work only on differentiable fitness functions. In this work, we are going to introduce yet another method in the set of tools used for the research of supergravity vacua, the Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES). In our search of minima of the scalar potential, we used a combination of GAs and CMA-ES, we will not dive into the explanation of GAs, since they are already known by the community, a nice introduction about these topics and other Data Science applications to String Theory can be found in [33]. Instead, we present the new technique, based on CMA-ES. CMA is an Evolutionary Strategy algorithm, therefore, it consists of a series of steps (generations), at each of them some operations are carried out before going to the next step. CMA is based on the adaptation of the covariance matrix associated with the multivariate normal distribution, from which we draw a new set of candidate solution points at each generation. The covariance matrix is not the only object that needs to be adapted at each step; the procedure implies the computation of the new mean and the overall step size as well. To compute the mean, we order the candidate solution points, based on the fitness function, and select the best µ, which results in a higher "evolution pressure". On the other hand, the Covariance Matrix is adapted in order to maximise the likelihood of precedent successful steps. Thus, CMA performs an iterated analysis of the principal components at every generation, while retaining all the principal axes. A deeper analysis of the algorithm is presented in Appendix A. Let us now describe some details of the analisis. Parallelisation proved to be fundamental in making the process faster and allowing for a more global search scan. Indeed, with CMA-ES we need to choose a starting mean m 0 and step size σ 0 , together with some hyperparameters such as the time allowed to carry the computations ('timeout' parameter), the minimum value accepted for the fitness function in order to declare that a minimum was achieved ('ftarget' parameter), the precision of this result ('tolfun'), the population size λ and whether or not to activate elitist research. Parallelisation gave us the opportunity to choose more initial means m 0 at a time, thus scanning a larger area of the parameter space. Some useful techniques have been used to adapt the algorithms to our specific problem and to render the analysis of numerical results faster. First of all, we are dealing with systems of homogeneous quadratic equations, therefore, their solutions always pass from the origin, which helps us to restrict the area of research when setting the initial mean for the multivariate normal distribution. On the other hand, we must pay attention because the origin of the reference system is a trivial solution for any homogenous system of equations and, therefore, starting close to it may lead us always there. This could be avoided by modifying the fitness function. We found the following definition to be very efficient: Basically, we created a step function to avoid the algorithm from always converge in 0.
The threshold must be chosen in such a way as to leave enough parameter space near the origin to complete a full scan without hitting the barrier too often. A technique has also been implemented to avoid the algorithm from returning to the previously found minima. It consisted of adding a multivariate normal distribution function centred on the minima on top of the fitness function, as illustrated in Fig. 2. Basically what one does is to add a series of umbrellas on top of the fitness function to stop the algorithm from going in those directions already analised.  In our case, though, the solutions to the systems are manifolds in the parameter space, so an infinite number of umbrellas would be needed to prevent the algorithm from finding again the vacuum structure under consideration. Imagine having a straight line in 2dimensions and starting to cover it with 2-dimensional multivariate normal distributions. In 7 space-time dimensions, we were interested, above all, in (Anti)-de Sitter vacua, and this information can be used to further simplify the numerical search. Indeed, taking into account the potential in (2.15) written in terms of the fermion shifts (2.14), we can see that the potential is written as the difference between 2 squared terms. For AdS we want to impose V = −k, with k constant, we can always normalise k to be 1, so we need to add another equation (quadratic) to our system, analogously for dS we need to solve V = 1.
We can always make a change of variables to completely solve this new constraint, for example, considering the case of AdS vacua, calling z i the variables contained in A d,abc 2 and x i the ones in A ab 1 : This change of variables introduces one more variable, ψ, but solves the constraint V = −1, thus removing a bunch of solutions from the system. By doing so, we will be certain that the algorithm will look only for vacua with a negative cosmological constant and that the other vacua disappeared from its landscape. Analogously, we can solve the constraint V = 1 for dS vacua by exchanging sinh with cosh in eq. (3.3). For the Minkowski vacua we just add one more homogeneous quadratic equation (V = 0) to the system. The initial points have been chosen as follows, for what concerns the variables w i and u i in (3.3), we draw them from a normal distribution with mean 0 and standard deviation σ = 4, instead, the starting points for ψ have been evenly displaced in an interval from 0.1 to 0.5 (note that the initial variables scale exponentially with ψ, so there is no need to reach high values for the latter). We used GA techniques, such as mutation and crossing over, when the algorithm CMA got stuck in a local minimum, in order to add some noise to the candidate solution points and move them away from the local minima. Let us now describe the analysis of the results. First, we remove the solution points that are close to each other, up to a certain threshold that we set to be equal to 3σ, thus 3 times the step size, removing all candidate solutions corresponding to vacua already present in the set of solutions. Then, we studied the residual amount of supersymmetry, the gravitini masses, the signature of the Cartan matrix (providing information about the number of compact and non-compact generators of the gauge group of the theory), and the rank of X P Q R giving us the dimension of the gauge group. The latter will provide us with everything that is needed to understand what gauging the vacuum belongs to. With all this information, it is possible to group the solution points and extract some very useful relations among them. Indeed, by plotting a variable x i or z i , against all other variables, it has been possible to reconstruct some analytical relations, as shown in Fig. (3), even by means of linear fitting. This is mainly possible when the number of unknowns is small, on the other hand, once we know the residual symmetry group of the solution, we can always use the technique of restricting the scan only to those variables which are present when that particular residual gauge symmetry has been imposed. This can lead to a new system that may be solved completely analytically, or we can always iterate the previous steps. The pseudocode for the search for AdS vacua is reported in Algorithm 1.
Sum of squared polynomials with the variables defined as in 3.3 end function function CM A minimization(f unction, u i , w i , ψ i , StepSize) return Candidate Solution Points in R n Hyperparameters, such as 'ftarget', 'seed', 'timeout', etc., must be set as well end function

Results
In this section we are going to present the vacua found through the methods presented before. As has been explained, we have been able to reconstruct directly or obtain analytical solutions from residual symmetry considerations starting from numerical ones. We found the two well-known AdS vacua, the maximal supersymmetric one [34] and the non-supersymmetric one [35], of which we present the mass spectrum. In addition, we found 2 new Minkowski vacua. The summary of the vacua is presented in Table 1

Mass spectra
In this section we present some of the masses for all the vacua listed in table 1. The masses for the AdS backgrounds are normalised in terms of the squared AdS radius L 2 = |15/V |, so that supersymmetric gravitinos have a normalised squared mass of 25/16.  The spectrum for the vacuum A1 agrees with the one found in [34], and it is determined by the representation of the superconformal algebra in six dimensions [36,37]. The scalar masses for the A2 case are the same as those in [35], ensuring that this is the same non-supersymmetric AdS vacuum. In order to compute the masses for the fermions, we note that in the A2 case supersymmetry is completely broken, therefore a super-Higgs mechanism is in action there. Consequently, 4 fermions, which correspond to the Goldstinos, must be eaten by the corrisponding gravitinos. The relevant part of the Lagrangian is where Γ µ are the 7-dimensional Dirac Gamma matrices and obtained from (2.19). We can do a field redefinition for the gravitinos in order to gauge away the interaction term between fermions and gravitinos, this will add a new term to the mass matrix of the fermions, (2.19), rendering the Goldstinos null eigenvectors. The supersymmetric AdS vacuum, (A1), is perturbatively stable, respecting the Breitenlohner-Freedman bound [38][39][40] for scalar degrees of freedom in AdS d . On the other hand, (A2), is pertubatively unstable, as already found in [35], thus corroborating the hypothesis about the instability of nonsupersymmetric AdS spacetimes formulated in the contest of the Swampland programme [41]. For the Minkowski vacua, we obtain  In Appendix B we present the form of the representations that make up the T-tensor, for each vacua we found. It is clear that this is not an exaustive analysis, indeed some vacua are missing from this scan, for example the Scherk-Schwarz Minkowski vacuum, with gauge group U(1) × R 6 , first found in [27], did not appear in our scan. In any case, it is possible to find it, once one limits the analysis to the vacua preserving a residual symmetry, in this case U(1).

Outlook and Future Directions
In section 3 we presented a new method, based on Evolutionary Strategies optimization techniques. The power of these tools combined with the thorough analysis of the numerical data allowed the reconstruction of analytical solutions and their mass spectra. The analysis has been carried out for D = 7 space-time dimensions, for reasons already explained in the introduction, but it would be interesting to see whether our procedure can be used also for different number of dimensions and for fewer amount of supersymmetry. The number of parameters in the embedding tensor, and therefore of variables, and the number of equations grow as the number of dimensions get lower. This is illustrated in Table 6, where the representations of the embedding tensor for maximal theories in each dimension are presented, giving the number of variables in the system of equations (there is the possibility to remove some of them using the H-redundancy of the scalar coset manifold). This implies that numerical methods are necessary when dealing with lower dimensional theories, because analytical results can be achieved in very special cases, either by fixing the gauging or restricting the analysis to vacua preserving some residual symmetry. Numerical methods, instead, would allow for a more general scan. On the other hand, with a large number of non-vanishing parameters in the vacua it is no longer possible to use the reconstruction techniques presented in section 3 and more subtle ways are needed if we want to obtain analytical results.  We also compared our method to some optimization tools available from TensorFlow libraries (Adam optimizer) and to other algorithms as well, finding CMA-ES and our implementations more efficient, given the cospicuous extension of the literature in Optimization it has been impossible to test our problem against many of the optimization techniques. Another interesting step for the future is to compare our methodology with other numerical approaches; maximal supergravities in D = 7 space-time dimensions are a good testbed due to the small number of variables and the interesting physical features.

A Covariance Matrix Adaptation -Evolutionary Strategy (CMA-ES)
In this Appendix, we present some details about the CMA-ES algorithm and its operational mode. CMA-ES, in short, CMA, is an evolution strategy (ES) algorithm that, as in the case of GA, only needs the fitness function as accessible information [42][43][44][45][46][47][48][49][50]. Therefore, differently from GD or stocastic GD, we do not require the function to be differentiable, it can also be not continuous. CMA, as the name suggests, is based on the "adaption" of a normal distribution to the fitness function under consideration (with its level curves). Let us analyse the algorithm in depth and consider a multivariate normal distribution, N (m, C), which is determined by the mean m ∈ R n (in our case, n is the number of free parameters in the embedding tensor) and by the symmetric, positive definite covariance matrix C ∈ R n×n . Covariance matrices are associated with the ellipsoid {x ∈ R n |x T C −1 x = 1}, where the principal axis of the latter are the eigenvectors of C, and the lengths of the squared axis are the eigenvalues of the covariance matrix. We can always diagonalize the covariance matrix by means of an orthogonal matrix B whose columns are the eigenvectors of C with unit length, C = B(D) 2 B T . Then, it is also possible to write the normal distribution as with I the n×n identity matrix. At each step of the process, we generate a new population of points (which in GA are called offsprings) by drawing them from a multivariate normal distribution: x g+1 j ∼ N m g , (σ g ) 2 C g with j = 1, ..., λ (A.2) Superscripts g, g + 1, etc. label the generations, λ is the population size, and σ g ∈ R + is the "overall" standard deviation (step size) at generation g. Now we need to explain how the mean, the covariance matrix, and the standard deviation are computed for the next generation g + 1.

The mean
The new mean m g+1 is simply selected with a weighted average of the µ best points of the population: w j x g+1 j:λ , with w j ∈ R + with j = 1, ..., µ are positive ordered weights, that is, w 1 ≥ w 2 ≥ ... ≥ w µ > 0. If w j = 1/µ for each j, we obtain the mean value for the best µ points. x g+1 j:λ represents the j-th best individual of the population, meaning, f (x g+1 1:λ ) ≤ f (x g+1 2:λ ) ≤ ... ≤ f (x g+1 λ:λ ). An essential quantity is the variance effective selection mass It is possible to show, from the definition of w j that 1 ≤ µ ef f ≤ µ and that µ ef f = µ only in the case where all the weights are the same and equal to 1/µ. Usually, µ ≈ λ/2 and w i ∝ µ − i + 1.

The covariance matrix
First, let us define the empirical covariance matrix C g+1 emp , which is nothing more than an estimate of the covariance matrix C g : This estimator has to be modified to obtain a maximum likelihood estimator of C g , by defining The difference between C g+1 emp and C g+1 λ is what is used as the mean value. The former uses the mean obtained from the entire population, thus estimating the variance of the sampled points, and the latter instead uses the mean obtained by (A.3), therefore estimating the sampled steps, x g+1 j − m g . We are going to modify this estimator again and define µ is an estimator of the variance of selected steps (the best / successful steps µ). We have some conditions on µ ef f in order for C g+1 µ to be a reliable estimator. Indeed, µ ef f has to be large enough to prevent the condition numbers (given a matrix A and a linear system Ax=b, with x unknown, they measure how sensitive the solution of the system to a change in b is, high condition numbers imply that small changes in b generate huge modifications in the solution) of C g+1 µ to be smaller than 10 for the fitness function of the sphere: f sphere (x) = n i=1 x 2 i ; empirically, it is seen that µ ef f ≈ 10n is a good choice. To avoid this problem for a small population, we will modify the update of the covariance matrix again. In order to obtain an algorithm that converges faster, we need a small population, on the other hand, to obtain a more global search the population has to increase. For a small population, also µ ef f ≈ λ/4 (which is the choice to take to have reasonable w j ) has to be small, then C g+1 µ is not a reliable estimator, in order to circumvent this, we define a new covariance matrix that takes into consideration the information we have from previous generations. Defining C 0 = I and the learning rate 0 < c cov ≤ 1, then (A.8) Step sizes σ g have been integrated to ensure that C g µ from different generations are comparable. If c cov = 1 the covariance matrix collapses to C g+1 µ and no information from previous generations is retained, on the other hand, if c cov = 0, C g+1 = C 0 and there is no learning. This kind of update, represented in (A.8) update, is called rank-µ update, because the sum goes from 1 to µ. Eq. (A.8) is iterative and can be expanded as Picking high values for c cov leads to degenerate covariance matrices, while small values imply slow learning, a good choice is c cov ≈ µ ef f /n 2 . Small population sizes λ lead to a large number of generations and therefore to a faster adaptation for the covariance matrix. A final step is necessary to update the covariance matrix: cumulation. In fact, information about "signs" of the steps the strategy took generation after generation has not been used so far. To do so, we introduce the evolution path. An evolution path is any sequence of succesive steps taken by the strategy, taking the sum of these steps is referred as cumulation; for instance, for three steps we have (A.10) Defining the 0-th order evolution path p 0 c = 0, we use the exponential smoothing and define iteratively with 0 ≥ c c ≤ 1 a new learning rate for the evolution path, the normalisation factor c c (2 − c c )µ ef f is dictated by the demand that p g+1 c be extracted from a normal distribution N (0, C). When c c = 0 there is no learning and p g c = 0. Putting everything together, we obtain the update of the covariance matrix: with µ cov ≥ 1 and should be µ cov = µ ef f . Eq. (A.12) reduces to eq. (A.8) in the case µ cov → ∞, so information from the last generation is taken into consideration by the rank-µ update and information from previous generations, instead, is exploited by the evolution path update, which is relevant above all for small population sizes.

The step size
An evolution path is also used to update the step size σ with a method called cumulative step size adaptation: • Whenever the evolution path is long, the steps are going in the same direction (approximatively), so they are correlated. Consequently, we can cover the same distance with longer but fewer steps, and the step size must be increased.
• When the evolution path is short, the steps cancel among each other, and the step size should be decreased.
• The optimal situation is that the steps are totally uncorrelated and orthogonal with respect to the previous and following ones.
We need to define what long-and short-evolution paths mean. In this respect, we compare the latter with the expected length under random selection, which means that the steps are not correlated with each other. If our strategy finds that the evolution paths are longer than the uncorrelated ones, σ has to be increased, and vice versa. The evolution path p g+1 c depends on its direction, therefore we define the conjugate path with 0 < c σ < 1 a learning rate and (C g ) − 1 2 ≡ B g (D g ) −1 B g T . Whenever (C g ) − 1 2 = I it aligns step m g+1 − m g with the coordinate system produced by B g . In particular, B g T rotates the system so that the columns of B g become the axis. (D g ) −1 rescales the length of the axis so that they measure the distances in the same way. B g rotates everything back, allowing us to compare the directions of the various steps. By adding the matrix (C g ) − we can see that, whenever ||p g+1 σ || > E [||N (0, I)||], σ g increases and viceversa when ||p g+1 σ || < E [||N (0, I)||]. It has been proved, in a survey about Black-Box optimizations [51], that CMA-ES outranks other 31 optimisation algorithms and that its performance is outstanding for rugged and ill-conditioned functions with large search spaces.
B T-tensor at the Vacuum in the 7 dimensional Theory In this Appendix we provide an instance of the value of the irreducible USp(4) representations composing the T-tensor generating the vacua of Table 4 Again, symmetries must be imposed on these tensors.