Handbook of Heuristics pp 1-31 | Cite as

# Evolution Strategies

## Abstract

Evolution strategies are classical variants of evolutionary algorithms which are frequently used to heuristically solve optimization problems, in particular in continuous domains. In this chapter, a description of classical and contemporary evolution strategies will be provided. The review includes remarks on the history of evolution strategies and how they relate to other evolutionary algorithms. Furthermore, developments of evolution strategies for nonstandard problems and search spaces will also be summarized, including multimodal, multi-criterion, and mixed-integer optimization. Finally, selected variants of evolution strategies are compared on a representative set of continuous benchmark functions, revealing strength and weaknesses of the different variants.

## Keywords

Evolution strategy derandomization CMA-ES benchmarking theory## Introduction

Evolution strategies (ESs) are a class of metaheuristics for optimization by means of (computer) experiments. They belong to the broader class of evolutionary algorithms and like other heuristics from this class mimic adaptive processes in biological evolution. The search in ESs is characterized by the alternating application of variation and selection operators. Recombination and mutation operators are the variation operators and create new individuals. The selection operator selects individuals from a population based on their corresponding fitness value, which is obtained by computing the objective function value in evolution strategies. The selected individuals form the next population and the process is repeated. A distinguishing feature of evolution strategies as compared to most other evolutionary algorithms is self-adaptive mutation operators, which are capable of adapting the shape of the mutation distribution according to the local topology of the landscape and thereby help ESs achieve maximal progress rates.

Before discussing technical details of the evolution strategies, it will be worthwhile to give a brief outline of their history: The idea of mimicking evolution in order to optimize technical systems arose in Germany, where it led to the development of evolution strategies by Ingo Rechenberg [50] and Hans-Paul Schwefel [57], and in the USA, where it led to the development of genetic algorithms [21, 34] and evolutionary programming [20]. In the mid-1990s, these strategies were unified under the common umbrella of *evolutionary algorithms* (EAs) [5]. While all these heuristics share the idea to mimic evolution in computational algorithms, researchers in genetic algorithms and evolution strategies emphasized on different aspects of algorithm design and problem domains.

Since their invention in the 1960s by Rechenberg and Schwefel at the Technical University of Berlin, evolution strategies have been used to optimize real-world systems, typically in engineering design. The first application of an evolution strategy was the design of optimal shapes in engineering such as nozzles and wing shape using physical experiments. Often the evolutionary design procedure discovered high-performing structures with surprising shapes that have never been considered by engineers before [7]. Starting from sequential stochastic hill climbing strategies, soon evolution strategies advanced to more sophisticated problem-solvers, and their main application domain became the treatment of black-box continuous optimization problems on the basis of computer models.

One important development introduced adaptive step sizes or mutation distributions. Although there were some precursors to the idea of step-size adaptation in stochastic search algorithms [56], the development of flexible and efficient adaptation schemes for mutation distributions became a major point of attention in ES research. This feature distinguished them from genetic algorithms which worked commonly with constant mutation strengths.

The so-called 1/5-th success rule: the rate of generating successful mutations is monitored, and the step size is controlled to achieve success rate 1/5, which is the optimal on spherical function.

The mutative self-adaptation: it most closely resembles natural evolution, where the step size also undergoes the recombination, mutation, and natural selection [14].

The derandomized self-adaptation (Hansen, Ostermeier, and Gawelczyk [30]): it cumulates the standardized steps and compares the length of the cumulative vector to the one obtained under random selection.

A parallel development was the introduction of population-based (or multi-membered) evolution strategies. Here, the idea is to create an evolutionary algorithm that performs, as Hans-Paul Schwefel called it, a *collective hill climbing* [58] (a collection of search points, where each point performs a simple hill climb search). To categorize different population models, the notation of (*μ*, *λ*)- and (*μ* + *λ*)- schemes was introduced, in which *μ* denotes the number of individuals in the parent population and *λ* the number of individuals in the offspring population. These multi-membered strategies are able to exploit the positive effects of recombination (crossover) and are more reliable in global optimization settings and on noisy problems than the early single-membered variants. Moreover, population-based algorithms could be executed in parallel and later could more easily be extended to advanced evolution strategies for solving multi-objective and multimodal optimization tasks.

Nowadays, evolution strategies are mainly used for the simulation-based optimization, i.e., computerized models that require parametric optimization. Evolution strategies are suitable for the optimization of non-smooth functions because they do not require derivatives. In general, the algorithmic paradigms of ESs can be extended to general metric search spaces, and mainstream variants of ESs address continuous optimization problems, as opposed to genetic algorithms [21], which are more typically used on binary search spaces.

Contemporary evolution strategies have shown to be competitive to other derivative-free optimization algorithms, both in theoretical studies [13, 65] and by benchmark comparisons on a large corpus of empirical problems [59]. Furthermore, their practical utility is underpinned by a large number of successful applications in engineering and systems optimization [6].

This chapter will give a brief introduction to classical and contemporary evolution strategies with a focus on mainstream variants of evolution strategies. Firstly, classical (*μ*, *λ*) − and (*μ* + *λ*) evolution strategies will be described in section “Classical Evolution Strategies”. Then, in section “Derandomized Evolution Strategies” derandomization, techniques will be discussed, which is the distinguishing feature in the CMA-ES. Section “Theoretical Results” addresses theoretical findings on the convergence and reliability of evolution strategies. An overview on new developments and nonstandard evolution strategies is provided in section “Nonstandard Evolution Strategies”. Moreover, this section covers adaptations of ESs that make them more suitable for multimodal optimization. Section “Benchmarks and Empirical Study” discusses empirical benchmarks used in this field and includes a comparative study of contemporary evolution strategies. Section “Conclusions” summarizes the main characteristics of ESs and highlights future research directions.

## Classical Evolution Strategies

*f*can be a black-box function and usually function

*f*is assumed to be nonlinear and have a minimum. Maximization problems can be brought into the standard form of Eq. 1 by simply flipping the sign of

*f*. Standard implementations also allow to restrict the domain of decision variables to interval domains and to introduce constraints. For the sake of brevity, the extensions of ESs for constraint handling will be widely omitted in the following discussion.

a

*d*-tuple of values for the decision variables*x*_{1}, …,*x*_{ d }, representing a candidate solution for the optimization problem (see Eq. 1),a tuple of strategy parameters. The strategy parameters can, for instance, be the standard deviations used to generate the perturbations of variables in the mutation (step sizes) or the components of a covariance matrix used in the mutation. Strategy parameters can be adapted during evolution.

a fitness value. It is typically based on the objective function value, and it may be altered by a penalty for constraint violations.

*μ*,

*κ*,

*λ*)- evolution strategies, the individual’s age is also maintained, that is, the number of generations that an individual has survived.

Another basic data structure of an evolution strategy is a population. A population is a multi-set of individuals, i. e., of elements of \(\mathbb {I}\). One distinguishes between the parent populations *P* _{ t } consisting of *μ* individuals and the offspring populations consisting of *λ* individuals.

*μ*,

*λ*)- evolution strategy and a (

*μ*+

*λ*)- evolution strategy is outlined in Algorithm 1. The algorithm starts with initializing a population

*P*

_{0}of

*μ*parent individuals, for instance, by uniform random sampling in the feasible intervals for the objective variables \(\vec {x}\). Then, the fitness values of

*P*

_{0}are determined, and the best solution found in

*P*

_{0}is identified and stored in the variables \(\vec {x}^{best}_0, f^{best}_0\).

Then, the following *generational loop* is executed until a termination criterion is met. Common termination criteria are stagnation of the search process or the excess of a maximally allowed duration for searching.

*recombination*and

*mutation*, and a deterministic

*selection*operator. The recombination operator, namely, \(\mathtt {Recombine}: \mathbb {I}^\mu \times \varOmega \rightarrow \mathbb {I}^\lambda \), generates from the

*μ*individuals in

*P*

_{ t }an offspring population of

*λ*individuals which are then mutated by the mutation operator, \(\mathtt {Mutate}: \mathbb {I}^\mu \times \varOmega \rightarrow \mathbb {I}^\mu \). The mutated individuals are evaluated, and, if necessary, the best found solution \(\vec {x}^{best}_t, f^{best}_t\) gets updated. Then the parent population of the next round is determined by selecting the

*μ*best solutions from

in case of a (

*μ*,*λ*)-ES the*λ*offspring individualsin case of a (

*μ*+*λ*)-ES the*μ*parents of the current generation and the*λ*offspring individuals,in case of a (

*μ*,*κ*,*λ*)-ES from the*μ*parents who have not exceeded an age of*κ*and the*λ*offspring individuals

*chicken-and-egg*” dilemma whether it makes more sense to start the evolution strategy with the generation of a parent population (with

*μ*individuals), as suggested in Algorithm 1, or to do so at some other stage of the evolution, i.e., starting with an offspring population. The chosen representation has the advantage that the process that generates the subsequent parent populations

*P*

_{0},

*P*

_{1},

*P*

_{2}, … can be viewed as a memoryless stochastic process or more precisely a Markov process:

*P*

_{ t }for some

*t*≥ 1, the information of

*P*

_{ t−1}is irrelevant in order to determine

*P*

_{ t+1}; alternatively, in the terminology of stochastic processes, the state of

*P*

_{ t+1}is conditionally independent of the state of

*P*

_{ t−1}given

*P*

_{ t }. This so-called Markov property makes it easier to analyze the behavior of the evolution strategy. In addition,

*P*

_{ t }can be viewed as a checkpoint of the algorithm, and if the process stops, e.g., because of a computer crash, the process may resume by starting the loop with the last saved state of

*P*

_{ t }.

The main loop of the evolution strategy is inspired by the principles of evolutionary adaptation in nature that were discovered in parallel by the naturalists Alfred Russel Wallace (1823–1913) and Charles Darwin (1809–1882). In brief, a population of individuals adapt to their environment by (random) variation and selection. The reason for variability in the population was unknown to these researchers. Only much later in the so-called modern synthesis, it was linked to the mutation and recombination of genes. The ES presented in Algorithm 1 does however by far not provide a complete model for evolution in nature. In fact, important driving forces of natural evolutionary processes such as the development of temporally stable species and coevolution cannot be modeled with this basic evolution strategy. On the other hand, by mimicking only the variation and selection process, one can already achieve a potent and robust optimization heuristic, and the theoretical analysis of evolution strategies can provide new insights in the dynamics of natural evolution.

There are many options to instantiate the operators of an evolution strategy, and in the literature, a certain terminology is used to refer to standard choices. Next, the most common instantiations of operators will be discussed by following the structure of Algorithm 1. The first step is the *initialization* of *P* _{0}, where the starting population is set by the user, since this allows to resume the evolution from a checkpoint. However, it is also very common to view the initialization as an integral part of the evolution strategy. Initialization procedures vary, while common choices are either constant initialization by generating *μ* copies of a starting (seed) point, or random initialization, i.e., to initialize the decision variables randomly within their bounds. The initialization of strategy parameters can have a significant impact on the transient behavior of an evolution strategy. In case of step-size parameters, it is often recommended to set these to 5*%* of the search space size.

A more complex operator is the recombination operator. In the nomenclature, the number of individuals that participate in the creation of a single vector is called *ρ*. The notation (*μ*/*ρ*, *λ*)- ES and, respectively, (*μ*/*ρ* + *λ*)- ES makes this number explicit. The *ρ* individuals that participate in the recombination are drawn by independent uniformly random choices from the population.

*ρ*individuals, there are two common strategies to create an offspring vector –

*intermediate*recombination and

*dominant*recombination. The vector to be determined is commonly the vector of decision variables \(\vec {x}\), but it can also include the vector of strategy parameters:

*Intermediate recombination*determines the offspring by averaging the components of the parent individual. It can be applied on the object parameters and on the strategy parameters. Given a*ρ*-tuple parent vectors \((\vec {q}^{(1)},\dots ,\vec {q}^{(\rho )}) \in (\mathbb {R}^d)^\rho \), it computes the resulting vector \(\vec {r}\) by means of \(r_j = \frac {1}{\rho }(\sum _{i=1}^{\rho } q_j^{(i)})\) for*i*= 1, …,*d*.*Discrete (or dominant) recombination*sets the*i*-th position of the offspring vector \(\vec {r}\) randomly to one of values of the parents. By drawing*d*uniform random numbers*u*_{ j },*j*= 1, …,*d*from the set {1, …,*ρ*}, the offspring individual is set to \(r_j = q_{u_j}\).

*intermediate*and

*dominant*, is lent from the theory of inheritance of biological traits by the botanist Gregor Mendel (1822–1884).

*mutation*operator is seen as a main driving force of the evolutionary progress in ESs. Mutation adds a small random perturbation to each component of \(\vec {x}\). The scaling of this perturbation is based on the strategy variables. A common case is to use individual so-called step-size parameters \(\sigma _1 \in \mathbb {R}^+\), …, \(\sigma _d \in \mathbb {R}^+\). The mutation then performs as indicated in the following equation:

*σ*

_{ i }, why the

*σ*-variables are also termed as standard deviations of the mutation.

*self-adaptation*of the mutation’s parameters. In case of

*d*step sizes, the mutative self-adaptation lets the step sizes themselves undergo an evolutionary process. In continuous optimization, the parameters of the multivariate Gaussian distribution can be adapted. Three levels of adaptation can be devised: Firstly, it is possible to control only a single standard deviation that is used for all decision variables (possibly with a constant scaling factor). This is called isotropic self-adaptation. Then, in the so-called individual step-size adaptation for each decision variable, a different standard deviation for the mutation is maintained and adapted. Finally, it is also possible to learn the full covariance matrix of the multivariate Gaussian distribution that is used in the mutation. The different levels of mutation distribution adaptation are indicated in Fig. 1. As a rule of thumb, it can be stated that the more mutation parameters are to be adapted, the longer it takes to reach an optimal convergence behavior for a given model.

Step-size control based on the success rate: It has been shown that for the (1 + 1)-ES and on two important benchmark functions – sum of squares and corridor model – among all isotropic Gaussian distributions, the optimal standard deviation is obtained at a step size that yields approximately a success probability of 1/5. Because the success probability can be assessed during execution, this allows for an effective step-size control of the (1 + 1)-ES.

Mutative step-size control: The idea is to make the parameters of the mutation distribution part of the individual and let it undergo an evolutionary process itself. Details of this strategy will be elaborated in this section.

Derandomized step-size control: Here a more efficient adaptation of the mutation distribution is derived based on cumulative information from previous successful mutation steps. Derandomized self-adaptation uses arithmetic procedures that can no longer be considered as biomimetic. For the price of losing flexibility and simplicity, they gain efficiency in particular for unconstrained continuous optimization and allow practicable schemes for adapting a full covariance matrix of a mutation distribution. The history and details of derandomized evolution strategies will be elaborated on in the next section.

*τ*

_{local}and

*τ*

_{global}are called local and global learning rates. The simplest form of the mutative step-size control exploits only a single step size

*σ*for all the coordinates. In this case, Eqs. (5) and (6) simplify to

*μ*,

*λ*)-ES are:

Small population size:

*μ*= 1,*λ*= 7,*τ*_{global}= 1.2, single step size,*ρ*= 1.Large population size:

*μ*= 15,*λ*= 100,*τ*_{local}= 1.1,*τ*_{global}= 1.2,*ρ*=*μ*, and intermediate recombination of decision variables and step-size parameters.

## Derandomized Evolution Strategies

**Indirect selection**. By definition, the goal of the mutation operator is to apply a stochastic variation to the decision variables, which will increase the individual’s selection probability. The selection of the*strategy parameters*setting is indirect, i.e., the vector of a successful mutation is not utilized to adapt the step-size parameters, but rather the parameters of the distribution that led to this mutation vector.**Realization of parameter variation**. Due to the sampling from a random distribution, the*realization*of the parameter variation does not necessarily reflect the nature of the strategy parameters. Thus, the difference de facto between good and bad strategy settings of strategy parameters is only reflected in the difference between their probabilities to be selected – which can be rather small. Essentially, this means that the selection process of the strategy parameters is*strongly disturbed*.The

*strategy parameter change rate*is defined as the difference between strategy parameters of two successive generations. Hansen and Ostermeier [29] argue that the change rate is an important factor, as it gives an indication concerning the adaptation speed, and thus it has a direct influence on the performance of the algorithm. The principal claim is that*this change rate basically vanishes*in the standard ES. The*change rate*depends on the*mutation strength*to which the strategy parameters are subject. While aiming at attaining the maximal change rate, the latter is limited by an upper bound, due to the finite selection information that can be transferred between generations. Change rates that exceed the upper bound would lead to a stochastic behavior. Moreover, the mutation strength that obtains optimal change rate is typically smaller than the one that obtains good diversity among the mutants – a desired outcome of the mutation operator, often referred to as*selection difference*. Thus, the conflict between the objective of*optimal change rate*versus the objective of*optimal selection difference*cannot be resolved at the mutation strength level [47]. A possible solution to this conflict would be to*detach*the change rate from the mutation strength.

The concept of derandomized evolution strategies has been originally introduced by scholars at the Technical University of Berlin in the beginning of the 1990s. It was followed by the release of a new generation of successful ES variants by Hansen, Ostermeier, and Gawelczyk [28, 30, 46, 48].

The first versions of *derandomized ES algorithms* introduced a controlled global step size in order to monitor the individual step sizes by decreasing the stochastic effects of the probabilistic sampling. The selection disturbance was completely removed with later versions by omitting the adaptation of strategy parameters by means of probabilistic sampling. This was combined with individual information from the last generation (the successful mutations, i.e., of selected offspring) and then adjusted to *correlated mutations*. Later on, the concept of *adaptation by accumulated information* was introduced, aiming to use wisely the past information for the purpose of step-size adaptation. Rather than using the last generation’s information alone, it was successfully generalized to a weighted average of the previous generations.

Note that the different derandomized ES variants strictly follow a \(\left (1,\lambda \right )\) strategy, postponing the treatment of recombination or plus-strategies for later stages. Moreover, the different variants hold different numbers of strategy parameters to be adapted, and this is an important factor in the complexity of the optimization routine and in its learning rate. The different algorithms hold a number of strategy parameters scaling either linearly (\(\mathcal {O}(d)\) parameters responsible for individual step-sizes) or quadratically (\(\mathcal {O}(d^2)\) parameters responsible for arbitrary normal mutations) with the dimensionality *d* of the search space.

### First Level of Derandomization

The so-called first level of derandomization targeted the following desired effects: (i) a degree of freedom with respect to the mutation strength of the strategy parameters, (ii) scalability of the *ratio* between the change rate and the mutation strength, and (iii) independence of population size with respect to the adaptation mechanism. The realization of the first level of derandomization can be reviewed through *three* particular derandomized ES variants:

#### DR1

*k*th individual,

*k*= 1, …,

*λ*:

*sel*refer to the selected individual):

#### DR2

The second derandomized ES variant [48] aimed to accumulate information about the correlation or anticorrelation of past mutation vectors in order to adapt the *global step size* as well as the *individual step sizes* – by introducing a *quasi-memory* vector. This accumulated information allowed omitting the stochastic element in the adaptation of the strategy parameters – updating them only by means of successful variations, rather than with random steps.

*k*th individual,

*k*= 1, …,

*λ*, reads

#### DR3

This third variant [30], usually referred to as the *Generation Set Adaptation* (GSA), considered the derandomization of arbitrary normal mutations for the first time, aiming to achieve invariance with respect to the scaling of variables and the rotation of the coordinate system. This naturally came with the cost of a quasi-memory matrix, \(\mathbf {B}\in \mathbb {R}^{r\times d}\), setting the dimension of the strategy parameters space to *d* ^{2} ≤ *r* ≤ 2*d* ^{2}. The adaptation of the global step size is *mutative* with stochastic variations, just like in the **DR1**.

*k*th individual,

*k*= 1, …,

*λ*:

*memory matrix*is formulated as

### Second Level of Derandomization: CMA-ES

Following a series of successful derandomized ES variants addressing the first level of derandomization, and a continuous effort at the Technical University of Berlin, the so-called covariance matrix adaptation (CMA) evolution strategy was released in 1996 [28], as a completely derandomized evolution strategy – the *fourth* generation of derandomized ES variants. The so-called second level of derandomization targeted the following effects: (i) The probability to regenerate the same mutation step is increased, (ii) the *change rate* of the strategy parameters is subject to explicit control, and (iii) strategy parameters are stationary when subject to random selection. The second level of derandomization was implemented by means of the CMA-ES.

The CMA-ES combines the robust mechanism of ES with powerful *statistical learning* principles, and thus it is sometimes subject to informal criticism for not being a genuine biomimetic evolution strategy. In short, it aims at satisfying the *maximum likelihood principle* by applying *principal component analysis* (PCA) [35] to the successful mutations, and it uses *cumulative global step-size adaptation*.

*σ*denotes the

*global step size*, and the covariance matrix

**C**determines the shape of the distribution ellipsoid:

**C**, versus the adaptation of the global step size

*σ*:

- The mean \(\vec {m}\) and the covariance matrix
**C**of the normal distribution are updated according to the*maximum likelihood principle*, such that good mutations are likely to appear again. \(\vec {m}\) is updated such thatand$$\displaystyle \begin{aligned} \mathcal{P}\Big(\vec{x}_{\mathrm{sel}}|\mathcal{N}\Big(\vec{m},\sigma^2\mathbf{C}\Big)\Big) \longrightarrow \max \end{aligned}$$**C**is updated such thatconsidering the prior$$\displaystyle \begin{aligned} \mathcal{P}\Big(\frac{\vec{x}_{\mathrm{sel}}-\vec{m}_{\mathrm{old}}}{\sigma}\Big|\mathcal{N}\Big(\vec{0},\mathbf{C}\Big)\Big) \longrightarrow \max \end{aligned}$$**C**. This is implemented through the so-called covariance matrix adaptation (CMA) mechanism. *σ*is updated such that it is conjugate perpendicular to the consecutive steps of \(\vec {m}\). This is implemented through the so-called cumulative step-size adaptation (CSA) mechanism.

### Evolution Path

*d*×

*d*

*matrix analogue*to the

**DR2**mechanism (see Eq. 13), with the

*outer-product*of the selected mutation vector \(\vec {z}_{\mathrm {sel}}\):

*exponentially weighted moving average*,

### The Path Length Control

*evolution path*is longer than expected, the steps are likely parallel, and thus the step size should be increased; alternatively, if it is shorter than expected, the steps are probably antiparallel, and the step size should be decreased accordingly. That magnitude is defined as the expected length of a normally distributed random vector. This evaluation is explicitly carried out by the

*conjugate*evolution path:

*eigen-decomposition*of

**C**is required in order to align all directions within the

*rotated frame*. Then, the update of the step size depends on the comparison between \(\|\vec {p}_{\sigma }\|\) and the expected length of a normally distributed random vector, \(E\left [\|\mathcal {N}\left (0,\mathbf {I}\right )\|\right ]\):

### The (*μ* _{ W }, *λ*) Rank-*μ* CMA

*μ*covariance matrix adaptation [26] is an extension of the original update rule for larger population sizes. The idea is to use

*μ*> 1 vectors in order to update the covariance matrix

**C**in each generation, based on

*weighted intermediate recombination*. Let \(\vec {x}_{i:\lambda }\) denote the

*i*th ranked solution point, such that

*mean*is now defined as follows:

*μ*update, combined with the rank-one update:

*μ*

_{ W },

*λ*)-CMA-ES heuristic is summarized in Algorithm 1, with initCMA() referring to the parametric initialization procedure. It should be noted that a CMA variant, which resembles the DR2 and targets a vector of

*d*individual step sizes (i.e., by means of a

*diagonalized*covariance matrix), was released by the name of sep-CMA-ES [51]. Furthermore, the CMA-ES heuristic was simplified in the form of the so-called CMSA strategy [15] and was further improved for certain cases of global optimization [1].

**Algorithm 1 (μW, λ)-CMA-ES**

## Theoretical Results

The theory of evolution strategies is traditionally focused on the questions of convergence dynamics.

Under relative mild conditions on the used mutation and recombination operators, complete global convergence in probability for *t* →*∞* can be proven [54]. Basically, for continuous objective functions, it is sufficient to ascertain that for every *𝜖*-ball around the global minimizer the probability that the mutation operator samples a point in this region is positive, regardless of the starting point. This is given, for instance, by bounding the standard deviations of the mutation from below by a small positive value. This can be easily generalized to ES for discrete [53] or even mixed-integer optimization problems [43].

Of more practical relevance are results on the convergence dynamics, that is, the speed of convergence to the optimum and on local progress rates. Different approaches for analysis have been used, based on dynamical systems theory [13], stochastic process theory [54], and techniques from the asymptotic analysis of randomized algorithms [33]. It is a well-established result that the most self-adaptive ES variants, if parameterized correctly, achieve a linear convergence rate on convex quadratic problems, where the condition number of the matrix and the type of step-size adaptation determine the linear factor [14, 45]. The same can be found for problems with fitness proportional noise [2]. On the other hand, on sharp ridges and plateaus and at the boundary of constraints, classical step-size adaptation tends to fail [38]. Also for problems with additive noise, the accuracy of the found results will be limited by the standard deviation of the noise [39]. The theory of ESs also revealed some insights regarding the manner in which different parameters are correlated with each other and devised guidelines concerning optimal parameter settings. Most prominently, the 1/5-th success rule for step-size control in (1 + 1)-ES with isotropic mutation was developed based on theoretical studies on the sphere and the corridor model [50].

Also results on the effect of genetic drift and recombination in population-based evolution strategies are available: Beyer highlighted the so-called genetic repair effect that recombination has in ESs. When using recombination convergent behavior and optimal convergence, rates can be achieved with higher mutation step sizes. This increases the robustness in global optimization settings as it is more likely to escape from local optima. The genetic repair effect is stronger when intermediate recombination is used, as compared to discrete recombination. Dynamical systems analysis and Markov chain analysis on simple search landscapes revealed that it is rather impossible to simultaneously explore different local optima by means of a single population [11]. Even if the recombination operator is disabled, when different attractor basins share exactly the same geometry, the population tends to quickly concentrate on a single attractor only [55]. These findings gave incentive to the development of niching [60] and restart methods [4] that counteract this effect and prove to perform better on multimodal landscapes.

## Nonstandard Evolution Strategies

The broad success of the family of evolution strategies provided the motivation to devise extended heuristics for treating problem instances that are beyond the canonical unconstrained single-objective, unimodal optimization formulation. Indeed, ES extensions to mixed-integer search spaces [8, 42], uncertainty handling [27], multimodal domains [60], and multi-objective Pareto optimization [31] were introduced in recent years. The goal of the current section is to provide an overview of those extensions.

### ES for Nonstandard Search Spaces

- 1.
Causal representation: Solutions should be represented in some metric search space such that relatively small changes with respect to the distance in the search space result on average in only relatively small changes of the objective function value.

- 2.
Unimodal mutation distributions: Small mutations should occur more likely than large mutations.

- 3.
Scalability of mutation: In order to implement self-adaptive mutation operators, it is essential that mutations can be scaled in terms of the average distance between parents and offspring.

- 4.
Accessibility of points by mutation: By applying one or a chain of many mutations, it should be possible to reach every point in the search space, regardless of the starting point.

- 5.
Unbiasedness of mutation: Mutations should not introduce a bias in the search. They should be symmetric and probability distributions with maximal entropy should be preferred.

- 6.
Similarity to parents in recombination: The distance of an offspring to its parents should not exceed the distance of the parents to each other. Moreover, on average, the distance to all parents should be the same.

Following these design principles, one can expect generalized ESs to possess similar properties in comparison with standard ESs for continuous search spaces. However, a warning should be placed here regarding the functioning of step-size adaptation. In the theoretical derivation of optimal schemes of ES, often the fact is used that differentiable problems locally resemble quadratic or linear functions. This property is lost when it comes to discrete optimization, and therefore the generalization of such results requires some caution. On the other hand, it has been found that mutative self-adaptation of step sizes also works in nonstandard search spaces such as for the adaptation of mutation probabilities for binary vectors or for parameters of geometric distributions in mixed integer evolution strategies.

### Niching and Multi-population ES

Given multimodal search landscapes with multiple basins of attraction that are of interest, targeting the simultaneous identification of several optima constitutes a challenge both at the theoretical and practical levels [44, 49, 60]. Within the domain of evolutionary computation, this challenge is typically treated by extending a given search heuristic into subpopulations of trial solutions that evolve in parallel to various solutions of the problem. This idea stems from the evolutionary concept of *organic speciation*, and the so-called *niching* techniques are the extension of EAs to speciation forming multiple subpopulations. The computational challenge in niching may be formulated as achieving an effective interplay between partitioning the search space into niches occupied by stable subpopulations, by means of population diversity preservation, and exploiting the search in each niche by means of a highly efficient optimizer with local search capabilities [60]. A niching framework utilizing derandomized ES was introduced in [60], proposing the CMA-ES as a niching optimizer for the first time. The underpinning of that framework was the selection of a peak individual per subpopulation in each generation, followed by its sampling according to DES principles to produce the consecutive dispersion of search points. The *biological analogy* of this machinery is an *alpha male winning all the imposed competitions and dominating thereafter its ecological niche*, which then obtains all the sexual resources therein to generate its offspring.

A common utility for defining the landscape subdomain of each subpopulation is a so-called niche radius. A radius-based framework for niching, which employs derandomized ES heuristics, has been formulated and investigated [61] and has shown a broad success in tackling both synthetic and real-world multimodal optimization problems. In practice, this framework holds multiple derandomized ES populations, which conduct heuristic search in their radii-defined subdomains and independently update their mutation distributions and step sizes. Since the partitioning is enforced per each generation according to the niche radius parameter, and since no a priori knowledge is available on the global structure of the search landscape and the spatial distribution of its basins of attraction, an adaptive niche radius approach was devised to remedy this so-called niche radius presumption [62]. The main idea of ES niching with self-adaptive niche shape approaches is to exploit learned landscape information, as reflected by the evolving mutation distribution, to define the niches in a more accurate manner. Especially, a Mahalanobis CMA-ES niching heuristic was formulated, which carries out the distance calculations among the individuals based upon the Mahalanobis distance metric, by utilizing the evolving covariance matrices of the CMA mechanism. Such heuristic operation resulted in successful niching on landscapes with unevenly shaped optima, on which the fixed-radii approaches performed poorly [62].

On a related note, a multi-restart with increasing population size approach was developed with the CMA algorithm, namely, IPOP-CMA-ES [4]. This heuristic aims at attaining the global optimum, while possibly visiting local optima along the process and restarting the algorithm with a larger population size and a modified initial step size. It is thus not defined as a niching technique.

### Noise Handling and Robust Optimization with ES

Robust optimization is concerned with identifying solvers that can also perform well when the input parameters and/or the objective function values are slightly perturbed on a systematic basis [10]. Most variants of ES are inherently suited to deal with noisy environments, and it was shown that a larger population size and the use of recombination are beneficial in noisy settings [2, 12]. Various techniques have shown to further improve performance on noisy objective functions. One technique is the so-called thresholding operator, which considers search points as improvement only when they introduce objective function improvements that exceed a certain threshold [9]. Moreover, it has been suggested to use the sample mean of multiple evaluations of the same individual (effective fitness) for evaluation. This increases the computation time, and therefore more efficient sampling schemes have been developed subsequently. Furthermore, a rank stability scheme was suggested to treat noise, exploiting the fact that ESs rather require a correct ranking rather than a correct objective function value [27].

Knowledge of second-order Hessian information at the optimum is desirable not only as a measure of system robustness to noise in the decision variables but also as a means for dimensionality reduction and for landscape characterization. Experimental optimization of quantum systems motivated the compilation of an automated method to efficiently retrieve the Hessian matrix about the global optimum *without derivative evaluations* from experimental measurements [63]. The study designed a heuristic to learn the Hessian matrix based upon the CMA-ES machinery, with necessary modification, by exploiting an inherent relation between the covariance matrix to the inverse Hessian matrix. It then corroborated this newly proposed technique, entitled forced optimal covariance adaptive learning (FOCAL), on noisy simulation-based optimization as well as on laboratory experimental quantum systems. The formal relation between the covariance matrix to the Hessian matrix is generally unknown, but has been a subject of active research. A recent study rigorously showed that accumulation of selected individuals carried the potential to reveal valuable information about the search landscape [64], e.g., as already practically utilized by derandomized ES variants. This theoretical study proved that a statistically constructed covariance matrix over selected decision vectors in the proximity of the optimum shared the same eigenvectors with the Hessian matrix about the optimum. It then provided an analytic approximation of this covariance matrix for a non-elitist multi-child (1, *λ*) strategy, holding for a large population size *λ*.

For a comprehensive overview of contemporary noise handling and robust optimization in ESs, the reader is referred to the PhD dissertation of Kruisselbrink, which was also complemented by an empirical study of the most common variants. Also, theoretical limits in the precision of multi-evaluation schemes in the presence of additive noise are derived therein [39].

### Multi-criterion and Constraint-Handling ES

In practical settings, the scenario of unconstrained optimization is not very common. Rather problems with multiple constraint functions and conflicting objective functions need to be solved.

In the optimization with constraints, it is mandatory to use alternative schemes for step-size adaptation for reasons that are explained in detail by Kramer and Schwefel [38]. They also suggest alternative schemes that can better deal with the constraints.

Pareto archived evolution strategy [37]: This classical multi-criterion optimization strategy uses an archive to maintain non-dominated points. The archive is updated based on the non-dominance and density of points.

Predator prey evolution strategy [22, 41]: In this biomimetic strategy, individuals are distributed on a grid and a population of predator individuals is performing a random walk on the grid and triggers local selection. The predators select their prey based on different objective functions or combination strategies.

Multi-objective CMA-ES [31]: This strategy seeks to improve contributions of individuals to the hypervolume indicator, which measures the size of the Pareto dominated space. Consequently, this strategy is well adapted to locate precise and regular representations of Pareto fronts for objective functions with complex shapes and correlated input variables. Another self-adaptation method, using local tournaments on hypervolume contributions, was suggested in [36] but so far received little attention.

### Mirrored Sampling

The mirrored sampling technique is a derandomized mutation method. It was firstly introduced in [16] for non-elitist (1, *λ*)-ES and then extended to the (*μ*, *λ*)-ES [3]. The idea of mirrored sampling is to generate part (normally half) of the offspring population in a derandomized way. More specifically, a single mutation vector **z** is utilized to generate two offspring (rather than one in the standard ES) – one by adding **z** to the parent **x**: **x** + **z** and another by subtracting **z** from **x**: **x** −**z**. The two offspring generated are *symmetric* or *mirrored* to the parental point. Mirrored sampling helps accelerating the convergence rate of evolution strategies, which is theoretically proven in [16]. When applied in (*μ*, *λ*)-CMA-ES with cumulative step-size adaptation, the mirrored sampling leads to a reduction of recombined mutation variance. Consequently, the step size is more than desirably reduced, and a premature convergence would occur. In order to solve this, the concept of pairwise selection is introduced [3], in which only the better offspring among the mirrored pair is allowed to possibly contribute to the weighted recombination. Then, it is assured that recombination will not use both elements of a mirrored pair at the same time.

## Benchmarks and Empirical Study

Besides a detailed description of evolution strategies and their theoretical aspects, it would be intuitive and helpful to look at the empirical ability of ESs in solving the black-box problems. In the ES benchmarking, two difficulties arise. On one hand, it is hard to design a set of test functions that captures the problem characteristics encountered in the real-world applications. On the other hand, summarizing ES ability over a set of test functions would be not straightforward due to the fact that performance of evolution strategies largely varies on problems with different characteristics (e.g., separability). The black-box optimization benchmark [25] (BBOB) is devised to tackle these difficulties. The noiseless BBOB encompasses 24 noise-free real-parameter single-objective functions that are either separable, ill-conditioned, or multimodal. All the test functions are defined over \(\mathbb {R}^d\), while the global optima for all the test functions are initialized in [−5, 5]^{ d } [19]. In addition, BBOB also introduces a proper measure to represent the performance of ESs for global optimization – the empirical cumulative distribution function (ECDF). ECDFs can be summarized over multiple test functions and represented graphically, increasing the accessibility of the benchmark results.

(1 + 1)-ES: one plus one elitist evolution strategy with 1/5 success rule.

(15, 100)-MSC-ES: mutative self-adaptation of individual step sizes.

(1, 7)-MSC-ES: mutative self-adaptation of individual step sizes.

DR2-ES: the derandomized evolution strategy using accumulated success mutation vector for step-size adaptation.

(

*μ*/*μ*_{w},*λ*)-CMA-ES: covariance matrix adaptation evolution strategy with weighted intermediate recombination.(

*μ*/*μ*_{w},*λ*_{m})-CMA-ES: CMA-ES with mirrored sampling and pairwise selection.IPOP-CMA-ES: a restart CMA-ES with increasing population size.

(1,

*λ*)-DR2-Niching: the niching approach based on the second derandomized ES variants.(1,

*λ*)-CMA-Niching: CMA-ES niching with fixed niche radius.(1 +

*λ*)-CMA-Niching: the elitist version.(1,

*λ*)-Mahalanobis-CMA-Niching: niche shape adaptation using Mahalanobis distance.

### Experimental Settings

The BBOB parameter settings of the experiment are the same for all the tested ES variants. The initial global step size *σ* is set to 1. The maximum number of function evaluations is set to 10^{4} × *d*. The initial solution vector (initial parent) is a uniformly distributed random vector restricted to the hyper-box [−4, 4]^{ d }. The algorithms are tested for problems of different number of input variables *d*. These are *d* ∈{2, 3, 5, 10, 20}.

The parameter settings for CMA-ES variants follow the suggested values in [23]; the reader is referred to it for details. For all the niching ES variants, the setting *λ* = 10 is used throughout all the test. The fixed niche radius calculation can be found in [60].

### Results

- 1.
*f*_{1}Sphere function. - 2.
*f*_{2}Ellipsoidal function. - 3.
*f*_{10}Rotated ellipsoidal function. - 4.
*f*_{8}Rosenbrock function. - 5.
*f*_{13}Sharp ridge function. - 6.
*f*_{7}Step ellipsoidal function. - 7.
*f*_{15}−*f*_{19}Multimodal function having weakly global structure. - 8.
*f*_{20}−*f*_{24}Multimodal function having adequate global structure.

*f*

_{15}−

*f*

_{19}), IPOP-CMA-ES outperforms all the niching ES variants, (1 + 1)-ES with restart and (15, 100)-MSC-ES. (1 +

*λ*)-CMA-Niching shows the best performance among all the niching ES variants tested. Surprisingly, (

*μ*,

*λ*)-MSC-ES performs well even compared to niching ES variants, which may be a consequence of its large population size. On the weakly structured multimodal functions, (1 +

*λ*)-CMA-Niching could catch up with the performance of IPOP-CMA-ES both in 5D and 20D. (

*μ*,

*λ*)-MSC-ES performs quite poorly in this case. In addition, although it is a simple strategy, (1 + 1)-restart shows good results compared to IPOP-CMA-ES and niching ES. Evidently, the niching ES variants spend many function evaluations for maintaining the local optima and thus exhibits altogether poorer performance in terms of global convergence speed when compared to, e.g., IPOP-CMA-ES, which targets the accurate approximation of a single optimum and exploits much of its resources to achieve this.

## Conclusions

It has been shown in this chapter that evolution strategies are a versatile class of stochastic search heuristics for optimization. There exists a rich body of theoretical results on ESs, including global convergence conditions, results showing linear convergence rates on high dimensional functions, and findings on the stability of subpopulations and the impact of recombination on global convergence reliability. Moreover, ESs are rank-based (order-invariant) and invariant to changes of the coordinate system. The self-adaptation of the stochastic distribution is an important feature, too, as it frees the user from the burden of choosing the right parameters for mutation and it also makes highly precise approximation of optima possible.

This chapter highlighted mainstream variants of ESs for continuous optimization, including the (1 + 1)-ES with 1/5-th success rule, the (*μ*, *λ*) −ES with mutative step-size adaptation, and common variants of ES with different levels of derandomized step-size adaptation, namely, DR1, DR2, DR3, and CMA-ES. Moreover, common concepts of ESs for multimodal optimization were discussed.

All these strategies have been compared on different categories of functions. The empirical studies confirmed the superiority of covariance matrix adaptation techniques on ill-conditioned problems with correlated variables. However, if these problems do not govern the search difficulty, other evolution strategies can be highly competitive as well. Moreover, it was confirmed that multimodal optimization requires special adaptations to evolution strategies in order to achieve maximal performance.

Our literature review has shown that the algorithmic techniques developed for ES are not only fruitful in the domain of continuous optimization but can be applied to other problem classes as well. Here, the key is defining a metric representation of the search space and following a set of guidelines for the design of mutation and recombination operators, which were reviewed here in a rather informal manner.

Some prevalent topics for future research will be the integration of multiple criteria and constraints, although some first promising results are already available in this direction. Moreover, for nonstandard ES, the theoretical analysis needs to be advanced, in particular the study of convergence dynamics when the available time is limited. Finally, looking back to the original biological inspiration of evolution strategies, one might conjecture that nature has still many “tricks” in store that when well understood could lead to a further enhancement of ES-like search strategies. In this context, it will be interesting to follow recent trends in biological evolution theories [32], showing that a much broader set of mechanisms seem to govern organic evolution than those captured in the modern synthesis.

## Cross-References

## References

- 1.Akimoto Y, Sakuma J, Ono I, Kobayashi S (2008) Functionally specialized CMA-ES: a modification of CMA-ES based on the specialization of the functions of covariance matrix adaptation and step size adaptation. In: GECCO’08: proceedings of the 10th annual conference on genetic and evolutionary computation, New York. ACM, pp 479–486Google Scholar
- 2.Arnold DV (2002) Noisy optimization with evolution strategies, vol 8. Springer, BostonGoogle Scholar
- 3.Auger A, Brockhoff D, Hansen N (2011) Mirrored sampling in evolution strategies with weighted recombination. In: Proceedings of the 13th annual conference on genetic and evolutionary computation, GECCO’11, New York. ACM, pp 861–868Google Scholar
- 4.Auger A, Hansen N (2005) A restart CMA evolution strategy with increasing population size. In: Proceedings of the 2005 congress on evolutionary computation CEC-2005, Piscataway. IEEE Press, pp 1769–1776Google Scholar
- 5.Bäck T (1996) Evolutionary algorithms in theory and practice. Oxford University Press, OxfordGoogle Scholar
- 6.Bäck T, Emmerich M, Shir OM (2008) Evolutionary algorithms for real world applications [application notes]. IEEE Comput Intell Mag 3(1):64–67Google Scholar
- 7.Bäck T, Hammel U, Schwefel H-P (1997) Evolutionary computation: comments on the history and current state. IEEE Trans Evol Comput 1(1):3–17Google Scholar
- 8.Bäck T, Schütz M (1995) Evolution strategies for mixed integer optimization of optical multilayer systems. In: Evolutionary programming IV – proceeding of fourth annual conference on evolutionary programming. MIT Press, pp 33–51Google Scholar
- 9.Bartz-Beielstein T (2005) Evolution strategies and threshold selection. In: Hybrid metaheuristics. Springer, pp 104–115Google Scholar
- 10.Ben-Tal A, El Ghaoui L, Nemirovski A (2009) Robust optimization. Princeton series in applied mathematics. Princeton University Press, PrincetonGoogle Scholar
- 11.Beyer H-G (1999) On the dynamics of EAs without selection. Found Genet Algorithms 5:5–26Google Scholar
- 12.Beyer H-G (2000) Evolutionary algorithms in noisy environments: theoretical issues and guidelines for practice. Comput Methods Appl Mech Eng 186(2):239–267Google Scholar
- 13.Beyer H-G (2001) The Theory of Evolution Strategies. Springer, BerlinGoogle Scholar
- 14.Beyer H-G, Schwefel H-P (2002) Evolution strategies–a comprehensive introduction. Nat Comput 1(1):3–52Google Scholar
- 15.Beyer H-G, Sendhoff B (2008) Covariance matrix adaptation revisited – the CMSA evolution strategy. In: Parallel problem solving from nature – PPSN X. Lecture notes in computer science, vol 5199. Springer, Berlin, pp 123–132Google Scholar
- 16.Brockhoff D, Auger A, Hansen N, Arnold DV, Hohm T (2010) Mirrored sampling and sequential selection for evolution strategies. In: Schaefer R, Cotta C, Kolodziej J, Rudolph G (eds) Proceedings of the 11th international conference on Parallel problem solving from nature: part I, PPSN’10. Springer, Berlin, pp 11–21Google Scholar
- 17.Droste S, Wiesmann D (2003) On the design of problem-specific evolutionary algorithms. In: Advances in evolutionary computing. Springer, Berlin, pp 153–173Google Scholar
- 18.Emmerich M, Grötzner M, Schütz M (2001) Design of graph-based evolutionary algorithms: a case study for chemical process networks. Evol Comput 9(3):329–354Google Scholar
- 19.Finck S, Hansen N, Ros R, Auger A (2010) Real-parameter black-box optimization benchmarking 2010: presentation of the noisy functions. Technical report 2009/21, Research Center PPEGoogle Scholar
- 20.Fogel LJ (1999) Intelligence through simulated evolution: forty years of evolutionary programming. Wiley, New YorkGoogle Scholar
- 21.Goldberg D (1989) Genetic algorithms in search, optimization, and machine learning. Addison Wesley, ReadingGoogle Scholar
- 22.Grimme C, Schmitt K (2006) Inside a predator-prey model for multi-objective optimization: a second study. In: Proceedings of the 8th annual conference on genetic and evolutionary computation. ACM, pp 707–714Google Scholar
- 23.Hansen N (2016) The CMA evolution strategy: a tutorial. arXiv preprint, arXiv:1604.00772. 4 Apr 2016Google Scholar
- 24.Hansen N, Arnold DV, Auger A (2015) Evolution strategies. Springer, Berlin/Heidelberg, pp 871–898Google Scholar
- 25.Hansen N, Auger A, Finck S, Ros R (2010) Real-parameter black-box optimization benchmarking 2010: experimental setup. Technical report RR-7215, INRIAGoogle Scholar
- 26.Hansen N, Kern S (1998) Evaluating the CMA evolution strategy on multimodal test functions. In: Parallel problem solving from nature – PPSN V. Lecture notes in computer science, vol 1498. Springer, Amsterdam, pp 282–291Google Scholar
- 27.Hansen N, Niederberger S, Guzzella L, Koumoutsakos P (2009) A method for handling uncertainty in evolutionary optimization with an application to feedback control of combustion. IEEE Trans Evol Comput 13(1):180–197CrossRefGoogle Scholar
- 28.Hansen N, Ostermeier A (1996) Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation. In: Proceedings of the 1996 IEEE international conference on evolutionary computation, Piscataway. IEEE, pp 312–317CrossRefGoogle Scholar
- 29.Hansen N, Ostermeier A (2001) Completely derandomized self-adaptation in evolution strategies. Evol Comput 9(2):159–195CrossRefGoogle Scholar
- 30.Hansen N, Ostermeier A, Gawelczyk A (1995) On the adaptation of arbitrary normal mutation distributions in evolution strategies: the generating set adaptation. In: Proceedings of the sixth international conference on genetic algorithms (ICGA6), San Francisco. Morgan Kaufmann, pp 57–64Google Scholar
- 31.Igel C, Hansen N, Roth S (2007) Covariance matrix adaptation for multi-objective optimization. Evol Comput 15(1):1–28CrossRefGoogle Scholar
- 32.Jablonka E, Lamb MJ (2005) Evolution in four dimensions. MIT Press, CumberlandGoogle Scholar
- 33.Jägersküpper J (2007) Algorithmic analysis of a basic evolutionary algorithm for continuous optimization. Theor Comput Sci 379(3):329–347MathSciNetCrossRefzbMATHGoogle Scholar
- 34.John H (1992) Holland, adaptation in natural and artificial systems. MIT Press, CambridgeGoogle Scholar
- 35.Jolliffe I (2002) Principal component analysis2nd edn. Springer, New YorkGoogle Scholar
- 36.Klinkenberg J-W, Emmerich MT, Deutz AH, Shir OM, Bäck T (2010) A reduced-cost SMS-EMOA using kriging, self-adaptation, and parallelization. In: Multiple criteria decision making for sustainable energy and transportation systems. Springer, Berlin/Heidelberg, pp 301–311CrossRefGoogle Scholar
- 37.Knowles JD, Corne DW (2000) Approximating the nondominated front using the Pareto archived evolution strategy. Evol Comput 8(2):149–172CrossRefGoogle Scholar
- 38.Kramer O, Schwefel H-P (2006) On three new approaches to handle constraints within evolution strategies. Nat Comput 5(4):363–385MathSciNetCrossRefzbMATHGoogle Scholar
- 39.Kruisselbrink J (2012) Evolution strategies for robust optimization. PhD thesis, Leiden University, LeidenGoogle Scholar
- 40.Kursawe F (1991) A variant of evolution strategies for vector optimization. In: Parallel problem solving from nature. Springer, Berlin/Heidelberg, pp 193–197CrossRefGoogle Scholar
- 41.Laumanns M, Rudolph G, Schwefel H-P (1998) A spatial predator-prey approach to multi-objective optimization: a preliminary study. In: Parallel problem solving from nature – PPSN V. Springer, Berlin/Heidelberg, pp 241–249CrossRefGoogle Scholar
- 42.Li R (2009) Mixed-integer evolution strategies for parameter optimization and their applications to medical image analysis. PhD thesis, Leiden University, LeidenGoogle Scholar
- 43.Li R, Emmerich MT, Eggermont J, Bäck T, Schütz M, Dijkstra J, Reiber JH (2013) Mixed integer evolution strategies for parameter optimization. Evol Comput 21(1):29–64CrossRefGoogle Scholar
- 44.Mahfoud S (1995) Niching methods for genetic algorithms. PhD thesis, University of Illinois at Urbana ChampaignGoogle Scholar
- 45.Meyer-Nieberg S, Beyer H-G (2007) Self-adaptation in evolutionary algorithms. In: Parameter setting in evolutionary algorithms. Springer, Berlin, pp 47–75CrossRefGoogle Scholar
- 46.Ostermeier A, Gawelczyk A, Hansen N (1993) A derandomized approach to self adaptation of evolution strategies. Technical report TR-93-003, TU BerlinGoogle Scholar
- 47.Ostermeier A, Gawelczyk A, Hansen N (1994) A derandomized approach to self adaptation of evolution strategies. Evol Comput 2(4):369–380CrossRefGoogle Scholar
- 48.Ostermeier A, Gawelczyk A, Hansen N (1994) Step-size adaptation based on non-local use of selection information. In: Parallel problem solving from nature – PPSN III. Lecture notes in computer science, vol 866. Springer, Berlin/Heidelberg, pp 189–198Google Scholar
- 49.Preuss M (2015) Multimodal optimization by means of evolutionary algorithms. Natural computing series. Springer International Publishing, ChamCrossRefzbMATHGoogle Scholar
- 50.Rechenberg I (1973) Evolutionsstrategie: optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Friedrich Frommann Verlag, Stuttgart-Bad CannstattGoogle Scholar
- 51.Ros R, Hansen N (2008) A simple modification in CMA-ES achieving linear time and space complexity. In: Parallel problem solving from nature – PPSN X. Lecture notes in computer science, vol 5199. Springer, Berlin/Heidelberg, pp 296–305Google Scholar
- 52.Rudolph G (1992) On correlated mutations in evolution strategies. In: Parallel problem solving from nature – PPSN II. Elsevier, Amsterdam, pp 105–114Google Scholar
- 53.Rudolph G (1994) An evolutionary algorithm for integer programming. In: Parallel problem solving from nature – PPSN III. Springer, Berlin/Heidelberg, pp 139–148CrossRefGoogle Scholar
- 54.Rudolph G (1997) Convergence properties of evolutionary algorithms. Kovac, HamburgzbMATHGoogle Scholar
- 55.Schönemann L, Emmerich M, Preuß M (2004) On the extinction of evolutionary algorithm subpopulations on multimodal landscapes. Informatica (Slovenia) 28(4):345–351Google Scholar
- 56.Schumer M, Steiglitz K (1968) Adaptive step size random search. IEEE Trans Autom Control 13(3):270–276CrossRefGoogle Scholar
- 57.Schwefel H-P (1965) Kybernetische Evolution als Strategie der experimentellen Forschung in der Strömungstechnik. Technische Universität, BerlinGoogle Scholar
- 58.Schwefel H-P (1987) Collective phenomena in evolutionary systems. In: Checkland P, Kiss I (eds) Problems of constancy and change – the complementarity of systems approaches to complexity, proceeding of 31st annual meeting, Budapest, vol 2. International Society for General System Research, pp 1025–1033Google Scholar
- 59.Schwefel H-PP (1993) Evolution and optimum seeking: the sixth generation. Wiley, New YorkGoogle Scholar
- 60.Shir OM (2008) Niching in derandomized evolution strategies and its applications in quantum control. PhD thesis, Leiden University, LeidenGoogle Scholar
- 61.Shir OM, Bäck T (2009) Niching with derandomized evolution strategies in artificial and real-world landscapes. Nat Comput Int J 8(1):171–196MathSciNetCrossRefzbMATHGoogle Scholar
- 62.Shir OM, Emmerich M, Bäck T (2010) Adaptive niche-radii and niche-shapes approaches for niching with the CMA-ES. Evol Comput 18(1):97–126CrossRefGoogle Scholar
- 63.Shir OM, Roslund J, Whitley D, Rabitz H (2014) Efficient retrieval of landscape hessian: forced optimal covariance adaptive learning. Phys Rev E 89(6):063306CrossRefGoogle Scholar
- 64.Shir OM, Yehudayoff A (2017) On the statistical learning ability of evolution strategies. In: Proceedings of the 14th ACM/SIGEVO conference on foundations of genetic algorithms, FOGA-2017. ACM Press, New York, pp 127–138CrossRefGoogle Scholar
- 65.Teytaud O (2011) Lower bounds for evolution strategies. Theory Random Search Heuristics 1:327–354MathSciNetCrossRefzbMATHGoogle Scholar