Model-based evolutionary algorithms: a short survey

The evolutionary algorithms (EAs) are a family of nature-inspired algorithms widely used for solving complex optimization problems. Since the operators (e.g. crossover, mutation, selection) in most traditional EAs are developed on the basis of fixed heuristic rules or strategies, they are unable to learn the structures or properties of the problems to be optimized. To equip the EAs with learning abilities, recently, various model-based evolutionary algorithms (MBEAs) have been proposed. This survey briefly reviews some representative MBEAs by considering three different motivations of using models. First, the most commonly seen motivation of using models is to estimate the distribution of the candidate solutions. Second, in evolutionary multi-objective optimization, one motivation of using models is to build the inverse models from the objective space to the decision space. Third, when solving computationally expensive problems, models can be used as surrogates of the fitness functions. Based on the review, some further discussions are also given.


Introduction
With the development of modern science and engineering, the optimization problems in various areas are becoming increasingly challenging. Without loss of generality, an optimization problem (for minimization, with box constraints) can be formulated as [1] min x f(x) = ( f 1 (x), f 2 (x), . . . , f m (x)), where X ⊂ R n and x = (x 1 , x 2 , . . . , x n ) ∈ X denote the decision space and the decision vector, respectively; Y ⊂ R m and f ∈ Y denote the objective space and the objective vector, respectively. The decision vector x comprises n decision variables, and the objective vector f comprises m objective functions which map x from X to Y . If there is only one objective, i.e., m = 1, the problems are often known as single-objective optimization problem (SOPs); while if there is more than one objective function, i.e., m > 1, the problems are often known as multi-objective optimization problems (MOPs) [2]. For SOPs, there usually exist at least one global optimal solution that optimizes the given objective function. For MOPs, however, there does not exist a single solution that optimizes all the objectives simultaneously, and by contrast, Fig. 1 The general framework of the evolutionary algorithms there exist a set of optimal solutions that trade off between different objectives, where the image of the solution set is known as the Pareto set (PS) and the Pareto front (PF) in the decision space and objective space, respectively.
Due to the complex properties of real-world optimization problems, mathematical methods such as the Newton's method [3] and the hill climbing method [4] fail to work effectively on them. By contrast, the evolutionary algorithms (EAs) [5] show generally robust performance on those complex optimization problems. Generally, the family of EAs refers to the population-based stochastic algorithms inspired by natural evolution, include the genetic algorithm (GA) [6], the evolutionary programming (EP) [7,8], the evolution strategy (ES) [9] and the genetic programming (GP) [10], as well as the differential evolution (DE) [11]. Besides, the recently developed swarm intelligence (SI) algorithms such as the particle swarm optimization (PSO) [12] and ant colony optimization (ACO) [13] are also regarded as new members of the EA family.
In spite of the different technical details adopted in different EAs, most of them share a common framework as given in Fig. 1. Each generation in the main loop of a typical EA consists of the following components: reproduction, fitness evaluation and selection. To be more specific, the reproduction process, which generates new candidate solutions, often adopts the so-called genetic operators such as crossover and/or mutation; the fitness evaluation process indicates the quality of the candidate solutions in the current population by assigning fitness values; and the selection operator determines which candidate solutions can survive to the next generation. Traditionally, the operators in EAs are developed on the basis of some fixed heuristic rules or strategies, but do not interact with the environment. 1 However, during the evolution process, the environment can vary rapidly due to the complicated properties of the problem to be optimized. In this case, traditional operators may not work effectively due to the failure of adaptively adjusting the behaviors. In other words, traditional EA operators are unable to learn from the environment.
To address the above issue, a number of recent works have been dedicated to proposing EAs with learning ability. The basic idea is to replace the heuristic operators with machinelearning models, where the candidate solutions are used as the training data sampled from the current environment in each generation. For different purposes, the machinelearning models can be embedded into any of the three main components in EAs, i.e., reproduction, fitness evaluation or selection. To be specific, the adopted machine-learning (ML) models can be regression models (e.g. the Gaussian process (GP) [10], artificial neural network (ANN) [14]), clustering models (e.g. the K-means [15]), classification models (e.g. the support vector machine (SVM) [16]), dimensionality reduction models (e.g. the principle component analysis (PCA) [17]), etc.
In spite of the various technical details, we find three main motivations of using ML models in EAs: (1) building estimation models in the decision space, (2) building inverse models to map from the objective space to the decision space, and (3) building surrogate models for the fitness functions. By considering the different motivations, we hope to conduct a short survey to not only provide a systematic summary of some representative works but also discuss the potential future research directions in the related field. Without loss of generality, we refer to the EAs using ML models as the model-based evolutionary algorithms (MBEAs) hereafter.
The rest of this survey is summarized as follows. "Estimation of distribution" reviews the MBEAs motivated to estimate the distribution in the decision space. "Inverse modeling" reviews the MBEAs motivated to build inverse models from the objective space to the decision space. "Surrogate modeling" reviews the MBEAs motivated to build surrogate models for the fitness functions. Finally, the last section summarizes this survey.

Estimation of distribution
Estimation of distribution algorithms (EDAs) refer to the MBEAs which estimate the distribution of the promising candidate solutions by training and sampling models in the decision space [18]. As given in Algorithm 1, EDAs still use the conventional framework of EAs, but the reproduction operators such as crossover and mutation are replaced by ML models. Ideally, the ML models in EDAs are iteratively refined as the evolution proceeds and eventually converged to the global optimum. In this section, we will present a brief overview of representative EDAs of three types: the univariate EDAs, the multivariate EDAs and the multi-objective EDAs.
Algorithm 1 Pseudo code for EDA approach.

Univariate EDAs
To estimate the distribution of the candidate solutions in the decision space, the variable correlation is an essential factor to be taken into consideration in modeling. A simple approach is to adopt univariate models, where the assumption is that the decision variables are independent. Based on this assumption, the probability distribution of a candidate solution x = (x 1 , x 2 , . . . , x n ) can be decomposed as where p(x) is the probability distribution of the candidate solution x, and p(x i ) is the probability distribution of decision variable x i . EDAs adopting such univariate distribution models are known as the univariate EDAs. As a classic univariate EDA, the univariate distribution algorithm (UMDA) was proposed to solve the well-known onemax problem [19]. UMDA adopts a binary-encoded probability model with a probability vector represented as p = ( p 1 , p 2 , . . . , p n ), where p i = 1 indicates the probability of having 1 at position i of a candidate solution.
Another representative univariate EDA is known as the population-based incremental learning (PBIL) algorithm [20], which uses a similar binary-encoded probability model as in UMDA. However, a main difference lies in the fact that PBIL aims to incrementally improve the probability model by sampling a small number of candidate solutions in each generation, while UMDA maintains a full population of candidate solutions.
Despite that the univariate EDAs are of high computational efficiency by building univariate models, their performance may sharply deteriorate if there exist strong interactions between the decision variables. In the following subsection, we will briefly review some multivariate EDAs motivated to address this issue.

Multivariate EDAs
Among various multivariate EDAs, the most intuitive approach is to consider the pair-wise interactions between decision variables. Generally, given a candidate solution x = (x 1 , x 2 , . . . , x n ), the pair-wise interactions can be presented by the conditional probability model as where p(x) is the probability distribution of the candidate solution x, p(x i 1 |x i 2 ) is the conditional probability distribution of x i j given x i j+1 , and i 1 , i 2 , . . . , i n denotes a permutation of the decision variables. For example, De et al. proposed a mutual-informationmaximizing input clustering (MIMIC) algorithm in [21]. In MIMIC, the conditional entropy of each decision variable is used as the information for building the conditional probability models, where a chain of dependencies is built according to the ascent order of the conditional entropy values of the candidate solutions. Another representative algorithm of this type is the bivariate marginal distribution algorithm (BMDA) [22], where the dependency model at each generation is built by considering Person's chi-square statistics as the dependence measure.
While the conditional probability model as given in (3) is only capable of presenting the pair-wise interactions, some problems may contain more complicated interactions between the decision variables. To model such complicated interactions, a classic approach is the Bayesian optimization algorithm (BOA) [23] which adopts the Bayesian networks as the mutivariate models. A Bayesian network is an acyclicdirected graph, where each decision variable is represented by a node, and the conditional dependencies between the decision variables are presented by the edges. Given a decision vector x = (x 1 , x 2 , . . . , x n ), a Bayesian network can be formulated as where edge(v) is the set of variables having edges connected to x i . To build a Bayesian network, BOA starts from a single node and iteratively adds edges to the network according to the Bayesian-Dirichlet metric [24]. Since the Bayesian networks are able to capture complex variable interactions, many other representative multivariate EDAs are also developed on the basis of it. In the estimation of Bayesian network algorithm (EBNA), the Bayesian information criterion [25] is adopted in the iterative construction of the Bayesian networks. In the hierarchical Bayesian Optimization Algorithm (hBOA) [26], a problem is decomposed into a group of subproblems, and a hierarchical structure is adopted to deal with different subproblems in multiple levels.
There are also some other multivariate EDAs using different types of models, such as the Markovianity-based optimization algorithm (MOA) using the Markov networks [27], the affinity propagation EDA (AffEDA) using the affinity propagation clustering method [28], etc.

Multi-objective EDAs
Apart from the single-objective EDAs as discussed above, there are also EDAs tailored for solving MOPs, known as the multi-objective EDAs (MEDAs). Instead of obtaining one global optimum, the MEDAs are expected to obtain a set of optimal solutions as an approximation to the PF (as well as PS).
To approximate the PF an MOP, most MEDAs adopt special mechanisms to balance the convergence and diversity of the candidate solutions. In the Bayesian multi-objective optimization algorithm (BMOA) [29], the selection operator is based on a -archive, where a minimal set of candidate solutions that -dominates all the others is maintained over generations. In the naive mixture-based multi-objective iterated density estimation evolutionary algorithm (MIDEA) [30], a two-phase selection pressure is adopted, where the selection pressure is tuned by a parameter δ. In the multi-objective Bayesian optimization algorithm (mBOA) [31], the selection operator is directly borrowed from the NSGA-II algorithm [32]. The multi-objective hierarchical BOA (mohBOA) also adopts the selection operator in NSGA-II, combined with a k-meas clustering method.
Different from most MEDAs that adopt new selection operators, the regularity model-based multi-objective estimation of distribution algorithm (RM-MEDA) adopts a new reproduction operator [33]. Since the PS is a piecewise continuous manifold under the Karush-Kuhn-Tucker optimality conditions (aka the regularity property) [34], RM-MEDA reduces the dimensionality of the decision vectors using the local PCA method and then samples new candidate solutions in the latent space.

Discussion
As the most commonly seen MBEAs, the EDAs have achieved considerable advances over the past decade. As a main advantage, the EDAs have potential abilities to adapt to the fitness environment and learn the problem structures. This is helpful when the problems to be optimized have some special properties. Nevertheless, some challenges still remain to be addressed.
First, compared to using heuristic strategies (e.g. twopoint crossover), it is generally more time consuming to build ML models. It should be well traded off in practice whether it is worth the computational cost to apply EDAs, maybe only for incremental performance improvement.
Second, most EDAs have strict requirement of the training data (i.e. candidate solutions to have the models adequately trained). This can hardly be guaranteed during the optimization process of an EA. Consequently, ill-trained models may lead to poor performance of EDAs.
Third, most EDAs suffer a serious curse of dimensionality. With the increase of the decision variables, the performance of EDAs may deteriorate sharply due to the failure of the ML models adopted therein. This has limited the robustness and applicability of EDAs in practice.
Fourth, EDAs focus on the estimation of the distribution in the decision space, but they pay little attention to the correlation among the decision variables. By contrast, the covariance matrix adaptation (CMA)-based algorithms [35], e.g., the CMA-based evolutionary strategy [36] and the multi-objective CMA (MO-CMA) [37], utilize the correlation and variance quotients of the distribution to enhance the convergence of the algorithm. A promising future work is the combination of EDA and CMA to take full advantage of the statistical information for accelerating the convergence rate of EDAs.

Inverse modeling
As discussed in the previous subsection, the target of multiobjective optimization is to obtain a set of candidate solutions as trade-offs between the different objectives. Hence, an algorithm should maintain a good balance between the convergence and diversity of the population, such that, ideally, the candidate solutions can be uniformly distributed on the true PF. Despite that the target is to approximate the PF (in the objective space), most MEDAs still build models in the decision space and sample candidate solutions. However, as illustrated in Fig. 2, a uniformly distributed solution set in the decision space may not necessarily mean that their image set is also uniformly distributed on the PF. To directly control the distribution of the candidate solutions on the PF, some researchers have proposed to first sample points in the decision space and then build inverse models to map them back to the decision space. In this section, we will introduce several representative MBEAs of this type.
Given an MOP f(x), the inverse modeling process is to build a model that maps from the objective space to the decision space as where g(·) denotes the inverse mapping function to be modeled. Strictly speaking, g(·) can be precisely modeled if and only if it is a one-to-one mapping from the PF to the PS. In practice, however, g(·) still can be approximated even if this condition does not hold. From the machine-learning point of view, building an inverse model g(·) can be seen as a regression task.
In [38], Giagkiozis and Fleming propose to use the Radial Basis Function Neural Networks (RBFNNs) [39] to build the inverse models mapping from the objective space to the decision space for multi-objective optimization, known as the Pareto estimation method. In this method, an existing multi-objective evolutionary evolutionary algorithm is first run to obtain a set of candidate solutions as an approximation to the PF. Then, the solution set is used for training the RBFNN mapping from the objective space to the decision space. Using the trained RBFNNs, the decision makers are able to sample more solutions in the region of interest on the PF without performing additional exhaustive search.
While the Pareto estimation method in [38] only works for off-line training using a solution set obtained by another existing algorithm, recently, Cheng et al. have proposed a multi-objective evolutionary algorithm using Gaussian process-based inverse modeling (IM-MOEA) [40,41], which adopts on-line training of the inverse models. As illustrated in Fig. 2, the inverse modeling process is embedded into the reproduction operator of the algorithm to sample new candidate solutions. To simplify the modeling process, the whole multivariate inverse mapping model is approximately decomposed into a number of univariate models: Since the decomposition does not strictly hold due to the variable correlations, the Gaussian process (GP) [10] models are used to present the uncertainty information (i.e. errors of the decomposition). In addition, a random grouping method is used to increase the probability that the correlated variables are considered together when training and sampling the inverse models. Compared to the RBFNN-based method in [38], the IM-MOEA shows more robust performance, and more importantly, it can not only be used to sample additional solutions in the region of interests, but also work as an independent EA to approximate the PF. Following the success of IM-MOEA, the idea has also been extended for solving MOPs with irregular PFs [42] as well as MOPs in dynamic environment [43]. Instead of adopting the GP model, an sim-ple linear model is adopted in [43] to simplify the inverse modeling process. Apart from the aforementioned inverse modeling-based approaches, there are also some approaches focused on the PF modeling only. For example, in the Pareto-adaptivedominance-based algorithm ( paλ-MyDE) [44,45] and the reference indicator-based MOEA (RIB-EMOA) [46], each PF is associated with one curve in the family: where f i denotes the ith objective value and M denotes the number of objectives. Recently, Tian et al. proposed a robust Pareto front modeling method [47] by training a generalized simplex model in consideration of both the scale and curvature of the PF. However, despite that these approaches are capable of capturing the approximate structures of the PFs, the models cannot be used to obtain the candidate solutions in the decision space directly, which is a major difference from the inverse modeling-based approaches.

Discussion
While the EDAs are focused on the estimation of the distribution in the decision space, the inverse modeling works as a bridge between the objective space and decision space. It is particularly useful to build such inverse models when there is a decision-making processes involved in multi-objective optimization. Nevertheless, the development of inverse modeling is still at the infancy and there is much to be explored and studied in the future.
First, inverse modeling is based on the assumption that the mapping from the objective space to the decision space is one-to-one mapping. In practice, however, it is very likely that one objective vector can correspond to more than one decision vectors. It is of particular interest to see how to build more robust inverse models for such problems.
Second, just as most other MBEAs, the inverse modelingbased algorithms also suffer from the curse of dimensionality. This issue is twofold. On one hand, the ML models such as GP can be extremely time consuming if there is a large number of variables. On the other hand, the training data required for building the inverse models exponentially increase with the number of variables, which, however, cannot be met due to the limited population size and fitness evaluations.

Surrogate modeling
One great challenge in solving many real-world optimization problems is that one single fitness evaluation (FE) is computationally and/or financially expensive, since it requires time-consuming computer simulation or physical experiments [48]. For instances, the computational fluid dynamic (CFD) simulation is used to estimate the quality of a design scheme in the field of structural design, where a single simulation may take from minutes to hours [49][50][51]. Conventional model-free EAs cannot afford such expensive function evaluations, as they typically require tens of thousands real-objective FEs. To overcome this challenge, the surrogate-assisted evolutionary algorithms (SAEAs) have been developed, where the computationally efficient models are introduced for replacing the computationally expensive models [52][53][54][55]. Generally speaking, the SAEAs are also a class of typical MBEAs where the models are used in the fitness evaluation component.
In this section, we will present a brief overview of representative SAEAs of two types: the single-objective SAEAs and the multi-objective SAEAs.

Single-objective SAEAs
In expensive single-objective optimization, the surrogate model typically aims to approximate the objective function or a fitness function of a candidate solution x where f * is the true value of the objective or fitness value of the solution, f is the approximated value, and ξ is the error function which reflects the degree of "uncertainty" of the approximation of the model [56]. The model management plays a key role in making the most use of the surrogate models [57]. Existing model management strategies can be roughly classified into three categories, namely, the generation-based, the populationbased and the individual-based strategies [52]. Most earlier model management strategies employ a generation-based method [58,59], where the key question is to adopt the frequency in which the real fitness function is used. For example, Nair et al. used the average approximation error of the surrogate during one control cycle to adjust the frequency of using the real objective function [60]. In the populationbased approaches, more than one subpopulation co-evolves, each using its own surrogate for fitness evaluations and the migration of individuals from one subpopulation to another is allowed. For example, Sefrioui et al. proposed a Hierarchical Genetic Algorithm (HGA) using multiple models [61]. By contrast, the individual-based model management [57,62] focuses on determining which individuals need to be evaluated within each generation. The most straightforward criterion is to evaluate solutions that have the best fitness according to the surrogate [57]. Emmerich et al. proposed a criterion to select solutions whose estimated fitness was the Fig. 3 An example illustrating the utility of the "uncertainty" information. The star point is the most uncertain candidate solution and the circle point is the optimal candidate solution found by the surrogate model, while the shaded area characterizes the "uncertainty" degree of the approximated function most uncertain [62,63], which could encourage exploration of the algorithm.
Usually, researchers tend to select those "certain" candidate solutions to achieve relatively accurate prediction. However, some "uncertain" candidate solutions may be potentially good [64]. An example of this situation is given in Fig. 3, the most uncertain solution (the star point) corresponds to the global optimal solution of the problem, while the predicted best solution (the circle point) corresponds to a local optimum.
To estimate the uncertainty in fitness approximation, the average distance between a solution and the training data can be adopted [62]. Since Kriging models are able to provide uncertainty information in the form of a confidence level of the predicted fitness [48], they have recently become very popular in SAEAs. To take advantage of the uncertainty information provided by the Kriging models, various model management criteria, also known as infill sampling criteria in the Kriging-assisted optimization, have been proposed in SAEAs, such as the probability of improvement (PoI) [65,66], the expected improvement (ExI) [67], the lower confidence bound (LCB) [64], and the heterogeneous ensemble-based infill criterion to enhance the reliability of ensembles for uncertainty estimation [68].
Apart from the aforementioned single-objective SAEAs that work in the context of genetic algorithm, there are also some SAEAs working in stochastic search methods rather than genetic algorithms, such as the surrogate-assisted artificial immune systems [69], the neural network-assisted evolution strategy [57], the feasibility structure modelingassisted memetic algorithm [70], the classification-assisted memetic algorithm [71], the surrogate-assisted cooperative particle swarm optimization [72], and the committee-based active learning based surrogate-assisted particle swarm optimizer [73].

Multi-objective SAEAs
In recent years, computationally expensive MOPs have drawn increasing attention in the area of expensive optimization as they are difficult for existing SAEAs [74].
Different from the single-objective SAEAs using surrogate models for approximating the objective functions or a fitness function, a variety of different targets can be approximated in multi-objective SAEAs. The most intuitive idea is to approximate the objective functions by multiple surrogate models [59,75,76]. For instances, Singh et al. proposed a surrogate-assisted simulated annealing algorithm (SASA) for constrained multi-objective optimization [77], Ahmed and Qin proposed a non-dominated sorting based SAEA for multi-objective aerothermodynamic design [78]. Recently, Chugh et al. proposed a reference vector-guided surrogateassisted evolutionary algorithm for solving expensive MOPs with more than three objectives [79,80], which was also applied to design the air intake ventilation system [81].
Another basic idea is to construct a single model of an aggregation function of the objectives. A typical algorithm is the hybrid algorithm with on-line landscape approximation (ParEGO) [82], where the Kriging model is adopted to surrogate the weighted sum fitness function. Similarly, the performance metrics can be used as the fitness function. In the S-metric selection-based SAEA (SMS-EGO), the Kriging model is used to surrogate the S metric. By contrary, there are also some SAEAs using the surrogate models for classification to learn the Pareto dominance relationship or the Pareto rankings [83]. In 2014, Bandaru et al. trained a multi-class surrogate classifier to determine the dominance relationship between two candidate solutions [84]. In 2015, Bhattacharjee and Ray proposed a support vector machine-based surrogate to learn the ranking of solutions for constrained multiobjective optimization problems [85]. Later in 2017, Zhang et al. trained a classifier based on a regression tree or a knearest-neighbour (KNN) to distinguish good solutions from bad ones [86]. Recently, Pan et al. proposed a classificationbased surrogate-assisted evolutionary algorithm (CSEA) to learn the dominance relationship between a candidate solution and a set of reference solutions [87].

Discussion
Different from the other two types of MBEAs, SAEAs are proposed for solving the computationally expensive optimization problems. They are effective in obtaining a set of acceptable candidate solutions with limited computational resources. Nevertheless, there are still some challenges to be addressed.
First, the choice of surrogate models is not straightforward. There are many different types of surrogates but there is no simple rule for determining which type should be chosen.
It is crucial to balance the fitting ability and the computational efficiency of a surrogate model for different problems, i.e., a simple/powerful model should be used for a simple/complex problem.
Second, SAEAs also suffer from the curse of dimensionality. For example, the computation time for training a Kriging model with a high-dimensional input data (in terms of the dimensionality of a sample and the size of the samples) is unaffordable. It is necessary to use some dimensionality reduction methods or powerful surrogate models to deal with this issue.
Third, it is non-trivial to determine what should be predicted by the surrogate. This issue is more serious for expensive MOPs due to the existence of multiple objectives. It is interesting to design new features to distinguish the quality of two candidate solutions, as the advanced abstract feature may filter the local optima to improve the performance of SAEAs.
Fourth, the utilization of the "uncertainty" information should be further investigated. While existing SAEAs mainly use a single type of "uncertainty" information, their performance may degenerate sharply if the trained surrogate models have the syntropic prediction biases.

Summary
While the evolutionary algorithms (EAs) have witnessed a rapid development during the past two decades, the development of the model-based evolutionary algorithms (MBEAs) is attracting increasing interests. In contrast to providing a comprehensive review of every single method in the literature, this survey tries to shed light on the different motivations of using models in EAs: estimation of distribution, inverse modeling, and surrogate modeling. Among the three types of MBEAs, the estimation of distribution algorithms (EDAs) and the surrogate-assisted evolutionary algorithms (SAEAs) are more widely studied and applied, while the development of the inverse modeling-based EAs is still at the infancy.
From the machine-learning point of view, the working mechanism in MBEAs is twofold, where data and models are also key to the MBEAs, just as to the machine-learning algorithms. On one hand, the learning models are iteratively built and updated using the fitness values as training data. On the other hand, the models are iteratively sampled to generate the candidate solutions as the reproduction. Therefore, it is very important that the suitable models should be trained using the suitable candidate solutions, and there is still much to be studied along this direction.
To the best of our knowledge, this is the first survey of MBEAs in the literature. We hope that it not only helps better understand how models enable EAs to learn, but also provides a systematic taxonomy of the related methods in this field.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecomm ons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.