An adaptive model switch-based surrogate-assisted evolutionary algorithm for noisy expensive multi-objective optimization

To solve noisy and expensive multi-objective optimization problems, there are only a few function evaluations can be used due to the limitation of time and/or money. Because of the influence of noises, the evaluations are inaccurate. It is challenging for the existing surrogate-assisted evolutionary algorithms. Due to the influence of noises, the performance of the surrogate model constructed by these algorithms is degraded. At the same time, noises would mislead the evolution direction. More importantly, because of the limitations of function evaluations, noise treatment methods consuming many function evaluations cannot be applied. An adaptive model switch-based surrogate-assisted evolutionary algorithm is proposed to solve such problems in this paper. The algorithm establishes radial basis function networks for denoising. An adaptive model switch strategy is adopted to select suited surrogate model from Gaussian regression and radial basis function network. It adaptively selects the sampling strategies based on the maximum improvement in the convergence, diversity, and approximation uncertainty to make full use of the limited number of function evaluations. The experimental results on a set of test problems show that the proposed algorithm is more competitive than the five most advanced surrogate-assisted evolutionary algorithms.


Introduction
Nowadays, there are many practical engineer problems that can be modeled as noisy and expensive multi-objective optimization problems (MOPs), such as the gait optimization of autonomous snake robot. It needs to maximize the speed and head stability of the sidewinding gait of a snake robot [1]. Because the snake robot achieves locomotion by twisting its  [2], its speed and head stability are a pair of conflicting objectives. The objective values can only be observed through experiments, so the cost of observations is expensive. Due to the influence of environment and robot gait, it is difficult to get accurate objective information from the speed and vision sensors. Therefore, this optimization problem can be modeled as an expensive MOP with independent noises on each objective.
MOPs usually have multiple conflicting objectives to be optimized at the same time. Evolutionary algorithms (EAs) [3] are widely used to solve MOPs because of some advantages. Those advantages include that obtaining a set of non-dominated solutions in one run [4], simultaneously handling problems with multiple local and non-convex Pareto fronts [5]. Therefore, a large number of multi-objective evolutionary algorithms (MOEAs) have proposed for MOPs in recent decades [6]. These MOEAs usually are divided into three types: algorithms based on Pareto domination, indictors and decomposition. The first type of MOEAs select candidate solutions by the Pareto dominance. For example, nondominated sorting genetic algorithm-II (NSGA-II) [7] uses the dominance relationship and crowding distance as criteria for its selection. The second type of MOEAs assign a fit-ness to each solution by a suitable indicator. Indicator-based selection evolutionary algorithm (IBEA) [8] and Hypervolume (HV) estimation algorithm (HypE) [9] use I + [10] and HV [10] as their indicators, respectively. The third type of MOEAs apply a series of vectors to decompose an MOP into multiply single objective optimization problems (SOPs). For instance, multi-objective evolutionary algorithm based on decomposition (MOEA/D) [11]. It is noteworthy that existing MOEAs need more than ten thousands of function evaluations [12]; however, there are only a few hundred function evaluations can be used due to the limitation of money and(or) time cost(s) in expensive MOPs [13].
Recently, one popular method to solve expensive MOPs is to use surrogate models [6,14] (also called meta-models [15]) to replace the real function evaluations in EAs. Specifically, modeling methods are applied to the existing data set to build a computationally cheap surrogate model to replace the expensive function evaluation, so as to reduce the computation budget. These MOEAs assisted by the surrogate are called as multi-objective surrogate-assisted evolutionary algorithms (SA-MOEAs) [16]. To improve the reliability of the surrogates and improve the optimization performance, SAEAs usually need corresponding sampling criteria which aim to select suitable solutions for evaluating. They usually need to take convergence, diversity and approximation uncertainty for expensive MOPs into consideration [13,17]. The methods to construct surrogates include response surface method (RSM) [18], Gaussian process (GP) [19], artificial neural networks (ANN) [20], support vector machines (SVM) [21], radial basis function networks (RBFN) [22]. According to the methods of constructing surrogates, these SAEAs can be divided into two classes: 1. Regression-based SAEAs: These SAEAs use surrogates to approximate objective functions, indictors, or scalar functions, etc. The examples of the first category include Kriging-assisted two-archive evolutionary algorithm (KTA2) [17], a surrogate-assisted particle swarm optimization algorithm with an adaptive dropout mechanism (ADSAPSO) [23] Kriging assisted reference vector guided evolutionary algorithm (KRVEA) [5]. KRVEA uses angle penalized distance (APD) [5] to select the solutions to keep the convergence and diversity of population. However, when the number of inactive fixed reference vectors, i.e., the reference vectors without any associated solution, the increasing number of those reference vectors between generations reaches a threshold, it would select the solutions with uncertainty. KTA2 considers convergence, diversity, and approximation uncertainty in its sampling criterion, which analyzes the optimization process need on convergence, diversity and approximation uncertainty then adaptively selects a sampling strategy based one aspect. ADSAPSO adopts the dominance relationship and crowding distance as the criteria sampling, which takes convergence and diversity into consideration. The examples of the second category include neural networks assisted steady state indicator-based evolutionary algorithm (NN-SS-IBEA) [24]. In the sampling criteria of NN-SS-IBEA, it chooses individual with a high HV contribution which considers convergence and diversity. The examples of the third category include Pareto based efficient global optimization (ParEGO) [25], efficient global optimization-assisted multi-objective evolutionary algorithm based on decomposition (MOEA/D-EGO) [26]. ParEGO and MOEA/D-EGO select the solution which has the best expected improvement (EI) [27] of the aggregation function to sample, which takes both convergence and uncertainty into account. 2. Classification-based SAEAs: These SAEAs make use of the existing data to train a classifier. It is used as the surrogate model to select candidate solutions as their function evaluations, such as classification and Paretobased MOEA (CPS-MOEA) [28], classification based on surrogate-assisted evolutionary algorithm (CSEA) [29]. CPS-MOEA selects the solutions with good fitness which mainly considers convergence. As there is no solution with good fitness, CPS-MOEA selects the solutions for function evaluations randomly which takes the approximation uncertainty into account. CSEA selects nondominated solutions with a high credibility level and dominated solutions with a low credibility level for function evaluations which takes convergence and approximation uncertainty into account. The main differences between CPS-MOEA and CSEA include their surrogates and sampling criteria. CPS-MOEA applies a classification tree as its surrogate but CSEA uses feedforward neural network (FNN) [29] as its surrogate. CPS-MOEA selects solutions with good fitness by surrogate but CSEA selects solutions by comprehensively considering the dominance relationship and credibility.
Noises would lead that the objective function cannot be accurately evaluated. Thus, the optimization direction may be misled [30]. To improve the performance of the MOEAs, we need to take some extra methods for dealing with noises for noisy MOPs. At present, the approaches for handling noises in MOEAs can be compartmentalized into three categories [31] as follow: 1. Averaging: The average value of all solutions after independently sampling several times is taken as the objective values after denoising. With this approach, resampling n times, then the standard deviation of the objectives would decrease √ n times [32]. This is a common method for decreasing noises but it requires many function evaluations [33][34][35]. For example, noise-tolerant strength Pareto evolutionary algorithm (NTSPEA) [35], each solution is assigned resampling times according to the number of dominated solutions in the population. 2. Ranking: The ranking methods are used to decrease noises include two types: ranking based on probability [36][37][38][39] and clustering [40]. For the first one, constructing a model with confidence level which is applied to predict the dominated probability between solutions with noises. For example, nondominated sorting genetic algorithm-II-A (NSGA-II-A) [37] constructs an SVM model with a confidence level by α dominance to predict the relationship between solutions with noises. The other one, the average and standard deviation of each solution are calculated by resampling, and the clustering radius is obtained by the calculation results, so as to select more reliable solutions. For instance, modified nondominated sorting genetic algorithm-II (M-NSGA-II) [40] uses this method to solve noisy MOPs. 3. Modeling: This approach considers that noises may has seriously influence on one solution but on a model can be minimal. Therefore, it uses the noisy data to construct a regression model for denoising. For example, regularity model [31], restrict Boltzmann machine [41], [42] and univariate marginal distribution algorithm [43]. They all handle noises through the idea of modeling which brings a new direction for denoising.
At present, there are some MOEAs that combine the above methods for denoising, such as Filters-based NSGA-II (FNSGA-II) [44]. It uses resampling, mean and Wiener filters for denoising, where resampling mean filter belong to averaging and Wiener filters belong to modeling. However, there are many MOEAs to solve either expensive MOPs or noisy MOPs. For noisy and expensive MOPs, these MOEAs cannot achieve satisfactory performance. According to the above analysis, we find that the MOEAs for expensive MOPs have poor performance for noisy MOPs due to the influence of noise, and the MOEAs for noisy MOPs which need a large number of evaluations cannot meet the computation budget limitation [31,33]. To solve this type of MOPs, we need to face three main challenges. First, because of the influence of noises, the reliability of surrogates would be reduced. Second, due to the limited number of function evaluations, some existing denoising methods are invalid. Third, most sampling criteria cannot make full use of limited function evaluations. Therefore, we propose an adaptive model switch-based surrogate-assisted evolutionary algorithm (MS-MOEA) in this paper. We adopt IBEA as the basic optimizer, where I + and L p -norm distance [45] are adopted as indicators to select sampling solutions and the GP and RBFN models are used as the surrogates to approximate objective functions. The main contributions of this work as follow: • A noise treatment method is proposed for decreasing the influence of noises. The approach divides all solutions into two sets: one of which all solutions greatly affected by noises, the other contains all remaining solutions. When the number of solutions in the first set reaches a certain scale, then an RBFN model is constructed by all data, and the data is smoothened by the model for denoising. • An adaptive model switch approach is designed for the proposed algorithm. Either the RBFN or GP model is chosen as the surrogate according to whether the data set is denoised or not. • We design an improved environmental selection for the MS-MOEA. It combines different selection methods to select individuals according to the impact of noise on the population. • We propose an adaptive sampling strategy in this paper.
This strategy includes three sampling methods and an approach to adaptively switch between the sampling methods.
The rest of the paper is organized as follows. An introduction to noisy MOPs, IBEA, GP and RBFN is given in the next section. The details of the proposed algorithm for noisy and expensive MOPs are described in the following section. The experimental settings, results and analysis are presented in the next section. The conclusions and future work are concluded in the last section.

Noisy multi-objective optimization problems
We consider the mathematical expression of MOPs is as follows: where x is decision vector and d is the dimension of decision vector. F(x) is the objective function which has m optimization objectives (m≥2). Since the objectives usually are conflicting, many solutions cannot be compared by objective values. The concept of the Pareto dominance [46] is to compare the solutions of MOPs. The MOPs that we need to solve have noises in the objective space, so the expression of the optimization objectives is shown in Eq. (2).
where F(x) is our optimization objectives, but we can only observe T (x). In this paper, we assume the noises are independent additive noises. β meets the Gaussian distribution. I is an identity matrix.

Indicator-based selection evolutionary algorithm
The evolutionary search of expensive MOPs need the assistance of surrogate models, which means that the approximation error of the model would have a great impact on the search process. IBEA + uses I + as selection criterion rather than the Pareto dominance relationship. I + is the minimal difference that the dominance relationship between two solution sets become one weakly dominating the other: where A and B are solution sets, respectively, and m is the number of objectives. I + (A, B) represents the minimum distance of each objective dimension when the dominance relationship between A and B is to be transformed into weak dominance. Compared with the Pareto dominance, there are two advantages of I + are as follows: it does not only represent the natural extension of the Pareto dominance but also transforms multiple objectives into one indictor which reduces the approximation errors of surrogate models to a certain extent. When the dominance relationship is adopted for ranking, it is necessary to ensure that the prediction objective error does not affect the size relation of each two individuals in every objective. However, I + can reduce the approximation errors of surrogate models to a certain extent and the reason as follow. If A (or B) only contains one solution x 1 (or x 2 ), the Eq.
(3) can be written as Thus, when I + is adpoted as the indicator, it is only necessary to ensure that the prediction objective error of the selected surrogate model does not affect the size relation of each two individuals in one objective. Therefore, the adaptive IBEA + is adopted as basic optimizer in this work. IBEA + uses I + as the indicator to assign the fitness for each individual in the population. The binary tournament selection is used to select the parents. It makes use of crossover and mutation to generate offsprings. The pseudo code of adaptive IBEA + is shown in Algorithm 1. According to the above analysis, it is proved that I + can be directly used in fitness calculation. IBEA + adopts Eq. (5) to assign fitness for individuals: where k represents a scaling factor (k > 0), c is defined to improve the robustness of the algorithm. Due to the dominance maintaining of I + . Therefore, the better solution would be allocated higher fitness, and them are easier to be selected in the population iteration. Because the size of population is fixed, we need to delete some individuals by the fitness. Therefore, when the composition of the population changes, the fitness of every individual need to be updated. The Eq. (6) is used to update the where W is the individual that needs to be deleted, other parameters same like Eq. (5). RBFN [47] is a type of artificial neural network which usually has three layers: an input layer, a hidden layer and an output layer. The structure of RBFN, as shown in Fig. 1. In the RBFN, each neuron of the hidden layer is a radial basis function that takes the distance from the input point to the corresponding center point as the input variable. The calculation formula of the RBFN output is shown in Eq. (7):

Radial basis function network
where N represents the number of neurons in hidden layer, w i is the weight of ith the hidden layer neuron and the output layer, φ i represents the output of ith the hidden layer and its input (|x − c|) is the distance of input point and ith center point. Bias is the bias of the whole network.
There are four types of kernel functions commonly used in RBFN: Gaussian, reflected sigmoid, multiquadric and inverse multiquadric kernels [48]. In this work, we use the most commonly used Gaussian kernel function as the radial basis function, as shown in Eq. (8): where r = |x − c|, σ is the spread rate used to control the smoothness of RBFN which decides the range of central point.
In this paper, generalized least square method [49] is used for calculating the weight of radial basis function network, as shown in Eq. (9). It is assumed that there are K training points X = [x 1 , x 2 , . . . , x K ] and Y = [y 1 , y 2 , . . . , y K ]. Because the decision space dimension of the MOPs is d and the objective space dimension is m, it is necessary to construct m RBFNs to replace all objective functions: where Q is the number of neuron in hidden layer, K represents the number of training data, φ i j respresents the output of the jth hidden layer neuron at sample x i . Q and the central point are determined by K-means clustering algorithm [50].
Taking the Gaussian kernel as an example, the farther the input vector from the center of RBF, the lower the activation degree of the node is, so the RBFN has the characteristics of local fitting [47]. In addition, the training speed of RBFN is fast and the amount of calculation is small, which is one of the advantages of using RBFN as a surrogate model.

Gaussian process
Gaussian process is a excellent performance modeling method. It cannot only provide the predicted value of the points needed to be predicted, but also provide the uncertainty of points. Therefore, GP has been widely used in SAEAs. In this paper, we adopt DACE [51] to construct the GP model.
It is assumed that there are K training points X = [x 1 , x 2 , . . . , x K ] and Y = [y 1 , y 2 , . . . , y K ] in data set S. Because the decision space dimension of MOPs is d and the objective space dimension is m, it is necessary to construct m GPs to replace all objective functions. The Gaussian process model based on data set S satisfies Eq. (10): where μ(x) is the predicted value of x by the model, α(x) represents an independently Gaussian distribution with 0 as the mean value, σ as the standard deviation. All training samples need to be trained to construct GP. For any two training samples x i and x j , the two random processes α(x i ) and α(x j ) are defined as Eq. (11): where R represents the correlation matrix consisting of the correlation function values of all samples. For any Surrogateassisted evaluaƟons 6 20 generaƟons ? Return 6 Y N 8 Offspring generaƟon 6 To improve the performance of the surrogate model on the data set S, we need to maximize the likehood function as shown in Eq. (12) by optimizing the hyperparameters: where 1 respresents a identity matrix of K * 1. y = Y T . To maximize the likehood function, the following conditions of μ and σ should be satisfied, as shown in Eq. (13): Finally, the calculation formula of predicted values and variance is shown in Eq. (14): where resents the correlation matrix consisting of the forecast point and all sample points. It is noteworthy that the computational complexity of training a GP is O(N 3 ) [52] and the algorithms for optimizing the hyperparameters θ would increase it. Therefore, GP is not suitable for the high-dimensional problems.

Proposed algorithm
In this work, an adaptive model switch-based SAEA (MS-MOEA) is proposed for noisy expensive multi-objective optimization. In this SAEA, we adopt IBEA + as the basic optimizer, an adaptive model switching strategy between RBFN and GP to approximate objective functions, an infill sampling criterion to select solutions for real function evaluations. The framework of the proposed MS-MOEA is shown in Fig. 2. From the figure, MS-MOEA can be divided into six main steps.
1. Initialization: An initial population P 0 in the size N is generated by Latin hypercube sampling [53]. All individuals in the P 0 have been evaluated by the real objective functions T (x) which are expensive and noisy, then the individuals of P 0 would be added to the archive A. 2. Noise treatment: A solution with the best fitness is sampled several times, and the average value is calculated as the actual objective value of the solution each objective. To ensure the convergence of population, we set the number of generations assisted by surrogates to 20. This parameter is set according to KRVEA [5]. 6. Adaptive sampling: By analyzing the improvement of population convergence and diversity, then we adopt an adaptive sampling approach to switch in three strategies based on convergence, diversity, and uncertainty. The sampled solutions would be evaluated by real objective functions, then the results of evaluations would be collected into archive A. 7. Repeat Steps 2-6 until the number of function evaluations is met.
In the following sections, we will discuss the details of Preliminaries, Proposed Algorithm, Empirical studies, and Conclusions.

Noise treatment
As mentioned in "Noisy multi-objective optimization problems", we need to handle the noises in MOPs. For the minimization MOPs, noises have slight influence to the population in the early stage of optimization, because the population has not converged and the objective values are relatively large. In addition, taking the view into consideration, this view holds that when the impact of noises is slight that is helpful to generate better solutions, but when it is too big that evolutionary search degenerates into random search [30]. Therefore, in order to reduce the influence of noises, we propose a method to estimate the impact of noises and a method to denoise.
According to the above analysis, a method is designed to estimate the influence of noise which divides the whole stage of optimization into early and late stages. When the algorithm is in the early stage of optimization, it means that the noise has little influence on the population, so the noise is not handled. Conversely, the algorithm would construct RBFNs with original data for all objectives and use the regression value as denoised objective value. The method of estimating the influence of the noise is to use the fitness of I B E A + to sort the individuals in the initial population and selects the individual with the best fitness to resample 5 times by real objective functions. Then, we calculate the average value and find out the absolute value of maximum noise E max by the average value and all sample values for each objective. Next, the Nth individual sorted is for the estimation. Then the algorithm calculates the ratio of E max to objective values and records it as λ in each objective, where N is the population size. Finally, we need to compare λ with the threshold of noises δ. As long as there is an objective λ < δ, it means that the algorithm is in the early stage of optimization, and the algorithm sets the noise flag to 0. Conversely, the algorithm would set the noise flag to 1, and once the noise flag is 1, the noise estimation method is no longer executed.
The output of RBFN consists of the product of all hidden layer neurons and corresponding weights, plus network bias. The output of all hidden layer neurons is controlled by the distance between input point and hidden layer center point. It is beneficial for denoising, because the correlation of objective values are affected by the distance in the decision space and the noises only exists in the objective space. In addition, The effect of denoising method depends on the number of center points and the value of the spread rate.

Model switch strategy
For the minimization MOPs, the noises in the early stage of optimization has little influences. Therefore, we can ignore the influence of noises for evolutionary process in the early stage of optimization. GP is a surrogate model with excellent performance which cannot only provide the prediction, but also provide the prediction error as the approximation uncertainty. It is extremely important in the early stage of evolution especially when the population is trapped in the local optimum. However, when the noises have non-ignorable influence, GP cannot handle noises well due to the fitting of GP to the objective is overfitting. Thus, It is not appropriate to use GP as the surrogate model in the late stage of evolution. RBFN has the characteristics of local fitting can smooth the noisy data, so RBFN as the surrogate model in the late stage of optimization can achieve better performance. Based on the above analysis, we propose a adaptive model switch strategy between GP and RBFN based on the influence of noises. If the noise flag is 0, it means that algorithm need to concentrate on the convergence of population and approximation uncertainty so the GP surrogate model would be constructed for MS-MOEA. If the noise flag is 1, it means that algorithm needs to concentrate on the influence of noises so the RBFN surrogate model would be constructed for MS-MOEA.

Improved environmental selection
In the offspring generation, we need to select next parent population by the environmental selection. It is very important to consider the convergence and diversity of population in environmental selection. However, the basic optimizer IBEA + cannot balance both aspects which is not very effective in maintaining diversity [17]. Therefore, we proposed an improved environmental selection. It can be seen from the above analysis that the evolutionary stage of the population can be divided into the early and late stages according to the noise flag. When the population is in the early stage, the population has not converged, it is important for environmental selection to pay attention to convergence. At the same time, considering that the population may fall into the local optimum, environmental selection needs to take uncertainty into account. Therefore, we adopt environmental selection based on convergence and uncertainty, respectively. Where convergence selection depends on the I + indictor and uncertainty selection depends on the prediction error of GP. All individuals selected by environmental selection are regarded as new parental populations. When the population is in the late stage of evolution, the population has converged well. To get better solutions, we need to improve the diversity of population. Thus, we adopt environmental selection based on convergence and diversity, respectively. Where diversity selection depends on dominance relationship and the L p -norm-based distance [54]. All individuals selected by environmental selection are regarded as new parental populations. For details of the improved environment selection are shown in Algorithm 2.

Adaptive sampling approach
Different sampling approaches can bring different improvements to the population or surrogate model. For example, the convergence sampling and diversity sampling approaches are helpful to improve the performance of population in convergence and diversity, while the uncertainty sampling approach is helpful to improve the global approximation accuracy of surrogate models. Because the MOPs studied in this work are expensive, we need to make good use of limited function evaluations and different sampling approaches. Therefore, an adaptive sampling approach is proposed for MS-MOEA. The procedure of adaptive sampling as shown in Fig. 3.
The idea of adaptive sampling approach is through preserve the populations that obtained by the convergence selection and diversity selection last time in the adaptive sampling are recorded as P to P g c , the convergence sampling would be adopted to sample. When there is not an improvement from P g−1 c to P g c , the algorithm would use the noise flag to judge the evolutionary stage of the population to make a choice from adopting the diversity sampling directly and comparing the improvement of diversity. If there is an improvement from P g−1 d to P g d , the diversity sampling would be adopted. When there is not an improvement from P g−1 d to P g d , the approach would choose uncertainty sampling.
As for the methods to judge the improvement of convergence and diversity, we adopt two indicators to measure. To measure the improvement of convergence, we calculate the distances from all solutions to the estimated ideal point. To measure the improvement of diversity, we calculate the pure diversity (PD) [55] for each population. The calculation formula of PD is shown in Eq. (15) where dis(s, s i ) is the L p norm between individual s and individual s i . d(s, P) is used to obtain the minimum L p norm. P D is an indicator to measure the diversity of population which does not require any reference, nor any parameters.
For the judgement results, the algorithm uses a specific sampling approach to select η individuals for expensive function evaluations: 1. Convergence sampling approach deletes the individual with minimum fitness calculated by I + in P g c until remaining η individuals in P g c . The details can be found in Algorithm 3. 2. Diversity sampling approach selects the nondominated individual with maximum contribution to the PD of P g d until the number of the selected individuals equals to η. The details can be found in Algorithm 4. 3. Uncertainty sampling approach randomly selects χ solutions from P g d . Next, the solution with the maximum uncertainty would be selected in χ solutions. The approach would be repeated η times and η samples are used to evaluate. The details can be found in Algorithm 5.

Empirical studies
To verify the performance of the proposed algorithm, we compare it with five SAEAs. The five algorithms include KRVEA [5], ParEGO [25], MOEA/D-EGO [26], CSEA [29], and KTA2 [17]. The characteristics of these algorithms are summarized as follows: • KRVEA: The reference vector guided evolutionary algorithm (RVEA) [56] is adopted as the basic optimizer. The Kriging model is used to approximate each objective in KRVEA. It uses APD to select solutions for evaluating, but when the diversity of population decreased, the solutions with maximum uncertainty would be selected. • ParEGO: It uses an aggregation function to scalarize the MOP into an SOP using a randomly chosen weight vector for each generation.The Kriging model is used to approximate the scalar function. It selects the solution with the best EI for evaluating. • MOEA/D-EGO: MOEA/D [11] is used as the basic optimizer. It uses a fuzzy clustering method to construct multiple GPs as local models. For each SOP, the solution with the best EI would be selected. • CSEA: It belongs to the SAEAs based on classification.
The existing data is used to train a feed-forward neural networks (FNN) [57] to classify solutions with more potentiality to evaluate. • KTA2: Two_Arch2 [54] is adopted as the basic optimizer.
It uses an influential point-insensitive Kriging model [17] to approximate each objective function. In addition, it adopts an adaptive sampling strategy to select solutions to evaluate.
This set of the experiments are conducted on DTLZ problems with two, three objectives. In the experiments, the number of decision variables of DTLZ1 and DTLZ3 is set to seven. The number of decision variables of other test problems is set to twelve. All algorithms use the Wilcoxon rank-sum (WRS) test to compare the results at a significance level of 0.05 over 30 independent runs. "+", "−" and "≈" are used to compare the proposed algorithm and compared SAEAs. Symbols "+", "−" denote that MS-MOEA performs significantly better and significantly worse than the compared SAEAs. Symbol "≈" indicates that there is no statistically significant difference between MS-MOEA and the compared SAEAs.
The inverted generational distance (IGD) is adopted for all experiments as performance indicator. IGD is an comprehensive indicator that can consider both convergence and diversity of the nondominated solutions. IGD is defined as follows: where PF is a set of points uniformly distributed on the real Pareto front and |PF| represents the number of elements in PF. For all test problems, the number of reference points for calculating IGD is set to 10,000. P is the nondominated solution set obtained by the algorithm. d min (x, P) is the minimum Euclidean distance from solution x in the PF to the solution in the P.

Parameter settings
To ensure the fairness of the experiments, the common parameter setting of all compared SAEAs are listed below: 1. The initial population size N and the number of initial data are set to 100. 2. The number of expensive function evaluations is set to 300. 3. The variance of additive noise is set to 0.2.
In addition, the other parameter settings of the five compared SAEAs are set according to the recommended settings in the original literature. The key parameters of the proposed algorithm are set as follows: 1. The fitness scaling factor of IBEA + k is set to 0.05. 2. The number of individuals for sampling η is set to 5.

The hyperparameter range of Matlab Toolbox DACE for
GP model is set to [10 −5 , 10 5 ]. We adopt the algorithm raised by Hookes and Jeeves [58] to optimize the hyperparameters. 4. The number of center points of RBFN hidden layer is set to ceil √ size(D), D is the number of the training data. 5. The spread rate σ is set to 2*Dis, Dis is the maximum distance between all center points. 6. The number of model assisted iteration w max is set to 20. 7. The threshold of noise δ is set to 0.03. 8. The parameters of crossover and mutation are set to p c = 1, η c = 20, p m = 1, η m = 20.
MS-MOEA has a parameter δ which directly affects the performance of noise treatment. It is used for estimating the influence of noise. Therefore, we set an experiment on DTLZ1 and DTLZ2 to select a proper value for the threshold of noise δ. DTLZ1 is a hard-to-converge test function. It is usually difficult to converge to the late stage of evolution when the noise has non-ignorable impact. DTLZ2 is an easy-to-converge test function. It is easy to converge to the late stage of evolution. That is the reason why we select them as the test functions. To better compare the effects of different noise thresholds on the performance of the proposed algorithm, we take a set of δ = {0.0003, 0.003, 0.03, 0.13, 0.23, 0.33, 0.43, 0.53} for experiments. For the convenience of observation, we set the x-axis coordinate to lg(x). Figure 4 is the result of DTLZ1 and DTLZ2 with two objectives which is executed by 30 runs independently. From Fig. 4, we can discover that δ determines the sensitivity of noise estimation method. The smaller this parameter δ is set, the more sensitive it is to noises. From Fig. 4a, we can discover this conclusion. When δ ≥ 0.03, it gets the familiar results in DTLZ1. When δ < 0.03, the algorithm performance is significantly reduced. The larger this parameter δ is set, the more non-sensitive it is to noises. From Fig. 4b, we can discover the conclusion. When δ = 0.03, it gets the best results in DTLZ2. When δ is set be larger, the noise needs to cause greater impact on the population before denoising. For keeping the suitable sensitivity of noise estimation method, δ is set to 0.03.

Effect of noise treatment
To test the performance of the proposed noise treatment, we design an experiment that selects DTLZ2 and DTLZ5 as the test problems. Because DTLZ2 and DTLZ5 belong to the test problems that are sensitive to noises. Every test problem would be tested 30 runs independently. MS-MOEA and the version without noise treatment, termed N-(MS-MOEA), are compared. To ensure the fairness of the experiment, the objective values without noises are used in the calculation of IGD, and the real objective values do not participate in population iteration. The results are shown in Table 1.
The experimental results are as expected, MS-MOEA have achieved the best results on all test problems. It proves the effectiveness of our proposed noise treatment method. Noises in the objective space have a direct impact on the dominance relationship between individuals in the population. It would has an adverse impact on the evolutionary search of the population, and even degenerate into random search. Therefore, when the noises has a large influence on the objective values, it is very necessary to deal with the noises.

Effect of model switch strategy
To discuss the role of model switch in MS-MOEA, we set up the following experiment. This experiment adopts DTLZ1, DTLZ2 and DTLZ5 as the test problems. Each problem would be tested 30 runs independently. According to the different models used in the experiment, the experimental variables are recorded as (MS-MOEA)-GP, (MS-MOEA)-RBFN and MS-MOEA. To ensure the fairness of the experiment, the objective values without noises are used in the calculation of IGD, and the objective values do not participate in the optimization process. In addition, the frequency of usage of the two surrogate models in MS-MOEA is used to further explain the experimental results. The meaning of FG and FR is shown below. The results are shown in Table 2.
• FG: It is the frequency of the usage of GP model. • FR: It is the frequency of the usage of RBFN model.
According to the results of experiment, we discover that the model switch strategy can find better surrogate model between GP and RBFN. It also proves that selecting an appropriate model according to the influence of noise on the objective values is conducive to evolutionary optimization. In addition, we can find that the performance of MS-MOEA and (MS-MOEA)-GP is similar in DTLZ1, it is because DTLZ1 is a hard-converging test function. Therefore, it is always in the early stage of the evolution process, when the impact of the noise is little. That is why the RBFN is never used in DTLZ1. In addition, we can discover the performance of using GP as the surrogate model is better than using RBFN as the surrogate model from the experimental results. The performance of MS-MOEA and (MS-MOEA)-RBFN is similar in DTLZ2 and DTLZ5, because both of them are sensitive to the noise. Therefore, they are always in the late stage of the evolution process, when the impact of the noise is non-ignorable. That The best result in each problem is bold. The symbols "+", "−" and "≈" indicate that MS-MOEA is statistically significantly superior to, inferior to, and almost equivalent to N-(MS-MOEA), respectively (the significance level is 0.05)

Table 2
Results of ( The best result in each problem is bold. The symbols "+", "−" and "≈" indicate that MS-MOEA is statistically significantly superior to, inferior to, and almost equivalent to the others, respectively (the significance level is 0.05) is why the GP is never used in DTLZ2 and DTLZ5. In addition, we can discover the performance of using RBFN as the surrogate model is better than using GP as the surrogate model from the experimental results. The results can prove that our model switch strategy can skillfully use the influence of noise to select the appropriate surrogate model for the evolutionary optimization.

Effect of improved environment selection
To verify the role of the improved environment selection, we design an experiment. The conventional environment selection methods used in different evolutionary stages in the algorithm are compared. This experiment adopts DTLZ1, DTLZ2, DTLZ5 and DTLZ7 as the test problems. Each problem would be tested 30 runs independently, too. To ensure the fairness of the experiment, the objective values without noises are used in the calculation of IGD, and the real objective values does not participate in all the compared algorithms. The results are shown in Table 3. We can also find that (MS-MOEA)-C+C and (MS-MOEA)-U+C have similar performance in DTLZ2 and DTLZ5. This is mainly because DTLZ2 and DTLZ5 are sensitive to noises. Therefore, the environmental selection in the late stage plays an important role. As for the reason why MS-MOEA is not significantly better than other algorithms on DTLZ2 and DTLZ5 is that the noise treatment can only deal with noises to a certain extent. Some objectives of DTLZ7 are sensitive to noise and some objectives are insensitive. Our improved environment selection uses different environmental selections for different evolutionary stages. Therefore, it has achieved satisfactory results. In summary, the selection considering convergence, diversity and uncertainty are the key to achieve the most satisfactory performance of MS-MOEA.   The best result in each problem is bold. The symbols "+", "−" and "≈" indicate that MS-MOEA is statistically significantly superior to, inferior to, and almost equivalent to the others, respectively (the significance level is 0.05)

Effect of adaptive sampling approach
To test the performance of the adaptive sampling approach in MS-MOEA, MS-MOEA is compared with MS-MOEA without adaptive sampling approach that is regarded as (MS-MOEA)-N. This experiment adopts DTLZ1, DTLZ2 and DTLZ7 as the test problems. Each problem would be tested 30 runs independently. To ensure the fairness of the experiment, the objective value without noises is used in the calculation of IGD, and the objective value does not participate in the optimization process. The results are shown in Table 4. By analyzing the experimental results, we discover that the algorithm with adaptive sampling approach have better performance than the other one. Because the adaptive sampling can find the improvement of convergence and diversity and use targeted sampling approach to select solutions to evaluate. It is beneficial to make full use of the limited function evaluations. It is the adaptive sampling approach that makes MS-MOEA get the best results on all test problems.
To study the usage of different sampling approach in every test function, we summarize the results of 30 independent runs about them and these results are displayed in Fig. 5. It is easy to be discovered that the convergence sampling approach is most frequently used in DTLZ1, the uncertainty sampling approach is next and the diversity sampling approach is the last. DTLZ1 is a multi-modal test function, so it is difficult to converge and easy to fall into a local optimum. In DTLZ2, the diversity sampling approach is also more adopted. Because DTLZ2 is an easy to converge test function, it needs to take both convergence and diversity into account. The most frequently adopted sampling approach in DTLZ7 is the diversity sampling approach. The real PF of DTLZ7 is irregular. Therefore, the algorithm needs to pay attention to the maintenance of diversity. According to Fig. 5 and Table 4, our adaptive sampling approach can select appropriate sampling approach from the convergence approach, the diversity approach and the uncertainty approach based on the characteristic of the test functions.

Comparative experiments
To test the overall performance of the proposed algorithm, we employ KRVEA, ParEGO, MOEA/D-EGO, CSEA and KTA2 as compared algorithms and carry out experiments on the test problems DTLZ1∼7. This paper uses the compared algorithms implemented in PlatEMO [59] to test. The number of objectives includes two and three in this experiment. IGD is used as performance indicator for 14 test problems, and each problem would be tested 30 runs independently. For ensuring the fairness of the experiment, the objective values without noises is used in the calculation of IGD, and the The best result in each problem is bold. The symbols "+", "−" and "≈" indicate that MS-MOEA is statistically significantly superior to, inferior to, and almost equivalent to the others, respectively (the significance level is 0.05)  Table 5.
From the experimental results, we can observe that MS-MOEA obtains the best results of 12 test problems; ParEGO obtains the best results of 1 test problems; KTA2 obtains the best result on 1 test problems; KRVEA, MOEA/D-EGO and CSEA do not get the best results on any test problem. Therefore, MS-MOEA has achieved the best performance between all compared algorithms on the DTLZ problems. The final non-dominated solutions obtained by six algorithms on two objectives DTLZ2 in the run associated with the median IGD values are plotted in Fig. 6. In addition, Fig. 7 shows the final non-dominated solutions obtained by six algorithms on three objectives DTLZ1 in the run associated with the median IGD values.
According to the characteristics of DTLZ series test problems, all the above test problems can be divided into three categories: 1. DTLZ1 and DTLZ3 are the first category. This kind of test problems are multimodal and have a large number of local optimal solutions, so it is difficult to converge to the true PF. DTLZ3 is more difficult to converge than DTLZ1. 2. DTLZ2 and DTLZ4 are the second type of test problems, which are uni-modal and easy to converge, but it is difficult to maintain the diversity of the population, and DTLZ4 is sensitive to the initial data. 3. DTLZ5∼7 are the third type. The PF of these test problems are irregular. The real PF of DTLZ5 and DTLZ6 are degenerated curves, while the real PF of DTLZ7 is discontinuous. These test problems require that the algorithm can better maintain the diversity of population.
For the first type of test problems, MS-MOEA almost achieves the best result. Through the analysis of the results, it is found that the reason for this result may be that the model switch strategy can select a suited surrogate model from GP and RBFN. At the same time, the adaptive sampling approach can make use of approximation uncertainty to get rid of the local optimum. In addition, KTA2 has the similar results with MS-MOEA when the number of objectives is three. Because both of DTLZ1 and DTLZ3 are difficult to converge, noise have little influence on them. However, when the number of objectives is two, MS-MOEA is significantly better than KTA2. Because KTA2 is designed for many-objectives problems.
For the second type of test problems, MS-MOEA achieves the best results which show that the proposed noise treatment is very effective. The adaptive sampling approach can also find the most improved aspect for sampling by analyzing the population of candidate solutions. However, other compared algorithms are greatly affected by noise, because they do not deal with the noise, resulting in the degradation of algorithms performance.
For the third type of test problems, MS-MOEA obtains the best results on DTLZ5 and DTLZ6, which benefits from the noise treatment, the improved environmental selection and adaptive sampling approach. However, the performance of DTLZ7 is inferior to ParEGO. Through the analysis of the experimental results, it is found that the reason is that the noise treatment designed in this paper is lack when some objectives are easy to converge and other objectives are dif-  The best result in each problem is bold. The symbols "+", "−" and "≈" indicate that MS-MOEA is statistically significantly superior to, inferior to, and almost equivalent to the others, respectively (the significance level is 0.05)

Conclusions
In this work, we propose an adaptive model switch-based surrogate-assisted evolutionary algorithm called MS-MOEA for noisy expensive multi-objective optimization problems. we propose a method to estimate the influence of noise and a noise reduction method based on RBFN, and the noise experiment shows that this noise treatment is effective. This algorithm adopts a model switch strategy based on noise influence to select the surrogate model that meets the needs from GP and RBFN. We find that this strategy is helpful to improve the performance of the algorithm by the experiment. The proposed improved environmental selection and adaptive sampling approach are used to obtain candidates and samples. Through the comparison between MS-MOEA and the other five SAEAs, it is found that MS-MOEA achieves better performance than the compared SAEAs on noisy DTLZ problems. However, there are still two deficiencies that can be improved in this paper. We need to improve the adaptive sampling approach for multimodal MOPs, these problems require the algorithm to use approximation uncertainty to search better solutions. For the same MOP, there are both noise sensitive and noise insensitive objectives, the noise treatment method performs poorly. In addition, the noise studied in this paper belongs to additive noises. Compared with the additive noise, the multiplicative noise is a more challenging task to be addressed.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.