Adaptive dropout for high-dimensional expensive multiobjective optimization

Lin, Jianqing; He, Cheng; Cheng, Ran

doi:10.1007/s40747-021-00362-5

Adaptive dropout for high-dimensional expensive multiobjective optimization

Original Article
Open access
Published: 21 April 2021

Volume 8, pages 271–285, (2022)
Cite this article

Download PDF

You have full access to this open access article

Complex & Intelligent Systems Aims and scope Submit manuscript

Adaptive dropout for high-dimensional expensive multiobjective optimization

Download PDF

2836 Accesses
36 Citations
Explore all metrics

Abstract

Various works have been proposed to solve expensive multiobjective optimization problems (EMOPs) using surrogate-assisted evolutionary algorithms (SAEAs) in recent decades. However, most existing methods focus on EMOPs with less than 30 decision variables, since a large number of training samples are required to build an accurate surrogate model for high-dimensional EMOPs, which is unrealistic for expensive multiobjective optimization. To address this issue, we propose an SAEA with an adaptive dropout mechanism. Specifically, this mechanism takes advantage of the statistical differences between different solution sets in the decision space to guide the selection of some crucial decision variables. A new infill criterion is then proposed to optimize the selected decision variables with the assistance of surrogate models. Moreover, the optimized decision variables are extended to new full-length solutions, and then the new candidate solutions are evaluated using expensive functions to update the archive. The proposed algorithm is tested on different benchmark problems with up to 200 decision variables compared to some state-of-the-art SAEAs. The experimental results have demonstrated the promising performance and computational efficiency of the proposed algorithm in high-dimensional expensive multiobjective optimization.

Dimension Dropout for Evolutionary High-Dimensional Expensive Multiobjective Optimization

Surrogate Many Objective Optimization: Combining Evolutionary Search, $$\epsilon $$ -Dominance and Connected Restarts

Pareto-Based Bi-indicator Infill Sampling Criterion for Expensive Multiobjective Optimization

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Multiobjective optimization problems (MOPs) refer to the optimization of multiple conflicting objectives simultaneously, i.e., improvement in one objective may lead to the degeneration of at least one of the other objectives [1]. MOPs are commonly seen in the science and engineering areas [2], e.g., neural architecture search [3], structure optimization [4], and industrial dispatch [5]. An MOP can be mathematically formulated as

$$\begin{aligned} \begin{aligned}&\text {minimize} \quad {\mathbf {f}}({\mathbf {x}})=\left( f_1({\mathbf {x}}),\ldots ,f_m({\mathbf {x}})\right) \\&\text {subject to} \quad \ {\mathbf {x}}\in X, \\ \end{aligned} \end{aligned}$$

(1)

where X is the search space of decision variables, m $(\ge 2)$ is the number of objectives, and ${\mathbf {x}} = (x_1, \ldots , x_D)$ is the decision vector with D decision variables [6].

The goal of multiobjective optimization is to obtain a set of representative Pareto-optimal solutions [7]. To be more specific, each solution in the Pareto-optimal solution set (PS) is not Pareto dominated by other solutions in X. Notably, the Pareto dominance between solution ${\mathbf {x}}_1$ and solution ${\mathbf {x}}_2$ is notated as ${\mathbf {x}}_1\prec {\mathbf {x}}_2$ if and only if $f_i({\mathbf {x}}_1)\le f_i({\mathbf {x}}_2)$ for all $1\le i\le m$, and $f_j({\mathbf {x}}_1)<f_j({\mathbf {x}}_2)$ for at least one $1\le j\le m$ [8].

As a complex case of MOPs, if the evaluation of any objective function in an MOP is computationally or economically expensive, the problem is called an expensive multiobjective optimization problem (EMOP) [9]. For example, in an automobile body structural design problem, one structural simulation takes tens of minutes or even hours [10]. As a powerful tool for black-box optimization, the evolutionary algorithms (EAs) are proven to be helpful in solving MOPs. Nevertheless, EAs are not competent to solve EMOPs, since the requirement of massive function evaluations leads to unbearable time consumption and computational cost [11].

Surrogate-assisted EAs (SAEAs), which apply surrogate model techniques to traditional evolutionary algorithms, are the mainstream methods to solve EMOPs. Surrogate models are used to replace part of computationally expensive function evaluations (FEs). Since the surrogate models are computationally cheap and efficient, they can significantly reduce the required computational costs [12]. The most commonly used surrogate models include polynomial regression (PR) [13], Gaussian process (GP) [13, 14], radial basis function (RBF) [15, 16], artificial neural networks (ANN) [11], etc. Apart from the adopted surrogate model(s), model management is also a crucial part of SAEAs, which mainly involves how to update the current surrogate model in two aspects. First, it includes the selection of offspring individuals to be re-evaluated by expensive function(s). Second, the infill criterion determines the quality of the candidate re-evaluated individuals, where the re-evaluated solutions will be added to the training dataset for updating the surrogate model [17].

In the past decades, a number of SAEAs have been proposed to solve EMOPs effectively. The most representative SAEAs include ParEGO [18], MOEA/D-EGO [19], K-RVEA [20, 21], CSEA [11], etc. However, the SAEAs above only focus on low-dimensional optimization with less than 30 decision variables [18, 19, 21, 22]. It is mainly due to the lack of training samples for building accurate surrogate models and the unbearable time cost for training the models with high-dimensional input. Recently, several attempts are made to address this issue. Some methods use dimensionality reduction techniques to map decision variables from high-dimensional space to low-dimensional space and then optimize in low-dimensional space [23,24,25]. Other approaches try to use other cheap surrogate models instead of computationally expensive Gaussian process models, or use some multiswarm optimization algorithms rather than conventional genetic algorithms [26]. Very recently, EDN-ARMOEA, a new method was proposed for solving high-dimensional EMOPs, which used a dropout neural network instead of the Gaussian process [27]. Nevertheless, the performance of the SAEAs mentioned above can be further improved.

In this paper, a surrogate-assisted particle swarm optimization algorithm with an adaptive dropout mechanism, called ADSAPSO, is proposed to handle high-dimensional EMOPs. We mainly focus on selecting a small number of crucial decision variables for convergence enhancement and diversity maintenance, aiming to reduce the cost for building and training surrogate model(s). Compared with EDN-ARMOEA, which uses a dropout ANN for dimension reduction, the proposed ADSAPSO is different mainly in two aspects. (i) EDN-ARMOEA uses the dropout strategy for improving the computational efficiency of the surrogate model; by contrast, the dropout strategy in ADSAPSO is adopted to reduce the dimension of the optimization problem. (ii) The dropout neural network in EDN-ARMOEA discards neurons of the network randomly with a certain probability. On the contrary, the dropout mechanism in ADSAPSO learns the deterministic reduction via a statistic model. The main new contributions of this work are summarized as follows:

(1)
Inspired by the dropout strategy in deep learning, we proposed an adaptive dropout mechanism to select several decision variables for dimensionality reduction. Unlike existing unsupervised dimensionality reduction approaches, the proposed adaptive mechanism learns from the statistical differences of different solution sets in the decision space for better dimension reduction.
(2)
To select promising candidate solutions to be re-evaluated, a novel infill criterion is proposed, which contains the RBF-based search and the Replacement operation. Notably, We use the RBF-based search to optimize the selected d decision variables, and the Replacement operation is used to extend the low-dimensional decision vectors to full-length candidate solutions.

The rest of this paper is organized as follows. In “Background”, we provide the background of the work, including the dropout mechanism, the RBF model, and the estimation of distribution model. The details of the proposed algorithm are illustrated in “Proposed method”. The experimental studies and result analysis are given in “Experimental study”. Finally, “Conclusion” concludes this paper.

Background

In this section, we first summarize the dropout in deep learning, which is extended to dimension dropout in this study. Next, the detailed principle and components of the RBF are illustrated. Then we introduce the estimation of distribution model, which is used in our proposed adaptive dropout mechanism.

Dropout

Dropout, which was first proposed in Ref. [28], originally referred to temporarily discarding neurons from the network according to a certain probability during the forward propagation process of the deep neural network, which is equivalent to finding a thinner network from the original network. An example of the dropout in a neural network is shown in Fig. 1.

The original neural network is shown in Fig. 1a. After dropout, some neurons are discarded, as shown by the dotted circle in Fig. 1b. The complexity of the neural network is significantly reduced, and it can make the neural network more general, which reduces the reliance on some local features and effectively alleviates the occurrence of over-fitting.

More specifically, for a neural network with N neurons, it can be regarded as a collection of $2^N$ neural networks after dropout, which can prevent over-fitting while improving the training efficiency of networks.

Radial basis function (RBF) neural network

The RBF neural network is a simple and widely used surrogate model that approximates any nonlinear function. The RBF neural network has a fast training speed due to that it is a local approximation network. On the basis of training data $\{ (\mathbf{x }_i,y_i)| i=1,\ldots ,N \}$, it approximates a continuous function as follows:

$$\begin{aligned} \ {\hat{f}}(\mathbf{x })=\sum _{i=1}^{N} w_i\phi (\mathbf{x }_i,\mathbf{x }), \end{aligned}$$

(2)

where $w_i$ is the weight coefficient, $\phi (\cdot , \cdot )$ represents the kernel function and the distance between the point $\mathbf{x }_i$ and the point x is used as the independent variable of the kernel function.

Commonly used kernel functions include linear kernel function, polynomial kernel function, Gaussian kernel function, etc. For example, the Gaussian kernel function is calculated as follows:

$$\begin{aligned} \ \phi (\mathbf{x }_i,\mathbf{x })=\exp (-\frac{\Vert \mathbf{x }_i -\mathbf{x } \Vert ^2}{2\delta ^2}), \end{aligned}$$

(3)

where $\delta $ is the expansion constant of kernel function, which reflects the width of the kernel function image. The smaller the $\delta $, the narrower the width.

The weight vector $\mathbf{w }=(w_1,\ldots ,w_N)$ is calculated as follows:

$$\begin{aligned} \ \mathbf{w } = (\varPhi ^T\varPhi )^{-1}\varPhi ^T\mathbf{y }, \end{aligned}$$

(4)

where $\mathbf{y }=(y_1,\ldots ,y_N)$ is the output vector of the training data and $\varPhi $ is the matrix computed as follows:

$$\begin{aligned} \begin{aligned} \ \varPhi = \left[ \begin{matrix} \phi (\mathbf{x }_1, \mathbf{x }_1) &{} \cdots &{} \phi (\mathbf{x }_1, \mathbf{x }_N) \\ \vdots &{} \ddots &{} \vdots \\ \phi (\mathbf{x }_N, \mathbf{x }_1) &{} \cdots &{} \phi (\mathbf{x }_N, \mathbf{x }_N) \end{matrix} \right] . \end{aligned} \end{aligned}$$

(5)

Statistical model

Statistical model refers to a model built by mathematical-statistical methods, which obtains the relationship and the development trend between variables through mathematical statistics [29].

Mathematical statistics has two functions: descriptive statistics and inferential statistics. Central trend statistics is a standard descriptive statistical method, which uses a trend value to represent the overall level of a set of data. Commonly used trend values include average, median, and mode. Discrete trend analysis reflects the degree of dispersion of data, and its commonly used indicators are range and standard deviation. Frequency analysis mainly describes the number of occurrences of each characteristic value of the data, reflecting the distribution of a set of data.

Inferential statistics is divided into two parts: parameter estimation and hypothesis testing. The former is based on a one-time sampling experiment to estimate a certain characteristic of the entire data set. The latter test a certain hypothesis and infer whether the hypothesis made is acceptable according to the calculation results.

Proposed method

In this section, the main framework of the proposed ADSAPSO is introduced, followed by the details of the proposed model management, including the adaptive dropout mechanism and a new infill criterion.

Framework of ADSAPSO

The framework of the proposed ADSAPSO is presented in Algorithm 1, which mainly consists of the adaptive dropout mechanism and the infill criterion. First, N solutions are sampled using the Latin hypercube sampling (LHS) strategy [30] and evaluated using the expensive functions. Then these evaluated candidate solutions are merged into an empty archive Arc. Next, the environmental selection selects $N_\alpha $ solutions (arc) from archive Arc. Afterwards, the adaptive dropout mechanism and the infill criterion are adopted. The adaptive dropout mechanism aims to select the d-dimensional variables, and the infill criterion is used to choose the new samples to be evaluated by expensive functions. Notably, we first select one well-performing solution set and one poorly-performing solution set, which are statistically different in the decision space. Accordingly, d-dimensional variable set S is selected from the full-length decision variables, where the chosen variables significantly affect the convergence. As for the infill criterion, the RBF-based search is used to optimize the selected d-dimensional decision variables and obtain optimal k d-dimensional decision vector set $X_d$. Then a replacement operator selects k full-length solutions, whose corresponding decision variables will be replaced by $X_d$, to form individual set $X_{{\text {new}}}$. $X_{{\text {new}}}$ will be evaluated using expensive functions for updating the archives Arc. Finally, the obtained Arc will be output as the final solutions. Note that ADSAPSO uses the same environmental selection operator in NSGA-II [7], and we will not describe its details.

Adaptive dropout mechanism

The adaptive dropout mechanism is a crucial component of the proposed ADSAPSO (line 8 in Algorithm 1), which discards the decision dimensions from D dimensions to d. In D-dimensional decision variables, some variables have a more significant effect on convergence enhancement, and they are important for expensive multiobjective optimization. Thus, the adaptive dropout mechanism adapts a statistical model-assisted method for selecting d decision variables that significantly affect convergence from the original full-length decision variables. An example of the adaptive dropout mechanism is shown in Fig. 2, where the dashed circle indicates the discarded decision variables.

Unlike the principal component analysis technique [31], which maps data from high-dimensional space to low-dimensional space, our proposed adaptive dropout mechanism retains the original information of the selected d dimensions. Specifically, the proposed adaptive dropout mechanism learns from different solution sets’ statistical differences in the decision space.

Generally, the proposed adaptive dropout mechanism consists of three steps: (1) Solutions selection; (2) Statistical models construction; (3) Dimension selection.

$\textit{(1) Solutions selection:}$ Assuming that there are significant differences in some dimensions in the decision space between a set of well-performing solutions and a set of poorly-performing ones. The distribution differences of these two solution sets can somehow reflect the importance of some decision variables in convergence enhancement. Thus, a suitable solution selection should be used to capture the differences. If well-performing solutions and poorly-performing ones are selected from archive Arc, the poorly-performing solutions remain the same during the update process. In this case, the rankings of poorly-performing solutions in the whole Arc are almost unchanged. In other words, the distribution of the poorly-performing solution sets will not help in the later stages of evolution. To remedy this issue, we select the well-performing solutions and the poorly-performing solutions from archive arc, where solutions in arc are elite solutions selected from Arc. Afterwards, according to the non-dominated sorting, the first $N_s$ solutions in arc are regarded as well-performing solutions, and the last $N_{\text {s}}$ solutions in arc are deemed to be poorly-performing ones. Consequently, the indistinguishable situation can be avoided.

$\textit{(2) Statistical models construction}$ After solutions selection, two statistical models are constructed using the selected well-performing solutions and the selected poorly-performing solutions, respectively. To discover the effect of different variables, we analyze the two sets of solutions in the decision space via the constructed two statistical models. To be more specific, we count the average value of each dimension of the two solution sets as the statistical value separately. Since the selected two sets of solutions have significant differences in the objective space, the influence of different variables on the objective values can be reflected by the statistical differences in the decision space. Thus, dimensions with significant differences in the decision space will be emphasized for better convergence enhancement in the following optimization.

$\textit{(3) Dimension selection}$ After constructing two statistical models, a difference model is obtained by calculating the difference between each dimension in the two models. Then, we select the top $\beta \cdot D$ dimensions with the highest absolute difference to set S.

Figure 3 shows an example of the adaptive dropout mechanism, and Fig. 3a demonstrates the detailed process of solutions selection. In this figure, the blue squares are the “Good” solutions and the red triangles are the “Bad” ones. Notably, the black circles on the upper right corner in Fig. 3a are the solutions (in Arc) that are not selected by arc. They will not be selected as “Bad” solutions for preventing the distribution of “Bad” solutions from premature. Figure 3b presents two statistical models, where the horizontal axis represents the dimension, and the vertical one is the dimension value. Besides, the red line indicates the model of poorly-performing solutions (“Bad” solutions), and the blue line indicates the model of well-performing solutions (“Good” solutions). Figure 3c demonstrates the absolute difference of the above two statistical models, where the horizontal axis represents the dimension, and the vertical axis represents the absolute difference between two statistical models. The dotted line represents the selection threshold, and the shaded parts indicate the dimensions to be selected. Specifically, $d_1$, $d_2$, $d_3$, and $d_4$ represent the lower and upper boundaries that locating to the dimensions above the threshold. The corresponding dimensions in $[d_1,d_2]$ and $[d_3,d_4]$ are the dimensions we ultimately choose.

For high-dimensional EMOPs, a large number of training samples are required for constructing accurate surrogate models, which is unrealistic for expensive multiobjective optimization. Our proposed adaptive dropout mechanism reduces the dimension of decision variables from D to d according to the statistical results. Generally, the proposed method helps build relatively accurate d-dimensional surrogate models with the same number of training samples.

Infill criterion

Once the d-dimensional decision vector set S, which has a more significant effect on convergence, is selected by adaptive dropout, the proposed ADSAPSO is expected to optimize the d decision variables and select promising candidate individuals to be re-evaluated.

In our proposed infill criterion, an RBF-based search is adapted to optimize the selected d-dimensional variables given in Algorithm 2. First, m d-dimensional RBF models are trained using the individuals in ${\text {arc}}_{d}$, which will be used to replace the original expensive functions for evaluating offspring. Notably, RBF is used due to its insensitivity to the increment in the dimension of the function to be approximated [32, 33]. Compared with the Kriging model [34], RBF is more suitable for problems with high-dimensional decision variables, since the computation time for training Kriging models will become unbearable when the number of training samples is large. Then, the particle swarm optimizer (PSO) [35] is adopted for further optimization, due to its promising capability in solving high-dimensional optimization problems as suggested in Ref. [34]. To enhance population diversity at the late stage of the evolution, we add a polynomial mutation operation (Line 6 in Algorithm 2). In the late stage of the evolution, the population may trap in local optima easily, and thus $0.75 \times {\text {MaxFEs}}$ is empirically adopted as the threshold for adopting polynomial mutation. After the optimization, k d-dimensional better-converged decision vectors are selected from the final decision vector set. Compared with the surrogate models built with D-dimensional training samples, although not all decision variables are optimized at every iteration, the surrogate model built with d-dimensional training samples can better help optimize the d-dimensional variables and accelerate convergence rate. The RBF-based search in the proposed ADSAPSO aims to conduct the local search in a low-dimensional space to quickly obtain better-converged solutions, which is naturally suitable for high-dimensional EMOPs.

Since the optimized k decision vectors of dimension d cannot be evaluated by expensive functions directly, we introduce a Replacement method that extends d-dimensional decision vectors to D-dimensional ones. To be more specific, k solutions of dimension D, named $X_D$, are first selected from the non-dominated solution set of arc by the environmental selection. Note that the environmental selection adopted here is the same as that in Algorithm 1. Next, the corresponding dimensions of $X_D$ will be replaced by $X_d$ to form k new individuals $X_{{\text {new}}}$. Specifically, solutions in $X_d$ are well converged, and solutions in $X_D$ are with good diversity. To some extent, the newly generated solutions $X_{{\text {new}}}$ can be considered to inherit the convergence and diversity properties of $X_d$ and $X_D$ simultaneously. Finally, we re-evaluate $X_{{\text {new}}}$ for updating the archives. An illustrative example of the Replacement is shown in Fig. 4, where each circle denotes a decision variable, and the dashed circles will be replaced for evaluating full-length decision vectors.

Experimental study

To empirically investigate the performance of the proposed ADSAPSO, we first discuss the impact of different parameter $\beta $ value on the algorithm. Then, three state-of-the-art SAEAs, namely MOEA/D-EGO [19], K-RVEA [21], and RM-MEDA [36] are compared with our proposed ADSAPSO on test problems. Notably, we compare our proposed algorithm with RM-MEDA since both algorithms have adopted Statistical models during the optimization. In the rest of this section, we first present a brief introduction to the experimental settings of all the compared algorithms. Then the test problems and performance indicators are described. Afterwards, each algorithm is run 20 times on each test problem independently. The Wilcoxon rank sum test is used to compare the results obtained by the proposed ADSAPSO and the compared algorithms at a significance level of 0.05. Symbols “$+$”, “−”, and “$\approx $” mean that the compared algorithm is significantly better than, significantly worse than, and approximately equal to the proposed ADSAPSO, respectively.

Experimental settings

For fair comparisons, all the compared algorithms are implemented in PlatEMO [37] using MATLAB, and they are run on a computer with an Intel Core i7 processor and 32 GB of RAM.

(1)
Parameters settings: In our proposed ADSAPSO, the number of samples used to build the surrogate model is set to $N_{\alpha }=200$, and the number of individuals to be re-evaluated is set to $k = 5$. The number of selecting for well-performing solutions and poorly-performing solutions in Solutions Selection is set to $N_{\text {s}} = 50$. In K-RVEA, the number of individuals to be re-evaluated and used for updating surrogate models is also set to 5. For the other compared algorithms, the recommended parameter settings in the literature are used.
(2)
Reproduction operators: In this paper, the maximum number of generations in the RBF-based search is set to $\lambda _{{\text {max}}}=100$. In PSO, the population size of PSO is set to ${\text {N\!I}} =$100, and the inertia weight of PSO is set to $W=0.5$.
(3)
Termination Condition: For all the test problems except RM-MEDA, the size of the initial data is set to 500. The maximum function evaluations using the expensive function (denoted as MaxFEs) is set to 1000.

Test problems and performance indicator

In this paper, we use ten IMF problems selected from [38] and seven DTLZ problems. Among the IMF test suite and DTLZ test suite, the number of objectives is three in IMF4, IMF8 and all in DTLZ, and it is two in other problems.

The Inverted Generational Distance (IGD) indicator [39] is a commonly used performance indicator for multiobjective optimization, which can assess the quality of the obtained solution set from both the convergence and distribution uniformity. To better explain IGD, we suppose that P is a set of reference points that are evenly distributed on the Pareto-optimal Front (PF) and Q is the set of obtained non-dominated solutions. The mathematical definition of IGD is

$$\begin{aligned} \text {IGD}(P,Q)= \frac{\sum _{{\mathbf {p}} \in P}\text {dis}({\mathbf {p}},Q)}{|P|}, \end{aligned}$$

(6)

where $\text {dis}({\mathbf {p}},Q)$ is the minimum Euclidean distance between ${\mathbf {p}}$ and points in Q, and |P| is the number of reference points in P. A smaller IGD value indicates better performance of the algorithm.

Impact of parameter $N_{\text {s}}$

In the proposed Adaptive Dropout Mechanism, parameter $N_{\text {s}}$ is the number of selected well-performing solutions and poorly-performing solutions from the elite archive arc. A sensitivity analysis is conducted to investigate the impact of parameter $N_{\text {s}}$ on the performance of ADSAPSO. Here, ADSAPSO with $N_{\text {s}}$ being to 100 and 67 are compared with ADSAPSO with $N_{\text {s}} =$50. Different $N_{\text {s}}$ values are derived from the empirical choices where $N_\alpha $ is set to 200. $N_{\text {s}} = 100$ means that half of arc is selected, $N_{\text {s}} = 67$ indicates that the elite archive arc is divided into three equal parts, and $N_{\text {s}} = 50$ represents a quarter of the elite archive arc is obtained.

The statistical results of the IGD values achieved by ADSAPSO with different $N_{\text {s}}$ values on the IMF test suite are summarized in Table 1. It can be observed that ADSAPSO with different $N_{\text {s}}$ values performs similarly, and ADSAPSO with $N_{\text {s}} = 50$ has achieved the best results on these test problems. Thus, $N_{\text {s}} = 50$ in adopted in the rest experiments.

Table 1 Statistics of IGD results obtained by ADSAPSO with different $N_{\text {s}}$ on 30 IMF test instances

Full size table

Impact of parameter $\beta $

Additional comparative experiments are run to investigate the impact of parameter $\beta $ (indicating the probability of selecting the number of dimensions in the adaptive dropout mechanism). Keeping the same parameters settings as introduced in Sect. 4.1, the ADSAPSO with different $\beta $ values from $\{0.1, 0.3, 0.5, 0.7, 1.0\}$ are empirically compared.

The statistical results of the IGD values achieved by ADSAPSO with different $\beta $ values on the IMF test suite are summarized in Table 2.

Table 2 Statistics of IGD results obtained by ADSAPSO with different $\beta $ on 30 IMF test instances

Full size table

It can be observed that ADSAPSO with $\beta =0.5$ has achieved the best results on these test problems, and ADSAPSO with $\beta =0.1$ has achieved the worst results, which indicates that different values of $\beta $ have a significant influence on the result. In subsequent experiments, we will adopt $\beta =0.5$ in our proposed ADSAPSO.

Simultaneously, $\beta =1.0$ indicates that the adaptive dropout mechanism is not used in ADSAPSO, and full-length decision vectors are used during optimization. It can be observed that ADSAPSO with $\beta =0.5$ shows an overall better performance than ADSAPSO without adaptive dropout mechanism. However, the ADSAPSO without adaptive dropout mechanism is better on IMF6 and IMF8. Comparing with ADSAPSO with $\beta =1.0$, we can verify the effectiveness of our proposed adaptive dropout mechanism.

General performance

In this subsection, ADSAPSO is compared with two representative SAEAs, i.e., MOEA/D-EGO [19], K-RVEA [21], and RM-MEDA [36]. K-RVEA is an SAEA that relies on a set of adaptive reference vectors for expensive many-objective optimization, and the MOEA/D-EGO is an MOEA/D-based SAEA with the Gaussian stochastic process model. RM-MEDA is a regularity model-based multiobjective estimation of distribution algorithm, which is also based on statistical learning methods.

Table 3 Statistics of IGD results obtained by MOEA/D-EGO, K-RVEA, RM-MEDA and ADSAPSO on 30 IMF test instances

Full size table

Table 4 Statistics of IGD results obtained by MOEA/D-EGO, K-RVEA, RM-MEDA, and ADSAPSO on 21 DTLZ test instances

Full size table

The experimental results of the IGD value provided by MOEA/D-EGO, K-RVEA, RM-MEDA, and ADSAPSO on the IMF test suite and DTLZ test suite are recorded in Tables 3 and 4, respectively. Notably, MOEA/D-EGO fails to solve DTLZ4 with 200 decision variables due to the failure of Kriging models in handling high-dimensional data. It can be seen that ADSAPSO obtains better IGD values in most of the three test suite compared with MOEA/D-EGO, K-RVEA, and RM-MEDA.

The convergence profiles of the three compared algorithms on DTLZ problems with 200 decision variables are given in Fig. 5.

It can be observed that our proposed ADSAPSO converges faster than the other three compared algorithms on most problems. The results have demonstrated the superiority of our proposed ADSAPSO over the three compared algorithms on EMOPs in terms of convergence speed.

Figure 6 presents the archive Arc achieved by the ADSAPSO, MOEA/D-EGO and K-RVEA on bi-objective DTLZ1 with 50 decision variables at 500 evaluations (Initial archive), 750 evaluations (Interim archive), and 1000 evaluations (Final archive) to visualize the experimental results more intuitively.

It can be observed that the archive obtained from ADSAPSO converges in several directions at a faster convergence rate in the early stage, and the convergence slows down in the later stage, but the population diversity increases. MOEA/D-EGO and K-RVEA hardly converge in the whole process. Overall, ADSAPSO has achieved the best results on these problems, where the obtained final archive Arc are best converged.

Conclusion

In this paper, we have proposed an SAEA with adaptive dropout mechanism, called ADSAPSO, for solving high-dimensional EMOPs with up to 200 decision variables.

Generally, the adaptive dropout mechanism is used for selecting d-dimension decision variables from D-dimensional original decision variables. It can help build d-dimensional relatively high-precision RBF models, and the algorithm only needs to optimize the d-dimensional variables at each iteration. The infill criterion is adopted to select candidate individuals for re-evaluation, where the corresponding dimensions in k D-dimensional solutions are replaced by the k d-dimensional optimized individuals to form k new D-dimensional individuals.

Some systematic comparisons have been conducted on a set of EMOPs with up to 200 decision variables. First, we discuss the impact of different parameter $\beta $ value on the algorithm and find $\beta =0.5$ is the best. The proposed ADSAPSO is compared with three representative algorithms, i.e., MOEA/D-EGO, K-RVEA, and RM-MEDA, on test problems IMF and DTLZ. The experimental results indicate the superiority of ADSAPSO in solving EMOPs with high-dimension decision variables.

This paper indicates that ADSAPSO is promising in solving high-dimension EMOPs. Furthermore, it is promising to extend ADSAPSO to solving expensive large-scale multiobjective optimization problems [40] and expensive many-objective optimization problems. It is also desirable to apply it to real-world applications.

References

He C, Cheng R, Zhang C, Tian Y, Chen Q, Yao X (2020) Evolutionary large-scale multiobjective optimization for ratio error estimation of voltage transformers. IEEE Trans Evol Comput 24(5):868–881
Article Google Scholar
Pan L, He C, Tian Y, Su Y, Zhang X (2017) A region division based diversity maintaining approach for many-objective optimization. Integr Comput Aided Eng 24(3):279–296
Article Google Scholar
Lu Z, Whalen I, Dhebar Y et al (2020) Multi-objective evolutionary design of deep convolutional neural networks for image classification. IEEE Trans Evol Comput 25(2):277–291
Sun G, Pang T, Fang J, Li G, Li Q (2017) Parameterization of criss-cross configurations for multiobjective crashworthiness optimization. Int J Mech Sci 124:145–157
Article Google Scholar
Abido MA (2006) Multiobjective evolutionary algorithms for electric power dispatch problem. IEEE Trans Evol Comput 10(3):315–329
Article Google Scholar
He C, Tian Y, Jin Y, Zhang X, Pan L (2017) A radial space division based many-objective optimization evolutionary algorithm. Appl Soft Comput 61:603–621
Article Google Scholar
Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 6(2):182–197
Article Google Scholar
He C, Cheng R, Danial Y (2020) Adaptive offspring generation for evolutionary large-scale multiobjective optimization. IEEE Trans Syst Man Cybern Syst. https://doi.org/10.1109/TSMC.2020.3003926
He C, Cheng R, Jin Y, Yao X (2019) Surrogate-assisted expensive many-objective optimization by model fusion. In: 2019 IEEE congress on evolutionary computation (CEC), pp 1672–1679. IEEE
Akhtar T, Shoemaker CA (2016) Multi objective optimization of computationally expensive multi-modal functions with RBF surrogates and multi-rule selection. J Glob Optim 64(1):17–32
Article MathSciNet Google Scholar
Pan L, He C, Tian Y, Wang H, Zhang X, Jin Y (2019) A classification-based surrogate-assisted evolutionary algorithm for expensive many-objective optimization. IEEE Trans Evol Comput 23(1):74–88
Article Google Scholar
He C, Huang S, Cheng R, Tan KC, Jin Y (2020) Evolutionary multiobjective optimization driven by generative adversarial networks (GANs). IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2020.2985081
Zhou Z, Ong Y, Nguyen M, Lim D (2005) A study on polynomial regression and gaussian process global surrogate model in hierarchical surrogate-assisted evolutionary algorithm. In: 2005 IEEE congress on evolutionary computation, vol 3, pp 2832–2839
Liu B, Zhang Q, Gielen GGE (2014) A Gaussian process surrogate model assisted evolutionary algorithm for medium scale expensive optimization problems. IEEE Trans Evol Comput 18(2):180–192
Article Google Scholar
Wang Y, Yin D, Yang S, Sun G (2019) Global and local surrogate-assisted differential evolution for expensive constrained optimization problems with inequality constraints. IEEE Trans Cybern 49(5):1642–1656
Article Google Scholar
Regis RG (2014) Evolutionary programming for high-dimensional constrained expensive black-box optimization using radial basis functions. IEEE Trans Evol Comput 18(3):326–347
Article Google Scholar
Jin Y (2011) Surrogate-assisted evolutionary computation: recent advances and future challenges. Swarm Evol Comput 1(2):61–70
Article Google Scholar
Knowles J (2006) Parego: a hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems. IEEE Trans Evol Comput 10(1):50–66
Article Google Scholar
Zhang Q, Liu W, Tsang E, Virginas B (2010) Expensive multiobjective optimization by MOEA/D with Gaussian process model. IEEE Trans Evol Comput 14(3):456–474
Article Google Scholar
Cheng R, Jin Y, Olhofer M, Sendhoff B (2016) A reference vector guided evolutionary algorithm for many-objective optimization. IEEE Trans Evol Comput 20(5):773–791
Article Google Scholar
Chugh T, Jin Y, Miettinen K, Hakanen J, Sindhya K (2018) A surrogate-assisted reference vector guided evolutionary algorithm for computationally expensive many-objective optimization. IEEE Trans Evol Comput 22(1):129–142
Article Google Scholar
Hussein R, Deb K (2016) A generative kriging surrogate model for constrained and unconstrained multi-objective optimization. Proc Genet Evol Comput Conf 2016:573–580
Google Scholar
Zhao M, Zhang K, Chen G, Zhao X, Yao C, Sun H, Huang Z, Yao J (2020) A surrogate-assisted multi-objective evolutionary algorithm with dimension-reduction for production optimization. J Pet Sci Engi 192:107192
Article Google Scholar
Li C, Gupta S, Rana S, Nguyen V, Venkatesh S, Shilton A (2017) High dimensional Bayesian optimization using dropout. In: Proceedings of the twenty-sixth international joint conference on artificial intelligence, pp 2096–2102
Chen G, Zhang K, Xue X, Zhang L, Yao J, Sun H, Fan L, Yang Y (2020) Surrogate-assisted evolutionary algorithm with dimensionality reduction method for water flooding production optimization. J Pet Sci Eng 185:106633
Li F, Cai X, Gao L, Shen W (2021) A surrogate-assisted multiswarm optimization algorithm for high-dimensional computationally expensive problems. IEEE Trans Cybern 51(3):1390–1402
Article Google Scholar
Guo D, Wang X, Gao K, Jin Y, Ding J, Cai T (2021) Evolutionary optimization of high-dimensional multiobjective and many-objective expensive problems assisted by a dropout neural network. In: IEEE transactions on systems, man, and cybernetics: systems, pp 1–14
Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. Comput Sci 3(4):212–223
Google Scholar
Peter M (2002) What is a statistical model? Ann Stat 30(5):1225–1310
MathSciNet MATH Google Scholar
Stein M (1987) Large sample properties of simulations using latin hypercube sampling. Technometrics 29(2):143–151
Article MathSciNet Google Scholar
Ma M, Li H, Huang J (2018) A multi-objective evolutionary algorithm based on principal component analysis and grid division. In: 2018 14th international conference on computational intelligence and security (CIS), pp 201–204
Er M, Wu S, Lu J, Toh H (2002) Face recognition with radial basis function (RBF) neural networks. IEEE Trans Neural Netw 13(3):697–710
Article Google Scholar
Kattan A, Galvan E (2012) Evolving radial basis function networks via GP for estimating fitness values using surrogate models. In: 2012 IEEE congress on evolutionary computation, pp 1–7
Sun C, Jin Y, Cheng R, Ding J, Zeng J (2017) Surrogate-assisted cooperative swarm optimization of high-dimensional expensive problems. IEEE Trans Evol Comput 21(4):644–660
Article Google Scholar
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95—international conference on neural networks, vol 4, pp 1942–1948
Zhang Q, Zhou A, Jin Y (2008) Rm-meda: a regularity model-based multiobjective estimation of distribution algorithm. IEEE Trans Evol Comput 12(1):41–63
Article Google Scholar
Tian Y, Cheng R, Zhang X, Jin Y (2017) Platemo: a matlab platform for evolutionary multi-objective optimization [educational forum]. IEEE Comput Intell Mag 12(4):73–87
Article Google Scholar
Cheng R, Jin Y, Narukawa K, Sendhoff B (2015) A multiobjective evolutionary algorithm using Gaussian process-based inverse modeling. IEEE Trans Evol Comput 19(6):838–856
Article Google Scholar
Yen GG, He Z (2014) Performance metric ensemble for multiobjective evolutionary algorithms. IEEE Trans Evol Comput 18(1):131–144
Article Google Scholar
He C, Li L, Tian Y, Zhang X, Cheng R, Jin Y, Yao X (2019) Accelerating large-scale multiobjective optimization via problem reformulation. IEEE Trans Evol Comput 23(6):949–961
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 61903178, 61906081, and U20A20306), the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (Grant No. 2017ZT07X386), the Shenzhen Peacock Plan (Grant No. KQTD2016112514355531), and the Program for University Key Laboratory of Guangdong Province (Grant No. 2017KSYS008).

Author information

Authors and Affiliations

Guangdong Provincial Key Laboratory of Brain-inspired Intelligent Computation, Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, 518055, China
Jianqing Lin, Cheng He & Ran Cheng

Authors

Jianqing Lin
View author publications
You can also search for this author in PubMed Google Scholar
Cheng He
View author publications
You can also search for this author in PubMed Google Scholar
Ran Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ran Cheng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Lin, J., He, C. & Cheng, R. Adaptive dropout for high-dimensional expensive multiobjective optimization. Complex Intell. Syst. 8, 271–285 (2022). https://doi.org/10.1007/s40747-021-00362-5

Download citation

Received: 02 March 2021
Accepted: 29 March 2021
Published: 21 April 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s40747-021-00362-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Adaptive dropout for high-dimensional expensive multiobjective optimization

Abstract

Similar content being viewed by others

Dimension Dropout for Evolutionary High-Dimensional Expensive Multiobjective Optimization

Surrogate Many Objective Optimization: Combining Evolutionary Search, $$\epsilon $$ -Dominance and Connected Restarts

Pareto-Based Bi-indicator Infill Sampling Criterion for Expensive Multiobjective Optimization

Introduction