## Introduction

Multiobjective optimization problems (MOPs) refer to the optimization of multiple conflicting objectives simultaneously, i.e., improvement in one objective may lead to the degeneration of at least one of the other objectives [1]. MOPs are commonly seen in the science and engineering areas [2], e.g., neural architecture search [3], structure optimization [4], and industrial dispatch [5]. An MOP can be mathematically formulated as

\begin{aligned} \begin{aligned}&\text {minimize} \quad {\mathbf {f}}({\mathbf {x}})=\left( f_1({\mathbf {x}}),\ldots ,f_m({\mathbf {x}})\right) \\&\text {subject to} \quad \ {\mathbf {x}}\in X, \\ \end{aligned} \end{aligned}
(1)

where X is the search space of decision variables, m $$(\ge 2)$$ is the number of objectives, and $${\mathbf {x}} = (x_1, \ldots , x_D)$$ is the decision vector with D decision variables [6].

The goal of multiobjective optimization is to obtain a set of representative Pareto-optimal solutions [7]. To be more specific, each solution in the Pareto-optimal solution set (PS) is not Pareto dominated by other solutions in X. Notably, the Pareto dominance between solution $${\mathbf {x}}_1$$ and solution $${\mathbf {x}}_2$$ is notated as $${\mathbf {x}}_1\prec {\mathbf {x}}_2$$ if and only if $$f_i({\mathbf {x}}_1)\le f_i({\mathbf {x}}_2)$$ for all $$1\le i\le m$$, and $$f_j({\mathbf {x}}_1)<f_j({\mathbf {x}}_2)$$ for at least one $$1\le j\le m$$ [8].

As a complex case of MOPs, if the evaluation of any objective function in an MOP is computationally or economically expensive, the problem is called an expensive multiobjective optimization problem (EMOP) [9]. For example, in an automobile body structural design problem, one structural simulation takes tens of minutes or even hours [10]. As a powerful tool for black-box optimization, the evolutionary algorithms (EAs) are proven to be helpful in solving MOPs. Nevertheless, EAs are not competent to solve EMOPs, since the requirement of massive function evaluations leads to unbearable time consumption and computational cost [11].

Surrogate-assisted EAs (SAEAs), which apply surrogate model techniques to traditional evolutionary algorithms, are the mainstream methods to solve EMOPs. Surrogate models are used to replace part of computationally expensive function evaluations (FEs). Since the surrogate models are computationally cheap and efficient, they can significantly reduce the required computational costs [12]. The most commonly used surrogate models include polynomial regression (PR) [13], Gaussian process (GP) [13, 14], radial basis function (RBF) [15, 16], artificial neural networks (ANN) [11], etc. Apart from the adopted surrogate model(s), model management is also a crucial part of SAEAs, which mainly involves how to update the current surrogate model in two aspects. First, it includes the selection of offspring individuals to be re-evaluated by expensive function(s). Second, the infill criterion determines the quality of the candidate re-evaluated individuals, where the re-evaluated solutions will be added to the training dataset for updating the surrogate model [17].

In the past decades, a number of SAEAs have been proposed to solve EMOPs effectively. The most representative SAEAs include ParEGO [18], MOEA/D-EGO [19], K-RVEA [20, 21], CSEA [11], etc. However, the SAEAs above only focus on low-dimensional optimization with less than 30 decision variables [18, 19, 21, 22]. It is mainly due to the lack of training samples for building accurate surrogate models and the unbearable time cost for training the models with high-dimensional input. Recently, several attempts are made to address this issue. Some methods use dimensionality reduction techniques to map decision variables from high-dimensional space to low-dimensional space and then optimize in low-dimensional space [23,24,25]. Other approaches try to use other cheap surrogate models instead of computationally expensive Gaussian process models, or use some multiswarm optimization algorithms rather than conventional genetic algorithms [26]. Very recently, EDN-ARMOEA, a new method was proposed for solving high-dimensional EMOPs, which used a dropout neural network instead of the Gaussian process [27]. Nevertheless, the performance of the SAEAs mentioned above can be further improved.

In this paper, a surrogate-assisted particle swarm optimization algorithm with an adaptive dropout mechanism, called ADSAPSO, is proposed to handle high-dimensional EMOPs. We mainly focus on selecting a small number of crucial decision variables for convergence enhancement and diversity maintenance, aiming to reduce the cost for building and training surrogate model(s). Compared with EDN-ARMOEA, which uses a dropout ANN for dimension reduction, the proposed ADSAPSO is different mainly in two aspects. (i) EDN-ARMOEA uses the dropout strategy for improving the computational efficiency of the surrogate model; by contrast, the dropout strategy in ADSAPSO is adopted to reduce the dimension of the optimization problem. (ii) The dropout neural network in EDN-ARMOEA discards neurons of the network randomly with a certain probability. On the contrary, the dropout mechanism in ADSAPSO learns the deterministic reduction via a statistic model. The main new contributions of this work are summarized as follows:

1. (1)

Inspired by the dropout strategy in deep learning, we proposed an adaptive dropout mechanism to select several decision variables for dimensionality reduction. Unlike existing unsupervised dimensionality reduction approaches, the proposed adaptive mechanism learns from the statistical differences of different solution sets in the decision space for better dimension reduction.

2. (2)

To select promising candidate solutions to be re-evaluated, a novel infill criterion is proposed, which contains the RBF-based search and the Replacement operation. Notably, We use the RBF-based search to optimize the selected d decision variables, and the Replacement operation is used to extend the low-dimensional decision vectors to full-length candidate solutions.

The rest of this paper is organized as follows. In “Background”, we provide the background of the work, including the dropout mechanism, the RBF model, and the estimation of distribution model. The details of the proposed algorithm are illustrated in “Proposed method”. The experimental studies and result analysis are given in “Experimental study”. Finally, “Conclusion” concludes this paper.

## Background

In this section, we first summarize the dropout in deep learning, which is extended to dimension dropout in this study. Next, the detailed principle and components of the RBF are illustrated. Then we introduce the estimation of distribution model, which is used in our proposed adaptive dropout mechanism.

### Dropout

Dropout, which was first proposed in Ref. [28], originally referred to temporarily discarding neurons from the network according to a certain probability during the forward propagation process of the deep neural network, which is equivalent to finding a thinner network from the original network. An example of the dropout in a neural network is shown in Fig. 1.

The original neural network is shown in Fig. 1a. After dropout, some neurons are discarded, as shown by the dotted circle in Fig. 1b. The complexity of the neural network is significantly reduced, and it can make the neural network more general, which reduces the reliance on some local features and effectively alleviates the occurrence of over-fitting.

More specifically, for a neural network with N neurons, it can be regarded as a collection of $$2^N$$ neural networks after dropout, which can prevent over-fitting while improving the training efficiency of networks.

### Radial basis function (RBF) neural network

The RBF neural network is a simple and widely used surrogate model that approximates any nonlinear function. The RBF neural network has a fast training speed due to that it is a local approximation network. On the basis of training data $$\{ (\mathbf{x }_i,y_i)| i=1,\ldots ,N \}$$, it approximates a continuous function as follows:

\begin{aligned} \ {\hat{f}}(\mathbf{x })=\sum _{i=1}^{N} w_i\phi (\mathbf{x }_i,\mathbf{x }), \end{aligned}
(2)

where $$w_i$$ is the weight coefficient, $$\phi (\cdot , \cdot )$$ represents the kernel function and the distance between the point $$\mathbf{x }_i$$ and the point x is used as the independent variable of the kernel function.

Commonly used kernel functions include linear kernel function, polynomial kernel function, Gaussian kernel function, etc. For example, the Gaussian kernel function is calculated as follows:

\begin{aligned} \ \phi (\mathbf{x }_i,\mathbf{x })=\exp (-\frac{\Vert \mathbf{x }_i -\mathbf{x } \Vert ^2}{2\delta ^2}), \end{aligned}
(3)

where $$\delta$$ is the expansion constant of kernel function, which reflects the width of the kernel function image. The smaller the $$\delta$$, the narrower the width.

The weight vector $$\mathbf{w }=(w_1,\ldots ,w_N)$$ is calculated as follows:

\begin{aligned} \ \mathbf{w } = (\varPhi ^T\varPhi )^{-1}\varPhi ^T\mathbf{y }, \end{aligned}
(4)

where $$\mathbf{y }=(y_1,\ldots ,y_N)$$ is the output vector of the training data and $$\varPhi$$ is the matrix computed as follows:

\begin{aligned} \begin{aligned} \ \varPhi = \left[ \begin{matrix} \phi (\mathbf{x }_1, \mathbf{x }_1) &{} \cdots &{} \phi (\mathbf{x }_1, \mathbf{x }_N) \\ \vdots &{} \ddots &{} \vdots \\ \phi (\mathbf{x }_N, \mathbf{x }_1) &{} \cdots &{} \phi (\mathbf{x }_N, \mathbf{x }_N) \end{matrix} \right] . \end{aligned} \end{aligned}
(5)

### Statistical model

Statistical model refers to a model built by mathematical-statistical methods, which obtains the relationship and the development trend between variables through mathematical statistics [29].

Mathematical statistics has two functions: descriptive statistics and inferential statistics. Central trend statistics is a standard descriptive statistical method, which uses a trend value to represent the overall level of a set of data. Commonly used trend values include average, median, and mode. Discrete trend analysis reflects the degree of dispersion of data, and its commonly used indicators are range and standard deviation. Frequency analysis mainly describes the number of occurrences of each characteristic value of the data, reflecting the distribution of a set of data.

Inferential statistics is divided into two parts: parameter estimation and hypothesis testing. The former is based on a one-time sampling experiment to estimate a certain characteristic of the entire data set. The latter test a certain hypothesis and infer whether the hypothesis made is acceptable according to the calculation results.

## Proposed method

In this section, the main framework of the proposed ADSAPSO is introduced, followed by the details of the proposed model management, including the adaptive dropout mechanism and a new infill criterion.

The framework of the proposed ADSAPSO is presented in Algorithm 1, which mainly consists of the adaptive dropout mechanism and the infill criterion. First, N solutions are sampled using the Latin hypercube sampling (LHS) strategy [30] and evaluated using the expensive functions. Then these evaluated candidate solutions are merged into an empty archive Arc. Next, the environmental selection selects $$N_\alpha$$ solutions (arc) from archive Arc. Afterwards, the adaptive dropout mechanism and the infill criterion are adopted. The adaptive dropout mechanism aims to select the d-dimensional variables, and the infill criterion is used to choose the new samples to be evaluated by expensive functions. Notably, we first select one well-performing solution set and one poorly-performing solution set, which are statistically different in the decision space. Accordingly, d-dimensional variable set S is selected from the full-length decision variables, where the chosen variables significantly affect the convergence. As for the infill criterion, the RBF-based search is used to optimize the selected d-dimensional decision variables and obtain optimal k d-dimensional decision vector set $$X_d$$. Then a replacement operator selects k full-length solutions, whose corresponding decision variables will be replaced by $$X_d$$, to form individual set $$X_{{\text {new}}}$$. $$X_{{\text {new}}}$$ will be evaluated using expensive functions for updating the archives Arc. Finally, the obtained Arc will be output as the final solutions. Note that ADSAPSO uses the same environmental selection operator in NSGA-II [7], and we will not describe its details.

The adaptive dropout mechanism is a crucial component of the proposed ADSAPSO (line 8 in Algorithm 1), which discards the decision dimensions from D dimensions to d. In D-dimensional decision variables, some variables have a more significant effect on convergence enhancement, and they are important for expensive multiobjective optimization. Thus, the adaptive dropout mechanism adapts a statistical model-assisted method for selecting d decision variables that significantly affect convergence from the original full-length decision variables. An example of the adaptive dropout mechanism is shown in Fig. 2, where the dashed circle indicates the discarded decision variables.

Unlike the principal component analysis technique [31], which maps data from high-dimensional space to low-dimensional space, our proposed adaptive dropout mechanism retains the original information of the selected d dimensions. Specifically, the proposed adaptive dropout mechanism learns from different solution sets’ statistical differences in the decision space.

Generally, the proposed adaptive dropout mechanism consists of three steps: (1) Solutions selection; (2) Statistical models construction; (3) Dimension selection.

$$\textit{(1) Solutions selection:}$$ Assuming that there are significant differences in some dimensions in the decision space between a set of well-performing solutions and a set of poorly-performing ones. The distribution differences of these two solution sets can somehow reflect the importance of some decision variables in convergence enhancement. Thus, a suitable solution selection should be used to capture the differences. If well-performing solutions and poorly-performing ones are selected from archive Arc, the poorly-performing solutions remain the same during the update process. In this case, the rankings of poorly-performing solutions in the whole Arc are almost unchanged. In other words, the distribution of the poorly-performing solution sets will not help in the later stages of evolution. To remedy this issue, we select the well-performing solutions and the poorly-performing solutions from archive arc, where solutions in arc are elite solutions selected from Arc. Afterwards, according to the non-dominated sorting, the first $$N_s$$ solutions in arc are regarded as well-performing solutions, and the last $$N_{\text {s}}$$ solutions in arc are deemed to be poorly-performing ones. Consequently, the indistinguishable situation can be avoided.

$$\textit{(2) Statistical models construction}$$ After solutions selection, two statistical models are constructed using the selected well-performing solutions and the selected poorly-performing solutions, respectively. To discover the effect of different variables, we analyze the two sets of solutions in the decision space via the constructed two statistical models. To be more specific, we count the average value of each dimension of the two solution sets as the statistical value separately. Since the selected two sets of solutions have significant differences in the objective space, the influence of different variables on the objective values can be reflected by the statistical differences in the decision space. Thus, dimensions with significant differences in the decision space will be emphasized for better convergence enhancement in the following optimization.

$$\textit{(3) Dimension selection}$$ After constructing two statistical models, a difference model is obtained by calculating the difference between each dimension in the two models. Then, we select the top $$\beta \cdot D$$ dimensions with the highest absolute difference to set S.

Figure 3 shows an example of the adaptive dropout mechanism, and Fig. 3a demonstrates the detailed process of solutions selection. In this figure, the blue squares are the “Good” solutions and the red triangles are the “Bad” ones. Notably, the black circles on the upper right corner in Fig. 3a are the solutions (in Arc) that are not selected by arc. They will not be selected as “Bad” solutions for preventing the distribution of “Bad” solutions from premature. Figure 3b presents two statistical models, where the horizontal axis represents the dimension, and the vertical one is the dimension value. Besides, the red line indicates the model of poorly-performing solutions (“Bad” solutions), and the blue line indicates the model of well-performing solutions (“Good” solutions). Figure 3c demonstrates the absolute difference of the above two statistical models, where the horizontal axis represents the dimension, and the vertical axis represents the absolute difference between two statistical models. The dotted line represents the selection threshold, and the shaded parts indicate the dimensions to be selected. Specifically, $$d_1$$, $$d_2$$, $$d_3$$, and $$d_4$$ represent the lower and upper boundaries that locating to the dimensions above the threshold. The corresponding dimensions in $$[d_1,d_2]$$ and $$[d_3,d_4]$$ are the dimensions we ultimately choose.

For high-dimensional EMOPs, a large number of training samples are required for constructing accurate surrogate models, which is unrealistic for expensive multiobjective optimization. Our proposed adaptive dropout mechanism reduces the dimension of decision variables from D to d according to the statistical results. Generally, the proposed method helps build relatively accurate d-dimensional surrogate models with the same number of training samples.

### Infill criterion

Once the d-dimensional decision vector set S, which has a more significant effect on convergence, is selected by adaptive dropout, the proposed ADSAPSO is expected to optimize the d decision variables and select promising candidate individuals to be re-evaluated.

In our proposed infill criterion, an RBF-based search is adapted to optimize the selected d-dimensional variables given in Algorithm 2. First, m d-dimensional RBF models are trained using the individuals in $${\text {arc}}_{d}$$, which will be used to replace the original expensive functions for evaluating offspring. Notably, RBF is used due to its insensitivity to the increment in the dimension of the function to be approximated [32, 33]. Compared with the Kriging model [34], RBF is more suitable for problems with high-dimensional decision variables, since the computation time for training Kriging models will become unbearable when the number of training samples is large. Then, the particle swarm optimizer (PSO) [35] is adopted for further optimization, due to its promising capability in solving high-dimensional optimization problems as suggested in Ref. [34]. To enhance population diversity at the late stage of the evolution, we add a polynomial mutation operation (Line 6 in Algorithm 2). In the late stage of the evolution, the population may trap in local optima easily, and thus $$0.75 \times {\text {MaxFEs}}$$ is empirically adopted as the threshold for adopting polynomial mutation. After the optimization, k d-dimensional better-converged decision vectors are selected from the final decision vector set. Compared with the surrogate models built with D-dimensional training samples, although not all decision variables are optimized at every iteration, the surrogate model built with d-dimensional training samples can better help optimize the d-dimensional variables and accelerate convergence rate. The RBF-based search in the proposed ADSAPSO aims to conduct the local search in a low-dimensional space to quickly obtain better-converged solutions, which is naturally suitable for high-dimensional EMOPs.

Since the optimized k decision vectors of dimension d cannot be evaluated by expensive functions directly, we introduce a Replacement method that extends d-dimensional decision vectors to D-dimensional ones. To be more specific, k solutions of dimension D, named $$X_D$$, are first selected from the non-dominated solution set of arc by the environmental selection. Note that the environmental selection adopted here is the same as that in Algorithm 1. Next, the corresponding dimensions of $$X_D$$ will be replaced by $$X_d$$ to form k new individuals $$X_{{\text {new}}}$$. Specifically, solutions in $$X_d$$ are well converged, and solutions in $$X_D$$ are with good diversity. To some extent, the newly generated solutions $$X_{{\text {new}}}$$ can be considered to inherit the convergence and diversity properties of $$X_d$$ and $$X_D$$ simultaneously. Finally, we re-evaluate $$X_{{\text {new}}}$$ for updating the archives. An illustrative example of the Replacement is shown in Fig. 4, where each circle denotes a decision variable, and the dashed circles will be replaced for evaluating full-length decision vectors.

## Experimental study

To empirically investigate the performance of the proposed ADSAPSO, we first discuss the impact of different parameter $$\beta$$ value on the algorithm. Then, three state-of-the-art SAEAs, namely MOEA/D-EGO [19], K-RVEA [21], and RM-MEDA [36] are compared with our proposed ADSAPSO on test problems. Notably, we compare our proposed algorithm with RM-MEDA since both algorithms have adopted Statistical models during the optimization. In the rest of this section, we first present a brief introduction to the experimental settings of all the compared algorithms. Then the test problems and performance indicators are described. Afterwards, each algorithm is run 20 times on each test problem independently. The Wilcoxon rank sum test is used to compare the results obtained by the proposed ADSAPSO and the compared algorithms at a significance level of 0.05. Symbols “$$+$$”, “−”, and “$$\approx$$” mean that the compared algorithm is significantly better than, significantly worse than, and approximately equal to the proposed ADSAPSO, respectively.

### Experimental settings

For fair comparisons, all the compared algorithms are implemented in PlatEMO [37] using MATLAB, and they are run on a computer with an Intel Core i7 processor and 32 GB of RAM.

1. (1)

Parameters settings: In our proposed ADSAPSO, the number of samples used to build the surrogate model is set to $$N_{\alpha }=200$$, and the number of individuals to be re-evaluated is set to $$k = 5$$. The number of selecting for well-performing solutions and poorly-performing solutions in Solutions Selection is set to $$N_{\text {s}} = 50$$. In K-RVEA, the number of individuals to be re-evaluated and used for updating surrogate models is also set to 5. For the other compared algorithms, the recommended parameter settings in the literature are used.

2. (2)

Reproduction operators: In this paper, the maximum number of generations in the RBF-based search is set to $$\lambda _{{\text {max}}}=100$$. In PSO, the population size of PSO is set to $${\text {N\!I}} =$$100, and the inertia weight of PSO is set to $$W=0.5$$.

3. (3)

Termination Condition: For all the test problems except RM-MEDA, the size of the initial data is set to 500. The maximum function evaluations using the expensive function (denoted as MaxFEs) is set to 1000.

### Test problems and performance indicator

In this paper, we use ten IMF problems selected from [38] and seven DTLZ problems. Among the IMF test suite and DTLZ test suite, the number of objectives is three in IMF4, IMF8 and all in DTLZ, and it is two in other problems.

The Inverted Generational Distance (IGD) indicator [39] is a commonly used performance indicator for multiobjective optimization, which can assess the quality of the obtained solution set from both the convergence and distribution uniformity. To better explain IGD, we suppose that P is a set of reference points that are evenly distributed on the Pareto-optimal Front (PF) and Q is the set of obtained non-dominated solutions. The mathematical definition of IGD is

\begin{aligned} \text {IGD}(P,Q)= \frac{\sum _{{\mathbf {p}} \in P}\text {dis}({\mathbf {p}},Q)}{|P|}, \end{aligned}
(6)

where $$\text {dis}({\mathbf {p}},Q)$$ is the minimum Euclidean distance between $${\mathbf {p}}$$ and points in Q, and |P| is the number of reference points in P. A smaller IGD value indicates better performance of the algorithm.

### Impact of parameter $$N_{\text {s}}$$

In the proposed Adaptive Dropout Mechanism, parameter $$N_{\text {s}}$$ is the number of selected well-performing solutions and poorly-performing solutions from the elite archive arc. A sensitivity analysis is conducted to investigate the impact of parameter $$N_{\text {s}}$$ on the performance of ADSAPSO. Here, ADSAPSO with $$N_{\text {s}}$$ being to 100 and 67 are compared with ADSAPSO with $$N_{\text {s}} =$$50. Different $$N_{\text {s}}$$ values are derived from the empirical choices where $$N_\alpha$$ is set to 200. $$N_{\text {s}} = 100$$ means that half of arc is selected, $$N_{\text {s}} = 67$$ indicates that the elite archive arc is divided into three equal parts, and $$N_{\text {s}} = 50$$ represents a quarter of the elite archive arc is obtained.

The statistical results of the IGD values achieved by ADSAPSO with different $$N_{\text {s}}$$ values on the IMF test suite are summarized in Table 1. It can be observed that ADSAPSO with different $$N_{\text {s}}$$ values performs similarly, and ADSAPSO with $$N_{\text {s}} = 50$$ has achieved the best results on these test problems. Thus, $$N_{\text {s}} = 50$$ in adopted in the rest experiments.

### Impact of parameter $$\beta$$

Additional comparative experiments are run to investigate the impact of parameter $$\beta$$ (indicating the probability of selecting the number of dimensions in the adaptive dropout mechanism). Keeping the same parameters settings as introduced in Sect. 4.1, the ADSAPSO with different $$\beta$$ values from $$\{0.1, 0.3, 0.5, 0.7, 1.0\}$$ are empirically compared.

The statistical results of the IGD values achieved by ADSAPSO with different $$\beta$$ values on the IMF test suite are summarized in Table 2.

It can be observed that ADSAPSO with $$\beta =0.5$$ has achieved the best results on these test problems, and ADSAPSO with $$\beta =0.1$$ has achieved the worst results, which indicates that different values of $$\beta$$ have a significant influence on the result. In subsequent experiments, we will adopt $$\beta =0.5$$ in our proposed ADSAPSO.

Simultaneously, $$\beta =1.0$$ indicates that the adaptive dropout mechanism is not used in ADSAPSO, and full-length decision vectors are used during optimization. It can be observed that ADSAPSO with $$\beta =0.5$$ shows an overall better performance than ADSAPSO without adaptive dropout mechanism. However, the ADSAPSO without adaptive dropout mechanism is better on IMF6 and IMF8. Comparing with ADSAPSO with $$\beta =1.0$$, we can verify the effectiveness of our proposed adaptive dropout mechanism.

### General performance

In this subsection, ADSAPSO is compared with two representative SAEAs, i.e., MOEA/D-EGO [19], K-RVEA [21], and RM-MEDA [36]. K-RVEA is an SAEA that relies on a set of adaptive reference vectors for expensive many-objective optimization, and the MOEA/D-EGO is an MOEA/D-based SAEA with the Gaussian stochastic process model. RM-MEDA is a regularity model-based multiobjective estimation of distribution algorithm, which is also based on statistical learning methods.

The experimental results of the IGD value provided by MOEA/D-EGO, K-RVEA, RM-MEDA, and ADSAPSO on the IMF test suite and DTLZ test suite are recorded in Tables 3 and 4, respectively. Notably, MOEA/D-EGO fails to solve DTLZ4 with 200 decision variables due to the failure of Kriging models in handling high-dimensional data. It can be seen that ADSAPSO obtains better IGD values in most of the three test suite compared with MOEA/D-EGO, K-RVEA, and RM-MEDA.

The convergence profiles of the three compared algorithms on DTLZ problems with 200 decision variables are given in Fig. 5.

It can be observed that our proposed ADSAPSO converges faster than the other three compared algorithms on most problems. The results have demonstrated the superiority of our proposed ADSAPSO over the three compared algorithms on EMOPs in terms of convergence speed.

Figure 6 presents the archive Arc achieved by the ADSAPSO, MOEA/D-EGO and K-RVEA on bi-objective DTLZ1 with 50 decision variables at 500 evaluations (Initial archive), 750 evaluations (Interim archive), and 1000 evaluations (Final archive) to visualize the experimental results more intuitively.

It can be observed that the archive obtained from ADSAPSO converges in several directions at a faster convergence rate in the early stage, and the convergence slows down in the later stage, but the population diversity increases. MOEA/D-EGO and K-RVEA hardly converge in the whole process. Overall, ADSAPSO has achieved the best results on these problems, where the obtained final archive Arc are best converged.

## Conclusion

In this paper, we have proposed an SAEA with adaptive dropout mechanism, called ADSAPSO, for solving high-dimensional EMOPs with up to 200 decision variables.

Generally, the adaptive dropout mechanism is used for selecting d-dimension decision variables from D-dimensional original decision variables. It can help build d-dimensional relatively high-precision RBF models, and the algorithm only needs to optimize the d-dimensional variables at each iteration. The infill criterion is adopted to select candidate individuals for re-evaluation, where the corresponding dimensions in k D-dimensional solutions are replaced by the k d-dimensional optimized individuals to form k new D-dimensional individuals.

Some systematic comparisons have been conducted on a set of EMOPs with up to 200 decision variables. First, we discuss the impact of different parameter $$\beta$$ value on the algorithm and find $$\beta =0.5$$ is the best. The proposed ADSAPSO is compared with three representative algorithms, i.e., MOEA/D-EGO, K-RVEA, and RM-MEDA, on test problems IMF and DTLZ. The experimental results indicate the superiority of ADSAPSO in solving EMOPs with high-dimension decision variables.

This paper indicates that ADSAPSO is promising in solving high-dimension EMOPs. Furthermore, it is promising to extend ADSAPSO to solving expensive large-scale multiobjective optimization problems [40] and expensive many-objective optimization problems. It is also desirable to apply it to real-world applications.