Introduction

Population-driven evolutionary algorithms (EAs) have attracted considerable interest and applications in complex black-box system design and optimization over the past several decades. Their applications include but are not limited to the design optimization of spacecraft and chemical reactors [1, 2], neural network architectures [3, 4], morphological topology [5, 6], at so forth, because of their simple principles, ease of operation, robustness, and especially the weak dependence on the problem attributes. Nonetheless, to locate the optimal solution to the optimization problem, EAs usually necessitate triggering a large number of objective function evaluations. This makes them inefficient and computationally expensive in the resolution of problems involving time-consuming high-precision simulation and analysis, such as computational fluid dynamics, finite element analysis, and physical and chemical experiments, significantly hindering their practicality [7]. To break the performance bottleneck of EAs in dealing with computationally expensive optimization problems, SAEAs have gained wide attention. SAEA boosts the efficiency of EA in resolving computationally expensive optimization problems by constructing an inexpensive surrogate model to approximately replace the expensive target fitness function for fitness evaluation, which considerably boosts the optimization efficiency of EA [8,9,10]. The commonly used surrogate models mainly include Radial Basis Function (RBF) [11,12,13], Kriging or Gaussian Process (GP) [12,13,14], Support Vector Machine (SVM) [15, 16], and Polynomial Regression (PR) [17], etc.

Nowadays, the focuses of SAEAs concern the surrogate modeling for different types of optimization problems or underlying algorithms and the customization of effective model management strategies to optimally regulate the balance between the frequency of target fitness function invocations and solution accuracy on maximizing the algorithm performance. Model management is critical for the correct convergence of these methodologies. The primary components of model management cover the cooperative operation of the surrogate model with the EA's learning operators and the impact of infill samples on the model’s correctness during the optimization process. To improve the training quality and prediction accuracy of the surrogate model, inspired by the Tri-Training semi-supervised learning technique, Wang et al. [18] proposed a new surrogate-assisted multi-objective optimization algorithm MOO-TTSA. The proposal used the Tri-Training in each iteration to filter the samples with higher confidence in fitness among the candidate solutions to enlarge the training sample set and optimize the modeling quality of the surrogate. Tong et al. [19] calculated the leave-one-out cross-validation error of the training sample set and constructed the uncertainty prediction model of candidate solutions based on the RBF model. The samples with the best-approximated fitness and the largest uncertainty in the design space were chosen for infill sampling, respectively. By leveraging the feature selection and feature extraction techniques in parallel, Guo et al. [20] enriched the training samples with three different feature attributes and constructed an ensemble model to approximate the target solution space landscape. The lower confidence bound and the expected improvement acquisition functions were also improved depending on the prediction variance of the three base models. In addition, considering the diversity and accuracy of surrogate ensemble modeling in the solution space, Yu et al. [21] calculated the prediction error sum of squares (PRESS) for RBF models paired with five different kernel functions by cross-validation techniques. Two RBF base models with optimal PRESS were selected to construct an ensemble model to estimate the fitness of candidate solutions. To improve the accuracy of the surrogate model and balance the exploration and exploitation of the algorithm, the best and the worst individuals in the iterative population were selected for real evaluation at each iteration, respectively.

However, as the problem scale and complexity increase, the issue of the curse of dimensionality, arising from high-dimensional complex feature space with multiple local optima and multivariate coupling, causes the exponential growth in the demand for the training sample size for surrogate modeling. As a result, the training cost and overfitting risk of the surrogate model increase, which greatly limits the effectiveness of SAEA. To improve the computational efficiency of SAEAs for high-dimensional computationally expensive optimization problems, spatial transformation and dimensional learning techniques have gained great attention. For reducing the training cost of GP models in high-dimensional decision space, Sammon mapping was employed in [12] to downscale the high-dimensional feature space to a low-dimensional one, and the GP models were then constructed in the reduced feature subspace to screen promising solutions in conjunction with the lower confidence bound criterion. Similarly, the high-dimensional feature space was also approximately simplified by the Sammon mapping in [22], and the iterative population was dynamically assigned with different differential mutation operators to generate offspring individuals based on the feedback information of the state of the optimal solution. Meanwhile, the global or local GP models were opted for and constructed in the feature subspace for different mutation strategies. In [23], an eigen coordinate system associated with the original coordinate system was generated through spatial transformation, and these two coordinate systems were then taken collaboratively with RBF-based multi-swam optimization to generate new candidate populations on a certain probability. When it comes to complex large-scale expensive optimization problems, the principal component analysis (PCA) was employed in [24] to help simplify the modeling complexity of the GP model by linearly mapping the training samples to a low-dimensional subspace, so that each objective function can be well approximated by the GP model to direct the optimization more accurately, and the solutions with the smallest angle-penalized distances and the largest uncertainties were chosen for subsequent refreshing of the GP models. Inspired by the divide-and-conquer philosophy, at each generation, the large-scale solution space was reduced to a series of low-dimensional subspaces using PCA and a random feature selection technique [25], and an adaptive search switch strategy was used to regulate the search of the subspaces at different optimization stages, allowing the iterative population and surrogate model to better accommodate the potential exploration and exploitation directions offered by the subspaces of the original and mapping spaces. A concept based on transfer optimization was adopted in [26], wherein a simplified problem space was constructed for the target problem feature space in line with the principal component analysis (PCA). The simplified problem space and the mapping relationship matrix between the two spaces are periodically reconstructed and updated according to the squared reconstruction error to ensure a positive transfer of the optimal information during the bi-spatial search. In [27] opted for a feature selection technique to condense the large-scale decision space and local surrogates were trained to approximate the landscape of the resulting low-dimensional feature subspace. Unlike the aforementioned approaches, the high-dimensional feature space was compressed and reconstructed via the encoding and decoding operators of an autoencoder in [28]. Two variable-size subpopulations were co-evolved and communicated in the original high-dimensional feature space and approximated low-dimensional feature subspace, significantly improving the solution efficiency of the SAEA on high-dimensional expensive optimization problems.

The usage of dimension reduction techniques to reduce the original complex high-dimensional feature space into a lower, more easily solvable feature subspace, makes it possible to control the complexity of the surrogate modeling on one hand. On the other hand, it can greatly increase the optimization efficiency of the SAEAs for high-dimensional solution space. Nevertheless, the loss of feature information associated with dimension reduction often results in a mismatch of the optimal structural properties between the feature subspace and the original feature space, which directly affects the accuracy and precision of SAEAs. In fact, in the framework of SAEAs combined with dimension reduction techniques, the mapping relationship model between the original feature space and the feature subspace is often built based on a small amount of historical evolutionary samples. The prediction accuracy of the mapping relationship model is strongly dependent on the quality and distribution of the historical evolutionary samples. Meanwhile, the candidate solutions derived from the feature subspace optimization are usually subject to inverse mapping to reconstruct features for generating solutions to be evaluated in the original feature space. The prediction quality of the mapping relationship model therefore directly determines the correctness of subsequent candidate solution screening and evaluation. Currently, existing SAEAs with dimension reduction techniques usually use the derived eigenvector matrix from the training samples for feature reconstruction for the newly generated candidates. Given the quality discrepancies and distribution characteristics of the training samples, directly adopting the derived eigenvector matrix for feature reconstruction of the newly added candidates can easily lead to feature drift, deteriorating the quality of the candidate samples in the original feature space and misleading the convergence direction of the iterative population. To address the above concerns, this paper leverages two unsupervised feature learning techniques, i.e., principal component analysis and autoencoder, to perform feature reduction and feature reconstruction of the high-dimensional solution space, and proposes a dual-drive collaboration SAEA, named DDCSAEA, for high-dimensional expensive optimization problems. The proposal’s main contributions are as follows.

  1. (1)

    A new feature reduction-driven surrogate-assisted subspace search strategy is proposed to simplify the surrogate modeling complexity and extract the principal component prior to the optimal solution during the iteration, based on the PCA and an RBF-assisted local search.

  2. (2)

    A new feature reconstruction-driven infill-sampling strategy is designed for reconstructing and filtering the promising solutions of the feature subspace for real evaluation in the target problem space, by taking advantage of differential mutation and an autoencoder.

  3. (3)

    A comprehensive analysis concerning the performance discrepancy of SAEAs under the single and sequential coupling modes of feature reduction and feature reconstruction is provided. The contrastive results show that the proposed method has better robustness and remarkable performance over five state-of-the-art algorithms on high-dimensional complex problems with multi-type fitness landscapes.

The remainder of the paper is structured as follows: “Related work” briefly introduces the principles of the RBF model, PCA, Autoencoder and the underlying local search engine. “Dual-drive collaboration surrogate-assisted evolutionary algorithm by coupling feature reduction and reconstruction” provides the motivation and a detailed description of the proposed method. “Empirical study” gives the experimental results and analyses. Finally, “Conclusion” concludes the paper and discusses some future works.

Related work

Social learning particle swarm optimizer

DDCSAEA employs the social learning particle swarm optimizer (SLPSO) to be the local search engine for the exploration of the optimal solution GbestRBF of the RBF in the feature subspace. SLPSO being a new PSO variant uses the randomly selected excellent exemplars and the mean position of the iterative swarm to replace the personal best Pbest and the swarm best Gbest, respectively, to guide the behavior learning of particles. Specifically, during the behavior learning of SLPSO, the iterative swarm members are first ranked in ascending order of fitness. The smaller the rank, the better the fitness. Then for each learner (particle), an exemplar is figured out from the swarm members with better ranks against the learner to guide its behavior learning. Here, the best particle in the current iterative swarm is directly retained in the new iterative swarm without undergoing behavior learning. The experimental results show that the SLPSO performs excellently in handling complex optimization problems.

Without loss of generality, for the minimization problem, SLPSO updates the velocities and positions of the particles with Eqs. (1) and (2), respectively.

$$ \Delta x_{ij}^{(t + 1)} = \gamma \cdot \Delta x_{ij}^{(t)} + c_{1} \cdot (x_{kj}^{(t)} - x_{ij}^{(t)} ) + c_{2} \cdot \varepsilon (\overline{x}_{j}^{(t)} - x_{ij}^{(t)} ), $$
(1)
$$ x_{ij}^{(t + 1)} = \left\{ {\begin{array}{*{20}c} {x_{ij}^{(t)} + \vartriangle x_{ij}^{(t + 1)} ,\quad if\;p_{i}^{(t)} \le p_{i}^{(L)} ,} \\ {x_{ij}^{(t)} ,\quad otherwise.} \\ \end{array} } \right. $$
(2)

where, \({\varvec{x}}_{i}^{(t)} = (x_{i1}^{(t)} ,x_{i2}^{(t)} ,...,x_{iD}^{(t)} )\), \(1 \le i < N\) denotes the position vector of the ith particle at generation t. N and D represent the population size and the dimensionality of the problem, respectively. \(\gamma\) is the inertia weight. \(\Delta {\varvec{x}}_{i}^{(t)} = (\Delta x_{i1}^{(t)} ,\Delta x_{i2}^{(t)} ,...,\Delta x_{iD}^{(t)} )\) denotes the behavior correction vector acting similarly to the velocity correction vector in the PSO. j denotes the jth decision variable for the ith particle, and \(P_{i}^{(L)}\) indicates the learning probability of the ith particle, \(\gamma\), \(c_{1}\), \(c_{2}\) and \(p_{i}^{(t)}\) are three uniformly distributed random numbers in [0,1] respectively. \(x_{kj}^{(t)}\) represents the jth element of the kth exemplar particle with better fitness over the ith particle, and \(\overline{x}_{j}^{(t)} = \left( {\sum\nolimits_{i = 1}^{N} {x_{ij}^{(t)} } } \right)/N\) denotes the mean value of the iterative swarm in the jth dimension. In Eq. (1), \(\varepsilon\) is the social influence factor for controlling the effect of \(\overline{x}_{j}^{(t)}\) on behavior learning. In this work, \(P_{i}^{L}\) and \(\varepsilon\) are set to 1 and 0, respectively [29].

Radial basis function

DDCSAEA constructs an RBF model to approximate the projected neighborhood landscape of the iterative swarm in the feature subspace during the surrogate-assisted feature subspace optimization phase, and to estimate the fitness of the new candidate samples in the feature subspace. RBF, as a single hidden-layer feed-forward neural network, is more effective at approximating target problems with different orders of nonlinearities and varying landscape characteristics than GP, PR and SVR, and has the advantage of being less sensitive to the training sample size and problem scale [30,31,32].

Formally, an RBF model can be obtained by interpolating the N pairs of training data \(({\mathbf{x}}_{1} ,f({\mathbf{x}}_{1} )),({\mathbf{x}}_{2} ,f({\mathbf{x}}_{2} ))...,({\mathbf{x}}_{N} ,f({\mathbf{x}}_{N} ))\), \({\mathbf{x}}_{i} \in {\mathbf{R}}^{d}\), \(f({\mathbf{x}}_{i} ) \in R\), \(i = 1,2,...,N\), according to Eq. (3) [33].

$$ \hat{f}({\mathbf{x}}) = \sum\limits_{i = 1}^{N} {\alpha_{i} \varphi \left( {\left\| {{\mathbf{x}} - {\mathbf{x}}_{i} } \right\|} \right) + p({\mathbf{x}})} , $$
(3)

where \(\left\| \cdot \right\|\) and \(\varphi ( \cdot )\) denote the Euclidean norm and kernel basis function, respectively. Commonly used kernel basis functions include cubic splines, thin-plate splines, gaussian, linear splines, and multi-quadrics splines. In this work, we opt for the thin-plate spline to construct the RBF to approximate the landscape of the feature subspace due to its excellent smoothing performance [34]. In Eq. (3), \(\alpha_{i} \in R\) represents the interpolation weight of kernel basis function over \({\mathbf{x}}_{i}\). \(p({\mathbf{x}})\) indicates a linear polynomial in d variables that meets \(\sum\nolimits_{i = 1}^{N} {\alpha_{i} p({\mathbf{x}}_{i} )} = 0\). The hyperparameters in Eq. (3) can be derived from the following system of equations.

$$ \left( {\begin{array}{*{20}c} {{\varvec{\Phi}}} & {\mathbf{P}} \\ {{\mathbf{P}}^{T} } & {\mathbf{0}} \\ \end{array} } \right)\left( {\begin{array}{*{20}c} {{\varvec{\upalpha}}} \\ {\mathbf{c}} \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {\mathbf{F}} \\ {\mathbf{0}} \\ \end{array} } \right) $$
(4)

where \({{\varvec{\Phi}}} \in {\mathbf{R}}^{N \times N}\) is the kernel function matrix filled with \({{\varvec{\Phi}}}_{ij} : = \varphi \left( {\left\| {{\mathbf{x}}_{i} - {\mathbf{x}}_{j} } \right\|} \right)\),\(i,j = 1,2,...,N\).\({{\varvec{\upalpha}}}{ = (}\alpha_{1} ,\alpha_{2} ,...,\alpha_{N} {)}^{{\text{T}}} \in {\mathbf{R}}^{N}\) denotes the weight coefficient vector. \({\mathbf{P}} \in {\mathbf{R}}^{N \times (d + 1)}\) collects the values of the primary functions of the linear polynomial \(p({\varvec{x}})\) at the interpolated sample points, and the vector \({\mathbf{c}}{ = }(c_{1} ,c_{2} ,...,c_{d + 1} )^{T} \in {\mathbf{R}}^{d + 1}\) gathers the coefficients of the linear polynomial \(p({\varvec{x}})\). \({\mathbf{F}} = (f({\varvec{x}}_{1} ),f({\varvec{x}}_{2} ),...,f({\varvec{x}}_{N} ))^{T} \in {\mathbf{R}}^{N}\) is the vector of fitness for the interpolated samples. Here the necessary condition for the coefficient matrix in Eq. (4) to be non-singular is that all training samples are affinely independent [35].

Principal component analysis for feature reduction

DDCSAEA takes PCA to implement the dimension reduction for producing a low-dimensional feature subspace by extracting as much principal feature information as possible from the original high-dimensional space, thereby effectively controlling the trade-off of the problem complexity and surrogate modeling complexity and improving the efficiency and accuracy of surrogate-assisted optimization. PCA is an unsupervised feature learning technique often used for dimension reduction of high-dimensional data [36]. The principle of PCA is to generate a set of uncorrelated low-dimensional feature vectors by performing a singular value decomposition on the covariance matrix of the centralized high-dimensional training samples. The new samples are then linearly projected into the low-dimensional feature space.

DDCSAEA randomly chooses M samples \({\mathbf{X}}^{{{\mathbf{(}}{\varvec{rand}}{\mathbf{)}}}} = ({\mathbf{X}}_{{\mathbf{1}}}^{{{\mathbf{(}}{\varvec{rand}}{\mathbf{)}}}} ,{\mathbf{X}}_{{\mathbf{2}}}^{{{\mathbf{(}}{\varvec{rand}}{\mathbf{)}}}} ,...,{\mathbf{X}}_{{\mathbf{M}}}^{{{\mathbf{(}}{\varvec{rand}}{\mathbf{)}}}} )\) from the database in the original feature space as the training samples for PCA projection modeling. Here using a random sample set of the database for the PCA training, on the one hand, allows the iterative population to follow the potential exploration and exploitation directions with equal probability as the optimization proceeds, thus weakening the biased search due to the biased distribution of the evolutionary samples. On the other hand, in this way, the optimal neighborhood prior to the target problem, especially the prior knowledge of the unexplored optimal regions, can be enriched following the low-dimensional subspace search on the optimal domain covered by the iterative population. The basic steps for constructing the low-dimension feature subspace by PCA on the random sample set \({\mathbf{X}}^{{{\mathbf{(}}{\varvec{rand}}{\mathbf{)}}}}\) in D-dimensional feature space are as follows.

  1. (1)

    Centralize \({\mathbf{X}}^{{{\mathbf{(}}{\varvec{rand}}{\mathbf{)}}}} = ({\mathbf{x}}_{i}^{{{\mathbf{(}}{\varvec{rand}}{\mathbf{)}}}} ,{\mathbf{x}}_{{\mathbf{2}}}^{{{\mathbf{(}}{\varvec{rand}}{\mathbf{)}}}} ,...,{\mathbf{x}}_{{\mathbf{M}}}^{{{\mathbf{(}}{\varvec{rand}}{\mathbf{)}}}} )\) to yield \({\mathbf{X}}_{c}^{{{\mathbf{(}}{\varvec{rand}}{\mathbf{)}}}} = ({\mathbf{x}}_{c1}^{{{\mathbf{(}}{\varvec{rand}}{\mathbf{)}}}} ,{\mathbf{x}}_{c2}^{{{\mathbf{(}}{\varvec{rand}}{\mathbf{)}}}} ,...,{\mathbf{x}}_{cM}^{{{\mathbf{(}}{\varvec{rand}}{\mathbf{)}}}} )\), where \({\mathbf{x}}_{i}^{(rand)} = (x_{i1}^{(rand)} ,x_{i2}^{(rand)} ,...,x_{iD}^{(rand)} )^{T}\), \({\mathbf{x}}_{ci}^{{{\mathbf{(}}{\varvec{rand}}{\mathbf{)}}}} = {\mathbf{x}}_{i}^{{{\mathbf{(}}{\varvec{rand}}{\mathbf{)}}}} - {\mathbf{x}}_{c}\),\(i = 1,2,...,M\), \({\mathbf{x}}_{c} = \frac{1}{M}\sum\nolimits_{i = 1}^{M} {{\mathbf{x}}_{i}^{{({\varvec{rand}})}} }\);

  2. (2)

    Calculate the covariance matrix \({\mathbf{R}} = {\mathbf{X}}_{c}^{{{\mathbf{(}}{\varvec{rand}}{\mathbf{)}}}} ({\mathbf{X}}_{c}^{{{\mathbf{(}}{\varvec{rand}}{\mathbf{)}}}} )^{T}\) of \({\mathbf{X}}_{c}^{{{\mathbf{(}}{\varvec{rand}}{\mathbf{)}}}}\) and perform singular value decomposition on \({\mathbf{R}}\) to obtain \({\mathbf{R}} = {\mathbf{UDV}}^{{\text{T}}}\);

  3. (3)

    Take the first \(D_{s}\) (\(D_{s} < D\)) columns of the matrix \({\mathbf{U}}\) to comprise the low-dimensional feature vector set \({\mathbf{U}}_{map} = ({\mathbf{u}}_{1} ,{\mathbf{u}}_{2} ,...,{\mathbf{u}}_{Ds} )\), and construct the Ds dimensional feature subspace.

This work sets the feature subspace size \(D_{s} = 10\) to balance the optimization performance and feature information loss of the DDCSAEA. The sensitivity analysis on the parameter \(D_{s}\) is detailed in “Sensitivity analysis”. Based on the PCA model, the projection vector \({\mathbf{X}}_{new}^{(sub)}\) in the feature subspace of the newly added sample \({\mathbf{X}}_{new}\) in the original high-dimensional feature space can be derived from \({\mathbf{X}}_{new}^{(sub)} = ({\mathbf{U}}_{map} )^{T} ({\mathbf{X}}_{new} - {\mathbf{X}}_{c} )\).

Autoencoder for feature reconstruction

DDCSAEA selects the candidate solutions obtained from the low-dimensional feature subspace as input data to train an autoencoder [37, 38], which is then adopted to reconstruct these candidate solutions into the original high-dimensional feature space for infill-sampling. The autoencoder is also an unsupervised feature learning technique depending on a backpropagation algorithm and optimization methods [39, 40]. As an important feature learning technique, autoencoder has been widely used in areas such as image classification and pattern recognition [41]. The autoencoder is essentially made up of a binary symmetric structure with an encoder and a decoder, as shown in Fig. 1a.

Fig. 1
figure 1

Schematic diagrams for feature reduction and reconstruction: a feature reduction and reconstruction by a single autoencoder; b feature reduction and reconstruction by PCA and an autoencoder, respectively

Given p input data \({\mathbf{X}}^{input} = ({\mathbf{X}}_{1}^{input} ,{\mathbf{X}}_{2}^{input} ,...,{\mathbf{X}}_{p}^{input} )\), the autoencoder first encodes \({\mathbf{X}}^{input}\) to map it to the hidden layer’s embedding space: \(Encoder({\mathbf{X}}^{input} ) \to {\mathbf{X}}^{embedding}\), and then reconstructs \({\mathbf{X}}^{embedding}\) at the output end through decoding to map it back to the original input space: \(Decoder({\mathbf{X}}^{embedding} ) \to {\mathbf{X}}^{{{\text{output}}}}\). Mathematically, Eq. (5) formulates the encoding process of \({\mathbf{X}}^{input}\) from the input space to the embedding space of the hidden layer. The decoding process of \({\mathbf{X}}^{embedding}\) from the hidden layer’s embedding space to the output space is formulated in Eq. (6).

$$ {\mathbf{X}}_{j}^{embedding} = \sigma ({\mathbf{WX}}_{j}^{input} + {\mathbf{b}}) $$
(5)
$$ {\mathbf{X}}_{i}^{output} = \sigma ({\mathbf{W^{\prime}X}}_{j}^{embedding} + {\mathbf{b}}) $$
(6)

where \({\mathbf{X}}_{j}^{input}\) denotes the jth input data, \({\mathbf{X}}_{j}^{embedding}\) and \({\mathbf{X}}_{i}^{output}\) represent the encoded vector and the decoded vector of \({\mathbf{X}}_{j}^{input}\), respectively. \(\sigma ( \cdot )\) represents the codec function and the Sigmoid function is chosen in this work for nonlinear codec transformation. \({\mathbf{W}} \in {\mathbb{R}}^{{D_{s} \times D}}\) and \({\mathbf{b}} \in D_{s}\) are the weight matrix and bias vector of the encoding process, respectively. \({\mathbf{W^{\prime}}} \in {\mathbb{R}}^{{D \times D_{s} }}\) and \({\mathbf{b^{\prime}}} \in D\) are the weight matrix and bias vector of the decoding process, respectively. The weights and bias units in the codec process can be derived by minimizing the reconstruction error \(L(w,b)\) of the input data \({\mathbf{X}}^{input}\) and the output data \({\mathbf{X}}^{output}\), as shown in Eq. (7).

$$ L(w,b) = \sum\limits_{i = 1}^{p} {\left\| {{\mathbf{X}}_{i}^{input} - {\mathbf{X}}_{i}^{output} } \right\|}^{2} $$
(7)

Unlike the reconstruction mode of linear PCA, the autoencoder learns the bilateral non-linear encoding and decoding transformations from input data to output data by minimizing the reconstruction error, so that it can maintain the consistency of data information between the input and output layers as much as possible, resulting in a strong generalization performance [42]. The autoencoder can achieve feature downscaling and feature upscaling of the input data during its encoding process by scaling the number of neuron nodes in the hidden layer [43]. Therefore, DDCSAEA uses an autoencoder to encode the candidate solutions of the low-dimensional feature subspace to expand their dimensionality for reconstructing the corresponding samples of the original high-dimensional feature space. In this way, the optimal structural property information about the problem fitness landscape contained in the candidate solutions of the low-dimensional feature subspace allows being transferred. To balance the complexity of autoencoder training and the efficiency of DDASAEA, the training epoch for the Autoencoder is set as \(epoch = 20\). The PCA feature reduction and autoencoder feature reconstruction are shown in Fig. 1b.

Dual-drive collaboration surrogate-assisted evolutionary algorithm by coupling feature reduction and reconstruction

Outline of DDCSAEA

Algorithm 1 gives the pseudocode of the DDCSAEA. The schema of DDCSAEA is mainly structured in two modules. One features a PCA-driven feature reduction and an RBF-assisted evolutionary sampling in the feature subspace. The other accompanies an autoencoder-driven feature reconstruction and nearest neighbor principle-assisted infill sampling in the original high-dimensional space. The two modules execute in tandem to realize the transfer and communication of the principal feature information in the original high-dimensional space and the low-dimension feature subspace. Additionally, the differential mutation is carried out on the nearest neighbors surrounding the optimal solution attained in the feature subspace optimization to fully exploit the optimal a priori of the solution space.

figure a

Figure 2 presents the flowchart of DDCSAEA, wherein the dashed line indicates the data flow direction. As shown in Fig. 2, DDCSAEA first uses Latin hypercube sampling (LHS) [44] to initialize the iterative swarm with its fitness calculated by the real objective function. Then the real-evaluated samples are stored in the database DB, followed by an ascending ranking of the DB samples based on fitness. Afterward, the PCA-driven feature reduction on the original high-dimensional space is performed to construct a low-dimensional feature subspace. Here, a certain number of random samples of the database is utilized to construct and update the PCA to reduce the original high-dimensional problem space into a low-dimensional feature subspace. The projection of the random sample set in the original problem space is selected to train a local RBF model to approximate the landscape of the feature subspace, and the optimum GbestRBF of the RBF model in the projection domain of the iterative swarm in the feature subspace is located via the SLPSO. Thereafter, k nearest neighbors to GbestRBF are chosen to constitute a trial population according to the Euclidean distance criterion. The trial population then evolves M generations with \(DE/rand/1\) mutation operator to generate \(K \times M\) candidate solutions. Subsequently, the autoencoder-driven feature reconstruction on these candidate solutions is implemented to reconstruct these candidate solutions into the original high-dimensional space in a one-by-one manner. Here a certain number of randomly selected candidate solutions is used to train the autoencoder. Finally, the Euclidean distances between the current global best solution and reconstructed candidate solutions are computed, and the solutions with the smallest distance are then chosen for real evaluation. During the iteration, the newly evaluated solutions are archived in DB to update the database and the global optimum. Repeat the above procedure until the termination condition is reached, then output the optimal solution.

Fig. 2
figure 2

The flowchart of DDCSAEA

PCA-driven feature reduction and surrogate-assisted feature subspace optimization

To alleviate the high complexity of surrogate modeling in high-dimensional feature space and improve the prediction accuracy, DDCSAEA iteratively trains a PCA model to perform feature reduction on the original high-dimensional feature space to extract principal features for constructing a simplified low-dimensional feature subspace. A local RBF model is then trained by the projections in the feature subspace of a certain number of random samples in the database to approximate the landscape of the feature subspace. To further manipulate the training complexity of RBF in the feature subspace, the \(2D_{s} + 1\)[45] database samples are chosen for training the RBF model, where Ds denotes the dimension of the feature subspace. Moreover, to fully exploit a priori knowledge contained in the principal features, an RBF-assisted SLPSO search is carried out to find the best solution GbestRBF of the RBF model in the area covered by the training samples. Note that other EAs and local search methods can also be used as the local optimizer. Algorithm 2 provides a pseudo-code for the surrogate model-assisted feature subspace optimization based on the PCA feature reduction.

figure b
figure c

Autoencoder-driven feature reconstruction

After completing the surrogate-assisted feature subspace optimization, DDCSAEA further collects the candidate samples \({\mathbf{V}}^{(candi)}\) after deduplication in the feature subspace to train an autoencoder for feature reconstruction. Algorithm 3 provides the pseudocode for autoencoder-driven feature reconstruction, where \({\mathbf{S}}^{(recons)}\) indicates the set of reconstructed solutions in the original problem space \({\mathbb{R}}^{D}\) by the autoencoder on the associated candidate solution set in the feature subspace \({\mathbb{R}}^{{D_{s} }}\).

To diversify the candidate solutions for the autoencoder modeling and the subsequent infill-sampling, the nearest neighbors to GbestRBF further undergo several generations of differential mutation and crossover operations to generate a set of trial solutions. In this way, it is possible to assure the intra-domain transfer of information about the optimal structure of the feature subspace on the one hand. On the other hand, it can effectively enrich a priori landscape knowledge over the optimal region of the feature subspace and provide enough samples for subsequent feature reconstruction. To be more specific, the Euclidean distances between the optimal solution GbestRBF and the individuals of the iterative population \({\mathbf{PoP}}^{(t)}\) is first calculated. Here \({\mathbf{PoP}}^{(t)}\) represents the corresponding projected population of feature subspace \({\mathbb{R}}^{{D_{s} }}\) by projecting the iterative population \({\mathbf{P}}^{(t)}\) of the original problem space \({\mathbb{R}}^{D}\) with the PCA model. K individuals with the smallest distances are then figured out to comprise a trial population \({\mathbf{P}}^{{{(}trial)}}\). After that, \({\mathbf{P}}^{{{(}trial)}}\) performs M generations of differential mutation and binomial crossover operations to generate the candidate solution set \({\mathbf{V}}^{(candi)}\). In this work, \(DE/rand/1\) mutation strategy as shown in Eq. (8) is adopted, where \(x_{r1}\), \(x_{r2}\) and \(x_{r2}\) represent three mutually exclusive individuals in \({\mathbf{P}}^{{{(}trial)}}\) and F is the scale factor taking values in the interval \([0.4,1]\)

$$ \upsilon_{i} = x_{r1} + F(x_{r2} - x_{r3} ) $$
(8)

For a balance between the training efficiency and prediction accuracy of the autoencoder model, its training set size is set to \(NS\). If the number of candidate solutions in the feature subspace exceeds \(NS\), \(NS\) candidate solutions are randomly picked from the candidate set to train the autoencoder model; Otherwise, a fraction of individuals in the trial population \({\mathbf{P}}^{{{(}trial)}}\) are chosen in the order of fitness priority to compensate for the training set. Note that the training set of the autoencoder model undergoes feature reconstruction to acquire the candidate solution set of the original high-dimensional feature space for subsequent infill sampling.

It is conceivable that using the autoencoder model to nonlinearly transfer the structural prior information concerning the optimal region of the feature subspace to the original problem space at the expense of losing some feature information contributes to the dynamic regulation of the structural prior information of the original problem space. Meanwhile, as the newly added samples aggregate within the neighborhood of the best solution, the consistency regarding the optimal structure attributes between the feature subspace \({\mathbb{R}}^{{D_{s} }}\) and the original problem space \({\mathbb{R}}^{D}\) is possible to strike a good balance via the autoencoder-driven feature reconstruction.

Infill-sampling based on the nearest neighbor principle

After mapping the candidate solutions in the feature subspace back into the original problem space via the autoencoder model, q samples \({\mathbf{nbest}} = ({\mathbf{ns}}_{1} ,{\mathbf{ns}}_{2} ,...,{\mathbf{ns}}_{q} )\) nearest to the current global best solution Gbest are picked from the candidate solution set \({\mathbf{S}}^{(recons)}\) for real evaluation. Algorithm 4 provides the pseudocode of the infill-sampling strategy. Here, taking the nearest neighbors of the best solution for reevaluation and infill-sampling is capable of increasing the sample density of the neighborhood the optimal solution locates, thus enriching the landscape prior to steering the search to rapidly reach the optimum.

figure d

Empirical study

To validate the effectiveness of the proposed DDCSAEA, we test it over seven widely used benchmark problems featured by different fitness landscapes at four-dimensional scales, i.e., 30, 50, 100, and 200 dimensions. Its performance is compared with eight state-of-the-art algorithms, including SHPSO [9], TL-SSLPSO [46], SAMSO [23], DESO [47], TS-DDEO [48], TASEA [22], SADE-AMSS [25] and SAEO [28], to examine its optimization efficiency. Table 1 lists the basic characteristics of the selected benchmark problems that cover the single-peaked landscape with zero-point as optimum, multi-peaked landscapes with zero-point as optimum, and complex asymmetric multi-peaked landscapes with non-zero-point as optimum.

Table 1 Basic characteristics of selected benchmark problems

Parameter settings

In the following experiments, the population size for DDCSAEA is set to 50. \(D_{s} = 10\) principal features are extracted via PCA for structuring the feature subspace \({\mathbb{R}}^{{D_{s} }}\). The scale factor F and the crossover probability CR are both set to 0.8, and \(K = 5\) nearest neighbors to \({\mathbf{Gbest}}\) are picked to comprise the trial population \({\mathbf{P}}^{{{(}trial)}}\). \(NS = 80\) training samples are chosen for the autoencoder modeling. A sensitivity analysis concerning the parameters K and \(D_{s}\) on the performance of DDCSAEA is detailed in the subsequent “Sensitivity analysis”. To control the computational budget, \(q = 2\) solutions are chosen for real evaluation at each iteration. In addition, the parameter configurations for the compared algorithms SHPSO, TL-SSLPSO, SAMSO, DESO, TS-DDEO, TASEA, SADE-AMSS and SAEO follow the same settings recommended in the relevant literature. All algorithms involved in the experiments are implemented on a desktop computer with an Intel(R) Xeon(R) Gold 5218 CPU @ 2.30 GHz. 20 independent runs are assigned to each contestant, and the maximum number of real objective function evaluations \(MaxFes = 1000\) is used to trigger the termination condition.

Behavior analysis of the DDCSAEA

DDCSAEA sequentially couples the PCA model and autoencoder model for feature reduction and feature reconstruction of the original high-dimensional problem space. To test its effectiveness, in this section, we first perform a sensitivity analysis of parameter \(D_{s}\) to gain some insights into its impacts on the performance of DDCSAEA. Then, we draw a comparison on the efficiency of feature reduction and reconstruction between DDCSAEA coupling with PCA and autoencoder versus DDCSAEA solely assembling the autoencoder or the PCA.

Sensitivity analysis

Table 2 records the statistical results of DDCSAEA with \(D_{s} = 5,10,30\) on 50-, 100-, and 200-dimensional benchmark problems over 20 independent runs, respectively, including the mean and standard deviation of the obtained best solutions as well as the average time cost (TC). In Table 2, the best result on each test instance is bolded and the suboptimal result is highlighted with shadow. As shown in Table 2, DDCSAEA’s performance improves as the feature subspace scale grows, but so does its time complexity. More specifically, DDCSAEA possesses a low time complexity with \(D_{s} = 5\), but it is hard to locate a better solution to the selected problems. When the scale of \(D_{s}\) reaches \(D_{s} = 30\), DDCSAEA performs the best on a majority of the selected problems. However, its time complexity is higher among the comparative cases. In contrast, for DDCSAEA with \(D_{s} = 10\), its final results are slightly worse than that of DDCSAEA with \(D_{s} = 30\), but its time complexity has received a great boost. Meanwhile, compared to DDCSAEA with \(D_{s} = 5\), DDCSAEA taking \(D_{s} = 10\) gets better performance, and both have a relatively lower time complexity against the case of \(D_{s} = 30\). Therefore, to strike a good trade-off of the computational complexity and convergence performance of the DDCSAEA, \(D_{s} = 10\) is configured to regulate the feature subspace scale.

Table 2 The statistical results of DDCSAEA with different Ds

Comparison results of DDCSAEA featured by a hybrid or single feature reduction and reconstruction technique

For simplicity, we denote the variant of DDCSAEA that ensembles a single autoencoder for feature reduction and reconstruction as DDCSAEA-AE, and name another DDCSAEA variant as DDCSAEA-PTP that solely assembles a PCA for feature reduction and reconstruction. Table 3 presents the statistical results of these three contestants for solving 50-, 100-, and 200-dimensional benchmark problems. The pairwise Wilcoxon rank sum test on the final results at 95% confidence level is also computed, where “ + ”, “–” and “≈” indicate DDCSAEA performs significantly better than, significantly worse than or equivalent to the compared algorithm, respectively, in terms of the final solutions. As shown in Table 3, for the selected benchmark problems, DDCSAEA significantly outperforms DDCSAEA-AE and DDCSAEA-PTP in obtaining the best solutions for at least 18 test instances, slightly underperforms DDCSAEA-PTP and DDCSAEA-AE on two test instances, and performs comparably to DDCSAEA-AE on one instance. To be more specific, DDCSAEA can obtain significantly better results than DDCSAEA-PTP and DDCSAEA-AE over 1000 real fitness function evaluations for all unimodal and multimodal test problems except F6, exhibiting good robustness over the problem scale. For F6 featured a complex multimodal fitness landscape, DDCSAEA performs worse than DDCSAEA-PTP and DDCSAEA-AE in the cases of 50 and 100 dimensions. However, as the scale and complexity of the problem grow, the performance of DDCSAEA coupled with PCA and autoencoder improves. Meanwhile, there shows no significant difference between DDCSAEA and DDCSAEA-AE in the solution quality on the 200-dimensional F6, on which the best solution obtained by DDCSAEA is slightly better than that of DDCSAEA-AE. From Table 3, we can conclude that in contrast to directly encoding and decoding the original high-dimensional problem space via a single autoencoder or a single PCA, sequentially coupling PCA with autoencoder for feature reduction and reconstruction of the target feature space can significantly improve the performance of DDCSAEA, demonstrating the effectiveness and superiority of this hybrid codec strategy.

Table 3 Comparison results of DDCSAEA against DDCSAEA-PTP and DDCSAEA-AE on benchmarks

Comparison results of DDCSAEA with different training samples for PCA

PCA tends to extract the principal components of feature space with larger variance by the training samples, thus using different training samples for PCA modeling differentiates the extracted principal components, due to the different distribution and quality of the training samples. This subsection explores the potential impact on the performance of DDCSAEA with different training samples for PCA modeling. Here, the DDCSAEA compares with its two variants, i.e., DDCSAEA-TB and DDCSAEA-TW. In terms of PCA modeling, DDCSAEA-TB chooses the best sample set of DB to be the training samples, while DDCSAEA-TW uses the worst sample set of DB. Table 4 shows the statistical results of these three competitors on the selected benchmark problems with the best results on each instance being highlighted. From Table 4, one can observe that DDCSAEA can obtain the best results than DDCSAEA-TB and DDCSAEA-TW on all the test instances except 100- and 200-dimensional F6, indicating the remarkable superiority of using a random sample set in the database for PCA modeling. In fact, as the optimization process progresses, the best samples in DB aggregate in the decision space exacerbating the difficulty of exploration with smaller variances in the principal direction, while the worst samples in DB deteriorate the exploitation. In contrast, using the random samples for PCA training is promising to promote diversity in search and help escape the shackles of local optimality by emphasizing exploration of the unexplored areas with random principal components.

Table 4 Comparison results of DDCSAEA against DDCSAEA-TB and DDCSAEA-TW on the selected benchmarks

Comparison results of DDCSAEA versus five advanced algorithms without feature reduction

To further investigate the computational efficiency of the proposed DDCSAEA, we carry out a comparison of DDCSAEA with five advanced algorithms without feature reduction, including SHPSO, TL-SSLPSO, SAMSO, DESO, and TS-DDEO. Table 5 shows the statistical results of these contestants on the selected benchmark problems. As shown in Table 5, in general, DDCSAEA performs significantly better than the five competitors for a majority of the selected test instances. According to the results derived from the pairwise Wilcoxon rank sum test at a 95% significant level, DDCSAEA significantly outperforms SHPSO, TL-SSLPSO, and SAMSO on at least 22 test instances and significantly outperforms DESO and TS-DDEO on at least 19 test instances. Furthermore, according to the average rankings computed by the Friedman test, the proposed DDCSAEA ranks first among the contestants, indicating its strong computational efficiency and robustness.

Table 5 Comparison results of DDCSAEA against five advanced algorithms on the selected problems

To be more specific, as shown in Table 5, DDCSAEA outperforms the other comparative algorithms for the 30-dimensional unimodal problem F1. However, as the dimensionality of the problem increases, DDCSAEA's performance improves and significantly outperforms SHPSO, TL-SSLPSO, SAMSO, and DESO, whereas its computational efficiency necessitates being further improved in contrast to TS-DDEO. We speculate that this is mainly due to the surrogate-assisted dimension-by-dimension crossover strategy in TS-DDEO. Underlying the dimension-by-dimension crossover strategy on the current optimal solution with a portion of the optimal sample set, TS-DDEO updates the global optimum by yielding as many candidate solutions as possible in the neighborhood of the optimal solution. This in turn enhances the algorithm's local exploitation to the neighborhood of the optimum and significantly increases the algorithm's convergence accuracy, especially for unimodal problems. Nonetheless, DDCSAEA can obtain significantly better solutions than TS-DDEO for the multimodal problems F4, F5, and F7.

For F2, whose global optimum locates in a narrow basin, the proposed DDCSAEA yields superior results over SHPSO, TL-SSLPSO, and SAMSO in 30- and 50-dimensional cases, while its final results are not as good as that of DESO and TS-DDEO. However, under the scenario of 100- and 200-dimensional instances, DDCSAEA has achieved the best solutions among all the competitors. This indicates that the neighboring prior knowledge of the optimal solution in the original high-dimensional problem space can be effectively learned through feature reduction, subspace local search and feature reconstruction as the increase of problem complexity.

For F3, which features a symmetric multimodal fitness landscape, DDCSAEA defeats the other five competitors in 50-, 100-, and 200-dimensional cases and obtains significantly better results. From Table 5, one can note that DDCSAEA is slightly worse than SHPSO, DESO, and TS-DDEO in solving the 30-dimensional instance. This may attribute to DDCSAEA’s iterative execution of feature reduction and reconstruction on the current optimal solution and its neighboring candidate solutions for updating the iterative population, resulting in rapid degradation of population diversity. On the contrary, SHPSO, DESO, and TS-DDEO, which all involve global exploration sampling of the original problem space, can effectively maintain the iterative population's diversity as the optimization progresses. Similarly, for the multimodal problems F4 and F5, as shown in Table 5, DDCSAEA holds good robustness as the problem scale varies, and the best solutions obtained are significantly better than those of the other contestants.

For the asymmetric multimodal problems F6-F7, as shown in Table 5, DDCSAEA performs worse than the other five competitors in 30-, 50-, and 100-dimensional cases. However, the performance of DDCSAEA improves significantly when the problem dimension reaches 200, where the best solution obtained by DDCSAEA is significantly better than those of the other five competitors. Particularly, for F7, DDCSAEA has found the best solutions with high qualities against the competitors within 200 dimensions, indicating the high accuracy and good robustness of DDCSAEA in solving such complex problems. Note that DDCSAEA deploys a PCA to reduce the complexity of the target high-dimensional problem space. In this manner, the original problem space can be well smoothed by getting rid of some redundant feature information.

To be more intuitive, Figs. 3, 4, 5 and 6 depict the average performance profiles of each compared algorithm for solving the selected benchmark problems over 20 independent runs. In general, DDCSAEA converges rapidly in the early search stage and can quickly locate the local optimal solution at the expense of very few real fitness function evaluations, e.g., optimization on F1–F7 of 30 dimensions. Meanwhile, DDCSAEA can still quickly locate the global best solution as the problem scale increases against the other five contestants, such as F2, F3, F4, F5, and F7 with 50 to 200 dimensions. In particular, for F5 and F7, DDCSAEA has better convergence performance than the other five compared algorithms within 1000 real evaluations, and its superiority persists as the problem scale increases. This largely benefits from the compression and simplification of the feature space by the PCA model in DDCSAEA and the surrogate-assisted optimization of the principal feature subspace, which greatly improves the accessibility to prior knowledge of the principal features of the original problem space in the early search stage. Consequently, the promising solutions in the optimal region can be enriched through feature reconstruction. From Fig. 3, we can also find that DDCSAEA is less capable of global exploration in the later search stage than DESO and TS-DDEO, such as F1, F3, F4 and F6. Nevertheless, as shown in Figs. 4, 5 and 6, the convergence performance of DDCSAEA achieves great improvement and is significantly better than the other competitors in F3 and F4 in 50- to 200-dimensional instances. What’s more, from Fig. 6, we can observe that DDCSAEA exhibits remarkable convergence performance versus the other five compared algorithms.

Fig. 3
figure 3

Convergence profiles of the compared algorithms for 30-D benchmark problems

Fig. 4
figure 4

Convergence profiles of the compared algorithms for 50-D benchmark problems

Fig. 5
figure 5

Convergence profiles of the compared algorithms for 100-D benchmark problems

Fig. 6
figure 6

Convergence profiles of the compared algorithms for 200-D benchmark problems

According to the aforementioned comparative results shown in Table 5 and Figs. 3, 4, 5 and 6, it is possible to conclude that the proposed DDCSAEA holds a fast optimization speed against SHPSO, TL-SSLPSO, SAMSO, DESO, and TS-DDEO under the computation budget of 1000 real fitness function evaluations. In particular, for high-dimensional complex problem solution spaces of 100 to 200 dimensions, DDCSAEA offers significant superiority in terms of convergence performance and robustness.

Comparison results of DDCSAEA versus three advanced algorithms with feature reduction

The proposed DDCSAEA is further examined for its computational effectiveness alongside the TASEA, SADE-AMSS and SAEO. These three contestants take advantage of dimension reduction techniques to eliminate redundant or irrelevant features of the target high-dimensional or large-scale feature space, enabling them to deeply mine the prior knowledge following the direction of the most important features within the solution space, thereby improving surrogate model performance and interpretability. TASEA employs Sammon mapping to project the high-dimensional space into a lower-dimensional subspace, constructing a global GP model and a local GP model for different mutation operators in screening candidate solutions. SADE-AMSS constructs a series of subspaces using PCA and a random feature selection strategy, adaptively switching three subspace search strategies and surrogate modeling according to different optimization stages. SAEO uses an autoencoder for feature reduction and feature reconstruction of the iterative population on the high-dimensional solution space, with a surrogate-assisted greedy sampling in the target high-dimensional space, achieving excellent optimization performance. Table 6 presents the statistical results obtained by TASEA, SADE-AMSS, SAEO, and DDCSAEA on the selected benchmark problems of 50–200 dimensions, with the best results on each instance being highlighted. From Table 6, following the results derived from the pairwise Wilcoxon rank sum test at a 95% significance level, DDCSAEA significantly outperforms TASEA and SADE-AMSS on at least 19 test instances and significantly outperforms SAEO on 8 test instances. According to the Friedman test, DDCSAEA ranks first and attains the best average ranking of 1.50 among all competitors, followed by SAEO, SADE-AMSS and TASEA, indicating the superior performance of DDCSAEA in this category of SAEA with feature reduction. Additionally, for a more visual representation of the comparative performance, the radar charts of DDCSAEA versus TASEA, SADE-AMSS and SAEO are shown in Fig. 7. For the 50-dimensional asymmetric multimodal problem F6, DDCSAEA demonstrates poorer performance compared to TASEA, SADE-AMSS, and SAEO. However, the performance of DDCSAEA exhibits significant improvement as the problem dimension increases to 100-dimensional and 200-dimensional, reaching its optimal performance at 200-dimensional instances. To provide a clearer illustration of the performance differences between DDCSAEA and SAEO, Fig. 8 presents the separate radar charts for these two contestants. From Fig. 8, one can observe that DDCSAEA exhibits significant superiority against SAEO on the selected benchmark problems in terms of comprehensive performance and maintains strong robustness with the problem scale. Here in Fig. 8, the F7 is not marked in the radar charts due to the similar performance of DDCSAEA and SAEO.

Table 6 Comparison results of DDCSAEA against SAEO algorithm
Fig. 7
figure 7

Radar charts of DDCSAEA versus TASEA, SADE-AMSS and SAEO

Fig. 8
figure 8

Radar charts of DDCSAEA versus SAEO

Conclusion

This paper proposes a dual-drive collaboration SAEA namely DDCSAEA for complex high-dimensional expensive optimization problems. In the framework of DDCSAEA, two unsupervised feature learning techniques, PCA and autoencoder, are assembled in tandem to drive its exploration and exploitation across two feature spaces through iterative feature reduction and feature reconstruction. At each iteration of DDCSAEA, a PCA model is trained to reduce the original problem space for getting a principal feature subspace. An RBF-assisted SLPSO is then employed to exploit the optimum of the RBF model. After that, the neighboring samples to the RBF optimum in the feature subspace further undergo differential mutation and crossover to explore some promising candidate solutions for enriching the structural prior information of the optimal region. An autoencoder is thereafter trained to perform feature reconstruction on these candidates to project them into the target high-dimensional space. Finally, these candidate samples and current optimal samples after the feature reconstruction are prescreened based on the nearest neighbor principle to determine the infill samples for real evaluation. DDCSAEA is evaluated on a widely used test suit with various fitness landscapes at different dimensional scales. Experimental results demonstrate that DDCSAEA features superior convergence performance and better robustness over eight state-of-the-art algorithms under the scenario of 30-, 50-, 100-, and 200- dimensional instances.

However, we also note that DDCSAEA performs poorly in dealing with some of the medium and low-dimensional test problems. We speculate that the fixed feature subspace scale leads to the distortion of the feature subspace optimization, resulting in the wrong convergence. In our future work, the adaptive mechanism of the feature reduction scale to the original problem scale will be considered in conjunction with fitness landscape analysis to further improve the applicability of the DDCSAEA. Applying DDCSAEA to the solution of real-world expensive problems is also included in our future work.