Introduction

Evolutionary Algorithms (EAs) are population-based optimization methods that are applied to many real-world problems. However, because an expensive fitness evaluation is often required in real-world applications owing to simulations or complex numerical calculations, the computational cost of EAs is typically high. To reduce the computing cost of EAs, surrogate-assisted EA (SAEA) has been studied [1, 2] and applied to real-world applications such as aerospace engineering [3, 4], vehicle design [5], and manufacturing process optimization [6]. An SAEA utilizes a surrogate model that estimates fitness instead of a computationally expensive fitness function and finds promising solutions for actual fitness evaluation. Because surrogate estimation is computationally cheaper than actual fitness evaluation, the execution time of the SAEA is reduced compared to that of conventional EAs.

Three types of surrogate models are used in SAEAs [7]asfollows: (1) a regression model that directly estimates the fitness evaluation, (2) a classification model that estimates the relative acceptability compared to a reference value rather than a fitness evaluation, or (3) a ranking model that estimates the relative superiority of solutions compared to each other. This study proposes a novel ranking-based surrogate model that exploits the fact that general EAs can perform parent selection and survival selection based on the superiority of solutions. Specifically, this paper proposes the Extreme Learning-machine-based DirectRanker (ELDR), which combines an extreme learning machine (ELM) [8, 9]—a type of single-layer neural network (NN) with fast learning capability—with DirectRanker (DR) [10]—a pairwise ranking method based on a NN developed primarily for information retrieval.

To investigate the effectiveness of ELDR, its prediction accuracy was first analyzed by comparing it with other surrogate models used in previous research. Then, to confirm its capability on an SAEA, it was incorporated into a state-of-the-art SAEA—specifically, surrogate-assisted hybrid optimization (SAHO) [11]—and the consequent ELDR-SAHO was compared with recent SAEA methods, including SAHO.

The remainder of this paper is organized as follows: “Related Work” discusses related work on SAEAs. “Proposed Method” describes the proposed new pairwise ranking surrogate model, ELDR, in detail. “Example Application of ELDR to SAEA: ELDR Surrogate-Assisted Hybrid Optimization” presents the details of ELDR-assisted SAEA. In particular, this study employed SAHO—a state-of-the-art SAEA. “Preliminary Experiment: Accuracy of ELDR analyzes the parameter sensitivity of ELDR and compares its estimation accuracy with other conventional surrogate models used in existing SAEAs. Section “Numerical Experiments” outlines the numerical experiments conducted to investigate the effectiveness of ELDR-assisted SAHO and analyzes the results obtained. Finally, “Conclusion and Future Work” presents concluding remarks and outlines possible future work.

Related work

This section reviews surrogate models used in SAEAs in previous studies. This study focuses primarily on single-objective real-valued optimization problems but also refers to some studies on multi-objective optimization and discrete optimization, including genetic programming.

In previous studies, the mainstream approach has been SAEAs using regression models. In particular, most previous works [12,13,14,15] used the radial basis function (RBF) [16] as a surrogate model. Other SAEAs that use the Kriging model [17, 18], Gaussian process regression [19], and the nearest neighbor method [20] have also been proposed. Pavelski et al. proposed ELMOEA/D [21], a surrogate-assisted multi-objective evolutionary algorithm that uses ELM as a regression surrogate model.

Recent studies have proposed classification-based SAEAs. Pan et al. [22] proposed a classification-based SAEA (CSEA) that learns the dominance relationship between candidate solutions and reference solutions using an artificial neural network (ANN) [23]. Sonoda et al. [24] used a support vector machine (SVM) [25] to optimize a decomposition-based MOEA (MOEA/D) to classify whether an offspring solution is better than its parent solution for each aggregation function. Wei et al. [26] proposed a classifier-assisted level-based learning swarm optimizer (CA-LLSO) that uses a multiclass gradient boosting classifier (GBC) [27] to classify the swarm (population) into different levels for applying LLSO [28].

In contrast to the regression and classification models, few studies have reported ranking-based SAEAs. Ranking models are usually applied for pre-selection [7], for example, to estimate the population ranking in CMA-ES [29] in the works of Runarsson [30] and Loshchilov et al. [31]. Lu et al. [32] proposed Differential Evolution (DE) [33] with surrogate-assisted self-adaptation (DESSA), which uses ranking SVM (RankSVM) [34] to select the most promising trial vector. In recent years, Hao et al. [35] proposed a ranking model using a vector concatenating two solutions as input and demonstrated that the ranking model shows higher estimation accuracy than regression and classification models with a small number of training samples. Another work by Hao et al. [36] proposed a similar ranking model based on a neural network that estimates the dominance relationship for solving multi-objective optimization problems. However, there is limited research on high-performance SAEAs using rank models compared with SAEAs using regression and classification models.

Because the ranking model can estimate the dominance relation between solutions rather than the objective function value, it can be applied to a broader range of problem domains where regression and classification models are not applicable. One example is constrained optimization problems for the definition of parental or survival selection methods (e.g., feasibility rules [37] or the \(\epsilon \)-constrained method [38]) without estimating all the constraint values. Another example is the subject human evaluation in interactive EAs [39], which generally uses relative rather than quantitative assessment. Therefore, developing a useful ranking surrogate model is more effective for applying SAEA to a different problem domain than regression and classification models.

Proposed method

This section first introduces ELM [8, 9] and DR [10], which are the components of ELDR. Then, the operation of ELDR, which integrates these two techniques, is explained.

Fig. 1
figure 1

A topology of ELM

Extreme learning machine

ELM is a single-layer NN. The topology of ELM is illustrated in Fig. 1. ELM is formed with a fully connected NN. Given a d-dimensional input \(\varvec{x}\), ELM with L hidden neurons calculates the output y as

$$\begin{aligned}&y=\sum _{i=1}^{L}\beta _ih( \varvec{x}, \varvec{w}_i, b_i)\nonumber \\&=\varvec{h}(\varvec{x}, W, \varvec{b})^T\varvec{\beta } ,\\&\varvec{h}(\varvec{x}, W, \varvec{b})=\begin{bmatrix} h(\varvec{x}, \varvec{w}_1, b_1)&\cdots&h(\varvec{x}, \varvec{w}_L, b_L) \end{bmatrix}^T\nonumber ,\\&W=\{ \varvec{w}_1, \cdots , \varvec{w}_L \}\nonumber ,\\&\varvec{b}=\{ b_1,\cdots ,b_L \}\nonumber ,\\&\varvec{\beta }=\begin{bmatrix} \beta _1&\cdots&\beta _L \end{bmatrix}^T,\nonumber \end{aligned}$$
(1)

where h indicates an activation function and \(\varvec{w}_i\) and \(b_i\) indicate the weight vector and bias value from the input to the i-th hidden neuron (\(1\le i\le L\)). The value of \(\beta _i\) represents the weight from the i-th hidden neuron to the output.

The most notable feature of ELM is that it randomly assigns hidden layer weights W and biases \(\varvec{b}\) and does not learn them, whereas the output weights \(\varvec{\beta }\) are the only parameters learned. For the training data of N input–output pairs \(D=\{(\varvec{x}_1, t_1), (\varvec{x}_2, t_2), \cdots , (\varvec{x}_N, t_N)\}\), (1) can be expressed as

$$\begin{aligned} \textbf{H}\varvec{\beta }=\textbf{T}, \end{aligned}$$

where

$$\begin{aligned} \textbf{H}&=\begin{bmatrix} \varvec{h}(\varvec{x}_1, W, \varvec{b})&\cdots&\varvec{h}(\varvec{x}_N, W, \varvec{b}) \end{bmatrix}^T\\ \textbf{T}&=\begin{bmatrix} t_1&\cdots&t_N \end{bmatrix}^T \end{aligned}$$

For this purpose, the ELM output weights \(\varvec{\beta }\) are calculated by the following equation using pseudo-inverse matrix operations with the regularization term [40]:

$$\begin{aligned} \varvec{\beta }=\textbf{H}^T\left( \frac{\textbf{I}}{C}+\textbf{H}\textbf{H}^T\right) ^{-1}\textbf{T}, \end{aligned}$$

when the number of training samples was not large. Conversely, for the case where the number of training samples is large (\(N\gg L\)), the following alternative formula is used:

$$\begin{aligned} \varvec{\beta }=\left( \frac{\textbf{I}}{C}+\textbf{H}^T\textbf{H}\right) ^{-1}\textbf{H}^T\textbf{T}. \end{aligned}$$
(2)

The output weights of ELM can be quickly calculated using the pseudo-inverse matrix operation without any iterative procedures. The user parameters are only the activation function h, regularization coefficient C, and number of hidden neurons L, making parameter tuning easy. Furthermore, the activation function is non-differentiable, which provides flexibility. In ELM, the following activation functions are generally employed:

Sigmoid (SIG)::
$$\begin{aligned} h(\varvec{x}, \varvec{w}, b)=\frac{1}{1+\exp (-(\varvec{w}\varvec{x}+b))} \end{aligned}$$
Gaussian (GAU)::
$$\begin{aligned} h(\varvec{x}, \varvec{w}, b)=\exp (-b\Vert \varvec{x}-\varvec{w}\Vert ^2) \end{aligned}$$
Multiquadric (MQ)::
$$\begin{aligned} h(\varvec{x}, \varvec{w}, b)=(\Vert \varvec{x}-\varvec{w}\Vert ^2+b^2)^{\frac{1}{2}} \end{aligned}$$

The advantages of ELM against conventional machine learning models such as NNs and SVMs have been reported in some previous works [41, 42] as follows:

  • The training cost is low because ELM does not require the iterative procedures of backpropagation.

  • ELM tends to have better generalization performance because it not only minimizes the mean squared error but also looks for the much simpler model.

  • Because ELM does not use gradient-based learning, non-differentiable activation functions can be used.

Since these features are helpful as a surrogate model for SAEAs, this study focuses on ELM.

DirectRanker

DR is a NN-based pairwise ranking method that implements a quasi-order \(\succcurlyeq \) in future space F such that the ranking is unique. In particular, the quasi-order \(\succcurlyeq \) satisfies the following three conditions for all \(\varvec{x}, \varvec{y}, \varvec{z}\in F\):

  1. 1.

    Reflexivity: \(\varvec{x}\succcurlyeq \varvec{x}\)

  2. 2.

    Antisymmetry: \(\varvec{x}\not \succcurlyeq \varvec{y}\Rightarrow \varvec{y}\succcurlyeq \varvec{x}\)

  3. 3.

    Transitivity: \((\varvec{x}\succcurlyeq \varvec{y}\wedge \varvec{y}\succcurlyeq \varvec{z})\Rightarrow \varvec{x}\succcurlyeq \varvec{z}\)

This quasi-order can be defined by the ranking function \(r: F\times F\rightarrow \mathbb {R}\) as follows:

$$\begin{aligned} x\succcurlyeq y\Leftrightarrow r(\varvec{x}, \varvec{y})\ge 0. \end{aligned}$$

The ranking function \(r(\varvec{x},\varvec{y})\) in DR is constructed using two subnetworks nn and the output layer. Subnetworks nn have the same structure and shared weights. On the other hand, the output layer calculates the output from the difference in the outputs of the two subnetworks, the output weights \(\varvec{w}\), and the activation function \(\tau \) as follows:

$$\begin{aligned} o=r(\varvec{x}, \varvec{y})=\tau \left( \varvec{w}^T(nn(\varvec{x})-nn(\varvec{y}))\right) , \end{aligned}$$
(3)

where the activation function \(\tau : \mathbb {R}\rightarrow \mathbb {R}\) satisfies the following conditions: \(\tau (-x)=-\tau (x), \text {sgn}(\tau (x))=\text {sgn}(x)\); the original work [10] used the hyperbolic tangent (\(\tanh \)). The function expressed in (3) satisfies the three conditions that the quasi-order must satisfy: Please refer to the details in the original work [10].

Extreme learning machine-based DirectRanker (ELDR)

Assuming that DR is applied to SAEA as a ranking-based surrogate model, the problem of high computational cost for the training of DR may arise, because it optimizes the weights of NN through iterative procedures using backpropagation. To address this issue, this study proposes a novel pairwise ranking method, ELM-based DR (ELDR), which constructs the subnetwork of DR with ELM, enabling quick training by the pseudo-inverse matrix operation without the iterative procedures of backpropagation.

Fig. 2
figure 2

A topology of ELDR

Figure 2 illustrates the topology of ELDR. ELDR takes two d-dimensional inputs, x and y, and calculates the output r from the subtraction of two hidden layers. ELDR employs the random-weighted single-layer network in (1) for the DR subnetwork and defines the ranking function \(r_{ELDR}\) as follows:

$$\begin{aligned} r_{ELDR}(\varvec{x}, \varvec{y})=\left( \varvec{h}(\varvec{x}, W, \varvec{b})-\varvec{h}(\varvec{y}, W, \varvec{b})\right) ^T \varvec{\beta }, \end{aligned}$$

where W and \(\varvec{b}\) are the weight parameters randomly assigned to the same ELM and \(\varvec{\beta }\) is the only parameter to be trained. Because ELDR uses two subnetworks with the same structure, weights, and activation functions as DR, the ranking function \(r_{ELDR}\) also satisfies the three conditions of the quasi-order.

figure a

Algorithm 1 shows the training procedure for ELDR. First, all feature vectors \(\varvec{x}_i\) are normalized in the range \([-1, 1]\) using the minimum and maximum values in the dataset D. Then, the paired dataset \(D_{pair}\) is constructed with all combinations of the two solutions in dataset D as

$$\begin{aligned}&D_{pair}=\{(\varvec{x}_1^{(1)}, \varvec{x}_1^{(2)}, t_1), \cdots , (\varvec{x}_{N_p}^{(1)}, \varvec{x}_{N_p}^{(2)}, t_{N_p})\}\\&t_j=\text {sgn}\left( f(\varvec{x}_j^{(2)})-f(\varvec{x}_j^{(1)})\right) \\&j=1,\cdots ,N_p, \end{aligned}$$

where \(\text {sgn}\) represents the sign function that returns \(1, -1, 0\) according to the sign of the input value. Here, it is assumed that the minimization of \(f_i\) and \(t_j\) is \(+1\) if \(\varvec{x}_j^{(1)}\) is better than \(\varvec{x}_j^{(2)}\), \(-1\) if worse, and 0 if equal. For the constructed training dataset \(D_{pair}\), ELDR is formulated as

$$\begin{aligned}&(\mathbf {H^{(1)}}-\mathbf {H^{(2)}})\varvec{\beta }=\textbf{H}^\prime \varvec{\beta }=\textbf{T}\\&\textbf{H}^\prime =\mathbf {H^{(1)}}-\mathbf {H^{(2)}}\nonumber \\&\textbf{T}=\begin{bmatrix} t_1&\cdots&t_{N_p} \end{bmatrix}^T\nonumber , \end{aligned}$$
(4)

where \(\textbf{H}^{(k)}\) is calculated from the k-th feature vector \(\varvec{x}^{(k)}_i\) (\(k={1,2}\)) in the i-th input pair as follows:

$$\begin{aligned} \textbf{H}^{(k)}=\begin{bmatrix} \varvec{h}(\varvec{x}_1^{(k)}, W, \varvec{b})&\cdots&\varvec{h}(\varvec{x}_{N_p}^{(k)}, W, \varvec{b}) \end{bmatrix}^T. \end{aligned}$$

In this study, the weights W and bias \(\varvec{b}\) are randomly sampled from a uniform distribution in the ranges \([-1, 1]\) and [0, 1], respectively [43]. From (4), the output weights \(\varvec{\beta }\) can be calculated as follows:

$$\begin{aligned} \varvec{\beta }=\left( \frac{\textbf{I}}{C}+\textbf{H}^{\prime T}\textbf{H}^\prime \right) ^{-1}\textbf{H}^{\prime T}\textbf{T} \end{aligned}$$
(5)

with a pseudo-inverse matrix calculation, as in Eq. (2).

When the prediction gives two solutions \(\varvec{x}_1\) and \(\varvec{x}_2\), the output of ELDR is calculated as

$$\begin{aligned} \hat{t}&=r_{ELDR}(\varvec{x}_1, \varvec{x}_2)\\&=\text {sgn}\left( (\varvec{h}(\varvec{x}_1,W, \varvec{b})-\varvec{h}(\varvec{x}_2,W, \varvec{b}))^T\varvec{\beta }\right) \end{aligned}$$

by using the weights W and biases \(\varvec{b}\) assigned in the training phase and the output weights \(\varvec{\beta }\) obtained by Eq. (5). If \(\hat{t}=+1\), ELDR predicts that \(\varvec{x}_1\) is better than \(\varvec{x}_2\); otherwise, it predicts that \(\varvec{x}_2\) is better.

Computational complexity

This subsection discusses the computational complexity of training ELDR. Let the size of the dataset D be N and the size of the paired dataset \(D_{pair}\) be \(N_p= N(N-1)/2\). The number of hidden neurons is L. Thence, the size of the matrix H of the hidden layer is \(N_p\times L\), and the size of the matrix T representing the training label is \(N_p\times L\).

First, the time complexity of the hidden layer calculation is O(NLd) for any activation function. Next, for the calculation of the output weights \(\beta \), assuming that \(N_p\gg L\) in general, and computing (2), the time complexity can be calculated as follows: The time complexity of \(\textbf{H}^T \textbf{H}\) becomes \(O(N_pL^2)\), and that of \(\left( \textbf{I}/C+\textbf{H}^\textbf{T} \textbf{H}\right) ^{-1}\) becomes \(O(N_pL^2+L^3)\), whereas, the time complexity of \(\textbf{H}^T\) is \(O(N_pL)\). Therefore, the time complexity of computing \(\varvec{\beta }=\left( \textbf{I}/C+\textbf{H}^T \textbf{H}\right) ^{-1}\textbf{H}^T \textbf{T}\) becomes \(O(N_pL^2+L^3+N_pL+L^2)\). The most computationally expensive part is \(N_pL^2\) and \(L^3\), but because \(N_p\gg L\), the time complexity of ELDR finally becomes \(O(N_pL^2)=O(N^2L^2)\). Thus, the time complexity of ELDR increases with the product of the square of the dataset size N and the square of the number of hidden neurons L.

Example application of ELDR to SAEA: ELDR surrogate-assisted hybrid optimization

To investigate the applicability of ELDR to SAEAs, this study incorporated ELDR into the state-of-the-art SAEA method SAHO [11] to produce ELDR-SAHO. The original SAHO utilizes RBF as its surrogate model and finds a promising solution by simultaneously using Differential Evolution (DE) [33] and Teaching-learning-based Optimization (TLBO) [44, 45]. ELDR-SAHO estimates the dominance relationship between solutions using ELDR instead of predicting the fitness using RBF. This section briefly explains the search algorithms, DE and TLBO, used in SAHO and describes in detail the algorithm employed in ELDR-SAHO.

DE

SAHO uses the DE/1/best strategy because of its high convergence capability. The DE/1/best strategy generates a mutant vector \(\varvec{v}_i\) for a solution \(\varvec{x}_i\) as follows:

$$\begin{aligned} \varvec{v}_i=\varvec{x}_{best}+F\cdot \left( \varvec{x}_{r1}-\varvec{x}_{r2}\right) , \end{aligned}$$
(6)

where \(\varvec{x}_{best}\) represents the best-fitted individual in the current population, r1 and r2 (\(r1\ne r2\ne i\)) denote random indices, and F is the scaling parameter. Using a mutant vector \(\varvec{v}_i\), a trial vector \(\varvec{u}_i\) for a solution \(\varvec{x}_i\) is calculated as

$$\begin{aligned} u_{i, j}={\left\{ \begin{array}{ll} v_{i, j} &{} \text {if}\,\texttt {rand}(0, 1)\le Cr \vee j=j_{rand}\\ x_{i, j} &{} \text {otherwise} \end{array}\right. }, \end{aligned}$$
(7)

where \(\texttt {rand}(0, 1)\) produces a uniform random value in the range [0, 1], Cr is the crossover rate, and \(j_{rand}\) denotes a random index in [1, D]. If the fitness value of the trial vector \(\varvec{u}_i\) is better than that of the current solution \(\varvec{x}_i\), then its position is updated.

TLBO

TLBO is a population-based search algorithm that simulates teacher instruction and mutual learning in education and consists of a teacher phase and a learner phase.

In the teacher phase, the best solution in the current population is selected as a teacher individual \(\varvec{x}_{teacher}\), and the positions of other solutions (students) are updated using the mean of the population \(\varvec{x}_{mean}\) as follows:

$$\begin{aligned} \varvec{x}_{new,i}=\varvec{x}_{old,i}+r_i\left( \varvec{x}_{teacher}-T_F \varvec{x}_{mean}\right) , \end{aligned}$$
(8)

where \(r_i\) is a uniform random value in [0, 1], and \(T_F\) is a variable calculated as \(T_F=1+\texttt {round}\left( \texttt {rand}(0, 1)\right) \) (\(\texttt {round}\) rounds off the input value); that is, \(T_F\) takes 1 or 2 with equal probability. If the fitness value of an updated position \(\varvec{x}_{new,i}\) is better than that of the original position \(\varvec{x}_{old,i}\), then its position is updated.

By contrast, the learner phase updates each solution in the population by interacting with a random solution. For a solution \(\varvec{x}_i\), a random solution \(\varvec{x}_j\) (\(i\ne j\)) is selected, and the following equation produces a new position:

$$\begin{aligned} \varvec{x}_{new,i}={\left\{ \begin{array}{ll} \begin{aligned} \varvec{x}_{old,i}+r_i\left( \varvec{x}_{old,i}-\varvec{x}_{old,j}\right) \\ \text {if}\,f(\varvec{x}_{old,i})<f(\varvec{x}_{old,j})\\ \end{aligned}\\ \begin{aligned} \varvec{x}_{old,i}+r_i\left( \varvec{x}_{old,j}-\varvec{x}_{old,i}\right) \\ \text {otherwise} \end{aligned} \end{array}\right. }, \end{aligned}$$
(9)

where \(r_i\) denotes a uniform random value in [0, 1]. Similar to the teaching phase, the position of a solution \(\varvec{x}_{old,i}\) is updated if the fitness value of the updated position \(\varvec{x}_{new,i}\) is better. TLBO alternatively repeats the teacher and learner phases, until the termination condition is satisfied.

The ELDR-SAHO Algorithm

figure b

The procedure used by the ELDR-SAHO algorithm, in which the proposed ELDR is incorporated into SAHO, is shown in Algorithm 2. ELDR-SAHO first produces ps well-distributed initial samples using Latin hypercube sampling (LHS) [46], and evaluates them using an actual (computationally expensive) fitness evaluation. All evaluated initial samples are stored in the dataset DB. For each iteration, the initial population is generated from the top ps solutions in the dataset DB, and K generations are evolved using DE or TLBO. The ELDR surrogate model is trained using a subset D extracted from DB consisting of n nearest neighbors in DB for each solution in the current population. Note that the design variable is normalized within the minimum and maximum variables in the extracted dataset D. A variable RunDE determines the algorithm used for each search. DE is performed if \(RunDE=True\), whereas TLBO is performed if \(RunDE=False\);

After the K-generations search, a promising solution for the actual fitness evaluation is selected according to Algorithm 3. In Algorithm 3, the best solution in the population at K generations is compared with the mean of the top r subpopulation (\(r\in [1,ps]\)), and a solution that the ELDR surrogate predicts better is selected as a promising solution. The selected promising solution is actually evaluated using the computationally expensive fitness function. If the promising solution is better than the best solution thus far, the current search algorithm (DE or TLBO) is continuously used. Otherwise, the search algorithm is switched.

The essential procedure is the same as that of the original SAHO with the RBF surrogate model but differs in Steps 7, 10, 13, and 15 in Algorithm 2 and Steps 1 and 5 in Algorithm 3. In these steps, the ELDR surrogate model is used to predict the dominance relationship.

figure c
Table 1 Benchmark problems

Preliminary experiment: accuracy of ELDR

This section analyzes the estimation accuracy of the proposed ELDR surrogate model. The first experiment explored the parameter sensitivity of ELDR regarding the activation function h, number of hidden neurons L, and regularization coefficient C. Then, “Comparison with other surrogate models” compared the estimation accuracy of ELDR with other surrogate models, RBF and RankSVM, which are generally used in previous SAEA research. Finally, the computational time of these surrogate models were compared in “Computation time”.

Experimental sttings

This study used the eight single-objective continuous optimization benchmarks shown in Table 1. This choice can be justified because these functions have been widely used to investigate the performance of SAEAs in recent research and have different fitness landscape characteristics. The dimensions of the design variables were set as \(d=20\) and 100 to compare surrogate model performance for small and large problems. It is known that RBF has high estimation accuracy for problems up to 20–30 dimensions, while 100 dimensions are considered a large-scale problem in the SAEA domain. The training datasets consisting of \(N_{train}=2d, 5d\) (d is the dimension) randomly sampled data units were used for training ELDR, RBF, and RandSVM. The trained models were tested on 10d random test samples generated independently from the training dataset. The training data size was chosen to verify estimation accuracy in SAEA when data are limited in the early stages of the search and when a certain number of data samples are obtained in later stages.

The prediction accuracy was assessed using Kendall’s rank correlation coefficient for 20 independent pairs of training and test data, and the average rank correlation was compared. Kendall’s rank correlation coefficient (hereafter, Kendall’s \(\tau \)) returns \(\tau =+1\) if the predicted and actual ranks are entirely the same, whereas it returns \(\tau =-1\) if they are entirely different. A higher rank correlation indicates a better rank prediction. For the ranking method (ELDR and RankSVM), the test data are sorted by the prediction, and their rank is compared with the actual data. For the RBF model, the test data are sorted based on the predicted fitness values, and their rank is also compared with the actual rank (not the comparison of the actual fitness values). The experiments were executed on a computer with an Intel Xeon W-2295 3.00 GHz CPU with 64 GB RAM using MATLAB R2019a.

Comparison of hyperparameters

This subsection reports the experiment conducted to evaluate the different hyperparameters of ELDR: activation function h, number of hidden neurons L, and regularization coefficient C. For the activation function h, the experiment used three functions: Sigmoid (SIG), Gaussian (GAU), and Multiquadric (MQ), as shown in “Extreme Learning Machine”. The number of hidden neurons was set to \(L=\{d+1, 2d+1, 3d+1, \cdots , 10d+1\}\) according to the dimension d of the design variable. The regularization coefficient \(C=\{2^{-5}, 2^{-4}, \cdots , 2^5\}\).

Because the difficulty of estimation varies for each problem, the scale of the estimation accuracy (i.e., Kendall’s \(\tau \)) varies. To consider these differences and investigate the overall performance for all problems, this study uses the same index as in the literature [7, 48]. Specifically, this study investigates the influence of the regularization coefficient C and the number of hidden neurons L for each activation function. For ELDRs using each activation function, the problem instance is denoted as \(F=\{f_k\mid k=1,2,\cdots , n\}\), the parameter setting (combination of C and L) is denoted as \(S=\{s_i\mid i=1,2,\cdots , m\}\), and ELDR with the parameter setting \(s_i\) is denoted as \(A_{s_i}\). The following equation calculates the performance \(PM(s_i)\) for each parameter setting \(s_i\):

$$\begin{aligned} PM(s_i)=\frac{1}{m-1}\sum _{j=1,j\ne i}^m P\left( A_{s_i}\succ A_{s_j}\right) , \end{aligned}$$
(10)

where \(P(A_{s_i}\succ A_{s_j})\) represents the ratio that \(A_{s_i}\) outperforms \(A_{s_j}\) and is calculated as follows:

$$\begin{aligned} P\left( A_{s_i}\succ A_{s_j}\right) =\frac{1}{n}\sum _{k=1}^n P\left( q_{i, k}>q_{j,k}\mid f_k\right) , \end{aligned}$$

where \(P\left( q_{i, k}>q_{j, k}\mid f_k\right) \) represents the ratio that the estimation accuracy of \(A_{s_i}\) is better than that of \(A_{s_j}\) for a problem instance \(f_k\). \(P\left( q_{i, k}>q_{j, k}\mid f_k\right) \) is calculated using the following equation:

$$\begin{aligned}{} & {} P\left( q_{i, k}>q_{j,k}\mid f_k\right) \\{} & {} \;\;=\frac{1}{N_i\times N_j}\sum _{t_i=1}^{N_i}\sum _{t_j=1}^{N_j}\mathbb {1}\left( \tau _{i,k,t_i}>\tau _{j,k,t_j}\right) , \end{aligned}$$

where \(\tau _{i, k, t}\) represents Kendall’s \(\tau \) for the t-th dataset of a problem instance \(f_k\) obtained by \(A_{s_i}\). \(N_i\) and \(N_j\) represent the number of test data in a problem instance \(f_k\) for \(A_{s_i}\) and \(A_{s_j}\), respectively (in this experiment \(N_i= N_j=20\)). The function \(\mathbb {1}(x)\) is the indicator function, which returns 1 if x is true and 0 otherwise.

Fig. 3
figure 3

Performance metric \(PM(s_i)\) calculated using Eq. (10) for each activation function

Figure 3 shows the performance metric \(PM(s_i)\) calculated using Eq. (10) for each activation function. The horizontal axis shows the regularization coefficient and the vertical axis shows the number of hidden neurons. The value in each cell indicates the \(PM(s_i)\) value for the corresponding parameter setting.

The result in Fig. 3a indicates that when using the Sigmoid function, the number of hidden neurons significantly affects its estimation accuracy, and a larger number of hidden neurons results in better performance. Conversely, the effect of the regularization coefficient is small. The results show that the combination of \(L=10d+1\) and \(C=2^{-2}\) achieves the highest estimation accuracy when using the Sigmoid function. However, the difference in the performance metric is small among the different parameter settings. This characteristic indicates that the optimal parameter setting varies depending on the characteristic of target problems.

Table 2 Comparison of Kendall’s \(\tau \) of ELDR, RBF, and RankSVM

Next, Fig. 3b shows that when using the Gaussian function, both the number of hidden neurons and the regularization coefficient significantly affect the estimation accuracy of ELDR. Specifically, a larger number of hidden neurons tends to result in high estimation accuracy, whereas the maximum performance is obtained around \(C=2^{-3}\) to \(2^{0}\) especially when the number of neurons is large for the regularization coefficient. Consequently, the combination of \(L=10d+1\) and \(C=2^{-3}\) achieves the highest estimation accuracy when using the Gaussian function. The performance metric among the different parameter settings is large when using the Gaussian function. This indicates that the optimal parameter setting is largely independent of the problem, and the best parameter setting of \(L=10d+1\) and \(C=2^{-3}\) can obtain stable performance when using the Gaussian activation function.

Finally, focusing on the Multiquadric function, Fig. 3c indicates that small normalization coefficients tend to give stable performance regardless of the number of hidden neurons. In particular, the normalization coefficient of \(C=2^{-5}\) provides the highest estimation accuracy with \(L\ge 2d+1\). Although the performance difference between different L is slight, the combination of \(L=2d+1\) and \(C=2^{-5}\) achieves the highest estimation accuracy when using the Multiquadric function. The large difference in performance metrics between the parameter settings can be seen. In particular, the choice of \(C=2^{-5}\) can be optimal. This indicates that the best parameter setting of \(L=2d+1\) and \(C=2^{-5}\) can be recommended regardless of problems for the Multiquadric function.

Comparison with other surrogate models

This subsection compares ELDR with RBF and RankSVM. ELDR employed the hyperparameters chosen from the results in the previous subsection. In particular, this comparison used ELDR with:

  • Sigmoid activation with \(L=10d+1\), \(C=2^{-2}\)

  • Gaussian activation with \(L=10d+1\), \(C=2^{-3}\)

  • Multiquadric activation with \(L=2d+1\), \(C=2^{-5}\)

RBF uses the cubic basis function (\(\phi (x)=x^3\)), which has been mostly used in previous works that employed the RBF surrogate. The parameters of RankSVM were chosen from the literature [7].

Table 2 shows the estimation accuracies for ELDR, RBF, and RankSVM. The bottom row summarizes the average rank of all problem instances and all dimensions.

Fig. 4
figure 4

Computation time of three surrogate models

First, it can be seen that ELDR using the Sigmoid function has a significantly lower estimation accuracy than the other methods in the first five benchmarks and is less accurate. However, its accuracy improves on the CEC 2005 benchmarks and is higher than that of the Gaussian activation function for some problems, e.g., 100-dimensional cases of \(F_{10}\) and \(F_{16}\). The average rank is the lowest among the five surrogate models. This indicates that the Sigmoid function has low estimation accuracy and low usefulness even when the parameters are optimally configured, but it can be an option for complex composition functions.

Second, ELDRs with the Gaussian and Multiquadric functions outperform RBF and RankSVM used in conventional SAEAs and achieve the best or second-best performance on most problems. In particular, ELDR with the Multiquadric function has a consistently high estimation accuracy and the highest average rank among the five surrogate models. Additionally, ELDRs with the Gaussian and Multiquadric functions exhibit a more stable estimation accuracy than RBF and RankSVM, even when the training dataset is small (\(N_{train}=2d\)). This feature benefits SAEAs because the amount of training data available for training surrogate models is generally limited.

Computation time

To evaluate the computation time of the proposed ELDR, the training times of the three methods are compared. Note that because the influence of differences in the regularization coefficients and activation functions of ELDR on the computation time is extremely small, only results using Gaussian as the activation function and \(C=2^{-3}\) as the regularization coefficient are compared in this paper. It has been confirmed that similar results can be obtained for other activation functions and regularization coefficients. In addition, because the training time does not depend on the characteristics of the problem, this paper addresses the average of all problems.

Figure 4 shows the average training time for ELDR, RBF, and RankSVM. The blue, orange, and green lines represent the results for ELDR, RBF, and RankSVM, respectively. In each figure, the horizontal axis represents the number of hidden neurons in ELDR, while the vertical axis represents the training time [s].

These figures show that the training time for ELDR increases with increasing the number of hidden neurons L in a nearly linear trend. This trend is smaller than the increase of L squared, derived from \(O(N^2L^2)\) for the computational complexity of ELDR discussed in “Computational complexity”. This is because this paper uses MATLAB to implement ELDR, and the computational complexity required to calculate \(H^TH\), which is the most computationally expensive, is less than that required for a simple matrix product. This indicates that the computational complexity of ELDR is smaller than the theoretical one, and the computation time linearly increases with increasing L in practice.

Next, the computation times of RBF and RankSVM is compared. First, ELDR has the shortest training time when the number of dimensions and training data are small (\(d=20, N_{train}=2d=40\)). When the number of dimensions and training data is large, the computation time of ELDR for small L (\(L=d+1\)) is similar to or slightly longer than that of RBF and RankSVM. On the other hand, as the number of hidden neurons L increases, the training time of ELDR becomes longer than that of other surrogate models. In particular, in the case of \(d=100, N_{train}=5d=500\), which has the highest number of dimensions and training data, the computation time of ELDR for \(L=10d+1\) is approximately 60 times longer.

These results indicate that ELDR can be trained with the same training time as existing surrogate models such as RBF and RankSVM for low-dimension, small-sample training. On the other hand, for high-dimension, many-sample training, ELDR requires significantly longer training time than existing models, especially when the number of hidden neurons L is increased. However, the training time for ELDR is acceptable compared to the expensive solution evaluations, which sometimes require several hours or days, targeted by SAEAs.

Numerical experiments

Numerical experiments were conducted using well-known benchmark problems to investigate the effectiveness of the proposed method. The experiments used the eight single-objective continuous optimization benchmarks listed in Table 1. The dimensions of the problems were set to \(d=\{10, 20, 30, 50, 100\}\).

Experimental settings

In the experiments, ELDR-SAHO—explained in “Example application of ELDR to SAEA: ELDR surrogate-assisted aybrid optimization”—was used as the ELDR-assisted EA. It was compared with the conventional state-of-the-art SAEA methods GORS-SSLPSO [13], FSAPSO [15], SLPSO [49], CA-LLSO [26], and SAHO [11]. GORS-SSLPSO, FSAPSO, SLPSO, and SAHO use the regression model based on the RBF, while CA-LLSO uses the classification model based on gradient boosting classifier (GBC) [50]. The implementations of GORS-SSLPSO,Footnote 1 FSAPSO,Footnote 2 SLPSO,Footnote 3 and CA-LLSOFootnote 4 available on the Internet were used. The implementation of SAHO was provided by Pan et al. [11], and ELDR-SAHO was implemented based on it.

Table 3 Experimental results of the performance comparison

The experimental setup was established based on Pan et al. [11]. The population size was set to \(ps=5\times d\) when \(d=\{10, 20, 30\}\) and to \(ps=100+\lfloor d/10\rfloor \) when \(d=\{50, 100\}\). The neighbor size n when training ELDR (Step 6 in Algorithm 2) was set to \(n=d\). The parameter values for DE were \(Cr=0.9\) and \(F=0.5\). The number of generations K required for DE or TLBO on the surrogate model was set to \(K=30\). The maximum fitness evaluation was \(MaxFE=11\times d\) when \(d=\{10, 20, 30\}\) and \(MaxFE=1000\) when \(d=\{50, 100\}\). For the other algorithms, the parameter values recommended in their respective studies were used.

figure d

From the preliminary experimental results in “Preliminary experiment: accuracy of ELDR”, it was revealed that ELDR using either the Gaussian or Multiquadric function has high potential as a surrogate model. However, the optimal activation function varies depending on the problem. Based on this analysis, the model selection through validation is introduced into ELDR-SAHO. Algorithm 4 shows the model selection procedure of ELDR-SAHO. The model selection compares the Gaussian function with \(L=10d+1, C=2^{-3}\), and the Multiquadric function with \(L=2d+1, C=2^{-5}\) as candidates. Subsequently, the model with the highest Kendall’s \(\tau \) on the validation set between these two models is used as a surrogate in the search. The validation set for model selection comprises randomly selected 20% of the data from the training dataset. This procedure is used at line 7 in Algorithm 2.

Results

Table 3 shows the mean and standard deviation of the best fitness after the maximum fitness evaluations in 20 independent runs. For each benchmark, the best (smallest) value is highlighted in boldface, whereas the second-best is underlined. The Mann–Whitney U test with a significance level of 5% was performed, and the results were adjusted using the Bonferroni correction [51] to confirm the statistical difference between the proposed method and the other methods. Where the proposed method obtained significantly better results than the other methods, “\(+\)” marks are put, “−” marks denote significantly worse, and “\(\approx \)” denotes no significant difference is found. Finally, the results are summarized at the bottom of the table.

The experimental results show that ELDR-SAHO achieved the best mean fitness in 21 out of 40 cases and was the second-best in two cases. Moreover, the proposed method is significantly better than the other algorithms on many benchmarks. However, on the \(F_{10}\) and \(F_{16}\) problems, the proposed method was inferior to SAHO, which uses RBF as a surrogate. Moreover, on the low-dimensional problem (\(d \le 30\)) of the Griewank, Rastrigin, and \(F_{19}\), ELDR-SAHO does not achieve good results compared with the other methods. On the other hand, ELDR-SAHO significantly outperformed the other algorithms for the high-dimensional benchmarks except for the Rosenbrock (\(d=100\)), \(F_{10}\), and \(F_{19}\) benchmarks.

Fig. 5
figure 5

Transitions of median objective function values and the interquartile ranges [ranges between the third quartile (Q3) and the first quartile (Q1)] when number of dimensions \(d=100\) (the vertical axis is logarithmic scale)

Figure 5 shows the median objective function values transitions and the interquartile ranges [range between the third quartile (Q3) and the first quartile (Q1)] when \(d=100\). The horizontal axis indicates the number of actual fitness evaluations, whereas the vertical axis shows the objective function value on a logarithmic scale. Note that for the CEC 2005 benchmarks, the difference between the optimum and obtained value is plotted.

First, focusing on the \(F_{10}\) and \(F_{16}\) problems, where ELDR-SAHO was significantly inferior to the other methods on the 100-dimensional problem, it can be seen that the results are comparable to the other methods until the middle stage of the search (about 400 fitness evaluations). On the other hand, in the subsequent search, the proposed method stagnates, whereas the other methods decrease their objective function values. This suggests that the proposed method is stuck in a local solution from which it cannot escape on these benchmarks.

By contrast, the figures show ELDR-SAHO as having a rapid improvement in the objective function value from the early stage of optimization, and it continues the search without stagnation on the Ellipsoid, Rosenbrock, Ackley, Griewank, Rastrigin, and \(F_{19}\) problems. This can be attributed to the ability of ELDR to estimate the dominance relationship more accurately with a smaller amount of training data than RBF and GBC. In particular, although the final objective function values of ELDR-SAHO were worse than those of SAHO on the 100-dimensional Rosenbrock function (Fig. 5b), ELDR-SAHO converges to the final result in fewer evaluations than the other algorithms.

These results indicate that ELDR-SAHO has better search performance than conventional SAEA using regression and classification surrogate models for many problems (especially high-dimensional problems). However, the results suggest that ELDR-SAHO may fall into the local optimum on some problems. Note that although ELDR was applied to SAHO in this study, the application of ELDR is not necessarily suitable for SAHO because SAHO is designed as an SAEA using RBF. In particular, for the \(F_{10}\) and \(F_{16}\) problems, it may be possible to enhance the performance from ELDR by proposing an SAEA with a mechanism to prevent (or escape from) falling into local optimum. In addition, because ELDR can be searched only by the superiority of the solutions in all problems, it may be applicable for interactive EAs that assume each function as a human evaluation model. Therefore, ELDR has high potential as a new surrogate model for SAEA.

Conclusion and future work

This paper proposed a novel ranking-based surrogate model, called ELDR, for SAEA that combines ELM and DirectRanker. Further, it was incorporated into SAHO, a state-of-the-art SAEA. To investigate the estimation accuracy of ELDR, this study compared the estimation accuracy of ELDR with that of RBF and RankSVM, which are commonly used in SAEAs. The comparison results indicated that ELDR with an appropriate hyperparameter setting exhibited the highest rank estimation accuracy compared to RBF and RankSVM, especially when the amount of training data was small. To investigate the compatibility of ELDR with SAEA, ELDR-SAHO was compared with existing SAEAs. The experimental results showed that ELDR-SAHO significantly outperforms existing SAEAs using regression and classification models, including RBF-based SAHO, on many problems, particularly on high-dimensional (\(d\ge 50\)) problems. Based on these results, it is concluded that ELDR is a promising surrogate model for SAEAs. However, because the combination of SAHO and ELDR may fall into the local optimum, future research should explore a different SAEA that can fully utilize the power of ELDR.

Because ELDR is a NN-based method, it has the advantage that, unlike RBF, it can be applied to both discrete and mixed variable optimization problems and real-valued optimization problems. In addition, a ranking-based surrogate model is useful for constrained and aesthetic optimizations. Therefore, future research should examine the application of ELDR to these problem domains.