Introduction

Evolutionary computation is an important field of numerical optimization. Due to the insensitivity of the characteristics of the objective function, evolutionary algorithms are considered effective for solving nonconvex optimization and NP-hard problems. The estimation of distribution algorithm (EDA) [1], as one of the traditional evolutionary computation techniques, has received considerable research attention over the past two decades. In contrast to the traditional evolutionary algorithms that use crossover, mutation and selection mechanisms, EDA has a unique evolutionary process [2]. It estimates the probability distribution model of selected solutions and evolves the whole population iteratively. The Gaussian distribution model is typically used with the EDA to solve problems in the continuous domain. According to the structure of the Gaussian probability model and the relationships among variables, EDAs can be classified as univariate [3], bivariate [4], or multivariate [5] models. The multivariate Gaussian distribution model used in EDA and its variants is competitive in various real world applications [6,7,8,9,10].

The EDA extracts the features of the population location from a macro perspective, and its generation relies heavily on the estimated distribution model. However, the variances in the distribution of the fitness descent directions shrink rapidly in its later stages [11, 12]. Moreover, the basic EDA has no local search mechanism to enhance the population diversity, causing it to be easily affected by ill-shaped distributions. Specifically, when addressing multimodal problems, the EDA cannot capture the appropriate characteristics of a problem effectively through the unimodal Gaussian distribution model. Thus, the traditional EDA struggles with premature convergence and is easily trapped in stagnate conditions. To address these drawbacks, many studies have been performed in recent decades, as reviewed in the next section. However, according to the results of single-objective optimization competitions organized by the IEEE Congress on Evolutionary Computation (IEEE CEC) in recent years, EDA extensions have been unable to achieve top performances in solving those complex benchmarks. Nevertheless, due to its unique model-based features, the EDA can be useful as a supplement to other algorithms. Note that all of the recent champion algorithms utilized the covariance matrix adaptation technique, such as HSES [13], ELSHADE-SPACMA [14] (in CEC 2018), EBOwithCMAR [15] (in CEC 2017), and UMOEAsII [16] (in CEC 2016). In these promising hybrid methods, the EDA provides a unique search framework to capture problem features, while other search mechanisms favorably supplement the EDA in enriching population diversity and performing efficient local searches.

However, the EDA’s potential is not limited to its application as a hybrid algorithm. The property of the distribution characteristics has not been fully exploited. First, the role of historical distribution information in modifying the abnormal distribution of the population is inefficiently developed. Second, the application of the best or the leader solutions in directing the search operation toward dominant regions has not received widespread attention. In addition, the function of inferior solutions in enhancing population diversity is always neglected. These areas of potential improvement leave room for us to further explore the characteristics of the EDA and make full use of the population distribution information to enhance the algorithm performance.

In this study, we focus on the improvement of the EDA distribution model from the above aspects and explore a new, improved EDA for solving single-objective optimization problems using three modification strategies. The main innovation points can be summarized as the following three points:

  1. a.

    Archive-based population updating (APU) We propose a distribution model estimation strategy based on historical distribution information. An archive is designed to store successive generations of solutions, and promising individuals are selected from this archive to form a new population and estimate the distribution model. This operator is called archive-based population updating (APU), which makes full use of historical population information. Using APU, the selected excellent individuals from different generations can expand the variances in the search scope. Nevertheless, the continuous archive updates following the first-in first-out rule can effectively avoid the over computation of several best solutions that result in the distribution model overfitting and loss of population diversity.

  2. b.

    Multileader-based search diversification (MSD) We develop a multileader-based search diversification (MSD) strategy to utilize the location difference among the population to diversify search scope. The MSD contains two different search strategies. First, the mean point is prompted using a leader solution randomly selected from a set where several of the most promising solutions are preserved. On the other hand, the candidate is generated around a disturbed mean point. Using the MSD strategy, the advantage of leader solutions in directing the evolution is fully utilized. Furthermore, the sampling scopes are diversified with different mean points and can effectively enhance the algorithm’s exploration behavior, thus avoiding the ill-shaped unimodal distribution model misleading the population into stagnation.

  3. c.

    Triggered distribution shrinkage (TDS) We study a triggered distribution shrinkage (TDS) strategy to scale the search scope when the algorithm falls into stagnation. This mechanism is designed to decrease the search scope and evolve the algorithm to focus on local exploration and improve the convergence performance.

We assemble the above three search strategies in the EDA to obtain the E3-EDA. The algorithm focuses on single-objective optimization problem optimization, which is the basic issue of a new evolutionary algorithm and the foundation of other complex extensions, such as multi-objective algorithms, constrained algorithms, and parallel algorithms. To verify the effectiveness of our proposal, the well-known CEC 2014 and CEC 2018 test suites are used for benchmarking. Both tests are carried out with 10-dimensional, 30-dimensional, 50-dimensional and 100-dimensional (10D, 30D, 50D and 100D) functions. Additionally, the top methods in the CEC 2014 and CEC 2018 competitions, including LSHADE-EpSin [17], UMOEAsII [16], L-SHADE [18], HSES [13], LSHADE-RSP [19], ELSHADE-SPACMA [14], EBOwithCMAR [15], in addition to two promising EDA variants, MLS-EDA [20] and ACSEDA [21], are employed to perform in competitions with E3-EDA.

This study is organized as follows. In Sect. “Related advances in EDA research”, the basic EDA is presented, and its related achievements are reviewed. In Sect. “E3-EDA search behavior description”, the search behaviors of E3-EDA are described mathematically. Section “Experimental study using modern IEEE CEC benchmarks” details the experimental research. First, a parametric study is carried out to determine E3-EDA’s optimal parameter settings. Then, the influence of different search mechanisms on algorithm convergence performance is revealed. Finally, the performance of E3-EDA is evaluated using the CEC 2014 and CEC 2018 benchmarks. Nonparametric tests are also adopted to compare E3-EDA with the top algorithms. Finally, the findings of this study are concluded in Sect. “Conclusions”.

Related advances in EDA research

The basic EDA evolves the search scope by learning the distribution characteristics of the selected superior solutions. The mean value μ and covariance matrix C of the Gaussian distribution model are determined using maximum likelihood estimation (MLE) as follows:

$$ {\varvec{\mu}} = \frac{1}{{|{\varvec{S}}|}}\sum\limits_{i = 1}^{{|{\varvec{S}}|}} {{\varvec{x}}_{i} } ,\quad {\varvec{x}}_{i} \in {\varvec{S}}\quad {\text{and}}\quad {\varvec{S}} \subset {\varvec{X}}, $$
(1)
$$ {\varvec{C}} = \frac{1}{{|{\varvec{S}}|}}\sum\limits_{i = 1}^{{|{\varvec{S}}|}} {\left( {{\varvec{x}}_{i} - {\varvec{\mu}}} \right)\left( {{\varvec{x}}_{i} - {\varvec{\mu}}} \right)^{{\text{T}}} } ,\quad {\varvec{x}}_{i} \in {\varvec{S}}\quad {\text{and}}\quad {\varvec{S}} \subset {\varvec{X}} $$
(2)

where the symbol S is a set containing the selected superior solutions and |S| is its cardinality. Then, population sampling is carried out using the mean point and covariance matrix as follows:

$$ {\varvec{x}}_{i} = {\varvec{\mu}} + {\kern 1pt} {\varvec{y}}_{i} ,{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\varvec{y}}_{i} \sim N\left( {0,{\varvec{C}}} \right). $$
(3)

The estimated distribution model is crucial to algorithm performance. When the population distribution becomes abnormal, the algorithm has no mechanism to maintain population diversity, which leads the main search direction to be gradually perpendicular to the descent direction of the objective function value, thus causing the algorithm to fall into local stagnation. This deficiency was first noted in [22], where Cai et al. studied an adaptive variance scaling (AVS) method. In [23, 24], Grahl et al. proposed a correlation-triggered AVS (CT-AVS) mechanism and another trigger technique called SDR. On this basis, a well-known EDA named AMaLGaM was proposed in 2013 [25]. However, this method scaled the variances without differences, and thus, the search scope cannot be readily directed toward the fitness descent directions. Ren et al. equipped EDA with anisotropic AVS (AAVS-EDA) [11]. The advances of AAVS-EDA over AMaLGaM were supported by their experimental study. Although AAVS-EDA can anisotropically scale the variances based on fitness landscape detection, one concern is that the trial points are not limited to the input domain, which leads to unreliable detection. Their other work, EDA2, employed an archive to preserve the superior solutions of successive generations [12]. This mechanism uses the distribution differences among distinct generations to increase the variance in the fitness descent direction and effectively avoid a problematic distribution. Similarly, Yang et al. [21] proposed ACSEDA with an adaptive covariance scaling strategy for covariance estimation.

However, all elements in the archive participate in the distribution estimation, which requires additional time consumption. The more widely accepted EDA variant is the covariance matrix adaptation evolution strategy (CMA-ES) [26]. CMA-ES utilizes the “rank-1” update and the “rank-μ” update to estimate the covariance matrix and adopts a cumulative step length adjustment to scale the distribution scope adaptively. Its promising variants, IPOP-CAM-ES [27] and NBiPOP-aCMA-ES [28], were the top methods in the CEC 2005 and CEC 2013 competitions, respectively. However, they incur a greater computational burden than other approaches due to their complex search framework.

In addition to modifications to the distribution model, many other search operators have been adopted in EDAs to enrich the population diversity. In our previous work [20], the MLS-EDA utilizes a multi-leader search strategy to enhance population diversity and avoid premature convergence. In [29], the EDA was equipped with a simulated annealing technique to enhance local exploration performance. Miquélez et al. [30] employed a Bayesian classifier method to estimate the probability model. Their proposal exhibited competitive performance in solving continuous optimization problems. The effectiveness of a regularized learning method in EDA modification was reported by Karshenas [31]. The copula theory [32] and a probabilistic graphical model [33] were also employed in the EDA for sampling. In RWGEDA, random walk strategies were utilized to strengthen the EDA’s exploration performance [34]. Moreover, the effects of the promising area detection technique and niching method on EDA performance improvement have been addressed by various studies [5, 35].

Combining EDA with other optimization algorithms to fully leverage their advantages is regarded as another effective way to improve search performance. A hybrid of the EDA with PSO was explored by Qi et al. [36] to address the real world optimization problem. Furthermore, several studies have been carried out to combine DE and EDA [37, 38]. The statistical results demonstrated that their proposals outperformed other approaches. Moreover, in Sun’s work [39], a hybrid technique combining EDA and CS was implemented to solve a scheduling problem.

The above review indicates that EDA development has progressed through different aspects. For the EDA variants, the properties of each element of solution information are not fully utilized. For example, poor solution information that is not involved in model construction is directly discarded. The function of elite solutions in promoting algorithms to search for different dominant regions has not been explored. For the remaining EDA studies, the improvements employed different search mechanisms or algorithms to compensate for the lack of population diversity in the basic EDA. These developments were accompanied by a complex implementation framework and additional free parameters, which reduce the algorithm’s efficiency and robustness. In contrast to previous studies, our E3-EDA utilizes APU, MSD and TDS strategies to modify the distribution model rather than employ other search mechanisms or sub-algorithms. The proposed modifications make full use of historical and current population information to enhance the basic EDA optimization and convergence performance.

E3-EDA search behavior description

The E3-EDA search framework integrates APU, MSD and TDS with the basic EDA, as illustrated in Fig. 1. In each generation, the first NP best-performing solutions from the archive are selected to participate in distribution model estimation. Then, TDS is activated to shrink the variances of the covariance matrix to strengthen the exploitation capacity only if the algorithm is determined to be in stagnation. In MSD, two different search behaviors are employed to modify the mean point to diversify the search scope. After sampling, the APU strategy updates the new population based on an archive to avoid the overuse of the determined best solutions. The detailed descriptions of E3-EDA are denoted below.

Fig. 1
figure 1

Workflow diagram of E3-EDA

APU strategy

In the APU strategy, continuous n generations of population information are retained in an archive, i.e., \(A^{t} = X^{t} \cup X^{t - 1} \cup \cdots \cup X^{t - n + 1}\). The first NP of the best-performing individuals is selected from the archive \({\varvec{A}}^{t}\) as the new population to estimate the Gaussian distribution model as follows:

$$ \left\{ \begin{gathered} {\varvec{\mu}} = \frac{1}{NP}\sum\limits_{i = 1}^{NP} {\omega_{i} {\varvec{x}}_{i} } \hfill \\ {\varvec{C}} = \frac{1}{NP}\sum\limits_{i = 1}^{{{\text{NP}}}} {\left( {{\varvec{x}}_{i} - {\varvec{\mu}}} \right)\left( {{\varvec{x}}_{i} - {\varvec{\mu}}} \right)^{T} } ,{\kern 1pt} \hfill \\ \end{gathered} \right.{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\varvec{x}}_{i} \in {\varvec{A}}_{{\left( {1:NP} \right)}} . $$
(4)

In (4), ωi is the weight coefficient of the weighted maximum likelihood estimation used to calculate the mean value, which is denoted as follows:

$$ \omega_{i} = {{\ln \left( {NP + 1} \right)} \mathord{\left/ {\vphantom {{\ln \left( {NP + 1} \right)} {\left( {\sum\limits_{i = 1}^{NP} {\left( {\ln \left( {NP + 1} \right) - \ln \left( i \right)} \right)} } \right)}}} \right. \kern-0pt} {\left( {\sum\limits_{i = 1}^{NP} {\left( {\ln \left( {NP + 1} \right) - \ln \left( i \right)} \right)} } \right)}}. $$
(5)

If the ith superior solution has a better fitness value than the kth, i.e., f(xi) < f(xk), the values of their weight coefficients are opposite as ωi > ωk. Compared with the truncation selection in CMA-ES, the gap of weight coefficients in APU is smaller. The purpose of this setting is to overcome the overreliance on the better solutions during the distribution estimation. Moreover, more solutions selected from different generations can rebuild the search scope, thus modifying the overfitted model. Therefore, the new candidates are generated following the newly estimated Gaussian distribution model as follows:

$$ {\varvec{x}}_{i} = {\varvec{\mu}}{ + }{\varvec{y}}_{i} ,\quad {\varvec{y}}_{i} \sim {N}\left( {0,{\varvec{C}}} \right). $$
(6)

The newly generated population is indicated as Xt+1. Then, the new archive is updated as \(A^{t + 1} = X^{t + 1} \cup X^{t} \cup \cdots \cup X^{t - n + 2}\), meaning that the oldest population information is abandoned. Then, the new population is chosen from At+1 to evolve the algorithm iteratively. In this behavior, the old population is discarded in archive updating even if they contain some excellent individuals. This principle is considered to avoid the defect of premature convergence, which is inherited in the unimodal distribution model, as illustrated in Fig. 2. During the iterations, using the old high-quality solution many more times will make the long axis of the distribution model gradually perpendicular to the descent direction of the fitness value, which will lead the algorithm into a stagnated state. This phenomenon can be avoided by properly removing these old high-quality solutions, as presented in Fig. 3. After cutting off several old candidates, the distribution of the new high-quality solutions has good diversity, which can increase the sampling range.

Fig. 2
figure 2

Using the old high-quality solutions many more times leads to a loss of population diversity for decreasing the variance along the descent direction

Fig. 3
figure 3

Through the APU strategy, the old solutions are abandoned and the enlarged new search scope can promote the algorithm searching in different promising regions

Moreover, the distribution model of offspring sampling is calculated based on the information of the parent population, which is composed of the first NP historical optimal solutions in updated archive A. These high-quality solutions from different generations may be distributed in multiple dominant regions, as illustrated in Fig. 4. The new high-quality solutions in the population will increase the sampling range of the distribution model, which will enhance the algorithm’s exploration ability and population diversity.

Fig. 4
figure 4

The locations of the high-quality solutions in the archive are diverse

MSD strategy

In our previous work [20], we utilized the individual information in the group for local search. On this basis, the MSD strategy is developed to diversify the distribution scope using the current best performing solutions (also called leader solutions) or the population location differences to modify the mean point. It provides two different search behaviors. The first one utilizes the multiple leader solution to direct the mean point shift as indicated below:

$$ {\varvec{\mu}}_{MSDi} = ({\varvec{\mu}} + {\varvec{L}}_{i} )/2, $$
(7)

where Lj is a randomly selected member in set L, which contains several top leader solutions. Set L has a variable capacity. Initially, only the current best solution is selected in set L. Once the algorithm stagnates, the capacity of L expands to store the best two solutions. Until its capacity arrives at its predefined maximum value, the capacity no longer changes.

It is difficult for the basic EDA to capture the characteristics of a multimodal problem effectively through a unimodal distribution model, which makes the algorithm susceptible to falling into a local optimum. In this behavior, these leader solutions may be scattered in different dominant regions, thus leading the diversified shifted mean points to perform a full global detection of the search area; this beneficially increases the sampling range and avoids the failure mode of algorithm stagnation. Since the population of each generation is updated from archive A, the elements in set L are changed, which effectively ensures the diversity of population sampling.

In the other search behavior of MSD, the mean point of the ith solution is shifted on each dimension using its individual location information, as described below:

$$\begin{aligned}{\varvec{\mu}}_{MSDi} & = \left( {{\varvec{\mu}} + {\varvec{x}}_{i} } \right)/2 + \left( {{\varvec{\mu}} - {\varvec{x}}_{i} } \right) \cdot {\varvec{B}} \cdot {\varvec{r}}_{1 \times D} \cdot {\varvec{B}}^{T},\\ & \quad {\varvec{r}}_{j} \sim U\left( {0,1} \right),\end{aligned} $$
(8)

where r is a 1 × D vector and D represents the problem dimensionality. B is the eigenvector matrix obtained from the decomposition of covariance matrix C.

$$ {\varvec{C}} = \left( {{\varvec{B}} \cdot {\varvec{D}}} \right) \cdot \left( {{\varvec{B}} \cdot {\varvec{D}}} \right)^{T} . $$
(9)

The μMSD is a product of a disturbance around the estimated mean point. When solving the multimodal problem, it is difficult to determine the promising search region. The current inferior solution may be located near the global optimum and can be converted into a superior solution at the next moment. Thus, we utilize the location information of all individuals to disturb the mean point. The modification along the axis of the probability density ellipsoid is achieved using the eigenvector matrix, which is helpful for shifting the mean point on the descent direction of the fitness value. The advantage of using eigen coordinates has been argued in [40, 41].

After using Eqs. (7) or (8) to obtain the improved mean point, the new individual sampling is formulated as follows:

$$ {\varvec{x}}_{i} = {\varvec{\mu}}_{{MSD{\kern 1pt} i}} + {\varvec{y}}_{i} ,\quad {\varvec{y}}_{i} \sim N\left( {0,{\varvec{C}}} \right). $$
(10)

The substantial difference between these two behaviors of the mean point in MSD, as presented in Eqs. (7) and (8), lies in the direction of mean point modification. In the first behavior, the search center moves toward the determined best-performing solutions. When the unimodal Gaussian model can capture the descent direction of the fitness value, shifting the mean point to the elite individuals is the same as promoting the distribution model in the descent direction, which is conducive to the algorithm’s convergence performance. In the second behavior, the search center is accessed using the individual information. Since the members of the population are selected from the archived A, the distribution of solutions derived from different generations is scattered. Various solutions are used to disturb the mean point to realize the diversification of the distribution model, which can considerably enhance the ability to explore the solution space. In other words, Eq. (7) is advantageous to the algorithm when the distribution model obtains the dominant evolution direction, while Eq. (8) is advantageous to the algorithm to eliminate local stagnation when dealing with multimodal problems. However, it is difficult to determine which behavior is better during algorithm execution. Therefore, we present a mechanism to adaptively adjust the selection probabilities of the two behaviors in MSD according to the proportion of high-quality offspring generated using Eqs. (7) or (8).

If the probability of each individual choosing Eqs. (7) and (10) for sampling is P1, then the probability of choosing other behavior is P2, and P1 + P2 = 1 (initially, P1 = P2 = 0.5). After population renewal, the number of offspring obtained by Eqs. (7) and (10) renewal is NP1, the number of individuals superior to their parents is SNP1, and the ratio of dominant offspring is SR1:

$$ {\text{SR}}_{1} = {\text{SNP}}_{1} /{\text{NP}}_{1} . $$
(11)

In the same way, SR2 is the ratio of the promising offspring generated using Eqs. (8) and (10):

$$ {\text{SR}}_{2} = {\text{SNP}}_{2} /{\text{NP}}_{2} . $$
(12)

If SR1 is greater than SR2, it indicates that the effect of the first search behavior of MSD is better than the other. In the next generation, the value of P1 should be increased appropriately, and the selection probability of the two techniques should be adjusted to the following:

$$ \left\{ \begin{gathered} P_{1} { = }\frac{{\left( {P_{1} { + }\left( {1 - P_{1} } \right) \cdot {{{\text{SR}}_{{1}} } \mathord{\left/ {\vphantom {{{\text{SR}}_{{1}} } {\left( {{\text{SR}}_{{1}} + {\text{SR}}_{{2}} } \right)}}} \right. \kern-0pt} {\left( {{\text{SR}}_{{1}} + {\text{SR}}_{{2}} } \right)}}} \right)}}{{\left( {1{ + }\left( {1 - P_{1} } \right) \cdot {{{\text{SR}}_{{1}} } \mathord{\left/ {\vphantom {{{\text{SR}}_{{1}} } {\left( {{\text{SR}}_{{1}} + {\text{SR}}_{{2}} } \right)}}} \right. \kern-0pt} {\left( {{\text{SR}}_{{1}} + {\text{SR}}_{{2}} } \right)}}} \right)}} \hfill \\ P_{{2}} = 1 - P_{1} \hfill \\ \end{gathered} \right.,\quad {\text{if}}\quad {\text{SR}}_{{1}} > {\text{SR}}_{{2}} . $$
(13)

In the same way, if SR1 is smaller than SR2, it indicates that the sampling using Eqs. (8) and (10) is better. In the next generation, the value of P2 should be increased appropriately, and the selection probability of the two behaviors should be adjusted as follows:

$$ \left\{ \begin{gathered} P_{{2}} = \frac{{\left( {P_{{2}} { + }\left( {1 - P_{{2}} } \right) \cdot {{{\text{SR}}_{{2}} } \mathord{\left/ {\vphantom {{{\text{SR}}_{{2}} } {\left( {{\text{SR}}_{{1}} + {\text{SR}}_{{2}} } \right)}}} \right. \kern-0pt} {\left( {{\text{SR}}_{{1}} + {\text{SR}}_{{2}} } \right)}}} \right)}}{{\left( {1{ + }\left( {1 - P_{{2}} } \right) \cdot {{{\text{SR}}_{{2}} } \mathord{\left/ {\vphantom {{{\text{SR}}_{{2}} } {\left( {{\text{SR}}_{{1}} + {\text{SR}}_{{2}} } \right)}}} \right. \kern-0pt} {\left( {{\text{SR}}_{{1}} + {\text{SR}}_{{2}} } \right)}}} \right)}} \hfill \\ P_{1} { = }1 - P_{{2}} \hfill \\ \end{gathered} \right.,\quad {\text{if}}\quad {\text{SR}}_{{2}} > {\text{SR}}_{1} . $$
(14)

To avoid the extinction of either search behavior, the probabilities of the two methods in MSD take a boundary as follows:

$$ \left\{ \begin{gathered} P_{1} = \min \left( {0.95,P_{1} } \right),P_{1} = \max \left( {0.05,P_{1} } \right) \hfill \\ P_{2} = \min \left( {0.95,P_{2} } \right),P_{2} = \max \left( {0.05,P_{2} } \right) \hfill \\ \end{gathered} \right.. $$
(15)

TDS strategy

Both APU and MSD can enlarge the search scope. They can improve the algorithm’s exploration performance but contribute little to the convergence performance. In E3-EDA, the TDS strategy is adopted to narrow the search area by scaling the variances when the algorithm falls into stagnation:

$$ \left[ {{\varvec{D}}^{2} } \right]_{{{\text{diagonal}}}} = \left( {1 - {\text{FEs}}/{\text{FEs}}_{\max } } \right) \cdot \left[ {{\varvec{D}}^{2} } \right]_{{{\text{diagonal}}}} , $$
(16)

where FEs are the function evaluations and FEsmax is the maximum boundary. As presented in Eq. (16), the search range decreases with additional iterations, which makes the algorithm focus on local exploitation in the late stage. On the other hand, we set the evaluation criteria of algorithm stagnation as follows: if the average fitness value of the first half of the superior individuals in the current population is not less than the average value of the previous generation, the algorithm is trapped in stagnation. When the algorithm stagnates, the covariance matrix is no longer updated. Only the search variances are shrunk until the algorithm does not meet the stagnant standard. This setting can reduce the computational burden in the covariance matrix calculation.

Moreover, to ensure E3-EDA’s convergence performance, the best result is updated after each generation. The flowchart of the execution framework of E3-EDA is illustrated in Fig. 5.

Fig. 5
figure 5

Flowchart of E3-EDA

The differences between E3-EDA and MLS-EDA

Generally, E3-EDA is an extended study based on our previous MLS-EDA. In MLS-EDA, we found that utilizing population diversity can effectively improve the EDA performance. Inspired by this, we have modified the MLS in E3-EDA, i.e., the newly proposed MSD adopts new search strategies and a statistical based search behavior adaptive selection method, which are different from the mechanisms adopted in MLS-EDA. Furthermore, only using the current population information still would have caused some restrictions on the population diversity. Thus, the APU strategy is investigated to utilize the historical distribution information to enhance the exploration performance. Through the ensemble of APU, MSD and TDS, a more efficient E3-EDA is achieved.

Time complexity analysis of E3-EDA

The computational time complexity is an important issue for evaluating the efficiency of an evolutionary algorithm. The calculation efficiency of the EDA is affected mainly by the calculation of the covariance matrix. E3-EDA is an EDA-based algorithm, but it has little difference compared with the basic EDA in terms of computation cost. As described above, in each iteration, all members of the population participate in covariance matrix estimation, as in Eq. (4). The time complexity of this part is O(D2·NP), where D is the dimensions of the problem. Moreover, the decomposition of the covariance matrix using the Jacobi method has a complexity of O(D3). In population updating, sampling is executed in each dimension for all individuals, which leads to a computational cost of O(NP·D). Therefore, the maximum time complexity of E3-EDA is determined by O{D2 × max(D, NP)}. Additionally, according to the execution process in Fig. 2, the computational cost for updating the covariance matrix can be avoided when the algorithm stagnates. Under this condition, our E3-EDA achieves a time complexity less than O{D2 × max(D, NP) + NP·D} in each iteration.

Experimental study using modern IEEE CEC benchmarks

Using benchmarks to verify the performance is an indispensable part of developing a new evolutionary algorithm. Since 2005, IEEE CEC has proposed several challenging benchmarks on different aspects and has held algorithm competitions every year. The CEC 2014 and CEC 2018 test suites are two modern and challenging sets of benchmarks for a single-objective optimization algorithm. The CEC 2014 test suite consists of 30 functions, which can be categorized into four types: unimodal functions F1–F3, multimodal functions F4–F16, hybrid functions F17–F22 and composite functions F23–F30. Similarly, the CEC 2018 testbed contains 29 benchmarks with F2 excluded, and it is also divided into four groups: unimodal functions F1 and F2, multimodal functions F4–F10, hybrid functions F11–F20 and composite functions F21–F30. These four types of test functions have different characteristics, and the difficulty of obtaining the optimal solution gradually increases. The distinction between CEC 2014 and CEC 2018 is that the basic functions used are different, which makes the problems have different features. More details about these two sets of testbeds are explained in [42] and [43], respectively.

Because the CEC test is a black-box problem, its result is recoded using the error value between the best value determined by the algorithm and the optimal value as f(x) − f(x*). The optimal value is determined when the error value is less than 1e−08. The maximum function evaluation is related to the problem dimensionality D and set to D × 10,000. All simulations in this section are implemented on a laptop with an i7-8700HQ processor (2.20 GHz) and 16 GB memory. The software MATLAB 2018a is utilized for coding and running.

In this section, we first use the CEC 2014 test suite with 30D problems to determine the optimal parameter settings of E3-EDA. Then, the contributions of the proposed modifications are discussed. Finally, the performance of E3-EDA is compared with that of the top methods in CEC competitions using the CEC 2014 and CEC 2018 testbeds.

E3-EDA parametric study

The tuning of parameters plays a substantial role in influencing the performance of an evolutionary algorithm. If an algorithm possesses many free parameters, it is difficult to determine its optimal values for different problems, which limits the algorithm’s robustness and applicability. In E3-EDA, only three parameters need to be discussed in terms of obtaining optimal settings: the population size (NP), the maximum size of archive A (Size|A|max), and the maximum size of set L (Size|L|max). In the basic EDA, a large population size is always desired to maintain the search diversity. In E3-EDA, five values of population size are used, namely, NP = 12·D, 15·D, 18·D, 21·D, and 24·D, where D denotes the problem dimensionality. Archive A retains population information of successive generations; therefore, its optimal size is related to the population size, i.e., Size|A|max = NP, 2·NP, 3·NP, 4·NP, and 5·NP. The set L preserves several top leader solutions to inform the mean point modification. Its optimal value is determined from six cases: Size|L|max = 1, 0.1·NP, 0.2·NP, 0.3·NP, 0.4·NP, and 0.5·NP. When the size of L equals 1, only the best solution is utilized. While the size of L is set to 0.5·NP, the first half of the best population is preserved. We associate the parameter assignment with the problem size so that E3-EDA can adaptively adjust its parameter values when dealing with problems of different dimensions. This step is performed to reduce the parameter sensitivity of the algorithm and avoid the parameter adjustment burden.

In this investigation, CEC 2014 benchmarks are employed to evaluate the E3-EDA performance with 150 (5 × 5 × 6) different parameter settings. The experiments are carried out with 30D problems, and each problem is run for 51 times independently. The maximum function evaluation is fixed at 300,000. To save space, we do not provide the optimization results of the 150 algorithms. However, the Friedman test is performed to rank the results to reveal the differences. Since the three free parameters are related to the population size, we analyze the influence of the other two parameter settings on the algorithm performance with different NP values. For NP set to 12·D, 15·D, 18·D, 21·D, and 24·D, the influence of the parameters Size|A|max and Size|L|max on the algorithm performance is tabulated in Tables 1, 2, 3, 4 and 5. The ranking differences are illustrated in Figs. 6, 7, 8, 9 and 10. With the population size NP taking these five different values, most of the best points are accompanied by the parameter Size|A|max being equal to 3·NP and Size|L|max being equal to 0.1·NP. The best parameter settings occur at NP = 18·D, Size|A|max = 3·NP, and Size|L|max = 0.1·NP, as shown in Table 3, where the statistical ranking value is the lowest at 38.2857.

Table 1 Rankings of algorithms with different size |A|max and size |L| max settings when NP = 12·D
Table 2 Rankings of algorithms with different size|A| and size|L| settings when NP = 12·D
Table 3 Rankings of algorithms with different size|A| and size|L| settings when NP = 18·D
Table 4 Rankings of algorithms with different size|A| and size|L| settings when NP = 21·D
Table 5 Rankings of algorithms with different size|A| and size|L| settings when NP = 24·D
Fig. 6
figure 6

When NP = 12·D, the best ranking value 51.9524 is obtained at Size|A| max = 5· NP and Size|L| max = 0.1· NP

Fig. 7
figure 7

When NP = 15·D, the best ranking value 43.9524 is obtained at Size|A| max = 5· NP and Size|L| max = 0.1· NP

Fig. 8
figure 8

When NP = 18·D, the best ranking value 38.2857 is obtained at Size|A| max = 5· NP and Size|L| max = 0.1· NP

Fig. 9
figure 9

When NP = 21·D, the best ranking value 42.5714 is obtained at Size|A| max = 5· NP and Size|L| max = 0.1· NP

Fig. 10
figure 10

When NP = 24·D, the best ranking value 44.6667 is obtained at Size|A| max = 5· NP and Size|L| max = 0.1· NP

We sum the data values in Tables 1, 2, 3, 4 and 5 by rows or columns. Then, taking the mean value, we can perform a statistical analysis on the influence of the parameters size|A|max or size|L|max independently on the algorithm performance under different population sizes, as presented in Figs. 11 and 12, respectively. It can be concluded that the optimal values of parameter size|A|max and size|L|max are 3·NP and 0.1·NP, respectively. Moreover, the curve variations in Fig. 11 are greater than those in Fig. 12, which can be summarized as follows: the influence of the value of parameter size|A|max on algorithm performance is greater than that of parameter size|L|max. By summing all the values in Tables 1, 2, 3, 4 and 5, we can compare the effects of different population sizes on the algorithm’s performance. The satisfactory settings of NP are 18·D and 21·D, with a minimal difference between the two ranking values of 70.619 and 70.4032, respectively. We choose 18·D as the optimal population size because a larger population size can increase the computational cost since the maximum time complexity is O{D2 × max(D, NP)}. In summary, the optimal parameter settings of E3-EDA are NP = 18·D, size|A|max = 3·NP and size|L|max = 0.1· NP.

Fig. 11
figure 11

Ranking variations of different values of Size|A|max

Fig. 12
figure 12

Ranking variations of different values of Size|L|max

Investigation of the influence of E3-EDA search behavior

In this subsection, we investigate the influence of three different search behaviors on the optimization results and convergence performance of E3-EDA. First, four E3-EDA variants are designed, namely, E3-EDA-1, E3-EDA-2, E3-EDA-3, and E3-EDA-4; compared with the original E3-EDA, the first three algorithms remove the APU, MSD, and TDS mechanisms, respectively. In E3-EDA-4, the two behaviors in MSD are randomly selected (which means P1 = P2 = 0.5) instead of using the adaptive probability adjustment mechanism. Moreover, the basic multivariate Gaussian EDA, i.e., EMNAg, is considered a competitor to verify the effectiveness of our proposed modifications. The performances of the six algorithms are evaluated using the CEC 2014 testbed with 30D problems. 51 independent evaluations are made for each benchmark with the same algorithm parameter settings, i.e., NP = 18·D, size|A|max = 3· NP and size|L|max = 0.1· NP. In addition, FEsmax is constrained at 300,000, and the experimental results are recorded in Table 6. The last row in Table 6 shows the rank difference among the six algorithms according to the Friedman test. E3-EDA, with all three search mechanisms integrated, obtains the best rank value of 1.9500. E3-EDA-4 is the next-best-scoring algorithm, achieving a rank value of 2.1833. The next followers are E3-EDA-2 and E3-EDA-3, with rank values of 2.9833 and 3.7167, respectively. The last two methods are E3-EDA-1 and EMNAg with the poorest ranks of 4.6333 and 5.5333, respectively. The worse the ranking value of the algorithm, the more substantial the influence of the missing part on the algorithm performance. APU is the most important component to ensure the high performance of E3-EDA. It considers and uses the historical distribution information to support the estimation of the current distribution model, while effectively avoiding an ill-conditioned distribution caused by the algorithm using only the current solution information. Comparing the optimization results of E3-EDA and E3-EDA-3, it can be found that the TDS strategy can enhance the algorithm performance to solve the multimodal complex problem. TDS can shrink the search scope to help the algorithm perform local exploitation. In addition, E3-EDA has better performance than E3-EDA-2 in solving unimodal functions, which proves that MSD can effectively evolve the search to the dominant region. Furthermore, the gap between E3-EDA and E3-EDA-4 is less than others. The proportion adaptive change trends of the two mechanisms included in MSD on different benchmarks can be found in Appendix A. According to the statistics of the test results, the mean value of P1 in the whole test is 0.4949 and that of P2 is 0.5051. Although both values are close to 0.5, as seen in Appendix A, the change in trend of P1 and P2 are different on different functions, as well as different stages of the same function. The advantage of E3-EDA compared with E3-EDA-4 verifies the rationality and effectiveness of our adaptive scaling strategy Eqs. (11)–(15).

Table 6 Comparison results of the four E3-EDA variants

Moreover, we study the effect of the three search mechanisms on the convergence performance of the algorithm. We select eight test functions with considerable differences in Table 6 to analyze the convergence of the algorithm. These eight benchmark functions are F1, F9, F11, F15, F17, F21, F27 and F29. We plot the convergence curves to compare the convergence performance of the four algorithms in Fig. 13a, 14, 15, 16, 17, 18, 19 and 20a. The convergence performance of the population distribution is analyzed using the variant of the mean value of the Euclidean distance between the population and its current optimal individual, as presented in Figs. 13b, 14, 15, 16, 17, 18, 19 and 20b. The mean value of the Euclidean distance is calculated as follows:

$$ {\text{Mean}}D_{{\text{E}}} = \frac{1}{{{\text{NP}}}}\sum\limits_{i = 1}^{{{\text{NP}}}} {\left( {{\varvec{x}}_{i} - {\varvec{x}}_{{{\text{best}}}} } \right)} . $$
(17)
Fig. 13
figure 13

Convergence analysis on F1

Fig. 14
figure 14

Convergence analysis on F9

Fig. 15
figure 15

Convergence analysis on F11

Fig. 16
figure 16

Convergence analysis on F15

Fig. 17
figure 17

Convergence analysis on F17

Fig. 18
figure 18

Convergence analysis on F21

Fig. 19
figure 19

Convergence analysis on F27

Fig. 20
figure 20

Convergence analysis on F29

Figures 13a, 14, 15, 16, 17, 18, 19 and 20a show that the convergence speed of the different algorithms is approximately the same as their population distribution shrinking speed. Among the four algorithms, without the MSD strategy, E3-EDA-2 has the fastest convergence speed. This is due to the MSD strategy in the other three algorithms expanding the search scope, which delays the convergence performance. However, the absence of the MSD search behavior in E3-EDA-2 leads to its incomplete exploration of the solution space, which makes its performance inferior to E3-EDA in the optimization results of the eight benchmarks. The convergence rate and the convergence of the population distribution of E3-EDA-1 and E3-EDA-3 are poorer than those of the basic E3-EDA, but the reasons for this are different. The absence of the APU strategy makes the E3-EDA-1 algorithm unable to capture the problem characteristics effectively, resulting in an abnormal distribution of the population. However, due to the absence of the TDS strategy, the search range of E3-EDA-3 is too large, precluding effective convergence to the dominant region. Generally, with an ensemble of three strategies, E3-EDA achieves the best performance in terms of both the optimization results and the convergence performance. According to the above analysis of the test results, the ranking of the influence of the three strategies on the algorithm performance from large to small is APU > TDS > MSD.

Comparison with LSHADE-EpSin, UMOEAsII, L-SHADE, MLS-EDA and ACSEDA using the CEC 2014 test suite

To assess the performance of our E3-EDA in solving the numerical optimization problem, the CEC 2014 and CEC 2018 test suites are employed as benchmarks. Both sets of benchmarks are evaluated in 10·D, 30·D, 50·D, and 100·D. The maximum function evaluation numbers are set to 10 × 10,000, 30 × 10,000, 50 × 10,000 and 100 × 10,000, respectively. Each function runs 51 times independently. The error values are recorded statistically in Appendices B and C, including the best, worst, median, mean, and standard deviation (SD) values. As the complexity and scale of the problem increase, the error values of the solutions become larger, especially in addressing the last two types of hybrid and composite benchmarks.

To further evaluate the performance of our method, we employ several state-of-the-art algorithms that perform competitively in the CEC 2014 test: LSHADE-EpSin, UMOEAsII, L-SHADE, MLS-EDA and ACSEDA. LSHADE-EpSin and UMOEAsII won the first two places for the CEC 2014 competition held in 2016. L-SHADE is the basic version of LSHADE-EpSin and was the winner of the CEC 2014 competition held in 2014. MLS-EDA and ACSEDA are two recently developed promising EDA variants. The superior performance of these five competitors was verified using CEC 2014 benchmarks in their original works. The parameter settings of all participants follow their literatures as tabulated in Table 7.

Table 7 Parameter settings of E3-EDA, LSHADE-EpSin, UMOEAsII, L-SHADE, MLS-EDA and ACSEDA in the CEC 2014 test

The mean and SD values derived from the six algorithms are recorded in Tables 8, 9, 10 and 11. for the 10·D, 30·D, 50·D and 100·D tests, respectively. For the first three unimodal functions, E3-EDA, MLS-EDA and ACSEDA show their robustness in finding the optimal values in all the 10·D, 30·D and 50·D tests. For the multimodal functions, E3-EDA, LSHADE-EpSin and UMOEAsII are the top three methods in terms of accuracy. Specifically, in 30·D, E3-EDA wins more than half of these 13 problems. The remaining L-SHADE, MLS-EDA and ACSEDA are less competitive, but all three methods can reach the optimal result on F7 regardless of the dimensionality. Additionally, for the hybrid functions F17–F22, our E3-EDA exhibits its superiority because it performs best on more than half of the benchmarks in four groups of tests except the 10·D test. The hybrid functions are partially separable problems that are close to real-world optimization problems. The domination of E3-EDA in solving this set of tasks verifies its potential in real world applications. As in solving the last eight composite functions, LSHADE-EpSin and UMOEAsII are the two competitive algorithms in all tests. MLS-EDA outperforms the other methods in the 30D test. Our E3-EDA has similar performance to L-SHADE and surpasses ACSEDA. From an overall view, according to the mean error data, our E3-EDA wins 13 out of 30 benchmarks in the 10D test, 17 in the 30·D test, 14 in the 50·D test and 6 in the 100·D test.

Table 8 Experimental results derived from six algorithms in the CEC 2014 10·D test
Table 9 Experimental results derived from six algorithms in the CEC 2014 30·D test
Table 10 Experimental results derived from six algorithms in the CEC 2014 50·D test
Table 11 Experimental results derived from six algorithms in the CEC 2014 100·D test

According to the mean error results, the performance of the six competitors on different types of test functions is statistically compared in Table 12. On this basis, E3-EDA has great advantages over other methods in solving hybrid problems and performs competitively in solving unimodal and multimodal functions. ACSEDA, UMOEAsII and LSHADE-EpSin perform best in solving unimodal, multimodal, and composite tests, respectively.

Table 12 Performance ranking of six algorithms for different types of test functions in different dimension CEC 2014 tests

To further demonstrate the superiority of our algorithm in a comprehensive manner, nonparametric tests are performed. Table 13 presents pairwise comparison results using the Wilcoxon signed-rank test for CEC 2014 10·D, 30·D, 50·D and 100·D tests. The symbols “±/≈” indicate that our E3-EDA has superior, inferior, or similar performance in comparison with other methods based on the evaluated mean error values in Tables 8, 9, 10 and 11. In the following two columns, the symbol “R+” denotes the magnitude with which E3-EDA can surpass the competitor, and “R−” expresses the opposite effect. The rightmost column records the p value derived from the Wilcoxon signed-rank test. If the p value is greater than the significance level (α = 0.05), the comparison of the two algorithms is regarded to be statistically insignificant. Otherwise, E3-EDA is concluded to be better or worse based on the “R+” magnitude. As seen in Table 14, E3-EDA has notable superiority compared with MLS-EDA and ACSEDA in the 10·D test, outmatches L-SHADE and ACSEDA in the 30·D test, and surpasses L-SHADE and MLS-EDA in the 100·D test.

Table 13 Pairwise comparison using the Wilcoxon signed-rank test for the CEC 2014 test (α = 0.05)
Table 14 Multi-comparison using the Friedman test for the CEC 2014 test (α = 0.05)

Moreover, the Friedman test is operated to perform a multiple comparison and rank these six algorithms according to their comprehensive performance. The ranking results for the three tests are presented in Table 14. E3-EDA ranks best among the six algorithms with values of 3.0167 in the 30D test, 3.1167 in the 50·D test and 2.8500 in the 100·D test, and ranks second with a score of 3.1000 in the 10·D test. The outperformance of E3-EDA is supported by the nonparametric studies in the CEC 2014 test, which demonstrates that our developed three search components are conducive to improving the algorithm performance.

In addition to the evidence provided by these optimization results, the execution time is another crucial issue to evaluate the algorithm’s effectiveness. The average time consumptions in the same platform derived from all six participants on each benchmark in the CEC 2014 three tests are recorded in Appendix D. On this basis, Table 15 parallels the average time consumption of the six algorithms in different dimensional tests. According to these results, ACSEDA and L-SHADE are the two most efficient algorithms. Our E3-EDA has similar computational efficiency to MLS-EDA and is superior to UMOEAsII and LSHADE-EpSin. Although E3-EDA is not the most efficient algorithm; it has fewer gaps with ACSEDA based on the recorded results in addressing low-dimensional problems.

Table 15 Comparison of the mean time costs on the CEC 2014 test

Comparison with HSES, LSHADE-RSP, ELSHADE-SPACMA, EBOwithCMAR and ACSEDA using the CEC 2018 test suite

To further verify the effectiveness of our proposal, another test using the CEC 2018 testbed is carried out. Five competitors are employed to make a comparison, including HSES, LSHADE-RSP, ELSHADE-SPACMA, EBOwithCMAR and ACSEDA. HSES, LSHADE-RSP, and ELSHADE-SPACMA are the top three methods of the CEC 2018 test. EBOwithCMAR is the champion of the CEC 2017 competition. ACSEDA represents a novel EDA variant. The parameters of each algorithm are assigned according to the original work, as shown in Table 16. LSHADE-RSP is coded using the C version. Its experimental data are taken from its original work. For the other five algorithms, all 29 benchmarks in CEC 2018 are evaluated 51 times with a fixed evaluation number of D × 10,000. The statistical data, including the mean error and SD values, are presented in Tables 17, 18, 19 and 20.

Table 16 Parameter settings of E3-EDA, HSES, LSHADE-RSP, ELSHADE-SPACMA, EBOwithCMAR, and ACSEDA in the CEC2017 test
Table 17 Experimental results derived from six participants in the CEC 2018 10·D test
Table 18 Experimental results derived from six participants in the CEC 2018 30·D test
Table 19 Experimental results derived from six participants in the CEC 2018 50·D test
Table 20 Experimental results derived from six participants in the CEC 2018 100·D test

The first two unimodal functions are easily addressed because all participants can access the acceptable error values in all tests. In addition, for the following seven multimodal functions, E3-EDA, EBOwithCMAR and ACSEDA are the top three algorithms in solving 10·D problems. Moreover, E3-EDA wins all benchmarks except F4 in the 30·D test and ranks first on F7, F9 and F10 in the 50·D trial. HSES obtains the best results on F4 in all four groups of tests. Furthermore, all six algorithms perform robustly on F6 and F9. For the hybrid functions in the 10D test, EBOwithCMAR obtains better results than the other algorithms, while E3-EDA performs best on F14. For the 30·D test, ACSEDA obtains the best result only on F16. The best values of the remaining 9 functions are accessed by our E3-EDA. The advantage of our E3-EDA is the highest in each row. In the 50·D test, E3-EDA is the best performing algorithm with the best values on F11, F12, F13, F16, F18 and F19. HSES ranks at the top on F14 and F15, while ACSEDA ranks first on F17 and F20. In the 100·D test, E3-EDA is also proven to be the best problem solver by winning on F12, F13, F15, F16 and F19. For the last group of 10 composite functions, E3-EDA wins on more than half of the functions except in the 10D test.

In general, EBOwithCMAR is the top algorithm in the 10·D test, while our E3-EDA has prominent advantages over the other five algorithms in the remaining three tests. Specifically, E3-EDA obtains the best results for almost all 29 benchmarks in the 30·D test except F4, F16, F21, F28, F29 and F30. In the 50·D test, E3-EDA ranks first on 17 out of 29 benchmarks. Moreover, E3-EDA obtains as many as 13 results better than other algorithms in the 100·D test.

To compare the performance of different algorithms more clearly in solving different types of problems, Table 21 compares the ranking results of the six algorithms. E3-EDA possesses the best performance in solving multimodal, hybrid and composite problems. HSES has robust performance in gaining the best results on unimodal functions.

Table 21 Performance ranking of six algorithms for different types of test functions in different dimension CEC 2018 tests

Similarly, the Wilcoxon signed-rank test is performed to conduct a pairwise comparison between E3-EDA and each competitor. The symbols in Table 22 have the same meaning as explained in the previous section. According to the p value recorded in the right column of Table 22, for the 10·D test, E3-EDA obtains similar performance to LSHADE-RSP, ELSHADE-SPACMA, and EBOwithCMAR but has substantial dominance to HSES and ACSEDA. For the remaining three tests, E3-EDA surpasses every competitor except HSES and ACSEDA in the 100·D test.

Table 22 Pairwise comparison using the Wilcoxon signed-rank test for the CEC 2018 tests

Moreover, the Friedman test is carried out to distinguish the differences among the six algorithms. Table 23 provides the rankings of each participant in the four CEC 2018 tests. E3-EDA ranks at the top with the smallest ranking values of 1.9483 in the 30·D test, 2.0690 in the 50·D test and 2.500 in the 100·D test. For the 10·D test, LSHADE-RSP performs best, and E3-EDA ranks third, following EBOwithCMAR. From an overall viewpoint of the ranking results of different dimension tests, E3-EDA possesses more competitive performance than other algorithms.

Table 23 Multi-comparison using the Friedman test for the CEC 2018 tests (α = 0.05)

The execution times of all competitors except LSHADE-RSP are compared in Appendix E, which shows the average time required by these five competitors to optimize each function. Table 24 compares the average amounts of time required by these eight algorithms to solve the benchmarks of different dimensionalities. ACSEDA is the most efficient algorithm. E3-EDA has similar performance to ELSHADE-SPACMA in terms of time consumption and surpasses the remaining two algorithms.

Table 24 Comparison of the computational efficiency for the CEC 2018 tests

Through the above experimental studies, the competitive performance of our E3-EDA is fully demonstrated with comparisons of the top methods in CEC competitions. In addition to the superiority in optimization accuracy, our E3-EDA possesses another two advantages. First, E3-EDA has fewer free parameters than other state-of-the-art methods, which can enhance its robustness in dealing with different problems. On the other hand, the optimal parameter values are determined in the CEC 2014 30·D test, and these settings also support the promising performance of E3-EDA in addressing the CEC 2018 problems, which demonstrates the insensitivity of the algorithm performance to parameter settings. Second, E3-EDA not only achieves the best performance but also shows satisfactory computational efficiency. High efficiency and accuracy are two important indexes for assessing an evolutionary algorithm in practical engineering applications. Thus, our proposal has great potential for addressing real world optimization problems.

Conclusions

As a newly developed EDA algorithm, the search behaviors of E3-EDA with the ensemble of APU, MSD, and TDS are introduced. APU utilizes historical population information to conduct distribution estimation to avoid ill conditioning. The MSD utilizes the location differences among individuals to extend the search scope toward dominant regions. TDS can shrink the search scope and help the algorithm focus on local exploitation. Through experimental studies, the optimal parameter settings of E3-EDA are established, and the influence of the proposed modifications on the algorithm performance in terms of optimization and convergence are discussed.

The E3-EDA performance is evaluated using the CEC 2014 and CEC 2018 test suites. The 10·D, 30·D, 50·D, and 100·D tests are carried out with two sets of problems. Comparisons with the top methods in CEC competitions and other promising EDA variants are performed. According to the nonparametric test results, the performance of E3-EDA with APU, MSD and TDS integrated in solving the two groups of CEC tests is more competitive than the advanced algorithms in terms of accuracy and efficiency, especially outperforming MLS-EDA, which proves the effectiveness of our extended investigation based on previous work to improve the EDA algorithm performance.

Future work can address three aspects. First, the search framework of E3-EDA needs improvement, especially the MSD strategy. The random values in mean point modification must be determined in a more reasonable way. Second, E3-EDA can be further developed with ensembles of other techniques, such as fitness landscape detection and adaptive population size adjustment. Moreover, it would be highly interesting to explore a hybrid algorithm of E3-EDA with modern DE algorithms, such as L-SHADE. Finally, it is also important to apply E3-EDA to address some challenging real world optimization problems.