Introduction

In general, global numerical optimization problem can be expressed as follows (without loss of generality, minimization problem is considered here):

$$\begin{aligned}&\min f(x), \vec {x} - [{x_1},{x_2},\ldots {x_D}] \in {\mathbb {R}}^{D};[x_j^{\mathrm{L}},x_j^U],\nonumber \\&\quad \forall j = 1,2,\ldots ,D \end{aligned}$$
(1)

where f is the objective function, \(\vec {x}\) is the decision vector \(\in {\mathbb {R}}^{D}\) space consisting of D variables, D is the problem dimension, i.e., the number of variables to be optimized, and \(x_j^\mathrm{L} \) and \(x_j^U \)are the lower and upper bounds for each decision variable, respectively. The optimization of the large-scale problems of this kind (i.e. D = 1000) is considered as a challenging task, since the solution space of a problem often increases exponentially with the problem dimension and the characteristics of a problem may change with the scale [1]. Generally speaking, there are different types of real-world large-scale global optimization (LSGO) problems in engineering, manufacturing, economy applications, such as (bio-computing, data or web mining, scheduling, vehicle routing, etc.). To draw more attention to this challenge of optimization, the first competition on (LSGO) was held in CEC 2008 [2]. Consequently, in the recent few years, (LSGO) has gained considerable attention and has attracted much interest from Operations Research and Computer Science professionals, researchers, and practitioners as well as mathematicians and engineers. Therefore, the challenges mentioned above have motivated the researchers to design and improve many kinds of efficient, effective, and robust various kinds of metaheuristic algorithms that can solve (LSGO) problems with high-quality solution and high convergence performance with low computational cost. Evolutionary algorithms (EAs) have been proposed to meet the global optimization challenges. The structure of (EAs) has been inspired from the mechanisms of natural evolution. Due to their adaptability and robustness, EAs are especially capable in solving difficult optimization problems, such as highly nonlinear, non-convex, non-differentiable, and multi-modal optimization problems. In general, the process of (EAs) is based on the exploration and the exploitation of the search space through selection and reproduction operators [3]. Similar to other evolutionary algorithms (EAs), differential evolution (DE) is a stochastic population-based search method, proposed by Storn and Price [4]. The advantages are its simple of implementation, ease of use, speed, and robustness. Due to these advantages, it has successfully been applied for solving many real-world applications, such as admission capacity planning in higher education [5, 6], financial markets dynamic modeling [7], solar energy [8], and many others. In addition, many recent studies prove that the performance of DE is highly competitive with and in many cases superior to other EAs in solving unconstrained optimization problems, constrained optimization problems, multi-objective optimization problems, and other complex optimization problems [9]. However, DE has many weaknesses as all other evolutionary search techniques. In general, DE has a good global exploration ability that can reach the region of global optimum, but it is slow at exploitation of the solution [10]. In addition, the parameters of DE are problem-dependent and it is difficult to adjust them for different problems. Moreover, DE performance decreases as search space dimensionality increases [11]. Finally, the performance of DE deteriorates significantly when the problems of premature convergence and/or stagnation occur [11, 12]. The performance of DE basically depends on the mutation strategy, the crossover operator. Besides, the intrinsic control parameters (population size NP, scaling factor F, and the crossover rate CR) play a vital role in balancing the diversity of population and convergence speed of the algorithm. For the original DE, these parameters are user-defined and kept fixed during the run. However, many recent studies indicate that the performance of DE is highly affected by the parameter setting and the choice of the optimal values of parameters is always problem-dependent. Moreover, prior to an actual optimization process, the traditional time-consuming trial-and-error method is used for fine-tuning the control parameters for each problem. Alternatively, to achieve acceptable results even for the same problem, different parameter settings along with different mutation schemes at different stages of evolution are needed. Therefore, some techniques have been designed to adjust control parameters in adaptive or self-adaptive manner instead of trial-and-error procedure plus new mutation rules have been developed to improve the search capability of DE [13,14,15,16,17,18,19,20,21,22]. Based on the above considerations, in this paper, we present a novel DE, referred as EADE, including two novel modifications: novel mutation rule and self-adaptive scheme for gradual change of CR values. In EADE, a novel mutation rule can balance the global exploration ability and the local exploitation tendency and enhance the convergence rate of the algorithm. Furthermore, a novel adaptation schemes for CR is developed that can benefit from the past experience through generations of evolutionary. Scaling factors are produced according to a uniform distribution to balance the global exploration and local exploitation during the evolution process. EADE has been tested on 20 benchmark test functions developed for the 2010 IEEE Congress on Evolutionary Computation (IEEE CEC 2010) [1]. Furthermore, EADE has been also tasted on 7 benchmark test functions developed for the 2008 IEEE Congress on Evolutionary Computation (IEEE CEC 2008) [2]. The experimental results indicate that the proposed algorithm and its two versions are highly competitive algorithms for solving large-scale global optimization problems. The remainder of this paper is organized as follows. The next section briefly introduces DE and its operators followed by which the related work is reviewed. In the subsequent section, EADE algorithm is presented. The experimental results are given before the concluding section. Finally, the conclusions and future works are presented.

Differential evolution (DE)

This section provides a brief summary of the basic Differential Evolution (DE) algorithm. In simple DE, generally known as DE/rand/1/bin [23, 24], an initial random population consists of NP vectors \({\vec {X}},\forall \quad i=1,2,\ldots ,NP\), is randomly generated according to a uniform distribution within the lower and upper boundaries (\(x_j^\mathrm{L} ,x_j^\mathrm{U})\). After initialization, these individuals are evolved by DE operators (mutation and crossover) to generate a trial vector. A comparison between the parent and its trial vector is then done to select the vector which should survive to the next generation [9]. DE steps are discussed below:

Initialization

To establish a starting point for the optimization process, an initial population P\(^{0}\) must be created. Typically, each j th component \((j=1,2,\ldots ,D)\)of the ith individuals \((i=1,2,\ldots ,NP)\) in the P\(^{0}\) is obtained as follows:

$$\begin{aligned} x_{j,i}^0 =x_{j,L} +\mathrm{rand}(0,1)\cdot (x_{j,U} -x_{j,L} ) \end{aligned}$$
(2)

where rand (0,1) returns a uniformly distributed random number in [0, 1].

Mutation

At generation G, for each target vector \(x_i^G \), a mutant vector \(v_i^G \) is generated according to the following:

$$\begin{aligned} v_i^G =x_{r_1 }^G +F\cdot (x_{r_2 }^G -x_{r_3 }^G ). r_1 \ne r_2 \ne r_3 \ne i \end{aligned}$$
(3)

With randomly chosen indices \(r_1 ,r_2 ,r_3 \in \{1,2,\ldots ,NP\}, F\) is a real number to control the amplification of the difference vector \((x_{r_2 }^G -x_{r_3 }^G )\). According to Storn and Price [4], the range of F is in [0, 2]. In this work, if a component of a mutant vector violates search space, then the value of this component is generated a new using (2). The other most frequently used mutations strategies are

$$\begin{aligned}&``\mathrm{DE}/\mathrm{best}/1''[17]: v_i^G =x_\mathrm{best}^G +F\cdot (x_{r_1 }^G -x_{r_2 }^G ) \end{aligned}$$
(4)
$$\begin{aligned}&``\mathrm{DE}/\mathrm{best}/2''[17]: v_i^G =x_\mathrm{best}^G +F\cdot (x_{r_1 }^G -x_{r_2 }^G)\nonumber \\&\quad +\,F\cdot (x_{r_3 }^G -x_{r_4 }^G ) \end{aligned}$$
(5)
$$\begin{aligned}&``\mathrm{DE}/\mathrm{rand}/2''[17]: v_i^G =x_{r_1 }^G +F\cdot (x_{r_2 }^G -x_{r_3 }^G )\nonumber \\&\quad +F\cdot (x_{r_4 }^G -x_{r_5 }^G ) \end{aligned}$$
(6)
$$\begin{aligned}&``\mathrm{DE}/\text {current-to-best}/1''[17]: v_i^G =x_i^G\nonumber \\&\quad +\,F\cdot (x_\mathrm{best}^G -x_i^G ) +\,F\cdot (x_{r_1 }^G -x_{r_2 }^G ) \end{aligned}$$
(7)
$$\begin{aligned}&``\mathrm{DE}/\text {current-to-rand}/1''[17]: v_i^G =x_i^G \nonumber \\&\quad +\,F\cdot (x_{r_1 }^G -x_i^G )+F\cdot (x_{r_2 }^G -x_{r_3 }^G ). \end{aligned}$$
(8)

The indices \(r_1 , r_2 , r_3 , r_4 , r_{5 }\) are mutually integers randomly generated within the range [1,NP], which are also different from the index i. These indices are randomly generated once for each mutant vector. The scale factor F is a positive control parameter for scaling the difference vector. \(x_\mathrm{best}^G \) is the best individual vector with the best fitness value in the population at generation G.

Crossover

There are two main crossover types, binomial and exponential. We here elaborate the binomial crossover. In the binomial crossover, the target vector is mixed with the mutated vector, using the following scheme, to yield the trial vector \(u_i^G \).

$$\begin{aligned} u_{j,i}^G =\left\{ {_{x_{j,i}^G , \quad {\hbox {otherwise}}}^{v_{j,i}^G , \quad \hbox {if} (\hbox {rand}_{j,i} \,\le \, \hbox {CR}\, \mathrm{or} j=j_{\hbox {rand}} )} } \right. \end{aligned}$$
(9)

where \(\mathrm{rand}_{j,i} \, (i\in [1,NP]\, and\, j\in [1,D])\) is a uniformly distributed random number in [0,1], \(CR\in [0,1]\) called the crossover rate that controls how many components are inherited from the mutant vector, \(j_\mathrm{rand} \) is a uniformly distributed random integer in [1, D] that makes sure at least one component of trial vector is inherited from the mutant vector.

Selection

DE adapts a greedy selection strategy. If and only if the trial vector \(u_i^G \) yields as good as or a better fitness function value than \(x_i^G \), then \(u_i^G \) is set to \(x_i^{G+1} \). Otherwise, the old vector \(x_i^G \) is retained. The selection scheme is as follows (for a minimization problem):

$$\begin{aligned} x_i^{G+1} =\left\{ {_{x_i^G ,\quad \hbox {otherwise}}^{u_i^G ,\quad f(u_i^G ) \le f(x_i^G )} } \right. \end{aligned}$$
(10)

A detailed description of standard DE algorithm is given in Fig. 1.

Fig. 1
figure 1

Description of standard DE algorithm. rand [0,1) is a function that returns a real number between 0 and 1. randint (min, max) is a function that returns an integer number between min and max. NP, \(G_\mathrm{max}\), CR, and F are user-defined parameters. D is the dimensionality of the problem

Related work

As previously mentioned, during the past few years, LSGO has attracted much attention by the researches due to its significance as many real-world problems and applications are high-dimensional problems in nature. Basically, the current EA-based LSGO research can be classified into two categories:

  • Cooperative Co-evolution (CC) framework algorithms or divide-and-conquer methods.

  • Non Cooperative Co-evolution (CC) framework algorithms or no divide-and-conquer methods.

Cooperative Co-evolution (CC) has become a popular and effective technique in Evolutionary Algorithms (EAs) for large-scale global optimization since its initiation in the publication of Potter and De Jong [26]. The main idea of CC is to partition the LSGO problem into a number of sub-problems, i.e., the decision variables of the problem are divided into smaller subcomponents, each of which is optimized using a separate EA. Using this divide-and-conquer method, the classical EAs are able to effectively solve many separable problems [26]. CC shows better performance on separable problems, but deteriorated on non-separable problems, because the interacting variables could not be grouped in one subcomponent. Recently, different versions of CC-based EAs have been developed and shown excellent performance. Yang et al. [27] proposed a new decomposition strategy called random grouping as a simple way of increasing the probability of grouping interacting variables in one subcomponent. According to this strategy, without any prior knowledge of the non-separability of a problem, subdivide a n-dimensional decision vector into m s-dimensional subcomponents. Later, Omidvar et al. [28] proposed DECC-DML algorithm which is a differential evolution algorithm adopting CC frame. They suggested a new decomposition strategy called delta grouping. The central idea of this technique was the improvement interval of interacting variables that would be limited if they were in different subcomponents. Delta method measures the averaged difference in a certain variable across the entire population and uses it for identifying interacting variables. The experimental results show that this new method is more effective than the existing random grouping method. However, DECC-DML is less efficient on non-separable functions with more than one group of rotated variables. Many CC-based algorithms have been developed during the past decade such as FEPCC [29], DECC-I, DECC-II [30], MLCC [31], SEE [32], and AM-CCPSO [33]. On the other hand, there are many other approaches that optimize LSGO problems as a whole, that is, no divide-and-conquer methods were used. Actually, it is considered as a challenging task as it needs to develop novel evolutionary operators that can promote and strengthen the capability of the algorithms to improve the overall optimization process in high-dimensional search space. Takahama and Sakai [34] proposed DE with landscape modality detection and a diversity archive (LMDEa). In this method, the landscape modality is observed at every fixed generation. Based on the modality detection, F is controlled dynamically. LMDEa showed excellent performance for the large-scale optimization problems. Brest et al. [35] presented self-adaptive Differential Evolution algorithm (jDElsgo). In this approach, self-adaptive F and Cr control parameters and “rand/1/bin” strategy along with population size reduction mechanism are used. Similarly, Wang et al. [36] introduced a sequential Differential Evolution (DE) enhanced by neighborhood search (SDENS), where hybrid crossover strategy “rand/1/bin” and “rand/1/exp” are used. To search the neighbors of each individual, two trial individuals by local and global neighborhood search strategies are created. Then, the fittest one among the current individual and the two created trial individuals is selected as a new current individual. Molina et al. [37] put forward a memetic algorithm based on local search chains, named MA-SW-Chains, which assigned local search intensity to each individual depending on its features by changing different local search applications. Kabán et al. [38] proposed some fundamental roots of the problem and make a start at developing a new and generic framework to yield effective and efficient estimations of distribution algorithms (EDAs)-type algorithms for large-scale continuous global optimization problems. Cheng and Jin [39] introduced a novel competitive swarm optimizer (CSO). The algorithm is fundamentally inspired by the particle swarm optimization but is conceptually very different. In the proposed CSO, neither the personal best position of each particle nor the global best position (or neighborhood best positions) is involved in updating the particles. Instead, a pair wise competition mechanism is introduced, where the particle that loses the competition will update its position by learning from the winner. Recently, different versions of non-CC-based EAs have been developed and shown excellent performance such as SL-PSO [40], GODE [41], and EDA-MCC [42]. In general, the proposed EADE algorithm belongs to this category.

EADE algorithm

In this section, we outline a novel DE algorithm, EADE, and explain the steps of the algorithm in details.

Novel mutation scheme

DE/rand/1 is the fundamental mutation strategy developed by Storn and Price [23, 25], and it is reported to be the most successful and widely used scheme in the literature [9]. Obviously, in this strategy, the three vectors are chosen from the population at random for mutation and the base vector is then selected at random among the three. The other two vectors form the difference vector that is added to the base vector. Consequently, it is able to maintain population diversity and global search capability with no bias to any specific search direction, but it slows down the convergence speed of DE algorithms [15]. DE/rand/2 strategy, like the former scheme with extra two vectors that form another difference vector, which might lead to better perturbation than one-difference-vector-based strategies [15]. Furthermore, it can provide more various differential trial vectors than the DE/rand/1/bin strategy which increase its exploration ability of the search space. On the other hand, greedy strategies like DE/best/1, DE/best/2, and DE/current-to-best/1 incorporate the information of the best solution found so far in the evolutionary process to increase the local search tendency that leads to fast convergence speed of the algorithm. However, the diversity of the population and exploration capability of the algorithm may deteriorate or may be completely lost through a very small number of generations, i.e., at the beginning of the optimization process, that cause problems such stagnation and/or premature convergence. Consequently, to overcome the shortcomings of both types of mutation strategies, most of the recent successful algorithms utilize the strategy candidate pool that combines different trail vector generation strategies that have diverse characteristics and distinct optimization capabilities, with different control parameter settings to be able to deal with a variety of problems with different features at different stages of evolution [15, 17, 41]. Contrarily, taking into consideration the weakness of existing greedy strategies, [16] introduced a new differential evolution (DE) algorithm, named JADE, to improve optimization performance by implementing a new mutation strategy “DE/current-to-pbest” with optional external archive and updating control parameters in an adaptive manner. Consequently, proposing new mutations strategies that can considerably improve the search capability of DE algorithms and increase the possibility of achieving promising and successful results in complex and large-scale optimization problems is still an open challenges for the evolutionary computation research. Therefore, this research uses a new mutation rule with a view of balancing the global exploration ability and the local exploitation tendency and enhancing the convergence rate of the algorithm. The proposed mutation strategy uses two random chosen vectors of the top and bottom 100p% individuals in the current population of size NP, while the third vector is selected randomly from the middle (NP-2(100p%)) individuals. The proposed mutation vector is generated in the following manner:

$$\begin{aligned} \nu _i^{G+1} =x_r^G +F1\cdot (x_{p\_\mathrm{best}}^G -x_r^G )+F2\cdot (x_r^G -x_{p\_\mathrm{worst}}^G )\qquad \end{aligned}$$
(11)

where \(x_r^G \)is a random chosen vector from the middle (NP-2(100p%)) individuals, and \(x_{p\_\mathrm{best}}^G \) and \(x_{p\_\mathrm{worst}}^G\) are randomly chosen as one of the top and bottom 100p% individuals in the current population, respectively, with \(p\in ( {0,1} ], \, F1\) and F2 are the mutation factors that are independently generated according to uniform distribution in (0,1). Really, the main idea of the proposed novel mutation is based on that each vector learns from the position of the top best and the bottom worst individuals among the entire population of a particular generation. Obviously, from mutation Eq. (11), it can be observed that the incorporation of the objective function value in the mutation scheme has two benefits. First, the target vectors are not always attracted toward the same best position found so far by the entire population. Thus, the premature convergence at local optima can be almost avoided by following the same direction of the top best vectors which preserves the exploration capability. Secondly, avoiding the direction of the bottom worst vectors can enhances the exploitation tendency by guiding the search process to the promising regions of the search space, i.e., it concentrates the exploitation of some sub-regions of the search space. Therefore, the directed perturbations in the proposed mutation resemble the concept of gradient as the difference vectors are directed from the worst vectors to the best vectors [43]. Thus, it is considerably used to explore the landscape of the objective function being optimized in different sub-regions around the best vectors within search space through optimization process. Thus, by utilizing and sharing the best and worst information of the DE population, the proposed directed mutation balances both global exploration capability and local exploitation tendency. The new mutation strategy is embedded into the DE algorithm and combined with the basic mutation strategy DE/rand/1/bin, where only one of the two mutation rules is applied with the probability of 0.5.

Parameter adaptation schemes in EADE

The successful performance of DE algorithm is significantly dependent upon the choice of its three control parameters: The scaling factor F, crossover rate CR, and population size NP [23, 25]. In fact, they have a vital role, because they greatly influence the effectiveness, efficiency, and robustness of the algorithm. Furthermore, it is difficult to determine the optimal values of the control parameters for a variety of problems with different characteristics at different stages of evolution. In the proposed HDE algorithm, NP is kept as a user-specified parameter, since it highly depends on the problem complexity. Generally speaking, F is an important parameter that controls the evolving rate of the population, i.e., it is closely related to the convergence speed [15]. A small F value encourages the exploitation tendency of the algorithm that makes the search focus on neighborhood of the current solutions; hence, it can enhance the convergence speed. However, it may also lead to premature convergence [43]. On the other hand, a large F value improves the exploration capability of the algorithm that can makes the mutant vectors distribute widely in the search space and can increase the diversity of the population [43]. However, it may slow down the search [43] with respect to the scaling factors in the proposed algorithm, at each generation G, the scale factors F1 and F2 of each individual target vector are independently generated according to uniform distribution in (0,1) to enrich the search behavior. The constant crossover (CR) reflects the probability with which the trial individual inherits the actual individual’s genes, i.e., which and how many components are mutated in each element of the current population [17, 43]. The constant crossover CR practically controls the diversity of the population [44]. As a matter of fact, if \(\hbox {CR}\) is high, this will increase the population diversity. Nevertheless, the stability of the algorithm may reduce. On the other hand, small values of CR increase the possibility of stagnation that may weak the exploration ability of the algorithm to open up new search space. In addition, CR is usually more sensitive to problems with different characteristics such as unimodality and multi-modality, and separable and non-separable problems. For separable problems, CR from the range (0, 0.2) is the best, while for multi-modal, parameter dependent problems, CR in the range (0.9,1) is suitable [45]. On the other hand, there are wide varieties of approaches for adapting or self-adapting control parameters values through optimization process. Most of these methods based on generating random values from uniform, normal, or Cauchy distributions or by generating different values from pre-defined parameter candidate pool besides use the previous experience (of generating better solutions) to guide the adaptation of these parameters [11, 15,16,17, 19, 45,46,47,48,49]. The presented work proposed a novel self-adaptation scheme for CR. The core idea of the proposed self-adaptation scheme for the crossover rate CR is based on the following fundamental principle. In the initial stage of the search process, the difference among individual vectors is large, because the vectors in the population are completely dispersed or the population diversity is large due to the random distribution of the individuals in the search space that requires a relatively smaller crossover value. Then, as the population evolves through generations, the diversity of the population decreases as the vectors in the population are clustered, because each individual gets closer to the best vector found so far. Consequently, to maintain the population diversity and improve the convergence speed, crossover should be gradually utilized with larger values along with the generations of evolution increased to preserve well genes in so far as possible and promote the convergence performance. Therefore, the population diversity can be greatly enhanced through generations. However, there is no an appropriate CR value that balances both the diversity and convergence speed when solve a given problem during overall optimization process. Consequently, to address this problem and following the SaDE algorithm [15], in this paper, a novel adaptation scheme for CR is developed that can benefit from the past experience through generations of evolutionary.

Crossover rate Adaptation At each generation G, the crossover probability CR\(_{i}\) of each individual target vector is independently generated randomly from pool A according to uniform distribution and the following procedure exists through generations. Where A is the pool of values of crossover rate CR that changes during and after the learning period LP, we set LP \(=\) 10% of GEN, G is the current generation number, and GEN is the maximum number of generations. The lower and upper limits of the ranges for (G) are experimentally determined, CR_Flag_List [i] is the list that contains one of two binary values (0,1) for each individual i through generation G, where 0 represents failure, no improvement, when the target vector is better than the trial vector during and after the learning period and 1 represent success, improvement, when the trial vector is better than the target vector during and after the learning period, the failure_counter_list [i] is the list that monitors the working of individuals in terms of fitness function value during generations after completion the learning period, if there is no improvement in fitness, then the failure counter of this target vector is increased by unity. This process is repeated until it achieves pre-specified value of Max_failure_counter which assigned a value 20 that is experimentally determined, CR_Ratio_List [k] is the list that records the relative change improvement ratios between the trial and target objective function values with respect to each value k of the pool of values A of CR through generation G. It can be clearly seen from procedure 1 that, at \(G=1\), CR \(=\) 0.05 for each target vector and then, at each generation G, if the generated trial vector produced is better than the target vector, the relative change improvement ratio (RCIR) associated with this CR value is computed and the correspondence ratio is updated. On the other hand, during the learning period, if the target vector is better than the trial vector, then the CR value is chosen randomly from the associated pool A of CR values, that is gradually added more values, according to generation number and hence, for this CR value, there is no improvement and its ratio remains unchanged. However, after termination of the learning period, if the target vector is better than the trial vector, i.e., if there is no improvement in fitness, then the failure_counter is increased by one in each generation till this value achieves a pre-specified value of Max_failure_counter which assigned a value 20, then this CR value should change to a new value that is randomly selected from the pool A of CR values that is taken in range 0.1–0.9 in steps of 0.1 and 0.05 and 0.95 which are also included as lower and upper values, respectively. Note that the (RCIR) is only updated if there is an improvement; Otherwise, it remains constant. Thus, the CR value with maximum ratio is continuously changing according to the evolution process at each subsequent generation. In fact, although all test problems included in this study have optimum of zero, the absolute value is used in calculating (RCIR) as a general rule to deal with positive, negative, or mixed values of objective function. Concretely, Fig. 2 shows that, during the first half of the learning period, the construction of pool A of CR values ensures the diversity of the population, such that the crossover probability for ith individual target increases gradually in staircase along with the generations of evolution process increased. Taking into consideration that the probability of chosen small CR values is greater than the probability of chosen the larger CR values as the diversity of the population is still large. In addition, in the second half of the learning period, there is a larger values of 0.9 and 0.95 which are added to the pool as it favors non-separable functions. However, all the values have an equally likely chance of occurrence to keep on the diversity with different values of CR. Consequently, the successful CR values with high relative change improvement ratio in this period will be survive to be used in the next generations of the optimization process until it fails to achieve improvement for a specific value of 20, then it must be changed randomly by a new value. Thus, the value of CR is adaptively changed as the diversity of the population changes through generations. Distinctly, it varies from one individual to another during generations, and also, it is different from one function to another one being optimized. In general, adaptive control parameters with different values during the optimization process in successive generations enrich the algorithm with controlled-randomness which enhances the global optimization performance of the algorithm in terms of exploration and exploitation capabilities. Therefore, it can be concluded that the proposed novel adaptation scheme for gradual change of the values of the crossover rate can excellently benefit from the past experience of the individuals in the search space during evolution process which, in turn, can considerably balance the common trade-off between the population diversity and convergence speed. The pseudocode of EADE is presented in Fig. 3.

Fig. 2
figure 2

Pseudocode of crossover rate CR

Fig. 3
figure 3

Description of EADE algorithm

Experimental study

Benchmark functions

The performance of the proposed EADE algorithm has been tasted on 20 scalable optimization functions for the CEC 2010 special session and competition on large-Scale Global Optimization. A detailed description of these test functions can be found in [1]. These 20 test functions can be divided into four classes:

  1. 1.

    Separable functions \(F_{1}\)\(F_{3}\);

  2. 2.

    Partially separable functions, in which a small number of variables are dependent, while all the remaining ones are independent (\(m=50\)) \(F_{4}\)\(F_{8}\);

  3. 3.

    Partially separable functions that consist of multiple independent subcomponents, each of which is m-non-separable (\(m=50\)) \(F_{9}\)\(F_{18}\);

  4. 4.

    Fully non-separable functions \(F_{19}\)\(F_{20};\)

where the sphere function, the rotated elliptic function, Schwefels Problem 1.2, Rosenbrock function, the rotated Rastrigins function, and the rotated Ackleys function are used as the basic functions. The control parameter used to define the degree of separability of a given function in the given test suite is set as \(m =50\). The dimensions (D) of functions are 1000. In addition, the performance of the proposed EADE algorithm has also been tasted on 7 scalable optimization functions for the CEC 2008 special session and competition on Large-Scale Global Optimization. A detailed description of these test functions can be found in [2]. These 7 test functions are Shifted Sphere Function (\(F_{1})\), Shifted Schwefel’s Problem 2.21 (\(F_{2})\), Shifted Rosenbrock’s Function (\(F_{3})\), Shifted Rastrigin’s Function (\(F_{4})\), Shifted Griewank’s Function (\(F_{5})\), Shifted Ackley’s Function (\(F_{6})\), and FastFractal “DoubleDip” Function (\(F_{7})\). The dimensions (D) of functions are 100, 500, and 1000. These functions can be divided into two classes:

  1. 1.

    Separable functions: \(F_{1}, F_{4}, F_{5}\, \mathrm{and}\, F_{6}\);

  2. 2.

    non-separable functions \(F_{2}, F_{3} \,\mathrm{and}\, F_{7}.\)

Note that \(F_{5 }\)is grouped as a non-separable function, because the product component becomes less significant with the increase of dimension [39].

Parameter settings and involved algorithms

To evaluate the performance of algorithm, experiments were conducted on these two test suites. We adopt the solution error measure \((f(x) - f(x*))\), where f(x) is the best solution obtained by algorithms in one run and \(f(x*)\) is well-known global optimum of each benchmark function, which is recorded after 1.2e\(+\)05, 6.0e\(+\)05, and 3.0e\(+\)06 function evaluations (FEs) for CEC’2010 and 5.0e\(+\)03*D function evaluations (FEs) for CEC’2008, respectively. All experiments for each function run 25 times independently and statistical results are provided including the best, median, mean, worst results, and the standard deviation. The population size in EADE was set to 50 for CEC’2010 and 50 for \(D=100\) and 100 for \(D=500\) and 1000 for CEC’2008, respectively. The p parameter is set to 0.1, i.e., the top 10% high-quality and bottom 10% low-quality solutions in the mutation are considered. The learning period (LP) and the maximum failure counter (MFC) are set to 10% of total generations and 20 generations, respectively. It is a comparatively good parameter combination that has been experimentally investigated and tuned by us. For separable functions \( F_{1}\)\(F_{3}\) in CEC’2010 and \(F_{1}, F_{4}, F_{5}\, \mathrm{and} F_{6 }\) in CEC’2008, CR is chosen to be 0.05 as they are separable functions. Regarding CEC’2010, EADE was compared to DE-based algorithms that were all tested on this test suite in this competition. These algorithms are:

  • Cooperative Co-evolution with Delta Grouping for Large-Scale Non-separable Function Optimization (DECC-DML) [28].

  • Large-scale optimization by Differential Evolution with Landscape modality detection and a diversity archive (LMDEa) [34].

  • Large-Scale Global Optimization using Self-adaptive Differential Evolution Algorithm (jDElsgo) [35].

  • DE Enhanced by Neighborhood Search for Large-Scale Global Optimization (SDENS) [36].

Besides, Memetic algorithm based on local search chains for large-scale continuous global optimization (MA-SW-Chains) algorithm, which is non-DE-based algorithm, as it won the CEC’2010 LSGO competition [37].

On the other hand, regarding CEC’2008, EADE was compared to different evolutionary algorithms that were all tested on this test suite in this competition or recently. These algorithms are:

  • A competitive swarm optimizer for large-scale optimization (CEO) [39].

  • A social learning particle swarm optimization algorithm for scalable optimization (SL-PSO) [40].

  • Cooperatively co-evolving particle swarms for large-scale optimization (CCPSO2) [50].

  • A simple modification in CMA-ES achieving linear time and space complexity (sep-CMA-ES) [51].

  • Solving large-scale global optimization using improved particle swarm optimizer (EPUS-PSO) [52].

  • Multilevel cooperative co-evolution for large-scale optimization (MLCC) [31].

  • Dynamic multi-swarm particle swarm optimizer with local search for large-scale global Optimization (DMS-L-PSO) [53].

Among these seven algorithms, CEO [39] is the most recently proposed state-of-the-art for large-scale optimization, which belongs to the cooperative co-evolution (CC) framework [26] for large-scale optimization. In the proposed CSO, neither the personal best position of each particle nor the global best position (or neighborhood best positions) is involved in updating the particles. Instead, a pairwise competition mechanism is introduced, where the particle that loses the competition will update its position by learning from the winner. To understand the search behavior of the proposed CSO, a theoretical proof of convergence is also provided. Similarly, a social learning PSO (SL-PSO) [40] has also been proposed belonging to the CC framework. Unlike classical PSO variants, each particle in the proposed SL-PSO learns from any better particles (termed demonstrators) in the current swarm. In addition, to ease the burden of parameter settings, the proposed SL-PSO adopts a dimension-dependent parameter control method. CCPSO2 also belongs to the cooperative co-evolution (CC) framework [50] for large-scale optimization, where a random grouping strategy is adopted based on the idea of divide-and-conquer [26]. The sep-CMA-ES is an extension of the original CMA-ES algorithm [51], which has been shown more efficient and fairly scalable to some high-dimensional test functions up to 1000-D. EPUS-PSO is another PSO variant which adjusts the swarm size according to the search results [52], and DMS-L-PSO is the DMS-PSO enhanced with a local search operator [53].

To compare the solution quality from a statistical angle of different algorithms and to check the behavior of the stochastic algorithms [54], the results are compared using multi-problem Wilcoxon signed-rank test at a 0.05 significance level. Wilcoxon signed-rank test is a non-parametric statistical test that allows us to judge the difference between paired scores when it cannot make the assumption required by the paired-sample t test, such as that the population should be normally distributed, where R\(^{+}\) denotes the sum of ranks for the test problems in which the first algorithm performs better than the second algorithm (in the first column), and R\(^{-}\) represents the sum of ranks for the test problems in which the first algorithm performs worse than the second algorithm (in the first column). Larger ranks indicate larger performance discrepancy. The numbers in Better, Equal, and Worse columns denote the number of problems in which the first algorithm is better than, equal, or worse than the second algorithm. As a null hypothesis, it is assumed that there is no significance difference between the mean results of the two samples. Whereas the alternative hypothesis is that there is significance in the mean results of the two samples, the number of test problems \(N=20\) for 1.25e\(+\)05, 6.00e\(+\)05, and 3.00e\(+\)006 Function evaluations for CEC’2010, while the number of test problems \(N=7\) for 5.00e\(+\)005, 2.50E\(+\)06, and 5.00e\(+\)006 Function evaluations with \(D=100\), \(D=500\), \(D=1000\) for CEC’2008 and 5% significance level. Use the smaller of the values of the sums as the test value and compare it with the critical value or use the p value and compare it with the significance level. Reject the null hypothesis if the test value is less than or equal to the critical value or if the p value is less than or equal to the significance level (5%). Based on the result of the test, one of three signs (\(+\), −, and \(\approx \)) is assigned for the comparison of any two algorithms (shown in the last column), where (\(+\)) sign means the first algorithm is significantly better than the second, (−) sign means that the first algorithm is significantly worse than the second, and (\(\approx \)) sign means that there is no significant difference between the two algorithms. In addition, to obtain the final rankings of different algorithms for all functions, the Friedman test is used at a 0.05 significance level. All the p values in this paper were computed using SPSS (the version is 20.00).

To perform comprehensive evaluation and to assess the effectiveness of the proposed self-adaptive crossover rate scheme and new mutation scheme, another version of EADE, named EADE*, has been tested and compared against EADE and other DE-based algorithms. EADE* is the same as EADE except that the new mutation scheme is only used.

Experimental results and discussions

In this section, we compare directly the mean results obtained by EADE and EADE* with the ones obtained by LMDEa [34], SDENS [36], jDElsgo [35], DECC-DML [28], and MA-SW-chains [37] for CEC’2010. Tables 1, 2, 3 contain the results obtained by all algorithms in 1.2e\(+\)05, 6.0e\(+\)05, and 3.0e\(+\)06 function evaluations (FEs), respectively. For remarking the best algorithm, best mean for each function is highlighted in boldface. From these tables, we have highlighted the following direct comparisons and conclusions:

  • For many test functions, the worst results obtained by the proposed algorithms are better than the best results obtained by other algorithms with all FEs.

  • For many test functions, there are continuous improvement in the results obtained by our proposed algorithms, especially EADE and EADE*, with all FEs, while the results with FEs = 6.0E\(+\)05 are very close to the results with FEs = 3.0E\(+\)06 obtained by some of the compared algorithms which indicate that our proposed approaches are scalable enough and can balance greatly the exploration and exploitation abilities for solving high-dimensional problems until the maximum FEs are reached.

  • For many functions, the remarkable performance of EADE and EADE* with FEs = 1.20E\(+\)05, and FEs = 6.0E\(+\)05 compared to the performance of other algorithms shows its fast convergence behavior. Thus, our proposed algorithms can perform well and achieve good results within limited number of function evaluations which is very important issue when dealing with real-world problems.

  • EADE and EADE* got very close to the optimum of single-group m-non-separable multi-modal functions F\(_{6}\) in all statistical results with 1.20E\(+\)05 FEs.

  • EADE and LMDEa, among all other algorithms, got very close to the optimum in all runs of single-group m-non-separable multi-modal functions F\(_{8}\) with 3.0E\(+\)06 FEs.

  • The performance of EADE and EADE* performs well in all types of problems which indicate that it is less affected than the most of other algorithms by the characteristics of the problems.

Table 1 Experimental comparisons between EADE, EADE*, and state-of-the-art algorithms, FES \(=\) 1.20E\(+\)05
Table 2 Experimental comparisons between EADE, EADE*, and state-of-the-art algorithms, FES \(=\) 6.00E\(+\)05
Table 3 Experimental comparisons between EADE, EADE*, and state-of-the-art algorithms, FES \(=\) 3.0E\(+\)06
Table 4 Results of multiple-problem Wilcoxon’s test for EADE and EADE* versus LMDEa, SDENS, jDElsgo, and DECC-DML over all functions at a 0.05 significance level with (1.25E\(+\)05 FES)
Table 5 Results of multiple-problem Wilcoxon’s test for EADE and EADE* versus LMDEa, SDENS, jDElsgo, and DECC-DML over all functions at a 0.05 significance level with (6.00E\(+\)05 FES)
Table 6 Results of multiple-problem Wilcoxon’s test for EADE and EADE* versus LMDEa, SDENS, jDElsgo, and DECC-DML over all functions at a 0.05 significance level with (3.00E\(+\)06 FES)
Table 7 Average ranks for all algorithms across all problems and 1.2e\(+\)05, 6.0e\(+\)05, and 3.0e\(+\)06 function evaluations (FEs)
Table 8 Experimental comparisons between EADE and state-of-the-art algorithms, \(D = 100\)
Table 9 Experimental comparisons between EADE and state-of-the-art algorithms, \(D = 500\)
Table 10 Experimental comparisons between EADE and state-of-the-art algorithms, \(D= 1000\)
Table 11 Results of multiple-problem Wilcoxon’s test for EADE versus state-of-the-art algorithms over all functions at a 0.05 significance level with (\(D = 100\))

Furthermore, compared to the complicated structures and number of methods and number of control parameters used in other algorithms, we can see that our proposed EADE and EADE* are very simple and easy to be implemented and programmed in many programming languages. They only use very simple self-adaptive crossover rate with two parameters and a novel mutation rule with one parameters and basic mutation. Thus, they neither increase the complexity of the original DE algorithm nor the number of control parameters. To investigate and compare the performance of the proposed algorithms EADE and EADE* against other algorithms in statistical sense, multi-problem Wilcoxon signed-rank test at a significance level 0.05 is performed on mean errors of all problems with (1.25E\(+\)05 FES, 6.00E\(+\)05 FES, and 3.00E\(+\)06 FES), and the results are presented in Tables 4, 5, and 6, respectively. Where R\(^{+}\) is the sum of ranks for the functions in which first algorithm outperforms the second algorithm in the row, and R\(^{-}\) is the sum of ranks for the opposite. From Table 4 and 5, it can be obviously seen that EADE and EADE* are significantly better than SDENS, jDElsgo, and DECC-DML algorithms. Moreover, there is no significant difference between EADE*, LMDEa, and EADE algorithm. However, MA-SW chains are significantly better than EADE and EADE* algorithms. Finally, from Table 5, it can be obviously seen that EADE and EADE* are significantly better than SDENS and DECC-DML algorithms, EADE* is significantly worse than LMDEa algorithm. Besides, there is no significant difference between EADE*, LMDEa, and jDElsgo and EADE. From Tables 4 and 5, it is noteworthy that EADE* is better than all DE s algorithms (LMDEa, SDENS, jDElsgo, and DECC-DML). Moreover, from Table 6, EADE* outperforms SDENS and DECC-DML algorithms and it is competitive with jDElsgo algorithm which indicate that the new mutation scheme helps to maintain effectively the balance between the global exploration and local exploitation abilities for searching process of the DE during the search process. EADE outperforms SDENS and DECC-DML algorithms, and it is competitive with jDElsgo, LMDEa, and MA-SW-chains algorithms. Furthermore, the performance of all algorithms is analyzed using all function evaluations (Fes) and different categories of functions. Therefore, the mean aggregated rank of all the 6 algorithms across all problems (20) and all 1.2e\(+\)05, 6.0e\(+\)05, and 3.0e\(+\)06 function evaluations (FEs) is presented in Table 7. The best ranks are marked in bold and the second ranks are underlined. From Table 7, it can be clearly concluded that MA-SW-chains is the best followed by EADE as second best among all algorithms while EADE* is ranked third. Note that the main contribution of this study is to propose a DE framework, and not to propose a “Best” algorithm or competitor to defeat other state-of-the-art algorithms. However, it is noteworthy to mentioning that the performance of EADE considerably increases as the number of functions evaluation increases from 1.25E\(+\)05 to 3.00E\(+\)06 which means that it benefits from extra FES. Therefore, it can be obviously observed from Tables 4, 5, and 6 that EADE is inferior to MA-SW chains for 17, 13, and 10 functions in 1.25E\(+\)05, 6.00E\(+\)05, and 3.00E\(+\)06 FES, respectively. Thus, it can be concluded that the inferiority of the EADE algorithm against MA-SW chains algorithm considerably decreases as the FEs increases.

On the other hand, regarding CEC’2008 benchmark functions, Tables 8, 9, and 10 contain the results obtained by all algorithms in \(D=100\), \(D=500\), and \(D=1000\), respectively. It includes the obtained best and the standard deviations of error from optimum solution of EADE and other seven state-of-the-art algorithms over 25 runs for all 7 benchmark functions. The results provided by these approaches were directly taken from references [39, 40]. For remarking the best algorithm, best mean for each function is highlighted in boldface.

As shown in Table 8, EADE is able to find the global optimal solution consistently in 4 out of 7 test functions over 25 runs with the exception of test functions (F\(_{2,}\) F\(_{3}\), and F\(_{7}\)). With respect to F\(_{2}\), although the optimal solution is not reached, the best result achieved is very close to the global optimal solution which can be verified by the very small function error and standard deviation. Regarding F\(_{3}\), the narrow valley from local optimal to global optimal present is a challenge to all algorithms that prevent EADE from finding global solution. In addition, EADE gets trapped in local optima on \(f_{7}\) as all other compared algorithms with exception to CEO which provide the best mean, although its global optimal solution is unknown. From the results presented in Tables 9 and 10, for 500D and 1000D problems, a similar trend as was observed in 100D is continued here. It is still able to provide the same competitive results. In general, it can be observed that EADE, CEO, and SL-PSO do significantly better than the others on most functions in different dimensions. On the other hand, EPUS-PSO performs poorly on all the functions in all dimensions.

Table 12 Results of multiple-problem Wilcoxon’s test for EADE* versus state-of-the-art algorithms over all functions at a 0.05 significance level with (\(D = 500\))
Table 13 Results of multiple-problem Wilcoxon’s test for EADE* versus state-of-the-art algorithms over all functions at a 0.05 significance level with (\(D = 1000\))

In addition, in all functions for all the three dimensionalities, EADE provides very small standard deviation which means that the differences between mean and median are small even in the cases when the final results are far away from the optimum, regardless of the dimensions. That implies that EADE is a robust algorithm. Moreover, due to insignificant difference between the results in three dimensions, it can be concluded that the performance of the EADE algorithm slightly diminishes and it is still more stable and robust against the curse of dimensionality, i.e., it is overall steady as the dimensions of the problems increase. Obviously, it can be deduced that the performance of the proposed EADE on large-scale optimization problems is surprisingly good, because there is no specific mechanism for large-scale optimization such as the divide-and-conquer or the CC framework adopted in EADE. However, the good scalability of EADE is due to the following two reasons. First, the new mutation scheme helps to maintain effectively the balance between the global exploration and local exploitation abilities for searching process of the DE needed for handling large-scale problems. Second, the proposed novel self-adaptive scheme for gradual change of the values of the crossover rate considerably balances the common trade-off between the population diversity and convergence speed might have contributed to the scalability. In fact, further investigation and experimental analysis of the performance of EADE on solving large-scale optimization are needed. Furthermore, to investigate and compare the performance of the proposed algorithms EADE against other algorithms in statistical sense, the multi-problem Wilcoxon signed-rank and Friedman tests between EADE and others in 100D, 500D, and 1000D are summarized in Tables 11,12,13, and 14, respectively. Where R\(^{+}\) is the sum of ranks for the functions in which first algorithm outperforms the second algorithm in the row, and R\(^{-}\) is the sum of ranks for the opposite.

Table 14 Average ranks for all algorithms across all problems with D = 100, D = 500 and D = 1000

From Table 11, we can see that EADE obtains higher R\(^{+}\) values than R\(^{-}\) in all cases, while slightly lower R\(^{+ }\) value than R\(^{-}\) value in comparison with SaDE. However, from Tables 12 and 13, in the cases of EFADE versus CEO, SL-PSO, and CCPSO2, they get higher R\(^{- }\) than R\(^{+}\) values. The reason is that EADE gains the performance far away of what these three algorithms do on function F\(_{7}\), resulting in higher ranking values. According to the Wilcoxon’s test at \(\alpha \) = 0.05, the significance difference can only be observed in EFADE versus EPUS-PSO case. Besides, Table 14 lists the average ranks EADE and other algorithms according to Friedman test for D = 100, 500, and 1000, respectively. The best ranks are marked in bold and the second ranks are underlined. The p value computed through Friedman test is 0.01, 0.48, and 0.47, respectively. Thus, it can be concluded that there is a significant difference between the performances of the algorithms. It can be clearly seen from Table 14 that EADE gets the first ranking among all algorithms in 100-dimensional functions, followed by CEO and SL-PSO. Regarding 500D and 1000D problems, CEO gets the first ranking, followed by SL-PSO and EADE. Furthermore, the performance of all algorithms is analyzed using all dimensions and different categories of functions. Therefore, the mean aggregated rank of all the 8 algorithms across all problems (7) and all dimensions (100D, 500D, and 100D) is presented in Table 12. From Table 12, it can be clearly concluded that CEO is the best followed by EADE as second best among all algorithms, while SL-PSO is ranked third. Finally, it is noteworthy to highlighting that EADE has shown comparable performance to MLCC, CCPSO2, and DMS-PSO, the three algorithms originally designed for solving large-scale optimization problems. Plus, it also significantly outperforms sep-CMA-ES and EPUS-PSO algorithms.

Overall, from the above results, comparisons, and discussion, the proposed EADE algorithm is of better searching quality, efficiency, and robustness for solving unconstrained large-scale global optimization problems. It is clear that the proposed EADE and EADE* algorithms perform well and it has shown its outstanding superiority with separable, non-separable, unimodal, and multi-modal functions with shifts in dimensionality, rotation, multiplicative noise in fitness, and composition of functions. Consequently, its performance is not influenced by all these obstacles. Contrarily, it greatly keeps the balance the local optimization speed and the global optimization diversity in challenging optimization environment with invariant performance. Besides, it can be obviously concluded from direct and statistical results that EADE and EADE* are powerful algorithms, and its performance is superior and competitive with the performance of the-state-of-the-art well-known DE-based algorithms.

Conclusion

To efficiently concentrate the exploitation tendency of some sub-region of the search space and to significantly promote the exploration capability in whole search space during the evolutionary process of the conventional DE algorithm, an enhanced adaptive Differential Evolution (EADE) algorithm for solving large-scale global numerical optimization problems over continuous space was presented in this paper. The proposed algorithm introduces a new mutation rule. It uses two random chosen vectors of the top and bottom 100p% individuals in the current population of size NP, while the third vector is selected randomly from the middle (NP-2p) individuals. The mutation rule is combined with the basic mutation strategy DE/rand/1/bin, where only one of the two mutation rules is applied with the probability of 0.5. Furthermore, we propose a novel self-adaptive scheme for gradual change of the values of the crossover rate that can excellently benefit from the past experience of the individuals in the search space during evolution process which, in turn, can considerably balance the common trade-off between the population diversity and convergence speed. The proposed mutation rule was shown to enhance the global and local search capabilities of the basic DE and to increase the convergence speed. The algorithm has been evaluated on the standard high-dimensional benchmark problems. The comparison results between EADE and EADE* and the other four state-of-art DE-based algorithms that were all tested on this test suite on the IEEE congress on Evolutionary competition in 2008 and 2010 indicate that the proposed algorithm and its version are highly competitive algorithms for solving large-scale global optimization problem. The experimental results and comparisons showed that the EADE and EADE* algorithms performed better in large-scale global optimization problems with different types and complexity; they performed better with regard to the search process efficiency, the final solution quality, the convergence rate, and robustness, when compared with other algorithms. In fact, the performance of the EADE and EADE* algorithm was statistically superior to and competitive with other recent and well-known DEs algorithms. Finally, to the best of our knowledge, this is the first study that uses all these different types of approaches (12) to carry out evaluation and comparisons on CEC’2008 and CEC’2010 benchmark problems. Virtually, this study aims to prove that EADE is a competitive and an efficient approach as well as being superior to the most recent techniques in the field of large-scale optimization. Several current and future works can be developed from this study. First, current research effort focuses on how to control the scaling factors by self-adaptive mechanism and develop another self-adaptive mechanism for crossover rate. In addition, the new version of EADE combined with Cooperative Co-evolution (CC) framework is being developed and will be experimentally investigated soon. Moreover, future research will investigate the performance of the EADE algorithm in solving constrained and multi-objective optimization problems as well as real-world applications such as data mining and clustering problems. In addition, large-scale combinatorial optimization problems will be taken into consideration. Another possible direction is integrating the proposed novel mutation scheme with all compared and other self-adaptive DE variants plus combining the proposed self-adaptive crossover with other DE mutation schemes. In addition, the promising research direction is joining the proposed mutation with evolutionary algorithms, such as genetic algorithms, harmony search, and particle swarm optimization, as well as foraging algorithms such as artificial bee colony, bees algorithm, and ant colony optimization. The MATLAB source code of EADE is available upon request.