Adaptive differential evolution with a new joint parameter adaptation method

Differential evolution (DE) is a population-based metaheuristic algorithm that has been proved powerful in solving a wide range of real-parameter optimization tasks. However, the selection of the mutation strategy and control parameters in DE is problem dependent, and inappropriate specification of them will lead to poor performance of the algorithm such as slow convergence and early stagnation in a local optimum. This paper proposes a new method termed as Joint Adaptation of Parameters in DE (JAPDE). The key idea lies in dynamically updating the selection probabilities for a complete set of pairs of parameter generating functions based on feedback information acquired during the search by DE. Further, for mutation strategy adaptation, the Rank-Based Adaptation (RAM) method is utilized to facilitate the learning of multiple probability distributions, each of which corresponds to an interval of fitness ranks of individuals in the population. The coupling of RAM with JAPDE results in the new RAM-JAPDE algorithm that enables simultaneous adaptation of the selection probabilities for pairs of control parameters and mutation strategies in DE. The merit of RAM-JAPDE has been evaluated on the benchmark test suit proposed in CEC2014 in comparison to many well-known DE algorithms. The results of experiments demonstrate that the proposed RAM-JAPDE algorithm outperforms or is competitive to the other related DE variants that perform mutation strategy and control parameter adaptation, respectively.


Introduction
Differential evolution (DE) is a population-based algorithm that belongs to the Evolutionary Algorithms family (Storn and Price 1997;Xiong et al. 2015). In DE, the search is driven by calculating differences between vectors. DE is an attractive tool for real-parameter optimization due to its high performance and easy implementation. It has been successfully employed in many real-world applications, such as furnace optimized control system (Leon et al. 2015), process modelling for greenhouses (Perez-Gonzalez et al. 2018 (Suresh and Lal 2017), optimal control and design of electric systems (Bakare et al. 2007;Leon et al. 2016).
However, DE, as many other metaheuristic algorithms, have two main issues: a) Stagnation into local optima; b) slow convergence speed (Leon and Xiong 2019). If the algorithm exploits promising regions too much by having a greedy setup, it will more likely stagnate into local optima. On the other hand, if the algorithm, with the purpose of avoiding stagnation into local optima, excessively explores the search space, it will encounter the problem of slow convergence speed. For this reason, a proper trade-off between exploration and exploitation is very much needed in DE.
The exploration/exploitation behaviour of DE is largely dependent on its two control parameters: mutation factor (F) and crossover rate (CR) (Yildiz et al. 2015). Manually finding the best values of the parameters for real applications usually involves a time consuming and trial-and-error procedure. Moreover, in different stages of the search, we would need different parameters to suite the characteristics of the varying landscape. Inappropriate setting of the control parameters will lead to low convergence speed or stagnation into local optima during the search.
Adaptation of the control parameters during the execution of DE has received much attention for more than one decade. In the works presented in (Qin et al. 2009;Zhang and Sanderson 2009;Islam et al. 2012), the F and CR values for each individual of the population are generated by Gaussian and Cauchy probability density functions separately. The mean values of these probability functions are updated based on the successful F and CR values that succeeded in producing better trial solutions (than target vectors) in the previous generation. The SHADE algorithm (Tanabe and Fukunaga 2013) relies on the previous experience to calculate the weighted Lehmer means of successful F values and the weighted arithmetic mean of successful CR values, which are then utilized to replace an entity of the memories for F and CR generation, respectively.
This paper proposes a new method called Joint Adaptation of Parameters in DE (JAPDE), which emphasizes more reliable assessment of the combined effect of F and CR parameter generating functions. Like in SADE (Qin et al. 2009), JADE (Zhang and Sanderson 2009) and SHADE algorithms, JAPDE also uses two different probability functions to generate the F and CR values, respectively. What is novel is that JAPDE deals with pairs of mean values of the two probability functions instead of a single means with separation of the other. A pair of mean values is randomly selected from a pool of candidates for each target vector in running of JAPDE. We use a matrix to represent the selection probabilities for a complete set of pairs of the two mean values. This probability matrix is then dynamically updated via performance evaluation using feedback acquired during the search, such that those pairs that were more successful in producing better trial solutions in the last period will reinforce their probabilities to be selected and vice versa.
Further JAPDE has been combined with the Rank-Based Mutation Adaptation (RAM) method, which was proposed in our recent study (Leon and Xiong 2018). The reason for this combination is that, the different mutation strategies will also highly affect the exploration/exploitation behaviour of DE. The main idea of RAM is to maintain and adapt multiple probability distributions such that individuals of distinct fitness ranks can obtain different probabilities in selecting the mutation strategy. The combination of RAM and JAPDE gives rise to the RAM-JAPDE algorithm, which offers a new and probably more synergistic manner for simultaneously adapting the selection probabilities for mutation strategies and pairs of control parameters in DE. RAM-JAPDE has been tested on the set of benchmark problems from CEC'14 (Liang et al. 2013a), and it has been shown to outperform or be competitive to many state-of-the-art DE variants that also adapt their control parameters and mutation strategies in the optimization process.
The rest of the paper is organized as follows. First, a description of the conventional DE is given in Sect. 2. In Sect. 3, a review of well-known DE algorithms with adaptation in both control parameters and mutation strategies is given. In Sect. 4, all the details of the proposed RAM-JPADE algorithm are presented. The experiments and results are given in Sect. 5. Finally, the conclusion is given in Sect. 6.

Differential evolution
Differential Evolution (DE) was proposed by R. Storn and K. Price (Storn and Price 1995) in 1995. The population X is formed by ps (population size) individuals, in which the ith individual is represented as X i = {x 1 , x 2 , . . . , x j , . . . , x n } where n is the dimension of the problem and x j ∈ R is inside a range dependent on the specifications of the problem. The initial population of DE is randomly generated inside the search space. These individuals are transformed in evolution by three different operations: Mutation, Crossover and Selection. Conducting all these operations sequentially constitutes one cycle of DE algorithm, which leads to a new generation. These operations are described as follows: MUTATION: The first step inside the DE cycle is called Mutation. In Mutation, a new population of mutated individuals, V , is generated by using differences of individuals of the population. There are different ways of calculating the i-th individual of V in generation g (V i,g ). Some common mutation strategies are described below. More variants of mutation strategies can be found in the literature (Leon and Xiong 2014;Zhang and Sanderson 2009;Leon and Xiong 2017). -DE/current-to-rand/1 where r 1 , r 2 and r 3 represent random indexes between 1 and the maximum number of individuals in the population ( ps), X best,g represents the best individual from the population at generation g and F ∈ [0, 2]. Additionally, the last position in the name of the mutation strategies indicates the number of subtractions between random individuals from the population. CROSSOVER: The second step is called Crossover. In Crossover, a mixture of the population, X , and the mutated population, V , is performed in order to generate the set of offspring, U . To obtain the j-th parameter of the i-th offspring, U i,g [ j], in generation g, the following calculation is performed: where rand is a uniform random number in the range [0, 1]. C R, called crossover rate, is a value inside the range [0, 1], which decides the probability of selecting values from the mutated individual. Additionally, j rand is an index between 1 and n, to ensure that at least one value is taken from V i . SELECTION: The last step is called Selection. In selection, the fitness of the i-th offspring ( f (U i )) is compared against the fitness of the i-th individual from the population ( f (X i )) and the best one will become a member of the population for the next generation g + 1. This operation is decided by Eq. (6) for a minimization problem.

Related work
In this section, an overview of different DE algorithms that adapt the control parameters (Sect. 3.1) and DE algorithms that adapt different mutation strategies (Sect. 3.2) is presented.

Adaptation of parameters
As previously mentioned, finding the best combination of parameters in DE is an expensive task. According to (Eiben et al. 1999), there are three types of methods to control the parameters: -Deterministic parameter control: A deterministic rule is applied to the different parameters in order to change them through the evolutionary process. -Adaptive parameter control: Feedback from the search is used to adapt the different parameters to respond to changing properties in various stages of the search -Self-adaptive parameter control: The parameters are embedded into the individuals, and thereby, the parameters will undergo different evolutionary operations.
The most common type is adaptive parameter control. The algorithms can be divided into two subcategories: individual parameter control and joint parameter control. Within the first subcategory, the first algorithm, SaDE (Qin et al. 2009), creates random F and CR values using two normal distributions, respectively. The novelty of this algorithm is that the mean of the normal distribution for CR, meanC R, is adapted. To do so, the successful CR values during the last learning period are used to calculate the new value of meanC R. A similar algorithm was proposed by Zhang et al. (Zhang and Sanderson 2009), stated as JADE. Differently from SaDE, both the Cauchy distribution for F and the normal distribution for CR are adapted. In this case, the successful F and CR values in the last generation are utilized to calculate their Lehmer and arithmetic means, respectively, for updating the mean values of both distributions. Similarly, in MDE_ pBX (Islam et al. 2012) the Lehmer mean is replaced by a pow mean. A different algorithm was proposed by Tanabe et al. (Tanabe and Fukunaga 2013), stated as SHADE, which maintains memories of F and CR values calculated as weighted Lehmer mean and weighted arithmetic mean of successful F and CR values from the last generation An improved version of SHADE was proposed in 2016 (Viktorin et al. 2016), in which a multichaotic framework is used to select the parents that will be used during the mutation phase.
Within the second subcategory there are not as many algorithms as in the first one. The well-known algorithm that keeps good combinations of F and CR values is EPSDE (Mallipeddi et al. 2011). EPSDE creates a pool with CR values from the range 0.1-0.9 and with F values from the range 0.4-0.9 in steps of 0.1. Random combinations are assigned to different individuals. If a combination successfully creates an offspring that is better than its parent, the same combination will be used for that offspring during the next generation. It will also be added to the pool of successful combinations. On the contrary, if the offspring is not better, then a new combination is assigned to that individual from either the pool of all combinations or the pool of successful combinations. A similar algorithm was proposed in (Wang et al. 2011), but with a reduced number of combinations in consideration. A different alternative was proposed in (Fan and Yan 2016), stated as ZEPDE, in which the total region of parameters is divided into 4 zones. Then, the combinations of values inside theses zones are changed using a weighted mean of the successful combinations within each zone.
Even though adaptive parameter control within differential evolution algorithms is more common, there are some algorithms that belong to deterministic parameter control or self-adaptive parameter control. One good example of the former is sinDE (Draa et al. 2015), which uses sinusoidal functions to change F and CR values through the time without any feedback from the search. Similarly, a deterministic rule is proposed in (Sun et al. 2018), that also uses a sinu-soidal function whereas for F and CR calculations dependent on individual performance ranking. More algorithms belongs to the latter. The first algorithm belonging to this category is jDE (Brest et al. 2006). In jDE, F and CR are embedded into the individuals. If a new offspring successfully outperforms its parent, then its associated parameters will survive as well, so that good combinations of parameters are available within the population. A different algorithm is proposed in (Teo 2006), stated as DESAP, in which an additional parameter, the population size, is adapted along with F and CR. The three parameters will undergo mutation with the individuals where they are embedded. Then, instead of crossover, a small perturbation is applied to the new values.

Adaptation of mutation strategies
There are different DE algorithms that use different mutation strategies. In SaDE (Qin et al. 2009), 4 different mutation strategies are used: DE/rand/1/bin, DE/rand-tobest/2/bin, DE/rand/2/bin and DE/current-to-rand/1. Each of them will have a probability of being selected for mutation. After a learning period, these probabilities of the mutation strategies are adjusted according to their success rates. Similarly to SaDE, ZEPDE (Fan and Yan 2016) adapts the selection among 5 different mutation strategies: DE/rand/1, DE/current-to-best/2, DE/current-to-best/1, DE/best/2 and DE/rand/2, each of which will have a probability of being selected. After each generation, these probabilities are adjusted based on how much each created offspring improves with respect to the worst one in the population. Note that all this is performed after a certain number of generations (G s ) since at the beginning of the search, only DE/rand/1 is used.
Differently from SaDE and ZEPDE, CoDE (Wang et al. 2011) uses three different mutation strategies (DE/rand/1/bin, DE/rand/2/bin and DE/current-to-rand/1) at the same time to create three different offspring per parent. Then, the best offspring will compete with the parent to become a member of the population. An improved version of the algorithm was proposed in (Deng et al. 2019), which introduced a new operation before applying mutation. An even simpler adaptation of mutation strategies was proposed by Mallipeddi et al. (Mallipeddi et al. 2011), in which DE/best/2/bin, DE/rand/1/bin, DE/current-to-rand/1/bin are used as mutation strategies. They are then randomly selected for being assigned to the different individuals. If a mutation strategy successfully creates an offspring that will replace its parent, then the same mutation strategy will be used for that individual in the next generation. If the mutation strategy is not successful, a new mutation strategy will be randomly assigned.
More recently, SA-SHADE as an improved version of SHADE was proposed (Dawar and Ludwig 2018). SA-SHADE uses 5 different mutation strategies: DE/rand/1, DE/rand/2, DE/best/2, DE/current-to-pbestWithArchive/ and DE/current-rand-to-pBest, all of them are described in (Dawar and Ludwig 2018). Similar as the parameter adaptation in SHADE, a memory of successful mutation strategies is maintained. In order to create an offspring, a random mutation strategy is selected from the memory. After each generation, the memory is updated with the mutation strategy that was successful for a highest number of times. In (Mohamed and Suganthan 2018), two mutations strategies are used as candidates: DE/rand/1 and the new triangular mutation strategy proposed to balance the global exploration. DE/rand/1 is selected with higher probability at the beginning of the search while both are used with the same probability at the end.

JAPDE: Joint Adaptation of Parameters in Differential Evolution
In most other adaptive DE algorithms, the parameters F and C R are considered separately when calculating the new centres of Normal/Cauchy distributions. However, F and C R are inherently related, e.g. for some functions, only small values of F work, if big values of C R are taken. Like in some other adaptive DE algorithms, the F and C R values are generated in JAPDE using Cauchy and Normal distributions, respectively. Thus the i-th mutant vector F i and crossover rate C R i are created by following Eq. (7) and Eq. (8).
where Cauchy(mean F i , 0.05) is a random value generated by the Cauchy distribution with mean F i as its mean and std equal to 0.05. Similarly, N ormal(meanC R i , 0.05) is a random value generated by the Normal distribution with meanC R i as its mean and std equal to 0.05. If F i is greater than 1 then it is set to 1. On the contrary, F i is generated again if it is smaller than 0. C R i is set to 0 or 1 if it is outside the range [0, 1]. Differently from other variants of adaptive DE, JAPDE maintains a probability distribution matrix (M) based on which to select pairs of mean F i and meanC R i . The size of M is equal to 11 × 11, in which the columns represent the different mean Fs (from 0 to 1 in the step size of 0.1) and the rows represent the different meanC Rs (from 0 to 1 in the step size of 0.1). The probability of selecting the pair of cr -th meanC R and the f -th mean F is given by M cr , f . The pseudocode of how a pair of mean F i and meanC R i is selected is given in Algorithm 1. In this algorithm, zeros( ps,1) creates an array of ps zeros, sumAll(M) will add all the values of M and sumPerRow(M) will return an array, in which each value is the sum of the elements of an entire row of M.
where E P A is the evaporation rate used to update the parameters, success PC cr , f is the number of times in which the combination f -cr ( f -index representing the closest candidate to the used F, cr -index representing the closest candidate to the used C R) was successful, tries PC cr , f is the number of times in which the combination f -cr was tried and E P f is a value from the exploration plane. E P f is calculated as indicated in Eq. (11). If f is equal to 0, then 0.01 is used instead.
where N F E S is the current number of evaluations and N F E S M AX is the maximum number of evaluations. The reason for using the evaporation rate and the exploration plane is to avoid fast convergence to the pairs of mean F and meanC R that can work well in early stages of the search but they will become inappropriate in later stages.

Adaptation of mutation strategies
JAPDE is combined with RAM, a method that adapts the selection between different mutation strategies. As previously mentioned, there are many different mutation strategies with various exploration-exploitation behaviour. The highly exploitative mutation strategies include DE/best/1 and DE/current-to-best, which utilize the best individual in the population in creating a new mutant vector. The highly exploratory strategies include DE/rand/1 and DE/currentto-rand/1, which completely rely on randomly selected individuals from the population in the mutation operation. Further by considering the structure of mutation strategies, DE/best/1 can be viewed as highly exploitative as opposed to DE/rand/1, and DE/current-to-best as highly exploitative as opposed to DE/current-to-rand/1. A comprise between two opposite mutation strategies can be implemented by two new strategies: DE/ pbest/1 (Eq. (12)) and DE/currenttopbest/1 (Eq. (13)), which can be used to produce an effect that lies in between highly exploratory and exploitative behaviours. How much these two new mutation strategies are going to explore or exploit the search space is determined by the parameter p. If p is set to a very high percentage, these two parameterized strategies will behave closely to the exploratory mutation strategies, otherwise they will function more similarly to exploitative mutation strategies. The relation between the 6 different mutation strategies can be found in Fig. 1.
where X pBest,g is a random individual selected from the p best individuals, with p being a percentage value of the population size. X r 1,g and X r 2,g are randomly selected individuals from the population.
In RAM-JAPDE, both strategies, DE/ pbest/1 and DE/ current-topbest/1, are used. Firstly, the parameter p is decided by the state of the search. A high value of p is enforced at the beginning of the search to enforce the algo- rithm to more explore the search space, while a small value of p is used at the end of the search process to enable the algorithm to converge faster. How p changes during the search process is given in Eq. (14).
where ps is the population size, N F E S is the current number of evaluations and N F E S M AX is the maximum number of evaluations. The evaluation maximum is used to ensure that at least one individual from the population is always taken. Secondly, in order to decide which strategy is used to create the different mutant vectors, a method that we previously proposed, called Rank-based Mutation Adaptation (RAM), is used (Leon and Xiong 2018). In RAM, the individuals from the population are first ordered according to their fitness values, and then the population is divided into different groups. The number of groups (gr) is a parameter to be decided by the user. The groups size (G S) is calculated as stated in Eq. (15).
where ps is the population size. The number of probabilities per group is set depending on the number of mutation strategies used by the algorithm, which is two in this algorithm ( pBest/1 and current-to-pBest/1). The total probability profile is stated as P, where P k,l stands for the probability of selecting the l-th mutation strategy for individuals of the k-th group. The pseudocode showing how the mutation strategies are selected is given in Algorithm 2. In Algorithm 2, ceil(i/G S) refers to the group id to which the i-th individual belongs, zeros( ps, 1) creates a vector of zeros with size ps, and rand(0, a) returns a random value in the range [0,a].
The different probabilities are updated according to its performance during a period of time, called learning period (L P R AM ). Hence, the new probability of using the l-th mutation strategy in the k-th group (P k,l ) is calculated as Algorithm 2 function selectMutationStrategy(G S, P) 1: mutationToUse = zeros( ps,1); 2: for i = 1 to ps do 3: cumulativeP = 0; 4: k = ceil(i/G S); 5: r = rand(0,1) 6: for l = 1 to 2 do 7: cumulativeP = cumulativeP + P k,l ; 8: if r <= cumulativeP && mutationToUse(i) == 0 then 9: mutationToUse(i) = j; 10: end if 11: end for 12: end for 13: return mutationToUse; where E R AM is the evaporation rate. This is used in order to avoid a big change on the probabilities. success M S k,l and tries M S k,l are, respectively, the number of times in which the l-th mutation strategy was successfully used for the k-th group in creating an offspring better than the parent, and the number of times that the same mutation strategy was used for the same group. A complete pseudocode of RAM-JAPDE is provided in Algorithm 3.

Experiments and results
In this section, the strength of RAM-JAPDE is tested. First, in Sect. 5.1, the experimental settings are described. Second, a study of the different parameters and parts of RAM-JAPDE is performed in Sect. 5.2. In Sect. 5.3, the strength of the JAPDE method is studied. Forth, in Sect. 5.4, the performance of RAM-JAPDE is compared with other adaptive DE algorithms. Fifth, the convergence speed of RAM-JAPDE is compared with other DE algorithms in Sect. 5.5. Sixth, the performance of RAM-JAPDE in problems of different dimensions is given in Sect. 5.6. Seventh, a discussion on the evolution of M is carried out in Sect. 5.7.Eighth, a comparison of the F and CR values used by RAM-JAPDE and SHADE is performed in Sect. 5.8. Finally, RAM-JAPDE is compared with L-SHADE, the winner of the CEC2014 competition, in Sect. 5.9.

Experimental settings and comparative measures
In this paper, the benchmark proposed in CEC'14 conference is used for comparisons. This benchmark is composed of 30 functions, which are grouped into four categories: Unimodal functions (F1-F3), multimodal functions (F4-F16), Hybrid Functions (F17-F22) and Composition functions (F23-F30).
For each function, all the algorithms compared in this paper have been tested 50 independent times with a maximum number of evaluations of 10000 × dimension . If an algorithm if mutationToUse(i) == 1 then 18: end if 22: 23: //Crossover 24: for j = 1 to dimension do 25: if rand(0,1) < C R i or j == j rand then 26: obtained an error below 1.00E-08, the optimal solution was considered to be found.
Additionally, in order to make a fair comparison between the different algorithms, two measures and one statistical test have been used: -F.A.R.: the average ranking according to Friedman statistical test. -S.E.R.: the summation of the relative errors with respect to the worst per function. -Wilcoxon's signed rank statistical test (α < 0.05). F.A.R. is a N to N comparison, in which a smaller value represents a better algorithm. In order to calculate the F.A.R. values, the following two steps have to be followed: 1) Calculate the rank of each algorithm on each separate problem and 2) calculate the average rank value.
Similar to F.A.R., S.E.R. is a N to N comparison in which a smaller value represents a better algorithm. When calculating the S.E.R. value, two steps have to be followed: 1) The performance of each algorithm on each problem is divided by the performance of the worst performing algorithm on the exact same problem and 2) The summation of the values across all functions is performed.
Wilcoxon's signed rank statistical test is a 1 to 1 statistical test. The first step is to calculate the differences between the two compared algorithms. Then, these differences are ranked from smaller to larger. The ranks are equal to points that the algorithms will receive if they are better. In order to conclude that one algorithm is better than the other, the p value has to be smaller than the significance values (α = 0.05 in our case), then the differences between the algorithms are statistically relevant. We have used the singrank function in MATLAB.
More information can be found in (Derrac et al. 2011)

Parameters study of RAM-JAPDE
In order to create RAM-JAPDE, 4 different parts have been combined together. Firstly, a matrix containing selection probabilities of all combinations of parameters is introduced. Secondly, RAM method is used to adapt the selection between two different mutation strategies. Thirdly, an exploration plane is added to the probability matrix. Lastly, a linear reduction of the parameter p is used. In this section, a study of the different components is performed using P S = 100 and gr = 10.
There are four constants that will affect the adaptation of the probability matrix and the usage of RAM: Learning Period of RAM (L P R AM ), learning period of parameters adaptation (L P P A ), evaporation rate of RAM (E R AM ) and evaporation rate of parameter adaptation (E P A ). A comparison between different combinations of parameters, on the benchmark proposed in CEC'13 conference (Liang et al. 2013b), is performed (Fig. 2). In order to make an easier comparison, we will assume that E R AM = E P A (E) and L P R AM = L P P A (L P). As it is expected, if we have an aggressive evaporation with a small learning period, the matrix will change too much and it will change too quickly from one combination of parameters to another. On the con- The best result is represented with white, while the worst is represented with dark blue trary, if we set a small evaporation rate with a long learning period, it will shift too slowly to a new combination of parameters. According to the results, the best combination is E = 0.2 and L P = 80.
As was mentioned above, RAM is used to adapt the selection between DE/current-to-pBest/1 (DE/ctpb/1) and DE/pBest/1. A comparison between RAM-JAPDE and JAPDE using only one of the two mutation strategies is performed (Fig. 3). In order to perform this comparison, the previously found parameters are used (LP = 80 and E = 0.2). From the figure, it can be observed that using RAM to adapt the selection between the two different mutation strategies did not obtained the best performance on every function. However, its results on each function is closer to the mutation strategy with better results. The good performance of RAM-JAPDE is also verified by the results shown in Table 1, demonstrating that using RAM provides a good coupling between the two mutation strategies.
Additionally, two extra parts are included into the algorithm: the exploration plane and the linear reduction of p.
RAM-JAPDE has been tested without the exploration plane (RAM-JAPDE no plane) and with a random p instead of the linear reduction (RAM-JAPDE random p). The Wilcoxon's statistical test, together with F.A.R. and S.R.E, shows that adding the exploration plane and having a linear reduction of p provide a better performance due to a good exploration/exploitation balance (Table 1).

Comparison of JAPDE with other parameter adaptation methods
In order to evaluate the strength of the proposed joint parameter adaptation method, JAPDE is compared with 4  The results (Table 2) show that JAPDE is stronger in all the dimensions than its counterparts, except jDE, which is statistically similar to JAPDE in dimensions 30, 50 and 100. Besides, JAPDE and SHADE are statistically similar in dimension 30.
The reason of stronger performance of JAPDE lies in its ability to adapt the control parameters by treating F and CR in a combined manner, while the other adaption methods (except jDE) adjust the two parameters separately. In jDE, F and CR are evolved at the same time in the running of the algorithm. That is why jDE obtained competitive results to JAPDE in the experiments.

Comparison of RAM-JAPDE with other adaptive Differential Evolution Algorithms
In order to evaluate the strength of RAM-JAPDE, two different comparisons in dimension 30 have been performed. Firstly, RAM-JAPDE have been compared with 6 different DE algorithms that also adapt the selection between different mutation strategies: SaDE (Qin et al. 2009 In Table 3, a comparison between RAM-JAPDE and the previously related algorithms is performed. It can be seen that RAM-JAPDE obtained the best results in half of all the functions. Additionally, Wilcoxon's statistical test has been performed, showing that RAM-JAPDE is statistically better than its counterparts. Moreover, RAM-JAPDE obtained the best F.A.R. and S.E.R. scores. It is good to mention that CoDE, EPSDE and ZEPDE also adapt F and CR in a joint manner and that RAM-JAPDE outperforms all of them.
Secondly The results of this comparison are presented in Table 4. It can be observed from the table that RAM-JAPDE is statistically better than its counter parts, except η_CODE, which is statistically similar. Additionally, if F.A.R. and S.E.R. scores are considered, RAM-JAPDE obtained the best results among all the compared algorithms.

Convergence speed analysis
In this subsection, the convergence speed of RAM-JAPDE is analysed and compared with algorithms that either jointly adapt the parameters of DE, as CoDE, EPSDE, ZEPDE or are well-known adaptive DE algorithms such as SaDE, JADE, jDE and SHADE.  In order to be able to study the convergence speed of RAM-JAPDE and compare it with other algorithms, some convergence graphs are presented in Fig. 4 It can be observed that on unimodal functions RAM-JAPDE is not the most exploitative algorithm due to the exploratory behaviour that RAM-JAPDE has at the beginning of the search. In F1, it can be observed that although RAM-JAPDE did not converge fast at the beginning, it behaved better than other algorithms when it started to change its behaviour to an exploitative one. The same result can be observed in multimodal functions, specially in F4 and F6.
A different observation can be made in hybrid functions. It seems that the algorithms can be divided into three groups. The first group is actually a single algorithm, JADE, that gets stuck in local optima faster than other algorithms due to the usage of DE/ctpb/1 with a small p. The second group is formed by jDE and ZEPDE. Since they use DE/rand/1, they have an exploratory behaviour. Then, when ZEPDE turns to adapt mutation strategies, we can observe how it could improve its performance. The third group is formed by all the other algorithms where differences between them depend on how much they explore/exploit at the beginning. It can be observed from this group that, at the beginning of the search, RAM-JAPDE was the most exploratory algorithm, but when the search process advanced, it reached better solutions than its counterparts.
A similar situation happened in composite functions, with the difference that only two groups existed. The first group is formed by jDE and ZEPDE. And the second group is formed by all the other algorithms. In these functions, the difference between the algorithms is smaller.

Dimensionality study
In this section, we will study the performance of RAM-JAPDE when dealing with problems of different dimensions. In order to perform the study, we have compared RAM-JAPDE against CoDE, EPSDE, ZEPDE, SaDE, SA-SHADE, EFADE, JADE, jDE and SHADE on the CEC2014 benchmark functions but with lower dimensionality (dimension 10) and higher dimensionalities (dimension 50 and 100).
Firstly, a summary of the results obtained by the mentioned algorithms, for the problems in dimension 10, is presented (Table 5). It can be observed that when the dimensionality of the problem is reduced, RAM-JAPDE outperformed all its counterparts.
Secondly, a summary of the results obtained by the different algorithms in problems of dimension 50 (Table 6) and dimension 100 (Table 7) are presented. It can be observed that RAM-JAPDE outperformed all its counterparts, except CoDE in dimension 50 and SHADE in dimension 50 and 100, in which they are statistically similar. These results indicate that RAM-JAPDE can perform well in higher dimensions.
The superiority of RAM-JAPDE to its counterparts is owing to the strength of the two adaptation methods: RAM and JAPDE are used together. In RAM, the selection probabilities of mutation strategies are adjusted specifically for individuals of different fitness ranking. In JAPDE, we achieve a joint parameter adaptation by considering a complete set of pairs of F and CR values. Such advantages for both parameter and mutation adaptation are not available with other adaptive DE algorithms

Discussion of the adaptive probability distribution matrix (M)
Different parameters have been defined within RAM-JAPDE in the previous sections that affect the probability matrix M. In Fig. 5, the probabilities of creating a pair of f -cr values inside the different groups, using mean F and meanC R equal to 0.5, are shown. It can be observed that though we are using mean values of 0.5, f -cr combinations from different groups can be created as well, with smaller probabilities. Moreover, combinations, in which F is equal to 1, will have    Fig. 6. Different observations can be made from this figure. First, a lot of high values inside M mean that many combinations will have a high probability of being selected. Second, there is a big change from generation 300 to generation 1200, in which bad combinations had a really low probability of being used again. Third, from generation 1200 to generation 2400

Comparison of the used F and CR values by RAM-JAPDE and SHADE
In this subsection, a comparison between RAM-JAPDE and SHADE, with regards to the used F and CR values, is per-  (Fig. 7). Usually, two different situations are found with regards to SHADE and CR values. Firstly, F5 and F15 are considered (See Fig. 7a, b). In these two functions, SHADE will tend to use CR equal to 0, while RAM-JAPDE will mostly use small values of CR in F5 and small and high values in F15. SHADE obtained a better performance in F5, since CR equal to 0 is the best option for this problem. However, in F15, RAM-JAPDE outperformed SHADE thanks to its ability to adapt to different search stages. It can be seen from Fig. 7b that in later stages, RAM-JAPDE started using high values (as well as small values), which seems beneficial for its performance, while SHADE got stuck on CR values equal to 0, and consequently obtaining poorer performance.
Secondly, F18 and F30 are considered (Fig. 7c, d). In these two functions, it can be seen how high values of CR are beneficial for both algorithms. The difference between both algorithms is that SHADE took a greedy approach and moved towards high CR values, while RAM-JAPDE used high and small values (which shows how both are beneficial for RAM-JAPDE). Over time, it can be observed how SHADE get penalized for that greedy decision, and it stopped its improvement (See Fig. 4h, l, respectively). On the other hand, thanks to the approach used in RAM-JAPDE, which instead of only using high CR values, it also uses small CR values. RAM-JAPDE is able to keep improving for a longer time.
On the other hand, if F values are considered, it can be observed how a larger range of successful values exist. RAM-JAPDE keeps a high variability of the used F through the generations. On the contrary, SHADE tends to focus on some specific range of F values. This focus changes through the generations.

Comparison of RAM-JAPDE with L-SHADE
In this subsection, RAM-JAPDE is compared with L-SHADE, the winner of the CEC2014 competition. The parameters for the algorithms are listed below: -L-SHADE: N init = 18 × n, N arc = 1.4 × PS, p = 0.11 and H = 6, as specified in their paper. -RAM-JAPDE: PS = 100, A = 100, gr = 10, L P R AM = L P P A = 30 and E R AM = E P A = 0.05. The results in (Table 8) show that L-SHADE (Tanabe and Fukunaga 2014) is statistically better than RAM-JAPDE in dimensions 30 and 50, while both are statistically similar in dimensions 10 and 100. This difference is caused by the linear reduction of the population size implemented in L-SHADE. Besides, RAM-JAPDE uses a constant population size independent of the dimension of the problem, while L-SHADE uses varying population sizes dependent on the dimension thereby better adapting itself to changes of dimensionality.
In order to further study the capability of RAM-JAPDE, we attempted to apply the same linear reduction of the population size as implemented in L-SHADE in RAM-JAPDE. The new version of the algorithm is stated as L-RAM-JAPDE. L-RAM-JAPDE uses the same parameters as RAM-JAPDE, except for the population size and the archive size that are the same as in L-SHADE, and LP is set to 3000 fitness evaluations. The results in (Table 9) show that L-RAM-JAPDE is statistically similar to L-SHADE in dimensions 10, 30, 50 and 100.
Observing the results of L-RAM-JAPDE also leaves us with the sense that reducing population during the search does not show strong potential to further enhance the performance of RAM-JAPDE. This can be attributed to the increasingly smaller populations caused by the population size reduction scheme. This would result in too many generations in a learning period and thereby delay the response of parameter adaptation given the accelerated convergence. On the other hand, smaller population makes less important to have multiple probability distributions in the adaptation of mutation strategies.

Conclusion
Selecting the best combination of control parameters and the best mutation strategies in DE can be an arduous task. This paper proposes a new parameter adaption method called Joint Adaptation of Parameters in DE (JAPDE), which jointly adapts the generation of the mutation factor and crossover rate during the running of DE. The key trick is to maintain and dynamically update a matrix of selection probabilities for a set of pairs of parameter creating functions, which covers the whole range of both parameters. Additionally, it adds an exploration plane that will enforce the search towards more exploration. Further we combine JAPDE with the Rank-Based Adaptation (RAM) method developed in our recent study (Leon and Xiong 2018), leading to the new RAM-JAPDE algorithm. RAM is able to adapt the selection between two strong mutation strategies (DE/pBest/1 and DE/curret-to-pBest/1) using different probability distributions for individuals belonging to distinct rank intervals. Moreover, RAM-JAPDE linearly decreases the parameter p inside both mutation strategies to allow for more exploration at the start of the search yet more exploitation at the end.
RAM-JAPDE has been evaluated in CEC2014 benchmark suit and compared with well-known DE algorithms. From the experiments, it was found that RAM-JAPDE was the best algorithm among those that also perform adaptation of mutation strategies, as well as those that adapt the mutation factor and crossover rate. Moreover, RAM-JAPDE was tested on different problem dimensions, and it obtained the same superiority to its counterparts. Additionally, the convergence speed of RAM-JAPDE was evaluated and it can be concluded that RAM-JAPDE enables a better exploration/exploitation trade-off than its counterparts.
A practical limitation of the proposed RAM-JAPDE algorithm is that it cannot be expected as highly effective when the population size is small. One reason is that a small population entails more generations in a learning period to evaluate all combinations of F and CR values. This would likely prevent parameter adaptation to be conducted in a timely manner, considering the fast convergence of DE with a small population. On the other hand, a small number of individuals seems to give less motivation to maintain multiple probability distributions in mutation strategy selection, which would mitigate the strength of RAM that is used in our algorithm.
Ethical approval This article does not contain any studies with human participants or animals performed by any of the authors.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/. Xiong N, Molina D, Ortiz ML, Herrera F (2015) A walk into metaheuristics for engineering optimization: principles, methods and recent trends, ISSN 18756883 Yildiz YE, Altun O, Topal AO (2015) The effects of crossover and mutation rates on chemotaxis differential evolution optimization algorithm. J Nat Tech Sci 1:89-101 Zhang J, Sanderson AC (2009) JADE: adaptive differential evolution with optional external archive. IEEE Trans Evolut Comput 13:945-958 Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.